For gitsm recipes, it's possible that some URL is used more than once.
e.g.,
A -> B:rev1 (B is a submodule of A)
A -> C (C is a submodule of A)
C -> B:rev2 (B is a submodule of C)
A anc C are both using B as submodules, but on different revs.
Now if we have:
B:rev1 -> D
B:rev2 -> E
Then, the mirror will not be fully used.
Say we have all repo mirrors for A, B, C, D, E, then in theory it's not
necessary to reach out to any network for downloading. But it's not the
case. After downloading B(rev1) and its submodule D from mirrors, the fetch
process continues to download C, thus B(rev2) and E. Now it finds that B
needs an update because its submodule E needs an update. Of course this is
true because E is not downloaded yet. Now the problem comes to whether to
use mirror or not. The git.py defines try_premirror to return 'False' when
the ud.clonedir exists. As B has been cloned, the ud.clonedir exists and
try_mirror returns False, resulting in not using mirror and going to upstream
directly.
We can see that the mirrors are not fully used. This is usually not problem,
as the cost is only some network download. But in case the following two
settings are there, we get errors.
BB_NO_NETWORK = "0"
BB_ALLOWED_NETWORKS = "*.some.allowed.domain"
In such case, the gitsm recipe A will fail to fetch. Note that all contents
that A needs are in mirrors and now it's failing to fetch. This is unexpected.
Note that the different revs of the same repo in gitsm recipe is not the only
way to reveal this problem. For example, there might be a recipe call B that
uses B:rev3. Check the protobuf and grpc recipes as an example.
For now, we can use the following steps to reproduce this issue. To be clear,
the grpc recipe in meta-oe is now 1.60.0.
1. Add in local.conf:
DL_DIR = "${TOPDIR}/downloads-premirror"
bitbake grpc -c fetch
2. Comment out the DL_DIR setting in local.conf and add the following lines:
PREMIRRORS:append = " \
git://.*/.* git://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
gitsm://.*/.* gitsm://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
"
3. Set BB_NO_NETWORK = "1" and then 'bitbake grpc -c fetch'.
This command succeeds and this shows that the premirror holds everything we need.
4. Add the following lines and then 'bitbake grpc -c fetch'.
BB_NO_NETWORK = "0"
BB_ALLOWED_NETWORKS = "*.some.domain"
After step 4, the error message is as below:
ERROR: grpc-1.60.0-r0 do_fetch: The URL: 'gitsm://github.com/protocolbuffers/protobuf.git;protocol=https;name=third_party/protobuf;subpath=third_party/protobuf;nobranch=1;lfs=True;bareclone=1;nobranch=1' is not trusted and cannot be used
This patch fixes this problem by handling this corner case, that is, if the URL is
not trusted from the settings of BB_NO_NETWORK and BB_ALLOWED_NETWORKS, then we should
try premirrors because trying to reach upstream is destined to fail.
(Bitbake rev: e1be272ad105b47d3131b77168d9172386993fcb)
Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
There are expectations of users of the fetcher code. This file attempts to document some of the constraints that are present. Some are obvious, some are less so. It is documented in the context of how OE uses it but the API calls are generic.
a) network access for sources is only expected to happen in the do_fetch task. This is not enforced or tested but is required so that we can:
i) audit the sources used (i.e. for license/manifest reasons) ii) support offline builds with a suitable cache iii) allow work to continue even with downtime upstream iv) allow for changes upstream in incompatible ways v) allow rebuilding of the software in X years time
b) network access is not expected in do_unpack task.
c) you can take DL_DIR and use it as a mirror for offline builds.
d) access to the network is only made when explicitly configured in recipes (e.g. use of AUTOREV, or use of git tags which change revision).
e) fetcher output is deterministic (i.e. if you fetch configuration XXX now it will match in future exactly in a clean build with a new DL_DIR). One specific pain point example are git tags. They can be replaced and change so the git fetcher has to resolve them with the network. We use git revisions where possible to avoid this and ensure determinism.
f) network access is expected to work with the standard linux proxy variables so that access behind firewalls works (the fetcher sets these in the environment but only in the do_fetch tasks).
g) access during parsing has to be minimal, a "git ls-remote" for an AUTOREV git recipe might be ok but you can't expect to checkout a git tree.
h) we need to provide revision information during parsing such that a version for the recipe can be constructed.
i) versions are expected to be able to increase in a way which sorts allowing package feeds to operate (see PR server required for git revisions to sort).
j) API to query for possible version upgrades of a url is highly desireable to allow our automated upgrage code to function (it is implied this does always have network access).
k) Where fixes or changes to behaviour in the fetcher are made, we ask that test cases are added (run with "bitbake-selftest bb.tests.fetch"). We do have fairly extensive test coverage of the fetcher as it is the only way to track all of its corner cases, it still doesn't give entire coverage though sadly.
l) If using tools during parse time, they will have to be in ASSUME_PROVIDED in OE's context as we can't build git-native, then parse a recipe and use git ls-remote.
Not all fetchers support all features, autorev is optional and doesn't make sense for some. Upgrade detection means different things in different contexts too.