Files
poky/bitbake/lib/bb/fetch2
Alexander Kanavin 74c93ec961 bitbake: fetch2/wget: set User-Agent to 'bitbake/version' in checkstatus()
This eliminates the last usage of 'fake mozilla' in bitbake, and
it's then truthful everywhere about presenting itself, or wget
(when that is used).

I understand this will make people nervous so I want to provide
an extended decription.

1. How was this tested?

- bitbake-selftest -k FetchCheckStatusTest
(tests a few hardcoded URIs, all passed)

- bitbake -k -c checkuri world
(runs checkstatus() over all recipes in oe-core, and all passed again -
this hopefully goes a long way to reassure everyone that hosts around
the world and various CDNs typically do not have a problem with user-agent
strings they haven't seen before or bitbake user-agent specifically)

2. What about that removed cloudflare comment?

I digged into git history, and I think it is not fully accurate. First, 'fake
mozilla' agent is used only for checkstatus() - in actual fetching with wget
it is not. And that has not been a problem for anyone.

Second, here's how the comment occured. Usage of 'fake mozilla' was introduced here:
https://git.yoctoproject.org/poky/commit/?h=master&id=ab26fdae9e5ae56bb84196698d3fa4fd568fe903

At that point it did not have to be specifically 'mozilla', the commit message
indicates that any User-Agent would have been ok. Mozilla was simply copied
from upstream version check for convenience.

Later on, the string was updated to a more recent Mozilla:
https://git.yoctoproject.org/poky/commit/?h=master&id=9f123238261a68e37cec634782e9320633cac5d4

The claim in the added comment become something else: that User-Agent *must* a browser,
without evidence or tests. Even though it demonstrably doesn't have to be - wget is ok.

3. What if someone has a server that is ok with wget agent, but not ok with bitbake agent?

Please see point one. It's not impossible but I think it's highly unlikely. I do think
we should rather tell servers the truth, and learn where the actual issues are. Then
we can consider options - whether that would be pretending to be wget, or allowing user-agent
to be configured. We should also add such servers to bitbake-selftest so we know what they
are.

(Bitbake rev: 234f9e810494394527f59fdf22eb86435d046d53)

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2024-10-22 11:16:32 +01:00
..
2021-10-26 13:47:24 +01:00
2022-09-29 21:24:29 +01:00

There are expectations of users of the fetcher code. This file attempts to document some of the constraints that are present. Some are obvious, some are less so. It is documented in the context of how OE uses it but the API calls are generic.

a) network access for sources is only expected to happen in the do_fetch task. This is not enforced or tested but is required so that we can:

i) audit the sources used (i.e. for license/manifest reasons) ii) support offline builds with a suitable cache iii) allow work to continue even with downtime upstream iv) allow for changes upstream in incompatible ways v) allow rebuilding of the software in X years time

b) network access is not expected in do_unpack task.

c) you can take DL_DIR and use it as a mirror for offline builds.

d) access to the network is only made when explicitly configured in recipes (e.g. use of AUTOREV, or use of git tags which change revision).

e) fetcher output is deterministic (i.e. if you fetch configuration XXX now it will match in future exactly in a clean build with a new DL_DIR). One specific pain point example are git tags. They can be replaced and change so the git fetcher has to resolve them with the network. We use git revisions where possible to avoid this and ensure determinism.

f) network access is expected to work with the standard linux proxy variables so that access behind firewalls works (the fetcher sets these in the environment but only in the do_fetch tasks).

g) access during parsing has to be minimal, a "git ls-remote" for an AUTOREV git recipe might be ok but you can't expect to checkout a git tree.

h) we need to provide revision information during parsing such that a version for the recipe can be constructed.

i) versions are expected to be able to increase in a way which sorts allowing package feeds to operate (see PR server required for git revisions to sort).

j) API to query for possible version upgrades of a url is highly desireable to allow our automated upgrage code to function (it is implied this does always have network access).

k) Where fixes or changes to behaviour in the fetcher are made, we ask that test cases are added (run with "bitbake-selftest bb.tests.fetch"). We do have fairly extensive test coverage of the fetcher as it is the only way to track all of its corner cases, it still doesn't give entire coverage though sadly.

l) If using tools during parse time, they will have to be in ASSUME_PROVIDED in OE's context as we can't build git-native, then parse a recipe and use git ls-remote.

Not all fetchers support all features, autorev is optional and doesn't make sense for some. Upgrade detection means different things in different contexts too.