This fetcher allows BitBake to fetch from a Google Cloud Storage bucket. The fetcher expects a gs:// URI of the following form: SSTATE_MIRRORS = "file://.* gs://<bucket name>/PATH" The fetcher uses the Google Cloud Storage Python Client, and expects it to be installed, configured, and authenticated prior to use. If accepted, this patch should merge in with the corresponding oe-core patch titled "Add GCP fetcher to list of supported protocols". Some comments on the patch: There is also documentation for the fetcher added to the User Manual. I'm still not completely sure about the recommends_checksum() being set to True. As I've noted in the mailing list, it will throw warnings if the fetcher is used in recipes without specifying a checksum. Please let me know if this is intended behavior or if it should be modified. Here is how this fetcher conforms to the fetcher expectations described at this link: https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README a) Yes, network fetching only happens in the fetcher b) The fetcher has nothing to do with the unpack phase so there is no network access there c) This change doesn't affect the behavior of DL_DIR. The GCP fetcher only downloads to the DL_DIR in the same way that other fetchers, namely the S3 and Azure fetchers do. d) The fetcher is identical to the S3 and Azure fetchers in this context e) Yes, the fetcher output is deterministic because it is downloading tarballs from a bucket and not modifying them in any way. f) I set up a local proxy using tinyproxy and set the http_proxy variable to test whether the Python API respected the proxy. It appears that it did as I could see traffic passing through the proxy. I also did some searching online and found posts indicating that the Google Cloud Python APIs supported the classic Linux proxy variables, namely: - https://github.com/googleapis/google-api-python-client/issues/1260 g) Access is minimal, only checking if the file exists and downloading it if it does. h) Not applicable, BitBake already knows which version it wants and the version infomation is encoded in the filename. The fetcher has no concept of versions. i) Not applicable j) Not applicable k) No tests were added as part of this change. I didn't see any tests for the S3 or Azure changes either, is that OK? l) I'm not 100% familiar but I don't believe this fetcher is using any tools during parse time. Please correct me if I'm wrong. (Bitbake rev: 8e7e5719c1de79eb488732818871add3a6fc238b) Signed-off-by: Emil Ekmečić <eekmecic@snap.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
There are expectations of users of the fetcher code. This file attempts to document some of the constraints that are present. Some are obvious, some are less so. It is documented in the context of how OE uses it but the API calls are generic.
a) network access for sources is only expected to happen in the do_fetch task. This is not enforced or tested but is required so that we can:
i) audit the sources used (i.e. for license/manifest reasons) ii) support offline builds with a suitable cache iii) allow work to continue even with downtime upstream iv) allow for changes upstream in incompatible ways v) allow rebuilding of the software in X years time
b) network access is not expected in do_unpack task.
c) you can take DL_DIR and use it as a mirror for offline builds.
d) access to the network is only made when explicitly configured in recipes (e.g. use of AUTOREV, or use of git tags which change revision).
e) fetcher output is deterministic (i.e. if you fetch configuration XXX now it will match in future exactly in a clean build with a new DL_DIR). One specific pain point example are git tags. They can be replaced and change so the git fetcher has to resolve them with the network. We use git revisions where possible to avoid this and ensure determinism.
f) network access is expected to work with the standard linux proxy variables so that access behind firewalls works (the fetcher sets these in the environment but only in the do_fetch tasks).
g) access during parsing has to be minimal, a "git ls-remote" for an AUTOREV git recipe might be ok but you can't expect to checkout a git tree.
h) we need to provide revision information during parsing such that a version for the recipe can be constructed.
i) versions are expected to be able to increase in a way which sorts allowing package feeds to operate (see PR server required for git revisions to sort).
j) API to query for possible version upgrades of a url is highly desireable to allow our automated upgrage code to function (it is implied this does always have network access).
k) Where fixes or changes to behaviour in the fetcher are made, we ask that test cases are added (run with "bitbake-selftest bb.tests.fetch"). We do have fairly extensive test coverage of the fetcher as it is the only way to track all of its corner cases, it still doesn't give entire coverage though sadly.
l) If using tools during parse time, they will have to be in ASSUME_PROVIDED in OE's context as we can't build git-native, then parse a recipe and use git ls-remote.
Not all fetchers support all features, autorev is optional and doesn't make sense for some. Upgrade detection means different things in different contexts too.