bitbake: user-manual-metadata.xml: Added "Checksums (Signatures)" section.

Added this section to the end of the Metadata chapter.

(Bitbake rev: b653c58284cafd0b79991520543ca6239705d36b)

Signed-off-by: Scott Rifenbark <scott.m.rifenbark@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
This commit is contained in:
Scott Rifenbark
2014-01-29 12:06:15 -06:00
committed by Richard Purdie
parent 89058e1ef7
commit 3cdf800334

View File

@@ -1092,4 +1092,187 @@
deletes all the flags for a variable.
</para>
</section>
<section id='task-checksums-and-setscene'>
<title>Task Checksums and Setscene</title>
<para>
This list is a place holder of content that needs explanation here.
Items should be moved to appropriate sections below as completed.
<itemizedlist>
<listitem><para><filename>STAMP</filename></para></listitem>
<listitem><para><filename>STAMPCLEAN</filename></para></listitem>
<listitem><para><filename>BB_STAMP_WHITELIST</filename></para></listitem>
<listitem><para><filename>BB_STAMP_POLICY</filename></para></listitem>
<listitem><para><filename>BB_HASHCHECK_FUNCTION</filename></para></listitem>
<listitem><para><filename>BB_SETSCENE_VERIFY_FUNCTION</filename></para></listitem>
<listitem><para><filename>BB_SETSCENE_DEPVALID</filename></para></listitem>
<listitem><para><filename>BB_TASKHASH</filename></para></listitem>
</itemizedlist>
</para>
<section id='checksums'>
<title>Checksums (Signatures)</title>
<para>
BitBake uses checksums (or signatures) along with the setscene
to determine if a task needs to be run.
This section describes the process.
To help understand how BitBake does this, the section assumes an
OpenEmbedded metadata-based example.
</para>
<para>
The setscene code uses a checksum, which is a unique signature of a task's
inputs, to determine if a task needs to be run again.
Because it is a change in a task's inputs that triggers a rerun, the process
needs to detect all the inputs to a given task.
For shell tasks, this turns out to be fairly easy because
BitBake generates a "run" shell script for each task and
it is possible to create a checksum that gives you a good idea of when
the task's data changes.
</para>
<para>
To complicate the problem, some things should not be included in
the checksum.
First, there is the actual specific build path of a given task -
the working directory.
It does not matter if the work directory changes because it should not
affect the output for target packages.
The simplistic approach for excluding the work directory is to set
it to some fixed value and create the checksum for the "run" script.
</para>
<para>
Another problem results from the "run" scripts containing functions that
might or might not get called.
The incremental build solution contains code that figures out dependencies
between shell functions.
This code is used to prune the "run" scripts down to the minimum set,
thereby alleviating this problem and making the "run" scripts much more
readable as a bonus.
</para>
<para>
So far we have solutions for shell scripts.
What about Python tasks?
The same approach applies even though these tasks are more difficult.
The process needs to figure out what variables a Python function accesses
and what functions it calls.
Again, the incremental build solution contains code that first figures out
the variable and function dependencies, and then creates a checksum for the data
used as the input to the task.
</para>
<para>
Like the working directory case, situations exist where dependencies
should be ignored.
For these cases, you can instruct the build process to ignore a dependency
by using a line like the following:
<literallayout class='monospaced'>
PACKAGE_ARCHS[vardepsexclude] = "MACHINE"
</literallayout>
This example ensures that the <filename>PACKAGE_ARCHS</filename> variable does not
depend on the value of <filename>MACHINE</filename>, even if it does reference it.
</para>
<para>
Equally, there are cases where we need to add dependencies BitBake
is not able to find.
You can accomplish this by using a line like the following:
<literallayout class='monospaced'>
PACKAGE_ARCHS[vardeps] = "MACHINE"
</literallayout>
This example explicitly adds the <filename>MACHINE</filename> variable as a
dependency for <filename>PACKAGE_ARCHS</filename>.
</para>
<para>
Consider a case with in-line Python, for example, where BitBake is not
able to figure out dependencies.
When running in debug mode (i.e. using <filename>-DDD</filename>), BitBake
produces output when it discovers something for which it cannot figure out
dependencies.
</para>
<para>
Thus far, this section has limited discussion to the direct inputs into a task.
Information based on direct inputs is referred to as the "basehash" in the
code.
However, there is still the question of a task's indirect inputs - the
things that were already built and present in the build directory.
The checksum (or signature) for a particular task needs to add the hashes
of all the tasks on which the particular task depends.
Choosing which dependencies to add is a policy decision.
However, the effect is to generate a master checksum that combines the basehash
and the hashes of the task's dependencies.
</para>
<para>
At the code level, there are a variety of ways both the basehash and the
dependent task hashes can be influenced.
Within the BitBake configuration file, we can give BitBake some extra information
to help it construct the basehash.
The following statement effectively results in a list of global variable
dependency excludes - variables never included in any checksum.
This example uses variables from OpenEmbedded to help illustrate
the concept:
<literallayout class='monospaced'>
BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH DL_DIR \
SSTATE_DIR THISDIR FILESEXTRAPATHS FILE_DIRNAME HOME LOGNAME SHELL TERM \
USER FILESPATH STAGING_DIR_HOST STAGING_DIR_TARGET COREBASE PRSERV_HOST \
PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_DISABLE LICENSE_PATH SDKPKGSUFFIX"
</literallayout>
The previous example excludes the work directory, which is part of
<filename>TMPDIR</filename>.
</para>
<para>
The rules for deciding which hashes of dependent tasks to include through
dependency chains are more complex and are generally accomplished with a
Python function.
The code in <filename>meta/lib/oe/sstatesig.py</filename> shows two examples
of this and also illustrates how you can insert your own policy into the system
if so desired.
This file defines the two basic signature generators OpenEmbedded Core
uses: "OEBasic" and "OEBasicHash".
By default, there is a dummy "noop" signature handler enabled in BitBake.
This means that behavior is unchanged from previous versions.
<filename>OE-Core</filename> uses the "OEBasicHash" signature handler by default
through this setting in the <filename>bitbake.conf</filename> file:
<literallayout class='monospaced'>
BB_SIGNATURE_HANDLER ?= "OEBasicHash"
</literallayout>
The "OEBasicHash" <filename>BB_SIGNATURE_HANDLER</filename> is the same as the
"OEBasic" version but adds the task hash to the stamp files.
This results in any metadata change that changes the task hash, automatically
causing the task to be run again.
This removes the need to bump
<link linkend='var-PR'><filename>PR</filename></link>
values, and changes to metadata automatically ripple across the build.
</para>
<para>
It is also worth noting that the end result of these signature generators is to
make some dependency and hash information available to the build.
This information includes:
<itemizedlist>
<listitem><para><filename>BB_BASEHASH_task-&lt;taskname&gt;</filename>:
The base hashes for each task in the recipe.
</para></listitem>
<listitem><para><filename>BB_BASEHASH_&lt;filename:taskname&gt;</filename>:
The base hashes for each dependent task.
</para></listitem>
<listitem><para><filename>BBHASHDEPS_&lt;filename:taskname&gt;</filename>:
The task dependencies for each task.
</para></listitem>
<listitem><para><filename>BB_TASKHASH</filename>:
The hash of the currently running task.
</para></listitem>
</itemizedlist>
</para>
</section>
</section>
</chapter>