Openembedded Core Discussions
 help / color / mirror / Atom feed
From: Jacob Kroon <jacob.kroon@gmail.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
	openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [RFC PATCH v2 1/2] bitbake.conf: Pad rpath and remove build ID in native binaries
Date: Thu, 2 Dec 2021 15:49:26 +0100	[thread overview]
Message-ID: <e388fadb-04d7-e653-2dfa-ccbd6e589251@gmail.com> (raw)
In-Reply-To: <c95df2f5084fd93bb10e308e5b501c92c0779d44.camel@linuxfoundation.org>

On 12/2/21 12:09, Richard Purdie wrote:
> On Thu, 2021-12-02 at 12:03 +0100, Jacob Kroon wrote:
>> On 12/2/21 11:51, Richard Purdie wrote:
>>> On Thu, 2021-12-02 at 11:19 +0100, Jacob Kroon wrote:
>>>> On 12/2/21 00:11, Richard Purdie wrote:
>>>>> On Tue, 2021-11-30 at 23:37 +0100, Jacob Kroon wrote:
>>>>>> Try to make sure that the RUNTIME dynamic entry size is the same for all
>>>>>> binaries produced with the native compiler. This is necessary in order to
>>>>>> produce identical binaries when using differently sized buildpaths. I've
>>>>>> tried using only patchelf, and keeping the linker flags as they are, but
>>>>>> I am unable to produce identical binaries. Has anyone else managed to do
>>>>>> this with patchelf ? If not, maybe we can write a new tool that can handle it ?
>>>>>>
>>>>>> The build-id also needs to be removed since it is calculated based on
>>>>>> the data present at link time. This includes STAGING_LIBDIR_NATIVE
>>>>>> and STAGING_BASE_LIBDIR_NATIVE. Both will differ and they need to be temporarily
>>>>>> preserved since some recipes will execute the binaries during do_install()
>>>>>> (for example python3-native). Later on these are removed in chrpath.bbclass.
>>>>>>
>>>>>> This hack is the first step for producing identical native binaries when using
>>>>>> different build paths. 'zstd-native' is a working example.
>>>>>>
>>>>>> Signed-off-by: Jacob Kroon <jacob.kroon@gmail.com>
>>>>>> ---
>>>>>>  meta/classes/chrpath.bbclass | 3 +++
>>>>>>  meta/conf/bitbake.conf       | 5 ++++-
>>>>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> I'm a little torn on this. Our other option would be to hardcoded a specific
>>>>> dummy path and then edit it later to the correct value. That may be neater than
>>>>> adding the padding. It will change the end binaries but hopefully only after
>>>>> they're installed so should give the same net end result more neatly?
>>>>>
>>>>
>>>> Hmm not sure I follow. This patch adds a new dummy rpath entry,
>>>> "/rpath-padding-xxx...", then we remove it in chrpath. I don't know what
>>>> other value we would like to put there. If I understand you correctly,
>>>> we could perhaps pad one of the ones we already pass
>>>>
>>>> -Wl,-rpath,${STAGING_LIBDIR_NATIVE}
>>>> -Wl,-rpath,${STAGING_BASE_LIBDIR_NATIVE}
>>>>
>>>> with spaces, like:
>>>>
>>>> -Wl,-rpath,${STAGING_LIBDIR_NATIVE}
>>>> -Wl,-rpath,"${STAGING_BASE_LIBDIR_NATIVE}${RPATH_PADDING}"
>>>
>>>
>>> I'm wondering if:
>>>
>>> -Wl,-rpath,/not/exist/our-native-libdir-marker
>>> -Wl,-rpath,/not/exist/our-native-base-libdir-marker
>>>
>>> would work.
>>>
>>
>> Right, I'll give it a try.
>>

Unfortunatley this breaks building python3-native. Although it compiles,
during the build the python build scripts tries to import the created
modules, and if this fails (which it does) it renames the modules:

> *** WARNING: renaming "_curses" since importing it failed: libncurses.so.5: cannot open shared object file: No such file or directory
> *** WARNING: renaming "_curses_panel" since importing it failed: libpanel.so.5: cannot open shared object file: No such file or directory
> *** WARNING: renaming "_ssl" since importing it failed: libssl.so.3: cannot open shared object file: No such file or directory
> *** WARNING: renaming "_hashlib" since importing it failed: libssl.so.3: cannot open shared object file: No such file or directory
> *** WARNING: renaming "nis" since importing it failed: libnsl.so.3: cannot open shared object file: No such file or directory
> *** WARNING: renaming "_ctypes" since importing it failed: libffi.so.8: cannot open shared object file: No such file or directory

I suppose it tries to import using the built python which has those
phony rpaths, and can't find the per-recipe-sysroot
lbncurses.so.5/libpanel.so.5/etc and fails.

The new modules will be called:

> sysroots-components/x86_64/python3-native/usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu_failed.so
> sysroots-components/x86_64/python3-native/usr/lib/python3.10/lib-dynload/nis.cpython-310-x86_64-linux-gnu_failed.so
> sysroots-components/x86_64/python3-native/usr/lib/python3.10/lib-dynload/_hashlib.cpython-310-x86_64-linux-gnu_failed.so
> sysroots-components/x86_64/python3-native/usr/lib/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu_failed.so
> sysroots-components/x86_64/python3-native/usr/lib/python3.10/lib-dynload/_curses_panel.cpython-310-x86_64-linux-gnu_failed.so
> sysroots-components/x86_64/python3-native/usr/lib/python3.10/lib-dynload/_curses.cpython-310-x86_64-linux-gnu_failed.so

which means any subsequent recipe that uses python3-native will fail to
import any of those modules.

I suspect it might not just be python that wants to run the produced
binaries during the build itself.

>>>> If that works that would be less intrusive I think.
>>>>
>>>>> If we separate out the build-id patch we could hopefully get that piece merged
>>>>> as that shouldn't be controversial? 
>>>>>
>>>>
>>>> Yes, I can split it out into a separate patch.
>>>>
>>>> But now that I've looked at this for a while, I've asked myself what
>>>> good does all this do ? The only optimization I can think of is that if
>>>> we rebuild a native recipes, and the sysroot component turns out the
>>>> same, then we don't need to create a new sstate cache entry. So we save
>>>> disk space, but disk space is cheap. We still need to build it. What I
>>>> would like is to have a common sstate dir for multiple build
>>>> directories. So if I build libtool-native in one build path, then at my
>>>> other build path it would just pick it up from sstate cache when I build
>>>> there. In the end, is that something that would be possible ?
>>>
>>> We originally started here with gcc-cross so lets consider that and multiple
>>> build directories where a patch changes gcc-cross in a way that is irrelavent to
>>> the output.
>>>
>>> The "win" is that regardless of whether I build in location A or B, I get the
>>> same gcc-cross binary. Hash-equiv will then not rebuild the target binaries.
>>> Yes, I pay the price of a gcc-cross rebuild but hashequiv saves the targets
>>> rebuilding.
>>>
>>> Currently it would only happen if you always build gcc-cross in a specific build
>>> path.
>>>
>>
>> I know the build path will change if I upgrade to a new version of gcc,
>> but then the output is most definitely gonna change as well.
>>
>>> Like everything, it is a question of looking at the changes and deciding whether
>>> they are worth any maintenance burden/code complication or additional overhead
>>> they generate. I don't know the answer here yet but I do appreciate the research
>>> in helping get us data to make decisions on!
>>>
>>
>> I was thinking if it was possible to add a "build-path-does-not-matter"
>> .bbclass that would make the signatures independent of build path and
>> then scan the output to make sure it didn't contain any references to
>> the build path. Then those recipes who didn't depend on build path could
>> inherit from that class, and then maybe their sstate could be reused
>> from multiple build directories ? Not sure reliable it would be though..
> 
> Another crazy thought is our sstate really is already path independent,
> regardless of the binary content. You could therefore make the hash function
> replace the path with a fixed string. The downside is that doesn't work well on
> binaries due to offsets, alignment and so on.
> 
> As I read the above I was reminded that insane.bbclass does sanity check the
> output for build paths and does have a configurable control mechanism. It
> doesn't do that for the populate_sysroot output though since it is for
> do_package.
> 
> Lots to think about here but you're right that adding some kind of scanner to
> mark up recipes over time would help us preserve this.

Jacob


  reply	other threads:[~2021-12-02 14:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-30 22:37 [RFC PATCH v2 0/2] Improve native/cross reproducibility Jacob Kroon
2021-11-30 22:37 ` [RFC PATCH v2 1/2] bitbake.conf: Pad rpath and remove build ID in native binaries Jacob Kroon
2021-12-01 23:11   ` [OE-core] " Richard Purdie
2021-12-02 10:19     ` Jacob Kroon
2021-12-02 10:51       ` Richard Purdie
2021-12-02 11:03         ` Jacob Kroon
2021-12-02 11:09           ` Richard Purdie
2021-12-02 14:49             ` Jacob Kroon [this message]
2021-11-30 22:37 ` [RFC PATCH v2 2/2] Improve native reproducibility in recipes Jacob Kroon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e388fadb-04d7-e653-2dfa-ccbd6e589251@gmail.com \
    --to=jacob.kroon@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox