Openembedded Core Discussions
 help / color / mirror / Atom feed
From: Anders Heimer <anders.heimer@est.tech>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
	Paul Barker <paul@pbarker.dev>,
	openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [PATCH 1/2] package: replace copydebugsources shell pipelines with Popen
Date: Tue, 16 Jun 2026 16:13:33 +0200	[thread overview]
Message-ID: <a92febe8-330a-4113-9b51-aae77d317cbe@est.tech> (raw)
In-Reply-To: <0bd08ff5a06dac60d667da9f17df183a0c971e2a.camel@linuxfoundation.org>

[-- Attachment #1: Type: text/plain, Size: 3842 bytes --]

Hi Richard,

On 6/16/26 15:44, Richard Purdie wrote:
> On Tue, 2026-06-16 at 15:35 +0200, Anders Heimer via lists.openembedded.org wrote:
>> On 6/16/26 14:12, Paul Barker wrote:
>>> On Tue, 2026-06-16 at 10:25 +0200, Anders Heimer wrote:
>>>> -        for pmap in prefixmap:
>>>> +        env = os.environ.copy()
>>>> +        env["LC_ALL"] = "C"
>>>> +
>>>> +        for pmap, prefix in prefixmap.items():
>>>> +            dstroot = dvar + prefix
>>>>                # Ignore files from the recipe sysroots (target and native)
>>>> -            cmd =  "LC_ALL=C ; sort -z -u '%s' | egrep -v -z '((<internal>|<built-in>)$|/.*recipe-sysroot.*/)' | " % sourcefile
>>>> +            sort_p = subprocess.Popen(["sort", "-z", "-u", "--", sourcefile], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, env=env)
>>>> +            egrep_p = subprocess.Popen(["egrep", "-v", "-z", "-e", r"((<internal>|<built-in>)$|/.*recipe-sysroot.*/)"], stdin=sort_p.stdout, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, env=env)
>>>> +            sort_p.stdout.close()
>>>> +
>>>>                # We need to ignore files that are not actually ours
>>>>                # we do this by only paying attention to items from this package
>>>> -            cmd += "fgrep -zw '%s' | " % prefixmap[pmap]
>>>> +            fgrep_p = subprocess.Popen(["fgrep", "-zw", "-e", prefix], stdin=egrep_p.stdout, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, env=env)
>>>> +            egrep_p.stdout.close()
>>>> +
>>>>                # Remove prefix in the source paths
>>>> -            cmd += "sed 's#%s/##g' | " % (prefixmap[pmap])
>>>> -            cmd += "(cd '%s' ; cpio -pd0mlLu --no-preserve-owner '%s%s' 2>/dev/null)" % (pmap, dvar, prefixmap[pmap])
>>>> +            sed_p = subprocess.Popen(["sed", "s#%s/##g" % prefix], stdin=fgrep_p.stdout, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, env=env)
>>>> +            fgrep_p.stdout.close()
>>>> +
>>>> +            cpio_p = subprocess.Popen(["cpio", "-pd0mlLu", "--no-preserve-owner", dstroot], stdin=sed_p.stdout, cwd=pmap, stderr=subprocess.DEVNULL, env=env)
>>>> +            sed_p.stdout.close()
>>>> +
>>>> +            for proc in (cpio_p, sed_p, fgrep_p, egrep_p, sort_p):
>>>> +                proc.wait()
>>> Hi Anders, thanks for the patches!
>>>
>>> If we're reworking this code, I think we should replace the complex
>>> sed/grep/sort pipeline with Python code. We can read into a Python list
>>> and sort/filter using the Python standard library, then pass the results
>>> to cpio.
>> Thank you,  I am very happy to implement this approach instead. I
>> strongly agree with all your comments.
> There are some other things to consider here. This is a fairly
> sensitive area of code from a performance perspective. You could code
> much of this in python using shutil for example however shutil has
> traditionally been up to an order of magnitude slower. As such, this
> code was optimized to be fast, hence the use of cpio.
>
> In many cases I worry less about performance but this is one area it
> does really matter and makes a big difference to build speed overall.
> python can be fast if carefully written but if this used shutil for
> example, it likely won't be.
>
> Cheers,
>
> Richard
Good point. I will avoid replacing the copy step with shutil and keep 
cpio in the path. I can look at whether only the sort/filter part can 
move to Python while still feeding cpio.

I had not considered the performance sensitivity here, so I will run 
some benchmarking before proposing any larger change.

Best regards,
Anders


[-- Attachment #2: Type: text/html, Size: 6420 bytes --]

  reply	other threads:[~2026-06-16 14:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16  8:25 [PATCH 0/2] package: replace copydebugsources shell pipelines with Popen Anders Heimer
2026-06-16  8:25 ` [PATCH 1/2] " Anders Heimer
2026-06-16 12:12   ` [OE-core] " Paul Barker
2026-06-16 13:35     ` Anders Heimer
2026-06-16 13:42       ` Paul Barker
2026-06-16 13:44       ` Richard Purdie
2026-06-16 14:13         ` Anders Heimer [this message]
2026-06-16  8:25 ` [PATCH 2/2] oeqa/oelib: add copydebugsources tests Anders Heimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a92febe8-330a-4113-9b51-aae77d317cbe@est.tech \
    --to=anders.heimer@est.tech \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=paul@pbarker.dev \
    --cc=richard.purdie@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox