From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: openembedded-core <openembedded-core@lists.openembedded.org>,
Bruce Ashfield <bruce.ashfield@gmail.com>
Subject: Re: Dilemma on changes - merge or not to merge (e.g. 6.4)
Date: Tue, 15 Aug 2023 09:08:53 -0400 [thread overview]
Message-ID: <ZNt45UVb0Ps/kKLM@windriver.com> (raw)
In-Reply-To: <1b3bb1c747644f83156f6269de2c502660c18466.camel@linuxfoundation.org>
[Dilemma on changes - merge or not to merge (e.g. 6.4)] On 14/08/2023 (Mon 10:54) Richard Purdie wrote:
> I'm becoming a little weary/wary of some of the changes that are coming
> in. The challenge is that once they merge, issues become the problem of
> a very small number of people.
>
> My current dilemma is the 6.4 kernel. People would like it, we'd really
> ideally use it for the next release but there are issues.
>
> I've worked through a few, at least pinning down where the issues were
> then resolving them with the help of others (thanks Bruce, Jon, Ross).
>
> Remaining are:
> * an error upon boot on preempt-rt on qemux86-64
> (e.g. https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/7616/steps/36/logs/stdio)
> We'll probably just have to ignore it in parselogs as it has been??
> around for a while and nobody seems interested in fixing it upstream.
Just back from vacation and I see an internal report of 10-ish at boot
NOHZ tick-stop error: local softirq work is pending, handler #80!!!
..on the 6.1.43-rt10-yocto-preempt-rt kernel, on real hardware. So it
seems we can't blame that one entirely on v6.4 kernel (or qemu).
We used to get (late 3.x and 4.x era) pretty common "NOHZ: local softirq
pending" messages even on common/popular distro kernels. But I haven't
seen those for a long time and they didn't scream "error" or have the
alarmist three exclamation marks either.
I'll see if I can dig into that further. This instance is new to me, so
any additional context or information I might not turn up myself would
be useful.
> * some random hangs:
> https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/349/steps/12/logs/stdio
> https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/354/steps/12/logs/stdio
>
> The latter are rare and intermittent, mainly taking out CI test builds.
> Most people aren't affected by them, find them hard to reproduce let
> alone fix and will ignore them. That will leave me/Bruce/PaulG holding
> the pieces.
Ugh. The RCU one is ugly and the Silent Boot Death one is no better.
Nobody likes SBD cases. They suck.
>
> I know Bruce spends a ton of time debugging weird things just to get
> the kernel to the point we can even consider merging and nobody ever
> really sees or appreciates that work :(.
Well, not "nobody". There are at least two people who have a good idea
of what Bruce does. :-P
Paul.
--
>
> Systemd was a similar challenge recently, multiple patches causing
> multiple issues with a significant impact on CI. In that case the
> issues weren't intermittent so resolution wasn't so bad.
>
> Rust and reproducibility??was given a pass so the rest of the changes
> could merge for it. That just meant there was less pressure and the
> reproducibility issue is still there with people saying its too hard.
> That issue is now spreading down the chain to other recipes.
>
> The toolchain test reports have thousands of failures nobody is really
> looking at. Similarly the now consistent ltp controllers failures
> (previously the reports weren't even consistent!).
>
> I'm worried the access control patches changing the tar format are
> going to destablise and once merged, people will move on to other
> things leaving any remaining intermittent issues to me. Already we're
> seeing things like sstate being blamed as it is easiest to do that. I
> end up having to "prove" it isn't that.
>
> There are intermittent ptests on the autobuilder too. I took mdadm
> ptest patches on the basis there was help to fix them. We are still see
> a lot of failures in CI from there. The glib-networking intermittent
> failures continue, I know Trevor has tried to dig into those but he is
> alone in doing it in code which isn't easy to navigate (and I don't
> know how to help there).
>
> As an idea of impact, every time one of these things fails in CI,
> someone has triage that failure. The bug triage team has to triage the
> bugs too.
>
> I don't know how we fix this but we really could do with more people
> able to dive in and help with these intermittent issues. I'm really
> really apprehensive about merging some patches as I can just tell
> they're going to cause pain :(.
>
> Cheers,
>
> Richard
>
next prev parent reply other threads:[~2023-08-15 13:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-14 9:54 Dilemma on changes - merge or not to merge (e.g. 6.4) Richard Purdie
2023-08-15 13:08 ` Paul Gortmaker [this message]
2023-08-15 13:38 ` Richard Purdie
2023-08-16 7:55 ` [OE-core] " Rasmus Villemoes
2023-08-18 3:22 ` Paul Gortmaker
2023-08-22 9:31 ` Richard Purdie
[not found] ` <177DAAB2E4C3384A.4797@lists.openembedded.org>
2023-08-22 11:07 ` Richard Purdie
[not found] ` <177DAFEBFB5EB0D2.24073@lists.openembedded.org>
2023-08-22 11:47 ` Richard Purdie
2023-08-22 12:20 ` Mikko Rapeli
2023-08-22 12:28 ` Richard Purdie
2023-08-22 12:31 ` Alexander Kanavin
[not found] ` <177DB4530EBE3FA3.24073@lists.openembedded.org>
2023-08-22 14:49 ` Richard Purdie
[not found] ` <177DBC07E94591CC.4797@lists.openembedded.org>
2023-08-22 21:08 ` Richard Purdie
[not found] ` <177DD0B30D8FEDF8.27837@lists.openembedded.org>
2023-08-22 22:01 ` Richard Purdie
[not found] ` <177DD39B5534099F.27837@lists.openembedded.org>
2023-08-23 21:16 ` Richard Purdie
[not found] ` <177E1FB73F514F09.8058@lists.openembedded.org>
2023-08-24 14:04 ` Richard Purdie
[not found] ` <177E56C1DFAB4DFC.13053@lists.openembedded.org>
2023-08-24 20:18 ` Richard Purdie
2023-08-25 5:04 ` Frédéric Martinsons
2023-08-25 6:27 ` Mikko Rapeli
2023-08-25 6:34 ` Richard Purdie
2023-08-25 7:26 ` Mikko Rapeli
[not found] ` <177E8CC0D944344B.23833@lists.openembedded.org>
2023-08-30 10:43 ` Richard Purdie
[not found] ` <178023427EE7BA0B.20206@lists.openembedded.org>
2023-08-30 13:03 ` Richard Purdie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNt45UVb0Ps/kKLM@windriver.com \
--to=paul.gortmaker@windriver.com \
--cc=bruce.ashfield@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=richard.purdie@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox