From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: openembedded-core <openembedded-core@lists.openembedded.org>,
Bruce Ashfield <bruce.ashfield@gmail.com>
Subject: Re: Dilemma on changes - merge or not to merge (e.g. 6.4)
Date: Tue, 15 Aug 2023 09:08:53 -0400 [thread overview]
Message-ID: <ZNt45UVb0Ps/kKLM@windriver.com> (raw)
In-Reply-To: <1b3bb1c747644f83156f6269de2c502660c18466.camel@linuxfoundation.org>
[Dilemma on changes - merge or not to merge (e.g. 6.4)] On 14/08/2023 (Mon 10:54) Richard Purdie wrote:
> I'm becoming a little weary/wary of some of the changes that are coming
> in. The challenge is that once they merge, issues become the problem of
> a very small number of people.
>
> My current dilemma is the 6.4 kernel. People would like it, we'd really
> ideally use it for the next release but there are issues.
>
> I've worked through a few, at least pinning down where the issues were
> then resolving them with the help of others (thanks Bruce, Jon, Ross).
>
> Remaining are:
> * an error upon boot on preempt-rt on qemux86-64
> (e.g. https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/7616/steps/36/logs/stdio)
> We'll probably just have to ignore it in parselogs as it has been??
> around for a while and nobody seems interested in fixing it upstream.
Just back from vacation and I see an internal report of 10-ish at boot
NOHZ tick-stop error: local softirq work is pending, handler #80!!!
..on the 6.1.43-rt10-yocto-preempt-rt kernel, on real hardware. So it
seems we can't blame that one entirely on v6.4 kernel (or qemu).
We used to get (late 3.x and 4.x era) pretty common "NOHZ: local softirq
pending" messages even on common/popular distro kernels. But I haven't
seen those for a long time and they didn't scream "error" or have the
alarmist three exclamation marks either.
I'll see if I can dig into that further. This instance is new to me, so
any additional context or information I might not turn up myself would
be useful.
> * some random hangs:
> https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/349/steps/12/logs/stdio
> https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/354/steps/12/logs/stdio
>
> The latter are rare and intermittent, mainly taking out CI test builds.
> Most people aren't affected by them, find them hard to reproduce let
> alone fix and will ignore them. That will leave me/Bruce/PaulG holding
> the pieces.
Ugh. The RCU one is ugly and the Silent Boot Death one is no better.
Nobody likes SBD cases. They suck.
>
> I know Bruce spends a ton of time debugging weird things just to get
> the kernel to the point we can even consider merging and nobody ever
> really sees or appreciates that work :(.
Well, not "nobody". There are at least two people who have a good idea
of what Bruce does. :-P
Paul.
--
>
> Systemd was a similar challenge recently, multiple patches causing
> multiple issues with a significant impact on CI. In that case the
> issues weren't intermittent so resolution wasn't so bad.
>
> Rust and reproducibility??was given a pass so the rest of the changes
> could merge for it. That just meant there was less pressure and the
> reproducibility issue is still there with people saying its too hard.
> That issue is now spreading down the chain to other recipes.
>
> The toolchain test reports have thousands of failures nobody is really
> looking at. Similarly the now consistent ltp controllers failures
> (previously the reports weren't even consistent!).
>
> I'm worried the access control patches changing the tar format are
> going to destablise and once merged, people will move on to other
> things leaving any remaining intermittent issues to me. Already we're
> seeing things like sstate being blamed as it is easiest to do that. I
> end up having to "prove" it isn't that.
>
> There are intermittent ptests on the autobuilder too. I took mdadm
> ptest patches on the basis there was help to fix them. We are still see
> a lot of failures in CI from there. The glib-networking intermittent
> failures continue, I know Trevor has tried to dig into those but he is
> alone in doing it in code which isn't easy to navigate (and I don't
> know how to help there).
>
> As an idea of impact, every time one of these things fails in CI,
> someone has triage that failure. The bug triage team has to triage the
> bugs too.
>
> I don't know how we fix this but we really could do with more people
> able to dive in and help with these intermittent issues. I'm really
> really apprehensive about merging some patches as I can just tell
> they're going to cause pain :(.
>
> Cheers,
>
> Richard
>
next prev parent reply other threads:[~2023-08-15 13:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-14 9:54 Dilemma on changes - merge or not to merge (e.g. 6.4) Richard Purdie
2023-08-15 13:08 ` Paul Gortmaker [this message]
2023-08-15 13:38 ` Richard Purdie
2023-08-16 7:55 ` [OE-core] " Rasmus Villemoes
2023-08-18 3:22 ` Paul Gortmaker
2023-08-22 9:31 ` Richard Purdie
[not found] ` <177DAAB2E4C3384A.4797@lists.openembedded.org>
2023-08-22 11:07 ` Richard Purdie
[not found] ` <177DAFEBFB5EB0D2.24073@lists.openembedded.org>
2023-08-22 11:47 ` Richard Purdie
2023-08-22 12:20 ` Mikko Rapeli
2023-08-22 12:28 ` Richard Purdie
2023-08-22 12:31 ` Alexander Kanavin
[not found] ` <177DB4530EBE3FA3.24073@lists.openembedded.org>
2023-08-22 14:49 ` Richard Purdie
[not found] ` <177DBC07E94591CC.4797@lists.openembedded.org>
2023-08-22 21:08 ` Richard Purdie
[not found] ` <177DD0B30D8FEDF8.27837@lists.openembedded.org>
2023-08-22 22:01 ` Richard Purdie
[not found] ` <177DD39B5534099F.27837@lists.openembedded.org>
2023-08-23 21:16 ` Richard Purdie
[not found] ` <177E1FB73F514F09.8058@lists.openembedded.org>
2023-08-24 14:04 ` Richard Purdie
[not found] ` <177E56C1DFAB4DFC.13053@lists.openembedded.org>
2023-08-24 20:18 ` Richard Purdie
2023-08-25 5:04 ` Frédéric Martinsons
2023-08-25 6:27 ` Mikko Rapeli
2023-08-25 6:34 ` Richard Purdie
2023-08-25 7:26 ` Mikko Rapeli
[not found] ` <177E8CC0D944344B.23833@lists.openembedded.org>
2023-08-30 10:43 ` Richard Purdie
[not found] ` <178023427EE7BA0B.20206@lists.openembedded.org>
2023-08-30 13:03 ` Richard Purdie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNt45UVb0Ps/kKLM@windriver.com \
--to=paul.gortmaker@windriver.com \
--cc=bruce.ashfield@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=richard.purdie@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.