Openembedded Core Discussions
 help / color / mirror / Atom feed
From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: openembedded-core <openembedded-core@lists.openembedded.org>,
	Bruce Ashfield <bruce.ashfield@gmail.com>
Subject: Re: Dilemma on changes - merge or not to merge (e.g. 6.4)
Date: Tue, 15 Aug 2023 09:08:53 -0400	[thread overview]
Message-ID: <ZNt45UVb0Ps/kKLM@windriver.com> (raw)
In-Reply-To: <1b3bb1c747644f83156f6269de2c502660c18466.camel@linuxfoundation.org>

[Dilemma on changes - merge or not to merge (e.g. 6.4)] On 14/08/2023 (Mon 10:54) Richard Purdie wrote:

> I'm becoming a little weary/wary of some of the changes that are coming
> in. The challenge is that once they merge, issues become the problem of
> a very small number of people.
> 
> My current dilemma is the 6.4 kernel. People would like it, we'd really
> ideally use it for the next release but there are issues.
> 
> I've worked through a few, at least pinning down where the issues were
> then resolving them with the help of others (thanks Bruce, Jon, Ross).
> 
> Remaining are:
>   * an error upon boot on preempt-rt on qemux86-64
>      (e.g. https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/7616/steps/36/logs/stdio)
>      We'll probably just have to ignore it in parselogs as it has been??
>      around for a while and nobody seems interested in fixing it upstream.

Just back from vacation and I see an internal report of 10-ish at boot

  NOHZ tick-stop error: local softirq work is pending, handler #80!!!

..on the 6.1.43-rt10-yocto-preempt-rt kernel, on real hardware.  So it
seems we can't blame that one entirely on v6.4 kernel (or qemu).

We used to get (late 3.x and 4.x era) pretty common "NOHZ: local softirq
pending" messages even on common/popular distro kernels.  But I haven't
seen those for a long time and they didn't scream "error" or have the
alarmist three exclamation marks either.

I'll see if I can dig into that further.  This instance is new to me, so
any additional context or information I might not turn up myself would
be useful.

>   * some random hangs:
>      https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/349/steps/12/logs/stdio
>      https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/354/steps/12/logs/stdio
> 
> The latter are rare and intermittent, mainly taking out CI test builds.
> Most people aren't affected by them, find them hard to reproduce let
> alone fix and will ignore them. That will leave me/Bruce/PaulG holding
> the pieces.

Ugh.  The RCU one is ugly and the Silent Boot Death one is no better.
Nobody likes SBD cases.  They suck.

> 
> I know Bruce spends a ton of time debugging weird things just to get
> the kernel to the point we can even consider merging and nobody ever
> really sees or appreciates that work :(.

Well, not "nobody".  There are at least two people who have a good idea
of what Bruce does.  :-P

Paul.
--

> 
> Systemd was a similar challenge recently, multiple patches causing
> multiple issues with a significant impact on CI. In that case the
> issues weren't intermittent so resolution wasn't so bad.
> 
> Rust and reproducibility??was given a pass so the rest of the changes
> could merge for it. That just meant there was less pressure and the
> reproducibility issue is still there with people saying its too hard.
> That issue is now spreading down the chain to other recipes.
> 
> The toolchain test reports have thousands of failures nobody is really
> looking at. Similarly the now consistent ltp controllers failures
> (previously the reports weren't even consistent!).
> 
> I'm worried the access control patches changing the tar format are
> going to destablise and once merged, people will move on to other
> things leaving any remaining intermittent issues to me. Already we're
> seeing things like sstate being blamed as it is easiest to do that. I
> end up having to "prove" it isn't that.
> 
> There are intermittent ptests on the autobuilder too. I took mdadm
> ptest patches on the basis there was help to fix them. We are still see
> a lot of failures in CI from there. The glib-networking intermittent
> failures continue, I know Trevor has tried to dig into those but he is
> alone in doing it in code which isn't easy to navigate (and I don't
> know how to help there).
> 
> As an idea of impact, every time one of these things fails in CI,
> someone has triage that failure. The bug triage team has to triage the
> bugs too.
> 
> I don't know how we fix this but we really could do with more people
> able to dive in and help with these intermittent issues. I'm really
> really apprehensive about merging some patches as I can just tell
> they're going to cause pain :(.
> 
> Cheers,
> 
> Richard
> 


  reply	other threads:[~2023-08-15 13:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14  9:54 Dilemma on changes - merge or not to merge (e.g. 6.4) Richard Purdie
2023-08-15 13:08 ` Paul Gortmaker [this message]
2023-08-15 13:38   ` Richard Purdie
2023-08-16  7:55   ` [OE-core] " Rasmus Villemoes
2023-08-18  3:22     ` Paul Gortmaker
2023-08-22  9:31       ` Richard Purdie
     [not found]       ` <177DAAB2E4C3384A.4797@lists.openembedded.org>
2023-08-22 11:07         ` Richard Purdie
     [not found]         ` <177DAFEBFB5EB0D2.24073@lists.openembedded.org>
2023-08-22 11:47           ` Richard Purdie
2023-08-22 12:20             ` Mikko Rapeli
2023-08-22 12:28               ` Richard Purdie
2023-08-22 12:31                 ` Alexander Kanavin
     [not found]               ` <177DB4530EBE3FA3.24073@lists.openembedded.org>
2023-08-22 14:49                 ` Richard Purdie
     [not found]                 ` <177DBC07E94591CC.4797@lists.openembedded.org>
2023-08-22 21:08                   ` Richard Purdie
     [not found]                   ` <177DD0B30D8FEDF8.27837@lists.openembedded.org>
2023-08-22 22:01                     ` Richard Purdie
     [not found]                     ` <177DD39B5534099F.27837@lists.openembedded.org>
2023-08-23 21:16                       ` Richard Purdie
     [not found]                       ` <177E1FB73F514F09.8058@lists.openembedded.org>
2023-08-24 14:04                         ` Richard Purdie
     [not found]                         ` <177E56C1DFAB4DFC.13053@lists.openembedded.org>
2023-08-24 20:18                           ` Richard Purdie
2023-08-25  5:04                             ` Frédéric Martinsons
2023-08-25  6:27                             ` Mikko Rapeli
2023-08-25  6:34                               ` Richard Purdie
2023-08-25  7:26                                 ` Mikko Rapeli
     [not found]                               ` <177E8CC0D944344B.23833@lists.openembedded.org>
2023-08-30 10:43                                 ` Richard Purdie
     [not found]                                 ` <178023427EE7BA0B.20206@lists.openembedded.org>
2023-08-30 13:03                                   ` Richard Purdie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNt45UVb0Ps/kKLM@windriver.com \
    --to=paul.gortmaker@windriver.com \
    --cc=bruce.ashfield@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox