public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* the stuttering regression in 7.0: should I have done something different?
@ 2026-04-23 16:30 Thorsten Leemhuis
  2026-04-26 21:16 ` Greg KH
  0 siblings, 1 reply; 2+ messages in thread
From: Thorsten Leemhuis @ 2026-04-23 16:30 UTC (permalink / raw)
  To: Greg KH, Linus Torvalds; +Cc: Linux kernel regressions list, LKML

Linus, Greg, if you have a minute, please help me out with something I'm
wondering about:

Should I have done something different wrt. to the periodic lockup aka
stuttering regression? The one quite a few users encountered, bisected,
and reported last week following the 7.0 release before it was fixed in
mainline & 7.0.1 during the first half of this week; see below for a
timeline of the whole thing.

I'm not asking if this could have been prevented or if the developers
did anything wrong handling this. I ask because I want to do the right
thing if similar situations arise in the future to ensure they are
handled like you folks want them to.

I for example wonder if I maybe should have made noise about this
regression earlier when I noticed that it affected quite a few (many?)
people. Or made more noise at the point where I spoke up, as then the
path to 7.0.1 might have been easier? Or should I have asked Linus for a
revert (or submit one myself) shortly after the impact became clear --
so that the problem would have been solved quickly while an improved
version of the culprit was developed and mainlined in parallel via the
regular channels?

On a related note: Do you think it would be wise if I started
maintaining a "regression fixes git tree" that collects temporary
reverts and wip fixes while fully formed reverts and/or fixes for
regressions are developed and make it through the ranks to mainline
(which often takes a few days or a week -- or multiple weeks in some
cases)? The idea comes up every now and then, as interested users and
distros then could easily avoid known regressions through that tree; it
could also serve as a test bed for reverts I could send to Linus in case
a proper revert/fix takes too long through the regular channels.

But I'm not sure if such an approach is really a good idea. I'd prefer
if our processes would be so quick and flexible that such a tree would
not be needed. At the same time I see that "quick and flexible" is often
not the case and that we are unlikely to get there any time soon
(especially for fixes that need to reach stable/longterm trees, as it
simply takes a while from initial reporting of a regression to
mainlining a fix [or at least hitting -next] to preparing and releasing
a new stable/longterm release with it).

FWIW, here is the rough timeline of the regression, just to be sure we
are all on the same page:

* The regression I'm talking about is caused by d6e152d905bdb1
("clockevents: Prevent timer interrupt starvation") [authored:
2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival:
next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 22:48:06)]

* On Monday and thus within 24 of the 7.0 release the first report about
the regression came in and immediately mentioned that a revert was able
to fix things:
https://lore.kernel.org/all/68d1e9ac-2780-4be3-8ee3-0788062dd3a4@gmail.com/

* On Tuesday someone else confirmed the findings and mentioned that
"several users" were seeing the problem:
https://lore.kernel.org/all/aeb848aa-404a-40fb-bd41-329644623b1d@cachyos.org/

* A few hours later (aka within 24 hours of the first report) Thomas had
a rough fix ready https://lore.kernel.org/all/87340xfeje.ffs@tglx/ (yeah!)

* On Thursday the fix was committed to the tip tree:
https://lore.kernel.org/all/177636758252.1323100.5283878386670888513.tip-bot2@tip-bot2/

* On Sunday I asked when the fix was going to be mainlined (with Linus
in CC) -- I feared Greg would soon start preparing 7.0.1-rc1 and I
wanted to ensure the fix was included there:
https://lore.kernel.org/all/5cbb14d8-46f9-4197-917f-51da852d7500@leemhuis.info/

* On Monday morning (UTC) mingo submitted a PR wit the fix:
https://lore.kernel.org/all/aeXYPt1FEbFRZNJf@gmail.com/

* On Monday Greg released 7.0.1-rc1 without the fix -- and a backport of
the culprit was in the -rc1 of various earlier series. Thomas quickly
told the stable team to not backport the culprit before the fix was
mainlined https://lore.kernel.org/all/87pl3ten5y.ffs@tglx/

* On Monday night Linus merged the PR from mingo as 4096fd0e8eaea1
("clockevents: Add missing resets of the next_event_forced flag")
[authored: 2026-04-14 22:55:01; committed: 2026-04-16 21:22:04; next
arrival: next-20260417; merged: 2026-04-21 00:30:08; v7.0-post]

* On Tuesday morning I wrote a mail to Greg about including the fix in
7.0.1; Thomas round about the same time provided the necessary backport,
which Greg then included out-of-band:
https://lore.kernel.org/all/2026042105-malformed-probation-232b@gregkh/
https://lore.kernel.org/all/87jyu0de2c.ffs@tglx/

* v7.0.1 is released on Wednesday, 2026-04-22 13:32:23

I lost track of how many people reported the regression exactly, but I
noticed at least seven reports (most of them in the past week) – and a
few people mentioned to me privately that they were affected, too. So it
was something that annoyed quite a few people afaics -- and made them
bisect, just to find out that the problem was known and a fix existed
already. This widespread effect is why I was wondering if I should have
done something differently, as a quicker fix could have avoided a few
people some pain.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: the stuttering regression in 7.0: should I have done something different?
  2026-04-23 16:30 the stuttering regression in 7.0: should I have done something different? Thorsten Leemhuis
@ 2026-04-26 21:16 ` Greg KH
  0 siblings, 0 replies; 2+ messages in thread
From: Greg KH @ 2026-04-26 21:16 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: Linus Torvalds, Linux kernel regressions list, LKML

On Thu, Apr 23, 2026 at 06:30:24PM +0200, Thorsten Leemhuis wrote:
> Linus, Greg, if you have a minute, please help me out with something I'm
> wondering about:
> 
> Should I have done something different wrt. to the periodic lockup aka
> stuttering regression? The one quite a few users encountered, bisected,
> and reported last week following the 7.0 release before it was fixed in
> mainline & 7.0.1 during the first half of this week; see below for a
> timeline of the whole thing.
> 
> I'm not asking if this could have been prevented or if the developers
> did anything wrong handling this. I ask because I want to do the right
> thing if similar situations arise in the future to ensure they are
> handled like you folks want them to.

I think this went fine.  It was caught properly, and fixed, and pushed
out to users pretty quickly.  We can't really ask for more :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-27  3:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 16:30 the stuttering regression in 7.0: should I have done something different? Thorsten Leemhuis
2026-04-26 21:16 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox