All of lore.kernel.org
 help / color / mirror / Atom feed
* the stuttering regression in 7.0: should I have done something different?
@ 2026-04-23 16:30 Thorsten Leemhuis
  2026-04-26 21:16 ` Greg KH
  2026-05-08  5:51 ` John Paul Adrian Glaubitz
  0 siblings, 2 replies; 18+ messages in thread
From: Thorsten Leemhuis @ 2026-04-23 16:30 UTC (permalink / raw)
  To: Greg KH, Linus Torvalds; +Cc: Linux kernel regressions list, LKML

Linus, Greg, if you have a minute, please help me out with something I'm
wondering about:

Should I have done something different wrt. to the periodic lockup aka
stuttering regression? The one quite a few users encountered, bisected,
and reported last week following the 7.0 release before it was fixed in
mainline & 7.0.1 during the first half of this week; see below for a
timeline of the whole thing.

I'm not asking if this could have been prevented or if the developers
did anything wrong handling this. I ask because I want to do the right
thing if similar situations arise in the future to ensure they are
handled like you folks want them to.

I for example wonder if I maybe should have made noise about this
regression earlier when I noticed that it affected quite a few (many?)
people. Or made more noise at the point where I spoke up, as then the
path to 7.0.1 might have been easier? Or should I have asked Linus for a
revert (or submit one myself) shortly after the impact became clear --
so that the problem would have been solved quickly while an improved
version of the culprit was developed and mainlined in parallel via the
regular channels?

On a related note: Do you think it would be wise if I started
maintaining a "regression fixes git tree" that collects temporary
reverts and wip fixes while fully formed reverts and/or fixes for
regressions are developed and make it through the ranks to mainline
(which often takes a few days or a week -- or multiple weeks in some
cases)? The idea comes up every now and then, as interested users and
distros then could easily avoid known regressions through that tree; it
could also serve as a test bed for reverts I could send to Linus in case
a proper revert/fix takes too long through the regular channels.

But I'm not sure if such an approach is really a good idea. I'd prefer
if our processes would be so quick and flexible that such a tree would
not be needed. At the same time I see that "quick and flexible" is often
not the case and that we are unlikely to get there any time soon
(especially for fixes that need to reach stable/longterm trees, as it
simply takes a while from initial reporting of a regression to
mainlining a fix [or at least hitting -next] to preparing and releasing
a new stable/longterm release with it).

FWIW, here is the rough timeline of the regression, just to be sure we
are all on the same page:

* The regression I'm talking about is caused by d6e152d905bdb1
("clockevents: Prevent timer interrupt starvation") [authored:
2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival:
next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 22:48:06)]

* On Monday and thus within 24 of the 7.0 release the first report about
the regression came in and immediately mentioned that a revert was able
to fix things:
https://lore.kernel.org/all/68d1e9ac-2780-4be3-8ee3-0788062dd3a4@gmail.com/

* On Tuesday someone else confirmed the findings and mentioned that
"several users" were seeing the problem:
https://lore.kernel.org/all/aeb848aa-404a-40fb-bd41-329644623b1d@cachyos.org/

* A few hours later (aka within 24 hours of the first report) Thomas had
a rough fix ready https://lore.kernel.org/all/87340xfeje.ffs@tglx/ (yeah!)

* On Thursday the fix was committed to the tip tree:
https://lore.kernel.org/all/177636758252.1323100.5283878386670888513.tip-bot2@tip-bot2/

* On Sunday I asked when the fix was going to be mainlined (with Linus
in CC) -- I feared Greg would soon start preparing 7.0.1-rc1 and I
wanted to ensure the fix was included there:
https://lore.kernel.org/all/5cbb14d8-46f9-4197-917f-51da852d7500@leemhuis.info/

* On Monday morning (UTC) mingo submitted a PR wit the fix:
https://lore.kernel.org/all/aeXYPt1FEbFRZNJf@gmail.com/

* On Monday Greg released 7.0.1-rc1 without the fix -- and a backport of
the culprit was in the -rc1 of various earlier series. Thomas quickly
told the stable team to not backport the culprit before the fix was
mainlined https://lore.kernel.org/all/87pl3ten5y.ffs@tglx/

* On Monday night Linus merged the PR from mingo as 4096fd0e8eaea1
("clockevents: Add missing resets of the next_event_forced flag")
[authored: 2026-04-14 22:55:01; committed: 2026-04-16 21:22:04; next
arrival: next-20260417; merged: 2026-04-21 00:30:08; v7.0-post]

* On Tuesday morning I wrote a mail to Greg about including the fix in
7.0.1; Thomas round about the same time provided the necessary backport,
which Greg then included out-of-band:
https://lore.kernel.org/all/2026042105-malformed-probation-232b@gregkh/
https://lore.kernel.org/all/87jyu0de2c.ffs@tglx/

* v7.0.1 is released on Wednesday, 2026-04-22 13:32:23

I lost track of how many people reported the regression exactly, but I
noticed at least seven reports (most of them in the past week) – and a
few people mentioned to me privately that they were affected, too. So it
was something that annoyed quite a few people afaics -- and made them
bisect, just to find out that the problem was known and a fix existed
already. This widespread effect is why I was wondering if I should have
done something differently, as a quicker fix could have avoided a few
people some pain.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-05-15 17:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 16:30 the stuttering regression in 7.0: should I have done something different? Thorsten Leemhuis
2026-04-26 21:16 ` Greg KH
2026-05-08  5:51 ` John Paul Adrian Glaubitz
2026-05-08  6:33   ` Thorsten Leemhuis
     [not found]     ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com>
2026-05-08  7:50       ` Thorsten Leemhuis
2026-05-08 20:15         ` Tony Rodriguez
2026-05-08 20:21           ` Tony Rodriguez
2026-05-10 21:29           ` Thomas Gleixner
2026-05-11  3:13             ` Tony Rodriguez
2026-05-12  5:03               ` the stuttering regression in 7.0: should I have done something different Tony Rodriguez
2026-05-12  8:17                 ` Thomas Gleixner
2026-05-12 21:43                   ` Tony Rodriguez
2026-05-13 20:28                     ` Thomas Gleixner
2026-05-14  7:24                       ` Tony Rodriguez
2026-05-14 10:24                         ` Thomas Gleixner
2026-05-15  4:47                           ` Tony Rodriguez
2026-05-15 15:35                             ` Thomas Gleixner
2026-05-15 17:51                               ` John Paul Adrian Glaubitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.