public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Linux regressions report  for mainline [2023-09-24]
@ 2023-09-24 16:17 Regzbot (on behalf of Thorsten Leemhuis)
  2023-09-25  8:02 ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Regzbot (on behalf of Thorsten Leemhuis) @ 2023-09-24 16:17 UTC (permalink / raw)
  To: LKML, Linus Torvalds, Linux regressions mailing list

Hi Linus. There is not much to report wrt to regressions introduced
during the current cycle, as many of those I tracked were fixed over the
past week (not totally sure, but I think this includes all Guenter
reported for -rc1). There is only one regression left that I'm aware of
and a fix for that is under discussion already, so there is nothing to
worry about (FWIW, there are also three vague reports, but I'll ignore
those here).

That being said, there are two other recent regression that I want to
tell you about, as I'm not sure if they are handled as you would like
them to be handled.

(1) Userspace nftables v1.0.6 generated incorrect bytecode that hits a
new kernel check introduced in 0ebc1064e4874d ("netfilter: nf_tables:
disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID")
[v6.5-rc4, v6.4.8, v6.1.43, v5.15.124, v5.10.190]). This causes trouble
for Debian users, as the latest version apparently was using such a
nftables release[1]. Pablo provided patches for the Debian devs to fix
their nftables package[2]; I asked if this could be fixed on the kernel
level and got this reply from Florian: "So while this can be
theoretically fixed in the kernel I don't see a sane way to do it.
Error unwinding / recovery from deeply nested errors is already too
complex for my taste."[3]. That's not really how we normally handle the
"no regression" rule, OTOH due to what Florian said might be the right
thing to do; but it's a judgement call, hence I wanted to tell you about it.

[1] https://lore.kernel.org/all/20230911213750.5B4B663206F5@dd20004.kasserver.com/
[2] https://lore.kernel.org/all/ZP+bUpxJiFcmTWhy@calendula/
[3] https://lore.kernel.org/all/20230912102701.GA13516@breakpoint.cc/


(2) Nearly six weeks ago there was a report that 101bd907b4244a ("misc:
rtsx: judge ASPM Mode to set PETXCFG Reg") [v6.5-rc6, v6.4.11, v6.1.46,
v5.15.127] broke booting various laptops (many or all of them are Dell).
This apparently plagues quite a few users, hence there were multiple
reports (see [2] for those I'm aware of). At least Fedora, openSUSE, and
nixOS have meanwhile reverted the change in their latest stable kernels
[3]. I one and a half week proposed to revert the culprit when I fully
noticed it's impact, but Greg wanted to give the developers more time.
We finally have a fix in sight now [5]; someone affected replied that it
helps. Not sure what's the right way forward now. But overall this to me
feels a lot like "this is not how a regression should be handled".
That's why I wanted to bring it up here in case to ensure your are aware
of this.

[1]
https://lore.kernel.org/all/5f968b95-6b1c-4d6f-aac7-5d54f66834a8@sapience.com/
https://lore.kernel.org/all/30b69186-5a6e-4f53-b24c-2221926fc3b4@sapience.com/

[2]
https://bugs.archlinux.org/task/79439#comment221866
https://bugzilla.kernel.org/show_bug.cgi?id=217802
https://bugzilla.suse.com/show_bug.cgi?id=1214428
https://github.com/NixOS/nixpkgs/issues/253418
https://lore.kernel.org/all/5DHV0S.D0F751ZF65JA1@gmail.com/

[3]
https://gitlab.com/cki-project/kernel-ark/-/commit/80c615ec2edb4aadded21fe924e2caa172d59577
https://github.com/openSUSE/kernel-source/commit/1b02b1528a26f4e9b577e215c114d8c5e773ee10
https://github.com/NixOS/nixpkgs/pull/255824

[4]
https://lore.kernel.org/all/2023091333-fiftieth-trustless-d69d@gregkh/

[5]
https://lore.kernel.org/all/37b1afb997f14946a8784c73d1f9a4f5@realtek.com/

Ciao, Thorsten

---

Hi, this is regzbot, the Linux kernel regression tracking bot.

Currently I'm aware of 1 regressions in linux-mainline. Find the
current status below and the latest on the web:

https://linux-regtracking.leemhuis.info/regzbot/mainline/

Bye bye, hope to see you soon for the next report.
   Regzbot (on behalf of Thorsten Leemhuis)


======================================================
current cycle (v6.5.. aka v6.6-rc), culprit identified
======================================================


[ *NEW* ] mm, memcg: runc fails to gather cgroup statistics
-----------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
https://lore.kernel.org/lkml/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/

By Jeremi Piotrowski; 4 days ago; 25 activities, latest 1 days ago.
Introduced in 86327e8eb94c (v6.6-rc1)

Fix incoming:
* mm, memcg: reconsider kmem.limit_in_bytes deprecation
  https://lore.kernel.org/lkml/d44b0746-2aa6-4608-ab22-bcb9efb27a26@leemhuis.info/


=============
End of report
=============

All regressions marked '[ *NEW* ]' were added since the previous report,
which can be found here:
https://lore.kernel.org/r/169436306694.2246708.7828658786502488268@leemhuis.info

Thanks for your attention, have a nice day!

  Regzbot, your hard working Linux kernel regression tracking robot


P.S.: Wanna know more about regzbot or how to use it to track regressions
for your subsystem? Then check out the getting started guide or the
reference documentation:

https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The short version: if you see a regression report you want to see
tracked, just send a reply to the report where you Cc
regressions@lists.linux.dev with a line like this:

#regzbot introduced: v5.13..v5.14-rc1

If you want to fix a tracked regression, just do what is expected
anyway: add a 'Link:' tag with the url to the report, e.g.:

Link: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux regressions report  for mainline [2023-09-24]
  2023-09-24 16:17 Linux regressions report for mainline [2023-09-24] Regzbot (on behalf of Thorsten Leemhuis)
@ 2023-09-25  8:02 ` Greg KH
  2023-09-25  9:11   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2023-09-25  8:02 UTC (permalink / raw)
  To: Regzbot (on behalf of Thorsten Leemhuis)
  Cc: LKML, Linus Torvalds, Linux regressions mailing list

On Sun, Sep 24, 2023 at 04:17:40PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
> (2) Nearly six weeks ago there was a report that 101bd907b4244a ("misc:
> rtsx: judge ASPM Mode to set PETXCFG Reg") [v6.5-rc6, v6.4.11, v6.1.46,
> v5.15.127] broke booting various laptops (many or all of them are Dell).
> This apparently plagues quite a few users, hence there were multiple
> reports (see [2] for those I'm aware of). At least Fedora, openSUSE, and
> nixOS have meanwhile reverted the change in their latest stable kernels
> [3]. I one and a half week proposed to revert the culprit when I fully
> noticed it's impact, but Greg wanted to give the developers more time.
> We finally have a fix in sight now [5]; someone affected replied that it
> helps. Not sure what's the right way forward now. But overall this to me
> feels a lot like "this is not how a regression should be handled".
> That's why I wanted to bring it up here in case to ensure your are aware
> of this.

We now have confirmed testing that the proposed fix resolves the issue
so I'll be getting it to Linus in time for the next -rc.  I've been
traveling all last week and this week for conferences so my response
times have been a bit slow, sorry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux regressions report for mainline [2023-09-24]
  2023-09-25  8:02 ` Greg KH
@ 2023-09-25  9:11   ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-25 11:04     ` Genes Lists
  2023-09-28 13:04     ` Greg KH
  0 siblings, 2 replies; 5+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-25  9:11 UTC (permalink / raw)
  To: Greg KH; +Cc: LKML, Linus Torvalds, Linux regressions mailing list

On 25.09.23 10:02, Greg KH wrote:
> On Sun, Sep 24, 2023 at 04:17:40PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
>> (2) Nearly six weeks ago there was a report that 101bd907b4244a ("misc:
>> rtsx: judge ASPM Mode to set PETXCFG Reg") [v6.5-rc6, v6.4.11, v6.1.46,
>> v5.15.127] broke booting various laptops (many or all of them are Dell).
>> This apparently plagues quite a few users, hence there were multiple
>> reports (see [2] for those I'm aware of). At least Fedora, openSUSE, and
>> nixOS have meanwhile reverted the change in their latest stable kernels
>> [3]. I one and a half week proposed to revert the culprit when I fully
>> noticed it's impact, but Greg wanted to give the developers more time.
>> We finally have a fix in sight now [5]; someone affected replied that it
>> helps. Not sure what's the right way forward now. But overall this to me
>> feels a lot like "this is not how a regression should be handled".
>> That's why I wanted to bring it up here in case to ensure your are aware
>> of this.
> 
> We now have confirmed testing that the proposed fix resolves the issue
> so I'll be getting it to Linus in time for the next -rc.

Many thx!

>  I've been
> traveling all last week and this week for conferences so my response
> times have been a bit slow, sorry.

No worries, I already suspected this[1]. The major aspect in this whole
episode that bugs me a lot is different anyway:

Wouldn't it have been much much better to revert[2] the culprit quickly
once it was known to cause a regression that annoyed some users a whole
lot[3, 4]?

Yes, looking back now it's easy to ask. But I encounter similar
situations all the time: developers and maintainers are
(understandably!) often quite reluctant to revert commits causing
regressions, especially when a fix seems not far off. But in the end it
often (like in this case) takes quite a while to polish the fix, get it
tested, reviewed, in -next for a day or two, into mainline, and (when
needed, like in this case) incorporation in affected stable series.

That's why I wrote the "Expectations and best practices for fixing
regressions" section in Documentation/process/handling-regressions.rst,
which mentions rough time frames to help when a revert is appropriate.
But nobody cares about them -- and I don't blame anyone, as Linus never
ACKed them; even parts that are directly based on statements from Linus
are ignored all the time (often because people simply don't known about
them [5]). That makes my job hard. :-/

Ciao, Thorsten

[1] Sadly I couldn't make it to Bilbao this year; ohh, and BTW, enjoy
Paris this week; wanted to be there, but that didn't work out due to
stupid reasons. :-/

[2] Or would that have cause a big regression for anyone? doesn't look
like it from here, but maybe I'm missing something.

[3] FWIW, I consider it partly my fault that this didn't happen, as I
should have rooted for this way earlier. :-/ I was on vacation when when
the report came in and only realize the full impact much later; then I
finally suggested to revert this ~11 days ago a fix seemed not too far
off. OTOH I still thing a revert at that point would have been the right
thing to do.

[4] And reapply it later (outside of the merge window) together with a
fix or directly in fixed form.

[5] recent example:
https://lore.kernel.org/all/a2839c37-580f-4091-8bbc-50eea96c7c8b@leemhuis.info/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux regressions report for mainline [2023-09-24]
  2023-09-25  9:11   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-25 11:04     ` Genes Lists
  2023-09-28 13:04     ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Genes Lists @ 2023-09-25 11:04 UTC (permalink / raw)
  To: Linux regressions mailing list, Greg KH; +Cc: LKML, Linus Torvalds

On 9/25/23 05:11, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 25.09.23 10:02, Greg KH wrote:
>> On Sun, Sep 24, 2023 at 04:17:40PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
>>> (2) Nearly six weeks ago there was a report that 101bd907b4244a ("misc:
>>> rtsx: judge ASPM Mode to set PETXCFG Reg") [v6.5-rc6, v6.4.11, v6.1.46,
>>> v5.15.127] broke booting various laptops (many or all of them are Dell).
>>> This apparently plagues quite a few users, hence there were multiple
>>> reports (see [2] for those I'm aware of). At least Fedora, openSUSE, and
>>> nixOS have meanwhile reverted the change in their latest stable kernels
>>> [3]. I one and a half week proposed to revert the culprit when I fully
>>> noticed it's impact, but Greg wanted to give the developers more time.
>>> We finally have a fix in sight now [5]; someone affected replied that it
>>> helps. Not sure what's the right way forward now. But overall this to me
>>> feels a lot like "this is not how a regression should be handled".
>>> That's why I wanted to bring it up here in case to ensure your are aware
>>> of this.
>>
>> We now have confirmed testing that the proposed fix resolves the issue
>> so I'll be getting it to Linus in time for the next -rc.
> 
> Many thx!
> 
Thank you all for taking care of this - much appreciated.

gene


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux regressions report for mainline [2023-09-24]
  2023-09-25  9:11   ` Linux regression tracking (Thorsten Leemhuis)
  2023-09-25 11:04     ` Genes Lists
@ 2023-09-28 13:04     ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Greg KH @ 2023-09-28 13:04 UTC (permalink / raw)
  To: Linux regressions mailing list; +Cc: LKML, Linus Torvalds

On Mon, Sep 25, 2023 at 11:11:51AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 25.09.23 10:02, Greg KH wrote:
> >  I've been
> > traveling all last week and this week for conferences so my response
> > times have been a bit slow, sorry.
> 
> No worries, I already suspected this[1]. The major aspect in this whole
> episode that bugs me a lot is different anyway:
> 
> Wouldn't it have been much much better to revert[2] the culprit quickly
> once it was known to cause a regression that annoyed some users a whole
> lot[3, 4]?

Possibly, yes.  It's a balancing act between keeping the pressure on the
developer to provide a fix, vs. the severity of the issue and how
wide-spread it is, vs. my ability to do anything at all due to
non-development issues (i.e. travel and conference work.)

Trying to pick the best thing with all of those is hard, sometimes we
get it wrong, sometimes we get it wrong, usually someone is upset no
matter what we pick, including a lack of sleep for the maintainer.

So "it's complicated", as you know...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-28 13:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-24 16:17 Linux regressions report for mainline [2023-09-24] Regzbot (on behalf of Thorsten Leemhuis)
2023-09-25  8:02 ` Greg KH
2023-09-25  9:11   ` Linux regression tracking (Thorsten Leemhuis)
2023-09-25 11:04     ` Genes Lists
2023-09-28 13:04     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox