* Re: [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu [not found] ` <e733c14e-0bdd-41b2-82aa-90c0449aff25@redhat.com> @ 2024-02-21 16:32 ` Linux regression tracking (Thorsten Leemhuis) 2024-02-24 7:00 ` Thorsten Leemhuis 0 siblings, 1 reply; 6+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-02-21 16:32 UTC (permalink / raw) To: Christian Brauner (Microsoft), Alexander Viro Cc: Matt Heon, Ed Santiago, Linux-fsdevel, Paul Holzinger, Linux regressions mailing list, LKML [adding Al, Christian and a few lists to the list of recipients to ensure all affected parties are aware of this new report about a bug for which a fix is committed, but not yet mainlined] Thread starts here: https://lore.kernel.org/all/6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com/ On 21.02.24 16:56, Paul Holzinger wrote: > Hi Thorsten, > > On 21/02/2024 15:42, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 21.02.24 15:31, Paul Holzinger wrote: >>> On 21/02/2024 15:20, Paul Holzinger wrote: >>>> we are seeing problems with the 6.8-rc kernels[1] in our CI systems, >>>> we see random process timeouts across our test suite. It appears that >>>> sometimes a process is unable to exit, nothing happens even if we send >>>> SIGKILL and instead the process consumes a lof of cpu. >>> [...] >> Thx for the report. >> >> Warning, this is not my area of expertise, so this might send you in the >> totally wrong direction. >> >> I briefly checked lore for similar reports and noticed this one when I >> searched for shrink_dcache_parent: >> >> https://lore.kernel.org/all/ZcKOGpTXnlmfplGR@gmail.com/ > >> Do you think that might be related? A fix for this is pending in vfs.git. >> > yes that does seem very relevant. Running the sysrq command I get the > same backtrace as the reporter there so I think it is fair to assume > this is the same bug. Looking forward to get the fix into mainline. FWIW, "the fix" afaics is 7e4a205fe56b90 ("Revert "get rid of DCACHE_GENOCIDE"") sitting 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for more than a week now. I assume Al or Christian will send this to Linus soon. Christian in fact already mentioned that he plans to send another vfs fix to Linux, but that one iirc was sitting in another repo (but I might be mistaken there!). Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. P.S.: let me update regzbot while at it: #regzbot introduced 57851607326a2beef21e67f83f4f53a90df8445a. #regzbot fix: Revert "get rid of DCACHE_GENOCIDE" ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu 2024-02-21 16:32 ` [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu Linux regression tracking (Thorsten Leemhuis) @ 2024-02-24 7:00 ` Thorsten Leemhuis 2024-02-24 23:43 ` Linus Torvalds 2024-02-25 1:17 ` Al Viro 0 siblings, 2 replies; 6+ messages in thread From: Thorsten Leemhuis @ 2024-02-24 7:00 UTC (permalink / raw) To: Christian Brauner (Microsoft), Alexander Viro, Linus Torvalds Cc: Matt Heon, Ed Santiago, Linux-fsdevel, Paul Holzinger, Linux regressions mailing list, LKML On 21.02.24 17:32, Linux regression tracking (Thorsten Leemhuis) wrote: > [adding Al, Christian and a few lists to the list of recipients to > ensure all affected parties are aware of this new report about a bug for > which a fix is committed, but not yet mainlined] > > Thread starts here: > https://lore.kernel.org/all/6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com/ [adding Linus now as well] TWIMC, the quoted mail apparently did not get delivered to Al (I got a "48 hours on the queue" warning from my hoster's MTA ~10 hours ago). Ohh, and there is some suspicion that the problem Calvin[1] and Paul (this thread, see quote below for the gist) encountered also causes problems for bwrap (used by Flapak)[2]. [1] https://lore.kernel.org/all/ZcKOGpTXnlmfplGR@gmail.com/ [2] https://github.com/containers/bubblewrap/issues/620 Christian, Linus, all that makes me wonder if it might be wise to pick up the revert[1] Al queued directly in case Al does not submit a PR today or tomorrow for -rc6. [1] https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=fixes&id=7e4a205fe56b9092f0143dad6aa5fee081139b09 Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > On 21.02.24 16:56, Paul Holzinger wrote: >> Hi Thorsten, >> >> On 21/02/2024 15:42, Linux regression tracking (Thorsten Leemhuis) wrote: >>> On 21.02.24 15:31, Paul Holzinger wrote: >>>> On 21/02/2024 15:20, Paul Holzinger wrote: >>>>> we are seeing problems with the 6.8-rc kernels[1] in our CI systems, >>>>> we see random process timeouts across our test suite. It appears that >>>>> sometimes a process is unable to exit, nothing happens even if we send >>>>> SIGKILL and instead the process consumes a lof of cpu. >>>> [...] >>> Thx for the report. >>> >>> Warning, this is not my area of expertise, so this might send you in the >>> totally wrong direction. >>> >>> I briefly checked lore for similar reports and noticed this one when I >>> searched for shrink_dcache_parent: >>> >>> https://lore.kernel.org/all/ZcKOGpTXnlmfplGR@gmail.com/ >> >>> Do you think that might be related? A fix for this is pending in vfs.git. >>> >> yes that does seem very relevant. Running the sysrq command I get the >> same backtrace as the reporter there so I think it is fair to assume >> this is the same bug. Looking forward to get the fix into mainline. > > FWIW, "the fix" afaics is 7e4a205fe56b90 ("Revert "get rid of > DCACHE_GENOCIDE"") sitting 'fixes' of > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for more than > a week now. > > I assume Al or Christian will send this to Linus soon. Christian in fact > already mentioned that he plans to send another vfs fix to Linux, but > that one iirc was sitting in another repo (but I might be mistaken there!). > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > P.S.: let me update regzbot while at it: > > #regzbot introduced 57851607326a2beef21e67f83f4f53a90df8445a. > #regzbot fix: Revert "get rid of DCACHE_GENOCIDE" ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu 2024-02-24 7:00 ` Thorsten Leemhuis @ 2024-02-24 23:43 ` Linus Torvalds 2024-02-25 1:22 ` Al Viro 2024-02-25 1:17 ` Al Viro 1 sibling, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2024-02-24 23:43 UTC (permalink / raw) To: Linux regressions mailing list, Al Viro Cc: Christian Brauner (Microsoft), Matt Heon, Ed Santiago, Linux-fsdevel, Paul Holzinger, LKML On Fri, 23 Feb 2024 at 23:00, Thorsten Leemhuis <regressions@leemhuis.info> wrote: > > TWIMC, the quoted mail apparently did not get delivered to Al (I got a > "48 hours on the queue" warning from my hoster's MTA ~10 hours ago). Al's email has been broken for the last almost two weeks - the machine went belly-up in a major way. I bounced the email to his kernel.org email that seems to work, but I also think Al ends up being busy trying to get through everything else he missed, in addition to trying to get the machine working again... Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu 2024-02-24 23:43 ` Linus Torvalds @ 2024-02-25 1:22 ` Al Viro 2024-02-25 5:57 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 1 reply; 6+ messages in thread From: Al Viro @ 2024-02-25 1:22 UTC (permalink / raw) To: Linus Torvalds Cc: Linux regressions mailing list, Christian Brauner (Microsoft), Matt Heon, Ed Santiago, Linux-fsdevel, Paul Holzinger, LKML On Sat, Feb 24, 2024 at 03:43:43PM -0800, Linus Torvalds wrote: > On Fri, 23 Feb 2024 at 23:00, Thorsten Leemhuis > <regressions@leemhuis.info> wrote: > > > > TWIMC, the quoted mail apparently did not get delivered to Al (I got a > > "48 hours on the queue" warning from my hoster's MTA ~10 hours ago). > > Al's email has been broken for the last almost two weeks - the machine > went belly-up in a major way. > > I bounced the email to his kernel.org email that seems to work, but I > also think Al ends up being busy trying to get through everything else > he missed, in addition to trying to get the machine working again... FWIW, I'm pretty sure that it's fixed by #fixes^ (7e4a205fe56b) in my tree; I'll send a pull request, both for #fixes and #fixes.pathwalk.rcu ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu 2024-02-25 1:22 ` Al Viro @ 2024-02-25 5:57 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 0 replies; 6+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-02-25 5:57 UTC (permalink / raw) To: Al Viro, Linus Torvalds Cc: Linux regressions mailing list, Christian Brauner (Microsoft), Matt Heon, Ed Santiago, Linux-fsdevel, Paul Holzinger, LKML On 25.02.24 02:22, Al Viro wrote: > On Sat, Feb 24, 2024 at 03:43:43PM -0800, Linus Torvalds wrote: >> On Fri, 23 Feb 2024 at 23:00, Thorsten Leemhuis >> <regressions@leemhuis.info> wrote: >>> >>> TWIMC, the quoted mail apparently did not get delivered to Al (I got a >>> "48 hours on the queue" warning from my hoster's MTA ~10 hours ago). >> >> Al's email has been broken for the last almost two weeks - the machine >> went belly-up in a major way. >> >> I bounced the email to his kernel.org email that seems to work, Thx! >> but I >> also think Al ends up being busy trying to get through everything else >> he missed, in addition to trying to get the machine working again... > > FWIW, I'm pretty sure that it's fixed by #fixes^ (7e4a205fe56b) in > my tree; I'll send a pull request, both for #fixes and #fixes.pathwalk.rcu Great, thank you, too! Ciao, Thorsten ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu 2024-02-24 7:00 ` Thorsten Leemhuis 2024-02-24 23:43 ` Linus Torvalds @ 2024-02-25 1:17 ` Al Viro 1 sibling, 0 replies; 6+ messages in thread From: Al Viro @ 2024-02-25 1:17 UTC (permalink / raw) To: Linux regressions mailing list Cc: Christian Brauner (Microsoft), Alexander Viro, Linus Torvalds, Matt Heon, Ed Santiago, Linux-fsdevel, Paul Holzinger, LKML On Sat, Feb 24, 2024 at 08:00:27AM +0100, Thorsten Leemhuis wrote: > On 21.02.24 17:32, Linux regression tracking (Thorsten Leemhuis) wrote: > > [adding Al, Christian and a few lists to the list of recipients to > > ensure all affected parties are aware of this new report about a bug for > > which a fix is committed, but not yet mainlined] > > > > Thread starts here: > > https://lore.kernel.org/all/6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com/ > > [adding Linus now as well] > > TWIMC, the quoted mail apparently did not get delivered to Al (I got a > "48 hours on the queue" warning from my hoster's MTA ~10 hours ago). > > Ohh, and there is some suspicion that the problem Calvin[1] and Paul > (this thread, see quote below for the gist) encountered also causes > problems for bwrap (used by Flapak)[2]. > [1] https://lore.kernel.org/all/ZcKOGpTXnlmfplGR@gmail.com/ > [2] https://github.com/containers/bubblewrap/issues/620 > > Christian, Linus, all that makes me wonder if it might be wise to pick > up the revert[1] Al queued directly in case Al does not submit a PR > today or tomorrow for -rc6. See #fixes in my tree. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-02-25 5:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <6a150ddd-3267-4f89-81bd-6807700c57c1@redhat.com>
[not found] ` <652928aa-0fb8-425e-87b0-d65176dd2cfa@redhat.com>
[not found] ` <9b92706b-14c2-4761-95fb-7dbbaede57f4@leemhuis.info>
[not found] ` <e733c14e-0bdd-41b2-82aa-90c0449aff25@redhat.com>
2024-02-21 16:32 ` [REGRESSION] 6.8-rc process is unable to exit and consumes a lot of cpu Linux regression tracking (Thorsten Leemhuis)
2024-02-24 7:00 ` Thorsten Leemhuis
2024-02-24 23:43 ` Linus Torvalds
2024-02-25 1:22 ` Al Viro
2024-02-25 5:57 ` Linux regression tracking (Thorsten Leemhuis)
2024-02-25 1:17 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).