* scsi regression that after months is still not addressed and now bothering 6.1.y users, too
@ 2023-11-21 9:50 Thorsten Leemhuis
2023-11-21 9:57 ` Thorsten Leemhuis
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Thorsten Leemhuis @ 2023-11-21 9:50 UTC (permalink / raw)
To: Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen,
Adaptec OEM Raid Solutions
Cc: stable@vger.kernel.org, Sasha Levin,
Linux kernel regressions list, Hannes Reinecke, scsi, LKML,
Sasha Levin, Gilbert Wu, John Garry
* @SCSI maintainers: could you please look into below please?
* @Stable team: you might want to take a look as well and consider a
revert in 6.1.y (yes, I know, those are normally avoided, but here it
might make sense).
Hi everyone!
TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes
hangs for a while) that was reported months ago already but is still not
fixed. Not only that, it apparently more and more users run into this
recently, as the culprit was recently integrated into 6.1.y; I wonder if
it would be best to revert it there, unless a fix for mainline comes
into reach soon.
Details:
Quite a few machines with Adaptec controllers seems to hang for a few
tens of seconds to a few minutes before things start to work normally
again for a while:
https://bugzilla.kernel.org/show_bug.cgi?id=217599
That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid:
Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That
commit despite a warning of mine to Sasha recently made it into 6.1.53
-- and that way apparently recently reached more users recently, as
quite a few joined that ticket.
The culprit is authored by Sagar Biradar who unless I missed something
never replied even once to the ticket or earlier mails about it. Lore
has no messages from him since early June.
Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but
that didn't work out (see the ticket for details). Since then things
look stalled again, which is, ehh, unfortunate when it comes to
regressions.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis @ 2023-11-21 9:57 ` Thorsten Leemhuis 2023-11-21 11:30 ` John Garry 2023-11-24 16:25 ` Greg KH 2 siblings, 0 replies; 12+ messages in thread From: Thorsten Leemhuis @ 2023-11-21 9:57 UTC (permalink / raw) To: Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions Cc: stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry On 21.11.23 10:50, Thorsten Leemhuis wrote: > * @SCSI maintainers: could you please look into below please? > > * @Stable team: you might want to take a look as well and consider a > revert in 6.1.y (yes, I know, those are normally avoided, but here it > might make sense). > > TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes > hangs for a while) that was reported months ago already but is still not > fixed. Not only that, it apparently more and more users run into this > recently, as the culprit was recently integrated into 6.1.y; I wonder if > it would be best to revert it there, unless a fix for mainline comes > into reach soon. > > Details: > > Quite a few machines with Adaptec controllers seems to hang for a few > tens of seconds to a few minutes before things start to work normally > again for a while: > https://bugzilla.kernel.org/show_bug.cgi?id=217599 Quick follow up, only saw this now while posting something to the ticket: according to one reporter the problem even causes data damage. To quote: ''' if you run fsck.ext4 on ext4 file system with buggy kernel it will damage file system and its data using buggy kernel BTRFS scrub also says that checksums are wrong ''' Ciao, Thorsten > That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: > Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That > commit despite a warning of mine to Sasha recently made it into 6.1.53 > -- and that way apparently recently reached more users recently, as > quite a few joined that ticket. > > The culprit is authored by Sagar Biradar who unless I missed something > never replied even once to the ticket or earlier mails about it. Lore > has no messages from him since early June. > > Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but > that didn't work out (see the ticket for details). Since then things > look stalled again, which is, ehh, unfortunate when it comes to > regressions. > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis 2023-11-21 9:57 ` Thorsten Leemhuis @ 2023-11-21 11:30 ` John Garry 2023-11-21 12:24 ` Linux regression tracking (Thorsten Leemhuis) 2023-11-24 16:25 ` Greg KH 2 siblings, 1 reply; 12+ messages in thread From: John Garry @ 2023-11-21 11:30 UTC (permalink / raw) To: Thorsten Leemhuis, Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions Cc: stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu On 21/11/2023 09:50, Thorsten Leemhuis wrote: > Quite a few machines with Adaptec controllers seems to hang for a few > tens of seconds to a few minutes before things start to work normally > again for a while: > https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=217599__;!!ACWV5N9M2RV99hQ!L26RD0hu99l3f709EFnXU_V7OaB1jG4Hi7BjKvxRuhDWKFmjrgfksLuXA6eBrBCRtOT8JcRRUvzRsHbyEm41r7tL_pbDfw$ > > That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: > Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That > commit despite a warning of mine to Sasha recently made it into 6.1.53 > -- and that way apparently recently reached more users recently, as > quite a few joined that ticket. Is there a full kernel log for this hanging system? I can only see snippets in the ticket. And what does /sys/class/scsi_host/host*/nr_hw_queues show? Thanks, John ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 11:30 ` John Garry @ 2023-11-21 12:24 ` Linux regression tracking (Thorsten Leemhuis) 2023-11-21 13:05 ` James Bottomley 0 siblings, 1 reply; 12+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-11-21 12:24 UTC (permalink / raw) To: John Garry, Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions Cc: stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu On 21.11.23 12:30, John Garry wrote: > On 21/11/2023 09:50, Thorsten Leemhuis wrote: >> Quite a few machines with Adaptec controllers seems to hang for a few >> tens of seconds to a few minutes before things start to work normally >> again for a while: >> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=217599__;!!ACWV5N9M2RV99hQ!L26RD0hu99l3f709EFnXU_V7OaB1jG4Hi7BjKvxRuhDWKFmjrgfksLuXA6eBrBCRtOT8JcRRUvzRsHbyEm41r7tL_pbDfw$ >> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: >> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That >> commit despite a warning of mine to Sasha recently made it into 6.1.53 >> -- and that way apparently recently reached more users recently, as >> quite a few joined that ticket. > > Is there a full kernel log for this hanging system? > > I can only see snippets in the ticket. > > And what does /sys/class/scsi_host/host*/nr_hw_queues show? Sorry, I'm just the man-in-the-middle: you need to ask in the ticket, as the privacy policy for bugzilla.kernel.org does not allow to CC the reporters from the ticket here without their consent. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 12:24 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-11-21 13:05 ` James Bottomley 2023-11-21 13:24 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 1 reply; 12+ messages in thread From: James Bottomley @ 2023-11-21 13:05 UTC (permalink / raw) To: Linux regressions mailing list, John Garry, Greg KH, Sagar Biradar, Martin K. Petersen, Adaptec OEM Raid Solutions Cc: stable@vger.kernel.org, Sasha Levin, Hannes Reinecke, scsi, LKML, Gilbert Wu On Tue, 2023-11-21 at 13:24 +0100, Linux regression tracking (Thorsten Leemhuis) wrote: > On 21.11.23 12:30, John Garry wrote: [...] > > Is there a full kernel log for this hanging system? > > > > I can only see snippets in the ticket. > > > > And what does /sys/class/scsi_host/host*/nr_hw_queues show? > > Sorry, I'm just the man-in-the-middle: you need to ask in the ticket, > as the privacy policy for bugzilla.kernel.org does not allow to CC > the reporters from the ticket here without their consent. How did you arrive at that conclusion? Tickets for linux-scsi are vectored to the list: https://lore.kernel.org/linux-scsi/bug-217599-11613@https.bugzilla.kernel.org%2F/ So all the email addresses in the bugzilla are already archived on our list. James ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 13:05 ` James Bottomley @ 2023-11-21 13:24 ` Linux regression tracking (Thorsten Leemhuis) 2023-11-21 13:31 ` James Bottomley 0 siblings, 1 reply; 12+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-11-21 13:24 UTC (permalink / raw) To: James Bottomley, Linux regressions mailing list, John Garry, Greg KH, Sagar Biradar, Martin K. Petersen, Adaptec OEM Raid Solutions Cc: stable@vger.kernel.org, Sasha Levin, Hannes Reinecke, scsi, LKML, Gilbert Wu On 21.11.23 14:05, James Bottomley wrote: > On Tue, 2023-11-21 at 13:24 +0100, Linux regression tracking (Thorsten > Leemhuis) wrote: >> On 21.11.23 12:30, John Garry wrote: > [...] >>> Is there a full kernel log for this hanging system? >>> I can only see snippets in the ticket. >>> And what does /sys/class/scsi_host/host*/nr_hw_queues show? >> >> Sorry, I'm just the man-in-the-middle: you need to ask in the ticket, >> as the privacy policy for bugzilla.kernel.org does not allow to CC >> the reporters from the ticket here without their consent. > > How did you arrive at that conclusion? To quote https://bugzilla.kernel.org/createaccount.cgi: """ Note that your email address will never be displayed to logged out users. Only registered users will be able to see it. """ Not sure since when it's there. Maybe it was added due to EU GDPR? Konstantin should know. But for me that's enough to not CC people. I even heard from one well known kernel developer that his company got a GDPR complaint because he had mentioning the reporters name and email address in a Reported-by: tag. Side note: bugbot afaics can solve the initial problem (e.g. interact with reporters in bugzilla by mail without exposing their email address). But to use bugbot one *afaik* still has to reassign a ticket to a specific product and component in bugzilla. Some subsystem maintainers don't want that, as that issues then does not show up in the usual queries. Ciao, Thorsten ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 13:24 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-11-21 13:31 ` James Bottomley 0 siblings, 0 replies; 12+ messages in thread From: James Bottomley @ 2023-11-21 13:31 UTC (permalink / raw) To: Linux regressions mailing list, John Garry, Greg KH, Sagar Biradar, Martin K. Petersen, Adaptec OEM Raid Solutions Cc: stable@vger.kernel.org, Sasha Levin, Hannes Reinecke, scsi, LKML, Gilbert Wu On Tue, 2023-11-21 at 14:24 +0100, Linux regression tracking (Thorsten Leemhuis) wrote: > On 21.11.23 14:05, James Bottomley wrote: > > On Tue, 2023-11-21 at 13:24 +0100, Linux regression tracking > > (Thorsten > > Leemhuis) wrote: > > > On 21.11.23 12:30, John Garry wrote: > > [...] > > > > Is there a full kernel log for this hanging system? > > > > I can only see snippets in the ticket. > > > > And what does /sys/class/scsi_host/host*/nr_hw_queues show? > > > > > > Sorry, I'm just the man-in-the-middle: you need to ask in the > > > ticket, as the privacy policy for bugzilla.kernel.org does not > > > allow to CC the reporters from the ticket here without their > > > consent. > > > > How did you arrive at that conclusion? > > To quote https://bugzilla.kernel.org/createaccount.cgi: > """ > Note that your email address will never be displayed to logged out > users. Only registered users will be able to see it. > """ OK, so someone needs to update that to reflect reality. > Not sure since when it's there. Maybe it was added due to EU GDPR? > Konstantin should know. But for me that's enough to not CC people. I > even heard from one well known kernel developer that his company got > a > GDPR complaint because he had mentioning the reporters name and email > address in a Reported-by: tag. > > Side note: bugbot afaics can solve the initial problem (e.g. interact > with reporters in bugzilla by mail without exposing their email > address). But to use bugbot one *afaik* still has to reassign a > ticket to a specific product and component in bugzilla. Some > subsystem maintainers don't want that, as that issues then does not > show up in the usual queries. I'm not sure we need to solve a problem that doesn't exist. Switching to email is a standard maintainer response: https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/ https://lore.kernel.org/all/20230314144145.07a3e680362eb77061fe6d0e@linux-foundation.org/ ... James ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-21 9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis 2023-11-21 9:57 ` Thorsten Leemhuis 2023-11-21 11:30 ` John Garry @ 2023-11-24 16:25 ` Greg KH 2023-11-24 22:44 ` Martin K. Petersen 2023-11-25 7:10 ` Thorsten Leemhuis 2 siblings, 2 replies; 12+ messages in thread From: Greg KH @ 2023-11-24 16:25 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions, stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote: > * @SCSI maintainers: could you please look into below please? > > * @Stable team: you might want to take a look as well and consider a > revert in 6.1.y (yes, I know, those are normally avoided, but here it > might make sense). > > Hi everyone! > > TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes > hangs for a while) that was reported months ago already but is still not > fixed. Not only that, it apparently more and more users run into this > recently, as the culprit was recently integrated into 6.1.y; I wonder if > it would be best to revert it there, unless a fix for mainline comes > into reach soon. > > Details: > > Quite a few machines with Adaptec controllers seems to hang for a few > tens of seconds to a few minutes before things start to work normally > again for a while: > https://bugzilla.kernel.org/show_bug.cgi?id=217599 > > That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: > Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That > commit despite a warning of mine to Sasha recently made it into 6.1.53 > -- and that way apparently recently reached more users recently, as > quite a few joined that ticket. > > The culprit is authored by Sagar Biradar who unless I missed something > never replied even once to the ticket or earlier mails about it. Lore > has no messages from him since early June. > > Hannes Reinecke at least tried to fix it a few weeks ago (many thx), but > that didn't work out (see the ticket for details). Since then things > look stalled again, which is, ehh, unfortunate when it comes to > regressions. I am loath to revert a stable patch that has been there for so long as any upgrade will just cause the same bug to show back up. Why can't we just revert it in Linus's tree now and I'll take that revert in the stable trees as well? thanks, greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-24 16:25 ` Greg KH @ 2023-11-24 22:44 ` Martin K. Petersen 2023-11-25 7:10 ` Thorsten Leemhuis 1 sibling, 0 replies; 12+ messages in thread From: Martin K. Petersen @ 2023-11-24 22:44 UTC (permalink / raw) To: Greg KH Cc: Thorsten Leemhuis, Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions, stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry Greg, > I am loath to revert a stable patch that has been there for so long as > any upgrade will just cause the same bug to show back up. Why can't we > just revert it in Linus's tree now and I'll take that revert in the > stable trees as well? Hannes just posted another tentative patch. I'd prefer an incremental fix if possible. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-24 16:25 ` Greg KH 2023-11-24 22:44 ` Martin K. Petersen @ 2023-11-25 7:10 ` Thorsten Leemhuis 2023-12-29 20:13 ` Salvatore Bonaccorso 1 sibling, 1 reply; 12+ messages in thread From: Thorsten Leemhuis @ 2023-11-25 7:10 UTC (permalink / raw) To: Greg KH Cc: Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions, stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry On 24.11.23 17:25, Greg KH wrote: > On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote: >> * @SCSI maintainers: could you please look into below please? >> >> * @Stable team: you might want to take a look as well and consider a >> revert in 6.1.y (yes, I know, those are normally avoided, but here it >> might make sense). >> >> Hi everyone! >> >> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes >> hangs for a while) that was reported months ago already but is still not >> fixed. Not only that, it apparently more and more users run into this >> recently, as the culprit was recently integrated into 6.1.y; I wonder if >> it would be best to revert it there, unless a fix for mainline comes >> into reach soon. >> >> Details: >> >> Quite a few machines with Adaptec controllers seems to hang for a few >> tens of seconds to a few minutes before things start to work normally >> again for a while: >> https://bugzilla.kernel.org/show_bug.cgi?id=217599 >> >> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: >> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That >> commit despite a warning of mine to Sasha recently made it into 6.1.53 >> -- and that way apparently recently reached more users recently, as >> quite a few joined that ticket. >[...] > I am loath to revert a stable patch that has been there for so long as > any upgrade will just cause the same bug to show back up. Why can't we > just revert it in Linus's tree now and I'll take that revert in the > stable trees as well? FWIW, I know and in general agree with that strategy, that's why I normally wouldn't have brought a stable-only revert up for consideration. But this issue to me looked somewhat special and urgent for two and a half reasons: (1) that backport apparently made a lot more people suddenly hit the issue (2) there was also this data corruption aspect one of the reporters mentioned (not sure if that is real and/or if this might be just a 6.1.y thing). Furthermore for 6.1.y it was recently confirmed that reverting the change fixes things, while we iirc had no such confirmation for recent mainline kernels at that point. So it looked like it would take a while to get this sorted out in mainline. But it seems we finally might get closer to that now, so yeah, maybe it's not worth a stable revert. Ciao, Thorsten ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-11-25 7:10 ` Thorsten Leemhuis @ 2023-12-29 20:13 ` Salvatore Bonaccorso 2023-12-30 10:58 ` Greg KH 0 siblings, 1 reply; 12+ messages in thread From: Salvatore Bonaccorso @ 2023-12-29 20:13 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Greg KH, Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions, stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry Hi all, On Sat, Nov 25, 2023 at 08:10:35AM +0100, Thorsten Leemhuis wrote: > On 24.11.23 17:25, Greg KH wrote: > > On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote: > >> * @SCSI maintainers: could you please look into below please? > >> > >> * @Stable team: you might want to take a look as well and consider a > >> revert in 6.1.y (yes, I know, those are normally avoided, but here it > >> might make sense). > >> > >> Hi everyone! > >> > >> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes > >> hangs for a while) that was reported months ago already but is still not > >> fixed. Not only that, it apparently more and more users run into this > >> recently, as the culprit was recently integrated into 6.1.y; I wonder if > >> it would be best to revert it there, unless a fix for mainline comes > >> into reach soon. > >> > >> Details: > >> > >> Quite a few machines with Adaptec controllers seems to hang for a few > >> tens of seconds to a few minutes before things start to work normally > >> again for a while: > >> https://bugzilla.kernel.org/show_bug.cgi?id=217599 > >> > >> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: > >> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That > >> commit despite a warning of mine to Sasha recently made it into 6.1.53 > >> -- and that way apparently recently reached more users recently, as > >> quite a few joined that ticket. > >[...] > > I am loath to revert a stable patch that has been there for so long as > > any upgrade will just cause the same bug to show back up. Why can't we > > just revert it in Linus's tree now and I'll take that revert in the > > stable trees as well? > > FWIW, I know and in general agree with that strategy, that's why I > normally wouldn't have brought a stable-only revert up for > consideration. But this issue to me looked somewhat special and urgent > for two and a half reasons: (1) that backport apparently made a lot more > people suddenly hit the issue (2) there was also this data corruption > aspect one of the reporters mentioned (not sure if that is real and/or > if this might be just a 6.1.y thing). Furthermore for 6.1.y it was > recently confirmed that reverting the change fixes things, while we iirc > had no such confirmation for recent mainline kernels at that point. So > it looked like it would take a while to get this sorted out in mainline. > But it seems we finally might get closer to that now, so yeah, maybe > it's not worth a stable revert. If I'm not completely wrong, finally indeed the commit has been reverted in mainline, with c5becf57dd56 ("Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity"") . This is what was mentioned here: https://bugzilla.kernel.org/show_bug.cgi?id=217599#c52 So should/can it be reverted it now as well on the 6.1.y stable series (and the others up as needed?) #regzbot link: https://bugs.debian.org/1059624 #regzbot fixed-by: c5becf57dd56 Thorsten, hope I got the above right. Regards, Salvatore ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scsi regression that after months is still not addressed and now bothering 6.1.y users, too 2023-12-29 20:13 ` Salvatore Bonaccorso @ 2023-12-30 10:58 ` Greg KH 0 siblings, 0 replies; 12+ messages in thread From: Greg KH @ 2023-12-30 10:58 UTC (permalink / raw) To: Salvatore Bonaccorso Cc: Thorsten Leemhuis, Sagar Biradar, James Bottomley, Martin K. Petersen, Adaptec OEM Raid Solutions, stable@vger.kernel.org, Sasha Levin, Linux kernel regressions list, Hannes Reinecke, scsi, LKML, Gilbert Wu, John Garry On Fri, Dec 29, 2023 at 09:13:18PM +0100, Salvatore Bonaccorso wrote: > Hi all, > > On Sat, Nov 25, 2023 at 08:10:35AM +0100, Thorsten Leemhuis wrote: > > On 24.11.23 17:25, Greg KH wrote: > > > On Tue, Nov 21, 2023 at 10:50:57AM +0100, Thorsten Leemhuis wrote: > > >> * @SCSI maintainers: could you please look into below please? > > >> > > >> * @Stable team: you might want to take a look as well and consider a > > >> revert in 6.1.y (yes, I know, those are normally avoided, but here it > > >> might make sense). > > >> > > >> Hi everyone! > > >> > > >> TLDR: I noticed a regression (Adaptec 71605z with aacraid sometimes > > >> hangs for a while) that was reported months ago already but is still not > > >> fixed. Not only that, it apparently more and more users run into this > > >> recently, as the culprit was recently integrated into 6.1.y; I wonder if > > >> it would be best to revert it there, unless a fix for mainline comes > > >> into reach soon. > > >> > > >> Details: > > >> > > >> Quite a few machines with Adaptec controllers seems to hang for a few > > >> tens of seconds to a few minutes before things start to work normally > > >> again for a while: > > >> https://bugzilla.kernel.org/show_bug.cgi?id=217599 > > >> > > >> That problem is apparently caused by 9dc704dcc09eae ("scsi: aacraid: > > >> Reply queue mapping to CPUs based on IRQ affinity") [v6.4-rc7]. That > > >> commit despite a warning of mine to Sasha recently made it into 6.1.53 > > >> -- and that way apparently recently reached more users recently, as > > >> quite a few joined that ticket. > > >[...] > > > I am loath to revert a stable patch that has been there for so long as > > > any upgrade will just cause the same bug to show back up. Why can't we > > > just revert it in Linus's tree now and I'll take that revert in the > > > stable trees as well? > > > > FWIW, I know and in general agree with that strategy, that's why I > > normally wouldn't have brought a stable-only revert up for > > consideration. But this issue to me looked somewhat special and urgent > > for two and a half reasons: (1) that backport apparently made a lot more > > people suddenly hit the issue (2) there was also this data corruption > > aspect one of the reporters mentioned (not sure if that is real and/or > > if this might be just a 6.1.y thing). Furthermore for 6.1.y it was > > recently confirmed that reverting the change fixes things, while we iirc > > had no such confirmation for recent mainline kernels at that point. So > > it looked like it would take a while to get this sorted out in mainline. > > But it seems we finally might get closer to that now, so yeah, maybe > > it's not worth a stable revert. > > If I'm not completely wrong, finally indeed the commit has been > reverted in mainline, with c5becf57dd56 ("Revert "scsi: aacraid: Reply > queue mapping to CPUs based on IRQ affinity"") . > > This is what was mentioned here: > https://bugzilla.kernel.org/show_bug.cgi?id=217599#c52 > > So should/can it be reverted it now as well on the 6.1.y stable series > (and the others up as needed?) Now queued up, thanks. greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-12-30 10:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-21 9:50 scsi regression that after months is still not addressed and now bothering 6.1.y users, too Thorsten Leemhuis 2023-11-21 9:57 ` Thorsten Leemhuis 2023-11-21 11:30 ` John Garry 2023-11-21 12:24 ` Linux regression tracking (Thorsten Leemhuis) 2023-11-21 13:05 ` James Bottomley 2023-11-21 13:24 ` Linux regression tracking (Thorsten Leemhuis) 2023-11-21 13:31 ` James Bottomley 2023-11-24 16:25 ` Greg KH 2023-11-24 22:44 ` Martin K. Petersen 2023-11-25 7:10 ` Thorsten Leemhuis 2023-12-29 20:13 ` Salvatore Bonaccorso 2023-12-30 10:58 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox