* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
@ 2024-07-13 17:52 Tim Lewis
2024-07-14 6:30 ` Greg KH
0 siblings, 1 reply; 10+ messages in thread
From: Tim Lewis @ 2024-07-13 17:52 UTC (permalink / raw)
To: Greg KH; +Cc: Niklas Neronin, Mathias Nyman, linux-usb, open list, stable
The patch
usb: xhci: prevent potential failure in handle_tx_event() for
Transfer events without TRB
https://patches.linaro.org/project/linux-usb/patch/20240429140245.3955523-11-mathias.nyman@linux.intel.com/
causes The Linux kernel
6.1.98
https://lkml.org/lkml/2024/7/9/645
to crash when plugging in a USB Seagate drive.
https://www.seagate.com/ca/en/products/gaming-drives/pc-gaming/firecuda-gaming-hub/
This is a regression.
Behavior of 6.1.98:
==============================================================================
scsi host1: uas_eh_device_reset_handler start
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 0-...!: (1 GPs behind) idle=686c/0/0x1 softirq=1841/1841 fqs=610
(detected by 4, t=5253 jiffies, g=2269, q=225 ncpus=6)
Task dump for CPU 0:
task:swapper/0 state:R running task stack:0 pid:0
ppid:0 flags:0x0000000
8
Call trace:
__switch_to+0xe4/0x160
0xd7f8f808
rcu: rcu_preempt kthread timer wakeup didn't happen for 4037 jiffies!
g2269 f0x0 RCU_GP_WAIT_FQS
(5) ->state=0x402
rcu: Possible timer handling issue on cpu=5 timer-softirq=1141
rcu: rcu_preempt kthread starved for 4043 jiffies! g2269 f0x0
RCU_GP_WAIT_FQS(5) ->state=0x402 -
>cpu=5
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is
now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:I stack:0 pid:14 ppid:2 flags:0x00000008
Call trace:
__switch_to+0xe4/0x160
__schedule+0x28c/0x710
schedule+0x5c/0xd0
schedule_timeout+0x8c/0x100
rcu_gp_fqs_loop+0x140/0x4a0
rcu_gp_kthread+0x13c/0x170
kthread+0x108/0x10c
ret_from_fork+0x10/0x20
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 5:
task:kworker/5:1 state:R running task stack:0 pid:89
ppid:2 flags:0x0000000
8
Workqueue: events xhci_handle_command_timeout
Call trace:
__switch_to+0xe4/0x160
0x0
Behavior of 6.1.97 (or 6.1.978 with the patch reverted):
==============================================================================
scsi host1: uas_eh_device_reset_handler start
usb 2-1.4.2.1: reset SuperSpeed USB device number 6 using xhci-hcd
scsi host1: uas_eh_device_reset_handler success
sd 1:0:0:0:31251759103 512-byte logical blocks: (16.0 TB/14.6 TiB)
sd 1:0:0:0:Write Protect is off
sd 1:0:0:0:Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 1:0:0:0:Preferred minimum I/O size 512 bytes
sd 1:0:0:0:Optimal transfer size 33553920 bytes
sdb: sdb1
sd 1:0:0:0:Attached SCSI disk
sd 0:0:0:0:tag#6 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
sd 0:0:0:0:tag#6 CDB: opcode=0x9e, sa=0x10 9e 10 00 00 00 00 00 00 00
00 00 00 00 20 00 0
0
scsi host0: uas_eh_device_reset_handler start
usb 2-1.4.1: reset SuperSpeed USB device number 5 using xhci-hcd
scsi host0: uas_eh_device_reset_handler success
sd 0:0:0:0:31251759103 512-byte logical blocks: (16.0 TB/14.6 TiB)
sd 0:0:0:0:Write Protect is off
sd 0:0:0:0:Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0:Preferred minimum I/O size 512 bytes
sd 0:0:0:0:Optimal transfer size 33553920 bytes
sda: sda1
sd 0:0:0:0:Attached SCSI disk
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-13 17:52 [PATCH 6.1 000/102] 6.1.98-rc1 review Tim Lewis
@ 2024-07-14 6:30 ` Greg KH
2024-07-14 12:51 ` Tim Lewis
0 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2024-07-14 6:30 UTC (permalink / raw)
To: Tim Lewis; +Cc: Niklas Neronin, Mathias Nyman, linux-usb, open list, stable
On Sat, Jul 13, 2024 at 01:52:52PM -0400, Tim Lewis wrote:
> The patch
> usb: xhci: prevent potential failure in handle_tx_event() for
> Transfer events without TRB
> https://patches.linaro.org/project/linux-usb/patch/20240429140245.3955523-11-mathias.nyman@linux.intel.com/
> causes The Linux kernel
> 6.1.98
> https://lkml.org/lkml/2024/7/9/645
> to crash when plugging in a USB Seagate drive.
> https://www.seagate.com/ca/en/products/gaming-drives/pc-gaming/firecuda-gaming-hub/
> This is a regression.
Ick, is this also a problem with the latest 6.6 and/or the latest 6.9
and/or Linus's tree?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 6:30 ` Greg KH
@ 2024-07-14 12:51 ` Tim Lewis
2024-07-14 14:45 ` Tim Lewis
0 siblings, 1 reply; 10+ messages in thread
From: Tim Lewis @ 2024-07-14 12:51 UTC (permalink / raw)
To: Greg KH; +Cc: Niklas Neronin, Mathias Nyman, linux-usb, open list, stable
On Sun, Jul 14, 2024 at 2:30 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> On Sat, Jul 13, 2024 at 01:52:52PM -0400, Tim Lewis wrote:
> > usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB
>
> Ick, is this also a problem with the latest 6.6 and/or the latest 6.9 and/or Linus's tree?
The problem did not occur on 6.9.9
I'll test 6.6.y next.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 12:51 ` Tim Lewis
@ 2024-07-14 14:45 ` Tim Lewis
2024-07-14 15:32 ` Michał Pecio
0 siblings, 1 reply; 10+ messages in thread
From: Tim Lewis @ 2024-07-14 14:45 UTC (permalink / raw)
To: Greg KH; +Cc: Niklas Neronin, Mathias Nyman, linux-usb, open list, stable
On Sun, Jul 14, 2024 at 8:51 AM Tim Lewis <elatllat@gmail.com> wrote:
>
> On Sun, Jul 14, 2024 at 2:30 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Sat, Jul 13, 2024 at 01:52:52PM -0400, Tim Lewis wrote:
> > > usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB
> >
> > Ick, is this also a problem with the latest 6.6 and/or the latest 6.9 and/or Linus's tree?
>
> The problem did not occur on 6.9.9
> I'll test 6.6.y next.
The problem did occur on 6.6.y
I'm going to re-test 6.9.y ...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 14:45 ` Tim Lewis
@ 2024-07-14 15:32 ` Michał Pecio
2024-07-14 16:05 ` Salvatore Bonaccorso
0 siblings, 1 reply; 10+ messages in thread
From: Michał Pecio @ 2024-07-14 15:32 UTC (permalink / raw)
To: elatllat
Cc: gregkh, linux-kernel, linux-usb, mathias.nyman, niklas.neronin,
stable
This looks like bug 219039, please see if my suggested solution works.
The upstream commit is correct, because the call to inc_deq() has been
moved outside handle_tx_event() so there is no longer this critical
difference between doing 'goto cleanup' and 'return 0'. The intended
change of this commit also makes sense to me.
This refactor is already present in v6.9 so I don't think the commit
will have any effect besides fixing the isochronous bug which it is
meant to fix.
But it is not present in v6.6 and v6.1, so they break/crash/hang/etc.
Symptoms may vary, but I believe the root cause is the same because the
code is visibly wrong.
I would like to use this opportunity to point out that the xhci driver
is currenty undergoing (much needed IMO) cleanups and refactors and
this is not the first time when a naive, verbatim backport is attempted
of a patch which works fine on upstream, but causes problems on earlier
kernels. These things need special scrutiny, beyond just "CC:stable".
Regards,
Michal
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 15:32 ` Michał Pecio
@ 2024-07-14 16:05 ` Salvatore Bonaccorso
2024-07-14 20:41 ` Tim Lewis
2024-07-15 5:45 ` Greg KH
0 siblings, 2 replies; 10+ messages in thread
From: Salvatore Bonaccorso @ 2024-07-14 16:05 UTC (permalink / raw)
To: Michał Pecio
Cc: elatllat, gregkh, linux-kernel, linux-usb, mathias.nyman,
niklas.neronin, stable, regressions
Hi,
On Sun, Jul 14, 2024 at 05:32:39PM +0200, Michał Pecio wrote:
> This looks like bug 219039, please see if my suggested solution works.
>
> The upstream commit is correct, because the call to inc_deq() has been
> moved outside handle_tx_event() so there is no longer this critical
> difference between doing 'goto cleanup' and 'return 0'. The intended
> change of this commit also makes sense to me.
>
> This refactor is already present in v6.9 so I don't think the commit
> will have any effect besides fixing the isochronous bug which it is
> meant to fix.
>
> But it is not present in v6.6 and v6.1, so they break/crash/hang/etc.
> Symptoms may vary, but I believe the root cause is the same because the
> code is visibly wrong.
>
>
> I would like to use this opportunity to point out that the xhci driver
> is currenty undergoing (much needed IMO) cleanups and refactors and
> this is not the first time when a naive, verbatim backport is attempted
> of a patch which works fine on upstream, but causes problems on earlier
> kernels. These things need special scrutiny, beyond just "CC:stable".
For tracking I guess this should go as well to the regressions list?
#regzbot introduced: 948554f1bb16e15b90006c109c3a558c66d4c4ac
#regzbot title: freezes on plugging USB connector due to 948554f1bb16 ("usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB")
#regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=219039
Thorsten I hope I got the most bits correctly, how would one inform
regzbot about the regresssion for 6.1.98 and 6.6.39 but not happening
in the upper versions?
Regards,
Salvatore
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 16:05 ` Salvatore Bonaccorso
@ 2024-07-14 20:41 ` Tim Lewis
2024-07-14 20:54 ` Michał Pecio
2024-07-15 5:45 ` Greg KH
1 sibling, 1 reply; 10+ messages in thread
From: Tim Lewis @ 2024-07-14 20:41 UTC (permalink / raw)
To: gregkh
Cc: Michał Pecio, linux-kernel, linux-usb, mathias.nyman,
niklas.neronin, stable, regressions, Salvatore Bonaccorso
On Sun, Jul 14, 2024 at 11:32 AM Michał Pecio <michal.pecio@gmail.com> wrote:
> see if my suggested solution works.
On Sun, Jul 14, 2024 at 12:05 PM Salvatore Bonaccorso <carnil@debian.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219039
Not on 6.1.y:
drivers/usb/host/xhci-ring.c:2644:31: error: ~@~Xir~@~Y undeclared
(first use in this fuunction); did you mean ~@~Xidr~@~Y?
2644 | inc_deq(xhci, ir->event_ring);
On Sun, Jul 14, 2024 at 2:30 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> Ick
We now have 3 USB HDD Vendors that are reported to crash the kernel
because of this patch. I think it best we take time to sort out the
long standing minor issue this patch is for, after we quickly revert
the major issues it is causing. Can we get it removed from 6.6.y and
older branches for the hopefully soon next release?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 20:41 ` Tim Lewis
@ 2024-07-14 20:54 ` Michał Pecio
0 siblings, 0 replies; 10+ messages in thread
From: Michał Pecio @ 2024-07-14 20:54 UTC (permalink / raw)
To: Tim Lewis
Cc: gregkh, linux-kernel, linux-usb, mathias.nyman, niklas.neronin,
stable, regressions, Salvatore Bonaccorso
> > Not on 6.1.y:
> drivers/usb/host/xhci-ring.c:2644:31: error: ~@~Xir~@~Y undeclared
> (first use in this fuunction); did you mean ~@~Xidr~@~Y?
> 2644 | inc_deq(xhci, ir->event_ring);
Sorry, I didn't notice that v6.1 is different. You need this instead:
inc_deq(xhci, xhci->event_ring);
Another refactor interfering with backporting :(
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-14 16:05 ` Salvatore Bonaccorso
2024-07-14 20:41 ` Tim Lewis
@ 2024-07-15 5:45 ` Greg KH
2024-07-15 8:29 ` Greg KH
1 sibling, 1 reply; 10+ messages in thread
From: Greg KH @ 2024-07-15 5:45 UTC (permalink / raw)
To: Salvatore Bonaccorso
Cc: Michał Pecio, elatllat, linux-kernel, linux-usb,
mathias.nyman, niklas.neronin, stable, regressions
On Sun, Jul 14, 2024 at 06:05:25PM +0200, Salvatore Bonaccorso wrote:
> Hi,
>
> On Sun, Jul 14, 2024 at 05:32:39PM +0200, Michał Pecio wrote:
> > This looks like bug 219039, please see if my suggested solution works.
> >
> > The upstream commit is correct, because the call to inc_deq() has been
> > moved outside handle_tx_event() so there is no longer this critical
> > difference between doing 'goto cleanup' and 'return 0'. The intended
> > change of this commit also makes sense to me.
> >
> > This refactor is already present in v6.9 so I don't think the commit
> > will have any effect besides fixing the isochronous bug which it is
> > meant to fix.
> >
> > But it is not present in v6.6 and v6.1, so they break/crash/hang/etc.
> > Symptoms may vary, but I believe the root cause is the same because the
> > code is visibly wrong.
> >
> >
> > I would like to use this opportunity to point out that the xhci driver
> > is currenty undergoing (much needed IMO) cleanups and refactors and
> > this is not the first time when a naive, verbatim backport is attempted
> > of a patch which works fine on upstream, but causes problems on earlier
> > kernels. These things need special scrutiny, beyond just "CC:stable".
>
> For tracking I guess this should go as well to the regressions list?
>
> #regzbot introduced: 948554f1bb16e15b90006c109c3a558c66d4c4ac
> #regzbot title: freezes on plugging USB connector due to 948554f1bb16 ("usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB")
> #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=219039
>
> Thorsten I hope I got the most bits correctly, how would one inform
> regzbot about the regresssion for 6.1.98 and 6.6.39 but not happening
> in the upper versions?
I'll handle this and go release new kernels with just this reverted in
it. Let my morning coffee kick in first...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6.1 000/102] 6.1.98-rc1 review
2024-07-15 5:45 ` Greg KH
@ 2024-07-15 8:29 ` Greg KH
0 siblings, 0 replies; 10+ messages in thread
From: Greg KH @ 2024-07-15 8:29 UTC (permalink / raw)
To: Salvatore Bonaccorso
Cc: Michał Pecio, elatllat, linux-kernel, linux-usb,
mathias.nyman, niklas.neronin, stable, regressions
On Mon, Jul 15, 2024 at 07:45:07AM +0200, Greg KH wrote:
> On Sun, Jul 14, 2024 at 06:05:25PM +0200, Salvatore Bonaccorso wrote:
> > Hi,
> >
> > On Sun, Jul 14, 2024 at 05:32:39PM +0200, Michał Pecio wrote:
> > > This looks like bug 219039, please see if my suggested solution works.
> > >
> > > The upstream commit is correct, because the call to inc_deq() has been
> > > moved outside handle_tx_event() so there is no longer this critical
> > > difference between doing 'goto cleanup' and 'return 0'. The intended
> > > change of this commit also makes sense to me.
> > >
> > > This refactor is already present in v6.9 so I don't think the commit
> > > will have any effect besides fixing the isochronous bug which it is
> > > meant to fix.
> > >
> > > But it is not present in v6.6 and v6.1, so they break/crash/hang/etc.
> > > Symptoms may vary, but I believe the root cause is the same because the
> > > code is visibly wrong.
> > >
> > >
> > > I would like to use this opportunity to point out that the xhci driver
> > > is currenty undergoing (much needed IMO) cleanups and refactors and
> > > this is not the first time when a naive, verbatim backport is attempted
> > > of a patch which works fine on upstream, but causes problems on earlier
> > > kernels. These things need special scrutiny, beyond just "CC:stable".
> >
> > For tracking I guess this should go as well to the regressions list?
> >
> > #regzbot introduced: 948554f1bb16e15b90006c109c3a558c66d4c4ac
> > #regzbot title: freezes on plugging USB connector due to 948554f1bb16 ("usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB")
> > #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=219039
> >
> > Thorsten I hope I got the most bits correctly, how would one inform
> > regzbot about the regresssion for 6.1.98 and 6.6.39 but not happening
> > in the upper versions?
>
> I'll handle this and go release new kernels with just this reverted in
> it. Let my morning coffee kick in first...
Should all now be fixed in the 6.6.40 and 6.1.99 releases. If not,
please let me know.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-07-15 8:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-13 17:52 [PATCH 6.1 000/102] 6.1.98-rc1 review Tim Lewis
2024-07-14 6:30 ` Greg KH
2024-07-14 12:51 ` Tim Lewis
2024-07-14 14:45 ` Tim Lewis
2024-07-14 15:32 ` Michał Pecio
2024-07-14 16:05 ` Salvatore Bonaccorso
2024-07-14 20:41 ` Tim Lewis
2024-07-14 20:54 ` Michał Pecio
2024-07-15 5:45 ` Greg KH
2024-07-15 8:29 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox