* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage [not found] <2b6dc4b0-71c8-28cb-383d-c01a9e896c7a@internode.on.net> @ 2017-07-12 16:10 ` Alan Stern 2017-07-12 16:52 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Alan Stern @ 2017-07-12 16:10 UTC (permalink / raw) To: Arthur Marsh Cc: Christoph Hellwig, Matthew Dharm, USB list, SCSI development list On Thu, 13 Jul 2017, Arthur Marsh wrote: > Thanks for the advice, I've enabled CONFIG_FRAME_POINTER and managed to > catch 1 more log triggered by issuing a > > blkid > > command after inserting a USB stick. > > The problem is that I can't be certain of triggering the problem when > testing kernels using git-bisect, so my git-bisect efforts are not > guaranteed to get a result. > Jul 12 18:51:00 localhost kernel: [ 1206.802306] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 > Jul 12 18:51:00 localhost kernel: [ 1206.802308] Modules linked in: cmac arc4 ecb md4 nls_utf8 cifs ccm dns_resolver fscache bnep bluetooth hmac drbg ansi_cprng ecdh_generic nfc rfkill cpufreq_userspace cpufreq_conservative snd_hrtimer cpufreq_powersave binfmt_misc fuse snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi_event snd_seq max6650 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi parport_pc ppdev lp parport ir_lirc_codec lirc_dev rtl2832_sdr videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media fc0012 rtl2832 i2c_mux dvb_usb_rtl28xxu dvb_usb_v2 amdkfd dvb_core rc_core edac_mce_amd radeon kvm_amd kvm irqbypass wmi_bmof snd_hda_codec_hdmi snd_emu10k1 snd_hda_intel ttm sg drm_km s_helper snd_hda_codec snd_util_mem snd_hda_core > Jul 12 18:51:00 localhost kernel: [ 1206.802336] snd_ac97_codec snd_rawmidi snd_seq_device snd_hwdep snd_pcm_oss drm snd_mixer_oss ac97_bus snd_pcm pcspkr evdev k10temp i2c_algo_bit serio_raw snd_timer emu10k1_gp snd gameport sp5100_tco soundcore button acpi_cpufreq asus_atk0110 wmi ext4 mbcache crc16 jbd2 crc32c_generic btrfs raid6_pq xor uas usb_storage sr_mod cdrom sd_mod hid_generic usbhid hid ata_generic ohci_pci ahci firewire_ohci libahci pata_atiixp firewire_core crc_itu_t i2c_piix4 ehci_pci ohci_hcd libata ehci_hcd usbcore scsi_mod r8169 mii > Jul 12 18:51:00 localhost kernel: [ 1206.802359] CPU: 1 PID: 142 Comm: usb-storage Not tainted 4.12.0+ #2751 > Jul 12 18:51:00 localhost kernel: [ 1206.802359] Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701 01/27/2011 > Jul 12 18:51:00 localhost kernel: [ 1206.802361] task: ffff927c610eef80 task.stack: ffffba9501b78000 > Jul 12 18:51:00 localhost kernel: [ 1206.802366] RIP: 0010:native_queued_spin_lock_slowpath+0x159/0x1c0 > Jul 12 18:51:00 localhost kernel: [ 1206.802366] RSP: 0018:ffffba9501b7bdc8 EFLAGS: 00000002 > Jul 12 18:51:00 localhost kernel: [ 1206.802368] RAX: 0000000000000101 RBX: ffff927c662f3030 RCX: 0000000000000001 > Jul 12 18:51:00 localhost kernel: [ 1206.802368] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff927c662f3030 > Jul 12 18:51:00 localhost kernel: [ 1206.802369] RBP: ffffba9501b7bde0 R08: 0000000000000101 R09: 0000000000000100 > Jul 12 18:51:00 localhost kernel: [ 1206.802370] R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000082 > Jul 12 18:51:00 localhost kernel: [ 1206.802370] R13: 0000000000002003 R14: 0000000000001d4c R15: ffff927c610eef80 > Jul 12 18:51:00 localhost kernel: [ 1206.802371] FS: 0000000000000000(0000) GS:ffff927c6fc40000(0000) knlGS:0000000000000000 > Jul 12 18:51:00 localhost kernel: [ 1206.802372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 12 18:51:00 localhost kernel: [ 1206.802373] CR2: 00000030007995d8 CR3: 000000017b144000 CR4: 00000000000006e0 > Jul 12 18:51:00 localhost kernel: [ 1206.802374] Call Trace: > Jul 12 18:51:00 localhost kernel: [ 1206.802378] ? _raw_spin_lock_irqsave+0x59/0x68 > Jul 12 18:51:00 localhost kernel: [ 1206.802389] scsi_eh_scmd_add+0x3f/0x150 [scsi_mod] > Jul 12 18:51:00 localhost kernel: [ 1206.802393] scsi_softirq_done+0xb5/0x150 [scsi_mod] > Jul 12 18:51:00 localhost kernel: [ 1206.802396] __blk_mq_complete_request+0xd4/0x160 > Jul 12 18:51:00 localhost kernel: [ 1206.802397] blk_mq_complete_request+0x18/0x20 > Jul 12 18:51:00 localhost kernel: [ 1206.802402] scsi_mq_done+0x21/0x80 [scsi_mod] > Jul 12 18:51:00 localhost kernel: [ 1206.802405] usb_stor_control_thread+0xf4/0x250 [usb_storage] > Jul 12 18:51:00 localhost kernel: [ 1206.802407] kthread+0x125/0x140 > Jul 12 18:51:00 localhost kernel: [ 1206.802409] ? fill_inquiry_response+0x20/0x20 [usb_storage] > Jul 12 18:51:00 localhost kernel: [ 1206.802410] ? kthread_create_on_node+0x70/0x70 > Jul 12 18:51:00 localhost kernel: [ 1206.802412] ret_from_fork+0x25/0x30 > Jul 12 18:51:00 localhost kernel: [ 1206.802413] Code: c2 74 a2 89 c2 89 d0 66 31 c0 39 c6 74 ea 4d 85 c9 c6 07 01 74 21 41 c7 41 08 01 00 00 00 eb 85 41 ff c9 75 04 f3 c3 f3 90 8b 07 <84> c0 75 f8 66 c7 07 01 00 c3 f3 90 4c 8b 09 4d 85 c9 74 f6 eb This is pretty conclusive. The problem comes about because usb_stor_control_thread() calls scsi_mq_done() while holding shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that same lock. I don't know why this didn't show up in earlier kernels. I guess some element of the call chain listed above must be new in 4.12. Christoph, what's the best way to fix this? Should usb-storage release the host lock before issuing the ->scsi_done callback? If so, does that change need to be applied to any kernels before 4.12? Alan Stern ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage 2017-07-12 16:10 ` CPU lock-ups with 4.12.0+ kernels related to usb_storage Alan Stern @ 2017-07-12 16:52 ` Christoph Hellwig 2017-07-13 17:00 ` Alan Stern 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2017-07-12 16:52 UTC (permalink / raw) To: Alan Stern Cc: Arthur Marsh, Christoph Hellwig, Matthew Dharm, USB list, SCSI development list, Jens Axboe On Wed, Jul 12, 2017 at 12:10:02PM -0400, Alan Stern wrote: > This is pretty conclusive. The problem comes about because > usb_stor_control_thread() calls scsi_mq_done() while holding > shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that > same lock. > > I don't know why this didn't show up in earlier kernels. I guess some > element of the call chain listed above must be new in 4.12. > > Christoph, what's the best way to fix this? Should usb-storage release > the host lock before issuing the ->scsi_done callback? If so, does > that change need to be applied to any kernels before 4.12? 4.12 switched to blk-mq by default, and while the old code used a softirq for completions, which is always a difference context the blk-mq code might execute in the same context it's called in. So yes, for that we'd need to drop host_lock. But I wonder how many more of these are lingering somewhere and if we can find another workaround. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage 2017-07-12 16:52 ` Christoph Hellwig @ 2017-07-13 17:00 ` Alan Stern [not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Alan Stern @ 2017-07-13 17:00 UTC (permalink / raw) To: Christoph Hellwig Cc: Arthur Marsh, Matthew Dharm, USB list, SCSI development list, Jens Axboe On Wed, 12 Jul 2017, Christoph Hellwig wrote: > On Wed, Jul 12, 2017 at 12:10:02PM -0400, Alan Stern wrote: > > This is pretty conclusive. The problem comes about because > > usb_stor_control_thread() calls scsi_mq_done() while holding > > shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that > > same lock. > > > > I don't know why this didn't show up in earlier kernels. I guess some > > element of the call chain listed above must be new in 4.12. > > > > Christoph, what's the best way to fix this? Should usb-storage release > > the host lock before issuing the ->scsi_done callback? If so, does > > that change need to be applied to any kernels before 4.12? > > 4.12 switched to blk-mq by default, and while the old code used > a softirq for completions, which is always a difference context > the blk-mq code might execute in the same context it's called in. > > So yes, for that we'd need to drop host_lock. But I wonder how > many more of these are lingering somewhere and if we can find > another workaround. All right. In the meantime, changing usb-storage won't hurt. Arthur, can you test the patch below? Alan Stern Index: usb-4.x/drivers/usb/storage/usb.c =================================================================== --- usb-4.x.orig/drivers/usb/storage/usb.c +++ usb-4.x/drivers/usb/storage/usb.c @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void { struct us_data *us = (struct us_data *)__us; struct Scsi_Host *host = us_to_host(us); + struct scsi_cmnd *srb; for (;;) { usb_stor_dbg(us, "*** thread sleeping\n"); @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void scsi_lock(host); /* When we are called with no command pending, we're done */ + srb = us->srb; if (us->srb == NULL) { scsi_unlock(host); mutex_unlock(&us->dev_mutex); @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void /* lock access to the state */ scsi_lock(host); - /* indicate that the command is done */ - if (us->srb->result != DID_ABORT << 16) { - usb_stor_dbg(us, "scsi cmd done, result=0x%x\n", - us->srb->result); - us->srb->scsi_done(us->srb); - } else { + /* was the command aborted? */ + if (us->srb->result == DID_ABORT << 16) { SkipForAbort: usb_stor_dbg(us, "scsi command aborted\n"); + srb = NULL; /* Don't call srb->scsi_done() */ } /* @@ -429,6 +428,13 @@ SkipForAbort: /* unlock the device pointers */ mutex_unlock(&us->dev_mutex); + + /* now that the locks are released, notify the SCSI core */ + if (srb) { + usb_stor_dbg(us, "scsi cmd done, result=0x%x\n", + srb->result); + srb->scsi_done(srb); + } } /* for (;;) */ /* Wait until we are told to stop */ ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>]
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage [not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org> @ 2017-07-13 18:48 ` Arthur Marsh [not found] ` <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org> 2017-07-26 15:47 ` Christoph Hellwig 1 sibling, 1 reply; 7+ messages in thread From: Arthur Marsh @ 2017-07-13 18:48 UTC (permalink / raw) To: Alan Stern, Christoph Hellwig Cc: Matthew Dharm, USB list, SCSI development list, Jens Axboe Alan Stern wrote on 14/07/17 02:30: > All right. In the meantime, changing usb-storage won't hurt. > Arthur, can you test the patch below? > > Alan Stern > > > > Index: usb-4.x/drivers/usb/storage/usb.c > =================================================================== > --- usb-4.x.orig/drivers/usb/storage/usb.c > +++ usb-4.x/drivers/usb/storage/usb.c > @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void > { > struct us_data *us = (struct us_data *)__us; > struct Scsi_Host *host = us_to_host(us); > + struct scsi_cmnd *srb; > > for (;;) { > usb_stor_dbg(us, "*** thread sleeping\n"); > @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void > scsi_lock(host); > > /* When we are called with no command pending, we're done */ > + srb = us->srb; > if (us->srb == NULL) { > scsi_unlock(host); > mutex_unlock(&us->dev_mutex); > @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void > /* lock access to the state */ > scsi_lock(host); > > - /* indicate that the command is done */ > - if (us->srb->result != DID_ABORT << 16) { > - usb_stor_dbg(us, "scsi cmd done, result=0x%x\n", > - us->srb->result); > - us->srb->scsi_done(us->srb); > - } else { > + /* was the command aborted? */ > + if (us->srb->result == DID_ABORT << 16) { > SkipForAbort: > usb_stor_dbg(us, "scsi command aborted\n"); > + srb = NULL; /* Don't call srb->scsi_done() */ > } > > /* > @@ -429,6 +428,13 @@ SkipForAbort: > > /* unlock the device pointers */ > mutex_unlock(&us->dev_mutex); > + > + /* now that the locks are released, notify the SCSI core */ > + if (srb) { > + usb_stor_dbg(us, "scsi cmd done, result=0x%x\n", > + srb->result); > + srb->scsi_done(srb); > + } > } /* for (;;) */ > > /* Wait until we are told to stop */ > > Thanks for the patch! I have applied it and am running the resulting kernel. As I didn't have a reproducible way to trigger the problem on demand, I'll just have to see that there isn't a lock-up that looks related over the next several days. Arthur. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org>]
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage [not found] ` <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org> @ 2017-07-17 20:15 ` Arthur Marsh 0 siblings, 0 replies; 7+ messages in thread From: Arthur Marsh @ 2017-07-17 20:15 UTC (permalink / raw) To: Alan Stern, Christoph Hellwig Cc: Matthew Dharm, USB list, SCSI development list, Jens Axboe Arthur Marsh wrote on 14/07/17 04:18: > > > Alan Stern wrote on 14/07/17 02:30: > >> All right. In the meantime, changing usb-storage won't hurt. >> Arthur, can you test the patch below? >> >> Alan Stern >> >> >> >> Index: usb-4.x/drivers/usb/storage/usb.c >> =================================================================== >> --- usb-4.x.orig/drivers/usb/storage/usb.c >> +++ usb-4.x/drivers/usb/storage/usb.c >> @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void >> { >> struct us_data *us = (struct us_data *)__us; >> struct Scsi_Host *host = us_to_host(us); >> + struct scsi_cmnd *srb; >> for (;;) { >> usb_stor_dbg(us, "*** thread sleeping\n"); >> @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void >> scsi_lock(host); >> /* When we are called with no command pending, we're done */ >> + srb = us->srb; >> if (us->srb == NULL) { >> scsi_unlock(host); >> mutex_unlock(&us->dev_mutex); >> @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void >> /* lock access to the state */ >> scsi_lock(host); >> - /* indicate that the command is done */ >> - if (us->srb->result != DID_ABORT << 16) { >> - usb_stor_dbg(us, "scsi cmd done, result=0x%x\n", >> - us->srb->result); >> - us->srb->scsi_done(us->srb); >> - } else { >> + /* was the command aborted? */ >> + if (us->srb->result == DID_ABORT << 16) { >> SkipForAbort: >> usb_stor_dbg(us, "scsi command aborted\n"); >> + srb = NULL; /* Don't call srb->scsi_done() */ >> } >> /* >> @@ -429,6 +428,13 @@ SkipForAbort: >> /* unlock the device pointers */ >> mutex_unlock(&us->dev_mutex); >> + >> + /* now that the locks are released, notify the SCSI core */ >> + if (srb) { >> + usb_stor_dbg(us, "scsi cmd done, result=0x%x\n", >> + srb->result); >> + srb->scsi_done(srb); >> + } >> } /* for (;;) */ >> /* Wait until we are told to stop */ Hi, just to confirm no further lock-ups occurred in the last 4 days with this patch applied. Arthur. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage [not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org> 2017-07-13 18:48 ` Arthur Marsh @ 2017-07-26 15:47 ` Christoph Hellwig 2017-07-26 15:50 ` Alan Stern 1 sibling, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2017-07-26 15:47 UTC (permalink / raw) To: Alan Stern Cc: Christoph Hellwig, Arthur Marsh, Matthew Dharm, USB list, SCSI development list, Jens Axboe On Thu, Jul 13, 2017 at 01:00:29PM -0400, Alan Stern wrote: > All right. In the meantime, changing usb-storage won't hurt. > Arthur, can you test the patch below? Do you plan to send this fix to Greg for 4.13-rc? -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage 2017-07-26 15:47 ` Christoph Hellwig @ 2017-07-26 15:50 ` Alan Stern 0 siblings, 0 replies; 7+ messages in thread From: Alan Stern @ 2017-07-26 15:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Arthur Marsh, Matthew Dharm, USB list, SCSI development list, Jens Axboe On Wed, 26 Jul 2017, Christoph Hellwig wrote: > On Thu, Jul 13, 2017 at 01:00:29PM -0400, Alan Stern wrote: > > All right. In the meantime, changing usb-storage won't hurt. > > Arthur, can you test the patch below? > > Do you plan to send this fix to Greg for 4.13-rc? I just submitted it. Alan Stern ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-07-26 15:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2b6dc4b0-71c8-28cb-383d-c01a9e896c7a@internode.on.net>
2017-07-12 16:10 ` CPU lock-ups with 4.12.0+ kernels related to usb_storage Alan Stern
2017-07-12 16:52 ` Christoph Hellwig
2017-07-13 17:00 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2017-07-13 18:48 ` Arthur Marsh
[not found] ` <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org>
2017-07-17 20:15 ` Arthur Marsh
2017-07-26 15:47 ` Christoph Hellwig
2017-07-26 15:50 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox