* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
[not found] <2b6dc4b0-71c8-28cb-383d-c01a9e896c7a@internode.on.net>
@ 2017-07-12 16:10 ` Alan Stern
2017-07-12 16:52 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Alan Stern @ 2017-07-12 16:10 UTC (permalink / raw)
To: Arthur Marsh
Cc: Christoph Hellwig, Matthew Dharm, USB list, SCSI development list
On Thu, 13 Jul 2017, Arthur Marsh wrote:
> Thanks for the advice, I've enabled CONFIG_FRAME_POINTER and managed to
> catch 1 more log triggered by issuing a
>
> blkid
>
> command after inserting a USB stick.
>
> The problem is that I can't be certain of triggering the problem when
> testing kernels using git-bisect, so my git-bisect efforts are not
> guaranteed to get a result.
> Jul 12 18:51:00 localhost kernel: [ 1206.802306] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> Jul 12 18:51:00 localhost kernel: [ 1206.802308] Modules linked in: cmac arc4 ecb md4 nls_utf8 cifs ccm dns_resolver fscache bnep bluetooth hmac drbg ansi_cprng ecdh_generic nfc rfkill cpufreq_userspace cpufreq_conservative snd_hrtimer cpufreq_powersave binfmt_misc fuse snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi_event snd_seq max6650 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi parport_pc ppdev lp parport ir_lirc_codec lirc_dev rtl2832_sdr videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media fc0012 rtl2832 i2c_mux dvb_usb_rtl28xxu dvb_usb_v2 amdkfd dvb_core rc_core edac_mce_amd radeon kvm_amd kvm irqbypass wmi_bmof snd_hda_codec_hdmi snd_emu10k1 snd_hda_intel ttm sg drm_km
s_helper snd_hda_codec snd_util_mem snd_hda_core
> Jul 12 18:51:00 localhost kernel: [ 1206.802336] snd_ac97_codec snd_rawmidi snd_seq_device snd_hwdep snd_pcm_oss drm snd_mixer_oss ac97_bus snd_pcm pcspkr evdev k10temp i2c_algo_bit serio_raw snd_timer emu10k1_gp snd gameport sp5100_tco soundcore button acpi_cpufreq asus_atk0110 wmi ext4 mbcache crc16 jbd2 crc32c_generic btrfs raid6_pq xor uas usb_storage sr_mod cdrom sd_mod hid_generic usbhid hid ata_generic ohci_pci ahci firewire_ohci libahci pata_atiixp firewire_core crc_itu_t i2c_piix4 ehci_pci ohci_hcd libata ehci_hcd usbcore scsi_mod r8169 mii
> Jul 12 18:51:00 localhost kernel: [ 1206.802359] CPU: 1 PID: 142 Comm: usb-storage Not tainted 4.12.0+ #2751
> Jul 12 18:51:00 localhost kernel: [ 1206.802359] Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701 01/27/2011
> Jul 12 18:51:00 localhost kernel: [ 1206.802361] task: ffff927c610eef80 task.stack: ffffba9501b78000
> Jul 12 18:51:00 localhost kernel: [ 1206.802366] RIP: 0010:native_queued_spin_lock_slowpath+0x159/0x1c0
> Jul 12 18:51:00 localhost kernel: [ 1206.802366] RSP: 0018:ffffba9501b7bdc8 EFLAGS: 00000002
> Jul 12 18:51:00 localhost kernel: [ 1206.802368] RAX: 0000000000000101 RBX: ffff927c662f3030 RCX: 0000000000000001
> Jul 12 18:51:00 localhost kernel: [ 1206.802368] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff927c662f3030
> Jul 12 18:51:00 localhost kernel: [ 1206.802369] RBP: ffffba9501b7bde0 R08: 0000000000000101 R09: 0000000000000100
> Jul 12 18:51:00 localhost kernel: [ 1206.802370] R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000082
> Jul 12 18:51:00 localhost kernel: [ 1206.802370] R13: 0000000000002003 R14: 0000000000001d4c R15: ffff927c610eef80
> Jul 12 18:51:00 localhost kernel: [ 1206.802371] FS: 0000000000000000(0000) GS:ffff927c6fc40000(0000) knlGS:0000000000000000
> Jul 12 18:51:00 localhost kernel: [ 1206.802372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 12 18:51:00 localhost kernel: [ 1206.802373] CR2: 00000030007995d8 CR3: 000000017b144000 CR4: 00000000000006e0
> Jul 12 18:51:00 localhost kernel: [ 1206.802374] Call Trace:
> Jul 12 18:51:00 localhost kernel: [ 1206.802378] ? _raw_spin_lock_irqsave+0x59/0x68
> Jul 12 18:51:00 localhost kernel: [ 1206.802389] scsi_eh_scmd_add+0x3f/0x150 [scsi_mod]
> Jul 12 18:51:00 localhost kernel: [ 1206.802393] scsi_softirq_done+0xb5/0x150 [scsi_mod]
> Jul 12 18:51:00 localhost kernel: [ 1206.802396] __blk_mq_complete_request+0xd4/0x160
> Jul 12 18:51:00 localhost kernel: [ 1206.802397] blk_mq_complete_request+0x18/0x20
> Jul 12 18:51:00 localhost kernel: [ 1206.802402] scsi_mq_done+0x21/0x80 [scsi_mod]
> Jul 12 18:51:00 localhost kernel: [ 1206.802405] usb_stor_control_thread+0xf4/0x250 [usb_storage]
> Jul 12 18:51:00 localhost kernel: [ 1206.802407] kthread+0x125/0x140
> Jul 12 18:51:00 localhost kernel: [ 1206.802409] ? fill_inquiry_response+0x20/0x20 [usb_storage]
> Jul 12 18:51:00 localhost kernel: [ 1206.802410] ? kthread_create_on_node+0x70/0x70
> Jul 12 18:51:00 localhost kernel: [ 1206.802412] ret_from_fork+0x25/0x30
> Jul 12 18:51:00 localhost kernel: [ 1206.802413] Code: c2 74 a2 89 c2 89 d0 66 31 c0 39 c6 74 ea 4d 85 c9 c6 07 01 74 21 41 c7 41 08 01 00 00 00 eb 85 41 ff c9 75 04 f3 c3 f3 90 8b 07 <84> c0 75 f8 66 c7 07 01 00 c3 f3 90 4c 8b 09 4d 85 c9 74 f6 eb
This is pretty conclusive. The problem comes about because
usb_stor_control_thread() calls scsi_mq_done() while holding
shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that
same lock.
I don't know why this didn't show up in earlier kernels. I guess some
element of the call chain listed above must be new in 4.12.
Christoph, what's the best way to fix this? Should usb-storage release
the host lock before issuing the ->scsi_done callback? If so, does
that change need to be applied to any kernels before 4.12?
Alan Stern
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
2017-07-12 16:10 ` CPU lock-ups with 4.12.0+ kernels related to usb_storage Alan Stern
@ 2017-07-12 16:52 ` Christoph Hellwig
2017-07-13 17:00 ` Alan Stern
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2017-07-12 16:52 UTC (permalink / raw)
To: Alan Stern
Cc: Arthur Marsh, Christoph Hellwig, Matthew Dharm, USB list,
SCSI development list, Jens Axboe
On Wed, Jul 12, 2017 at 12:10:02PM -0400, Alan Stern wrote:
> This is pretty conclusive. The problem comes about because
> usb_stor_control_thread() calls scsi_mq_done() while holding
> shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that
> same lock.
>
> I don't know why this didn't show up in earlier kernels. I guess some
> element of the call chain listed above must be new in 4.12.
>
> Christoph, what's the best way to fix this? Should usb-storage release
> the host lock before issuing the ->scsi_done callback? If so, does
> that change need to be applied to any kernels before 4.12?
4.12 switched to blk-mq by default, and while the old code used
a softirq for completions, which is always a difference context
the blk-mq code might execute in the same context it's called in.
So yes, for that we'd need to drop host_lock. But I wonder how
many more of these are lingering somewhere and if we can find
another workaround.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
2017-07-12 16:52 ` Christoph Hellwig
@ 2017-07-13 17:00 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Alan Stern @ 2017-07-13 17:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Arthur Marsh, Matthew Dharm, USB list, SCSI development list,
Jens Axboe
On Wed, 12 Jul 2017, Christoph Hellwig wrote:
> On Wed, Jul 12, 2017 at 12:10:02PM -0400, Alan Stern wrote:
> > This is pretty conclusive. The problem comes about because
> > usb_stor_control_thread() calls scsi_mq_done() while holding
> > shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that
> > same lock.
> >
> > I don't know why this didn't show up in earlier kernels. I guess some
> > element of the call chain listed above must be new in 4.12.
> >
> > Christoph, what's the best way to fix this? Should usb-storage release
> > the host lock before issuing the ->scsi_done callback? If so, does
> > that change need to be applied to any kernels before 4.12?
>
> 4.12 switched to blk-mq by default, and while the old code used
> a softirq for completions, which is always a difference context
> the blk-mq code might execute in the same context it's called in.
>
> So yes, for that we'd need to drop host_lock. But I wonder how
> many more of these are lingering somewhere and if we can find
> another workaround.
All right. In the meantime, changing usb-storage won't hurt.
Arthur, can you test the patch below?
Alan Stern
Index: usb-4.x/drivers/usb/storage/usb.c
===================================================================
--- usb-4.x.orig/drivers/usb/storage/usb.c
+++ usb-4.x/drivers/usb/storage/usb.c
@@ -315,6 +315,7 @@ static int usb_stor_control_thread(void
{
struct us_data *us = (struct us_data *)__us;
struct Scsi_Host *host = us_to_host(us);
+ struct scsi_cmnd *srb;
for (;;) {
usb_stor_dbg(us, "*** thread sleeping\n");
@@ -330,6 +331,7 @@ static int usb_stor_control_thread(void
scsi_lock(host);
/* When we are called with no command pending, we're done */
+ srb = us->srb;
if (us->srb == NULL) {
scsi_unlock(host);
mutex_unlock(&us->dev_mutex);
@@ -398,14 +400,11 @@ static int usb_stor_control_thread(void
/* lock access to the state */
scsi_lock(host);
- /* indicate that the command is done */
- if (us->srb->result != DID_ABORT << 16) {
- usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
- us->srb->result);
- us->srb->scsi_done(us->srb);
- } else {
+ /* was the command aborted? */
+ if (us->srb->result == DID_ABORT << 16) {
SkipForAbort:
usb_stor_dbg(us, "scsi command aborted\n");
+ srb = NULL; /* Don't call srb->scsi_done() */
}
/*
@@ -429,6 +428,13 @@ SkipForAbort:
/* unlock the device pointers */
mutex_unlock(&us->dev_mutex);
+
+ /* now that the locks are released, notify the SCSI core */
+ if (srb) {
+ usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
+ srb->result);
+ srb->scsi_done(srb);
+ }
} /* for (;;) */
/* Wait until we are told to stop */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
[not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2017-07-13 18:48 ` Arthur Marsh
[not found] ` <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org>
2017-07-26 15:47 ` Christoph Hellwig
1 sibling, 1 reply; 7+ messages in thread
From: Arthur Marsh @ 2017-07-13 18:48 UTC (permalink / raw)
To: Alan Stern, Christoph Hellwig
Cc: Matthew Dharm, USB list, SCSI development list, Jens Axboe
Alan Stern wrote on 14/07/17 02:30:
> All right. In the meantime, changing usb-storage won't hurt.
> Arthur, can you test the patch below?
>
> Alan Stern
>
>
>
> Index: usb-4.x/drivers/usb/storage/usb.c
> ===================================================================
> --- usb-4.x.orig/drivers/usb/storage/usb.c
> +++ usb-4.x/drivers/usb/storage/usb.c
> @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void
> {
> struct us_data *us = (struct us_data *)__us;
> struct Scsi_Host *host = us_to_host(us);
> + struct scsi_cmnd *srb;
>
> for (;;) {
> usb_stor_dbg(us, "*** thread sleeping\n");
> @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void
> scsi_lock(host);
>
> /* When we are called with no command pending, we're done */
> + srb = us->srb;
> if (us->srb == NULL) {
> scsi_unlock(host);
> mutex_unlock(&us->dev_mutex);
> @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void
> /* lock access to the state */
> scsi_lock(host);
>
> - /* indicate that the command is done */
> - if (us->srb->result != DID_ABORT << 16) {
> - usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
> - us->srb->result);
> - us->srb->scsi_done(us->srb);
> - } else {
> + /* was the command aborted? */
> + if (us->srb->result == DID_ABORT << 16) {
> SkipForAbort:
> usb_stor_dbg(us, "scsi command aborted\n");
> + srb = NULL; /* Don't call srb->scsi_done() */
> }
>
> /*
> @@ -429,6 +428,13 @@ SkipForAbort:
>
> /* unlock the device pointers */
> mutex_unlock(&us->dev_mutex);
> +
> + /* now that the locks are released, notify the SCSI core */
> + if (srb) {
> + usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
> + srb->result);
> + srb->scsi_done(srb);
> + }
> } /* for (;;) */
>
> /* Wait until we are told to stop */
>
>
Thanks for the patch!
I have applied it and am running the resulting kernel.
As I didn't have a reproducible way to trigger the problem on demand,
I'll just have to see that there isn't a lock-up that looks related over
the next several days.
Arthur.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
[not found] ` <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org>
@ 2017-07-17 20:15 ` Arthur Marsh
0 siblings, 0 replies; 7+ messages in thread
From: Arthur Marsh @ 2017-07-17 20:15 UTC (permalink / raw)
To: Alan Stern, Christoph Hellwig
Cc: Matthew Dharm, USB list, SCSI development list, Jens Axboe
Arthur Marsh wrote on 14/07/17 04:18:
>
>
> Alan Stern wrote on 14/07/17 02:30:
>
>> All right. In the meantime, changing usb-storage won't hurt.
>> Arthur, can you test the patch below?
>>
>> Alan Stern
>>
>>
>>
>> Index: usb-4.x/drivers/usb/storage/usb.c
>> ===================================================================
>> --- usb-4.x.orig/drivers/usb/storage/usb.c
>> +++ usb-4.x/drivers/usb/storage/usb.c
>> @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void
>> {
>> struct us_data *us = (struct us_data *)__us;
>> struct Scsi_Host *host = us_to_host(us);
>> + struct scsi_cmnd *srb;
>> for (;;) {
>> usb_stor_dbg(us, "*** thread sleeping\n");
>> @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void
>> scsi_lock(host);
>> /* When we are called with no command pending, we're done */
>> + srb = us->srb;
>> if (us->srb == NULL) {
>> scsi_unlock(host);
>> mutex_unlock(&us->dev_mutex);
>> @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void
>> /* lock access to the state */
>> scsi_lock(host);
>> - /* indicate that the command is done */
>> - if (us->srb->result != DID_ABORT << 16) {
>> - usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
>> - us->srb->result);
>> - us->srb->scsi_done(us->srb);
>> - } else {
>> + /* was the command aborted? */
>> + if (us->srb->result == DID_ABORT << 16) {
>> SkipForAbort:
>> usb_stor_dbg(us, "scsi command aborted\n");
>> + srb = NULL; /* Don't call srb->scsi_done() */
>> }
>> /*
>> @@ -429,6 +428,13 @@ SkipForAbort:
>> /* unlock the device pointers */
>> mutex_unlock(&us->dev_mutex);
>> +
>> + /* now that the locks are released, notify the SCSI core */
>> + if (srb) {
>> + usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
>> + srb->result);
>> + srb->scsi_done(srb);
>> + }
>> } /* for (;;) */
>> /* Wait until we are told to stop */
Hi, just to confirm no further lock-ups occurred in the last 4 days with
this patch applied.
Arthur.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
[not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2017-07-13 18:48 ` Arthur Marsh
@ 2017-07-26 15:47 ` Christoph Hellwig
2017-07-26 15:50 ` Alan Stern
1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2017-07-26 15:47 UTC (permalink / raw)
To: Alan Stern
Cc: Christoph Hellwig, Arthur Marsh, Matthew Dharm, USB list,
SCSI development list, Jens Axboe
On Thu, Jul 13, 2017 at 01:00:29PM -0400, Alan Stern wrote:
> All right. In the meantime, changing usb-storage won't hurt.
> Arthur, can you test the patch below?
Do you plan to send this fix to Greg for 4.13-rc?
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPU lock-ups with 4.12.0+ kernels related to usb_storage
2017-07-26 15:47 ` Christoph Hellwig
@ 2017-07-26 15:50 ` Alan Stern
0 siblings, 0 replies; 7+ messages in thread
From: Alan Stern @ 2017-07-26 15:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Arthur Marsh, Matthew Dharm, USB list, SCSI development list,
Jens Axboe
On Wed, 26 Jul 2017, Christoph Hellwig wrote:
> On Thu, Jul 13, 2017 at 01:00:29PM -0400, Alan Stern wrote:
> > All right. In the meantime, changing usb-storage won't hurt.
> > Arthur, can you test the patch below?
>
> Do you plan to send this fix to Greg for 4.13-rc?
I just submitted it.
Alan Stern
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-07-26 15:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2b6dc4b0-71c8-28cb-383d-c01a9e896c7a@internode.on.net>
2017-07-12 16:10 ` CPU lock-ups with 4.12.0+ kernels related to usb_storage Alan Stern
2017-07-12 16:52 ` Christoph Hellwig
2017-07-13 17:00 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1707131258190.1571-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2017-07-13 18:48 ` Arthur Marsh
[not found] ` <02c6c52e-d520-6d84-81e7-0b49dfae0ec9-CkBdp7X+a1oIQCUVoCVjmQ@public.gmane.org>
2017-07-17 20:15 ` Arthur Marsh
2017-07-26 15:47 ` Christoph Hellwig
2017-07-26 15:50 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox