* USB storage SCSI EH oops
@ 2012-04-14 22:18 Linus Torvalds
2012-04-14 22:29 ` Linus Torvalds
2012-04-15 3:01 ` Martin K. Petersen
0 siblings, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2012-04-14 22:18 UTC (permalink / raw)
To: Martin K. Petersen, James Bottomley, Jens Axboe,
Greg Kroah-Hartman
Cc: linux-kernel@vger.kernel.org
So I got the appended NULL pointer dereference with current -git (plus
the RCU patches I'm testing, but they seem unrelated)..
The stack still has signs of some USB urb dequeue, so I suspect that
is related, even if it isn't in the actual stack frame trace (and this
is an example of why stack traces using pure frame pointer information
may not always be a good idea - the stale stack things that show up
can be interesting).
It triggered when I inserted a SD-card into my Dell monitor - as I
decided to finally try to use that reader. Obviously there's some
problem with it, but oopsing certainly isn't the solution.
The NULL pointer dereference is the "rq_disk->private_data" access -
it looks like rq_disk is NULL. This is all part of the whole
scsi_cmd_to_driver() thing.
I've added to the participants list the relevant parties, but I really
think the bug was introduced in commit 18a4d0a22ed6 ("[SCSI] Handle
disk devices which can not process medium access commands") which is
new since 3.3. That's the thing that added that whole
"scsi_cmd_to_driver()" call to the error path, and I suspect that the
problem is that it's simply not appropriate at that level. Presumably
the whole "rq_disk" thing is only set up much later, after the disk
has actually been recognized.
Please check this out asap.
Linus
---
usb 2-1.4.1.1: reset high-speed USB device number 7 using ehci_hcd
BUG: unable to handle kernel NULL pointer dereference at 0000000000000220
IP: [<ffffffff813e43c1>] scsi_send_eh_cmnd+0x41/0x2d0
PGD 0
Oops: 0000 [#1] PREEMPT SMP
CPU 2
Pid: 18579, comm: scsi_eh_9 Not tainted
3.4.0-rc2-00348-g7bcaa30035d1 #14 System manufacturer System Prod
RIP: 0010:[<ffffffff813e43c1>] [<ffffffff813e43c1>]
scsi_send_eh_cmnd+0x41/0x2d0
RSP: 0018:ffff8801fb1e1cf0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff880220914800 RCX: 0000000000000006
RDX: ffffffff81d95668 RSI: ffff8801fb1e1d00 RDI: ffff880220914800
RBP: ffff8801fb1e1dc0 R08: 0000000000000000 R09: dead000000100100
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8801fb1e1e90
R13: ffff8801fb1e1d70 R14: 0000000000000006 R15: ffffffff81d95668
FS: 0000000000000000(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000220 CR3: 0000000001c0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_9 (pid: 18579, threadinfo ffff8801fb1e0000, task
ffff880231cc5e00)
Stack:
000009c4fb1e1d10 ffffffff81444f2e 0000000000000000 0000000000000082
ffff8801fb1e1d60 ffffffff814458cb ffff8801fb1e0000 ffff8801ffffff98
ffff880236875240 ffff88023263cdc8 7fffffffffffffff ffff88023263cdd0
Call Trace:
[<ffffffff81444f2e>] ? unlink_async+0x7e/0x80
[<ffffffff814458cb>] ? ehci_urb_dequeue+0x8b/0xf0
[<ffffffff816ae161>] ? wait_for_common+0x121/0x150
[<ffffffff813e46e5>] scsi_eh_tur+0x25/0x80
[<ffffffff813e47b0>] scsi_eh_test_devices+0x70/0x190
[<ffffffff813e5679>] scsi_error_handler+0x419/0x480
[<ffffffff813e5260>] ? scsi_eh_get_sense+0x100/0x100
[<ffffffff813e5260>] ? scsi_eh_get_sense+0x100/0x100
[<ffffffff8104ad96>] kthread+0x96/0xa0
[<ffffffff816b1854>] kernel_thread_helper+0x4/0x10
[<ffffffff8104ad00>] ? kthread_flush_work_fn+0x10/0x10
[<ffffffff816b1850>] ? gs_change+0xb/0xb
Code: d6 41 54 4c 8d 6d b0 53 48 89 fb 48 81 ec a8 00 00 00 89 8d 34
ff ff ff 48 8b 87 80 00 00 00 89 d1
RIP [<ffffffff813e43c1>] scsi_send_eh_cmnd+0x41/0x2d0
RSP <ffff8801fb1e1cf0>
CR2: 0000000000000220
---[ end trace e9fb437b88cc7b45 ]---
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: USB storage SCSI EH oops
2012-04-14 22:18 USB storage SCSI EH oops Linus Torvalds
@ 2012-04-14 22:29 ` Linus Torvalds
2012-04-14 22:49 ` Linus Torvalds
2012-04-15 3:01 ` Martin K. Petersen
1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2012-04-14 22:29 UTC (permalink / raw)
To: Martin K. Petersen, James Bottomley, Jens Axboe,
Greg Kroah-Hartman
Cc: linux-kernel@vger.kernel.org
On Sat, Apr 14, 2012 at 3:18 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I've added to the participants list the relevant parties, but I really
> think the bug was introduced in commit 18a4d0a22ed6 ("[SCSI] Handle
> disk devices which can not process medium access commands") which is
> new since 3.3. That's the thing that added that whole
> "scsi_cmd_to_driver()" call to the error path, and I suspect that the
> problem is that it's simply not appropriate at that level. Presumably
> the whole "rq_disk" thing is only set up much later, after the disk
> has actually been recognized.
Confirmed.
I tested twice: with that patch, the oops is repeatable, and happens
something like 30 seconds after plugging in the USB thing into the
monitor.
With that patch reverted, the thing still doesn't *work*, but I don't
get the oops. Instead, I get the appended noise in my dmesg..
The oops seems to happen immediately after it does the first "reset
high-speed USB device".
I am currently planning on just reverting that commit, unless somebody
sends me a fix asap.
Linus
---
[ 37.801911] usb 2-1.4: new high-speed USB device number 5 using ehci_hcd
[ 37.894129] usb 2-1.4: New USB device found, idVendor=0424, idProduct=2502
[ 37.894135] usb 2-1.4: New USB device strings: Mfr=0, Product=0,
SerialNumber=0
[ 37.894394] hub 2-1.4:1.0: USB hub found
[ 37.894592] hub 2-1.4:1.0: 2 ports detected
[ 38.165342] usb 2-1.4.1: new high-speed USB device number 6 using ehci_hcd
[ 38.257582] usb 2-1.4.1: New USB device found, idVendor=0424, idProduct=2602
[ 38.257588] usb 2-1.4.1: New USB device strings: Mfr=0, Product=0,
SerialNumber=0
[ 38.257834] hub 2-1.4.1:1.0: USB hub found
[ 38.258027] hub 2-1.4.1:1.0: 4 ports detected
[ 38.528797] usb 2-1.4.1.1: new high-speed USB device number 7 using ehci_hcd
[ 38.707359] usb 2-1.4.1.1: New USB device found, idVendor=0424,
idProduct=2228
[ 38.707365] usb 2-1.4.1.1: New USB device strings: Mfr=1,
Product=2, SerialNumber=3
[ 38.707370] usb 2-1.4.1.1: Product: Flash Card Reader
[ 38.707373] usb 2-1.4.1.1: Manufacturer: Generic
[ 38.707376] usb 2-1.4.1.1: SerialNumber: 26020128B005
[ 38.707684] scsi9 : usb-storage 2-1.4.1.1:1.0
[ 39.712828] scsi 9:0:0:0: Direct-Access Generic Flash HS-CF
4.44 PQ: 0 ANSI: 0
[ 39.712976] sd 9:0:0:0: Attached scsi generic sg2 type 0
[ 39.719767] scsi 9:0:0:1: Direct-Access Generic Flash HS-COMBO
4.44 PQ: 0 ANSI: 0
[ 39.719855] sd 9:0:0:1: Attached scsi generic sg3 type 0
[ 69.989207] usb 2-1.4.1.1: reset high-speed USB device number 7
using ehci_hcd
[ 80.181679] usb 2-1.4.1.1: reset high-speed USB device number 7
using ehci_hcd
[ 96.369007] usb 2-1.4.1.1: reset high-speed USB device number 7
using ehci_hcd
[ 96.580731] usb 2-1.4.1.1: reset high-speed USB device number 7
using ehci_hcd
[ 106.773182] usb 2-1.4.1.1: reset high-speed USB device number 7
using ehci_hcd
[ 106.911994] sd 9:0:0:1: Device offlined - not ready after error recovery
[ 106.912020] sd 9:0:0:1: rejecting I/O to offline device
[ 106.912028] sd 9:0:0:1: rejecting I/O to offline device
[ 106.912031] sd 9:0:0:1: rejecting I/O to offline device
[ 106.912034] sd 9:0:0:1: [sdc] READ CAPACITY failed
[ 106.912035] sd 9:0:0:1: [sdc] Result: hostbyte=0x01 driverbyte=0x00
[ 106.912037] sd 9:0:0:1: [sdc] Sense not available.
[ 106.912041] sd 9:0:0:1: rejecting I/O to offline device
[ 106.912044] sd 9:0:0:1: [sdc] Write Protect is off
[ 106.912045] sd 9:0:0:1: [sdc] Mode Sense: 00 00 00 00
[ 106.912047] sd 9:0:0:1: rejecting I/O to offline device
[ 106.912050] sd 9:0:0:1: [sdc] Asking for cache data failed
[ 106.912051] sd 9:0:0:1: [sdc] Assuming drive cache: write through
[ 106.912800] sd 9:0:0:1: [sdc] Attached SCSI removable disk
[ 106.922403] sd 9:0:0:0: [sdb] Attached SCSI removable disk
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: USB storage SCSI EH oops
2012-04-14 22:29 ` Linus Torvalds
@ 2012-04-14 22:49 ` Linus Torvalds
2012-04-18 7:53 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2012-04-14 22:49 UTC (permalink / raw)
To: Martin K. Petersen, James Bottomley, Jens Axboe,
Greg Kroah-Hartman
Cc: linux-kernel@vger.kernel.org
On Sat, Apr 14, 2012 at 3:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Confirmed.
>
> I tested twice: with that patch, the oops is repeatable, and happens
> something like 30 seconds after plugging in the USB thing into the
> monitor.
>
> With that patch reverted, the thing still doesn't *work*, but I don't
> get the oops. Instead, I get the appended noise in my dmesg..
.. and the reason that card reader has trouble seems to be that it's
just too damn old, and doesn't understand SD-HC cards. It works fine
with old SD cards.
So the reader is fine (well, apart from being too old), USB-storage is
fine, but the SCSI error handler is broken.
Even with that commit reverted, once the SCSI layer has decided to
off-line the device, you can't get it back, even if you remove the
media and insert a non-HC SD card. You have to unplug and re-plug the
reader. That seems to be a slight misfeature of SCSI error handling,
but compared to oopsing, it's minor.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: USB storage SCSI EH oops
2012-04-14 22:18 USB storage SCSI EH oops Linus Torvalds
2012-04-14 22:29 ` Linus Torvalds
@ 2012-04-15 3:01 ` Martin K. Petersen
1 sibling, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2012-04-15 3:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Martin K. Petersen, James Bottomley, Jens Axboe,
Greg Kroah-Hartman, linux-kernel@vger.kernel.org
>>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes:
Linus,
Linus> So I got the appended NULL pointer dereference with current -git
Linus> (plus the RCU patches I'm testing, but they seem unrelated)..
I sent out the following patch earlier in the week but James hasn't
picked it up yet...
SCSI: Fix error handling when no ULD is attached
Commit 18a4d0a2 introduced a bug in which we would attempt to
dereference the scsi driver even when the device had no ULD attached.
Ensure that a driver is registered and make the driver accessor function
more resilient to errors during device discovery.
Reported-by: Elric Fu <elricfu1@gmail.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 2cfcbff..386f0c5 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -835,7 +835,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
scsi_eh_restore_cmnd(scmd, &ses);
- if (sdrv->eh_action)
+ if (sdrv && sdrv->eh_action)
rtn = sdrv->eh_action(scmd, cmnd, cmnd_size, rtn);
return rtn;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 377df4a..1e11985 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -134,6 +134,9 @@ struct scsi_cmnd {
static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
{
+ if (!cmd->request->rq_disk)
+ return NULL;
+
return *(struct scsi_driver **)cmd->request->rq_disk->private_data;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: USB storage SCSI EH oops
2012-04-14 22:49 ` Linus Torvalds
@ 2012-04-18 7:53 ` James Bottomley
2012-04-18 7:56 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2012-04-18 7:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Martin K. Petersen, Jens Axboe, Greg Kroah-Hartman,
linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1519 bytes --]
On Sat, 2012-04-14 at 15:49 -0700, Linus Torvalds wrote:
> On Sat, Apr 14, 2012 at 3:29 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Confirmed.
> >
> > I tested twice: with that patch, the oops is repeatable, and happens
> > something like 30 seconds after plugging in the USB thing into the
> > monitor.
> >
> > With that patch reverted, the thing still doesn't *work*, but I don't
> > get the oops. Instead, I get the appended noise in my dmesg..
>
> .. and the reason that card reader has trouble seems to be that it's
> just too damn old, and doesn't understand SD-HC cards. It works fine
> with old SD cards.
>
> So the reader is fine (well, apart from being too old), USB-storage is
> fine, but the SCSI error handler is broken.
>
> Even with that commit reverted, once the SCSI layer has decided to
> off-line the device, you can't get it back, even if you remove the
> media and insert a non-HC SD card. You have to unplug and re-plug the
> reader. That seems to be a slight misfeature of SCSI error handling,
> but compared to oopsing, it's minor.
OK, will either queue the update or a revert.
Just on the offline device problem; after it's offlined, can you get it
back with
echo running > /sys/block/sd<x>/device/state
?
That would show we're failing to recognise the device as removable media
which is gone.
James
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: USB storage SCSI EH oops
2012-04-18 7:53 ` James Bottomley
@ 2012-04-18 7:56 ` Jens Axboe
2012-04-18 7:58 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2012-04-18 7:56 UTC (permalink / raw)
To: James Bottomley
Cc: Linus Torvalds, Martin K. Petersen, Greg Kroah-Hartman,
linux-kernel@vger.kernel.org
On 04/18/2012 09:53 AM, James Bottomley wrote:
> On Sat, 2012-04-14 at 15:49 -0700, Linus Torvalds wrote:
>> On Sat, Apr 14, 2012 at 3:29 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> Confirmed.
>>>
>>> I tested twice: with that patch, the oops is repeatable, and happens
>>> something like 30 seconds after plugging in the USB thing into the
>>> monitor.
>>>
>>> With that patch reverted, the thing still doesn't *work*, but I don't
>>> get the oops. Instead, I get the appended noise in my dmesg..
>>
>> .. and the reason that card reader has trouble seems to be that it's
>> just too damn old, and doesn't understand SD-HC cards. It works fine
>> with old SD cards.
>>
>> So the reader is fine (well, apart from being too old), USB-storage is
>> fine, but the SCSI error handler is broken.
>>
>> Even with that commit reverted, once the SCSI layer has decided to
>> off-line the device, you can't get it back, even if you remove the
>> media and insert a non-HC SD card. You have to unplug and re-plug the
>> reader. That seems to be a slight misfeature of SCSI error handling,
>> but compared to oopsing, it's minor.
>
> OK, will either queue the update or a revert.
Martins patch went into -git 4 days ago. Since Linus didn't follow up in
rage, I was assuming the issue had been resolved with that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: USB storage SCSI EH oops
2012-04-18 7:56 ` Jens Axboe
@ 2012-04-18 7:58 ` James Bottomley
0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2012-04-18 7:58 UTC (permalink / raw)
To: Jens Axboe
Cc: Linus Torvalds, Martin K. Petersen, Greg Kroah-Hartman,
linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1735 bytes --]
On Wed, 2012-04-18 at 09:56 +0200, Jens Axboe wrote:
> On 04/18/2012 09:53 AM, James Bottomley wrote:
> > On Sat, 2012-04-14 at 15:49 -0700, Linus Torvalds wrote:
> >> On Sat, Apr 14, 2012 at 3:29 PM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >>>
> >>> Confirmed.
> >>>
> >>> I tested twice: with that patch, the oops is repeatable, and happens
> >>> something like 30 seconds after plugging in the USB thing into the
> >>> monitor.
> >>>
> >>> With that patch reverted, the thing still doesn't *work*, but I don't
> >>> get the oops. Instead, I get the appended noise in my dmesg..
> >>
> >> .. and the reason that card reader has trouble seems to be that it's
> >> just too damn old, and doesn't understand SD-HC cards. It works fine
> >> with old SD cards.
> >>
> >> So the reader is fine (well, apart from being too old), USB-storage is
> >> fine, but the SCSI error handler is broken.
> >>
> >> Even with that commit reverted, once the SCSI layer has decided to
> >> off-line the device, you can't get it back, even if you remove the
> >> media and insert a non-HC SD card. You have to unplug and re-plug the
> >> reader. That seems to be a slight misfeature of SCSI error handling,
> >> but compared to oopsing, it's minor.
> >
> > OK, will either queue the update or a revert.
>
> Martins patch went into -git 4 days ago. Since Linus didn't follow up in
> rage, I was assuming the issue had been resolved with that.
Hm, OK ... I'm not sure currently the original patch is correct, but we
can sort that out as part of the normal process.
James
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-04-18 7:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-14 22:18 USB storage SCSI EH oops Linus Torvalds
2012-04-14 22:29 ` Linus Torvalds
2012-04-14 22:49 ` Linus Torvalds
2012-04-18 7:53 ` James Bottomley
2012-04-18 7:56 ` Jens Axboe
2012-04-18 7:58 ` James Bottomley
2012-04-15 3:01 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox