* Re: [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent)
2008-05-18 9:14 ` Andrew Morton
@ 2008-05-18 11:22 ` Sitsofe Wheeler
2008-05-18 16:00 ` [BUG] unable to handle kernel paging request in next-20080516 Sitsofe Wheeler
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-05-18 11:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-scsi
On Sun, 18 May 2008 02:14:23 -0700, Andrew Morton wrote:
> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a
>> panic will occur. At first I thought it might be provoked by vga=0x164
>> but this does not appear to be the case and the issue is seemingly
>> random. I've hand transcribed the oops so there may be errors in it but
>> hopefully it will still help:
>>
>> BUG: unable to handle kernel paging request at e6f17fac IP:
>> [<c02604d6>] scsi_bus_uevent+0x1/0x17 *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>>
> I thought we'd already fixed this?
Thanks to your tip off I've found that this bug is already in bugzilla
(complete with the commit that caused the regression) - http://
bugzilla.kernel.org/show_bug.cgi?id=10711 . There's nothing there that
says it has been fixed though. I'll look harder before reporting problems
next time.
--
Sitsofe | http://sucs.org/~sits/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [BUG] unable to handle kernel paging request in next-20080516
2008-05-18 9:14 ` Andrew Morton
2008-05-18 11:22 ` [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent) Sitsofe Wheeler
@ 2008-05-18 16:00 ` Sitsofe Wheeler
2008-05-18 17:47 ` Greg KH
2008-05-22 11:34 ` James Bottomley
3 siblings, 0 replies; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-05-18 16:00 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-scsi
Andrew Morton wrote:
> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
> <sitsofe@yahoo.com> wrote:
>
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
>> will occur. At first I thought it might be provoked by vga=0x164 but this
>> does not appear to be the case and the issue is seemingly random. I've
>> hand transcribed the oops so there may be errors in it but hopefully it
>> will still help:
>>
>> BUG: unable to handle kernel paging request at e6f17fac
>> IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>>
>
> I thought we'd already fixed this?
If it hasn't yet been fixed I think it can be narrowed down to
[dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource() .
Be aware that the problem also seems to go away if an initrd file is present. I
struggled to revert this commit against the latest linux-next due to conflicts.
Here's the commit message:
Author: Bjorn Helgaas <bjorn.helgaas@hp.com> 2008-04-28 23:34:35
Committer: Len Brown <len.brown@intel.com> 2008-04-29 08:22:28
Child: cc8c2e308194f0997c718c7c735550ff06754d20 (PNP: make generic pnp_add_io_resource())
Branches: v2.6.26rc1, remotes/origin/master, remotes/linux-next/stable, remotes/linux-next/master, remotes/linux-next/history, master, linux-next, bisect
Follows: v2.6.25
Precedes: v2.6.26-rc1, next-20080502, next-20080501, next-20080430
PNP: make generic pnp_add_dma_resource()
Add a pnp_add_dma_resource() that can be used by all the PNP
backends. This consolidates a little more pnp_resource_table
knowledge into one place.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Here's the git-bisect log:
# bad: [2ddcca36c8bcfa251724fe342c8327451988be0d] Linux 2.6.26-rc1
# good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25
git-bisect start 'v2.6.26-rc1' 'v2.6.25'
# good: [7ae44cfa7ab29b277691327e8de790d7b880722f] [ALSA] snd-powermac: style awacs.s and awacs.h
git-bisect good 7ae44cfa7ab29b277691327e8de790d7b880722f
# good: [c60264c494a119cd3a716a22edc0137b11de6d1e] smack: fix integer as NULL pointer warning in smack_lsm.c
git-bisect good c60264c494a119cd3a716a22edc0137b11de6d1e
# good: [3977c965ec35ce1a7eac988ad313f0fc9aee9660] ext4: zero out small extents when writing to prealloc area.
git-bisect good 3977c965ec35ce1a7eac988ad313f0fc9aee9660
# good: [ccf2779544eecfcc5447e2028d1029b6d4ff7bb6] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git-bisect good ccf2779544eecfcc5447e2028d1029b6d4ff7bb6
# bad: [55e462b05b5df4fd113c4a304c4f487d44b0898e] memcg: simple stats for memory resource controller
git-bisect bad 55e462b05b5df4fd113c4a304c4f487d44b0898e
# good: [96916090f488986a4ebb8e9ffa6a3b50881d5ccd] Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release
git-bisect good 96916090f488986a4ebb8e9ffa6a3b50881d5ccd
# bad: [6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2
# good: [f5d94ff014cb7e6212f40fc6644f3fd68507df33] PNP: pass resources, not indexes, to pnp_check_port(), et al
git-bisect good f5d94ff014cb7e6212f40fc6644f3fd68507df33
# bad: [d152cf5d0c3325979e71ee53b425fdd51a1a285a] PNPACPI: move _CRS/_PRS warnings closer to the action
git-bisect bad d152cf5d0c3325979e71ee53b425fdd51a1a285a
# good: [784f01d5bdeae7d7005ede17305306b042ba2617] PNP: add struct pnp_resource
git-bisect good 784f01d5bdeae7d7005ede17305306b042ba2617
# good: [dbddd0383c59d588f8db5e773b062756e39117ec] PNP: make generic pnp_add_irq_resource()
git-bisect good dbddd0383c59d588f8db5e773b062756e39117ec
# bad: [cc8c2e308194f0997c718c7c735550ff06754d20] PNP: make generic pnp_add_io_resource()
git-bisect bad cc8c2e308194f0997c718c7c735550ff06754d20
# bad: [dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource()
git-bisect bad dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2
--
Sitsofe | http://sucs.org/~sits/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [BUG] unable to handle kernel paging request in next-20080516
2008-05-18 9:14 ` Andrew Morton
2008-05-18 11:22 ` [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent) Sitsofe Wheeler
2008-05-18 16:00 ` [BUG] unable to handle kernel paging request in next-20080516 Sitsofe Wheeler
@ 2008-05-18 17:47 ` Greg KH
2008-05-18 20:22 ` Sitsofe Wheeler
2008-05-22 11:34 ` James Bottomley
3 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2008-05-18 17:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: Sitsofe Wheeler, linux-kernel, linux-scsi
On Sun, May 18, 2008 at 02:14:23AM -0700, Andrew Morton wrote:
>
> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
>
> > Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
> > will occur. At first I thought it might be provoked by vga=0x164 but this
> > does not appear to be the case and the issue is seemingly random. I've
> > hand transcribed the oops so there may be errors in it but hopefully it
> > will still help:
> >
> > BUG: unable to handle kernel paging request at e6f17fac
> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
> > *pde = 2714b163 *pte = 26f17160
> > Oops: 0000 [#1] DEBUG_PAGEALLOC
> > last sysfs file:
> >
> > Pid: 1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
> > EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
> > EIP is at scsi_bus_uevent+0x1/0x17
> > EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
> > ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
> > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
> > Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
> > e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
> > 00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
> > Call Trace:
> > [<c0237f3a>] ? dev_uevent+0x8e/0xca
> > [<c0237eac>] ? dev_uevent+0x0/0xca
> > [<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
> > [<c01da52a>] ? kobject_uevent_env+0xa/0xc
> > [<c023884b>] ? device_add+0x2bf/0x3f0
> > [<c0321905>] ? mutex_unlock+0x8/0xa
> > [<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
> > [<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
> > [<c025f9ef>] ? __scsi_add_device+0x85/0xab
> > [<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
> > [<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
> > [<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
> > [<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
> > [<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
> > [<c027219c>] ? via_init_one+0x1da/0x1e3
> > [<c01e5670>] ? pci_device_probe+0x39/0x59
> > [<c023a0a1>] ? driver_probe_device+0x9f/0x119
> > [<c023a158>] ? __driver_attach+0x3d/0x5f
> > [<c023990a>] ? bus_for_each_dev+0x3e/0x60
> > [<c0239f39>] ? driver_attach+0x14/0x16
> > [<c023a11b>] ? __driver_attach+0x0/0x5f
> > [<c0239c9d>] ? bus_add_driver+0x99/0x1a0
> > [<c023a2d6>] ? driver_register+0x71/0xcd
> > [<c01e5852>] ? __pci_register_driver+0x53/0x81
> > [<c04205b1>] ? kernel_init+0x0/0xc4
> > [<c04378fc>] ? via_init+0x14/0x16
> > [<c0132800>] ? trace_softirqs_on+0x78/0x7e
> > [<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
> > [<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
> > [<c04205b1>] ? kernel_init+0x0/0x1c4
> > [<c04205b1>] ? kernel_init+0x0/0x1c4
> > [<c010373f>] ? kernel_thread_helper+0x7/0x10
> > =======================
> >
>
> I thought we'd already fixed this?
I have a patch for it, posted to lkml on Friday (or was it thursday...)
Then on Friday I went and audited all users of device_create and found 5
other places where this same problem will occur (or something almost
like it) and fixed them up and Cc:ed the subsystem maintainers that were
affected.
I wanted a round of tests in linux-next to happen before sending them
all to Linus. I'll do that on Monday as they missed the last linux-next
release.
If you want to test them out yourself, the patches are this one first:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
and then add any one of the rest of the patches in the directory at:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
depending on the subsystem you are having problems with. There are 12
different ones in there.
hope this helps,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] unable to handle kernel paging request in next-20080516
2008-05-18 17:47 ` Greg KH
@ 2008-05-18 20:22 ` Sitsofe Wheeler
0 siblings, 0 replies; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-05-18 20:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-scsi
<posted & mailed>
(I've dropped akpm because the mail server doesn't like where I'm sending
from with this address)
Greg KH wrote:
> On Sun, May 18, 2008 at 02:14:23AM -0700, Andrew Morton wrote:
>>
>> (cc's added)
>>
>> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>> <sitsofe@yahoo.com> wrote:
>>
>> > BUG: unable to handle kernel paging request at e6f17fac
>> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> > *pde = 2714b163 *pte = 26f17160
>> > Oops: 0000 [#1] DEBUG_PAGEALLOC
>> > last sysfs file:
>>
>> I thought we'd already fixed this?
>
> If you want to test them out yourself, the patches are this one first:
>
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
> and then add any one of the rest of the patches in the directory at:
>
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
> depending on the subsystem you are having problems with. There are 12
> different ones in there.
>
> hope this helps,
Bad news - the patches all applied to 2.6.26-rc2 / current HEAD but the
problem remained.
The trace at the end seems slightly different though (alas I have to
transcribe):
BUG: unable to handle kernel paging request at e725ffac
IP: [<c025fdb6>] scsi_bus_uevent+0x1/0x17
*pde = 27845163 *pte = 2725f160
Oops: 0000 [#1] DEBUG_PAGEALLOC
[...]
dev_uevent
dev_uevent
kobject_uevent_env
mutex_unlock
kobject_uevent
device_add
mutex_unlock
scsi_sys_add_sdev
scsi_probe_and_add_lun
mark_held_locks
__scsi_add_device
ata_scsi_scan_host
ata_host_register
ata_pci_sff_activate_host
ata_sff_interrupt
ata_pci_sff_init_one
pci_device_probe
driver_probe_device
__driver_attach
bus_for_each_dev
driver_attach
__driver_attach
bus_add_driver
driver_register
__pci_register_driver
kernel_init
via_init
kernel_init
kernel_init
kernel_init
krenel_thread_helper
--
Sitsofe | http://sucs.org/~sits/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] unable to handle kernel paging request in next-20080516
2008-05-18 9:14 ` Andrew Morton
` (2 preceding siblings ...)
2008-05-18 17:47 ` Greg KH
@ 2008-05-22 11:34 ` James Bottomley
2008-05-23 19:34 ` Sitsofe Wheeler
3 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2008-05-22 11:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: Sitsofe Wheeler, linux-kernel, linux-scsi, Greg KH
On Sun, 2008-05-18 at 02:14 -0700, Andrew Morton wrote:
> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
>
> > Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
> > will occur. At first I thought it might be provoked by vga=0x164 but this
> > does not appear to be the case and the issue is seemingly random. I've
> > hand transcribed the oops so there may be errors in it but hopefully it
> > will still help:
> >
> > BUG: unable to handle kernel paging request at e6f17fac
> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
> > *pde = 2714b163 *pte = 26f17160
> > Oops: 0000 [#1] DEBUG_PAGEALLOC
> > last sysfs file:
> >
> > Pid: 1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
> > EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
> > EIP is at scsi_bus_uevent+0x1/0x17
> > EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
> > ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
> > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
> > Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
> > e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
> > 00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
> > Call Trace:
> > [<c0237f3a>] ? dev_uevent+0x8e/0xca
> > [<c0237eac>] ? dev_uevent+0x0/0xca
> > [<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
> > [<c01da52a>] ? kobject_uevent_env+0xa/0xc
> > [<c023884b>] ? device_add+0x2bf/0x3f0
> > [<c0321905>] ? mutex_unlock+0x8/0xa
> > [<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
> > [<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
> > [<c025f9ef>] ? __scsi_add_device+0x85/0xab
> > [<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
> > [<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
> > [<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
> > [<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
> > [<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
> > [<c027219c>] ? via_init_one+0x1da/0x1e3
> > [<c01e5670>] ? pci_device_probe+0x39/0x59
> > [<c023a0a1>] ? driver_probe_device+0x9f/0x119
> > [<c023a158>] ? __driver_attach+0x3d/0x5f
> > [<c023990a>] ? bus_for_each_dev+0x3e/0x60
> > [<c0239f39>] ? driver_attach+0x14/0x16
> > [<c023a11b>] ? __driver_attach+0x0/0x5f
> > [<c0239c9d>] ? bus_add_driver+0x99/0x1a0
> > [<c023a2d6>] ? driver_register+0x71/0xcd
> > [<c01e5852>] ? __pci_register_driver+0x53/0x81
> > [<c04205b1>] ? kernel_init+0x0/0xc4
> > [<c04378fc>] ? via_init+0x14/0x16
> > [<c0132800>] ? trace_softirqs_on+0x78/0x7e
> > [<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
> > [<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
> > [<c04205b1>] ? kernel_init+0x0/0x1c4
> > [<c04205b1>] ? kernel_init+0x0/0x1c4
> > [<c010373f>] ? kernel_thread_helper+0x7/0x10
> > =======================
> >
>
> I thought we'd already fixed this?
Actually, I think this is a very subtle bug; what I think is happening
is that after Hannes sysfs changes, we now add scsi_bus_type to the
target device. However, scsi_bus_uevent() unconditionally casts from
dev to a struct scsi_device and then looks at the type entry. My theory
is that in this particular config going from struct scsi_target to
struct device and back to struct scsi_device actually tips us over into
unmapped space for the -> type deref.
Hopefully this should fix it by checking the device type before doing
the deref.
James
---
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 049103f..93d2b67 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -359,7 +359,12 @@ static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)
static int scsi_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct scsi_device *sdev = to_scsi_device(dev);
+ struct scsi_device *sdev;
+
+ if (dev->type != &scsi_dev_type)
+ return 0;
+
+ sdev = to_scsi_device(dev);
add_uevent_var(env, "MODALIAS=" SCSI_DEVICE_MODALIAS_FMT, sdev->type);
return 0;
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [BUG] unable to handle kernel paging request in next-20080516
2008-05-22 11:34 ` James Bottomley
@ 2008-05-23 19:34 ` Sitsofe Wheeler
2008-05-23 20:26 ` James Bottomley
0 siblings, 1 reply; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-05-23 19:34 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-scsi
<posted & mailed>
James Bottomley wrote:
> Actually, I think this is a very subtle bug; what I think is happening
> is that after Hannes sysfs changes, we now add scsi_bus_type to the
> target device. However, scsi_bus_uevent() unconditionally casts from
> dev to a struct scsi_device and then looks at the type entry. My theory
> is that in this particular config going from struct scsi_target to
> struct device and back to struct scsi_device actually tips us over into
> unmapped space for the -> type deref.
>
> Hopefully this should fix it by checking the device type before doing
> the deref.
This fixed the problem for me (it was horribly intermittant but I've done
10+ consecutive reboots without seeing an oopos). I changed the patch to
printk everytime the condition was hit and it seems to happen twice per
PATA device - once after each scsi?: pata_via message and then again after
each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .
The thing I don't understand about your explanation is that it sounds like
the device struct is being round-tripped (but is just being cast to
different things along the way). If this is the case why would this problem
ever arise? Surely if it is really a struct scsi_device underneath there
should be no problem?
--
Sitsofe | http://sucs.org/~sits/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] unable to handle kernel paging request in next-20080516
2008-05-23 19:34 ` Sitsofe Wheeler
@ 2008-05-23 20:26 ` James Bottomley
0 siblings, 0 replies; 9+ messages in thread
From: James Bottomley @ 2008-05-23 20:26 UTC (permalink / raw)
To: Sitsofe Wheeler; +Cc: linux-scsi, linux-kernel
On Fri, 2008-05-23 at 20:34 +0100, Sitsofe Wheeler wrote:
> <posted & mailed>
>
> James Bottomley wrote:
>
> > Actually, I think this is a very subtle bug; what I think is happening
> > is that after Hannes sysfs changes, we now add scsi_bus_type to the
> > target device. However, scsi_bus_uevent() unconditionally casts from
> > dev to a struct scsi_device and then looks at the type entry. My theory
> > is that in this particular config going from struct scsi_target to
> > struct device and back to struct scsi_device actually tips us over into
> > unmapped space for the -> type deref.
> >
> > Hopefully this should fix it by checking the device type before doing
> > the deref.
>
> This fixed the problem for me (it was horribly intermittant but I've done
> 10+ consecutive reboots without seeing an oopos). I changed the patch to
> printk everytime the condition was hit and it seems to happen twice per
> PATA device - once after each scsi?: pata_via message and then again after
> each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .
>
> The thing I don't understand about your explanation is that it sounds like
> the device struct is being round-tripped (but is just being cast to
> different things along the way). If this is the case why would this problem
> ever arise? Surely if it is really a struct scsi_device underneath there
> should be no problem?
The event is called for all generic device objects belonging to the
scsi_bus_type. That means both struct scsi_device and struct
scsi_target objects. When it's called for struct scsi_target objects,
casting out to struct scsi_device does the wrong thing.
James
^ permalink raw reply [flat|nested] 9+ messages in thread