public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [BUG] unable to handle kernel paging request in next-20080516
       [not found] <g0mkaf$i1r$1@ger.gmane.org>
@ 2008-05-18  9:14 ` Andrew Morton
  2008-05-18 11:22   ` [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent) Sitsofe Wheeler
                     ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Andrew Morton @ 2008-05-18  9:14 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-kernel, linux-scsi, Greg KH


(cc's added)

On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic 
> will occur. At first I thought it might be provoked by vga=0x164 but this 
> does not appear to be the case and the issue is seemingly random. I've 
> hand transcribed the oops so there may be errors in it but hopefully it 
> will still help:
> 
> BUG: unable to handle kernel paging request at e6f17fac
> IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
> *pde = 2714b163 *pte = 26f17160
> Oops: 0000 [#1] DEBUG_PAGEALLOC
> last sysfs file:
> 
> Pid:  1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
> EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
> EIP is at scsi_bus_uevent+0x1/0x17
> EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
> ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
>  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
> Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
>        e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
>        00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
> Call Trace:
>  [<c0237f3a>] ? dev_uevent+0x8e/0xca
>  [<c0237eac>] ? dev_uevent+0x0/0xca
>  [<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
>  [<c01da52a>] ? kobject_uevent_env+0xa/0xc
>  [<c023884b>] ? device_add+0x2bf/0x3f0
>  [<c0321905>] ? mutex_unlock+0x8/0xa
>  [<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
>  [<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
>  [<c025f9ef>] ? __scsi_add_device+0x85/0xab
>  [<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
>  [<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
>  [<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
>  [<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
>  [<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
>  [<c027219c>] ? via_init_one+0x1da/0x1e3
>  [<c01e5670>] ? pci_device_probe+0x39/0x59
>  [<c023a0a1>] ? driver_probe_device+0x9f/0x119
>  [<c023a158>] ? __driver_attach+0x3d/0x5f
>  [<c023990a>] ? bus_for_each_dev+0x3e/0x60
>  [<c0239f39>] ? driver_attach+0x14/0x16
>  [<c023a11b>] ? __driver_attach+0x0/0x5f
>  [<c0239c9d>] ? bus_add_driver+0x99/0x1a0
>  [<c023a2d6>] ? driver_register+0x71/0xcd
>  [<c01e5852>] ? __pci_register_driver+0x53/0x81
>  [<c04205b1>] ? kernel_init+0x0/0xc4
>  [<c04378fc>] ? via_init+0x14/0x16
>  [<c0132800>] ? trace_softirqs_on+0x78/0x7e
>  [<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
>  [<c04205b1>] ? kernel_init+0x0/0x1c4
>  [<c04205b1>] ? kernel_init+0x0/0x1c4
>  [<c010373f>] ? kernel_thread_helper+0x7/0x10
>  =======================
> 

I thought we'd already fixed this?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent)
  2008-05-18  9:14 ` [BUG] unable to handle kernel paging request in next-20080516 Andrew Morton
@ 2008-05-18 11:22   ` Sitsofe Wheeler
  2008-05-18 16:00   ` [BUG] unable to handle kernel paging request in next-20080516 Sitsofe Wheeler
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2008-05-18 11:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

On Sun, 18 May 2008 02:14:23 -0700, Andrew Morton wrote:

> (cc's added)
> 
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
> 
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a
>> panic will occur. At first I thought it might be provoked by vga=0x164
>> but this does not appear to be the case and the issue is seemingly
>> random. I've hand transcribed the oops so there may be errors in it but
>> hopefully it will still help:
>> 
>> BUG: unable to handle kernel paging request at e6f17fac IP:
>> [<c02604d6>] scsi_bus_uevent+0x1/0x17 *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>> 
> I thought we'd already fixed this?

Thanks to your tip off I've found that this bug is already in bugzilla 
(complete with the commit that caused the regression) - http://
bugzilla.kernel.org/show_bug.cgi?id=10711 . There's nothing there that 
says it has been fixed though. I'll look harder before reporting problems 
next time.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516
  2008-05-18  9:14 ` [BUG] unable to handle kernel paging request in next-20080516 Andrew Morton
  2008-05-18 11:22   ` [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent) Sitsofe Wheeler
@ 2008-05-18 16:00   ` Sitsofe Wheeler
  2008-05-18 17:47   ` Greg KH
  2008-05-22 11:34   ` James Bottomley
  3 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2008-05-18 16:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

Andrew Morton wrote:

> (cc's added)
> 
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
> <sitsofe@yahoo.com> wrote:
> 
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
>> will occur. At first I thought it might be provoked by vga=0x164 but this
>> does not appear to be the case and the issue is seemingly random. I've
>> hand transcribed the oops so there may be errors in it but hopefully it
>> will still help:
>> 
>> BUG: unable to handle kernel paging request at e6f17fac
>> IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>> 
> 
> I thought we'd already fixed this?

If it hasn't yet been fixed I think it can be narrowed down to 
[dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource() .
Be aware that the problem also seems to go away if an initrd file is present. I
struggled to revert this commit against the latest linux-next due to conflicts.

Here's the commit message:

Author: Bjorn Helgaas <bjorn.helgaas@hp.com>  2008-04-28 23:34:35
Committer: Len Brown <len.brown@intel.com>  2008-04-29 08:22:28
Child:  cc8c2e308194f0997c718c7c735550ff06754d20 (PNP: make generic pnp_add_io_resource())
Branches: v2.6.26rc1, remotes/origin/master, remotes/linux-next/stable, remotes/linux-next/master, remotes/linux-next/history, master, linux-next, bisect
Follows: v2.6.25
Precedes: v2.6.26-rc1, next-20080502, next-20080501, next-20080430

    PNP: make generic pnp_add_dma_resource()
    
    Add a pnp_add_dma_resource() that can be used by all the PNP
    backends.  This consolidates a little more pnp_resource_table
    knowledge into one place.
    
    Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

Here's the git-bisect log:

# bad: [2ddcca36c8bcfa251724fe342c8327451988be0d] Linux 2.6.26-rc1
# good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25
git-bisect start 'v2.6.26-rc1' 'v2.6.25'
# good: [7ae44cfa7ab29b277691327e8de790d7b880722f] [ALSA] snd-powermac: style awacs.s and awacs.h
git-bisect good 7ae44cfa7ab29b277691327e8de790d7b880722f
# good: [c60264c494a119cd3a716a22edc0137b11de6d1e] smack: fix integer as NULL pointer warning in smack_lsm.c
git-bisect good c60264c494a119cd3a716a22edc0137b11de6d1e
# good: [3977c965ec35ce1a7eac988ad313f0fc9aee9660] ext4: zero out small extents when writing to prealloc area.
git-bisect good 3977c965ec35ce1a7eac988ad313f0fc9aee9660
# good: [ccf2779544eecfcc5447e2028d1029b6d4ff7bb6] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git-bisect good ccf2779544eecfcc5447e2028d1029b6d4ff7bb6
# bad: [55e462b05b5df4fd113c4a304c4f487d44b0898e] memcg: simple stats for memory resource controller
git-bisect bad 55e462b05b5df4fd113c4a304c4f487d44b0898e
# good: [96916090f488986a4ebb8e9ffa6a3b50881d5ccd] Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release
git-bisect good 96916090f488986a4ebb8e9ffa6a3b50881d5ccd
# bad: [6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2
# good: [f5d94ff014cb7e6212f40fc6644f3fd68507df33] PNP: pass resources, not indexes, to pnp_check_port(), et al
git-bisect good f5d94ff014cb7e6212f40fc6644f3fd68507df33
# bad: [d152cf5d0c3325979e71ee53b425fdd51a1a285a] PNPACPI: move _CRS/_PRS warnings closer to the action
git-bisect bad d152cf5d0c3325979e71ee53b425fdd51a1a285a
# good: [784f01d5bdeae7d7005ede17305306b042ba2617] PNP: add struct pnp_resource
git-bisect good 784f01d5bdeae7d7005ede17305306b042ba2617
# good: [dbddd0383c59d588f8db5e773b062756e39117ec] PNP: make generic pnp_add_irq_resource()
git-bisect good dbddd0383c59d588f8db5e773b062756e39117ec
# bad: [cc8c2e308194f0997c718c7c735550ff06754d20] PNP: make generic pnp_add_io_resource()
git-bisect bad cc8c2e308194f0997c718c7c735550ff06754d20
# bad: [dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource()
git-bisect bad dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516
  2008-05-18  9:14 ` [BUG] unable to handle kernel paging request in next-20080516 Andrew Morton
  2008-05-18 11:22   ` [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent) Sitsofe Wheeler
  2008-05-18 16:00   ` [BUG] unable to handle kernel paging request in next-20080516 Sitsofe Wheeler
@ 2008-05-18 17:47   ` Greg KH
  2008-05-18 20:22     ` Sitsofe Wheeler
  2008-05-22 11:34   ` James Bottomley
  3 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2008-05-18 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sitsofe Wheeler, linux-kernel, linux-scsi

On Sun, May 18, 2008 at 02:14:23AM -0700, Andrew Morton wrote:
> 
> (cc's added)
> 
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
> > Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic 
> > will occur. At first I thought it might be provoked by vga=0x164 but this 
> > does not appear to be the case and the issue is seemingly random. I've 
> > hand transcribed the oops so there may be errors in it but hopefully it 
> > will still help:
> > 
> > BUG: unable to handle kernel paging request at e6f17fac
> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
> > *pde = 2714b163 *pte = 26f17160
> > Oops: 0000 [#1] DEBUG_PAGEALLOC
> > last sysfs file:
> > 
> > Pid:  1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
> > EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
> > EIP is at scsi_bus_uevent+0x1/0x17
> > EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
> > ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
> >  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
> > Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
> >        e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
> >        00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
> > Call Trace:
> >  [<c0237f3a>] ? dev_uevent+0x8e/0xca
> >  [<c0237eac>] ? dev_uevent+0x0/0xca
> >  [<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
> >  [<c01da52a>] ? kobject_uevent_env+0xa/0xc
> >  [<c023884b>] ? device_add+0x2bf/0x3f0
> >  [<c0321905>] ? mutex_unlock+0x8/0xa
> >  [<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
> >  [<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
> >  [<c025f9ef>] ? __scsi_add_device+0x85/0xab
> >  [<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
> >  [<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
> >  [<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
> >  [<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
> >  [<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
> >  [<c027219c>] ? via_init_one+0x1da/0x1e3
> >  [<c01e5670>] ? pci_device_probe+0x39/0x59
> >  [<c023a0a1>] ? driver_probe_device+0x9f/0x119
> >  [<c023a158>] ? __driver_attach+0x3d/0x5f
> >  [<c023990a>] ? bus_for_each_dev+0x3e/0x60
> >  [<c0239f39>] ? driver_attach+0x14/0x16
> >  [<c023a11b>] ? __driver_attach+0x0/0x5f
> >  [<c0239c9d>] ? bus_add_driver+0x99/0x1a0
> >  [<c023a2d6>] ? driver_register+0x71/0xcd
> >  [<c01e5852>] ? __pci_register_driver+0x53/0x81
> >  [<c04205b1>] ? kernel_init+0x0/0xc4
> >  [<c04378fc>] ? via_init+0x14/0x16
> >  [<c0132800>] ? trace_softirqs_on+0x78/0x7e
> >  [<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
> >  [<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
> >  [<c04205b1>] ? kernel_init+0x0/0x1c4
> >  [<c04205b1>] ? kernel_init+0x0/0x1c4
> >  [<c010373f>] ? kernel_thread_helper+0x7/0x10
> >  =======================
> > 
> 
> I thought we'd already fixed this?

I have a patch for it, posted to lkml on Friday (or was it thursday...)
Then on Friday I went and audited all users of device_create and found 5
other places where this same problem will occur (or something almost
like it) and fixed them up and Cc:ed the subsystem maintainers that were
affected.

I wanted a round of tests in linux-next to happen before sending them
all to Linus.  I'll do that on Monday as they missed the last linux-next
release.

If you want to test them out yourself, the patches are this one first:
  http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
and then add any one of the rest of the patches in the directory at:
  http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
depending on the subsystem you are having problems with.  There are 12
different ones in there.

hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516
  2008-05-18 17:47   ` Greg KH
@ 2008-05-18 20:22     ` Sitsofe Wheeler
  0 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2008-05-18 20:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

<posted & mailed>

(I've dropped akpm because the mail server doesn't like where I'm sending
from with this address)

Greg KH wrote:

> On Sun, May 18, 2008 at 02:14:23AM -0700, Andrew Morton wrote:
>> 
>> (cc's added)
>> 
>> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>> <sitsofe@yahoo.com> wrote:
>> 
>> > BUG: unable to handle kernel paging request at e6f17fac
>> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> > *pde = 2714b163 *pte = 26f17160
>> > Oops: 0000 [#1] DEBUG_PAGEALLOC
>> > last sysfs file:
>> 
>> I thought we'd already fixed this?
>
> If you want to test them out yourself, the patches are this one first:
>  
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
> and then add any one of the rest of the patches in the directory at:
>  
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
> depending on the subsystem you are having problems with.  There are 12
> different ones in there.
> 
> hope this helps,

Bad news - the patches all applied to 2.6.26-rc2 / current HEAD but the
problem remained.

The trace at the end seems slightly different though (alas I have to
transcribe):

BUG: unable to handle kernel paging request at e725ffac
IP: [<c025fdb6>] scsi_bus_uevent+0x1/0x17
*pde = 27845163 *pte = 2725f160
Oops: 0000 [#1] DEBUG_PAGEALLOC

[...]

dev_uevent
dev_uevent
kobject_uevent_env
mutex_unlock
kobject_uevent
device_add
mutex_unlock
scsi_sys_add_sdev
scsi_probe_and_add_lun
mark_held_locks
__scsi_add_device
ata_scsi_scan_host
ata_host_register
ata_pci_sff_activate_host
ata_sff_interrupt
ata_pci_sff_init_one
pci_device_probe
driver_probe_device
__driver_attach
bus_for_each_dev
driver_attach
__driver_attach
bus_add_driver
driver_register
__pci_register_driver
kernel_init
via_init
kernel_init
kernel_init
kernel_init
krenel_thread_helper

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516
  2008-05-18  9:14 ` [BUG] unable to handle kernel paging request in next-20080516 Andrew Morton
                     ` (2 preceding siblings ...)
  2008-05-18 17:47   ` Greg KH
@ 2008-05-22 11:34   ` James Bottomley
  2008-05-23 19:34     ` Sitsofe Wheeler
  3 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2008-05-22 11:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sitsofe Wheeler, linux-kernel, linux-scsi, Greg KH

On Sun, 2008-05-18 at 02:14 -0700, Andrew Morton wrote:
> (cc's added)
> 
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
> > Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic 
> > will occur. At first I thought it might be provoked by vga=0x164 but this 
> > does not appear to be the case and the issue is seemingly random. I've 
> > hand transcribed the oops so there may be errors in it but hopefully it 
> > will still help:
> > 
> > BUG: unable to handle kernel paging request at e6f17fac
> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
> > *pde = 2714b163 *pte = 26f17160
> > Oops: 0000 [#1] DEBUG_PAGEALLOC
> > last sysfs file:
> > 
> > Pid:  1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
> > EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
> > EIP is at scsi_bus_uevent+0x1/0x17
> > EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
> > ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
> >  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
> > Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
> >        e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
> >        00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
> > Call Trace:
> >  [<c0237f3a>] ? dev_uevent+0x8e/0xca
> >  [<c0237eac>] ? dev_uevent+0x0/0xca
> >  [<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
> >  [<c01da52a>] ? kobject_uevent_env+0xa/0xc
> >  [<c023884b>] ? device_add+0x2bf/0x3f0
> >  [<c0321905>] ? mutex_unlock+0x8/0xa
> >  [<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
> >  [<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
> >  [<c025f9ef>] ? __scsi_add_device+0x85/0xab
> >  [<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
> >  [<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
> >  [<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
> >  [<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
> >  [<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
> >  [<c027219c>] ? via_init_one+0x1da/0x1e3
> >  [<c01e5670>] ? pci_device_probe+0x39/0x59
> >  [<c023a0a1>] ? driver_probe_device+0x9f/0x119
> >  [<c023a158>] ? __driver_attach+0x3d/0x5f
> >  [<c023990a>] ? bus_for_each_dev+0x3e/0x60
> >  [<c0239f39>] ? driver_attach+0x14/0x16
> >  [<c023a11b>] ? __driver_attach+0x0/0x5f
> >  [<c0239c9d>] ? bus_add_driver+0x99/0x1a0
> >  [<c023a2d6>] ? driver_register+0x71/0xcd
> >  [<c01e5852>] ? __pci_register_driver+0x53/0x81
> >  [<c04205b1>] ? kernel_init+0x0/0xc4
> >  [<c04378fc>] ? via_init+0x14/0x16
> >  [<c0132800>] ? trace_softirqs_on+0x78/0x7e
> >  [<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
> >  [<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
> >  [<c04205b1>] ? kernel_init+0x0/0x1c4
> >  [<c04205b1>] ? kernel_init+0x0/0x1c4
> >  [<c010373f>] ? kernel_thread_helper+0x7/0x10
> >  =======================
> > 
> 
> I thought we'd already fixed this?

Actually, I think this is a very subtle bug; what I think is happening
is that after Hannes sysfs changes, we now add scsi_bus_type to the
target device.  However, scsi_bus_uevent() unconditionally casts from
dev to a struct scsi_device and then looks at the type entry.  My theory
is that in this particular config going from struct scsi_target to
struct device and back to struct scsi_device actually tips us over into
unmapped space for the -> type deref.

Hopefully this should fix it by checking the device type before doing
the deref.

James

---

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 049103f..93d2b67 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -359,7 +359,12 @@ static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)
 
 static int scsi_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
-	struct scsi_device *sdev = to_scsi_device(dev);
+	struct scsi_device *sdev;
+
+	if (dev->type != &scsi_dev_type)
+		return 0;
+
+	sdev = to_scsi_device(dev);
 
 	add_uevent_var(env, "MODALIAS=" SCSI_DEVICE_MODALIAS_FMT, sdev->type);
 	return 0;



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516
  2008-05-22 11:34   ` James Bottomley
@ 2008-05-23 19:34     ` Sitsofe Wheeler
  2008-05-23 20:26       ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2008-05-23 19:34 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel

<posted & mailed>

James Bottomley wrote:

> Actually, I think this is a very subtle bug; what I think is happening
> is that after Hannes sysfs changes, we now add scsi_bus_type to the
> target device.  However, scsi_bus_uevent() unconditionally casts from
> dev to a struct scsi_device and then looks at the type entry.  My theory
> is that in this particular config going from struct scsi_target to
> struct device and back to struct scsi_device actually tips us over into
> unmapped space for the -> type deref.
> 
> Hopefully this should fix it by checking the device type before doing
> the deref.

This fixed the problem for me (it was horribly intermittant but I've done
10+ consecutive reboots without seeing an oopos). I changed the patch to
printk everytime the condition was hit and it seems to happen twice per
PATA device - once after each scsi?: pata_via message and then again after
each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .

The thing I don't understand about your explanation is that it sounds like
the device struct is being round-tripped (but is just being cast to
different things along the way). If this is the case why would this problem
ever arise? Surely if it is really a struct scsi_device underneath there
should be no problem?

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] unable to handle kernel paging request in next-20080516
  2008-05-23 19:34     ` Sitsofe Wheeler
@ 2008-05-23 20:26       ` James Bottomley
  0 siblings, 0 replies; 8+ messages in thread
From: James Bottomley @ 2008-05-23 20:26 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-scsi, linux-kernel

On Fri, 2008-05-23 at 20:34 +0100, Sitsofe Wheeler wrote:
> <posted & mailed>
> 
> James Bottomley wrote:
> 
> > Actually, I think this is a very subtle bug; what I think is happening
> > is that after Hannes sysfs changes, we now add scsi_bus_type to the
> > target device.  However, scsi_bus_uevent() unconditionally casts from
> > dev to a struct scsi_device and then looks at the type entry.  My theory
> > is that in this particular config going from struct scsi_target to
> > struct device and back to struct scsi_device actually tips us over into
> > unmapped space for the -> type deref.
> > 
> > Hopefully this should fix it by checking the device type before doing
> > the deref.
> 
> This fixed the problem for me (it was horribly intermittant but I've done
> 10+ consecutive reboots without seeing an oopos). I changed the patch to
> printk everytime the condition was hit and it seems to happen twice per
> PATA device - once after each scsi?: pata_via message and then again after
> each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .
> 
> The thing I don't understand about your explanation is that it sounds like
> the device struct is being round-tripped (but is just being cast to
> different things along the way). If this is the case why would this problem
> ever arise? Surely if it is really a struct scsi_device underneath there
> should be no problem?

The event is called for all generic device objects belonging to the
scsi_bus_type.  That means both struct scsi_device and struct
scsi_target objects.  When it's called for struct scsi_target objects,
casting out to struct scsi_device does the wrong thing.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-05-23 20:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <g0mkaf$i1r$1@ger.gmane.org>
2008-05-18  9:14 ` [BUG] unable to handle kernel paging request in next-20080516 Andrew Morton
2008-05-18 11:22   ` [BUG] unable to handle kernel paging request in next-20080516 (scsi_bus_uevent) Sitsofe Wheeler
2008-05-18 16:00   ` [BUG] unable to handle kernel paging request in next-20080516 Sitsofe Wheeler
2008-05-18 17:47   ` Greg KH
2008-05-18 20:22     ` Sitsofe Wheeler
2008-05-22 11:34   ` James Bottomley
2008-05-23 19:34     ` Sitsofe Wheeler
2008-05-23 20:26       ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox