public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Debonzi <debonzi@linux.vnet.ibm.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matthew Wilcox <matthew@wil.cx>,
	Brian King <brking@linux.vnet.ibm.com>,
	linux-scsi@vger.kernel.org
Subject: Re: scsi_host_alloc does not check for used shost->host_no
Date: Tue, 15 Jul 2008 18:39:48 -0300	[thread overview]
Message-ID: <487D1924.7070502@linux.vnet.ibm.com> (raw)
In-Reply-To: <1216154055.3312.129.camel@localhost.localdomain>

James Bottomley wrote:
> On Tue, 2008-07-15 at 14:25 -0600, Matthew Wilcox wrote:
>>> Do we need to worry about a host in the SHOST_DEL state? In that case, it will still
>>> exist to some degree, but scsi_host_get will fail. For example, what happens if a
>>> shell is in /sys/class/scsi_host/host5/ and you delete host 5 and try to add another.
>>> Couldn't you run into the same problem? In that case the scsi_host_get will fail.
>>> I suppose you could check specifically for -ENXIO getting returned...
>> Or we could make the host_no a u64 and avoid the problem ever happening
>> in our lifetimes.  I'm amazed that anyone's had the time to do 4 billion
>> add/removes, to be honest.  Assuming it takes 1 second per add/remove
>> cycle, and there's not even time to scan a bus in that time, that's
>> still 136 years.
> 
> Actually, right at the moment, a lot of the udev stuff is conditioned on
> a non repeating host number (which is why we don't use idr like we do
> for the other things).  I'm really reluctant to go to a u64 host
> number ... what was the use case that produced this problem?
> 
> James

All of it started in some functional tests against pata_pdc2027x module 
which includes some rmmod/modprobe (around 10000). Before I start to 
work on it, this functional test started to fail, sometimes with at 
different points.

Just to make clear, I am adding some kernel messages and mon info to 
help some additional comments.

We can see that on the first and Third times (on rmmod) a panic happened 
far to the short int border (around 19741 and 9887). On the Second we 
can see that it happens on modprobe when going from 65355 to 0. This 
pointed me to the patch I summited and which I used to check if all of 
it would be "fixed". After that patch (I know it is far away from a good 
patch) I got this rmmod/modprobe loop running for more then 4 days with 
no kernel panic. It made me believe that somehow it avoids the First and 
Third panics to happen.

I am pretty knew to this peaces of code and I probably don't have a full 
overview of it. That is way I would like to have your input and opinions.

I really appreciate that.
Daniel Debonzi

First occurrence:
**********************************************
    Vendor: IBM       Model: DROM00205         Rev: NR36
   Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
sr 19740:0:0:0: Attached scsi generic sg4 type 5
ata19739.00: disabled
   Vendor: IBM       Model: DROM00205         Rev: NR36
   Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
sr 19742:0:0:0: Attached scsi generic sg4 type 5
ata19741.00: disabled
Unable to handle kernel paging request for data at address 
0xd0000000008c3e98
Faulting instruction address: 0xd0000000001db250
cpu 0x3: Vector: 300 (Data Access) at [c0000000391173c0]
     pc: d0000000001db250: .scsi_target_reap_usercontext+0x90/0x114 
[scsi_mod]
     lr: d0000000001db244: .scsi_target_reap_usercontext+0x84/0x114 
[scsi_mod]
     sp: c000000039117640
    msr: 8000000000001032
    dar: d0000000008c3e98
  dsisr: 40000000
   current = 0xc00000007182dc60
   paca    = 0xc000000000475400
     pid   = 535, comm = udevd


3:mon> t
[c0000000391176e0] c00000000007f35c .execute_in_process_context+0x54/0xa0
[c000000039117760] d0000000001da190 .scsi_target_reap+0xc8/0x100 [scsi_mod]
[c0000000391177f0] d0000000001db5c8 
.scsi_device_dev_release_usercontext+0xc8/0x
120 [scsi_mod]
[c0000000391178a0] c00000000007f35c .execute_in_process_context+0x54/0xa0
[c000000039117920] d0000000001db4e8 .scsi_device_dev_release+0x24/0x3c 
[scsi_mod
]
[c0000000391179a0] c00000000022580c .device_release+0x4c/0x78
[c000000039117a20] c0000000001b2388 .kobject_cleanup+0x90/0xf0
[c000000039117ac0] c0000000001b3470 .kref_put+0x84/0xa0
[c000000039117b40] c0000000001b22e0 .kobject_put+0x28/0x40
[c000000039117bc0] c00000000014dbe8 .sysfs_release+0x48/0xe0
[c000000039117c50] c0000000000eca1c .__fput+0x108/0x25c
[c000000039117d00] c0000000000e8fa4 .filp_close+0xac/0xd4
[c000000039117d90] c0000000000eacf4 .sys_close+0xc4/0x110
[c000000039117e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007e7d7d0
SP (ff8dc360) is in userspace

**********************************************

Second occurrence:
**********************************************
*  Vendor: IBM       Model: DROM00205         Rev: NR38
   Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65529:0:0:0: Attached scsi generic sg11 type 5
ata65529.00: disabled
   Vendor: IBM       Model: DROM00205         Rev: NR38
   Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65531:0:0:0: Attached scsi generic sg11 type 5
ata65531.00: disabled
   Vendor: IBM       Model: DROM00205         Rev: NR38
   Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65533:0:0:0: Attached scsi generic sg11 type 5
ata65533.00: disabled
kobject_add failed for host0 with -EEXIST, don't try to register things 
with the
same name in the same directory.
Call Trace:
[C0000000B565B1A0] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable)
[C0000000B565B240] [C0000000001B28E0] .kobject_add+0x1a4/0x1fc
[C0000000B565B2E0] [C000000000229EE0] .class_device_add+0xb4/0x4e4
[C0000000B565B3B0] [D0000000001D1F2C] .scsi_add_host+0xf8/0x208 [scsi_mod]
[C0000000B565B450] [D00000000052B52C] .ata_scsi_add_hosts+0xa4/0x160 
[libata]
[C0000000B565B500] [D000000000527C0C] .ata_host_register+0xec/0x368 [libata]
[C0000000B565B5D0] [D000000000527F1C] .ata_host_activate+0x94/0xe0 [libata]
[C0000000B565B680] [D0000000007D11B0] .pdc2027x_init_one+0x36c/0x39c 
[pata_pdc2027x]
[C0000000B565B730] [C0000000001C3530] .pci_device_probe+0x13c/0x1dc
[C0000000B565B7F0] [C0000000002287F0] .driver_probe_device+0xa0/0x16c
[C0000000B565B890] [C000000000228A58] .__driver_attach+0xb4/0x138
[C0000000B565B920] [C000000000227F14] .bus_for_each_dev+0x7c/0xd4
[C0000000B565B9E0] [C000000000228694] .driver_attach+0x28/0x40
[C0000000B565BA60] [C00000000022797C] .bus_add_driver+0x98/0x18c
[C0000000B565BB00] [C000000000228E58] .driver_register+0xa8/0xc4
[C0000000B565BB80] [C0000000001C3838] .__pci_register_driver+0x5c/0xa4
[C0000000B565BC10] [D0000000007D14D4] .pdc2027x_init+0x20/0x45c 
[pata_pdc2027x]
[C0000000B565BC90] [C000000000090B50] .sys_init_module+0x1764/0x1998
[C0000000B565BE30] [C0000000000086A4] syscall_exit+0x0/0x40
slab error in kmem_cache_destroy(): cache `scsi_cmd_cache': Can't free 
all objects
Call Trace:
[C0000000B565B070] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable)
[C0000000B565B110] [C0000000000E4020] .kmem_cache_destroy+0x94/0x1b0
[C0000000B565B1A0] [D0000000001D12D8] 
.scsi_destroy_command_freelist+0xa0/0xcc
[scsi_mod]
[C0000000B565B230] [D0000000001D1720] .scsi_host_dev_release+0x80/0xe0 
[scsi_mod]
[C0000000B565B2C0] [C00000000022580C] .device_release+0x4c/0x78
[C0000000B565B340] [C0000000001B2388] .kobject_cleanup+0x90/0xf0
[C0000000B565B3E0] [C0000000001B3470] .kref_put+0x84/0xa0
[C0000000B565B460] [C0000000001B22E0] .kobject_put+0x28/0x40
[C0000000B565B4E0] [C000000000225968] .put_device+0x1c/0x30
[C0000000B565B560] [D0000000001D168C] .scsi_host_put+0x14/0x28 [scsi_mod]
[C0000000B565B5E0] [D000000000528058] .ata_host_release+0xf0/0x14c [libata]
[C0000000B565B680] [C00000000022C720] .release_nodes+0x1c8/0x22c
[C0000000B565B750] [C00000000022CB98] .devres_release_all+0x58/0xd4
[C0000000B565B7F0] [C000000000228860] .driver_probe_device+0x110/0x16c
[C0000000B565B890] [C000000000228A58] .__driver_attach+0xb4/0x138
[C0000000B565B920] [C000000000227F14] .bus_for_each_dev+0x7c/0xd4
[C0000000B565B9E0] [C000000000228694] .driver_attach+0x28/0x40
[C0000000B565BA60] [C00000000022797C] .bus_add_driver+0x98/0x18c
[C0000000B565BB00] [C000000000228E58] .driver_register+0xa8/0xc4
[C0000000B565BB80] [C0000000001C3838] .__pci_register_driver+0x5c/0xa4
[C0000000B565BC10] [D0000000007D14D4] .pdc2027x_init+0x20/0x45c 
[pata_pdc2027x]
[C0000000B565BC90] [C000000000090B50] .sys_init_module+0x1764/0x1998
[C0000000B565BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Unable to handle kernel paging request for data at address 
0x3a30322e332f3040
Faulting instruction address: 0xc0000000000843e4
cpu 0x4: Vector: 300 (Data Access) at [c0000000b565af10]
     pc: c0000000000843e4: .kthread_stop+0x3c/0xfc
     lr: c0000000000843e0: .kthread_stop+0x38/0xfc
     sp: c0000000b565b190
    msr: 8000000000009032
    dar: 3a30322e332f3040
  dsisr: 40000000
   current = 0xc0000000ea1294d0
   paca    = 0xc000000000475600
     pid   = 15221, comm = modprobe
*********************************************



Third occurrence:
**********************************************
  <7>pata_pdc2027x 0001:cc:01.0: version 0.74-ac5
<6>pata_pdc2027x 0001:cc:01.0: PLL input clock 32758 kHz
<6>ata9887: PATA max UDMA/133 cmd 0xD000080084DC07C0 ctl 0xD000080084DC0FDA
bmdma 0xD000080084DC0000 irq 166
<6>ata9888: PATA max UDMA/133 cmd 0xD000080084DC05C0 ctl 0xD000080084DC0DDA
bmdma 0xD000080084DC0008 irq 166
<6>scsi9887 : pata_pdc2027x
<6>ata9887.00: ATAPI, max UDMA/33
<6>ata9887.00: configured for UDMA/33
<6>scsi9888 : pata_pdc2027x
<4>ATA: abnormal status 0x8 on port 0xD000080084DC05DF
<5>  Vendor: IBM       Model: DROM00205         Rev: NR36
<5>  Type:   CD-ROM                             ANSI SCSI revision: 02
<4>sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
<7>sr 9887:0:0:0: Attached scsi CD-ROM sr0
<5>sr 9887:0:0:0: Attached scsi generic sg0 type 5
<4>ata9887.00: disabled
<1>Unable to handle kernel paging request for data at address 
0xd000000000047878
<1>Faulting instruction address: 0xd0000000000821f0
cpu 0x1: Vector: 300 (Data Access) at [c000000070f03580]
     pc: d0000000000821f0: 
.scsi_device_dev_release_usercontext+0x40/0x1ac [scsi_mod]
     lr: c000000000077394: .execute_in_process_context+0x54/0xa0
     sp: c000000070f03800
    msr: 8000000000009032
    dar: d000000000047878
  dsisr: 40000000
   current = 0xc000000002cacad0
   paca    = 0xc0000000004a3780
     pid   = 2086, comm = hald


      reply	other threads:[~2008-07-15 21:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-11 13:19 scsi_host_alloc does not check for used shost->host_no Daniel Debonzi
2008-07-15 20:16 ` Brian King
2008-07-15 20:25   ` Matthew Wilcox
2008-07-15 20:31     ` Brian King
2008-07-15 20:54       ` [PATCH] Make host_no an unsigned int Matthew Wilcox
2008-07-15 20:34     ` scsi_host_alloc does not check for used shost->host_no James Bottomley
2008-07-15 21:39       ` Daniel Debonzi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487D1924.7070502@linux.vnet.ibm.com \
    --to=debonzi@linux.vnet.ibm.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=matthew@wil.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox