* 2.6.15-git12, slab corruption in ipr
@ 2006-01-17 0:05 Olaf Hering
2006-01-18 18:42 ` Brian King
2006-01-30 18:07 ` 2.6.15-git12, slab corruption in ipr Olaf Hering
0 siblings, 2 replies; 24+ messages in thread
From: Olaf Hering @ 2006-01-17 0:05 UTC (permalink / raw)
To: linux-scsi, Brian J King
I tested the current Linus tree + a few SuSE patches on a p710. There is
some slab corruption. Will check if plain Linus tree gives the same...
dmesg | grep -wiC9 slab
sda: Write Protect is off
sda: Mode Sense: cb 00 00 08
SCSI device sda: drive cache: write through
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 00 08
SCSI device sda: drive cache: write through
sda: sda1 sda2 sda3 sda4
sd 0:0:3:0: Attached scsi disk sda
Slab corruption: start=c000000000431000, len=4096
d80: c0 00 00 00 00 43 1d 80 c0 00 00 00 00 43 1d 80
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 00 08
SCSI device sdb: drive cache: write through
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
sdb: Write Protect is off
Page orders: linear mapping = 24, others = 12
Found initrd at 0xc000000002400000:0xc0000000026e7000
Partition configured for 4 cpus.
Starting Linux PPC64 #1 SMP Mon Jan 16 21:48:18 UTC 2006
-----------------------------------------------------
ppc64_pft_size = 0x1b
ppc64_interrupt_controller = 0x2
platform = 0x101
physicalMemorySize = 0x1e8000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address = 0x0000000000000000
htab_hash_mask = 0xfffff
-----------------------------------------------------
[boot]0100 MM Init
[boot]0100 MM Init Done
Linux version 2.6.15-git12-20060116214818-ppc64 (geeko@buildhost) (gcc version 4.1.0 20060109 (prerelease) (SUSE Linux)) #1 SMP Mon Jan 16 21:48:18 UTC 2006
[boot]0012 Setup Arch
Node 0 Memory: 0x0-0x1e8000000
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Using dedicated idle loop
On node 0 totalpages: 1998848
DMA zone: 1998848 pages, LIFO batch:31
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 0 pages, LIFO batch:0
HighMem zone: 0 pages, LIFO batch:0
[boot]0015 Setup Done
Built 1 zonelists
Kernel command line: root=/dev/sdd3 quiet sysrq=1 xmon=on
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 131072 bytes)
time_init: decrementer frequency = 207.051000 MHz
time_init: processor frequency = 1656.408000 MHz
Console: colour dummy device 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node 0
Memory: 7838780k/7995392k available (4304k kernel code, 156612k reserved, 1936k data, 836k bss, 252k init)
Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
checking if image is initramfs... it is
Freeing initrd memory: 2972k freed
Processor 1 found.
Processor 2 found.
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
migration_cost=1,0
NET: Registered protocol family 16
Installing base platform functions...
All base functions installed
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging enabled
mapping IO 3fe00200000 -> d000080000000000, size: 100000
mapping IO 3fe00700000 -> d000080000100000, size: 100000
PCI: Probing PCI hardware done
Registering pmac pic with sysfs...
usbcore: registered new driver usbfs
usbcore: registered new driver hub
TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca)
IBM eBus Device Driver
probe_bus_pseries: processing c0000001e7ff9df0
RTAS daemon started
RTAS: event: 1, Type: Platform Error, Severity: 2
probe_bus_pseries: processing c0000001e7ff9fb8
probe_bus_pseries: processing c0000001e7ffa100
probe_bus_pseries: processing c0000001e7ffa268
audit: initializing netlink socket (disabled)
audit(1137455211.949:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Using unsupported 640x480 display at 401f0000000, depth=8, pitch=640
Console: switching to colour frame buffer device 80x30
fb0: Open Firmware frame buffer device on /pci@800000020000003/pci@2,2/pci@1/display@0
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
RAMDISK driver initialized: 16 RAM disks of 123456K size 1024 blocksize
loop: loaded (max 8 devices)
input: Macintosh mouse button emulation as /class/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PDC20275: IDE controller at PCI slot 0000:cc:01.0
PCI: Enabling device: (0000:cc:01.0), cmd 3
PDC20275: chipset revision 1
PDC20275: 100% native mode on irq 134
ide2: BM-DMA at 0xdec00-0xdec07, BIOS settings: hde:pio, hdf:pio
ide3: BM-DMA at 0xdec08-0xdec0f, BIOS settings: hdg:pio, hdh:pio
Probing IDE interface ide2...
hde: IBM RMBO0020501, ATAPI CD/DVD-ROM drive
ide2 at 0xde400-0xde407,0xddc02 on irq 134
Probing IDE interface ide3...
Probing IDE interface ide3...
PCI: Enabling device: (0000:c8:01.2), cmd 142
ehci_hcd 0000:c8:01.2: EHCI Host Controller
ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:c8:01.2: irq 133, io mem 0x400b0002000
ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd 0000:c8:01.0: OHCI Host Controller
ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:c8:01.0: irq 133, io mem 0x400b0001000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
hub 2-0:1.0: over-current change on port 1
ohci_hcd 0000:c8:01.1: OHCI Host Controller
ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:c8:01.1: irq 133, io mem 0x400b0000000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
hub 3-0:1.0: over-current change on port 1
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
oprofile: using ppc64/power5 performance monitoring.
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
TCP established hash table entries: 524288 (order: 12, 16777216 bytes)
TCP bind hash table entries: 65536 (order: 9, 2097152 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 15
Freeing unused kernel memory: 252k freed
SCSI subsystem initialized
ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005)
ipr 0000:d0:01.0: Found IOA with IRQ: 135
ipr 0000:d0:01.0: Starting IOA initialization sequence.
ipr 0000:d0:01.0: Adapter firmware version: 020A0052
ipr 0000:d0:01.0: IOA initialized.
scsi0 : IBM 570B Storage Adapter
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 00 08
SCSI device sda: drive cache: write through
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 00 08
SCSI device sda: drive cache: write through
sda: sda1 sda2 sda3 sda4
sd 0:0:3:0: Attached scsi disk sda
Slab corruption: start=c000000000431000, len=4096
d80: c0 00 00 00 00 43 1d 80 c0 00 00 00 00 43 1d 80
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 00 08
SCSI device sdb: drive cache: write through
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 00 08
SCSI device sdb: drive cache: write through
sdb: sdb1
sd 0:0:4:0: Attached scsi disk sdb
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdc: 71096640 512-byte hdwr sectors (36401 MB)
sd 0:0:3:0: Attached scsi generic sg0 type 0
sd 0:0:4:0: Attached scsi generic sg1 type 0
sdc: Write Protect is off
sdc: Mode Sense: cb 00 00 08
SCSI device sdc: drive cache: write through
SCSI device sdc: 71096640 512-byte hdwr sectors (36401 MB)
sdc: Write Protect is off
sdc: Mode Sense: cb 00 00 08
SCSI device sdc: drive cache: write through
sdc: sdc1
sd 0:0:5:0: Attached scsi disk sdc
sd 0:0:5:0: Attached scsi generic sg2 type 0
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdd: 71096640 512-byte hdwr sectors (36401 MB)
sdd: Write Protect is off
sdd: Mode Sense: cb 00 00 08
SCSI device sdd: drive cache: write through
SCSI device sdd: 71096640 512-byte hdwr sectors (36401 MB)
sdd: Write Protect is off
sdd: Mode Sense: cb 00 00 08
SCSI device sdd: drive cache: write through
sdd: sdd1 sdd2 sdd3
sd 0:0:8:0: Attached scsi disk sdd
sd 0:0:8:0: Attached scsi generic sg3 type 0
Vendor: IBM Model: VSBPD4E2 U4SCSI Rev: 4142
Type: Enclosure ANSI SCSI revision: 02
0:0:15:0: Attached scsi generic sg4 type 13
scsi: unknown device type 31
Vendor: IBM Model: 570B001 Rev: 0150
Type: Unknown ANSI SCSI revision: 00
0:255:255:255: Attached scsi generic sg5 type 31
ReiserFS: sdd3: found reiserfs format "3.6" with standard journal
ReiserFS: sdd3: using ordered data mode
ReiserFS: sdd3: journal params: device sdd3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sdd3: checking transaction log (sdd3)
ReiserFS: sdd3: Using r5 hash to sort names
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdc1 ...
md: adding sdc1 ...
md: adding sdb1 ...
md: created md0
md: bind<sdb1>
md: bind<sdc1>
md: running: <sdc1><sdb1>
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
dm-netlink version 0.0.2 loaded
md: raid1 personality registered for level 1
raid1: raid set md0 active with 1 out of 2 mirrors
md: ... autorun DONE.
hde: ATAPI 31X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache
Uniform CD-ROM driver Revision: 3.20
ts: Compaq touchscreen protocol output
Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
PCI: Enabling device: (0000:c0:01.0), cmd 143
Intel(R) PRO/10GbE Network Driver - version 1.0.100-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0000:c0:01.1), cmd 143
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0001:d8:01.0), cmd 143
eth0: Intel(R) PRO/10GbE Network Connection
Adding 1050616k swap on /dev/sda2. Priority:-1 extents:1 across:1050616k
Adding 2096472k swap on /dev/sdd2. Priority:-2 extents:1 across:2096472k
ioctl32(hald-probe-inpu:3250): Unknown cmd fd(4) cmd(40084502){00} arg(ffbdb95a) on /dev/input/ts0
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
subfs 0.9
ADDRCONF(NETDEV_UP): eth1: link is not ready
ISO 9660 Extensions: RRIP_1991A
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
--
short story of a lazy sysadmin:
alias appserv=wotan
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: 2.6.15-git12, slab corruption in ipr 2006-01-17 0:05 2.6.15-git12, slab corruption in ipr Olaf Hering @ 2006-01-18 18:42 ` Brian King 2006-01-19 21:05 ` Olaf Hering 2006-01-30 18:07 ` 2.6.15-git12, slab corruption in ipr Olaf Hering 1 sibling, 1 reply; 24+ messages in thread From: Brian King @ 2006-01-18 18:42 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-scsi, Brian J King Olaf Hering wrote: > I tested the current Linus tree + a few SuSE patches on a p710. There is > some slab corruption. Will check if plain Linus tree gives the same... I can't recreate the problem on my p5... Can you recreate it with the current git tree without any other patches? Brian > > dmesg | grep -wiC9 slab > sda: Write Protect is off > sda: Mode Sense: cb 00 00 08 > SCSI device sda: drive cache: write through > SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) > sda: Write Protect is off > sda: Mode Sense: cb 00 00 08 > SCSI device sda: drive cache: write through > sda: sda1 sda2 sda3 sda4 > sd 0:0:3:0: Attached scsi disk sda > Slab corruption: start=c000000000431000, len=4096 > d80: c0 00 00 00 00 43 1d 80 c0 00 00 00 00 43 1d 80 > Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G > Type: Direct-Access ANSI SCSI revision: 03 > SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) > sdb: Write Protect is off > sdb: Mode Sense: cb 00 00 08 > SCSI device sdb: drive cache: write through > SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) > sdb: Write Protect is off > > > Page orders: linear mapping = 24, others = 12 > Found initrd at 0xc000000002400000:0xc0000000026e7000 > Partition configured for 4 cpus. > Starting Linux PPC64 #1 SMP Mon Jan 16 21:48:18 UTC 2006 > ----------------------------------------------------- > ppc64_pft_size = 0x1b > ppc64_interrupt_controller = 0x2 > platform = 0x101 > physicalMemorySize = 0x1e8000000 > ppc64_caches.dcache_line_size = 0x80 > ppc64_caches.icache_line_size = 0x80 > htab_address = 0x0000000000000000 > htab_hash_mask = 0xfffff > ----------------------------------------------------- > [boot]0100 MM Init > [boot]0100 MM Init Done > Linux version 2.6.15-git12-20060116214818-ppc64 (geeko@buildhost) (gcc version 4.1.0 20060109 (prerelease) (SUSE Linux)) #1 SMP Mon Jan 16 21:48:18 UTC 2006 > [boot]0012 Setup Arch > Node 0 Memory: 0x0-0x1e8000000 > EEH: PCI Enhanced I/O Error Handling Enabled > PPC64 nvram contains 7168 bytes > Using dedicated idle loop > On node 0 totalpages: 1998848 > DMA zone: 1998848 pages, LIFO batch:31 > DMA32 zone: 0 pages, LIFO batch:0 > Normal zone: 0 pages, LIFO batch:0 > HighMem zone: 0 pages, LIFO batch:0 > [boot]0015 Setup Done > Built 1 zonelists > Kernel command line: root=/dev/sdd3 quiet sysrq=1 xmon=on > [boot]0020 XICS Init > xics: no ISA interrupt controller > [boot]0021 XICS Done > PID hash table entries: 4096 (order: 12, 131072 bytes) > time_init: decrementer frequency = 207.051000 MHz > time_init: processor frequency = 1656.408000 MHz > Console: colour dummy device 80x25 > Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) > Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) > freeing bootmem node 0 > Memory: 7838780k/7995392k available (4304k kernel code, 156612k reserved, 1936k data, 836k bss, 252k init) > Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480) > Security Framework v1.0.0 initialized > Mount-cache hash table entries: 256 > checking if image is initramfs... it is > Freeing initrd memory: 2972k freed > Processor 1 found. > Processor 2 found. > Processor 3 found. > Brought up 4 CPUs > Node 0 CPUs: 0-3 > migration_cost=1,0 > NET: Registered protocol family 16 > Installing base platform functions... > All base functions installed > PCI: Probing PCI hardware > IOMMU table initialized, virtual merging enabled > mapping IO 3fe00200000 -> d000080000000000, size: 100000 > mapping IO 3fe00700000 -> d000080000100000, size: 100000 > PCI: Probing PCI hardware done > Registering pmac pic with sysfs... > usbcore: registered new driver usbfs > usbcore: registered new driver hub > TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) > IBM eBus Device Driver > probe_bus_pseries: processing c0000001e7ff9df0 > RTAS daemon started > RTAS: event: 1, Type: Platform Error, Severity: 2 > probe_bus_pseries: processing c0000001e7ff9fb8 > probe_bus_pseries: processing c0000001e7ffa100 > probe_bus_pseries: processing c0000001e7ffa268 > audit: initializing netlink socket (disabled) > audit(1137455211.949:1): initialized > Total HugeTLB memory allocated, 0 > VFS: Disk quotas dquot_6.5.1 > Dquot-cache hash table entries: 512 (order 0, 4096 bytes) > Initializing Cryptographic API > io scheduler noop registered > io scheduler anticipatory registered > io scheduler deadline registered > io scheduler cfq registered > Using unsupported 640x480 display at 401f0000000, depth=8, pitch=640 > Console: switching to colour frame buffer device 80x30 > fb0: Open Firmware frame buffer device on /pci@800000020000003/pci@2,2/pci@1/display@0 > vio_register_driver: driver hvc_console registering > HVSI: registered 0 devices > Generic RTC Driver v1.07 > Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled > pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>) > RAMDISK driver initialized: 16 RAM disks of 123456K size 1024 blocksize > loop: loaded (max 8 devices) > input: Macintosh mouse button emulation as /class/input/input0 > Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > PDC20275: IDE controller at PCI slot 0000:cc:01.0 > PCI: Enabling device: (0000:cc:01.0), cmd 3 > PDC20275: chipset revision 1 > PDC20275: 100% native mode on irq 134 > ide2: BM-DMA at 0xdec00-0xdec07, BIOS settings: hde:pio, hdf:pio > ide3: BM-DMA at 0xdec08-0xdec0f, BIOS settings: hdg:pio, hdh:pio > Probing IDE interface ide2... > hde: IBM RMBO0020501, ATAPI CD/DVD-ROM drive > ide2 at 0xde400-0xde407,0xddc02 on irq 134 > Probing IDE interface ide3... > Probing IDE interface ide3... > PCI: Enabling device: (0000:c8:01.2), cmd 142 > ehci_hcd 0000:c8:01.2: EHCI Host Controller > ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1 > ehci_hcd 0000:c8:01.2: irq 133, io mem 0x400b0002000 > ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 > usb usb1: configuration #1 chosen from 1 choice > hub 1-0:1.0: USB hub found > hub 1-0:1.0: 5 ports detected > ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) > ohci_hcd 0000:c8:01.0: OHCI Host Controller > ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2 > ohci_hcd 0000:c8:01.0: irq 133, io mem 0x400b0001000 > usb usb2: configuration #1 chosen from 1 choice > hub 2-0:1.0: USB hub found > hub 2-0:1.0: 3 ports detected > hub 2-0:1.0: over-current change on port 1 > ohci_hcd 0000:c8:01.1: OHCI Host Controller > ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3 > ohci_hcd 0000:c8:01.1: irq 133, io mem 0x400b0000000 > usb usb3: configuration #1 chosen from 1 choice > hub 3-0:1.0: USB hub found > hub 3-0:1.0: 2 ports detected > hub 3-0:1.0: over-current change on port 1 > usbcore: registered new driver hiddev > usbcore: registered new driver usbhid > drivers/usb/input/hid-core.c: v2.6:USB HID core driver > mice: PS/2 mouse device common for all mice > md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md: bitmap version 4.39 > oprofile: using ppc64/power5 performance monitoring. > NET: Registered protocol family 2 > IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) > TCP established hash table entries: 524288 (order: 12, 16777216 bytes) > TCP bind hash table entries: 65536 (order: 9, 2097152 bytes) > TCP: Hash tables configured (established 524288 bind 65536) > TCP reno registered > NET: Registered protocol family 1 > NET: Registered protocol family 17 > NET: Registered protocol family 15 > Freeing unused kernel memory: 252k freed > SCSI subsystem initialized > ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) > ipr 0000:d0:01.0: Found IOA with IRQ: 135 > ipr 0000:d0:01.0: Starting IOA initialization sequence. > ipr 0000:d0:01.0: Adapter firmware version: 020A0052 > ipr 0000:d0:01.0: IOA initialized. > scsi0 : IBM 570B Storage Adapter > Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G > Type: Direct-Access ANSI SCSI revision: 03 > SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) > sda: Write Protect is off > sda: Mode Sense: cb 00 00 08 > SCSI device sda: drive cache: write through > SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) > sda: Write Protect is off > sda: Mode Sense: cb 00 00 08 > SCSI device sda: drive cache: write through > sda: sda1 sda2 sda3 sda4 > sd 0:0:3:0: Attached scsi disk sda > Slab corruption: start=c000000000431000, len=4096 > d80: c0 00 00 00 00 43 1d 80 c0 00 00 00 00 43 1d 80 > Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G > Type: Direct-Access ANSI SCSI revision: 03 > SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) > sdb: Write Protect is off > sdb: Mode Sense: cb 00 00 08 > SCSI device sdb: drive cache: write through > SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) > sdb: Write Protect is off > sdb: Mode Sense: cb 00 00 08 > SCSI device sdb: drive cache: write through > sdb: sdb1 > sd 0:0:4:0: Attached scsi disk sdb > Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G > Type: Direct-Access ANSI SCSI revision: 03 > SCSI device sdc: 71096640 512-byte hdwr sectors (36401 MB) > sd 0:0:3:0: Attached scsi generic sg0 type 0 > sd 0:0:4:0: Attached scsi generic sg1 type 0 > sdc: Write Protect is off > sdc: Mode Sense: cb 00 00 08 > SCSI device sdc: drive cache: write through > SCSI device sdc: 71096640 512-byte hdwr sectors (36401 MB) > sdc: Write Protect is off > sdc: Mode Sense: cb 00 00 08 > SCSI device sdc: drive cache: write through > sdc: sdc1 > sd 0:0:5:0: Attached scsi disk sdc > sd 0:0:5:0: Attached scsi generic sg2 type 0 > Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G > Type: Direct-Access ANSI SCSI revision: 03 > SCSI device sdd: 71096640 512-byte hdwr sectors (36401 MB) > sdd: Write Protect is off > sdd: Mode Sense: cb 00 00 08 > SCSI device sdd: drive cache: write through > SCSI device sdd: 71096640 512-byte hdwr sectors (36401 MB) > sdd: Write Protect is off > sdd: Mode Sense: cb 00 00 08 > SCSI device sdd: drive cache: write through > sdd: sdd1 sdd2 sdd3 > sd 0:0:8:0: Attached scsi disk sdd > sd 0:0:8:0: Attached scsi generic sg3 type 0 > Vendor: IBM Model: VSBPD4E2 U4SCSI Rev: 4142 > Type: Enclosure ANSI SCSI revision: 02 > 0:0:15:0: Attached scsi generic sg4 type 13 > scsi: unknown device type 31 > Vendor: IBM Model: 570B001 Rev: 0150 > Type: Unknown ANSI SCSI revision: 00 > 0:255:255:255: Attached scsi generic sg5 type 31 > ReiserFS: sdd3: found reiserfs format "3.6" with standard journal > ReiserFS: sdd3: using ordered data mode > ReiserFS: sdd3: journal params: device sdd3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 > ReiserFS: sdd3: checking transaction log (sdd3) > ReiserFS: sdd3: Using r5 hash to sort names > md: Autodetecting RAID arrays. > md: autorun ... > md: considering sdc1 ... > md: adding sdc1 ... > md: adding sdb1 ... > md: created md0 > md: bind<sdb1> > md: bind<sdc1> > md: running: <sdc1><sdb1> > md: kicking non-fresh sdc1 from array! > md: unbind<sdc1> > md: export_rdev(sdc1) > device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com > dm-netlink version 0.0.2 loaded > md: raid1 personality registered for level 1 > raid1: raid set md0 active with 1 out of 2 mirrors > md: ... autorun DONE. > hde: ATAPI 31X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache > Uniform CD-ROM driver Revision: 3.20 > ts: Compaq touchscreen protocol output > Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI > Copyright (c) 1999-2005 Intel Corporation. > PCI: Enabling device: (0000:c0:01.0), cmd 143 > Intel(R) PRO/10GbE Network Driver - version 1.0.100-k2-NAPI > Copyright (c) 1999-2005 Intel Corporation. > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > PCI: Enabling device: (0000:c0:01.1), cmd 143 > e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection > PCI: Enabling device: (0001:d8:01.0), cmd 143 > eth0: Intel(R) PRO/10GbE Network Connection > Adding 1050616k swap on /dev/sda2. Priority:-1 extents:1 across:1050616k > Adding 2096472k swap on /dev/sdd2. Priority:-2 extents:1 across:2096472k > ioctl32(hald-probe-inpu:3250): Unknown cmd fd(4) cmd(40084502){00} arg(ffbdb95a) on /dev/input/ts0 > NET: Registered protocol family 10 > lo: Disabled Privacy Extensions > IPv6 over IPv4 tunneling driver > subfs 0.9 > ADDRCONF(NETDEV_UP): eth1: link is not ready > ISO 9660 Extensions: RRIP_1991A > e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex > ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready > -- Brian King eServer Storage I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.15-git12, slab corruption in ipr 2006-01-18 18:42 ` Brian King @ 2006-01-19 21:05 ` Olaf Hering 2006-01-30 10:46 ` Olaf Hering 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-01-19 21:05 UTC (permalink / raw) To: Brian King; +Cc: linux-scsi, Brian J King On Wed, Jan 18, Brian King wrote: > Olaf Hering wrote: > > I tested the current Linus tree + a few SuSE patches on a p710. There is > > some slab corruption. Will check if plain Linus tree gives the same... > > I can't recreate the problem on my p5... Can you recreate it with the > current git tree without any other patches? I tried rc1-git1 with ipr compiled in and did not see it. The system was used for other testing and will be busy for a bit. I will look at it later. -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.15-git12, slab corruption in ipr 2006-01-19 21:05 ` Olaf Hering @ 2006-01-30 10:46 ` Olaf Hering 2006-01-30 16:49 ` Olaf Hering 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-01-30 10:46 UTC (permalink / raw) To: Brian King; +Cc: linux-scsi, Brian J King On Thu, Jan 19, Olaf Hering wrote: > On Wed, Jan 18, Brian King wrote: > > > Olaf Hering wrote: > > > I tested the current Linus tree + a few SuSE patches on a p710. There is > > > some slab corruption. Will check if plain Linus tree gives the same... > > > > I can't recreate the problem on my p5... Can you recreate it with the > > current git tree without any other patches? > > I tried rc1-git1 with ipr compiled in and did not see it. > The system was used for other testing and will be busy for a bit. I will > look at it later. I see it not only on ipr systems, also on JS20 with the media tray assigned. It doesnt reproduce all the time. I'm currently down to 'only 20 patches applied from our CVS'. The symptoms differ, I suspect the bug is also present in mainline. Just to let you know. Still looking. ... Freeing unused kernel memory: 260k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Slab corruption: start=c000000000455000, len=4096 d80: c0 00 00 00 00 45 5d 80 c0 00 00 00 00 45 5d 80 Loading loop loop: loaded (max 255 devices) Loading ipr ... -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.15-git12, slab corruption in ipr 2006-01-30 10:46 ` Olaf Hering @ 2006-01-30 16:49 ` Olaf Hering 2006-02-06 22:04 ` 2.6.16-rc1 crash in scsi_target_reap_work Olaf Hering 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-01-30 16:49 UTC (permalink / raw) To: Brian King; +Cc: linux-scsi, Brian J King On Mon, Jan 30, Olaf Hering wrote: > I see it not only on ipr systems, also on JS20 with the media tray > assigned. It doesnt reproduce all the time. I'm currently down to 'only > 20 patches applied from our CVS'. The symptoms differ, I suspect the bug > is also present in mainline. > > Just to let you know. Still looking. I guess you dont use the latest udev technology... This is what I got with only 10 (unrelated) patches applied: TCP reno registered NET: Registered protocol family 1 NET: Registered protocol family 17 NET: Registered protocol family 15 Freeing unused kernel memory: 260k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Loading loop loop: loaded (max 255 devices) Loading ipr ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) ipr 0000:c0:01.0: Found IOA with IRQ: 99 ipr 0000:c0:01.0: Starting IOA initialization sequence. ipr 0000:c0:01.0: Adapter firmware version: 020A004E ipr 0000:c0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter Vendor: IBM Model: ST373453LC Rev: C51A Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sda3 sda4 sd 0:0:3:0: Attached scsi disk sda Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 Unable to handle kernel paging request for data at address 0x00000004 Faulting instruction address: 0xc0000000001dcc98 cpu 0x0: Vector: 300 (Data Access) at [c0000000ebcd37e0] pc: c0000000001dcc98: ._raw_spin_lock+0x28/0x17c lr: c000000000388b40: ._spin_lock+0x10/0x24 sp: c0000000ebcd3a60 msr: 8000000000009032 dar: 4 dsisr: 40000000 current = 0xc0000000ebcc1000 paca = 0xc0000000004a6e00 pid = 26, comm = events/0 enter ? for help 0:mon> t [c0000000ebcd3af0] c000000000388b40 ._spin_lock+0x10/0x24 [c0000000ebcd3b70] c000000000385380 .klist_del+0x28/0x58 [c0000000ebcd3c00] c000000000262bb0 .device_del+0x50/0x120 [c0000000ebcd3ca0] d00000000007ac18 .scsi_target_reap_work+0xe0/0x12c [scsi_mod] [c0000000ebcd3d30] c000000000077bdc .run_workqueue+0x108/0x19c [c0000000ebcd3dd0] c000000000077dc0 .worker_thread+0x150/0x1c0 [c0000000ebcd3ed0] c00000000007d72c .kthread+0x140/0x190 [c0000000ebcd3f90] c000000000025d1c .kernel_thread+0x4c/0x68 knode_parent is all zeros. device_del(): (gdb) p/x dev $1 = {klist_children = {k_lock = {raw_lock = {slock = 0x0}, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff}, k_list = { next = 0xc00000006f033710, prev = 0xc00000006f033710}, get = 0xc000000000620a20, put = 0xc0000000006209f0}, knode_parent = {n_klist = 0x0, n_node = { next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, knode_driver = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = { refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = { next = 0x0, prev = 0x0}}}}, knode_bus = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, parent = 0xc00000000fc7e1a8, kobj = {k_name = 0xc00000006f033830, name = {0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x30, 0x3a, 0x32, 0x35, 0x35, 0x3a, 0x33, 0x38, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, kref = {refcount = {counter = 0x1}}, entry = {next = 0xc00000006f033848, prev = 0xc00000006f033848}, parent = 0xc00000000fc7e2d8, kset = 0xc000000000509508, ktype = 0x0, dentry = 0x0}, bus_id = {0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x30, 0x3a, 0x32, 0x35, 0x35, 0x3a, 0x33, 0x38, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, uevent_attr = {attr = {name = 0x0, owner = 0x0, mode = 0x0}, show = 0x0, store = 0x0}, sem = {count = {counter = 0x1}, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff}, task_list = {next = 0xc00000006f0338d8, prev = 0xc00000006f0338d8}}}, bus = 0x0, driver = 0x0, driver_data = 0x0, platform_data = 0x0, firmware_data = 0x0, power = {power_state = {event = 0x0}, can_wakeup = 0x0}, dma_mask = 0x0, coherent_dma_mask = 0x0, dma_pools = {next = 0xc00000006f033928, prev = 0xc00000006f033928}, dma_mem = 0x0, release = 0xd0000000000a4d38} -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* 2.6.16-rc1 crash in scsi_target_reap_work 2006-01-30 16:49 ` Olaf Hering @ 2006-02-06 22:04 ` Olaf Hering 2006-02-06 22:26 ` Olaf Hering 2006-02-06 22:44 ` James Bottomley 0 siblings, 2 replies; 24+ messages in thread From: Olaf Hering @ 2006-02-06 22:04 UTC (permalink / raw) To: linux-scsi On Mon, Jan 30, Olaf Hering wrote: > Unable to handle kernel paging request for data at address 0x00000004 > Faulting instruction address: 0xc0000000001dcc98 > cpu 0x0: Vector: 300 (Data Access) at [c0000000ebcd37e0] > pc: c0000000001dcc98: ._raw_spin_lock+0x28/0x17c > lr: c000000000388b40: ._spin_lock+0x10/0x24 > sp: c0000000ebcd3a60 > msr: 8000000000009032 > dar: 4 > dsisr: 40000000 > current = 0xc0000000ebcc1000 > paca = 0xc0000000004a6e00 > pid = 26, comm = events/0 > enter ? for help > 0:mon> t > [c0000000ebcd3af0] c000000000388b40 ._spin_lock+0x10/0x24 > [c0000000ebcd3b70] c000000000385380 .klist_del+0x28/0x58 > [c0000000ebcd3c00] c000000000262bb0 .device_del+0x50/0x120 > [c0000000ebcd3ca0] d00000000007ac18 .scsi_target_reap_work+0xe0/0x12c [scsi_mod] > [c0000000ebcd3d30] c000000000077bdc .run_workqueue+0x108/0x19c > [c0000000ebcd3dd0] c000000000077dc0 .worker_thread+0x150/0x1c0 > [c0000000ebcd3ed0] c00000000007d72c .kthread+0x140/0x190 > [c0000000ebcd3f90] c000000000025d1c .kernel_thread+0x4c/0x68 > > > knode_parent is all zeros. > > device_del(): > (gdb) p/x dev > $1 = {klist_children = {k_lock = {raw_lock = {slock = 0x0}, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff}, k_list = { > next = 0xc00000006f033710, prev = 0xc00000006f033710}, get = 0xc000000000620a20, put = 0xc0000000006209f0}, knode_parent = {n_klist = 0x0, n_node = { > next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, > owner_cpu = 0x0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, knode_driver = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = { > refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = { > next = 0x0, prev = 0x0}}}}, knode_bus = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0x0}}, n_removed = {done = 0x0, > wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, parent = 0xc00000000fc7e1a8, > kobj = {k_name = 0xc00000006f033830, name = {0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x30, 0x3a, 0x32, 0x35, 0x35, 0x3a, 0x33, 0x38, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > kref = {refcount = {counter = 0x1}}, entry = {next = 0xc00000006f033848, prev = 0xc00000006f033848}, parent = 0xc00000000fc7e2d8, kset = 0xc000000000509508, > ktype = 0x0, dentry = 0x0}, bus_id = {0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x30, 0x3a, 0x32, 0x35, 0x35, 0x3a, 0x33, 0x38, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > uevent_attr = {attr = {name = 0x0, owner = 0x0, mode = 0x0}, show = 0x0, store = 0x0}, sem = {count = {counter = 0x1}, wait = {lock = {raw_lock = {slock = 0x0}, > magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff}, task_list = {next = 0xc00000006f0338d8, prev = 0xc00000006f0338d8}}}, bus = 0x0, > driver = 0x0, driver_data = 0x0, platform_data = 0x0, firmware_data = 0x0, power = {power_state = {event = 0x0}, can_wakeup = 0x0}, dma_mask = 0x0, > coherent_dma_mask = 0x0, dma_pools = {next = 0xc00000006f033928, prev = 0xc00000006f033928}, dma_mem = 0x0, release = 0xd0000000000a4d38} I need help on this one. What I have so far (via git-bisect) is: f62870db3c73683fe566a05efa2a05f3faeb44f5 is a bad But the/a good one is 5e03e2c48fc2952f6a9e986cfa194fe905d0f569. So the remaining ones are all docu. Maybe I just dont use git-bisect correctly, and maybe the changes are all that linear. I started with: git-bisect bad 7a0268fa1a3613f2c526a9b3058701b277f6abe1 git-bisect good a020ff412f0ecbb1e4aae1681b287e5785dd77b5 2006-02-05 14:07 .git/refs/bisect/good-a020ff412f0ecbb1e4aae1681b287e5785dd77b5 2006-02-05 21:25 .git/refs/bisect/good-f61ea1b0c825a20a1826bb43a226387091934586 2006-02-06 07:02 .git/refs/bisect/good-f2e46561cc1afa82b18b2fc6efc8510ec57c7d7d 2006-02-06 10:24 .git/refs/bisect/good-d779188d2baf436e67fe8816fca2ef53d246900f 2006-02-06 14:37 .git/refs/bisect/good-1cb9e8e01d2c73184e2074f37cd155b3c4fdaae6 2006-02-06 17:43 .git/refs/bisect/good-54e08a2392e99ba9e48ce1060e0b52a39118419c 2006-02-06 19:52 .git/refs/bisect/bad 2006-02-06 21:57 .git/refs/bisect/good-4a4efbdee278b2f4ed91aad2db5c006ff754276e 142 Starting Linux PPC64 #23 SMP Sun Feb 5 14:14:14 CET 2006 good 96 Starting Linux PPC64 #24 SMP Sun Feb 5 16:48:11 CET 2006 bad 234 Starting Linux PPC64 #25 SMP Sun Feb 5 18:21:23 CET 2006 bad 744 Starting Linux PPC64 #26 SMP Sun Feb 5 21:32:57 CET 2006 good 270 Starting Linux PPC64 #27 SMP Mon Feb 6 07:10:00 CET 2006 good 298 Starting Linux PPC64 #28 SMP Mon Feb 6 10:42:03 CET 2006 good 234 Starting Linux PPC64 #29 SMP Mon Feb 6 14:43:20 CET 2006 good 54 Starting Linux PPC64 #30 SMP Mon Feb 6 17:50:49 CET 2006 good ? 18 Starting Linux PPC64 #31 SMP Mon Feb 6 18:56:07 CET 2006 154 Starting Linux PPC64 #32 SMP Mon Feb 6 19:56:10 CET 2006 bad 50 Starting Linux PPC64 #33 SMP Mon Feb 6 22:03:54 CET 2006 still going ^^ #boots / 2 One of the good ones did not reproduce within 370 reboots. Usually, it crashes in less than 80. Looking back through the git-commit mails, I'm past a huge pile of unrelated network driver changes. Will continue with git-bisect. -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-06 22:04 ` 2.6.16-rc1 crash in scsi_target_reap_work Olaf Hering @ 2006-02-06 22:26 ` Olaf Hering 2006-02-06 22:44 ` James Bottomley 1 sibling, 0 replies; 24+ messages in thread From: Olaf Hering @ 2006-02-06 22:26 UTC (permalink / raw) To: linux-scsi On Mon, Feb 06, Olaf Hering wrote: > I started with: > git-bisect bad 7a0268fa1a3613f2c526a9b3058701b277f6abe1 > git-bisect good a020ff412f0ecbb1e4aae1681b287e5785dd77b5 I just learned there is a log: .git/BISECT_LOG git-bisect start # bad: [7a0268fa1a3613f2c526a9b3058701b277f6abe1] powerpc/64: per cpu data optimisations git-bisect bad 7a0268fa1a3613f2c526a9b3058701b277f6abe1 # good: [a020ff412f0ecbb1e4aae1681b287e5785dd77b5] Fix pragma packing in ip2 driver git-bisect good a020ff412f0ecbb1e4aae1681b287e5785dd77b5 # bad: [864472e9b8fa76ffaad17dfcb84d79e16df6828c] NFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctly git-bisect bad 864472e9b8fa76ffaad17dfcb84d79e16df6828c # bad: [25c862cc9ea9b312c25a9f577f91b973131f1261] Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild git-bisect bad 25c862cc9ea9b312c25a9f577f91b973131f1261 # good: [f61ea1b0c825a20a1826bb43a226387091934586] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 git-bisect good f61ea1b0c825a20a1826bb43a226387091934586 # good: [f2e46561cc1afa82b18b2fc6efc8510ec57c7d7d] sky2: no irq disable needed during tx git-bisect good f2e46561cc1afa82b18b2fc6efc8510ec57c7d7d # good: [d779188d2baf436e67fe8816fca2ef53d246900f] Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 git-bisect good d779188d2baf436e67fe8816fca2ef53d246900f # good: [1cb9e8e01d2c73184e2074f37cd155b3c4fdaae6] Merge branch 'upstream' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev git-bisect good 1cb9e8e01d2c73184e2074f37cd155b3c4fdaae6 # good: [54e08a2392e99ba9e48ce1060e0b52a39118419c] kbuild: tags file generation fixup git-bisect good 54e08a2392e99ba9e48ce1060e0b52a39118419c # bad: [52347f4e810ba323d02cd2c26b5d738f4a2c3d5e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial git-bisect bad 52347f4e810ba323d02cd2c26b5d738f4a2c3d5e # bad: [f62870db3c73683fe566a05efa2a05f3faeb44f5] Documentation/SubmittingPatches: update Trivial Patch Monkey information git-bisect bad f62870db3c73683fe566a05efa2a05f3faeb44f5 # good: [4a4efbdee278b2f4ed91aad2db5c006ff754276e] s/retreiv/retriev/g git-bisect good 4a4efbdee278b2f4ed91aad2db5c006ff754276e Guess I need to let the good ones run much longer now. -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-06 22:04 ` 2.6.16-rc1 crash in scsi_target_reap_work Olaf Hering 2006-02-06 22:26 ` Olaf Hering @ 2006-02-06 22:44 ` James Bottomley 2006-02-09 20:05 ` Olaf Hering 1 sibling, 1 reply; 24+ messages in thread From: James Bottomley @ 2006-02-06 22:44 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-scsi On Mon, 2006-02-06 at 23:04 +0100, Olaf Hering wrote: > I need help on this one. > What I have so far (via git-bisect) is: My guess would be this patch: commit 863a930a40eb7f2d18534c2c166b22582f5c6cfd [SCSI] fix scsi_reap_target() device_del from atomic context scsi_reap_target() was desgined to be called from any context. However it must do a device_del() of the target device, which may only be called from user context. Thus we have to reimplement scsi_reap_target() via a workqueue. James ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-06 22:44 ` James Bottomley @ 2006-02-09 20:05 ` Olaf Hering 2006-02-10 10:11 ` Olaf Hering 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-02-09 20:05 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi On Mon, Feb 06, James Bottomley wrote: > On Mon, 2006-02-06 at 23:04 +0100, Olaf Hering wrote: > > I need help on this one. > > What I have so far (via git-bisect) is: > > My guess would be this patch: > > > commit > 863a930a40eb7f2d18534c2c166b22582f5c6cfd > [SCSI] fix scsi_reap_target() device_del from atomic context I'm testing this patch now. +++ linux-2.6.16-rc2-olh/drivers/scsi/scsi_scan.c @@ -428,18 +428,19 @@ static void scsi_target_reap_work(void * */ void scsi_target_reap(struct scsi_target *starget) { - struct work_queue_wrapper *wqw = - kzalloc(sizeof(struct work_queue_wrapper), GFP_ATOMIC); + struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); + unsigned long flags; + spin_lock_irqsave(shost->host_lock, flags); - if (!wqw) { - starget_printk(KERN_ERR, starget, - "Failed to allocate memory in scsi_reap_target()\n"); + if (--starget->reap_ref == 0 && list_empty(&starget->devices)) { + list_del_init(&starget->siblings); + spin_unlock_irqrestore(shost->host_lock, flags); + device_del(&starget->dev); + transport_unregister_device(&starget->dev); + put_device(&starget->dev); return; } - - INIT_WORK(&wqw->work, scsi_target_reap_work, wqw); - wqw->starget = starget; - schedule_work(&wqw->work); + spin_unlock_irqrestore(shost->host_lock, flags); } /** -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-09 20:05 ` Olaf Hering @ 2006-02-10 10:11 ` Olaf Hering 2006-02-10 14:04 ` James Bottomley 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-02-10 10:11 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi On Thu, Feb 09, Olaf Hering wrote: > On Mon, Feb 06, James Bottomley wrote: > > > On Mon, 2006-02-06 at 23:04 +0100, Olaf Hering wrote: > > > I need help on this one. > > > What I have so far (via git-bisect) is: > > > > My guess would be this patch: > > > > > > commit > > 863a930a40eb7f2d18534c2c166b22582f5c6cfd > > [SCSI] fix scsi_reap_target() device_del from atomic context > > I'm testing this patch now. 550 reboots without crash, with this patch reverted. Will try the execute_in_process_context thing now. -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 10:11 ` Olaf Hering @ 2006-02-10 14:04 ` James Bottomley 2006-02-10 14:10 ` Olaf Hering 2006-02-10 21:28 ` Brian King 0 siblings, 2 replies; 24+ messages in thread From: James Bottomley @ 2006-02-10 14:04 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-scsi On Fri, 2006-02-10 at 11:11 +0100, Olaf Hering wrote: > 550 reboots without crash, with this patch reverted. > Will try the execute_in_process_context thing now. I wouldn't bother ... because of the structure, the execute_in_process_context() patch must have the same bug, but the context check will make it much more difficult to hit. Go back to the original and see if you can diagnose what is NULL and why. The target is supposed to have a reference on the parent, so what was the parent in this case? If it's a host, there's nothing I can think of that can produce the behaviour you see; if it's something else, like an rport or phy then we may have a transport class issue. James ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 14:04 ` James Bottomley @ 2006-02-10 14:10 ` Olaf Hering 2006-02-10 23:01 ` Olaf Hering 2006-02-10 21:28 ` Brian King 1 sibling, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-02-10 14:10 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi On Fri, Feb 10, James Bottomley wrote: > On Fri, 2006-02-10 at 11:11 +0100, Olaf Hering wrote: > > 550 reboots without crash, with this patch reverted. > > Will try the execute_in_process_context thing now. > > I wouldn't bother ... because of the structure, the > execute_in_process_context() patch must have the same bug, but the > context check will make it much more difficult to hit. Ok, will revert to the -git7 status and poke around once it crashes. Currently it runs ok since almost 150 reboots. -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 14:10 ` Olaf Hering @ 2006-02-10 23:01 ` Olaf Hering 2006-02-10 23:21 ` Brian King 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-02-10 23:01 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi On Fri, Feb 10, Olaf Hering wrote: > On Fri, Feb 10, James Bottomley wrote: > > > On Fri, 2006-02-10 at 11:11 +0100, Olaf Hering wrote: > > > 550 reboots without crash, with this patch reverted. > > > Will try the execute_in_process_context thing now. > > > > I wouldn't bother ... because of the structure, the > > execute_in_process_context() patch must have the same bug, but the > > context check will make it much more difficult to hit. > > Ok, will revert to the -git7 status and poke around once it crashes. this is struct scsi_target in scsi_target_reap_work() (gdb) p st $1 = {starget_sdev_user = 0x0, siblings = {next = 0xc0000000024caca8, prev = 0xc0000000024caca8}, devices = {next = 0xc0000000024cacb8, prev = 0xc0000000024cacb8}, dev = {klist_children = {k_lock = {raw_lock = {slock = 0}, magic = 3735899821, owner_cpu = 4294967295, owner = 0xffffffffffffffff}, k_list = { next = 0xc0000000024cace0, prev = 0xc0000000024cace0}, get = 0xc000000000614f68, put = 0xc000000000614f38}, knode_parent = {n_klist = 0x0, n_node = { next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0}}, n_removed = {done = 0, wait = {lock = {raw_lock = {slock = 0}, magic = 0, owner_cpu = 0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, knode_driver = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = {refcount = { counter = 0}}, n_removed = {done = 0, wait = {lock = {raw_lock = {slock = 0}, magic = 0, owner_cpu = 0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, knode_bus = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0}}, n_removed = {done = 0, wait = { lock = {raw_lock = {slock = 0}, magic = 0, owner_cpu = 0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, parent = 0xc00000000303a1a8, kobj = { k_name = 0xc0000000024cae00 <Address 0xc0000000024cae00 out of bounds>, name = "target0:255:100\000\000\000\000", kref = {refcount = {counter = 6}}, entry = { next = 0xc0000000024cae18, prev = 0xc0000000024cae18}, parent = 0xc00000000303a2d8, kset = 0xc000000000500c88, ktype = 0x0, dentry = 0x0}, bus_id = "target0:255:100\000\000\000\000", uevent_attr = {attr = {name = 0x0, owner = 0x0, mode = 0}, show = 0, store = 0}, sem = {count = {counter = 1}, wait = {lock = {raw_lock = {slock = 0}, magic = 3735899821, owner_cpu = 4294967295, owner = 0xffffffffffffffff}, task_list = {next = 0xc0000000024caea8, prev = 0xc0000000024caea8}}}, bus = 0x0, driver = 0x0, driver_data = 0x0, platform_data = 0x0, firmware_data = 0x0, power = {power_state = {event = 0}, can_wakeup = 0}, dma_mask = 0x0, coherent_dma_mask = 0, dma_pools = {next = 0xc0000000024caef8, prev = 0xc0000000024caef8}, dma_mem = 0x0, release = 0xd0000000002190c0}, reap_ref = 0, channel = 255, id = 100, create = 0, scsi_level = 0 '\0', hostdata = 0x0, starget_data = 0x1ffffffec60} (gdb) p/x st $2 = {starget_sdev_user = 0x0, siblings = {next = 0xc0000000024caca8, prev = 0xc0000000024caca8}, devices = {next = 0xc0000000024cacb8, prev = 0xc0000000024cacb8}, dev = {klist_children = {k_lock = {raw_lock = {slock = 0x0}, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff}, k_list = { next = 0xc0000000024cace0, prev = 0xc0000000024cace0}, get = 0xc000000000614f68, put = 0xc000000000614f38}, knode_parent = {n_klist = 0x0, n_node = { next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, knode_driver = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = { refcount = {counter = 0x0}}, n_removed = {done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = { next = 0x0, prev = 0x0}}}}, knode_bus = {n_klist = 0x0, n_node = {next = 0x0, prev = 0x0}, n_ref = {refcount = {counter = 0x0}}, n_removed = { done = 0x0, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0x0, owner_cpu = 0x0, owner = 0x0}, task_list = {next = 0x0, prev = 0x0}}}}, parent = 0xc00000000303a1a8, kobj = {k_name = 0xc0000000024cae00, name = {0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x30, 0x3a, 0x32, 0x35, 0x35, 0x3a, 0x31, 0x30, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0}, kref = {refcount = {counter = 0x6}}, entry = {next = 0xc0000000024cae18, prev = 0xc0000000024cae18}, parent = 0xc00000000303a2d8, kset = 0xc000000000500c88, ktype = 0x0, dentry = 0x0}, bus_id = {0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x30, 0x3a, 0x32, 0x35, 0x35, 0x3a, 0x31, 0x30, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0}, uevent_attr = {attr = {name = 0x0, owner = 0x0, mode = 0x0}, show = 0x0, store = 0x0}, sem = { count = {counter = 0x1}, wait = {lock = {raw_lock = {slock = 0x0}, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff}, task_list = { next = 0xc0000000024caea8, prev = 0xc0000000024caea8}}}, bus = 0x0, driver = 0x0, driver_data = 0x0, platform_data = 0x0, firmware_data = 0x0, power = { power_state = {event = 0x0}, can_wakeup = 0x0}, dma_mask = 0x0, coherent_dma_mask = 0x0, dma_pools = {next = 0xc0000000024caef8, prev = 0xc0000000024caef8}, dma_mem = 0x0, release = 0xd0000000002190c0}, reap_ref = 0x0, channel = 0xff, id = 0x64, create = 0x0, scsi_level = 0x0, hostdata = 0x0, starget_data = 0x1ffffffec60} 1:mon> d c0000000024caca0 666 c0000000024caca0 0000000000000000 c0000000024caca8 |.............L..| c0000000024cacb0 c0000000024caca8 c0000000024cacb8 |.....L.......L..| c0000000024cacc0 c0000000024cacb8 00000000dead4ead |.....L........N.| c0000000024cacd0 ffffffff00000000 ffffffffffffffff |................| c0000000024cace0 c0000000024cace0 c0000000024cace0 |.....L.......L..| c0000000024cacf0 c000000000614f68 c000000000614f38 |.....aOh.....aO8| c0000000024cad00 0000000000000000 0000000000000000 |................| c0000000024cad10 0000000000000000 0000000000000000 |................| c0000000024cad20 0000000000000000 0000000000000000 |................| c0000000024cad30 0000000000000000 0000000000000000 |................| c0000000024cad40 0000000000000000 0000000000000000 |................| c0000000024cad50 0000000000000000 0000000000000000 |................| c0000000024cad60 0000000000000000 0000000000000000 |................| c0000000024cad70 0000000000000000 0000000000000000 |................| c0000000024cad80 0000000000000000 0000000000000000 |................| c0000000024cad90 0000000000000000 0000000000000000 |................| c0000000024cada0 0000000000000000 0000000000000000 |................| c0000000024cadb0 0000000000000000 0000000000000000 |................| c0000000024cadc0 0000000000000000 0000000000000000 |................| c0000000024cadd0 0000000000000000 0000000000000000 |................| c0000000024cade0 0000000000000000 0000000000000000 |................| c0000000024cadf0 c00000000303a1a8 c0000000024cae00 |.............L..| c0000000024cae00 746172676574303a 3235353a31303000 |target0:255:100.| c0000000024cae10 0000000000000006 c0000000024cae18 |.............L..| c0000000024cae20 c0000000024cae18 c00000000303a2d8 |.....L..........| c0000000024cae30 c000000000500c88 0000000000000000 |.....P..........| c0000000024cae40 0000000000000000 746172676574303a |........target0:| c0000000024cae50 3235353a31303000 0000000000000000 |255:100.........| c0000000024cae60 0000000000000000 0000000000000000 |................| c0000000024cae70 0000000000000000 0000000000000000 |................| c0000000024cae80 0000000000000000 0000000100000000 |................| c0000000024cae90 00000000dead4ead ffffffff00000000 |......N.........| c0000000024caea0 ffffffffffffffff c0000000024caea8 |.............L..| c0000000024caeb0 c0000000024caea8 0000000000000000 |.....L..........| c0000000024caec0 0000000000000000 0000000000000000 |................| c0000000024caed0 0000000000000000 0000000000000000 |................| c0000000024caee0 0000000000000000 0000000000000000 |................| c0000000024caef0 0000000000000000 c0000000024caef8 |.............L..| c0000000024caf00 c0000000024caef8 0000000000000000 |.....L..........| c0000000024caf10 d0000000002190c0 00000000000000ff |.....!..........| c0000000024caf20 0000006400000000 0000000000000000 |...d............| c0000000024caf30 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf40 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf50 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf60 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf70 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf80 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf90 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafa0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafb0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafc0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafd0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafe0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| .... -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 23:01 ` Olaf Hering @ 2006-02-10 23:21 ` Brian King 2006-02-10 23:29 ` Olaf Hering 0 siblings, 1 reply; 24+ messages in thread From: Brian King @ 2006-02-10 23:21 UTC (permalink / raw) To: Olaf Hering; +Cc: James Bottomley, linux-scsi Olaf Hering wrote: > On Fri, Feb 10, Olaf Hering wrote: > >> On Fri, Feb 10, James Bottomley wrote: >> >>>On Fri, 2006-02-10 at 11:11 +0100, Olaf Hering wrote: >>>>550 reboots without crash, with this patch reverted. >>>>Will try the execute_in_process_context thing now. >>>I wouldn't bother ... because of the structure, the >>>execute_in_process_context() patch must have the same bug, but the >>>context check will make it much more difficult to hit. >>Ok, will revert to the -git7 status and poke around once it crashes. > > this is struct scsi_target in scsi_target_reap_work() > > release = 0xd0000000002190c0}, reap_ref = 0, channel = 255, id = 100, create = 0, scsi_level = 0 '\0', hostdata = 0x0, starget_data = 0x1ffffffec60} ^^^^^^^^^^^^^ This is interesting.. This means that it is on ipr's logical scsi bus (255). Assuming this is happening at boot time and not due to a user initiated scan through sysfs, this means we are going through the scsi_add_device path, rather than the scsi_scan_host path... Brian > (gdb) p/x st > dma_mem = 0x0, release = 0xd0000000002190c0}, reap_ref = 0x0, channel = 0xff, id = 0x64, create = 0x0, scsi_level = 0x0, hostdata = 0x0, > starget_data = 0x1ffffffec60} > > 1:mon> d c0000000024caca0 666 > c0000000024caca0 0000000000000000 c0000000024caca8 |.............L..| > c0000000024cacb0 c0000000024caca8 c0000000024cacb8 |.....L.......L..| > c0000000024cacc0 c0000000024cacb8 00000000dead4ead |.....L........N.| > c0000000024cacd0 ffffffff00000000 ffffffffffffffff |................| > c0000000024cace0 c0000000024cace0 c0000000024cace0 |.....L.......L..| > c0000000024cacf0 c000000000614f68 c000000000614f38 |.....aOh.....aO8| > c0000000024cad00 0000000000000000 0000000000000000 |................| > c0000000024cad10 0000000000000000 0000000000000000 |................| > c0000000024cad20 0000000000000000 0000000000000000 |................| > c0000000024cad30 0000000000000000 0000000000000000 |................| > c0000000024cad40 0000000000000000 0000000000000000 |................| > c0000000024cad50 0000000000000000 0000000000000000 |................| > c0000000024cad60 0000000000000000 0000000000000000 |................| > c0000000024cad70 0000000000000000 0000000000000000 |................| > c0000000024cad80 0000000000000000 0000000000000000 |................| > c0000000024cad90 0000000000000000 0000000000000000 |................| > c0000000024cada0 0000000000000000 0000000000000000 |................| > c0000000024cadb0 0000000000000000 0000000000000000 |................| > c0000000024cadc0 0000000000000000 0000000000000000 |................| > c0000000024cadd0 0000000000000000 0000000000000000 |................| > c0000000024cade0 0000000000000000 0000000000000000 |................| > c0000000024cadf0 c00000000303a1a8 c0000000024cae00 |.............L..| > c0000000024cae00 746172676574303a 3235353a31303000 |target0:255:100.| > c0000000024cae10 0000000000000006 c0000000024cae18 |.............L..| > c0000000024cae20 c0000000024cae18 c00000000303a2d8 |.....L..........| > c0000000024cae30 c000000000500c88 0000000000000000 |.....P..........| > c0000000024cae40 0000000000000000 746172676574303a |........target0:| > c0000000024cae50 3235353a31303000 0000000000000000 |255:100.........| > c0000000024cae60 0000000000000000 0000000000000000 |................| > c0000000024cae70 0000000000000000 0000000000000000 |................| > c0000000024cae80 0000000000000000 0000000100000000 |................| > c0000000024cae90 00000000dead4ead ffffffff00000000 |......N.........| > c0000000024caea0 ffffffffffffffff c0000000024caea8 |.............L..| > c0000000024caeb0 c0000000024caea8 0000000000000000 |.....L..........| > c0000000024caec0 0000000000000000 0000000000000000 |................| > c0000000024caed0 0000000000000000 0000000000000000 |................| > c0000000024caee0 0000000000000000 0000000000000000 |................| > c0000000024caef0 0000000000000000 c0000000024caef8 |.............L..| > c0000000024caf00 c0000000024caef8 0000000000000000 |.....L..........| > c0000000024caf10 d0000000002190c0 00000000000000ff |.....!..........| > c0000000024caf20 0000006400000000 0000000000000000 |...d............| > c0000000024caf30 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024caf40 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024caf50 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024caf60 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024caf70 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024caf80 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024caf90 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024cafa0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024cafb0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024cafc0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024cafd0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > c0000000024cafe0 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| > .... > -- Brian King eServer Storage I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 23:21 ` Brian King @ 2006-02-10 23:29 ` Olaf Hering 2006-02-11 10:34 ` Olaf Hering 2006-02-20 23:00 ` Brian King 0 siblings, 2 replies; 24+ messages in thread From: Olaf Hering @ 2006-02-10 23:29 UTC (permalink / raw) To: Brian King; +Cc: James Bottomley, linux-scsi [-- Attachment #1: Type: text/plain, Size: 698 bytes --] On Fri, Feb 10, Brian King wrote: > > release = 0xd0000000002190c0}, reap_ref = 0, channel = 255, id = 100, > > create = 0, scsi_level = 0 '\0', hostdata = 0x0, starget_data = > > 0x1ffffffec60} > ^^^^^^^^^^^^^ > > This is interesting.. This means that it is on ipr's logical scsi bus > (255). Assuming this is > happening at boot time and not due to a user initiated scan through sysfs, > this > means we are going through the scsi_add_device path, rather than the > scsi_scan_host > path... I forgot to attach my notes from the crash, it contains parts of the bootlog. -- short story of a lazy sysadmin: alias appserv=wotan [-- Attachment #2: 2.6.16-rc2-git7-notes.txt --] [-- Type: text/plain, Size: 34257 bytes --] 1:mon> ? Commands: b show breakpoints bd set data breakpoint bi set instruction breakpoint bc clear breakpoint c print cpus stopped in xmon c# try to switch to cpu number h (in hex) C checksum d dump bytes di dump instructions df dump float values dd dump double values e print exception information f flush cache la lookup symbol+offset of specified address ls lookup address of specified symbol m examine/change memory mm move a block of memory ms set a block of memory md compare two blocks of memory ml locate a block of memory mz zero a block of memory mi show information about memory allocation p call a procedure r print registers s single step S print special registers t print backtrace x exit monitor and recover X exit monitor and dont recover u dump segment table or SLB ? help zr reboot zh halt Good boot: Freeing unused kernel memory: 252k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Loading loop loop: loaded (max 255 devices) Loading ipr ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) ipr 0000:c0:01.0: Found IOA with IRQ: 99 ipr 0000:c0:01.0: Starting IOA initialization sequence. ipr 0000:c0:01.0: Adapter firmware version: 020A004E ipr 0000:c0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter Vendor: IBM Model: ST373453LC Rev: C51A Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sda3 sda4 sd 0:0:3:0: Attached scsi disk sda Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 sd 0:0:3:0: Attached scsi generic sg0 type 0 0:0:15:0: Attached scsi generic sg1 type 13 scsi: unknown device type 31 Vendor: IBM Model: 570B001 Rev: 0150 Type: Unknown ANSI SCSI revision: 00 0:255:255:255: Attached scsi generic sg2 type 31 ipr 0002:c8:01.0: Found IOA with IRQ: 133 ipr 0002:c8:01.0: Initializing IOA. ipr 0002:c8:01.0: Starting IOA initialization sequence. ipr 0002:c8:01.0: Adapter firmware version: 020A004E ipr 0002:c8:01.0: IOA initialized. scsi1 : IBM 570B Storage Adapter Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 1:0:15:0: Attached scsi generic sg3 type 13 scsi: unknown device type 31 Vendor: IBM Model: 570B001 Rev: 0150 Type: Unknown ANSI SCSI revision: 00 1:255:255:255: Attached scsi generic sg4 type 31 Bad boot: Freeing unused kernel memory: 252k freed Starting udevd Creating devices Loading sd_mod SCSI subsystem initialized Loading loop loop: loaded (max 255 devices) Loading ipr ipr: IBM Power RAID SCSI Device Driver version: 2.1.1 (November 15, 2005) ipr 0000:c0:01.0: Found IOA with IRQ: 99 ipr 0000:c0:01.0: Starting IOA initialization sequence. ipr 0000:c0:01.0: Adapter firmware version: 020A004E ipr 0000:c0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter Vendor: IBM Model: ST373453LC Rev: C51A Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sda3 sda4 sd 0:0:3:0: Attached scsi disk sda Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 sd 0:0:3:0: Attached scsi generic sg0 type 0 0:0:15:0: Attached scsi generic sg1 type 13 Unable to handle kernel paging request for data at address 0x00000004 Faulting instruction address: 0xc0000000001dd0dc cpu 0x1: Vector: 300 (Data Access) at [c0000000036937e0] pc: c0000000001dd0dc: ._raw_spin_lock+0x2c/0x18c lr: c0000000003821f0: ._spin_lock+0x10/0x24 sp: c000000003693a60 msr: 8000000000009032 dar: 4 dsisr: 40000000 current = 0xc0000000eb9777f0 paca = 0xc00000000049f000 pid = 27, comm = events/1 enter ? for help 1:mon> t [c000000003693af0] c0000000003821f0 ._spin_lock+0x10/0x24 [c000000003693b70] c00000000037ea44 .klist_del+0x28/0x58 [c000000003693c00] c000000000262d98 .device_del+0x50/0x120 [c000000003693ca0] d0000000001eec64 .scsi_target_reap_work+0xe0/0x12c [scsi_mod] [c000000003693d30] c000000000076e70 .run_workqueue+0x108/0x19c [c000000003693dd0] c000000000077054 .worker_thread+0x150/0x1c0 [c000000003693ed0] c00000000007c9f8 .kthread+0x140/0x190 [c000000003693f90] c000000000025764 .kernel_thread+0x4c/0x68 1:mon> r R00 = ffffffffdead0000 R16 = 0000000000000000 R01 = c000000003693a60 R17 = 0000000000000000 R02 = c000000000630a38 R18 = 0000000000000000 R03 = 0000000000000000 R19 = 0000000000000000 R04 = 000000000000003e R20 = 0000000000c00000 R05 = 000000000000003e R21 = 0000000000000008 R06 = c000000003d4b2e8 R22 = 0000000000000000 R07 = 0000000000000000 R23 = 4000000000400000 R08 = c0000000006ff018 R24 = c00000000049ee00 R09 = c0000000006e4338 R25 = c00000007257fc40 R10 = c0000000024f2d28 R26 = c000000002364b88 R11 = c000000000262d48 R27 = c000000003d4b268 R12 = 0000000024000084 R28 = 0000000000000000 R13 = c00000000049f000 R29 = c0000000024cad00 R14 = 0000000000000000 R30 = c0000000004fadc8 R15 = 0000000000000000 R31 = 0000000000000000 pc = c0000000001dd0dc ._raw_spin_lock+0x2c/0x18c lr = c0000000003821f0 ._spin_lock+0x10/0x24 msr = 8000000000009032 cr = 24008084 ctr = c000000000262d48 xer = 0000000000000000 trap = 300 dar = 0000000000000004 dsisr = 40000000 1:mon> 1:mon> echo $(( 0xc000000003693d30 - 0xc000000003693a60 )) 720 1:mon> d c000000003693a60 720 c000000003693a60 c000000003693af0 c000000003880ec8 |.....i:.........| c000000003693a70 c000000000500d60 0000000000000000 |.....P.`........| c000000003693a80 c000000003693b20 0000000000000000 |.....i; ........| c000000003693a90 c0000000004fabd0 c000000000630a38 |.....O.......c.8| c000000003693aa0 c000000003693b20 c000000000630a38 |.....i; .....c.8| c000000003693ab0 c000000003693b50 c000000003880ff8 |.....i;P........| c000000003693ac0 c000000003693b50 c000000007520808 |.....i;P.....R..| c000000003693ad0 0000000000000000 c0000000024cad00 |.............L..| c000000003693ae0 c000000000518718 c0000000024cacc8 |.....Q.......L..| _spin_lock c000000003693af0 c000000003693b70 c000000003dc8430 |.....i;p.......0| c000000003693b00 c0000000003821f0 c000000003880ec8 |.....8!.........| c000000003693b10 c0000000004facb0 c000000003880ea0 |.....O..........| c000000003693b20 d0000000001eebec c000000003d4b000 |................| c000000003693b30 c00000006fffc600 0000000000000000 |....o...........| c000000003693b40 c0000000004dfe88 c000000003d4b260 |.....M.........`| c000000003693b50 c000000003693bf0 c0000000006160f0 |.....i;......a`.| c000000003693b60 c000000000518aa0 c000000000501548 |.....Q.......P.H| klist_del c000000003693b70 c000000003693c00 c000000003693d30 |.....i<......i=0| c000000003693b80 c00000000037ea44 c000000002316740 |.....7.D.....1g@| c000000003693b90 c000000003693c10 c0000000eb9777f0 |.....i<.......w.| c000000003693ba0 c000000003693c20 0000000000000008 |.....i< ........| c000000003693bb0 c0000000001d6908 4000000000400000 |......i.@....@..| c000000003693bc0 8000000000009032 c000000003d4b268 |.......2.......h| c000000003693bd0 d0000000001eebec c0000000ebf19ad0 |................| c000000003693be0 c00000000303a1a8 c0000000024cadf8 |.............L..| c000000003693bf0 c000000003d4b268 8000000000009032 |.......h.......2| device_del c000000003693c00 c000000003693ca0 c0000000024cacc8 |.....i<......L..| c000000003693c10 c000000000262d98 c0000000024caca0 |.....&-......L..| c000000003693c20 c000000003693ca0 c0000000eb608000 |.....i<......`..| c000000003693c30 c00000000026acdc 0000000000000000 |.....&..........| c000000003693c40 c00000000038209c 0000000000000000 |.....8 .........| c000000003693c50 8000000000001032 c000000002364bc0 |.......2.....6K.| c000000003693c60 c000000003d4b260 0000000000000003 |.......`........| c000000003693c70 c000000002364b88 c000000003d4b268 |.....6K........h| c000000003693c80 c000000003d4b268 c0000000024cacc8 |.......h.....L..| c000000003693c90 c00000000303a000 c0000000024caca0 |.............L..| scsi_target_reap_work c000000003693ca0 c000000003693d30 2400008402364b60 |.....i=0$....6K`| c000000003693cb0 d0000000001eec64 4400002803b677f0 |.......dD..(..w.| c000000003693cc0 c0000000003820d4 d000000000221930 |.....8 ......".0| c000000003693cd0 0000000000000000 0000000000000000 |................| c000000003693ce0 0000000000000003 0000000000000000 |................| c000000003693cf0 0000000022000022 c00000007257fc40 |....".."....rW.@| c000000003693d00 c000000002364b88 c000000003d4b300 |.....6K.........| c000000003693d10 d000000000219228 c000000003d4b268 |.....!.(.......h| c000000003693d20 c0000000004dc890 c000000002364b60 |.....M.......6K`| run_workqueue c000000003693d30 c000000003693dd0 0000000000000000 |.....i=.........| c000000003693d40 c000000000076e70 000000000000021c |......np........| c000000003693d50 c00000000007ce98 c000000000630a38 |.............c.8| c000000003693d60 0000000000000000 0000000000000000 |................| c000000003693d70 0000000000c00000 0000000000000008 |................| c000000003693d80 0000000000000000 4000000000400000 |........@....@..| c000000003693d90 c00000000049ee00 c00000007257fc40 |.....I......rW.@| c000000003693da0 c000000002364b60 c00000007257fbd0 |.....6K`....rW..| c000000003693db0 c000000002364b88 c000000002364b98 |.....6K......6K.| c000000003693dc0 c0000000004dc890 c000000002364b60 |.....M.......6K`| worker_thread c000000003693dd0 c000000003693ed0 2400002200000000 |.....i>.$.."....| c000000003693de0 c000000000077054 0000000000000000 |......pT........| c000000003693df0 0000000000000000 c00000007257fc40 |............rW.@| c000000003693e00 c000000002364b60 8000000000009032 |.....6K`.......2| c000000003693e10 c00000007257fbd8 c00000007257fbd0 |....rW......rW..| c000000003693e20 00000000004dcba0 fffffffffffffffc |.....M..........| c000000003693e30 0000000000000001 0000000000000000 |................| c000000003693e40 0000000000000000 c0000000eb9777f0 |..............w.| c000000003693e50 c0000000005f0608 0000000000100100 |....._..........| c000000003693e60 0000000000200200 0000000000000000 |..... ..........| c000000003693e70 0000000000000001 0000000000000000 |................| c000000003693e80 c0000000005f0608 0000000000010000 |....._..........| c000000003693e90 0000000000000000 c00000007257fc40 |............rW.@| c000000003693ea0 ffffffffffffffff c00000007257fbd0 |............rW..| c000000003693eb0 c0000000005f31a0 c000000002364b60 |....._1......6K`| c000000003693ec0 c0000000004dcba0 fffffffffffffffc |.....M..........| kthread c000000003693ed0 c000000003693f90 00000000ffffffff |.....i?.........| c000000003693ee0 c00000000007c9f8 0000000000000000 |................| c000000003693ef0 0000000000000000 c000000000630a38 |.............c.8| c000000003693f00 0000000000000000 ffffffffffffffff |................| c000000003693f10 ffffffffffffffff 0000000000000000 |................| c000000003693f20 0000000000000000 0000000000000000 |................| c000000003693f30 0000000000000000 0000000000000000 |................| c000000003693f40 ffffffffffffffff ffffffffffffffff |................| c000000003693f50 ffffffffffffffff 4000000000400000 |........@....@..| c000000003693f60 c00000000049ee00 c0000000005f31a0 |.....I......._1.| c000000003693f70 c0000000003a0630 c00000000007c8b8 |.....:.0........| c000000003693f80 c00000007257fbc0 c0000000003c3350 |....rW.......<3P| kernel_thread c000000003693f90 0000000000000000 c0000000003c3350 |.............<3P| c000000003693fa0 c000000000025764 8000000000009032 |......Wd.......2| c000000003693fb0 0000000000800711 c00000000007c824 |...............$| c000000003693fc0 c00000000007c854 0000000020000000 |.......T.... ...| c000000003693fd0 0000000024000022 00000001ffffffff |....$.."........| c000000003693fe0 0000000000000c00 0000000000000000 |................| c000000003693ff0 0000000000000000 0000000000000000 |................| c0000000001dd0b0 <._raw_spin_lock>: c0000000001dd0b0: 7c 08 02 a6 mflr r0 c0000000001dd0b4: fb c1 ff f0 std r30,-16(r1) c0000000001dd0b8: fb e1 ff f8 std r31,-8(r1) (sp)c000000003693a60 + 144 - 8 == c000000003693ae8 c0000000001dd0bc: fb 81 ff e0 std r28,-32(r1) c0000000001dd0c0: fb a1 ff e8 std r29,-24(r1) c0000000001dd0c4: eb c2 c8 d8 ld r30,-14120(r2) c0000000001dd0c8: 7c 7f 1b 78 mr r31,r3 lock == NULL c0000000001dd0cc: f8 01 00 10 std r0,16(r1) c0000000001dd0d0: f8 21 ff 71 stdu r1,-144(r1) c0000000001dd0d4: 3c 00 de ad lis r0,-8531 c0000000001dd0d8: 60 00 00 00 nop c0000000001dd0dc: 81 23 00 04 lwz r9,4(r3) c0000000001dd0e0: 60 00 4e ad ori r0,r0,20141 c0000000001dd0e4: 7f 89 00 00 cmpw cr7,r9,r0 c0000000001dd0e8: 41 9e 00 0c beq- cr7,c0000000001dd0f4 <._raw_spin_lock+0x44> c0000000001dd0ec: e8 9e 80 18 ld r4,-32744(r30) c0000000001dd0f0: 4b ff fb 4d bl c0000000001dcc3c <.spin_bug> ... c0000000003821e0 <._spin_lock>: c0000000003821e0: 7c 08 02 a6 mflr r0 c0000000003821e4: f8 01 00 10 std r0,16(r1) c0000000003821e8: f8 21 ff 81 stdu r1,-128(r1) c0000000003821ec: 4b e5 ae c5 bl c0000000001dd0b0 <._raw_spin_lock> c0000000003821f0: 60 00 00 00 nop c0000000003821f4: 38 21 00 80 addi r1,r1,128 c0000000003821f8: e8 01 00 10 ld r0,16(r1) c0000000003821fc: 7c 08 03 a6 mtlr r0 c000000000382200: 4e 80 00 20 blr c00000000037ea1c <.klist_del>: c00000000037ea1c: 7c 08 02 a6 mflr r0 c00000000037ea20: fb a1 ff e8 std r29,-24(r1) c00000000037ea24: fb 81 ff e0 std r28,-32(r1) c00000000037ea28: 7c 7d 1b 78 mr r29,r3 n == c0000000024cad00 c00000000037ea2c: f8 01 00 10 std r0,16(r1) c00000000037ea30: f8 21 ff 71 stdu r1,-144(r1) c00000000037ea34: 60 00 00 00 nop c00000000037ea38: eb 83 00 00 ld r28,0(r3) c00000000037ea3c: 7f 83 e3 78 mr r3,r28 lock == NULL c00000000037ea40: 48 00 37 a1 bl c0000000003821e0 <._spin_lock> c00000000037ea44: 60 00 00 00 nop c00000000037ea48: 7f a3 eb 78 mr r3,r29 c00000000037ea4c: 4b ff fe f5 bl c00000000037e940 <.klist_dec_and_del> c00000000037ea50: 7f 83 e3 78 mr r3,r28 c00000000037ea54: 48 00 36 9d bl c0000000003820f0 <._spin_unlock> c00000000037ea58: 60 00 00 00 nop c00000000037ea5c: 38 21 00 90 addi r1,r1,144 c00000000037ea60: e8 01 00 10 ld r0,16(r1) c00000000037ea64: eb 81 ff e0 ld r28,-32(r1) c00000000037ea68: eb a1 ff e8 ld r29,-24(r1) c00000000037ea6c: 7c 08 03 a6 mtlr r0 c00000000037ea70: 4e 80 00 20 blr c000000000262d48 <.device_del>: c000000000262d48: 7d 80 00 26 mfcr r12 c000000000262d4c: 7c 08 02 a6 mflr r0 c000000000262d50: fb a1 ff e8 std r29,-24(r1) c000000000262d54: fb c1 ff f0 std r30,-16(r1) c000000000262d58: fb e1 ff f8 std r31,-8(r1) c000000000262d5c: fb 61 ff d8 std r27,-40(r1) c000000000262d60: 7c 7f 1b 78 mr r31,r3 dev == c0000000024cacc8 c000000000262d64: 3b a3 01 30 addi r29,r3,304 c000000000262d68: fb 81 ff e0 std r28,-32(r1) (sp)c000000003693ca0 + 160 - 32 == c000000003693c80 c000000000262d6c: eb c2 cb 90 ld r30,-13424(r2) c000000000262d70: f8 01 00 10 std r0,16(r1) c000000000262d74: 91 81 00 08 stw r12,8(r1) c000000000262d78: f8 21 ff 61 stdu r1,-160(r1) c000000000262d7c: 60 00 00 00 nop c000000000262d80: 60 00 00 00 nop c000000000262d84: eb 83 01 28 ld r28,296(r3) c000000000262d88: 38 63 00 38 addi r3,r3,56 c000000000262d8c: 2e 3c 00 00 cmpdi cr4,r28,0 c000000000262d90: 41 92 00 0c beq- cr4,c000000000262d9c <.device_del+0x54> c000000000262d94: 48 11 bc 89 bl c00000000037ea1c <.klist_del> c000000000262d98: 60 00 00 00 nop c000000000262d9c: 7f e3 fb 78 mr r3,r31 c000000000262da0: 38 9f 01 98 addi r4,r31,408 c000000000262da4: 4b ff ff 29 bl c000000000262ccc <.device_remove_file> c000000000262da8: e9 3e 80 40 ld r9,-32704(r30) c000000000262dac: 7f e3 fb 78 mr r3,r31 c000000000262db0: e9 29 00 00 ld r9,0(r9) c000000000262db4: 2f a9 00 00 cmpdi cr7,r9,0 c000000000262db8: 41 9e 00 28 beq- cr7,c000000000262de0 <.device_del+0x98> c000000000262dbc: e8 09 00 00 ld r0,0(r9) c000000000262dc0: f8 41 00 28 std r2,40(r1) c000000000262dc4: 60 00 00 00 nop c000000000262dc8: 60 00 00 00 nop c000000000262dcc: e9 69 00 10 ld r11,16(r9) c000000000262dd0: e8 49 00 08 ld r2,8(r9) c000000000262dd4: 7c 09 03 a6 mtctr r0 c000000000262dd8: 4e 80 04 21 bctrl c000000000262ddc: e8 41 00 28 ld r2,40(r1) c000000000262de0: 7f e3 fb 78 mr r3,r31 c000000000262de4: 48 00 19 bd bl c0000000002647a0 <.bus_remove_device> c000000000262de8: 60 00 00 00 nop c000000000262dec: 38 80 00 02 li r4,2 c000000000262df0: 7f a3 eb 78 mr r3,r29 c000000000262df4: 4b f7 47 e5 bl c0000000001d75d8 <.kobject_uevent> c000000000262df8: 60 00 00 00 nop c000000000262dfc: 7f a3 eb 78 mr r3,r29 c000000000262e00: 4b f7 3f 01 bl c0000000001d6d00 <.kobject_del> c000000000262e04: 60 00 00 00 nop c000000000262e08: 7f 83 e3 78 mr r3,r28 c000000000262e0c: 41 92 00 30 beq- cr4,c000000000262e3c <.device_del+0xf4> c000000000262e10: 38 21 00 a0 addi r1,r1,160 c000000000262e14: e8 01 00 10 ld r0,16(r1) c000000000262e18: 81 81 00 08 lwz r12,8(r1) c000000000262e1c: eb e1 ff f8 ld r31,-8(r1) c000000000262e20: eb 61 ff d8 ld r27,-40(r1) c000000000262e24: eb 81 ff e0 ld r28,-32(r1) c000000000262e28: eb a1 ff e8 ld r29,-24(r1) c000000000262e2c: eb c1 ff f0 ld r30,-16(r1) c000000000262e30: 7c 08 03 a6 mtlr r0 c000000000262e34: 7d 80 81 20 mtcrf 8,r12 c000000000262e38: 4b ff fd 50 b c000000000262b88 <.put_device> c000000000262e3c: 38 21 00 a0 addi r1,r1,160 c000000000262e40: e8 01 00 10 ld r0,16(r1) c000000000262e44: 81 81 00 08 lwz r12,8(r1) c000000000262e48: eb 61 ff d8 ld r27,-40(r1) c000000000262e4c: eb 81 ff e0 ld r28,-32(r1) c000000000262e50: eb a1 ff e8 ld r29,-24(r1) c000000000262e54: eb c1 ff f0 ld r30,-16(r1) c000000000262e58: eb e1 ff f8 ld r31,-8(r1) c000000000262e5c: 7c 08 03 a6 mtlr r0 c000000000262e60: 7d 80 81 20 mtcrf 8,r12 c000000000262e64: 4e 80 00 20 blr 1:mon> di d0000000001eeb84 35 d0000000001eeb84 7c0802a6 mflr r0 d0000000001eeb88 fb81ffe0 std r28,-32(r1) d0000000001eeb8c fba1ffe8 std r29,-24(r1) d0000000001eeb90 fbc1fff0 std r30,-16(r1) d0000000001eeb94 fbe1fff8 std r31,-8(r1) d0000000001eeb98 7c7c1b78 mr r28,r3 wq == c000000003d4b268 d0000000001eeb9c f8010010 std r0,16(r1) d0000000001eeba0 f821ff71 stdu r1,-144(r1) d0000000001eeba4 60000000 nop ... d0000000001eebac ebe30060 ld r31,96(r3) starget = c0000000024caca0 d0000000001eebb0 ebbf0150 ld r29,336(r31) d0000000001eebb4 48000014 b d0000000001eebc8 # .scsi_target_reap_work+0x44/0x12c [scsi_mod] d0000000001eebb8 e81d0128 ld r0,296(r29) d0000000001eebbc 2fa00000 cmpdi cr7,r0,0 d0000000001eebc0 7c1d0378 mr r29,r0 d0000000001eebc4 419e0020 beq cr7,d0000000001eebe4 # .scsi_target_reap_work+0x60/0x12c [scsi_mod] d0000000001eebc8 7fa3eb78 mr r3,r29 d0000000001eebcc 3bc00000 li r30,0 d0000000001eebd0 4bff5e35 bl d0000000001e4a04 # .scsi_is_host_device+0x0/0x2c [scsi_mod] d0000000001eebd4 60000000 nop d0000000001eebd8 2f830000 cmpwi cr7,r3,0 d0000000001eebdc 419effdc beq cr7,d0000000001eebb8 # .scsi_target_reap_work+0x34/0x12c [scsi_mod] d0000000001eebe0 3bddfe58 addi r30,r29,-424 d0000000001eebe4 7f83e378 mr r3,r28 d0000000001eebe8 48003871 bl d0000000001f2458 # .scsi_init_procfs+0x4d0/0x3140 [scsi_mod] kfree() d0000000001eebec e8410028 ld r2,40(r1) d0000000001eebf0 e87e0078 ld r3,120(r30) d0000000001eebf4 48003505 bl d0000000001f20f8 # .scsi_init_procfs+0x170/0x3140 [scsi_mod] spin_lock_irqsave() d0000000001eebf8 e8410028 ld r2,40(r1) d0000000001eebfc 813f0278 lwz r9,632(r31) d0000000001eec00 7c641b78 mr r4,r3 d0000000001eec04 3929ffff addi r9,r9,-1 d0000000001eec08 2f890000 cmpwi cr7,r9,0 d0000000001eec0c 913f0278 stw r9,632(r31) d0000000001eec10 409e0074 bne cr7,d0000000001eec84 # .scsi_target_reap_work+0x100/0x12c [scsi_mod] d0000000001eec14 e81f0018 ld r0,24(r31) d0000000001eec18 393f0018 addi r9,r31,24 d0000000001eec1c 7fa90000 cmpd cr7,r9,r0 d0000000001eec20 409e0064 bne cr7,d0000000001eec84 # .scsi_target_reap_work+0x100/0x12c [scsi_mod] d0000000001eec24 393f0008 addi r9,r31,8 d0000000001eec28 e95f0008 ld r10,8(r31) d0000000001eec2c 3bbf0028 addi r29,r31,40 d0000000001eec30 e9690008 ld r11,8(r9) d0000000001eec34 f94b0000 std r10,0(r11) d0000000001eec38 f96a0008 std r11,8(r10) d0000000001eec3c f93f0008 std r9,8(r31) d0000000001eec40 f9290008 std r9,8(r9) d0000000001eec44 e87e0078 ld r3,120(r30) d0000000001eec48 480034e1 bl d0000000001f2128 # .scsi_init_procfs+0x1a0/0x3140 [scsi_mod] spin_unlock_irqrestore() d0000000001eec4c e8410028 ld r2,40(r1) d0000000001eec50 7fa3eb78 mr r3,r29 d0000000001eec54 48003c55 bl d0000000001f28a8 # .scsi_init_procfs+0x920/0x3140 [scsi_mod] transport_remove_device() d0000000001eec58 e8410028 ld r2,40(r1) d0000000001eec5c 7fa3eb78 mr r3,r29 d0000000001eec60 48003c19 bl d0000000001f2878 # .scsi_init_procfs+0x8f0/0x3140 [scsi_mod] device_del() d0000000001eec64 e8410028 ld r2,40(r1) d0000000001eec68 7fa3eb78 mr r3,r29 d0000000001eec6c 48003c6d bl d0000000001f28d8 # .scsi_init_procfs+0x950/0x3140 [scsi_mod] transport_destroy_device() d0000000001eec70 e8410028 ld r2,40(r1) d0000000001eec74 7fa3eb78 mr r3,r29 d0000000001eec78 48003421 bl d0000000001f2098 # .scsi_init_procfs+0x110/0x3140 [scsi_mod] put_device() d0000000001eec7c e8410028 ld r2,40(r1) d0000000001eec80 48000010 b d0000000001eec90 # .scsi_target_reap_work+0x10c/0x12c [scsi_mod] d0000000001eec84 e87e0078 ld r3,120(r30) d0000000001eec88 480034a1 bl d0000000001f2128 # .scsi_init_procfs+0x1a0/0x3140 [scsi_mod] spin_unlock_irqrestore() d0000000001eec8c e8410028 ld r2,40(r1) d0000000001eec90 38210090 addi r1,r1,144 d0000000001eec94 e8010010 ld r0,16(r1) d0000000001eec98 eb81ffe0 ld r28,-32(r1) d0000000001eec9c eba1ffe8 ld r29,-24(r1) d0000000001eeca0 ebc1fff0 ld r30,-16(r1) d0000000001eeca4 ebe1fff8 ld r31,-8(r1) d0000000001eeca8 7c0803a6 mtlr r0 d0000000001eecac 4e800020 blr 1:mon> d c0000000024cacc8 c0000000024cacc8 00000000dead4ead ffffffff00000000 |......N.........| c0000000024cacd8 ffffffffffffffff c0000000024cace0 |.............L..| c0000000024cace8 c0000000024cace0 c000000000614f68 |.....L.......aOh| c0000000024cacf8 c000000000614f38 0000000000000000 |.....aO8........| c0000000024cad08 0000000000000000 0000000000000000 |................| c0000000024cad18 0000000000000000 0000000000000000 |................| c0000000024cad28 0000000000000000 0000000000000000 |................| c0000000024cad38 0000000000000000 0000000000000000 |................| c0000000024cad48 0000000000000000 0000000000000000 |................| c0000000024cad58 0000000000000000 0000000000000000 |................| c0000000024cad68 0000000000000000 0000000000000000 |................| c0000000024cad78 0000000000000000 0000000000000000 |................| c0000000024cad88 0000000000000000 0000000000000000 |................| c0000000024cad98 0000000000000000 0000000000000000 |................| c0000000024cada8 0000000000000000 0000000000000000 |................| c0000000024cadb8 0000000000000000 0000000000000000 |................| c0000000024cadc8 0000000000000000 0000000000000000 |................| c0000000024cadd8 0000000000000000 0000000000000000 |................| c0000000024cade8 0000000000000000 c00000000303a1a8 |................| c0000000024cadf8 c0000000024cae00 746172676574303a |.....L..target0:| c0000000024cae08 3235353a31303000 0000000000000006 |255:100.........| c0000000024cae18 c0000000024cae18 c0000000024cae18 |.....L.......L..| c0000000024cae28 c00000000303a2d8 c000000000500c88 |.............P..| c0000000024cae38 0000000000000000 0000000000000000 |................| c0000000024cae48 746172676574303a 3235353a31303000 |target0:255:100.| c0000000024cae58 0000000000000000 0000000000000000 |................| c0000000024cae68 0000000000000000 0000000000000000 |................| c0000000024cae78 0000000000000000 0000000000000000 |................| c0000000024cae88 0000000100000000 00000000dead4ead |..............N.| c0000000024cae98 ffffffff00000000 ffffffffffffffff |................| c0000000024caea8 c0000000024caea8 c0000000024caea8 |.....L.......L..| c0000000024caeb8 0000000000000000 0000000000000000 |................| c0000000024caec8 0000000000000000 0000000000000000 |................| c0000000024caed8 0000000000000000 0000000000000000 |................| c0000000024caee8 0000000000000000 0000000000000000 |................| c0000000024caef8 c0000000024caef8 c0000000024caef8 |.....L.......L..| c0000000024caf08 0000000000000000 d0000000002190c0 |.............!..| c0000000024caf18 00000000000000ff 0000006400000000 |...........d....| c0000000024caf28 0000000000000000 5a5a5a5a5a5a5a5a |........ZZZZZZZZ| c0000000024caf38 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf48 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf58 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf68 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf78 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf88 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caf98 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafa8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafb8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafc8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafd8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cafe8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024caff8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb008 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb018 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb028 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb038 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb048 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb058 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb068 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb078 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb088 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb098 5a5a5a5a5a5a5aa5 00000000170fc2a5 |ZZZZZZZ.........| c0000000024cb0a8 d0000000001eea84 00000000170fc2a5 |................| c0000000024cb0b8 0000000100000000 c0000000024d0e98 |.............M..| c0000000024cb0c8 c0000000024cb0c8 c0000000024cb0c8 |.....L.......L..| c0000000024cb0d8 c0000000024cb0d8 c0000000024cb0d8 |.....L.......L..| c0000000024cb0e8 c0000000024cb0e8 c0000000024cb0e8 |.....L.......L..| c0000000024cb0f8 c0000000024cb0f8 c0000000024cb0f8 |.....L.......L..| c0000000024cb108 c0000000024cb108 c0000000024cb108 |.....L.......L..| c0000000024cb118 c0000000024cb118 c0000000024cb118 |.....L.......L..| c0000000024cb128 c0000000024cb128 c0000000024cb128 |.....L.(.....L.(| c0000000024cb138 c0000000024cb138 c0000000024cb138 |.....L.8.....L.8| c0000000024cb148 c0000000024cb148 c0000000024cb148 |.....L.H.....L.H| c0000000024cb158 c0000000024cb158 c0000000024cb158 |.....L.X.....L.X| c0000000024cb168 c0000000024cb168 c0000000024cb168 |.....L.h.....L.h| c0000000024cb178 0000000000000000 c0000000024cb180 |.............L..| c0000000024cb188 c0000000024cb180 c000000003dcbb50 |.....L.........P| c0000000024cb198 c000000003dcbd68 0000002000000000 |.......h... ....| c0000000024cb1a8 c000000003dc8560 0000000000000000 |.......`........| c0000000024cb1b8 0000000000000000 0000000000000000 |................| c0000000024cb1c8 0000000000000000 c00000000060c1c0 |.............`..| c0000000024cb1d8 c0000000024cb0b8 c00000000231fc00 |.....L.......1..| c0000000024cb1e8 0000000000000000 c0000000024cb1f0 |.............L..| c0000000024cb1f8 c0000000024cb1f0 c00000000060c178 |.....L.......`.x| c0000000024cb208 c0000000024d0e98 0000000000000000 |.....M..........| c0000000024cb218 0000000000000000 0000000000000000 |................| c0000000024cb228 0000000000000000 0000000000000000 |................| c0000000024cb238 0000000000000000 c00000000231fc00 |.............1..| c0000000024cb248 0000000000000000 0000000000000000 |................| c0000000024cb258 0000000000000000 0000000000000000 |................| c0000000024cb268 0000000000000000 0000000000000000 |................| c0000000024cb278 0000000000000000 c00000000060c190 |.............`..| c0000000024cb288 c0000000024cb0b8 c00000000231fc00 |.....L.......1..| c0000000024cb298 0000000000000000 0000000000000000 |................| c0000000024cb2a8 0000000000000004 0000000800000019 |................| c0000000024cb2b8 0000000c00000002 0000400000000004 |..........@.....| c0000000024cb2c8 0000000a00000002 0000000100000002 |................| c0000000024cb2d8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb2e8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb2f8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb308 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb318 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb328 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb338 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb348 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb358 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb368 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb378 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb388 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb398 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb3a8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb3b8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb3c8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| c0000000024cb3d8 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a |ZZZZZZZZZZZZZZZZ| wq: 1:mon> d c000000003d4b268 104 c000000003d4b268 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b278 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b288 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b298 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b2a8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b2b8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b2c8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b2d8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6ba5 |kkkkkkkkkkkkkkk.| c000000003d4b2e8 000000005a2cf071 d0000000001eebec |....Z,.q........| c000000003d4b2f8 000000005a2cf071 6b6b6b6b6b6b6b6b |....Z,.qkkkkkkkk| c000000003d4b308 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b318 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b328 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b338 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b348 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b358 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b |kkkkkkkkkkkkkkkk| c000000003d4b368 6b6b6b6b |kkkk | ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 23:29 ` Olaf Hering @ 2006-02-11 10:34 ` Olaf Hering 2006-02-20 23:00 ` Brian King 1 sibling, 0 replies; 24+ messages in thread From: Olaf Hering @ 2006-02-11 10:34 UTC (permalink / raw) To: Brian King; +Cc: James Bottomley, linux-scsi On Sat, Feb 11, Olaf Hering wrote: > On Fri, Feb 10, Brian King wrote: > > > > release = 0xd0000000002190c0}, reap_ref = 0, channel = 255, id = 100, > > > create = 0, scsi_level = 0 '\0', hostdata = 0x0, starget_data = > > > 0x1ffffffec60} > > ^^^^^^^^^^^^^ > > > > This is interesting.. This means that it is on ipr's logical scsi bus > > (255). Assuming this is > > happening at boot time and not due to a user initiated scan through sysfs, > > this > > means we are going through the scsi_add_device path, rather than the > > scsi_scan_host > > path... > > I forgot to attach my notes from the crash, it contains parts of the bootlog. poking around a bit further (after reboot and another crash): cpu0 idle cpu1 idle cpu2 udevd runs, starts to do lstat on 2:mon> d 00000000ffdd3560 123 00000000ffdd3560 2f7379732f646576 696365732f706369 |/sys/devices/pci| 00000000ffdd3570 303030303a303000 303030303a30303a |0000:00.0000:00:| 00000000ffdd3580 30322e3000303030 303a63303a30312e |02.0.0000:c0:01.| 00000000ffdd3590 3000686f73743000 746172676574303a |0.host0.target0:| 00000000ffdd35a0 303a3300303a303a 333a3000f800f018 |0:3.0:0:3:0.....| cpu3 udevd runs, starts to open 3:mon> d 00000000ffdd3aa8 123 00000000ffdd3aa8 2f6465762f2e7564 65762f64622f626c |/dev/.udev/db/bl| 00000000ffdd3ab8 6f636b4073646100 ffdd3ad01000a608 |ock@sda...:.....| cpu4 vol_id runs, about to compat_sys_execve /sbin/vol_id cpu5 crashed in events/5 cpu6 idle cpu7 vol_id runs, starts to open /lib/libc.so.6 -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 23:29 ` Olaf Hering 2006-02-11 10:34 ` Olaf Hering @ 2006-02-20 23:00 ` Brian King 2006-02-22 8:36 ` Olaf Hering 1 sibling, 1 reply; 24+ messages in thread From: Brian King @ 2006-02-20 23:00 UTC (permalink / raw) To: Olaf Hering; +Cc: James Bottomley, linux-scsi [-- Attachment #1: Type: text/plain, Size: 2493 bytes --] Olaf Hering wrote: > 1:mon> d c0000000024cacc8 > c0000000024cacc8 00000000dead4ead ffffffff00000000 |......N.........| > c0000000024cacd8 ffffffffffffffff c0000000024cace0 |.............L..| > c0000000024cace8 c0000000024cace0 c000000000614f68 |.....L.......aOh| > c0000000024cacf8 c000000000614f38 0000000000000000 |.....aO8........| > c0000000024cad08 0000000000000000 0000000000000000 |................| > c0000000024cad18 0000000000000000 0000000000000000 |................| > c0000000024cad28 0000000000000000 0000000000000000 |................| > c0000000024cad38 0000000000000000 0000000000000000 |................| > c0000000024cad48 0000000000000000 0000000000000000 |................| > c0000000024cad58 0000000000000000 0000000000000000 |................| > c0000000024cad68 0000000000000000 0000000000000000 |................| > c0000000024cad78 0000000000000000 0000000000000000 |................| > c0000000024cad88 0000000000000000 0000000000000000 |................| > c0000000024cad98 0000000000000000 0000000000000000 |................| > c0000000024cada8 0000000000000000 0000000000000000 |................| > c0000000024cadb8 0000000000000000 0000000000000000 |................| > c0000000024cadc8 0000000000000000 0000000000000000 |................| > c0000000024cadd8 0000000000000000 0000000000000000 |................| I've now seen a couple recreates of this problem on various systems in our labs, and there are always a bunch of zeroes in the struct device in the same place as above. I wonder if perhaps the call to device_add is failing in scsi_alloc_target. Failure of this call is not being handled today. Can you give the attached patch a try? > c0000000024cade8 0000000000000000 c00000000303a1a8 |................| > c0000000024cadf8 c0000000024cae00 746172676574303a |.....L..target0:| > c0000000024cae08 3235353a31303000 0000000000000006 |255:100.........| > c0000000024cae18 c0000000024cae18 c0000000024cae18 |.....L.......L..| > c0000000024cae28 c00000000303a2d8 c000000000500c88 |.............P..| > c0000000024cae38 0000000000000000 0000000000000000 |................| > c0000000024cae48 746172676574303a 3235353a31303000 |target0:255:100.| > c0000000024cae58 0000000000000000 0000000000000000 |................| > c0000000024cae68 0000000000000000 0000000000000000 |................| > c0000000024cae78 0000000000000000 0000000000000000 |................| -- Brian King eServer Storage I/O IBM Linux Technology Center [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: scsi_alloc_target_device_add_failure.patch --] [-- Type: text/x-patch; name="scsi_alloc_target_device_add_failure.patch", Size: 1229 bytes --] Signed-off-by: Brian King <brking@us.ibm.com> --- linux-2.6-bjking1/drivers/scsi/scsi_scan.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff -puN drivers/scsi/scsi_scan.c~scsi_alloc_target_device_add_failure drivers/scsi/scsi_scan.c --- linux-2.6/drivers/scsi/scsi_scan.c~scsi_alloc_target_device_add_failure 2006-02-20 14:55:13.000000000 -0600 +++ linux-2.6-bjking1/drivers/scsi/scsi_scan.c 2006-02-20 16:51:15.000000000 -0600 @@ -361,7 +361,15 @@ static struct scsi_target *scsi_alloc_ta spin_unlock_irqrestore(shost->host_lock, flags); /* allocate and add */ transport_setup_device(dev); - device_add(dev); + if (device_add(dev)) { + spin_lock_irqsave(shost->host_lock, flags); + list_del_init(&starget->siblings); + spin_unlock_irqrestore(shost->host_lock, flags); + transport_destroy_device(dev); + put_device(parent); + kfree(starget); + return NULL; + } transport_add_device(dev); if (shost->hostt->target_alloc) { int error = shost->hostt->target_alloc(starget); @@ -403,7 +411,6 @@ static void scsi_target_reap_usercontext transport_destroy_device(&starget->dev); put_device(&starget->dev); return; - } spin_unlock_irqrestore(shost->host_lock, flags); _ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-20 23:00 ` Brian King @ 2006-02-22 8:36 ` Olaf Hering 2006-02-22 14:38 ` Brian King 0 siblings, 1 reply; 24+ messages in thread From: Olaf Hering @ 2006-02-22 8:36 UTC (permalink / raw) To: Brian King; +Cc: James Bottomley, linux-scsi On Mon, Feb 20, Brian King wrote: > Olaf Hering wrote: > > 1:mon> d c0000000024cacc8 > > c0000000024cacc8 00000000dead4ead ffffffff00000000 |......N.........| > > c0000000024cacd8 ffffffffffffffff c0000000024cace0 |.............L..| > > c0000000024cace8 c0000000024cace0 c000000000614f68 |.....L.......aOh| > > c0000000024cacf8 c000000000614f38 0000000000000000 |.....aO8........| > > c0000000024cad08 0000000000000000 0000000000000000 |................| > > c0000000024cad18 0000000000000000 0000000000000000 |................| > > c0000000024cad28 0000000000000000 0000000000000000 |................| > > c0000000024cad38 0000000000000000 0000000000000000 |................| > > c0000000024cad48 0000000000000000 0000000000000000 |................| > > c0000000024cad58 0000000000000000 0000000000000000 |................| > > c0000000024cad68 0000000000000000 0000000000000000 |................| > > c0000000024cad78 0000000000000000 0000000000000000 |................| > > c0000000024cad88 0000000000000000 0000000000000000 |................| > > c0000000024cad98 0000000000000000 0000000000000000 |................| > > c0000000024cada8 0000000000000000 0000000000000000 |................| > > c0000000024cadb8 0000000000000000 0000000000000000 |................| > > c0000000024cadc8 0000000000000000 0000000000000000 |................| > > c0000000024cadd8 0000000000000000 0000000000000000 |................| > > I've now seen a couple recreates of this problem on various systems in > our labs, and there are always a bunch of zeroes in the struct device > in the same place as above. I wonder if perhaps the call to device_add > is failing in scsi_alloc_target. Failure of this call is not being handled > today. Can you give the attached patch a try? This fixes it, tested with plain rc3. Lots of -EEXIST, I wonder if the real bug is elsewhere. cat /root/rocket/cranberry_full.20.log | strings | env -i grep -w device_add | sort | uniq -c 2 scsi_alloc_target(367): device_add for 'target0:255:107' failed with -17 3 scsi_alloc_target(367): device_add for 'target0:255:110' failed with -17 3 scsi_alloc_target(367): device_add for 'target0:255:114' failed with -17 2 scsi_alloc_target(367): device_add for 'target0:255:37' failed with -17 1 scsi_alloc_target(367): device_add for 'target0:255:39' failed with -17 @@ -361,7 +362,17 @@ static struct scsi_target *scsi_alloc_ta spin_unlock_irqrestore(shost->host_lock, flags); /* allocate and add */ transport_setup_device(dev); - device_add(dev); + err = device_add(dev); + if (err) { + printk(KERN_EMERG "%s(%u): device_add for '%s' failed with %d\n",__FUNCTION__,__LINE__,dev->bus_id,err); + spin_lock_irqsave(shost->host_lock, flags); + list_del_init(&starget->siblings); + spin_unlock_irqrestore(shost->host_lock, flags); + transport_destroy_device(dev); + put_device(parent); + kfree(starget); + return NULL; + } transport_add_device(dev); ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-22 8:36 ` Olaf Hering @ 2006-02-22 14:38 ` Brian King 2006-02-22 15:53 ` Olaf Hering 2006-02-22 16:47 ` Mike Anderson 0 siblings, 2 replies; 24+ messages in thread From: Brian King @ 2006-02-22 14:38 UTC (permalink / raw) To: Olaf Hering; +Cc: James Bottomley, linux-scsi Olaf Hering wrote: > On Mon, Feb 20, Brian King wrote: > >> Olaf Hering wrote: >>> 1:mon> d c0000000024cacc8 >>> c0000000024cacc8 00000000dead4ead ffffffff00000000 |......N.........| >>> c0000000024cacd8 ffffffffffffffff c0000000024cace0 |.............L..| >>> c0000000024cace8 c0000000024cace0 c000000000614f68 |.....L.......aOh| >>> c0000000024cacf8 c000000000614f38 0000000000000000 |.....aO8........| >>> c0000000024cad08 0000000000000000 0000000000000000 |................| >>> c0000000024cad18 0000000000000000 0000000000000000 |................| >>> c0000000024cad28 0000000000000000 0000000000000000 |................| >>> c0000000024cad38 0000000000000000 0000000000000000 |................| >>> c0000000024cad48 0000000000000000 0000000000000000 |................| >>> c0000000024cad58 0000000000000000 0000000000000000 |................| >>> c0000000024cad68 0000000000000000 0000000000000000 |................| >>> c0000000024cad78 0000000000000000 0000000000000000 |................| >>> c0000000024cad88 0000000000000000 0000000000000000 |................| >>> c0000000024cad98 0000000000000000 0000000000000000 |................| >>> c0000000024cada8 0000000000000000 0000000000000000 |................| >>> c0000000024cadb8 0000000000000000 0000000000000000 |................| >>> c0000000024cadc8 0000000000000000 0000000000000000 |................| >>> c0000000024cadd8 0000000000000000 0000000000000000 |................| >> I've now seen a couple recreates of this problem on various systems in >> our labs, and there are always a bunch of zeroes in the struct device >> in the same place as above. I wonder if perhaps the call to device_add >> is failing in scsi_alloc_target. Failure of this call is not being handled >> today. Can you give the attached patch a try? > > This fixes it, tested with plain rc3. Lots of -EEXIST, I wonder if the real bug is elsewhere. I would guess that the -EEXIST is coming from: create_dir sysfs_create_dir create_dir kobject_add device_add Looking at the scsi_target reap code, it looks like there is a race condition. The target is removed from the hosts list of targets under the host lock, then the host lock is released. If another thread tries to add the same target that is being tore down at this point (before device_del), the device_add will fail with EEXIST since the sysfs directory for the device still exists. Any reason we can't protect the target reaping code from this by grabbing the scan_mutex? Brian -- Brian King eServer Storage I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-22 14:38 ` Brian King @ 2006-02-22 15:53 ` Olaf Hering 2006-02-22 16:47 ` Mike Anderson 1 sibling, 0 replies; 24+ messages in thread From: Olaf Hering @ 2006-02-22 15:53 UTC (permalink / raw) To: Brian King; +Cc: James Bottomley, linux-scsi On Wed, Feb 22, Brian King wrote: > I would guess that the -EEXIST is coming from: > > create_dir You are right: lib/kobject.c:create_dir(57) sysfs_create_dir failed to create 'target0:255:122' 'target0:255:122' with error -17 +++ linux-2.6.16-rc3-olh/lib/kobject.c @@ -53,6 +53,8 @@ static int create_dir(struct kobject * k if ((error = populate_dir(kobj))) sysfs_remove_dir(kobj); } + else + printk(KERN_EMERG "%s:%s(%u) sysfs_create_dir failed to create '%s' '%s' with error %d\n",__FILE__,__FUNCTION__,__LINE__,kobj->k_name,kobj->name,error); } return error; } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-22 14:38 ` Brian King 2006-02-22 15:53 ` Olaf Hering @ 2006-02-22 16:47 ` Mike Anderson 2006-02-22 17:05 ` James Bottomley 1 sibling, 1 reply; 24+ messages in thread From: Mike Anderson @ 2006-02-22 16:47 UTC (permalink / raw) To: Brian King; +Cc: Olaf Hering, James Bottomley, linux-scsi Brian King <brking@us.ibm.com> wrote: > I would guess that the -EEXIST is coming from: > > create_dir > sysfs_create_dir > create_dir > kobject_add > device_add > > Looking at the scsi_target reap code, it looks like there is a race condition. The > target is removed from the hosts list of targets under the host lock, then the host > lock is released. If another thread tries to add the same target that is being > tore down at this point (before device_del), the device_add will fail with EEXIST > since the sysfs directory for the device still exists. > > Any reason we can't protect the target reaping code from this by grabbing the > scan_mutex? Another manifestation is a bug I was recently looking at where a BUG_ON is triggered in the aic7xxx driver if someone is removing and adding devices repeatably. "Feb 8 14:21:52 test klogd: kernel BUG at drivers/scsi/aic7xxx/aic7xxx_osm.c:535!" The scan mutex I believe would not help in the case I describe above as the issue in this instance is the widow between the call of "list_del_init(&starget->siblings);" and the call to target_destroy. -andmike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-22 16:47 ` Mike Anderson @ 2006-02-22 17:05 ` James Bottomley 0 siblings, 0 replies; 24+ messages in thread From: James Bottomley @ 2006-02-22 17:05 UTC (permalink / raw) To: Mike Anderson; +Cc: Brian King, Olaf Hering, linux-scsi On Wed, 2006-02-22 at 08:47 -0800, Mike Anderson wrote: > Another manifestation is a bug I was recently looking at where a BUG_ON is > triggered in the aic7xxx driver if someone is removing and adding devices > repeatably. > > "Feb 8 14:21:52 test klogd: kernel BUG at > drivers/scsi/aic7xxx/aic7xxx_osm.c:535!" > > The scan mutex I believe would not help in the case I describe above as > the issue in this instance is the widow between the call of > "list_del_init(&starget->siblings);" and the call to target_destroy. Right, I have a fix for this which also gets around the execute_in_process_context() multiple times problem too ... I just need to find time to clean it up and post it. James ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.16-rc1 crash in scsi_target_reap_work 2006-02-10 14:04 ` James Bottomley 2006-02-10 14:10 ` Olaf Hering @ 2006-02-10 21:28 ` Brian King 1 sibling, 0 replies; 24+ messages in thread From: Brian King @ 2006-02-10 21:28 UTC (permalink / raw) To: James Bottomley; +Cc: Olaf Hering, linux-scsi James Bottomley wrote: > On Fri, 2006-02-10 at 11:11 +0100, Olaf Hering wrote: >>550 reboots without crash, with this patch reverted. >>Will try the execute_in_process_context thing now. > > I wouldn't bother ... because of the structure, the > execute_in_process_context() patch must have the same bug, but the > context check will make it much more difficult to hit. > > Go back to the original and see if you can diagnose what is NULL and > why. The target is supposed to have a reference on the parent, so what > was the parent in this case? If it's a host, there's nothing I can > think of that can produce the behaviour you see; if it's something else, > like an rport or phy then we may have a transport class issue. Taking a quick look at the scsi_target_reap_work patch that went in, one of the things it changed in this regard was the fact that the caller of scsi_target_reap always has a ref to the starget->dev, which would have protected scsi_target_reap from ever releasing the starget->dev. Since the actual remove work was moved to a workqueue, the get and put that the callers of scsi_target_reap are doing is no longer adding the same protection it was before. -- Brian King eServer Storage I/O IBM Linux Technology Center ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.15-git12, slab corruption in ipr 2006-01-17 0:05 2.6.15-git12, slab corruption in ipr Olaf Hering 2006-01-18 18:42 ` Brian King @ 2006-01-30 18:07 ` Olaf Hering 1 sibling, 0 replies; 24+ messages in thread From: Olaf Hering @ 2006-01-30 18:07 UTC (permalink / raw) To: linux-scsi, Brian J King On Tue, Jan 17, Olaf Hering wrote: > I tested the current Linus tree + a few SuSE patches on a p710. There is > some slab corruption. Will check if plain Linus tree gives the same... > > dmesg | grep -wiC9 slab > sda: Write Protect is off > sda: Mode Sense: cb 00 00 08 > SCSI device sda: drive cache: write through > SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) > sda: Write Protect is off > sda: Mode Sense: cb 00 00 08 > SCSI device sda: drive cache: write through > sda: sda1 sda2 sda3 sda4 > sd 0:0:3:0: Attached scsi disk sda > Slab corruption: start=c000000000431000, len=4096 > d80: c0 00 00 00 00 43 1d 80 c0 00 00 00 00 43 1d 80 > Vendor: IBM Model: IC35L036UCDY10-0 Rev: S28G > Type: Direct-Access ANSI SCSI revision: 03 I made https://bugzilla.novell.com/show_bug.cgi?id=145459 public. No idea if this is one bug (mem corruption on ppc64), or if there are plenty of them. -- short story of a lazy sysadmin: alias appserv=wotan ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-02-22 17:06 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-17 0:05 2.6.15-git12, slab corruption in ipr Olaf Hering 2006-01-18 18:42 ` Brian King 2006-01-19 21:05 ` Olaf Hering 2006-01-30 10:46 ` Olaf Hering 2006-01-30 16:49 ` Olaf Hering 2006-02-06 22:04 ` 2.6.16-rc1 crash in scsi_target_reap_work Olaf Hering 2006-02-06 22:26 ` Olaf Hering 2006-02-06 22:44 ` James Bottomley 2006-02-09 20:05 ` Olaf Hering 2006-02-10 10:11 ` Olaf Hering 2006-02-10 14:04 ` James Bottomley 2006-02-10 14:10 ` Olaf Hering 2006-02-10 23:01 ` Olaf Hering 2006-02-10 23:21 ` Brian King 2006-02-10 23:29 ` Olaf Hering 2006-02-11 10:34 ` Olaf Hering 2006-02-20 23:00 ` Brian King 2006-02-22 8:36 ` Olaf Hering 2006-02-22 14:38 ` Brian King 2006-02-22 15:53 ` Olaf Hering 2006-02-22 16:47 ` Mike Anderson 2006-02-22 17:05 ` James Bottomley 2006-02-10 21:28 ` Brian King 2006-01-30 18:07 ` 2.6.15-git12, slab corruption in ipr Olaf Hering
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).