From: Olaf Hering <olh@suse.de>
To: James Smart <James.Smart@Emulex.Com>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH 0/22] lpfc 8.1.2 driver update, crash in lpfc_worker_0
Date: Sun, 12 Feb 2006 15:57:36 +0100 [thread overview]
Message-ID: <20060212145736.GA23765@suse.de> (raw)
In-Reply-To: <43EA10FC.3030008@emulex.com>
On Wed, Feb 08, James Smart wrote:
> This patch set updates the lpfc driver to revision 8.1.2, which includes
James, we have this driver now.
Today I got this crash during bootup on a p620 (4 RS64 cpus), it ran
-git9 before, now -git11. A quick look at the changes did not show
anything related.
both cpu1 and cpu3 had a invalid data access at the same time,
no idea which one came first.
It came up on after a second try.
Maybe there is an obvious error in the new code,
maybe 303 patches on top of Linus tree are bad.
...
Linux version 2.6.16-rc2-git11-20060212084234-ppc64 (geeko@buildhost) (gcc version 4.1.0 20060210 (prerelease) (SUSE Linux)) #1 SMP Sun Feb 12 08:42:34 UTC 2006
...
Kernel command line: root=/dev/md0 xmon=on kdb=on sysrq=1 selinux=0 elevator=cfq splash=silent desktop
...
Loading scsi_mod
SCSI subsystem initialized
Loading sd_mod
Loading scsi_transport_spi
Loading sym53c8xx
sym0: <896> rev 0x7 at pci 0000:01:01.0 irq 35
sym0: No NVRAM, ID 7, Fast-40, SE, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.2
Vendor: IBM Model: CDRM00203 !K Rev: 1_05
Type: CD-ROM ANSI SCSI revision: 02
target0:0:1: Beginning Domain Validation
target0:0:1: asynchronous
target0:0:1: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 15)
target0:0:1: Domain Validation skipping write tests
target0:0:1: Ending Domain Validation
target0:0:2: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 31)
Vendor: IBM Model: ST318305LC Rev: C505
Type: Direct-Access ANSI SCSI revision: 03
target0:0:2: tagged command queuing enabled, command queue depth 16.
target0:0:2: Beginning Domain Validation
target0:0:2: asynchronous
target0:0:2: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 31)
target0:0:2: Domain Validation skipping write tests
target0:0:2: Ending Domain Validation
sr0: scsi-1 drive
SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
sda: sda1 sda2
sd 0:0:2:0: Attached scsi disk sda
Uniform CD-ROM driver Revision: 3.20
scsi_id[1068]: ssr 0:0:1:0: Attached scsi generic sg0 type 5
csi_id: unable tsd 0:0:2:0: Attached scsi generic sg1 type 0
o access parent device of '/block/sda'
sym1: <896> rev 0x7 at pci 0000:01:01.1 irq 34
sym1: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym1: SCSI BUS has been reset.
scsi1 : sym-2.2.2
Loading scsi_transport_fc
Loading lpfc
Emulex LightPulse Fibre Channel SCSI driver 8.1.2
Copyright(c) 2004-2006 Emulex. All rights reserved.
scsi2 : on PCI bus 21 device 08 irq 55
lpfc 0001:21:01.0: 0:1303 Link Up Event x1 received Data: x1 x1 x4 xa9
Vendor: IBM Model: 2105F20 Rev: 1.94
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdb: 12500032 512-byte hdwr sectors (6400 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
SCSI device sdb: 12500032 512-byte hdwr sectors (6400 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
sdb: sdb1 sdb2 sdb3
md: raid0 personality registered for level 0
scsi_sid[1186]: d 2:0:0:0: Attached scsi disk sdb
csi_id: unable tsd 2:0:0:0: Attached scsi generic sg2 type 0
o access parent Vendor: device of '/blocIk/sdb'
WaitingBM for udev to set tle: scsi_id[112]: scsi_id: una ble to access pa rent device of Model: /block/sdb'
2105F20 Rev: 1.94
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdc: 12500032 512-byte hdwr sectors (6400 MB)
Decbpuug: 0x3 :s Velecteorpi: 30n0 (gD afutanc Atcicesos)n acat l[cl0e00d 00fr00o3fm bib2envba0]
id c o pnct: ecx0t0 a00t0 i00n00c052lu22dce/: l.inreusxc/hpaedg_etmaaps.k+0hx:314/06xc08 l
lr: c000000000053adc: .try_to_wake_up+0x4a8/0x51c
sp: c00000003fbb3130
msr: a000000000001032
dar: c00001800048acb0
dsisr: 40000000
current = 0xc0000000034d07f0
paca = 0xc00000000048b280
pid = 1167, comm = lpfc_worker_0
enter ? for help
[c00000003fbb31b0] c000000000053adc .try_to_wake_up+0x4a8/0x51c
[c00000003fbb3290] c000000000051c4c .__wake_up_common+0x68/0xe0
[c00000003fbb3340] c000000000055214 .__wake_up+0x54/0x88
[c00000003fbb33f0] c0000000002daafc .sock_def_readable+0x54/0xa8
[c00000003fbb3480] c00000000030115c .netlink_broadcast+0x344/0x50c
[c00000003fbb3560] c0000000001c6378 .kobject_uevent+0x41c/0x4dc
[c00000003fbb3650] c000000000251c94 .class_device_add+0x240/0x398
[c00000003fbb3700] c000000000254f74 .attribute_container_add_class_device+0x18/0x50
[c00000003fbb3780] c0000000002556c8 .transport_add_class_device+0x24/0x68
[c00000003fbb3810] c000000000254e3c .attribute_container_device_trigger+0x124/0x1c8
[c00000003fbb38d0] c0000000002555e0 .transport_add_device+0x1c/0x34
[c00000003fbb3950] d000000000052bb0 .fc_rport_create+0x270/0x36c [scsi_transport_fc]
[c00000003fbb3a10] d0000000001504c4 .lpfc_nlp_list+0x988/0xaf8 [lpfc]
[c00000003fbb3af0] d000000000158f68 .lpfc_cmpl_reglogin_reglogin_issue+0x150/0x174 [lpfc]
[c00000003fbb3b80] d000000000157a58 .lpfc_disc_state_machine+0xd4/0x1c8 [lpfc]
[c00000003fbb3c20] d000000000151510 .lpfc_mbx_cmpl_reg_login+0x44/0x94 [lpfc]
[c00000003fbb3cc0] d000000000144b40 .lpfc_sli_handle_mb_event+0x418/0x5c8 [lpfc]
[c00000003fbb3da0] d000000000152550 .lpfc_do_work+0x1c0/0xbf8 [lpfc]
[c00000003fbb3ee0] c000000000079250 .kthread+0x128/0x178
[c00000003fbb3f90] c000000000024adc .kernel_thread+0x4c/0x68
3:mon> cpu 0x1: Vector: 300 (Data Access) at [c00000002f806d30]
pc: c00000000000ecdc: .validate_sp+0x30/0x88
lr: c00000000000ee14: .show_stack+0xe0/0x1b0
sp: c00000002f806fb0
msr: a000000000001032
dar: c000000600627b20
dsisr: 40000000
current = 0xc0000000fe9587f0
paca = 0xc00000000048ae80
pid = 1205, comm = vol_id
...
3:mon> e
cpu 0x3: Vector: 300 (Data Access) at [c00000003fbb2eb0]
pc: c00000000005222c: .resched_task+0x34/0xc0
lr: c000000000053adc: .try_to_wake_up+0x4a8/0x51c
sp: c00000003fbb3130
msr: a000000000001032
dar: c00001800048acb0
dsisr: 40000000
current = 0xc0000000034d07f0
paca = 0xc00000000048b280
pid = 1167, comm = lpfc_worker_0
3:mon> t
[c00000003fbb31b0] c000000000053adc .try_to_wake_up+0x4a8/0x51c
[c00000003fbb3290] c000000000051c4c .__wake_up_common+0x68/0xe0
[c00000003fbb3340] c000000000055214 .__wake_up+0x54/0x88
[c00000003fbb33f0] c0000000002daafc .sock_def_readable+0x54/0xa8
[c00000003fbb3480] c00000000030115c .netlink_broadcast+0x344/0x50c
[c00000003fbb3560] c0000000001c6378 .kobject_uevent+0x41c/0x4dc
[c00000003fbb3650] c000000000251c94 .class_device_add+0x240/0x398
[c00000003fbb3700] c000000000254f74 .attribute_container_add_class_device+0x18/0x50
[c00000003fbb3780] c0000000002556c8 .transport_add_class_device+0x24/0x68
[c00000003fbb3810] c000000000254e3c .attribute_container_device_trigger+0x124/0x1c8
[c00000003fbb38d0] c0000000002555e0 .transport_add_device+0x1c/0x34
[c00000003fbb3950] d000000000052bb0 .fc_rport_create+0x270/0x36c [scsi_transport_fc]
[c00000003fbb3a10] d0000000001504c4 .lpfc_nlp_list+0x988/0xaf8 [lpfc]
[c00000003fbb3af0] d000000000158f68 .lpfc_cmpl_reglogin_reglogin_issue+0x150/0x174 [lpfc]
[c00000003fbb3b80] d000000000157a58 .lpfc_disc_state_machine+0xd4/0x1c8 [lpfc]
[c00000003fbb3c20] d000000000151510 .lpfc_mbx_cmpl_reg_login+0x44/0x94 [lpfc]
[c00000003fbb3cc0] d000000000144b40 .lpfc_sli_handle_mb_event+0x418/0x5c8 [lpfc]
[c00000003fbb3da0] d000000000152550 .lpfc_do_work+0x1c0/0xbf8 [lpfc]
[c00000003fbb3ee0] c000000000079250 .kthread+0x128/0x178
[c00000003fbb3f90] c000000000024adc .kernel_thread+0x4c/0x68
3:mon> r
R00 = c00000000048ac80 R16 = 0000000000000000
R01 = c00000003fbb3130 R17 = 0000000000000000
R02 = c000000000625be0 R18 = 0000000000000000
R03 = c0000000fe9587f0 R19 = c00000000fdbca08
R04 = c000000004a55140 R20 = 0000000000000000
R05 = 0000000000000000 R21 = 0000000000000001
R06 = c000000004a6cf70 R22 = 0000000000000001
R07 = c000000003393870 R23 = a000000000001032
R08 = c0000000fe9587f0 R24 = 0000000000000003
R09 = c00001800048ac80 R25 = c0000000035a87f0
R10 = c000000004a55850 R26 = 0000000000000001
R11 = c0000000004341c0 R27 = 0000000000000001
R12 = 0000000000000000 R28 = c000000004a54f70
R13 = c00000000048b280 R29 = 0000000000000001
R14 = 0000000000000000 R30 = c0000000004c7e78
R15 = 0000000000000000 R31 = c000000004a550c0
pc = c00000000005222c .resched_task+0x34/0xc0
lr = c000000000053adc .try_to_wake_up+0x4a8/0x51c
msr = a000000000001032 cr = 24000088
ctr = c000000000053b50 xer = 0000000020000000 trap = 300
dar = c00001800048acb0 dsisr = 40000000
cpu2 idle.
1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c00000002f806d30]
pc: c00000000000ecdc: .validate_sp+0x30/0x88
lr: c00000000000ee14: .show_stack+0xe0/0x1b0
sp: c00000002f806fb0
msr: a000000000001032
dar: c000000600627b20
dsisr: 40000000
current = 0xc0000000fe9587f0
paca = 0xc00000000048ae80
pid = 1205, comm = vol_id
1:mon> t
[link register ] c00000000000ee14 .show_stack+0xe0/0x1b0
[c00000002f806fb0] c00000000000ee00 .show_stack+0xcc/0x1b0 (unreliable)
[c00000002f807050] c0000000001cb214 ._raw_spin_lock+0x120/0x164
[c00000002f8070e0] c00000000036bf44 ._spin_lock+0x10/0x24
[c00000002f807160] c00000000005440c .scheduler_tick+0xf4/0x3ec
[c00000002f807210] c00000000006aa44 .update_process_times+0x7c/0xa8
[c00000002f8072a0] c000000000021100 .timer_interrupt+0x94/0x404
[c00000002f807380] c0000000000034b4 decrementer_common+0xb4/0x100
--- Exception: 901 (Decrementer) at c00000000005cd64 .release_console_sem+0x1c4/0x284
[c00000002f807720] c00000000005d7dc .vprintk+0x330/0x388
[c00000002f807840] c00000000005d86c .printk+0x38/0x48
[c00000002f8078d0] c0000000000524e4 .__might_sleep+0x98/0xf4
[c00000002f807950] c000000000092468 .do_generic_mapping_read+0x1fc/0x4dc
[c00000002f807aa0] c00000000009307c .__generic_file_aio_read+0x184/0x22c
[c00000002f807b70] c00000000009487c .generic_file_read+0x94/0xcc
[c00000002f807cf0] c0000000000c42f0 .vfs_read+0x118/0x1fc
[c00000002f807d90] c0000000000c47d0 .sys_read+0x4c/0x8c
[c00000002f807e30] c0000000000086f8 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff5cc68
SP (ffd46a00) is in userspace
1:mon> r
R00 = 0000000600000000 R16 = c00000002f807e08
R01 = c00000002f806fb0 R17 = 00000000000200e3
R02 = c000000000625be0 R18 = c0000000005eaa90
R03 = c00000002f807d90 R19 = c0000000fed52c50
R04 = c0000000fe9587f0 R20 = 0000000000000000
R05 = 00000000000002f0 R21 = c00000002f807b10
R06 = c00000002f806fe8 R22 = 00000000000200e4
R07 = 0000000000080000 R23 = 0000000000000800
R08 = 0000000000002b02 R24 = c000000000629f50
R09 = c000000000627b20 R25 = 0000000000000001
R10 = 0000000000000000 R26 = c00000002f807e30
R11 = c00000002f804000 R27 = 000000000000000f
R12 = 0000000000000020 R28 = c00000000005cd60
R13 = c00000000048ae80 R29 = c00000002f807d90
R14 = 00000000000200e3 R30 = c0000000fe9587f0
R15 = c00000003fa1c508 R31 = c0000000000c47d0
pc = c00000000000ecdc .validate_sp+0x30/0x88
lr = c00000000000ee14 .show_stack+0xe0/0x1b0
msr = a000000000001032 cr = 88022444
ctr = 0000000000000000 xer = 0000000000000000 trap = 300
dar = c000000600627b20 dsisr = 40000000
cpu0 idle
looking into dmesg, its not clear anymore if lpfc is at fault:
<6>md: raid0 personality registered for level 0
<5>sd 2:0:0:0: Attached scsi disk sdb
<5>sd 2:0:0:0: Attached scsi generic sg2 type 0
<5> Vendor: IBM Model: 2105F20 Rev: 1.94
<5> Type: Direct-Access ANSI SCSI revision: 03
<5>SCSI device sdc: 12500032 512-byte hdwr sectors (6400 MB)
<3>Debug: sleeping function called from invalid context at include/linux/pagemap.h:168
<1>Unable to handle kernel paging request for data at address 0xc00001800048acb0
<1>Faulting instruction address: 0xc00000000005222c
<0>BUG: spinlock lockup on CPU#1, vol_id/1205, c000000004a550c0
<4>Call Trace:
<4>[C00000002F806FB0] [C00000000000ED9C] .show_stack+0x68/0x1b0 (unreliable)
<4>[C00000002F807050] [C0000000001CB214] ._raw_spin_lock+0x120/0x164
<4>[C00000002F8070E0] [C00000000036BF44] ._spin_lock+0x10/0x24
<4>[C00000002F807160] [C00000000005440C] .scheduler_tick+0xf4/0x3ec
<4>[C00000002F807210] [C00000000006AA44] .update_process_times+0x7c/0xa8
<4>[C00000002F8072A0] [C000000000021100] .timer_interrupt+0x94/0x404
<4>[C00000002F807380] [C0000000000034B4] decrementer_common+0xb4/0x100
<4>--- Exception: 901 at .release_console_sem+0x1c4/0x284
<4> LR = .release_console_sem+0x1c0/0x284
<4>[C00000002F807720] [C00000000005D7DC] .vprintk+0x330/0x388
<4>[C00000002F807840] [C00000000005D86C] .printk+0x38/0x48
<4>[C00000002F8078D0] [C0000000000524E4] .__might_sleep+0x98/0xf4
<4>[C00000002F807950] [C000000000092468] .do_generic_mapping_read+0x1fc/0x4dc
<4>[C00000002F807AA0] [C00000000009307C] .__generic_file_aio_read+0x184/0x22c
<4>[C00000002F807B70] [C00000000009487C] .generic_file_read+0x94/0xcc
<4>[C00000002F807CF0] [C0000000000C42F0] .vfs_read+0x118/0x1fc
<4>[C00000002F807D90] [C0000000000C47D0] .sys_read+0x4c/0x8c
<1>Unable to handle kernel paging request for data at address 0xc000000600627b20
<1>Faulting instruction address: 0xc00000000000ecdc
0:mon>
--
short story of a lazy sysadmin:
alias appserv=wotan
next prev parent reply other threads:[~2006-02-12 14:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-08 15:40 [PATCH 0/22] lpfc 8.1.2 driver update James Smart
2006-02-12 14:57 ` Olaf Hering [this message]
2006-02-12 15:20 ` [PATCH 0/22] lpfc 8.1.2 driver update, crash in lpfc_worker_0 James Bottomley
2006-02-12 15:26 ` Olaf Hering
2006-02-28 5:14 ` [PATCH 0/22] lpfc 8.1.2 driver update James Bottomley
2006-03-01 0:00 ` Jamie Wellnitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060212145736.GA23765@suse.de \
--to=olh@suse.de \
--cc=James.Smart@Emulex.Com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).