* sym2 oops in 2.6.9-rc2-BK
@ 2004-09-28 13:58 Anton Blanchard
2004-09-28 14:21 ` Anton Blanchard
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Anton Blanchard @ 2004-09-28 13:58 UTC (permalink / raw)
To: linux-scsi
Hi,
Ive got a 2.6.9-rc2-bk tree from about September 16 which exploded in
sym_prepare_nego. It turns out sdev is NULL, and scsi_device_dt(sdev)
causes the trouble.
A few lines above there is a check for sdev != NULL, so assuming it is
valid to be NULL add a check before scsi_device_dt() too.
Anton
Signed-off-by: Anton Blanchard <anton@samba.org>
diff -puN drivers/scsi/sym53c8xx_2/sym_hipd.c~fix-sym2 drivers/scsi/sym53c8xx_2/sym_hipd.c
--- gr_work/drivers/scsi/sym53c8xx_2/sym_hipd.c~fix-sym2 2004-09-28 03:03:26.493627814 -0500
+++ gr_work-anton/drivers/scsi/sym53c8xx_2/sym_hipd.c 2004-09-28 03:03:50.247458823 -0500
@@ -1550,7 +1550,7 @@ static int sym_prepare_nego(hcb_p np, cc
/*
* negotiate using PPR ?
*/
- if (scsi_device_dt(sdev)) {
+ if (sdev && scsi_device_dt(sdev)) {
nego = NS_PPR;
} else {
/*
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-28 13:58 sym2 oops in 2.6.9-rc2-BK Anton Blanchard
@ 2004-09-28 14:21 ` Anton Blanchard
2004-09-28 15:17 ` Matthew Wilcox
2004-09-28 14:56 ` Matthew Wilcox
2004-09-28 15:38 ` James Bottomley
2 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2004-09-28 14:21 UTC (permalink / raw)
To: linux-scsi; +Cc: willy
> Ive got a 2.6.9-rc2-bk tree from about September 16 which exploded in
> sym_prepare_nego. It turns out sdev is NULL, and scsi_device_dt(sdev)
> causes the trouble.
>
> A few lines above there is a check for sdev != NULL, so assuming it is
> valid to be NULL add a check before scsi_device_dt() too.
With that patch I still managed to get an oops. There is a fair amount
of bad hardware in the box but oopsing is pretty anti social.
Looks like a refcount problem. We kref_get'ed something already freed,
then finally oopsed in scsi_device_get, trying to access address
0x100510.
Anton
sym.0014:03:01.0:11:0: ABORT operation started.
sym.0014:03:01.0:11:0: ABORT operation complete.
sym.0014:03:01.0:11:0: DEVICE RESET operation started.
sym.0014:03:01.0:11:0: DEVICE RESET operation complete.
sym.0014:03:01.0:11:control msgout: c.
sym.0014:03:01.0: TARGET 11 has been reset.
sym.1214:03:01.0:11:0: ABORT operation started.
sym.1214:03:01.0:11:0: ABORT operation complete.
sym.1214:03:01.0: SCSI parity error detected: SCR1=1 DBC=1500000e SBCL=ae
sym.1214:03:01.0:11:0: DEVICE RESET operation started.
sym.1214:03:01.0:11:0: DEVICE RESET operation complete.
sym.1214:03:01.0:11:control msgout: c.
sym.1214:03:01.0: TARGET 11 has been reset.
sym.0014:03:01.0:11:0: ABORT operation started.
sym.0014:03:01.0:11:0: ABORT operation complete.
sym.0014:03:01.0:11:0: BUS RESET operation started.
sym.0014:03:01.0:11:0: BUS RESET operation complete.
sym.0014:03:01.0: SCSI BUS reset detected.
sym.0014:03:01.0: SCSI BUS has been reset.
sym.1214:03:01.0:11:0: ABORT operation started.
sym.1214:03:01.0:11:0: ABORT operation complete.
sym.1214:03:01.0:11:0: BUS RESET operation started.
sym.1214:03:01.0:11:0: BUS RESET operation complete.
sym.1214:03:01.0: SCSI BUS reset detected.
sym.1214:03:01.0: SCSI BUS has been reset.
scsi: Device offlined - not ready after error recovery: host 2 channel 0 id 11 lun 0
Badness in kref_get at lib/kref.c:32
Call Trace:
[c0000032fcab3bd0] [c0000032fcab3c50] 0xc0000032fcab3c50 (unreliable)
[c0000032fcab3c50] [c00000000021f5b8] .get_device+0x20/0x3c
[c0000032fcab3cc0] [c000000000294c60] .scsi_device_get+0x38/0xe4
[c0000032fcab3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000032fcab3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000032fcab3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000032fcab3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
sym.0014:03:01.0:11:control msgout: c.
NIP: C000000000294C48 XER: 0000000020000000 LR: C000000000294E30
REGS: c0000032fcab3a40 TRAP: 0300 Not tainted (2.6.9-rc2-bml)
MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000100510, DSISR: 0000000040000000
TASK: c000002bfd33d3c0[1494] 'scsi_eh_2' THREAD: c0000032fcab0000 CPU: 14
GPR00: FFFFFFFFFFFFFFFA C0000032FCAB3CC0 C0000000007297B8 00000000001000F0
GPR04: C00000000E112800 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000100100 C000001DFF875C28 9000000000009032
GPR12: 0000000024FFFF22 C000000000545880 0000000000000000 0000000000000000
GPR16: 0000000000000000 C00000000040D190 C000000000587058 C0000032FCAB3ED0
GPR20: 00000000000000FC C00000000040D190 C000000000587058 C0000032FCAB3F00
GPR24: C0000032FCAB3EF0 0000040180000000 C000000073847BB0 C00000000E112800
GPR28: 9000000000009032 C000000FFFFA8800 00000000001002D8 00000000001000F0
NIP [c000000000294c48] .scsi_device_get+0x20/0xe4
LR [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
Call Trace:
[c0000032fcab3cc0] [c000000000294da8] .scsi_device_put+0x9c/0xc4 (unreliable)
[c0000032fcab3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000032fcab3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000032fcab3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000032fcab3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-28 14:21 ` Anton Blanchard
@ 2004-09-28 15:17 ` Matthew Wilcox
0 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2004-09-28 15:17 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-scsi, willy
On Wed, Sep 29, 2004 at 12:21:04AM +1000, Anton Blanchard wrote:
> With that patch I still managed to get an oops. There is a fair amount
> of bad hardware in the box but oopsing is pretty anti social.
>
> Looks like a refcount problem. We kref_get'ed something already freed,
> then finally oopsed in scsi_device_get, trying to access address
> 0x100510.
__scsi_iterate_devices is part of a shost_for_each_device() loop. That
means we had a scsi_device sitting on the shost->__devices list with a
zero refcount. I'll see if I can spot the leak in my current sources,
but some of the behaviour has changed recently and it may be gone.
> scsi: Device offlined - not ready after error recovery: host 2 channel 0 id 11 lun 0
> Badness in kref_get at lib/kref.c:32
> Call Trace:
> [c0000032fcab3bd0] [c0000032fcab3c50] 0xc0000032fcab3c50 (unreliable)
> [c0000032fcab3c50] [c00000000021f5b8] .get_device+0x20/0x3c
> [c0000032fcab3cc0] [c000000000294c60] .scsi_device_get+0x38/0xe4
> [c0000032fcab3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
> [c0000032fcab3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
> [c0000032fcab3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
> [c0000032fcab3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
> sym.0014:03:01.0:11:control msgout: c.
>
> NIP: C000000000294C48 XER: 0000000020000000 LR: C000000000294E30
> REGS: c0000032fcab3a40 TRAP: 0300 Not tainted (2.6.9-rc2-bml)
> MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: 0000000000100510, DSISR: 0000000040000000
> TASK: c000002bfd33d3c0[1494] 'scsi_eh_2' THREAD: c0000032fcab0000 CPU: 14
> GPR00: FFFFFFFFFFFFFFFA C0000032FCAB3CC0 C0000000007297B8 00000000001000F0
> GPR04: C00000000E112800 0000000000000001 0000000000000000 0000000000000000
> GPR08: 0000000000000000 0000000000100100 C000001DFF875C28 9000000000009032
> GPR12: 0000000024FFFF22 C000000000545880 0000000000000000 0000000000000000
> GPR16: 0000000000000000 C00000000040D190 C000000000587058 C0000032FCAB3ED0
> GPR20: 00000000000000FC C00000000040D190 C000000000587058 C0000032FCAB3F00
> GPR24: C0000032FCAB3EF0 0000040180000000 C000000073847BB0 C00000000E112800
> GPR28: 9000000000009032 C000000FFFFA8800 00000000001002D8 00000000001000F0
> NIP [c000000000294c48] .scsi_device_get+0x20/0xe4
> LR [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
> Call Trace:
> [c0000032fcab3cc0] [c000000000294da8] .scsi_device_put+0x9c/0xc4 (unreliable)
> [c0000032fcab3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
> [c0000032fcab3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
> [c0000032fcab3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
> [c0000032fcab3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-28 13:58 sym2 oops in 2.6.9-rc2-BK Anton Blanchard
2004-09-28 14:21 ` Anton Blanchard
@ 2004-09-28 14:56 ` Matthew Wilcox
2004-09-28 15:25 ` Anton Blanchard
2004-09-28 15:38 ` James Bottomley
2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2004-09-28 14:56 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-scsi
On Tue, Sep 28, 2004 at 11:58:26PM +1000, Anton Blanchard wrote:
> Ive got a 2.6.9-rc2-bk tree from about September 16 which exploded in
> sym_prepare_nego. It turns out sdev is NULL, and scsi_device_dt(sdev)
> causes the trouble.
>
> A few lines above there is a check for sdev != NULL, so assuming it is
> valid to be NULL add a check before scsi_device_dt() too.
Yes, this looks like the right solution to me.
Can you tell me what circumstances you see it under, and do you
successfully negotiate 160MB/s with this patch?
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-28 14:56 ` Matthew Wilcox
@ 2004-09-28 15:25 ` Anton Blanchard
0 siblings, 0 replies; 9+ messages in thread
From: Anton Blanchard @ 2004-09-28 15:25 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-scsi
> Yes, this looks like the right solution to me.
>
> Can you tell me what circumstances you see it under, and do you
> successfully negotiate 160MB/s with this patch?
There is a bunch of bad hardware in it, so Im having a bit of trouble
working out exactly what is going on :) I'll look through the logs and
see if I can make sense of them.
Anton
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-28 13:58 sym2 oops in 2.6.9-rc2-BK Anton Blanchard
2004-09-28 14:21 ` Anton Blanchard
2004-09-28 14:56 ` Matthew Wilcox
@ 2004-09-28 15:38 ` James Bottomley
2004-09-30 13:05 ` Anton Blanchard
2 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2004-09-28 15:38 UTC (permalink / raw)
To: Anton Blanchard; +Cc: SCSI Mailing List
On Tue, 2004-09-28 at 09:58, Anton Blanchard wrote:
> Ive got a 2.6.9-rc2-bk tree from about September 16 which exploded in
> sym_prepare_nego. It turns out sdev is NULL, and scsi_device_dt(sdev)
> causes the trouble.
>
> A few lines above there is a check for sdev != NULL, so assuming it is
> valid to be NULL add a check before scsi_device_dt() too.
>
> Anton
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
>
> diff -puN drivers/scsi/sym53c8xx_2/sym_hipd.c~fix-sym2 drivers/scsi/sym53c8xx_2/sym_hipd.c
> --- gr_work/drivers/scsi/sym53c8xx_2/sym_hipd.c~fix-sym2 2004-09-28 03:03:26.493627814 -0500
> +++ gr_work-anton/drivers/scsi/sym53c8xx_2/sym_hipd.c 2004-09-28 03:03:50.247458823 -0500
> @@ -1550,7 +1550,7 @@ static int sym_prepare_nego(hcb_p np, cc
> /*
> * negotiate using PPR ?
> */
> - if (scsi_device_dt(sdev)) {
> + if (sdev && scsi_device_dt(sdev)) {
> nego = NS_PPR;
> } else {
> /*
Actually, this patch can't be correct. We should never be negotiating
with a NULL sdev. Previously we negotated after slave_alloc, but I've
tried to change the driver to defer negotiation until slave_configure.
What were the messages in the log prior to the NULL deref? What I'm
trying to understand is how we got to this point.
James
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-28 15:38 ` James Bottomley
@ 2004-09-30 13:05 ` Anton Blanchard
2004-09-30 13:52 ` James Bottomley
0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2004-09-30 13:05 UTC (permalink / raw)
To: James Bottomley; +Cc: SCSI Mailing List
> Actually, this patch can't be correct. We should never be negotiating
> with a NULL sdev. Previously we negotated after slave_alloc, but I've
> tried to change the driver to defer negotiation until slave_configure.
>
> What were the messages in the log prior to the NULL deref? What I'm
> trying to understand is how we got to this point.
Im confused, why are we checking sdev earlier on? Unfortunately I dont
have the machine at the moment, if I get it back I'll get a dmesg.
Anton
static int sym_prepare_nego(hcb_p np, ccb_p cp, int nego, u_char
*msgptr)
{
tcb_p tp = &np->target[cp->target];
int msglen = 0;
struct scsi_device *sdev = tp->sdev;
if (likely(sdev))
sym_check_goals(sdev);
/*
* Early C1010 chips need a work-around for DT
* data transfer to work.
*/
if (!(np->features & FE_U3EN))
tp->tinfo.goal.options = 0;
/*
* negotiate using PPR ?
*/
if (scsi_device_dt(sdev)) {
nego = NS_PPR;
} else {
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-30 13:05 ` Anton Blanchard
@ 2004-09-30 13:52 ` James Bottomley
2004-09-30 14:05 ` Anton Blanchard
0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2004-09-30 13:52 UTC (permalink / raw)
To: Anton Blanchard; +Cc: SCSI Mailing List
On Thu, 2004-09-30 at 09:05, Anton Blanchard wrote:
> Im confused, why are we checking sdev earlier on? Unfortunately I dont
> have the machine at the moment, if I get it back I'll get a dmesg.
Erm, because I didn't notice it and forgot to remove it when I revamped
the negotiation routines...
James
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: sym2 oops in 2.6.9-rc2-BK
2004-09-30 13:52 ` James Bottomley
@ 2004-09-30 14:05 ` Anton Blanchard
0 siblings, 0 replies; 9+ messages in thread
From: Anton Blanchard @ 2004-09-30 14:05 UTC (permalink / raw)
To: James Bottomley; +Cc: SCSI Mailing List
> Erm, because I didn't notice it and forgot to remove it when I revamped
> the negotiation routines...
OK, I backed that last patch out and hit what looks to be my bug 2 again.
Anton
sym0: <1010-66> rev 0x1 at pci 0004:03:01.0 irq 87
sym.0004:03:01.0: No NVRAM, ID 7, Fast-80, LVD, parity checking
xics_enable_irq 47 buid 4 gqirm 255
sym.0004:03:01.0: SCSI BUS has been reset.
scsi0 : sym-2.1.18j
Using anticipatory io scheduler
sym.0004:03:01.0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M
Type: Direct-Access ANSI SCSI revision: 03
sym.0004:03:01.0:10:0: tagged command queuing enabled, command queue depth 16.
scsi(0:0:10:0): Beginning Domain Validation
sym.0004:03:01.0:10: asynchronous.
sym.0004:03:01.0:10: wide asynchronous.
sym.0004:03:01.0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
scsi(0:0:10:0): Ending Domain Validation
sym.0004:03:01.0:11:0:phase change 2-7 6@01050368 resid=5.
sym.0004:03:01.0:11:0:phase change 2-3 6@01050368 resid=5.
sym.0004:03:01.0:11: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: DEVICE RESET operation started.
sym.0004:03:01.0:11:0: DEVICE RESET operation complete.
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: BUS RESET operation started.
sym.0004:03:01.0:11:0: BUS RESET operation complete.
sym.0004:03:01.0: SCSI BUS reset detected.
sym.0004:03:01.0: SCSI BUS has been reset.
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 11 lun 0
NIP: C000000000294C48 XER: 0000000020000000 LR: C000000000294E30
REGS: c000001dfd0e7a40 TRAP: 0300 Not tainted (2.6.9-rc2-bml)
MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000100510, DSISR: 0000000040000000
TASK: c000000ffe73b240[1463] 'scsi_eh_0' THREAD: c000001dfd0e4000 CPU: 3
GPR00: FFFFFFFFFFFFFFFA C000001DFD0E7CC0 C0000000007297B8 00000000001000F0
GPR04: C0000032FC834000 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000100100 C000000FFFFD7228 9000000000009032
GPR12: 0000000024FFFF22 C000000000542700 0000000000000000 0000000000000000
GPR16: 0000000000000000 C00000000040D190 C000000000587058 C000001DFD0E7ED0
GPR20: 00000000000000FC C00000000040D190 C000000000587058 C000001DFD0E7F00
GPR24: C000001DFD0E7EF0 0000040100000000 C000001DFD077D30 C0000032FC834000
GPR28: 9000000000009032 C000000FFFFC3800 00000000001002D8 00000000001000F0
NIP [c000000000294c48] .scsi_device_get+0x20/0xe4
LR [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
Call Trace:
[c000001dfd0e7cc0] [c000000000294da8] .scsi_device_put+0x9c/0xc4 (unreliable)
[c000001dfd0e7d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c000001dfd0e7de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c000001dfd0e7e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c000001dfd0e7f90] [c000000000017aac] .kernel_thread+0x4c/0x68
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-09-30 14:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-28 13:58 sym2 oops in 2.6.9-rc2-BK Anton Blanchard
2004-09-28 14:21 ` Anton Blanchard
2004-09-28 15:17 ` Matthew Wilcox
2004-09-28 14:56 ` Matthew Wilcox
2004-09-28 15:25 ` Anton Blanchard
2004-09-28 15:38 ` James Bottomley
2004-09-30 13:05 ` Anton Blanchard
2004-09-30 13:52 ` James Bottomley
2004-09-30 14:05 ` Anton Blanchard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox