public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-04 20:13 Brian King
  2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2004-03-04 20:13 UTC (permalink / raw)
  To: linux-scsi

I have been experiencing occasional oopses in some testing I have been
doing and have recently been able to aggravate the problem to recreate
the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
also doing heavy disk I/O, I can recreate the oops within a few minutes,
although I have also seen the problem under very little load. I have
seen the problem using both the ipr and sym2 drivers.


ksymoops output:

Unable to handle kernel paging request at virtual address c6cf3044
c017e81c
*pde = 0001c067
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c017e81c>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c6cf3044   ebx: c9545df8   ecx: 00000002   edx: c6cf3044
esi: c6cf3048   edi: c6cf3000   ebp: cc7a7c84   esp: cc7a7c74
ds: 007b   es: 007b   ss: 0068
Stack: cc7a7c84 c9545df8 c6cf3048 c9545df8 cc7a7ca8 d086c547 c6cf3044 0000001d
        00020001 d0889000 cad8ebf8 cec4f000 c9170eb8 cc7a7cc8 c032e743 c9170eb8
        cafd13a8 000000ff c9170eb8 cc7a7cd8 0000000a cc7a7ce8 c032e66b c9170eb8
Call Trace:
  [<d086c547>] sg_cmd_done+0x167/0x280 [sg]
  [<c032e743>] scsi_finish_command+0x73/0xb0
  [<c032e66b>] scsi_softirq+0xab/0xd0
  [<c012a935>] do_softirq+0xe5/0xf0
  [<c010c6b8>] do_IRQ+0x198/0x240
  [<c010a50c>] common_interrupt+0x18/0x20
  [<c0253a60>] __copy_from_user_ll+0x50/0x80
  [<c0171ed7>] blkdev_prepare_write+0x27/0x30
  [<c0146aa3>] generic_file_aio_write_nolock+0x463/0xc20
  [<c01472d8>] generic_file_write_nolock+0x78/0x90
  [<c011fd52>] default_wake_function+0x22/0x30
  [<d086c547>] sg_cmd_done+0x167/0x280 [sg]
  [<c032e743>] scsi_finish_command+0x73/0xb0
  [<c01733b3>] blkdev_file_write+0x33/0x40
  [<c0168b6f>] vfs_write+0xaf/0x120
  [<c0168c7f>] sys_write+0x3f/0x60
  [<c0109b9f>] syscall_call+0x7/0xb
Code: 8b 0a 85 c9 74 4b bb 00 e0 ff ff 21 e3 ff 43 14 b8 84 d2 4f


 >>EIP; c017e81c <kill_fasync+c/75>   <=====

 >>eax; c6cf3044 <_end+66b5b80/3f9c0b3c>
 >>ebx; c9545df8 <_end+8f08934/3f9c0b3c>
 >>edx; c6cf3044 <_end+66b5b80/3f9c0b3c>
 >>esi; c6cf3048 <_end+66b5b84/3f9c0b3c>
 >>edi; c6cf3000 <_end+66b5b3c/3f9c0b3c>
 >>ebp; cc7a7c84 <_end+c16a7c0/3f9c0b3c>
 >>esp; cc7a7c74 <_end+c16a7b0/3f9c0b3c>

Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c032e66b <scsi_softirq+ab/d0>
Trace; c012a935 <do_softirq+e5/f0>
Trace; c010c6b8 <do_IRQ+198/240>
Trace; c010a50c <common_interrupt+18/20>
Trace; c0253a60 <__copy_from_user_ll+50/80>
Trace; c0171ed7 <blkdev_prepare_write+27/30>
Trace; c0146aa3 <generic_file_aio_write_nolock+463/c20>
Trace; c01472d8 <generic_file_write_nolock+78/90>
Trace; c011fd52 <default_wake_function+22/30>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c01733b3 <blkdev_file_write+33/40>
Trace; c0168b6f <vfs_write+af/120>
Trace; c0168c7f <sys_write+3f/60>
Trace; c0109b9f <syscall_call+7/b>

Code;  c017e81c <kill_fasync+c/75>
00000000 <_EIP>:
Code;  c017e81c <kill_fasync+c/75>   <=====
    0:   8b 0a                     mov    (%edx),%ecx   <=====
Code;  c017e81e <kill_fasync+e/75>
    2:   85 c9                     test   %ecx,%ecx
Code;  c017e820 <kill_fasync+10/75>
    4:   74 4b                     je     51 <_EIP+0x51>
Code;  c017e822 <kill_fasync+12/75>
    6:   bb 00 e0 ff ff            mov    $0xffffe000,%ebx
Code;  c017e827 <kill_fasync+17/75>
    b:   21 e3                     and    %esp,%ebx
Code;  c017e829 <kill_fasync+19/75>
    d:   ff 43 14                  incl   0x14(%ebx)
Code;  c017e82c <kill_fasync+1c/75>
   10:   b8 84 d2 4f 00            mov    $0x4fd284,%eax




Gnu C                  3.2
Gnu make               3.79.1
util-linux             2.11r
mount                  2.11r
module-init-tools      0.9.12
e2fsprogs              1.27
jfsutils               1.0.17
reiserfsprogs          3.6.2
pcmcia-cs              3.1.31
quota-tools            3.06.
PPP                    2.4.1
isdn4k-utils           3.1pre4
nfs-utils              1.0.1
Linux C Library        2.2.93
Dynamic linker (ldd)   2.2.93
Procps                 2.0.7
Net-tools              1.60
Kbd                    1.06
Sh-utils               2.0.12
Modules Loaded         ipr firmware_class sg


cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1266MHz
stepping        : 1
cpu MHz         : 1259.071
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 2482.17

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1266MHz
stepping        : 1
cpu MHz         : 1259.071
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 2506.75



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH]
  2004-03-04 20:13 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
@ 2004-03-08 22:01 ` Brian King
  2004-03-09 15:29   ` Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2004-03-08 22:01 UTC (permalink / raw)
  To: dougg; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 818 bytes --]

Attached is a patch which seems to fix the oops for me. Without the patch
I can consistently reproduce the oops in just a couple minutes. With the
patch I have been running for close to an hour without problems so far.
Doug, does this look ok? I'm going to let my testcase run overnight as well
and will post the results tomorrow.


> I have been experiencing occasional oopses in some testing I have been
> doing and have recently been able to aggravate the problem to recreate
> the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
> also doing heavy disk I/O, I can recreate the oops within a few minutes,
> although I have also seen the problem under very little load. I have
> seen the problem using both the ipr and sym2 drivers.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: sg_cmd_done_oops.patch --]
[-- Type: text/plain, Size: 939 bytes --]


The patch fixes a race condition in sg_cmd_done that results in an oops.


---


diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
--- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops	2004-03-06 22:08:45.000000000 -0600
+++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c	2004-03-06 22:55:12.000000000 -0600
@@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
 
 	srp->my_cmdp = NULL;
-	srp->done = 1;
 
 	SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
 		sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
@@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	}
 	if (sfp && srp) {
 		/* Now wake up any sg_read() that is waiting for this packet. */
-		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		srp->done = 1;
+		wake_up_interruptible(&sfp->read_wait);
 	}
 }
 

_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH]
  2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
@ 2004-03-09 15:29   ` Brian King
  2004-03-09 16:02     ` Tony Battersby
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2004-03-09 15:29 UTC (permalink / raw)
  To: Brian King; +Cc: dougg, linux-scsi

Testcase ran overnight without any problems.

-Brian


Brian King wrote:
> Attached is a patch which seems to fix the oops for me. Without the patch
> I can consistently reproduce the oops in just a couple minutes. With the
> patch I have been running for close to an hour without problems so far.
> Doug, does this look ok? I'm going to let my testcase run overnight as well
> and will post the results tomorrow.
> 
> 
>> I have been experiencing occasional oopses in some testing I have been
>> doing and have recently been able to aggravate the problem to recreate
>> the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
>> also doing heavy disk I/O, I can recreate the oops within a few minutes,
>> although I have also seen the problem under very little load. I have
>> seen the problem using both the ipr and sym2 drivers.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> The patch fixes a race condition in sg_cmd_done that results in an oops.
> 
> 
> ---
> 
> 
> diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
> --- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops	2004-03-06 22:08:45.000000000 -0600
> +++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c	2004-03-06 22:55:12.000000000 -0600
> @@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
>  	SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
>  
>  	srp->my_cmdp = NULL;
> -	srp->done = 1;
>  
>  	SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
>  		sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
> @@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
>  	}
>  	if (sfp && srp) {
>  		/* Now wake up any sg_read() that is waiting for this packet. */
> -		wake_up_interruptible(&sfp->read_wait);
>  		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
> +		srp->done = 1;
> +		wake_up_interruptible(&sfp->read_wait);
>  	}
>  }
>  
> 
> _


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH]
  2004-03-09 15:29   ` Brian King
@ 2004-03-09 16:02     ` Tony Battersby
  2004-03-09 17:30       ` Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Battersby @ 2004-03-09 16:02 UTC (permalink / raw)
  To: 'Brian King'; +Cc: dougg, linux-scsi

I found a similar problem in 2.4 last year.  I posted a patch but it
wasn't merged due to a minor technicality with memory barriers, and I
forgot to follow up on the issue.  Sorry for dropping the ball on this
one.  See http://marc.theaimsgroup.com/?t=105664088400001&r=1&w=2.  I
believe the fix probably still needs to be merged in 2.4.

Anthony J. Battersby
Cybernetics


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH]
  2004-03-09 16:02     ` Tony Battersby
@ 2004-03-09 17:30       ` Brian King
  0 siblings, 0 replies; 6+ messages in thread
From: Brian King @ 2004-03-09 17:30 UTC (permalink / raw)
  To: tonyb; +Cc: dougg, linux-scsi

I tried out your patch and it didn't fix my problem. All it did was
reduce the window. I believe the patch I submitted should fix the
problem you were seeing as well.

-Brian


Tony Battersby wrote:
> I found a similar problem in 2.4 last year.  I posted a patch but it
> wasn't merged due to a minor technicality with memory barriers, and I
> forgot to follow up on the issue.  Sorry for dropping the ball on this
> one.  See http://marc.theaimsgroup.com/?t=105664088400001&r=1&w=2.  I
> believe the fix probably still needs to be merged in 2.4.
> 
> Anthony J. Battersby
> Cybernetics
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-18 21:48 Brian King
  0 siblings, 0 replies; 6+ messages in thread
From: Brian King @ 2004-03-18 21:48 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, akpm, dougg

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

James,

Attached is a patch to fix an oops in sg_cmd_done. Please apply.

Thanks

-Brian


-------- Original Message --------
Subject: Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
Date: Wed, 10 Mar 2004 10:24:39 -0600
From: Brian King <brking@us.ibm.com>
To: dougg@torque.net
CC: James.Bottomley@steeleye.com,  tonyb@cybernetics.com
References: <40478DD3.10807@us.ibm.com> <404B1C79.4060600@torque.net> <404CD74A.1090301@us.ibm.com> 
<404F3133.7060200@torque.net>

Douglas Gilbert wrote:

 > Brian,
 > Thanks for this test code. I don't follow the "run disk exercisers" bit.
 > BTW iprinit seg faulted when the sg module wasn't loaded
 >
 > Your patch widens the srp->done window and re-orders kill_fasync()
 > and wake_up_interruptible(). Are they both needed? If not which one
 > is critical?

I need both for my testcase to run clean. sg_cmd_done cannot touch sfp
once srp->done is set.

 > Anyway I'm happy to go ahead with the patch (posted by you a little
 > while later on the lsml). Having just moved accommodation my
 > equipment still needs more setting up. I have a sym53c8xx HBA.

Thanks. Once again, here is the patch. James, please apply.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: sg_cmd_done_oops.patch --]
[-- Type: text/plain, Size: 940 bytes --]


The patch fixes a race condition in sg_cmd_done that results in an oops.


---


diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
--- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops	2004-03-06 22:08:45.000000000 -0600
+++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c	2004-03-06 22:55:12.000000000 -0600
@@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
 
 	srp->my_cmdp = NULL;
-	srp->done = 1;
 
 	SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
 		sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
@@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	}
 	if (sfp && srp) {
 		/* Now wake up any sg_read() that is waiting for this packet. */
-		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		srp->done = 1;
+		wake_up_interruptible(&sfp->read_wait);
 	}
 }
 

_


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-03-18 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-04 20:13 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
2004-03-09 15:29   ` Brian King
2004-03-09 16:02     ` Tony Battersby
2004-03-09 17:30       ` Brian King
  -- strict thread matches above, loose matches on Subject: below --
2004-03-18 21:48 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox