public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-04 20:13 Brian King
  2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2004-03-04 20:13 UTC (permalink / raw)
  To: linux-scsi

I have been experiencing occasional oopses in some testing I have been
doing and have recently been able to aggravate the problem to recreate
the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
also doing heavy disk I/O, I can recreate the oops within a few minutes,
although I have also seen the problem under very little load. I have
seen the problem using both the ipr and sym2 drivers.


ksymoops output:

Unable to handle kernel paging request at virtual address c6cf3044
c017e81c
*pde = 0001c067
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c017e81c>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c6cf3044   ebx: c9545df8   ecx: 00000002   edx: c6cf3044
esi: c6cf3048   edi: c6cf3000   ebp: cc7a7c84   esp: cc7a7c74
ds: 007b   es: 007b   ss: 0068
Stack: cc7a7c84 c9545df8 c6cf3048 c9545df8 cc7a7ca8 d086c547 c6cf3044 0000001d
        00020001 d0889000 cad8ebf8 cec4f000 c9170eb8 cc7a7cc8 c032e743 c9170eb8
        cafd13a8 000000ff c9170eb8 cc7a7cd8 0000000a cc7a7ce8 c032e66b c9170eb8
Call Trace:
  [<d086c547>] sg_cmd_done+0x167/0x280 [sg]
  [<c032e743>] scsi_finish_command+0x73/0xb0
  [<c032e66b>] scsi_softirq+0xab/0xd0
  [<c012a935>] do_softirq+0xe5/0xf0
  [<c010c6b8>] do_IRQ+0x198/0x240
  [<c010a50c>] common_interrupt+0x18/0x20
  [<c0253a60>] __copy_from_user_ll+0x50/0x80
  [<c0171ed7>] blkdev_prepare_write+0x27/0x30
  [<c0146aa3>] generic_file_aio_write_nolock+0x463/0xc20
  [<c01472d8>] generic_file_write_nolock+0x78/0x90
  [<c011fd52>] default_wake_function+0x22/0x30
  [<d086c547>] sg_cmd_done+0x167/0x280 [sg]
  [<c032e743>] scsi_finish_command+0x73/0xb0
  [<c01733b3>] blkdev_file_write+0x33/0x40
  [<c0168b6f>] vfs_write+0xaf/0x120
  [<c0168c7f>] sys_write+0x3f/0x60
  [<c0109b9f>] syscall_call+0x7/0xb
Code: 8b 0a 85 c9 74 4b bb 00 e0 ff ff 21 e3 ff 43 14 b8 84 d2 4f


 >>EIP; c017e81c <kill_fasync+c/75>   <=====

 >>eax; c6cf3044 <_end+66b5b80/3f9c0b3c>
 >>ebx; c9545df8 <_end+8f08934/3f9c0b3c>
 >>edx; c6cf3044 <_end+66b5b80/3f9c0b3c>
 >>esi; c6cf3048 <_end+66b5b84/3f9c0b3c>
 >>edi; c6cf3000 <_end+66b5b3c/3f9c0b3c>
 >>ebp; cc7a7c84 <_end+c16a7c0/3f9c0b3c>
 >>esp; cc7a7c74 <_end+c16a7b0/3f9c0b3c>

Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c032e66b <scsi_softirq+ab/d0>
Trace; c012a935 <do_softirq+e5/f0>
Trace; c010c6b8 <do_IRQ+198/240>
Trace; c010a50c <common_interrupt+18/20>
Trace; c0253a60 <__copy_from_user_ll+50/80>
Trace; c0171ed7 <blkdev_prepare_write+27/30>
Trace; c0146aa3 <generic_file_aio_write_nolock+463/c20>
Trace; c01472d8 <generic_file_write_nolock+78/90>
Trace; c011fd52 <default_wake_function+22/30>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c01733b3 <blkdev_file_write+33/40>
Trace; c0168b6f <vfs_write+af/120>
Trace; c0168c7f <sys_write+3f/60>
Trace; c0109b9f <syscall_call+7/b>

Code;  c017e81c <kill_fasync+c/75>
00000000 <_EIP>:
Code;  c017e81c <kill_fasync+c/75>   <=====
    0:   8b 0a                     mov    (%edx),%ecx   <=====
Code;  c017e81e <kill_fasync+e/75>
    2:   85 c9                     test   %ecx,%ecx
Code;  c017e820 <kill_fasync+10/75>
    4:   74 4b                     je     51 <_EIP+0x51>
Code;  c017e822 <kill_fasync+12/75>
    6:   bb 00 e0 ff ff            mov    $0xffffe000,%ebx
Code;  c017e827 <kill_fasync+17/75>
    b:   21 e3                     and    %esp,%ebx
Code;  c017e829 <kill_fasync+19/75>
    d:   ff 43 14                  incl   0x14(%ebx)
Code;  c017e82c <kill_fasync+1c/75>
   10:   b8 84 d2 4f 00            mov    $0x4fd284,%eax




Gnu C                  3.2
Gnu make               3.79.1
util-linux             2.11r
mount                  2.11r
module-init-tools      0.9.12
e2fsprogs              1.27
jfsutils               1.0.17
reiserfsprogs          3.6.2
pcmcia-cs              3.1.31
quota-tools            3.06.
PPP                    2.4.1
isdn4k-utils           3.1pre4
nfs-utils              1.0.1
Linux C Library        2.2.93
Dynamic linker (ldd)   2.2.93
Procps                 2.0.7
Net-tools              1.60
Kbd                    1.06
Sh-utils               2.0.12
Modules Loaded         ipr firmware_class sg


cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1266MHz
stepping        : 1
cpu MHz         : 1259.071
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 2482.17

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1266MHz
stepping        : 1
cpu MHz         : 1259.071
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 2506.75



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-18 21:48 Brian King
  0 siblings, 0 replies; 6+ messages in thread
From: Brian King @ 2004-03-18 21:48 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, akpm, dougg

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

James,

Attached is a patch to fix an oops in sg_cmd_done. Please apply.

Thanks

-Brian


-------- Original Message --------
Subject: Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
Date: Wed, 10 Mar 2004 10:24:39 -0600
From: Brian King <brking@us.ibm.com>
To: dougg@torque.net
CC: James.Bottomley@steeleye.com,  tonyb@cybernetics.com
References: <40478DD3.10807@us.ibm.com> <404B1C79.4060600@torque.net> <404CD74A.1090301@us.ibm.com> 
<404F3133.7060200@torque.net>

Douglas Gilbert wrote:

 > Brian,
 > Thanks for this test code. I don't follow the "run disk exercisers" bit.
 > BTW iprinit seg faulted when the sg module wasn't loaded
 >
 > Your patch widens the srp->done window and re-orders kill_fasync()
 > and wake_up_interruptible(). Are they both needed? If not which one
 > is critical?

I need both for my testcase to run clean. sg_cmd_done cannot touch sfp
once srp->done is set.

 > Anyway I'm happy to go ahead with the patch (posted by you a little
 > while later on the lsml). Having just moved accommodation my
 > equipment still needs more setting up. I have a sym53c8xx HBA.

Thanks. Once again, here is the patch. James, please apply.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: sg_cmd_done_oops.patch --]
[-- Type: text/plain, Size: 940 bytes --]


The patch fixes a race condition in sg_cmd_done that results in an oops.


---


diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
--- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops	2004-03-06 22:08:45.000000000 -0600
+++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c	2004-03-06 22:55:12.000000000 -0600
@@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
 
 	srp->my_cmdp = NULL;
-	srp->done = 1;
 
 	SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
 		sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
@@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	}
 	if (sfp && srp) {
 		/* Now wake up any sg_read() that is waiting for this packet. */
-		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		srp->done = 1;
+		wake_up_interruptible(&sfp->read_wait);
 	}
 }
 

_


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-03-18 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-04 20:13 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
2004-03-09 15:29   ` Brian King
2004-03-09 16:02     ` Tony Battersby
2004-03-09 17:30       ` Brian King
  -- strict thread matches above, loose matches on Subject: below --
2004-03-18 21:48 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox