All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-04 20:13 Brian King
  2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2004-03-04 20:13 UTC (permalink / raw)
  To: linux-scsi

I have been experiencing occasional oopses in some testing I have been
doing and have recently been able to aggravate the problem to recreate
the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
also doing heavy disk I/O, I can recreate the oops within a few minutes,
although I have also seen the problem under very little load. I have
seen the problem using both the ipr and sym2 drivers.


ksymoops output:

Unable to handle kernel paging request at virtual address c6cf3044
c017e81c
*pde = 0001c067
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c017e81c>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c6cf3044   ebx: c9545df8   ecx: 00000002   edx: c6cf3044
esi: c6cf3048   edi: c6cf3000   ebp: cc7a7c84   esp: cc7a7c74
ds: 007b   es: 007b   ss: 0068
Stack: cc7a7c84 c9545df8 c6cf3048 c9545df8 cc7a7ca8 d086c547 c6cf3044 0000001d
        00020001 d0889000 cad8ebf8 cec4f000 c9170eb8 cc7a7cc8 c032e743 c9170eb8
        cafd13a8 000000ff c9170eb8 cc7a7cd8 0000000a cc7a7ce8 c032e66b c9170eb8
Call Trace:
  [<d086c547>] sg_cmd_done+0x167/0x280 [sg]
  [<c032e743>] scsi_finish_command+0x73/0xb0
  [<c032e66b>] scsi_softirq+0xab/0xd0
  [<c012a935>] do_softirq+0xe5/0xf0
  [<c010c6b8>] do_IRQ+0x198/0x240
  [<c010a50c>] common_interrupt+0x18/0x20
  [<c0253a60>] __copy_from_user_ll+0x50/0x80
  [<c0171ed7>] blkdev_prepare_write+0x27/0x30
  [<c0146aa3>] generic_file_aio_write_nolock+0x463/0xc20
  [<c01472d8>] generic_file_write_nolock+0x78/0x90
  [<c011fd52>] default_wake_function+0x22/0x30
  [<d086c547>] sg_cmd_done+0x167/0x280 [sg]
  [<c032e743>] scsi_finish_command+0x73/0xb0
  [<c01733b3>] blkdev_file_write+0x33/0x40
  [<c0168b6f>] vfs_write+0xaf/0x120
  [<c0168c7f>] sys_write+0x3f/0x60
  [<c0109b9f>] syscall_call+0x7/0xb
Code: 8b 0a 85 c9 74 4b bb 00 e0 ff ff 21 e3 ff 43 14 b8 84 d2 4f


 >>EIP; c017e81c <kill_fasync+c/75>   <=====

 >>eax; c6cf3044 <_end+66b5b80/3f9c0b3c>
 >>ebx; c9545df8 <_end+8f08934/3f9c0b3c>
 >>edx; c6cf3044 <_end+66b5b80/3f9c0b3c>
 >>esi; c6cf3048 <_end+66b5b84/3f9c0b3c>
 >>edi; c6cf3000 <_end+66b5b3c/3f9c0b3c>
 >>ebp; cc7a7c84 <_end+c16a7c0/3f9c0b3c>
 >>esp; cc7a7c74 <_end+c16a7b0/3f9c0b3c>

Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c032e66b <scsi_softirq+ab/d0>
Trace; c012a935 <do_softirq+e5/f0>
Trace; c010c6b8 <do_IRQ+198/240>
Trace; c010a50c <common_interrupt+18/20>
Trace; c0253a60 <__copy_from_user_ll+50/80>
Trace; c0171ed7 <blkdev_prepare_write+27/30>
Trace; c0146aa3 <generic_file_aio_write_nolock+463/c20>
Trace; c01472d8 <generic_file_write_nolock+78/90>
Trace; c011fd52 <default_wake_function+22/30>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c01733b3 <blkdev_file_write+33/40>
Trace; c0168b6f <vfs_write+af/120>
Trace; c0168c7f <sys_write+3f/60>
Trace; c0109b9f <syscall_call+7/b>

Code;  c017e81c <kill_fasync+c/75>
00000000 <_EIP>:
Code;  c017e81c <kill_fasync+c/75>   <=====
    0:   8b 0a                     mov    (%edx),%ecx   <=====
Code;  c017e81e <kill_fasync+e/75>
    2:   85 c9                     test   %ecx,%ecx
Code;  c017e820 <kill_fasync+10/75>
    4:   74 4b                     je     51 <_EIP+0x51>
Code;  c017e822 <kill_fasync+12/75>
    6:   bb 00 e0 ff ff            mov    $0xffffe000,%ebx
Code;  c017e827 <kill_fasync+17/75>
    b:   21 e3                     and    %esp,%ebx
Code;  c017e829 <kill_fasync+19/75>
    d:   ff 43 14                  incl   0x14(%ebx)
Code;  c017e82c <kill_fasync+1c/75>
   10:   b8 84 d2 4f 00            mov    $0x4fd284,%eax




Gnu C                  3.2
Gnu make               3.79.1
util-linux             2.11r
mount                  2.11r
module-init-tools      0.9.12
e2fsprogs              1.27
jfsutils               1.0.17
reiserfsprogs          3.6.2
pcmcia-cs              3.1.31
quota-tools            3.06.
PPP                    2.4.1
isdn4k-utils           3.1pre4
nfs-utils              1.0.1
Linux C Library        2.2.93
Dynamic linker (ldd)   2.2.93
Procps                 2.0.7
Net-tools              1.60
Kbd                    1.06
Sh-utils               2.0.12
Modules Loaded         ipr firmware_class sg


cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1266MHz
stepping        : 1
cpu MHz         : 1259.071
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 2482.17

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1266MHz
stepping        : 1
cpu MHz         : 1259.071
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 2506.75



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-18 21:48 Brian King
  0 siblings, 0 replies; 6+ messages in thread
From: Brian King @ 2004-03-18 21:48 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, akpm, dougg

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

James,

Attached is a patch to fix an oops in sg_cmd_done. Please apply.

Thanks

-Brian


-------- Original Message --------
Subject: Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
Date: Wed, 10 Mar 2004 10:24:39 -0600
From: Brian King <brking@us.ibm.com>
To: dougg@torque.net
CC: James.Bottomley@steeleye.com,  tonyb@cybernetics.com
References: <40478DD3.10807@us.ibm.com> <404B1C79.4060600@torque.net> <404CD74A.1090301@us.ibm.com> 
<404F3133.7060200@torque.net>

Douglas Gilbert wrote:

 > Brian,
 > Thanks for this test code. I don't follow the "run disk exercisers" bit.
 > BTW iprinit seg faulted when the sg module wasn't loaded
 >
 > Your patch widens the srp->done window and re-orders kill_fasync()
 > and wake_up_interruptible(). Are they both needed? If not which one
 > is critical?

I need both for my testcase to run clean. sg_cmd_done cannot touch sfp
once srp->done is set.

 > Anyway I'm happy to go ahead with the patch (posted by you a little
 > while later on the lsml). Having just moved accommodation my
 > equipment still needs more setting up. I have a sym53c8xx HBA.

Thanks. Once again, here is the patch. James, please apply.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: sg_cmd_done_oops.patch --]
[-- Type: text/plain, Size: 940 bytes --]


The patch fixes a race condition in sg_cmd_done that results in an oops.


---


diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
--- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops	2004-03-06 22:08:45.000000000 -0600
+++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c	2004-03-06 22:55:12.000000000 -0600
@@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
 
 	srp->my_cmdp = NULL;
-	srp->done = 1;
 
 	SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
 		sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
@@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
 	}
 	if (sfp && srp) {
 		/* Now wake up any sg_read() that is waiting for this packet. */
-		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		srp->done = 1;
+		wake_up_interruptible(&sfp->read_wait);
 	}
 }
 

_


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-03-18 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-04 20:13 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
2004-03-09 15:29   ` Brian King
2004-03-09 16:02     ` Tony Battersby
2004-03-09 17:30       ` Brian King
  -- strict thread matches above, loose matches on Subject: below --
2004-03-18 21:48 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.