* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-18 21:48 Brian King
0 siblings, 0 replies; 2+ messages in thread
From: Brian King @ 2004-03-18 21:48 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-scsi, akpm, dougg
[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]
James,
Attached is a patch to fix an oops in sg_cmd_done. Please apply.
Thanks
-Brian
-------- Original Message --------
Subject: Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
Date: Wed, 10 Mar 2004 10:24:39 -0600
From: Brian King <brking@us.ibm.com>
To: dougg@torque.net
CC: James.Bottomley@steeleye.com, tonyb@cybernetics.com
References: <40478DD3.10807@us.ibm.com> <404B1C79.4060600@torque.net> <404CD74A.1090301@us.ibm.com>
<404F3133.7060200@torque.net>
Douglas Gilbert wrote:
> Brian,
> Thanks for this test code. I don't follow the "run disk exercisers" bit.
> BTW iprinit seg faulted when the sg module wasn't loaded
>
> Your patch widens the srp->done window and re-orders kill_fasync()
> and wake_up_interruptible(). Are they both needed? If not which one
> is critical?
I need both for my testcase to run clean. sg_cmd_done cannot touch sfp
once srp->done is set.
> Anyway I'm happy to go ahead with the patch (posted by you a little
> while later on the lsml). Having just moved accommodation my
> equipment still needs more setting up. I have a sym53c8xx HBA.
Thanks. Once again, here is the patch. James, please apply.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
[-- Attachment #2: sg_cmd_done_oops.patch --]
[-- Type: text/plain, Size: 940 bytes --]
The patch fixes a race condition in sg_cmd_done that results in an oops.
---
diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
--- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops 2004-03-06 22:08:45.000000000 -0600
+++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c 2004-03-06 22:55:12.000000000 -0600
@@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
srp->my_cmdp = NULL;
- srp->done = 1;
SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
@@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
}
if (sfp && srp) {
/* Now wake up any sg_read() that is waiting for this packet. */
- wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+ srp->done = 1;
+ wake_up_interruptible(&sfp->read_wait);
}
}
_
^ permalink raw reply [flat|nested] 2+ messages in thread* PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-04 20:13 Brian King
0 siblings, 0 replies; 2+ messages in thread
From: Brian King @ 2004-03-04 20:13 UTC (permalink / raw)
To: linux-scsi
I have been experiencing occasional oopses in some testing I have been
doing and have recently been able to aggravate the problem to recreate
the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
also doing heavy disk I/O, I can recreate the oops within a few minutes,
although I have also seen the problem under very little load. I have
seen the problem using both the ipr and sym2 drivers.
ksymoops output:
Unable to handle kernel paging request at virtual address c6cf3044
c017e81c
*pde = 0001c067
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c017e81c>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c6cf3044 ebx: c9545df8 ecx: 00000002 edx: c6cf3044
esi: c6cf3048 edi: c6cf3000 ebp: cc7a7c84 esp: cc7a7c74
ds: 007b es: 007b ss: 0068
Stack: cc7a7c84 c9545df8 c6cf3048 c9545df8 cc7a7ca8 d086c547 c6cf3044 0000001d
00020001 d0889000 cad8ebf8 cec4f000 c9170eb8 cc7a7cc8 c032e743 c9170eb8
cafd13a8 000000ff c9170eb8 cc7a7cd8 0000000a cc7a7ce8 c032e66b c9170eb8
Call Trace:
[<d086c547>] sg_cmd_done+0x167/0x280 [sg]
[<c032e743>] scsi_finish_command+0x73/0xb0
[<c032e66b>] scsi_softirq+0xab/0xd0
[<c012a935>] do_softirq+0xe5/0xf0
[<c010c6b8>] do_IRQ+0x198/0x240
[<c010a50c>] common_interrupt+0x18/0x20
[<c0253a60>] __copy_from_user_ll+0x50/0x80
[<c0171ed7>] blkdev_prepare_write+0x27/0x30
[<c0146aa3>] generic_file_aio_write_nolock+0x463/0xc20
[<c01472d8>] generic_file_write_nolock+0x78/0x90
[<c011fd52>] default_wake_function+0x22/0x30
[<d086c547>] sg_cmd_done+0x167/0x280 [sg]
[<c032e743>] scsi_finish_command+0x73/0xb0
[<c01733b3>] blkdev_file_write+0x33/0x40
[<c0168b6f>] vfs_write+0xaf/0x120
[<c0168c7f>] sys_write+0x3f/0x60
[<c0109b9f>] syscall_call+0x7/0xb
Code: 8b 0a 85 c9 74 4b bb 00 e0 ff ff 21 e3 ff 43 14 b8 84 d2 4f
>>EIP; c017e81c <kill_fasync+c/75> <=====
>>eax; c6cf3044 <_end+66b5b80/3f9c0b3c>
>>ebx; c9545df8 <_end+8f08934/3f9c0b3c>
>>edx; c6cf3044 <_end+66b5b80/3f9c0b3c>
>>esi; c6cf3048 <_end+66b5b84/3f9c0b3c>
>>edi; c6cf3000 <_end+66b5b3c/3f9c0b3c>
>>ebp; cc7a7c84 <_end+c16a7c0/3f9c0b3c>
>>esp; cc7a7c74 <_end+c16a7b0/3f9c0b3c>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c032e66b <scsi_softirq+ab/d0>
Trace; c012a935 <do_softirq+e5/f0>
Trace; c010c6b8 <do_IRQ+198/240>
Trace; c010a50c <common_interrupt+18/20>
Trace; c0253a60 <__copy_from_user_ll+50/80>
Trace; c0171ed7 <blkdev_prepare_write+27/30>
Trace; c0146aa3 <generic_file_aio_write_nolock+463/c20>
Trace; c01472d8 <generic_file_write_nolock+78/90>
Trace; c011fd52 <default_wake_function+22/30>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c01733b3 <blkdev_file_write+33/40>
Trace; c0168b6f <vfs_write+af/120>
Trace; c0168c7f <sys_write+3f/60>
Trace; c0109b9f <syscall_call+7/b>
Code; c017e81c <kill_fasync+c/75>
00000000 <_EIP>:
Code; c017e81c <kill_fasync+c/75> <=====
0: 8b 0a mov (%edx),%ecx <=====
Code; c017e81e <kill_fasync+e/75>
2: 85 c9 test %ecx,%ecx
Code; c017e820 <kill_fasync+10/75>
4: 74 4b je 51 <_EIP+0x51>
Code; c017e822 <kill_fasync+12/75>
6: bb 00 e0 ff ff mov $0xffffe000,%ebx
Code; c017e827 <kill_fasync+17/75>
b: 21 e3 and %esp,%ebx
Code; c017e829 <kill_fasync+19/75>
d: ff 43 14 incl 0x14(%ebx)
Code; c017e82c <kill_fasync+1c/75>
10: b8 84 d2 4f 00 mov $0x4fd284,%eax
Gnu C 3.2
Gnu make 3.79.1
util-linux 2.11r
mount 2.11r
module-init-tools 0.9.12
e2fsprogs 1.27
jfsutils 1.0.17
reiserfsprogs 3.6.2
pcmcia-cs 3.1.31
quota-tools 3.06.
PPP 2.4.1
isdn4k-utils 3.1pre4
nfs-utils 1.0.1
Linux C Library 2.2.93
Dynamic linker (ldd) 2.2.93
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0.12
Modules Loaded ipr firmware_class sg
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1266MHz
stepping : 1
cpu MHz : 1259.071
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 2482.17
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1266MHz
stepping : 1
cpu MHz : 1259.071
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 2506.75
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-03-18 21:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-18 21:48 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
-- strict thread matches above, loose matches on Subject: below --
2004-03-04 20:13 Brian King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox