* PROBLEM: Oops in 2.6.3 with lots of SG_IO activity
@ 2004-03-04 20:13 Brian King
2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2004-03-04 20:13 UTC (permalink / raw)
To: linux-scsi
I have been experiencing occasional oopses in some testing I have been
doing and have recently been able to aggravate the problem to recreate
the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
also doing heavy disk I/O, I can recreate the oops within a few minutes,
although I have also seen the problem under very little load. I have
seen the problem using both the ipr and sym2 drivers.
ksymoops output:
Unable to handle kernel paging request at virtual address c6cf3044
c017e81c
*pde = 0001c067
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c017e81c>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c6cf3044 ebx: c9545df8 ecx: 00000002 edx: c6cf3044
esi: c6cf3048 edi: c6cf3000 ebp: cc7a7c84 esp: cc7a7c74
ds: 007b es: 007b ss: 0068
Stack: cc7a7c84 c9545df8 c6cf3048 c9545df8 cc7a7ca8 d086c547 c6cf3044 0000001d
00020001 d0889000 cad8ebf8 cec4f000 c9170eb8 cc7a7cc8 c032e743 c9170eb8
cafd13a8 000000ff c9170eb8 cc7a7cd8 0000000a cc7a7ce8 c032e66b c9170eb8
Call Trace:
[<d086c547>] sg_cmd_done+0x167/0x280 [sg]
[<c032e743>] scsi_finish_command+0x73/0xb0
[<c032e66b>] scsi_softirq+0xab/0xd0
[<c012a935>] do_softirq+0xe5/0xf0
[<c010c6b8>] do_IRQ+0x198/0x240
[<c010a50c>] common_interrupt+0x18/0x20
[<c0253a60>] __copy_from_user_ll+0x50/0x80
[<c0171ed7>] blkdev_prepare_write+0x27/0x30
[<c0146aa3>] generic_file_aio_write_nolock+0x463/0xc20
[<c01472d8>] generic_file_write_nolock+0x78/0x90
[<c011fd52>] default_wake_function+0x22/0x30
[<d086c547>] sg_cmd_done+0x167/0x280 [sg]
[<c032e743>] scsi_finish_command+0x73/0xb0
[<c01733b3>] blkdev_file_write+0x33/0x40
[<c0168b6f>] vfs_write+0xaf/0x120
[<c0168c7f>] sys_write+0x3f/0x60
[<c0109b9f>] syscall_call+0x7/0xb
Code: 8b 0a 85 c9 74 4b bb 00 e0 ff ff 21 e3 ff 43 14 b8 84 d2 4f
>>EIP; c017e81c <kill_fasync+c/75> <=====
>>eax; c6cf3044 <_end+66b5b80/3f9c0b3c>
>>ebx; c9545df8 <_end+8f08934/3f9c0b3c>
>>edx; c6cf3044 <_end+66b5b80/3f9c0b3c>
>>esi; c6cf3048 <_end+66b5b84/3f9c0b3c>
>>edi; c6cf3000 <_end+66b5b3c/3f9c0b3c>
>>ebp; cc7a7c84 <_end+c16a7c0/3f9c0b3c>
>>esp; cc7a7c74 <_end+c16a7b0/3f9c0b3c>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c032e66b <scsi_softirq+ab/d0>
Trace; c012a935 <do_softirq+e5/f0>
Trace; c010c6b8 <do_IRQ+198/240>
Trace; c010a50c <common_interrupt+18/20>
Trace; c0253a60 <__copy_from_user_ll+50/80>
Trace; c0171ed7 <blkdev_prepare_write+27/30>
Trace; c0146aa3 <generic_file_aio_write_nolock+463/c20>
Trace; c01472d8 <generic_file_write_nolock+78/90>
Trace; c011fd52 <default_wake_function+22/30>
Trace; d086c547 <_end+1022f083/3f9c0b3c>
Trace; c032e743 <scsi_finish_command+73/b0>
Trace; c01733b3 <blkdev_file_write+33/40>
Trace; c0168b6f <vfs_write+af/120>
Trace; c0168c7f <sys_write+3f/60>
Trace; c0109b9f <syscall_call+7/b>
Code; c017e81c <kill_fasync+c/75>
00000000 <_EIP>:
Code; c017e81c <kill_fasync+c/75> <=====
0: 8b 0a mov (%edx),%ecx <=====
Code; c017e81e <kill_fasync+e/75>
2: 85 c9 test %ecx,%ecx
Code; c017e820 <kill_fasync+10/75>
4: 74 4b je 51 <_EIP+0x51>
Code; c017e822 <kill_fasync+12/75>
6: bb 00 e0 ff ff mov $0xffffe000,%ebx
Code; c017e827 <kill_fasync+17/75>
b: 21 e3 and %esp,%ebx
Code; c017e829 <kill_fasync+19/75>
d: ff 43 14 incl 0x14(%ebx)
Code; c017e82c <kill_fasync+1c/75>
10: b8 84 d2 4f 00 mov $0x4fd284,%eax
Gnu C 3.2
Gnu make 3.79.1
util-linux 2.11r
mount 2.11r
module-init-tools 0.9.12
e2fsprogs 1.27
jfsutils 1.0.17
reiserfsprogs 3.6.2
pcmcia-cs 3.1.31
quota-tools 3.06.
PPP 2.4.1
isdn4k-utils 3.1pre4
nfs-utils 1.0.1
Linux C Library 2.2.93
Dynamic linker (ldd) 2.2.93
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0.12
Modules Loaded ipr firmware_class sg
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1266MHz
stepping : 1
cpu MHz : 1259.071
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 2482.17
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1266MHz
stepping : 1
cpu MHz : 1259.071
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 2506.75
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH]
2004-03-04 20:13 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
@ 2004-03-08 22:01 ` Brian King
2004-03-09 15:29 ` Brian King
0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2004-03-08 22:01 UTC (permalink / raw)
To: dougg; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 818 bytes --]
Attached is a patch which seems to fix the oops for me. Without the patch
I can consistently reproduce the oops in just a couple minutes. With the
patch I have been running for close to an hour without problems so far.
Doug, does this look ok? I'm going to let my testcase run overnight as well
and will post the results tomorrow.
> I have been experiencing occasional oopses in some testing I have been
> doing and have recently been able to aggravate the problem to recreate
> the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
> also doing heavy disk I/O, I can recreate the oops within a few minutes,
> although I have also seen the problem under very little load. I have
> seen the problem using both the ipr and sym2 drivers.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
[-- Attachment #2: sg_cmd_done_oops.patch --]
[-- Type: text/plain, Size: 939 bytes --]
The patch fixes a race condition in sg_cmd_done that results in an oops.
---
diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
--- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops 2004-03-06 22:08:45.000000000 -0600
+++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c 2004-03-06 22:55:12.000000000 -0600
@@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
srp->my_cmdp = NULL;
- srp->done = 1;
SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
@@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
}
if (sfp && srp) {
/* Now wake up any sg_read() that is waiting for this packet. */
- wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+ srp->done = 1;
+ wake_up_interruptible(&sfp->read_wait);
}
}
_
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH]
2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
@ 2004-03-09 15:29 ` Brian King
2004-03-09 16:02 ` Tony Battersby
0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2004-03-09 15:29 UTC (permalink / raw)
To: Brian King; +Cc: dougg, linux-scsi
Testcase ran overnight without any problems.
-Brian
Brian King wrote:
> Attached is a patch which seems to fix the oops for me. Without the patch
> I can consistently reproduce the oops in just a couple minutes. With the
> patch I have been running for close to an hour without problems so far.
> Doug, does this look ok? I'm going to let my testcase run overnight as well
> and will post the results tomorrow.
>
>
>> I have been experiencing occasional oopses in some testing I have been
>> doing and have recently been able to aggravate the problem to recreate
>> the oops quite quickly. If I do lots of overlapped SG_IO ioctls while
>> also doing heavy disk I/O, I can recreate the oops within a few minutes,
>> although I have also seen the problem under very little load. I have
>> seen the problem using both the ipr and sym2 drivers.
>
>
>
>
> ------------------------------------------------------------------------
>
>
> The patch fixes a race condition in sg_cmd_done that results in an oops.
>
>
> ---
>
>
> diff -puN drivers/scsi/sg.c~sg_cmd_done_oops drivers/scsi/sg.c
> --- linux-2.6.4-rc2/drivers/scsi/sg.c~sg_cmd_done_oops 2004-03-06 22:08:45.000000000 -0600
> +++ linux-2.6.4-rc2-brking/drivers/scsi/sg.c 2004-03-06 22:55:12.000000000 -0600
> @@ -1256,7 +1256,6 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
> SRpnt->sr_request->rq_disk = NULL; /* "sg" _disowns_ request blk */
>
> srp->my_cmdp = NULL;
> - srp->done = 1;
>
> SCSI_LOG_TIMEOUT(4, printk("sg_cmd_done: %s, pack_id=%d, res=0x%x\n",
> sdp->disk->disk_name, srp->header.pack_id, (int) SRpnt->sr_result));
> @@ -1312,8 +1311,9 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
> }
> if (sfp && srp) {
> /* Now wake up any sg_read() that is waiting for this packet. */
> - wake_up_interruptible(&sfp->read_wait);
> kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
> + srp->done = 1;
> + wake_up_interruptible(&sfp->read_wait);
> }
> }
>
>
> _
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-03-09 17:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-04 20:13 PROBLEM: Oops in 2.6.3 with lots of SG_IO activity Brian King
2004-03-08 22:01 ` PROBLEM: Oops in 2.6.3 with lots of SG_IO activity - [PATCH] Brian King
2004-03-09 15:29 ` Brian King
2004-03-09 16:02 ` Tony Battersby
2004-03-09 17:30 ` Brian King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox