kernel crash when BSG request timesout

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* kernel crash when BSG request timesout
@ 2009-05-22 20:51 Giridhar Malavali
  2009-05-24 11:00 ` Boaz Harrosh
  0 siblings, 1 reply; 9+ messages in thread
From: Giridhar Malavali @ 2009-05-22 20:51 UTC (permalink / raw)
  To: linux-scsi

Hi,

	While testing the FC pass thru support I am constantly hitting a  
kernel crash when BSG request times out.
I took the latest FC pass thru patches from James Smart from
http://marc.info/?l=linux-scsi&m=123436574018579&w=2. and on top of it  
applied Boaz patches from
http://markmail.org/search/?q=FC+pass-through+support+&x=0&y=0#query:FC 
%20passthrough%20support%20from%3A%22Boaz%20Harrosh%22+page:2+mid:ke4lj4cg5ftc6nsc+state:results

Is there any additional patches I am missing?

Thanks,
Giridhar.M.B

[ 1464.584437] ------------[ cut here ]------------
[ 1464.584437] kernel BUG at block/blk-softirq.c:110!
[ 1464.584437] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1464.584437] last sysfs file: /sys/devices/system/cpu/cpu3/cache/ 
index2/shared_cpu_map
[ 1464.584437] CPU 3
[ 1464.584437] Modules linked in: qla2xxx netconsole scsi_transport_fc  
[last unloaded: qla2xxx]
[ 1464.584437] Pid: 0, comm: swapper Not tainted 2.6.30-rc4 #3 X7DB8
[ 1464.584437] RIP: 0010:[<ffffffff80361112>]  [<ffffffff80361112>]  
__blk_complete_request+0xe8/0xec
[ 1464.584437] RSP: 0018:ffff880001063e10  EFLAGS: 00010046
[ 1464.584437] RAX: 0000000000000001 RBX: ffff88007ab93e80 RCX:  
ffffffff8070f680
[ 1464.584437] RDX: 0000000000008988 RSI: 0000000000000086 RDI:  
ffff88007ab93e80
[ 1464.584437] RBP: ffff880001063e30 R08: 00000000ffffffff R09:  
0000000000000003
[ 1464.584437] R10: 000000000000000a R11: 0000000000000000 R12:  
ffff88007a8b26c8
[ 1464.584437] R13: ffff88007a8b2a70 R14: ffff88007a8b26c8 R15:  
0000000000000286
[ 1464.584437] FS:  0000000000000000(0000) GS:ffff880001060000(0000)  
knlGS:0000000000000000
[ 1464.584437] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 1464.584437] CR2: 00007f7943ffd4a8 CR3: 000000007fb1e000 CR4:  
00000000000006e0
[ 1464.584437] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  
0000000000000000
[ 1464.584437] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:  
0000000000000400
[ 1464.584437] Process swapper (pid: 0, threadinfo ffff88007f9fe000,  
task ffff88007f9e1990)
[ 1464.584437] Stack:
[ 1464.584437]  ffff88007a8b2970 ffff88007ab93e80 0000000000000000  
ffff88007a8b2a70
[ 1464.584437]  ffff880001063e50 ffffffff80361299 ffff88007a8b26c8  
ffff88007a8b2930
[ 1464.584437]  ffff880001063e90 ffffffff803614e5 ffff88007f9b8000  
ffff88007a8b26c8
[ 1464.584437] Call Trace:
[ 1464.584437]  <IRQ> <0> [<ffffffff80361299>] blk_rq_timed_out 
+0x48/0x67
[ 1464.584437]  [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121
[ 1464.584437]  [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121
[ 1464.584437]  [<ffffffff80240857>] run_timer_softirq+0x147/0x215
[ 1464.584437]  [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68
[ 1464.584437]  [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3
[ 1464.584437]  [<ffffffff8020c36c>] call_softirq+0x1c/0x30
[ 1464.584437]  [<ffffffff8020de61>] do_softirq+0x61/0xa0
[ 1464.584437]  [<ffffffff8023b8b1>] irq_exit+0x51/0x59
[ 1464.584437]  [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96
[ 1464.584437]  [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20
[ 1464.584437]  <EOI> <0> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f
[ 1464.584437]  [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f
[ 1464.584437]  [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97
[ 1464.584437]  [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df
[ 1464.584437] Code: b7 0f 36 80 48 89 5b 28 66 c7 43 30 00 00 48 8d  
73 10 31 d2 e8 4c 8a ef ff eb b2 bf 04 00 00 00 e8 05 a3 ed ff 0f 1f  
40 00 eb a2 <0f> 0b eb fe 55 48 89 e5 48 8d 47 50 f0 0f ba 28 00 19 d2  
85 d2
[ 1464.584437] RIP  [<ffffffff80361112>] __blk_complete_request 
+0xe8/0xec
[ 1464.584437]  RSP <ffff880001063e10>
[ 1464.584437] ---[ end trace 7325773d478b6460 ]---
[ 1464.584437] Kernel panic - not syncing: Fatal exception in interrupt
[ 1464.584437] Pid: 0, comm: swapper Tainted: G      D    2.6.30-rc4 #3
[ 1464.584437] Call Trace:
[ 1464.584437]  <IRQ>  [<ffffffff8051098a>] panic+0x75/0x146
[ 1464.584437]  [<ffffffff8020f31b>] oops_end+0x8f/0x97
[ 1464.584437]  [<ffffffff8020f4ea>] die+0x46/0x60
[ 1464.584437]  [<ffffffff8020cb76>] do_trap+0x129/0x152
[ 1464.584437]  [<ffffffff8024f84d>] ? atomic_notifier_call_chain 
+0x15/0x17
[ 1464.584437]  [<ffffffff8020cf62>] do_invalid_op+0x90/0xa1
[ 1464.584437]  [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec
[ 1464.584437]  [<ffffffff80513acf>] ? trace_hardirqs_off_thunk+0x3a/ 
0x6c
[ 1464.584437]  [<ffffffff8020c005>] invalid_op+0x15/0x20
[ 1464.584437]  [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec
[ 1464.584437]  [<ffffffff80361299>] blk_rq_timed_out+0x48/0x67
[ 1464.584437]  [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121
[ 1464.584437]  [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121
[ 1464.584437]  [<ffffffff80240857>] run_timer_softirq+0x147/0x215
[ 1464.584437]  [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68
[ 1464.584437]  [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3
[ 1464.584437]  [<ffffffff8020c36c>] call_softirq+0x1c/0x30
[ 1464.584437]  [<ffffffff8020de61>] do_softirq+0x61/0xa0
[ 1464.584437]  [<ffffffff8023b8b1>] irq_exit+0x51/0x59
[ 1464.584437]  [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96
[ 1464.584437]  [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20
[ 1464.584437]  <EOI>  [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f
[ 1464.584437]  [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f
[ 1464.584437]  [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97
[ 1464.584437]  [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel crash when BSG request timesout
  2009-05-22 20:51 kernel crash when BSG request timesout Giridhar Malavali
@ 2009-05-24 11:00 ` Boaz Harrosh
  2009-05-26 18:38   ` Giridhar Malavali
  0 siblings, 1 reply; 9+ messages in thread
From: Boaz Harrosh @ 2009-05-24 11:00 UTC (permalink / raw)
  To: Giridhar Malavali; +Cc: linux-scsi

On 05/22/2009 11:51 PM, Giridhar Malavali wrote:
> Hi,
> 
> 	While testing the FC pass thru support I am constantly hitting a  
> kernel crash when BSG request times out.
> I took the latest FC pass thru patches from James Smart from
> http://marc.info/?l=linux-scsi&m=123436574018579&w=2. and on top of it  
> applied Boaz patches from
> http://markmail.org/search/?q=FC+pass-through+support+&x=0&y=0#query:FC 
> %20passthrough%20support%20from%3A%22Boaz%20Harrosh%22+page:2+mid:ke4lj4cg5ftc6nsc+state:results
> 
> Is there any additional patches I am missing?
> 
> Thanks,
> Giridhar.M.B
> 
> [ 1464.584437] ------------[ cut here ]------------
> [ 1464.584437] kernel BUG at block/blk-softirq.c:110!
> [ 1464.584437] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 1464.584437] last sysfs file: /sys/devices/system/cpu/cpu3/cache/ 
> index2/shared_cpu_map
> [ 1464.584437] CPU 3
> [ 1464.584437] Modules linked in: qla2xxx netconsole scsi_transport_fc  
> [last unloaded: qla2xxx]
> [ 1464.584437] Pid: 0, comm: swapper Not tainted 2.6.30-rc4 #3 X7DB8
> [ 1464.584437] RIP: 0010:[<ffffffff80361112>]  [<ffffffff80361112>]  
> __blk_complete_request+0xe8/0xec
> [ 1464.584437] RSP: 0018:ffff880001063e10  EFLAGS: 00010046
> [ 1464.584437] RAX: 0000000000000001 RBX: ffff88007ab93e80 RCX:  
> ffffffff8070f680
> [ 1464.584437] RDX: 0000000000008988 RSI: 0000000000000086 RDI:  
> ffff88007ab93e80
> [ 1464.584437] RBP: ffff880001063e30 R08: 00000000ffffffff R09:  
> 0000000000000003
> [ 1464.584437] R10: 000000000000000a R11: 0000000000000000 R12:  
> ffff88007a8b26c8
> [ 1464.584437] R13: ffff88007a8b2a70 R14: ffff88007a8b26c8 R15:  
> 0000000000000286
> [ 1464.584437] FS:  0000000000000000(0000) GS:ffff880001060000(0000)  
> knlGS:0000000000000000
> [ 1464.584437] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 1464.584437] CR2: 00007f7943ffd4a8 CR3: 000000007fb1e000 CR4:  
> 00000000000006e0
> [ 1464.584437] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  
> 0000000000000000
> [ 1464.584437] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:  
> 0000000000000400
> [ 1464.584437] Process swapper (pid: 0, threadinfo ffff88007f9fe000,  
> task ffff88007f9e1990)
> [ 1464.584437] Stack:
> [ 1464.584437]  ffff88007a8b2970 ffff88007ab93e80 0000000000000000  
> ffff88007a8b2a70
> [ 1464.584437]  ffff880001063e50 ffffffff80361299 ffff88007a8b26c8  
> ffff88007a8b2930
> [ 1464.584437]  ffff880001063e90 ffffffff803614e5 ffff88007f9b8000  
> ffff88007a8b26c8
> [ 1464.584437] Call Trace:
> [ 1464.584437]  <IRQ> <0> [<ffffffff80361299>] blk_rq_timed_out 
> +0x48/0x67
> [ 1464.584437]  [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121
> [ 1464.584437]  [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121
> [ 1464.584437]  [<ffffffff80240857>] run_timer_softirq+0x147/0x215
> [ 1464.584437]  [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68
> [ 1464.584437]  [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3
> [ 1464.584437]  [<ffffffff8020c36c>] call_softirq+0x1c/0x30
> [ 1464.584437]  [<ffffffff8020de61>] do_softirq+0x61/0xa0
> [ 1464.584437]  [<ffffffff8023b8b1>] irq_exit+0x51/0x59
> [ 1464.584437]  [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96
> [ 1464.584437]  [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20
> [ 1464.584437]  <EOI> <0> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f
> [ 1464.584437]  [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f
> [ 1464.584437]  [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97
> [ 1464.584437]  [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df
> [ 1464.584437] Code: b7 0f 36 80 48 89 5b 28 66 c7 43 30 00 00 48 8d  
> 73 10 31 d2 e8 4c 8a ef ff eb b2 bf 04 00 00 00 e8 05 a3 ed ff 0f 1f  
> 40 00 eb a2 <0f> 0b eb fe 55 48 89 e5 48 8d 47 50 f0 0f ba 28 00 19 d2  
> 85 d2
> [ 1464.584437] RIP  [<ffffffff80361112>] __blk_complete_request 
> +0xe8/0xec
> [ 1464.584437]  RSP <ffff880001063e10>
> [ 1464.584437] ---[ end trace 7325773d478b6460 ]---
> [ 1464.584437] Kernel panic - not syncing: Fatal exception in interrupt
> [ 1464.584437] Pid: 0, comm: swapper Tainted: G      D    2.6.30-rc4 #3
> [ 1464.584437] Call Trace:
> [ 1464.584437]  <IRQ>  [<ffffffff8051098a>] panic+0x75/0x146
> [ 1464.584437]  [<ffffffff8020f31b>] oops_end+0x8f/0x97
> [ 1464.584437]  [<ffffffff8020f4ea>] die+0x46/0x60
> [ 1464.584437]  [<ffffffff8020cb76>] do_trap+0x129/0x152
> [ 1464.584437]  [<ffffffff8024f84d>] ? atomic_notifier_call_chain 
> +0x15/0x17
> [ 1464.584437]  [<ffffffff8020cf62>] do_invalid_op+0x90/0xa1
> [ 1464.584437]  [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec
> [ 1464.584437]  [<ffffffff80513acf>] ? trace_hardirqs_off_thunk+0x3a/ 
> 0x6c
> [ 1464.584437]  [<ffffffff8020c005>] invalid_op+0x15/0x20
> [ 1464.584437]  [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec
> [ 1464.584437]  [<ffffffff80361299>] blk_rq_timed_out+0x48/0x67
> [ 1464.584437]  [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121
> [ 1464.584437]  [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121
> [ 1464.584437]  [<ffffffff80240857>] run_timer_softirq+0x147/0x215
> [ 1464.584437]  [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68
> [ 1464.584437]  [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3
> [ 1464.584437]  [<ffffffff8020c36c>] call_softirq+0x1c/0x30
> [ 1464.584437]  [<ffffffff8020de61>] do_softirq+0x61/0xa0
> [ 1464.584437]  [<ffffffff8023b8b1>] irq_exit+0x51/0x59
> [ 1464.584437]  [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96
> [ 1464.584437]  [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20
> [ 1464.584437]  <EOI>  [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f
> [ 1464.584437]  [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f
> [ 1464.584437]  [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97
> [ 1464.584437]  [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df
> 
> --

I did not exactly understand which tree are you using. There where
lots of related changes around these areas

Please try James post merge tree for the FC pass through support.
It has all you need:
 git clone git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-post-merge-2.6.git

Thanks
Boaz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel crash when BSG request timesout
  2009-05-24 11:00 ` Boaz Harrosh
@ 2009-05-26 18:38   ` Giridhar Malavali
  2009-05-28  6:01     ` FUJITA Tomonori
  0 siblings, 1 reply; 9+ messages in thread
From: Giridhar Malavali @ 2009-05-26 18:38 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: linux-scsi@vger.kernel.org

Thanks for the pointer. I will check with the post-merge tree.

	The crash I am seeing is because of softirq_done_fn not set in the  
request queue for BSG request. Even in the post-merge tree I don't see  
FC transport setting this function during the allocation of the  
request queue.  When BSG request times out, I see that it executes  
__blk_complete_request function where check is done for its existence.  
I see this getting set for SCSI request during queue allocation in  
scsi_lib.c. Is this required for BSG request?

Thanks,
Giridhar.M.B



On May 24, 2009, at 4:00 AM, Boaz Harrosh wrote:

> On 05/22/2009 11:51 PM, Giridhar Malavali wrote:
>> Hi,
>>
>> 	While testing the FC pass thru support I am constantly hitting a
>> kernel crash when BSG request times out.
>> I took the latest FC pass thru patches from James Smart from
>> http://marc.info/?l=linux-scsi&m=123436574018579&w=2. and on top of  
>> it
>> applied Boaz patches from
>> http://markmail.org/search/?q=FC+pass-through+support 
>> +&x=0&y=0#query:FC
>> %20passthrough%20support%20from%3A%22Boaz%20Harrosh%22+page:2+mid:ke4lj4cg5ftc6nsc+state:results
>>
>> Is there any additional patches I am missing?
>>
>> Thanks,
>> Giridhar.M.B
>>
>> [ 1464.584437] ------------[ cut here ]------------
>> [ 1464.584437] kernel BUG at block/blk-softirq.c:110!
>> [ 1464.584437] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [ 1464.584437] last sysfs file: /sys/devices/system/cpu/cpu3/cache/
>> index2/shared_cpu_map
>> [ 1464.584437] CPU 3
>> [ 1464.584437] Modules linked in: qla2xxx netconsole  
>> scsi_transport_fc
>> [last unloaded: qla2xxx]
>> [ 1464.584437] Pid: 0, comm: swapper Not tainted 2.6.30-rc4 #3 X7DB8
>> [ 1464.584437] RIP: 0010:[<ffffffff80361112>]  [<ffffffff80361112>]
>> __blk_complete_request+0xe8/0xec
>> [ 1464.584437] RSP: 0018:ffff880001063e10  EFLAGS: 00010046
>> [ 1464.584437] RAX: 0000000000000001 RBX: ffff88007ab93e80 RCX:
>> ffffffff8070f680
>> [ 1464.584437] RDX: 0000000000008988 RSI: 0000000000000086 RDI:
>> ffff88007ab93e80
>> [ 1464.584437] RBP: ffff880001063e30 R08: 00000000ffffffff R09:
>> 0000000000000003
>> [ 1464.584437] R10: 000000000000000a R11: 0000000000000000 R12:
>> ffff88007a8b26c8
>> [ 1464.584437] R13: ffff88007a8b2a70 R14: ffff88007a8b26c8 R15:
>> 0000000000000286
>> [ 1464.584437] FS:  0000000000000000(0000) GS:ffff880001060000(0000)
>> knlGS:0000000000000000
>> [ 1464.584437] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> [ 1464.584437] CR2: 00007f7943ffd4a8 CR3: 000000007fb1e000 CR4:
>> 00000000000006e0
>> [ 1464.584437] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [ 1464.584437] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [ 1464.584437] Process swapper (pid: 0, threadinfo ffff88007f9fe000,
>> task ffff88007f9e1990)
>> [ 1464.584437] Stack:
>> [ 1464.584437]  ffff88007a8b2970 ffff88007ab93e80 0000000000000000
>> ffff88007a8b2a70
>> [ 1464.584437]  ffff880001063e50 ffffffff80361299 ffff88007a8b26c8
>> ffff88007a8b2930
>> [ 1464.584437]  ffff880001063e90 ffffffff803614e5 ffff88007f9b8000
>> ffff88007a8b26c8
>> [ 1464.584437] Call Trace:
>> [ 1464.584437]  <IRQ> <0> [<ffffffff80361299>] blk_rq_timed_out
>> +0x48/0x67
>> [ 1464.584437]  [<ffffffff803614e5>] blk_rq_timed_out_timer 
>> +0xd6/0x121
>> [ 1464.584437]  [<ffffffff8036140f>] ? blk_rq_timed_out_timer 
>> +0x0/0x121
>> [ 1464.584437]  [<ffffffff80240857>] run_timer_softirq+0x147/0x215
>> [ 1464.584437]  [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68
>> [ 1464.584437]  [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3
>> [ 1464.584437]  [<ffffffff8020c36c>] call_softirq+0x1c/0x30
>> [ 1464.584437]  [<ffffffff8020de61>] do_softirq+0x61/0xa0
>> [ 1464.584437]  [<ffffffff8023b8b1>] irq_exit+0x51/0x59
>> [ 1464.584437]  [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/ 
>> 0x96
>> [ 1464.584437]  [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20
>> [ 1464.584437]  <EOI> <0> [<ffffffff80212ac8>] ? mwait_idle+0xfe/ 
>> 0x10f
>> [ 1464.584437]  [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f
>> [ 1464.584437]  [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97
>> [ 1464.584437]  [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df
>> [ 1464.584437] Code: b7 0f 36 80 48 89 5b 28 66 c7 43 30 00 00 48 8d
>> 73 10 31 d2 e8 4c 8a ef ff eb b2 bf 04 00 00 00 e8 05 a3 ed ff 0f 1f
>> 40 00 eb a2 <0f> 0b eb fe 55 48 89 e5 48 8d 47 50 f0 0f ba 28 00 19  
>> d2
>> 85 d2
>> [ 1464.584437] RIP  [<ffffffff80361112>] __blk_complete_request
>> +0xe8/0xec
>> [ 1464.584437]  RSP <ffff880001063e10>
>> [ 1464.584437] ---[ end trace 7325773d478b6460 ]---
>> [ 1464.584437] Kernel panic - not syncing: Fatal exception in  
>> interrupt
>> [ 1464.584437] Pid: 0, comm: swapper Tainted: G      D    2.6.30- 
>> rc4 #3
>> [ 1464.584437] Call Trace:
>> [ 1464.584437]  <IRQ>  [<ffffffff8051098a>] panic+0x75/0x146
>> [ 1464.584437]  [<ffffffff8020f31b>] oops_end+0x8f/0x97
>> [ 1464.584437]  [<ffffffff8020f4ea>] die+0x46/0x60
>> [ 1464.584437]  [<ffffffff8020cb76>] do_trap+0x129/0x152
>> [ 1464.584437]  [<ffffffff8024f84d>] ? atomic_notifier_call_chain
>> +0x15/0x17
>> [ 1464.584437]  [<ffffffff8020cf62>] do_invalid_op+0x90/0xa1
>> [ 1464.584437]  [<ffffffff80361112>] ? __blk_complete_request 
>> +0xe8/0xec
>> [ 1464.584437]  [<ffffffff80513acf>] ? trace_hardirqs_off_thunk+0x3a/
>> 0x6c
>> [ 1464.584437]  [<ffffffff8020c005>] invalid_op+0x15/0x20
>> [ 1464.584437]  [<ffffffff80361112>] ? __blk_complete_request 
>> +0xe8/0xec
>> [ 1464.584437]  [<ffffffff80361299>] blk_rq_timed_out+0x48/0x67
>> [ 1464.584437]  [<ffffffff803614e5>] blk_rq_timed_out_timer 
>> +0xd6/0x121
>> [ 1464.584437]  [<ffffffff8036140f>] ? blk_rq_timed_out_timer 
>> +0x0/0x121
>> [ 1464.584437]  [<ffffffff80240857>] run_timer_softirq+0x147/0x215
>> [ 1464.584437]  [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68
>> [ 1464.584437]  [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3
>> [ 1464.584437]  [<ffffffff8020c36c>] call_softirq+0x1c/0x30
>> [ 1464.584437]  [<ffffffff8020de61>] do_softirq+0x61/0xa0
>> [ 1464.584437]  [<ffffffff8023b8b1>] irq_exit+0x51/0x59
>> [ 1464.584437]  [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/ 
>> 0x96
>> [ 1464.584437]  [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20
>> [ 1464.584437]  <EOI>  [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f
>> [ 1464.584437]  [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f
>> [ 1464.584437]  [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97
>> [ 1464.584437]  [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df
>>
>> --
>
> I did not exactly understand which tree are you using. There where
> lots of related changes around these areas
>
> Please try James post merge tree for the FC pass through support.
> It has all you need:
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi- 
> post-merge-2.6.git
>
> Thanks
> Boaz


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel crash when BSG request timesout
  2009-05-26 18:38   ` Giridhar Malavali
@ 2009-05-28  6:01     ` FUJITA Tomonori
  2009-05-28  6:12       ` FUJITA Tomonori
  2009-05-28 13:54       ` Douglas Gilbert
  0 siblings, 2 replies; 9+ messages in thread
From: FUJITA Tomonori @ 2009-05-28  6:01 UTC (permalink / raw)
  To: giridhar.malavali; +Cc: bharrosh, linux-scsi, James.Smart

CC'ed James Smart,

On Tue, 26 May 2009 11:38:14 -0700
Giridhar Malavali <giridhar.malavali@qlogic.com> wrote:

> Thanks for the pointer. I will check with the post-merge tree.
> 
> 	The crash I am seeing is because of softirq_done_fn not set in the  
> request queue for BSG request. Even in the post-merge tree I don't see  
> FC transport setting this function during the allocation of the  
> request queue.  When BSG request times out, I see that it executes  
> __blk_complete_request function where check is done for its existence.  
> I see this getting set for SCSI request during queue allocation in  
> scsi_lib.c. Is this required for BSG request?

Yeah, you need to set q->softirq_done_fn if you use the block timeout
infrastructure. The current bsg user, SMP, uses bsg but it doesn't use
the timeout infrastructure so it doesn't set q->softirq_done_fn.

If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't
expect that q->softirq_done_fn frees the request (currently,
fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn
for it.

The attached patch works? It just adds q->softirq_done_fn and moves
fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout
returns BLK_EH_NOT_HANDLED when a job is done since the job will be
finished shortly so we don't want the block layer to do anything for
the job.

It might be better to use q->softirq_done_fn for all the requests not
only for expired requests, as SCSI-ml does, that is, job->job_done
calls blk_complete_request().


diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 3f64d93..c58e33a 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job)
 	fc_destroy_bsgjob(job);
 }
 
+static void fc_bsg_softirq_done(struct request *rq)
+{
+	struct fc_bsg_job *job = rq->special;
+	unsigned long flags;
+
+	spin_lock_irqsave(&job->job_lock, flags);
+	job->ref_cnt--;
+	spin_unlock_irqrestore(&job->job_lock, flags);
+	fc_destroy_bsgjob(job);
+}
 
 /**
  * fc_bsg_job_timeout - handler for when a bsg request timesout
@@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req)
 				"abort failed with status %d\n", err);
 	}
 
-	if (!done) {
-		spin_lock_irqsave(&job->job_lock, flags);
-		job->ref_cnt--;
-		spin_unlock_irqrestore(&job->job_lock, flags);
-		fc_destroy_bsgjob(job);
-	}
-
-	/* the blk_end_sync_io() doesn't check the error */
-	return BLK_EH_HANDLED;
+	if (done)
+		return BLK_EH_NOT_HANDLED;
+	else
+		return BLK_EH_HANDLED;
 }
 
-
-
 static int
 fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req)
 {
@@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct fc_host_attrs *fc_host)
 
 	q->queuedata = shost;
 	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
+	blk_queue_softirq_done(q, fc_bsg_softirq_done);
 	blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
 	blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: kernel crash when BSG request timesout
  2009-05-28  6:01     ` FUJITA Tomonori
@ 2009-05-28  6:12       ` FUJITA Tomonori
  2009-06-10  7:56         ` [Suspected SPAM] " Giridhar Malavali
  2009-05-28 13:54       ` Douglas Gilbert
  1 sibling, 1 reply; 9+ messages in thread
From: FUJITA Tomonori @ 2009-05-28  6:12 UTC (permalink / raw)
  To: giridhar.malavali; +Cc: bharrosh, linux-scsi, James.Smart

On Thu, 28 May 2009 15:01:21 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> CC'ed James Smart,
> 
> On Tue, 26 May 2009 11:38:14 -0700
> Giridhar Malavali <giridhar.malavali@qlogic.com> wrote:
> 
> > Thanks for the pointer. I will check with the post-merge tree.
> > 
> > 	The crash I am seeing is because of softirq_done_fn not set in the  
> > request queue for BSG request. Even in the post-merge tree I don't see  
> > FC transport setting this function during the allocation of the  
> > request queue.  When BSG request times out, I see that it executes  
> > __blk_complete_request function where check is done for its existence.  
> > I see this getting set for SCSI request during queue allocation in  
> > scsi_lib.c. Is this required for BSG request?
> 
> Yeah, you need to set q->softirq_done_fn if you use the block timeout
> infrastructure. The current bsg user, SMP, uses bsg but it doesn't use
> the timeout infrastructure so it doesn't set q->softirq_done_fn.
> 
> If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't
> expect that q->softirq_done_fn frees the request (currently,
> fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn
> for it.

Oops, 

If q->rq_timed_out_fn returns BLK_EH_HANDLED, the block layer doesn't
expect that q->rq_timed_out_fn frees the request (currently,
fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn to
clean up the request.


> The attached patch works? It just adds q->softirq_done_fn and moves
> fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout
> returns BLK_EH_NOT_HANDLED when a job is done since the job will be
> finished shortly so we don't want the block layer to do anything for
> the job.
> 
> It might be better to use q->softirq_done_fn for all the requests not
> only for expired requests, as SCSI-ml does, that is, job->job_done
> calls blk_complete_request().
> 
> 
> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> index 3f64d93..c58e33a 100644
> --- a/drivers/scsi/scsi_transport_fc.c
> +++ b/drivers/scsi/scsi_transport_fc.c
> @@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job)
>  	fc_destroy_bsgjob(job);
>  }
>  
> +static void fc_bsg_softirq_done(struct request *rq)
> +{
> +	struct fc_bsg_job *job = rq->special;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&job->job_lock, flags);
> +	job->ref_cnt--;
> +	spin_unlock_irqrestore(&job->job_lock, flags);
> +	fc_destroy_bsgjob(job);
> +}
>  
>  /**
>   * fc_bsg_job_timeout - handler for when a bsg request timesout
> @@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req)
>  				"abort failed with status %d\n", err);
>  	}
>  
> -	if (!done) {
> -		spin_lock_irqsave(&job->job_lock, flags);
> -		job->ref_cnt--;
> -		spin_unlock_irqrestore(&job->job_lock, flags);
> -		fc_destroy_bsgjob(job);
> -	}
> -
> -	/* the blk_end_sync_io() doesn't check the error */
> -	return BLK_EH_HANDLED;
> +	if (done)
> +		return BLK_EH_NOT_HANDLED;
> +	else
> +		return BLK_EH_HANDLED;
>  }
>  
> -
> -
>  static int
>  fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req)
>  {
> @@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct fc_host_attrs *fc_host)
>  
>  	q->queuedata = shost;
>  	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
> +	blk_queue_softirq_done(q, fc_bsg_softirq_done);
>  	blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
>  	blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel crash when BSG request timesout
  2009-05-28  6:01     ` FUJITA Tomonori
  2009-05-28  6:12       ` FUJITA Tomonori
@ 2009-05-28 13:54       ` Douglas Gilbert
  2009-05-28 22:23         ` FUJITA Tomonori
  1 sibling, 1 reply; 9+ messages in thread
From: Douglas Gilbert @ 2009-05-28 13:54 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: giridhar.malavali, bharrosh, linux-scsi, James.Smart

FUJITA Tomonori wrote:
> CC'ed James Smart,
> 
> On Tue, 26 May 2009 11:38:14 -0700
> Giridhar Malavali <giridhar.malavali@qlogic.com> wrote:
> 
>> Thanks for the pointer. I will check with the post-merge tree.
>>
>> 	The crash I am seeing is because of softirq_done_fn not set in the  
>> request queue for BSG request. Even in the post-merge tree I don't see  
>> FC transport setting this function during the allocation of the  
>> request queue.  When BSG request times out, I see that it executes  
>> __blk_complete_request function where check is done for its existence.  
>> I see this getting set for SCSI request during queue allocation in  
>> scsi_lib.c. Is this required for BSG request?
> 
> Yeah, you need to set q->softirq_done_fn if you use the block timeout
> infrastructure. The current bsg user, SMP, uses bsg but it doesn't use
> the timeout infrastructure so it doesn't set q->softirq_done_fn.

sg3_utils version 1.27 (and later) is a user of bsg, sending
SCSI commands through. Will timeouts works? [I didn't check.]

Doug Gilbert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel crash when BSG request timesout
  2009-05-28 13:54       ` Douglas Gilbert
@ 2009-05-28 22:23         ` FUJITA Tomonori
  0 siblings, 0 replies; 9+ messages in thread
From: FUJITA Tomonori @ 2009-05-28 22:23 UTC (permalink / raw)
  To: dgilbert
  Cc: fujita.tomonori, giridhar.malavali, bharrosh, linux-scsi,
	James.Smart

On Thu, 28 May 2009 09:54:28 -0400
Douglas Gilbert <dgilbert@interlog.com> wrote:

> FUJITA Tomonori wrote:
> > CC'ed James Smart,
> > 
> > On Tue, 26 May 2009 11:38:14 -0700
> > Giridhar Malavali <giridhar.malavali@qlogic.com> wrote:
> > 
> >> Thanks for the pointer. I will check with the post-merge tree.
> >>
> >> 	The crash I am seeing is because of softirq_done_fn not set in the  
> >> request queue for BSG request. Even in the post-merge tree I don't see  
> >> FC transport setting this function during the allocation of the  
> >> request queue.  When BSG request times out, I see that it executes  
> >> __blk_complete_request function where check is done for its existence.  
> >> I see this getting set for SCSI request during queue allocation in  
> >> scsi_lib.c. Is this required for BSG request?
> > 
> > Yeah, you need to set q->softirq_done_fn if you use the block timeout
> > infrastructure. The current bsg user, SMP, uses bsg but it doesn't use
> > the timeout infrastructure so it doesn't set q->softirq_done_fn.
> 
> sg3_utils version 1.27 (and later) is a user of bsg, sending
> SCSI commands through. Will timeouts works? [I didn't check.]

Yeah, it should work.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Suspected SPAM] Re: kernel crash when BSG request timesout
  2009-05-28  6:12       ` FUJITA Tomonori
@ 2009-06-10  7:56         ` Giridhar Malavali
  2009-06-10  8:40           ` FUJITA Tomonori
  0 siblings, 1 reply; 9+ messages in thread
From: Giridhar Malavali @ 2009-06-10  7:56 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: bharrosh@panasas.com, linux-scsi@vger.kernel.org,
	James.Smart@Emulex.Com


	After applying the changes from Fujita, I see that application never  
completes when BSG time out happens.  Once the BSG request times out,  
I see fc_bsg_softirq_done routine destroying the bsg_job but does not  
send any response back to the application. The application infinitely  
waits for the response with following warning message

Jun  9 17:22:09 elab60 kernel: [  480.666830]INFO: task sgv4_els:6058  
blocked for more than 120 seconds.
Jun  9 17:22:09 elab60 kernel: [  480.666833] "echo 0 > /proc/sys/ 
kernel/hung_task_timeout_secs" disables this message.
Jun  9 17:22:09 elab60 kernel: [  480.666835] sgv4_els      D  
0000000000000000     0  6058   5993
Jun  9 17:22:09 elab60 kernel: [  480.666838]  ffff88007f173b78  
0000000000000082 0000000000000000 ffffffffa003f880
Jun  9 17:22:09 elab60 kernel: [  480.666842]  ffff880001030000  
000000000000ff00 000000000000c8b8 ffff88007fbf6990
Jun  9 17:22:09 elab60 kernel: [  480.666845]  ffff88007fbf6c18  
00000001a00382cf 00000000ffff3524 ffff88007f93c990
Jun  9 17:22:09 elab60 kernel: [  480.666848] Call Trace:
Jun  9 17:22:09 elab60 kernel: [  480.666858]  [<ffffffffa00015d2>] ?  
fc_bsg_map_buffer+0x2a/0x72 [scsi_transport_fc]
Jun  9 17:22:09 elab60 kernel: [  480.666864]  [<ffffffff8029a2ba>] ?  
cache_alloc_debugcheck_after+0x73/0x243
Jun  9 17:22:09 elab60 kernel: [  480.666868]  [<ffffffff80511ebe>]  
schedule+0x9/0x1d
Jun  9 17:22:09 elab60 kernel: [  480.666871]  [<ffffffff8051210f>]  
schedule_timeout+0x12f/0x164
Jun  9 17:22:09 elab60 kernel: [  480.666873]  [<ffffffff805113f7>]  
wait_for_common+0xb8/0x15e
Jun  9 17:22:09 elab60 kernel: [  480.666878]  [<ffffffff80230feb>] ?  
default_wake_function+0x0/0xf
Jun  9 17:22:09 elab60 kernel: [  480.666880]  [<ffffffff80511527>]  
wait_for_completion+0x18/0x1a
Jun  9 17:22:09 elab60 kernel: [  480.666884]  [<ffffffff80360da6>]  
blk_execute_rq+0x7f/0xc9
Jun  9 17:22:09 elab60 kernel: [  480.666887]  [<ffffffff80365c28>]  
bsg_ioctl+0x1c0/0x227
Jun  9 17:22:09 elab60 kernel: [  480.666890]  [<ffffffff80514362>] ?  
_spin_unlock_irqrestore+0x2b/0x32
Jun  9 17:22:09 elab60 kernel: [  480.666894]  [<ffffffff802adb36>]  
vfs_ioctl+0x2a/0x95
Jun  9 17:22:09 elab60 kernel: [  480.666896]  [<ffffffff802adc22>]  
do_vfs_ioctl+0x81/0x583
Jun  9 17:22:09 elab60 kernel: [  480.666898]  [<ffffffff80514372>] ?  
_spin_unlock+0x9/0xb
Jun  9 17:22:09 elab60 kernel: [  480.666901]  [<ffffffff802ae165>]  
sys_ioctl+0x41/0x65
Jun  9 17:22:09 elab60 kernel: [  480.666904]  [<ffffffff8020b26b>]  
system_call_fastpath+0x16/0x1b

	I see that function blk_end_request_all calls blk_finish_request  
routine to complete the response to application. After adding this  
call in fc_bsg_softirq_done function, the application gets the  
response and completes.

Is this a proper fix? How does block layer request completes when  
timeout happens?

/**
  * fc_bsg_softirq_done - softirq done routine for destroying the bsg  
requests
  * @req:        BSG request that holds the job to be destroyed
  */
static void fc_bsg_softirq_done(struct request *rq)
{
         struct fc_bsg_job *job = rq->special;
         unsigned long flags;

         spin_lock_irqsave(&job->job_lock, flags);
+      job->state_flags |= FC_RQST_STATE_DONE;
         job->ref_cnt--;
         spin_unlock_irqrestore(&job->job_lock, flags);
+      blk_end_request_all(rq, rq->errors);
         fc_destroy_bsgjob(job);

- Giridhar.M.B



Once the BSG request times
On May 27, 2009, at 11:12 PM, FUJITA Tomonori wrote:

> On Thu, 28 May 2009 15:01:21 +0900
> FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
>
>> CC'ed James Smart,
>>
>> On Tue, 26 May 2009 11:38:14 -0700
>> Giridhar Malavali <giridhar.malavali@qlogic.com> wrote:
>>
>>> Thanks for the pointer. I will check with the post-merge tree.
>>>
>>> 	The crash I am seeing is because of softirq_done_fn not set in the
>>> request queue for BSG request. Even in the post-merge tree I don't  
>>> see
>>> FC transport setting this function during the allocation of the
>>> request queue.  When BSG request times out, I see that it executes
>>> __blk_complete_request function where check is done for its  
>>> existence.
>>> I see this getting set for SCSI request during queue allocation in
>>> scsi_lib.c. Is this required for BSG request?
>>
>> Yeah, you need to set q->softirq_done_fn if you use the block timeout
>> infrastructure. The current bsg user, SMP, uses bsg but it doesn't  
>> use
>> the timeout infrastructure so it doesn't set q->softirq_done_fn.
>>
>> If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't
>> expect that q->softirq_done_fn frees the request (currently,
>> fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn
>> for it.
>
> Oops,
>
> If q->rq_timed_out_fn returns BLK_EH_HANDLED, the block layer doesn't
> expect that q->rq_timed_out_fn frees the request (currently,
> fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn to
> clean up the request.
>
>
>> The attached patch works? It just adds q->softirq_done_fn and moves
>> fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout
>> returns BLK_EH_NOT_HANDLED when a job is done since the job will be
>> finished shortly so we don't want the block layer to do anything for
>> the job.
>>
>> It might be better to use q->softirq_done_fn for all the requests not
>> only for expired requests, as SCSI-ml does, that is, job->job_done
>> calls blk_complete_request().
>>
>>
>> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/ 
>> scsi_transport_fc.c
>> index 3f64d93..c58e33a 100644
>> --- a/drivers/scsi/scsi_transport_fc.c
>> +++ b/drivers/scsi/scsi_transport_fc.c
>> @@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job)
>> 	fc_destroy_bsgjob(job);
>> }
>>
>> +static void fc_bsg_softirq_done(struct request *rq)
>> +{
>> +	struct fc_bsg_job *job = rq->special;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&job->job_lock, flags);
>> +	job->ref_cnt--;
>> +	spin_unlock_irqrestore(&job->job_lock, flags);
>> +	fc_destroy_bsgjob(job);
>> +}
>>
>> /**
>>  * fc_bsg_job_timeout - handler for when a bsg request timesout
>> @@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req)
>> 				"abort failed with status %d\n", err);
>> 	}
>>
>> -	if (!done) {
>> -		spin_lock_irqsave(&job->job_lock, flags);
>> -		job->ref_cnt--;
>> -		spin_unlock_irqrestore(&job->job_lock, flags);
>> -		fc_destroy_bsgjob(job);
>> -	}
>> -
>> -	/* the blk_end_sync_io() doesn't check the error */
>> -	return BLK_EH_HANDLED;
>> +	if (done)
>> +		return BLK_EH_NOT_HANDLED;
>> +	else
>> +		return BLK_EH_HANDLED;
>> }
>>
>> -
>> -
>> static int
>> fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req)
>> {
>> @@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost,  
>> struct fc_host_attrs *fc_host)
>>
>> 	q->queuedata = shost;
>> 	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
>> +	blk_queue_softirq_done(q, fc_bsg_softirq_done);
>> 	blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
>> 	blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux- 
>> scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Suspected SPAM] Re: kernel crash when BSG request timesout
  2009-06-10  7:56         ` [Suspected SPAM] " Giridhar Malavali
@ 2009-06-10  8:40           ` FUJITA Tomonori
  0 siblings, 0 replies; 9+ messages in thread
From: FUJITA Tomonori @ 2009-06-10  8:40 UTC (permalink / raw)
  To: giridhar.malavali; +Cc: fujita.tomonori, bharrosh, linux-scsi, James.Smart

On Wed, 10 Jun 2009 00:56:05 -0700
Giridhar Malavali <giridhar.malavali@qlogic.com> wrote:

> 
> 	After applying the changes from Fujita, I see that application never  
> completes when BSG time out happens.  Once the BSG request times out,  
> I see fc_bsg_softirq_done routine destroying the bsg_job but does not  
> send any response back to the application. The application infinitely  
> waits for the response with following warning message
> 
> Jun  9 17:22:09 elab60 kernel: [  480.666830]INFO: task sgv4_els:6058  
> blocked for more than 120 seconds.
> Jun  9 17:22:09 elab60 kernel: [  480.666833] "echo 0 > /proc/sys/ 
> kernel/hung_task_timeout_secs" disables this message.
> Jun  9 17:22:09 elab60 kernel: [  480.666835] sgv4_els      D  
> 0000000000000000     0  6058   5993
> Jun  9 17:22:09 elab60 kernel: [  480.666838]  ffff88007f173b78  
> 0000000000000082 0000000000000000 ffffffffa003f880
> Jun  9 17:22:09 elab60 kernel: [  480.666842]  ffff880001030000  
> 000000000000ff00 000000000000c8b8 ffff88007fbf6990
> Jun  9 17:22:09 elab60 kernel: [  480.666845]  ffff88007fbf6c18  
> 00000001a00382cf 00000000ffff3524 ffff88007f93c990
> Jun  9 17:22:09 elab60 kernel: [  480.666848] Call Trace:
> Jun  9 17:22:09 elab60 kernel: [  480.666858]  [<ffffffffa00015d2>] ?  
> fc_bsg_map_buffer+0x2a/0x72 [scsi_transport_fc]
> Jun  9 17:22:09 elab60 kernel: [  480.666864]  [<ffffffff8029a2ba>] ?  
> cache_alloc_debugcheck_after+0x73/0x243
> Jun  9 17:22:09 elab60 kernel: [  480.666868]  [<ffffffff80511ebe>]  
> schedule+0x9/0x1d
> Jun  9 17:22:09 elab60 kernel: [  480.666871]  [<ffffffff8051210f>]  
> schedule_timeout+0x12f/0x164
> Jun  9 17:22:09 elab60 kernel: [  480.666873]  [<ffffffff805113f7>]  
> wait_for_common+0xb8/0x15e
> Jun  9 17:22:09 elab60 kernel: [  480.666878]  [<ffffffff80230feb>] ?  
> default_wake_function+0x0/0xf
> Jun  9 17:22:09 elab60 kernel: [  480.666880]  [<ffffffff80511527>]  
> wait_for_completion+0x18/0x1a
> Jun  9 17:22:09 elab60 kernel: [  480.666884]  [<ffffffff80360da6>]  
> blk_execute_rq+0x7f/0xc9
> Jun  9 17:22:09 elab60 kernel: [  480.666887]  [<ffffffff80365c28>]  
> bsg_ioctl+0x1c0/0x227
> Jun  9 17:22:09 elab60 kernel: [  480.666890]  [<ffffffff80514362>] ?  
> _spin_unlock_irqrestore+0x2b/0x32
> Jun  9 17:22:09 elab60 kernel: [  480.666894]  [<ffffffff802adb36>]  
> vfs_ioctl+0x2a/0x95
> Jun  9 17:22:09 elab60 kernel: [  480.666896]  [<ffffffff802adc22>]  
> do_vfs_ioctl+0x81/0x583
> Jun  9 17:22:09 elab60 kernel: [  480.666898]  [<ffffffff80514372>] ?  
> _spin_unlock+0x9/0xb
> Jun  9 17:22:09 elab60 kernel: [  480.666901]  [<ffffffff802ae165>]  
> sys_ioctl+0x41/0x65
> Jun  9 17:22:09 elab60 kernel: [  480.666904]  [<ffffffff8020b26b>]  
> system_call_fastpath+0x16/0x1b

Oops, sorry about that.


> 	I see that function blk_end_request_all calls blk_finish_request  
> routine to complete the response to application. After adding this  
> call in fc_bsg_softirq_done function, the application gets the  
> response and completes.
> 
> Is this a proper fix? How does block layer request completes when  
> timeout happens?

Looks ok to me. You need to complete such requests (as your fix does
in fc_bsg_softirq_done), if I understand correctly.


> /**
>   * fc_bsg_softirq_done - softirq done routine for destroying the bsg  
> requests
>   * @req:        BSG request that holds the job to be destroyed
>   */
> static void fc_bsg_softirq_done(struct request *rq)
> {
>          struct fc_bsg_job *job = rq->special;
>          unsigned long flags;
> 
>          spin_lock_irqsave(&job->job_lock, flags);
> +      job->state_flags |= FC_RQST_STATE_DONE;
>          job->ref_cnt--;
>          spin_unlock_irqrestore(&job->job_lock, flags);
> +      blk_end_request_all(rq, rq->errors);
>          fc_destroy_bsgjob(job);
> 

My previous patch with this fix is fine by me for now.

However, as I proposed in the previous mail, I think that it would be
clean if we use q->softirq_done_fn for all the requests not only for
expired requests because fc_bsg_jobdone() does the part of what
fc_bsg_softirq_done() does.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-10  8:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-22 20:51 kernel crash when BSG request timesout Giridhar Malavali
2009-05-24 11:00 ` Boaz Harrosh
2009-05-26 18:38   ` Giridhar Malavali
2009-05-28  6:01     ` FUJITA Tomonori
2009-05-28  6:12       ` FUJITA Tomonori
2009-06-10  7:56         ` [Suspected SPAM] " Giridhar Malavali
2009-06-10  8:40           ` FUJITA Tomonori
2009-05-28 13:54       ` Douglas Gilbert
2009-05-28 22:23         ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox