* kernel crash when BSG request timesout @ 2009-05-22 20:51 Giridhar Malavali 2009-05-24 11:00 ` Boaz Harrosh 0 siblings, 1 reply; 9+ messages in thread From: Giridhar Malavali @ 2009-05-22 20:51 UTC (permalink / raw) To: linux-scsi Hi, While testing the FC pass thru support I am constantly hitting a kernel crash when BSG request times out. I took the latest FC pass thru patches from James Smart from http://marc.info/?l=linux-scsi&m=123436574018579&w=2. and on top of it applied Boaz patches from http://markmail.org/search/?q=FC+pass-through+support+&x=0&y=0#query:FC %20passthrough%20support%20from%3A%22Boaz%20Harrosh%22+page:2+mid:ke4lj4cg5ftc6nsc+state:results Is there any additional patches I am missing? Thanks, Giridhar.M.B [ 1464.584437] ------------[ cut here ]------------ [ 1464.584437] kernel BUG at block/blk-softirq.c:110! [ 1464.584437] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 1464.584437] last sysfs file: /sys/devices/system/cpu/cpu3/cache/ index2/shared_cpu_map [ 1464.584437] CPU 3 [ 1464.584437] Modules linked in: qla2xxx netconsole scsi_transport_fc [last unloaded: qla2xxx] [ 1464.584437] Pid: 0, comm: swapper Not tainted 2.6.30-rc4 #3 X7DB8 [ 1464.584437] RIP: 0010:[<ffffffff80361112>] [<ffffffff80361112>] __blk_complete_request+0xe8/0xec [ 1464.584437] RSP: 0018:ffff880001063e10 EFLAGS: 00010046 [ 1464.584437] RAX: 0000000000000001 RBX: ffff88007ab93e80 RCX: ffffffff8070f680 [ 1464.584437] RDX: 0000000000008988 RSI: 0000000000000086 RDI: ffff88007ab93e80 [ 1464.584437] RBP: ffff880001063e30 R08: 00000000ffffffff R09: 0000000000000003 [ 1464.584437] R10: 000000000000000a R11: 0000000000000000 R12: ffff88007a8b26c8 [ 1464.584437] R13: ffff88007a8b2a70 R14: ffff88007a8b26c8 R15: 0000000000000286 [ 1464.584437] FS: 0000000000000000(0000) GS:ffff880001060000(0000) knlGS:0000000000000000 [ 1464.584437] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 1464.584437] CR2: 00007f7943ffd4a8 CR3: 000000007fb1e000 CR4: 00000000000006e0 [ 1464.584437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1464.584437] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1464.584437] Process swapper (pid: 0, threadinfo ffff88007f9fe000, task ffff88007f9e1990) [ 1464.584437] Stack: [ 1464.584437] ffff88007a8b2970 ffff88007ab93e80 0000000000000000 ffff88007a8b2a70 [ 1464.584437] ffff880001063e50 ffffffff80361299 ffff88007a8b26c8 ffff88007a8b2930 [ 1464.584437] ffff880001063e90 ffffffff803614e5 ffff88007f9b8000 ffff88007a8b26c8 [ 1464.584437] Call Trace: [ 1464.584437] <IRQ> <0> [<ffffffff80361299>] blk_rq_timed_out +0x48/0x67 [ 1464.584437] [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121 [ 1464.584437] [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121 [ 1464.584437] [<ffffffff80240857>] run_timer_softirq+0x147/0x215 [ 1464.584437] [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68 [ 1464.584437] [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3 [ 1464.584437] [<ffffffff8020c36c>] call_softirq+0x1c/0x30 [ 1464.584437] [<ffffffff8020de61>] do_softirq+0x61/0xa0 [ 1464.584437] [<ffffffff8023b8b1>] irq_exit+0x51/0x59 [ 1464.584437] [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96 [ 1464.584437] [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20 [ 1464.584437] <EOI> <0> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f [ 1464.584437] [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f [ 1464.584437] [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97 [ 1464.584437] [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df [ 1464.584437] Code: b7 0f 36 80 48 89 5b 28 66 c7 43 30 00 00 48 8d 73 10 31 d2 e8 4c 8a ef ff eb b2 bf 04 00 00 00 e8 05 a3 ed ff 0f 1f 40 00 eb a2 <0f> 0b eb fe 55 48 89 e5 48 8d 47 50 f0 0f ba 28 00 19 d2 85 d2 [ 1464.584437] RIP [<ffffffff80361112>] __blk_complete_request +0xe8/0xec [ 1464.584437] RSP <ffff880001063e10> [ 1464.584437] ---[ end trace 7325773d478b6460 ]--- [ 1464.584437] Kernel panic - not syncing: Fatal exception in interrupt [ 1464.584437] Pid: 0, comm: swapper Tainted: G D 2.6.30-rc4 #3 [ 1464.584437] Call Trace: [ 1464.584437] <IRQ> [<ffffffff8051098a>] panic+0x75/0x146 [ 1464.584437] [<ffffffff8020f31b>] oops_end+0x8f/0x97 [ 1464.584437] [<ffffffff8020f4ea>] die+0x46/0x60 [ 1464.584437] [<ffffffff8020cb76>] do_trap+0x129/0x152 [ 1464.584437] [<ffffffff8024f84d>] ? atomic_notifier_call_chain +0x15/0x17 [ 1464.584437] [<ffffffff8020cf62>] do_invalid_op+0x90/0xa1 [ 1464.584437] [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec [ 1464.584437] [<ffffffff80513acf>] ? trace_hardirqs_off_thunk+0x3a/ 0x6c [ 1464.584437] [<ffffffff8020c005>] invalid_op+0x15/0x20 [ 1464.584437] [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec [ 1464.584437] [<ffffffff80361299>] blk_rq_timed_out+0x48/0x67 [ 1464.584437] [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121 [ 1464.584437] [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121 [ 1464.584437] [<ffffffff80240857>] run_timer_softirq+0x147/0x215 [ 1464.584437] [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68 [ 1464.584437] [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3 [ 1464.584437] [<ffffffff8020c36c>] call_softirq+0x1c/0x30 [ 1464.584437] [<ffffffff8020de61>] do_softirq+0x61/0xa0 [ 1464.584437] [<ffffffff8023b8b1>] irq_exit+0x51/0x59 [ 1464.584437] [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96 [ 1464.584437] [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20 [ 1464.584437] <EOI> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f [ 1464.584437] [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f [ 1464.584437] [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97 [ 1464.584437] [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel crash when BSG request timesout 2009-05-22 20:51 kernel crash when BSG request timesout Giridhar Malavali @ 2009-05-24 11:00 ` Boaz Harrosh 2009-05-26 18:38 ` Giridhar Malavali 0 siblings, 1 reply; 9+ messages in thread From: Boaz Harrosh @ 2009-05-24 11:00 UTC (permalink / raw) To: Giridhar Malavali; +Cc: linux-scsi On 05/22/2009 11:51 PM, Giridhar Malavali wrote: > Hi, > > While testing the FC pass thru support I am constantly hitting a > kernel crash when BSG request times out. > I took the latest FC pass thru patches from James Smart from > http://marc.info/?l=linux-scsi&m=123436574018579&w=2. and on top of it > applied Boaz patches from > http://markmail.org/search/?q=FC+pass-through+support+&x=0&y=0#query:FC > %20passthrough%20support%20from%3A%22Boaz%20Harrosh%22+page:2+mid:ke4lj4cg5ftc6nsc+state:results > > Is there any additional patches I am missing? > > Thanks, > Giridhar.M.B > > [ 1464.584437] ------------[ cut here ]------------ > [ 1464.584437] kernel BUG at block/blk-softirq.c:110! > [ 1464.584437] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 1464.584437] last sysfs file: /sys/devices/system/cpu/cpu3/cache/ > index2/shared_cpu_map > [ 1464.584437] CPU 3 > [ 1464.584437] Modules linked in: qla2xxx netconsole scsi_transport_fc > [last unloaded: qla2xxx] > [ 1464.584437] Pid: 0, comm: swapper Not tainted 2.6.30-rc4 #3 X7DB8 > [ 1464.584437] RIP: 0010:[<ffffffff80361112>] [<ffffffff80361112>] > __blk_complete_request+0xe8/0xec > [ 1464.584437] RSP: 0018:ffff880001063e10 EFLAGS: 00010046 > [ 1464.584437] RAX: 0000000000000001 RBX: ffff88007ab93e80 RCX: > ffffffff8070f680 > [ 1464.584437] RDX: 0000000000008988 RSI: 0000000000000086 RDI: > ffff88007ab93e80 > [ 1464.584437] RBP: ffff880001063e30 R08: 00000000ffffffff R09: > 0000000000000003 > [ 1464.584437] R10: 000000000000000a R11: 0000000000000000 R12: > ffff88007a8b26c8 > [ 1464.584437] R13: ffff88007a8b2a70 R14: ffff88007a8b26c8 R15: > 0000000000000286 > [ 1464.584437] FS: 0000000000000000(0000) GS:ffff880001060000(0000) > knlGS:0000000000000000 > [ 1464.584437] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > [ 1464.584437] CR2: 00007f7943ffd4a8 CR3: 000000007fb1e000 CR4: > 00000000000006e0 > [ 1464.584437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1464.584437] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 1464.584437] Process swapper (pid: 0, threadinfo ffff88007f9fe000, > task ffff88007f9e1990) > [ 1464.584437] Stack: > [ 1464.584437] ffff88007a8b2970 ffff88007ab93e80 0000000000000000 > ffff88007a8b2a70 > [ 1464.584437] ffff880001063e50 ffffffff80361299 ffff88007a8b26c8 > ffff88007a8b2930 > [ 1464.584437] ffff880001063e90 ffffffff803614e5 ffff88007f9b8000 > ffff88007a8b26c8 > [ 1464.584437] Call Trace: > [ 1464.584437] <IRQ> <0> [<ffffffff80361299>] blk_rq_timed_out > +0x48/0x67 > [ 1464.584437] [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121 > [ 1464.584437] [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121 > [ 1464.584437] [<ffffffff80240857>] run_timer_softirq+0x147/0x215 > [ 1464.584437] [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68 > [ 1464.584437] [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3 > [ 1464.584437] [<ffffffff8020c36c>] call_softirq+0x1c/0x30 > [ 1464.584437] [<ffffffff8020de61>] do_softirq+0x61/0xa0 > [ 1464.584437] [<ffffffff8023b8b1>] irq_exit+0x51/0x59 > [ 1464.584437] [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96 > [ 1464.584437] [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20 > [ 1464.584437] <EOI> <0> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f > [ 1464.584437] [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f > [ 1464.584437] [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97 > [ 1464.584437] [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df > [ 1464.584437] Code: b7 0f 36 80 48 89 5b 28 66 c7 43 30 00 00 48 8d > 73 10 31 d2 e8 4c 8a ef ff eb b2 bf 04 00 00 00 e8 05 a3 ed ff 0f 1f > 40 00 eb a2 <0f> 0b eb fe 55 48 89 e5 48 8d 47 50 f0 0f ba 28 00 19 d2 > 85 d2 > [ 1464.584437] RIP [<ffffffff80361112>] __blk_complete_request > +0xe8/0xec > [ 1464.584437] RSP <ffff880001063e10> > [ 1464.584437] ---[ end trace 7325773d478b6460 ]--- > [ 1464.584437] Kernel panic - not syncing: Fatal exception in interrupt > [ 1464.584437] Pid: 0, comm: swapper Tainted: G D 2.6.30-rc4 #3 > [ 1464.584437] Call Trace: > [ 1464.584437] <IRQ> [<ffffffff8051098a>] panic+0x75/0x146 > [ 1464.584437] [<ffffffff8020f31b>] oops_end+0x8f/0x97 > [ 1464.584437] [<ffffffff8020f4ea>] die+0x46/0x60 > [ 1464.584437] [<ffffffff8020cb76>] do_trap+0x129/0x152 > [ 1464.584437] [<ffffffff8024f84d>] ? atomic_notifier_call_chain > +0x15/0x17 > [ 1464.584437] [<ffffffff8020cf62>] do_invalid_op+0x90/0xa1 > [ 1464.584437] [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec > [ 1464.584437] [<ffffffff80513acf>] ? trace_hardirqs_off_thunk+0x3a/ > 0x6c > [ 1464.584437] [<ffffffff8020c005>] invalid_op+0x15/0x20 > [ 1464.584437] [<ffffffff80361112>] ? __blk_complete_request+0xe8/0xec > [ 1464.584437] [<ffffffff80361299>] blk_rq_timed_out+0x48/0x67 > [ 1464.584437] [<ffffffff803614e5>] blk_rq_timed_out_timer+0xd6/0x121 > [ 1464.584437] [<ffffffff8036140f>] ? blk_rq_timed_out_timer+0x0/0x121 > [ 1464.584437] [<ffffffff80240857>] run_timer_softirq+0x147/0x215 > [ 1464.584437] [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68 > [ 1464.584437] [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3 > [ 1464.584437] [<ffffffff8020c36c>] call_softirq+0x1c/0x30 > [ 1464.584437] [<ffffffff8020de61>] do_softirq+0x61/0xa0 > [ 1464.584437] [<ffffffff8023b8b1>] irq_exit+0x51/0x59 > [ 1464.584437] [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/0x96 > [ 1464.584437] [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20 > [ 1464.584437] <EOI> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f > [ 1464.584437] [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f > [ 1464.584437] [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97 > [ 1464.584437] [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df > > -- I did not exactly understand which tree are you using. There where lots of related changes around these areas Please try James post merge tree for the FC pass through support. It has all you need: git clone git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-post-merge-2.6.git Thanks Boaz ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel crash when BSG request timesout 2009-05-24 11:00 ` Boaz Harrosh @ 2009-05-26 18:38 ` Giridhar Malavali 2009-05-28 6:01 ` FUJITA Tomonori 0 siblings, 1 reply; 9+ messages in thread From: Giridhar Malavali @ 2009-05-26 18:38 UTC (permalink / raw) To: Boaz Harrosh; +Cc: linux-scsi@vger.kernel.org Thanks for the pointer. I will check with the post-merge tree. The crash I am seeing is because of softirq_done_fn not set in the request queue for BSG request. Even in the post-merge tree I don't see FC transport setting this function during the allocation of the request queue. When BSG request times out, I see that it executes __blk_complete_request function where check is done for its existence. I see this getting set for SCSI request during queue allocation in scsi_lib.c. Is this required for BSG request? Thanks, Giridhar.M.B On May 24, 2009, at 4:00 AM, Boaz Harrosh wrote: > On 05/22/2009 11:51 PM, Giridhar Malavali wrote: >> Hi, >> >> While testing the FC pass thru support I am constantly hitting a >> kernel crash when BSG request times out. >> I took the latest FC pass thru patches from James Smart from >> http://marc.info/?l=linux-scsi&m=123436574018579&w=2. and on top of >> it >> applied Boaz patches from >> http://markmail.org/search/?q=FC+pass-through+support >> +&x=0&y=0#query:FC >> %20passthrough%20support%20from%3A%22Boaz%20Harrosh%22+page:2+mid:ke4lj4cg5ftc6nsc+state:results >> >> Is there any additional patches I am missing? >> >> Thanks, >> Giridhar.M.B >> >> [ 1464.584437] ------------[ cut here ]------------ >> [ 1464.584437] kernel BUG at block/blk-softirq.c:110! >> [ 1464.584437] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC >> [ 1464.584437] last sysfs file: /sys/devices/system/cpu/cpu3/cache/ >> index2/shared_cpu_map >> [ 1464.584437] CPU 3 >> [ 1464.584437] Modules linked in: qla2xxx netconsole >> scsi_transport_fc >> [last unloaded: qla2xxx] >> [ 1464.584437] Pid: 0, comm: swapper Not tainted 2.6.30-rc4 #3 X7DB8 >> [ 1464.584437] RIP: 0010:[<ffffffff80361112>] [<ffffffff80361112>] >> __blk_complete_request+0xe8/0xec >> [ 1464.584437] RSP: 0018:ffff880001063e10 EFLAGS: 00010046 >> [ 1464.584437] RAX: 0000000000000001 RBX: ffff88007ab93e80 RCX: >> ffffffff8070f680 >> [ 1464.584437] RDX: 0000000000008988 RSI: 0000000000000086 RDI: >> ffff88007ab93e80 >> [ 1464.584437] RBP: ffff880001063e30 R08: 00000000ffffffff R09: >> 0000000000000003 >> [ 1464.584437] R10: 000000000000000a R11: 0000000000000000 R12: >> ffff88007a8b26c8 >> [ 1464.584437] R13: ffff88007a8b2a70 R14: ffff88007a8b26c8 R15: >> 0000000000000286 >> [ 1464.584437] FS: 0000000000000000(0000) GS:ffff880001060000(0000) >> knlGS:0000000000000000 >> [ 1464.584437] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> [ 1464.584437] CR2: 00007f7943ffd4a8 CR3: 000000007fb1e000 CR4: >> 00000000000006e0 >> [ 1464.584437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 1464.584437] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> [ 1464.584437] Process swapper (pid: 0, threadinfo ffff88007f9fe000, >> task ffff88007f9e1990) >> [ 1464.584437] Stack: >> [ 1464.584437] ffff88007a8b2970 ffff88007ab93e80 0000000000000000 >> ffff88007a8b2a70 >> [ 1464.584437] ffff880001063e50 ffffffff80361299 ffff88007a8b26c8 >> ffff88007a8b2930 >> [ 1464.584437] ffff880001063e90 ffffffff803614e5 ffff88007f9b8000 >> ffff88007a8b26c8 >> [ 1464.584437] Call Trace: >> [ 1464.584437] <IRQ> <0> [<ffffffff80361299>] blk_rq_timed_out >> +0x48/0x67 >> [ 1464.584437] [<ffffffff803614e5>] blk_rq_timed_out_timer >> +0xd6/0x121 >> [ 1464.584437] [<ffffffff8036140f>] ? blk_rq_timed_out_timer >> +0x0/0x121 >> [ 1464.584437] [<ffffffff80240857>] run_timer_softirq+0x147/0x215 >> [ 1464.584437] [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68 >> [ 1464.584437] [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3 >> [ 1464.584437] [<ffffffff8020c36c>] call_softirq+0x1c/0x30 >> [ 1464.584437] [<ffffffff8020de61>] do_softirq+0x61/0xa0 >> [ 1464.584437] [<ffffffff8023b8b1>] irq_exit+0x51/0x59 >> [ 1464.584437] [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/ >> 0x96 >> [ 1464.584437] [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20 >> [ 1464.584437] <EOI> <0> [<ffffffff80212ac8>] ? mwait_idle+0xfe/ >> 0x10f >> [ 1464.584437] [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f >> [ 1464.584437] [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97 >> [ 1464.584437] [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df >> [ 1464.584437] Code: b7 0f 36 80 48 89 5b 28 66 c7 43 30 00 00 48 8d >> 73 10 31 d2 e8 4c 8a ef ff eb b2 bf 04 00 00 00 e8 05 a3 ed ff 0f 1f >> 40 00 eb a2 <0f> 0b eb fe 55 48 89 e5 48 8d 47 50 f0 0f ba 28 00 19 >> d2 >> 85 d2 >> [ 1464.584437] RIP [<ffffffff80361112>] __blk_complete_request >> +0xe8/0xec >> [ 1464.584437] RSP <ffff880001063e10> >> [ 1464.584437] ---[ end trace 7325773d478b6460 ]--- >> [ 1464.584437] Kernel panic - not syncing: Fatal exception in >> interrupt >> [ 1464.584437] Pid: 0, comm: swapper Tainted: G D 2.6.30- >> rc4 #3 >> [ 1464.584437] Call Trace: >> [ 1464.584437] <IRQ> [<ffffffff8051098a>] panic+0x75/0x146 >> [ 1464.584437] [<ffffffff8020f31b>] oops_end+0x8f/0x97 >> [ 1464.584437] [<ffffffff8020f4ea>] die+0x46/0x60 >> [ 1464.584437] [<ffffffff8020cb76>] do_trap+0x129/0x152 >> [ 1464.584437] [<ffffffff8024f84d>] ? atomic_notifier_call_chain >> +0x15/0x17 >> [ 1464.584437] [<ffffffff8020cf62>] do_invalid_op+0x90/0xa1 >> [ 1464.584437] [<ffffffff80361112>] ? __blk_complete_request >> +0xe8/0xec >> [ 1464.584437] [<ffffffff80513acf>] ? trace_hardirqs_off_thunk+0x3a/ >> 0x6c >> [ 1464.584437] [<ffffffff8020c005>] invalid_op+0x15/0x20 >> [ 1464.584437] [<ffffffff80361112>] ? __blk_complete_request >> +0xe8/0xec >> [ 1464.584437] [<ffffffff80361299>] blk_rq_timed_out+0x48/0x67 >> [ 1464.584437] [<ffffffff803614e5>] blk_rq_timed_out_timer >> +0xd6/0x121 >> [ 1464.584437] [<ffffffff8036140f>] ? blk_rq_timed_out_timer >> +0x0/0x121 >> [ 1464.584437] [<ffffffff80240857>] run_timer_softirq+0x147/0x215 >> [ 1464.584437] [<ffffffff8023b67b>] ? raise_softirq+0x59/0x68 >> [ 1464.584437] [<ffffffff8023bf67>] __do_softirq+0xba/0x1a3 >> [ 1464.584437] [<ffffffff8020c36c>] call_softirq+0x1c/0x30 >> [ 1464.584437] [<ffffffff8020de61>] do_softirq+0x61/0xa0 >> [ 1464.584437] [<ffffffff8023b8b1>] irq_exit+0x51/0x59 >> [ 1464.584437] [<ffffffff8021d888>] smp_apic_timer_interrupt+0x6d/ >> 0x96 >> [ 1464.584437] [<ffffffff8020bd83>] apic_timer_interrupt+0x13/0x20 >> [ 1464.584437] <EOI> [<ffffffff80212ac8>] ? mwait_idle+0xfe/0x10f >> [ 1464.584437] [<ffffffff80212abf>] ? mwait_idle+0xf5/0x10f >> [ 1464.584437] [<ffffffff8020a4ce>] ? cpu_idle+0x63/0x97 >> [ 1464.584437] [<ffffffff8050d3c4>] ? start_secondary+0x183/0x1df >> >> -- > > I did not exactly understand which tree are you using. There where > lots of related changes around these areas > > Please try James post merge tree for the FC pass through support. > It has all you need: > git clone git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi- > post-merge-2.6.git > > Thanks > Boaz ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel crash when BSG request timesout 2009-05-26 18:38 ` Giridhar Malavali @ 2009-05-28 6:01 ` FUJITA Tomonori 2009-05-28 6:12 ` FUJITA Tomonori 2009-05-28 13:54 ` Douglas Gilbert 0 siblings, 2 replies; 9+ messages in thread From: FUJITA Tomonori @ 2009-05-28 6:01 UTC (permalink / raw) To: giridhar.malavali; +Cc: bharrosh, linux-scsi, James.Smart CC'ed James Smart, On Tue, 26 May 2009 11:38:14 -0700 Giridhar Malavali <giridhar.malavali@qlogic.com> wrote: > Thanks for the pointer. I will check with the post-merge tree. > > The crash I am seeing is because of softirq_done_fn not set in the > request queue for BSG request. Even in the post-merge tree I don't see > FC transport setting this function during the allocation of the > request queue. When BSG request times out, I see that it executes > __blk_complete_request function where check is done for its existence. > I see this getting set for SCSI request during queue allocation in > scsi_lib.c. Is this required for BSG request? Yeah, you need to set q->softirq_done_fn if you use the block timeout infrastructure. The current bsg user, SMP, uses bsg but it doesn't use the timeout infrastructure so it doesn't set q->softirq_done_fn. If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't expect that q->softirq_done_fn frees the request (currently, fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn for it. The attached patch works? It just adds q->softirq_done_fn and moves fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout returns BLK_EH_NOT_HANDLED when a job is done since the job will be finished shortly so we don't want the block layer to do anything for the job. It might be better to use q->softirq_done_fn for all the requests not only for expired requests, as SCSI-ml does, that is, job->job_done calls blk_complete_request(). diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index 3f64d93..c58e33a 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job) fc_destroy_bsgjob(job); } +static void fc_bsg_softirq_done(struct request *rq) +{ + struct fc_bsg_job *job = rq->special; + unsigned long flags; + + spin_lock_irqsave(&job->job_lock, flags); + job->ref_cnt--; + spin_unlock_irqrestore(&job->job_lock, flags); + fc_destroy_bsgjob(job); +} /** * fc_bsg_job_timeout - handler for when a bsg request timesout @@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req) "abort failed with status %d\n", err); } - if (!done) { - spin_lock_irqsave(&job->job_lock, flags); - job->ref_cnt--; - spin_unlock_irqrestore(&job->job_lock, flags); - fc_destroy_bsgjob(job); - } - - /* the blk_end_sync_io() doesn't check the error */ - return BLK_EH_HANDLED; + if (done) + return BLK_EH_NOT_HANDLED; + else + return BLK_EH_HANDLED; } - - static int fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req) { @@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct fc_host_attrs *fc_host) q->queuedata = shost; queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q); + blk_queue_softirq_done(q, fc_bsg_softirq_done); blk_queue_rq_timed_out(q, fc_bsg_job_timeout); blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: kernel crash when BSG request timesout 2009-05-28 6:01 ` FUJITA Tomonori @ 2009-05-28 6:12 ` FUJITA Tomonori 2009-06-10 7:56 ` [Suspected SPAM] " Giridhar Malavali 2009-05-28 13:54 ` Douglas Gilbert 1 sibling, 1 reply; 9+ messages in thread From: FUJITA Tomonori @ 2009-05-28 6:12 UTC (permalink / raw) To: giridhar.malavali; +Cc: bharrosh, linux-scsi, James.Smart On Thu, 28 May 2009 15:01:21 +0900 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > CC'ed James Smart, > > On Tue, 26 May 2009 11:38:14 -0700 > Giridhar Malavali <giridhar.malavali@qlogic.com> wrote: > > > Thanks for the pointer. I will check with the post-merge tree. > > > > The crash I am seeing is because of softirq_done_fn not set in the > > request queue for BSG request. Even in the post-merge tree I don't see > > FC transport setting this function during the allocation of the > > request queue. When BSG request times out, I see that it executes > > __blk_complete_request function where check is done for its existence. > > I see this getting set for SCSI request during queue allocation in > > scsi_lib.c. Is this required for BSG request? > > Yeah, you need to set q->softirq_done_fn if you use the block timeout > infrastructure. The current bsg user, SMP, uses bsg but it doesn't use > the timeout infrastructure so it doesn't set q->softirq_done_fn. > > If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't > expect that q->softirq_done_fn frees the request (currently, > fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn > for it. Oops, If q->rq_timed_out_fn returns BLK_EH_HANDLED, the block layer doesn't expect that q->rq_timed_out_fn frees the request (currently, fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn to clean up the request. > The attached patch works? It just adds q->softirq_done_fn and moves > fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout > returns BLK_EH_NOT_HANDLED when a job is done since the job will be > finished shortly so we don't want the block layer to do anything for > the job. > > It might be better to use q->softirq_done_fn for all the requests not > only for expired requests, as SCSI-ml does, that is, job->job_done > calls blk_complete_request(). > > > diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c > index 3f64d93..c58e33a 100644 > --- a/drivers/scsi/scsi_transport_fc.c > +++ b/drivers/scsi/scsi_transport_fc.c > @@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job) > fc_destroy_bsgjob(job); > } > > +static void fc_bsg_softirq_done(struct request *rq) > +{ > + struct fc_bsg_job *job = rq->special; > + unsigned long flags; > + > + spin_lock_irqsave(&job->job_lock, flags); > + job->ref_cnt--; > + spin_unlock_irqrestore(&job->job_lock, flags); > + fc_destroy_bsgjob(job); > +} > > /** > * fc_bsg_job_timeout - handler for when a bsg request timesout > @@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req) > "abort failed with status %d\n", err); > } > > - if (!done) { > - spin_lock_irqsave(&job->job_lock, flags); > - job->ref_cnt--; > - spin_unlock_irqrestore(&job->job_lock, flags); > - fc_destroy_bsgjob(job); > - } > - > - /* the blk_end_sync_io() doesn't check the error */ > - return BLK_EH_HANDLED; > + if (done) > + return BLK_EH_NOT_HANDLED; > + else > + return BLK_EH_HANDLED; > } > > - > - > static int > fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req) > { > @@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct fc_host_attrs *fc_host) > > q->queuedata = shost; > queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q); > + blk_queue_softirq_done(q, fc_bsg_softirq_done); > blk_queue_rq_timed_out(q, fc_bsg_job_timeout); > blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT); > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Suspected SPAM] Re: kernel crash when BSG request timesout 2009-05-28 6:12 ` FUJITA Tomonori @ 2009-06-10 7:56 ` Giridhar Malavali 2009-06-10 8:40 ` FUJITA Tomonori 0 siblings, 1 reply; 9+ messages in thread From: Giridhar Malavali @ 2009-06-10 7:56 UTC (permalink / raw) To: FUJITA Tomonori Cc: bharrosh@panasas.com, linux-scsi@vger.kernel.org, James.Smart@Emulex.Com After applying the changes from Fujita, I see that application never completes when BSG time out happens. Once the BSG request times out, I see fc_bsg_softirq_done routine destroying the bsg_job but does not send any response back to the application. The application infinitely waits for the response with following warning message Jun 9 17:22:09 elab60 kernel: [ 480.666830]INFO: task sgv4_els:6058 blocked for more than 120 seconds. Jun 9 17:22:09 elab60 kernel: [ 480.666833] "echo 0 > /proc/sys/ kernel/hung_task_timeout_secs" disables this message. Jun 9 17:22:09 elab60 kernel: [ 480.666835] sgv4_els D 0000000000000000 0 6058 5993 Jun 9 17:22:09 elab60 kernel: [ 480.666838] ffff88007f173b78 0000000000000082 0000000000000000 ffffffffa003f880 Jun 9 17:22:09 elab60 kernel: [ 480.666842] ffff880001030000 000000000000ff00 000000000000c8b8 ffff88007fbf6990 Jun 9 17:22:09 elab60 kernel: [ 480.666845] ffff88007fbf6c18 00000001a00382cf 00000000ffff3524 ffff88007f93c990 Jun 9 17:22:09 elab60 kernel: [ 480.666848] Call Trace: Jun 9 17:22:09 elab60 kernel: [ 480.666858] [<ffffffffa00015d2>] ? fc_bsg_map_buffer+0x2a/0x72 [scsi_transport_fc] Jun 9 17:22:09 elab60 kernel: [ 480.666864] [<ffffffff8029a2ba>] ? cache_alloc_debugcheck_after+0x73/0x243 Jun 9 17:22:09 elab60 kernel: [ 480.666868] [<ffffffff80511ebe>] schedule+0x9/0x1d Jun 9 17:22:09 elab60 kernel: [ 480.666871] [<ffffffff8051210f>] schedule_timeout+0x12f/0x164 Jun 9 17:22:09 elab60 kernel: [ 480.666873] [<ffffffff805113f7>] wait_for_common+0xb8/0x15e Jun 9 17:22:09 elab60 kernel: [ 480.666878] [<ffffffff80230feb>] ? default_wake_function+0x0/0xf Jun 9 17:22:09 elab60 kernel: [ 480.666880] [<ffffffff80511527>] wait_for_completion+0x18/0x1a Jun 9 17:22:09 elab60 kernel: [ 480.666884] [<ffffffff80360da6>] blk_execute_rq+0x7f/0xc9 Jun 9 17:22:09 elab60 kernel: [ 480.666887] [<ffffffff80365c28>] bsg_ioctl+0x1c0/0x227 Jun 9 17:22:09 elab60 kernel: [ 480.666890] [<ffffffff80514362>] ? _spin_unlock_irqrestore+0x2b/0x32 Jun 9 17:22:09 elab60 kernel: [ 480.666894] [<ffffffff802adb36>] vfs_ioctl+0x2a/0x95 Jun 9 17:22:09 elab60 kernel: [ 480.666896] [<ffffffff802adc22>] do_vfs_ioctl+0x81/0x583 Jun 9 17:22:09 elab60 kernel: [ 480.666898] [<ffffffff80514372>] ? _spin_unlock+0x9/0xb Jun 9 17:22:09 elab60 kernel: [ 480.666901] [<ffffffff802ae165>] sys_ioctl+0x41/0x65 Jun 9 17:22:09 elab60 kernel: [ 480.666904] [<ffffffff8020b26b>] system_call_fastpath+0x16/0x1b I see that function blk_end_request_all calls blk_finish_request routine to complete the response to application. After adding this call in fc_bsg_softirq_done function, the application gets the response and completes. Is this a proper fix? How does block layer request completes when timeout happens? /** * fc_bsg_softirq_done - softirq done routine for destroying the bsg requests * @req: BSG request that holds the job to be destroyed */ static void fc_bsg_softirq_done(struct request *rq) { struct fc_bsg_job *job = rq->special; unsigned long flags; spin_lock_irqsave(&job->job_lock, flags); + job->state_flags |= FC_RQST_STATE_DONE; job->ref_cnt--; spin_unlock_irqrestore(&job->job_lock, flags); + blk_end_request_all(rq, rq->errors); fc_destroy_bsgjob(job); - Giridhar.M.B Once the BSG request times On May 27, 2009, at 11:12 PM, FUJITA Tomonori wrote: > On Thu, 28 May 2009 15:01:21 +0900 > FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > >> CC'ed James Smart, >> >> On Tue, 26 May 2009 11:38:14 -0700 >> Giridhar Malavali <giridhar.malavali@qlogic.com> wrote: >> >>> Thanks for the pointer. I will check with the post-merge tree. >>> >>> The crash I am seeing is because of softirq_done_fn not set in the >>> request queue for BSG request. Even in the post-merge tree I don't >>> see >>> FC transport setting this function during the allocation of the >>> request queue. When BSG request times out, I see that it executes >>> __blk_complete_request function where check is done for its >>> existence. >>> I see this getting set for SCSI request during queue allocation in >>> scsi_lib.c. Is this required for BSG request? >> >> Yeah, you need to set q->softirq_done_fn if you use the block timeout >> infrastructure. The current bsg user, SMP, uses bsg but it doesn't >> use >> the timeout infrastructure so it doesn't set q->softirq_done_fn. >> >> If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't >> expect that q->softirq_done_fn frees the request (currently, >> fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn >> for it. > > Oops, > > If q->rq_timed_out_fn returns BLK_EH_HANDLED, the block layer doesn't > expect that q->rq_timed_out_fn frees the request (currently, > fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn to > clean up the request. > > >> The attached patch works? It just adds q->softirq_done_fn and moves >> fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout >> returns BLK_EH_NOT_HANDLED when a job is done since the job will be >> finished shortly so we don't want the block layer to do anything for >> the job. >> >> It might be better to use q->softirq_done_fn for all the requests not >> only for expired requests, as SCSI-ml does, that is, job->job_done >> calls blk_complete_request(). >> >> >> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/ >> scsi_transport_fc.c >> index 3f64d93..c58e33a 100644 >> --- a/drivers/scsi/scsi_transport_fc.c >> +++ b/drivers/scsi/scsi_transport_fc.c >> @@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job) >> fc_destroy_bsgjob(job); >> } >> >> +static void fc_bsg_softirq_done(struct request *rq) >> +{ >> + struct fc_bsg_job *job = rq->special; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&job->job_lock, flags); >> + job->ref_cnt--; >> + spin_unlock_irqrestore(&job->job_lock, flags); >> + fc_destroy_bsgjob(job); >> +} >> >> /** >> * fc_bsg_job_timeout - handler for when a bsg request timesout >> @@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req) >> "abort failed with status %d\n", err); >> } >> >> - if (!done) { >> - spin_lock_irqsave(&job->job_lock, flags); >> - job->ref_cnt--; >> - spin_unlock_irqrestore(&job->job_lock, flags); >> - fc_destroy_bsgjob(job); >> - } >> - >> - /* the blk_end_sync_io() doesn't check the error */ >> - return BLK_EH_HANDLED; >> + if (done) >> + return BLK_EH_NOT_HANDLED; >> + else >> + return BLK_EH_HANDLED; >> } >> >> - >> - >> static int >> fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req) >> { >> @@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, >> struct fc_host_attrs *fc_host) >> >> q->queuedata = shost; >> queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q); >> + blk_queue_softirq_done(q, fc_bsg_softirq_done); >> blk_queue_rq_timed_out(q, fc_bsg_job_timeout); >> blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT); >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux- >> scsi" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Suspected SPAM] Re: kernel crash when BSG request timesout 2009-06-10 7:56 ` [Suspected SPAM] " Giridhar Malavali @ 2009-06-10 8:40 ` FUJITA Tomonori 0 siblings, 0 replies; 9+ messages in thread From: FUJITA Tomonori @ 2009-06-10 8:40 UTC (permalink / raw) To: giridhar.malavali; +Cc: fujita.tomonori, bharrosh, linux-scsi, James.Smart On Wed, 10 Jun 2009 00:56:05 -0700 Giridhar Malavali <giridhar.malavali@qlogic.com> wrote: > > After applying the changes from Fujita, I see that application never > completes when BSG time out happens. Once the BSG request times out, > I see fc_bsg_softirq_done routine destroying the bsg_job but does not > send any response back to the application. The application infinitely > waits for the response with following warning message > > Jun 9 17:22:09 elab60 kernel: [ 480.666830]INFO: task sgv4_els:6058 > blocked for more than 120 seconds. > Jun 9 17:22:09 elab60 kernel: [ 480.666833] "echo 0 > /proc/sys/ > kernel/hung_task_timeout_secs" disables this message. > Jun 9 17:22:09 elab60 kernel: [ 480.666835] sgv4_els D > 0000000000000000 0 6058 5993 > Jun 9 17:22:09 elab60 kernel: [ 480.666838] ffff88007f173b78 > 0000000000000082 0000000000000000 ffffffffa003f880 > Jun 9 17:22:09 elab60 kernel: [ 480.666842] ffff880001030000 > 000000000000ff00 000000000000c8b8 ffff88007fbf6990 > Jun 9 17:22:09 elab60 kernel: [ 480.666845] ffff88007fbf6c18 > 00000001a00382cf 00000000ffff3524 ffff88007f93c990 > Jun 9 17:22:09 elab60 kernel: [ 480.666848] Call Trace: > Jun 9 17:22:09 elab60 kernel: [ 480.666858] [<ffffffffa00015d2>] ? > fc_bsg_map_buffer+0x2a/0x72 [scsi_transport_fc] > Jun 9 17:22:09 elab60 kernel: [ 480.666864] [<ffffffff8029a2ba>] ? > cache_alloc_debugcheck_after+0x73/0x243 > Jun 9 17:22:09 elab60 kernel: [ 480.666868] [<ffffffff80511ebe>] > schedule+0x9/0x1d > Jun 9 17:22:09 elab60 kernel: [ 480.666871] [<ffffffff8051210f>] > schedule_timeout+0x12f/0x164 > Jun 9 17:22:09 elab60 kernel: [ 480.666873] [<ffffffff805113f7>] > wait_for_common+0xb8/0x15e > Jun 9 17:22:09 elab60 kernel: [ 480.666878] [<ffffffff80230feb>] ? > default_wake_function+0x0/0xf > Jun 9 17:22:09 elab60 kernel: [ 480.666880] [<ffffffff80511527>] > wait_for_completion+0x18/0x1a > Jun 9 17:22:09 elab60 kernel: [ 480.666884] [<ffffffff80360da6>] > blk_execute_rq+0x7f/0xc9 > Jun 9 17:22:09 elab60 kernel: [ 480.666887] [<ffffffff80365c28>] > bsg_ioctl+0x1c0/0x227 > Jun 9 17:22:09 elab60 kernel: [ 480.666890] [<ffffffff80514362>] ? > _spin_unlock_irqrestore+0x2b/0x32 > Jun 9 17:22:09 elab60 kernel: [ 480.666894] [<ffffffff802adb36>] > vfs_ioctl+0x2a/0x95 > Jun 9 17:22:09 elab60 kernel: [ 480.666896] [<ffffffff802adc22>] > do_vfs_ioctl+0x81/0x583 > Jun 9 17:22:09 elab60 kernel: [ 480.666898] [<ffffffff80514372>] ? > _spin_unlock+0x9/0xb > Jun 9 17:22:09 elab60 kernel: [ 480.666901] [<ffffffff802ae165>] > sys_ioctl+0x41/0x65 > Jun 9 17:22:09 elab60 kernel: [ 480.666904] [<ffffffff8020b26b>] > system_call_fastpath+0x16/0x1b Oops, sorry about that. > I see that function blk_end_request_all calls blk_finish_request > routine to complete the response to application. After adding this > call in fc_bsg_softirq_done function, the application gets the > response and completes. > > Is this a proper fix? How does block layer request completes when > timeout happens? Looks ok to me. You need to complete such requests (as your fix does in fc_bsg_softirq_done), if I understand correctly. > /** > * fc_bsg_softirq_done - softirq done routine for destroying the bsg > requests > * @req: BSG request that holds the job to be destroyed > */ > static void fc_bsg_softirq_done(struct request *rq) > { > struct fc_bsg_job *job = rq->special; > unsigned long flags; > > spin_lock_irqsave(&job->job_lock, flags); > + job->state_flags |= FC_RQST_STATE_DONE; > job->ref_cnt--; > spin_unlock_irqrestore(&job->job_lock, flags); > + blk_end_request_all(rq, rq->errors); > fc_destroy_bsgjob(job); > My previous patch with this fix is fine by me for now. However, as I proposed in the previous mail, I think that it would be clean if we use q->softirq_done_fn for all the requests not only for expired requests because fc_bsg_jobdone() does the part of what fc_bsg_softirq_done() does. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel crash when BSG request timesout 2009-05-28 6:01 ` FUJITA Tomonori 2009-05-28 6:12 ` FUJITA Tomonori @ 2009-05-28 13:54 ` Douglas Gilbert 2009-05-28 22:23 ` FUJITA Tomonori 1 sibling, 1 reply; 9+ messages in thread From: Douglas Gilbert @ 2009-05-28 13:54 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: giridhar.malavali, bharrosh, linux-scsi, James.Smart FUJITA Tomonori wrote: > CC'ed James Smart, > > On Tue, 26 May 2009 11:38:14 -0700 > Giridhar Malavali <giridhar.malavali@qlogic.com> wrote: > >> Thanks for the pointer. I will check with the post-merge tree. >> >> The crash I am seeing is because of softirq_done_fn not set in the >> request queue for BSG request. Even in the post-merge tree I don't see >> FC transport setting this function during the allocation of the >> request queue. When BSG request times out, I see that it executes >> __blk_complete_request function where check is done for its existence. >> I see this getting set for SCSI request during queue allocation in >> scsi_lib.c. Is this required for BSG request? > > Yeah, you need to set q->softirq_done_fn if you use the block timeout > infrastructure. The current bsg user, SMP, uses bsg but it doesn't use > the timeout infrastructure so it doesn't set q->softirq_done_fn. sg3_utils version 1.27 (and later) is a user of bsg, sending SCSI commands through. Will timeouts works? [I didn't check.] Doug Gilbert ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel crash when BSG request timesout 2009-05-28 13:54 ` Douglas Gilbert @ 2009-05-28 22:23 ` FUJITA Tomonori 0 siblings, 0 replies; 9+ messages in thread From: FUJITA Tomonori @ 2009-05-28 22:23 UTC (permalink / raw) To: dgilbert Cc: fujita.tomonori, giridhar.malavali, bharrosh, linux-scsi, James.Smart On Thu, 28 May 2009 09:54:28 -0400 Douglas Gilbert <dgilbert@interlog.com> wrote: > FUJITA Tomonori wrote: > > CC'ed James Smart, > > > > On Tue, 26 May 2009 11:38:14 -0700 > > Giridhar Malavali <giridhar.malavali@qlogic.com> wrote: > > > >> Thanks for the pointer. I will check with the post-merge tree. > >> > >> The crash I am seeing is because of softirq_done_fn not set in the > >> request queue for BSG request. Even in the post-merge tree I don't see > >> FC transport setting this function during the allocation of the > >> request queue. When BSG request times out, I see that it executes > >> __blk_complete_request function where check is done for its existence. > >> I see this getting set for SCSI request during queue allocation in > >> scsi_lib.c. Is this required for BSG request? > > > > Yeah, you need to set q->softirq_done_fn if you use the block timeout > > infrastructure. The current bsg user, SMP, uses bsg but it doesn't use > > the timeout infrastructure so it doesn't set q->softirq_done_fn. > > sg3_utils version 1.27 (and later) is a user of bsg, sending > SCSI commands through. Will timeouts works? [I didn't check.] Yeah, it should work. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-06-10 8:40 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-22 20:51 kernel crash when BSG request timesout Giridhar Malavali 2009-05-24 11:00 ` Boaz Harrosh 2009-05-26 18:38 ` Giridhar Malavali 2009-05-28 6:01 ` FUJITA Tomonori 2009-05-28 6:12 ` FUJITA Tomonori 2009-06-10 7:56 ` [Suspected SPAM] " Giridhar Malavali 2009-06-10 8:40 ` FUJITA Tomonori 2009-05-28 13:54 ` Douglas Gilbert 2009-05-28 22:23 ` FUJITA Tomonori
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox