* Re: question on block-layer timeout change [not found] <3568BBCB98C00041A9E622952FD5F24EA11C9F3A@cosmail03.lsi.com> @ 2008-11-12 7:29 ` Mike Anderson 2008-11-12 17:16 ` malahal 2008-11-14 8:51 ` Shi, Harris 0 siblings, 2 replies; 16+ messages in thread From: Mike Anderson @ 2008-11-12 7:29 UTC (permalink / raw) To: Shi, Harris; +Cc: Jens Axboe, Alan Stern, Tejun Heo, SCSI development list Shi, Harris <Harris.Shi@lsi.com> wrote: > Due to the current timeout management change, our RDAC (failover) driver > had some difficulties in handling SCSI I/O timeout. The RDAC driver is in > the similar layer as HBA driver in that it will register into scsi > mid-layer as scsi_host_template and stays below mid-layer. However, all > scsi I/Os coming to RDAC stack are routed by a path then dispatched to the > real HBA driver via mid-layer. We used to rely on the timer in > scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. > Basically when I/O is coming to RDAC stack, we need to delete the timer > for each I/O. Then after selecting a specific path for this I/O, we need > to send the I/O back to mid-layer with a larger timeout value just to > avoid the forced failover. When I/O completes successfully, we added the > original timer back to the I/O and pass it over to upper block layer for > further process. > > > > However, with the current timeout management functions moving to block > layer, it became difficult for us to explicitly control the timeout value > for specific I/O. > > Can you shed some lights on how to handle the I/O based timeout in this > case? > Since long term mpp capabilities should be handled by dm-mp and the SCSI RDAC handler exporting functions to allow direct adding and deleting of the timer may not be something that would be needed long term. It may not be really clean to add these interfaces in. Could similar prior functionality be created by the following? 1.) To the RDAC vhba add a hostt->eh_timed_out function. In this timeout function return BLK_EH_RESET_TIMER until it is done with the command. Since the vhba does not have a transport scsi_times_out should call this function on every timeout. There is some overhead here depending on the default timeout value set in timing out and then resetting the timer. 2.) For each sdev that is taken over store the previous rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp specific function for the requests sent to the real HBA drivers. 3.) Set the timeout in the real HBA driver requests prior to sending it to the mid layer. -andmike -- Michael Anderson andmike@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2008-11-12 7:29 ` question on block-layer timeout change Mike Anderson @ 2008-11-12 17:16 ` malahal 2008-11-14 8:51 ` Shi, Harris 1 sibling, 0 replies; 16+ messages in thread From: malahal @ 2008-11-12 17:16 UTC (permalink / raw) To: Mike Anderson Cc: Shi, Harris, Jens Axboe, Alan Stern, Tejun Heo, SCSI development list Mike Anderson [andmike@linux.vnet.ibm.com] wrote: > Shi, Harris <Harris.Shi@lsi.com> wrote: > > Due to the current timeout management change, our RDAC (failover) driver > > had some difficulties in handling SCSI I/O timeout. The RDAC driver is in > > the similar layer as HBA driver in that it will register into scsi > > mid-layer as scsi_host_template and stays below mid-layer. However, all > > scsi I/Os coming to RDAC stack are routed by a path then dispatched to the > > real HBA driver via mid-layer. We used to rely on the timer in > > scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. > > Basically when I/O is coming to RDAC stack, we need to delete the timer > > for each I/O. Then after selecting a specific path for this I/O, we need > > to send the I/O back to mid-layer with a larger timeout value just to > > avoid the forced failover. When I/O completes successfully, we added the > > original timer back to the I/O and pass it over to upper block layer for > > further process. > > > > > > > > However, with the current timeout management functions moving to block > > layer, it became difficult for us to explicitly control the timeout value > > for specific I/O. > > > > Can you shed some lights on how to handle the I/O based timeout in this > > case? > > > > Since long term mpp capabilities should be handled by dm-mp and the SCSI > RDAC handler exporting functions to allow direct adding and deleting of the > timer may not be something that would be needed long term. It may not be > really clean to add these interfaces in. > > Could similar prior functionality be created by the following? > 1.) To the RDAC vhba add a hostt->eh_timed_out function. In this > timeout function return BLK_EH_RESET_TIMER until it is done with > the command. Since the vhba does not have a transport > scsi_times_out should call this function on every timeout. There is > some overhead here depending on the default timeout value set in > timing out and then resetting the timer. I believe, GDTG driver does this sort of thing for a different purpose. > 2.) For each sdev that is taken over store the previous > rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp > specific function for the requests sent to the real HBA drivers. > > 3.) Set the timeout in the real HBA driver requests prior to > sending it to the mid layer. Should work, I guess... ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2008-11-12 7:29 ` question on block-layer timeout change Mike Anderson 2008-11-12 17:16 ` malahal @ 2008-11-14 8:51 ` Shi, Harris 2008-11-14 17:18 ` malahal 1 sibling, 1 reply; 16+ messages in thread From: Shi, Harris @ 2008-11-14 8:51 UTC (permalink / raw) To: Mike Anderson; +Cc: malahal@us.ibm.com, SCSI development list Mike, Thanks for your valuable input. For item 1, how can I make sure that the timed-out command will have the timer modified via blk_add_timer given that one of following conditions has to be met, * timer isn't already pending or * this timeout value is earlier than an existing one. Also where do I need to have a retry after reset the timer? For item 2, rq_timed_out_fn is tied with scsi_times_out at the very beginning. What's the purpose to tie a specific mpp method? How do we handle the case if timeout is triggered at this time? Harris -----Original Message----- From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com] Sent: Wednesday, November 12, 2008 1:29 AM To: Shi, Harris Cc: Jens Axboe; Alan Stern; Tejun Heo; SCSI development list Subject: Re: question on block-layer timeout change Shi, Harris <Harris.Shi@lsi.com> wrote: > Due to the current timeout management change, our RDAC (failover) driver > had some difficulties in handling SCSI I/O timeout. The RDAC driver is in > the similar layer as HBA driver in that it will register into scsi > mid-layer as scsi_host_template and stays below mid-layer. However, all > scsi I/Os coming to RDAC stack are routed by a path then dispatched to the > real HBA driver via mid-layer. We used to rely on the timer in > scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. > Basically when I/O is coming to RDAC stack, we need to delete the timer > for each I/O. Then after selecting a specific path for this I/O, we need > to send the I/O back to mid-layer with a larger timeout value just to > avoid the forced failover. When I/O completes successfully, we added the > original timer back to the I/O and pass it over to upper block layer for > further process. > > > > However, with the current timeout management functions moving to block > layer, it became difficult for us to explicitly control the timeout value > for specific I/O. > > Can you shed some lights on how to handle the I/O based timeout in this > case? > Since long term mpp capabilities should be handled by dm-mp and the SCSI RDAC handler exporting functions to allow direct adding and deleting of the timer may not be something that would be needed long term. It may not be really clean to add these interfaces in. Could similar prior functionality be created by the following? 1.) To the RDAC vhba add a hostt->eh_timed_out function. In this timeout function return BLK_EH_RESET_TIMER until it is done with the command. Since the vhba does not have a transport scsi_times_out should call this function on every timeout. There is some overhead here depending on the default timeout value set in timing out and then resetting the timer. 2.) For each sdev that is taken over store the previous rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp specific function for the requests sent to the real HBA drivers. 3.) Set the timeout in the real HBA driver requests prior to sending it to the mid layer. -andmike -- Michael Anderson andmike@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2008-11-14 8:51 ` Shi, Harris @ 2008-11-14 17:18 ` malahal 2008-12-10 23:11 ` Shi, Harris 0 siblings, 1 reply; 16+ messages in thread From: malahal @ 2008-11-14 17:18 UTC (permalink / raw) To: Shi, Harris; +Cc: Mike Anderson, SCSI development list Shi, Harris [Harris.Shi@lsi.com] wrote: > Mike, > > Thanks for your valuable input. > > For item 1, how can I make sure that the timed-out command will have > the timer modified via blk_add_timer given that one of following > conditions has to be met, > > * timer isn't already pending or I don't completely understand the RDAC architecture, but here are my comments bases on what I read from your earlier email. When your timed_out function is called, the timer is already fired In your case, you should always return 'BLK_EH_RESET_TIMER'. This is just to make sure that the command doesn't fail before you resubmit the request to the real HBA adapter, I think. > * this timeout value is earlier than an existing one. When you return BLK_EH_RESET_TIMER, the block layer puts the command again in its timeout queue and waits for another 'timeout' value before calling your timer again. You can send the request to real HBA divers after timeout expiry. If you really want to fail, you can return some other value... > Also where do I need to have a retry after reset the timer? There is no retry of the command. If you return 'BLK_EH_RESET_TIMER' the command is still with your driver and you can take another timeout value to finish the request. > For item 2, rq_timed_out_fn is tied with scsi_times_out at the very beginning. What's the purpose to tie a specific mpp method? How do we handle the case if timeout is triggered at this time? > When you send the request to the real HBA, the request timeout value doesn't change. So Mike's suggestion is to have your own timed_out_fun that returns BLK_EH_RESET_TIMER few times (effectively hijacking the timeout value). See gdth_timed_out() for this case. --Malahal. > Harris > > -----Original Message----- > From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com] > Sent: Wednesday, November 12, 2008 1:29 AM > To: Shi, Harris > Cc: Jens Axboe; Alan Stern; Tejun Heo; SCSI development list > Subject: Re: question on block-layer timeout change > > Shi, Harris <Harris.Shi@lsi.com> wrote: > > Due to the current timeout management change, our RDAC (failover) driver > > had some difficulties in handling SCSI I/O timeout. The RDAC driver is in > > the similar layer as HBA driver in that it will register into scsi > > mid-layer as scsi_host_template and stays below mid-layer. However, all > > scsi I/Os coming to RDAC stack are routed by a path then dispatched to the > > real HBA driver via mid-layer. We used to rely on the timer in > > scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. > > Basically when I/O is coming to RDAC stack, we need to delete the timer > > for each I/O. Then after selecting a specific path for this I/O, we need > > to send the I/O back to mid-layer with a larger timeout value just to > > avoid the forced failover. When I/O completes successfully, we added the > > original timer back to the I/O and pass it over to upper block layer for > > further process. > > > > > > > > However, with the current timeout management functions moving to block > > layer, it became difficult for us to explicitly control the timeout value > > for specific I/O. > > > > Can you shed some lights on how to handle the I/O based timeout in this > > case? > > > > Since long term mpp capabilities should be handled by dm-mp and the SCSI > RDAC handler exporting functions to allow direct adding and deleting of the > timer may not be something that would be needed long term. It may not be > really clean to add these interfaces in. > > Could similar prior functionality be created by the following? > 1.) To the RDAC vhba add a hostt->eh_timed_out function. In this > timeout function return BLK_EH_RESET_TIMER until it is done with > the command. Since the vhba does not have a transport > scsi_times_out should call this function on every timeout. There is > some overhead here depending on the default timeout value set in > timing out and then resetting the timer. > > 2.) For each sdev that is taken over store the previous > rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp > specific function for the requests sent to the real HBA drivers. > > 3.) Set the timeout in the real HBA driver requests prior to > sending it to the mid layer. > > -andmike > -- > Michael Anderson > andmike@linux.vnet.ibm.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2008-11-14 17:18 ` malahal @ 2008-12-10 23:11 ` Shi, Harris 2008-12-11 11:03 ` Hannes Reinecke 0 siblings, 1 reply; 16+ messages in thread From: Shi, Harris @ 2008-12-10 23:11 UTC (permalink / raw) To: malahal@us.ibm.com; +Cc: Mike Anderson, SCSI development list Mike, Your suggestion on MPP driver is working pretty good on FC config in term of failover and failback. However, recently when we switched over to iscsi config on SLES11beta6 (2.6.27.7-4-default, SLES11 kernel did not sync up with current upstream one, all of timeout management patch has been pulled in), we were consistently hit by the following panic when we tried to do failover via controller sysReboot or placed offline. Is it something related to the recent timeout management patch introduced into the kernel? BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba IP: [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b PGD 196cf6067 PUD 196c4f067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map CPU 2 Modules linked in: radeon drm crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core d Supported: Yes, External Pid: 0, comm: swapper Tainted: G 2.6.27.7-4-default #1 RIP: 0010:[<ffffffff80222047>] [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b RSP: 0018:ffff88019f187e20 EFLAGS: 00010086 RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff88019d8c3218 RDX: ffff88019cd3d000 RSI: 0000000000002007 RDI: 00000000000000ba RBP: ffff880194940918 R08: ffff880194940c78 R09: 0000000000000000 R10: ffffffff80a65b80 R11: ffffffff8021c6ed R12: 0000000000000000 R13: ffff880194940b50 R14: ffff88019f187ed0 R15: ffff880194940c90 FS: 0000000000000000(0000) GS:ffff88019f157ec0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000000ba CR3: 0000000196cc7000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff88019f180000, task ffff88019f17e280) Stack: ffffffff804aabe2 ffffffffa0321734 0000000000000000 ffff88018fc97c80 ffff880194940918 ffffffffa0004f35 ffff88019d8c30d8 ffffffff8034b954 ffff88019d8c30d8 ffffffff8034ba23 0000000000000282 0000000000000100 Call Trace: [<ffffffff804aabe2>] _spin_lock+0x13/0x15 [<ffffffffa0321734>] iscsi_eh_cmd_timed_out+0x27/0xc0 [libiscsi] [<ffffffffa0004f35>] scsi_times_out+0x46/0x72 [scsi_mod] [<ffffffff8034b954>] blk_rq_timed_out+0xe/0x4a [<ffffffff8034ba23>] blk_rq_timed_out_timer+0x93/0x116 [<ffffffff8024a5f5>] run_timer_softirq+0x19a/0x228 [<ffffffff8024696d>] __do_softirq+0x84/0x115 [<ffffffff8020ddac>] call_softirq+0x1c/0x28 [<ffffffff8020f177>] do_softirq+0x3c/0x81 [<ffffffff80246684>] irq_exit+0x3f/0x83 [<ffffffff8021cf73>] smp_apic_timer_interrupt+0x95/0xae [<ffffffff8020d523>] apic_timer_interrupt+0x83/0x90 [<ffffffff802134f4>] mwait_idle+0x3c/0x46 [<ffffffff8020b3b5>] cpu_idle+0xa9/0xf1 Code: ff 00 00 c1 ea 10 39 c2 0f 95 c0 0f b6 c0 c3 8b 17 89 d0 c1 f8 10 29 d0 25 ff ff 00 00 ff RIP [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b RSP <ffff88019f187e20> ---------------------------------------------------------------------------- Your comment is very much appreciated. Thanks. Harris -----Original Message----- From: malahal@us.ibm.com [mailto:malahal@us.ibm.com] Sent: Friday, November 14, 2008 11:18 AM To: Shi, Harris Cc: Mike Anderson; SCSI development list Subject: Re: question on block-layer timeout change Shi, Harris [Harris.Shi@lsi.com] wrote: > Mike, > > Thanks for your valuable input. > > For item 1, how can I make sure that the timed-out command will have > the timer modified via blk_add_timer given that one of following > conditions has to be met, > > * timer isn't already pending or I don't completely understand the RDAC architecture, but here are my comments bases on what I read from your earlier email. When your timed_out function is called, the timer is already fired In your case, you should always return 'BLK_EH_RESET_TIMER'. This is just to make sure that the command doesn't fail before you resubmit the request to the real HBA adapter, I think. > * this timeout value is earlier than an existing one. When you return BLK_EH_RESET_TIMER, the block layer puts the command again in its timeout queue and waits for another 'timeout' value before calling your timer again. You can send the request to real HBA divers after timeout expiry. If you really want to fail, you can return some other value... > Also where do I need to have a retry after reset the timer? There is no retry of the command. If you return 'BLK_EH_RESET_TIMER' the command is still with your driver and you can take another timeout value to finish the request. > For item 2, rq_timed_out_fn is tied with scsi_times_out at the very beginning. What's the purpose to tie a specific mpp method? How do we handle the case if timeout is triggered at this time? > When you send the request to the real HBA, the request timeout value doesn't change. So Mike's suggestion is to have your own timed_out_fun that returns BLK_EH_RESET_TIMER few times (effectively hijacking the timeout value). See gdth_timed_out() for this case. --Malahal. > Harris > > -----Original Message----- > From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com] > Sent: Wednesday, November 12, 2008 1:29 AM > To: Shi, Harris > Cc: Jens Axboe; Alan Stern; Tejun Heo; SCSI development list > Subject: Re: question on block-layer timeout change > > Shi, Harris <Harris.Shi@lsi.com> wrote: > > Due to the current timeout management change, our RDAC (failover) driver > > had some difficulties in handling SCSI I/O timeout. The RDAC driver is in > > the similar layer as HBA driver in that it will register into scsi > > mid-layer as scsi_host_template and stays below mid-layer. However, all > > scsi I/Os coming to RDAC stack are routed by a path then dispatched to the > > real HBA driver via mid-layer. We used to rely on the timer in > > scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. > > Basically when I/O is coming to RDAC stack, we need to delete the timer > > for each I/O. Then after selecting a specific path for this I/O, we need > > to send the I/O back to mid-layer with a larger timeout value just to > > avoid the forced failover. When I/O completes successfully, we added the > > original timer back to the I/O and pass it over to upper block layer for > > further process. > > > > > > > > However, with the current timeout management functions moving to block > > layer, it became difficult for us to explicitly control the timeout value > > for specific I/O. > > > > Can you shed some lights on how to handle the I/O based timeout in this > > case? > > > > Since long term mpp capabilities should be handled by dm-mp and the SCSI > RDAC handler exporting functions to allow direct adding and deleting of the > timer may not be something that would be needed long term. It may not be > really clean to add these interfaces in. > > Could similar prior functionality be created by the following? > 1.) To the RDAC vhba add a hostt->eh_timed_out function. In this > timeout function return BLK_EH_RESET_TIMER until it is done with > the command. Since the vhba does not have a transport > scsi_times_out should call this function on every timeout. There is > some overhead here depending on the default timeout value set in > timing out and then resetting the timer. > > 2.) For each sdev that is taken over store the previous > rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp > specific function for the requests sent to the real HBA drivers. > > 3.) Set the timeout in the real HBA driver requests prior to > sending it to the mid layer. > > -andmike > -- > Michael Anderson > andmike@linux.vnet.ibm.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2008-12-10 23:11 ` Shi, Harris @ 2008-12-11 11:03 ` Hannes Reinecke 2008-12-16 16:55 ` Shi, Harris 0 siblings, 1 reply; 16+ messages in thread From: Hannes Reinecke @ 2008-12-11 11:03 UTC (permalink / raw) To: Shi, Harris; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list Hi Harris, Shi, Harris wrote: > Mike, > > Your suggestion on MPP driver is working pretty good on FC config in term of failover and failback. However, recently when we switched over to iscsi config on SLES11beta6 (2.6.27.7-4-default, SLES11 kernel did not sync up with current upstream one, all of timeout management patch has been pulled in), we were consistently hit by the following panic when we tried to do failover via controller sysReboot or placed offline. Is it something related to the recent timeout management patch introduced into the kernel? > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba > IP: [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b > PGD 196cf6067 PUD 196c4f067 PMD 0 > Oops: 0002 [1] SMP > last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map > CPU 2 > Modules linked in: radeon drm crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core d > Supported: Yes, External > Pid: 0, comm: swapper Tainted: G 2.6.27.7-4-default #1 > RIP: 0010:[<ffffffff80222047>] [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b > RSP: 0018:ffff88019f187e20 EFLAGS: 00010086 > RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff88019d8c3218 > RDX: ffff88019cd3d000 RSI: 0000000000002007 RDI: 00000000000000ba > RBP: ffff880194940918 R08: ffff880194940c78 R09: 0000000000000000 > R10: ffffffff80a65b80 R11: ffffffff8021c6ed R12: 0000000000000000 > R13: ffff880194940b50 R14: ffff88019f187ed0 R15: ffff880194940c90 > FS: 0000000000000000(0000) GS:ffff88019f157ec0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 00000000000000ba CR3: 0000000196cc7000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff88019f180000, task ffff88019f17e280) > Stack: ffffffff804aabe2 ffffffffa0321734 0000000000000000 ffff88018fc97c80 > ffff880194940918 ffffffffa0004f35 ffff88019d8c30d8 ffffffff8034b954 > ffff88019d8c30d8 ffffffff8034ba23 0000000000000282 0000000000000100 > Call Trace: > [<ffffffff804aabe2>] _spin_lock+0x13/0x15 > [<ffffffffa0321734>] iscsi_eh_cmd_timed_out+0x27/0xc0 [libiscsi] > [<ffffffffa0004f35>] scsi_times_out+0x46/0x72 [scsi_mod] > [<ffffffff8034b954>] blk_rq_timed_out+0xe/0x4a > [<ffffffff8034ba23>] blk_rq_timed_out_timer+0x93/0x116 > [<ffffffff8024a5f5>] run_timer_softirq+0x19a/0x228 > [<ffffffff8024696d>] __do_softirq+0x84/0x115 > [<ffffffff8020ddac>] call_softirq+0x1c/0x28 > [<ffffffff8020f177>] do_softirq+0x3c/0x81 > [<ffffffff80246684>] irq_exit+0x3f/0x83 > [<ffffffff8021cf73>] smp_apic_timer_interrupt+0x95/0xae > [<ffffffff8020d523>] apic_timer_interrupt+0x83/0x90 > [<ffffffff802134f4>] mwait_idle+0x3c/0x46 > [<ffffffff8020b3b5>] cpu_idle+0xa9/0xf1 > > > Code: ff 00 00 c1 ea 10 39 c2 0f 95 c0 0f b6 c0 c3 8b 17 89 d0 c1 f8 10 29 d0 25 ff ff 00 00 ff > RIP [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b > RSP <ffff88019f187e20> > ---------------------------------------------------------------------------- > > Your comment is very much appreciated. > Please try the latest SLES11 Beta/RC kernel. This bug is due to an incomplete patchset from mainline and should be resolved now. If not please do open a Novell bugzilla for it. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2008-12-11 11:03 ` Hannes Reinecke @ 2008-12-16 16:55 ` Shi, Harris 2008-12-17 7:33 ` Hannes Reinecke 0 siblings, 1 reply; 16+ messages in thread From: Shi, Harris @ 2008-12-16 16:55 UTC (permalink / raw) To: Hannes Reinecke; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list Hannes, Just let you know that the same panic is still there in SLES11RC1. Philip should be able to help us to report it in Novell Bugzilla. Thanks. Harris -----Original Message----- From: Hannes Reinecke [mailto:hare@suse.de] Sent: Thursday, December 11, 2008 5:03 AM To: Shi, Harris Cc: malahal@us.ibm.com; Mike Anderson; SCSI development list Subject: Re: question on block-layer timeout change Hi Harris, Shi, Harris wrote: > Mike, > > Your suggestion on MPP driver is working pretty good on FC config in term of failover and failback. However, recently when we switched over to iscsi config on SLES11beta6 (2.6.27.7-4-default, SLES11 kernel did not sync up with current upstream one, all of timeout management patch has been pulled in), we were consistently hit by the following panic when we tried to do failover via controller sysReboot or placed offline. Is it something related to the recent timeout management patch introduced into the kernel? > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba > IP: [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b > PGD 196cf6067 PUD 196c4f067 PMD 0 > Oops: 0002 [1] SMP > last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map > CPU 2 > Modules linked in: radeon drm crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core d > Supported: Yes, External > Pid: 0, comm: swapper Tainted: G 2.6.27.7-4-default #1 > RIP: 0010:[<ffffffff80222047>] [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b > RSP: 0018:ffff88019f187e20 EFLAGS: 00010086 > RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff88019d8c3218 > RDX: ffff88019cd3d000 RSI: 0000000000002007 RDI: 00000000000000ba > RBP: ffff880194940918 R08: ffff880194940c78 R09: 0000000000000000 > R10: ffffffff80a65b80 R11: ffffffff8021c6ed R12: 0000000000000000 > R13: ffff880194940b50 R14: ffff88019f187ed0 R15: ffff880194940c90 > FS: 0000000000000000(0000) GS:ffff88019f157ec0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 00000000000000ba CR3: 0000000196cc7000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff88019f180000, task ffff88019f17e280) > Stack: ffffffff804aabe2 ffffffffa0321734 0000000000000000 ffff88018fc97c80 > ffff880194940918 ffffffffa0004f35 ffff88019d8c30d8 ffffffff8034b954 > ffff88019d8c30d8 ffffffff8034ba23 0000000000000282 0000000000000100 > Call Trace: > [<ffffffff804aabe2>] _spin_lock+0x13/0x15 > [<ffffffffa0321734>] iscsi_eh_cmd_timed_out+0x27/0xc0 [libiscsi] > [<ffffffffa0004f35>] scsi_times_out+0x46/0x72 [scsi_mod] > [<ffffffff8034b954>] blk_rq_timed_out+0xe/0x4a > [<ffffffff8034ba23>] blk_rq_timed_out_timer+0x93/0x116 > [<ffffffff8024a5f5>] run_timer_softirq+0x19a/0x228 > [<ffffffff8024696d>] __do_softirq+0x84/0x115 > [<ffffffff8020ddac>] call_softirq+0x1c/0x28 > [<ffffffff8020f177>] do_softirq+0x3c/0x81 > [<ffffffff80246684>] irq_exit+0x3f/0x83 > [<ffffffff8021cf73>] smp_apic_timer_interrupt+0x95/0xae > [<ffffffff8020d523>] apic_timer_interrupt+0x83/0x90 > [<ffffffff802134f4>] mwait_idle+0x3c/0x46 > [<ffffffff8020b3b5>] cpu_idle+0xa9/0xf1 > > > Code: ff 00 00 c1 ea 10 39 c2 0f 95 c0 0f b6 c0 c3 8b 17 89 d0 c1 f8 10 29 d0 25 ff ff 00 00 ff > RIP [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b > RSP <ffff88019f187e20> > ---------------------------------------------------------------------------- > > Your comment is very much appreciated. > Please try the latest SLES11 Beta/RC kernel. This bug is due to an incomplete patchset from mainline and should be resolved now. If not please do open a Novell bugzilla for it. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2008-12-16 16:55 ` Shi, Harris @ 2008-12-17 7:33 ` Hannes Reinecke 2008-12-17 22:38 ` Shi, Harris 0 siblings, 1 reply; 16+ messages in thread From: Hannes Reinecke @ 2008-12-17 7:33 UTC (permalink / raw) To: Shi, Harris; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list Hi Harris, Shi, Harris wrote: > Hannes, > > Just let you know that the same panic is still there in SLES11RC1. Philip should > be able to help us to report it in Novell Bugzilla. > Ah. It might be related to the wrong scsi_device_online() check (cf my patch 'Check for deleted device in scsi_device_online()' earlier on this list). Problem is that the error handler just checks with scsi_device_online() if a command can be sent down the wire. And so for a failed/deleted devices the error handler just barfs at random places. Can you check if this patch resolves the issue? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2008-12-17 7:33 ` Hannes Reinecke @ 2008-12-17 22:38 ` Shi, Harris 2008-12-18 9:23 ` Mike Anderson 0 siblings, 1 reply; 16+ messages in thread From: Shi, Harris @ 2008-12-17 22:38 UTC (permalink / raw) To: Hannes Reinecke; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list [-- Attachment #1: Type: text/plain, Size: 1227 bytes --] Hannes, I rebuilt a kernel with your patch on a 2.6.28-rc8 kernel and hit the same panic as before. The attached trace is for your reference. Thanks. Harris -----Original Message----- From: Hannes Reinecke [mailto:hare@suse.de] Sent: Wednesday, December 17, 2008 1:33 AM To: Shi, Harris Cc: malahal@us.ibm.com; Mike Anderson; SCSI development list Subject: Re: question on block-layer timeout change Hi Harris, Shi, Harris wrote: > Hannes, > > Just let you know that the same panic is still there in SLES11RC1. Philip should > be able to help us to report it in Novell Bugzilla. > Ah. It might be related to the wrong scsi_device_online() check (cf my patch 'Check for deleted device in scsi_device_online()' earlier on this list). Problem is that the error handler just checks with scsi_device_online() if a command can be sent down the wire. And so for a failed/deleted devices the error handler just barfs at random places. Can you check if this patch resolves the issue? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) [-- Attachment #2: panic_on_iscsi_array_with_failover.txt --] [-- Type: text/plain, Size: 5282 bytes --] Information from /var/log/messages: =================================== Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Information from Serial output: =============================== Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 *pdpt = 00000000319fe001 *pde = 0000000000000000 BUG: unable to handle kernel NULL pointer dereference at 00000086 IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 *pdpt = 0000000000546001 *pde = 0000000000000000 ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 EIP is at __ticket_spin_lock+0x8/0x19 EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) Stack: f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 Call Trace: [<c0328e0f>] ? _spin_lock+0x15/0x18 [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] [<c0212539>] ? blk_rq_timed_out+0xc/0x46 [<c02125f2>] ? blk_rq_timed_out_timer+0x7f/0xf6 [<c0212573>] ? blk_rq_timed_out_timer+0x0/0xf6 [<c013744c>] ? run_timer_softirq+0x154/0x1c7 [<c0133ab0>] ? __do_softirq+0x8d/0x133 [<c0133b9e>] ? do_softirq+0x48/0x57 [<c0133cad>] ? irq_exit+0x38/0x6d [<c0112e8d>] ? smp_apic_timer_interrupt+0x71/0x7f [<c0105f78>] ? apic_timer_interrupt+0x28/0x30 [<c010a75d>] ? mwait_idle+0x32/0x40 [<c0103c36>] ? cpu_idle+0x74/0x8e [<c0324407>] ? start_secondary+0x269/0x26e Code: fe ff ff 5b eb 13 56 0f b7 d2 ff 75 08 89 d9 0f b6 c0 e8 44 fe ff ff 5a 59 8d 65 f8 5b 5e 5d c3 90 90 90 55 ba 00 01 EIP: [<c011a274>] __ticket_spin_lock+0x8/0x19 SS:ESP 0068:f38a5e78 Oops: 0002 [#2] <0>Kernel panic - not syncing: Fatal exception in interrupt ------------[ cut here ]------------ WARNING: at kernel/smp.c:333 smp_call_function_mask+0x27/0x1b6() Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i] Pid: 0, comm: swapper Tainted: G D 2.6.28-rc8-test-1-pae #1 Call Trace: [<c0326a95>] ? printk+0xf/0x12 [<c012f173>] warn_on_slowpath+0x41/0x63 [<c02207fc>] ? __const_udelay+0x2c/0x2e [<c02207fc>] ? __const_udelay+0x2c/0x2e [<c028afc1>] ? wait_for_xmitr+0x37/0x7d [<c028afc1>] ? wait_for_xmitr+0x37/0x7d [<c014d326>] smp_call_function_mask+0x27/0x1b6 [<c011a346>] ? default_spin_lock_flags+0x15/0x1b [<c011a346>] ? default_spin_lock_flags+0x15/0x1b [<c0143b34>] ? down_trylock+0x20/0x29 [<c012f4ea>] ? try_acquire_console_sem+0xd/0x2e [<c0157040>] ? crash_kexec+0x9f/0xa7 [<c014d4d3>] smp_call_function+0x1e/0x25 [<c0111980>] ? stop_this_cpu+0x0/0x51 [<c0111956>] native_smp_send_stop+0x1b/0x45 [<c03269ef>] panic+0x48/0xdf [<c0329a1b>] oops_end+0x8f/0xa3 [<c01074c7>] die+0x5d/0x65 [<c032b422>] do_page_fault+0x9ef/0xaf3 [<c0124b8a>] ? enqueue_task_fair+0x1f/0x59 [<c012120e>] ? resched_task+0x30/0x74 [<c0128a9c>] ? try_to_wake_up+0x216/0x221 [<c0128ab2>] ? default_wake_function+0xb/0xd [<c01403f2>] ? autoremove_wake_function+0xf/0x33 [<c014044c>] ? wake_bit_function+0x36/0x43 [<c0120b54>] ? __wake_up_common+0x35/0x5b [<c016e4dc>] ? mempool_free_slab+0xe/0x10 [<c0124b63>] ? enqueue_entity+0x297/0x29f [<c0124b8a>] ? enqueue_task_fair+0x1f/0x59 [<c012057e>] ? enqueue_task+0x4c/0x58 [<c012120e>] ? resched_task+0x30/0x74 [<c032aa33>] ? do_page_fault+0x0/0xaf3 [<c0329132>] error_code+0x72/0x80 [<c014007b>] ? kthreadd+0xa8/0x156 [<c011a274>] ? __ticket_spin_lock+0x8/0x19 [<c0328e0f>] _spin_lock+0x15/0x18 [<f9298389>] iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] [<f7e13396>] scsi_times_out+0x35/0x61 [scsi_mod] [<c0212539>] blk_rq_timed_out+0xc/0x46 [<c02125f2>] blk_rq_timed_out_timer+0x7f/0xf6 [<c0212573>] ? blk_rq_timed_out_timer+0x0/0xf6 [<c013744c>] run_timer_softirq+0x154/0x1c7 [<c0133ab0>] __do_softirq+0x8d/0x133 [<c0133b9e>] do_softirq+0x48/0x57 [<c0133cad>] irq_exit+0x38/0x6d [<c0112e8d>] smp_apic_timer_interrupt+0x71/0x7f [<c0105f78>] apic_timer_interrupt+0x28/0x30 [<c010a75d>] ? mwait_idle+0x32/0x40 [<c0103c36>] cpu_idle+0x74/0x8e [<c0324407>] start_secondary+0x269/0x26e ---[ end trace d93b1e584c414659 ]--- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2008-12-17 22:38 ` Shi, Harris @ 2008-12-18 9:23 ` Mike Anderson 2008-12-18 22:37 ` Shi, Harris 2009-01-04 17:12 ` Mike Christie 0 siblings, 2 replies; 16+ messages in thread From: Mike Anderson @ 2008-12-18 9:23 UTC (permalink / raw) To: Shi, Harris Cc: Hannes Reinecke, malahal@us.ibm.com, SCSI development list, Mike Christie Shi, Harris <Harris.Shi@lsi.com> wrote: > Information from /var/log/messages: > =================================== > Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] > Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 > Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 > Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) > Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) > > > > Information from Serial output: > =============================== > Oops: 0002 [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map > Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 > IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 > *pdpt = 00000000319fe001 *pde = 0000000000000000 > BUG: unable to handle kernel NULL pointer dereference at 00000086 > IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 > *pdpt = 0000000000546001 *pde = 0000000000000000 > ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] > > Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 > EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 > EIP is at __ticket_spin_lock+0x8/0x19 > EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 > ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) > Stack: > f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 > f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 > f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 > Call Trace: > [<c0328e0f>] ? _spin_lock+0x15/0x18 > [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] > [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] > [<c0212539>] ? blk_rq_timed_out+0xc/0x46 I could not match my listing exactly with this output, but it appears that the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line would help verify the iscsi_eh_cmd_timed_out line. I added Mike C to the email cc for possible comments on the error messages displayed above and if that would lead to cleanup of structures referenced in iscsi_eh_cmd_timed_out. Question on the system setup. Are you using mpp in this kernel as I did not see it in the module list? -andmike -- Michael Anderson andmike@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2008-12-18 9:23 ` Mike Anderson @ 2008-12-18 22:37 ` Shi, Harris 2009-01-04 17:12 ` Mike Christie 1 sibling, 0 replies; 16+ messages in thread From: Shi, Harris @ 2008-12-18 22:37 UTC (permalink / raw) To: Mike Anderson Cc: Hannes Reinecke, malahal@us.ibm.com, SCSI development list, Mike Christie Mike, Yes. We do have MPP loaded in SLES11RC1. The mpp modules were cut off by the serial windows. It was actually there. The NULL session was causing the problem after controller was place offline. Actually it only took bit more than 15s after controller offline to hit the panic. Thanks. Harris -----Original Message----- From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com] Sent: Thursday, December 18, 2008 3:24 AM To: Shi, Harris Cc: Hannes Reinecke; malahal@us.ibm.com; SCSI development list; Mike Christie Subject: Re: question on block-layer timeout change Shi, Harris <Harris.Shi@lsi.com> wrote: > Information from /var/log/messages: > =================================== > Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] > Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 > Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 > Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) > Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) > > > > Information from Serial output: > =============================== > Oops: 0002 [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map > Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 > IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 > *pdpt = 00000000319fe001 *pde = 0000000000000000 > BUG: unable to handle kernel NULL pointer dereference at 00000086 > IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 > *pdpt = 0000000000546001 *pde = 0000000000000000 > ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] > > Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 > EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 > EIP is at __ticket_spin_lock+0x8/0x19 > EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 > ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) > Stack: > f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 > f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 > f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 > Call Trace: > [<c0328e0f>] ? _spin_lock+0x15/0x18 > [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] > [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] > [<c0212539>] ? blk_rq_timed_out+0xc/0x46 I could not match my listing exactly with this output, but it appears that the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line would help verify the iscsi_eh_cmd_timed_out line. I added Mike C to the email cc for possible comments on the error messages displayed above and if that would lead to cleanup of structures referenced in iscsi_eh_cmd_timed_out. Question on the system setup. Are you using mpp in this kernel as I did not see it in the module list? -andmike -- Michael Anderson andmike@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2008-12-18 9:23 ` Mike Anderson 2008-12-18 22:37 ` Shi, Harris @ 2009-01-04 17:12 ` Mike Christie 2009-01-07 6:37 ` Shi, Harris 1 sibling, 1 reply; 16+ messages in thread From: Mike Christie @ 2009-01-04 17:12 UTC (permalink / raw) To: Mike Anderson Cc: Shi, Harris, Hannes Reinecke, malahal@us.ibm.com, SCSI development list Mike Anderson wrote: > Shi, Harris <Harris.Shi@lsi.com> wrote: >> Information from /var/log/messages: >> =================================== >> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] >> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 >> Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 >> Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) >> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) >> >> >> >> Information from Serial output: >> =============================== >> Oops: 0002 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map >> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 >> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >> *pdpt = 00000000319fe001 *pde = 0000000000000000 >> BUG: unable to handle kernel NULL pointer dereference at 00000086 >> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >> *pdpt = 0000000000546001 *pde = 0000000000000000 >> ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] >> >> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 >> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 >> EIP is at __ticket_spin_lock+0x8/0x19 >> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 >> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 >> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) >> Stack: >> f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 >> f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 >> f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 >> Call Trace: >> [<c0328e0f>] ? _spin_lock+0x15/0x18 >> [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] >> [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] >> [<c0212539>] ? blk_rq_timed_out+0xc/0x46 > > I could not match my listing exactly with this output, but it appears that > the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line > would help verify the iscsi_eh_cmd_timed_out line. > > I added Mike C to the email cc for possible comments on the error messages > displayed above and if that would lead to cleanup of structures referenced > in iscsi_eh_cmd_timed_out. > Sorry for the late reply. I have been on vacation. The iscsi error message just indicates that the initiator tried to send a iscsi ping and it did not get a response, so the initiator dropped the session. The error was reported as a generic connection error (1011) and when the error was fired the initiator was in the logged in / full feature phase (this basically means normal old use and nothing special). Due to the sles use and MPP driver, I am not sure what exactly is running, but for this code path the iscsi driver does this: cls_session = starget_to_session(scsi_target(scmd->device)); session = cls_session->dd_data; to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels do that has not changed). The session pointers are only changed when a session is destroyed, and that only happens if you do a logout of the session (iscsiadm -m .... -u), and at that time when the session is destroyed we should have flushed all IO. Are you guys doing a logout of the session with iscsiadm at this time? Does MPP clone commands and is it doing something with the command's pointers to the device? Another possibilty is that commands are not getting cleaned up correctly. When you see this "connection2:0: detected conn error (1011)", the driver is going to kill all outstadnding commands and and call scsi_done on them to requeue them with the scsi layer, so we should be getting any commands timed out after you see that message (maybe only in some race case where the session's commands are getting flushed at the exact same time the scsi eh was firing). There was no time stamp on the oops output but I doubt this happened. Did the oops happen after the conn error message though? ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2009-01-04 17:12 ` Mike Christie @ 2009-01-07 6:37 ` Shi, Harris 2009-01-07 20:46 ` Mike Christie 0 siblings, 1 reply; 16+ messages in thread From: Shi, Harris @ 2009-01-07 6:37 UTC (permalink / raw) To: Mike Christie, Mike Anderson Cc: Hannes Reinecke, malahal@us.ibm.com, SCSI development list What we did is to sysReboot one controller while I/O is up and running, we expected to have controller failover and did not explicitly run iscsiadm to logout the session in this case. Yes. We got the oop after conn error message. If controller reboot resulted in session gone, that might explain we had a kernel panic. However, MPP has been running on iscsi array for quite a while on SLES10/REHL distribution, it's the first time we got hit by this kernel panic. That's why we had a some concerns on iscsi stack. -----Original Message----- From: Mike Christie [mailto:michaelc@cs.wisc.edu] Sent: Sunday, January 04, 2009 11:12 AM To: Mike Anderson Cc: Shi, Harris; Hannes Reinecke; malahal@us.ibm.com; SCSI development list Subject: Re: question on block-layer timeout change Mike Anderson wrote: > Shi, Harris <Harris.Shi@lsi.com> wrote: >> Information from /var/log/messages: >> =================================== >> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] >> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 >> Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 >> Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) >> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) >> >> >> >> Information from Serial output: >> =============================== >> Oops: 0002 [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map >> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 >> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >> *pdpt = 00000000319fe001 *pde = 0000000000000000 >> BUG: unable to handle kernel NULL pointer dereference at 00000086 >> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >> *pdpt = 0000000000546001 *pde = 0000000000000000 >> ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] >> >> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 >> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 >> EIP is at __ticket_spin_lock+0x8/0x19 >> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 >> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 >> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) >> Stack: >> f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 >> f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 >> f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 >> Call Trace: >> [<c0328e0f>] ? _spin_lock+0x15/0x18 >> [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] >> [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] >> [<c0212539>] ? blk_rq_timed_out+0xc/0x46 > > I could not match my listing exactly with this output, but it appears that > the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line > would help verify the iscsi_eh_cmd_timed_out line. > > I added Mike C to the email cc for possible comments on the error messages > displayed above and if that would lead to cleanup of structures referenced > in iscsi_eh_cmd_timed_out. > Sorry for the late reply. I have been on vacation. The iscsi error message just indicates that the initiator tried to send a iscsi ping and it did not get a response, so the initiator dropped the session. The error was reported as a generic connection error (1011) and when the error was fired the initiator was in the logged in / full feature phase (this basically means normal old use and nothing special). Due to the sles use and MPP driver, I am not sure what exactly is running, but for this code path the iscsi driver does this: cls_session = starget_to_session(scsi_target(scmd->device)); session = cls_session->dd_data; to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels do that has not changed). The session pointers are only changed when a session is destroyed, and that only happens if you do a logout of the session (iscsiadm -m .... -u), and at that time when the session is destroyed we should have flushed all IO. Are you guys doing a logout of the session with iscsiadm at this time? Does MPP clone commands and is it doing something with the command's pointers to the device? Another possibilty is that commands are not getting cleaned up correctly. When you see this "connection2:0: detected conn error (1011)", the driver is going to kill all outstadnding commands and and call scsi_done on them to requeue them with the scsi layer, so we should be getting any commands timed out after you see that message (maybe only in some race case where the session's commands are getting flushed at the exact same time the scsi eh was firing). There was no time stamp on the oops output but I doubt this happened. Did the oops happen after the conn error message though? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: question on block-layer timeout change 2009-01-07 6:37 ` Shi, Harris @ 2009-01-07 20:46 ` Mike Christie 2009-01-24 16:34 ` Shi, Harris 0 siblings, 1 reply; 16+ messages in thread From: Mike Christie @ 2009-01-07 20:46 UTC (permalink / raw) To: Shi, Harris Cc: Mike Anderson, Hannes Reinecke, malahal@us.ibm.com, SCSI development list Shi, Harris wrote: > What we did is to sysReboot one controller while I/O is up and running, we expected to have controller failover and did not explicitly run iscsiadm to logout the session in this case. Yes. We got the oop after conn error message. If controller reboot resulted in session gone, that might explain we had a kernel panic. However, MPP has been running on iscsi array for quite a while on SLES10/REHL distribution, it's the first time we got hit by this kernel panic. That's why we had a some concerns on iscsi stack. > The failover should not have resulted in the session getting destroyed. It might depend on what is in SLES though. In older tools if the chap values changed (we logged in with one user/pass and the target no wants diff values) or for some errors the initiator thought was fatal. For this though you should have seen a error in logs indicating that we got a fatal login error. What does the portal return when you were rebooting it if the initiator tries to log in? Look at the open-iscsi userspace tools in SLES. Is your target returning a error values that initiator.c:__check_iscsi_status_class() would destroy the session for? Is it now changed upstream? > -----Original Message----- > From: Mike Christie [mailto:michaelc@cs.wisc.edu] > Sent: Sunday, January 04, 2009 11:12 AM > To: Mike Anderson > Cc: Shi, Harris; Hannes Reinecke; malahal@us.ibm.com; SCSI development list > Subject: Re: question on block-layer timeout change > > Mike Anderson wrote: >> Shi, Harris <Harris.Shi@lsi.com> wrote: >>> Information from /var/log/messages: >>> =================================== >>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] >>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 >>> Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 >>> Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) >>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) >>> >>> >>> >>> Information from Serial output: >>> =============================== >>> Oops: 0002 [#1] SMP >>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map >>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 >>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >>> *pdpt = 00000000319fe001 *pde = 0000000000000000 >>> BUG: unable to handle kernel NULL pointer dereference at 00000086 >>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >>> *pdpt = 0000000000546001 *pde = 0000000000000000 >>> ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] >>> >>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 >>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 >>> EIP is at __ticket_spin_lock+0x8/0x19 >>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 >>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 >>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) >>> Stack: >>> f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 >>> f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 >>> f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 >>> Call Trace: >>> [<c0328e0f>] ? _spin_lock+0x15/0x18 >>> [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] >>> [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] >>> [<c0212539>] ? blk_rq_timed_out+0xc/0x46 >> I could not match my listing exactly with this output, but it appears that >> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line >> would help verify the iscsi_eh_cmd_timed_out line. >> >> I added Mike C to the email cc for possible comments on the error messages >> displayed above and if that would lead to cleanup of structures referenced >> in iscsi_eh_cmd_timed_out. >> > > Sorry for the late reply. I have been on vacation. > > The iscsi error message just indicates that the initiator tried to send > a iscsi ping and it did not get a response, so the initiator dropped the > session. The error was reported as a generic connection error (1011) and > when the error was fired the initiator was in the logged in / full > feature phase (this basically means normal old use and nothing special). > > Due to the sles use and MPP driver, I am not sure what exactly is > running, but for this code path the iscsi driver does this: > > cls_session = starget_to_session(scsi_target(scmd->device)); > session = cls_session->dd_data; > > to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels > do that has not changed). > > The session pointers are only changed when a session is destroyed, and > that only happens if you do a logout of the session (iscsiadm -m .... > -u), and at that time when the session is destroyed we should have > flushed all IO. > > Are you guys doing a logout of the session with iscsiadm at this time? > > Does MPP clone commands and is it doing something with the command's > pointers to the device? > > Another possibilty is that commands are not getting cleaned up > correctly. When you see this "connection2:0: detected conn error > (1011)", the driver is going to kill all outstadnding commands and and > call scsi_done on them to requeue them with the scsi layer, so we should > be getting any commands timed out after you see that message (maybe only > in some race case where the session's commands are getting flushed at > the exact same time the scsi eh was firing). There was no time stamp on > the oops output but I doubt this happened. Did the oops happen after the > conn error message though? ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: question on block-layer timeout change 2009-01-07 20:46 ` Mike Christie @ 2009-01-24 16:34 ` Shi, Harris 0 siblings, 0 replies; 16+ messages in thread From: Shi, Harris @ 2009-01-24 16:34 UTC (permalink / raw) To: Mike Christie Cc: Mike Anderson, Hannes Reinecke, malahal@us.ibm.com, SCSI development list [-- Attachment #1: Type: text/plain, Size: 6822 bytes --] Christie, Sorry for the late response. I was off-track by other issues for a while. On controller sysReboot, it actually did not send any asynchronous event/message for iscsid to destroy the session. It simply went offline for a while and then back online. Now I was able to enable iscsid debug trace and capture a log for your reference. My serial console was disconnected during the weekend for some reasons, whenever it's back online, I might try some more cases. Thanks. Harris -----Original Message----- From: Mike Christie [mailto:michaelc@cs.wisc.edu] Sent: Wednesday, January 07, 2009 2:46 PM To: Shi, Harris Cc: Mike Anderson; Hannes Reinecke; malahal@us.ibm.com; SCSI development list Subject: Re: question on block-layer timeout change Shi, Harris wrote: > What we did is to sysReboot one controller while I/O is up and running, we expected to have controller failover and did not explicitly run iscsiadm to logout the session in this case. Yes. We got the oop after conn error message. If controller reboot resulted in session gone, that might explain we had a kernel panic. However, MPP has been running on iscsi array for quite a while on SLES10/REHL distribution, it's the first time we got hit by this kernel panic. That's why we had a some concerns on iscsi stack. > The failover should not have resulted in the session getting destroyed. It might depend on what is in SLES though. In older tools if the chap values changed (we logged in with one user/pass and the target no wants diff values) or for some errors the initiator thought was fatal. For this though you should have seen a error in logs indicating that we got a fatal login error. What does the portal return when you were rebooting it if the initiator tries to log in? Look at the open-iscsi userspace tools in SLES. Is your target returning a error values that initiator.c:__check_iscsi_status_class() would destroy the session for? Is it now changed upstream? > -----Original Message----- > From: Mike Christie [mailto:michaelc@cs.wisc.edu] > Sent: Sunday, January 04, 2009 11:12 AM > To: Mike Anderson > Cc: Shi, Harris; Hannes Reinecke; malahal@us.ibm.com; SCSI development list > Subject: Re: question on block-layer timeout change > > Mike Anderson wrote: >> Shi, Harris <Harris.Shi@lsi.com> wrote: >>> Information from /var/log/messages: >>> =================================== >>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] >>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 >>> Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 >>> Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) >>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) >>> >>> >>> >>> Information from Serial output: >>> =============================== >>> Oops: 0002 [#1] SMP >>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map >>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 >>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >>> *pdpt = 00000000319fe001 *pde = 0000000000000000 >>> BUG: unable to handle kernel NULL pointer dereference at 00000086 >>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 >>> *pdpt = 0000000000546001 *pde = 0000000000000000 >>> ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] >>> >>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 >>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 >>> EIP is at __ticket_spin_lock+0x8/0x19 >>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 >>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 >>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) >>> Stack: >>> f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 >>> f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 >>> f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 >>> Call Trace: >>> [<c0328e0f>] ? _spin_lock+0x15/0x18 >>> [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] >>> [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] >>> [<c0212539>] ? blk_rq_timed_out+0xc/0x46 >> I could not match my listing exactly with this output, but it appears that >> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line >> would help verify the iscsi_eh_cmd_timed_out line. >> >> I added Mike C to the email cc for possible comments on the error messages >> displayed above and if that would lead to cleanup of structures referenced >> in iscsi_eh_cmd_timed_out. >> > > Sorry for the late reply. I have been on vacation. > > The iscsi error message just indicates that the initiator tried to send > a iscsi ping and it did not get a response, so the initiator dropped the > session. The error was reported as a generic connection error (1011) and > when the error was fired the initiator was in the logged in / full > feature phase (this basically means normal old use and nothing special). > > Due to the sles use and MPP driver, I am not sure what exactly is > running, but for this code path the iscsi driver does this: > > cls_session = starget_to_session(scsi_target(scmd->device)); > session = cls_session->dd_data; > > to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels > do that has not changed). > > The session pointers are only changed when a session is destroyed, and > that only happens if you do a logout of the session (iscsiadm -m .... > -u), and at that time when the session is destroyed we should have > flushed all IO. > > Are you guys doing a logout of the session with iscsiadm at this time? > > Does MPP clone commands and is it doing something with the command's > pointers to the device? > > Another possibilty is that commands are not getting cleaned up > correctly. When you see this "connection2:0: detected conn error > (1011)", the driver is going to kill all outstadnding commands and and > call scsi_done on them to requeue them with the scsi layer, so we should > be getting any commands timed out after you see that message (maybe only > in some race case where the session's commands are getting flushed at > the exact same time the scsi eh was firing). There was no time stamp on > the oops output but I doubt this happened. Did the oops happen after the > conn error message though? [-- Attachment #2: iscsid_debug_on.txt --] [-- Type: text/plain, Size: 6791 bytes --] =~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2009.01.24 09:37:15 =~=~=~=~=~=~=~=~=~=~=~= iscsid: poll result 1 iscsid: in ctldev_handle iscsid: in nl_read iscsid: ctldev_handle got event type 102 iscsid: get conn context 0x6d7b30 iscsid: message real length is 72 bytes, recv_handle 0x6d9430 iscsid: in nlpayload_read iscsid: sched conn context 0x6d7b30 event 3, tmo 0 iscsid: thread 0x6d7b30 schedule: delay 0 state 3 iscsid: exec thread 006d7b30 callback iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) iscsid: put conn context 0x6d7b30 iscsid: re-opening session 1 (reopen_cnt 0) iscsid: thread 006d59d0 delete: state 3 iscsid: thread 006d5a08 delete: state 3 iscsid: disconnecting conn 0x6d3508, fd 5 iscsid: in kstop_conn iscsid: in __kipc_call iscsid: in kwritev BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba IP: [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b PGD 23dc067 PUD 238c067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /sys/devices/mppvirtual/mppadapter0/host6/target6:0:0/6:0:0:8/block/sdj/dev CPU 0 Modules linked in: crc32c libcrc32c radeon drm iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc microcode ipv6 fuse loop dm_mod dcdbas(X) rtc_cmos rtc_core rtc_lib e1000 video mptctl output floppy iTCO_wdt iTCO_vendor_support e752x_edac button pcspkr edac_core shpchp sr_mod pci_hotplug cdrom uhci_hcd ehci_hcd usbcore mppVhba edd ext3 mbcache jbd fan ide_pci_generic piix ide_core ata_generic ata_piix libata dock mptspi mptscsih mptbase scsi_transport_spi thermal processor thermal_sys hwmon mppUpper sg sd_mod crc_t10dif scsi_mod Supported: Yes, External Pid: 8, comm: kblockd/0 Tainted: G 2.6.27.11-1-default #1 RIP: 0010:[<ffffffff8049bd5f>] [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b RSP: 0018:ffffffff80a36e38 EFLAGS: 00010082 RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff8800070b2a38 RDX: ffff880002c16000 RSI: 0000000000002007 RDI: 00000000000000ba RBP: ffff88000472a240 R08: ffff88000472a5a0 R09: 0000000000000000 R10: 0000000000000000 R11: ffff8800010dc228 R12: 0000000000000000 R13: ffff880005482dc8 R14: ffffffff80a36ed0 R15: ffff88000472a5b8 FS: 0000000000000000(0000) GS:ffffffff808cac80(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000000ba CR3: 00000000023d4000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kblockd/0 (pid: 8, threadinfo ffff880008628000, task ffff880008626200) Stack: ffffffffa037172f ffff8800068e5380 ffffffffa0004f99 ffff8800070b28f8 ffffffff8033e773 ffff8800070b28f8 ffffffff8033e91a 0000000000000282 ffff88000472a240 0000000000000100 ffffffff80ab7680 ffffffff8033e887 Call Trace: [<ffffffffa037172f>] iscsi_eh_cmd_timed_out+0x22/0xa3 [libiscsi] [<ffffffffa0004f99>] scsi_times_out+0x46/0x72 [scsi_mod] [<ffffffff8033e773>] blk_rq_timed_out+0xe/0x4a [<ffffffff8033e91a>] blk_rq_timed_out_timer+0x93/0x119 [<ffffffff802465f0>] run_timer_softirq+0x18d/0x204 [<ffffffff80242b4d>] __do_softirq+0x7d/0x107 [<ffffffff8020d2dc>] call_softirq+0x1c/0x28 [<ffffffff8020e583>] do_softirq+0x2c/0x68 [<ffffffff8024287a>] irq_exit+0x3f/0x85 [<ffffffff8021b4f8>] smp_apic_timer_interrupt+0x95/0xae [<ffffffff8020ccb6>] apic_timer_interrupt+0x66/0x70 [<ffffffff8049bd99>] _spin_unlock_irqrestore+0x5/0x6 [<ffffffffa0006ed1>] scsi_queue_insert+0x94/0xda [scsi_mod] [<ffffffffa000090a>] scsi_dispatch_cmd+0x1ec/0x235 [scsi_mod] [<ffffffffa0005fda>] scsi_request_fn+0x391/0x4d3 [scsi_mod] [<ffffffff8033ad53>] generic_unplug_device+0x21/0x2e [<ffffffff80339b39>] blk_unplug_work+0x5e/0x65 [<ffffffff8024cdf9>] run_workqueue+0x7a/0x100 [<ffffffff8024cf57>] worker_thread+0xd8/0xe7 [<ffffffff8024ff67>] kthread+0x47/0x73 [<ffffffff8020cf79>] child_rip+0xa/0x11 Code: a2 ec ff c3 9c 58 fa 90 81 2f 00 00 00 01 74 05 e8 a7 a2 ec ff c3 fa 90 81 2f 00 00 00 01 74 05 e8 97 a2 ec ff c3 b8 00 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5 RIP [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b RSP <ffffffff80a36e38> CR2: 00000000000000ba ---[ end trace 1330da6161ef128a ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! ------------[ cut here ]------------ WARNING: at kernel/smp.c:331 smp_call_function_mask+0x32/0x1ec() Modules linked in: crc32c libcrc32c radeon drm iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc microcode ipv6 fuse loop dm_mod dcdbas(X) rtc_cmos rtc_core rtc_lib e1000 video mptctl output floppy iTCO_wdt iTCO_vendor_support e752x_edac button pcspkr edac_core shpchp sr_mod pci_hotplug cdrom uhci_hcd ehci_hcd usbcore mppVhba edd ext3 mbcache jbd fan ide_pci_generic piix ide_core ata_generic ata_piix libata dock mptspi mptscsih mptbase scsi_transport_spi thermal processor thermal_sys hwmon mppUpper sg sd_mod crc_t10dif scsi_mod Supported: Yes, External Pid: 8, comm: kblockd/0 Tainted: G D 2.6.27.11-1-default #1 Call Trace: [<ffffffff8020da29>] show_trace_log_lvl+0x41/0x58 [<ffffffff80499eaf>] dump_stack+0x69/0x6f [<ffffffff8023d9b2>] warn_on_slowpath+0x51/0x77 [<ffffffff8025d609>] smp_call_function_mask+0x32/0x1ec [<ffffffff8025d7ec>] smp_call_function+0x29/0x2e [<ffffffff8021a34a>] native_smp_send_stop+0x1a/0x26 [<ffffffff80499f71>] panic+0xbc/0x169 [<ffffffff80241256>] do_exit+0x67/0x325 [<ffffffff8049c6dc>] oops_begin+0x0/0x90 [<ffffffff8049e42e>] do_page_fault+0x5fa/0x6ca [<ffffffff8049bfb9>] error_exit+0x0/0x51 [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b [<ffffffffa037172f>] iscsi_eh_cmd_timed_out+0x22/0xa3 [libiscsi] [<ffffffffa0004f99>] scsi_times_out+0x46/0x72 [scsi_mod] [<ffffffff8033e773>] blk_rq_timed_out+0xe/0x4a [<ffffffff8033e91a>] blk_rq_timed_out_timer+0x93/0x119 [<ffffffff802465f0>] run_timer_softirq+0x18d/0x204 [<ffffffff80242b4d>] __do_softirq+0x7d/0x107 [<ffffffff8020d2dc>] call_softirq+0x1c/0x28 [<ffffffff8020e583>] do_softirq+0x2c/0x68 [<ffffffff8024287a>] irq_exit+0x3f/0x85 [<ffffffff8021b4f8>] smp_apic_timer_interrupt+0x95/0xae [<ffffffff8020ccb6>] apic_timer_interrupt+0x66/0x70 [<ffffffff8049bd99>] _spin_unlock_irqrestore+0x5/0x6 [<ffffffffa0006ed1>] scsi_queue_insert+0x94/0xda [scsi_mod] [<ffffffffa000090a>] scsi_dispatch_cmd+0x1ec/0x235 [scsi_mod] [<ffffffffa0005fda>] scsi_request_fn+0x391/0x4d3 [scsi_mod] [<ffffffff8033ad53>] generic_unplug_device+0x21/0x2e [<ffffffff80339b39>] blk_unplug_work+0x5e/0x65 [<ffffffff8024cdf9>] run_workqueue+0x7a/0x100 [<ffffffff8024cf57>] worker_thread+0xd8/0xe7 [<ffffffff8024ff67>] kthread+0x47/0x73 [<ffffffff8020cf79>] child_rip+0xa/0x11 ---[ end trace 1330da6161ef128a ]--- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Question on block-layer timeout change @ 2008-11-11 16:26 Shi, Harris 0 siblings, 0 replies; 16+ messages in thread From: Shi, Harris @ 2008-11-11 16:26 UTC (permalink / raw) To: linux-scsi@vger.kernel.org Cc: Jens Axboe, Alan Stern, Tejun Heo, FUJITA Tomonori Due to the current timeout management change, LSI RDAC (failover) driver had some difficulties in handling SCSI I/O timeout. The RDAC driver is in the similar layer as HBA driver in that it will register into scsi mid-layer as scsi_host_template and stays below mid-layer. All SCSI I/Os coming to RDAC stack are routed by an available path then dispatched to the real HBA driver via mid-layer. We used to rely on the timer in scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. Basically when I/O is coming to RDAC stack, we need to delete the timer for each individual I/O. Then after selecting a specific path for this I/O, we need to send the I/O back to mid-layer with a larger timeout value just to avoid the forced failover. When I/O completes successfully, we added the ori ginal timer back to the I/O and pass it over to upper block layer for further process. However, with the current timeout management functions moving to block layer, it became difficult for us to explicitly control the timeout value for specific SCSI I/O, which ended up causing many issues on failover. Can you shed some lights on how to handle the I/O based timeout in this case? Harris ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-01-24 16:34 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <3568BBCB98C00041A9E622952FD5F24EA11C9F3A@cosmail03.lsi.com>
2008-11-12 7:29 ` question on block-layer timeout change Mike Anderson
2008-11-12 17:16 ` malahal
2008-11-14 8:51 ` Shi, Harris
2008-11-14 17:18 ` malahal
2008-12-10 23:11 ` Shi, Harris
2008-12-11 11:03 ` Hannes Reinecke
2008-12-16 16:55 ` Shi, Harris
2008-12-17 7:33 ` Hannes Reinecke
2008-12-17 22:38 ` Shi, Harris
2008-12-18 9:23 ` Mike Anderson
2008-12-18 22:37 ` Shi, Harris
2009-01-04 17:12 ` Mike Christie
2009-01-07 6:37 ` Shi, Harris
2009-01-07 20:46 ` Mike Christie
2009-01-24 16:34 ` Shi, Harris
2008-11-11 16:26 Question " Shi, Harris
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).