Re: question on block-layer timeout change

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: question on block-layer timeout change
       [not found] <3568BBCB98C00041A9E622952FD5F24EA11C9F3A@cosmail03.lsi.com>
@ 2008-11-12  7:29 ` Mike Anderson
  2008-11-12 17:16   ` malahal
  2008-11-14  8:51   ` Shi, Harris
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Anderson @ 2008-11-12  7:29 UTC (permalink / raw)
  To: Shi, Harris; +Cc: Jens Axboe, Alan Stern, Tejun Heo, SCSI development list

Shi, Harris <Harris.Shi@lsi.com> wrote:
>    Due to the current timeout management change, our RDAC (failover) driver
>    had some difficulties in handling SCSI I/O timeout. The RDAC driver is in
>    the similar        layer as HBA driver in that it will register into scsi
>    mid-layer as scsi_host_template and stays below mid-layer. However, all
>    scsi I/Os coming to RDAC stack are routed by a path then dispatched to the
>    real HBA driver via mid-layer. We used to rely on the timer in
>    scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack.
>    Basically when I/O is coming to RDAC stack, we need to delete the timer
>    for each I/O. Then after selecting a specific path for this I/O, we need
>    to send the I/O back to mid-layer with a larger timeout value just to
>    avoid the forced failover. When I/O completes successfully, we added the
>    original timer back to the I/O and pass it over to upper block layer for
>    further process.
> 
> 
> 
>    However, with the current timeout management functions moving to block
>    layer, it became difficult for us to explicitly control the timeout value
>    for specific I/O.
> 
>    Can you shed some lights on how to handle the I/O based timeout in this
>    case?
> 

Since long term mpp capabilities should be handled by dm-mp and the SCSI
RDAC handler exporting functions to allow direct adding and deleting of the
timer may not be something that would be needed long term. It may not be
really clean to add these interfaces in.

Could similar prior functionality be created by the following?
	1.) To the RDAC vhba add a hostt->eh_timed_out function. In this
	timeout function return BLK_EH_RESET_TIMER until it is done with
	the command. Since the vhba does not have a transport
	scsi_times_out should call this function on every timeout. There is
	some overhead here depending on the default timeout value set in
	timing out and then resetting the timer.

	2.) For each sdev that is taken over store the previous
	rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp
	specific function for the requests sent to the real HBA drivers.

	3.) Set the timeout in the real HBA driver requests prior to
	sending it to the mid layer.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2008-11-12  7:29 ` question on block-layer timeout change Mike Anderson
@ 2008-11-12 17:16   ` malahal
  2008-11-14  8:51   ` Shi, Harris
  1 sibling, 0 replies; 16+ messages in thread
From: malahal @ 2008-11-12 17:16 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Shi, Harris, Jens Axboe, Alan Stern, Tejun Heo,
	SCSI development list

Mike Anderson [andmike@linux.vnet.ibm.com] wrote:
> Shi, Harris <Harris.Shi@lsi.com> wrote:
> >    Due to the current timeout management change, our RDAC (failover) driver
> >    had some difficulties in handling SCSI I/O timeout. The RDAC driver is in
> >    the similar        layer as HBA driver in that it will register into scsi
> >    mid-layer as scsi_host_template and stays below mid-layer. However, all
> >    scsi I/Os coming to RDAC stack are routed by a path then dispatched to the
> >    real HBA driver via mid-layer. We used to rely on the timer in
> >    scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack.
> >    Basically when I/O is coming to RDAC stack, we need to delete the timer
> >    for each I/O. Then after selecting a specific path for this I/O, we need
> >    to send the I/O back to mid-layer with a larger timeout value just to
> >    avoid the forced failover. When I/O completes successfully, we added the
> >    original timer back to the I/O and pass it over to upper block layer for
> >    further process.
> > 
> > 
> > 
> >    However, with the current timeout management functions moving to block
> >    layer, it became difficult for us to explicitly control the timeout value
> >    for specific I/O.
> > 
> >    Can you shed some lights on how to handle the I/O based timeout in this
> >    case?
> > 
> 
> Since long term mpp capabilities should be handled by dm-mp and the SCSI
> RDAC handler exporting functions to allow direct adding and deleting of the
> timer may not be something that would be needed long term. It may not be
> really clean to add these interfaces in.
> 
> Could similar prior functionality be created by the following?
> 	1.) To the RDAC vhba add a hostt->eh_timed_out function. In this
> 	timeout function return BLK_EH_RESET_TIMER until it is done with
> 	the command. Since the vhba does not have a transport
> 	scsi_times_out should call this function on every timeout. There is
> 	some overhead here depending on the default timeout value set in
> 	timing out and then resetting the timer.

I believe, GDTG driver does this sort of thing for a different purpose.

> 	2.) For each sdev that is taken over store the previous
> 	rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp
> 	specific function for the requests sent to the real HBA drivers.
> 
> 	3.) Set the timeout in the real HBA driver requests prior to
> 	sending it to the mid layer.

Should work, I guess...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2008-11-12  7:29 ` question on block-layer timeout change Mike Anderson
  2008-11-12 17:16   ` malahal
@ 2008-11-14  8:51   ` Shi, Harris
  2008-11-14 17:18     ` malahal
  1 sibling, 1 reply; 16+ messages in thread
From: Shi, Harris @ 2008-11-14  8:51 UTC (permalink / raw)
  To: Mike Anderson; +Cc: malahal@us.ibm.com, SCSI development list

Mike,

Thanks for your valuable input.

For item 1, how can I make sure that the timed-out command will have the timer modified via blk_add_timer given that one of following conditions has to be met,
* timer isn't already pending or
* this timeout value is earlier than an existing one.
Also where do I need to have a retry after reset the timer?

For item 2, rq_timed_out_fn is tied with scsi_times_out at the very beginning. What's the purpose to tie a specific mpp method? How do we handle the case if timeout is triggered at this time?

Harris

-----Original Message-----
From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com]
Sent: Wednesday, November 12, 2008 1:29 AM
To: Shi, Harris
Cc: Jens Axboe; Alan Stern; Tejun Heo; SCSI development list
Subject: Re: question on block-layer timeout change

Shi, Harris <Harris.Shi@lsi.com> wrote:
>    Due to the current timeout management change, our RDAC (failover) driver
>    had some difficulties in handling SCSI I/O timeout. The RDAC driver is in
>    the similar        layer as HBA driver in that it will register into scsi
>    mid-layer as scsi_host_template and stays below mid-layer. However, all
>    scsi I/Os coming to RDAC stack are routed by a path then dispatched to the
>    real HBA driver via mid-layer. We used to rely on the timer in
>    scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack.
>    Basically when I/O is coming to RDAC stack, we need to delete the timer
>    for each I/O. Then after selecting a specific path for this I/O, we need
>    to send the I/O back to mid-layer with a larger timeout value just to
>    avoid the forced failover. When I/O completes successfully, we added the
>    original timer back to the I/O and pass it over to upper block layer for
>    further process.
>
>
>
>    However, with the current timeout management functions moving to block
>    layer, it became difficult for us to explicitly control the timeout value
>    for specific I/O.
>
>    Can you shed some lights on how to handle the I/O based timeout in this
>    case?
>

Since long term mpp capabilities should be handled by dm-mp and the SCSI
RDAC handler exporting functions to allow direct adding and deleting of the
timer may not be something that would be needed long term. It may not be
really clean to add these interfaces in.

Could similar prior functionality be created by the following?
        1.) To the RDAC vhba add a hostt->eh_timed_out function. In this
        timeout function return BLK_EH_RESET_TIMER until it is done with
        the command. Since the vhba does not have a transport
        scsi_times_out should call this function on every timeout. There is
        some overhead here depending on the default timeout value set in
        timing out and then resetting the timer.

        2.) For each sdev that is taken over store the previous
        rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp
        specific function for the requests sent to the real HBA drivers.

        3.) Set the timeout in the real HBA driver requests prior to
        sending it to the mid layer.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2008-11-14  8:51   ` Shi, Harris
@ 2008-11-14 17:18     ` malahal
  2008-12-10 23:11       ` Shi, Harris
  0 siblings, 1 reply; 16+ messages in thread
From: malahal @ 2008-11-14 17:18 UTC (permalink / raw)
  To: Shi, Harris; +Cc: Mike Anderson, SCSI development list

Shi, Harris [Harris.Shi@lsi.com] wrote:
> Mike,
> 
> Thanks for your valuable input.
> 
> For item 1, how can I make sure that the timed-out command will have
> the timer modified via blk_add_timer given that one of following
> conditions has to be met,
>
> * timer isn't already pending or

I don't completely understand the RDAC architecture, but here are my
comments bases on what I read from your earlier email.

When your timed_out function is called, the timer is already fired

In your case, you should always return 'BLK_EH_RESET_TIMER'. This is
just to make sure that the command doesn't fail before you resubmit the
request to the real HBA adapter, I think.

> * this timeout value is earlier than an existing one.

When you return BLK_EH_RESET_TIMER, the block layer puts the command
again in its timeout queue and waits for another 'timeout' value before
calling your timer again.

You can send the request to real HBA divers after timeout expiry. If you
really want to fail, you can return some other value...

> Also where do I need to have a retry after reset the timer?
There is no retry of the command. If you return 'BLK_EH_RESET_TIMER' the
command is still with your driver and you can take another timeout value
to finish the request.

> For item 2, rq_timed_out_fn is tied with scsi_times_out at the very beginning. What's the purpose to tie a specific mpp method? How do we handle the case if timeout is triggered at this time?
> 

When you send the request to the real HBA, the request timeout value
doesn't change. So Mike's suggestion is to have your own timed_out_fun
that returns BLK_EH_RESET_TIMER few times (effectively hijacking the
timeout value). See gdth_timed_out() for this case.

--Malahal.

> Harris
> 
> -----Original Message-----
> From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com]
> Sent: Wednesday, November 12, 2008 1:29 AM
> To: Shi, Harris
> Cc: Jens Axboe; Alan Stern; Tejun Heo; SCSI development list
> Subject: Re: question on block-layer timeout change
> 
> Shi, Harris <Harris.Shi@lsi.com> wrote:
> >    Due to the current timeout management change, our RDAC (failover) driver
> >    had some difficulties in handling SCSI I/O timeout. The RDAC driver is in
> >    the similar        layer as HBA driver in that it will register into scsi
> >    mid-layer as scsi_host_template and stays below mid-layer. However, all
> >    scsi I/Os coming to RDAC stack are routed by a path then dispatched to the
> >    real HBA driver via mid-layer. We used to rely on the timer in
> >    scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack.
> >    Basically when I/O is coming to RDAC stack, we need to delete the timer
> >    for each I/O. Then after selecting a specific path for this I/O, we need
> >    to send the I/O back to mid-layer with a larger timeout value just to
> >    avoid the forced failover. When I/O completes successfully, we added the
> >    original timer back to the I/O and pass it over to upper block layer for
> >    further process.
> >
> >
> >
> >    However, with the current timeout management functions moving to block
> >    layer, it became difficult for us to explicitly control the timeout value
> >    for specific I/O.
> >
> >    Can you shed some lights on how to handle the I/O based timeout in this
> >    case?
> >
> 
> Since long term mpp capabilities should be handled by dm-mp and the SCSI
> RDAC handler exporting functions to allow direct adding and deleting of the
> timer may not be something that would be needed long term. It may not be
> really clean to add these interfaces in.
> 
> Could similar prior functionality be created by the following?
>         1.) To the RDAC vhba add a hostt->eh_timed_out function. In this
>         timeout function return BLK_EH_RESET_TIMER until it is done with
>         the command. Since the vhba does not have a transport
>         scsi_times_out should call this function on every timeout. There is
>         some overhead here depending on the default timeout value set in
>         timing out and then resetting the timer.
> 
>         2.) For each sdev that is taken over store the previous
>         rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp
>         specific function for the requests sent to the real HBA drivers.
> 
>         3.) Set the timeout in the real HBA driver requests prior to
>         sending it to the mid layer.
> 
> -andmike
> --
> Michael Anderson
> andmike@linux.vnet.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2008-11-14 17:18     ` malahal
@ 2008-12-10 23:11       ` Shi, Harris
  2008-12-11 11:03         ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Shi, Harris @ 2008-12-10 23:11 UTC (permalink / raw)
  To: malahal@us.ibm.com; +Cc: Mike Anderson, SCSI development list

Mike,

Your suggestion on MPP driver is working pretty good on FC config in term of failover and failback. However, recently when we switched over to iscsi config on SLES11beta6 (2.6.27.7-4-default, SLES11 kernel did not sync up with current upstream one, all of timeout management patch has been pulled in), we were consistently hit by the following panic when we tried to do failover via controller sysReboot or placed offline. Is it something related to the recent timeout management patch introduced into the kernel?

BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba
IP: [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
PGD 196cf6067 PUD 196c4f067 PMD 0
Oops: 0002 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 2
Modules linked in: radeon drm crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core d
Supported: Yes, External
Pid: 0, comm: swapper Tainted: G          2.6.27.7-4-default #1
RIP: 0010:[<ffffffff80222047>]  [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
RSP: 0018:ffff88019f187e20  EFLAGS: 00010086
RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff88019d8c3218
RDX: ffff88019cd3d000 RSI: 0000000000002007 RDI: 00000000000000ba
RBP: ffff880194940918 R08: ffff880194940c78 R09: 0000000000000000
R10: ffffffff80a65b80 R11: ffffffff8021c6ed R12: 0000000000000000
R13: ffff880194940b50 R14: ffff88019f187ed0 R15: ffff880194940c90
FS:  0000000000000000(0000) GS:ffff88019f157ec0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000ba CR3: 0000000196cc7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88019f180000, task ffff88019f17e280)
Stack:  ffffffff804aabe2 ffffffffa0321734 0000000000000000 ffff88018fc97c80
 ffff880194940918 ffffffffa0004f35 ffff88019d8c30d8 ffffffff8034b954
 ffff88019d8c30d8 ffffffff8034ba23 0000000000000282 0000000000000100
Call Trace:
 [<ffffffff804aabe2>] _spin_lock+0x13/0x15
 [<ffffffffa0321734>] iscsi_eh_cmd_timed_out+0x27/0xc0 [libiscsi]
 [<ffffffffa0004f35>] scsi_times_out+0x46/0x72 [scsi_mod]
 [<ffffffff8034b954>] blk_rq_timed_out+0xe/0x4a
 [<ffffffff8034ba23>] blk_rq_timed_out_timer+0x93/0x116
 [<ffffffff8024a5f5>] run_timer_softirq+0x19a/0x228
 [<ffffffff8024696d>] __do_softirq+0x84/0x115
 [<ffffffff8020ddac>] call_softirq+0x1c/0x28
 [<ffffffff8020f177>] do_softirq+0x3c/0x81
 [<ffffffff80246684>] irq_exit+0x3f/0x83
 [<ffffffff8021cf73>] smp_apic_timer_interrupt+0x95/0xae
 [<ffffffff8020d523>] apic_timer_interrupt+0x83/0x90
 [<ffffffff802134f4>] mwait_idle+0x3c/0x46
 [<ffffffff8020b3b5>] cpu_idle+0xa9/0xf1


Code: ff 00 00 c1 ea 10 39 c2 0f 95 c0 0f b6 c0 c3 8b 17 89 d0 c1 f8 10 29 d0 25 ff ff 00 00 ff
RIP  [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
 RSP <ffff88019f187e20>
----------------------------------------------------------------------------

Your comment is very much appreciated.

Thanks.
Harris

-----Original Message-----
From: malahal@us.ibm.com [mailto:malahal@us.ibm.com]
Sent: Friday, November 14, 2008 11:18 AM
To: Shi, Harris
Cc: Mike Anderson; SCSI development list
Subject: Re: question on block-layer timeout change

Shi, Harris [Harris.Shi@lsi.com] wrote:
> Mike,
>
> Thanks for your valuable input.
>
> For item 1, how can I make sure that the timed-out command will have
> the timer modified via blk_add_timer given that one of following
> conditions has to be met,
>
> * timer isn't already pending or

I don't completely understand the RDAC architecture, but here are my
comments bases on what I read from your earlier email.

When your timed_out function is called, the timer is already fired

In your case, you should always return 'BLK_EH_RESET_TIMER'. This is
just to make sure that the command doesn't fail before you resubmit the
request to the real HBA adapter, I think.

> * this timeout value is earlier than an existing one.

When you return BLK_EH_RESET_TIMER, the block layer puts the command
again in its timeout queue and waits for another 'timeout' value before
calling your timer again.

You can send the request to real HBA divers after timeout expiry. If you
really want to fail, you can return some other value...

> Also where do I need to have a retry after reset the timer?
There is no retry of the command. If you return 'BLK_EH_RESET_TIMER' the
command is still with your driver and you can take another timeout value
to finish the request.

> For item 2, rq_timed_out_fn is tied with scsi_times_out at the very beginning. What's the purpose to tie a specific mpp method? How do we handle the case if timeout is triggered at this time?
>

When you send the request to the real HBA, the request timeout value
doesn't change. So Mike's suggestion is to have your own timed_out_fun
that returns BLK_EH_RESET_TIMER few times (effectively hijacking the
timeout value). See gdth_timed_out() for this case.

--Malahal.

> Harris
>
> -----Original Message-----
> From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com]
> Sent: Wednesday, November 12, 2008 1:29 AM
> To: Shi, Harris
> Cc: Jens Axboe; Alan Stern; Tejun Heo; SCSI development list
> Subject: Re: question on block-layer timeout change
>
> Shi, Harris <Harris.Shi@lsi.com> wrote:
> >    Due to the current timeout management change, our RDAC (failover) driver
> >    had some difficulties in handling SCSI I/O timeout. The RDAC driver is in
> >    the similar        layer as HBA driver in that it will register into scsi
> >    mid-layer as scsi_host_template and stays below mid-layer. However, all
> >    scsi I/Os coming to RDAC stack are routed by a path then dispatched to the
> >    real HBA driver via mid-layer. We used to rely on the timer in
> >    scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack.
> >    Basically when I/O is coming to RDAC stack, we need to delete the timer
> >    for each I/O. Then after selecting a specific path for this I/O, we need
> >    to send the I/O back to mid-layer with a larger timeout value just to
> >    avoid the forced failover. When I/O completes successfully, we added the
> >    original timer back to the I/O and pass it over to upper block layer for
> >    further process.
> >
> >
> >
> >    However, with the current timeout management functions moving to block
> >    layer, it became difficult for us to explicitly control the timeout value
> >    for specific I/O.
> >
> >    Can you shed some lights on how to handle the I/O based timeout in this
> >    case?
> >
>
> Since long term mpp capabilities should be handled by dm-mp and the SCSI
> RDAC handler exporting functions to allow direct adding and deleting of the
> timer may not be something that would be needed long term. It may not be
> really clean to add these interfaces in.
>
> Could similar prior functionality be created by the following?
>         1.) To the RDAC vhba add a hostt->eh_timed_out function. In this
>         timeout function return BLK_EH_RESET_TIMER until it is done with
>         the command. Since the vhba does not have a transport
>         scsi_times_out should call this function on every timeout. There is
>         some overhead here depending on the default timeout value set in
>         timing out and then resetting the timer.
>
>         2.) For each sdev that is taken over store the previous
>         rq_timed_out_fn and then use blk_queue_rq_timed_out to set a mpp
>         specific function for the requests sent to the real HBA drivers.
>
>         3.) Set the timeout in the real HBA driver requests prior to
>         sending it to the mid layer.
>
> -andmike
> --
> Michael Anderson
> andmike@linux.vnet.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2008-12-10 23:11       ` Shi, Harris
@ 2008-12-11 11:03         ` Hannes Reinecke
  2008-12-16 16:55           ` Shi, Harris
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2008-12-11 11:03 UTC (permalink / raw)
  To: Shi, Harris; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list

Hi Harris,

Shi, Harris wrote:
> Mike,
> 
> Your suggestion on MPP driver is working pretty good on FC config in term of failover and failback. However, recently when we switched over to iscsi config on SLES11beta6 (2.6.27.7-4-default, SLES11 kernel did not sync up with current upstream one, all of timeout management patch has been pulled in), we were consistently hit by the following panic when we tried to do failover via controller sysReboot or placed offline. Is it something related to the recent timeout management patch introduced into the kernel?
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba
> IP: [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
> PGD 196cf6067 PUD 196c4f067 PMD 0
> Oops: 0002 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 2
> Modules linked in: radeon drm crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core d
> Supported: Yes, External
> Pid: 0, comm: swapper Tainted: G          2.6.27.7-4-default #1
> RIP: 0010:[<ffffffff80222047>]  [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
> RSP: 0018:ffff88019f187e20  EFLAGS: 00010086
> RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff88019d8c3218
> RDX: ffff88019cd3d000 RSI: 0000000000002007 RDI: 00000000000000ba
> RBP: ffff880194940918 R08: ffff880194940c78 R09: 0000000000000000
> R10: ffffffff80a65b80 R11: ffffffff8021c6ed R12: 0000000000000000
> R13: ffff880194940b50 R14: ffff88019f187ed0 R15: ffff880194940c90
> FS:  0000000000000000(0000) GS:ffff88019f157ec0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000000000ba CR3: 0000000196cc7000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff88019f180000, task ffff88019f17e280)
> Stack:  ffffffff804aabe2 ffffffffa0321734 0000000000000000 ffff88018fc97c80
>  ffff880194940918 ffffffffa0004f35 ffff88019d8c30d8 ffffffff8034b954
>  ffff88019d8c30d8 ffffffff8034ba23 0000000000000282 0000000000000100
> Call Trace:
>  [<ffffffff804aabe2>] _spin_lock+0x13/0x15
>  [<ffffffffa0321734>] iscsi_eh_cmd_timed_out+0x27/0xc0 [libiscsi]
>  [<ffffffffa0004f35>] scsi_times_out+0x46/0x72 [scsi_mod]
>  [<ffffffff8034b954>] blk_rq_timed_out+0xe/0x4a
>  [<ffffffff8034ba23>] blk_rq_timed_out_timer+0x93/0x116
>  [<ffffffff8024a5f5>] run_timer_softirq+0x19a/0x228
>  [<ffffffff8024696d>] __do_softirq+0x84/0x115
>  [<ffffffff8020ddac>] call_softirq+0x1c/0x28
>  [<ffffffff8020f177>] do_softirq+0x3c/0x81
>  [<ffffffff80246684>] irq_exit+0x3f/0x83
>  [<ffffffff8021cf73>] smp_apic_timer_interrupt+0x95/0xae
>  [<ffffffff8020d523>] apic_timer_interrupt+0x83/0x90
>  [<ffffffff802134f4>] mwait_idle+0x3c/0x46
>  [<ffffffff8020b3b5>] cpu_idle+0xa9/0xf1
> 
> 
> Code: ff 00 00 c1 ea 10 39 c2 0f 95 c0 0f b6 c0 c3 8b 17 89 d0 c1 f8 10 29 d0 25 ff ff 00 00 ff
> RIP  [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
>  RSP <ffff88019f187e20>
> ----------------------------------------------------------------------------
> 
> Your comment is very much appreciated.
> 
Please try the latest SLES11 Beta/RC kernel. This bug is due to an incomplete patchset from
mainline and should be resolved now.

If not please do open a Novell bugzilla for it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2008-12-11 11:03         ` Hannes Reinecke
@ 2008-12-16 16:55           ` Shi, Harris
  2008-12-17  7:33             ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Shi, Harris @ 2008-12-16 16:55 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list

Hannes, 

Just let you know that the same panic is still there in SLES11RC1. Philip should be able to help us to report it in Novell Bugzilla. 

Thanks.
Harris

-----Original Message-----
From: Hannes Reinecke [mailto:hare@suse.de] 
Sent: Thursday, December 11, 2008 5:03 AM
To: Shi, Harris
Cc: malahal@us.ibm.com; Mike Anderson; SCSI development list
Subject: Re: question on block-layer timeout change

Hi Harris,

Shi, Harris wrote:
> Mike,
>
> Your suggestion on MPP driver is working pretty good on FC config in term of failover and failback. However, recently when we switched over to iscsi config on SLES11beta6 (2.6.27.7-4-default, SLES11 kernel did not sync up with current upstream one, all of timeout management patch has been pulled in), we were consistently hit by the following panic when we tried to do failover via controller sysReboot or placed offline. Is it something related to the recent timeout management patch introduced into the kernel?
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba
> IP: [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
> PGD 196cf6067 PUD 196c4f067 PMD 0
> Oops: 0002 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 2
> Modules linked in: radeon drm crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core d
> Supported: Yes, External
> Pid: 0, comm: swapper Tainted: G          2.6.27.7-4-default #1
> RIP: 0010:[<ffffffff80222047>]  [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
> RSP: 0018:ffff88019f187e20  EFLAGS: 00010086
> RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff88019d8c3218
> RDX: ffff88019cd3d000 RSI: 0000000000002007 RDI: 00000000000000ba
> RBP: ffff880194940918 R08: ffff880194940c78 R09: 0000000000000000
> R10: ffffffff80a65b80 R11: ffffffff8021c6ed R12: 0000000000000000
> R13: ffff880194940b50 R14: ffff88019f187ed0 R15: ffff880194940c90
> FS:  0000000000000000(0000) GS:ffff88019f157ec0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000000000ba CR3: 0000000196cc7000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff88019f180000, task ffff88019f17e280)
> Stack:  ffffffff804aabe2 ffffffffa0321734 0000000000000000 ffff88018fc97c80
>  ffff880194940918 ffffffffa0004f35 ffff88019d8c30d8 ffffffff8034b954
>  ffff88019d8c30d8 ffffffff8034ba23 0000000000000282 0000000000000100
> Call Trace:
>  [<ffffffff804aabe2>] _spin_lock+0x13/0x15
>  [<ffffffffa0321734>] iscsi_eh_cmd_timed_out+0x27/0xc0 [libiscsi]
>  [<ffffffffa0004f35>] scsi_times_out+0x46/0x72 [scsi_mod]
>  [<ffffffff8034b954>] blk_rq_timed_out+0xe/0x4a
>  [<ffffffff8034ba23>] blk_rq_timed_out_timer+0x93/0x116
>  [<ffffffff8024a5f5>] run_timer_softirq+0x19a/0x228
>  [<ffffffff8024696d>] __do_softirq+0x84/0x115
>  [<ffffffff8020ddac>] call_softirq+0x1c/0x28
>  [<ffffffff8020f177>] do_softirq+0x3c/0x81
>  [<ffffffff80246684>] irq_exit+0x3f/0x83
>  [<ffffffff8021cf73>] smp_apic_timer_interrupt+0x95/0xae
>  [<ffffffff8020d523>] apic_timer_interrupt+0x83/0x90
>  [<ffffffff802134f4>] mwait_idle+0x3c/0x46
>  [<ffffffff8020b3b5>] cpu_idle+0xa9/0xf1
>
>
> Code: ff 00 00 c1 ea 10 39 c2 0f 95 c0 0f b6 c0 c3 8b 17 89 d0 c1 f8 10 29 d0 25 ff ff 00 00 ff
> RIP  [<ffffffff80222047>] __ticket_spin_lock+0x5/0x1b
>  RSP <ffff88019f187e20>
> ----------------------------------------------------------------------------
>
> Your comment is very much appreciated.
>
Please try the latest SLES11 Beta/RC kernel. This bug is due to an incomplete patchset from
mainline and should be resolved now.

If not please do open a Novell bugzilla for it.

Cheers,

Hannes
--
Dr. Hannes Reinecke                   zSeries & Storage
hare@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2008-12-16 16:55           ` Shi, Harris
@ 2008-12-17  7:33             ` Hannes Reinecke
  2008-12-17 22:38               ` Shi, Harris
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2008-12-17  7:33 UTC (permalink / raw)
  To: Shi, Harris; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list

Hi Harris,

Shi, Harris wrote:
> Hannes, 
> 
> Just let you know that the same panic is still there in SLES11RC1. Philip should
> be able to help us to report it in Novell Bugzilla. 
>
Ah. It might be related to the wrong scsi_device_online() check (cf my patch
'Check for deleted device in scsi_device_online()' earlier on this list).
Problem is that the error handler just checks with scsi_device_online() if
a command can be sent down the wire.
And so for a failed/deleted devices the error handler just barfs at random
places.
Can you check if this patch resolves the issue?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2008-12-17  7:33             ` Hannes Reinecke
@ 2008-12-17 22:38               ` Shi, Harris
  2008-12-18  9:23                 ` Mike Anderson
  0 siblings, 1 reply; 16+ messages in thread
From: Shi, Harris @ 2008-12-17 22:38 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: malahal@us.ibm.com, Mike Anderson, SCSI development list

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Hannes, 

I rebuilt a kernel with your patch on a 2.6.28-rc8 kernel and hit the same panic as before. The attached trace is for your reference. 

Thanks.
Harris


-----Original Message-----
From: Hannes Reinecke [mailto:hare@suse.de] 
Sent: Wednesday, December 17, 2008 1:33 AM
To: Shi, Harris
Cc: malahal@us.ibm.com; Mike Anderson; SCSI development list
Subject: Re: question on block-layer timeout change

Hi Harris,

Shi, Harris wrote:
> Hannes, 
> 
> Just let you know that the same panic is still there in SLES11RC1. Philip should
> be able to help us to report it in Novell Bugzilla. 
>
Ah. It might be related to the wrong scsi_device_online() check (cf my patch
'Check for deleted device in scsi_device_online()' earlier on this list).
Problem is that the error handler just checks with scsi_device_online() if
a command can be sent down the wire.
And so for a failed/deleted devices the error handler just barfs at random
places.
Can you check if this patch resolves the issue?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: panic_on_iscsi_array_with_failover.txt --]
[-- Type: text/plain, Size: 5282 bytes --]

Information from /var/log/messages:
===================================
Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)



Information from Serial output:
===============================
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
*pdpt = 00000000319fe001 *pde = 0000000000000000
BUG: unable to handle kernel NULL pointer dereference at 00000086
IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
*pdpt = 0000000000546001 *pde = 0000000000000000
 ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]

Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
EIP is at __ticket_spin_lock+0x8/0x19
EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
Stack:
 f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
 f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
 f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
Call Trace:
 [<c0328e0f>] ? _spin_lock+0x15/0x18
 [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
 [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
 [<c0212539>] ? blk_rq_timed_out+0xc/0x46
 [<c02125f2>] ? blk_rq_timed_out_timer+0x7f/0xf6
 [<c0212573>] ? blk_rq_timed_out_timer+0x0/0xf6
 [<c013744c>] ? run_timer_softirq+0x154/0x1c7
 [<c0133ab0>] ? __do_softirq+0x8d/0x133
 [<c0133b9e>] ? do_softirq+0x48/0x57
 [<c0133cad>] ? irq_exit+0x38/0x6d
 [<c0112e8d>] ? smp_apic_timer_interrupt+0x71/0x7f
 [<c0105f78>] ? apic_timer_interrupt+0x28/0x30
 [<c010a75d>] ? mwait_idle+0x32/0x40
 [<c0103c36>] ? cpu_idle+0x74/0x8e
 [<c0324407>] ? start_secondary+0x269/0x26e
Code: fe ff ff 5b eb 13 56 0f b7 d2 ff 75 08 89 d9 0f b6 c0 e8 44 fe ff ff 5a 59 8d 65 f8 5b 5e 5d c3 90 90 90 55 ba 00 01
EIP: [<c011a274>] __ticket_spin_lock+0x8/0x19 SS:ESP 0068:f38a5e78
Oops: 0002 [#2] <0>Kernel panic - not syncing: Fatal exception in interrupt
------------[ cut here ]------------
WARNING: at kernel/smp.c:333 smp_call_function_mask+0x27/0x1b6()
Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i]
Pid: 0, comm: swapper Tainted: G      D    2.6.28-rc8-test-1-pae #1
Call Trace:
 [<c0326a95>] ? printk+0xf/0x12
 [<c012f173>] warn_on_slowpath+0x41/0x63
 [<c02207fc>] ? __const_udelay+0x2c/0x2e
 [<c02207fc>] ? __const_udelay+0x2c/0x2e
 [<c028afc1>] ? wait_for_xmitr+0x37/0x7d
 [<c028afc1>] ? wait_for_xmitr+0x37/0x7d
 [<c014d326>] smp_call_function_mask+0x27/0x1b6
 [<c011a346>] ? default_spin_lock_flags+0x15/0x1b
 [<c011a346>] ? default_spin_lock_flags+0x15/0x1b
 [<c0143b34>] ? down_trylock+0x20/0x29
 [<c012f4ea>] ? try_acquire_console_sem+0xd/0x2e
 [<c0157040>] ? crash_kexec+0x9f/0xa7
 [<c014d4d3>] smp_call_function+0x1e/0x25
 [<c0111980>] ? stop_this_cpu+0x0/0x51
 [<c0111956>] native_smp_send_stop+0x1b/0x45
 [<c03269ef>] panic+0x48/0xdf
 [<c0329a1b>] oops_end+0x8f/0xa3
 [<c01074c7>] die+0x5d/0x65
 [<c032b422>] do_page_fault+0x9ef/0xaf3
 [<c0124b8a>] ? enqueue_task_fair+0x1f/0x59
 [<c012120e>] ? resched_task+0x30/0x74
 [<c0128a9c>] ? try_to_wake_up+0x216/0x221
 [<c0128ab2>] ? default_wake_function+0xb/0xd
 [<c01403f2>] ? autoremove_wake_function+0xf/0x33
 [<c014044c>] ? wake_bit_function+0x36/0x43
 [<c0120b54>] ? __wake_up_common+0x35/0x5b
 [<c016e4dc>] ? mempool_free_slab+0xe/0x10
 [<c0124b63>] ? enqueue_entity+0x297/0x29f
 [<c0124b8a>] ? enqueue_task_fair+0x1f/0x59
 [<c012057e>] ? enqueue_task+0x4c/0x58
 [<c012120e>] ? resched_task+0x30/0x74
 [<c032aa33>] ? do_page_fault+0x0/0xaf3
 [<c0329132>] error_code+0x72/0x80
 [<c014007b>] ? kthreadd+0xa8/0x156
 [<c011a274>] ? __ticket_spin_lock+0x8/0x19
 [<c0328e0f>] _spin_lock+0x15/0x18
 [<f9298389>] iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
 [<f7e13396>] scsi_times_out+0x35/0x61 [scsi_mod]
 [<c0212539>] blk_rq_timed_out+0xc/0x46
 [<c02125f2>] blk_rq_timed_out_timer+0x7f/0xf6
 [<c0212573>] ? blk_rq_timed_out_timer+0x0/0xf6
 [<c013744c>] run_timer_softirq+0x154/0x1c7
 [<c0133ab0>] __do_softirq+0x8d/0x133
 [<c0133b9e>] do_softirq+0x48/0x57
 [<c0133cad>] irq_exit+0x38/0x6d
 [<c0112e8d>] smp_apic_timer_interrupt+0x71/0x7f
 [<c0105f78>] apic_timer_interrupt+0x28/0x30
 [<c010a75d>] ? mwait_idle+0x32/0x40
 [<c0103c36>] cpu_idle+0x74/0x8e
 [<c0324407>] start_secondary+0x269/0x26e
---[ end trace d93b1e584c414659 ]---


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2008-12-17 22:38               ` Shi, Harris
@ 2008-12-18  9:23                 ` Mike Anderson
  2008-12-18 22:37                   ` Shi, Harris
  2009-01-04 17:12                   ` Mike Christie
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Anderson @ 2008-12-18  9:23 UTC (permalink / raw)
  To: Shi, Harris
  Cc: Hannes Reinecke, malahal@us.ibm.com, SCSI development list,
	Mike Christie

Shi, Harris <Harris.Shi@lsi.com> wrote:
> Information from /var/log/messages:
> ===================================
> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
> 
> 
> 
> Information from Serial output:
> ===============================
> Oops: 0002 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
> *pdpt = 00000000319fe001 *pde = 0000000000000000
> BUG: unable to handle kernel NULL pointer dereference at 00000086
> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
> *pdpt = 0000000000546001 *pde = 0000000000000000
>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
> 
> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
> EIP is at __ticket_spin_lock+0x8/0x19
> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
> Stack:
>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
> Call Trace:
>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46

I could not match my listing exactly with this output, but it appears that
the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
would help verify the iscsi_eh_cmd_timed_out line.

I added Mike C to the email cc for possible comments on the error messages
displayed above and if that would lead to cleanup of structures referenced
in iscsi_eh_cmd_timed_out.

Question on the system setup. Are you using mpp in this kernel as I did not
see it in the module list?

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2008-12-18  9:23                 ` Mike Anderson
@ 2008-12-18 22:37                   ` Shi, Harris
  2009-01-04 17:12                   ` Mike Christie
  1 sibling, 0 replies; 16+ messages in thread
From: Shi, Harris @ 2008-12-18 22:37 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Hannes Reinecke, malahal@us.ibm.com, SCSI development list,
	Mike Christie

Mike, 

Yes. We do have MPP loaded in SLES11RC1. The mpp modules were cut off by the serial windows. It was actually there. 

The NULL session was causing the problem after controller was place offline. Actually it only took bit more than 15s after controller offline to hit the panic.

Thanks.
Harris
-----Original Message-----
From: Mike Anderson [mailto:andmike@linux.vnet.ibm.com] 
Sent: Thursday, December 18, 2008 3:24 AM
To: Shi, Harris
Cc: Hannes Reinecke; malahal@us.ibm.com; SCSI development list; Mike Christie
Subject: Re: question on block-layer timeout change

Shi, Harris <Harris.Shi@lsi.com> wrote:
> Information from /var/log/messages:
> ===================================
> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
> 
> 
> 
> Information from Serial output:
> ===============================
> Oops: 0002 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
> *pdpt = 00000000319fe001 *pde = 0000000000000000
> BUG: unable to handle kernel NULL pointer dereference at 00000086
> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
> *pdpt = 0000000000546001 *pde = 0000000000000000
>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
> 
> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
> EIP is at __ticket_spin_lock+0x8/0x19
> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
> Stack:
>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
> Call Trace:
>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46

I could not match my listing exactly with this output, but it appears that
the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
would help verify the iscsi_eh_cmd_timed_out line.

I added Mike C to the email cc for possible comments on the error messages
displayed above and if that would lead to cleanup of structures referenced
in iscsi_eh_cmd_timed_out.

Question on the system setup. Are you using mpp in this kernel as I did not
see it in the module list?

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2008-12-18  9:23                 ` Mike Anderson
  2008-12-18 22:37                   ` Shi, Harris
@ 2009-01-04 17:12                   ` Mike Christie
  2009-01-07  6:37                     ` Shi, Harris
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Christie @ 2009-01-04 17:12 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Shi, Harris, Hannes Reinecke, malahal@us.ibm.com,
	SCSI development list

Mike Anderson wrote:
> Shi, Harris <Harris.Shi@lsi.com> wrote:
>> Information from /var/log/messages:
>> ===================================
>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
>> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
>> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
>>
>>
>>
>> Information from Serial output:
>> ===============================
>> Oops: 0002 [#1] SMP
>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>> *pdpt = 00000000319fe001 *pde = 0000000000000000
>> BUG: unable to handle kernel NULL pointer dereference at 00000086
>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>> *pdpt = 0000000000546001 *pde = 0000000000000000
>>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
>>
>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
>> EIP is at __ticket_spin_lock+0x8/0x19
>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
>> Stack:
>>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
>> Call Trace:
>>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46
> 
> I could not match my listing exactly with this output, but it appears that
> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
> would help verify the iscsi_eh_cmd_timed_out line.
> 
> I added Mike C to the email cc for possible comments on the error messages
> displayed above and if that would lead to cleanup of structures referenced
> in iscsi_eh_cmd_timed_out.
> 

Sorry for the late reply. I have been on vacation.

The iscsi error message just indicates that the initiator tried to send 
a iscsi ping and it did not get a response, so the initiator dropped the 
session. The error was reported as a generic connection error (1011) and 
when the error was fired the initiator was in the logged in / full 
feature phase (this basically means normal old use and nothing special).

Due to the sles use and MPP driver, I am not sure what exactly is 
running, but for this code path the iscsi driver does this:

         cls_session = starget_to_session(scsi_target(scmd->device));
         session = cls_session->dd_data;

to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels 
do that has not changed).

The session pointers are only changed when a session is destroyed, and 
that only happens if you do a logout of the session (iscsiadm -m .... 
-u), and at that time when the session is destroyed we should have 
flushed all IO.

Are you guys doing a logout of the session with iscsiadm at this time?

Does MPP clone commands and is it doing something with the command's 
pointers to the device?

Another possibilty is that commands are not getting cleaned up 
correctly. When you see this "connection2:0: detected conn error 
(1011)", the driver is going to kill all outstadnding commands and and 
call scsi_done on them to requeue them with the scsi layer, so we should 
be getting any commands timed out after you see that message (maybe only 
in some race case where the session's commands are getting flushed at 
the exact same time the scsi eh was firing). There was no time stamp on 
the oops output but I doubt this happened. Did the oops happen after the 
conn error message though?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2009-01-04 17:12                   ` Mike Christie
@ 2009-01-07  6:37                     ` Shi, Harris
  2009-01-07 20:46                       ` Mike Christie
  0 siblings, 1 reply; 16+ messages in thread
From: Shi, Harris @ 2009-01-07  6:37 UTC (permalink / raw)
  To: Mike Christie, Mike Anderson
  Cc: Hannes Reinecke, malahal@us.ibm.com, SCSI development list

What we did is to sysReboot one controller while I/O is up and running, we expected to have controller failover and did not explicitly run iscsiadm to logout the session in this case. Yes. We got the oop after conn error message. If controller reboot resulted in session gone, that might explain we had a kernel panic. However, MPP has been running on iscsi array for quite a while on SLES10/REHL distribution, it's the first time we got hit by this kernel panic. That's why we had a some concerns on iscsi stack. 

-----Original Message-----
From: Mike Christie [mailto:michaelc@cs.wisc.edu] 
Sent: Sunday, January 04, 2009 11:12 AM
To: Mike Anderson
Cc: Shi, Harris; Hannes Reinecke; malahal@us.ibm.com; SCSI development list
Subject: Re: question on block-layer timeout change

Mike Anderson wrote:
> Shi, Harris <Harris.Shi@lsi.com> wrote:
>> Information from /var/log/messages:
>> ===================================
>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
>> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
>> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
>>
>>
>>
>> Information from Serial output:
>> ===============================
>> Oops: 0002 [#1] SMP
>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>> *pdpt = 00000000319fe001 *pde = 0000000000000000
>> BUG: unable to handle kernel NULL pointer dereference at 00000086
>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>> *pdpt = 0000000000546001 *pde = 0000000000000000
>>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
>>
>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
>> EIP is at __ticket_spin_lock+0x8/0x19
>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
>> Stack:
>>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
>> Call Trace:
>>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46
> 
> I could not match my listing exactly with this output, but it appears that
> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
> would help verify the iscsi_eh_cmd_timed_out line.
> 
> I added Mike C to the email cc for possible comments on the error messages
> displayed above and if that would lead to cleanup of structures referenced
> in iscsi_eh_cmd_timed_out.
> 

Sorry for the late reply. I have been on vacation.

The iscsi error message just indicates that the initiator tried to send 
a iscsi ping and it did not get a response, so the initiator dropped the 
session. The error was reported as a generic connection error (1011) and 
when the error was fired the initiator was in the logged in / full 
feature phase (this basically means normal old use and nothing special).

Due to the sles use and MPP driver, I am not sure what exactly is 
running, but for this code path the iscsi driver does this:

         cls_session = starget_to_session(scsi_target(scmd->device));
         session = cls_session->dd_data;

to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels 
do that has not changed).

The session pointers are only changed when a session is destroyed, and 
that only happens if you do a logout of the session (iscsiadm -m .... 
-u), and at that time when the session is destroyed we should have 
flushed all IO.

Are you guys doing a logout of the session with iscsiadm at this time?

Does MPP clone commands and is it doing something with the command's 
pointers to the device?

Another possibilty is that commands are not getting cleaned up 
correctly. When you see this "connection2:0: detected conn error 
(1011)", the driver is going to kill all outstadnding commands and and 
call scsi_done on them to requeue them with the scsi layer, so we should 
be getting any commands timed out after you see that message (maybe only 
in some race case where the session's commands are getting flushed at 
the exact same time the scsi eh was firing). There was no time stamp on 
the oops output but I doubt this happened. Did the oops happen after the 
conn error message though?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question on block-layer timeout change
  2009-01-07  6:37                     ` Shi, Harris
@ 2009-01-07 20:46                       ` Mike Christie
  2009-01-24 16:34                         ` Shi, Harris
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Christie @ 2009-01-07 20:46 UTC (permalink / raw)
  To: Shi, Harris
  Cc: Mike Anderson, Hannes Reinecke, malahal@us.ibm.com,
	SCSI development list

Shi, Harris wrote:
> What we did is to sysReboot one controller while I/O is up and running, we expected to have controller failover and did not explicitly run iscsiadm to logout the session in this case. Yes. We got the oop after conn error message. If controller reboot resulted in session gone, that might explain we had a kernel panic. However, MPP has been running on iscsi array for quite a while on SLES10/REHL distribution, it's the first time we got hit by this kernel panic. That's why we had a some concerns on iscsi stack. 
> 

The failover should not have resulted in the session getting destroyed. 
It might depend on what is in SLES though. In older tools if the chap 
values changed (we logged in with one user/pass and the target no wants 
diff values) or for some errors the initiator thought was fatal. For 
this though you should have seen a error in logs indicating that we got 
a fatal login error.

What does the portal return when you were rebooting it if the initiator 
tries to log in? Look at the open-iscsi userspace tools in SLES. Is your 
target returning a error values that 
initiator.c:__check_iscsi_status_class() would destroy the session for? 
Is it now changed upstream?

> -----Original Message-----
> From: Mike Christie [mailto:michaelc@cs.wisc.edu] 
> Sent: Sunday, January 04, 2009 11:12 AM
> To: Mike Anderson
> Cc: Shi, Harris; Hannes Reinecke; malahal@us.ibm.com; SCSI development list
> Subject: Re: question on block-layer timeout change
> 
> Mike Anderson wrote:
>> Shi, Harris <Harris.Shi@lsi.com> wrote:
>>> Information from /var/log/messages:
>>> ===================================
>>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
>>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
>>> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
>>> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
>>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
>>>
>>>
>>>
>>> Information from Serial output:
>>> ===============================
>>> Oops: 0002 [#1] SMP
>>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
>>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
>>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>>> *pdpt = 00000000319fe001 *pde = 0000000000000000
>>> BUG: unable to handle kernel NULL pointer dereference at 00000086
>>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>>> *pdpt = 0000000000546001 *pde = 0000000000000000
>>>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
>>>
>>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
>>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
>>> EIP is at __ticket_spin_lock+0x8/0x19
>>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
>>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
>>> Stack:
>>>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>>>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>>>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
>>> Call Trace:
>>>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>>>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>>>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>>>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46
>> I could not match my listing exactly with this output, but it appears that
>> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
>> would help verify the iscsi_eh_cmd_timed_out line.
>>
>> I added Mike C to the email cc for possible comments on the error messages
>> displayed above and if that would lead to cleanup of structures referenced
>> in iscsi_eh_cmd_timed_out.
>>
> 
> Sorry for the late reply. I have been on vacation.
> 
> The iscsi error message just indicates that the initiator tried to send 
> a iscsi ping and it did not get a response, so the initiator dropped the 
> session. The error was reported as a generic connection error (1011) and 
> when the error was fired the initiator was in the logged in / full 
> feature phase (this basically means normal old use and nothing special).
> 
> Due to the sles use and MPP driver, I am not sure what exactly is 
> running, but for this code path the iscsi driver does this:
> 
>          cls_session = starget_to_session(scsi_target(scmd->device));
>          session = cls_session->dd_data;
> 
> to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels 
> do that has not changed).
> 
> The session pointers are only changed when a session is destroyed, and 
> that only happens if you do a logout of the session (iscsiadm -m .... 
> -u), and at that time when the session is destroyed we should have 
> flushed all IO.
> 
> Are you guys doing a logout of the session with iscsiadm at this time?
> 
> Does MPP clone commands and is it doing something with the command's 
> pointers to the device?
> 
> Another possibilty is that commands are not getting cleaned up 
> correctly. When you see this "connection2:0: detected conn error 
> (1011)", the driver is going to kill all outstadnding commands and and 
> call scsi_done on them to requeue them with the scsi layer, so we should 
> be getting any commands timed out after you see that message (maybe only 
> in some race case where the session's commands are getting flushed at 
> the exact same time the scsi eh was firing). There was no time stamp on 
> the oops output but I doubt this happened. Did the oops happen after the 
> conn error message though?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: question on block-layer timeout change
  2009-01-07 20:46                       ` Mike Christie
@ 2009-01-24 16:34                         ` Shi, Harris
  0 siblings, 0 replies; 16+ messages in thread
From: Shi, Harris @ 2009-01-24 16:34 UTC (permalink / raw)
  To: Mike Christie
  Cc: Mike Anderson, Hannes Reinecke, malahal@us.ibm.com,
	SCSI development list

[-- Attachment #1: Type: text/plain, Size: 6822 bytes --]

Christie, 

Sorry for the late response. I was off-track by other issues for a while. On controller sysReboot, it actually did not send any asynchronous event/message for iscsid to destroy the session. It simply went offline for a while and then back online. Now I was able to enable iscsid debug trace and capture a log for your reference. My serial console was disconnected during the weekend for some reasons, whenever it's back online, I might try some more cases. 

Thanks.
Harris

-----Original Message-----
From: Mike Christie [mailto:michaelc@cs.wisc.edu] 
Sent: Wednesday, January 07, 2009 2:46 PM
To: Shi, Harris
Cc: Mike Anderson; Hannes Reinecke; malahal@us.ibm.com; SCSI development list
Subject: Re: question on block-layer timeout change

Shi, Harris wrote:
> What we did is to sysReboot one controller while I/O is up and running, we expected to have controller failover and did not explicitly run iscsiadm to logout the session in this case. Yes. We got the oop after conn error message. If controller reboot resulted in session gone, that might explain we had a kernel panic. However, MPP has been running on iscsi array for quite a while on SLES10/REHL distribution, it's the first time we got hit by this kernel panic. That's why we had a some concerns on iscsi stack. 
> 

The failover should not have resulted in the session getting destroyed. 
It might depend on what is in SLES though. In older tools if the chap 
values changed (we logged in with one user/pass and the target no wants 
diff values) or for some errors the initiator thought was fatal. For 
this though you should have seen a error in logs indicating that we got 
a fatal login error.

What does the portal return when you were rebooting it if the initiator 
tries to log in? Look at the open-iscsi userspace tools in SLES. Is your 
target returning a error values that 
initiator.c:__check_iscsi_status_class() would destroy the session for? 
Is it now changed upstream?

> -----Original Message-----
> From: Mike Christie [mailto:michaelc@cs.wisc.edu] 
> Sent: Sunday, January 04, 2009 11:12 AM
> To: Mike Anderson
> Cc: Shi, Harris; Hannes Reinecke; malahal@us.ibm.com; SCSI development list
> Subject: Re: question on block-layer timeout change
> 
> Mike Anderson wrote:
>> Shi, Harris <Harris.Shi@lsi.com> wrote:
>>> Information from /var/log/messages:
>>> ===================================
>>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
>>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
>>> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
>>> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
>>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
>>>
>>>
>>>
>>> Information from Serial output:
>>> ===============================
>>> Oops: 0002 [#1] SMP
>>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
>>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
>>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>>> *pdpt = 00000000319fe001 *pde = 0000000000000000
>>> BUG: unable to handle kernel NULL pointer dereference at 00000086
>>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>>> *pdpt = 0000000000546001 *pde = 0000000000000000
>>>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
>>>
>>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
>>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
>>> EIP is at __ticket_spin_lock+0x8/0x19
>>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
>>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
>>> Stack:
>>>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>>>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>>>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
>>> Call Trace:
>>>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>>>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>>>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>>>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46
>> I could not match my listing exactly with this output, but it appears that
>> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
>> would help verify the iscsi_eh_cmd_timed_out line.
>>
>> I added Mike C to the email cc for possible comments on the error messages
>> displayed above and if that would lead to cleanup of structures referenced
>> in iscsi_eh_cmd_timed_out.
>>
> 
> Sorry for the late reply. I have been on vacation.
> 
> The iscsi error message just indicates that the initiator tried to send 
> a iscsi ping and it did not get a response, so the initiator dropped the 
> session. The error was reported as a generic connection error (1011) and 
> when the error was fired the initiator was in the logged in / full 
> feature phase (this basically means normal old use and nothing special).
> 
> Due to the sles use and MPP driver, I am not sure what exactly is 
> running, but for this code path the iscsi driver does this:
> 
>          cls_session = starget_to_session(scsi_target(scmd->device));
>          session = cls_session->dd_data;
> 
> to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels 
> do that has not changed).
> 
> The session pointers are only changed when a session is destroyed, and 
> that only happens if you do a logout of the session (iscsiadm -m .... 
> -u), and at that time when the session is destroyed we should have 
> flushed all IO.
> 
> Are you guys doing a logout of the session with iscsiadm at this time?
> 
> Does MPP clone commands and is it doing something with the command's 
> pointers to the device?
> 
> Another possibilty is that commands are not getting cleaned up 
> correctly. When you see this "connection2:0: detected conn error 
> (1011)", the driver is going to kill all outstadnding commands and and 
> call scsi_done on them to requeue them with the scsi layer, so we should 
> be getting any commands timed out after you see that message (maybe only 
> in some race case where the session's commands are getting flushed at 
> the exact same time the scsi eh was firing). There was no time stamp on 
> the oops output but I doubt this happened. Did the oops happen after the 
> conn error message though?


[-- Attachment #2: iscsid_debug_on.txt --]
[-- Type: text/plain, Size: 6791 bytes --]

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2009.01.24 09:37:15 =~=~=~=~=~=~=~=~=~=~=~=

iscsid: poll result 1
iscsid: in ctldev_handle
iscsid: in nl_read
iscsid: ctldev_handle got event type 102

iscsid: get conn context 0x6d7b30
iscsid: message real length is 72 bytes, recv_handle 0x6d9430
iscsid: in nlpayload_read
iscsid: sched conn context 0x6d7b30 event 3, tmo 0
iscsid: thread 0x6d7b30 schedule: delay 0 state 3
iscsid: exec thread 006d7b30 callback
iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
iscsid: put conn context 0x6d7b30
iscsid: re-opening session 1 (reopen_cnt 0)
iscsid: thread 006d59d0 delete: state 3
iscsid: thread 006d5a08 delete: state 3
iscsid: disconnecting conn 0x6d3508, fd 5
iscsid: in kstop_conn
iscsid: in __kipc_call
iscsid: in kwritev
BUG: unable to handle kernel NULL pointer dereference at 00000000000000ba
IP: [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b
PGD 23dc067 PUD 238c067 PMD 0 
Oops: 0002 [1] SMP 
last sysfs file: /sys/devices/mppvirtual/mppadapter0/host6/target6:0:0/6:0:0:8/block/sdj/dev
CPU 0 
Modules linked in: crc32c libcrc32c radeon drm iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc microcode ipv6 fuse loop dm_mod dcdbas(X) rtc_cmos rtc_core rtc_lib e1000 video mptctl output floppy iTCO_wdt iTCO_vendor_support e752x_edac button pcspkr edac_core shpchp sr_mod pci_hotplug cdrom uhci_hcd ehci_hcd usbcore mppVhba edd ext3 mbcache jbd fan ide_pci_generic piix ide_core ata_generic ata_piix libata dock mptspi mptscsih mptbase scsi_transport_spi thermal processor thermal_sys hwmon mppUpper sg sd_mod crc_t10dif scsi_mod
Supported: Yes, External
Pid: 8, comm: kblockd/0 Tainted: G          2.6.27.11-1-default #1
RIP: 0010:[<ffffffff8049bd5f>]  [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b
RSP: 0018:ffffffff80a36e38  EFLAGS: 00010082
RAX: 0000000000010000 RBX: 0000000000000002 RCX: ffff8800070b2a38
RDX: ffff880002c16000 RSI: 0000000000002007 RDI: 00000000000000ba
RBP: ffff88000472a240 R08: ffff88000472a5a0 R09: 0000000000000000
R10: 0000000000000000 R11: ffff8800010dc228 R12: 0000000000000000
R13: ffff880005482dc8 R14: ffffffff80a36ed0 R15: ffff88000472a5b8
FS:  0000000000000000(0000) GS:ffffffff808cac80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000ba CR3: 00000000023d4000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kblockd/0 (pid: 8, threadinfo ffff880008628000, task ffff880008626200)
Stack:  ffffffffa037172f ffff8800068e5380 ffffffffa0004f99 ffff8800070b28f8
 ffffffff8033e773 ffff8800070b28f8 ffffffff8033e91a 0000000000000282
 ffff88000472a240 0000000000000100 ffffffff80ab7680 ffffffff8033e887
Call Trace:
 [<ffffffffa037172f>] iscsi_eh_cmd_timed_out+0x22/0xa3 [libiscsi]
 [<ffffffffa0004f99>] scsi_times_out+0x46/0x72 [scsi_mod]
 [<ffffffff8033e773>] blk_rq_timed_out+0xe/0x4a
 [<ffffffff8033e91a>] blk_rq_timed_out_timer+0x93/0x119
 [<ffffffff802465f0>] run_timer_softirq+0x18d/0x204
 [<ffffffff80242b4d>] __do_softirq+0x7d/0x107
 [<ffffffff8020d2dc>] call_softirq+0x1c/0x28
 [<ffffffff8020e583>] do_softirq+0x2c/0x68
 [<ffffffff8024287a>] irq_exit+0x3f/0x85
 [<ffffffff8021b4f8>] smp_apic_timer_interrupt+0x95/0xae
 [<ffffffff8020ccb6>] apic_timer_interrupt+0x66/0x70
 [<ffffffff8049bd99>] _spin_unlock_irqrestore+0x5/0x6
 [<ffffffffa0006ed1>] scsi_queue_insert+0x94/0xda [scsi_mod]
 [<ffffffffa000090a>] scsi_dispatch_cmd+0x1ec/0x235 [scsi_mod]
 [<ffffffffa0005fda>] scsi_request_fn+0x391/0x4d3 [scsi_mod]
 [<ffffffff8033ad53>] generic_unplug_device+0x21/0x2e
 [<ffffffff80339b39>] blk_unplug_work+0x5e/0x65
 [<ffffffff8024cdf9>] run_workqueue+0x7a/0x100
 [<ffffffff8024cf57>] worker_thread+0xd8/0xe7
 [<ffffffff8024ff67>] kthread+0x47/0x73
 [<ffffffff8020cf79>] child_rip+0xa/0x11


Code: a2 ec ff c3 9c 58 fa 90 81 2f 00 00 00 01 74 05 e8 a7 a2 ec ff c3 fa 90 81 2f 00 00 00 01 74 05 e8 97 a2 ec ff c3 b8 00 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5 
RIP  [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b
 RSP <ffffffff80a36e38>
CR2: 00000000000000ba
---[ end trace 1330da6161ef128a ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!
------------[ cut here ]------------
WARNING: at kernel/smp.c:331 smp_call_function_mask+0x32/0x1ec()
Modules linked in: crc32c libcrc32c radeon drm iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc microcode ipv6 fuse loop dm_mod dcdbas(X) rtc_cmos rtc_core rtc_lib e1000 video mptctl output floppy iTCO_wdt iTCO_vendor_support e752x_edac button pcspkr edac_core shpchp sr_mod pci_hotplug cdrom uhci_hcd ehci_hcd usbcore mppVhba edd ext3 mbcache jbd fan ide_pci_generic piix ide_core ata_generic ata_piix libata dock mptspi mptscsih mptbase scsi_transport_spi thermal processor thermal_sys hwmon mppUpper sg sd_mod crc_t10dif scsi_mod
Supported: Yes, External
Pid: 8, comm: kblockd/0 Tainted: G      D   2.6.27.11-1-default #1

Call Trace:
 [<ffffffff8020da29>] show_trace_log_lvl+0x41/0x58
 [<ffffffff80499eaf>] dump_stack+0x69/0x6f
 [<ffffffff8023d9b2>] warn_on_slowpath+0x51/0x77
 [<ffffffff8025d609>] smp_call_function_mask+0x32/0x1ec
 [<ffffffff8025d7ec>] smp_call_function+0x29/0x2e
 [<ffffffff8021a34a>] native_smp_send_stop+0x1a/0x26
 [<ffffffff80499f71>] panic+0xbc/0x169
 [<ffffffff80241256>] do_exit+0x67/0x325
 [<ffffffff8049c6dc>] oops_begin+0x0/0x90
 [<ffffffff8049e42e>] do_page_fault+0x5fa/0x6ca
 [<ffffffff8049bfb9>] error_exit+0x0/0x51
 [<ffffffff8049bd5f>] _spin_lock+0x5/0x1b
 [<ffffffffa037172f>] iscsi_eh_cmd_timed_out+0x22/0xa3 [libiscsi]
 [<ffffffffa0004f99>] scsi_times_out+0x46/0x72 [scsi_mod]
 [<ffffffff8033e773>] blk_rq_timed_out+0xe/0x4a
 [<ffffffff8033e91a>] blk_rq_timed_out_timer+0x93/0x119
 [<ffffffff802465f0>] run_timer_softirq+0x18d/0x204
 [<ffffffff80242b4d>] __do_softirq+0x7d/0x107
 [<ffffffff8020d2dc>] call_softirq+0x1c/0x28
 [<ffffffff8020e583>] do_softirq+0x2c/0x68
 [<ffffffff8024287a>] irq_exit+0x3f/0x85
 [<ffffffff8021b4f8>] smp_apic_timer_interrupt+0x95/0xae
 [<ffffffff8020ccb6>] apic_timer_interrupt+0x66/0x70
 [<ffffffff8049bd99>] _spin_unlock_irqrestore+0x5/0x6
 [<ffffffffa0006ed1>] scsi_queue_insert+0x94/0xda [scsi_mod]
 [<ffffffffa000090a>] scsi_dispatch_cmd+0x1ec/0x235 [scsi_mod]
 [<ffffffffa0005fda>] scsi_request_fn+0x391/0x4d3 [scsi_mod]
 [<ffffffff8033ad53>] generic_unplug_device+0x21/0x2e
 [<ffffffff80339b39>] blk_unplug_work+0x5e/0x65
 [<ffffffff8024cdf9>] run_workqueue+0x7a/0x100
 [<ffffffff8024cf57>] worker_thread+0xd8/0xe7
 [<ffffffff8024ff67>] kthread+0x47/0x73
 [<ffffffff8020cf79>] child_rip+0xa/0x11

---[ end trace 1330da6161ef128a ]---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Question on block-layer timeout change
@ 2008-11-11 16:26 Shi, Harris
  0 siblings, 0 replies; 16+ messages in thread
From: Shi, Harris @ 2008-11-11 16:26 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org
  Cc: Jens Axboe, Alan Stern, Tejun Heo, FUJITA Tomonori

Due to the current timeout management change, LSI RDAC (failover) driver had some difficulties in handling SCSI I/O timeout. The RDAC driver is in the similar layer as HBA driver in that it will register into scsi mid-layer as scsi_host_template and stays below mid-layer. All SCSI I/Os coming to RDAC stack are routed by an available path then dispatched to the real HBA driver via mid-layer. We used to rely on the timer in scsi_cmnd->eh_timeout to deal with scsi i/o coming into the RDAC stack. Basically when I/O is coming to RDAC stack, we need to delete the timer for each individual I/O. Then after selecting a specific path for this I/O, we need to send the I/O back to mid-layer with a larger timeout value just to avoid the forced failover. When I/O completes successfully, we added the ori
 ginal timer back to the I/O and pass it over to upper block layer for further process.

However, with the current timeout management functions moving to block layer, it became difficult for us to explicitly control the timeout value for specific SCSI I/O, which ended up causing many issues on failover. Can you shed some lights on how to handle the I/O based timeout in this case?

Harris

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-01-24 16:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3568BBCB98C00041A9E622952FD5F24EA11C9F3A@cosmail03.lsi.com>
2008-11-12  7:29 ` question on block-layer timeout change Mike Anderson
2008-11-12 17:16   ` malahal
2008-11-14  8:51   ` Shi, Harris
2008-11-14 17:18     ` malahal
2008-12-10 23:11       ` Shi, Harris
2008-12-11 11:03         ` Hannes Reinecke
2008-12-16 16:55           ` Shi, Harris
2008-12-17  7:33             ` Hannes Reinecke
2008-12-17 22:38               ` Shi, Harris
2008-12-18  9:23                 ` Mike Anderson
2008-12-18 22:37                   ` Shi, Harris
2009-01-04 17:12                   ` Mike Christie
2009-01-07  6:37                     ` Shi, Harris
2009-01-07 20:46                       ` Mike Christie
2009-01-24 16:34                         ` Shi, Harris
2008-11-11 16:26 Question " Shi, Harris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).