public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
@ 2010-06-16 20:51 Vikas Chaudhary
  2010-06-18 22:58 ` Mike Christie
  0 siblings, 1 reply; 7+ messages in thread
From: Vikas Chaudhary @ 2010-06-16 20:51 UTC (permalink / raw)
  To: james.bottomley, michaelc, linux-scsi; +Cc: ravi.anand, vikas.chaudhary

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Ravi Anand <ravi.anand@qlogic.com>
---
 drivers/scsi/qla4xxx/ql4_def.h  |    3 +--
 drivers/scsi/qla4xxx/ql4_init.c |    6 +++---
 drivers/scsi/qla4xxx/ql4_mbx.c  |    1 -
 drivers/scsi/qla4xxx/ql4_os.c   |    4 ++--
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/qla4xxx/ql4_def.h b/drivers/scsi/qla4xxx/ql4_def.h
index 4288026..2b53c79 100644
--- a/drivers/scsi/qla4xxx/ql4_def.h
+++ b/drivers/scsi/qla4xxx/ql4_def.h
@@ -239,12 +239,12 @@ struct ddb_entry {
 	uint32_t default_time2wait; /* Default Min time between
 				     * relogins (+aens) */
 
-	atomic_t port_down_timer; /* Device connection timer */
 	atomic_t retry_relogin_timer; /* Min Time between relogins
 				       * (4000 only) */
 	atomic_t relogin_timer;	/* Max Time to wait for relogin to complete */
 	atomic_t relogin_retry_count; /* Num of times relogin has been
 				       * retried */
+	uint16_t ka_timeout;
 
 	uint16_t port;
 	uint32_t tpgt;
@@ -394,7 +394,6 @@ struct scsi_qla_host {
 	uint32_t timer_active;
 
 	/* Recovery Timers */
-	uint32_t port_down_retry_count;
 	uint32_t discovery_wait;
 	atomic_t check_relogin_timeouts;
 	uint32_t retry_reset_ha_cnt;
diff --git a/drivers/scsi/qla4xxx/ql4_init.c b/drivers/scsi/qla4xxx/ql4_init.c
index d525405..6cdfe9f 100644
--- a/drivers/scsi/qla4xxx/ql4_init.c
+++ b/drivers/scsi/qla4xxx/ql4_init.c
@@ -548,6 +548,9 @@ static int qla4xxx_update_ddb_entry(struct scsi_qla_host *ha,
 	ddb_entry->default_relogin_timeout =
 		le16_to_cpu(fw_ddb_entry->def_timeout);
 	ddb_entry->default_time2wait = le16_to_cpu(fw_ddb_entry->iscsi_def_time2wait);
+	ddb_entry->ka_timeout = le16_to_cpu(fw_ddb_entry->ka_timeout);
+	if (ddb_entry->sess)
+		ddb_entry->sess->recovery_tmo = ddb_entry->ka_timeout;
 
 	/* Update index in case it changed */
 	ddb_entry->fw_ddb_index = fw_ddb_index;
@@ -633,7 +636,6 @@ static struct ddb_entry * qla4xxx_alloc_ddb(struct scsi_qla_host *ha,
 	}
 
 	ddb_entry->fw_ddb_index = fw_ddb_index;
-	atomic_set(&ddb_entry->port_down_timer, ha->port_down_retry_count);
 	atomic_set(&ddb_entry->retry_relogin_timer, INVALID_ENTRY);
 	atomic_set(&ddb_entry->relogin_timer, 0);
 	atomic_set(&ddb_entry->relogin_retry_count, 0);
@@ -1500,8 +1502,6 @@ int qla4xxx_process_ddb_changed(struct scsi_qla_host *ha, uint32_t fw_ddb_index,
 	/* Device is back online. */
 	if (ddb_entry->fw_ddb_device_state == DDB_DS_SESSION_ACTIVE) {
 		atomic_set(&ddb_entry->state, DDB_STATE_ONLINE);
-		atomic_set(&ddb_entry->port_down_timer,
-			   ha->port_down_retry_count);
 		atomic_set(&ddb_entry->relogin_retry_count, 0);
 		atomic_set(&ddb_entry->relogin_timer, 0);
 		clear_bit(DF_RELOGIN, &ddb_entry->flags);
diff --git a/drivers/scsi/qla4xxx/ql4_mbx.c b/drivers/scsi/qla4xxx/ql4_mbx.c
index 75496fb..aa12ffd 100644
--- a/drivers/scsi/qla4xxx/ql4_mbx.c
+++ b/drivers/scsi/qla4xxx/ql4_mbx.c
@@ -287,7 +287,6 @@ qla4xxx_update_local_ifcb(struct scsi_qla_host *ha,
 	       min(sizeof(ha->alias), sizeof(init_fw_cb->Alias)));*/
 
 	/* Save Command Line Paramater info */
-	ha->port_down_retry_count = le16_to_cpu(init_fw_cb->conn_ka_timeout);
 	ha->discovery_wait = ql4xdiscoverywait;
 
 	if (ha->acb_version == ACB_SUPPORTED) {
diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 38b1d38..11a4bd1 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -155,7 +155,7 @@ static void qla4xxx_recovery_timedout(struct iscsi_cls_session *session)
 		DEBUG2(printk("scsi%ld: %s: index [%d] port down retry count "
 			      "of (%d) secs exhausted, marking device DEAD.\n",
 			      ha->host_no, __func__, ddb_entry->fw_ddb_index,
-			      ha->port_down_retry_count));
+			      ddb_entry->ka_timeout));
 
 		DEBUG2(printk("scsi%ld: %s: scheduling dpc routine - dpc "
 			      "flags = 0x%lx\n",
@@ -286,7 +286,7 @@ int qla4xxx_add_sess(struct ddb_entry *ddb_entry)
 {
 	int err;
 
-	ddb_entry->sess->recovery_tmo = ddb_entry->ha->port_down_retry_count;
+	ddb_entry->sess->recovery_tmo = ddb_entry->ka_timeout;
 	err = iscsi_add_session(ddb_entry->sess, ddb_entry->fw_ddb_index);
 	if (err) {
 		DEBUG2(printk(KERN_ERR "Could not add session.\n"));
-- 
1.7.0.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
  2010-06-16 20:51 [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo Vikas Chaudhary
@ 2010-06-18 22:58 ` Mike Christie
  2010-06-21 23:35   ` Ravi Anand
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Christie @ 2010-06-18 22:58 UTC (permalink / raw)
  To: Vikas Chaudhary; +Cc: james.bottomley, linux-scsi, ravi.anand

Actually, I am going to take back my ACK/review-by on this.

After discussing with Qlogic what the ka_timeout really is, it seems it 
is the wrong value to use. When the session->recovery_tmo is running the 
session is down so iscsi nops cannot be sent.



On 06/16/2010 03:51 PM, Vikas Chaudhary wrote:
> Signed-off-by: Vikas Chaudhary<vikas.chaudhary@qlogic.com>
> Signed-off-by: Ravi Anand<ravi.anand@qlogic.com>
> ---
>   drivers/scsi/qla4xxx/ql4_def.h  |    3 +--
>   drivers/scsi/qla4xxx/ql4_init.c |    6 +++---
>   drivers/scsi/qla4xxx/ql4_mbx.c  |    1 -
>   drivers/scsi/qla4xxx/ql4_os.c   |    4 ++--
>   4 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/scsi/qla4xxx/ql4_def.h b/drivers/scsi/qla4xxx/ql4_def.h
> index 4288026..2b53c79 100644
> --- a/drivers/scsi/qla4xxx/ql4_def.h
> +++ b/drivers/scsi/qla4xxx/ql4_def.h
> @@ -239,12 +239,12 @@ struct ddb_entry {
>   	uint32_t default_time2wait; /* Default Min time between
>   				     * relogins (+aens) */
>
> -	atomic_t port_down_timer; /* Device connection timer */
>   	atomic_t retry_relogin_timer; /* Min Time between relogins
>   				       * (4000 only) */
>   	atomic_t relogin_timer;	/* Max Time to wait for relogin to complete */
>   	atomic_t relogin_retry_count; /* Num of times relogin has been
>   				       * retried */
> +	uint16_t ka_timeout;
>
>   	uint16_t port;
>   	uint32_t tpgt;
> @@ -394,7 +394,6 @@ struct scsi_qla_host {
>   	uint32_t timer_active;
>
>   	/* Recovery Timers */
> -	uint32_t port_down_retry_count;
>   	uint32_t discovery_wait;
>   	atomic_t check_relogin_timeouts;
>   	uint32_t retry_reset_ha_cnt;
> diff --git a/drivers/scsi/qla4xxx/ql4_init.c b/drivers/scsi/qla4xxx/ql4_init.c
> index d525405..6cdfe9f 100644
> --- a/drivers/scsi/qla4xxx/ql4_init.c
> +++ b/drivers/scsi/qla4xxx/ql4_init.c
> @@ -548,6 +548,9 @@ static int qla4xxx_update_ddb_entry(struct scsi_qla_host *ha,
>   	ddb_entry->default_relogin_timeout =
>   		le16_to_cpu(fw_ddb_entry->def_timeout);
>   	ddb_entry->default_time2wait = le16_to_cpu(fw_ddb_entry->iscsi_def_time2wait);
> +	ddb_entry->ka_timeout = le16_to_cpu(fw_ddb_entry->ka_timeout);
> +	if (ddb_entry->sess)
> +		ddb_entry->sess->recovery_tmo = ddb_entry->ka_timeout;
>
>   	/* Update index in case it changed */
>   	ddb_entry->fw_ddb_index = fw_ddb_index;
> @@ -633,7 +636,6 @@ static struct ddb_entry * qla4xxx_alloc_ddb(struct scsi_qla_host *ha,
>   	}
>
>   	ddb_entry->fw_ddb_index = fw_ddb_index;
> -	atomic_set(&ddb_entry->port_down_timer, ha->port_down_retry_count);
>   	atomic_set(&ddb_entry->retry_relogin_timer, INVALID_ENTRY);
>   	atomic_set(&ddb_entry->relogin_timer, 0);
>   	atomic_set(&ddb_entry->relogin_retry_count, 0);
> @@ -1500,8 +1502,6 @@ int qla4xxx_process_ddb_changed(struct scsi_qla_host *ha, uint32_t fw_ddb_index,
>   	/* Device is back online. */
>   	if (ddb_entry->fw_ddb_device_state == DDB_DS_SESSION_ACTIVE) {
>   		atomic_set(&ddb_entry->state, DDB_STATE_ONLINE);
> -		atomic_set(&ddb_entry->port_down_timer,
> -			   ha->port_down_retry_count);
>   		atomic_set(&ddb_entry->relogin_retry_count, 0);
>   		atomic_set(&ddb_entry->relogin_timer, 0);
>   		clear_bit(DF_RELOGIN,&ddb_entry->flags);
> diff --git a/drivers/scsi/qla4xxx/ql4_mbx.c b/drivers/scsi/qla4xxx/ql4_mbx.c
> index 75496fb..aa12ffd 100644
> --- a/drivers/scsi/qla4xxx/ql4_mbx.c
> +++ b/drivers/scsi/qla4xxx/ql4_mbx.c
> @@ -287,7 +287,6 @@ qla4xxx_update_local_ifcb(struct scsi_qla_host *ha,
>   	       min(sizeof(ha->alias), sizeof(init_fw_cb->Alias)));*/
>
>   	/* Save Command Line Paramater info */
> -	ha->port_down_retry_count = le16_to_cpu(init_fw_cb->conn_ka_timeout);
>   	ha->discovery_wait = ql4xdiscoverywait;
>
>   	if (ha->acb_version == ACB_SUPPORTED) {
> diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
> index 38b1d38..11a4bd1 100644
> --- a/drivers/scsi/qla4xxx/ql4_os.c
> +++ b/drivers/scsi/qla4xxx/ql4_os.c
> @@ -155,7 +155,7 @@ static void qla4xxx_recovery_timedout(struct iscsi_cls_session *session)
>   		DEBUG2(printk("scsi%ld: %s: index [%d] port down retry count "
>   			      "of (%d) secs exhausted, marking device DEAD.\n",
>   			      ha->host_no, __func__, ddb_entry->fw_ddb_index,
> -			      ha->port_down_retry_count));
> +			      ddb_entry->ka_timeout));
>
>   		DEBUG2(printk("scsi%ld: %s: scheduling dpc routine - dpc "
>   			      "flags = 0x%lx\n",
> @@ -286,7 +286,7 @@ int qla4xxx_add_sess(struct ddb_entry *ddb_entry)
>   {
>   	int err;
>
> -	ddb_entry->sess->recovery_tmo = ddb_entry->ha->port_down_retry_count;
> +	ddb_entry->sess->recovery_tmo = ddb_entry->ka_timeout;
>   	err = iscsi_add_session(ddb_entry->sess, ddb_entry->fw_ddb_index);
>   	if (err) {
>   		DEBUG2(printk(KERN_ERR "Could not add session.\n"));


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
  2010-06-18 22:58 ` Mike Christie
@ 2010-06-21 23:35   ` Ravi Anand
  2010-06-22 18:12     ` Mike Christie
  0 siblings, 1 reply; 7+ messages in thread
From: Ravi Anand @ 2010-06-21 23:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: Vikas Chaudhary, james.bottomley@suse.de,
	linux-scsi@vger.kernel.org

Mike,

Actually, that's not true.

When we call iscsi_block_session(), session is already down and we are not
sending any NOP.

FW closes connection after ka_timeout expire at FW level.  Then FW
sends notification to driver for DDB state change which basically indicates
that session is in failed state . After that driver mark device *missing* and call iscsi_block_sesstion().
At this point iSCSI connection is already closed,  so FW is not sending
any iSCSI NOP to target.

We are just using value of ka_timeout in sess->recovery_tmo because
we want to wait for some time for connection to comeback else
mark device dead.

Let us know if you see any issue with this implementation.

Thanks
Ravi
On Jun 18, 2010, at 3:58 PM, Mike Christie wrote:

> Actually, I am going to take back my ACK/review-by on this.
> 
> After discussing with Qlogic what the ka_timeout really is, it seems it 
> is the wrong value to use. When the session->recovery_tmo is running the 
> session is down so iscsi nops cannot be sent.
> 
> 
> 
> On 06/16/2010 03:51 PM, Vikas Chaudhary wrote:
>> Signed-off-by: Vikas Chaudhary<vikas.chaudhary@qlogic.com>
>> Signed-off-by: Ravi Anand<ravi.anand@qlogic.com>
>> ---
>>  drivers/scsi/qla4xxx/ql4_def.h  |    3 +--
>>  drivers/scsi/qla4xxx/ql4_init.c |    6 +++---
>>  drivers/scsi/qla4xxx/ql4_mbx.c  |    1 -
>>  drivers/scsi/qla4xxx/ql4_os.c   |    4 ++--
>>  4 files changed, 6 insertions(+), 8 deletions(-)
>> 
>> diff --git a/drivers/scsi/qla4xxx/ql4_def.h b/drivers/scsi/qla4xxx/ql4_def.h
>> index 4288026..2b53c79 100644
>> --- a/drivers/scsi/qla4xxx/ql4_def.h
>> +++ b/drivers/scsi/qla4xxx/ql4_def.h
>> @@ -239,12 +239,12 @@ struct ddb_entry {
>>  	uint32_t default_time2wait; /* Default Min time between
>>  				     * relogins (+aens) */
>> 
>> -	atomic_t port_down_timer; /* Device connection timer */
>>  	atomic_t retry_relogin_timer; /* Min Time between relogins
>>  				       * (4000 only) */
>>  	atomic_t relogin_timer;	/* Max Time to wait for relogin to complete */
>>  	atomic_t relogin_retry_count; /* Num of times relogin has been
>>  				       * retried */
>> +	uint16_t ka_timeout;
>> 
>>  	uint16_t port;
>>  	uint32_t tpgt;
>> @@ -394,7 +394,6 @@ struct scsi_qla_host {
>>  	uint32_t timer_active;
>> 
>>  	/* Recovery Timers */
>> -	uint32_t port_down_retry_count;
>>  	uint32_t discovery_wait;
>>  	atomic_t check_relogin_timeouts;
>>  	uint32_t retry_reset_ha_cnt;
>> diff --git a/drivers/scsi/qla4xxx/ql4_init.c b/drivers/scsi/qla4xxx/ql4_init.c
>> index d525405..6cdfe9f 100644
>> --- a/drivers/scsi/qla4xxx/ql4_init.c
>> +++ b/drivers/scsi/qla4xxx/ql4_init.c
>> @@ -548,6 +548,9 @@ static int qla4xxx_update_ddb_entry(struct scsi_qla_host *ha,
>>  	ddb_entry->default_relogin_timeout =
>>  		le16_to_cpu(fw_ddb_entry->def_timeout);
>>  	ddb_entry->default_time2wait = le16_to_cpu(fw_ddb_entry->iscsi_def_time2wait);
>> +	ddb_entry->ka_timeout = le16_to_cpu(fw_ddb_entry->ka_timeout);
>> +	if (ddb_entry->sess)
>> +		ddb_entry->sess->recovery_tmo = ddb_entry->ka_timeout;
>> 
>>  	/* Update index in case it changed */
>>  	ddb_entry->fw_ddb_index = fw_ddb_index;
>> @@ -633,7 +636,6 @@ static struct ddb_entry * qla4xxx_alloc_ddb(struct scsi_qla_host *ha,
>>  	}
>> 
>>  	ddb_entry->fw_ddb_index = fw_ddb_index;
>> -	atomic_set(&ddb_entry->port_down_timer, ha->port_down_retry_count);
>>  	atomic_set(&ddb_entry->retry_relogin_timer, INVALID_ENTRY);
>>  	atomic_set(&ddb_entry->relogin_timer, 0);
>>  	atomic_set(&ddb_entry->relogin_retry_count, 0);
>> @@ -1500,8 +1502,6 @@ int qla4xxx_process_ddb_changed(struct scsi_qla_host *ha, uint32_t fw_ddb_index,
>>  	/* Device is back online. */
>>  	if (ddb_entry->fw_ddb_device_state == DDB_DS_SESSION_ACTIVE) {
>>  		atomic_set(&ddb_entry->state, DDB_STATE_ONLINE);
>> -		atomic_set(&ddb_entry->port_down_timer,
>> -			   ha->port_down_retry_count);
>>  		atomic_set(&ddb_entry->relogin_retry_count, 0);
>>  		atomic_set(&ddb_entry->relogin_timer, 0);
>>  		clear_bit(DF_RELOGIN,&ddb_entry->flags);
>> diff --git a/drivers/scsi/qla4xxx/ql4_mbx.c b/drivers/scsi/qla4xxx/ql4_mbx.c
>> index 75496fb..aa12ffd 100644
>> --- a/drivers/scsi/qla4xxx/ql4_mbx.c
>> +++ b/drivers/scsi/qla4xxx/ql4_mbx.c
>> @@ -287,7 +287,6 @@ qla4xxx_update_local_ifcb(struct scsi_qla_host *ha,
>>  	       min(sizeof(ha->alias), sizeof(init_fw_cb->Alias)));*/
>> 
>>  	/* Save Command Line Paramater info */
>> -	ha->port_down_retry_count = le16_to_cpu(init_fw_cb->conn_ka_timeout);
>>  	ha->discovery_wait = ql4xdiscoverywait;
>> 
>>  	if (ha->acb_version == ACB_SUPPORTED) {
>> diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
>> index 38b1d38..11a4bd1 100644
>> --- a/drivers/scsi/qla4xxx/ql4_os.c
>> +++ b/drivers/scsi/qla4xxx/ql4_os.c
>> @@ -155,7 +155,7 @@ static void qla4xxx_recovery_timedout(struct iscsi_cls_session *session)
>>  		DEBUG2(printk("scsi%ld: %s: index [%d] port down retry count "
>>  			      "of (%d) secs exhausted, marking device DEAD.\n",
>>  			      ha->host_no, __func__, ddb_entry->fw_ddb_index,
>> -			      ha->port_down_retry_count));
>> +			      ddb_entry->ka_timeout));
>> 
>>  		DEBUG2(printk("scsi%ld: %s: scheduling dpc routine - dpc "
>>  			      "flags = 0x%lx\n",
>> @@ -286,7 +286,7 @@ int qla4xxx_add_sess(struct ddb_entry *ddb_entry)
>>  {
>>  	int err;
>> 
>> -	ddb_entry->sess->recovery_tmo = ddb_entry->ha->port_down_retry_count;
>> +	ddb_entry->sess->recovery_tmo = ddb_entry->ka_timeout;
>>  	err = iscsi_add_session(ddb_entry->sess, ddb_entry->fw_ddb_index);
>>  	if (err) {
>>  		DEBUG2(printk(KERN_ERR "Could not add session.\n"));
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
  2010-06-21 23:35   ` Ravi Anand
@ 2010-06-22 18:12     ` Mike Christie
  2010-06-22 20:55       ` Ravi Anand
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Christie @ 2010-06-22 18:12 UTC (permalink / raw)
  To: Ravi Anand
  Cc: Vikas Chaudhary, james.bottomley@suse.de,
	linux-scsi@vger.kernel.org

On 06/21/2010 06:35 PM, Ravi Anand wrote:
> Mike,
>
> Actually, that's not true.
>

I think you are misunderstanding me.


> When we call iscsi_block_session(), session is already down and we are not
> sending any NOP.
>

I never said you were going to send a nop. I was saying that you guys 
said the ka_timeout is used to detemine when to send a nop.

You guys said:
<start>
It's time interval between connection keep-alive ping. When a connection
is idle for the connection keep-alive timeout interval, the ISP4xxx
sends an iSCSI NOP PDU to the other device that is part of the connection.
When the device responses to the NOP PDU, the connection remains open.
When the device fails to respond, the ISP4010 closes the connection
and informs the driver that the connection has gone down.
<end>

so I was saying if the session is down, nops cannot be sent so 
ka_timeout does not come into play at that time (it has already expired).

> FW closes connection after ka_timeout expire at FW level.  Then FW
> sends notification to driver for DDB state change which basically indicates
> that session is in failed state . After that driver mark device *missing* and call iscsi_block_sesstion().
> At this point iSCSI connection is already closed,  so FW is not sending
> any iSCSI NOP to target.

Yeah, that is what I was saying.

>
> We are just using value of ka_timeout in sess->recovery_tmo because
> we want to wait for some time for connection to comeback else
> mark device dead.

I agree 100% that you should have a timer to wait for the session to 
come back. libiscsi has the noop settings which is equivalent to your 
ka_timeout and then for the recovery_tmo we have the 
replacement/recovery_tmo setting. My point is that there is no 
requirement and no good reason to set sess->recovery_tmo based on the 
same setting.

I am saying ka_timeout is not used for the same operations as 
session->recovery_tmo. They handle two separate events, and should be 
two separate settings. Once ka_timeout has expired we have marked the 
session as down, and now we are on the recovery stage, so a new timer 
comes into play.

So ka_timeout is used to determine when we want to mark a session as bad.

session->recovery_tmo is used to determine when we want give up on the 
session.

They should be 2 different settings since testing a SAN/port can take 
different times than doing recovery of a session.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
  2010-06-22 18:12     ` Mike Christie
@ 2010-06-22 20:55       ` Ravi Anand
  2010-06-23 19:21         ` Mike Christie
  0 siblings, 1 reply; 7+ messages in thread
From: Ravi Anand @ 2010-06-22 20:55 UTC (permalink / raw)
  To: Mike Christie
  Cc: Vikas Chaudhary, james.bottomley@suse.de,
	linux-scsi@vger.kernel.org


On Jun 22, 2010, at 11:12 AM, Mike Christie wrote:

> On 06/21/2010 06:35 PM, Ravi Anand wrote:
>> Mike,
>> 
>> Actually, that's not true.
>> 
> 
> I think you are misunderstanding me.
> 
> 
>> When we call iscsi_block_session(), session is already down and we are not
>> sending any NOP.
>> 
> 
> I never said you were going to send a nop. I was saying that you guys 
> said the ka_timeout is used to detemine when to send a nop.
> 
> You guys said:
> <start>
> It's time interval between connection keep-alive ping. When a connection
> is idle for the connection keep-alive timeout interval, the ISP4xxx
> sends an iSCSI NOP PDU to the other device that is part of the connection.
> When the device responses to the NOP PDU, the connection remains open.
> When the device fails to respond, the ISP4010 closes the connection
> and informs the driver that the connection has gone down.
> <end>
> 
> so I was saying if the session is down, nops cannot be sent so 
> ka_timeout does not come into play at that time (it has already expired).
> 
>> FW closes connection after ka_timeout expire at FW level.  Then FW
>> sends notification to driver for DDB state change which basically indicates
>> that session is in failed state . After that driver mark device *missing* and call iscsi_block_sesstion().
>> At this point iSCSI connection is already closed,  so FW is not sending
>> any iSCSI NOP to target.
> 
> Yeah, that is what I was saying.
> 
>> 
>> We are just using value of ka_timeout in sess->recovery_tmo because
>> we want to wait for some time for connection to comeback else
>> mark device dead.
> 
> I agree 100% that you should have a timer to wait for the session to 
> come back. libiscsi has the noop settings which is equivalent to your 
> ka_timeout and then for the recovery_tmo we have the 
> replacement/recovery_tmo setting. My point is that there is no 
> requirement and no good reason to set sess->recovery_tmo based on the 
> same setting.
> 
> I am saying ka_timeout is not used for the same operations as 
> session->recovery_tmo. They handle two separate events, and should be 
> two separate settings. Once ka_timeout has expired we have marked the 
> session as down, and now we are on the recovery stage, so a new timer 
> comes into play.
> 
> So ka_timeout is used to determine when we want to mark a session as bad.
> 
> session->recovery_tmo is used to determine when we want give up on the 
> session.
> 
> They should be 2 different settings since testing a SAN/port can take 
> different times than doing recovery of a session.

Thanks for the clarification. What about if we provide a cmd line option
for user's to fine tune if needed, with default value set.
Sound's OK ? 

Anyway we will look into libiscsi code as well.

Thanks
ravi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
  2010-06-22 20:55       ` Ravi Anand
@ 2010-06-23 19:21         ` Mike Christie
  2010-07-06  5:20           ` Vikas Chaudhary
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Christie @ 2010-06-23 19:21 UTC (permalink / raw)
  To: Ravi Anand
  Cc: Vikas Chaudhary, james.bottomley@suse.de,
	linux-scsi@vger.kernel.org

On 06/22/2010 03:55 PM, Ravi Anand wrote:
> Thanks for the clarification. What about if we provide a cmd line option
> for user's to fine tune if needed, with default value set.
> Sound's OK ?

iscsi recovery_tmo/replacement_tmo is the same as FC's fast_io_fail_tmo. 
So I think we want to make recovery_tmo/replacement_tmo's sysfs file 
writable, so tools like multipath/multipathd can set it like they do for 
fast_io_fail.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
  2010-06-23 19:21         ` Mike Christie
@ 2010-07-06  5:20           ` Vikas Chaudhary
  0 siblings, 0 replies; 7+ messages in thread
From: Vikas Chaudhary @ 2010-07-06  5:20 UTC (permalink / raw)
  To: Mike Christie
  Cc: james.bottomley@suse.de, linux-scsi@vger.kernel.org, Ravi Anand

>-----Original Message-----
>From: Mike Christie [mailto:michaelc@cs.wisc.edu]
>Sent: Thursday, June 24, 2010 12:52 AM
>To: Ravi Anand
>Cc: Vikas Chaudhary; james.bottomley@suse.de; linux-scsi@vger.kernel.org
>Subject: Re: [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo
>
>On 06/22/2010 03:55 PM, Ravi Anand wrote:
>> Thanks for the clarification. What about if we provide a cmd line option
>> for user's to fine tune if needed, with default value set.
>> Sound's OK ?
>
>iscsi recovery_tmo/replacement_tmo is the same as FC's fast_io_fail_tmo.
>So I think we want to make recovery_tmo/replacement_tmo's sysfs file
>writable, so tools like multipath/multipathd can set it like they do for
>fast_io_fail.

I added support in iscsi midlayer to modify recovery_tmo from sysfs.
I am sending rfc patches for the same in the next email.
Please review and let us know what you think.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-06  5:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-16 20:51 [PATCH 04/11] qla4xxx: set correct value in sess->recovery_tmo Vikas Chaudhary
2010-06-18 22:58 ` Mike Christie
2010-06-21 23:35   ` Ravi Anand
2010-06-22 18:12     ` Mike Christie
2010-06-22 20:55       ` Ravi Anand
2010-06-23 19:21         ` Mike Christie
2010-07-06  5:20           ` Vikas Chaudhary

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox