linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/5] zfcp fixes for 2.6.32-rc4
@ 2009-10-13  8:44 Christof Schmitt
  2009-10-13  8:44 ` [patch 1/5] zfcp: fix kfree handling in zfcp_init_device_setup Christof Schmitt
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-13  8:44 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens

James,

here is the (hopefully) last round of zfcp fixes for the 2.6.32 cycle.
The largest change is a fix for zfcp to handle storage devices that
return a wrong wwpn in the plogi payload. The patches apply on the
current linux-2.6 git tree (-rc4 + )

--
Christof Schmitt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 1/5] zfcp: fix kfree handling in zfcp_init_device_setup
  2009-10-13  8:44 [patch 0/5] zfcp fixes for 2.6.32-rc4 Christof Schmitt
@ 2009-10-13  8:44 ` Christof Schmitt
  2009-10-13  8:44 ` [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload Christof Schmitt
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-13  8:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Christof Schmitt

[-- Attachment #1: 705-zfcp-init-device-kfree.diff --]
[-- Type: text/plain, Size: 1524 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

The pointer that is allocated with kmalloc() is passed to strsep()
which modifies it. Later on the modified pointer value will be passed
to kfree. Save the original pointer and pass that one to kfree
instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_aux.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff -urpN linux-2.6/drivers/s390/scsi/zfcp_aux.c linux-2.6-patched/drivers/s390/scsi/zfcp_aux.c
--- linux-2.6/drivers/s390/scsi/zfcp_aux.c	2009-10-07 10:17:01.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_aux.c	2009-10-07 10:17:01.000000000 +0200
@@ -128,12 +128,13 @@ out_ccwdev:
 static void __init zfcp_init_device_setup(char *devstr)
 {
 	char *token;
-	char *str;
+	char *str, *str_saved;
 	char busid[ZFCP_BUS_ID_SIZE];
 	u64 wwpn, lun;
 
 	/* duplicate devstr and keep the original for sysfs presentation*/
-	str = kmalloc(strlen(devstr) + 1, GFP_KERNEL);
+	str_saved = kmalloc(strlen(devstr) + 1, GFP_KERNEL);
+	str = str_saved;
 	if (!str)
 		return;
 
@@ -152,12 +153,12 @@ static void __init zfcp_init_device_setu
 	if (!token || strict_strtoull(token, 0, (unsigned long long *) &lun))
 		goto err_out;
 
-	kfree(str);
+	kfree(str_saved);
 	zfcp_init_device_configure(busid, wwpn, lun);
 	return;
 
- err_out:
-	kfree(str);
+err_out:
+	kfree(str_saved);
 	pr_err("%s is not a valid SCSI device\n", devstr);
 }
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload
  2009-10-13  8:44 [patch 0/5] zfcp fixes for 2.6.32-rc4 Christof Schmitt
  2009-10-13  8:44 ` [patch 1/5] zfcp: fix kfree handling in zfcp_init_device_setup Christof Schmitt
@ 2009-10-13  8:44 ` Christof Schmitt
  2009-10-13 16:51   ` Joe Eykholt
  2009-10-14  9:00   ` [updated patch (fix comment)][patch " Christof Schmitt
  2009-10-13  8:44 ` [patch 3/5] zfcp: Warn about storage devices with broken PLOGI data Christof Schmitt
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-13  8:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Christof Schmitt

[-- Attachment #1: 706-zfcp-wwpn-mismatch.diff --]
[-- Type: text/plain, Size: 4656 bytes --]

From: Christof Schmitt <christof.schmitt@de.ibm.com>

For ports, zfcp gets the DID from the FC nameserver and tries to open
the port. If the open succeeds, zfcp compares the WWPN from the
nameserver with the WWPN in the PLOGI payload. In case of a mismatch,
zfcp assumes that the DID of the port just changed and we opened the
wrong port. This means that zfcp has to forget the DID, lookup the DID
again and retry.

This error case had a problem that zfcp forgets the DID, but never
looks up a new one, stalling the ERP in this case. Fix this by
triggering the DID lookup and properly exit from the ERP. The DID
lookup will trigger a new ERP action.

Also ensure when trying to open the port again with the new DID, first
close the open port, even in the NOESC case.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_erp.c |   22 ++++++++++------------
 drivers/s390/scsi/zfcp_ext.h |    1 +
 drivers/s390/scsi/zfcp_fc.c  |   11 +++++++++++
 3 files changed, 22 insertions(+), 12 deletions(-)

diff -urpN linux-2.6/drivers/s390/scsi/zfcp_erp.c linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c
--- linux-2.6/drivers/s390/scsi/zfcp_erp.c	2009-10-07 10:16:33.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c	2009-10-07 10:17:01.000000000 +0200
@@ -858,10 +858,7 @@ static int zfcp_erp_port_strategy_open_c
 		if (fc_host_port_type(adapter->scsi_host) == FC_PORTTYPE_PTP)
 			return zfcp_erp_open_ptp_port(act);
 		if (!port->d_id) {
-			zfcp_port_get(port);
-			if (!queue_work(adapter->work_queue,
-					&port->gid_pn_work))
-				zfcp_port_put(port);
+			zfcp_fc_trigger_did_lookup(port);
 			return ZFCP_ERP_EXIT;
 		}
 		return zfcp_erp_port_strategy_open_port(act);
@@ -869,12 +866,11 @@ static int zfcp_erp_port_strategy_open_c
 	case ZFCP_ERP_STEP_PORT_OPENING:
 		/* D_ID might have changed during open */
 		if (p_status & ZFCP_STATUS_COMMON_OPEN) {
-			if (port->d_id)
-				return ZFCP_ERP_SUCCEEDED;
-			else {
-				act->step = ZFCP_ERP_STEP_PORT_CLOSING;
-				return ZFCP_ERP_CONTINUES;
+			if (!port->d_id) {
+				zfcp_fc_trigger_did_lookup(port);
+				return ZFCP_ERP_EXIT;
 			}
+			return ZFCP_ERP_SUCCEEDED;
 		}
 		if (port->d_id && !(p_status & ZFCP_STATUS_COMMON_NOESC)) {
 			port->d_id = 0;
@@ -889,19 +885,21 @@ static int zfcp_erp_port_strategy_open_c
 static int zfcp_erp_port_strategy(struct zfcp_erp_action *erp_action)
 {
 	struct zfcp_port *port = erp_action->port;
+	int p_status = atomic_read(&port->status);
 
-	if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_NOESC)
+	if ((p_status & ZFCP_STATUS_COMMON_NOESC) &&
+	    !(p_status & ZFCP_STATUS_COMMON_OPEN))
 		goto close_init_done;
 
 	switch (erp_action->step) {
 	case ZFCP_ERP_STEP_UNINITIALIZED:
 		zfcp_erp_port_strategy_clearstati(port);
-		if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)
+		if (p_status & ZFCP_STATUS_COMMON_OPEN)
 			return zfcp_erp_port_strategy_close(erp_action);
 		break;
 
 	case ZFCP_ERP_STEP_PORT_CLOSING:
-		if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)
+		if (p_status & ZFCP_STATUS_COMMON_OPEN)
 			return ZFCP_ERP_FAILED;
 		break;
 	}
diff -urpN linux-2.6/drivers/s390/scsi/zfcp_ext.h linux-2.6-patched/drivers/s390/scsi/zfcp_ext.h
--- linux-2.6/drivers/s390/scsi/zfcp_ext.h	2009-10-07 10:16:59.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_ext.h	2009-10-07 10:17:01.000000000 +0200
@@ -96,6 +96,7 @@ extern int zfcp_fc_scan_ports(struct zfc
 extern void _zfcp_fc_scan_ports_later(struct work_struct *);
 extern void zfcp_fc_incoming_els(struct zfcp_fsf_req *);
 extern void zfcp_fc_port_did_lookup(struct work_struct *);
+extern void zfcp_fc_trigger_did_lookup(struct zfcp_port *);
 extern void zfcp_fc_plogi_evaluate(struct zfcp_port *, struct fsf_plogi *);
 extern void zfcp_fc_test_link(struct zfcp_port *);
 extern void zfcp_fc_link_test_work(struct work_struct *);
diff -urpN linux-2.6/drivers/s390/scsi/zfcp_fc.c linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c
--- linux-2.6/drivers/s390/scsi/zfcp_fc.c	2009-10-07 10:16:33.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c	2009-10-07 10:17:01.000000000 +0200
@@ -361,6 +361,17 @@ out:
 }
 
 /**
+ * zfcp_fc_trigger_did_ookup - trigger the d_id lookup using a GPN_FT request
+ * @port: The zfcp_port to lookup the d_id for.
+ */
+void zfcp_fc_trigger_did_lookup(struct zfcp_port *port)
+{
+	zfcp_port_get(port);
+	if (!queue_work(port->adapter->work_queue, &port->gid_pn_work))
+		zfcp_port_put(port);
+}
+
+/**
  * zfcp_fc_plogi_evaluate - evaluate PLOGI playload
  * @port: zfcp_port structure
  * @plogi: plogi payload


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 3/5] zfcp: Warn about storage devices with broken PLOGI data
  2009-10-13  8:44 [patch 0/5] zfcp fixes for 2.6.32-rc4 Christof Schmitt
  2009-10-13  8:44 ` [patch 1/5] zfcp: fix kfree handling in zfcp_init_device_setup Christof Schmitt
  2009-10-13  8:44 ` [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload Christof Schmitt
@ 2009-10-13  8:44 ` Christof Schmitt
  2009-10-13  8:44 ` [patch 4/5] zfcp: Fix timer initialization for ct and els requests Christof Schmitt
  2009-10-13  8:44 ` [patch 5/5] zfcp: Flush SCSI registration work when adding unit Christof Schmitt
  4 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-13  8:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Christof Schmitt

[-- Attachment #1: 713-zfcp-broken-plogi.diff --]
[-- Type: text/plain, Size: 1668 bytes --]

From: Christof Schmitt <christof.schmitt@de.ibm.com>

After opening a remote port zfcp checks if the WWPN returned in the
PLOGI maches the WWPN of the port that should have been opened. On a
mismatch zfcp assumes that the DID just changed, queries the FC
nameserver and tries again. If the situation persists the erp will
give up.

With this strategy, if the remote port always returns the wrong PLOGI
data, the remote port will not be opened. Introduce a warning, so that
the system administrator knows why the remote port is not being opened
and to have a pointer to investigate the problem on the storage
system.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_fsf.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/drivers/s390/scsi/zfcp_fsf.c	2009-10-12 09:56:22.000000000 +0200
+++ b/drivers/s390/scsi/zfcp_fsf.c	2009-10-12 09:56:27.000000000 +0200
@@ -1475,9 +1475,16 @@ static void zfcp_fsf_open_port_handler(s
 		plogi = (struct fsf_plogi *) req->qtcb->bottom.support.els;
 		if (req->qtcb->bottom.support.els1_length >=
 		    FSF_PLOGI_MIN_LEN) {
-			if (plogi->serv_param.wwpn != port->wwpn)
+			if (plogi->serv_param.wwpn != port->wwpn) {
 				port->d_id = 0;
-			else {
+				dev_warn(&port->adapter->ccw_device->dev,
+					 "A port opened with WWPN 0x%016Lx "
+					 "returned data that identifies it as "
+					 "WWPN 0x%016Lx\n",
+					 (unsigned long long) port->wwpn,
+					 (unsigned long long)
+					  plogi->serv_param.wwpn);
+			} else {
 				port->wwnn = plogi->serv_param.wwnn;
 				zfcp_fc_plogi_evaluate(port, plogi);
 			}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 4/5] zfcp: Fix timer initialization for ct and els requests
  2009-10-13  8:44 [patch 0/5] zfcp fixes for 2.6.32-rc4 Christof Schmitt
                   ` (2 preceding siblings ...)
  2009-10-13  8:44 ` [patch 3/5] zfcp: Warn about storage devices with broken PLOGI data Christof Schmitt
@ 2009-10-13  8:44 ` Christof Schmitt
  2009-10-13  8:44 ` [patch 5/5] zfcp: Flush SCSI registration work when adding unit Christof Schmitt
  4 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-13  8:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Christof Schmitt

[-- Attachment #1: 715-zfcp-timer-init.diff --]
[-- Type: text/plain, Size: 808 bytes --]

From: Christof Schmitt <christof.schmitt@de.ibm.com>

Add HZ since the start_timer function expects jiffies, not seconds.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_fsf.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/s390/scsi/zfcp_fsf.c	2009-10-12 09:56:27.000000000 +0200
+++ b/drivers/s390/scsi/zfcp_fsf.c	2009-10-12 09:56:31.000000000 +0200
@@ -1079,7 +1079,7 @@ static int zfcp_fsf_setup_ct_els(struct 
 	/* common settings for ct/gs and els requests */
 	req->qtcb->bottom.support.service_class = FSF_CLASS_3;
 	req->qtcb->bottom.support.timeout = 2 * R_A_TOV;
-	zfcp_fsf_start_timer(req, 2 * R_A_TOV + 10);
+	zfcp_fsf_start_timer(req, (2 * R_A_TOV + 10) * HZ);
 
 	return 0;
 }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 5/5] zfcp: Flush SCSI registration work when adding unit
  2009-10-13  8:44 [patch 0/5] zfcp fixes for 2.6.32-rc4 Christof Schmitt
                   ` (3 preceding siblings ...)
  2009-10-13  8:44 ` [patch 4/5] zfcp: Fix timer initialization for ct and els requests Christof Schmitt
@ 2009-10-13  8:44 ` Christof Schmitt
  4 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-13  8:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Christof Schmitt

[-- Attachment #1: 718-zfcp-flush-work.diff --]
[-- Type: text/plain, Size: 1064 bytes --]

From: Christof Schmitt <christof.schmitt@de.ibm.com>

When configuring a LUN for use in zfcp, flush the SCSI work to ensure
the SCSI device has been created before returning. This means that a
configuration procedure can run these commands in a script and the
SCSI device is available immediately after the unit_add:

echo 1 > /sys/bus/ccw/drivers/zfcp/0.0.181d/online
echo 0x401040C300000000 > \
        /sys/bus/ccw/drivers/zfcp/0.0.181d/0x500507630313c562/unit_add
lsscsi

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_sysfs.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/s390/scsi/zfcp_sysfs.c	2009-10-07 15:54:50.000000000 +0200
+++ b/drivers/s390/scsi/zfcp_sysfs.c	2009-10-09 14:21:14.000000000 +0200
@@ -224,6 +224,7 @@ static ssize_t zfcp_sysfs_unit_add_store
 
 	zfcp_erp_unit_reopen(unit, 0, "syuas_1", NULL);
 	zfcp_erp_wait(unit->port->adapter);
+	flush_work(&unit->scsi_work);
 	zfcp_unit_put(unit);
 out:
 	mutex_unlock(&zfcp_data.config_mutex);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload
  2009-10-13  8:44 ` [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload Christof Schmitt
@ 2009-10-13 16:51   ` Joe Eykholt
  2009-10-13 18:09     ` James Smart
  2009-10-14  8:42     ` Christof Schmitt
  2009-10-14  9:00   ` [updated patch (fix comment)][patch " Christof Schmitt
  1 sibling, 2 replies; 10+ messages in thread
From: Joe Eykholt @ 2009-10-13 16:51 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: James Bottomley, linux-scsi, linux-s390, schwidefsky,
	heiko.carstens

Christof Schmitt wrote:
> From: Christof Schmitt <christof.schmitt@de.ibm.com>
> 
> For ports, zfcp gets the DID from the FC nameserver and tries to open
> the port. If the open succeeds, zfcp compares the WWPN from the
> nameserver with the WWPN in the PLOGI payload. In case of a mismatch,
> zfcp assumes that the DID of the port just changed and we opened the
> wrong port. This means that zfcp has to forget the DID, lookup the DID
> again and retry.

Does this happen very often?  Would you get an RSCN if a target's WWPN changed?
Just wondering how this happens and whether libfc needs similar defenses.
Is it due to a broken target?

See below for a minor typo.

> This error case had a problem that zfcp forgets the DID, but never
> looks up a new one, stalling the ERP in this case. Fix this by
> triggering the DID lookup and properly exit from the ERP. The DID
> lookup will trigger a new ERP action.
> 
> Also ensure when trying to open the port again with the new DID, first
> close the open port, even in the NOESC case.
> 
> Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
> ---
> 
>  drivers/s390/scsi/zfcp_erp.c |   22 ++++++++++------------
>  drivers/s390/scsi/zfcp_ext.h |    1 +
>  drivers/s390/scsi/zfcp_fc.c  |   11 +++++++++++
>  3 files changed, 22 insertions(+), 12 deletions(-)
> 
> diff -urpN linux-2.6/drivers/s390/scsi/zfcp_erp.c linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c
> --- linux-2.6/drivers/s390/scsi/zfcp_erp.c	2009-10-07 10:16:33.000000000 +0200
> +++ linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c	2009-10-07 10:17:01.000000000 +0200
> @@ -858,10 +858,7 @@ static int zfcp_erp_port_strategy_open_c
>  		if (fc_host_port_type(adapter->scsi_host) == FC_PORTTYPE_PTP)
>  			return zfcp_erp_open_ptp_port(act);
>  		if (!port->d_id) {
> -			zfcp_port_get(port);
> -			if (!queue_work(adapter->work_queue,
> -					&port->gid_pn_work))
> -				zfcp_port_put(port);
> +			zfcp_fc_trigger_did_lookup(port);
>  			return ZFCP_ERP_EXIT;
>  		}
>  		return zfcp_erp_port_strategy_open_port(act);
> @@ -869,12 +866,11 @@ static int zfcp_erp_port_strategy_open_c
>  	case ZFCP_ERP_STEP_PORT_OPENING:
>  		/* D_ID might have changed during open */
>  		if (p_status & ZFCP_STATUS_COMMON_OPEN) {
> -			if (port->d_id)
> -				return ZFCP_ERP_SUCCEEDED;
> -			else {
> -				act->step = ZFCP_ERP_STEP_PORT_CLOSING;
> -				return ZFCP_ERP_CONTINUES;
> +			if (!port->d_id) {
> +				zfcp_fc_trigger_did_lookup(port);
> +				return ZFCP_ERP_EXIT;
>  			}
> +			return ZFCP_ERP_SUCCEEDED;
>  		}
>  		if (port->d_id && !(p_status & ZFCP_STATUS_COMMON_NOESC)) {
>  			port->d_id = 0;
> @@ -889,19 +885,21 @@ static int zfcp_erp_port_strategy_open_c
>  static int zfcp_erp_port_strategy(struct zfcp_erp_action *erp_action)
>  {
>  	struct zfcp_port *port = erp_action->port;
> +	int p_status = atomic_read(&port->status);
>  
> -	if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_NOESC)
> +	if ((p_status & ZFCP_STATUS_COMMON_NOESC) &&
> +	    !(p_status & ZFCP_STATUS_COMMON_OPEN))
>  		goto close_init_done;
>  
>  	switch (erp_action->step) {
>  	case ZFCP_ERP_STEP_UNINITIALIZED:
>  		zfcp_erp_port_strategy_clearstati(port);
> -		if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)
> +		if (p_status & ZFCP_STATUS_COMMON_OPEN)
>  			return zfcp_erp_port_strategy_close(erp_action);
>  		break;
>  
>  	case ZFCP_ERP_STEP_PORT_CLOSING:
> -		if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)
> +		if (p_status & ZFCP_STATUS_COMMON_OPEN)
>  			return ZFCP_ERP_FAILED;
>  		break;
>  	}
> diff -urpN linux-2.6/drivers/s390/scsi/zfcp_ext.h linux-2.6-patched/drivers/s390/scsi/zfcp_ext.h
> --- linux-2.6/drivers/s390/scsi/zfcp_ext.h	2009-10-07 10:16:59.000000000 +0200
> +++ linux-2.6-patched/drivers/s390/scsi/zfcp_ext.h	2009-10-07 10:17:01.000000000 +0200
> @@ -96,6 +96,7 @@ extern int zfcp_fc_scan_ports(struct zfc
>  extern void _zfcp_fc_scan_ports_later(struct work_struct *);
>  extern void zfcp_fc_incoming_els(struct zfcp_fsf_req *);
>  extern void zfcp_fc_port_did_lookup(struct work_struct *);
> +extern void zfcp_fc_trigger_did_lookup(struct zfcp_port *);
>  extern void zfcp_fc_plogi_evaluate(struct zfcp_port *, struct fsf_plogi *);
>  extern void zfcp_fc_test_link(struct zfcp_port *);
>  extern void zfcp_fc_link_test_work(struct work_struct *);
> diff -urpN linux-2.6/drivers/s390/scsi/zfcp_fc.c linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c
> --- linux-2.6/drivers/s390/scsi/zfcp_fc.c	2009-10-07 10:16:33.000000000 +0200
> +++ linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c	2009-10-07 10:17:01.000000000 +0200
> @@ -361,6 +361,17 @@ out:
>  }
>  
>  /**
> + * zfcp_fc_trigger_did_ookup - trigger the d_id lookup using a GPN_FT request

s/ook/look/

I just happened to notice that.  I didn't review the rest.

> + * @port: The zfcp_port to lookup the d_id for.
> + */
> +void zfcp_fc_trigger_did_lookup(struct zfcp_port *port)
> +{
> +	zfcp_port_get(port);
> +	if (!queue_work(port->adapter->work_queue, &port->gid_pn_work))
> +		zfcp_port_put(port);
> +}
> +
> +/**
>   * zfcp_fc_plogi_evaluate - evaluate PLOGI playload
>   * @port: zfcp_port structure
>   * @plogi: plogi payload
> 
> --

	Cheers,
	Joe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload
  2009-10-13 16:51   ` Joe Eykholt
@ 2009-10-13 18:09     ` James Smart
  2009-10-14  8:42     ` Christof Schmitt
  1 sibling, 0 replies; 10+ messages in thread
From: James Smart @ 2009-10-13 18:09 UTC (permalink / raw)
  To: Joe Eykholt
  Cc: Christof Schmitt, James Bottomley, linux-scsi@vger.kernel.org,
	linux-s390@vger.kernel.org, schwidefsky@de.ibm.com,
	heiko.carstens@de.ibm.com



Joe Eykholt wrote:
> Christof Schmitt wrote:
>   
>> From: Christof Schmitt <christof.schmitt@de.ibm.com>
>>
>> For ports, zfcp gets the DID from the FC nameserver and tries to open
>> the port. If the open succeeds, zfcp compares the WWPN from the
>> nameserver with the WWPN in the PLOGI payload. In case of a mismatch,
>> zfcp assumes that the DID of the port just changed and we opened the
>> wrong port. This means that zfcp has to forget the DID, lookup the DID
>> again and retry.
>>     
>
> Does this happen very often?  Would you get an RSCN if a target's WWPN changed?
>   
You should - as it is supposed to re-login with the fabric, thus the 
fabric login will update the nameserver, and nameserver will generate 
the RSCN. But, there's no guarantee as to how quickly you will see the 
RSCN.   If you were on FC-AL, this happens all the time and never w/ an 
RSCN.

> Just wondering how this happens and whether libfc needs similar defenses.
> Is it due to a broken target?
>   

Libfc needs a similar defense.  You may not need to compare against the 
"nameserver" values, but you had better compare against same N_Port_ID, 
different WWPN. or vice-versa.   Its common to have RSCN events delayed, 
allowing discovery to occur in between events that may be outstanding 
(e.g. tgt plugs in - RSCN-1 event, tgt moves to another port (user error 
-kicks out and replugs) or an array failover such that WWPN moves to new 
switch port/N_Port_ID - RSCN-2 event;  and your ADISC/PLOGI based on 
RSCN-1 arrives and is responded to sooner than the nameserver propagates 
its updates and you receive RSCN-2)

-- james s


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload
  2009-10-13 16:51   ` Joe Eykholt
  2009-10-13 18:09     ` James Smart
@ 2009-10-14  8:42     ` Christof Schmitt
  1 sibling, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-14  8:42 UTC (permalink / raw)
  To: Joe Eykholt
  Cc: James Bottomley, linux-scsi, linux-s390, schwidefsky,
	heiko.carstens

On Tue, Oct 13, 2009 at 09:51:39AM -0700, Joe Eykholt wrote:
>> For ports, zfcp gets the DID from the FC nameserver and tries to open
>> the port. If the open succeeds, zfcp compares the WWPN from the
>> nameserver with the WWPN in the PLOGI payload. In case of a mismatch,
>> zfcp assumes that the DID of the port just changed and we opened the
>> wrong port. This means that zfcp has to forget the DID, lookup the DID
>> again and retry.
>
> Does this happen very often?  Would you get an RSCN if a target's WWPN changed?
> Just wondering how this happens and whether libfc needs similar defenses.
> Is it due to a broken target?

zfcp first gets all DIDs from the nameserver and then triggers to
login to all ports. A change could happen at any time. With the RSCNs
being asynchronous to the login, we have to check if the port is
really the one we wanted to login to. James S. pointed out some more
details.

In this particular case i am seeing in out test environment, the
storage target always returns a wrong WWPN. I would recommend that any
check being done also considers targets that return wrong data to
avoid endless loops and such.

> See below for a minor typo.

[...]

>> diff -urpN linux-2.6/drivers/s390/scsi/zfcp_fc.c linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c
>> --- linux-2.6/drivers/s390/scsi/zfcp_fc.c	2009-10-07 10:16:33.000000000 +0200
>> +++ linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c	2009-10-07 10:17:01.000000000 +0200
>> @@ -361,6 +361,17 @@ out:
>>  }
>>   /**
>> + * zfcp_fc_trigger_did_ookup - trigger the d_id lookup using a GPN_FT request
>
> s/ook/look/
>
> I just happened to notice that.  I didn't review the rest.

Thanks for catching this. And looking at the line, it is supposed to
read GID_PN and not GPN_FT.

I will resend the patch with the updated comment.

--
Christof

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [updated patch (fix comment)][patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload
  2009-10-13  8:44 ` [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload Christof Schmitt
  2009-10-13 16:51   ` Joe Eykholt
@ 2009-10-14  9:00   ` Christof Schmitt
  1 sibling, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2009-10-14  9:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens

From: Christof Schmitt <christof.schmitt@de.ibm.com>

For ports, zfcp gets the DID from the FC nameserver and tries to open
the port. If the open succeeds, zfcp compares the WWPN from the
nameserver with the WWPN in the PLOGI payload. In case of a mismatch,
zfcp assumes that the DID of the port just changed and we opened the
wrong port. This means that zfcp has to forget the DID, lookup the DID
again and retry.

This error case had a problem that zfcp forgets the DID, but never
looks up a new one, stalling the ERP in this case. Fix this by
triggering the DID lookup and properly exit from the ERP. The DID
lookup will trigger a new ERP action.

Also ensure when trying to open the port again with the new DID, first
close the open port, even in the NOESC case.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_erp.c |   22 ++++++++++------------
 drivers/s390/scsi/zfcp_ext.h |    1 +
 drivers/s390/scsi/zfcp_fc.c  |   11 +++++++++++
 3 files changed, 22 insertions(+), 12 deletions(-)

diff -urpN linux-2.6/drivers/s390/scsi/zfcp_erp.c linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c
--- linux-2.6/drivers/s390/scsi/zfcp_erp.c	2009-10-14 10:41:56.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c	2009-10-14 10:42:20.000000000 +0200
@@ -858,10 +858,7 @@ static int zfcp_erp_port_strategy_open_c
 		if (fc_host_port_type(adapter->scsi_host) == FC_PORTTYPE_PTP)
 			return zfcp_erp_open_ptp_port(act);
 		if (!port->d_id) {
-			zfcp_port_get(port);
-			if (!queue_work(adapter->work_queue,
-					&port->gid_pn_work))
-				zfcp_port_put(port);
+			zfcp_fc_trigger_did_lookup(port);
 			return ZFCP_ERP_EXIT;
 		}
 		return zfcp_erp_port_strategy_open_port(act);
@@ -869,12 +866,11 @@ static int zfcp_erp_port_strategy_open_c
 	case ZFCP_ERP_STEP_PORT_OPENING:
 		/* D_ID might have changed during open */
 		if (p_status & ZFCP_STATUS_COMMON_OPEN) {
-			if (port->d_id)
-				return ZFCP_ERP_SUCCEEDED;
-			else {
-				act->step = ZFCP_ERP_STEP_PORT_CLOSING;
-				return ZFCP_ERP_CONTINUES;
+			if (!port->d_id) {
+				zfcp_fc_trigger_did_lookup(port);
+				return ZFCP_ERP_EXIT;
 			}
+			return ZFCP_ERP_SUCCEEDED;
 		}
 		if (port->d_id && !(p_status & ZFCP_STATUS_COMMON_NOESC)) {
 			port->d_id = 0;
@@ -889,19 +885,21 @@ static int zfcp_erp_port_strategy_open_c
 static int zfcp_erp_port_strategy(struct zfcp_erp_action *erp_action)
 {
 	struct zfcp_port *port = erp_action->port;
+	int p_status = atomic_read(&port->status);
 
-	if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_NOESC)
+	if ((p_status & ZFCP_STATUS_COMMON_NOESC) &&
+	    !(p_status & ZFCP_STATUS_COMMON_OPEN))
 		goto close_init_done;
 
 	switch (erp_action->step) {
 	case ZFCP_ERP_STEP_UNINITIALIZED:
 		zfcp_erp_port_strategy_clearstati(port);
-		if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)
+		if (p_status & ZFCP_STATUS_COMMON_OPEN)
 			return zfcp_erp_port_strategy_close(erp_action);
 		break;
 
 	case ZFCP_ERP_STEP_PORT_CLOSING:
-		if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)
+		if (p_status & ZFCP_STATUS_COMMON_OPEN)
 			return ZFCP_ERP_FAILED;
 		break;
 	}
diff -urpN linux-2.6/drivers/s390/scsi/zfcp_ext.h linux-2.6-patched/drivers/s390/scsi/zfcp_ext.h
--- linux-2.6/drivers/s390/scsi/zfcp_ext.h	2009-10-14 10:41:56.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_ext.h	2009-10-14 10:42:20.000000000 +0200
@@ -96,6 +96,7 @@ extern int zfcp_fc_scan_ports(struct zfc
 extern void _zfcp_fc_scan_ports_later(struct work_struct *);
 extern void zfcp_fc_incoming_els(struct zfcp_fsf_req *);
 extern void zfcp_fc_port_did_lookup(struct work_struct *);
+extern void zfcp_fc_trigger_did_lookup(struct zfcp_port *);
 extern void zfcp_fc_plogi_evaluate(struct zfcp_port *, struct fsf_plogi *);
 extern void zfcp_fc_test_link(struct zfcp_port *);
 extern void zfcp_fc_link_test_work(struct work_struct *);
diff -urpN linux-2.6/drivers/s390/scsi/zfcp_fc.c linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c
--- linux-2.6/drivers/s390/scsi/zfcp_fc.c	2009-10-14 10:41:56.000000000 +0200
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_fc.c	2009-10-14 10:42:20.000000000 +0200
@@ -361,6 +361,17 @@ out:
 }
 
 /**
+ * zfcp_fc_trigger_did_lookup - trigger the d_id lookup using a GID_PN request
+ * @port: The zfcp_port to lookup the d_id for.
+ */
+void zfcp_fc_trigger_did_lookup(struct zfcp_port *port)
+{
+	zfcp_port_get(port);
+	if (!queue_work(port->adapter->work_queue, &port->gid_pn_work))
+		zfcp_port_put(port);
+}
+
+/**
  * zfcp_fc_plogi_evaluate - evaluate PLOGI playload
  * @port: zfcp_port structure
  * @plogi: plogi payload

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-10-14  9:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-13  8:44 [patch 0/5] zfcp fixes for 2.6.32-rc4 Christof Schmitt
2009-10-13  8:44 ` [patch 1/5] zfcp: fix kfree handling in zfcp_init_device_setup Christof Schmitt
2009-10-13  8:44 ` [patch 2/5] zfcp: Handle WWPN mismatch in PLOGI payload Christof Schmitt
2009-10-13 16:51   ` Joe Eykholt
2009-10-13 18:09     ` James Smart
2009-10-14  8:42     ` Christof Schmitt
2009-10-14  9:00   ` [updated patch (fix comment)][patch " Christof Schmitt
2009-10-13  8:44 ` [patch 3/5] zfcp: Warn about storage devices with broken PLOGI data Christof Schmitt
2009-10-13  8:44 ` [patch 4/5] zfcp: Fix timer initialization for ct and els requests Christof Schmitt
2009-10-13  8:44 ` [patch 5/5] zfcp: Flush SCSI registration work when adding unit Christof Schmitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).