public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <heiko.carstens@de.ibm.com>
To: James Smart <James.Smart@Emulex.Com>
Cc: Swen Schillig <swen@vnet.ibm.com>,
	James Bottomley <James.Bottomley@steeleye.com>,
	linux-scsi@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH 2/5] zfcp: Fix deadlock between zfcp ERP and SCSI
Date: Thu, 10 May 2007 09:49:02 +0200	[thread overview]
Message-ID: <20070510074902.GA7893@osiris.boeblingen.de.ibm.com> (raw)
In-Reply-To: <4640907B.3080409@emulex.com>

On Tue, May 08, 2007 at 11:00:11AM -0400, James Smart wrote:
> Curious why you are calling scsi_scan_target() or
> scsi_target_block()/scsi_target_unblock() directly. I would have
> thought the add/remove rport code would have done this for you,
> and it deals with all the flush conditions, etc.
> 
> -- james
> 
> Swen Schillig wrote:
> >From: Christof Schmitt <christof.schmitt@de.ibm.com>
> >The SCSI stack requires low level drivers to register and
> >unregister devices. For zfcp this leads to the situation where
> >zfcp calls the SCSI stack, the SCSI tries to scan the new device
> >and the scan SCSI command fails. This would require the zfcp erp,
> >but the erp thread is already blocked in the register call.
> >The fix is to make sure that the calls from the ERP thread to
> >the SCSI stack do not block the ERP thread. In detail:
> >1) Use a workqueue to avoid blocking of the scsi_scan_target calls.
> >2) When removing a unit make sure that no scsi_scan_target call is
> >   pending.
> >3) Replace scsi_flush_work with scsi_target_unblock. This avoids
> >   blocking and has the same result.

Reading the patch again, I think there is still a race:

> >+static void zfcp_erp_scsi_scan(struct work_struct *work)
> >+{
> >+	struct zfcp_erp_add_work *p =
> >+		container_of(work, struct zfcp_erp_add_work, work);
> >+	struct zfcp_unit *unit = p->unit;
> >+	struct fc_rport *rport = unit->port->rport;
> >+	scsi_scan_target(&rport->dev, 0, rport->scsi_target_id,
> >+			 unit->scsi_lun, 0);
> >+	atomic_clear_mask(ZFCP_STATUS_UNIT_SCSI_WORK_PENDING, &unit->status);
> >+	wake_up(&unit->scsi_scan_wq);
> >+	zfcp_unit_put(unit);
> >+	kfree(p);

This function gets executed via schedule_work() and therefore there is
nothing that prevents the rport to go. E.g. the following might happen:
function gets enqueued for execution via schedule_work(), adapter gets
shut down, call to fc_remort_port_delete(), port->rport gets deleted,
then zfcp_erp_scsi_scan() gets called and tries to derefence port->rport
which is NULL. Addressing exception is the result.

The patch below should fix that, but it changes the call from
scsi_scan_target() to scsi_add_device(). I think that should be ok.

Not-yet-signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 drivers/s390/scsi/zfcp_def.h |    1 +
 drivers/s390/scsi/zfcp_erp.c |   37 +++++++++++++++++++------------------
 2 files changed, 20 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/s390/scsi/zfcp_def.h
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_def.h
+++ linux-2.6/drivers/s390/scsi/zfcp_def.h
@@ -961,6 +961,7 @@ struct zfcp_port {
         atomic_t               erp_counter;
 	u32                    maxframe_size;
 	u32                    supported_classes;
+	unsigned int	       scsi_target_id;
 };
 
 /* the struct device sysfs_device must be at the beginning of this structure.
Index: linux-2.6/drivers/s390/scsi/zfcp_erp.c
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_erp.c
+++ linux-2.6/drivers/s390/scsi/zfcp_erp.c
@@ -1591,7 +1591,7 @@ zfcp_erp_strategy_check_adapter(struct z
 }
 
 struct zfcp_erp_add_work {
-	struct zfcp_unit  *unit;
+	struct zfcp_unit *unit;
 	struct work_struct work;
 };
 
@@ -1603,15 +1603,14 @@ struct zfcp_erp_add_work {
  */
 static void zfcp_erp_scsi_scan(struct work_struct *work)
 {
-	struct zfcp_erp_add_work *p =
-		container_of(work, struct zfcp_erp_add_work, work);
-	struct zfcp_unit *unit = p->unit;
-	struct fc_rport *rport = unit->port->rport;
-	scsi_scan_target(&rport->dev, 0, rport->scsi_target_id,
-			 unit->scsi_lun, 0);
-	atomic_clear_mask(ZFCP_STATUS_UNIT_SCSI_WORK_PENDING, &unit->status);
-	wake_up(&unit->scsi_scan_wq);
-	zfcp_unit_put(unit);
+	struct zfcp_erp_add_work *p;
+
+	p = container_of(work, struct zfcp_erp_add_work, work);
+	scsi_add_device(p->unit->port->adapter->scsi_host, 0,
+			p->unit->port->scsi_target_id, p->unit->scsi_lun);
+	atomic_clear_mask(ZFCP_STATUS_UNIT_SCSI_WORK_PENDING, &p->unit->status);
+	wake_up(&p->unit->scsi_scan_wq);
+	zfcp_unit_put(p->unit);
 	kfree(p);
 }
 
@@ -3161,25 +3160,27 @@ zfcp_erp_action_cleanup(int action, stru
 			break;
 		}
 
-		if ((result == ZFCP_ERP_SUCCEEDED)
-		    && !port->rport) {
+		if ((result == ZFCP_ERP_SUCCEEDED) && !port->rport) {
 			struct fc_rport_identifiers ids;
+			struct fc_rport *rport;
+
 			ids.node_name = port->wwnn;
 			ids.port_name = port->wwpn;
 			ids.port_id = port->d_id;
 			ids.roles = FC_RPORT_ROLE_FCP_TARGET;
-			port->rport =
-				fc_remote_port_add(adapter->scsi_host, 0, &ids);
-			if (!port->rport)
+			rport = fc_remote_port_add(adapter->scsi_host, 0, &ids);
+			if (!rport)
 				ZFCP_LOG_NORMAL("failed registration of rport"
 						"(adapter %s, wwpn=0x%016Lx)\n",
 						zfcp_get_busid_by_port(port),
 						port->wwpn);
 			else {
-				scsi_target_unblock(&port->rport->dev);
-				port->rport->maxframe_size = port->maxframe_size;
-				port->rport->supported_classes =
+				port->rport = rport;
+				port->scsi_target_id = rport->scsi_target_id;
+				rport->maxframe_size = port->maxframe_size;
+				rport->supported_classes =
 					port->supported_classes;
+				scsi_target_unblock(&port->rport->dev);
 			}
 		}
 		if ((result != ZFCP_ERP_SUCCEEDED) && port->rport) {

      parent reply	other threads:[~2007-05-10  7:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-08  9:16 [PATCH 2/5] zfcp: Fix deadlock between zfcp ERP and SCSI Swen Schillig
2007-05-08 15:00 ` James Smart
2007-05-08 15:32   ` Christof Schmitt
2007-05-10  7:49   ` Heiko Carstens [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070510074902.GA7893@osiris.boeblingen.de.ibm.com \
    --to=heiko.carstens@de.ibm.com \
    --cc=James.Bottomley@steeleye.com \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=swen@vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox