Re: [PATCH] fc transport: resolve scan vs delete deadlocks

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Reed <mdr@sgi.com>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: James.Smart@Emulex.Com, linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] fc transport: resolve scan vs delete deadlocks
Date: Mon, 12 Jun 2006 12:50:20 -0500	[thread overview]
Message-ID: <448DA95C.70802@sgi.com> (raw)
In-Reply-To: <1147368429.18517.3.camel@localhost.localdomain>

Will this patch be integrated into scsi-misc or other tree at some point?

Thanks,
 Mike


James Smart wrote:
> In a prior posting to linux-scsi on the fc transport and workq
> deadlocks, we noted a second error that did not have a patch:
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=114467847711383&w=2
>   - There's a deadlock where scsi_remove_target() has to sit behind 
>     scsi_scan_target() due to contention over the scan_lock().
> 
> Subsequently we posted a request for comments about the deadlock:
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=114469358829500&w=2
> 
> This posting resolves the second error. Here's what we now understand,
> and are implementing:
> 
>   If the lldd deletes the rport while a scan is active, the sdev's queue
>   is blocked which stops the issuing of commands associated with the scan.
>   At this point, the scan stalls, and does so with the shost->scan_mutex held.
>   If, at this point, if any scan or delete request is made on the host, it
>   will stall waiting for the scan_mutex.
>   
>   For the FC transport, we queue all delete work to a single workq.
>   So, things worked fine when competing with the scan, as long as the
>   target blocking the scan was the same target at the top of our delete
>   workq, as the delete workq routine always unblocked just prior to
>   requesting the delete.  Unfortunately, if the top of our delete workq
>   was for a different target, we deadlock.  Additionally, if the target
>   blocking scan returned, we were unblocking it in the scan workq routine,
>   which really won't execute until the existing stalled scan workq
>   completes (e.g. we're re-scheduling it while it is in the midst of its
>   execution).
> 
>   This patch moves the unblock out of the workq routines and moves it to
>   the context that is scheduling the work. This ensures that at some point,
>   we will unblock the target that is blocking scan.  Please note, however,
>   that the deadlock condition may still occur while it waits for the
>   transport to timeout an unblock on a target.  Worst case, this is bounded
>   by the transport dev_loss_tmo (default: 30 seconds).
> 
> 
> Finally, Michael Reed deserves the credit for the bulk of this patch,
> analysis, and it's testing. Thank you for your help.
> 
> -- james s
> 
> Note: this patch supercedes a previous preliminary post by Michael Reed
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=114675967824701&w=2
> 
> Note: The request for comments statements about the gross-ness of the
>   scan_mutex still stand.
> 
> 
> 
> Signed-off-by: Michael Reed <mdr@sgi.com>
> Signed-off-by: James Smart <james.smart@emulex.com>
> 
> 
> diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> --- a/drivers/scsi/scsi_transport_fc.c	2006-05-11 10:43:31.000000000 -0400
> +++ b/drivers/scsi/scsi_transport_fc.c	2006-05-11 12:04:36.000000000 -0400
> @@ -1284,7 +1284,9 @@ EXPORT_SYMBOL(fc_release_transport);
>   * @work:	Work to queue for execution.
>   *
>   * Return value:
> - * 	0 on success / != 0 for error
> + * 	1 - work queued for execution
> + *	0 - work is already queued
> + *	-EINVAL - work queue doesn't exist
>   **/
>  static int
>  fc_queue_work(struct Scsi_Host *shost, struct work_struct *work)
> @@ -1434,8 +1436,6 @@ fc_starget_delete(void *data)
>  	struct Scsi_Host *shost = rport_to_shost(rport);
>  	unsigned long flags;
>  
> -	scsi_target_unblock(&rport->dev);
> -
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	if (rport->flags & FC_RPORT_DEVLOSS_PENDING) {
>  		spin_unlock_irqrestore(shost->host_lock, flags);
> @@ -1707,6 +1707,8 @@ fc_remote_port_add(struct Scsi_Host *sho
>  
>  				spin_unlock_irqrestore(shost->host_lock, flags);
>  
> +				scsi_target_unblock(&rport->dev);
> +
>  				return rport;
>  			}
>  		}
> @@ -1762,9 +1764,10 @@ fc_remote_port_add(struct Scsi_Host *sho
>  				/* initiate a scan of the target */
>  				rport->flags |= FC_RPORT_SCAN_PENDING;
>  				scsi_queue_work(shost, &rport->scan_work);
> -			}
> -
> -			spin_unlock_irqrestore(shost->host_lock, flags);
> +				spin_unlock_irqrestore(shost->host_lock, flags);
> +				scsi_target_unblock(&rport->dev);
> +			} else
> +				spin_unlock_irqrestore(shost->host_lock, flags);
>  
>  			return rport;
>  		}
> @@ -1938,6 +1941,7 @@ fc_remote_port_rolechg(struct fc_rport  
>  		rport->flags |= FC_RPORT_SCAN_PENDING;
>  		scsi_queue_work(shost, &rport->scan_work);
>  		spin_unlock_irqrestore(shost->host_lock, flags);
> +		scsi_target_unblock(&rport->dev);
>  	}
>  }
>  EXPORT_SYMBOL(fc_remote_port_rolechg);
> @@ -1970,8 +1974,9 @@ fc_timeout_deleted_rport(void  *data)
>  		dev_printk(KERN_ERR, &rport->dev,
>  			"blocked FC remote port time out: no longer"
>  			" a FCP target, removing starget\n");
> -		fc_queue_work(shost, &rport->stgt_delete_work);
>  		spin_unlock_irqrestore(shost->host_lock, flags);
> +		scsi_target_unblock(&rport->dev);
> +		fc_queue_work(shost, &rport->stgt_delete_work);
>  		return;
>  	}
>  
> @@ -2035,17 +2040,15 @@ fc_timeout_deleted_rport(void  *data)
>  	 * went away and didn't come back - we'll remove
>  	 * all attached scsi devices.
>  	 */
> -	fc_queue_work(shost, &rport->stgt_delete_work);
> -
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> +
> +	scsi_target_unblock(&rport->dev);
> +	fc_queue_work(shost, &rport->stgt_delete_work);
>  }
>  
>  /**
>   * fc_scsi_scan_rport - called to perform a scsi scan on a remote port.
>   *
> - * Will unblock the target (in case it went away and has now come back),
> - * then invoke a scan.
> - *
>   * @data:	remote port to be scanned.
>   **/
>  static void
> @@ -2057,7 +2060,6 @@ fc_scsi_scan_rport(void *data)
>  
>  	if ((rport->port_state == FC_PORTSTATE_ONLINE) &&
>  	    (rport->roles & FC_RPORT_ROLE_FCP_TARGET)) {
> -		scsi_target_unblock(&rport->dev);
>  		scsi_scan_target(&rport->dev, rport->channel,
>  			rport->scsi_target_id, SCAN_WILD_CARD, 1);
>  	}
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

     prev parent reply	other threads:[~2006-06-12 17:50 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-11 17:27 [PATCH] fc transport: resolve scan vs delete deadlocks James Smart
2006-06-12 17:50 ` Michael Reed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=448DA95C.70802@sgi.com \
    --to=mdr@sgi.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox