All of lore.kernel.org
 help / color / mirror / Atom feed
From: hch@lst.de (Christoph Hellwig)
Subject: [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal
Date: Thu, 1 Sep 2016 10:36:55 +0200	[thread overview]
Message-ID: <20160901083655.GD3703@lst.de> (raw)
In-Reply-To: <71557e76dd8ad082badce2793aaa24105fc57406.1472574410.git.swise@opengridcomputing.com>

On Tue, Aug 30, 2016@09:25:46AM -0700, Steve Wise wrote:
> Change nvme-rdma to use the IB Client API to detect device removal.
> This has the wonderful benefit of being able to blow away all the
> ib/rdma_cm resources for the device being removed.  No craziness about
> not destroying the cm_id handling the event.  No deadlocks due to broken
> iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
> around during controller recovery/reconnect to catch device removal
> events.
> 
> We don't use the device_add aspect of the ib_client service since we only
> want to create resources for an IB device if we have a target utilizing
> that device.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
>  drivers/nvme/host/rdma.c | 111 ++++++++++++++++++-----------------------------
>  1 file changed, 43 insertions(+), 68 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 8036c23..ecb4162 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1322,64 +1322,6 @@ out_destroy_queue_ib:
>  	return ret;
>  }
>  
> -/**
> - * nvme_rdma_device_unplug() - Handle RDMA device unplug
> - * @queue:      Queue that owns the cm_id that caught the event
> - *
> - * DEVICE_REMOVAL event notifies us that the RDMA device is about
> - * to unplug so we should take care of destroying our RDMA resources.
> - * This event will be generated for each allocated cm_id.
> - *
> - * In our case, the RDMA resources are managed per controller and not
> - * only per queue. So the way we handle this is we trigger an implicit
> - * controller deletion upon the first DEVICE_REMOVAL event we see, and
> - * hold the event inflight until the controller deletion is completed.
> - *
> - * One exception that we need to handle is the destruction of the cm_id
> - * that caught the event. Since we hold the callout until the controller
> - * deletion is completed, we'll deadlock if the controller deletion will
> - * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
> - * of destroying this queue before-hand, destroy the queue resources,
> - * then queue the controller deletion which won't destroy this queue and
> - * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
> - */
> -static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
> -{
> -	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
> -	int ret = 0;
> -
> -	/* Own the controller deletion */
> -	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
> -		return 0;
> -
> -	dev_warn(ctrl->ctrl.device,
> -		"Got rdma device removal event, deleting ctrl\n");
> -
> -	/* Get rid of reconnect work if its running */
> -	cancel_delayed_work_sync(&ctrl->reconnect_work);
> -
> -	/* Disable the queue so ctrl delete won't free it */
> -	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
> -		/* Free this queue ourselves */
> -		nvme_rdma_stop_queue(queue);
> -		nvme_rdma_destroy_queue_ib(queue);
> -
> -		/* Return non-zero so the cm_id will destroy implicitly */
> -		ret = 1;
> -	}
> -
> -	/*
> -	 * Queue controller deletion. Keep a reference until all
> -	 * work is flushed since delete_work will free the ctrl mem
> -	 */
> -	kref_get(&ctrl->ctrl.kref);
> -	queue_work(nvme_rdma_wq, &ctrl->delete_work);
> -	flush_work(&ctrl->delete_work);
> -	nvme_put_ctrl(&ctrl->ctrl);
> -
> -	return ret;
> -}
> -
>  static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
>  		struct rdma_cm_event *ev)
>  {
> @@ -1421,8 +1363,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
>  		nvme_rdma_error_recovery(queue->ctrl);
>  		break;
>  	case RDMA_CM_EVENT_DEVICE_REMOVAL:
> -		/* return 1 means impliciy CM ID destroy */
> -		return nvme_rdma_device_unplug(queue);
> +		/* device removal is handled via the ib_client API */
> +		break;
>  	default:
>  		dev_err(queue->ctrl->ctrl.device,
>  			"Unexpected RDMA CM event (%d)\n", ev->event);
> @@ -2031,27 +1973,60 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
>  	.create_ctrl	= nvme_rdma_create_ctrl,
>  };
>  
> +static void nvme_rdma_add_one(struct ib_device *ib_device)
> +{
> +	pr_info("Device %s available for use\n", ib_device->name);
> +}
> +
> +static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
> +{
> +	struct nvme_rdma_ctrl *ctrl;
> +
> +	pr_info("Removing resources for device %s\n", ib_device->name);

Can we avoid these noisy printks?

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

      reply	other threads:[~2016-09-01  8:36 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
2016-08-29 21:26 ` [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-09-01  8:34   ` Christoph Hellwig
2016-08-29 21:26 ` [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-09-01  8:35   ` Christoph Hellwig
2016-08-29 21:27 ` [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-30 17:24   ` Sagi Grimberg
2016-08-30 17:48     ` Steve Wise
2016-08-31  5:06       ` Sagi Grimberg
2016-08-31 14:29         ` Steve Wise
2016-09-01  6:40           ` Sagi Grimberg
2016-09-01  8:36     ` Christoph Hellwig
2016-09-01  9:31       ` Sagi Grimberg
2016-08-29 21:28 ` [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
2016-09-01  8:41   ` Christoph Hellwig
2016-09-01 13:34     ` Steve Wise
2016-08-29 21:28 ` [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-09-01 15:00   ` Steve Wise
2016-08-30 16:25 ` [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
2016-09-01  8:36   ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160901083655.GD3703@lst.de \
    --to=hch@lst.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.