From mboxrd@z Thu Jan  1 00:00:00 1970
From: hch@infradead.org (Christoph Hellwig)
Date: Tue, 26 Jan 2016 08:53:34 -0800
Subject: [PATCH 1/2] NVMe: Make surprise removal work again
In-Reply-To: <1453757017-13640-1-git-send-email-keith.busch@intel.com>
References: <1453757017-13640-1-git-send-email-keith.busch@intel.com>
Message-ID: <20160126165334.GB609@infradead.org>

> +static void __nvme_start_queue_locked(struct nvme_ns *ns)
> +{
> +	queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
> +	blk_mq_start_stopped_hw_queues(ns->queue, true);
> +	blk_mq_kick_requeue_list(ns->queue);
> +}

This almost screams for a block level helper.

> +
>  static void nvme_ns_remove(struct nvme_ns *ns)
>  {
>  	bool kill = nvme_io_incapable(ns->ctrl) &&
> @@ -1187,15 +1194,20 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>  
>  	if (kill) {
>  		blk_set_queue_dying(ns->queue);
> +		mb();

Please always comment memory barriers, I can't really make any sense
of this one.

>  		/*
>  		 * The controller was shutdown first if we got here through
>  		 * device removal. The shutdown may requeue outstanding
>  		 * requests. These need to be aborted immediately so
>  		 * del_gendisk doesn't block indefinitely for their completion.
> +		 * The queue needs to be restarted to let pending requests
> +		 * fail.
>  		 */
>  		blk_mq_abort_requeue_list(ns->queue);
> +		__nvme_start_queue_locked(ns);

Also why do we need a kick of the requeue list above if we just aborted
all requests on it?

> index 72ef832..bdf148e 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -640,6 +640,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct nvme_command cmnd;
>  	int ret = BLK_MQ_RQ_QUEUE_OK;
>  
> +	if (unlikely(blk_queue_dying(req->q))) {
> +		blk_mq_end_request(req, -EIO);
> +		return BLK_MQ_RQ_QUEUE_OK;
> +	}

Seems like this is something blk-mq should be doing.