* [PATCH v2 1/3] nvme-loop: flush off pending I/O while shutting down loop controller
2024-10-08 4:13 [PATCH v2 0/3] nvme: system fault while shutting down fabric controller Nilay Shroff
@ 2024-10-08 4:13 ` Nilay Shroff
2024-10-08 4:13 ` [PATCH v2 2/3] nvme: make keep-alive synchronous operation Nilay Shroff
2024-10-08 4:13 ` [PATCH v2 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function Nilay Shroff
2 siblings, 0 replies; 6+ messages in thread
From: Nilay Shroff @ 2024-10-08 4:13 UTC (permalink / raw)
To: linux-nvme; +Cc: kbusch, hch, sagi, axboe, chaitanyak, gjoyce, Nilay Shroff
While shutting down loop controller, we first quiesce the admin/IO queue,
delete the admin/IO tag-set and then at last destroy the admin/IO queue.
However it's quite possible that during the window between quiescing and
destroying of the admin/IO queue, some admin/IO request might sneak in
and if that happens then we could potentially encounter a hung task
because shutdown operation can't forward progress until any pending I/O
is flushed off.
This commit helps ensure that before destroying the admin/IO queue, we
unquiesce the admin/IO queue so that any outstanding requests, which are
added after the admin/IO queue is quiesced, are now flushed to its
completion.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/target/loop.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index e32790d8fc26..a9d112d34d4f 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -265,6 +265,13 @@ static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
{
if (!test_and_clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[0].flags))
return;
+ /*
+ * It's possible that some requests might have been added
+ * after admin queue is stopped/quiesced. So now start the
+ * queue to flush these requests to the completion.
+ */
+ nvme_unquiesce_admin_queue(&ctrl->ctrl);
+
nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
nvme_remove_admin_tag_set(&ctrl->ctrl);
}
@@ -297,6 +304,12 @@ static void nvme_loop_destroy_io_queues(struct nvme_loop_ctrl *ctrl)
nvmet_sq_destroy(&ctrl->queues[i].nvme_sq);
}
ctrl->ctrl.queue_count = 1;
+ /*
+ * It's possible that some requests might have been added
+ * after io queue is stopped/quiesced. So now start the
+ * queue to flush these requests to the completion.
+ */
+ nvme_unquiesce_io_queues(&ctrl->ctrl);
}
static int nvme_loop_init_io_queues(struct nvme_loop_ctrl *ctrl)
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 2/3] nvme: make keep-alive synchronous operation
2024-10-08 4:13 [PATCH v2 0/3] nvme: system fault while shutting down fabric controller Nilay Shroff
2024-10-08 4:13 ` [PATCH v2 1/3] nvme-loop: flush off pending I/O while shutting down loop controller Nilay Shroff
@ 2024-10-08 4:13 ` Nilay Shroff
2024-10-08 4:13 ` [PATCH v2 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function Nilay Shroff
2 siblings, 0 replies; 6+ messages in thread
From: Nilay Shroff @ 2024-10-08 4:13 UTC (permalink / raw)
To: linux-nvme; +Cc: kbusch, hch, sagi, axboe, chaitanyak, gjoyce, Nilay Shroff
The nvme keep-alive operation, which executes at a periodic interval,
could potentially sneak in while shutting down a fabric controller.
This may lead to a race between the fabric controller admin queue
destroy code path (while shutting down controller) and the blk-mq
hw/hctx queuing from the keep-alive thread.
This fix helps avoid race by implementing keep-alive as a synchronous
operation so that admin queue-usage ref counter is decremented only
after keep-alive command finish execution and returns its status. This
would ensure that we don't inadvertently destroy the fabric admin queue
until we finish processing of nvme keep-alive request and its status and
hence it's safe to delete the queue.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 02897f0564a3..736adbf65ef5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1292,10 +1292,10 @@ static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
}
-static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
- blk_status_t status)
+static void nvme_keep_alive_finish(struct request *rq,
+ blk_status_t status,
+ struct nvme_ctrl *ctrl)
{
- struct nvme_ctrl *ctrl = rq->end_io_data;
unsigned long flags;
bool startka = false;
unsigned long rtt = jiffies - (rq->deadline - rq->timeout);
@@ -1313,13 +1313,11 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
delay = 0;
}
- blk_mq_free_request(rq);
-
if (status) {
dev_err(ctrl->device,
"failed nvme_keep_alive_end_io error=%d\n",
status);
- return RQ_END_IO_NONE;
+ return;
}
ctrl->ka_last_check_time = jiffies;
@@ -1331,7 +1329,6 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
spin_unlock_irqrestore(&ctrl->lock, flags);
if (startka)
queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
- return RQ_END_IO_NONE;
}
static void nvme_keep_alive_work(struct work_struct *work)
@@ -1340,6 +1337,7 @@ static void nvme_keep_alive_work(struct work_struct *work)
struct nvme_ctrl, ka_work);
bool comp_seen = ctrl->comp_seen;
struct request *rq;
+ blk_status_t status;
ctrl->ka_last_check_time = jiffies;
@@ -1362,9 +1360,9 @@ static void nvme_keep_alive_work(struct work_struct *work)
nvme_init_request(rq, &ctrl->ka_cmd);
rq->timeout = ctrl->kato * HZ;
- rq->end_io = nvme_keep_alive_end_io;
- rq->end_io_data = ctrl;
- blk_execute_rq_nowait(rq, false);
+ status = blk_execute_rq(rq, false);
+ nvme_keep_alive_finish(rq, status, ctrl);
+ blk_mq_free_request(rq);
}
static void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
2024-10-08 4:13 [PATCH v2 0/3] nvme: system fault while shutting down fabric controller Nilay Shroff
2024-10-08 4:13 ` [PATCH v2 1/3] nvme-loop: flush off pending I/O while shutting down loop controller Nilay Shroff
2024-10-08 4:13 ` [PATCH v2 2/3] nvme: make keep-alive synchronous operation Nilay Shroff
@ 2024-10-08 4:13 ` Nilay Shroff
2024-10-08 5:19 ` Damien Le Moal
2 siblings, 1 reply; 6+ messages in thread
From: Nilay Shroff @ 2024-10-08 4:13 UTC (permalink / raw)
To: linux-nvme; +Cc: kbusch, hch, sagi, axboe, chaitanyak, gjoyce, Nilay Shroff
We no more need acquiring ctrl->lock for accessing the NVMe controller
state and instead we can now use the helper nvme_ctrl_state. So replace
the ctrl->lock from nvme_keep_alive_finish function with nvme_ctrl_state
call.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 736adbf65ef5..5a690cf16e5e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1296,10 +1296,10 @@ static void nvme_keep_alive_finish(struct request *rq,
blk_status_t status,
struct nvme_ctrl *ctrl)
{
- unsigned long flags;
bool startka = false;
unsigned long rtt = jiffies - (rq->deadline - rq->timeout);
unsigned long delay = nvme_keep_alive_work_period(ctrl);
+ enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
/*
* Subtract off the keepalive RTT so nvme_keep_alive_work runs
@@ -1322,11 +1322,8 @@ static void nvme_keep_alive_finish(struct request *rq,
ctrl->ka_last_check_time = jiffies;
ctrl->comp_seen = false;
- spin_lock_irqsave(&ctrl->lock, flags);
- if (ctrl->state == NVME_CTRL_LIVE ||
- ctrl->state == NVME_CTRL_CONNECTING)
+ if (state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING)
startka = true;
- spin_unlock_irqrestore(&ctrl->lock, flags);
if (startka)
queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v2 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
2024-10-08 4:13 ` [PATCH v2 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function Nilay Shroff
@ 2024-10-08 5:19 ` Damien Le Moal
2024-10-08 5:36 ` Nilay Shroff
0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2024-10-08 5:19 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme; +Cc: kbusch, hch, sagi, axboe, chaitanyak, gjoyce
On 10/8/24 13:13, Nilay Shroff wrote:
> We no more need acquiring ctrl->lock for accessing the NVMe controller
> state and instead we can now use the helper nvme_ctrl_state. So replace
> the ctrl->lock from nvme_keep_alive_finish function with nvme_ctrl_state
> call.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/nvme/host/core.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 736adbf65ef5..5a690cf16e5e 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1296,10 +1296,10 @@ static void nvme_keep_alive_finish(struct request *rq,
> blk_status_t status,
> struct nvme_ctrl *ctrl)
> {
> - unsigned long flags;
> bool startka = false;
> unsigned long rtt = jiffies - (rq->deadline - rq->timeout);
> unsigned long delay = nvme_keep_alive_work_period(ctrl);
> + enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
>
> /*
> * Subtract off the keepalive RTT so nvme_keep_alive_work runs
> @@ -1322,11 +1322,8 @@ static void nvme_keep_alive_finish(struct request *rq,
>
> ctrl->ka_last_check_time = jiffies;
> ctrl->comp_seen = false;
> - spin_lock_irqsave(&ctrl->lock, flags);
> - if (ctrl->state == NVME_CTRL_LIVE ||
> - ctrl->state == NVME_CTRL_CONNECTING)
> + if (state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING)
> startka = true;
startka = state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING;
But do you even need that variable now ? The below "if (startka)" could be
replaced with "if (state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING)", no ?
> - spin_unlock_irqrestore(&ctrl->lock, flags);
> if (startka)
> queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
> }
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v2 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
2024-10-08 5:19 ` Damien Le Moal
@ 2024-10-08 5:36 ` Nilay Shroff
0 siblings, 0 replies; 6+ messages in thread
From: Nilay Shroff @ 2024-10-08 5:36 UTC (permalink / raw)
To: Damien Le Moal, linux-nvme; +Cc: kbusch, hch, sagi, axboe, chaitanyak, gjoyce
On 10/8/24 10:49, Damien Le Moal wrote:
> On 10/8/24 13:13, Nilay Shroff wrote:
>> We no more need acquiring ctrl->lock for accessing the NVMe controller
>> state and instead we can now use the helper nvme_ctrl_state. So replace
>> the ctrl->lock from nvme_keep_alive_finish function with nvme_ctrl_state
>> call.
>>
>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>> ---
>> drivers/nvme/host/core.c | 7 ++-----
>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 736adbf65ef5..5a690cf16e5e 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1296,10 +1296,10 @@ static void nvme_keep_alive_finish(struct request *rq,
>> blk_status_t status,
>> struct nvme_ctrl *ctrl)
>> {
>> - unsigned long flags;
>> bool startka = false;
>> unsigned long rtt = jiffies - (rq->deadline - rq->timeout);
>> unsigned long delay = nvme_keep_alive_work_period(ctrl);
>> + enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
>>
>> /*
>> * Subtract off the keepalive RTT so nvme_keep_alive_work runs
>> @@ -1322,11 +1322,8 @@ static void nvme_keep_alive_finish(struct request *rq,
>>
>> ctrl->ka_last_check_time = jiffies;
>> ctrl->comp_seen = false;
>> - spin_lock_irqsave(&ctrl->lock, flags);
>> - if (ctrl->state == NVME_CTRL_LIVE ||
>> - ctrl->state == NVME_CTRL_CONNECTING)
>> + if (state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING)
>> startka = true;
>
> startka = state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING;
>
> But do you even need that variable now ? The below "if (startka)" could be
> replaced with "if (state == NVME_CTRL_LIVE || state == NVME_CTRL_CONNECTING)", no ?
>
>> - spin_unlock_irqrestore(&ctrl->lock, flags);
>> if (startka)
>> queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
>> }
>
>
Oops yeah you were correct, we don't need @startka. I will spin a new version.
Thanks for your comment!
--Nilay
^ permalink raw reply [flat|nested] 6+ messages in thread