[PATCH] nvme-tcp: Do not terminate commands when in RESETTING

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH] nvme-tcp: Do not terminate commands when in RESETTING
@ 2024-01-11  7:29 hare
  2024-01-11 12:30 ` Sagi Grimberg
  2024-01-11 16:15 ` Keith Busch
  0 siblings, 2 replies; 4+ messages in thread
From: hare @ 2024-01-11  7:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

Terminating commands from the timeout handler might lead
to a data corruption as the timeout might trigger before
KATO expired.
This is the case when several commands have been started
before the keep-alive command and the command timeouts
trigger just after the keep-alive command has been sent.
Then the first command will trigger an error recovery,
but all the other commands will be aborted directly
and immediately retried.
So return BLK_EH_RESET_TIMER for I/O commands when
error recovery has been started to ensure that the
commands will be retried only after the KATO interval.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index b234f0674aeb..b9ec121b3fc6 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2429,6 +2429,18 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
 		 rq->tag, nvme_cid(rq), pdu->hdr.type, opc,
 		 nvme_opcode_str(qid, opc, fctype), qid);
 
+	/*
+	 * If the error recovery is started we should ignore all
+	 * I/O commands as they'll be aborted once error recovery starts.
+	 * Otherwise they'll be failed over immediately and might
+	 * cause data corruption.
+	 */
+	if (ctrl->state == NVME_CTRL_RESETTING && qid > 0) {
+		/* Avoid interfering with firmware download */
+		if (!WARN_ON(work_pending(&ctrl->fw_act_work)))
+			return BLK_EH_RESET_TIMER;
+	}
+
 	if (ctrl->state != NVME_CTRL_LIVE) {
 		/*
 		 * If we are resetting, connecting or deleting we should
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme-tcp: Do not terminate commands when in RESETTING
  2024-01-11  7:29 [PATCH] nvme-tcp: Do not terminate commands when in RESETTING hare
@ 2024-01-11 12:30 ` Sagi Grimberg
  2024-01-11 14:10   ` Hannes Reinecke
  2024-01-11 16:15 ` Keith Busch
  1 sibling, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2024-01-11 12:30 UTC (permalink / raw)
  To: hare, Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Hannes Reinecke


> From: Hannes Reinecke <hare@suse.de>
> 
> Terminating commands from the timeout handler might lead
> to a data corruption as the timeout might trigger before
> KATO expired.
> This is the case when several commands have been started
> before the keep-alive command and the command timeouts
> trigger just after the keep-alive command has been sent.
> Then the first command will trigger an error recovery,
> but all the other commands will be aborted directly
> and immediately retried.
> So return BLK_EH_RESET_TIMER for I/O commands when
> error recovery has been started to ensure that the
> commands will be retried only after the KATO interval.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/nvme/host/tcp.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index b234f0674aeb..b9ec121b3fc6 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2429,6 +2429,18 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
>   		 rq->tag, nvme_cid(rq), pdu->hdr.type, opc,
>   		 nvme_opcode_str(qid, opc, fctype), qid);
>   
> +	/*
> +	 * If the error recovery is started we should ignore all
> +	 * I/O commands as they'll be aborted once error recovery starts.
> +	 * Otherwise they'll be failed over immediately and might
> +	 * cause data corruption.

I think that we can settle with a simpler comment:
	/*
	 * If the controller is in state RESETTING, the error
	 * recovery handler will cancel all inflight commands soon
	 * which in turn may failover to a different path.
	 */

The error handler will start promptly anyways.

> +	 */
> +	if (ctrl->state == NVME_CTRL_RESETTING && qid > 0) {

What would be the issue doing this for admin commands as well?

> +		/* Avoid interfering with firmware download */
> +		if (!WARN_ON(work_pending(&ctrl->fw_act_work)))

Let's ignore the fw_act_work, it's not a thing in fabrics.

> +			return BLK_EH_RESET_TIMER;
> +	}
> +
>   	if (ctrl->state != NVME_CTRL_LIVE) {
>   		/*
>   		 * If we are resetting, connecting or deleting we should

I think that the comment here needs an update as well.

BTW, what happens when deleting a controller? Won't the state be
DELETING and the situation you are trying to prevent happen as well?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme-tcp: Do not terminate commands when in RESETTING
  2024-01-11 12:30 ` Sagi Grimberg
@ 2024-01-11 14:10   ` Hannes Reinecke
  0 siblings, 0 replies; 4+ messages in thread
From: Hannes Reinecke @ 2024-01-11 14:10 UTC (permalink / raw)
  To: Sagi Grimberg, hare, Christoph Hellwig; +Cc: Keith Busch, linux-nvme

On 1/11/24 13:30, Sagi Grimberg wrote:
> 
>> From: Hannes Reinecke <hare@suse.de>
>>
>> Terminating commands from the timeout handler might lead
>> to a data corruption as the timeout might trigger before
>> KATO expired.
>> This is the case when several commands have been started
>> before the keep-alive command and the command timeouts
>> trigger just after the keep-alive command has been sent.
>> Then the first command will trigger an error recovery,
>> but all the other commands will be aborted directly
>> and immediately retried.
>> So return BLK_EH_RESET_TIMER for I/O commands when
>> error recovery has been started to ensure that the
>> commands will be retried only after the KATO interval.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/tcp.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index b234f0674aeb..b9ec121b3fc6 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -2429,6 +2429,18 @@ static enum blk_eh_timer_return 
>> nvme_tcp_timeout(struct request *rq)
>>            rq->tag, nvme_cid(rq), pdu->hdr.type, opc,
>>            nvme_opcode_str(qid, opc, fctype), qid);
>> +    /*
>> +     * If the error recovery is started we should ignore all
>> +     * I/O commands as they'll be aborted once error recovery starts.
>> +     * Otherwise they'll be failed over immediately and might
>> +     * cause data corruption.
> 
> I think that we can settle with a simpler comment:
>      /*
>       * If the controller is in state RESETTING, the error
>       * recovery handler will cancel all inflight commands soon
>       * which in turn may failover to a different path.
>       */
> 
> The error handler will start promptly anyways.
> 
Ok.

>> +     */
>> +    if (ctrl->state == NVME_CTRL_RESETTING && qid > 0) {
> 
> What would be the issue doing this for admin commands as well?
> 
There are several commands sent during the 'connect' workflow before
the queue is switched to 'LIVE', and we need to terminate them correctly
such that the 'connect' workflow can make progress (and we're not
deadlocking waiting for the commands to complete).
(Cf the comment after the 'LIVE' check later in the code)

Plus all commands on the admin queue do not change the state of the
data, so it doesn't matter if they got re-issued to a different
controller. Well, it _might_ matter, but then we'll be getting
an NVMe status back and we'll be notified in the message log.

Core issue with this patch is that we want to avoid a silent
failover of data I/O before KATO expires.

>> +        /* Avoid interfering with firmware download */
>> +        if (!WARN_ON(work_pending(&ctrl->fw_act_work)))
> 
> Let's ignore the fw_act_work, it's not a thing in fabrics.
> 
Wasn't sure, but if you say so, ok.

>> +            return BLK_EH_RESET_TIMER;
>> +    }
>> +
>>       if (ctrl->state != NVME_CTRL_LIVE) {
>>           /*
>>            * If we are resetting, connecting or deleting we should
> 
> I think that the comment here needs an update as well.
> 
Oh, I thought that I had modified that, too. But yes, of course.

> BTW, what happens when deleting a controller? Won't the state be
> DELETING and the situation you are trying to prevent happen as well?

Not sure. Haven't really tested that path, and it'll be hard to trigger
anyway (you would have to trigger controller deletion _just_ before the
I/O timeout hits).
But yeah, I could check for DELETING, too.

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme-tcp: Do not terminate commands when in RESETTING
  2024-01-11  7:29 [PATCH] nvme-tcp: Do not terminate commands when in RESETTING hare
  2024-01-11 12:30 ` Sagi Grimberg
@ 2024-01-11 16:15 ` Keith Busch
  1 sibling, 0 replies; 4+ messages in thread
From: Keith Busch @ 2024-01-11 16:15 UTC (permalink / raw)
  To: hare; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, Hannes Reinecke

On Thu, Jan 11, 2024 at 08:29:29AM +0100, hare@kernel.org wrote:
> +	if (ctrl->state == NVME_CTRL_RESETTING && qid > 0) {
> +		/* Avoid interfering with firmware download */
> +		if (!WARN_ON(work_pending(&ctrl->fw_act_work)))
> +			return BLK_EH_RESET_TIMER;
> +	}
> +
>  	if (ctrl->state != NVME_CTRL_LIVE) {

Unfortunately I submitted a fix late in 6.7 that changes all the
ctrl->state checks to use the READ_ONCE helper. Let's wait for block
tree's 6.8 to sync back to mainline, then redo this patch with those
updates included to avoid the conflicts.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-11 16:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11  7:29 [PATCH] nvme-tcp: Do not terminate commands when in RESETTING hare
2024-01-11 12:30 ` Sagi Grimberg
2024-01-11 14:10   ` Hannes Reinecke
2024-01-11 16:15 ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox