Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
@ 2025-02-17 14:22 Meir Elisha
  2025-02-18 15:19 ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Meir Elisha @ 2025-02-17 14:22 UTC (permalink / raw)
  To: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg
  Cc: linux-nvme, Meir Elisha

The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().

Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
---
v2: Change comments to c-style

 drivers/nvme/target/tcp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 7c51c2a8c109..49ce2f9ac6c8 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
 	queue->offset = 0;
 	queue->left = sizeof(struct nvme_tcp_hdr);
 	queue->cmd = NULL;
-	queue->rcv_state = NVMET_TCP_RECV_PDU;
+	/* Ensure rcv_state is visible only after queue->cmd is set */
+	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
 }
 
 static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
@@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
 	cmd->pdu_recv = 0;
 	nvmet_tcp_build_pdu_iovec(cmd);
 	queue->cmd = cmd;
-	queue->rcv_state = NVMET_TCP_RECV_DATA;
+	/* Ensure rcv_state is visible only after queue->cmd is set */
+	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
 
 	return 0;
 
-- 
2.34.1

This ordering is critical on weakly ordered architectures (such as ARM)
so that any observer which sees the new rcv_state is guaranteed to also
see the updated cmd. Without this guarantee (i.e if the two stores were
reordered), a parallel context might see the new state while queue->cmd
still holds a stale value. This could cause the inline-data check to
return early and ultimately hang the IO.
Additionally, I reviewed the assembly code for ARM and confirmed that
the instructions were reordered(unlike x86), reinforcing the need for
this change.

This scenario was encountered during fio testing, which involved
running 2 min of 4K random writes using an ARM-based machine as the
target. We observed hanging I/O typically after 10-20 iterations.

fio config used:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/mnt/volumez/vol0
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
  2025-02-17 14:22 [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state Meir Elisha
@ 2025-02-18 15:19 ` Keith Busch
  2025-02-19 12:28   ` Meir Elisha
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2025-02-18 15:19 UTC (permalink / raw)
  To: Meir Elisha
  Cc: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg, linux-nvme

On Mon, Feb 17, 2025 at 04:22:10PM +0200, Meir Elisha wrote:
> The order in which queue->cmd and rcv_state are updated is crucial.
> If these assignments are reordered by the compiler, the worker might not
> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> the correct reordering, set rcv_state using smp_store_release().
> 
> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
> ---
> v2: Change comments to c-style
> 
>  drivers/nvme/target/tcp.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 7c51c2a8c109..49ce2f9ac6c8 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
>  	queue->offset = 0;
>  	queue->left = sizeof(struct nvme_tcp_hdr);
>  	queue->cmd = NULL;
> -	queue->rcv_state = NVMET_TCP_RECV_PDU;
> +	/* Ensure rcv_state is visible only after queue->cmd is set */
> +	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
>  }
>  
>  static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
> @@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
>  	cmd->pdu_recv = 0;
>  	nvmet_tcp_build_pdu_iovec(cmd);
>  	queue->cmd = cmd;
> -	queue->rcv_state = NVMET_TCP_RECV_DATA;
> +	/* Ensure rcv_state is visible only after queue->cmd is set */
> +	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
>  
>  	return 0;
>  
> -- 
> 2.34.1
> 
> This ordering is critical on weakly ordered architectures (such as ARM)
> so that any observer which sees the new rcv_state is guaranteed to also
> see the updated cmd. 

Something seems off if smp_store_release() isn't paired with
smp_load_acquire(). Why does the reader side not need a barrier?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
  2025-02-18 15:19 ` Keith Busch
@ 2025-02-19 12:28   ` Meir Elisha
  0 siblings, 0 replies; 3+ messages in thread
From: Meir Elisha @ 2025-02-19 12:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg, linux-nvme



On 18/02/2025 17:19, Keith Busch wrote:
> On Mon, Feb 17, 2025 at 04:22:10PM +0200, Meir Elisha wrote:
>> The order in which queue->cmd and rcv_state are updated is crucial.
>> If these assignments are reordered by the compiler, the worker might not
>> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
>> the correct reordering, set rcv_state using smp_store_release().
>>
>> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
>> ---
>> v2: Change comments to c-style
>>
>>  drivers/nvme/target/tcp.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 7c51c2a8c109..49ce2f9ac6c8 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
>>  	queue->offset = 0;
>>  	queue->left = sizeof(struct nvme_tcp_hdr);
>>  	queue->cmd = NULL;
>> -	queue->rcv_state = NVMET_TCP_RECV_PDU;
>> +	/* Ensure rcv_state is visible only after queue->cmd is set */
>> +	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
>>  }
>>  
>>  static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
>> @@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
>>  	cmd->pdu_recv = 0;
>>  	nvmet_tcp_build_pdu_iovec(cmd);
>>  	queue->cmd = cmd;
>> -	queue->rcv_state = NVMET_TCP_RECV_DATA;
>> +	/* Ensure rcv_state is visible only after queue->cmd is set */
>> +	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
>>  
>>  	return 0;
>>  
>> -- 
>> 2.34.1
>>
>> This ordering is critical on weakly ordered architectures (such as ARM)
>> so that any observer which sees the new rcv_state is guaranteed to also
>> see the updated cmd. 
> 
> Something seems off if smp_store_release() isn't paired with
> smp_load_acquire(). Why does the reader side not need a barrier?

Hi Keith

Thanks for the reply. After reviewing the code again, I think there may
still be a race condition here.

consider the following, worker thread executed the request (queue->cmd->req.execute) and before
it regains execution,nvmet_tcp_queue_response() gets called from another context.
It passes the first if statement(queue->cmd == cmd) and just before evaluating the
second one(queue->state == NVMET_TCP_RECV_PDU) the worker executes again and sets both queue->cmd and rcv_state.
In that case, the second thread will mistakenly exit on the second if statement causing a hanging IO.
I will create another version that declares the cmd and the state as local variables in
nvmet_tcp_queue_response (using the read barrier and READ_ONCE) in an opposite order
which should enforce the correct ordering and fix the problem I've mentioned above.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-02-19 12:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 14:22 [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state Meir Elisha
2025-02-18 15:19 ` Keith Busch
2025-02-19 12:28   ` Meir Elisha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox