* [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
@ 2025-02-17 14:22 Meir Elisha
2025-02-18 15:19 ` Keith Busch
0 siblings, 1 reply; 3+ messages in thread
From: Meir Elisha @ 2025-02-17 14:22 UTC (permalink / raw)
To: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg
Cc: linux-nvme, Meir Elisha
The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().
Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
---
v2: Change comments to c-style
drivers/nvme/target/tcp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 7c51c2a8c109..49ce2f9ac6c8 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
queue->offset = 0;
queue->left = sizeof(struct nvme_tcp_hdr);
queue->cmd = NULL;
- queue->rcv_state = NVMET_TCP_RECV_PDU;
+ /* Ensure rcv_state is visible only after queue->cmd is set */
+ smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
}
static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
@@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
cmd->pdu_recv = 0;
nvmet_tcp_build_pdu_iovec(cmd);
queue->cmd = cmd;
- queue->rcv_state = NVMET_TCP_RECV_DATA;
+ /* Ensure rcv_state is visible only after queue->cmd is set */
+ smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
return 0;
--
2.34.1
This ordering is critical on weakly ordered architectures (such as ARM)
so that any observer which sees the new rcv_state is guaranteed to also
see the updated cmd. Without this guarantee (i.e if the two stores were
reordered), a parallel context might see the new state while queue->cmd
still holds a stale value. This could cause the inline-data check to
return early and ultimately hang the IO.
Additionally, I reviewed the assembly code for ARM and confirmed that
the instructions were reordered(unlike x86), reinforcing the need for
this change.
This scenario was encountered during fio testing, which involved
running 2 min of 4K random writes using an ARM-based machine as the
target. We observed hanging I/O typically after 10-20 iterations.
fio config used:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/mnt/volumez/vol0
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
2025-02-17 14:22 [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state Meir Elisha
@ 2025-02-18 15:19 ` Keith Busch
2025-02-19 12:28 ` Meir Elisha
0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2025-02-18 15:19 UTC (permalink / raw)
To: Meir Elisha
Cc: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg, linux-nvme
On Mon, Feb 17, 2025 at 04:22:10PM +0200, Meir Elisha wrote:
> The order in which queue->cmd and rcv_state are updated is crucial.
> If these assignments are reordered by the compiler, the worker might not
> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> the correct reordering, set rcv_state using smp_store_release().
>
> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
> ---
> v2: Change comments to c-style
>
> drivers/nvme/target/tcp.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 7c51c2a8c109..49ce2f9ac6c8 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
> queue->offset = 0;
> queue->left = sizeof(struct nvme_tcp_hdr);
> queue->cmd = NULL;
> - queue->rcv_state = NVMET_TCP_RECV_PDU;
> + /* Ensure rcv_state is visible only after queue->cmd is set */
> + smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
> }
>
> static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
> @@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
> cmd->pdu_recv = 0;
> nvmet_tcp_build_pdu_iovec(cmd);
> queue->cmd = cmd;
> - queue->rcv_state = NVMET_TCP_RECV_DATA;
> + /* Ensure rcv_state is visible only after queue->cmd is set */
> + smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
>
> return 0;
>
> --
> 2.34.1
>
> This ordering is critical on weakly ordered architectures (such as ARM)
> so that any observer which sees the new rcv_state is guaranteed to also
> see the updated cmd.
Something seems off if smp_store_release() isn't paired with
smp_load_acquire(). Why does the reader side not need a barrier?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state
2025-02-18 15:19 ` Keith Busch
@ 2025-02-19 12:28 ` Meir Elisha
0 siblings, 0 replies; 3+ messages in thread
From: Meir Elisha @ 2025-02-19 12:28 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg, linux-nvme
On 18/02/2025 17:19, Keith Busch wrote:
> On Mon, Feb 17, 2025 at 04:22:10PM +0200, Meir Elisha wrote:
>> The order in which queue->cmd and rcv_state are updated is crucial.
>> If these assignments are reordered by the compiler, the worker might not
>> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
>> the correct reordering, set rcv_state using smp_store_release().
>>
>> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
>> ---
>> v2: Change comments to c-style
>>
>> drivers/nvme/target/tcp.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 7c51c2a8c109..49ce2f9ac6c8 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -848,7 +848,8 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
>> queue->offset = 0;
>> queue->left = sizeof(struct nvme_tcp_hdr);
>> queue->cmd = NULL;
>> - queue->rcv_state = NVMET_TCP_RECV_PDU;
>> + /* Ensure rcv_state is visible only after queue->cmd is set */
>> + smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
>> }
>>
>> static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
>> @@ -1017,7 +1018,8 @@ static int nvmet_tcp_handle_h2c_data_pdu(struct nvmet_tcp_queue *queue)
>> cmd->pdu_recv = 0;
>> nvmet_tcp_build_pdu_iovec(cmd);
>> queue->cmd = cmd;
>> - queue->rcv_state = NVMET_TCP_RECV_DATA;
>> + /* Ensure rcv_state is visible only after queue->cmd is set */
>> + smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_DATA);
>>
>> return 0;
>>
>> --
>> 2.34.1
>>
>> This ordering is critical on weakly ordered architectures (such as ARM)
>> so that any observer which sees the new rcv_state is guaranteed to also
>> see the updated cmd.
>
> Something seems off if smp_store_release() isn't paired with
> smp_load_acquire(). Why does the reader side not need a barrier?
Hi Keith
Thanks for the reply. After reviewing the code again, I think there may
still be a race condition here.
consider the following, worker thread executed the request (queue->cmd->req.execute) and before
it regains execution,nvmet_tcp_queue_response() gets called from another context.
It passes the first if statement(queue->cmd == cmd) and just before evaluating the
second one(queue->state == NVMET_TCP_RECV_PDU) the worker executes again and sets both queue->cmd and rcv_state.
In that case, the second thread will mistakenly exit on the second if statement causing a hanging IO.
I will create another version that declares the cmd and the state as local variables in
nvmet_tcp_queue_response (using the read barrier and READ_ONCE) in an opposite order
which should enforce the correct ordering and fix the problem I've mentioned above.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-19 12:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 14:22 [PATCH v2] nvmet-tcp: Enforce update ordering between queue->cmd and rcv_state Meir Elisha
2025-02-18 15:19 ` Keith Busch
2025-02-19 12:28 ` Meir Elisha
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox