All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
@ 2025-02-23  7:28 Meir Elisha
  2025-02-23 12:21 ` Sagi Grimberg
  2025-02-24 14:13 ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Meir Elisha @ 2025-02-23  7:28 UTC (permalink / raw)
  To: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg
  Cc: linux-nvme, Meir Elisha

The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().

Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")

Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
---
Changes from v2:
	- Fix barrier semantics
	- Use rcv_state instead of state variable

 drivers/nvme/target/tcp.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 7c51c2a8c109..714d920d14e1 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -571,10 +571,13 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
 	struct nvmet_tcp_cmd *cmd =
 		container_of(req, struct nvmet_tcp_cmd, req);
 	struct nvmet_tcp_queue	*queue = cmd->queue;
+	/* Pairs with store_release in nvmet_prepare_receive_pdu() */
+	enum nvmet_tcp_recv_state queue_state = smp_load_acquire(&queue->rcv_state);
+	struct nvmet_tcp_cmd *queue_cmd = READ_ONCE(queue->cmd);
 	struct nvme_sgl_desc *sgl;
 	u32 len;
 
-	if (unlikely(cmd == queue->cmd)) {
+	if (unlikely(cmd == queue_cmd)) {
 		sgl = &cmd->req.cmd->common.dptr.sgl;
 		len = le32_to_cpu(sgl->length);
 
@@ -583,7 +586,7 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
 		 * Avoid using helpers, this might happen before
 		 * nvmet_req_init is completed.
 		 */
-		if (queue->rcv_state == NVMET_TCP_RECV_PDU &&
+		if (queue_state == NVMET_TCP_RECV_PDU &&
 		    len && len <= cmd->req.port->inline_data_size &&
 		    nvme_is_write(cmd->req.cmd))
 			return;
@@ -847,8 +850,9 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
 {
 	queue->offset = 0;
 	queue->left = sizeof(struct nvme_tcp_hdr);
-	queue->cmd = NULL;
-	queue->rcv_state = NVMET_TCP_RECV_PDU;
+	WRITE_ONCE(queue->cmd, NULL);
+	/* Ensure rcv_state is visible only after queue->cmd is set */
+	smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
 }
 
 static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
-- 
2.34.1

This ordering is critical on weakly ordered architectures (such as ARM)
so that any observer which sees the new rcv_state is guaranteed to also
see the updated cmd. Without this guarantee (i.e if the two stores were
reordered), a parallel context might see the new state while queue->cmd
still holds a stale value. This could cause the inline-data check to
return early and ultimately hang the IO.
Additionally, I reviewed the assembly code for ARM and confirmed that
the instructions were reordered(unlike x86), reinforcing the need for
this change.

This scenario was encountered during fio testing, which involved
running 2 min of 4K random writes using an ARM-based machine as the
target. We observed hanging I/O typically after 10-20 iterations.

fio config used:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/mnt/volumez/vol0
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
  2025-02-23  7:28 [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
@ 2025-02-23 12:21 ` Sagi Grimberg
  2025-02-24 14:13 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2025-02-23 12:21 UTC (permalink / raw)
  To: Meir Elisha, Christoph Hellwig, Chaitanya Kulkarni; +Cc: linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
  2025-02-23  7:28 [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
  2025-02-23 12:21 ` Sagi Grimberg
@ 2025-02-24 14:13 ` Christoph Hellwig
  2025-02-25  7:40   ` Sagi Grimberg
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2025-02-24 14:13 UTC (permalink / raw)
  To: Meir Elisha
  Cc: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg, linux-nvme

On Sun, Feb 23, 2025 at 09:28:45AM +0200, Meir Elisha wrote:
> The order in which queue->cmd and rcv_state are updated is crucial.
> If these assignments are reordered by the compiler, the worker might not
> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> the correct reordering, set rcv_state using smp_store_release().
> 
> Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
> 
> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
> ---
> Changes from v2:
> 	- Fix barrier semantics
> 	- Use rcv_state instead of state variable
> 
>  drivers/nvme/target/tcp.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 7c51c2a8c109..714d920d14e1 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -571,10 +571,13 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
>  	struct nvmet_tcp_cmd *cmd =
>  		container_of(req, struct nvmet_tcp_cmd, req);
>  	struct nvmet_tcp_queue	*queue = cmd->queue;
> +	/* Pairs with store_release in nvmet_prepare_receive_pdu() */
> +	enum nvmet_tcp_recv_state queue_state = smp_load_acquire(&queue->rcv_state);

Ovely long line.

And another thing purely cosmetic: while I generally like initializing
variables at declaration time, doing that for something like
smp_load_acquire which should go with a comment looks kinda weird.

So maybe just split the assignment out from the declaration?

> -- 
> 2.34.1
> 
> This ordering is critical on weakly ordered architectures (such as ARM)

Something weird is going on with this description below the actual
patch.  This should normally go above.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
  2025-02-24 14:13 ` Christoph Hellwig
@ 2025-02-25  7:40   ` Sagi Grimberg
  0 siblings, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2025-02-25  7:40 UTC (permalink / raw)
  To: Christoph Hellwig, Meir Elisha; +Cc: Chaitanya Kulkarni, linux-nvme



On 24/02/2025 16:13, Christoph Hellwig wrote:
> On Sun, Feb 23, 2025 at 09:28:45AM +0200, Meir Elisha wrote:
>> The order in which queue->cmd and rcv_state are updated is crucial.
>> If these assignments are reordered by the compiler, the worker might not
>> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
>> the correct reordering, set rcv_state using smp_store_release().
>>
>> Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
>>
>> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
>> ---
>> Changes from v2:
>> 	- Fix barrier semantics
>> 	- Use rcv_state instead of state variable
>>
>>   drivers/nvme/target/tcp.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 7c51c2a8c109..714d920d14e1 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -571,10 +571,13 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
>>   	struct nvmet_tcp_cmd *cmd =
>>   		container_of(req, struct nvmet_tcp_cmd, req);
>>   	struct nvmet_tcp_queue	*queue = cmd->queue;
>> +	/* Pairs with store_release in nvmet_prepare_receive_pdu() */
>> +	enum nvmet_tcp_recv_state queue_state = smp_load_acquire(&queue->rcv_state);
> Ovely long line.
>
> And another thing purely cosmetic: while I generally like initializing
> variables at declaration time, doing that for something like
> smp_load_acquire which should go with a comment looks kinda weird.
>
> So maybe just split the assignment out from the declaration?

Makes sense.

>
>> -- 
>> 2.34.1
>>
>> This ordering is critical on weakly ordered architectures (such as ARM)
> Something weird is going on with this description below the actual
> patch.  This should normally go above.

git am of the exported mbox seems to correctly drop it from the patch 
either way.
But I agree it should go above below the --- separator.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-25  7:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-23  7:28 [PATCH v2] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
2025-02-23 12:21 ` Sagi Grimberg
2025-02-24 14:13 ` Christoph Hellwig
2025-02-25  7:40   ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.