* [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
@ 2025-02-26 7:28 Meir Elisha
2025-02-27 9:09 ` Meir Elisha
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Meir Elisha @ 2025-02-26 7:28 UTC (permalink / raw)
To: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg
Cc: linux-nvme, Meir Elisha
The order in which queue->cmd and rcv_state are updated is crucial.
If these assignments are reordered by the compiler, the worker might not
get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
the correct reordering, set rcv_state using smp_store_release().
Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
---
Changes from v2:
- Cosmetic changes to queue_state declaration
Changes from v1:
- Fix barrier semantics
- Use rcv_state instead of state variable
This ordering is critical on weakly ordered architectures (such as ARM)
so that any observer which sees the new rcv_state is guaranteed to also
see the updated cmd. Without this guarantee (i.e if the two stores were
reordered), a parallel context might see the new state while queue->cmd
still holds a stale value. This could cause the inline-data check to
return early and ultimately hang the IO.
Additionally, I reviewed the assembly code for ARM and confirmed that
the instructions were reordered(unlike x86), reinforcing the need for
this change.
This scenario was encountered during fio testing, which involved
running 2 min of 4K random writes using an ARM-based machine as the
target. We observed hanging I/O typically after 10-20 iterations.
fio config used:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/mnt/volumez/vol0
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite
drivers/nvme/target/tcp.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 7c51c2a8c109..4f9cac8a5abe 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -571,10 +571,16 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
struct nvmet_tcp_cmd *cmd =
container_of(req, struct nvmet_tcp_cmd, req);
struct nvmet_tcp_queue *queue = cmd->queue;
+ enum nvmet_tcp_recv_state queue_state;
+ struct nvmet_tcp_cmd *queue_cmd;
struct nvme_sgl_desc *sgl;
u32 len;
- if (unlikely(cmd == queue->cmd)) {
+ /* Pairs with store_release in nvmet_prepare_receive_pdu() */
+ queue_state = smp_load_acquire(&queue->rcv_state);
+ queue_cmd = READ_ONCE(queue->cmd);
+
+ if (unlikely(cmd == queue_cmd)) {
sgl = &cmd->req.cmd->common.dptr.sgl;
len = le32_to_cpu(sgl->length);
@@ -583,7 +589,7 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
* Avoid using helpers, this might happen before
* nvmet_req_init is completed.
*/
- if (queue->rcv_state == NVMET_TCP_RECV_PDU &&
+ if (queue_state == NVMET_TCP_RECV_PDU &&
len && len <= cmd->req.port->inline_data_size &&
nvme_is_write(cmd->req.cmd))
return;
@@ -847,8 +853,9 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
{
queue->offset = 0;
queue->left = sizeof(struct nvme_tcp_hdr);
- queue->cmd = NULL;
- queue->rcv_state = NVMET_TCP_RECV_PDU;
+ WRITE_ONCE(queue->cmd, NULL);
+ /* Ensure rcv_state is visible only after queue->cmd is set */
+ smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
}
static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
2025-02-26 7:28 [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
@ 2025-02-27 9:09 ` Meir Elisha
2025-02-27 10:02 ` Sagi Grimberg
2025-02-28 18:00 ` Keith Busch
2 siblings, 0 replies; 4+ messages in thread
From: Meir Elisha @ 2025-02-27 9:09 UTC (permalink / raw)
To: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg; +Cc: linux-nvme
Just making sure it wasn't forgotten since there was a lot of traffic. Thanks for your time!
On 26/02/2025 9:28, Meir Elisha wrote:
> The order in which queue->cmd and rcv_state are updated is crucial.
> If these assignments are reordered by the compiler, the worker might not
> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> the correct reordering, set rcv_state using smp_store_release().
>
> Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
>
> Signed-off-by: Meir Elisha <meir.elisha@volumez.com>
> ---
> Changes from v2:
> - Cosmetic changes to queue_state declaration
>
> Changes from v1:
> - Fix barrier semantics
> - Use rcv_state instead of state variable
>
> This ordering is critical on weakly ordered architectures (such as ARM)
> so that any observer which sees the new rcv_state is guaranteed to also
> see the updated cmd. Without this guarantee (i.e if the two stores were
> reordered), a parallel context might see the new state while queue->cmd
> still holds a stale value. This could cause the inline-data check to
> return early and ultimately hang the IO.
> Additionally, I reviewed the assembly code for ARM and confirmed that
> the instructions were reordered(unlike x86), reinforcing the need for
> this change.
>
> This scenario was encountered during fio testing, which involved
> running 2 min of 4K random writes using an ARM-based machine as the
> target. We observed hanging I/O typically after 10-20 iterations.
>
> fio config used:
> [global]
> ioengine=libaio
> max_latency=45s
> end_fsync=1
> create_serialize=0
> size=3200m
> directory=/mnt/volumez/vol0
> ramp_time=30
> lat_percentiles=1
> direct=1
> filename_format=fiodata.$jobnum
> verify_dump=1
> numjobs=16
> fallocate=native
> stonewall=1
> group_reporting=1
> file_service_type=random
> iodepth=16
> runtime=5m
> time_based=1
> [random_0_100_4k]
> bs=4k
> rw=randwrite
>
> drivers/nvme/target/tcp.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 7c51c2a8c109..4f9cac8a5abe 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -571,10 +571,16 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
> struct nvmet_tcp_cmd *cmd =
> container_of(req, struct nvmet_tcp_cmd, req);
> struct nvmet_tcp_queue *queue = cmd->queue;
> + enum nvmet_tcp_recv_state queue_state;
> + struct nvmet_tcp_cmd *queue_cmd;
> struct nvme_sgl_desc *sgl;
> u32 len;
>
> - if (unlikely(cmd == queue->cmd)) {
> + /* Pairs with store_release in nvmet_prepare_receive_pdu() */
> + queue_state = smp_load_acquire(&queue->rcv_state);
> + queue_cmd = READ_ONCE(queue->cmd);
> +
> + if (unlikely(cmd == queue_cmd)) {
> sgl = &cmd->req.cmd->common.dptr.sgl;
> len = le32_to_cpu(sgl->length);
>
> @@ -583,7 +589,7 @@ static void nvmet_tcp_queue_response(struct nvmet_req *req)
> * Avoid using helpers, this might happen before
> * nvmet_req_init is completed.
> */
> - if (queue->rcv_state == NVMET_TCP_RECV_PDU &&
> + if (queue_state == NVMET_TCP_RECV_PDU &&
> len && len <= cmd->req.port->inline_data_size &&
> nvme_is_write(cmd->req.cmd))
> return;
> @@ -847,8 +853,9 @@ static void nvmet_prepare_receive_pdu(struct nvmet_tcp_queue *queue)
> {
> queue->offset = 0;
> queue->left = sizeof(struct nvme_tcp_hdr);
> - queue->cmd = NULL;
> - queue->rcv_state = NVMET_TCP_RECV_PDU;
> + WRITE_ONCE(queue->cmd, NULL);
> + /* Ensure rcv_state is visible only after queue->cmd is set */
> + smp_store_release(&queue->rcv_state, NVMET_TCP_RECV_PDU);
> }
>
> static void nvmet_tcp_free_crypto(struct nvmet_tcp_queue *queue)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
2025-02-26 7:28 [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
2025-02-27 9:09 ` Meir Elisha
@ 2025-02-27 10:02 ` Sagi Grimberg
2025-02-28 18:00 ` Keith Busch
2 siblings, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2025-02-27 10:02 UTC (permalink / raw)
To: Meir Elisha, Christoph Hellwig, Chaitanya Kulkarni; +Cc: linux-nvme
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
2025-02-26 7:28 [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
2025-02-27 9:09 ` Meir Elisha
2025-02-27 10:02 ` Sagi Grimberg
@ 2025-02-28 18:00 ` Keith Busch
2 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2025-02-28 18:00 UTC (permalink / raw)
To: Meir Elisha
Cc: Christoph Hellwig, Chaitanya Kulkarni, Sagi Grimberg, linux-nvme
On Wed, Feb 26, 2025 at 09:28:12AM +0200, Meir Elisha wrote:
> The order in which queue->cmd and rcv_state are updated is crucial.
> If these assignments are reordered by the compiler, the worker might not
> get queued in nvmet_tcp_queue_response(), hanging the IO. to enforce the
> the correct reordering, set rcv_state using smp_store_release().
>
> Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
Thanks, applied to nvme-6.14.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-28 18:00 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26 7:28 [PATCH v3] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch Meir Elisha
2025-02-27 9:09 ` Meir Elisha
2025-02-27 10:02 ` Sagi Grimberg
2025-02-28 18:00 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox