* [PATCH] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path
[not found] ` <2026031827-carving-overdrawn-5b1f@gregkh>
@ 2026-03-18 22:56 ` Shivam Kumar
2026-03-20 7:46 ` Christoph Hellwig
[not found] ` <CA+ysrSLcSJSgEC7i0GYXPiRY9sbbyEGT60Nh1SJ6PqXWENgSOw@mail.gmail.com>
1 sibling, 1 reply; 3+ messages in thread
From: Shivam Kumar @ 2026-03-18 22:56 UTC (permalink / raw)
To: gregkh; +Cc: security, hch, sagi, kch, linux-nvme, kumar.shivam43666
In nvmet_tcp_try_recv_ddgst(), when a data digest mismatch is detected,
nvmet_req_uninit() is called unconditionally. However, if the command
arrived via the nvmet_tcp_handle_req_failure() path, nvmet_req_init()
had returned false and percpu_ref_tryget_live() was never executed. The
unconditional percpu_ref_put() inside nvmet_req_uninit() then causes a
refcount underflow, leading to a WARNING in
percpu_ref_switch_to_atomic_rcu, a use-after-free diagnostic, and
eventually a permanent workqueue deadlock.
Check cmd->flags & NVMET_TCP_F_INIT_FAILED before calling
nvmet_req_uninit(), matching the existing pattern in
nvmet_tcp_execute_request().
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
drivers/nvme/target/tcp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index acc71a26733f..a456dd2fd8bd 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1310,7 +1310,8 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
queue->idx, cmd->req.cmd->common.command_id,
queue->pdu.cmd.hdr.type, le32_to_cpu(cmd->recv_ddgst),
le32_to_cpu(cmd->exp_ddgst));
- nvmet_req_uninit(&cmd->req);
+ if (!(cmd->flags & NVMET_TCP_F_INIT_FAILED))
+ nvmet_req_uninit(&cmd->req);
nvmet_tcp_free_cmd_buffers(cmd);
nvmet_tcp_fatal_error(queue);
ret = -EPROTO;
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path
2026-03-18 22:56 ` [PATCH] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Shivam Kumar
@ 2026-03-20 7:46 ` Christoph Hellwig
0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2026-03-20 7:46 UTC (permalink / raw)
To: Shivam Kumar; +Cc: gregkh, security, sagi, kch, linux-nvme
Looks good to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Although I'd prefer if someone more familiar with the TCP code could
also look over it.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bug Report] nvmet-tcp: unbalanced percpu_ref_put on data digest error after nvmet_req_init failure causes refcount underflow, use-after-free, and permanent workqueue deadlock
[not found] ` <CA+ysrSJQ2tDiB=zL+GwF=ckhmO62v6GC2bWVUVVDVn7d8Km1mw@mail.gmail.com>
@ 2026-03-25 4:29 ` Chaitanya Kulkarni
0 siblings, 0 replies; 3+ messages in thread
From: Chaitanya Kulkarni @ 2026-03-25 4:29 UTC (permalink / raw)
To: Shivam Kumar, Greg KH
Cc: Christoph Hellwig, sagi@grimberg.me, Chaitanya Kulkarni,
security@kernel.org, linux-nvme@lists.infradead.org
+linux-nvme list as well.
On 3/24/26 21:22, Shivam Kumar wrote:
> Hi Greg, Christoph, Sagi, Chaitanya
>
> Following up on my previous report (percpu_ref underflow in
> nvmet_tcp_try_recv_ddgst). During testing with the percpu_ref patch
> applied, I identified a separate vulnerability triggered by the same
> PoC. This one affects the queue kref (not the SQ percpu_ref) and has a
> different root cause.
>
> # Bug Overview
>
> nvmet_tcp_handle_icreq() sets queue->state = NVMET_TCP_Q_LIVE without
> holding state_lock and without checking whether teardown has already
> started. When a client closes the connection immediately after sending
> an ICReq and a malformed CapsuleCmd, the FIN can arrive and be
> processed before io_work reads the ICReq from the socket buffer. The
> FIN handler sets queue->state = NVMET_TCP_Q_DISCONNECTING and calls
> kref_put (1->0). Then io_work processes the ICReq and handle_icreq
> blindly overwrites the state back to NVMET_TCP_Q_LIVE. This defeats
> the guard in nvmet_tcp_schedule_release_queue(), which relies on state
> == NVMET_TCP_Q_DISCONNECTING to prevent a second kref_put.
>
> When the subsequent digest error triggers nvmet_tcp_fatal_error() ->
> kernel_sock_shutdown(), the socket teardown synchronously processes
> any pending TCP packets in the socket backlog via release_sock() ->
> tcp_done() -> nvmet_tcp_state_change() ->
> nvmet_tcp_schedule_release_queue(). Because the state was overwritten
> to LIVE, the guard passes and kref_put is called on an already-zero
> kref, triggering the underflow.
>
> # Root Cause
>
> The vulnerable code in nvmet_tcp_handle_icreq() (tcp.c):
>
> queue->state = NVMET_TCP_Q_LIVE;
> nvmet_prepare_receive_pdu(queue);
> return 0;
>
> This state transition is not protected by state_lock and does not
> check the current state. All other state transitions to
> NVMET_TCP_Q_DISCONNECTING in nvmet_tcp_schedule_release_queue() are
> protected by the spinlock, but handle_icreq bypasses this
> synchronization entirely.
>
> # Race Sequence
>
> The PoC sends ICReq + CapsuleCmd + close() in rapid succession. On the wire:
>
> [TCP handshake] -> [ICReq] -> [CapsuleCmd] -> [FIN]
>
> All data arrives in order in the socket buffer, but the FIN triggers
> a TCP state change callback (softirq context) that races with io_work
> (process context) reading the application data.
>
> # ftrace evidence (queue 29)
>
> Step 1: FIN arrives, teardown begins (softirq context):
>
> kworker/-74 1b..2. 125826488us : nvmet_tcp_state_change: DEBUG queue
> 29: In schedule_release before checking state variable state is 0,
> kref=1, caller=tcp_fin+0x278/0x4f0
> kworker/-74 1b..2. 125826492us : nvmet_tcp_state_change: DEBUG queue
> 29: kref_put in schedule_release state changed to 3, kref=1,
> caller=tcp_fin+0x278/0x4f0
>
> State transitions from CONNECTING(0) to DISCONNECTING(3), kref decremented 1->0.
>
> Step 2: io_work processes ICReq, overwrites state (process context):
>
> kworker/-74 1..... 125826529us : nvmet_tcp_try_recv_pdu: DEBUG queue
> 29: Inside nvmet_tcp_handle_icreq_pdu before changing state is 3,
> kref=0, caller=nvmet_tcp_io_work+0x12b/0x1fc0
> kworker/-74 1..... 125826532us : nvmet_tcp_try_recv_pdu: DEBUG queue
> 29: Inside nvmet_tcp_handle_icreq_pdu after changing state is 2,
> kref=0, caller=nvmet_tcp_io_work+0x12b/0x1fc0
>
> handle_icreq see's state=DISCONNECTING(3) but overwrites it to LIVE(2)
> without holding state_lock. kref is already 0.
>
> Step 3: Digest error triggers fatal_error -> kernel_sock_shutdown
> (process context):
>
> kworker/-74 1..... 125831291us : nvmet_tcp_fatal_error: DEBUG queue
> 29: fatal_error, state=2, ctrl=0000000000000000,
> caller=nvmet_tcp_try_recv_ddgst+0x74f/0x9a0
>
> Step 4: kernel_sock_shutdown -> release_sock processes backlog -> tcp_done
> re-enters state_change, guard bypassed because state is LIVE(2):
>
> kworker/-74 1b..2. 125831626us : nvmet_tcp_state_change: DEBUG queue
> 29: In schedule_release before checking state variable state is 2,
> kref=0, caller=tcp_done+0x1d7/0x350
> kworker/-74 1b..2. 125831629us : nvmet_tcp_state_change: DEBUG queue
> 29: kref_put in schedule_release state changed to 3, kref=0,
> caller=tcp_done+0x1d7/0x350
>
> schedule_release_queue see's state=LIVE(2), guard passes,
> kref_put called on kref=0 -> underflow -> CRASH.
>
> The critical evidence is Step 2: handle_icreq reads state=3(DISCONNECTING)
> and writes state=2(LIVE) — proving the unsynchronized overwrite.
>
> # Additional Observation: Double fatal_error Call
>
> The ftrace also shows that fatal_error is called twice per error: once
> from nvmet_tcp_try_recv_ddgst (digest mismatch) and again from
> nvmet_tcp_socket_error in nvmet_tcp_io_work (propagated error return).
> Both calls invoke kernel_sock_shutdown, doubling the window for the
> reentrant state_change.
>
> # Crash Trace (with percpu_ref patch applied)
>
> refcount_t: underflow; use-after-free.
> WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xc5/0x110
> Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> Call Trace:
> <TASK>
> nvmet_tcp_state_change+0x3e6/0x470
> tcp_done+0x1d7/0x350
> tcp_rcv_state_process+0x392c/0x6d80
> tcp_v4_do_rcv+0x1b1/0xa80
> __release_sock+0x2ab/0x340
> release_sock+0x58/0x1c0
> inet_shutdown+0x1f6/0x3c0
> nvmet_tcp_fatal_error+0x13c/0x1c0
> nvmet_tcp_try_recv_ddgst+0x74f/0x9a0
> nvmet_tcp_io_work+0xd43/0x1fc0
> process_one_work+0x8cf/0x1a40
>
> # Affected Code
>
> - Subsystem: drivers/nvme/target/tcp.c
> - Function: nvmet_tcp_handle_icreq()
> - Tested on: Linux 7.0-rc3
>
> # Suggested Fix Direction
>
> The state transition in nvmet_tcp_handle_icreq() needs to be
> synchronized under state_lock with a check for
> NVMET_TCP_Q_DISCONNECTING. If the queue is already tearing down,
> handle_icreq should not set the state to LIVE and should stop
> processing. I'll defer to you on the exact error handling semantics
> (return value and cleanup path).
>
> This bug is independent of the percpu_ref underflow in the previous
> report. The percpu_ref patch fixes the SQ refcount underflow; this bug
> affects the queue kref through an unsynchronized state transition.
>
> I am available for further discussion and testing.
>
> Thanks,
> Shivam Kumar
> PhD Student, Computer Science
> SYNE Lab, Syracuse University
> skumar47@syr.edu
I'll try to have a look at this over the weekend.
-ck
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-25 4:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+ysrS+QrbGfOLk=e0=PZ-py-KO_xZgn827nEg7mh7hGhdUAAw@mail.gmail.com>
[not found] ` <2026031827-carving-overdrawn-5b1f@gregkh>
2026-03-18 22:56 ` [PATCH] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Shivam Kumar
2026-03-20 7:46 ` Christoph Hellwig
[not found] ` <CA+ysrSLcSJSgEC7i0GYXPiRY9sbbyEGT60Nh1SJ6PqXWENgSOw@mail.gmail.com>
[not found] ` <CA+ysrSL+3YVGvTKhJ=jL92XZPwjoDpt5D2BK-HeCU975vO-fCQ@mail.gmail.com>
[not found] ` <CA+ysrSJQ2tDiB=zL+GwF=ckhmO62v6GC2bWVUVVDVn7d8Km1mw@mail.gmail.com>
2026-03-25 4:29 ` [Bug Report] nvmet-tcp: unbalanced percpu_ref_put on data digest error after nvmet_req_init failure causes refcount underflow, use-after-free, and permanent workqueue deadlock Chaitanya Kulkarni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox