* [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll
@ 2025-02-20 11:18 Sagi Grimberg
2025-02-24 22:23 ` Chaitanya Kulkarni
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Sagi Grimberg @ 2025-02-20 11:18 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, Christoph Hellwig, Hannes Reinecke
nvme_tcp_poll() may race with the send path error handler because
it may complete the request while it is actively being polled for
completion, resulting in a UAF panic [1]:
We should make sure to stop polling when we see an error when
trying to read from the socket. Hence make sure to propagate the
error so that the block layer breaks the polling cycle.
[1]:
--
[35665.692310] nvme nvme2: failed to send request -13
[35665.702265] nvme nvme2: unsupported pdu type (3)
[35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000
[35665.702542] nvme nvme2: queue 1 receive failed: -22
[35665.703209] #PF: supervisor write access in kernel mode
[35665.703213] #PF: error_code(0x0002) - not-present page
[35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0
[35665.703220] Oops: 0002 [#1] SMP PTI
[35665.703658] nvme nvme2: starting error recovery
[35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022
[35665.705812] Workqueue: kblockd blk_mq_requeue_work
[35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30
[35665.715788] Call Trace:
[35665.716201] <TASK>
[35665.716613] ? show_trace_log_lvl+0x1c1/0x2d9
[35665.717049] ? show_trace_log_lvl+0x1c1/0x2d9
[35665.717457] ? blk_mq_request_bypass_insert+0x2c/0xb0
[35665.717950] ? __die_body.cold+0x8/0xd
[35665.718361] ? page_fault_oops+0xac/0x140
[35665.718749] ? blk_mq_start_request+0x30/0xf0
[35665.719144] ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp]
[35665.719547] ? exc_page_fault+0x62/0x130
[35665.719938] ? asm_exc_page_fault+0x22/0x30
[35665.720333] ? _raw_spin_lock+0xc/0x30
[35665.720723] blk_mq_request_bypass_insert+0x2c/0xb0
[35665.721101] blk_mq_requeue_work+0xa5/0x180
[35665.721451] process_one_work+0x1e8/0x390
[35665.721809] worker_thread+0x53/0x3d0
[35665.722159] ? process_one_work+0x390/0x390
[35665.722501] kthread+0x124/0x150
[35665.722849] ? set_kthread_struct+0x50/0x50
[35665.723182] ret_from_fork+0x1f/0x30
Reported-by: Zhang Guanghui <zhang.guanghui@cestc.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/tcp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index c637ff04a052..0e390e98aaf9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
{
struct nvme_tcp_queue *queue = hctx->driver_data;
struct sock *sk = queue->sock->sk;
+ int ret;
if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
return 0;
@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
set_bit(NVME_TCP_Q_POLLING, &queue->flags);
if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
sk_busy_loop(sk, true);
- nvme_tcp_try_recv(queue);
+ ret = nvme_tcp_try_recv(queue);
clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
- return queue->nr_cqe;
+ return ret < 0 ? ret : queue->nr_cqe;
}
static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll
2025-02-20 11:18 [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll Sagi Grimberg
@ 2025-02-24 22:23 ` Chaitanya Kulkarni
2025-02-25 0:12 ` Keith Busch
2025-02-26 9:46 ` Hannes Reinecke
2 siblings, 0 replies; 4+ messages in thread
From: Chaitanya Kulkarni @ 2025-02-24 22:23 UTC (permalink / raw)
To: sagi@grimberg.me, linux-nvme@lists.infradead.org
Cc: Keith Busch, Christoph Hellwig, Hannes Reinecke
On 2/20/25 03:18, Sagi Grimberg wrote:
> nvme_tcp_poll() may race with the send path error handler because
> it may complete the request while it is actively being polled for
> completion, resulting in a UAF panic [1]:
>
> We should make sure to stop polling when we see an error when
> trying to read from the socket. Hence make sure to propagate the
> error so that the block layer breaks the polling cycle.
>
> [1]:
> --
> [35665.692310] nvme nvme2: failed to send request -13
> [35665.702265] nvme nvme2: unsupported pdu type (3)
> [35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [35665.702542] nvme nvme2: queue 1 receive failed: -22
> [35665.703209] #PF: supervisor write access in kernel mode
> [35665.703213] #PF: error_code(0x0002) - not-present page
> [35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0
> [35665.703220] Oops: 0002 [#1] SMP PTI
> [35665.703658] nvme nvme2: starting error recovery
> [35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022
> [35665.705812] Workqueue: kblockd blk_mq_requeue_work
> [35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30
> [35665.715788] Call Trace:
> [35665.716201] <TASK>
> [35665.716613] ? show_trace_log_lvl+0x1c1/0x2d9
> [35665.717049] ? show_trace_log_lvl+0x1c1/0x2d9
> [35665.717457] ? blk_mq_request_bypass_insert+0x2c/0xb0
> [35665.717950] ? __die_body.cold+0x8/0xd
> [35665.718361] ? page_fault_oops+0xac/0x140
> [35665.718749] ? blk_mq_start_request+0x30/0xf0
> [35665.719144] ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp]
> [35665.719547] ? exc_page_fault+0x62/0x130
> [35665.719938] ? asm_exc_page_fault+0x22/0x30
> [35665.720333] ? _raw_spin_lock+0xc/0x30
> [35665.720723] blk_mq_request_bypass_insert+0x2c/0xb0
> [35665.721101] blk_mq_requeue_work+0xa5/0x180
> [35665.721451] process_one_work+0x1e8/0x390
> [35665.721809] worker_thread+0x53/0x3d0
> [35665.722159] ? process_one_work+0x390/0x390
> [35665.722501] kthread+0x124/0x150
> [35665.722849] ? set_kthread_struct+0x50/0x50
> [35665.723182] ret_from_fork+0x1f/0x30
>
> Reported-by: Zhang Guanghui<zhang.guanghui@cestc.cn>
> Signed-off-by: Sagi Grimberg<sagi@grimberg.me>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll
2025-02-20 11:18 [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll Sagi Grimberg
2025-02-24 22:23 ` Chaitanya Kulkarni
@ 2025-02-25 0:12 ` Keith Busch
2025-02-26 9:46 ` Hannes Reinecke
2 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2025-02-25 0:12 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Hannes Reinecke
On Thu, Feb 20, 2025 at 01:18:30PM +0200, Sagi Grimberg wrote:
> nvme_tcp_poll() may race with the send path error handler because
> it may complete the request while it is actively being polled for
> completion, resulting in a UAF panic [1]:
Thanks, applied to nvme-6.14.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll
2025-02-20 11:18 [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll Sagi Grimberg
2025-02-24 22:23 ` Chaitanya Kulkarni
2025-02-25 0:12 ` Keith Busch
@ 2025-02-26 9:46 ` Hannes Reinecke
2 siblings, 0 replies; 4+ messages in thread
From: Hannes Reinecke @ 2025-02-26 9:46 UTC (permalink / raw)
To: sagi, linux-nvme; +Cc: Keith Busch, Christoph Hellwig
On 2/20/25 12:18, Sagi Grimberg wrote:
> nvme_tcp_poll() may race with the send path error handler because
> it may complete the request while it is actively being polled for
> completion, resulting in a UAF panic [1]:
>
> We should make sure to stop polling when we see an error when
> trying to read from the socket. Hence make sure to propagate the
> error so that the block layer breaks the polling cycle.
>
> [1]:
> --
> [35665.692310] nvme nvme2: failed to send request -13
> [35665.702265] nvme nvme2: unsupported pdu type (3)
> [35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [35665.702542] nvme nvme2: queue 1 receive failed: -22
> [35665.703209] #PF: supervisor write access in kernel mode
> [35665.703213] #PF: error_code(0x0002) - not-present page
> [35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0
> [35665.703220] Oops: 0002 [#1] SMP PTI
> [35665.703658] nvme nvme2: starting error recovery
> [35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022
> [35665.705812] Workqueue: kblockd blk_mq_requeue_work
> [35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30
> [35665.715788] Call Trace:
> [35665.716201] <TASK>
> [35665.716613] ? show_trace_log_lvl+0x1c1/0x2d9
> [35665.717049] ? show_trace_log_lvl+0x1c1/0x2d9
> [35665.717457] ? blk_mq_request_bypass_insert+0x2c/0xb0
> [35665.717950] ? __die_body.cold+0x8/0xd
> [35665.718361] ? page_fault_oops+0xac/0x140
> [35665.718749] ? blk_mq_start_request+0x30/0xf0
> [35665.719144] ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp]
> [35665.719547] ? exc_page_fault+0x62/0x130
> [35665.719938] ? asm_exc_page_fault+0x22/0x30
> [35665.720333] ? _raw_spin_lock+0xc/0x30
> [35665.720723] blk_mq_request_bypass_insert+0x2c/0xb0
> [35665.721101] blk_mq_requeue_work+0xa5/0x180
> [35665.721451] process_one_work+0x1e8/0x390
> [35665.721809] worker_thread+0x53/0x3d0
> [35665.722159] ? process_one_work+0x390/0x390
> [35665.722501] kthread+0x124/0x150
> [35665.722849] ? set_kthread_struct+0x50/0x50
> [35665.723182] ret_from_fork+0x1f/0x30
>
> Reported-by: Zhang Guanghui <zhang.guanghui@cestc.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> drivers/nvme/host/tcp.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index c637ff04a052..0e390e98aaf9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> {
> struct nvme_tcp_queue *queue = hctx->driver_data;
> struct sock *sk = queue->sock->sk;
> + int ret;
>
> if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
> return 0;
> @@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> set_bit(NVME_TCP_Q_POLLING, &queue->flags);
> if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
> sk_busy_loop(sk, true);
> - nvme_tcp_try_recv(queue);
> + ret = nvme_tcp_try_recv(queue);
> clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
> - return queue->nr_cqe;
> + return ret < 0 ? ret : queue->nr_cqe;
> }
>
> static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-26 9:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-20 11:18 [PATCH] nvme-tcp: fix possible UAF in nvme_tcp_poll Sagi Grimberg
2025-02-24 22:23 ` Chaitanya Kulkarni
2025-02-25 0:12 ` Keith Busch
2025-02-26 9:46 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox