* nvme-tcp: fix a possible UAF when failing to send request
@ 2025-02-10 7:41 zhang.guanghui
2025-02-10 10:01 ` Maurizio Lombardi
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-02-10 7:41 UTC (permalink / raw)
To: sagi, mgurtovoy, kbusch, sashal, chunguang.xu, zhang.guanghui
Cc: linux-kernel, linux-nvme, linux-block
Hello
When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
CPU1 CPU2
nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
nvme_tcp_try_recv nvme_tcp_fail_request
nvme_tcp_recv_skb nvme_tcp_end_request
nvme_tcp_recv_pdu nvme_complete_rq
nvme_tcp_handle_comp nvme_retry_req -- request->mq_hctx have been freed, is NULL.
nvme_tcp_process_nvme_cqe
nvme_complete_rq
nvme_end_req
blk_mq_end_request
when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then the nvme_tcp_fail_request would execute。
but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request. request->mq_hctx was used after free.
the follow patch is to solve it.
can you give some suggestions? thanks!
diff --git a/linux/drivers/nvme/host/core.c b/linux/drivers/nvme/host/core.c
index a65b1dce8..417466674 100644
--- a/linux/drivers/nvme/host/core.c
+++ b/linux/drivers/nvme/host/core.c
@@ -288,6 +288,9 @@ static void nvme_retry_req(struct request *req)
unsigned long delay = 0;
u16 crd;
+ if(!req->mq_hctx && req->state == MQ_RQ_IDLE)
+ return;
+
/* The mask and shift result must be <= 3 */
crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
The details are as follows
[35665.692310] nvme nvme2: failed to send request -13
[35665.692683] nvme nvme2: queue 1 failed to send request 00000000b42f4e2b state 2 pdu 00000000d7fb8da3 type 4 rq_state 1 nrq_status 0
[35665.693323] nvme nvme2: failed to send rq 00000000f86a68c3 state 2 nrq_status 370
[35665.702265] nvme nvme2: unsupported pdu type (3)
[35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000
[35665.702542] nvme nvme2: queue 1 receive failed: -22
[35665.703209] #PF: supervisor write access in kernel mode
[35665.703213] #PF: error_code(0x0002) - not-present page
[35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0
[35665.703220] Oops: 0002 [#1] SMP PTI
[35665.703658] nvme nvme2: starting error recovery
[35665.704442] CPU: 20 PID: 815 Comm: kworker/20:1H Kdump: loaded Not tainted 5.15.131-17.cl9.x86_64 #1
[35665.705168] nvme nvme2: queue 1 receive again after receive failed
[35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022
[35665.705812] Workqueue: kblockd blk_mq_requeue_work
[35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30
[35665.709606] Code: 05 c3 cc cc cc cc 89 c6 e8 31 05 68 ff 66 90 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e8 02 05 68 ff 66 90 c3 cc
[35665.710470] RSP: 0000:ffffa67bcd797e08 EFLAGS: 00010246
[35665.710925] RAX: 0000000000000000 RBX: ffff92f6bbcc9840 RCX: ffff92f6bbcc9888
[35665.711393] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[35665.711849] RBP: 0000000000000000 R08: ffffa67bcd797e48 R09: ffff932346d576f4
[35665.712275] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[35665.712725] R13: ffff92f6bbcc9888 R14: 0000000000000008 R15: 0000000000000000
[35665.713158] FS: 0000000000000000(0000) GS:ffff93527d400000(0000) knlGS:0000000000000000
[35665.713603] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[35665.714063] CR2: 0000000000000000 CR3: 000000371aa02006 CR4: 00000000007706e0
[35665.714534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[35665.714961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[35665.715359] PKRU: 55555554
[35665.715788] Call Trace:
[35665.716201] <TASK>
[35665.716613] ? show_trace_log_lvl+0x1c1/0x2d9
[35665.717049] ? show_trace_log_lvl+0x1c1/0x2d9
[35665.717457] ? blk_mq_request_bypass_insert+0x2c/0xb0
[35665.717950] ? __die_body.cold+0x8/0xd
[35665.718361] ? page_fault_oops+0xac/0x140
[35665.718749] ? blk_mq_start_request+0x30/0xf0
[35665.719144] ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp]
[35665.719547] ? exc_page_fault+0x62/0x130
[35665.719938] ? asm_exc_page_fault+0x22/0x30
[35665.720333] ? _raw_spin_lock+0xc/0x30
[35665.720723] blk_mq_request_bypass_insert+0x2c/0xb0
[35665.721101] blk_mq_requeue_work+0xa5/0x180
[35665.721451] process_one_work+0x1e8/0x390
[35665.721809] worker_thread+0x53/0x3d0
[35665.722159] ? process_one_work+0x390/0x390
[35665.722501] kthread+0x124/0x150
[35665.722849] ? set_kthread_struct+0x50/0x50
[35665.723182] ret_from_fork+0x1f/0x30
[35665.723508] </TASK>
crash> struct nvme_tcp_request ffff92f6bbcc9950
struct nvme_tcp_request {
req = {
cmd = 0xffff92f5b83f6748,
result = {
u16 = 0,
u32 = 0,
u64 = 0
},
genctr = 169 '\251',
retries = 1 '\001',
flags = 0 '\000',
status = 6,
ctrl = 0xffff92f5e5df7348
},
pdu = 0xffff92f5b83f6740,
queue = 0xffff92f407cc9128,
data_len = 4096,
pdu_len = 4096,
pdu_sent = 0,
h2cdata_left = 0,
h2cdata_offset = 0,
ttag = 62,
status = 12,
entry = {
next = 0xdead000000000100,
prev = 0xdead000000000122
},
lentry = {
next = 0x0
},
ddgst = 0,
curr_bio = 0xffff9324d639e240,
iter = {
iter_type = 2 '\002',
nofault = false,
data_source = true,
iov_offset = 0,
count = 4096,
{
iov = 0xffff92f6bbcc98a8,
kvec = 0xffff92f6bbcc98a8,
bvec = 0xffff92f6bbcc98a8,
xarray = 0xffff92f6bbcc98a8,
pipe = 0xffff92f6bbcc98a8
},
{
nr_segs = 1,
{
head = 1,
start_head = 0
},
xarray_start = 1
}
},
offset = 0,
data_sent = 0,
state = NVME_TCP_SEND_DATA
}
crash> nvme_tcp_hdr.type 0xffff92f5b83f6740
type = 4 '\004',
crash>
crash> struct request ffff92f6bbcc9840
struct request {
q = 0xffff92f59d55c240,
mq_ctx = 0xffffc67bb9a1f040,
mq_hctx = 0x0,
cmd_flags = 33556483,
rq_flags = 139456,
tag = 87,
internal_tag = -1,
__data_len = 0,
__sector = 66846720,
bio = 0x0,
biotail = 0x0,
queuelist = {
next = 0xffff92f6bbcc9888,
prev = 0xffff92f6bbcc9888
},
{
hash = {
next = 0x0,
pprev = 0x0
},
ipi_list = {
next = 0x0
}
},
{
rb_node = {
__rb_parent_color = 18446685131795018112,
rb_right = 0x1000,
rb_left = 0x0
},
special_vec = {
bv_page = 0xffffca64841f3180,
bv_len = 4096,
bv_offset = 0
},
completion_data = 0xffffca64841f3180,
error_count = -2078330496
},
{
elv = {
icq = 0x0,
priv = {0xffff92f6bbcc98c8, 0xffff92f6bbcc98c8}
},
flush = {
seq = 0,
list = {
next = 0xffff92f6bbcc98c8,
prev = 0xffff92f6bbcc98c8
},
saved_end_io = 0x0
}
},
rq_disk = 0xffff92f6bdbff600,
part = 0xffff92f59f557800,
start_time_ns = 35665692557229,
io_start_time_ns = 35665692566268,
wbt_flags = 0,
stats_sectors = 0,
nr_phys_segments = 1,
nr_integrity_segments = 0,
write_hint = 0,
ioprio = 0,
state = MQ_RQ_IDLE,
ref = {
refs = {
counter = 0
}
},
timeout = 180000,
deadline = 4330512774,
{
csd = {
node = {
llist = {
next = 0x0
},
{
u_flags = 0,
a_flags = {
counter = 0
}
},
src = 0,
dst = 0
},
func = 0xffffffff8eed62d0 <__blk_mq_complete_request_remote>,
info = 0xffff92f6bbcc9840
},
fifo_time = 0
},
end_io = 0x0,
end_io_data = 0x0
}
Best regards
zhang.guanghui@cestc.cn
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
@ 2025-02-10 10:01 ` Maurizio Lombardi
2025-02-10 10:24 ` Max Gurtovoy
2025-02-12 15:33 ` Maurizio Lombardi
2025-02-17 7:46 ` Sagi Grimberg
2 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-10 10:01 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
> Hello
>
>
I guess you have to fix your mail client.
>
> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
>
> CPU1 CPU2
>
> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
Personally, I would try to fix it inside the nvme-tcp driver without
touching the core functions.
Maybe nvme_tcp_poll should just ensure that io_work completes before
calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
from getting rescheduled by the nvme_tcp_data_ready() callback.
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 10:01 ` Maurizio Lombardi
@ 2025-02-10 10:24 ` Max Gurtovoy
2025-02-10 11:16 ` zhang.guanghui
2025-02-10 16:40 ` Maurizio Lombardi
0 siblings, 2 replies; 28+ messages in thread
From: Max Gurtovoy @ 2025-02-10 10:24 UTC (permalink / raw)
To: Maurizio Lombardi, zhang.guanghui@cestc.cn, sagi, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On 10/02/2025 12:01, Maurizio Lombardi wrote:
> On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
>> Hello
>>
>>
> I guess you have to fix your mail client.
>
>> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>>
>>
>> CPU1 CPU2
>>
>> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
> This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
> Personally, I would try to fix it inside the nvme-tcp driver without
> touching the core functions.
>
> Maybe nvme_tcp_poll should just ensure that io_work completes before
> calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
> from getting rescheduled by the nvme_tcp_data_ready() callback.
>
>
> Maurizio
It seems to me that the HOST_PATH_ERROR handling can be improved in
nvme-tcp.
In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in
case we fail to submit a command..
can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),
NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in
nvme-rdma for host path error handling ?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 10:24 ` Max Gurtovoy
@ 2025-02-10 11:16 ` zhang.guanghui
2025-02-10 11:34 ` Max Gurtovoy
2025-02-10 16:40 ` Maurizio Lombardi
1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-02-10 11:16 UTC (permalink / raw)
To: mgurtovoy, Maurizio Lombardi, sagi, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi,
Thank you for your reply.
In nvme-rdma we use nvme_host_path_error(rq) , the prerequisites are -EIO, ignore this condition?
in nvme-tcp this judgment condition is different, use the function nvme_host_path_error ,
it should be possible to go to nvme_complete_rq -> nvme_retry_req -- request->mq_hctx have been freed, is NULL.
zhang.guanghui@cestc.cn
From: Max Gurtovoy
Date: 2025-02-10 18:24
To: Maurizio Lombardi; zhang.guanghui@cestc.cn; sagi; kbusch; sashal; chunguang.xu
CC: linux-kernel; linux-nvme; linux-block
Subject: Re: nvme-tcp: fix a possible UAF when failing to send request
On 10/02/2025 12:01, Maurizio Lombardi wrote:
> On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
>> Hello
>>
>>
> I guess you have to fix your mail client.
>
>> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>>
>>
>> CPU1 CPU2
>>
>> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
> This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
> Personally, I would try to fix it inside the nvme-tcp driver without
> touching the core functions.
>
> Maybe nvme_tcp_poll should just ensure that io_work completes before
> calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
> from getting rescheduled by the nvme_tcp_data_ready() callback.
>
>
> Maurizio
It seems to me that the HOST_PATH_ERROR handling can be improved in
nvme-tcp.
In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in
case we fail to submit a command..
can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),
NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in
nvme-rdma for host path error handling ?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 11:16 ` zhang.guanghui
@ 2025-02-10 11:34 ` Max Gurtovoy
0 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2025-02-10 11:34 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, Maurizio Lombardi, sagi, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
hi
On 10/02/2025 13:16, zhang.guanghui@cestc.cn wrote:
> Hi,
>
>
>
> Thank you for your reply.
>
>
> In nvme-rdma we use nvme_host_path_error(rq) , the prerequisites are -EIO, ignore this condition?
> in nvme-tcp this judgment condition is different, use the function nvme_host_path_error ,
> it should be possible to go to nvme_complete_rq -> nvme_retry_req -- request->mq_hctx have been freed, is NULL.
did you try the proposal or are you assuming it will be NULL ?
>
>
>
>
> zhang.guanghui@cestc.cn
>
>
>
>
>
>
>
> From: Max Gurtovoy
>
>
>
> Date: 2025-02-10 18:24
>
>
>
> To: Maurizio Lombardi; zhang.guanghui@cestc.cn; sagi; kbusch; sashal; chunguang.xu
>
>
>
> CC: linux-kernel; linux-nvme; linux-block
>
>
>
> Subject: Re: nvme-tcp: fix a possible UAF when failing to send request
>
>
>
>
>
>
>
> On 10/02/2025 12:01, Maurizio Lombardi wrote:
>
>
>
>> On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
>
>
>>> Hello
>
>
>
>
>
>
>> I guess you have to fix your mail client.
>
>
>
>
>>> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>
>
>>> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
>
>
>
>
>
>>> CPU1 CPU2
>
>
>
>
>>> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
>
>
>> This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
>
>
>> Personally, I would try to fix it inside the nvme-tcp driver without
>
>
>> touching the core functions.
>
>
>
>
>> Maybe nvme_tcp_poll should just ensure that io_work completes before
>
>
>> calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
>
>
>> from getting rescheduled by the nvme_tcp_data_ready() callback.
>
>
>
>
>
>
>> Maurizio
>
>
>
>
>
>
> It seems to me that the HOST_PATH_ERROR handling can be improved in
>
>
>
> nvme-tcp.
>
>
>
>
>
>
>
> In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in
>
>
>
> case we fail to submit a command..
>
>
>
>
>
>
>
> can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),
>
>
>
> NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in
>
>
>
> nvme-rdma for host path error handling ?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 10:24 ` Max Gurtovoy
2025-02-10 11:16 ` zhang.guanghui
@ 2025-02-10 16:40 ` Maurizio Lombardi
[not found] ` <CAAO4dAWdsMjYMp9jdWXd_48aG0mTtVpRONqjJJ1scnc773tHzg@mail.gmail.com>
1 sibling, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-10 16:40 UTC (permalink / raw)
To: Max Gurtovoy, zhang.guanghui@cestc.cn, sagi, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On Mon Feb 10, 2025 at 11:24 AM CET, Max Gurtovoy wrote:
>
> It seems to me that the HOST_PATH_ERROR handling can be improved in
> nvme-tcp.
>
> In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in
> case we fail to submit a command..
>
> can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),
> NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in
> nvme-rdma for host path error handling ?
Yes, I could try to prepare a patch.
In any case, I think the main issue here is that nvme_tcp_poll()
should be prevented from racing against io_work... and I also think
there is a possible race condition if nvme_tcp_poll() races against
the controller resetting code.
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
2025-02-10 10:01 ` Maurizio Lombardi
@ 2025-02-12 15:33 ` Maurizio Lombardi
2025-02-12 16:07 ` Maurizio Lombardi
2025-02-17 7:46 ` Sagi Grimberg
2 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 15:33 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
> Hello
>
> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
> CPU1 CPU2
>
> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
>
> nvme_tcp_try_recv nvme_tcp_fail_request
>
> nvme_tcp_recv_skb nvme_tcp_end_request
>
> nvme_tcp_recv_pdu nvme_complete_rq
>
> nvme_tcp_handle_comp nvme_retry_req -- request->mq_hctx have been freed, is NULL.
> nvme_tcp_process_nvme_cqe
>
> nvme_complete_rq
>
> nvme_end_req
>
> blk_mq_end_request
Taking a step back. Let's take a different approach and try to avoid the
double completion.
The problem here is that apparently we received a nvme_tcp_rsp capsule
from the target, meaning that the command has been processed (I guess
the capsule has an error status?)
So maybe only part of the command has been sent?
Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
error by the controller?
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-12 15:33 ` Maurizio Lombardi
@ 2025-02-12 16:07 ` Maurizio Lombardi
2025-02-13 2:04 ` zhang.guanghui
0 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 16:07 UTC (permalink / raw)
To: Maurizio Lombardi, zhang.guanghui@cestc.cn, sagi, mgurtovoy,
kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On Wed Feb 12, 2025 at 4:33 PM CET, Maurizio Lombardi wrote:
>
> Taking a step back. Let's take a different approach and try to avoid the
> double completion.
>
> The problem here is that apparently we received a nvme_tcp_rsp capsule
> from the target, meaning that the command has been processed (I guess
> the capsule has an error status?)
>
> So maybe only part of the command has been sent?
> Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
> error by the controller?
The NVMe/TCP specification says
******
When a controller detects a fatal error, that controller shall:
1. stop processing any PDUs that arrive on the connection; and
2. send a C2HTermReq PDU
******
And indeed I see in the dmesg this:
nvme nvme2: unsupported pdu type (3)
This means the controller detected the problem and sent to the host the
C2HTermReq command. Upon receiving this command, the host is supposed to
close the connection.
Now I get it.
Zhang, do you have commit aeacfcefa218f4ed11da478e9b7915a37d1afaff in
your kernel, I guess you are missing it. Check it please.
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-12 16:07 ` Maurizio Lombardi
@ 2025-02-13 2:04 ` zhang.guanghui
0 siblings, 0 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-02-13 2:04 UTC (permalink / raw)
To: Maurizio Lombardi, sagi, mgurtovoy, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi
int fact, the commit aeacfcefa218f4ed11da478e9b7915a37d1afaff has been synchronized, but the issue still occurs.
I think so, when the host failing to send request -13, maybe only part of the command has been sent or the controller has receved a complete command.
but the controller may send nvme_tcp_rsp and C2HTermReq consecutively. Now I am analyzing the controller spdk processing logic.
when the host uses nvme_tcp_poll, it receives nvme_tcp_rsp and handles it firstly,
then the host have a UAF and crashed.
zhang.guanghui@cestc.cn
From: Maurizio Lombardi
Date: 2025-02-13 00:07
To: Maurizio Lombardi; zhang.guanghui@cestc.cn; sagi; mgurtovoy; kbusch; sashal; chunguang.xu
CC: linux-kernel; linux-nvme; linux-block
Subject: Re: nvme-tcp: fix a possible UAF when failing to send request
On Wed Feb 12, 2025 at 4:33 PM CET, Maurizio Lombardi wrote:
>
> Taking a step back. Let's take a different approach and try to avoid the
> double completion.
>
> The problem here is that apparently we received a nvme_tcp_rsp capsule
> from the target, meaning that the command has been processed (I guess
> the capsule has an error status?)
>
> So maybe only part of the command has been sent?
> Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
> error by the controller?
--
> The NVMe/TCP specification says
******
When a controller detects a fatal error, that controller shall:
1. stop processing any PDUs that arrive on the connection; and
2. send a C2HTermReq PDU
******
And indeed I see in the dmesg this:
nvme nvme2: unsupported pdu type (3)
This means the controller detected the problem and sent to the host the
C2HTermReq command. Upon receiving this command, the host is supposed to
close the connection.
Now I get it.
Zhang, do you have commit aeacfcefa218f4ed11da478e9b7915a37d1afaff in
your kernel, I guess you are missing it. Check it please.
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-10 7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
2025-02-10 10:01 ` Maurizio Lombardi
2025-02-12 15:33 ` Maurizio Lombardi
@ 2025-02-17 7:46 ` Sagi Grimberg
2025-02-20 8:20 ` zhang.guanghui
2025-03-07 10:10 ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】 zhang.guanghui
2 siblings, 2 replies; 28+ messages in thread
From: Sagi Grimberg @ 2025-02-17 7:46 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, mgurtovoy, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On 10/02/2025 9:41, zhang.guanghui@cestc.cn wrote:
> Hello
>
>
>
> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>
>
>
> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
>
>
>
>
> CPU1 CPU2
>
>
>
> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
>
>
>
> nvme_tcp_try_recv nvme_tcp_fail_request
>
>
>
> nvme_tcp_recv_skb nvme_tcp_end_request
>
>
>
> nvme_tcp_recv_pdu nvme_complete_rq
>
>
>
> nvme_tcp_handle_comp nvme_retry_req -- request->mq_hctx have been freed, is NULL.
>
>
>
> nvme_tcp_process_nvme_cqe
>
>
>
> nvme_complete_rq
>
>
>
> nvme_end_req
>
>
>
> blk_mq_end_request
>
>
>
>
>
>
>
> when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then the nvme_tcp_fail_request would execute。
>
>
>
> but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request. request->mq_hctx was used after free.
>
>
>
> the follow patch is to solve it.
Zhang, your email client needs fixing - it is impossible to follow your
emails.
>
>
>
> can you give some suggestions? thanks!
The problem is the C2HTerm that the host is unable to handle correctly.
And it also appears that nvme_tcp_poll() does not signal correctly to
blk-mq to stop
calling poll.
One thing to do is to handle C2HTerm PDU correctly, and, here is a
possible fix to try for the UAF issue:
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index c637ff04a052..0e390e98aaf9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx
*hctx, struct io_comp_batch *iob)
{
struct nvme_tcp_queue *queue = hctx->driver_data;
struct sock *sk = queue->sock->sk;
+ int ret;
if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
return 0;
@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx
*hctx, struct io_comp_batch *iob)
set_bit(NVME_TCP_Q_POLLING, &queue->flags);
if (sk_can_busy_loop(sk) &&
skb_queue_empty_lockless(&sk->sk_receive_queue))
sk_busy_loop(sk, true);
- nvme_tcp_try_recv(queue);
+ ret = nvme_tcp_try_recv(queue);
clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
- return queue->nr_cqe;
+ return ret < 0 ? ret : queue->nr_cqe;
}
static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int
size)
--
Does this help?
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
2025-02-17 7:46 ` Sagi Grimberg
@ 2025-02-20 8:20 ` zhang.guanghui
2025-03-07 10:10 ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】 zhang.guanghui
1 sibling, 0 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-02-20 8:20 UTC (permalink / raw)
To: sagi, mgurtovoy, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi
After testing this patch, No sending request failure occurred, the issue has not been reproduced yet.
It may take a long time to test.
best wishes
zhang.guanghui@cestc.cn
发件人: Sagi Grimberg
发送时间: 2025-02-17 15:46
收件人: zhang.guanghui@cestc.cn; mgurtovoy; kbusch; sashal; chunguang.xu
抄送: linux-kernel; linux-nvme; linux-block
主题: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
On 10/02/2025 9:41, zhang.guanghui@cestc.cn wrote:
> Hello
>
>
>
> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>
>
>
> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
>
>
>
>
> CPU1 CPU2
>
>
>
> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
>
>
>
> nvme_tcp_try_recv nvme_tcp_fail_request
>
>
>
> nvme_tcp_recv_skb nvme_tcp_end_request
>
>
>
> nvme_tcp_recv_pdu nvme_complete_rq
>
>
>
> nvme_tcp_handle_comp nvme_retry_req -- request->mq_hctx have been freed, is NULL.
>
>
>
> nvme_tcp_process_nvme_cqe
>
>
>
> nvme_complete_rq
>
>
>
> nvme_end_req
>
>
>
> blk_mq_end_request
>
>
>
>
>
>
>
> when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then the nvme_tcp_fail_request would execute。
>
>
>
> but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request. request->mq_hctx was used after free.
>
>
>
> the follow patch is to solve it.
Zhang, your email client needs fixing - it is impossible to follow your
emails.
>
>
>
> can you give some suggestions? thanks!
The problem is the C2HTerm that the host is unable to handle correctly.
And it also appears that nvme_tcp_poll() does not signal correctly to
blk-mq to stop
calling poll.
One thing to do is to handle C2HTerm PDU correctly, and, here is a
possible fix to try for the UAF issue:
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index c637ff04a052..0e390e98aaf9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx
*hctx, struct io_comp_batch *iob)
{
struct nvme_tcp_queue *queue = hctx->driver_data;
struct sock *sk = queue->sock->sk;
+ int ret;
if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
return 0;
@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx
*hctx, struct io_comp_batch *iob)
set_bit(NVME_TCP_Q_POLLING, &queue->flags);
if (sk_can_busy_loop(sk) &&
skb_queue_empty_lockless(&sk->sk_receive_queue))
sk_busy_loop(sk, true);
- nvme_tcp_try_recv(queue);
+ ret = nvme_tcp_try_recv(queue);
clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
- return queue->nr_cqe;
+ return ret < 0 ? ret : queue->nr_cqe;
}
static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int
size)
--
Does this help?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
2025-02-17 7:46 ` Sagi Grimberg
2025-02-20 8:20 ` zhang.guanghui
@ 2025-03-07 10:10 ` zhang.guanghui
2025-03-11 10:52 ` Maurizio Lombardi
1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-07 10:10 UTC (permalink / raw)
To: sagi, mgurtovoy, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi
After testing this patch, sending request failure occurred, unfortunately, the issue still persists.
best wishes
zhang.guanghui@cestc.cn
发件人: Sagi Grimberg
发送时间: 2025-02-17 15:46
收件人: zhang.guanghui@cestc.cn; mgurtovoy; kbusch; sashal; chunguang.xu
抄送: linux-kernel; linux-nvme; linux-block
主题: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
On 10/02/2025 9:41, zhang.guanghui@cestc.cn wrote:
> Hello
>
>
>
> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>
>
>
> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
>
>
>
>
> CPU1 CPU2
>
>
>
> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
>
>
>
> nvme_tcp_try_recv nvme_tcp_fail_request
>
>
>
> nvme_tcp_recv_skb nvme_tcp_end_request
>
>
>
> nvme_tcp_recv_pdu nvme_complete_rq
>
>
>
> nvme_tcp_handle_comp nvme_retry_req -- request->mq_hctx have been freed, is NULL.
>
>
>
> nvme_tcp_process_nvme_cqe
>
>
>
> nvme_complete_rq
>
>
>
> nvme_end_req
>
>
>
> blk_mq_end_request
>
>
>
>
>
>
>
> when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then the nvme_tcp_fail_request would execute。
>
>
>
> but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request. request->mq_hctx was used after free.
>
>
>
> the follow patch is to solve it.
Zhang, your email client needs fixing - it is impossible to follow your
emails.
>
>
>
> can you give some suggestions? thanks!
The problem is the C2HTerm that the host is unable to handle correctly.
And it also appears that nvme_tcp_poll() does not signal correctly to
blk-mq to stop
calling poll.
One thing to do is to handle C2HTerm PDU correctly, and, here is a
possible fix to try for the UAF issue:
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index c637ff04a052..0e390e98aaf9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx
*hctx, struct io_comp_batch *iob)
{
struct nvme_tcp_queue *queue = hctx->driver_data;
struct sock *sk = queue->sock->sk;
+ int ret;
if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
return 0;
@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx
*hctx, struct io_comp_batch *iob)
set_bit(NVME_TCP_Q_POLLING, &queue->flags);
if (sk_can_busy_loop(sk) &&
skb_queue_empty_lockless(&sk->sk_receive_queue))
sk_busy_loop(sk, true);
- nvme_tcp_try_recv(queue);
+ ret = nvme_tcp_try_recv(queue);
clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
- return queue->nr_cqe;
+ return ret < 0 ? ret : queue->nr_cqe;
}
static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int
size)
--
Does this help?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
2025-03-07 10:10 ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】 zhang.guanghui
@ 2025-03-11 10:52 ` Maurizio Lombardi
2025-03-13 1:48 ` zhang.guanghui
2025-03-28 9:24 ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
0 siblings, 2 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-03-11 10:52 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On Fri Mar 7, 2025 at 11:10 AM CET, zhang.guanghui@cestc.cn wrote:
>
> Hi
>
> After testing this patch, sending request failure occurred, unfortunately, the issue still persists.
Maybe I am completely wrong but I am still quite convinced that the problem here
is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule
for the command, leading to a double-completion in the host.
Sagi, what about taking this patch: https://lore.kernel.org/linux-nvme/20250306160322.1370300-2-mlombard@redhat.com/T/#u
and do a step further by not completing the request, leaving the error
recovery handler the task of cleaning everything up?
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 327f3f2f5399..72c1d7948386 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1328,8 +1328,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
} else if (ret < 0) {
dev_err(queue->ctrl->ctrl.device,
"failed to send request %d\n", ret);
- nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
+ nvme_tcp_error_recovery(&queue->ctrl->ctrl);
}
out:
memalloc_noreclaim_restore(noreclaim_flag);
Maurizio
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
2025-03-11 10:52 ` Maurizio Lombardi
@ 2025-03-13 1:48 ` zhang.guanghui
2025-03-13 7:51 ` Hannes Reinecke
2025-03-28 9:24 ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-13 1:48 UTC (permalink / raw)
To: Maurizio Lombardi, sagi, mgurtovoy, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Yes, the problem here is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.
Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index be04c5f3856d..17407eb12ad9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
} else if (ret < 0) {
dev_err(queue->ctrl->ctrl.device,
"failed to send request %d\n", ret);
- nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
+ queue->rd_enabled = false;
+ nvme_tcp_error_recovery(&queue->ctrl->ctrl);
}
out:
memalloc_noreclaim_restore(noreclaim_flag);
zhang.guanghui@cestc.cn
发件人: Maurizio Lombardi
发送时间: 2025-03-11 18:52
收件人: zhang.guanghui@cestc.cn; sagi; mgurtovoy; kbusch; sashal; chunguang.xu
抄送: linux-kernel; linux-nvme; linux-block
主题: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
On Fri Mar 7, 2025 at 11:10 AM CET, zhang.guanghui@cestc.cn wrote:
>
> Hi
>
> After testing this patch, sending request failure occurred, unfortunately, the issue still persists.
Maybe I am completely wrong but I am still quite convinced that the problem here
is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule
for the command, leading to a double-completion in the host.
Sagi, what about taking this patch: https://lore.kernel.org/linux-nvme/20250306160322.1370300-2-mlombard@redhat.com/T/#u
and do a step further by not completing the request, leaving the error
recovery handler the task of cleaning everything up?
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 327f3f2f5399..72c1d7948386 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1328,8 +1328,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
} else if (ret < 0) {
dev_err(queue->ctrl->ctrl.device,
"failed to send request %d\n", ret);
- nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
+ nvme_tcp_error_recovery(&queue->ctrl->ctrl);
}
out:
memalloc_noreclaim_restore(noreclaim_flag);
Maurizio
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
2025-03-13 1:48 ` zhang.guanghui
@ 2025-03-13 7:51 ` Hannes Reinecke
2025-03-13 8:18 ` zhang.guanghui
2025-03-13 8:38 ` zhang.guanghui
0 siblings, 2 replies; 28+ messages in thread
From: Hannes Reinecke @ 2025-03-13 7:51 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, Maurizio Lombardi, sagi, mgurtovoy,
kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On 3/13/25 02:48, zhang.guanghui@cestc.cn wrote:
> Yes, the problem here is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.
>
> Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index be04c5f3856d..17407eb12ad9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
> } else if (ret < 0) {
> dev_err(queue->ctrl->ctrl.device,
> "failed to send request %d\n", ret);
> - nvme_tcp_fail_request(queue->request);
> nvme_tcp_done_send_req(queue);
> + queue->rd_enabled = false;
> + nvme_tcp_error_recovery(&queue->ctrl->ctrl);
> }
> out:
> memalloc_noreclaim_restore(noreclaim_flag);
>
>
>
Hmm. In principle, yes. Problem is that network is a bi-directional
communication, and a failure on one side doesn't necessarily imply
a failure on the other.
In particular when the send side fails we should _continue_ to read
as we should be flushing the read side buffer before closing.
So I agree with starting error recovery, but not with disabling the
reading side (as we haven't encountered a read error).
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
2025-03-13 7:51 ` Hannes Reinecke
@ 2025-03-13 8:18 ` zhang.guanghui
[not found] ` <2025031316313196627826@cestc.cn>
2025-03-13 8:38 ` zhang.guanghui
1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-13 8:18 UTC (permalink / raw)
To: Hannes Reinecke, Maurizio Lombardi, sagi, mgurtovoy, kbusch,
sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi, in fact, the target may send C2HTermReq.
zhang.guanghui@cestc.cn
发件人: Hannes Reinecke
发送时间: 2025-03-13 15:51
收件人: zhang.guanghui@cestc.cn; Maurizio Lombardi; sagi; mgurtovoy; kbusch; sashal; chunguang.xu
抄送: linux-kernel; linux-nvme; linux-block
主题: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
On 3/13/25 02:48, zhang.guanghui@cestc.cn wrote:
> Yes, the problem here is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.
>
> Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index be04c5f3856d..17407eb12ad9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
> } else if (ret < 0) {
> dev_err(queue->ctrl->ctrl.device,
> "failed to send request %d\n", ret);
> - nvme_tcp_fail_request(queue->request);
> nvme_tcp_done_send_req(queue);
> + queue->rd_enabled = false;
> + nvme_tcp_error_recovery(&queue->ctrl->ctrl);
> }
> out:
> memalloc_noreclaim_restore(noreclaim_flag);
>
>
>
Hmm. In principle, yes. Problem is that network is a bi-directional
communication, and a failure on one side doesn't necessarily imply
a failure on the other.
In particular when the send side fails we should _continue_ to read
as we should be flushing the read side buffer before closing.
So I agree with starting error recovery, but not with disabling the
reading side (as we haven't encountered a read error).
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
2025-03-13 7:51 ` Hannes Reinecke
2025-03-13 8:18 ` zhang.guanghui
@ 2025-03-13 8:38 ` zhang.guanghui
1 sibling, 0 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-03-13 8:38 UTC (permalink / raw)
To: Hannes Reinecke, Maurizio Lombardi, sagi, mgurtovoy, kbusch,
sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi,
in fact, the nvme_tcp_try_send() failure, the target may send C2HTermReq immediately. while the host receives the C2HTermReq and still starting error recovery.
so when queue->rd_enabled is false, can avoid starting error recovery agagin. Thanks
zhang.guanghui@cestc.cn
发件人: Hannes Reinecke
发送时间: 2025-03-13 15:51
收件人: zhang.guanghui@cestc.cn; Maurizio Lombardi; sagi; mgurtovoy; kbusch; sashal; chunguang.xu
抄送: linux-kernel; linux-nvme; linux-block
主题: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
On 3/13/25 02:48, zhang.guanghui@cestc.cn wrote:
> Yes, the problem here is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.
>
> Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index be04c5f3856d..17407eb12ad9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
> } else if (ret < 0) {
> dev_err(queue->ctrl->ctrl.device,
> "failed to send request %d\n", ret);
> - nvme_tcp_fail_request(queue->request);
> nvme_tcp_done_send_req(queue);
> + queue->rd_enabled = false;
> + nvme_tcp_error_recovery(&queue->ctrl->ctrl);
> }
> out:
> memalloc_noreclaim_restore(noreclaim_flag);
>
>
>
Hmm. In principle, yes. Problem is that network is a bi-directional
communication, and a failure on one side doesn't necessarily imply
a failure on the other.
In particular when the send side fails we should _continue_ to read
as we should be flushing the read side buffer before closing.
So I agree with starting error recovery, but not with disabling the
reading side (as we haven't encountered a read error).
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
2025-03-11 10:52 ` Maurizio Lombardi
2025-03-13 1:48 ` zhang.guanghui
@ 2025-03-28 9:24 ` zhang.guanghui
2025-04-01 12:11 ` Maurizio Lombardi
1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-28 9:24 UTC (permalink / raw)
To: Maurizio Lombardi, sagi, mgurtovoy, kbusch, sashal, chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
Hi,
So far, the UAF issue have not been occcurred during testing this patch in the same situation.
Additionally, the patch triggers a compilation warning, nvme_tcp_fail_request is declared but not defined, please also review.
Thanks
zhang.guanghui@cestc.cn
发件人: Maurizio Lombardi
发送时间: 2025-03-11 18:52
收件人: zhang.guanghui@cestc.cn; sagi; mgurtovoy; kbusch; sashal; chunguang.xu
抄送: linux-kernel; linux-nvme; linux-block
主题: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】
On Fri Mar 7, 2025 at 11:10 AM CET, zhang.guanghui@cestc.cn wrote:
>
> Hi
>
> After testing this patch, sending request failure occurred, unfortunately, the issue still persists.
Maybe I am completely wrong but I am still quite convinced that the problem here
is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule
for the command, leading to a double-completion in the host.
Sagi, what about taking this patch: https://lore.kernel.org/linux-nvme/20250306160322.1370300-2-mlombard@redhat.com/T/#u
and do a step further by not completing the request, leaving the error
recovery handler the task of cleaning everything up?
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 327f3f2f5399..72c1d7948386 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1328,8 +1328,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
} else if (ret < 0) {
dev_err(queue->ctrl->ctrl.device,
"failed to send request %d\n", ret);
- nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
+ nvme_tcp_error_recovery(&queue->ctrl->ctrl);
}
out:
memalloc_noreclaim_restore(noreclaim_flag);
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: nvme-tcp: fix a possible UAF when failing to send request
2025-03-28 9:24 ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
@ 2025-04-01 12:11 ` Maurizio Lombardi
0 siblings, 0 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-04-01 12:11 UTC (permalink / raw)
To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
chunguang.xu
Cc: linux-kernel, linux-nvme, linux-block
On Fri Mar 28, 2025 at 10:24 AM CET, zhang.guanghui@cestc.cn wrote:
> Hi,
>
> So far, the UAF issue have not been occcurred during testing this patch in the same situation.
> Additionally, the patch triggers a compilation warning, nvme_tcp_fail_request is declared but not defined, please also review.
>
Ok, I am going to submit a formal patch.
Thanks for testing it,
Maurizio
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-04-01 12:18 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10 7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
2025-02-10 10:01 ` Maurizio Lombardi
2025-02-10 10:24 ` Max Gurtovoy
2025-02-10 11:16 ` zhang.guanghui
2025-02-10 11:34 ` Max Gurtovoy
2025-02-10 16:40 ` Maurizio Lombardi
[not found] ` <CAAO4dAWdsMjYMp9jdWXd_48aG0mTtVpRONqjJJ1scnc773tHzg@mail.gmail.com>
2025-02-11 8:04 ` zhang.guanghui
2025-02-12 8:11 ` Maurizio Lombardi
2025-02-12 8:23 ` Maurizio Lombardi
2025-02-12 8:52 ` Maurizio Lombardi
2025-02-12 9:47 ` zhang.guanghui
2025-02-12 10:28 ` Maurizio Lombardi
2025-02-12 11:14 ` Maurizio Lombardi
2025-02-12 11:47 ` Maurizio Lombardi
2025-02-12 15:33 ` Maurizio Lombardi
2025-02-12 16:07 ` Maurizio Lombardi
2025-02-13 2:04 ` zhang.guanghui
2025-02-17 7:46 ` Sagi Grimberg
2025-02-20 8:20 ` zhang.guanghui
2025-03-07 10:10 ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意,邮件由sagigrim@gmail.com代发】 zhang.guanghui
2025-03-11 10:52 ` Maurizio Lombardi
2025-03-13 1:48 ` zhang.guanghui
2025-03-13 7:51 ` Hannes Reinecke
2025-03-13 8:18 ` zhang.guanghui
[not found] ` <2025031316313196627826@cestc.cn>
2025-03-13 9:01 ` Maurizio Lombardi
2025-03-13 8:38 ` zhang.guanghui
2025-03-28 9:24 ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
2025-04-01 12:11 ` Maurizio Lombardi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).