nvme-tcp: fix a possible UAF when failing to send request

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* nvme-tcp: fix a possible UAF when failing to send request
@ 2025-02-10  7:41 zhang.guanghui
  2025-02-10 10:01 ` Maurizio Lombardi
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-02-10  7:41 UTC (permalink / raw)
  To: sagi, mgurtovoy, kbusch, sashal, chunguang.xu, zhang.guanghui
  Cc: linux-kernel, linux-nvme, linux-block

Hello 



    When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.



By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free. 





CPU1                                                                   CPU2



nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13 



    nvme_tcp_try_recv                                                      nvme_tcp_fail_request



        nvme_tcp_recv_skb                                                      nvme_tcp_end_request



            nvme_tcp_recv_pdu                                                      nvme_complete_rq 



                nvme_tcp_handle_comp                                                   nvme_retry_req -- request->mq_hctx have been freed, is NULL.               



                    nvme_tcp_process_nvme_cqe                                                                                    



                        nvme_complete_rq



                            nvme_end_req



                                  blk_mq_end_request 







when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then  the nvme_tcp_fail_request would execute。



but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request.  request->mq_hctx was used after free. 



the follow patch is to solve it. 



can you give  some suggestions?  thanks!



diff --git a/linux/drivers/nvme/host/core.c b/linux/drivers/nvme/host/core.c



index a65b1dce8..417466674 100644



--- a/linux/drivers/nvme/host/core.c



+++ b/linux/drivers/nvme/host/core.c



@@ -288,6 +288,9 @@ static void nvme_retry_req(struct request *req)



        unsigned long delay = 0;



        u16 crd;



+       if(!req->mq_hctx && req->state == MQ_RQ_IDLE)
+               return;
+


        /* The mask and shift result must be <= 3 */



        crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;




The details are as follows



[35665.692310] nvme nvme2: failed to send request -13



[35665.692683] nvme nvme2: queue 1 failed to send request 00000000b42f4e2b state 2 pdu 00000000d7fb8da3 type 4 rq_state 1 nrq_status 0



[35665.693323] nvme nvme2: failed to send rq 00000000f86a68c3 state 2 nrq_status 370



[35665.702265] nvme nvme2: unsupported pdu type (3)



[35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000



[35665.702542] nvme nvme2: queue 1 receive failed:  -22



[35665.703209] #PF: supervisor write access in kernel mode



[35665.703213] #PF: error_code(0x0002) - not-present page



[35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0



[35665.703220] Oops: 0002 [#1] SMP PTI



[35665.703658] nvme nvme2: starting error recovery



[35665.704442] CPU: 20 PID: 815 Comm: kworker/20:1H Kdump: loaded Not tainted 5.15.131-17.cl9.x86_64 #1



[35665.705168] nvme nvme2: queue 1 receive again after receive failed



[35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022



[35665.705812] Workqueue: kblockd blk_mq_requeue_work



[35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30



[35665.709606] Code: 05 c3 cc cc cc cc 89 c6 e8 31 05 68 ff 66 90 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e8 02 05 68 ff 66 90 c3 cc



[35665.710470] RSP: 0000:ffffa67bcd797e08 EFLAGS: 00010246



[35665.710925] RAX: 0000000000000000 RBX: ffff92f6bbcc9840 RCX: ffff92f6bbcc9888



[35665.711393] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000



[35665.711849] RBP: 0000000000000000 R08: ffffa67bcd797e48 R09: ffff932346d576f4



[35665.712275] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000



[35665.712725] R13: ffff92f6bbcc9888 R14: 0000000000000008 R15: 0000000000000000



[35665.713158] FS:  0000000000000000(0000) GS:ffff93527d400000(0000) knlGS:0000000000000000



[35665.713603] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033



[35665.714063] CR2: 0000000000000000 CR3: 000000371aa02006 CR4: 00000000007706e0



[35665.714534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000



[35665.714961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400



[35665.715359] PKRU: 55555554



[35665.715788] Call Trace:



[35665.716201]  <TASK>



[35665.716613]  ? show_trace_log_lvl+0x1c1/0x2d9



[35665.717049]  ? show_trace_log_lvl+0x1c1/0x2d9



[35665.717457]  ? blk_mq_request_bypass_insert+0x2c/0xb0



[35665.717950]  ? __die_body.cold+0x8/0xd



[35665.718361]  ? page_fault_oops+0xac/0x140



[35665.718749]  ? blk_mq_start_request+0x30/0xf0



[35665.719144]  ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp]



[35665.719547]  ? exc_page_fault+0x62/0x130



[35665.719938]  ? asm_exc_page_fault+0x22/0x30



[35665.720333]  ? _raw_spin_lock+0xc/0x30



[35665.720723]  blk_mq_request_bypass_insert+0x2c/0xb0



[35665.721101]  blk_mq_requeue_work+0xa5/0x180



[35665.721451]  process_one_work+0x1e8/0x390



[35665.721809]  worker_thread+0x53/0x3d0



[35665.722159]  ? process_one_work+0x390/0x390



[35665.722501]  kthread+0x124/0x150



[35665.722849]  ? set_kthread_struct+0x50/0x50



[35665.723182]  ret_from_fork+0x1f/0x30



[35665.723508]  </TASK>




crash> struct nvme_tcp_request ffff92f6bbcc9950



struct nvme_tcp_request {



  req = {



    cmd = 0xffff92f5b83f6748,



    result = {



      u16 = 0,



      u32 = 0,



      u64 = 0



    },



    genctr = 169 '\251',



    retries = 1 '\001',



    flags = 0 '\000',



    status = 6,



    ctrl = 0xffff92f5e5df7348



  },



  pdu = 0xffff92f5b83f6740,



  queue = 0xffff92f407cc9128,



  data_len = 4096,



  pdu_len = 4096,



  pdu_sent = 0,



  h2cdata_left = 0,



  h2cdata_offset = 0,



  ttag = 62,



  status = 12,



  entry = {



    next = 0xdead000000000100,



    prev = 0xdead000000000122



  },



  lentry = {



    next = 0x0



  },



  ddgst = 0,



  curr_bio = 0xffff9324d639e240,



  iter = {



    iter_type = 2 '\002',



    nofault = false,



    data_source = true,



    iov_offset = 0,



    count = 4096,



    {



      iov = 0xffff92f6bbcc98a8,



      kvec = 0xffff92f6bbcc98a8,



      bvec = 0xffff92f6bbcc98a8,



      xarray = 0xffff92f6bbcc98a8,



      pipe = 0xffff92f6bbcc98a8



    },



    {



      nr_segs = 1,



      {



        head = 1,



        start_head = 0



      },



      xarray_start = 1



    }



  },



  offset = 0,



  data_sent = 0,



  state = NVME_TCP_SEND_DATA



}





crash> nvme_tcp_hdr.type 0xffff92f5b83f6740



  type = 4 '\004',



crash>



crash> struct request  ffff92f6bbcc9840



struct request {



  q = 0xffff92f59d55c240,



  mq_ctx = 0xffffc67bb9a1f040,



  mq_hctx = 0x0,



  cmd_flags = 33556483,



  rq_flags = 139456,



  tag = 87,



  internal_tag = -1,



  __data_len = 0,



  __sector = 66846720,



  bio = 0x0,



  biotail = 0x0,



  queuelist = {



    next = 0xffff92f6bbcc9888,



    prev = 0xffff92f6bbcc9888



  },



  {



    hash = {



      next = 0x0,



      pprev = 0x0



    },



    ipi_list = {



      next = 0x0



    }



  },



  {



    rb_node = {



      __rb_parent_color = 18446685131795018112,



      rb_right = 0x1000,



      rb_left = 0x0



    },



    special_vec = {



      bv_page = 0xffffca64841f3180,



      bv_len = 4096,



      bv_offset = 0



    },



    completion_data = 0xffffca64841f3180,



    error_count = -2078330496



  },



  {



    elv = {



      icq = 0x0,



      priv = {0xffff92f6bbcc98c8, 0xffff92f6bbcc98c8}



    },



    flush = {



      seq = 0,



      list = {



        next = 0xffff92f6bbcc98c8,



        prev = 0xffff92f6bbcc98c8



      },



      saved_end_io = 0x0



    }



  },



  rq_disk = 0xffff92f6bdbff600,



  part = 0xffff92f59f557800,



  start_time_ns = 35665692557229,



  io_start_time_ns = 35665692566268,



  wbt_flags = 0,



  stats_sectors = 0,



  nr_phys_segments = 1,



  nr_integrity_segments = 0,



  write_hint = 0,



  ioprio = 0,



  state = MQ_RQ_IDLE,



  ref = {



    refs = {



      counter = 0



    }



  },



  timeout = 180000,



  deadline = 4330512774,



  {



    csd = {



      node = {



        llist = {



          next = 0x0



        },



        {



          u_flags = 0,



          a_flags = {



            counter = 0



          }



        },



        src = 0,



        dst = 0



      },



      func = 0xffffffff8eed62d0 <__blk_mq_complete_request_remote>,



      info = 0xffff92f6bbcc9840



    },



    fifo_time = 0



  },



  end_io = 0x0,



  end_io_data = 0x0



}



Best regards



zhang.guanghui@cestc.cn



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10  7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
@ 2025-02-10 10:01 ` Maurizio Lombardi
  2025-02-10 10:24   ` Max Gurtovoy
  2025-02-12 15:33 ` Maurizio Lombardi
  2025-02-17  7:46 ` Sagi Grimberg
  2 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-10 10:01 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
> Hello 
>
>

I guess you have to fix your mail client.

>
>     When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free. 
>
>
> CPU1                                                                   CPU2
>
> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13

This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
Personally, I would try to fix it inside the nvme-tcp driver without
touching the core functions.

Maybe nvme_tcp_poll should just ensure that io_work completes before
calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
from getting rescheduled by the nvme_tcp_data_ready() callback.


Maurizio


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10 10:01 ` Maurizio Lombardi
@ 2025-02-10 10:24   ` Max Gurtovoy
  2025-02-10 11:16     ` zhang.guanghui
  2025-02-10 16:40     ` Maurizio Lombardi
  0 siblings, 2 replies; 28+ messages in thread
From: Max Gurtovoy @ 2025-02-10 10:24 UTC (permalink / raw)
  To: Maurizio Lombardi, zhang.guanghui@cestc.cn, sagi, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block


On 10/02/2025 12:01, Maurizio Lombardi wrote:
> On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
>> Hello
>>
>>
> I guess you have to fix your mail client.
>
>>      When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free.
>>
>>
>> CPU1                                                                   CPU2
>>
>> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13
> This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
> Personally, I would try to fix it inside the nvme-tcp driver without
> touching the core functions.
>
> Maybe nvme_tcp_poll should just ensure that io_work completes before
> calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
> from getting rescheduled by the nvme_tcp_data_ready() callback.
>
>
> Maurizio

It seems to me that the HOST_PATH_ERROR handling can be improved in 
nvme-tcp.

In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in 
case we fail to submit a command..

can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req), 
NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in 
nvme-rdma for host path error handling ?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10 10:24   ` Max Gurtovoy
@ 2025-02-10 11:16     ` zhang.guanghui
  2025-02-10 11:34       ` Max Gurtovoy
  2025-02-10 16:40     ` Maurizio Lombardi
  1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-02-10 11:16 UTC (permalink / raw)
  To: mgurtovoy, Maurizio Lombardi, sagi, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

Hi， 



     Thank you for your reply.


     In nvme-rdma we use nvme_host_path_error(rq) ,   the prerequisites are -EIO,  ignore this condition? 
 in nvme-tcp this judgment condition is different,  use the function nvme_host_path_error , 
it should be possible to go to nvme_complete_rq ->  nvme_retry_req -- request->mq_hctx have been freed, is NULL.            




zhang.guanghui@cestc.cn



 



From: Max Gurtovoy



Date: 2025-02-10 18:24



To: Maurizio Lombardi; zhang.guanghui@cestc.cn; sagi; kbusch; sashal; chunguang.xu



CC: linux-kernel; linux-nvme; linux-block



Subject: Re: nvme-tcp: fix a possible UAF when failing to send request



 



On 10/02/2025 12:01, Maurizio Lombardi wrote:



> On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:



>> Hello



>>



>>



> I guess you have to fix your mail client.



>



>>      When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.



>> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free.



>>



>>



>> CPU1                                                                   CPU2



>>



>> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13



> This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()



> Personally, I would try to fix it inside the nvme-tcp driver without



> touching the core functions.



>



> Maybe nvme_tcp_poll should just ensure that io_work completes before



> calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work



> from getting rescheduled by the nvme_tcp_data_ready() callback.



>



>



> Maurizio



 



It seems to me that the HOST_PATH_ERROR handling can be improved in



nvme-tcp.



 



In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in



case we fail to submit a command..



 



can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),



NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in



nvme-rdma for host path error handling ?



 



 



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10 11:16     ` zhang.guanghui
@ 2025-02-10 11:34       ` Max Gurtovoy
  0 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2025-02-10 11:34 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, Maurizio Lombardi, sagi, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

hi

On 10/02/2025 13:16, zhang.guanghui@cestc.cn wrote:
> Hi，
>
>
>
>       Thank you for your reply.
>
>
>       In nvme-rdma we use nvme_host_path_error(rq) ,   the prerequisites are -EIO,  ignore this condition?
>   in nvme-tcp this judgment condition is different,  use the function nvme_host_path_error ,
> it should be possible to go to nvme_complete_rq ->  nvme_retry_req -- request->mq_hctx have been freed, is NULL.

did you try the proposal or are you assuming it will be NULL ?


>
>
>
>
> zhang.guanghui@cestc.cn
>
>
>
>   
>
>
>
> From: Max Gurtovoy
>
>
>
> Date: 2025-02-10 18:24
>
>
>
> To: Maurizio Lombardi; zhang.guanghui@cestc.cn; sagi; kbusch; sashal; chunguang.xu
>
>
>
> CC: linux-kernel; linux-nvme; linux-block
>
>
>
> Subject: Re: nvme-tcp: fix a possible UAF when failing to send request
>
>
>
>   
>
>
>
> On 10/02/2025 12:01, Maurizio Lombardi wrote:
>
>
>
>> On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
>
>
>>> Hello
>
>
>
>
>
>
>> I guess you have to fix your mail client.
>
>
>
>
>>>        When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>
>
>>> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free.
>
>
>
>
>
>
>>> CPU1                                                                   CPU2
>
>
>
>
>>> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13
>
>
>> This simply looks like a race condition between nvme_tcp_poll() and nvme_tcp_try_send()
>
>
>> Personally, I would try to fix it inside the nvme-tcp driver without
>
>
>> touching the core functions.
>
>
>
>
>> Maybe nvme_tcp_poll should just ensure that io_work completes before
>
>
>> calling nvme_tcp_try_recv(), the POLLING flag should then prevent io_work
>
>
>> from getting rescheduled by the nvme_tcp_data_ready() callback.
>
>
>
>
>
>
>> Maurizio
>
>
>   
>
>
>
> It seems to me that the HOST_PATH_ERROR handling can be improved in
>
>
>
> nvme-tcp.
>
>
>
>   
>
>
>
> In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in
>
>
>
> case we fail to submit a command..
>
>
>
>   
>
>
>
> can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),
>
>
>
> NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in
>
>
>
> nvme-rdma for host path error handling ?
>
>
>
>   
>
>
>
>   
>
>
>
>   
>
>
>
>   
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10 10:24   ` Max Gurtovoy
  2025-02-10 11:16     ` zhang.guanghui
@ 2025-02-10 16:40     ` Maurizio Lombardi
       [not found]       ` <CAAO4dAWdsMjYMp9jdWXd_48aG0mTtVpRONqjJJ1scnc773tHzg@mail.gmail.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-10 16:40 UTC (permalink / raw)
  To: Max Gurtovoy, zhang.guanghui@cestc.cn, sagi, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Mon Feb 10, 2025 at 11:24 AM CET, Max Gurtovoy wrote:
>
> It seems to me that the HOST_PATH_ERROR handling can be improved in 
> nvme-tcp.
>
> In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in 
> case we fail to submit a command..
>
> can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req), 
> NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in 
> nvme-rdma for host path error handling ?

Yes, I could try to prepare a patch.

In any case, I think the main issue here is that nvme_tcp_poll()
should be prevented from racing against io_work... and I also think
there is a possible race condition if nvme_tcp_poll() races against
the controller resetting code.

Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <CAAO4dAWdsMjYMp9jdWXd_48aG0mTtVpRONqjJJ1scnc773tHzg@mail.gmail.com>]

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
       [not found]       ` <CAAO4dAWdsMjYMp9jdWXd_48aG0mTtVpRONqjJJ1scnc773tHzg@mail.gmail.com>
@ 2025-02-11  8:04         ` zhang.guanghui
  2025-02-12  8:11           ` Maurizio Lombardi
  0 siblings, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-02-11  8:04 UTC (permalink / raw)
  To: chunguang.xu, Maurizio Lombardi
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

Hi 



    This is a  race issue,  I can't reproduce it stably yet. I have not tested the latest kernel.  but in fact,  I've synced some nvme-tcp patches from  lastest upstream, 



But in fact, the issue still exists. I review nvme-tcp code and found that this issues may exist in nvme_tcp_poll processing. 







zhang.guanghui@cestc.cn



 



From: Xu Chunguang



Date: 2025-02-11 15:20



To: Maurizio Lombardi



CC: Max Gurtovoy; zhang.guanghui; sagi; kbusch; sashal; linux-kernel; linux-nvme; linux-block



Subject: Re: nvme-tcp: fix a possible UAF when failing to send request







Hi: 







Does  you have tested the latest kernel, can it  reproduce the same issue？







Maurizio Lombardi <mlombard@bsdbackstore.eu> 于 2025年2月11日周二 00:40写道：





On Mon Feb 10, 2025 at 11:24 AM CET, Max Gurtovoy wrote:



>



> It seems to me that the HOST_PATH_ERROR handling can be improved in



> nvme-tcp.



>



> In nvme-rdma we use nvme_host_path_error(rq) and nvme_cleanup_cmd(rq) in



> case we fail to submit a command..



>



> can you try to replacing nvme_tcp_end_request(blk_mq_rq_from_pdu(req),



> NVME_SC_HOST_PATH_ERROR); call with the similar logic we use in



> nvme-rdma for host path error handling ?





Yes, I could try to prepare a patch.





In any case, I think the main issue here is that nvme_tcp_poll()



should be prevented from racing against io_work... and I also think



there is a possible race condition if nvme_tcp_poll() races against



the controller resetting code.





Maurizio



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-11  8:04         ` zhang.guanghui
@ 2025-02-12  8:11           ` Maurizio Lombardi
  2025-02-12  8:23             ` Maurizio Lombardi
  2025-02-12  8:52             ` Maurizio Lombardi
  0 siblings, 2 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12  8:11 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

On Tue Feb 11, 2025 at 9:04 AM CET, zhang.guanghui@cestc.cn wrote:
> Hi 
>
>     This is a  race issue,  I can't reproduce it stably yet. I have not tested the latest kernel.  but in fact,  I've synced some nvme-tcp patches from  lastest upstream,

Hello, could you try this patch?

queue_lock should protect against concurrent "error recovery",
while send_mutex should serialize try_recv() and try_send(), emulating
the way io_work works.
Concurrent calls to try_recv() should already be protected by
sock_lock.

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 841238f38fdd..f464de04ff4d 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2653,16 +2653,24 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
 {
 	struct nvme_tcp_queue *queue = hctx->driver_data;
 	struct sock *sk = queue->sock->sk;
+	int r = 0;
 
+	mutex_lock(&queue->queue_lock);
 	if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
-		return 0;
+		goto out;
 
 	set_bit(NVME_TCP_Q_POLLING, &queue->flags);
 	if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
 		sk_busy_loop(sk, true);
+
+	mutex_lock(&queue->send_mutex);
 	nvme_tcp_try_recv(queue);
+	r = queue->nr_cqe;
+	mutex_unlock(&queue->send_mutex);
 	clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
-	return queue->nr_cqe;
+out:
+	mutex_unlock(&queue->queue_lock);
+	return r;
 }
 
 static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int size)


Thanks,
Maurizio

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12  8:11           ` Maurizio Lombardi
@ 2025-02-12  8:23             ` Maurizio Lombardi
  2025-02-12  8:52             ` Maurizio Lombardi
  1 sibling, 0 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12  8:23 UTC (permalink / raw)
  To: Maurizio Lombardi, zhang.guanghui@cestc.cn, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

On Wed Feb 12, 2025 at 9:11 AM CET, Maurizio Lombardi wrote:
> Hello, could you try this patch?
>
> Concurrent calls to try_recv() should already be protected by
> sock_lock.
>
> +	mutex_lock(&queue->send_mutex);
>  	nvme_tcp_try_recv(queue);
> +	r = queue->nr_cqe;
> +	mutex_unlock(&queue->send_mutex);

Well, reading nr_cqe like this is still racy, but should be a minor
issue and not hard to fix.

Maurizio


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12  8:11           ` Maurizio Lombardi
  2025-02-12  8:23             ` Maurizio Lombardi
@ 2025-02-12  8:52             ` Maurizio Lombardi
  2025-02-12  9:47               ` zhang.guanghui
  1 sibling, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12  8:52 UTC (permalink / raw)
  To: Maurizio Lombardi, zhang.guanghui@cestc.cn, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

On Wed Feb 12, 2025 at 9:11 AM CET, Maurizio Lombardi wrote:
> On Tue Feb 11, 2025 at 9:04 AM CET, zhang.guanghui@cestc.cn wrote:
>> Hi 
>>
>>     This is a  race issue,  I can't reproduce it stably yet. I have not tested the latest kernel.  but in fact,  I've synced some nvme-tcp patches from  lastest upstream,
>
> Hello, could you try this patch?
>
> queue_lock should protect against concurrent "error recovery",
> +	mutex_lock(&queue->queue_lock);

Unfortunately I've just realized that queue_lock won't save us
from the race against the controller reset, it's still possible
we lock a destroyed mutex. So just try this
simplified patch, I will try to figure out something else:

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 841238f38fdd..b714e1691c30 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2660,7 +2660,10 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
 	set_bit(NVME_TCP_Q_POLLING, &queue->flags);
 	if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
 		sk_busy_loop(sk, true);
+
+	mutex_lock(&queue->send_mutex);
 	nvme_tcp_try_recv(queue);
+	mutex_unlock(&queue->send_mutex);
 	clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
 	return queue->nr_cqe;
 }

Maurizio

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12  8:52             ` Maurizio Lombardi
@ 2025-02-12  9:47               ` zhang.guanghui
  2025-02-12 10:28                 ` Maurizio Lombardi
  0 siblings, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-02-12  9:47 UTC (permalink / raw)
  To: Maurizio Lombardi, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

    Hi, Thanks.
    I will test this patch, but I am worried whether it will affect the performance.
Should we also consider null pointer protection?


zhang.guanghui@cestc.cn



From: Maurizio Lombardi



Date: 2025-02-12 16:52



To: Maurizio Lombardi; zhang.guanghui@cestc.cn; chunguang.xu



CC: mgurtovoy; sagi; kbusch; sashal; linux-kernel; linux-nvme; linux-block



Subject: Re: nvme-tcp: fix a possible UAF when failing to send request



On Wed Feb 12, 2025 at 9:11 AM CET, Maurizio Lombardi wrote:



> On Tue Feb 11, 2025 at 9:04 AM CET, zhang.guanghui@cestc.cn wrote:



>> Hi 



>>



>>     This is a  race issue,  I can't reproduce it stably yet. I have not tested the latest kernel.  but in fact,  I've synced some nvme-tcp patches from  lastest upstream,



>



> Hello, could you try this patch?



>



> queue_lock should protect against concurrent "error recovery",



> + mutex_lock(&queue->queue_lock);



 



Unfortunately I've just realized that queue_lock won't save us



from the race against the controller reset, it's still possible



we lock a destroyed mutex. So just try this



simplified patch, I will try to figure out something else:



 



diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



index 841238f38fdd..b714e1691c30 100644



--- a/drivers/nvme/host/tcp.c



+++ b/drivers/nvme/host/tcp.c



@@ -2660,7 +2660,10 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)



set_bit(NVME_TCP_Q_POLLING, &queue->flags);



if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))



sk_busy_loop(sk, true);



+



+ mutex_lock(&queue->send_mutex);



nvme_tcp_try_recv(queue);



+ mutex_unlock(&queue->send_mutex);



clear_bit(NVME_TCP_Q_POLLING, &queue->flags);



return queue->nr_cqe;



}



 



Maurizio



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12  9:47               ` zhang.guanghui
@ 2025-02-12 10:28                 ` Maurizio Lombardi
  2025-02-12 11:14                   ` Maurizio Lombardi
  0 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 10:28 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

On Wed Feb 12, 2025 at 10:47 AM CET, zhang.guanghui@cestc.cn wrote:
>     Hi, Thanks.
>     I will test this patch, but I am worried whether it will affect the performance.
> Should we also consider null pointer protection?

Yes, it will likely affect the performance, just check if it works.

Probably it could be optimized by just protecting
nvme_tcp_fail_request(), which AFAICT is the only function in the
nvme_tcp_try_send() code that calls nvme_complete_rq().

Maurizio


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12 10:28                 ` Maurizio Lombardi
@ 2025-02-12 11:14                   ` Maurizio Lombardi
  2025-02-12 11:47                     ` Maurizio Lombardi
  0 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 11:14 UTC (permalink / raw)
  To: Maurizio Lombardi, zhang.guanghui@cestc.cn, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

On Wed Feb 12, 2025 at 11:28 AM CET, Maurizio Lombardi wrote:
> On Wed Feb 12, 2025 at 10:47 AM CET, zhang.guanghui@cestc.cn wrote:
>>     Hi, Thanks.
>>     I will test this patch, but I am worried whether it will affect the performance.
>> Should we also consider null pointer protection?
>
> Yes, it will likely affect the performance, just check if it works.
>
> Probably it could be optimized by just protecting
> nvme_tcp_fail_request(), which AFAICT is the only function in the
> nvme_tcp_try_send() code that calls nvme_complete_rq().

Something like that, maybe, not tested:

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 841238f38fdd..488edec35a65 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -146,6 +146,7 @@ struct nvme_tcp_queue {
 
 	struct mutex		queue_lock;
 	struct mutex		send_mutex;
+	struct mutex		poll_mutex;
 	struct llist_head	req_list;
 	struct list_head	send_list;
 
@@ -1259,7 +1260,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
 	} else if (ret < 0) {
 		dev_err(queue->ctrl->ctrl.device,
 			"failed to send request %d\n", ret);
+		mutex_lock(&queue->poll_mutex);
 		nvme_tcp_fail_request(queue->request);
+		mutex_unlock(&queue->poll_mutex);
 		nvme_tcp_done_send_req(queue);
 	}
 out:
@@ -1397,6 +1400,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 	kfree(queue->pdu);
 	mutex_destroy(&queue->send_mutex);
 	mutex_destroy(&queue->queue_lock);
+	mutex_destroy(&queue->poll_mutex);
 }
 
 static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
@@ -1710,6 +1714,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 	init_llist_head(&queue->req_list);
 	INIT_LIST_HEAD(&queue->send_list);
 	mutex_init(&queue->send_mutex);
+	mutex_init(&queue->poll_mutex);
 	INIT_WORK(&queue->io_work, nvme_tcp_io_work);
 
 	if (qid > 0)
@@ -2660,7 +2665,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
 	set_bit(NVME_TCP_Q_POLLING, &queue->flags);
 	if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
 		sk_busy_loop(sk, true);
+	mutex_lock(&queue->poll_mutex);
 	nvme_tcp_try_recv(queue);
+	mutex_unlock(&queue->poll_mutex);
 	clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
 	return queue->nr_cqe;
 }

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12 11:14                   ` Maurizio Lombardi
@ 2025-02-12 11:47                     ` Maurizio Lombardi
  0 siblings, 0 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 11:47 UTC (permalink / raw)
  To: Maurizio Lombardi, zhang.guanghui@cestc.cn, chunguang.xu
  Cc: mgurtovoy, sagi, kbusch, sashal, linux-kernel, linux-nvme,
	linux-block

On Wed Feb 12, 2025 at 12:14 PM CET, Maurizio Lombardi wrote:
> On Wed Feb 12, 2025 at 11:28 AM CET, Maurizio Lombardi wrote:
>> On Wed Feb 12, 2025 at 10:47 AM CET, zhang.guanghui@cestc.cn wrote:
>>>     Hi, Thanks.
>>>     I will test this patch, but I am worried whether it will affect the performance.
>>> Should we also consider null pointer protection?
>>
>> Yes, it will likely affect the performance, just check if it works.
>>
>> Probably it could be optimized by just protecting
>> nvme_tcp_fail_request(), which AFAICT is the only function in the
>> nvme_tcp_try_send() code that calls nvme_complete_rq().
>
> Something like that, maybe, not tested:

Ah wait, this won't fix anything because it will end up with a double
completion.
Ok I am not sure how to fix this, someone else maybe has better ideas.

Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10  7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
  2025-02-10 10:01 ` Maurizio Lombardi
@ 2025-02-12 15:33 ` Maurizio Lombardi
  2025-02-12 16:07   ` Maurizio Lombardi
  2025-02-17  7:46 ` Sagi Grimberg
  2 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 15:33 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@cestc.cn wrote:
> Hello 
>
>     When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free. 
>
> CPU1                                                                   CPU2
>
> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13 
>
>     nvme_tcp_try_recv                                                      nvme_tcp_fail_request
>
>         nvme_tcp_recv_skb                                                      nvme_tcp_end_request
>
>             nvme_tcp_recv_pdu                                                      nvme_complete_rq 
>
>                 nvme_tcp_handle_comp                                                   nvme_retry_req -- request->mq_hctx have been freed, is NULL.               
>                     nvme_tcp_process_nvme_cqe                                                                                    
>
>                         nvme_complete_rq
>
>                             nvme_end_req
>
>                                   blk_mq_end_request

Taking a step back. Let's take a different approach and try to avoid the
double completion.

The problem here is that apparently we received a nvme_tcp_rsp capsule
from the target, meaning that the command has been processed (I guess
the capsule has an error status?)

So maybe only part of the command has been sent?
Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
error by the controller?

Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12 15:33 ` Maurizio Lombardi
@ 2025-02-12 16:07   ` Maurizio Lombardi
  2025-02-13  2:04     ` zhang.guanghui
  0 siblings, 1 reply; 28+ messages in thread
From: Maurizio Lombardi @ 2025-02-12 16:07 UTC (permalink / raw)
  To: Maurizio Lombardi, zhang.guanghui@cestc.cn, sagi, mgurtovoy,
	kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Wed Feb 12, 2025 at 4:33 PM CET, Maurizio Lombardi wrote:
>
> Taking a step back. Let's take a different approach and try to avoid the
> double completion.
>
> The problem here is that apparently we received a nvme_tcp_rsp capsule
> from the target, meaning that the command has been processed (I guess
> the capsule has an error status?)
>
> So maybe only part of the command has been sent?
> Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
> error by the controller?

The NVMe/TCP specification says

******
When a controller detects a fatal error, that controller shall:
  1. stop processing any PDUs that arrive on the connection; and
  2. send a C2HTermReq PDU
******

And indeed I see in the dmesg this:

nvme nvme2: unsupported pdu type (3)

This means the controller detected the problem and sent to the host the
C2HTermReq command. Upon receiving this command, the host is supposed to
close the connection.

Now I get it.

Zhang, do you have commit aeacfcefa218f4ed11da478e9b7915a37d1afaff in
your kernel, I guess you are missing it. Check it please.

Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-12 16:07   ` Maurizio Lombardi
@ 2025-02-13  2:04     ` zhang.guanghui
  0 siblings, 0 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-02-13  2:04 UTC (permalink / raw)
  To: Maurizio Lombardi, sagi, mgurtovoy, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

Hi 
     int fact,  the commit aeacfcefa218f4ed11da478e9b7915a37d1afaff  has been synchronized, but the issue still occurs.

    I think so, when the host failing to send request -13,   maybe only part of the command has been sent or  the controller has receved  a complete command.
but the controller may send nvme_tcp_rsp  and  C2HTermReq consecutively.  Now I am analyzing the controller spdk processing logic. 
when the host uses nvme_tcp_poll,  it receives nvme_tcp_rsp and handles it firstly,    
then the host  have a UAF and crashed.   

zhang.guanghui@cestc.cn

From: Maurizio Lombardi

Date: 2025-02-13 00:07

To: Maurizio Lombardi; zhang.guanghui@cestc.cn; sagi; mgurtovoy; kbusch; sashal; chunguang.xu

CC: linux-kernel; linux-nvme; linux-block

Subject: Re: nvme-tcp: fix a possible UAF when failing to send request

On Wed Feb 12, 2025 at 4:33 PM CET, Maurizio Lombardi wrote:

>

> Taking a step back. Let's take a different approach and try to avoid the

> double completion.

>

> The problem here is that apparently we received a nvme_tcp_rsp capsule

> from the target, meaning that the command has been processed (I guess

> the capsule has an error status?)

>

> So maybe only part of the command has been sent?

> Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal

> error by the controller?  

--

> The NVMe/TCP specification says

******

When a controller detects a fatal error, that controller shall:

  1. stop processing any PDUs that arrive on the connection; and

  2. send a C2HTermReq PDU

******

And indeed I see in the dmesg this:

nvme nvme2: unsupported pdu type (3)

This means the controller detected the problem and sent to the host the

C2HTermReq command. Upon receiving this command, the host is supposed to

close the connection.

Now I get it.

Zhang, do you have commit aeacfcefa218f4ed11da478e9b7915a37d1afaff in

your kernel, I guess you are missing it. Check it please.

Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-10  7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
  2025-02-10 10:01 ` Maurizio Lombardi
  2025-02-12 15:33 ` Maurizio Lombardi
@ 2025-02-17  7:46 ` Sagi Grimberg
  2025-02-20  8:20   ` zhang.guanghui
  2025-03-07 10:10   ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】 zhang.guanghui
  2 siblings, 2 replies; 28+ messages in thread
From: Sagi Grimberg @ 2025-02-17  7:46 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, mgurtovoy, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block




On 10/02/2025 9:41, zhang.guanghui@cestc.cn wrote:
> Hello
>
>
>
>      When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
>
>
>
> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free.
>
>
>
>
>
> CPU1                                                                   CPU2
>
>
>
> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13
>
>
>
>      nvme_tcp_try_recv                                                      nvme_tcp_fail_request
>
>
>
>          nvme_tcp_recv_skb                                                      nvme_tcp_end_request
>
>
>
>              nvme_tcp_recv_pdu                                                      nvme_complete_rq
>
>
>
>                  nvme_tcp_handle_comp                                                   nvme_retry_req -- request->mq_hctx have been freed, is NULL.
>
>
>
>                      nvme_tcp_process_nvme_cqe
>
>
>
>                          nvme_complete_rq
>
>
>
>                              nvme_end_req
>
>
>
>                                    blk_mq_end_request
>
>
>
>
>
>
>
> when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then  the nvme_tcp_fail_request would execute。
>
>
>
> but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request.  request->mq_hctx was used after free.
>
>
>
> the follow patch is to solve it.

Zhang, your email client needs fixing - it is impossible to follow your 
emails.

>
>
>
> can you give  some suggestions?  thanks!

The problem is the C2HTerm that the host is unable to handle correctly.
And it also appears that nvme_tcp_poll() does not signal correctly to 
blk-mq to stop
calling poll.

One thing to do is to handle C2HTerm PDU correctly, and, here is a 
possible fix to try for the UAF issue:
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index c637ff04a052..0e390e98aaf9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx 
*hctx, struct io_comp_batch *iob)
  {
         struct nvme_tcp_queue *queue = hctx->driver_data;
         struct sock *sk = queue->sock->sk;
+       int ret;

         if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
                 return 0;
@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx 
*hctx, struct io_comp_batch *iob)
         set_bit(NVME_TCP_Q_POLLING, &queue->flags);
         if (sk_can_busy_loop(sk) && 
skb_queue_empty_lockless(&sk->sk_receive_queue))
                 sk_busy_loop(sk, true);
-       nvme_tcp_try_recv(queue);
+       ret = nvme_tcp_try_recv(queue);
         clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
-       return queue->nr_cqe;
+       return ret < 0 ? ret : queue->nr_cqe;
  }

  static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int 
size)
--

Does this help?

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-02-17  7:46 ` Sagi Grimberg
@ 2025-02-20  8:20   ` zhang.guanghui
  2025-03-07 10:10   ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】 zhang.guanghui
  1 sibling, 0 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-02-20  8:20 UTC (permalink / raw)
  To: sagi, mgurtovoy, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block


Hi 
        After testing this patch,  No sending request failure occurred, the issue has not been reproduced yet.  
It may take a long time to test.

best wishes



zhang.guanghui@cestc.cn



 



发件人： Sagi Grimberg



发送时间： 2025-02-17 15:46



收件人： zhang.guanghui@cestc.cn; mgurtovoy; kbusch; sashal; chunguang.xu



抄送： linux-kernel; linux-nvme; linux-block



主题： Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】



 



 



 



On 10/02/2025 9:41, zhang.guanghui@cestc.cn wrote:



> Hello



>



>



>



>      When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.



>



>



>



> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free.



>



>



>



>



>



> CPU1                                                                   CPU2



>



>



>



> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13



>



>



>



>      nvme_tcp_try_recv                                                      nvme_tcp_fail_request



>



>



>



>          nvme_tcp_recv_skb                                                      nvme_tcp_end_request



>



>



>



>              nvme_tcp_recv_pdu                                                      nvme_complete_rq



>



>



>



>                  nvme_tcp_handle_comp                                                   nvme_retry_req -- request->mq_hctx have been freed, is NULL.



>



>



>



>                      nvme_tcp_process_nvme_cqe



>



>



>



>                          nvme_complete_rq



>



>



>



>                              nvme_end_req



>



>



>



>                                    blk_mq_end_request



>



>



>



>



>



>



>



> when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then  the nvme_tcp_fail_request would execute。



>



>



>



> but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request.  request->mq_hctx was used after free.



>



>



>



> the follow patch is to solve it.



 



Zhang, your email client needs fixing - it is impossible to follow your



emails.



 



>



>



>



> can you give  some suggestions?  thanks!



 



The problem is the C2HTerm that the host is unable to handle correctly.



And it also appears that nvme_tcp_poll() does not signal correctly to



blk-mq to stop



calling poll.



 



One thing to do is to handle C2HTerm PDU correctly, and, here is a



possible fix to try for the UAF issue:



--



diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



index c637ff04a052..0e390e98aaf9 100644



--- a/drivers/nvme/host/tcp.c



+++ b/drivers/nvme/host/tcp.c



@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx



*hctx, struct io_comp_batch *iob)



 {



        struct nvme_tcp_queue *queue = hctx->driver_data;



        struct sock *sk = queue->sock->sk;



+       int ret;



 



        if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))



                return 0;



@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx



*hctx, struct io_comp_batch *iob)



        set_bit(NVME_TCP_Q_POLLING, &queue->flags);



        if (sk_can_busy_loop(sk) &&



skb_queue_empty_lockless(&sk->sk_receive_queue))



                sk_busy_loop(sk, true);



-       nvme_tcp_try_recv(queue);



+       ret = nvme_tcp_try_recv(queue);



        clear_bit(NVME_TCP_Q_POLLING, &queue->flags);



-       return queue->nr_cqe;



+       return ret < 0 ? ret : queue->nr_cqe;



 }



 



 static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int



size)



--



 



Does this help?



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
  2025-02-17  7:46 ` Sagi Grimberg
  2025-02-20  8:20   ` zhang.guanghui
@ 2025-03-07 10:10   ` zhang.guanghui
  2025-03-11 10:52     ` Maurizio Lombardi
  1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-07 10:10 UTC (permalink / raw)
  To: sagi, mgurtovoy, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block


Hi 

        After testing this patch,   sending request failure occurred,   unfortunately, the issue still persists.  


best wishes




zhang.guanghui@cestc.cn



 



发件人： Sagi Grimberg



发送时间： 2025-02-17 15:46



收件人： zhang.guanghui@cestc.cn; mgurtovoy; kbusch; sashal; chunguang.xu



抄送： linux-kernel; linux-nvme; linux-block



主题： Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】



 



 



 



On 10/02/2025 9:41, zhang.guanghui@cestc.cn wrote:



> Hello



>



>



>



>      When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.



>



>



>



> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free.



>



>



>



>



>



> CPU1                                                                   CPU2



>



>



>



> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13



>



>



>



>      nvme_tcp_try_recv                                                      nvme_tcp_fail_request



>



>



>



>          nvme_tcp_recv_skb                                                      nvme_tcp_end_request



>



>



>



>              nvme_tcp_recv_pdu                                                      nvme_complete_rq



>



>



>



>                  nvme_tcp_handle_comp                                                   nvme_retry_req -- request->mq_hctx have been freed, is NULL.



>



>



>



>                      nvme_tcp_process_nvme_cqe



>



>



>



>                          nvme_complete_rq



>



>



>



>                              nvme_end_req



>



>



>



>                                    blk_mq_end_request



>



>



>



>



>



>



>



> when nvme_tcp_try_send failed to send reqrest 13, it maybe be resulted by selinux or other reasons, this is a problem. then  the nvme_tcp_fail_request would execute。



>



>



>



> but the nvme_tcp_recv_pdu may have received the responding pdu and the nvme_tcp_process_nvme_cqe would have completed the request.  request->mq_hctx was used after free.



>



>



>



> the follow patch is to solve it.



 



Zhang, your email client needs fixing - it is impossible to follow your



emails.



 



>



>



>



> can you give  some suggestions?  thanks!



 



The problem is the C2HTerm that the host is unable to handle correctly.



And it also appears that nvme_tcp_poll() does not signal correctly to



blk-mq to stop



calling poll.



 



One thing to do is to handle C2HTerm PDU correctly, and, here is a



possible fix to try for the UAF issue:



--



diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



index c637ff04a052..0e390e98aaf9 100644



--- a/drivers/nvme/host/tcp.c



+++ b/drivers/nvme/host/tcp.c



@@ -2673,6 +2673,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx



*hctx, struct io_comp_batch *iob)



 {



        struct nvme_tcp_queue *queue = hctx->driver_data;



        struct sock *sk = queue->sock->sk;



+       int ret;



 



        if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))



                return 0;



@@ -2680,9 +2681,9 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx



*hctx, struct io_comp_batch *iob)



        set_bit(NVME_TCP_Q_POLLING, &queue->flags);



        if (sk_can_busy_loop(sk) &&



skb_queue_empty_lockless(&sk->sk_receive_queue))



                sk_busy_loop(sk, true);



-       nvme_tcp_try_recv(queue);



+       ret = nvme_tcp_try_recv(queue);



        clear_bit(NVME_TCP_Q_POLLING, &queue->flags);



-       return queue->nr_cqe;



+       return ret < 0 ? ret : queue->nr_cqe;



 }



 



 static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int



size)



--



 



Does this help?



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
  2025-03-07 10:10   ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】 zhang.guanghui
@ 2025-03-11 10:52     ` Maurizio Lombardi
  2025-03-13  1:48       ` zhang.guanghui
  2025-03-28  9:24       ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
  0 siblings, 2 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-03-11 10:52 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Fri Mar 7, 2025 at 11:10 AM CET, zhang.guanghui@cestc.cn wrote:
>
> Hi 
>
>         After testing this patch,   sending request failure occurred,   unfortunately, the issue still persists. 

Maybe I am completely wrong but I am still quite convinced that the problem here
is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule
for the command, leading to a double-completion in the host.

Sagi, what about taking this patch: https://lore.kernel.org/linux-nvme/20250306160322.1370300-2-mlombard@redhat.com/T/#u
and do a step further by not completing the request, leaving the error
recovery handler the task of cleaning everything up?


diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 327f3f2f5399..72c1d7948386 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1328,8 +1328,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
 	} else if (ret < 0) {
 		dev_err(queue->ctrl->ctrl.device,
 			"failed to send request %d\n", ret);
-		nvme_tcp_fail_request(queue->request);
 		nvme_tcp_done_send_req(queue);
+		nvme_tcp_error_recovery(&queue->ctrl->ctrl);
 	}
 out:
 	memalloc_noreclaim_restore(noreclaim_flag);


Maurizio

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
  2025-03-11 10:52     ` Maurizio Lombardi
@ 2025-03-13  1:48       ` zhang.guanghui
  2025-03-13  7:51         ` Hannes Reinecke
  2025-03-28  9:24       ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
  1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-13  1:48 UTC (permalink / raw)
  To: Maurizio Lombardi, sagi, mgurtovoy, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

Yes, the problem here is that,  despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host. 

Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks 
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index be04c5f3856d..17407eb12ad9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
        } else if (ret < 0) {
                dev_err(queue->ctrl->ctrl.device,
                        "failed to send request %d\n", ret);
-               nvme_tcp_fail_request(queue->request);
                nvme_tcp_done_send_req(queue);
+              queue->rd_enabled = false;
+              nvme_tcp_error_recovery(&queue->ctrl->ctrl);
        }
 out:
        memalloc_noreclaim_restore(noreclaim_flag);




zhang.guanghui@cestc.cn



发件人： Maurizio Lombardi



发送时间： 2025-03-11 18:52



收件人： zhang.guanghui@cestc.cn; sagi; mgurtovoy; kbusch; sashal; chunguang.xu



抄送： linux-kernel; linux-nvme; linux-block



主题： Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】



On Fri Mar 7, 2025 at 11:10 AM CET, zhang.guanghui@cestc.cn wrote:



>



> Hi



>



>         After testing this patch,   sending request failure occurred,   unfortunately, the issue still persists.



 



Maybe I am completely wrong but I am still quite convinced that the problem here



is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule



for the command, leading to a double-completion in the host.



 



Sagi, what about taking this patch: https://lore.kernel.org/linux-nvme/20250306160322.1370300-2-mlombard@redhat.com/T/#u



and do a step further by not completing the request, leaving the error



recovery handler the task of cleaning everything up?



 



 



diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



index 327f3f2f5399..72c1d7948386 100644



--- a/drivers/nvme/host/tcp.c



+++ b/drivers/nvme/host/tcp.c



@@ -1328,8 +1328,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)



} else if (ret < 0) {



dev_err(queue->ctrl->ctrl.device,



"failed to send request %d\n", ret);



- nvme_tcp_fail_request(queue->request);



nvme_tcp_done_send_req(queue);



+ nvme_tcp_error_recovery(&queue->ctrl->ctrl);



}



out:



memalloc_noreclaim_restore(noreclaim_flag);



 



 



Maurizio



 



 



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
  2025-03-13  1:48       ` zhang.guanghui
@ 2025-03-13  7:51         ` Hannes Reinecke
  2025-03-13  8:18           ` zhang.guanghui
  2025-03-13  8:38           ` zhang.guanghui
  0 siblings, 2 replies; 28+ messages in thread
From: Hannes Reinecke @ 2025-03-13  7:51 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, Maurizio Lombardi, sagi, mgurtovoy,
	kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On 3/13/25 02:48, zhang.guanghui@cestc.cn wrote:
> Yes, the problem here is that,  despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.
> 
> Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks
>   
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index be04c5f3856d..17407eb12ad9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
>          } else if (ret < 0) {
>                  dev_err(queue->ctrl->ctrl.device,
>                          "failed to send request %d\n", ret);
> -               nvme_tcp_fail_request(queue->request);
>                  nvme_tcp_done_send_req(queue);
> +              queue->rd_enabled = false;
> +              nvme_tcp_error_recovery(&queue->ctrl->ctrl);
>          }
>   out:
>          memalloc_noreclaim_restore(noreclaim_flag);
> 
> 
> 
Hmm. In principle, yes. Problem is that network is a bi-directional 
communication, and a failure on one side doesn't necessarily imply
a failure on the other.
In particular when the send side fails we should _continue_ to read
as we should be flushing the read side buffer before closing.

So I agree with starting error recovery, but not with disabling the 
reading side (as we haven't encountered a read error).

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
  2025-03-13  7:51         ` Hannes Reinecke
@ 2025-03-13  8:18           ` zhang.guanghui
       [not found]             ` <2025031316313196627826@cestc.cn>
  2025-03-13  8:38           ` zhang.guanghui
  1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-13  8:18 UTC (permalink / raw)
  To: Hannes Reinecke, Maurizio Lombardi, sagi, mgurtovoy, kbusch,
	sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

Hi,  in fact, the target may send C2HTermReq.




zhang.guanghui@cestc.cn



 



发件人： Hannes Reinecke



发送时间： 2025-03-13 15:51



收件人： zhang.guanghui@cestc.cn; Maurizio Lombardi; sagi; mgurtovoy; kbusch; sashal; chunguang.xu



抄送： linux-kernel; linux-nvme; linux-block



主题： Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】



On 3/13/25 02:48, zhang.guanghui@cestc.cn wrote:



> Yes, the problem here is that,  despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.



>



> Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks



>  



> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



> index be04c5f3856d..17407eb12ad9 100644



> --- a/drivers/nvme/host/tcp.c



> +++ b/drivers/nvme/host/tcp.c



> @@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)



>          } else if (ret < 0) {



>                  dev_err(queue->ctrl->ctrl.device,



>                          "failed to send request %d\n", ret);



> -               nvme_tcp_fail_request(queue->request);



>                  nvme_tcp_done_send_req(queue);



> +              queue->rd_enabled = false;



> +              nvme_tcp_error_recovery(&queue->ctrl->ctrl);



>          }



>   out:



>          memalloc_noreclaim_restore(noreclaim_flag);



>



>



>



Hmm. In principle, yes. Problem is that network is a bi-directional



communication, and a failure on one side doesn't necessarily imply



a failure on the other.



In particular when the send side fails we should _continue_ to read



as we should be flushing the read side buffer before closing.



 



So I agree with starting error recovery, but not with disabling the



reading side (as we haven't encountered a read error).



 



Cheers,



 



Hannes



--



Dr. Hannes Reinecke                  Kernel Storage Architect



hare@suse.de                                +49 911 74053 688



SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg



HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <2025031316313196627826@cestc.cn>]

* Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
       [not found]             ` <2025031316313196627826@cestc.cn>
@ 2025-03-13  9:01               ` Maurizio Lombardi
  0 siblings, 0 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-03-13  9:01 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, Hannes Reinecke, sagi, mgurtovoy, kbusch,
	sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Thu Mar 13, 2025 at 9:31 AM CET, zhang.guanghui@cestc.cn wrote:
> Hi,
>   in fact, the nvme_tcp_try_send() failure, the target may send C2HTermReq immediately. while the host receives the C2HTermReq and still starting error recovery.  
>  so when queue->rd_enabled is false,  can avoid starting error recovery agagin.


Not all targets send C2HTermReq (for example, the Linux target doesn't
at the moment) so you can't rely on that.
In any case, calling nvme_tcp_error_recovery() twice is harmless;
the first call moves the controller to the resetting state, the second
call is ignored.

Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】
  2025-03-13  7:51         ` Hannes Reinecke
  2025-03-13  8:18           ` zhang.guanghui
@ 2025-03-13  8:38           ` zhang.guanghui
  1 sibling, 0 replies; 28+ messages in thread
From: zhang.guanghui @ 2025-03-13  8:38 UTC (permalink / raw)
  To: Hannes Reinecke, Maurizio Lombardi, sagi, mgurtovoy, kbusch,
	sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

Hi,



  in fact, the nvme_tcp_try_send() failure, the target may send C2HTermReq immediately. while the host receives the C2HTermReq and still starting error recovery. 



so when queue->rd_enabled is false,  can avoid starting error recovery agagin. Thanks





zhang.guanghui@cestc.cn



 



发件人： Hannes Reinecke



发送时间： 2025-03-13 15:51



收件人： zhang.guanghui@cestc.cn; Maurizio Lombardi; sagi; mgurtovoy; kbusch; sashal; chunguang.xu



抄送： linux-kernel; linux-nvme; linux-block



主题： Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】



On 3/13/25 02:48, zhang.guanghui@cestc.cn wrote:



> Yes, the problem here is that,  despite the nvme_tcp_try_send() failure, the target sends a response capsule for the command, leading to a UAF in the host.



>



> Is it more reasonable to disable queue->rd_enabled to prevent receiving. Thanks



>  



> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



> index be04c5f3856d..17407eb12ad9 100644



> --- a/drivers/nvme/host/tcp.c



> +++ b/drivers/nvme/host/tcp.c



> @@ -1203,8 +1203,9 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)



>          } else if (ret < 0) {



>                  dev_err(queue->ctrl->ctrl.device,



>                          "failed to send request %d\n", ret);



> -               nvme_tcp_fail_request(queue->request);



>                  nvme_tcp_done_send_req(queue);



> +              queue->rd_enabled = false;



> +              nvme_tcp_error_recovery(&queue->ctrl->ctrl);



>          }



>   out:



>          memalloc_noreclaim_restore(noreclaim_flag);



>



>



>



Hmm. In principle, yes. Problem is that network is a bi-directional



communication, and a failure on one side doesn't necessarily imply



a failure on the other.



In particular when the send side fails we should _continue_ to read



as we should be flushing the read side buffer before closing.



 



So I agree with starting error recovery, but not with disabling the



reading side (as we haven't encountered a read error).



 



Cheers,



 



Hannes



--



Dr. Hannes Reinecke                  Kernel Storage Architect



hare@suse.de                                +49 911 74053 688



SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg



HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-03-11 10:52     ` Maurizio Lombardi
  2025-03-13  1:48       ` zhang.guanghui
@ 2025-03-28  9:24       ` zhang.guanghui
  2025-04-01 12:11         ` Maurizio Lombardi
  1 sibling, 1 reply; 28+ messages in thread
From: zhang.guanghui @ 2025-03-28  9:24 UTC (permalink / raw)
  To: Maurizio Lombardi, sagi, mgurtovoy, kbusch, sashal, chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

Hi, 

      So far,  the UAF issue have not been occcurred during testing this patch in the same situation. 
Additionally, the patch triggers a compilation warning,  nvme_tcp_fail_request is declared but not defined, please also review.  

Thanks 




zhang.guanghui@cestc.cn



 



发件人： Maurizio Lombardi



发送时间： 2025-03-11 18:52



收件人： zhang.guanghui@cestc.cn; sagi; mgurtovoy; kbusch; sashal; chunguang.xu



抄送： linux-kernel; linux-nvme; linux-block



主题： Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】



On Fri Mar 7, 2025 at 11:10 AM CET, zhang.guanghui@cestc.cn wrote:



>



> Hi



>



>         After testing this patch,   sending request failure occurred,   unfortunately, the issue still persists.



 



Maybe I am completely wrong but I am still quite convinced that the problem here



is that, despite the nvme_tcp_try_send() failure, the target sends a response capsule



for the command, leading to a double-completion in the host.



 



Sagi, what about taking this patch: https://lore.kernel.org/linux-nvme/20250306160322.1370300-2-mlombard@redhat.com/T/#u



and do a step further by not completing the request, leaving the error



recovery handler the task of cleaning everything up?



 



 



diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c



index 327f3f2f5399..72c1d7948386 100644



--- a/drivers/nvme/host/tcp.c



+++ b/drivers/nvme/host/tcp.c



@@ -1328,8 +1328,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)



} else if (ret < 0) {



dev_err(queue->ctrl->ctrl.device,



"failed to send request %d\n", ret);



- nvme_tcp_fail_request(queue->request);



nvme_tcp_done_send_req(queue);



+ nvme_tcp_error_recovery(&queue->ctrl->ctrl);



}



out:



memalloc_noreclaim_restore(noreclaim_flag);



 



 



Maurizio



 



 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: nvme-tcp: fix a possible UAF when failing to send request
  2025-03-28  9:24       ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
@ 2025-04-01 12:11         ` Maurizio Lombardi
  0 siblings, 0 replies; 28+ messages in thread
From: Maurizio Lombardi @ 2025-04-01 12:11 UTC (permalink / raw)
  To: zhang.guanghui@cestc.cn, sagi, mgurtovoy, kbusch, sashal,
	chunguang.xu
  Cc: linux-kernel, linux-nvme, linux-block

On Fri Mar 28, 2025 at 10:24 AM CET, zhang.guanghui@cestc.cn wrote:
> Hi, 
>
>       So far,  the UAF issue have not been occcurred during testing this patch in the same situation. 
> Additionally, the patch triggers a compilation warning,  nvme_tcp_fail_request is declared but not defined, please also review.  
>

Ok, I am going to submit a formal patch.

Thanks for testing it,
Maurizio

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-04-01 12:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10  7:41 nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
2025-02-10 10:01 ` Maurizio Lombardi
2025-02-10 10:24   ` Max Gurtovoy
2025-02-10 11:16     ` zhang.guanghui
2025-02-10 11:34       ` Max Gurtovoy
2025-02-10 16:40     ` Maurizio Lombardi
     [not found]       ` <CAAO4dAWdsMjYMp9jdWXd_48aG0mTtVpRONqjJJ1scnc773tHzg@mail.gmail.com>
2025-02-11  8:04         ` zhang.guanghui
2025-02-12  8:11           ` Maurizio Lombardi
2025-02-12  8:23             ` Maurizio Lombardi
2025-02-12  8:52             ` Maurizio Lombardi
2025-02-12  9:47               ` zhang.guanghui
2025-02-12 10:28                 ` Maurizio Lombardi
2025-02-12 11:14                   ` Maurizio Lombardi
2025-02-12 11:47                     ` Maurizio Lombardi
2025-02-12 15:33 ` Maurizio Lombardi
2025-02-12 16:07   ` Maurizio Lombardi
2025-02-13  2:04     ` zhang.guanghui
2025-02-17  7:46 ` Sagi Grimberg
2025-02-20  8:20   ` zhang.guanghui
2025-03-07 10:10   ` Re: nvme-tcp: fix a possible UAF when failing to send request【请注意，邮件由sagigrim@gmail.com代发】 zhang.guanghui
2025-03-11 10:52     ` Maurizio Lombardi
2025-03-13  1:48       ` zhang.guanghui
2025-03-13  7:51         ` Hannes Reinecke
2025-03-13  8:18           ` zhang.guanghui
     [not found]             ` <2025031316313196627826@cestc.cn>
2025-03-13  9:01               ` Maurizio Lombardi
2025-03-13  8:38           ` zhang.guanghui
2025-03-28  9:24       ` Re: nvme-tcp: fix a possible UAF when failing to send request zhang.guanghui
2025-04-01 12:11         ` Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).