Re: [PATCH 1/1] nvme: fix use after free when disconnect a reconnecting ctrl

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

From: James Smart <jsmart2021@gmail.com>
To: Sagi Grimberg <sagi@grimberg.me>, Ruozhu Li <liruozhu@huawei.com>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH 1/1] nvme: fix use after free when disconnect a reconnecting ctrl
Date: Thu, 4 Nov 2021 16:23:09 -0700	[thread overview]
Message-ID: <f6f05614-1070-1dc5-6adc-bd642f21d5e5@gmail.com> (raw)
In-Reply-To: <8165ac91-3ed1-f0c4-16d3-7e6741a610fb@grimberg.me>

On 11/4/2021 5:26 AM, Sagi Grimberg wrote:
> 
>> A crash happens when I try to disconnect a reconnecting ctrl:
>>
>> 1) The network was cut off when the connection was just established,
>> scan work hang there waiting for some IOs complete.Those IOs were
>> retrying because we return BLK_STS_RESOURCE to blk in reconnecting.
>>
>> 2) After a while, I tried to disconnect this connection.This procedure
>> also hung because it tried to obtain ctrl->scan_lock.It should be noted
>> that now we have switched the controller state to NVME_CTRL_DELETING.
>>
>> 3) In nvme_check_ready(), we always return true when ctrl->state is
>> NVME_CTRL_DELETING, so those retrying IOs were issued to the bottom
>> device which was already freed.
>>
>> To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
>> device only when queue state is live.If not, return host path error to 
>> blk.
>>
>> Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
>> ---
>>   drivers/nvme/host/core.c | 1 +
>>   drivers/nvme/host/nvme.h | 2 +-
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 838b5e2058be..752203ad7639 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -666,6 +666,7 @@ blk_status_t nvme_fail_nonready_command(struct 
>> nvme_ctrl *ctrl,
>>           struct request *rq)
>>   {
>>       if (ctrl->state != NVME_CTRL_DELETING_NOIO &&
>> +        ctrl->state != NVME_CTRL_DELETING &&
> 
> Please explain why you need this change? As suggested by the name
> only DELETING_NOIO does not accept I/O, and if we return
> BLK_STS_RESOURCE we can get into an endless loop of resubmission.

Before the change below (if fabrics and DELETING, return queue_live), 
when DELETING, fabrics always would have returned true and never called 
the nvme_fail_nonready_command() routine.

But with the change, we now have DELETING cases where qlive is false 
calling this routine. Its possible some of those may have returned 
BLK_STS_RESOURCE and gotten into the endless loop. The !DELETING check 
keeps the same behavior as prior while forcing the new DELETING requests 
to return host_path_error.

I think the change is ok.


> 
>>           ctrl->state != NVME_CTRL_DEAD &&
>>           !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
>>           !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index b334af8aa264..9b095ee01364 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -709,7 +709,7 @@ static inline bool nvme_check_ready(struct 
>> nvme_ctrl *ctrl, struct request *rq,
>>           return true;
>>       if (ctrl->ops->flags & NVME_F_FABRICS &&
>>           ctrl->state == NVME_CTRL_DELETING)
>> -        return true;
>> +        return queue_live;
> 
> I agree with this change. I thought I've already seen this change from
> James in the past.
> 

this new test was added when when nvmf_check_ready() moved to 
nvme_check_ready, as fabrics need to do GET/SET_PROPERTIES for register 
access on shutdown (CC, CSTS) whereas PCI doesn't.  So it was keeping 
the fabrics unconditional return true to let them through.

It's ok to qualify it as to whether the transport has the queue live.

-- james

next prev parent reply	other threads:[~2021-11-04 23:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-04  7:13 [PATCH 0/1] fix UAF when disconnect a reconnecting state ctrl Ruozhu Li
2021-11-04  7:13 ` [PATCH 1/1] nvme: fix use after free when disconnect a reconnecting ctrl Ruozhu Li
2021-11-04 12:26   ` Sagi Grimberg
2021-11-04 23:23     ` James Smart [this message]
2021-11-05  1:55       ` liruozhu
2021-11-05  1:34     ` liruozhu
2021-11-13 10:04       ` liruozhu
2021-11-14 10:20         ` Sagi Grimberg
2021-11-11  4:09   ` liruozhu
2021-11-25  3:20   ` liruozhu
2021-12-07 12:45   ` liruozhu
2021-12-07 17:23     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6f05614-1070-1dc5-6adc-bd642f21d5e5@gmail.com \
    --to=jsmart2021@gmail.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=liruozhu@huawei.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox