Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: James Smart <james.smart@broadcom.com>
To: Sagi Grimberg <sagi@grimberg.me>, linux-nvme@lists.infradead.org
Cc: Israel Rukshin <israelr@mellanox.com>,
	stable@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	Max Gurtovoy <maxg@mellanox.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] nvme: Revert: Fix controller creation races with teardown flow
Date: Tue, 1 Sep 2020 08:39:01 -0700	[thread overview]
Message-ID: <f7d0c08f-2a34-fffa-7962-d0641bc579fd@broadcom.com> (raw)
In-Reply-To: <4aaad97e-03b5-5c22-af8e-ae7624e78991@grimberg.me>



On 8/31/2020 4:15 PM, Sagi Grimberg wrote:
>
>>>>> This is indeed a regression.
>>>>>
>>>>> Perhaps we should also revert:
>>>>> 12a0b6622107 ("nvme: don't hold nvmf_transports_rwsem for more 
>>>>> than transport lookups")
>>>>>
>>>>> Which inherently caused this by removing the serialization of
>>>>> .create_ctrl()...
>>>>
>>>> no, I believe the patch on the semaphore is correct. Otherwise - 
>>>> things can be blocked a long time.. a minute (1 cmd timeout) or 
>>>> even multiple minutes in the case where a command failure in core 
>>>> layers effectively gets ignored and thus doesn't cause the error 
>>>> path in the transport. There can be multiple /dev/nvme-fabrics 
>>>> commands stacked up that can make the delays look much longer to 
>>>> the last guy.
>>>>
>>>> as far as creation vs teardown... yeah, not fun, but there are 
>>>> other ways to deal with it. FC: I got rid of the separate 
>>>> create/reconnect threads a while ago thus the 
>>>> return-control-while-reconnecting behavior, so I've had to deal 
>>>> with it.  It's one area it'd be nice to see some convergence in 
>>>> implementation again between transports.
>>>
>>> Doesn't fc have a bug there? in create_ctrl after flushing the
>>> connect_work, what is telling it if delete is running in with it
>>> (or that it already ran...)
>>
>> I guess I don't understand what the bug is you are thinking about. 
>> Maybe there's a short period that the ctrl ptr is perhaps freed, thus 
>> the pointer shouldn't be used - but I don't see it as almost 
>> everything is simply looking at  the value of the pointer, not 
>> dereferencing it.
>
> I'm referring to nvme_fc_init_ctrl, if delete happens while it
> is waiting in flush_delayed_work(&ctrl->connect_work); won't you
> dereference and return a controller that is possibly already
> deleted/freed?

ok - that matches my "short period" and it is possible as there's one 
immediate printf that may dereference the ptr. After that, it's 
comparisons of the pointer value.  I can move the printf to avoid the 
issue.  That window's rather small.


>
>> I do have a bug or two  with delete association fighting with 
>> create_association - but it's mainly due to nvme_fc_error_recovery 
>> not the delete routine. I've reworked this area after seeing your 
>> other patches and will be posting after some more testing.  But no 
>> reason for synchronizing all ctrl creates.
>
> Is it that big of an issue? it should fail rather quickly shouldn't it?

not sure what you are asking.   if it's how long to fail the creation of 
a new association - it's at least 60s (an admin command timeout).

-- james


  reply	other threads:[~2020-09-01 16:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 19:01 [PATCH] nvme: Revert: Fix controller creation races with teardown flow James Smart
2020-08-28 19:08 ` Sagi Grimberg
2020-08-28 20:24   ` James Smart
2020-08-28 23:59     ` Sagi Grimberg
2020-08-31 22:26       ` James Smart
2020-08-31 23:15         ` Sagi Grimberg
2020-09-01 15:39           ` James Smart [this message]
2020-09-01 22:01             ` Sagi Grimberg
2020-09-02  0:01               ` James Smart
2020-09-08 17:47 ` Christoph Hellwig
2020-09-08 17:47   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2020-08-28 19:01 James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7d0c08f-2a34-fffa-7962-d0641bc579fd@broadcom.com \
    --to=james.smart@broadcom.com \
    --cc=hch@lst.de \
    --cc=israelr@mellanox.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox