From: James Smart <james.smart@broadcom.com>
To: Sagi Grimberg <sagi@grimberg.me>, linux-nvme@lists.infradead.org
Cc: Israel Rukshin <israelr@mellanox.com>,
stable@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
Max Gurtovoy <maxg@mellanox.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] nvme: Revert: Fix controller creation races with teardown flow
Date: Fri, 28 Aug 2020 13:24:10 -0700 [thread overview]
Message-ID: <a73cd06b-b319-d55f-1465-4263e08900ae@broadcom.com> (raw)
In-Reply-To: <0867c437-1521-c0c9-d7fa-6a615d88105a@grimberg.me>
On 8/28/2020 12:08 PM, Sagi Grimberg wrote:
>
>> The indicated patch introduced a barrier in the sysfs_delete attribute
>> for the controller that rejects the request if the controller isn't
>> created. "Created" is defined as at least 1 call to nvme_start_ctrl().
>>
>> This is problematic in error-injection testing. If an error occurs on
>> the initial attempt to create an association and the controller enters
>> reconnect(s) attempts, the admin cannot delete the controller until
>> either there is a successful association created or ctrl_loss_tmo
>> times out.
>>
>> Where this issue is particularly hurtful is when the "admin" is the
>> nvme-cli, it is performing a connection to a discovery controller, and
>> it is initiated via auto-connect scripts. With the FC transport, if the
>> first connection attempt fails, the controller enters a normal reconnect
>> state but returns control to the cli thread that created the controller.
>> In this scenario, the cli attempts to read the discovery log via ioctl,
>> which fails, causing the cli to see it as an empty log and then proceeds
>> to delete the discovery controller. The delete is rejected and the
>> controller is left live. If the discovery controller reconnect then
>> succeeds, there is no action to delete it, and it sits live doing
>> nothing.
>
> This is indeed a regression.
>
> Perhaps we should also revert:
> 12a0b6622107 ("nvme: don't hold nvmf_transports_rwsem for more than
> transport lookups")
>
> Which inherently caused this by removing the serialization of
> .create_ctrl()...
no, I believe the patch on the semaphore is correct. Otherwise - things
can be blocked a long time.. a minute (1 cmd timeout) or even multiple
minutes in the case where a command failure in core layers effectively
gets ignored and thus doesn't cause the error path in the transport.
There can be multiple /dev/nvme-fabrics commands stacked up that can
make the delays look much longer to the last guy.
as far as creation vs teardown... yeah, not fun, but there are other
ways to deal with it. FC: I got rid of the separate create/reconnect
threads a while ago thus the return-control-while-reconnecting behavior,
so I've had to deal with it. It's one area it'd be nice to see some
convergence in implementation again between transports.
-- james
next prev parent reply other threads:[~2020-08-28 20:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-28 19:01 [PATCH] nvme: Revert: Fix controller creation races with teardown flow James Smart
2020-08-28 19:08 ` Sagi Grimberg
2020-08-28 20:24 ` James Smart [this message]
2020-08-28 23:59 ` Sagi Grimberg
2020-08-31 22:26 ` James Smart
2020-08-31 23:15 ` Sagi Grimberg
2020-09-01 15:39 ` James Smart
2020-09-01 22:01 ` Sagi Grimberg
2020-09-02 0:01 ` James Smart
2020-09-08 17:47 ` Christoph Hellwig
2020-09-08 17:47 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2020-08-28 19:01 James Smart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a73cd06b-b319-d55f-1465-4263e08900ae@broadcom.com \
--to=james.smart@broadcom.com \
--cc=hch@lst.de \
--cc=israelr@mellanox.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=maxg@mellanox.com \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox