From: Sagi Grimberg <sagi@grimberg.me>
To: James Smart <james.smart@broadcom.com>, linux-nvme@lists.infradead.org
Cc: Keith Busch <kbusch@kernel.org>,
Israel Rukshin <israelr@mellanox.com>,
Christoph Hellwig <hch@lst.de>,
stable@vger.kernel.org, Max Gurtovoy <maxg@mellanox.com>
Subject: Re: [PATCH] nvme: Revert: Fix controller creation races with teardown flow
Date: Mon, 31 Aug 2020 16:15:22 -0700 [thread overview]
Message-ID: <4aaad97e-03b5-5c22-af8e-ae7624e78991@grimberg.me> (raw)
In-Reply-To: <7f43e9db-974a-5e98-76a6-ed2f3bd0dc92@broadcom.com>
>>>> This is indeed a regression.
>>>>
>>>> Perhaps we should also revert:
>>>> 12a0b6622107 ("nvme: don't hold nvmf_transports_rwsem for more than
>>>> transport lookups")
>>>>
>>>> Which inherently caused this by removing the serialization of
>>>> .create_ctrl()...
>>>
>>> no, I believe the patch on the semaphore is correct. Otherwise -
>>> things can be blocked a long time.. a minute (1 cmd timeout) or even
>>> multiple minutes in the case where a command failure in core layers
>>> effectively gets ignored and thus doesn't cause the error path in the
>>> transport. There can be multiple /dev/nvme-fabrics commands stacked
>>> up that can make the delays look much longer to the last guy.
>>>
>>> as far as creation vs teardown... yeah, not fun, but there are other
>>> ways to deal with it. FC: I got rid of the separate create/reconnect
>>> threads a while ago thus the return-control-while-reconnecting
>>> behavior, so I've had to deal with it. It's one area it'd be nice to
>>> see some convergence in implementation again between transports.
>>
>> Doesn't fc have a bug there? in create_ctrl after flushing the
>> connect_work, what is telling it if delete is running in with it
>> (or that it already ran...)
>
> I guess I don't understand what the bug is you are thinking about. Maybe
> there's a short period that the ctrl ptr is perhaps freed, thus the
> pointer shouldn't be used - but I don't see it as almost everything is
> simply looking at the value of the pointer, not dereferencing it.
I'm referring to nvme_fc_init_ctrl, if delete happens while it
is waiting in flush_delayed_work(&ctrl->connect_work); won't you
dereference and return a controller that is possibly already
deleted/freed?
> I do have a bug or two with delete association fighting with
> create_association - but it's mainly due to nvme_fc_error_recovery not
> the delete routine. I've reworked this area after seeing your other
> patches and will be posting after some more testing. But no reason for
> synchronizing all ctrl creates.
Is it that big of an issue? it should fail rather quickly shouldn't it?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-08-31 23:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-28 19:01 [PATCH] nvme: Revert: Fix controller creation races with teardown flow James Smart
2020-08-28 19:08 ` Sagi Grimberg
2020-08-28 20:24 ` James Smart
2020-08-28 23:59 ` Sagi Grimberg
2020-08-31 22:26 ` James Smart
2020-08-31 23:15 ` Sagi Grimberg [this message]
2020-09-01 15:39 ` James Smart
2020-09-01 22:01 ` Sagi Grimberg
2020-09-02 0:01 ` James Smart
2020-09-08 17:47 ` Christoph Hellwig
2020-09-08 17:47 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4aaad97e-03b5-5c22-af8e-ae7624e78991@grimberg.me \
--to=sagi@grimberg.me \
--cc=hch@lst.de \
--cc=israelr@mellanox.com \
--cc=james.smart@broadcom.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=maxg@mellanox.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).