From: Sagi Grimberg <sagi@grimberg.me>
To: Hannes Reinecke <hare@suse.de>, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>,
linux-nvme@lists.infradead.org,
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Subject: Re: [PATCH 1/2] nvmet-rdma: avoid circular locking dependency on install_queue()
Date: Mon, 4 Dec 2023 13:57:32 +0200 [thread overview]
Message-ID: <9ca33803-e2ca-4550-8d6e-1e64219fb3d9@grimberg.me> (raw)
In-Reply-To: <c261fe62-e8e6-4470-b9e2-3047fd4c3d99@suse.de>
On 12/4/23 13:49, Hannes Reinecke wrote:
> On 12/4/23 11:19, Sagi Grimberg wrote:
>>
>>
>> On 11/20/23 15:48, Sagi Grimberg wrote:
>>>
>>>>> According to 777dc82395de ("nvmet-rdma: occasionally flush ongoing
>>>>> controller teardown") this is just for reducing the memory footprint.
>>>>> Wonder if we need to bother, and whether it won't be better to remove
>>>>> the whole thing entirely.
>>>>
>>>> Well, Sagi added it, so I'll let him chime in on the usefulness.
>>>
>>> While I don't like having nvmet arbitrarily replying busy and instead
>>> have lockdep simply just accept that its not a deadlock here, but we can
>>> simply just sidetrack it as proposed I guess.
>>>
>>> But Hannes, this is on the other extreme.. Now every connect that nvmet
>>> gets, if there is even a single queue that is disconnecting (global
>>> scope), then the host is denied. Lets give it a sane backlog.
>>> We use rdma_listen backlog of 128, so maybe stick with this magic
>>> number... This way we are busy only if more than 128 queues are tearing
>>> down to prevent the memory footprint from exploding, and hopefully it is
>>> rare enough that the normal host does not see an arbitrary busy
>>> rejection.
>>>
>>> Same comment for nvmet-tcp.
>>
>> Hey Hannes, anything happened with this one?
>>
>> Overall I think that the approach is fine, but I do think
>> that we need a backlog for it.
>
> Hmm. The main issue here is the call to 'flush_workqueue()', which
> triggers the circular lock warning. So a ratelimit would only help
> us so much; the real issue is to get rid of the flush_workqueue()
> thingie.
>
> What I can to is to add this:
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 4cc27856aa8f..72bcc54701a0 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -2119,8 +2119,20 @@ static u16 nvmet_tcp_install_queue(struct
> nvmet_sq *sq)
> container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>
> if (sq->qid == 0) {
> + struct nvmet_tcp_queue *q;
> + int pending = 0;
> +
> /* Let inflight controller teardown complete */
> - flush_workqueue(nvmet_wq);
> + mutex_lock(&nvmet_tcp_queue_mutex);
> + list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
> + if (q->nvme_sq.ctrl == sq->ctrl &&
> + q->state == NVMET_TCP_Q_DISCONNECTING)
> + pending++;
> + }
> + mutex_unlock(&nvmet_tcp_queue_mutex);
> + /* Retry for pending controller teardown */
> + if (pending)
> + return NVME_SC_CONNECT_CTRL_BUSY;
> }
>
> which then would only affect the controller we're connecting to.
> Hmm?
Still I think we should give a reasonable backlog, no reason to limit
this as we may hit this more often than we'd like and the sole purpose
here is to avoid memory overrun.
next prev parent reply other threads:[~2023-12-04 11:57 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-02 14:19 [PATCHv2 0/2] nvmet: avoid circular locking warning Hannes Reinecke
2023-11-02 14:19 ` [PATCH 1/2] nvmet-rdma: avoid circular locking dependency on install_queue() Hannes Reinecke
2023-11-03 8:23 ` Christoph Hellwig
2023-11-03 8:53 ` Hannes Reinecke
2023-11-03 9:19 ` Christoph Hellwig
2023-11-03 11:58 ` Hannes Reinecke
2023-11-03 14:05 ` Christoph Hellwig
2023-11-20 13:48 ` Sagi Grimberg
2023-12-04 10:19 ` Sagi Grimberg
2023-12-04 11:49 ` Hannes Reinecke
2023-12-04 11:57 ` Sagi Grimberg [this message]
2023-12-04 12:31 ` Hannes Reinecke
2023-12-04 12:46 ` Sagi Grimberg
2023-12-07 5:54 ` Shinichiro Kawasaki
2023-12-07 12:17 ` Sagi Grimberg
2023-11-02 14:19 ` [PATCH 2/2] nvmet-tcp: " Hannes Reinecke
-- strict thread matches above, loose matches on Subject: below --
2023-11-01 10:32 [PATCH 0/2] nvmet: avoid circular locking warning Hannes Reinecke
2023-11-01 10:32 ` [PATCH 1/2] nvmet-rdma: avoid circular locking dependency on install_queue() Hannes Reinecke
2023-11-01 16:21 ` Jens Axboe
2023-11-01 17:28 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9ca33803-e2ca-4550-8d6e-1e64219fb3d9@grimberg.me \
--to=sagi@grimberg.me \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox