All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Chao Leng <lengchao@huawei.com>
Subject: Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
Date: Mon, 27 Jul 2020 10:08:03 +0800	[thread overview]
Message-ID: <20200727020803.GC1129253@T590> (raw)
In-Reply-To: <9ac5f658-31b3-bb19-e5fe-385a629a7d67@grimberg.me>

On Sun, Jul 26, 2020 at 09:27:56AM -0700, Sagi Grimberg wrote:
> 
> > > +void blk_mq_quiesce_queue_async(struct request_queue *q)
> > > +{
> > > +	struct blk_mq_hw_ctx *hctx;
> > > +	unsigned int i;
> > > +
> > > +	blk_mq_quiesce_queue_nowait(q);
> > > +
> > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > +		init_completion(&hctx->rcu_sync.completion);
> > > +		init_rcu_head(&hctx->rcu_sync.head);
> > > +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> > > +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> > > +				wakeme_after_rcu);
> > > +		else
> > > +			call_rcu(&hctx->rcu_sync.head,
> > > +				wakeme_after_rcu);
> > > +	}
> > 
> > Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> > synchronize_rcu() is OK for all hctx during waiting.
> 
> That's true, but I want a single interface for both. v2 had exactly
> that, but I decided that this approach is better.

Not sure one new interface is needed, and one simple way is to:

1) call blk_mq_quiesce_queue_nowait() for each request queue

2) wait in driver specific way

Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
then you may add per-tagset APIs for the waiting.

> 
> Also, having the driver call a single synchronize_rcu isn't great

Too many drivers are using synchronize_rcu():

	$ git grep -n synchronize_rcu ./drivers/ | wc
	    186     524   11384

> layering (as quiesce can possibly use a different mechanism in the future).

What is the different mechanism?

> So drivers assumptions like:
> 
>         /*
>          * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
>          * calling synchronize_rcu() once is enough.
>          */
>         WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
> 
>         if (!ret)
>                 synchronize_rcu();
> 
> Are not great...

Both rcu read lock/unlock and synchronize_rcu is global interface, then
it is reasonable to avoid unnecessary synchronize_rcu().

> 
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> > > +
> > > +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> > > +{
> > > +	struct blk_mq_hw_ctx *hctx;
> > > +	unsigned int i;
> > > +
> > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > +		wait_for_completion(&hctx->rcu_sync.completion);
> > > +		destroy_rcu_head(&hctx->rcu_sync.head);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> > > +
> > >   /**
> > >    * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
> > >    * @q: request queue.
> > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > > index 23230c1d031e..5536e434311a 100644
> > > --- a/include/linux/blk-mq.h
> > > +++ b/include/linux/blk-mq.h
> > > @@ -5,6 +5,7 @@
> > >   #include <linux/blkdev.h>
> > >   #include <linux/sbitmap.h>
> > >   #include <linux/srcu.h>
> > > +#include <linux/rcupdate_wait.h>
> > >   struct blk_mq_tags;
> > >   struct blk_flush_queue;
> > > @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
> > >   	 */
> > >   	struct list_head	hctx_list;
> > > +	struct rcu_synchronize	rcu_sync;
> > The above struct takes at least 5 words, and I'd suggest to avoid it,
> > and the hctx->srcu should be re-used for waiting BLK_MQ_F_BLOCKING.
> > Meantime !BLK_MQ_F_BLOCKING doesn't need it.
> 
> It is at the end and contains exactly what is needed to synchronize. Not

The sync is simply single global synchronize_rcu(), and why bother to add
extra >=40bytes for each hctx.

> sure what you mean by reuse hctx->srcu?

You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
to each hctx for just simulating one single synchronize_rcu().


Thanks,
Ming


WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Chao Leng <lengchao@huawei.com>, Keith Busch <kbusch@kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
Date: Mon, 27 Jul 2020 10:08:03 +0800	[thread overview]
Message-ID: <20200727020803.GC1129253@T590> (raw)
In-Reply-To: <9ac5f658-31b3-bb19-e5fe-385a629a7d67@grimberg.me>

On Sun, Jul 26, 2020 at 09:27:56AM -0700, Sagi Grimberg wrote:
> 
> > > +void blk_mq_quiesce_queue_async(struct request_queue *q)
> > > +{
> > > +	struct blk_mq_hw_ctx *hctx;
> > > +	unsigned int i;
> > > +
> > > +	blk_mq_quiesce_queue_nowait(q);
> > > +
> > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > +		init_completion(&hctx->rcu_sync.completion);
> > > +		init_rcu_head(&hctx->rcu_sync.head);
> > > +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> > > +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> > > +				wakeme_after_rcu);
> > > +		else
> > > +			call_rcu(&hctx->rcu_sync.head,
> > > +				wakeme_after_rcu);
> > > +	}
> > 
> > Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> > synchronize_rcu() is OK for all hctx during waiting.
> 
> That's true, but I want a single interface for both. v2 had exactly
> that, but I decided that this approach is better.

Not sure one new interface is needed, and one simple way is to:

1) call blk_mq_quiesce_queue_nowait() for each request queue

2) wait in driver specific way

Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
then you may add per-tagset APIs for the waiting.

> 
> Also, having the driver call a single synchronize_rcu isn't great

Too many drivers are using synchronize_rcu():

	$ git grep -n synchronize_rcu ./drivers/ | wc
	    186     524   11384

> layering (as quiesce can possibly use a different mechanism in the future).

What is the different mechanism?

> So drivers assumptions like:
> 
>         /*
>          * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
>          * calling synchronize_rcu() once is enough.
>          */
>         WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
> 
>         if (!ret)
>                 synchronize_rcu();
> 
> Are not great...

Both rcu read lock/unlock and synchronize_rcu is global interface, then
it is reasonable to avoid unnecessary synchronize_rcu().

> 
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> > > +
> > > +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> > > +{
> > > +	struct blk_mq_hw_ctx *hctx;
> > > +	unsigned int i;
> > > +
> > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > +		wait_for_completion(&hctx->rcu_sync.completion);
> > > +		destroy_rcu_head(&hctx->rcu_sync.head);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> > > +
> > >   /**
> > >    * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
> > >    * @q: request queue.
> > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > > index 23230c1d031e..5536e434311a 100644
> > > --- a/include/linux/blk-mq.h
> > > +++ b/include/linux/blk-mq.h
> > > @@ -5,6 +5,7 @@
> > >   #include <linux/blkdev.h>
> > >   #include <linux/sbitmap.h>
> > >   #include <linux/srcu.h>
> > > +#include <linux/rcupdate_wait.h>
> > >   struct blk_mq_tags;
> > >   struct blk_flush_queue;
> > > @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
> > >   	 */
> > >   	struct list_head	hctx_list;
> > > +	struct rcu_synchronize	rcu_sync;
> > The above struct takes at least 5 words, and I'd suggest to avoid it,
> > and the hctx->srcu should be re-used for waiting BLK_MQ_F_BLOCKING.
> > Meantime !BLK_MQ_F_BLOCKING doesn't need it.
> 
> It is at the end and contains exactly what is needed to synchronize. Not

The sync is simply single global synchronize_rcu(), and why bother to add
extra >=40bytes for each hctx.

> sure what you mean by reuse hctx->srcu?

You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
to each hctx for just simulating one single synchronize_rcu().


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-07-27  2:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-26  0:22 [PATCH v3 0/2] improve quiesce time for large amount of namespaces Sagi Grimberg
2020-07-26  0:22 ` Sagi Grimberg
2020-07-26  0:23 ` [PATCH v3 1/2] blk-mq: add async quiesce interface Sagi Grimberg
2020-07-26  0:23   ` Sagi Grimberg
2020-07-26  9:31   ` Ming Lei
2020-07-26  9:31     ` Ming Lei
2020-07-26 16:27     ` Sagi Grimberg
2020-07-26 16:27       ` Sagi Grimberg
2020-07-27  2:08       ` Ming Lei [this message]
2020-07-27  2:08         ` Ming Lei
2020-07-27  3:33         ` Chao Leng
2020-07-27  3:33           ` Chao Leng
2020-07-27  3:50           ` Ming Lei
2020-07-27  3:50             ` Ming Lei
2020-07-27  5:55             ` Chao Leng
2020-07-27  5:55               ` Chao Leng
2020-07-27  6:32               ` Ming Lei
2020-07-27  6:32                 ` Ming Lei
2020-07-27 18:40                 ` Sagi Grimberg
2020-07-27 18:40                   ` Sagi Grimberg
2020-07-27 18:38             ` Sagi Grimberg
2020-07-27 18:38               ` Sagi Grimberg
2020-07-27 18:36         ` Sagi Grimberg
2020-07-27 18:36           ` Sagi Grimberg
2020-07-27 20:37           ` Jens Axboe
2020-07-27 20:37             ` Jens Axboe
2020-07-27 21:00             ` Sagi Grimberg
2020-07-27 21:00               ` Sagi Grimberg
2020-07-27 21:05               ` Jens Axboe
2020-07-27 21:05                 ` Jens Axboe
2020-07-27 21:21                 ` Keith Busch
2020-07-27 21:21                   ` Keith Busch
2020-07-27 21:30                   ` Jens Axboe
2020-07-27 21:30                     ` Jens Axboe
2020-07-28  1:09               ` Ming Lei
2020-07-28  1:09                 ` Ming Lei
2020-07-26  0:23 ` [PATCH v3 2/2] nvme: improve quiesce time for large amount of namespaces Sagi Grimberg
2020-07-26  0:23   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200727020803.GC1129253@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.