All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	linux-block@vger.kernel.org, Chao Leng <lengchao@huawei.com>
Subject: Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
Date: Tue, 28 Jul 2020 09:09:28 +0800	[thread overview]
Message-ID: <20200728010928.GA1303645@T590> (raw)
In-Reply-To: <bcb8f89b-8477-c48b-1e0f-947cbe741818@grimberg.me>

On Mon, Jul 27, 2020 at 02:00:15PM -0700, Sagi Grimberg wrote:
> 
> > > > > > > +void blk_mq_quiesce_queue_async(struct request_queue *q)
> > > > > > > +{
> > > > > > > +	struct blk_mq_hw_ctx *hctx;
> > > > > > > +	unsigned int i;
> > > > > > > +
> > > > > > > +	blk_mq_quiesce_queue_nowait(q);
> > > > > > > +
> > > > > > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > > > > > +		init_completion(&hctx->rcu_sync.completion);
> > > > > > > +		init_rcu_head(&hctx->rcu_sync.head);
> > > > > > > +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> > > > > > > +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> > > > > > > +				wakeme_after_rcu);
> > > > > > > +		else
> > > > > > > +			call_rcu(&hctx->rcu_sync.head,
> > > > > > > +				wakeme_after_rcu);
> > > > > > > +	}
> > > > > > 
> > > > > > Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> > > > > > synchronize_rcu() is OK for all hctx during waiting.
> > > > > 
> > > > > That's true, but I want a single interface for both. v2 had exactly
> > > > > that, but I decided that this approach is better.
> > > > 
> > > > Not sure one new interface is needed, and one simple way is to:
> > > > 
> > > > 1) call blk_mq_quiesce_queue_nowait() for each request queue
> > > > 
> > > > 2) wait in driver specific way
> > > > 
> > > > Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
> > > > then you may add per-tagset APIs for the waiting.
> > > 
> > > Because it puts assumptions on how quiesce works, which is something
> > > I'd like to avoid because I think its cleaner, what do others think?
> > > Jens? Christoph?
> > 
> > I'd prefer to have it in a helper, and just have blk_mq_quiesce_queue()
> > call that.
> 
> I agree with this approach as well.
> 
> Jens, this mean that we use the call_rcu mechanism also for non-blocking
> hctxs, because the caller  will call it for multiple request queues (see
> patch 2) and we don't want to call synchronize_rcu for every request
> queue serially, we want it to happen in parallel.
> 
> Which leaves us with the patchset as it is, just to convert the
> rcu_synchronize structure to be dynamically allocated on the heap
> rather than keeping it statically allocated in the hctx.
> 
> This is how it looks:
> --
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index abcf590f6238..d913924117d2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -209,6 +209,52 @@ void blk_mq_quiesce_queue_nowait(struct request_queue
> *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> +void blk_mq_quiesce_queue_async(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       int rcu = false;
> +
> +       blk_mq_quiesce_queue_nowait(q);
> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               hctx->rcu_sync = kmalloc(sizeof(*hctx->rcu_sync),
> GFP_KERNEL);
> +               if (!hctx->rcu_sync) {
> +                       /* fallback to serial rcu sync */
> +                       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                               synchronize_srcu(hctx->srcu);
> +                       else
> +                               rcu = true;
> +               } else {
> +                       init_completion(&hctx->rcu_sync->completion);
> +                       init_rcu_head(&hctx->rcu_sync->head);
> +                       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                               call_srcu(hctx->srcu, &hctx->rcu_sync->head,
> +                                       wakeme_after_rcu);
> +                       else
> +                               call_rcu(&hctx->rcu_sync->head,
> +                                       wakeme_after_rcu);
> +               }
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> +
> +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (!hctx->rcu_sync)
> +                       continue;
> +               wait_for_completion(&hctx->rcu_sync->completion);
> +               destroy_rcu_head(&hctx->rcu_sync->head);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> +
>  /**
>   * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
>   * @q: request queue.
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 23230c1d031e..7213ce56bb31 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -5,6 +5,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/sbitmap.h>
>  #include <linux/srcu.h>
> +#include <linux/rcupdate_wait.h>
> 
>  struct blk_mq_tags;
>  struct blk_flush_queue;
> @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
>          */
>         struct list_head        hctx_list;
> 
> +       struct rcu_synchronize  *rcu_sync;

The above pointer needn't to be added to blk_mq_hw_ctx, and it can be
allocated on heap and passed to the waiting helper.


Thanks,
Ming


WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Chao Leng <lengchao@huawei.com>, Keith Busch <kbusch@kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
Date: Tue, 28 Jul 2020 09:09:28 +0800	[thread overview]
Message-ID: <20200728010928.GA1303645@T590> (raw)
In-Reply-To: <bcb8f89b-8477-c48b-1e0f-947cbe741818@grimberg.me>

On Mon, Jul 27, 2020 at 02:00:15PM -0700, Sagi Grimberg wrote:
> 
> > > > > > > +void blk_mq_quiesce_queue_async(struct request_queue *q)
> > > > > > > +{
> > > > > > > +	struct blk_mq_hw_ctx *hctx;
> > > > > > > +	unsigned int i;
> > > > > > > +
> > > > > > > +	blk_mq_quiesce_queue_nowait(q);
> > > > > > > +
> > > > > > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > > > > > +		init_completion(&hctx->rcu_sync.completion);
> > > > > > > +		init_rcu_head(&hctx->rcu_sync.head);
> > > > > > > +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> > > > > > > +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> > > > > > > +				wakeme_after_rcu);
> > > > > > > +		else
> > > > > > > +			call_rcu(&hctx->rcu_sync.head,
> > > > > > > +				wakeme_after_rcu);
> > > > > > > +	}
> > > > > > 
> > > > > > Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> > > > > > synchronize_rcu() is OK for all hctx during waiting.
> > > > > 
> > > > > That's true, but I want a single interface for both. v2 had exactly
> > > > > that, but I decided that this approach is better.
> > > > 
> > > > Not sure one new interface is needed, and one simple way is to:
> > > > 
> > > > 1) call blk_mq_quiesce_queue_nowait() for each request queue
> > > > 
> > > > 2) wait in driver specific way
> > > > 
> > > > Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
> > > > then you may add per-tagset APIs for the waiting.
> > > 
> > > Because it puts assumptions on how quiesce works, which is something
> > > I'd like to avoid because I think its cleaner, what do others think?
> > > Jens? Christoph?
> > 
> > I'd prefer to have it in a helper, and just have blk_mq_quiesce_queue()
> > call that.
> 
> I agree with this approach as well.
> 
> Jens, this mean that we use the call_rcu mechanism also for non-blocking
> hctxs, because the caller  will call it for multiple request queues (see
> patch 2) and we don't want to call synchronize_rcu for every request
> queue serially, we want it to happen in parallel.
> 
> Which leaves us with the patchset as it is, just to convert the
> rcu_synchronize structure to be dynamically allocated on the heap
> rather than keeping it statically allocated in the hctx.
> 
> This is how it looks:
> --
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index abcf590f6238..d913924117d2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -209,6 +209,52 @@ void blk_mq_quiesce_queue_nowait(struct request_queue
> *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> +void blk_mq_quiesce_queue_async(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       int rcu = false;
> +
> +       blk_mq_quiesce_queue_nowait(q);
> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               hctx->rcu_sync = kmalloc(sizeof(*hctx->rcu_sync),
> GFP_KERNEL);
> +               if (!hctx->rcu_sync) {
> +                       /* fallback to serial rcu sync */
> +                       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                               synchronize_srcu(hctx->srcu);
> +                       else
> +                               rcu = true;
> +               } else {
> +                       init_completion(&hctx->rcu_sync->completion);
> +                       init_rcu_head(&hctx->rcu_sync->head);
> +                       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                               call_srcu(hctx->srcu, &hctx->rcu_sync->head,
> +                                       wakeme_after_rcu);
> +                       else
> +                               call_rcu(&hctx->rcu_sync->head,
> +                                       wakeme_after_rcu);
> +               }
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> +
> +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (!hctx->rcu_sync)
> +                       continue;
> +               wait_for_completion(&hctx->rcu_sync->completion);
> +               destroy_rcu_head(&hctx->rcu_sync->head);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> +
>  /**
>   * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
>   * @q: request queue.
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 23230c1d031e..7213ce56bb31 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -5,6 +5,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/sbitmap.h>
>  #include <linux/srcu.h>
> +#include <linux/rcupdate_wait.h>
> 
>  struct blk_mq_tags;
>  struct blk_flush_queue;
> @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
>          */
>         struct list_head        hctx_list;
> 
> +       struct rcu_synchronize  *rcu_sync;

The above pointer needn't to be added to blk_mq_hw_ctx, and it can be
allocated on heap and passed to the waiting helper.


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  parent reply	other threads:[~2020-07-28  1:09 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-26  0:22 [PATCH v3 0/2] improve quiesce time for large amount of namespaces Sagi Grimberg
2020-07-26  0:22 ` Sagi Grimberg
2020-07-26  0:23 ` [PATCH v3 1/2] blk-mq: add async quiesce interface Sagi Grimberg
2020-07-26  0:23   ` Sagi Grimberg
2020-07-26  9:31   ` Ming Lei
2020-07-26  9:31     ` Ming Lei
2020-07-26 16:27     ` Sagi Grimberg
2020-07-26 16:27       ` Sagi Grimberg
2020-07-27  2:08       ` Ming Lei
2020-07-27  2:08         ` Ming Lei
2020-07-27  3:33         ` Chao Leng
2020-07-27  3:33           ` Chao Leng
2020-07-27  3:50           ` Ming Lei
2020-07-27  3:50             ` Ming Lei
2020-07-27  5:55             ` Chao Leng
2020-07-27  5:55               ` Chao Leng
2020-07-27  6:32               ` Ming Lei
2020-07-27  6:32                 ` Ming Lei
2020-07-27 18:40                 ` Sagi Grimberg
2020-07-27 18:40                   ` Sagi Grimberg
2020-07-27 18:38             ` Sagi Grimberg
2020-07-27 18:38               ` Sagi Grimberg
2020-07-27 18:36         ` Sagi Grimberg
2020-07-27 18:36           ` Sagi Grimberg
2020-07-27 20:37           ` Jens Axboe
2020-07-27 20:37             ` Jens Axboe
2020-07-27 21:00             ` Sagi Grimberg
2020-07-27 21:00               ` Sagi Grimberg
2020-07-27 21:05               ` Jens Axboe
2020-07-27 21:05                 ` Jens Axboe
2020-07-27 21:21                 ` Keith Busch
2020-07-27 21:21                   ` Keith Busch
2020-07-27 21:30                   ` Jens Axboe
2020-07-27 21:30                     ` Jens Axboe
2020-07-28  1:09               ` Ming Lei [this message]
2020-07-28  1:09                 ` Ming Lei
2020-07-26  0:23 ` [PATCH v3 2/2] nvme: improve quiesce time for large amount of namespaces Sagi Grimberg
2020-07-26  0:23   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728010928.GA1303645@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.