All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
Date: Tue, 5 Oct 2021 10:31:19 +0800	[thread overview]
Message-ID: <YVu49xcM1N//fvKR@T590> (raw)
In-Reply-To: <e3d6c61c-f7cf-dcb0-df2e-a8e9acf5aaaa@acm.org>

On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> On 9/30/21 5:56 AM, Ming Lei wrote:
> > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > queue is frozen, and in several cases, such as switching io scheduler,
> > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > supplement of freezing queue.
> 
> Is there agreement about this? If not, how about leaving out the above from the
> patch description?

Yeah, actually the code has been merged, please see the related
functions: elevator_switch(), queue_wb_lat_store() and
blk_mq_update_nr_requests().

> 
> > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > for us to need support nested quiesce, especially we can't let
> > unquiesce happen when there is quiesce originated from other contexts.
> > 
> > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > and we only unquiesce queue when it is the last/outer-most one of all
> > contexts.
> > 
> > One kernel panic issue has been reported[2] when running stress test on
> > dm-mpath's updating nr_requests and suspending queue, and the similar
> > issue should exist on almost all drivers which use quiesce/unquiesce.
> > 
> > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> 
> Please share the call stack of the kernel oops fixed by [2] since that
> call stack is not in the patch description.

OK, it is something like the following:

[  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
[  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
[  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
[  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
[  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
[  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
[  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
[  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
[  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
[  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
[  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  145.460250] Call Trace:
[  145.460779]  <IRQ>
[  145.461453]  blk_done_softirq+0xa1/0xd0
[  145.462138]  __do_softirq+0xd7/0x2d6
[  145.462814]  irq_exit+0xf7/0x100
[  145.463480]  do_IRQ+0x7f/0xd0
[  145.464131]  common_interrupt+0xf/0xf
[  145.464797]  </IRQ>

> 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 21bf4c3f0825..10f8a3d4e3a1 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
> >    */
> >   void blk_mq_quiesce_queue_nowait(struct request_queue *q)
> >   {
> > -	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (!q->quiesce_depth++)
> > +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   }
> >   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> Consider using == 0 instead of ! to check whether or not quiesce_depth is
> zero to improve code readability.

Fine.

> 
> > @@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> >    */
> >   void blk_mq_unquiesce_queue(struct request_queue *q)
> >   {
> > -	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +	bool run_queue = false;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> > +		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +		run_queue = true;
> > +	}
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   	/* dispatch requests which are inserted during quiescing */
> > -	blk_mq_run_hw_queues(q, true);
> > +	if (run_queue)
> > +		blk_mq_run_hw_queues(q, true);
> >   }
> 
> So calling with blk_mq_unquiesce_queue() q->quiesce_depth <= 0 is ignored
> quietly? How about triggering a kernel warning for that condition?

OK.


Thanks, 
Ming


WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
Date: Tue, 5 Oct 2021 10:31:19 +0800	[thread overview]
Message-ID: <YVu49xcM1N//fvKR@T590> (raw)
In-Reply-To: <e3d6c61c-f7cf-dcb0-df2e-a8e9acf5aaaa@acm.org>

On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> On 9/30/21 5:56 AM, Ming Lei wrote:
> > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > queue is frozen, and in several cases, such as switching io scheduler,
> > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > supplement of freezing queue.
> 
> Is there agreement about this? If not, how about leaving out the above from the
> patch description?

Yeah, actually the code has been merged, please see the related
functions: elevator_switch(), queue_wb_lat_store() and
blk_mq_update_nr_requests().

> 
> > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > for us to need support nested quiesce, especially we can't let
> > unquiesce happen when there is quiesce originated from other contexts.
> > 
> > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > and we only unquiesce queue when it is the last/outer-most one of all
> > contexts.
> > 
> > One kernel panic issue has been reported[2] when running stress test on
> > dm-mpath's updating nr_requests and suspending queue, and the similar
> > issue should exist on almost all drivers which use quiesce/unquiesce.
> > 
> > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> 
> Please share the call stack of the kernel oops fixed by [2] since that
> call stack is not in the patch description.

OK, it is something like the following:

[  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
[  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
[  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
[  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
[  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
[  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
[  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
[  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
[  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
[  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
[  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  145.460250] Call Trace:
[  145.460779]  <IRQ>
[  145.461453]  blk_done_softirq+0xa1/0xd0
[  145.462138]  __do_softirq+0xd7/0x2d6
[  145.462814]  irq_exit+0xf7/0x100
[  145.463480]  do_IRQ+0x7f/0xd0
[  145.464131]  common_interrupt+0xf/0xf
[  145.464797]  </IRQ>

> 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 21bf4c3f0825..10f8a3d4e3a1 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
> >    */
> >   void blk_mq_quiesce_queue_nowait(struct request_queue *q)
> >   {
> > -	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (!q->quiesce_depth++)
> > +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   }
> >   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> Consider using == 0 instead of ! to check whether or not quiesce_depth is
> zero to improve code readability.

Fine.

> 
> > @@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> >    */
> >   void blk_mq_unquiesce_queue(struct request_queue *q)
> >   {
> > -	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +	bool run_queue = false;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> > +		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +		run_queue = true;
> > +	}
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   	/* dispatch requests which are inserted during quiescing */
> > -	blk_mq_run_hw_queues(q, true);
> > +	if (run_queue)
> > +		blk_mq_run_hw_queues(q, true);
> >   }
> 
> So calling with blk_mq_unquiesce_queue() q->quiesce_depth <= 0 is ignored
> quietly? How about triggering a kernel warning for that condition?

OK.


Thanks, 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-10-05  2:31 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 12:56 [PATCH V2 0/5] blk-mq: support concurrent queue quiescing Ming Lei
2021-09-30 12:56 ` Ming Lei
2021-09-30 12:56 ` [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-10-01  5:56   ` Chaitanya Kulkarni
2021-10-01  5:56     ` Chaitanya Kulkarni
2021-10-05  2:23     ` Ming Lei
2021-10-05  2:23       ` Ming Lei
2021-10-05  3:38       ` Chaitanya Kulkarni
2021-10-05  3:38         ` Chaitanya Kulkarni
2021-10-05  8:04         ` Ming Lei
2021-10-05  8:04           ` Ming Lei
2021-10-11 11:39       ` Christoph Hellwig
2021-10-11 11:39         ` Christoph Hellwig
2021-10-13  4:25         ` Chaitanya Kulkarni
2021-10-13  4:25           ` Chaitanya Kulkarni
2021-09-30 12:56 ` [PATCH V2 2/5] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-10-01  5:57   ` Chaitanya Kulkarni
2021-10-01  5:57     ` Chaitanya Kulkarni
2021-09-30 12:56 ` [PATCH V2 3/5] nvme: prepare for pairing quiescing and unquiescing Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-09-30 12:56 ` [PATCH V2 4/5] nvme: paring quiesce/unquiesce Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-09-30 12:56 ` [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-09-30 15:56   ` Bart Van Assche
2021-09-30 15:56     ` Bart Van Assche
2021-10-05  2:31     ` Ming Lei [this message]
2021-10-05  2:31       ` Ming Lei
2021-10-08  3:22       ` yukuai (C)
2021-10-08  3:22         ` yukuai (C)
2021-10-08  5:10         ` Ming Lei
2021-10-08  5:10           ` Ming Lei
2021-10-08  6:22           ` yukuai (C)
2021-10-08  6:22             ` yukuai (C)
2021-10-08  6:35             ` Ming Lei
2021-10-08  6:35               ` Ming Lei
2021-10-08  7:13               ` yukuai (C)
2021-10-08  7:13                 ` yukuai (C)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVu49xcM1N//fvKR@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.