Re: [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, John Garry <john.garry@huawei.com>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive
Date: Mon, 11 May 2020 11:48:27 +0800	[thread overview]
Message-ID: <20200511034827.GD1418834@T590> (raw)
In-Reply-To: <8ef5352b-a1bb-a3c1-3ad2-696df6e86f1f@acm.org>

On Sun, May 10, 2020 at 08:20:24PM -0700, Bart Van Assche wrote:
> On 2020-05-10 18:45, Ming Lei wrote:
> > On Sat, May 09, 2020 at 07:18:46AM -0700, Bart Van Assche wrote:
> >> On 2020-05-08 21:10, Ming Lei wrote:
> >>> queue freezing can only be applied on the request queue level, and not
> >>> hctx level. When requests can't be completed, wait freezing just hangs
> >>> for-ever.
> >>
> >> That's indeed what I meant: freeze the entire queue instead of
> >> introducing a new mechanism that freezes only one hardware queue at a time.
> > 
> > No, the issue is exactly that one single hctx becomes inactive, and
> > other hctx are still active and workable.
> > 
> > If one entire queue is frozen because of some of CPUs are offline, how
> > can userspace submit IO to this disk? You suggestion justs makes the
> > disk not usable, that won't be accepted.
> 
> What I meant is to freeze a request queue temporarily (until hot
> unplugging of a CPU has finished). I would never suggest to freeze a
> request queue forever and I think that you already knew that.

But what is your motivation to freeze queue temporarily?

I don's see any help of freezing queue for this issue. Also even though
it is temporary, IO effect still can be observed for other online CPUs.

If you want to block new allocation from the inactive hctx, that isn't
necessary cause no new allocation is basically possible because all
cpus of this hctx will be offline.

If you want to wait completion of in-flight requests, that isn't doable
because requests may not be completed at all when one hctx becomes
inactive and the managed interrupt is shutdown.

> 
> >> Please clarify what "when requests can't be completed" means. Are you
> >> referring to requests that take longer than expected due to e.g. a
> >> controller lockup or to requests that take a long time intentionally?
> > 
> > If all CPUs in one hctx->cpumask are offline, the managed irq of this hw
> > queue will be shutdown by genirq code, so any in-flight IO won't be
> > completed or timedout after the managed irq is shutdown because of cpu
> > offline.
> > 
> > Some drivers may implement timeout handler, so these in-flight requests
> > will be timed out, but still not friendly behaviour given the default
> > timeout is too long.
> > 
> > Some drivers don't implement timeout handler at all, so these IO won't
> > be completed.
> 
> I think that the block layer needs to be notified after the decision has

I have added new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for getting the
notification and blk_mq_hctx_notify_online() will be called before this
cpu is put offline.

> been taken to offline a CPU and before the interrupts associated with
> that CPU are disabled. That would allow the block layer to freeze a
> request queue without triggering any timeouts (ignoring block driver and
> hardware bugs). I'm not familiar with CPU hotplugging so I don't know
> whether or not such a mechanism already exists.

How can freezing queue avoid to triggering timeout?

Freezing queue basically blocks new request allocation, and follows wait
for completion of all in-flight request. As I explained, either no new
allocation on this inactive hctx, or in-flight request won't be completed
without this patch's solution.

> 
> >> The former case is handled by the block layer timeout handler. I propose
> >> to handle the latter case by introducing a new callback function pointer
> >> in struct blk_mq_ops that aborts all outstanding requests.
> > 
> > As I mentioned, timeout isn't a friendly behavior. Or not every driver
> > implements timeout handler or well enough.
> 
> What I propose is to fix those block drivers instead of complicating the
> block layer core further and instead of introducing potential deadlocks
> in the block layer core.

The deadlock you mentioned can be fixed with help of BLK_MQ_REQ_PREEMPT.

> 
> >> Request queue
> >> freezing is such an important block layer mechanism that I think we
> >> should require that all block drivers support freezing a request queue
> >> in a short time.
> > 
> > Firstly, we just need to drain in-flight requests and re-submit queued
> > requests from one single hctx, and queue wide freezing causes whole
> > userspace IOs blocked unnecessarily.
> 
> Freezing a request queue for a short time is acceptable. As you know we
> already do that when the queue depth is modified, when the write-back
> throttling latency is modified and also when the I/O scheduler is changed.

Again, how can freeze queue help the issue addressed by this patchset?

> 
> > Secondly, some requests may not be completed at all, so freezing can't
> > work because freeze_wait may hang forever.
> 
> If a request neither can be aborted nor completes then that's a severe
> bug in the block driver that submitted the request to the block device.

It is hard to implement timeout handler for every driver, or remove all
BLK_EH_RESET_TIMER returning from driver.

Even for drivers which implementing timeout handler elegantly, it isn't
friendly to wait several dozens of seconds or more than one hundred seconds
to wait IO completion during cpu hotplug. Who said that IO timeout has
to be triggered during cpu hotplug? At least there isn't such issue with
non-managed interrupt.



Thanks, 
Ming

next prev parent reply	other threads:[~2020-05-11  3:48 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05  2:09 [PATCH V10 00/11] blk-mq: improvement CPU hotplug Ming Lei
2020-05-05  2:09 ` [PATCH V10 01/11] block: clone nr_integrity_segments and write_hint in blk_rq_prep_clone Ming Lei
2020-05-05  2:09 ` [PATCH V10 02/11] block: add helper for copying request Ming Lei
2020-05-05  2:09 ` [PATCH V10 03/11] blk-mq: mark blk_mq_get_driver_tag as static Ming Lei
2020-05-05  2:09 ` [PATCH V10 04/11] blk-mq: assign rq->tag in blk_mq_get_driver_tag Ming Lei
2020-05-05  2:09 ` [PATCH V10 05/11] blk-mq: support rq filter callback when iterating rqs Ming Lei
2020-05-08 23:32   ` Bart Van Assche
2020-05-09  0:18     ` Bart Van Assche
2020-05-09  2:05       ` Ming Lei
2020-05-09  3:08         ` Bart Van Assche
2020-05-09  3:52           ` Ming Lei
2020-05-05  2:09 ` [PATCH V10 06/11] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2020-05-05  6:14   ` Hannes Reinecke
2020-05-08 23:26   ` Bart Van Assche
2020-05-09  2:09     ` Ming Lei
2020-05-09  3:11       ` Bart Van Assche
2020-05-09  3:56         ` Ming Lei
2020-05-05  2:09 ` [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Ming Lei
2020-05-08 23:39   ` Bart Van Assche
2020-05-09  2:20     ` Ming Lei
2020-05-09  3:24       ` Bart Van Assche
2020-05-09  4:10         ` Ming Lei
2020-05-09 14:18           ` Bart Van Assche
2020-05-11  1:45             ` Ming Lei
2020-05-11  3:20               ` Bart Van Assche
2020-05-11  3:48                 ` Ming Lei [this message]
2020-05-11 20:56                   ` Bart Van Assche
2020-05-12  1:25                     ` Ming Lei
2020-05-05  2:09 ` [PATCH V10 08/11] block: add blk_end_flush_machinery Ming Lei
2020-05-05  2:09 ` [PATCH V10 09/11] blk-mq: add blk_mq_hctx_handle_dead_cpu for handling cpu dead Ming Lei
2020-05-05  2:09 ` [PATCH V10 10/11] blk-mq: re-submit IO in case that hctx is inactive Ming Lei
2020-05-05  2:09 ` [PATCH V10 11/11] block: deactivate hctx when the hctx is actually inactive Ming Lei
2020-05-09 14:07   ` Bart Van Assche
2020-05-11  2:11     ` Ming Lei
2020-05-11  3:30       ` Bart Van Assche
2020-05-11  4:08         ` Ming Lei
2020-05-11 20:52           ` Bart Van Assche
2020-05-12  1:43             ` Ming Lei
2020-05-12  2:08             ` Ming Lei
2020-05-08 21:49 ` [PATCH V10 00/11] blk-mq: improvement CPU hotplug Ming Lei
2020-05-09  3:17   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200511034827.GD1418834@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).