Re: [PATCH 4/6] blk-mq: use EWMA to estimate congestion threshold

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Bart Van Assche <bart.vanassche@sandisk.com>,
	Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH 4/6] blk-mq: use EWMA to estimate congestion threshold
Date: Wed, 12 Jul 2017 10:30:29 +0800	[thread overview]
Message-ID: <20170712023028.GB13036@ming.t460p> (raw)
In-Reply-To: <db7526a0-d534-8bc2-93de-546f4dabccfb@kernel.dk>

On Tue, Jul 11, 2017 at 12:25:16PM -0600, Jens Axboe wrote:
> On 07/11/2017 12:21 PM, Ming Lei wrote:
> > When .queue_rq() returns BLK_STS_RESOURCE(BUSY), we can
> > consider that there is congestion in either low level
> > driver or hardware.
> > 
> > This patch uses EWMA to estimate this congestion threshold,
> > then this threshold can be used to detect/avoid congestion.
> 
> This whole patch set lacks some sort of reasoning why this
> makes sense. I'm assuming you want to reduce unnecessary
> restarts of the queue?

When low level drivers returns BLK_STS_RESOURCE, it means
either driver or hardware can't handle the incoming requests.
The issue is actually one problem of congestion control, IMO.

There are some ways taken in drivers to deal with the issue:

	1) both virtio-blk/xen-blkfront just stops queue in this
	case, then restart the queue when one request is completed.

	2) virtio-scsi chooses to not stop queue

For reaching good IO performance, we often do the following two
things:
	- try to make the queue pipeline as full by increasing queue depth
	- run several tasks to do IO at the same time

So for the soltion 1) above, it can't be efficient, because when one request
is completed to restart queue, there are lots of requests in the queue,
and dispatching these requests will cause to return lots of BLK_STS_RESOURCE
too. The dispatch isn't cheap at all, we need to acquire sw queue
lock/scheduler lock/hw queue lock /low level queue lock in the whole
path, so we should try to avoid dispatching to a busy queue.

For the solution 2), it simply doesn't stop queue, unnecessary
dispatching to a busy queue just wastes CPU, and has the same issue
with 1) too. As you can see in the commit log of patch 5, big
improvement is observed with this simple implementation:

	sequential read test(libaio, bs:4k, direct io, queue depth:64, 8 jobs)
	on virtio-scsi shows that:
	        - CPU utilization decreases ~20%
	        - IOPS increases by ~10%

EWMA is used to estimate one average queue depth, with which
the queue will often be busy. In this patchset, the average queue
depth when queue becomes busy is called congestion threshold.

I chooose EWMA because it is one usual/common way to figure out
weight average value, and it has been used in several fields of
kernel(wifi rate control, sequential I/O estimation in bcache,
network,...)  Or there is other better way to do that? I am happy
to take it, even in the future we may support more than one approaches
to address the issue.

Also another motivation is to unexport APIs of start/stop queue.

> I would much rather ensure that we only
> start when we absolutely have to to begin with, I'm pretty sure
> we have a number of cases where that is not so.

Could you share us these cases? I believe we can integrate different
approaches for congestion control to address different requirement.

> 
> What happens with fluid congestion boundaries, with shared tags?

The approach in this patch should work, but the threshold may not
be accurate in this way, one simple method is to use the average
tag weight in EWMA, like this:

	sbitmap_weight() / hctx->tags->active_queues

-- 
Ming

next prev parent reply	other threads:[~2017-07-12  2:30 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11 18:20 [PATCH 0/6] blk-mq: introduce congestion control Ming Lei
2017-07-11 18:20 ` [PATCH 1/6] xen-blkfront: avoid to use start/stop queue Ming Lei
2017-07-11 18:41   ` Konrad Rzeszutek Wilk
2017-07-12  2:52     ` Ming Lei
2017-07-11 18:41   ` Bart Van Assche
2017-07-12  2:59     ` Ming Lei
2017-07-12  3:05     ` Ming Lei
2017-07-11 21:24   ` Roger Pau Monné
2017-07-12  3:12     ` Ming Lei
2017-07-11 18:20 ` [PATCH 2/6] SCSI: use blk_mq_run_hw_queues() in scsi_kick_queue() Ming Lei
2017-07-11 19:57   ` Bart Van Assche
2017-07-12  3:15     ` Ming Lei
2017-07-12 15:12       ` Bart Van Assche
2017-07-13 10:23         ` Ming Lei
2017-07-13 17:44           ` Bart Van Assche
2017-07-11 18:21 ` [PATCH 3/6] blk-mq: send the request to dispatch list if direct issue returns busy Ming Lei
2017-07-11 20:18   ` Bart Van Assche
2017-07-12  3:45     ` Ming Lei
2017-07-11 18:21 ` [PATCH 4/6] blk-mq: use EWMA to estimate congestion threshold Ming Lei
2017-07-11 18:25   ` Jens Axboe
2017-07-12  2:30     ` Ming Lei [this message]
2017-07-12 15:39       ` Bart Van Assche
2017-07-13 10:43         ` Ming Lei
2017-07-13 14:56           ` Bart Van Assche
2017-07-13 15:32             ` Ming Lei
2017-07-13 17:35               ` Bart Van Assche
2017-07-11 18:39   ` Jens Axboe
2017-07-12  3:20     ` Ming Lei
2017-07-11 21:02   ` Bart Van Assche
2017-07-12  3:43     ` Ming Lei
2017-07-11 18:21 ` [PATCH 5/6] blk-mq: introduce basic congestion control Ming Lei
2017-07-11 18:21 ` [PATCH 6/6] blk-mq: unexport APIs for start/stop queues Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170712023028.GB13036@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@sandisk.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox