Re: [RFC 0/3] block: proportional based blk-throttling

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Shaohua Li <shli@fb.com>
Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, vgoyal@redhat.com,
	jmoyer@redhat.com, Kernel-team@fb.com
Subject: Re: [RFC 0/3] block: proportional based blk-throttling
Date: Thu, 21 Jan 2016 16:10:02 -0500	[thread overview]
Message-ID: <20160121211002.GH5157@mtj.duckdns.org> (raw)
In-Reply-To: <cover.1453308862.git.shli@fb.com>

Hello, Shaohua.

On Wed, Jan 20, 2016 at 09:49:16AM -0800, Shaohua Li wrote:
> Currently we have 2 iocontrollers. blk-throttling is bandwidth based. CFQ is

Just a nit.  blk-throttle is both bw and iops based.

> weight based. It would be great there is a unified iocontroller for the two.
> And blk-mq doesn't support ioscheduler, leaving blk-throttling the only option
> for blk-mq. It's time to have a scalable iocontroller supporting both
> bandwidth/weight based control and working with blk-mq.
> 
> blk-throttling is a good candidate, it works for both blk-mq and legacy queue.
> It has a global lock which is scaring for scalability, but it's not terrible in
> practice. In my test, the NVMe IOPS can reach 1M/s and I have all CPU run IO. Enabling
> blk-throttle has around 2~3% IOPS and 10% cpu utilization impact. I'd expect
> this isn't a big problem for today's workload. This patchset then try to make a
> unified iocontroller. I'm leveraging blk-throttling.

Have you tried with some level, say 5, of nesting?  IIRC, how it
implements hierarchical control is rather braindead (and yeah I'm
responsible for the damage).

> The idea is pretty simple. If we know disk total bandwidth, we can calculate
> cgroup bandwidth according to its weight. blk-throttling can use the calculated
> bandwidth to throttle cgroup. Disk total bandwidth changes dramatically per IO
> pattern. Long history is meaningless. The simple algorithm in patch 1 works
> pretty well when IO pattern changes.

So, that part is fine but I don't think it makes sense to make weight
based control either bandwidth or iops based.  The fundamental problem
is that it's a false choice.  It's like asking someone who wants a car
to choose between accelerator and brake.  It's a choice without a good
answer.  Both are wrong.  Also note that there's an inherent
difference from the currently implemented absolute limits.  Absolute
limits can be combined.  Weights based on different metrics can't be.

Even with modern SSDs, both iops and bandwidth play major roles in
deciding how costly each IO is and I'm fairly confident that this is
fundamental enough to be the case for quite a while.  I *think* the
cost model can be approximated from measurements.  Devices are
becoming more and more predictable in their behaviors after all.  For
weight based distribution, the unit of distribution should be IO time,
not bandwidth or iops.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2016-01-21 21:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-20 17:49 [RFC 0/3] block: proportional based blk-throttling Shaohua Li
2016-01-20 17:49 ` [RFC 1/3] block: estimate disk bandwidth Shaohua Li
2016-01-20 17:49 ` [RFC 2/3] blk-throttling: weight based throttling Shaohua Li
2016-01-21 20:33   ` Vivek Goyal
2016-01-21 21:00     ` Shaohua Li
2016-01-20 17:49 ` [RFC 3/3] blk-throttling: detect inactive cgroup Shaohua Li
2016-01-21 20:44   ` Vivek Goyal
2016-01-21 21:05     ` Shaohua Li
2016-01-21 21:09       ` Vivek Goyal
2016-01-20 19:05 ` [RFC 0/3] block: proportional based blk-throttling Vivek Goyal
2016-01-20 19:34   ` Shaohua Li
2016-01-20 19:40     ` Vivek Goyal
2016-01-20 19:43       ` Shaohua Li
2016-01-20 19:54         ` Vivek Goyal
2016-01-20 21:11         ` Vivek Goyal
2016-01-20 21:34           ` Shaohua Li
2016-01-21 21:10 ` Tejun Heo [this message]
2016-01-21 22:24   ` Shaohua Li
2016-01-21 22:41     ` Tejun Heo
2016-01-22  0:00       ` Shaohua Li
2016-01-22 14:48         ` Tejun Heo
2016-01-22 15:52           ` Vivek Goyal
2016-01-22 18:00             ` Shaohua Li
2016-01-22 19:09               ` Vivek Goyal
2016-01-22 19:45                 ` Shaohua Li
2016-01-22 20:04                   ` Vivek Goyal
2016-01-22 17:57           ` Shaohua Li
2016-01-22 18:08             ` Tejun Heo
2016-01-22 19:11               ` Shaohua Li
2016-01-22 14:43       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160121211002.GH5157@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=Kernel-team@fb.com \
    --cc=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).