Re: [RFC 0/3] block: proportional based blk-throttling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Shaohua Li <shli@fb.com>
To: Tejun Heo <tj@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <axboe@kernel.dk>,
	<vgoyal@redhat.com>, <jmoyer@redhat.com>, <Kernel-team@fb.com>
Subject: Re: [RFC 0/3] block: proportional based blk-throttling
Date: Fri, 22 Jan 2016 11:11:45 -0800	[thread overview]
Message-ID: <20160122191144.GA2151859@devbig084.prn1.facebook.com> (raw)
In-Reply-To: <20160122180844.GM5157@mtj.duckdns.org>

On Fri, Jan 22, 2016 at 01:08:44PM -0500, Tejun Heo wrote:
> Hello, Shaohua.
> 
> On Fri, Jan 22, 2016 at 09:57:10AM -0800, Shaohua Li wrote:
> > > Let's say per-cgroup buffer budget B is calculated as, say, 100ms
> > > worth of IO cost (or bandwidth or iops) available to the cgroup.  In
> > > practice, this may have to be adjusted down depending on the number of
> > > cgroups performing active IOs.  For a given cgroup, B can be
> > > distributed among the CPUs that are actively issuing IOs in that
> > > cgroup.  It will degenerate to round robin of small budget if there
> > > are too many active for the budget available but for most cases this
> > > will cut down most of cross-CPU traffic.
> > 
> > The cgroup could be a single thread. It uses cpu0's per-cpu budget B-1,
> > move to cpu1 and use another B - 1, and so on
> 
> Sure, just ensure that the total cached is bound by B and expire if
> not used over a certain amount of time.  The thing is as long as we
> can go through percpu cache most of the time, it's all fine.  We can
> spend a lot of processing budget for corner cases.
> 
> > >  cost = F + R * size
> > 
> > F could be IOPS. and the real cost becomes R. How do you get R? We can't
> > simply use R(4k) = 1, R(8k) = 2 .... I tried the idea several years ago:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_474164_&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=X13hAPkxmvBro1Ug8vcKHw&m=4X56EQmXhfF82BH-eQkQL08afWwbrOErtEVkn5xKsWA&s=_IkvDWMM7AXgh840OrQKndkJpBVcKrGhgLnHkA_aYNg&e= 
> > The idea is the same. But the reality is we can't get R. I don't want to
> > have a random math working for one SSD but not for another.
> 
> Yeah, it'll have to be adaptive.  We can't use fixed values; however,
> note that using bandwidth means that we assume F == 0 and R == 1,
> which wouldn't be appropriate for most devices.

It's true bandwidth means R == 1. But it has a kind of adaptive. The
cgroup bandwidth == share * disk_bandwidth. disk_bandwidth is adaptive.
It might not work well if cgroups have completely different IO pattern
though. 

Thanks,
Shaohua

next prev parent reply	other threads:[~2016-01-22 19:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-20 17:49 [RFC 0/3] block: proportional based blk-throttling Shaohua Li
2016-01-20 17:49 ` [RFC 1/3] block: estimate disk bandwidth Shaohua Li
2016-01-20 17:49 ` [RFC 2/3] blk-throttling: weight based throttling Shaohua Li
2016-01-21 20:33   ` Vivek Goyal
2016-01-21 21:00     ` Shaohua Li
2016-01-20 17:49 ` [RFC 3/3] blk-throttling: detect inactive cgroup Shaohua Li
2016-01-21 20:44   ` Vivek Goyal
2016-01-21 21:05     ` Shaohua Li
2016-01-21 21:09       ` Vivek Goyal
2016-01-20 19:05 ` [RFC 0/3] block: proportional based blk-throttling Vivek Goyal
2016-01-20 19:34   ` Shaohua Li
2016-01-20 19:40     ` Vivek Goyal
2016-01-20 19:43       ` Shaohua Li
2016-01-20 19:54         ` Vivek Goyal
2016-01-20 21:11         ` Vivek Goyal
2016-01-20 21:34           ` Shaohua Li
2016-01-21 21:10 ` Tejun Heo
2016-01-21 22:24   ` Shaohua Li
2016-01-21 22:41     ` Tejun Heo
2016-01-22  0:00       ` Shaohua Li
2016-01-22 14:48         ` Tejun Heo
2016-01-22 15:52           ` Vivek Goyal
2016-01-22 18:00             ` Shaohua Li
2016-01-22 19:09               ` Vivek Goyal
2016-01-22 19:45                 ` Shaohua Li
2016-01-22 20:04                   ` Vivek Goyal
2016-01-22 17:57           ` Shaohua Li
2016-01-22 18:08             ` Tejun Heo
2016-01-22 19:11               ` Shaohua Li [this message]
2016-01-22 14:43       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160122191144.GA2151859@devbig084.prn1.facebook.com \
    --to=shli@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.