From: Vivek Goyal <vgoyal@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Robin Dong <sanbai@taobao.com>,
linux-kernel@vger.kernel.org, Zhu Yanhai <gaoyang.zyh@taobao.com>,
Jens Axboe <axboe@kernel.dk>, Tao Ma <taoma.tm@gmail.com>,
kent.overstreet@gmail.com
Subject: Re: [RFC v1] add new io-scheduler to use cgroup on high-speed device
Date: Wed, 5 Jun 2013 09:55:12 -0400 [thread overview]
Message-ID: <20130605135512.GB16339@redhat.com> (raw)
In-Reply-To: <20130605030337.GO14916@htj.dyndns.org>
On Tue, Jun 04, 2013 at 08:03:37PM -0700, Tejun Heo wrote:
> (cc'ing Kent. Original posting at
> http://thread.gmane.org/gmane.linux.kernel/1502484 )
>
> Hello,
>
> On Wed, Jun 05, 2013 at 10:09:31AM +0800, Robin Dong wrote:
> > We want to use blkio.cgroup on high-speed device (like fusionio) for our mysql clusters.
> > After testing different io-scheduler, we found that cfq is too slow and deadline can't run on cgroup.
> > So we developed a new io-scheduler: tpps (Tiny Parallel Proportion Scheduler).It dispatch requests
> > only by using their individual weight and total weight (proportion) therefore it's simply and efficient.
> >
> > Test case: fusionio card, 4 cgroups, iodepth-512
>
> So, while I understand the intention behind it, I'm not sure a
> separate io-sched for this is what we want. Kent and Jens have been
> thinking about this lately so they'll probably chime in. From my POV,
> I see a few largish issues.
>
> * It has to be scalable with relatively large scale SMP / NUMA
> configurations. It better integrate with blk-mq support currently
> being brewed.
Agreed that any new algorithm to do proportional IO should integrate
well will blk-mq support. I have yet to look at that implementation but
my understanding was that current algorithm is per queue and one
queue would not know about other queue.
As you suggested in the past, may be some kind of token based scheme
will work better instead of trying to service differentation based
on time slice.
>
> * It definitely has to support hierarchy. Nothing which doesn't
> support full hierarchy can be added to cgroup at this point.
>
> * We already have separate implementations in blk-throtl and
> cfq-iosched. Maybe it's too late and too different for cfq-iosched
> given that it's primarily targeted at disks, but I wonder whether we
> can make blk-throtl generic and scalable enough to cover all other
> use cases.
I think it will be hard to cover all the use cases. There is a reason
why CFQ got so complicated and bulky because it tried to cover all the
use cases and provide service differentiation among workloads. blk-cgroup
will try to do the same thing at group level. All these question will
arise when to idle, how much to idle, how much device queue depth we
should drive to keep service differention better, how much outstanding
IO from each group we should allow in the queue.
And all of this affects what kind of service differentation you see
on different devices for different workloads.
I think generic implementation can be written with the goal of trying to
make it work with faster flash devices (which will typically use blk-mq).
And for slower disks, one can continue to use CFQ's cgroup implementation.
On a side note, it would be nice if we handle problem of managing buffered
writes using cgroup first. Otherwise there are very few practical
scenarios where proportional IO thing can be used.
Robin, what's the workload/setup which will benefit from this even without
buffered write support.
Thanks
Vivek
next prev parent reply other threads:[~2013-06-05 13:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-05 2:09 [RFC v1] add new io-scheduler to use cgroup on high-speed device Robin Dong
2013-06-05 3:03 ` Tejun Heo
2013-06-05 3:26 ` sanbai
2013-06-05 13:55 ` Vivek Goyal [this message]
2013-06-05 17:36 ` Tejun Heo
2013-06-05 13:59 ` Vivek Goyal
2013-06-05 13:30 ` Vivek Goyal
2013-06-07 3:09 ` sanbai
2013-06-07 19:53 ` Vivek Goyal
2013-06-08 3:50 ` sanbai
2013-06-08 4:38 ` sanbai
-- strict thread matches above, loose matches on Subject: below --
2013-06-05 2:23 Robin Dong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130605135512.GB16339@redhat.com \
--to=vgoyal@redhat.com \
--cc=axboe@kernel.dk \
--cc=gaoyang.zyh@taobao.com \
--cc=kent.overstreet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sanbai@taobao.com \
--cc=taoma.tm@gmail.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox