Re: [RFC v1] add new io-scheduler to use cgroup on high-speed device

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: sanbai <sanbai@taobao.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-kernel@vger.kernel.org, Zhu Yanhai <gaoyang.zyh@taobao.com>,
	Tejun Heo <tj@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Tao Ma <taoma.tm@gmail.com>
Subject: Re: [RFC v1] add new io-scheduler to use cgroup on high-speed device
Date: Sat, 08 Jun 2013 11:50:06 +0800	[thread overview]
Message-ID: <51B2A9EE.50908@taobao.com> (raw)
In-Reply-To: <20130607195351.GD14015@redhat.com>

On 2013年06月08日 03:53, Vivek Goyal wrote:
> On Fri, Jun 07, 2013 at 11:09:54AM +0800, sanbai wrote:
>> On 2013年06月05日 21:30, Vivek Goyal wrote:
>>> On Wed, Jun 05, 2013 at 10:09:31AM +0800, Robin Dong wrote:
>>>> We want to use blkio.cgroup on high-speed device (like fusionio) for our mysql clusters.
>>>> After testing different io-scheduler, we found that  cfq is too slow and deadline can't run on cgroup.
>>> So why not enhance deadline to be able to be used with cgroups instead of
>>> coming up with a new scheduler?
>> I think if we add cgroups support into deadline, it will not be
>> suitable to call "deadline" anymore...so a new ioscheduler and a new
>> name may not confuse users.
> Nobody got confused when we added cgroup support to CFQ. Not that
> I am saying go add support to deadline. I am just saying that need
> for cgroup support does not sound like it justfies need of a new
> IO scheduler.
>
> [..]
>>> Can you give more details. Do you idle? Idling kills performance. If not,
>>> then without idling how do you achieve performance differentiation.
>> We don't idle, when comes to .elevator_dispatch_fn，we just compute
>> quota for every group:
>>
>> quota = nr_requests - rq_in_driver;
>> group_quota = quota * group_weight / total_weight;
>>
>> and dispatch 'group_quota' requests for the coordinate group.
>> Therefore high-weight group
>> will dispatch more requests than low-weight group.
> Ok, this works only if all the groups are full all the time otherwise
> groups will lose their fair share. This simplifies the things a lot.
> That is fairness is provided only if group is always backlogged. In
> practice, this happens only if a group is doing IO at very high rate
> (like your fio scripts). Have you tried running any real life workload
> in these cgroups (apache, databases etc) and see how good is service
> differentiation.
>
> Anyway, sounds like this can be done at generic block layer like
> blk-throtl and it can sit on top so that it can work with all schedulers
> and can also work with bio based block drivers.
That's a new idea, I will give a try later.
>    
>
> [..]
>> I do the test again for cfq (slice_idle=0, quatum=128) and tpps
>>
>> cfq (slice_idle=0, quatum=128)
>> groupname iops avg-rt(ms) max-rt(ms)
>> test1 16148 15 188
>> test2 12756 20 117
>> test3 9778 26 268
>> test4 6198 41 209
>>
>> tpps
>> groupname iops avg-rt(ms) max-rt(ms)
>> test1 17292 14 65
>> test2 15221 16 80
>> test3 12080 21 66
>> test4 7995 32 90
>>
>> Looks cfq with is much better than before.
> Yep, I am sure there are more simple opportunites for optimization
> where it can help. Can you try couple more things.
>
> - Drive even deeper queue depth. Set quantum=512.
>
> - set group_idle=0.
I changed the iodepth to 512 in fio script and the new result is:

cfq (group_idle=0, quantum=512)
groupname    iops        avg-rt(ms)   max-rt(ms)
test1               15259    33                305
test2               11858    42                345
test3               8885      57                335
test4               5738      89                355

cfq (group_idle=0, quantum=512, slice_sync=10)
groupname    iops        avg-rt(ms)   max-rt(ms)
test1               16507    31                177
test2               12896    39                366
test3               9301      55                188
test4               6023      84                545

tpps
groupname    iops        avg-rt(ms)   max-rt(ms)
test1               16316    31                99
test2               15066    33                106
test3               12182    42                101
test4               8350      61                180

looks cfq works much better now.
>
>    Ideally this should effectively emulate what you are doing. That is try
>    to provide fairness without idling on group.
>
>    In practice I could not keep group queue full and before group exhausted
>    its slice, it got empty and got deleted from service tree and lost its
>    fair share. So if group_idle=0 leads to no service differentiation,
>    try slice_sync=10 and see what happens.
>
> Thanks
> Vivek


-- 

Robin Dong
董昊（花名：三百）
阿里巴巴 集团 核心系统部 内核组
分机：72370
手机：13520865473
email：sanbai@taobao.com

next prev parent reply	other threads:[~2013-06-08  3:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05  2:09 [RFC v1] add new io-scheduler to use cgroup on high-speed device Robin Dong
2013-06-05  3:03 ` Tejun Heo
2013-06-05  3:26   ` sanbai
2013-06-05 13:55   ` Vivek Goyal
2013-06-05 17:36     ` Tejun Heo
2013-06-05 13:59   ` Vivek Goyal
2013-06-05 13:30 ` Vivek Goyal
2013-06-07  3:09   ` sanbai
2013-06-07 19:53     ` Vivek Goyal
2013-06-08  3:50       ` sanbai [this message]
2013-06-08  4:38         ` sanbai
  -- strict thread matches above, loose matches on Subject: below --
2013-06-05  2:23 Robin Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B2A9EE.50908@taobao.com \
    --to=sanbai@taobao.com \
    --cc=axboe@kernel.dk \
    --cc=gaoyang.zyh@taobao.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=taoma.tm@gmail.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox