From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751311Ab3FHEjK (ORCPT ); Sat, 8 Jun 2013 00:39:10 -0400 Received: from [205.204.113.248] ([205.204.113.248]:56921 "HELO us-alimail-mta2.hst.scl.en.alidc.net." rhost-flags-FAIL-FAIL-FAIL-FAIL) by vger.kernel.org with SMTP id S1750804Ab3FHEjI (ORCPT ); Sat, 8 Jun 2013 00:39:08 -0400 Message-ID: <51B2B55A.1070607@taobao.com> Date: Sat, 08 Jun 2013 12:38:50 +0800 From: sanbai User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Vivek Goyal CC: linux-kernel@vger.kernel.org, Zhu Yanhai , Tejun Heo , Jens Axboe , Tao Ma Subject: Re: [RFC v1] add new io-scheduler to use cgroup on high-speed device References: <1370398171-25173-1-git-send-email-sanbai@taobao.com> <20130605133059.GA16339@redhat.com> <51B14F02.9090409@taobao.com> <20130607195351.GD14015@redhat.com> <51B2A9EE.50908@taobao.com> In-Reply-To: <51B2A9EE.50908@taobao.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2013-06-08- 11:50, sanbai wrote: > On 2013年06月08日 03:53, Vivek Goyal wrote: >> On Fri, Jun 07, 2013 at 11:09:54AM +0800, sanbai wrote: >>> On 2013年06月05日 21:30, Vivek Goyal wrote: >>>> On Wed, Jun 05, 2013 at 10:09:31AM +0800, Robin Dong wrote: >>>>> We want to use blkio.cgroup on high-speed device (like fusionio) >>>>> for our mysql clusters. >>>>> After testing different io-scheduler, we found that cfq is too >>>>> slow and deadline can't run on cgroup. >>>> So why not enhance deadline to be able to be used with cgroups >>>> instead of >>>> coming up with a new scheduler? >>> I think if we add cgroups support into deadline, it will not be >>> suitable to call "deadline" anymore...so a new ioscheduler and a new >>> name may not confuse users. >> Nobody got confused when we added cgroup support to CFQ. Not that >> I am saying go add support to deadline. I am just saying that need >> for cgroup support does not sound like it justfies need of a new >> IO scheduler. >> >> [..] >>>> Can you give more details. Do you idle? Idling kills performance. >>>> If not, >>>> then without idling how do you achieve performance differentiation. >>> We don't idle, when comes to .elevator_dispatch_fn,we just compute >>> quota for every group: >>> >>> quota = nr_requests - rq_in_driver; >>> group_quota = quota * group_weight / total_weight; >>> >>> and dispatch 'group_quota' requests for the coordinate group. >>> Therefore high-weight group >>> will dispatch more requests than low-weight group. >> Ok, this works only if all the groups are full all the time otherwise >> groups will lose their fair share. This simplifies the things a lot. >> That is fairness is provided only if group is always backlogged. In >> practice, this happens only if a group is doing IO at very high rate >> (like your fio scripts). Have you tried running any real life workload >> in these cgroups (apache, databases etc) and see how good is service >> differentiation. >> >> Anyway, sounds like this can be done at generic block layer like >> blk-throtl and it can sit on top so that it can work with all schedulers >> and can also work with bio based block drivers. > That's a new idea, I will give a try later. >> >> [..] >>> I do the test again for cfq (slice_idle=0, quatum=128) and tpps >>> >>> cfq (slice_idle=0, quatum=128) >>> groupname iops avg-rt(ms) max-rt(ms) >>> test1 16148 15 188 >>> test2 12756 20 117 >>> test3 9778 26 268 >>> test4 6198 41 209 >>> >>> tpps >>> groupname iops avg-rt(ms) max-rt(ms) >>> test1 17292 14 65 >>> test2 15221 16 80 >>> test3 12080 21 66 >>> test4 7995 32 90 >>> >>> Looks cfq with is much better than before. >> Yep, I am sure there are more simple opportunites for optimization >> where it can help. Can you try couple more things. >> >> - Drive even deeper queue depth. Set quantum=512. >> >> - set group_idle=0. > I changed the iodepth to 512 in fio script and the new result is: > > cfq (group_idle=0, quantum=512) > groupname iops avg-rt(ms) max-rt(ms) > test1 15259 33 305 > test2 11858 42 345 > test3 8885 57 335 > test4 5738 89 355 > > cfq (group_idle=0, quantum=512, slice_sync=10) > groupname iops avg-rt(ms) max-rt(ms) > test1 16507 31 177 > test2 12896 39 366 > test3 9301 55 188 > test4 6023 84 545 > > tpps > groupname iops avg-rt(ms) max-rt(ms) > test1 16316 31 99 > test2 15066 33 106 > test3 12182 42 101 > test4 8350 61 180 > > looks cfq works much better now. But after I changed to 'randrw', the condition is a little different: cfq (group_idle=0, quantum=512, slice_sync=10,slice_async=10) groupname iops(r/w) avg-rt(ms) max-rt(ms) test1 8717/8726 26/31 553/576 test2 6944/6943 34/39 507/514 test3 4974/4961 49/53 725/658 test4 3117/3109 79/84 1107/1094 tpps groupname iops(r/w) avg-rt(ms) max-rt(ms) test1 9130/9147 25/30 85/98 test2 7644/7652 30/36 103/118 test3 5727/5733 41/47 132/146 test4 3889/3891 62/68 193/214 >> >> Ideally this should effectively emulate what you are doing. That >> is try >> to provide fairness without idling on group. >> >> In practice I could not keep group queue full and before group >> exhausted >> its slice, it got empty and got deleted from service tree and lost >> its >> fair share. So if group_idle=0 leads to no service differentiation, >> try slice_sync=10 and see what happens. >> >> Thanks >> Vivek > > -- Robin Dong email:sanbai@taobao.com