From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752721Ab3FGDKP (ORCPT ); Thu, 6 Jun 2013 23:10:15 -0400 Received: from [205.204.113.249] ([205.204.113.249]:45009 "HELO us-alimail-mta2.hst.scl.en.alidc.net." rhost-flags-FAIL-FAIL-FAIL-FAIL) by vger.kernel.org with SMTP id S1751367Ab3FGDKN (ORCPT ); Thu, 6 Jun 2013 23:10:13 -0400 Message-ID: <51B14F02.9090409@taobao.com> Date: Fri, 07 Jun 2013 11:09:54 +0800 From: sanbai User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Vivek Goyal CC: linux-kernel@vger.kernel.org, Zhu Yanhai , Tejun Heo , Jens Axboe , Tao Ma Subject: Re: [RFC v1] add new io-scheduler to use cgroup on high-speed device References: <1370398171-25173-1-git-send-email-sanbai@taobao.com> <20130605133059.GA16339@redhat.com> In-Reply-To: <20130605133059.GA16339@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2013年06月05日 21:30, Vivek Goyal wrote: > On Wed, Jun 05, 2013 at 10:09:31AM +0800, Robin Dong wrote: >> We want to use blkio.cgroup on high-speed device (like fusionio) for our mysql clusters. >> After testing different io-scheduler, we found that cfq is too slow and deadline can't run on cgroup. > So why not enhance deadline to be able to be used with cgroups instead of > coming up with a new scheduler? I think if we add cgroups support into deadline, it will not be suitable to call "deadline" anymore...so a new ioscheduler and a new name may not confuse users. > >> So we developed a new io-scheduler: tpps (Tiny Parallel Proportion Scheduler).It dispatch requests >> only by using their individual weight and total weight (proportion) therefore it's simply and efficient. > Can you give more details. Do you idle? Idling kills performance. If not, > then without idling how do you achieve performance differentiation. We don't idle, when comes to .elevator_dispatch_fn,we just compute quota for every group: quota = nr_requests - rq_in_driver; group_quota = quota * group_weight / total_weight; and dispatch 'group_quota' requests for the coordinate group. Therefore high-weight group will dispatch more requests than low-weight group. > >> Test case: fusionio card, 4 cgroups, iodepth-512 >> >> groupname weight >> test1 1000 >> test2 800 >> test3 600 >> test4 400 >> > What's the workload used for this? > >> Use tpps, the result is: >> >> groupname iops avg-rt(ms) max-rt(ms) >> test1 30220 16 54 >> test2 28261 18 56 >> test3 26333 19 69 >> test4 20152 25 87 >> >> Use cfq, the result is: >> >> groupname iops avg-rt(ms) max-rt(ms) >> test1 16478 30 242 >> test2 13015 39 347 >> test3 9300 54 371 >> test4 5806 87 393 > How do results look like with cfq if this is run with slice_idle=0 and > quatum=128 or higher. > > cfqq idles on 3 things. queue (cfqq), service tree and cfq group. > slice_idle will disable idling on cfqq but not no service tree. If > we provide a knob for that, then idling on service tree can be disabled > too and then we will be left with group idling only and then it should > be much better. I do the test again for cfq (slice_idle=0, quatum=128) and tpps cfq (slice_idle=0, quatum=128) groupname iops avg-rt(ms) max-rt(ms) test1 16148 15 188 test2 12756 20 117 test3 9778 26 268 test4 6198 41 209 tpps groupname iops avg-rt(ms) max-rt(ms) test1 17292 14 65 test2 15221 16 80 test3 12080 21 66 test4 7995 32 90 Looks cfq with is much better than before. My fio script is : [global] direct=1 ioengine=libaio #ioengine=psync runtime=30 bs=4k rw=randread iodepth=256 filename=/dev/fioa numjobs=2 #group_reporting [read1] cgroup=test1 cgroup_weight=1000 [read2] cgroup=test2 cgroup_weight=800 [read3] cgroup=test3 cgroup_weight=600 [read4] cgroup=test4 cgroup_weight=400 > > Thanks > Vivek -- Robin Dong 董昊(花名:三百) 阿里巴巴 集团 核心系统部 内核组 分机:72370 手机:13520865473 email:sanbai@taobao.com