public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, nauman@google.com,
	dpshah@google.com, jmoyer@redhat.com, czoccolo@gmail.com
Subject: Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle tunable V3
Date: Fri, 23 Jul 2010 07:53:58 +0800	[thread overview]
Message-ID: <4C48DA16.4010403@cn.fujitsu.com> (raw)
In-Reply-To: <20100722144931.GD28684@redhat.com>

Vivek Goyal wrote:
> On Thu, Jul 22, 2010 at 03:08:00PM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> Hi,
>>>
>>> This is V3 of the group_idle and CFQ IOPS mode implementation patchset. Since V2
>>> I have cleaned up the code a bit to clarify the confusion lingering around in
>>> what cases do we charge time slice and in what cases do we charge number of
>>> requests.
>>>
>>> What's the problem
>>> ------------------
>>> On high end storage (I got on HP EVA storage array with 12 SATA disks in 
>>> RAID 5), CFQ's model of dispatching requests from a single queue at a
>>> time (sequential readers/write sync writers etc), becomes a bottleneck.
>>> Often we don't drive enough request queue depth to keep all the disks busy
>>> and suffer a lot in terms of overall throughput.
>>>
>>> All these problems primarily originate from two things. Idling on per
>>> cfq queue and quantum (dispatching limited number of requests from a
>>> single queue) and till then not allowing dispatch from other queues. Once
>>> you set the slice_idle=0 and quantum to higher value, most of the CFQ's
>>> problem on higher end storage disappear.
>>>
>>> This problem also becomes visible in IO controller where one creates
>>> multiple groups and gets the fairness but overall throughput is less. In
>>> the following table, I am running increasing number of sequential readers
>>> (1,2,4,8) in 8 groups of weight 100 to 800.
>>>
>>> Kernel=2.6.35-rc5-iops+
>>> GROUPMODE=1          NRGRP=8
>>> DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4
>>> Workload=bsr      iosched=cfq     Filesz=512M bs=4K
>>> group_isolation=1 slice_idle=8    group_idle=8    quantum=8
>>> =========================================================================
>>> AVERAGE[bsr]    [bw in KB/s]
>>> -------
>>> job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total
>>> ---     --- --  ---------------------------------------------------------------
>>> bsr     3   1   6186   12752  16568  23068  28608  35785  42322  48409  213701
>>> bsr     3   2   5396   10902  16959  23471  25099  30643  37168  42820  192461
>>> bsr     3   4   4655   9463   14042  20537  24074  28499  34679  37895  173847
>>> bsr     3   8   4418   8783   12625  19015  21933  26354  29830  36290  159249
>>>
>>> Notice that overall throughput is just around 160MB/s with 8 sequential reader
>>> in each group.
>>>
>>> With this patch set, I have set slice_idle=0 and re-ran same test.
>>>
>>> Kernel=2.6.35-rc5-iops+
>>> GROUPMODE=1          NRGRP=8
>>> DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4
>>> Workload=bsr      iosched=cfq     Filesz=512M bs=4K
>>> group_isolation=1 slice_idle=0    group_idle=8    quantum=8
>>> =========================================================================
>>> AVERAGE[bsr]    [bw in KB/s]
>>> -------
>>> job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total
>>> ---     --- --  ---------------------------------------------------------------
>>> bsr     3   1   6523   12399  18116  24752  30481  36144  42185  48894  219496
>>> bsr     3   2   10072  20078  29614  38378  46354  52513  58315  64833  320159
>>> bsr     3   4   11045  22340  33013  44330  52663  58254  63883  70990  356520
>>> bsr     3   8   12362  25860  37920  47486  61415  47292  45581  70828  348747
>>>
>>> Notice how overall throughput has shot upto 348MB/s while retaining the ability
>>> to do the IO control.
>>>
>>> So this is not the default mode. This new tunable group_idle, allows one to
>>> set slice_idle=0 to disable some of the CFQ features and and use primarily
>>> group service differentation feature.
>>>
>>> If you have thoughts on other ways of solving the problem, I am all ears
>>> to it.
>> Hi Vivek
>>
>> Would you attach your fio job config file?
>>
> 
> Hi Gui,
> 
> I have written a fio based test script, "iostest", to be able to
> do cgroup and other IO scheduler testing more smoothly and I am using
> that. I am attaching the compressed script with the mail. Try using it
> and if it works for you and you find it useful, I can think of hosting a
> git tree somewhere.
> 
> I used following following command lines to test above.
> 
> # iostest <block-device> -G -w bsr -m 8 -c --nrgrp 8 --total
> 
> With slice idle disabled.
> 
> # iostest <block-device> -G -w bsr -m 8 -c --nrgrp 8 --total -I 0

That's cool! Very helpful, I'll try it.

Thanks,
Gui

> 
> Thanks
> Vivek

-- 
Regards
Gui Jianfeng

  reply	other threads:[~2010-07-22 23:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-21 19:06 [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle tunable V3 Vivek Goyal
2010-07-21 19:06 ` [PATCH 1/3] cfq-iosched: Implment IOPS mode Vivek Goyal
2010-07-21 20:33   ` Jeff Moyer
2010-07-21 20:57     ` Vivek Goyal
2010-07-21 19:06 ` [PATCH 2/3] cfq-iosched: Implement a tunable group_idle Vivek Goyal
2010-07-21 19:40   ` Jeff Moyer
2010-07-21 20:13     ` Vivek Goyal
2010-07-21 20:54       ` Jeff Moyer
2010-07-21 19:06 ` [PATCH 3/3] cfq-iosched: Print number of sectors dispatched per cfqq slice Vivek Goyal
2010-07-22  5:56 ` [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle tunable V3 Christoph Hellwig
2010-07-22 14:00   ` Vivek Goyal
2010-07-24  8:51     ` Christoph Hellwig
2010-07-24  9:07       ` Corrado Zoccolo
2010-07-26 14:30         ` Vivek Goyal
2010-07-26 21:21           ` Tuning IO scheduler (Was: Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle tunable V3) Vivek Goyal
2010-07-26 14:33         ` [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle tunable V3 Vivek Goyal
2010-07-29 19:57           ` Corrado Zoccolo
2010-07-26 13:51       ` Vivek Goyal
2010-07-22 20:54   ` Vivek Goyal
2010-07-22  7:08 ` Gui Jianfeng
2010-07-22 14:49   ` Vivek Goyal
2010-07-22 23:53     ` Gui Jianfeng [this message]
2010-07-26  6:58 ` Gui Jianfeng
2010-07-26 14:10   ` Vivek Goyal
2010-07-27  8:33     ` Gui Jianfeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C48DA16.4010403@cn.fujitsu.com \
    --to=guijianfeng@cn.fujitsu.com \
    --cc=axboe@kernel.dk \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox