From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
s-uchida@ap.jp.nec.com, taka@valinux.co.jp,
guijianfeng@cn.fujitsu.com, jmoyer@redhat.com,
balbir@linux.vnet.ibm.com, righi.andrea@gmail.com,
m-ikeda@ds.jp.nec.com, akpm@linux-foundation.org,
riel@redhat.com, kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [PATCH 14/16] blkio: Idle on a group for some time on rotational media
Date: Fri, 13 Nov 2009 10:37:01 -0500 [thread overview]
Message-ID: <20091113153701.GE17076@redhat.com> (raw)
In-Reply-To: <4e5e476b0911130258v7b81902dlfcc298c72f2de63a@mail.gmail.com>
On Fri, Nov 13, 2009 at 11:58:53AM +0100, Corrado Zoccolo wrote:
> Hi Vivek,
> On Fri, Nov 13, 2009 at 12:32 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > o If a group is not continuously backlogged, then it will be deleted from
> > service tree and loose it share. For example, if a single random seeky
> > reader or a single sequential reader is running in group.
> >
> Without groups, a single sequential reader would already have its 10ms
> idle slice, and a single random reader on the noidle service tree
> would have its 2ms idle, before switching to a new workload. Were
> those removed in the previous patches (and this patches re-enable
> them), or this introduces an additional idle between groups?
>
Previous patches were based on 2.6.32-rc5 did not have the concept of idling
on no-idle group.
Group idling can be thought of an additional idling if group is empty and
and for some reason CFQ did not decide to idle on the queue (because slice
has expired). Even if cfqq slice has expired, we want to wait a bit to
make sure that this group gets backlogged and does not get deleted from
service tree so that it continues to get its fair share.
It helps in following circumstances. I have got a NCQ enabled rotational
disk which roughly does 90MB/s for buffered reads. If I launch two
sequential readers in two group of weight 400 and 200 and first group
should get double the disk time of second group. But after every slice
expiry the group gets deleted and looses share. Hence this extra idle on
group helps (if we already did not decide to idle on queue).
Having said that, I understand that idling can hurt in total throughput
if there is a NCQ enabled fast storage array. But it does not necessarily
hurt if there is a single NCQ enabled rotational disk. So as we move along
we need to figure out a way when to idle to achieve fairness and where we
let the fairness go becuase it hurts too much.
That's the reason I introduced the "group_idle" tunable so that one can
disable group_idle manually. This will also help us analyze when exactly
does it make sense to wait for slow groups to catch up or when it does
not.
So in summary, group_idling is an extra idle period we wait on group if
it is empty and we did not decide to idle on the cfqq. This is an effort
to make the group backlogged again so that it does not get deleted from
service tree and does not loose its share. This idling is disabled on
NCQ SSDs. Now only case left is fast storage arrays with rotational disk
and we need to figure out when to idle and when not to. Currently CFQ
seems to be idling on even fast storage array for sequential tasks and
this hurts if that sequential task is doing direct IO and not utilizing
the full capacity of the array.
Thanks
Vivek
> > o One solution is to let group loose it share if it is not backlogged and
> > other solution is to wait a bit for the slow group so that it can get its
> > time slice. This patch implements waiting for a group to wait a bit.
> >
> > o This waiting is disabled for NCQ SSDs.
> >
> > o This patch also intorduces the tunable "group_idle" which can enable/disable
> > group idling manually.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>
> Thanks,
> Corrado
next prev parent reply other threads:[~2009-11-13 15:37 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-12 23:32 [RFC] Block IO Controller V2 Vivek Goyal
2009-11-12 23:32 ` [PATCH 01/16] blkio: Documentation Vivek Goyal
2009-11-13 10:48 ` Jens Axboe
2009-11-13 15:18 ` Vivek Goyal
2009-11-12 23:32 ` [PATCH 02/16] blkio: Introduce the notion of cfq groups Vivek Goyal
2009-11-12 23:32 ` [PATCH 03/16] blkio: Keep queue on service tree until we expire it Vivek Goyal
2009-11-13 10:39 ` Corrado Zoccolo
2009-11-13 10:48 ` Jens Axboe
2009-11-13 15:05 ` Vivek Goyal
2009-11-13 18:44 ` Jens Axboe
2009-11-12 23:32 ` [PATCH 04/16] blkio: Introduce the root service tree for cfq groups Vivek Goyal
2009-11-12 23:32 ` [PATCH 05/16] blkio: Implement per cfq group latency target and busy queue avg Vivek Goyal
2009-11-13 10:46 ` Corrado Zoccolo
2009-11-13 15:18 ` Vivek Goyal
2009-11-13 16:15 ` Vivek Goyal
2009-11-13 18:40 ` Corrado Zoccolo
2009-11-13 19:26 ` Vivek Goyal
2009-11-13 19:38 ` Corrado Zoccolo
2009-11-12 23:32 ` [PATCH 06/16] blkio: Introduce blkio controller cgroup interface Vivek Goyal
2009-11-12 23:32 ` [PATCH 07/16] blkio: Introduce per cfq group weights and vdisktime calculations Vivek Goyal
2009-11-12 23:32 ` [PATCH 08/16] blkio: Group time used accounting and workload context save restore Vivek Goyal
2009-11-12 23:32 ` [PATCH 09/16] blkio: Dynamic cfq group creation based on cgroup tasks belongs to Vivek Goyal
2009-11-12 23:32 ` [PATCH 10/16] blkio: Take care of cgroup deletion and cfq group reference counting Vivek Goyal
2009-11-12 23:32 ` [PATCH 11/16] blkio: Some debugging aids for CFQ Vivek Goyal
2009-11-12 23:32 ` [PATCH 12/16] blkio: Export disk time and sectors used by a group to user space Vivek Goyal
2009-11-12 23:32 ` [PATCH 13/16] blkio: Provide some isolation between groups Vivek Goyal
2009-11-12 23:32 ` [PATCH 14/16] blkio: Idle on a group for some time on rotational media Vivek Goyal
2009-11-13 10:58 ` Corrado Zoccolo
2009-11-13 15:37 ` Vivek Goyal [this message]
2009-11-12 23:32 ` [PATCH 15/16] blkio: Drop the reference to queue once the task changes cgroup Vivek Goyal
2009-11-12 23:32 ` [PATCH 16/16] blkio: Propagate cgroup weight updation to cfq groups Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091113153701.GE17076@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=czoccolo@gmail.com \
--cc=dpshah@google.com \
--cc=fernando@oss.ntt.co.jp \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=m-ikeda@ds.jp.nec.com \
--cc=nauman@google.com \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox