Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling support

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Nauman Rafique <nauman@google.com>
Cc: Vivek Goyal <vgoyal@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Jeff Moyer <jmoyer@redhat.com>, Divyesh Shah <dpshah@google.com>,
	Corrado Zoccolo <czoccolo@gmail.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling support
Date: Thu, 02 Sep 2010 08:30:37 +0800	[thread overview]
Message-ID: <4C7EF02D.9040202@cn.fujitsu.com> (raw)
In-Reply-To: <AANLkTin8zv9GdBnSsU-a2XjeqXrvr0fTNY2ZMTbGxiVd@mail.gmail.com>

Nauman Rafique wrote:
> On Wed, Sep 1, 2010 at 10:10 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Wed, Sep 01, 2010 at 08:49:26AM -0700, Nauman Rafique wrote:
>>> On Wed, Sep 1, 2010 at 1:50 AM, Gui Jianfeng <guijianfeng@cn.fujitsu.com> wrote:
>>>> Vivek Goyal wrote:
>>>>> On Tue, Aug 31, 2010 at 08:40:19AM -0700, Nauman Rafique wrote:
>>>>>> On Tue, Aug 31, 2010 at 5:57 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>>>> On Tue, Aug 31, 2010 at 08:29:20AM +0800, Gui Jianfeng wrote:
>>>>>>>> Vivek Goyal wrote:
>>>>>>>>> On Mon, Aug 30, 2010 at 02:50:40PM +0800, Gui Jianfeng wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> This patch enables cfq group hierarchical scheduling.
>>>>>>>>>>
>>>>>>>>>> With this patch, you can create a cgroup directory deeper than level 1.
>>>>>>>>>> Now, I/O Bandwidth is distributed in a hierarchy way. For example:
>>>>>>>>>> We create cgroup directories as following(the number represents weight):
>>>>>>>>>>
>>>>>>>>>>             Root grp
>>>>>>>>>>            /       \
>>>>>>>>>>        grp_1(100) grp_2(400)
>>>>>>>>>>        /    \
>>>>>>>>>>   grp_3(200) grp_4(300)
>>>>>>>>>>
>>>>>>>>>> If grp_2 grp_3 and grp_4 are contending for I/O Bandwidth,
>>>>>>>>>> grp_2 will share 80% of total bandwidth.
>>>>>>>>>> For sub_groups, grp_3 shares 8%(20% * 40%), grp_4 shares 12%(20% * 60%)
>>>>>>>>>>
>>>>>>>>>> Design:
>>>>>>>>>>   o Each cfq group has its own group service tree.
>>>>>>>>>>   o Each cfq group contains a "group schedule entity" (gse) that
>>>>>>>>>>     schedules on parent cfq group's service tree.
>>>>>>>>>>   o Each cfq group contains a "queue schedule entity"(qse), it
>>>>>>>>>>     represents all cfqqs located on this cfq group. It schedules
>>>>>>>>>>     on this group's service tree. For the time being, root group
>>>>>>>>>>     qse's weight is 1000, and subgroup qse's weight is 500.
>>>>>>>>>>   o All gses and qse which belones to a same cfq group schedules
>>>>>>>>>>     on the same group service tree.
>>>>>>>>> Hi Gui,
>>>>>>>>>
>>>>>>>>> Thanks for the patch. I have few questions.
>>>>>>>>>
>>>>>>>>> - So how does the hierarchy look like, w.r.t root group. Something as
>>>>>>>>>   follows?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                     root
>>>>>>>>>                    / | \
>>>>>>>>>                  q1  q2 G1
>>>>>>>>>
>>>>>>>>> Assume there are two processes doin IO in root group and q1 and q2 are
>>>>>>>>> cfqq queues for those processes and G1 is the cgroup created by user.
>>>>>>>>>
>>>>>>>>> If yes, then what algorithm do you use to do scheduling between q1, q2
>>>>>>>>> and G1? IOW, currently we have two algorithms operating in CFQ. One for
>>>>>>>>> cfqq and other for groups. Group algorithm does not use the logic of
>>>>>>>>> cfq_slice_offset().
>>>>>>>> Hi Vivek,
>>>>>>>>
>>>>>>>> This patch doesn't break the original sheduling logic. That is cfqg => st => cfqq.
>>>>>>>> If q1 and q2 in root group, I treat q1 and q2 bundle as a queue sched entity, and
>>>>>>>> it will schedule on root group service with G1, as following:
>>>>>>>>
>>>>>>>>                          root group
>>>>>>>>                         /         \
>>>>>>>>                     qse(q1,q2)    gse(G1)
>>>>>>>>
>>>>>>> Ok. That's interesting. That raises another question that how hierarchy
>>>>>>> should look like. IOW, how queue and groups should be treated in
>>>>>>> hierarchy.
>>>>>>>
>>>>>>> CFS cpu scheduler treats queues and group at the same level. That is as
>>>>>>> follows.
>>>>>>>
>>>>>>>                        root
>>>>>>>                        / | \
>>>>>>>                       q1 q2 G1
>>>>>>>
>>>>>>> In the past I had raised this question and Jens and corrado liked treating
>>>>>>> queues and group at same level.
>>>>>>>
>>>>>>> Logically, q1, q2 and G1 are all children of root, so it makes sense to
>>>>>>> treat them at same level and not group q1 and q2 in to a single entity and
>>>>>>> group.
>>>>>>>
>>>>>>> One of the possible way forward could be this.
>>>>>>>
>>>>>>> - Treat queue and group at same level (like CFS)
>>>>>>>
>>>>>>> - Get rid of cfq_slice_offset() logic. That means without idling on, there
>>>>>>>  will be no ioprio difference between cfq queues. I think anyway as of
>>>>>>>  today that logic helps in so little situations that I would not mind
>>>>>>>  getting rid of it. Just that Jens should agree to it.
>>>>>>>
>>>>>>> - With this new scheme, it will break the existing semantics of root group
>>>>>>>  being at same level as child groups. To avoid that, we can probably
>>>>>>>  implement two modes (flat and hierarchical), something similar to what
>>>>>>>  memory cgroup controller has done. May be one tunable in root cgroup of
>>>>>>>  blkio "use_hierarchy".  By default everything will be in flat mode and
>>>>>>>  if user wants hiearchical control, he needs to set user_hierarchy in
>>>>>>>  root group.
>>>>>> Vivek, may be I am reading you wrong here. But you are first
>>>>>> suggesting to add more complexity to treat queues and group at the
>>>>>> same level. Then you are suggesting add even more complexity to fix
>>>>>> the problems caused by that approach.
>>>>>>
>>>>>> Why do we need to treat queues and group at the same level? "CFS does
>>>>>> it" is not a good argument.
>>>>> Sure it is not a very good argument but at the same time one would need
>>>>> a very good argument that why we should do things differently.
>>>>>
>>>>> - If a user has mounted cpu and blkio controller together and both the
>>>>>   controllers are viewing the same hierarchy differently, then it is
>>>>>   odd. We need a good reason that why different arrangement makes sense.
>>>> Hi Vivek，
>>>>
>>>> Even if we mount cpu and blkio together, to me, it's ok for cpu and blkio
>>>> having their own logic, since they are totally different cgroup subsystems.
>>>>
>>>>> - To me, both group and cfq queue are children of root group and it
>>>>>   makes sense to treat them independent childrens instead of putting
>>>>>   all the queues in one logical group which inherits the weight of
>>>>>   parent.
>>>>>
>>>>> - With this new scheme, I am finding it hard to visualize the hierachy.
>>>>>   How do you assign the weights to queue entities of a group. It is more
>>>>>   like a invisible group with-in group. We shall have to create new
>>>>>   tunable which can speicy the weight for this hidden group.
>>>> For the time being, the root "qse" weight is 1000 and others is 500, they don't
>>>> inherit the weight of parent. I was thinking that maybe we can determine the qse
>>>> weight in term of the queue number and weight in this group and subgroups.
>>>>
>>>> Thanks,
>>>> Gui
>>>>
>>>>>
>>>>> So in summary I am liking the "queue at same level as group" scheme for
>>>>> two reasons.
>>>>>
>>>>> - It is more intutive to visualize and implement. It follows the true
>>>>>   hierarchy as seen by cgroup file system.
>>>>>
>>>>> - CFS has already implemented this scheme. So we need a strong arguemnt
>>>>>   to justify why we should not follow the same thing. Especially for
>>>>>   the case where user has co-mounted cpu and blkio controller.
>>>>>
>>>>> - It can achieve the same goal as "hidden group" proposal just by
>>>>>   creating a cgroup explicitly and moving all threads in that group.
>>>>>
>>>>> Why do you think that "hidden group" proposal is better than "treating
>>>>> queue at same level as group" ?
>>> There are multiple reasons for "hidden group" proposal being a better approach.
>>>
>>> - "Hidden group" would allow us to keep scheduling queues using the
>>> CFQ queue scheduling logic. And does not require any major changes in
>>> CFQ. Aren't we already using that approach to deal with queues at the
>>> root group?
>> Currently we are operating in flat mode where all the groups are at
>> same level (irrespective their position in cgroup hiearchy).
>>
>>> - If queues and groups are treated at the same level, queues can end
>>> up in root cgroup. And we cannot put an upper bound on the number of
>>> those queues. Those queues can consume system resources in proportion
>>> to their number, causing the performance of groups to suffer. If we
>>> have "hidden group", we can configure it to a small weight, and that
>>> would limit the impact these queues in root group can have.
>> To limit the impact of other queues in cgroup, one can use libcgroup to
>> automatically place new threads or tasks into a subgroup.
>>
>> I understand that kernel doing it by default should help though. It is
>> less work in terms of configuration. But I am not sure that's a good
>> argument to design kernel functionality. Kernel functionality should be
>> pretty generic.
>>
>> Anyway, how would you assign the weight to the hidden group. What's the
>> interface for that? A new cgroup file inside each cgroup? Personally
>> I think that's little odd interface. Every group has one hidden group
>> where all the queues in that group go and weight of that group can be
>> specified by a cgroup file.
> 
> I think picking a reasonable default weight at compile time is not
> that bad an option, given that threads showing up in the "hidden
> group" is an uncommon case.

Hi Nauman,

Later, I think we might adjust the weight of "hidden group" automatically
according to the queue number and subgroup number and their weight.
But for the time being, i'd choose a fixed value for the sake of simplicity.

Gui

> 
>> But anyway, I am not tied to any of the approach. I am just trying to
>> make sure that we have put enough thought into it as changing it later
>> will be hard.
>>
>> Vivek
>>
> 
>

next prev parent reply	other threads:[~2010-09-02  0:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-30  6:50 [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling support Gui Jianfeng
2010-08-30 18:20 ` Chad Talbott
2010-08-31  0:35   ` Gui Jianfeng
2010-08-30 20:36 ` Vivek Goyal
2010-08-31  0:29   ` Gui Jianfeng
2010-08-31 12:57     ` Vivek Goyal
2010-08-31 15:40       ` Nauman Rafique
2010-08-31 19:25         ` Vivek Goyal
2010-09-01  8:50           ` Gui Jianfeng
2010-09-01 15:49             ` Nauman Rafique
2010-09-01 17:10               ` Vivek Goyal
2010-09-01 17:15                 ` Nauman Rafique
2010-09-01 17:21                   ` Vivek Goyal
2010-09-02  0:30                   ` Gui Jianfeng [this message]
2010-09-02  2:20             ` Vivek Goyal
2010-09-01  8:48       ` Gui Jianfeng
2010-09-01  9:02         ` KAMEZAWA Hiroyuki
2010-09-02  2:29           ` Vivek Goyal
2010-09-02  2:42             ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C7EF02D.9040202@cn.fujitsu.com \
    --to=guijianfeng@cn.fujitsu.com \
    --cc=axboe@kernel.dk \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=jmoyer@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox