Re: [RFC] CFQ group scheduling structure organization

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Munehiro Ikeda <m-ikeda@ds.jp.nec.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
	nauman@google.com, lizf@cn.fujitsu.com, ryov@valinux.co.jp,
	fernando@oss.ntt.co.jp, taka@valinux.co.jp,
	guijianfeng@cn.fujitsu.com, jmoyer@redhat.com,
	Alan.Brunelle@hp.com
Subject: Re: [RFC] CFQ group scheduling structure organization
Date: Thu, 17 Dec 2009 18:58:21 -0500	[thread overview]
Message-ID: <4B2AC59D.2010004@ds.jp.nec.com> (raw)
In-Reply-To: <4e5e476b0912170341h7ba632akddb921c996a36f73@mail.gmail.com>

Hello,

Corrado Zoccolo wrote, on 12/17/2009 06:41 AM:
> Hi,
> On Wed, Dec 16, 2009 at 11:52 PM, Vivek Goyal<vgoyal@redhat.com>  wrote:
>> Hi All,
>>
>> With some basic group scheduling support in CFQ, there are few questions
>> regarding how group structure should look like in CFQ.
>>
>> Currently, grouping looks as follows. A, and B are two cgroups created by
>> user.
>>
>> [snip]
>>
>> Proposal 4:
>> ==========
>> Treat task and group at same level. Currently groups are at top level and
>> at second level are tasks. View the whole hierarchy as follows.
>>
>>
>>                         service-tree
>>                         /   |  \  \
>>                        T1   T2  G1 G2
>>
>> Here T1 and T2 are two tasks in root group and G1 and G2 are two cgroups
>> created under root.
>>
>> In this kind of scheme, any RT task in root group will still be system
>> wide RT even if we create groups G1 and G2.
>>
>> So what are the issues?
>>
>> - I talked to few folks and everybody found this scheme not so intutive.
>>   Their argument was that once I create a cgroup, say A,  under root, then
>>   bandwidth should be divided between "root" and "A" proportionate to
>>   the weight.
>>
>>   It is not very intutive that group is competing with all the tasks
>>   running in root group. And disk share of newly created group will change
>>   if more tasks fork in root group. So it is highly dynamic and not
>>   static hence un-intutive.

I agree it might be dynamic but I don't think it's un-intuitive.
I think it's reasonable that disk share of a group is
influenced by the number of tasks running in root group,
because the root group is shared by the tasks and groups from
the viewpoint of cgroup I/F, and they really share disk bandwidth.


>>   To emulate the behavior of previous proposals, root shall have to create
>>   a new group and move all root tasks there. But admin shall have to still
>>   keep RT tasks in root group so that they still remain system-wide.
>>
>>                         service-tree
>>                         /   |    \  \
>>                        T1  root  G1 G2
>>                             |
>>                             T2
>>
>>   Now admin has specifically created a group "root" along side G1 and G2
>>   and moved T2 under root. T1 is still left in top level group as it might
>>   be an RT task and we want it to remain RT task systemwide.
>>
>>   So to some people this scheme is un-intutive and requires more work in
>>   user space to achive desired behavior. I am kind of 50:50 between two
>>   kind of arrangements.
>>
> This is the one I prefer: it is the most natural one if you see that
> groups are scheduling entities like any other task.
> I think it becomes intuitive with an analogy with a qemu (e.g. kvm)
> virtual machine model. If you think a group like a virtual machine, it
> is clear that for the normal system, the whole virtual machine is a
> single scheduling entity, and that it has to compete with other
> virtual machines (as other single entities) and every process in the
> real system (those are inherently more important, since without the
> real system, the VMs cannot simply exist).
> Having a designated root group, instead, resembles the xen VM model,
> where you have a separated domain for each VM and for the real system.
>
> I think the implementation of this approach can make the code simpler
> and modular (CFQ could be abstracted to deal with scheduling entities,
> and each scheduling entity could be defined in a separate file).
> Within each group, you will now have the choice of how to schedule its
> queues. This means that you could possibly have different I/O
> schedulers within each group, and even have sub-groups within groups.

Corrado exactly says my preference.

I understand current implementation, like proposal 1, was
employed to make code simple and I believe it succeeded.
However, rather I feel it's un-intuitive because it's
inconsistent with cgroup I/F.  Behavior which is inconsistent
with the I/F can lead to misconfiguration of sys-admins.
This might be problematic, IMHO.



Thanks,
Muuhh

-- 
IKEDA, Munehiro
   NEC Corporation of America
     m-ikeda@ds.jp.nec.com

next prev parent reply	other threads:[~2009-12-18  0:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-16 22:52 [RFC] CFQ group scheduling structure organization Vivek Goyal
2009-12-16 22:52 ` [PATCH 1/4] cfq-iosced: Remove the check for same cfq group from allow_merge Vivek Goyal
2009-12-17  9:26   ` Gui Jianfeng
2009-12-16 22:52 ` [PATCH 2/4] cfq-iosched: Get rid of nr_groups Vivek Goyal
2009-12-17  9:26   ` Gui Jianfeng
2009-12-16 22:52 ` [PATCH 3/4] cfq-iosched: Remove prio_change logic for workload selection Vivek Goyal
2009-12-17  9:20   ` Gui Jianfeng
2009-12-18 15:17     ` Vivek Goyal
2009-12-20  4:19       ` Gui Jianfeng
2009-12-17 11:49   ` Corrado Zoccolo
2009-12-16 22:53 ` [PATCH 4/4] cfq-iosched: Implement system wide RT and IDLE groups Vivek Goyal
2009-12-16 23:14 ` [RFC] CFQ group scheduling structure organization Nauman Rafique
2009-12-16 23:24   ` Vivek Goyal
2009-12-17 10:17 ` Gui Jianfeng
2009-12-18 15:21   ` Vivek Goyal
2009-12-17 11:41 ` Corrado Zoccolo
2009-12-17 23:58   ` Munehiro Ikeda [this message]
2009-12-18 16:01     ` Vivek Goyal
2009-12-21 12:16     ` Jens Axboe
2009-12-21 14:42       ` Vivek Goyal
2009-12-18 15:49   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B2AC59D.2010004@ds.jp.nec.com \
    --to=m-ikeda@ds.jp.nec.com \
    --cc=Alan.Brunelle@hp.com \
    --cc=czoccolo@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=nauman@google.com \
    --cc=ryov@valinux.co.jp \
    --cc=taka@valinux.co.jp \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.