From: Tejun Heo <tj@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: lizefan@huawei.com, axboe@kernel.dk,
containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, ctalbott@google.com,
rni@google.com, Peter Zijlstra <pzijlstr@redhat.com>,
peterz@infradead.org
Subject: Re: [PATCHSET] block: implement blkcg hierarchy support in cfq
Date: Mon, 17 Dec 2012 09:38:00 -0800 [thread overview]
Message-ID: <20121217173800.GB2592@htj.dyndns.org> (raw)
In-Reply-To: <20121217165228.GB7235@redhat.com>
Hello, Vivek.
On Mon, Dec 17, 2012 at 11:52:28AM -0500, Vivek Goyal wrote:
> I am wondering if blkio.task_group_weight[_device] will more sense. It
> is easier to think in terms of hidden task group of a cfqg instead of
> whether it is a leaf node or not.
As long as we stick to something consistent and short (one word
please), I don't think it'll matter all that much and the
"leaf_weight" stands for weight for the hidden leaf node.
> > these tasks. Another way to look at it is that each cfqg has a
> > hidden leaf child node attached to it which hosts all tasks and
> > leaf_weight controls the weight of that hidden node.
> >
> > Treating cfqqs and cfqgs as equals doesn't make much sense to me and
> > is hairy - we need to establish ioprio to weight mapping and the
> > weights fluctuate as processes fork and exit.
>
> So weights of task (io_context) or blkcg weights don't fluctuate with
> task fork/exit. It is just the weight on service tree, which fluctuates.
Why would the weight on service tree fluctuate?
> > This becomes hairier
> > when considering multiple controllers, Such mappings can't be
> > established consistently across different controllers and the
> > weights are given out differently - ie. blkcg give weights out to
> > io_contexts while cpu to tasks, which may share io_contexts. It's
> > difficult to make sense of what's going on.
>
> We already have that issue, isn't it. Cpu does task scheduling and
> CFQ does io_context scheduling. (Nobody seems to be complaining though).
We don't have that issue yet because cfq doesn't do hierarchical
weighting yet and people are not co-mounting cpu and blkio.
> I think we first need to have some kind of buy-in from cpu controller
> guys that yes in long term they will change it. Otherwise we don't want
> to be stuck in a situation where cpu and blkio behave entirely
> differently.
Sure, I was planning to work on that once blkio is in place but it's
not like we can be consistent in any other way and I don't think
making cpu support this behavior would be too difficult. It's just
dealing with an extra leaf node after all. Peter?
> In fact we need to revisit this idea that what makes more sense. To
> me treating task and group at same level is not necessarily bad as
> it gives more flexibility. And leave it to user to create another
> subgroup and launch all the tasks there if they want to emulate the
> behavior of a hidden sub-group.
We'll have to disagree here.
> If you look at it systemd already puts services in separate group. They
> always wanted to put user sessions also in a separate group. So
> effectively hierarhcy looks as follows (for cpu controller).
>
> root
> / | \ \
> T1 T2 usr system
>
> So T1 and T2 here basically a kernel threds (All users sessions and
> services have been moved out to respective cgroups).
>
> I think I am fine with not limiting kernel threads into a subgroup of
> their own. In fact I think there was a patch where we could not move
> kernel threads out of root cgroup. If that makes sense, then it does
> not make sense to limit kernel threads to a subgroup of their own
> by default (it is equivalent to moving these threads to a cgroup of
> their own).
How is that relevant to the discussion at hand? That's about having
no limit for root cgroup. For anything weight-based, you can't have
"no" weight.
> So though I don't mind the notion of this hidden cgroups but given
> the fact that we have implemented things other way and left it to
> user space to manage it based on their needs, I am not sure what's
> that fundamental reason that we should change that assumption now.
Hmmm? blkio doesn't work like that *at all*. Currently, it basically
treats the root cgroup as a leaf group, so I'm kinda lost why you're
talking about "changing the assumption" because the proposed patchset
maintains the existing behavior (at least for 1-level hierarchy) while
what you're suggesting would change the behavior fundamentally.
So, in terms of compatibility, I don't think there's a clear better
way here. cpu and blkio are already doing things differently and we
need to pick one to unify the behavior and I think having separate
weight for tasks in internal node is a better one because
* Configuration lives in cgroup proper. No need to somehow map
per-schedule-entity attribute to cgroup weight, which is hairy and
non-obvious.
* Different controllers deal with different scheduling-entities and it
becomes very difficult to tell how the weight is actually being
distributed. It's just nasty.
> And even if we decide to do it, we need to have other controllers
> on board (especially cpu).
Only cpu actually. That's the only other controller which implements
weight based control and I'm fairly sure we can implement such
behavior in cpu in backward compatible way.
> I think we will have similar issues with others components too. In blkio
> throttling support, we will have to put some kind of throttling limits
> on internal group too. I guess one can raise similar concerns for memory
> controller too where there are no internal limits on child task of a
> cgroup but there are limits on child group.
I don't think so. We need some way of assigning weights between tasks
of an internal cgroup and children. No such issue exists for
non-weight based controllers. I don't see any reason to change that.
Thanks.
--
tejun
next prev parent reply other threads:[~2012-12-17 17:38 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-14 22:41 [PATCHSET] block: implement blkcg hierarchy support in cfq Tejun Heo
2012-12-14 22:41 ` [PATCH 01/12] blkcg: fix minor bug in blkg_alloc() Tejun Heo
2012-12-17 19:10 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 02/12] blkcg: reorganize blkg_lookup_create() and friends Tejun Heo
2012-12-17 19:28 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 03/12] blkcg: cosmetic updates to blkg_create() Tejun Heo
2012-12-17 19:37 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 04/12] blkcg: make blkcg_gq's hierarchical Tejun Heo
2012-12-17 20:04 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 05/12] cfq-iosched: add leaf_weight Tejun Heo
2012-12-14 22:41 ` [PATCH 06/12] cfq-iosched: implement cfq_group->nr_active and ->level_weight Tejun Heo
2012-12-17 20:46 ` Vivek Goyal
2012-12-17 21:15 ` Tejun Heo
2012-12-17 21:18 ` Vivek Goyal
2012-12-17 21:20 ` Tejun Heo
2012-12-14 22:41 ` [PATCH 07/12] cfq-iosched: implement hierarchy-ready cfq_group charge scaling Tejun Heo
2012-12-17 20:53 ` Vivek Goyal
2012-12-17 21:17 ` Tejun Heo
2012-12-17 21:27 ` Vivek Goyal
2012-12-17 21:33 ` Tejun Heo
2012-12-17 21:49 ` Vivek Goyal
2012-12-17 22:12 ` Tejun Heo
2012-12-14 22:41 ` [PATCH 08/12] cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction Tejun Heo
2012-12-14 22:41 ` [PATCH 09/12] cfq-iosched: enable full blkcg hierarchy support Tejun Heo
2012-12-18 18:40 ` Vivek Goyal
2012-12-18 19:10 ` Tejun Heo
2012-12-18 19:16 ` Vivek Goyal
2012-12-18 19:17 ` Tejun Heo
2012-12-14 22:41 ` [PATCH 10/12] blkcg: add blkg_policy_data->plid Tejun Heo
2012-12-14 22:41 ` [PATCH 11/12] blkcg: implement blkg_prfill_[rw]stat_recursive() Tejun Heo
2012-12-14 22:41 ` [PATCH 12/12] cfq-iosched: add hierarchical cfq_group statistics Tejun Heo
2012-12-18 19:11 ` Vivek Goyal
2012-12-18 19:14 ` Tejun Heo
2012-12-18 19:18 ` Vivek Goyal
2012-12-18 19:21 ` Tejun Heo
2012-12-18 19:26 ` Vivek Goyal
2012-12-17 16:52 ` [PATCHSET] block: implement blkcg hierarchy support in cfq Vivek Goyal
2012-12-17 17:38 ` Tejun Heo [this message]
2012-12-17 18:50 ` Vivek Goyal
2012-12-17 18:59 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121217173800.GB2592@htj.dyndns.org \
--to=tj@kernel.org \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=peterz@infradead.org \
--cc=pzijlstr@redhat.com \
--cc=rni@google.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).