From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, axboe@kernel.dk, vgoyal@redhat.com
Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, ctalbott@google.com,
rni@google.com
Subject: [PATCHSET] block: implement blkcg hierarchy support in cfq
Date: Fri, 14 Dec 2012 14:41:13 -0800 [thread overview]
Message-ID: <1355524885-22719-1-git-send-email-tj@kernel.org> (raw)
Hello,
cfq-iosched is currently utterly broken in how it handles cgroup
hierarchy. It ignores the hierarchy structure and just treats every
blkcgs equally. This is simply broken. This breakage makes blkcg
behave very differently from other properly-hierarchical controllers
and makes it impossible to give any uniform interpretation to the
hierarchy, which in turn makes it impossible to implement unified
hierarchy.
Given the relative simplicity of cfqg scheduling, implementing proper
hierarchy support isn't that difficult. All that's necessary is
determining how much fraction each cfqg on the service tree has claim
to considering the hierarchy. The calculation can be done by
maintaining the sum of active weights at each level and compounding
the ratios from the cfqg in question to root. The overhead isn't
significant. Tree traversals happen only when cfqgs are added or
removed from the service tree and they are from the cfqg being
modified to the root.
There are some design choices which are worth mentioning.
* Internal (non-leaf) cfqgs w/ tasks treat the tasks as a single unit
competeting against the children cfqgs. New config knobs -
blkio.leaf_weight[_device] - are added to configure the weight of
these tasks. Another way to look at it is that each cfqg has a
hidden leaf child node attached to it which hosts all tasks and
leaf_weight controls the weight of that hidden node.
Treating cfqqs and cfqgs as equals doesn't make much sense to me and
is hairy - we need to establish ioprio to weight mapping and the
weights fluctuate as processes fork and exit. This becomes hairier
when considering multiple controllers, Such mappings can't be
established consistently across different controllers and the
weights are given out differently - ie. blkcg give weights out to
io_contexts while cpu to tasks, which may share io_contexts. It's
difficult to make sense of what's going on.
The goal is to bring cpu, currently the only other controller which
implements weight based resource allocation, to similar behavior.
* The existing stats aren't converted to hierarchical but new
hierarchical ones are added. There isn't a way to do that w/o
introducing nasty silent surprises to the existing flat hierarchy
users, so while being a bit clumsy, I can't see a better way.
* I based it on top of Vivek's cleanup patchset[1] but not the cfqq,
cfqg scheduling unification patchset. I don't think it's necessary
or beneficial to mix the two and would really like to avoid messing
with !blkcg scheduling logic.
The hierarchical scheduling itself is fairly simple. The cfq part is
only ~260 lines with ~60 lines being comment, and the hierarchical
weight scaling is really straight-forward.
This patchset contains the following 12 patches.
0001-blkcg-fix-minor-bug-in-blkg_alloc.patch
0002-blkcg-reorganize-blkg_lookup_create-and-friends.patch
0003-blkcg-cosmetic-updates-to-blkg_create.patch
0004-blkcg-make-blkcg_gq-s-hierarchical.patch
0005-cfq-iosched-add-leaf_weight.patch
0006-cfq-iosched-implement-cfq_group-nr_active-and-level_.patch
0007-cfq-iosched-implement-hierarchy-ready-cfq_group-char.patch
0008-cfq-iosched-convert-cfq_group_slice-to-use-cfqg-vfra.patch
0009-cfq-iosched-enable-full-blkcg-hierarchy-support.patch
0010-blkcg-add-blkg_policy_data-plid.patch
0011-blkcg-implement-blkg_prfill_-rw-stat_recursive.patch
0012-cfq-iosched-add-hierarchical-cfq_group-statistics.patch
0001-0003 are prep patches.
0004 makes blkcg core always allocate non-leaf blkgs so that any given
blkg is guaranteed to have all its ancestor blkgs to the root.
0005-0006 prepare for hierarchical scheduling.
0007-0008 implement hierarchy-ready cfqg scheduling.
0009 enbles hierarchical scheduling.
0010-0012 implement hierarchical stats.
This patchset is on top of
linus#master (d42b3a2906a10b732ea7d7f849d49be79d242ef0)
+ [1] "cfq-iosched: Some minor cleanups" patchset
and available in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git blkcg-cfq-hierarchy
Thanks.
block/blk-cgroup.c | 263 ++++++++++++++++++++++++++++++++++++-----
block/blk-cgroup.h | 26 +++-
block/cfq-iosched.c | 329 ++++++++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 560 insertions(+), 58 deletions(-)
--
tejun
[1] https://lkml.org/lkml/2012/10/3/502
next reply other threads:[~2012-12-14 22:41 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-14 22:41 Tejun Heo [this message]
2012-12-14 22:41 ` [PATCH 01/12] blkcg: fix minor bug in blkg_alloc() Tejun Heo
2012-12-17 19:10 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 02/12] blkcg: reorganize blkg_lookup_create() and friends Tejun Heo
2012-12-17 19:28 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 03/12] blkcg: cosmetic updates to blkg_create() Tejun Heo
2012-12-17 19:37 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 04/12] blkcg: make blkcg_gq's hierarchical Tejun Heo
2012-12-17 20:04 ` Vivek Goyal
2012-12-14 22:41 ` [PATCH 05/12] cfq-iosched: add leaf_weight Tejun Heo
2012-12-14 22:41 ` [PATCH 06/12] cfq-iosched: implement cfq_group->nr_active and ->level_weight Tejun Heo
2012-12-17 20:46 ` Vivek Goyal
2012-12-17 21:15 ` Tejun Heo
2012-12-17 21:18 ` Vivek Goyal
2012-12-17 21:20 ` Tejun Heo
2012-12-14 22:41 ` [PATCH 07/12] cfq-iosched: implement hierarchy-ready cfq_group charge scaling Tejun Heo
2012-12-17 20:53 ` Vivek Goyal
2012-12-17 21:17 ` Tejun Heo
2012-12-17 21:27 ` Vivek Goyal
2012-12-17 21:33 ` Tejun Heo
2012-12-17 21:49 ` Vivek Goyal
2012-12-17 22:12 ` Tejun Heo
2012-12-14 22:41 ` [PATCH 08/12] cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction Tejun Heo
2012-12-14 22:41 ` [PATCH 09/12] cfq-iosched: enable full blkcg hierarchy support Tejun Heo
2012-12-18 18:40 ` Vivek Goyal
2012-12-18 19:10 ` Tejun Heo
2012-12-18 19:16 ` Vivek Goyal
2012-12-18 19:17 ` Tejun Heo
2012-12-14 22:41 ` [PATCH 10/12] blkcg: add blkg_policy_data->plid Tejun Heo
2012-12-14 22:41 ` [PATCH 11/12] blkcg: implement blkg_prfill_[rw]stat_recursive() Tejun Heo
2012-12-14 22:41 ` [PATCH 12/12] cfq-iosched: add hierarchical cfq_group statistics Tejun Heo
2012-12-18 19:11 ` Vivek Goyal
2012-12-18 19:14 ` Tejun Heo
2012-12-18 19:18 ` Vivek Goyal
2012-12-18 19:21 ` Tejun Heo
2012-12-18 19:26 ` Vivek Goyal
2012-12-17 16:52 ` [PATCHSET] block: implement blkcg hierarchy support in cfq Vivek Goyal
2012-12-17 17:38 ` Tejun Heo
2012-12-17 18:50 ` Vivek Goyal
2012-12-17 18:59 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1355524885-22719-1-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=rni@google.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).