From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, axboe@kernel.dk, vgoyal@redhat.com
Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, ctalbott@google.com,
rni@google.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 15/24] cfq-iosched: enable full blkcg hierarchy support
Date: Fri, 28 Dec 2012 12:35:37 -0800 [thread overview]
Message-ID: <1356726946-26037-16-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1356726946-26037-1-git-send-email-tj@kernel.org>
With the previous two patches, all cfqg scheduling decisions are based
on vfraction and ready for hierarchy support. The only thing which
keeps the behavior flat is cfqg_flat_parent() which makes vfraction
calculation consider all non-root cfqgs children of the root cfqg.
Replace it with cfqg_parent() which returns the real parent. This
enables full blkcg hierarchy support for cfq-iosched. For example,
consider the following hierarchy.
root
/ \
A:500 B:250
/ \
AA:500 AB:1000
For simplicity, let's say all the leaf nodes have active tasks and are
on service tree. For each leaf node, vfraction would be
AA: (500 / 1500) * (500 / 750) =~ 0.2222
AB: (1000 / 1500) * (500 / 750) =~ 0.4444
B: (250 / 750) =~ 0.3333
and vdisktime will be distributed accordingly. For more detail,
please refer to Documentation/block/cfq-iosched.txt.
v2: cfq-iosched.txt updated to describe group scheduling as suggested
by Vivek.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
Documentation/block/cfq-iosched.txt | 58 +++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 21 ++++----------
2 files changed, 64 insertions(+), 15 deletions(-)
diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
index d89b4fe..a5eb7d1 100644
--- a/Documentation/block/cfq-iosched.txt
+++ b/Documentation/block/cfq-iosched.txt
@@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the
performace although this can cause the latency of some I/O to increase due
to more number of requests.
+CFQ Group scheduling
+====================
+
+CFQ supports blkio cgroup and has "blkio." prefixed files in each
+blkio cgroup directory. It is weight-based and there are four knobs
+for configuration - weight[_device] and leaf_weight[_device].
+Internal cgroup nodes (the ones with children) can also have tasks in
+them, so the former two configure how much proportion the cgroup as a
+whole is entitled to at its parent's level while the latter two
+configure how much proportion the tasks in the cgroup have compared to
+its direct children.
+
+Another way to think about it is assuming that each internal node has
+an implicit leaf child node which hosts all the tasks whose weight is
+configured by leaf_weight[_device]. Let's assume a blkio hierarchy
+composed of five cgroups - root, A, B, AA and AB - with the following
+weights where the names represent the hierarchy.
+
+ weight leaf_weight
+ root : 125 125
+ A : 500 750
+ B : 250 500
+ AA : 500 500
+ AB : 1000 500
+
+root never has a parent making its weight is meaningless. For backward
+compatibility, weight is always kept in sync with leaf_weight. B, AA
+and AB have no child and thus its tasks have no children cgroup to
+compete with. They always get 100% of what the cgroup won at the
+parent level. Considering only the weights which matter, the hierarchy
+looks like the following.
+
+ root
+ / | \
+ A B leaf
+ 500 250 125
+ / | \
+ AA AB leaf
+ 500 1000 750
+
+If all cgroups have active IOs and competing with each other, disk
+time will be distributed like the following.
+
+Distribution below root. The total active weight at this level is
+A:500 + B:250 + C:125 = 875.
+
+ root-leaf : 125 / 875 =~ 14%
+ A : 500 / 875 =~ 57%
+ B(-leaf) : 250 / 875 =~ 28%
+
+A has children and further distributes its 57% among the children and
+the implicit leaf node. The total active weight at this level is
+AA:500 + AB:1000 + A-leaf:750 = 2250.
+
+ A-leaf : ( 750 / 2250) * A =~ 19%
+ AA(-leaf) : ( 500 / 2250) * A =~ 12%
+ AB(-leaf) : (1000 / 2250) * A =~ 25%
+
CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ee34282..e8f3106 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -606,20 +606,11 @@ static inline struct cfq_group *blkg_to_cfqg(struct blkcg_gq *blkg)
return pd_to_cfqg(blkg_to_pd(blkg, &blkcg_policy_cfq));
}
-/*
- * Determine the parent cfqg for weight calculation. Currently, cfqg
- * scheduling is flat and the root is the parent of everyone else.
- */
-static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg)
+static inline struct cfq_group *cfqg_parent(struct cfq_group *cfqg)
{
- struct blkcg_gq *blkg = cfqg_to_blkg(cfqg);
- struct cfq_group *root;
-
- while (blkg->parent)
- blkg = blkg->parent;
- root = blkg_to_cfqg(blkg);
+ struct blkcg_gq *pblkg = cfqg_to_blkg(cfqg)->parent;
- return root != cfqg ? root : NULL;
+ return pblkg ? blkg_to_cfqg(pblkg) : NULL;
}
static inline void cfqg_get(struct cfq_group *cfqg)
@@ -722,7 +713,7 @@ static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
#else /* CONFIG_CFQ_GROUP_IOSCHED */
-static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg) { return NULL; }
+static inline struct cfq_group *cfqg_parent(struct cfq_group *cfqg) { return NULL; }
static inline void cfqg_get(struct cfq_group *cfqg) { }
static inline void cfqg_put(struct cfq_group *cfqg) { }
@@ -1290,7 +1281,7 @@ cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
* stops once an already activated node is met. vfraction
* calculation should always continue to the root.
*/
- while ((parent = cfqg_flat_parent(pos))) {
+ while ((parent = cfqg_parent(pos))) {
if (propagate) {
propagate = !parent->nr_active++;
parent->children_weight += pos->weight;
@@ -1341,7 +1332,7 @@ cfq_group_service_tree_del(struct cfq_rb_root *st, struct cfq_group *cfqg)
pos->children_weight -= pos->leaf_weight;
while (propagate) {
- struct cfq_group *parent = cfqg_flat_parent(pos);
+ struct cfq_group *parent = cfqg_parent(pos);
/* @pos has 0 nr_active at this point */
WARN_ON_ONCE(pos->children_weight);
--
1.8.0.2
next prev parent reply other threads:[~2012-12-28 20:36 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-28 20:35 [PATCHSET] block: implement blkcg hierarchy support in cfq, take#2 Tejun Heo
2012-12-28 20:35 ` [PATCH 01/24] cfq-iosched: Properly name all references to IO class Tejun Heo
2012-12-28 20:35 ` [PATCH 02/24] cfq-iosched: More renaming to better represent wl_class and wl_type Tejun Heo
2012-12-28 20:35 ` [PATCH 03/24] cfq-iosched: Rename "service_tree" to "st" at some places Tejun Heo
2012-12-28 20:35 ` [PATCH 04/24] cfq-iosched: Rename few functions related to selecting workload Tejun Heo
2012-12-28 20:35 ` [PATCH 05/24] cfq-iosched: Get rid of unnecessary local variable Tejun Heo
2012-12-28 20:35 ` [PATCH 06/24] cfq-iosched: Print sync-noidle information in blktrace messages Tejun Heo
2012-12-28 20:35 ` [PATCH 07/24] blkcg: fix minor bug in blkg_alloc() Tejun Heo
2012-12-28 20:35 ` [PATCH 08/24] blkcg: reorganize blkg_lookup_create() and friends Tejun Heo
2012-12-28 20:35 ` [PATCH 09/24] blkcg: cosmetic updates to blkg_create() Tejun Heo
2012-12-28 20:35 ` [PATCH 10/24] blkcg: make blkcg_gq's hierarchical Tejun Heo
2012-12-28 20:35 ` [PATCH 11/24] cfq-iosched: add leaf_weight Tejun Heo
2013-01-08 15:34 ` Vivek Goyal
2013-01-08 17:24 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 12/24] cfq-iosched: implement cfq_group->nr_active and ->children_weight Tejun Heo
2013-01-08 15:51 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 13/24] cfq-iosched: implement hierarchy-ready cfq_group charge scaling Tejun Heo
2013-01-08 16:16 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 14/24] cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction Tejun Heo
2013-01-08 16:42 ` Vivek Goyal
2012-12-28 20:35 ` Tejun Heo [this message]
2013-01-07 16:34 ` [PATCH UPDATED 15/24] cfq-iosched: enable full blkcg hierarchy support Tejun Heo
2013-01-08 14:42 ` Vivek Goyal
2013-01-08 17:19 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 16/24] blkcg: add blkg_policy_data->plid Tejun Heo
2013-01-08 16:51 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 17/24] blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->online Tejun Heo
2013-01-02 19:38 ` Vivek Goyal
2013-01-02 20:37 ` Tejun Heo
2013-01-08 16:58 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 18/24] blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/ Tejun Heo
2013-01-08 16:59 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 19/24] blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() Tejun Heo
2013-01-08 18:03 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 20/24] block: RCU free request_queue Tejun Heo
2013-01-02 18:48 ` Vivek Goyal
2013-01-02 20:43 ` Tejun Heo
2013-01-08 18:05 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 21/24] blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock Tejun Heo
2013-01-02 19:27 ` Vivek Goyal
2013-01-02 20:45 ` Tejun Heo
2013-01-08 18:08 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 22/24] cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() Tejun Heo
2013-01-08 18:09 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 23/24] cfq-iosched: collect stats from dead cfqgs Tejun Heo
2013-01-02 16:24 ` Vivek Goyal
2013-01-02 16:30 ` Tejun Heo
2013-01-02 16:44 ` Vivek Goyal
2013-01-02 16:52 ` Tejun Heo
2013-01-08 18:12 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 24/24] cfq-iosched: add hierarchical cfq_group statistics Tejun Heo
2013-01-08 18:27 ` Vivek Goyal
2012-12-28 23:18 ` [PATCH 18.5/24] blkcg: export __blkg_prfill_rwstat() take#2 Tejun Heo
2013-01-02 18:20 ` [PATCHSET] block: implement blkcg hierarchy support in cfq, take#2 Vivek Goyal
2013-01-07 16:34 ` Tejun Heo
2013-01-08 18:28 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1356726946-26037-16-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=rni@google.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox