From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, axboe@kernel.dk, vgoyal@redhat.com
Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, ctalbott@google.com,
rni@google.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 15/24] cfq-iosched: enable full blkcg hierarchy support
Date: Fri, 28 Dec 2012 12:35:37 -0800 [thread overview]
Message-ID: <1356726946-26037-16-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1356726946-26037-1-git-send-email-tj@kernel.org>
With the previous two patches, all cfqg scheduling decisions are based
on vfraction and ready for hierarchy support. The only thing which
keeps the behavior flat is cfqg_flat_parent() which makes vfraction
calculation consider all non-root cfqgs children of the root cfqg.
Replace it with cfqg_parent() which returns the real parent. This
enables full blkcg hierarchy support for cfq-iosched. For example,
consider the following hierarchy.
root
/ \
A:500 B:250
/ \
AA:500 AB:1000
For simplicity, let's say all the leaf nodes have active tasks and are
on service tree. For each leaf node, vfraction would be
AA: (500 / 1500) * (500 / 750) =~ 0.2222
AB: (1000 / 1500) * (500 / 750) =~ 0.4444
B: (250 / 750) =~ 0.3333
and vdisktime will be distributed accordingly. For more detail,
please refer to Documentation/block/cfq-iosched.txt.
v2: cfq-iosched.txt updated to describe group scheduling as suggested
by Vivek.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
Documentation/block/cfq-iosched.txt | 58 +++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 21 ++++----------
2 files changed, 64 insertions(+), 15 deletions(-)
diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
index d89b4fe..a5eb7d1 100644
--- a/Documentation/block/cfq-iosched.txt
+++ b/Documentation/block/cfq-iosched.txt
@@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the
performace although this can cause the latency of some I/O to increase due
to more number of requests.
+CFQ Group scheduling
+====================
+
+CFQ supports blkio cgroup and has "blkio." prefixed files in each
+blkio cgroup directory. It is weight-based and there are four knobs
+for configuration - weight[_device] and leaf_weight[_device].
+Internal cgroup nodes (the ones with children) can also have tasks in
+them, so the former two configure how much proportion the cgroup as a
+whole is entitled to at its parent's level while the latter two
+configure how much proportion the tasks in the cgroup have compared to
+its direct children.
+
+Another way to think about it is assuming that each internal node has
+an implicit leaf child node which hosts all the tasks whose weight is
+configured by leaf_weight[_device]. Let's assume a blkio hierarchy
+composed of five cgroups - root, A, B, AA and AB - with the following
+weights where the names represent the hierarchy.
+
+ weight leaf_weight
+ root : 125 125
+ A : 500 750
+ B : 250 500
+ AA : 500 500
+ AB : 1000 500
+
+root never has a parent making its weight is meaningless. For backward
+compatibility, weight is always kept in sync with leaf_weight. B, AA
+and AB have no child and thus its tasks have no children cgroup to
+compete with. They always get 100% of what the cgroup won at the
+parent level. Considering only the weights which matter, the hierarchy
+looks like the following.
+
+ root
+ / | \
+ A B leaf
+ 500 250 125
+ / | \
+ AA AB leaf
+ 500 1000 750
+
+If all cgroups have active IOs and competing with each other, disk
+time will be distributed like the following.
+
+Distribution below root. The total active weight at this level is
+A:500 + B:250 + C:125 = 875.
+
+ root-leaf : 125 / 875 =~ 14%
+ A : 500 / 875 =~ 57%
+ B(-leaf) : 250 / 875 =~ 28%
+
+A has children and further distributes its 57% among the children and
+the implicit leaf node. The total active weight at this level is
+AA:500 + AB:1000 + A-leaf:750 = 2250.
+
+ A-leaf : ( 750 / 2250) * A =~ 19%
+ AA(-leaf) : ( 500 / 2250) * A =~ 12%
+ AB(-leaf) : (1000 / 2250) * A =~ 25%
+
CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ee34282..e8f3106 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -606,20 +606,11 @@ static inline struct cfq_group *blkg_to_cfqg(struct blkcg_gq *blkg)
return pd_to_cfqg(blkg_to_pd(blkg, &blkcg_policy_cfq));
}
-/*
- * Determine the parent cfqg for weight calculation. Currently, cfqg
- * scheduling is flat and the root is the parent of everyone else.
- */
-static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg)
+static inline struct cfq_group *cfqg_parent(struct cfq_group *cfqg)
{
- struct blkcg_gq *blkg = cfqg_to_blkg(cfqg);
- struct cfq_group *root;
-
- while (blkg->parent)
- blkg = blkg->parent;
- root = blkg_to_cfqg(blkg);
+ struct blkcg_gq *pblkg = cfqg_to_blkg(cfqg)->parent;
- return root != cfqg ? root : NULL;
+ return pblkg ? blkg_to_cfqg(pblkg) : NULL;
}
static inline void cfqg_get(struct cfq_group *cfqg)
@@ -722,7 +713,7 @@ static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
#else /* CONFIG_CFQ_GROUP_IOSCHED */
-static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg) { return NULL; }
+static inline struct cfq_group *cfqg_parent(struct cfq_group *cfqg) { return NULL; }
static inline void cfqg_get(struct cfq_group *cfqg) { }
static inline void cfqg_put(struct cfq_group *cfqg) { }
@@ -1290,7 +1281,7 @@ cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
* stops once an already activated node is met. vfraction
* calculation should always continue to the root.
*/
- while ((parent = cfqg_flat_parent(pos))) {
+ while ((parent = cfqg_parent(pos))) {
if (propagate) {
propagate = !parent->nr_active++;
parent->children_weight += pos->weight;
@@ -1341,7 +1332,7 @@ cfq_group_service_tree_del(struct cfq_rb_root *st, struct cfq_group *cfqg)
pos->children_weight -= pos->leaf_weight;
while (propagate) {
- struct cfq_group *parent = cfqg_flat_parent(pos);
+ struct cfq_group *parent = cfqg_parent(pos);
/* @pos has 0 nr_active at this point */
WARN_ON_ONCE(pos->children_weight);
--
1.8.0.2
next prev parent reply other threads:[~2012-12-28 20:35 UTC|newest]
Thread overview: 130+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-28 20:35 [PATCHSET] block: implement blkcg hierarchy support in cfq, take#2 Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 11/24] cfq-iosched: add leaf_weight Tejun Heo
[not found] ` <1356726946-26037-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 15:34 ` Vivek Goyal
2013-01-08 15:34 ` Vivek Goyal
[not found] ` <20130108153448.GB29635-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-08 17:24 ` Tejun Heo
2013-01-08 17:24 ` Tejun Heo
2013-01-08 17:24 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 12/24] cfq-iosched: implement cfq_group->nr_active and ->children_weight Tejun Heo
[not found] ` <1356726946-26037-13-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 15:51 ` Vivek Goyal
2013-01-08 15:51 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 14/24] cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction Tejun Heo
[not found] ` <1356726946-26037-15-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 16:42 ` Vivek Goyal
2013-01-08 16:42 ` Vivek Goyal
2013-01-08 16:42 ` Vivek Goyal
2012-12-28 20:35 ` Tejun Heo [this message]
[not found] ` <1356726946-26037-16-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-07 16:34 ` [PATCH UPDATED 15/24] cfq-iosched: enable full blkcg hierarchy support Tejun Heo
2013-01-07 16:34 ` Tejun Heo
[not found] ` <20130107163405.GE3926-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-01-08 14:42 ` Vivek Goyal
2013-01-08 14:42 ` Vivek Goyal
[not found] ` <20130108144240.GA29635-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-08 17:19 ` Tejun Heo
2013-01-08 17:19 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 16/24] blkcg: add blkg_policy_data->plid Tejun Heo
[not found] ` <1356726946-26037-17-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 16:51 ` Vivek Goyal
2013-01-08 16:51 ` Vivek Goyal
2013-01-08 16:51 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 19/24] blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() Tejun Heo
[not found] ` <1356726946-26037-20-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 18:03 ` Vivek Goyal
2013-01-08 18:03 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 20/24] block: RCU free request_queue Tejun Heo
[not found] ` <1356726946-26037-21-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-02 18:48 ` Vivek Goyal
2013-01-02 18:48 ` Vivek Goyal
[not found] ` <20130102184814.GD4306-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-02 20:43 ` Tejun Heo
2013-01-02 20:43 ` Tejun Heo
2013-01-08 18:05 ` Vivek Goyal
2013-01-08 18:05 ` Vivek Goyal
2013-01-08 18:05 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 22/24] cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() Tejun Heo
[not found] ` <1356726946-26037-23-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 18:09 ` Vivek Goyal
2013-01-08 18:09 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 23/24] cfq-iosched: collect stats from dead cfqgs Tejun Heo
[not found] ` <1356726946-26037-24-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-02 16:24 ` Vivek Goyal
2013-01-02 16:24 ` Vivek Goyal
[not found] ` <20130102162415.GA4306-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-02 16:30 ` Tejun Heo
2013-01-02 16:30 ` Tejun Heo
2013-01-02 16:30 ` Tejun Heo
[not found] ` <20130102163010.GC11220-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-01-02 16:44 ` Vivek Goyal
2013-01-02 16:44 ` Vivek Goyal
[not found] ` <20130102164415.GB4306-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-02 16:52 ` Tejun Heo
2013-01-02 16:52 ` Tejun Heo
2013-01-02 16:52 ` Tejun Heo
2013-01-02 16:24 ` Vivek Goyal
2013-01-08 18:12 ` Vivek Goyal
2013-01-08 18:12 ` Vivek Goyal
2013-01-08 18:12 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 24/24] cfq-iosched: add hierarchical cfq_group statistics Tejun Heo
[not found] ` <1356726946-26037-25-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 18:27 ` Vivek Goyal
2013-01-08 18:27 ` Vivek Goyal
2013-01-08 18:27 ` Vivek Goyal
[not found] ` <1356726946-26037-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-12-28 20:35 ` [PATCH 01/24] cfq-iosched: Properly name all references to IO class Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 02/24] cfq-iosched: More renaming to better represent wl_class and wl_type Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 03/24] cfq-iosched: Rename "service_tree" to "st" at some places Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 04/24] cfq-iosched: Rename few functions related to selecting workload Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 05/24] cfq-iosched: Get rid of unnecessary local variable Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 06/24] cfq-iosched: Print sync-noidle information in blktrace messages Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 07/24] blkcg: fix minor bug in blkg_alloc() Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 08/24] blkcg: reorganize blkg_lookup_create() and friends Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 09/24] blkcg: cosmetic updates to blkg_create() Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 10/24] blkcg: make blkcg_gq's hierarchical Tejun Heo
2012-12-28 20:35 ` Tejun Heo
2012-12-28 20:35 ` [PATCH 11/24] cfq-iosched: add leaf_weight Tejun Heo
2012-12-28 20:35 ` [PATCH 12/24] cfq-iosched: implement cfq_group->nr_active and ->children_weight Tejun Heo
2012-12-28 20:35 ` [PATCH 13/24] cfq-iosched: implement hierarchy-ready cfq_group charge scaling Tejun Heo
2012-12-28 20:35 ` Tejun Heo
[not found] ` <1356726946-26037-14-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 16:16 ` Vivek Goyal
2013-01-08 16:16 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 14/24] cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction Tejun Heo
2012-12-28 20:35 ` [PATCH 15/24] cfq-iosched: enable full blkcg hierarchy support Tejun Heo
2012-12-28 20:35 ` [PATCH 16/24] blkcg: add blkg_policy_data->plid Tejun Heo
2012-12-28 20:35 ` [PATCH 17/24] blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->online Tejun Heo
2012-12-28 20:35 ` Tejun Heo
[not found] ` <1356726946-26037-18-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-02 19:38 ` Vivek Goyal
2013-01-02 19:38 ` Vivek Goyal
2013-01-02 19:38 ` Vivek Goyal
[not found] ` <20130102193828.GE4306-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-02 20:37 ` Tejun Heo
2013-01-02 20:37 ` Tejun Heo
2013-01-02 20:37 ` Tejun Heo
2013-01-08 16:58 ` Vivek Goyal
2013-01-08 16:58 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 18/24] blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/ Tejun Heo
2012-12-28 20:35 ` Tejun Heo
[not found] ` <1356726946-26037-19-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-08 16:59 ` Vivek Goyal
2013-01-08 16:59 ` Vivek Goyal
2013-01-08 16:59 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 19/24] blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() Tejun Heo
2012-12-28 20:35 ` [PATCH 20/24] block: RCU free request_queue Tejun Heo
2012-12-28 20:35 ` [PATCH 21/24] blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock Tejun Heo
2012-12-28 20:35 ` Tejun Heo
[not found] ` <1356726946-26037-22-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-01-02 19:27 ` Vivek Goyal
2013-01-02 19:27 ` Vivek Goyal
[not found] ` <20130102192700.GA9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-02 20:45 ` Tejun Heo
2013-01-02 20:45 ` Tejun Heo
2013-01-02 20:45 ` Tejun Heo
2013-01-02 19:27 ` Vivek Goyal
2013-01-08 18:08 ` Vivek Goyal
2013-01-08 18:08 ` Vivek Goyal
2013-01-08 18:08 ` Vivek Goyal
2012-12-28 20:35 ` [PATCH 22/24] cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() Tejun Heo
2012-12-28 20:35 ` [PATCH 23/24] cfq-iosched: collect stats from dead cfqgs Tejun Heo
2012-12-28 20:35 ` [PATCH 24/24] cfq-iosched: add hierarchical cfq_group statistics Tejun Heo
2012-12-28 23:18 ` [PATCH 18.5/24] blkcg: export __blkg_prfill_rwstat() take#2 Tejun Heo
2012-12-28 23:18 ` Tejun Heo
2012-12-28 23:18 ` Tejun Heo
2013-01-02 18:20 ` [PATCHSET] block: implement blkcg hierarchy support in cfq, take#2 Vivek Goyal
2013-01-02 18:20 ` Vivek Goyal
2013-01-02 18:20 ` Vivek Goyal
[not found] ` <20130102182037.GC4306-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-07 16:34 ` Tejun Heo
2013-01-07 16:34 ` Tejun Heo
2013-01-07 16:34 ` Tejun Heo
[not found] ` <20130107163437.GF3926-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-01-08 18:28 ` Vivek Goyal
2013-01-08 18:28 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1356726946-26037-16-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=rni@google.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.