* [PATCH 01/33] blkcg: fix error return path in blkg_create()
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 02/33] blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.h Tejun Heo
` (20 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
In blkg_create(), after lookup of parent fails, the control jumps to
error path with the error code encoded into @blkg. The error path
doesn't use @blkg for the return value. It returns ERR_PTR(ret).
Make lookup fail path set @ret instead of @blkg.
Note that the parent lookup is guaranteed to succeed at that point and
the condition check is purely for sanity and triggers WARN when fails.
As such, I don't think it's necessary to mark it for -stable.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-cgroup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b2b9837..0ab211a 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -238,7 +238,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
if (blkcg_parent(blkcg)) {
blkg->parent = __blkg_lookup(blkcg_parent(blkcg), q, false);
if (WARN_ON_ONCE(!blkg->parent)) {
- blkg = ERR_PTR(-EINVAL);
+ ret = -EINVAL;
goto err_put_css;
}
blkg_get(blkg->parent);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 02/33] blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.h
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-06 22:45 ` [PATCH 01/33] blkcg: fix error return path in blkg_create() Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 03/33] blkcg: implement blkg_for_each_descendant_post() Tejun Heo
` (19 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
blk-throttle hierarchy support will make use of it. Move
blkg_for_each_descendant_pre() from block/blk-cgroup.c to
block/blk-cgroup.h.
signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-cgroup.c | 24 ++----------------------
block/blk-cgroup.h | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0ab211a..6b10d5c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -32,26 +32,6 @@ EXPORT_SYMBOL_GPL(blkcg_root);
static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];
-static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg,
- struct request_queue *q, bool update_hint);
-
-/**
- * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants
- * @d_blkg: loop cursor pointing to the current descendant
- * @pos_cgrp: used for iteration
- * @p_blkg: target blkg to walk descendants of
- *
- * Walk @c_blkg through the descendants of @p_blkg. Must be used with RCU
- * read locked. If called under either blkcg or queue lock, the iteration
- * is guaranteed to include all and only online blkgs. The caller may
- * update @pos_cgrp by calling cgroup_rightmost_descendant() to skip
- * subtree.
- */
-#define blkg_for_each_descendant_pre(d_blkg, pos_cgrp, p_blkg) \
- cgroup_for_each_descendant_pre((pos_cgrp), (p_blkg)->blkcg->css.cgroup) \
- if (((d_blkg) = __blkg_lookup(cgroup_to_blkcg(pos_cgrp), \
- (p_blkg)->q, false)))
-
static bool blkcg_policy_enabled(struct request_queue *q,
const struct blkcg_policy *pol)
{
@@ -158,8 +138,8 @@ err_free:
* @q's bypass state. If @update_hint is %true, the caller should be
* holding @q->queue_lock and lookup hint is updated on success.
*/
-static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg,
- struct request_queue *q, bool update_hint)
+struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, struct request_queue *q,
+ bool update_hint)
{
struct blkcg_gq *blkg;
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 4e595ee..11f5b92 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -282,6 +282,26 @@ static inline void blkg_put(struct blkcg_gq *blkg)
__blkg_release(blkg);
}
+struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, struct request_queue *q,
+ bool update_hint);
+
+/**
+ * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants
+ * @d_blkg: loop cursor pointing to the current descendant
+ * @pos_cgrp: used for iteration
+ * @p_blkg: target blkg to walk descendants of
+ *
+ * Walk @c_blkg through the descendants of @p_blkg. Must be used with RCU
+ * read locked. If called under either blkcg or queue lock, the iteration
+ * is guaranteed to include all and only online blkgs. The caller may
+ * update @pos_cgrp by calling cgroup_rightmost_descendant() to skip
+ * subtree.
+ */
+#define blkg_for_each_descendant_pre(d_blkg, pos_cgrp, p_blkg) \
+ cgroup_for_each_descendant_pre((pos_cgrp), (p_blkg)->blkcg->css.cgroup) \
+ if (((d_blkg) = __blkg_lookup(cgroup_to_blkcg(pos_cgrp), \
+ (p_blkg)->q, false)))
+
/**
* blk_get_rl - get request_list to use
* @q: request_queue of interest
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 03/33] blkcg: implement blkg_for_each_descendant_post()
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-06 22:45 ` [PATCH 01/33] blkcg: fix error return path in blkg_create() Tejun Heo
2013-05-06 22:45 ` [PATCH 02/33] blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.h Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 04/33] blkcg: invoke blkcg_policy->pd_init() after parent is linked Tejun Heo
` (18 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
This will be used by blk-throttle hierarchy support.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-cgroup.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 11f5b92..e15f731 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -303,6 +303,20 @@ struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, struct request_queue *q,
(p_blkg)->q, false)))
/**
+ * blkg_for_each_descendant_post - post-order walk of a blkg's descendants
+ * @d_blkg: loop cursor pointing to the current descendant
+ * @pos_cgrp: used for iteration
+ * @p_blkg: target blkg to walk descendants of
+ *
+ * Similar to blkg_for_each_descendant_pre() but performs post-order
+ * traversal instead. Synchronization rules are the same.
+ */
+#define blkg_for_each_descendant_post(d_blkg, pos_cgrp, p_blkg) \
+ cgroup_for_each_descendant_post((pos_cgrp), (p_blkg)->blkcg->css.cgroup) \
+ if (((d_blkg) = __blkg_lookup(cgroup_to_blkcg(pos_cgrp), \
+ (p_blkg)->q, false)))
+
+/**
* blk_get_rl - get request_list to use
* @q: request_queue of interest
* @bio: bio which will be attached to the allocated request (may be %NULL)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 04/33] blkcg: invoke blkcg_policy->pd_init() after parent is linked
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (2 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 03/33] blkcg: implement blkg_for_each_descendant_post() Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 05/33] blkcg: move bulk of blkcg_gq release operations to the RCU callback Tejun Heo
` (17 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
Currently, when creating a new blkcg_gq, each policy's pd_init_fn() is
invoked in blkg_alloc() before the parent is linked. This makes it
difficult for policies to perform initializations which are dependent
on the parent.
This patch moves pd_init_fn() invocations to blkg_create() after the
parent blkg is linked where the new blkg is fully initialized. As
this means that blkg_free() can't assume that pd's are initialized,
pd_exit_fn() invocations are moved to __blkg_release(). This
guarantees that pd_exit_fn() is also invoked with fully initialized
blkgs with valid parent pointers.
This will help implementing hierarchy support in blk-throttle.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-cgroup.c | 39 ++++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 17 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6b10d5c..f13cf95 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -51,18 +51,8 @@ static void blkg_free(struct blkcg_gq *blkg)
if (!blkg)
return;
- for (i = 0; i < BLKCG_MAX_POLS; i++) {
- struct blkcg_policy *pol = blkcg_policy[i];
- struct blkg_policy_data *pd = blkg->pd[i];
-
- if (!pd)
- continue;
-
- if (pol && pol->pd_exit_fn)
- pol->pd_exit_fn(blkg);
-
- kfree(pd);
- }
+ for (i = 0; i < BLKCG_MAX_POLS; i++)
+ kfree(blkg->pd[i]);
blk_exit_rl(&blkg->rl);
kfree(blkg);
@@ -114,10 +104,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
blkg->pd[i] = pd;
pd->blkg = blkg;
pd->plid = i;
-
- /* invoke per-policy init */
- if (pol->pd_init_fn)
- pol->pd_init_fn(blkg);
}
return blkg;
@@ -214,7 +200,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
}
blkg = new_blkg;
- /* link parent and insert */
+ /* link parent */
if (blkcg_parent(blkcg)) {
blkg->parent = __blkg_lookup(blkcg_parent(blkcg), q, false);
if (WARN_ON_ONCE(!blkg->parent)) {
@@ -224,6 +210,15 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_get(blkg->parent);
}
+ /* invoke per-policy init */
+ for (i = 0; i < BLKCG_MAX_POLS; i++) {
+ struct blkcg_policy *pol = blkcg_policy[i];
+
+ if (blkg->pd[i] && pol->pd_init_fn)
+ pol->pd_init_fn(blkg);
+ }
+
+ /* insert */
spin_lock(&blkcg->lock);
ret = radix_tree_insert(&blkcg->blkg_tree, q->id, blkg);
if (likely(!ret)) {
@@ -381,6 +376,16 @@ static void blkg_rcu_free(struct rcu_head *rcu_head)
void __blkg_release(struct blkcg_gq *blkg)
{
+ int i;
+
+ /* tell policies that this one is being freed */
+ for (i = 0; i < BLKCG_MAX_POLS; i++) {
+ struct blkcg_policy *pol = blkcg_policy[i];
+
+ if (blkg->pd[i] && pol->pd_exit_fn)
+ pol->pd_exit_fn(blkg);
+ }
+
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
if (blkg->parent)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 05/33] blkcg: move bulk of blkcg_gq release operations to the RCU callback
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (3 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 04/33] blkcg: invoke blkcg_policy->pd_init() after parent is linked Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 06/33] blk-throttle: remove spurious throtl_enqueue_tg() call from throtl_select_dispatch() Tejun Heo
` (16 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
Currently, when the last reference of a blkcg_gq is put, all then
release operations sans the actual freeing happen directly in
blkg_put(). As blkg_put() may be called under queue_lock, all
pd_exit_fn()s may be too. This makes it impossible for pd_exit_fn()s
to use del_timer_sync() on timers which grab the queue_lock which is
an irq-safe lock due to the deadlock possibility described in the
comment on top of del_timer_sync().
This can be easily avoided by perfoming the release operations in the
RCU callback instead of directly from blkg_put(). This patch moves
the blkcg_gq release operations to the RCU callback.
As this leaves __blkg_release() with only call_rcu() invocation,
blkg_rcu_free() is renamed to __blkg_release_rcu(), exported and
call_rcu() invocation is now done directly from blkg_put() instead of
going through __blkg_release() which is removed.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-cgroup.c | 34 ++++++++++++++++------------------
block/blk-cgroup.h | 4 ++--
2 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f13cf95..af2ca27 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -369,13 +369,17 @@ static void blkg_destroy_all(struct request_queue *q)
q->root_rl.blkg = NULL;
}
-static void blkg_rcu_free(struct rcu_head *rcu_head)
-{
- blkg_free(container_of(rcu_head, struct blkcg_gq, rcu_head));
-}
-
-void __blkg_release(struct blkcg_gq *blkg)
+/*
+ * A group is RCU protected, but having an rcu lock does not mean that one
+ * can access all the fields of blkg and assume these are valid. For
+ * example, don't try to follow throtl_data and request queue links.
+ *
+ * Having a reference to blkg under an rcu allows accesses to only values
+ * local to groups like group stats and group rate limits.
+ */
+void __blkg_release_rcu(struct rcu_head *rcu_head)
{
+ struct blkcg_gq *blkg = container_of(rcu_head, struct blkcg_gq, rcu_head);
int i;
/* tell policies that this one is being freed */
@@ -388,21 +392,15 @@ void __blkg_release(struct blkcg_gq *blkg)
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
- if (blkg->parent)
+ if (blkg->parent) {
+ spin_lock_irq(blkg->q->queue_lock);
blkg_put(blkg->parent);
+ spin_unlock_irq(blkg->q->queue_lock);
+ }
- /*
- * A group is freed in rcu manner. But having an rcu lock does not
- * mean that one can access all the fields of blkg and assume these
- * are valid. For example, don't try to follow throtl_data and
- * request queue links.
- *
- * Having a reference to blkg under an rcu allows acess to only
- * values local to groups like group stats and group rate limits
- */
- call_rcu(&blkg->rcu_head, blkg_rcu_free);
+ blkg_free(blkg);
}
-EXPORT_SYMBOL_GPL(__blkg_release);
+EXPORT_SYMBOL_GPL(__blkg_release_rcu);
/*
* The next function used by blk_queue_for_each_rl(). It's a bit tricky
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index e15f731..8056c03 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -266,7 +266,7 @@ static inline void blkg_get(struct blkcg_gq *blkg)
blkg->refcnt++;
}
-void __blkg_release(struct blkcg_gq *blkg);
+void __blkg_release_rcu(struct rcu_head *rcu);
/**
* blkg_put - put a blkg reference
@@ -279,7 +279,7 @@ static inline void blkg_put(struct blkcg_gq *blkg)
lockdep_assert_held(blkg->q->queue_lock);
WARN_ON_ONCE(blkg->refcnt <= 0);
if (!--blkg->refcnt)
- __blkg_release(blkg);
+ call_rcu(&blkg->rcu_head, __blkg_release_rcu);
}
struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, struct request_queue *q,
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 06/33] blk-throttle: remove spurious throtl_enqueue_tg() call from throtl_select_dispatch()
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (4 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 05/33] blkcg: move bulk of blkcg_gq release operations to the RCU callback Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 08/33] blk-throttle: collapse throtl_dispatch() into the work function Tejun Heo
` (15 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
throtl_select_dispatch() calls throtl_enqueue_tg() right after
tg_update_disptime(), which always calls the function anyway. The
call is, while harmless, unnecessary. Remove it.
This patch doesn't introduce any behavior difference.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 3114622..3960787 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -816,10 +816,8 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl)
nr_disp += throtl_dispatch_tg(td, tg, bl);
- if (tg->nr_queued[0] || tg->nr_queued[1]) {
+ if (tg->nr_queued[0] || tg->nr_queued[1])
tg_update_disptime(td, tg);
- throtl_enqueue_tg(td, tg);
- }
if (nr_disp >= throtl_quantum)
break;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 08/33] blk-throttle: collapse throtl_dispatch() into the work function
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (5 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 06/33] blk-throttle: remove spurious throtl_enqueue_tg() call from throtl_select_dispatch() Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 09/33] blk-throttle: relocate throtl_schedule_delayed_work() Tejun Heo
` (14 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
blk-throttle is about to go through major restructuring to support
hierarchy. Do cosmetic updates in preparation.
* s/throtl_data->throtl_work/throtl_data->dispatch_work/
* s/blk_throtl_work()/blk_throtl_dispatch_work_fn()/
* Collapse throtl_dispatch() into blk_throtl_dispatch_work_fn()
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 26 +++++++++-----------------
1 file changed, 9 insertions(+), 17 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7dbd0e69..0a0bc00 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -108,7 +108,7 @@ struct throtl_data
unsigned int nr_undestroyed_grps;
/* Work for dispatching throttled bios */
- struct delayed_work throtl_work;
+ struct delayed_work dispatch_work;
};
/* list and work item to allocate percpu group stats */
@@ -820,10 +820,12 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl)
return nr_disp;
}
-/* Dispatch throttled bios. Should be called without queue lock held. */
-static int throtl_dispatch(struct request_queue *q)
+/* work function to dispatch throttled bios */
+void blk_throtl_dispatch_work_fn(struct work_struct *work)
{
- struct throtl_data *td = q->td;
+ struct throtl_data *td = container_of(to_delayed_work(work),
+ struct throtl_data, dispatch_work);
+ struct request_queue *q = td->queue;
unsigned int nr_disp = 0;
struct bio_list bio_list_on_stack;
struct bio *bio;
@@ -859,16 +861,6 @@ out:
generic_make_request(bio);
blk_finish_plug(&plug);
}
- return nr_disp;
-}
-
-void blk_throtl_work(struct work_struct *work)
-{
- struct throtl_data *td = container_of(work, struct throtl_data,
- throtl_work.work);
- struct request_queue *q = td->queue;
-
- throtl_dispatch(q);
}
/* Call with queue lock held */
@@ -876,7 +868,7 @@ static void
throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay)
{
- struct delayed_work *dwork = &td->throtl_work;
+ struct delayed_work *dwork = &td->dispatch_work;
if (total_nr_queued(td)) {
mod_delayed_work(kthrotld_workqueue, dwork, delay);
@@ -1057,7 +1049,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
{
struct throtl_data *td = q->td;
- cancel_delayed_work_sync(&td->throtl_work);
+ cancel_delayed_work_sync(&td->dispatch_work);
}
static struct blkcg_policy blkcg_policy_throtl = {
@@ -1206,7 +1198,7 @@ int blk_throtl_init(struct request_queue *q)
return -ENOMEM;
td->tg_service_tree = THROTL_RB_ROOT;
- INIT_DELAYED_WORK(&td->throtl_work, blk_throtl_work);
+ INIT_DELAYED_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn);
q->td = td;
td->queue = q;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 09/33] blk-throttle: relocate throtl_schedule_delayed_work()
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (6 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 08/33] blk-throttle: collapse throtl_dispatch() into the work function Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 11/33] blk-throttle: rename throtl_rb_root to throtl_service_queue Tejun Heo
` (13 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
Move throtl_schedule_delayed_work() above its first user so that the
forward declaration can be removed.
This patch is pure relocaiton.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 0a0bc00..507b1c6 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -25,8 +25,6 @@ static struct blkcg_policy blkcg_policy_throtl;
/* A workqueue to queue throttle related work */
static struct workqueue_struct *kthrotld_workqueue;
-static void throtl_schedule_delayed_work(struct throtl_data *td,
- unsigned long delay);
struct throtl_rb_root {
struct rb_root rb;
@@ -398,6 +396,19 @@ static void throtl_dequeue_tg(struct throtl_data *td, struct throtl_grp *tg)
__throtl_dequeue_tg(td, tg);
}
+/* Call with queue lock held */
+static void throtl_schedule_delayed_work(struct throtl_data *td,
+ unsigned long delay)
+{
+ struct delayed_work *dwork = &td->dispatch_work;
+
+ if (total_nr_queued(td)) {
+ mod_delayed_work(kthrotld_workqueue, dwork, delay);
+ throtl_log(td, "schedule work. delay=%lu jiffies=%lu",
+ delay, jiffies);
+ }
+}
+
static void throtl_schedule_next_dispatch(struct throtl_data *td)
{
struct throtl_rb_root *st = &td->tg_service_tree;
@@ -863,20 +874,6 @@ out:
}
}
-/* Call with queue lock held */
-static void
-throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay)
-{
-
- struct delayed_work *dwork = &td->dispatch_work;
-
- if (total_nr_queued(td)) {
- mod_delayed_work(kthrotld_workqueue, dwork, delay);
- throtl_log(td, "schedule work. delay=%lu jiffies=%lu",
- delay, jiffies);
- }
-}
-
static u64 tg_prfill_cpu_rwstat(struct seq_file *sf,
struct blkg_policy_data *pd, int off)
{
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 11/33] blk-throttle: rename throtl_rb_root to throtl_service_queue
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (7 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 09/33] blk-throttle: relocate throtl_schedule_delayed_work() Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 13/33] blk-throttle: add backlink pointer from throtl_grp to throtl_data Tejun Heo
` (12 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
throtl_rb_root will be expanded to cover more roles for hierarchy
support. Rename it to throtl_service_queue and make its fields more
descriptive.
* rb -> pending_tree
* left -> first_pending
* count -> nr_pending
* min_disptime -> first_pending_disptime
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
---
block/blk-throttle.c | 84 ++++++++++++++++++++++++++--------------------------
1 file changed, 42 insertions(+), 42 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index dbeef30..b279110 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -26,15 +26,15 @@ static struct blkcg_policy blkcg_policy_throtl;
/* A workqueue to queue throttle related work */
static struct workqueue_struct *kthrotld_workqueue;
-struct throtl_rb_root {
- struct rb_root rb;
- struct rb_node *left;
- unsigned int count;
- unsigned long min_disptime;
+struct throtl_service_queue {
+ struct rb_root pending_tree; /* RB tree of active tgs */
+ struct rb_node *first_pending; /* first node in the tree */
+ unsigned int nr_pending; /* # queued in the tree */
+ unsigned long first_pending_disptime; /* disptime of the first tg */
};
-#define THROTL_RB_ROOT (struct throtl_rb_root) { .rb = RB_ROOT, .left = NULL, \
- .count = 0, .min_disptime = 0}
+#define THROTL_SERVICE_QUEUE_INITIALIZER \
+ (struct throtl_service_queue){ .pending_tree = RB_ROOT }
#define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node)
@@ -50,7 +50,7 @@ struct throtl_grp {
/* must be the first member */
struct blkg_policy_data pd;
- /* active throtl group service_tree member */
+ /* active throtl group service_queue member */
struct rb_node rb_node;
/*
@@ -93,7 +93,7 @@ struct throtl_grp {
struct throtl_data
{
/* service tree for active throtl groups */
- struct throtl_rb_root tg_service_tree;
+ struct throtl_service_queue service_queue;
struct request_queue *queue;
@@ -296,17 +296,17 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
return tg;
}
-static struct throtl_grp *throtl_rb_first(struct throtl_rb_root *root)
+static struct throtl_grp *throtl_rb_first(struct throtl_service_queue *sq)
{
/* Service tree is empty */
- if (!root->count)
+ if (!sq->nr_pending)
return NULL;
- if (!root->left)
- root->left = rb_first(&root->rb);
+ if (!sq->first_pending)
+ sq->first_pending = rb_first(&sq->pending_tree);
- if (root->left)
- return rb_entry_tg(root->left);
+ if (sq->first_pending)
+ return rb_entry_tg(sq->first_pending);
return NULL;
}
@@ -317,29 +317,29 @@ static void rb_erase_init(struct rb_node *n, struct rb_root *root)
RB_CLEAR_NODE(n);
}
-static void throtl_rb_erase(struct rb_node *n, struct throtl_rb_root *root)
+static void throtl_rb_erase(struct rb_node *n, struct throtl_service_queue *sq)
{
- if (root->left == n)
- root->left = NULL;
- rb_erase_init(n, &root->rb);
- --root->count;
+ if (sq->first_pending == n)
+ sq->first_pending = NULL;
+ rb_erase_init(n, &sq->pending_tree);
+ --sq->nr_pending;
}
-static void update_min_dispatch_time(struct throtl_rb_root *st)
+static void update_min_dispatch_time(struct throtl_service_queue *sq)
{
struct throtl_grp *tg;
- tg = throtl_rb_first(st);
+ tg = throtl_rb_first(sq);
if (!tg)
return;
- st->min_disptime = tg->disptime;
+ sq->first_pending_disptime = tg->disptime;
}
-static void
-tg_service_tree_add(struct throtl_rb_root *st, struct throtl_grp *tg)
+static void tg_service_queue_add(struct throtl_service_queue *sq,
+ struct throtl_grp *tg)
{
- struct rb_node **node = &st->rb.rb_node;
+ struct rb_node **node = &sq->pending_tree.rb_node;
struct rb_node *parent = NULL;
struct throtl_grp *__tg;
unsigned long key = tg->disptime;
@@ -358,19 +358,19 @@ tg_service_tree_add(struct throtl_rb_root *st, struct throtl_grp *tg)
}
if (left)
- st->left = &tg->rb_node;
+ sq->first_pending = &tg->rb_node;
rb_link_node(&tg->rb_node, parent, node);
- rb_insert_color(&tg->rb_node, &st->rb);
+ rb_insert_color(&tg->rb_node, &sq->pending_tree);
}
static void __throtl_enqueue_tg(struct throtl_data *td, struct throtl_grp *tg)
{
- struct throtl_rb_root *st = &td->tg_service_tree;
+ struct throtl_service_queue *sq = &td->service_queue;
- tg_service_tree_add(st, tg);
+ tg_service_queue_add(sq, tg);
throtl_mark_tg_on_rr(tg);
- st->count++;
+ sq->nr_pending++;
}
static void throtl_enqueue_tg(struct throtl_data *td, struct throtl_grp *tg)
@@ -381,7 +381,7 @@ static void throtl_enqueue_tg(struct throtl_data *td, struct throtl_grp *tg)
static void __throtl_dequeue_tg(struct throtl_data *td, struct throtl_grp *tg)
{
- throtl_rb_erase(&tg->rb_node, &td->tg_service_tree);
+ throtl_rb_erase(&tg->rb_node, &td->service_queue);
throtl_clear_tg_on_rr(tg);
}
@@ -403,18 +403,18 @@ static void throtl_schedule_delayed_work(struct throtl_data *td,
static void throtl_schedule_next_dispatch(struct throtl_data *td)
{
- struct throtl_rb_root *st = &td->tg_service_tree;
+ struct throtl_service_queue *sq = &td->service_queue;
/* any pending children left? */
- if (!st->count)
+ if (!sq->nr_pending)
return;
- update_min_dispatch_time(st);
+ update_min_dispatch_time(sq);
- if (time_before_eq(st->min_disptime, jiffies))
+ if (time_before_eq(sq->first_pending_disptime, jiffies))
throtl_schedule_delayed_work(td, 0);
else
- throtl_schedule_delayed_work(td, (st->min_disptime - jiffies));
+ throtl_schedule_delayed_work(td, sq->first_pending_disptime - jiffies);
}
static inline void
@@ -794,10 +794,10 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl)
{
unsigned int nr_disp = 0;
struct throtl_grp *tg;
- struct throtl_rb_root *st = &td->tg_service_tree;
+ struct throtl_service_queue *sq = &td->service_queue;
while (1) {
- tg = throtl_rb_first(st);
+ tg = throtl_rb_first(sq);
if (!tg)
break;
@@ -1145,7 +1145,7 @@ void blk_throtl_drain(struct request_queue *q)
__releases(q->queue_lock) __acquires(q->queue_lock)
{
struct throtl_data *td = q->td;
- struct throtl_rb_root *st = &td->tg_service_tree;
+ struct throtl_service_queue *sq = &td->service_queue;
struct throtl_grp *tg;
struct bio_list bl;
struct bio *bio;
@@ -1154,7 +1154,7 @@ void blk_throtl_drain(struct request_queue *q)
bio_list_init(&bl);
- while ((tg = throtl_rb_first(st))) {
+ while ((tg = throtl_rb_first(sq))) {
throtl_dequeue_tg(td, tg);
while ((bio = bio_list_peek(&tg->bio_lists[READ])))
@@ -1179,7 +1179,7 @@ int blk_throtl_init(struct request_queue *q)
if (!td)
return -ENOMEM;
- td->tg_service_tree = THROTL_RB_ROOT;
+ td->service_queue = THROTL_SERVICE_QUEUE_INITIALIZER;
INIT_DELAYED_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn);
q->td = td;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 13/33] blk-throttle: add backlink pointer from throtl_grp to throtl_data
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (8 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 11/33] blk-throttle: rename throtl_rb_root to throtl_service_queue Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 14/33] blk-throttle: pass around throtl_service_queue instead of throtl_data Tejun Heo
` (11 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
Add throtl_grp->td so that the td (throtl_data) a given tg
(throtl_grp) belongs to can be determined, and remove @td argument
from functions which take both @td and @tg as the former now can be
determined from the latter.
This generally simplifies the code and removes a number of cases where
@td is passed as an argument without being actually used. This will
also help hierarchy support implementation.
While at it, in multi-line conditions, move the logical operators
leading broken lines to the end of the previous line.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 106 +++++++++++++++++++++++++--------------------------
1 file changed, 53 insertions(+), 53 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index e8ef43d..a489391 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -57,6 +57,9 @@ struct throtl_grp {
/* active throtl group service_queue member */
struct rb_node rb_node;
+ /* throtl_data this group belongs to */
+ struct throtl_data *td;
+
/*
* Dispatch time in jiffies. This is the estimated time when group
* will unthrottle and is ready to dispatch more bio. It is used as
@@ -140,11 +143,11 @@ static inline struct throtl_grp *td_root_tg(struct throtl_data *td)
return blkg_to_tg(td->queue->root_blkg);
}
-#define throtl_log_tg(td, tg, fmt, args...) do { \
+#define throtl_log_tg(tg, fmt, args...) do { \
char __pbuf[128]; \
\
blkg_path(tg_to_blkg(tg), __pbuf, sizeof(__pbuf)); \
- blk_add_trace_msg((td)->queue, "throtl %s " fmt, __pbuf, ##args); \
+ blk_add_trace_msg((tg)->td->queue, "throtl %s " fmt, __pbuf, ##args); \
} while (0)
#define throtl_log(td, fmt, args...) \
@@ -193,6 +196,7 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
unsigned long flags;
RB_CLEAR_NODE(&tg->rb_node);
+ tg->td = blkg->q->td;
bio_list_init(&tg->bio_lists[0]);
bio_list_init(&tg->bio_lists[1]);
@@ -401,36 +405,34 @@ static void throtl_schedule_next_dispatch(struct throtl_data *td)
throtl_schedule_delayed_work(td, sq->first_pending_disptime - jiffies);
}
-static inline void
-throtl_start_new_slice(struct throtl_data *td, struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
{
tg->bytes_disp[rw] = 0;
tg->io_disp[rw] = 0;
tg->slice_start[rw] = jiffies;
tg->slice_end[rw] = jiffies + throtl_slice;
- throtl_log_tg(td, tg, "[%c] new slice start=%lu end=%lu jiffies=%lu",
+ throtl_log_tg(tg, "[%c] new slice start=%lu end=%lu jiffies=%lu",
rw == READ ? 'R' : 'W', tg->slice_start[rw],
tg->slice_end[rw], jiffies);
}
-static inline void throtl_set_slice_end(struct throtl_data *td,
- struct throtl_grp *tg, bool rw, unsigned long jiffy_end)
+static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
+ unsigned long jiffy_end)
{
tg->slice_end[rw] = roundup(jiffy_end, throtl_slice);
}
-static inline void throtl_extend_slice(struct throtl_data *td,
- struct throtl_grp *tg, bool rw, unsigned long jiffy_end)
+static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw,
+ unsigned long jiffy_end)
{
tg->slice_end[rw] = roundup(jiffy_end, throtl_slice);
- throtl_log_tg(td, tg, "[%c] extend slice start=%lu end=%lu jiffies=%lu",
+ throtl_log_tg(tg, "[%c] extend slice start=%lu end=%lu jiffies=%lu",
rw == READ ? 'R' : 'W', tg->slice_start[rw],
tg->slice_end[rw], jiffies);
}
/* Determine if previously allocated or extended slice is complete or not */
-static bool
-throtl_slice_used(struct throtl_data *td, struct throtl_grp *tg, bool rw)
+static bool throtl_slice_used(struct throtl_grp *tg, bool rw)
{
if (time_in_range(jiffies, tg->slice_start[rw], tg->slice_end[rw]))
return 0;
@@ -439,8 +441,7 @@ throtl_slice_used(struct throtl_data *td, struct throtl_grp *tg, bool rw)
}
/* Trim the used slices and adjust slice start accordingly */
-static inline void
-throtl_trim_slice(struct throtl_data *td, struct throtl_grp *tg, bool rw)
+static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
{
unsigned long nr_slices, time_elapsed, io_trim;
u64 bytes_trim, tmp;
@@ -452,7 +453,7 @@ throtl_trim_slice(struct throtl_data *td, struct throtl_grp *tg, bool rw)
* renewed. Don't try to trim the slice if slice is used. A new
* slice will start when appropriate.
*/
- if (throtl_slice_used(td, tg, rw))
+ if (throtl_slice_used(tg, rw))
return;
/*
@@ -463,7 +464,7 @@ throtl_trim_slice(struct throtl_data *td, struct throtl_grp *tg, bool rw)
* is bad because it does not allow new slice to start.
*/
- throtl_set_slice_end(td, tg, rw, jiffies + throtl_slice);
+ throtl_set_slice_end(tg, rw, jiffies + throtl_slice);
time_elapsed = jiffies - tg->slice_start[rw];
@@ -492,14 +493,14 @@ throtl_trim_slice(struct throtl_data *td, struct throtl_grp *tg, bool rw)
tg->slice_start[rw] += nr_slices * throtl_slice;
- throtl_log_tg(td, tg, "[%c] trim slice nr=%lu bytes=%llu io=%lu"
+ throtl_log_tg(tg, "[%c] trim slice nr=%lu bytes=%llu io=%lu"
" start=%lu end=%lu jiffies=%lu",
rw == READ ? 'R' : 'W', nr_slices, bytes_trim, io_trim,
tg->slice_start[rw], tg->slice_end[rw], jiffies);
}
-static bool tg_with_in_iops_limit(struct throtl_data *td, struct throtl_grp *tg,
- struct bio *bio, unsigned long *wait)
+static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
+ unsigned long *wait)
{
bool rw = bio_data_dir(bio);
unsigned int io_allowed;
@@ -548,8 +549,8 @@ static bool tg_with_in_iops_limit(struct throtl_data *td, struct throtl_grp *tg,
return 0;
}
-static bool tg_with_in_bps_limit(struct throtl_data *td, struct throtl_grp *tg,
- struct bio *bio, unsigned long *wait)
+static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
+ unsigned long *wait)
{
bool rw = bio_data_dir(bio);
u64 bytes_allowed, extra_bytes, tmp;
@@ -600,8 +601,8 @@ static bool tg_no_rule_group(struct throtl_grp *tg, bool rw) {
* Returns whether one can dispatch a bio or not. Also returns approx number
* of jiffies to wait before this bio is with-in IO rate and can be dispatched
*/
-static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
- struct bio *bio, unsigned long *wait)
+static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
+ unsigned long *wait)
{
bool rw = bio_data_dir(bio);
unsigned long bps_wait = 0, iops_wait = 0, max_wait = 0;
@@ -626,15 +627,15 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
* existing slice to make sure it is at least throtl_slice interval
* long since now.
*/
- if (throtl_slice_used(td, tg, rw))
- throtl_start_new_slice(td, tg, rw);
+ if (throtl_slice_used(tg, rw))
+ throtl_start_new_slice(tg, rw);
else {
if (time_before(tg->slice_end[rw], jiffies + throtl_slice))
- throtl_extend_slice(td, tg, rw, jiffies + throtl_slice);
+ throtl_extend_slice(tg, rw, jiffies + throtl_slice);
}
- if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
- && tg_with_in_iops_limit(td, tg, bio, &iops_wait)) {
+ if (tg_with_in_bps_limit(tg, bio, &bps_wait) &&
+ tg_with_in_iops_limit(tg, bio, &iops_wait)) {
if (wait)
*wait = 0;
return 1;
@@ -646,7 +647,7 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
*wait = max_wait;
if (time_before(tg->slice_end[rw], jiffies + max_wait))
- throtl_extend_slice(td, tg, rw, jiffies + max_wait);
+ throtl_extend_slice(tg, rw, jiffies + max_wait);
return 0;
}
@@ -707,10 +708,10 @@ static void tg_update_disptime(struct throtl_data *td, struct throtl_grp *tg)
struct bio *bio;
if ((bio = bio_list_peek(&tg->bio_lists[READ])))
- tg_may_dispatch(td, tg, bio, &read_wait);
+ tg_may_dispatch(tg, bio, &read_wait);
if ((bio = bio_list_peek(&tg->bio_lists[WRITE])))
- tg_may_dispatch(td, tg, bio, &write_wait);
+ tg_may_dispatch(tg, bio, &write_wait);
min_wait = min(read_wait, write_wait);
disptime = jiffies + min_wait;
@@ -721,8 +722,8 @@ static void tg_update_disptime(struct throtl_data *td, struct throtl_grp *tg)
throtl_enqueue_tg(td, tg);
}
-static void tg_dispatch_one_bio(struct throtl_data *td, struct throtl_grp *tg,
- bool rw, struct bio_list *bl)
+static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
+ struct bio_list *bl)
{
struct bio *bio;
@@ -731,18 +732,17 @@ static void tg_dispatch_one_bio(struct throtl_data *td, struct throtl_grp *tg,
/* Drop bio reference on blkg */
blkg_put(tg_to_blkg(tg));
- BUG_ON(td->nr_queued[rw] <= 0);
- td->nr_queued[rw]--;
+ BUG_ON(tg->td->nr_queued[rw] <= 0);
+ tg->td->nr_queued[rw]--;
throtl_charge_bio(tg, bio);
bio_list_add(bl, bio);
bio->bi_rw |= REQ_THROTTLED;
- throtl_trim_slice(td, tg, rw);
+ throtl_trim_slice(tg, rw);
}
-static int throtl_dispatch_tg(struct throtl_data *td, struct throtl_grp *tg,
- struct bio_list *bl)
+static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
{
unsigned int nr_reads = 0, nr_writes = 0;
unsigned int max_nr_reads = throtl_grp_quantum*3/4;
@@ -751,20 +751,20 @@ static int throtl_dispatch_tg(struct throtl_data *td, struct throtl_grp *tg,
/* Try to dispatch 75% READS and 25% WRITES */
- while ((bio = bio_list_peek(&tg->bio_lists[READ]))
- && tg_may_dispatch(td, tg, bio, NULL)) {
+ while ((bio = bio_list_peek(&tg->bio_lists[READ])) &&
+ tg_may_dispatch(tg, bio, NULL)) {
- tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), bl);
nr_reads++;
if (nr_reads >= max_nr_reads)
break;
}
- while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
- && tg_may_dispatch(td, tg, bio, NULL)) {
+ while ((bio = bio_list_peek(&tg->bio_lists[WRITE])) &&
+ tg_may_dispatch(tg, bio, NULL)) {
- tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), bl);
nr_writes++;
if (nr_writes >= max_nr_writes)
@@ -791,7 +791,7 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl)
throtl_dequeue_tg(td, tg);
- nr_disp += throtl_dispatch_tg(td, tg, bl);
+ nr_disp += throtl_dispatch_tg(tg, bl);
if (tg->nr_queued[0] || tg->nr_queued[1])
tg_update_disptime(td, tg);
@@ -933,7 +933,7 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
else
*(unsigned int *)((void *)tg + cft->private) = ctx.v;
- throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu riops=%u wiops=%u",
+ throtl_log_tg(tg, "limit change rbps=%llu wbps=%llu riops=%u wiops=%u",
tg->bps[READ], tg->bps[WRITE],
tg->iops[READ], tg->iops[WRITE]);
@@ -945,8 +945,8 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
* that a group's limit are dropped suddenly and we don't want to
* account recently dispatched IO with new low rate.
*/
- throtl_start_new_slice(td, tg, 0);
- throtl_start_new_slice(td, tg, 1);
+ throtl_start_new_slice(tg, 0);
+ throtl_start_new_slice(tg, 1);
if (tg->flags & THROTL_TG_PENDING) {
tg_update_disptime(td, tg);
@@ -1076,7 +1076,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
}
/* Bio is with-in rate limit of group */
- if (tg_may_dispatch(td, tg, bio, NULL)) {
+ if (tg_may_dispatch(tg, bio, NULL)) {
throtl_charge_bio(tg, bio);
/*
@@ -1090,12 +1090,12 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
*
* So keep on trimming slice even if bio is not queued.
*/
- throtl_trim_slice(td, tg, rw);
+ throtl_trim_slice(tg, rw);
goto out_unlock;
}
queue_bio:
- throtl_log_tg(td, tg, "[%c] bio. bdisp=%llu sz=%u bps=%llu"
+ throtl_log_tg(tg, "[%c] bio. bdisp=%llu sz=%u bps=%llu"
" iodisp=%u iops=%u queued=%d/%d",
rw == READ ? 'R' : 'W',
tg->bytes_disp[rw], bio->bi_size, tg->bps[rw],
@@ -1142,9 +1142,9 @@ void blk_throtl_drain(struct request_queue *q)
throtl_dequeue_tg(td, tg);
while ((bio = bio_list_peek(&tg->bio_lists[READ])))
- tg_dispatch_one_bio(td, tg, bio_data_dir(bio), &bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), &bl);
while ((bio = bio_list_peek(&tg->bio_lists[WRITE])))
- tg_dispatch_one_bio(td, tg, bio_data_dir(bio), &bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), &bl);
}
spin_unlock_irq(q->queue_lock);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 14/33] blk-throttle: pass around throtl_service_queue instead of throtl_data
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (9 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 13/33] blk-throttle: add backlink pointer from throtl_grp to throtl_data Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 15/33] blk-throttle: reorganize throtl_service_queue passed around as argument Tejun Heo
` (10 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
throtl_service_queue will be used as the basic block to implement
hierarchy support. Pass around throtl_service_queue *sq instead of
throtl_data *td in the following functions which will be used across
multiple levels of hierarchy.
* [__]throtl_enqueue/dequeue_tg()
* throtl_add_bio_tg()
* tg_update_disptime()
* throtl_select_dispatch()
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 53 +++++++++++++++++++++++++++-------------------------
1 file changed, 28 insertions(+), 25 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a489391..9660ec8 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -352,31 +352,33 @@ static void tg_service_queue_add(struct throtl_service_queue *sq,
rb_insert_color(&tg->rb_node, &sq->pending_tree);
}
-static void __throtl_enqueue_tg(struct throtl_data *td, struct throtl_grp *tg)
+static void __throtl_enqueue_tg(struct throtl_service_queue *sq,
+ struct throtl_grp *tg)
{
- struct throtl_service_queue *sq = &td->service_queue;
-
tg_service_queue_add(sq, tg);
tg->flags |= THROTL_TG_PENDING;
sq->nr_pending++;
}
-static void throtl_enqueue_tg(struct throtl_data *td, struct throtl_grp *tg)
+static void throtl_enqueue_tg(struct throtl_service_queue *sq,
+ struct throtl_grp *tg)
{
if (!(tg->flags & THROTL_TG_PENDING))
- __throtl_enqueue_tg(td, tg);
+ __throtl_enqueue_tg(sq, tg);
}
-static void __throtl_dequeue_tg(struct throtl_data *td, struct throtl_grp *tg)
+static void __throtl_dequeue_tg(struct throtl_service_queue *sq,
+ struct throtl_grp *tg)
{
- throtl_rb_erase(&tg->rb_node, &td->service_queue);
+ throtl_rb_erase(&tg->rb_node, sq);
tg->flags &= ~THROTL_TG_PENDING;
}
-static void throtl_dequeue_tg(struct throtl_data *td, struct throtl_grp *tg)
+static void throtl_dequeue_tg(struct throtl_service_queue *sq,
+ struct throtl_grp *tg)
{
if (tg->flags & THROTL_TG_PENDING)
- __throtl_dequeue_tg(td, tg);
+ __throtl_dequeue_tg(sq, tg);
}
/* Call with queue lock held */
@@ -689,8 +691,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw);
}
-static void throtl_add_bio_tg(struct throtl_data *td, struct throtl_grp *tg,
- struct bio *bio)
+static void throtl_add_bio_tg(struct throtl_service_queue *sq,
+ struct throtl_grp *tg, struct bio *bio)
{
bool rw = bio_data_dir(bio);
@@ -698,11 +700,12 @@ static void throtl_add_bio_tg(struct throtl_data *td, struct throtl_grp *tg,
/* Take a bio reference on tg */
blkg_get(tg_to_blkg(tg));
tg->nr_queued[rw]++;
- td->nr_queued[rw]++;
- throtl_enqueue_tg(td, tg);
+ tg->td->nr_queued[rw]++;
+ throtl_enqueue_tg(sq, tg);
}
-static void tg_update_disptime(struct throtl_data *td, struct throtl_grp *tg)
+static void tg_update_disptime(struct throtl_service_queue *sq,
+ struct throtl_grp *tg)
{
unsigned long read_wait = -1, write_wait = -1, min_wait = -1, disptime;
struct bio *bio;
@@ -717,9 +720,9 @@ static void tg_update_disptime(struct throtl_data *td, struct throtl_grp *tg)
disptime = jiffies + min_wait;
/* Update dispatch time */
- throtl_dequeue_tg(td, tg);
+ throtl_dequeue_tg(sq, tg);
tg->disptime = disptime;
- throtl_enqueue_tg(td, tg);
+ throtl_enqueue_tg(sq, tg);
}
static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
@@ -774,11 +777,11 @@ static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
return nr_reads + nr_writes;
}
-static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl)
+static int throtl_select_dispatch(struct throtl_service_queue *sq,
+ struct bio_list *bl)
{
unsigned int nr_disp = 0;
struct throtl_grp *tg;
- struct throtl_service_queue *sq = &td->service_queue;
while (1) {
tg = throtl_rb_first(sq);
@@ -789,12 +792,12 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl)
if (time_before(jiffies, tg->disptime))
break;
- throtl_dequeue_tg(td, tg);
+ throtl_dequeue_tg(sq, tg);
nr_disp += throtl_dispatch_tg(tg, bl);
if (tg->nr_queued[0] || tg->nr_queued[1])
- tg_update_disptime(td, tg);
+ tg_update_disptime(sq, tg);
if (nr_disp >= throtl_quantum)
break;
@@ -822,7 +825,7 @@ void blk_throtl_dispatch_work_fn(struct work_struct *work)
td->nr_queued[READ] + td->nr_queued[WRITE],
td->nr_queued[READ], td->nr_queued[WRITE]);
- nr_disp = throtl_select_dispatch(td, &bio_list_on_stack);
+ nr_disp = throtl_select_dispatch(&td->service_queue, &bio_list_on_stack);
if (nr_disp)
throtl_log(td, "bios disp=%u", nr_disp);
@@ -949,7 +952,7 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
throtl_start_new_slice(tg, 1);
if (tg->flags & THROTL_TG_PENDING) {
- tg_update_disptime(td, tg);
+ tg_update_disptime(&td->service_queue, tg);
throtl_schedule_next_dispatch(td);
}
@@ -1103,11 +1106,11 @@ queue_bio:
tg->nr_queued[READ], tg->nr_queued[WRITE]);
bio_associate_current(bio);
- throtl_add_bio_tg(q->td, tg, bio);
+ throtl_add_bio_tg(&q->td->service_queue, tg, bio);
throttled = true;
if (update_disptime) {
- tg_update_disptime(td, tg);
+ tg_update_disptime(&td->service_queue, tg);
throtl_schedule_next_dispatch(td);
}
@@ -1139,7 +1142,7 @@ void blk_throtl_drain(struct request_queue *q)
bio_list_init(&bl);
while ((tg = throtl_rb_first(sq))) {
- throtl_dequeue_tg(td, tg);
+ throtl_dequeue_tg(sq, tg);
while ((bio = bio_list_peek(&tg->bio_lists[READ])))
tg_dispatch_one_bio(tg, bio_data_dir(bio), &bl);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 15/33] blk-throttle: reorganize throtl_service_queue passed around as argument
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (10 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 14/33] blk-throttle: pass around throtl_service_queue instead of throtl_data Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 16/33] blk-throttle: add throtl_grp->service_queue Tejun Heo
` (9 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
throtl_service_queue will be the building block of hierarchy support
and will form a tree. This patch updates its usages as arguments to
reduce confusion.
* When a service queue is used as the parent role - the host of the
rbtree - use @parent_sq instead of @sq.
* For functions taking both @tg and @parent_sq, reorder them so that
the order is (@tg, @parent_sq) not the other way around. This makes
the code follow the usual convention of specifying the primary
target of the operation as the first argument.
This patch doesn't make any functional differences.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 100 ++++++++++++++++++++++++++-------------------------
1 file changed, 51 insertions(+), 49 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 9660ec8..ebaaaa9 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -284,17 +284,18 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
return tg;
}
-static struct throtl_grp *throtl_rb_first(struct throtl_service_queue *sq)
+static struct throtl_grp *
+throtl_rb_first(struct throtl_service_queue *parent_sq)
{
/* Service tree is empty */
- if (!sq->nr_pending)
+ if (!parent_sq->nr_pending)
return NULL;
- if (!sq->first_pending)
- sq->first_pending = rb_first(&sq->pending_tree);
+ if (!parent_sq->first_pending)
+ parent_sq->first_pending = rb_first(&parent_sq->pending_tree);
- if (sq->first_pending)
- return rb_entry_tg(sq->first_pending);
+ if (parent_sq->first_pending)
+ return rb_entry_tg(parent_sq->first_pending);
return NULL;
}
@@ -305,29 +306,30 @@ static void rb_erase_init(struct rb_node *n, struct rb_root *root)
RB_CLEAR_NODE(n);
}
-static void throtl_rb_erase(struct rb_node *n, struct throtl_service_queue *sq)
+static void throtl_rb_erase(struct rb_node *n,
+ struct throtl_service_queue *parent_sq)
{
- if (sq->first_pending == n)
- sq->first_pending = NULL;
- rb_erase_init(n, &sq->pending_tree);
- --sq->nr_pending;
+ if (parent_sq->first_pending == n)
+ parent_sq->first_pending = NULL;
+ rb_erase_init(n, &parent_sq->pending_tree);
+ --parent_sq->nr_pending;
}
-static void update_min_dispatch_time(struct throtl_service_queue *sq)
+static void update_min_dispatch_time(struct throtl_service_queue *parent_sq)
{
struct throtl_grp *tg;
- tg = throtl_rb_first(sq);
+ tg = throtl_rb_first(parent_sq);
if (!tg)
return;
- sq->first_pending_disptime = tg->disptime;
+ parent_sq->first_pending_disptime = tg->disptime;
}
-static void tg_service_queue_add(struct throtl_service_queue *sq,
- struct throtl_grp *tg)
+static void tg_service_queue_add(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
- struct rb_node **node = &sq->pending_tree.rb_node;
+ struct rb_node **node = &parent_sq->pending_tree.rb_node;
struct rb_node *parent = NULL;
struct throtl_grp *__tg;
unsigned long key = tg->disptime;
@@ -346,39 +348,39 @@ static void tg_service_queue_add(struct throtl_service_queue *sq,
}
if (left)
- sq->first_pending = &tg->rb_node;
+ parent_sq->first_pending = &tg->rb_node;
rb_link_node(&tg->rb_node, parent, node);
- rb_insert_color(&tg->rb_node, &sq->pending_tree);
+ rb_insert_color(&tg->rb_node, &parent_sq->pending_tree);
}
-static void __throtl_enqueue_tg(struct throtl_service_queue *sq,
- struct throtl_grp *tg)
+static void __throtl_enqueue_tg(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
- tg_service_queue_add(sq, tg);
+ tg_service_queue_add(tg, parent_sq);
tg->flags |= THROTL_TG_PENDING;
- sq->nr_pending++;
+ parent_sq->nr_pending++;
}
-static void throtl_enqueue_tg(struct throtl_service_queue *sq,
- struct throtl_grp *tg)
+static void throtl_enqueue_tg(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
if (!(tg->flags & THROTL_TG_PENDING))
- __throtl_enqueue_tg(sq, tg);
+ __throtl_enqueue_tg(tg, parent_sq);
}
-static void __throtl_dequeue_tg(struct throtl_service_queue *sq,
- struct throtl_grp *tg)
+static void __throtl_dequeue_tg(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
- throtl_rb_erase(&tg->rb_node, sq);
+ throtl_rb_erase(&tg->rb_node, parent_sq);
tg->flags &= ~THROTL_TG_PENDING;
}
-static void throtl_dequeue_tg(struct throtl_service_queue *sq,
- struct throtl_grp *tg)
+static void throtl_dequeue_tg(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
if (tg->flags & THROTL_TG_PENDING)
- __throtl_dequeue_tg(sq, tg);
+ __throtl_dequeue_tg(tg, parent_sq);
}
/* Call with queue lock held */
@@ -691,8 +693,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw);
}
-static void throtl_add_bio_tg(struct throtl_service_queue *sq,
- struct throtl_grp *tg, struct bio *bio)
+static void throtl_add_bio_tg(struct bio *bio, struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
bool rw = bio_data_dir(bio);
@@ -701,11 +703,11 @@ static void throtl_add_bio_tg(struct throtl_service_queue *sq,
blkg_get(tg_to_blkg(tg));
tg->nr_queued[rw]++;
tg->td->nr_queued[rw]++;
- throtl_enqueue_tg(sq, tg);
+ throtl_enqueue_tg(tg, parent_sq);
}
-static void tg_update_disptime(struct throtl_service_queue *sq,
- struct throtl_grp *tg)
+static void tg_update_disptime(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
unsigned long read_wait = -1, write_wait = -1, min_wait = -1, disptime;
struct bio *bio;
@@ -720,9 +722,9 @@ static void tg_update_disptime(struct throtl_service_queue *sq,
disptime = jiffies + min_wait;
/* Update dispatch time */
- throtl_dequeue_tg(sq, tg);
+ throtl_dequeue_tg(tg, parent_sq);
tg->disptime = disptime;
- throtl_enqueue_tg(sq, tg);
+ throtl_enqueue_tg(tg, parent_sq);
}
static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
@@ -777,14 +779,14 @@ static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
return nr_reads + nr_writes;
}
-static int throtl_select_dispatch(struct throtl_service_queue *sq,
+static int throtl_select_dispatch(struct throtl_service_queue *parent_sq,
struct bio_list *bl)
{
unsigned int nr_disp = 0;
struct throtl_grp *tg;
while (1) {
- tg = throtl_rb_first(sq);
+ tg = throtl_rb_first(parent_sq);
if (!tg)
break;
@@ -792,12 +794,12 @@ static int throtl_select_dispatch(struct throtl_service_queue *sq,
if (time_before(jiffies, tg->disptime))
break;
- throtl_dequeue_tg(sq, tg);
+ throtl_dequeue_tg(tg, parent_sq);
nr_disp += throtl_dispatch_tg(tg, bl);
if (tg->nr_queued[0] || tg->nr_queued[1])
- tg_update_disptime(sq, tg);
+ tg_update_disptime(tg, parent_sq);
if (nr_disp >= throtl_quantum)
break;
@@ -952,7 +954,7 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
throtl_start_new_slice(tg, 1);
if (tg->flags & THROTL_TG_PENDING) {
- tg_update_disptime(&td->service_queue, tg);
+ tg_update_disptime(tg, &td->service_queue);
throtl_schedule_next_dispatch(td);
}
@@ -1106,11 +1108,11 @@ queue_bio:
tg->nr_queued[READ], tg->nr_queued[WRITE]);
bio_associate_current(bio);
- throtl_add_bio_tg(&q->td->service_queue, tg, bio);
+ throtl_add_bio_tg(bio, tg, &q->td->service_queue);
throttled = true;
if (update_disptime) {
- tg_update_disptime(&td->service_queue, tg);
+ tg_update_disptime(tg, &td->service_queue);
throtl_schedule_next_dispatch(td);
}
@@ -1132,7 +1134,7 @@ void blk_throtl_drain(struct request_queue *q)
__releases(q->queue_lock) __acquires(q->queue_lock)
{
struct throtl_data *td = q->td;
- struct throtl_service_queue *sq = &td->service_queue;
+ struct throtl_service_queue *parent_sq = &td->service_queue;
struct throtl_grp *tg;
struct bio_list bl;
struct bio *bio;
@@ -1141,8 +1143,8 @@ void blk_throtl_drain(struct request_queue *q)
bio_list_init(&bl);
- while ((tg = throtl_rb_first(sq))) {
- throtl_dequeue_tg(sq, tg);
+ while ((tg = throtl_rb_first(parent_sq))) {
+ throtl_dequeue_tg(tg, parent_sq);
while ((bio = bio_list_peek(&tg->bio_lists[READ])))
tg_dispatch_one_bio(tg, bio_data_dir(bio), &bl);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 16/33] blk-throttle: add throtl_grp->service_queue
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (11 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 15/33] blk-throttle: reorganize throtl_service_queue passed around as argument Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 18/33] blk-throttle: dispatch to throtl_data->service_queue.bio_lists[] Tejun Heo
` (8 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lizefan-hv44wF8Li93QT0dZR+AlfA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
Tejun Heo
Currently, there's single service_queue per queue -
throtl_data->service_queue. All active throtl_grp's are queued on the
queue and dispatched according to their limits. To support hierarchy,
this will be expanded such that active throtl_grp's form a tree
anchored at throtl_data->service_queue and chained through each
intermediate throtl_grp's service_queue.
This patch adds throtl_grp->service_queue to prepare for hierarchy
support. The initialization function - throtl_service_queue_init() -
is added and replaces the macro initializer. The newly added
tg->service_queue isn't used yet. Following patches will do.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ebaaaa9..7340440 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -33,9 +33,6 @@ struct throtl_service_queue {
unsigned long first_pending_disptime; /* disptime of the first tg */
};
-#define THROTL_SERVICE_QUEUE_INITIALIZER \
- (struct throtl_service_queue){ .pending_tree = RB_ROOT }
-
enum tg_state_flags {
THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */
};
@@ -60,6 +57,9 @@ struct throtl_grp {
/* throtl_data this group belongs to */
struct throtl_data *td;
+ /* this group's service queue */
+ struct throtl_service_queue service_queue;
+
/*
* Dispatch time in jiffies. This is the estimated time when group
* will unthrottle and is ready to dispatch more bio. It is used as
@@ -190,11 +190,18 @@ alloc_stats:
goto alloc_stats;
}
+/* init a service_queue, assumes the caller zeroed it */
+static void throtl_service_queue_init(struct throtl_service_queue *sq)
+{
+ sq->pending_tree = RB_ROOT;
+}
+
static void throtl_pd_init(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
unsigned long flags;
+ throtl_service_queue_init(&tg->service_queue);
RB_CLEAR_NODE(&tg->rb_node);
tg->td = blkg->q->td;
bio_list_init(&tg->bio_lists[0]);
@@ -1168,8 +1175,8 @@ int blk_throtl_init(struct request_queue *q)
if (!td)
return -ENOMEM;
- td->service_queue = THROTL_SERVICE_QUEUE_INITIALIZER;
INIT_DELAYED_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn);
+ throtl_service_queue_init(&td->service_queue);
q->td = td;
td->queue = q;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 18/33] blk-throttle: dispatch to throtl_data->service_queue.bio_lists[]
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (12 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 16/33] blk-throttle: add throtl_grp->service_queue Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 19/33] blk-throttle: generalize update_disptime optimization in blk_throtl_bio() Tejun Heo
` (7 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lizefan-hv44wF8Li93QT0dZR+AlfA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
Tejun Heo
throtl_service_queues will eventually form a tree which is anchored at
throtl_data->service_queue and queue bios will climb the tree to the
top service_queue to be executed.
This patch makes the dispatch paths in blk_throtl_dispatch_work_fn()
and blk_throtl_drain() to dispatch bios to
throtl_data->service_queue.bio_lists[] instead of the on-stack
bio_lists. This will keep the final dispatch to the top level
service_queue share the same mechanism as dispatches through the rest
of the hierarchy.
As bio's should be issued in a sleepable context,
blk_throtl_dispatch_work_fn() transfers all dispatched bio's from the
service_queue bio_lists[] into an onstack one before dropping
queue_lock and issuing the bio's.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 40 +++++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 17 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 6f57f94..154bd63 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -743,7 +743,7 @@ static void tg_update_disptime(struct throtl_grp *tg,
}
static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
- struct bio_list *bl)
+ struct throtl_service_queue *parent_sq)
{
struct throtl_service_queue *sq = &tg->service_queue;
struct bio *bio;
@@ -757,13 +757,14 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
tg->td->nr_queued[rw]--;
throtl_charge_bio(tg, bio);
- bio_list_add(bl, bio);
+ bio_list_add(&parent_sq->bio_lists[rw], bio);
bio->bi_rw |= REQ_THROTTLED;
throtl_trim_slice(tg, rw);
}
-static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
+static int throtl_dispatch_tg(struct throtl_grp *tg,
+ struct throtl_service_queue *parent_sq)
{
struct throtl_service_queue *sq = &tg->service_queue;
unsigned int nr_reads = 0, nr_writes = 0;
@@ -776,7 +777,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
while ((bio = bio_list_peek(&sq->bio_lists[READ])) &&
tg_may_dispatch(tg, bio, NULL)) {
- tg_dispatch_one_bio(tg, bio_data_dir(bio), bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
nr_reads++;
if (nr_reads >= max_nr_reads)
@@ -786,7 +787,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
while ((bio = bio_list_peek(&sq->bio_lists[WRITE])) &&
tg_may_dispatch(tg, bio, NULL)) {
- tg_dispatch_one_bio(tg, bio_data_dir(bio), bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
nr_writes++;
if (nr_writes >= max_nr_writes)
@@ -796,8 +797,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg, struct bio_list *bl)
return nr_reads + nr_writes;
}
-static int throtl_select_dispatch(struct throtl_service_queue *parent_sq,
- struct bio_list *bl)
+static int throtl_select_dispatch(struct throtl_service_queue *parent_sq)
{
unsigned int nr_disp = 0;
@@ -813,7 +813,7 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq,
throtl_dequeue_tg(tg, parent_sq);
- nr_disp += throtl_dispatch_tg(tg, bl);
+ nr_disp += throtl_dispatch_tg(tg, parent_sq);
if (sq->nr_queued[0] || sq->nr_queued[1])
tg_update_disptime(tg, parent_sq);
@@ -830,11 +830,13 @@ void blk_throtl_dispatch_work_fn(struct work_struct *work)
{
struct throtl_data *td = container_of(to_delayed_work(work),
struct throtl_data, dispatch_work);
+ struct throtl_service_queue *sq = &td->service_queue;
struct request_queue *q = td->queue;
unsigned int nr_disp = 0;
struct bio_list bio_list_on_stack;
struct bio *bio;
struct blk_plug plug;
+ int rw;
spin_lock_irq(q->queue_lock);
@@ -844,10 +846,15 @@ void blk_throtl_dispatch_work_fn(struct work_struct *work)
td->nr_queued[READ] + td->nr_queued[WRITE],
td->nr_queued[READ], td->nr_queued[WRITE]);
- nr_disp = throtl_select_dispatch(&td->service_queue, &bio_list_on_stack);
+ nr_disp = throtl_select_dispatch(sq);
- if (nr_disp)
+ if (nr_disp) {
+ for (rw = READ; rw <= WRITE; rw++) {
+ bio_list_merge(&bio_list_on_stack, &sq->bio_lists[rw]);
+ bio_list_init(&sq->bio_lists[rw]);
+ }
throtl_log(td, "bios disp=%u", nr_disp);
+ }
throtl_schedule_next_dispatch(td);
@@ -1156,27 +1163,26 @@ void blk_throtl_drain(struct request_queue *q)
struct throtl_data *td = q->td;
struct throtl_service_queue *parent_sq = &td->service_queue;
struct throtl_grp *tg;
- struct bio_list bl;
struct bio *bio;
+ int rw;
queue_lockdep_assert_held(q);
- bio_list_init(&bl);
-
while ((tg = throtl_rb_first(parent_sq))) {
struct throtl_service_queue *sq = &tg->service_queue;
throtl_dequeue_tg(tg, parent_sq);
while ((bio = bio_list_peek(&sq->bio_lists[READ])))
- tg_dispatch_one_bio(tg, bio_data_dir(bio), &bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
while ((bio = bio_list_peek(&sq->bio_lists[WRITE])))
- tg_dispatch_one_bio(tg, bio_data_dir(bio), &bl);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
}
spin_unlock_irq(q->queue_lock);
- while ((bio = bio_list_pop(&bl)))
- generic_make_request(bio);
+ for (rw = READ; rw <= WRITE; rw++)
+ while ((bio = bio_list_pop(&parent_sq->bio_lists[rw])))
+ generic_make_request(bio);
spin_lock_irq(q->queue_lock);
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 19/33] blk-throttle: generalize update_disptime optimization in blk_throtl_bio()
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (13 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 18/33] blk-throttle: dispatch to throtl_data->service_queue.bio_lists[] Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:45 ` [PATCH 20/33] blk-throttle: add throtl_service_queue->parent_sq Tejun Heo
` (6 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
When blk_throtl_bio() wants to queue a bio to a tg (throtl_grp), it
avoids invoking tg_update_disptime() and
throtl_schedule_next_dispatch() if the tg already has bios queued in
that direction. As a new bio is appeneded after the existing ones, it
can't change the tg's next dispatch time or the parent's dispatch
schedule.
This optimization is currently open coded in blk_throtl_bio().
Whether the target biolist was occupied was recorded in a local
variable and later used to skip disptime update. This patch moves
generalizes it so that throtl_add_bio_tg() sets a new flag
THROTL_TG_WAS_EMPTY if the biolist was empty before the new bio was
added. tg_update_disptime() clears the flag automatically.
blk_throtl_bio() is updated to simply test the flag before updating
disptime.
This patch doesn't make any functional differences now but will enable
using the same optimization for recursive dispatch.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 154bd63..ec9397f 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -46,6 +46,7 @@ struct throtl_service_queue {
enum tg_state_flags {
THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */
+ THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */
};
#define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node)
@@ -712,6 +713,15 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_grp *tg,
struct throtl_service_queue *sq = &tg->service_queue;
bool rw = bio_data_dir(bio);
+ /*
+ * If @tg doesn't currently have any bios queued in the same
+ * direction, queueing @bio can change when @tg should be
+ * dispatched. Mark that @tg was empty. This is automatically
+ * cleaered on the next tg_update_disptime().
+ */
+ if (!sq->nr_queued[rw])
+ tg->flags |= THROTL_TG_WAS_EMPTY;
+
bio_list_add(&sq->bio_lists[rw], bio);
/* Take a bio reference on tg */
blkg_get(tg_to_blkg(tg));
@@ -740,6 +750,9 @@ static void tg_update_disptime(struct throtl_grp *tg,
throtl_dequeue_tg(tg, parent_sq);
tg->disptime = disptime;
throtl_enqueue_tg(tg, parent_sq);
+
+ /* see throtl_add_bio_tg() */
+ tg->flags &= ~THROTL_TG_WAS_EMPTY;
}
static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
@@ -1061,7 +1074,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
struct throtl_data *td = q->td;
struct throtl_grp *tg;
struct throtl_service_queue *sq;
- bool rw = bio_data_dir(bio), update_disptime = true;
+ bool rw = bio_data_dir(bio);
struct blkcg *blkcg;
bool throttled = false;
@@ -1097,16 +1110,10 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
sq = &tg->service_queue;
- if (sq->nr_queued[rw]) {
- /*
- * There is already another bio queued in same dir. No
- * need to update dispatch time.
- */
- update_disptime = false;
+ /* throtl is FIFO - if other bios are already queued, should queue */
+ if (sq->nr_queued[rw])
goto queue_bio;
- }
-
/* Bio is with-in rate limit of group */
if (tg_may_dispatch(tg, bio, NULL)) {
throtl_charge_bio(tg, bio);
@@ -1138,7 +1145,8 @@ queue_bio:
throtl_add_bio_tg(bio, tg, &q->td->service_queue);
throttled = true;
- if (update_disptime) {
+ /* update @tg's dispatch time if @tg was empty before @bio */
+ if (tg->flags & THROTL_TG_WAS_EMPTY) {
tg_update_disptime(tg, &td->service_queue);
throtl_schedule_next_dispatch(td);
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 20/33] blk-throttle: add throtl_service_queue->parent_sq
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (14 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 19/33] blk-throttle: generalize update_disptime optimization in blk_throtl_bio() Tejun Heo
@ 2013-05-06 22:45 ` Tejun Heo
2013-05-06 22:46 ` [PATCH 21/33] blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log() Tejun Heo
` (5 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:45 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
To prepare for hierarchy support, this patch adds
throtl_service_queue->service_sq which points to the arent
service_queue. Currently, for all service_queues embedded in
throtl_grps, it points to throtl_data->service_queue. As
throtl_data->service_queue doesn't have a parent its parent_sq is set
to NULL.
There are a number of functions which take both throtl_grp *tg and
throtl_service_queue *parent_sq. With this patch, the parent
service_queue can be determined from @tg and the @parent_sq arguments
are removed.
This patch doesn't make any behavior differences.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 81 +++++++++++++++++++++++++---------------------------
1 file changed, 39 insertions(+), 42 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ec9397f..00cfdd0 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -27,6 +27,8 @@ static struct blkcg_policy blkcg_policy_throtl;
static struct workqueue_struct *kthrotld_workqueue;
struct throtl_service_queue {
+ struct throtl_service_queue *parent_sq; /* the parent service_queue */
+
/*
* Bios queued directly to this service_queue or dispatched from
* children throtl_grp's.
@@ -197,21 +199,24 @@ alloc_stats:
}
/* init a service_queue, assumes the caller zeroed it */
-static void throtl_service_queue_init(struct throtl_service_queue *sq)
+static void throtl_service_queue_init(struct throtl_service_queue *sq,
+ struct throtl_service_queue *parent_sq)
{
bio_list_init(&sq->bio_lists[0]);
bio_list_init(&sq->bio_lists[1]);
sq->pending_tree = RB_ROOT;
+ sq->parent_sq = parent_sq;
}
static void throtl_pd_init(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
+ struct throtl_data *td = blkg->q->td;
unsigned long flags;
- throtl_service_queue_init(&tg->service_queue);
+ throtl_service_queue_init(&tg->service_queue, &td->service_queue);
RB_CLEAR_NODE(&tg->rb_node);
- tg->td = blkg->q->td;
+ tg->td = td;
tg->bps[READ] = -1;
tg->bps[WRITE] = -1;
@@ -339,9 +344,9 @@ static void update_min_dispatch_time(struct throtl_service_queue *parent_sq)
parent_sq->first_pending_disptime = tg->disptime;
}
-static void tg_service_queue_add(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void tg_service_queue_add(struct throtl_grp *tg)
{
+ struct throtl_service_queue *parent_sq = tg->service_queue.parent_sq;
struct rb_node **node = &parent_sq->pending_tree.rb_node;
struct rb_node *parent = NULL;
struct throtl_grp *__tg;
@@ -367,33 +372,29 @@ static void tg_service_queue_add(struct throtl_grp *tg,
rb_insert_color(&tg->rb_node, &parent_sq->pending_tree);
}
-static void __throtl_enqueue_tg(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void __throtl_enqueue_tg(struct throtl_grp *tg)
{
- tg_service_queue_add(tg, parent_sq);
+ tg_service_queue_add(tg);
tg->flags |= THROTL_TG_PENDING;
- parent_sq->nr_pending++;
+ tg->service_queue.parent_sq->nr_pending++;
}
-static void throtl_enqueue_tg(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void throtl_enqueue_tg(struct throtl_grp *tg)
{
if (!(tg->flags & THROTL_TG_PENDING))
- __throtl_enqueue_tg(tg, parent_sq);
+ __throtl_enqueue_tg(tg);
}
-static void __throtl_dequeue_tg(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void __throtl_dequeue_tg(struct throtl_grp *tg)
{
- throtl_rb_erase(&tg->rb_node, parent_sq);
+ throtl_rb_erase(&tg->rb_node, tg->service_queue.parent_sq);
tg->flags &= ~THROTL_TG_PENDING;
}
-static void throtl_dequeue_tg(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void throtl_dequeue_tg(struct throtl_grp *tg)
{
if (tg->flags & THROTL_TG_PENDING)
- __throtl_dequeue_tg(tg, parent_sq);
+ __throtl_dequeue_tg(tg);
}
/* Call with queue lock held */
@@ -707,8 +708,7 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw);
}
-static void throtl_add_bio_tg(struct bio *bio, struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void throtl_add_bio_tg(struct bio *bio, struct throtl_grp *tg)
{
struct throtl_service_queue *sq = &tg->service_queue;
bool rw = bio_data_dir(bio);
@@ -727,11 +727,10 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_grp *tg,
blkg_get(tg_to_blkg(tg));
sq->nr_queued[rw]++;
tg->td->nr_queued[rw]++;
- throtl_enqueue_tg(tg, parent_sq);
+ throtl_enqueue_tg(tg);
}
-static void tg_update_disptime(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static void tg_update_disptime(struct throtl_grp *tg)
{
struct throtl_service_queue *sq = &tg->service_queue;
unsigned long read_wait = -1, write_wait = -1, min_wait = -1, disptime;
@@ -747,16 +746,15 @@ static void tg_update_disptime(struct throtl_grp *tg,
disptime = jiffies + min_wait;
/* Update dispatch time */
- throtl_dequeue_tg(tg, parent_sq);
+ throtl_dequeue_tg(tg);
tg->disptime = disptime;
- throtl_enqueue_tg(tg, parent_sq);
+ throtl_enqueue_tg(tg);
/* see throtl_add_bio_tg() */
tg->flags &= ~THROTL_TG_WAS_EMPTY;
}
-static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
- struct throtl_service_queue *parent_sq)
+static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
{
struct throtl_service_queue *sq = &tg->service_queue;
struct bio *bio;
@@ -770,14 +768,13 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw,
tg->td->nr_queued[rw]--;
throtl_charge_bio(tg, bio);
- bio_list_add(&parent_sq->bio_lists[rw], bio);
+ bio_list_add(&sq->parent_sq->bio_lists[rw], bio);
bio->bi_rw |= REQ_THROTTLED;
throtl_trim_slice(tg, rw);
}
-static int throtl_dispatch_tg(struct throtl_grp *tg,
- struct throtl_service_queue *parent_sq)
+static int throtl_dispatch_tg(struct throtl_grp *tg)
{
struct throtl_service_queue *sq = &tg->service_queue;
unsigned int nr_reads = 0, nr_writes = 0;
@@ -790,7 +787,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg,
while ((bio = bio_list_peek(&sq->bio_lists[READ])) &&
tg_may_dispatch(tg, bio, NULL)) {
- tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio));
nr_reads++;
if (nr_reads >= max_nr_reads)
@@ -800,7 +797,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg,
while ((bio = bio_list_peek(&sq->bio_lists[WRITE])) &&
tg_may_dispatch(tg, bio, NULL)) {
- tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio));
nr_writes++;
if (nr_writes >= max_nr_writes)
@@ -824,12 +821,12 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq)
if (time_before(jiffies, tg->disptime))
break;
- throtl_dequeue_tg(tg, parent_sq);
+ throtl_dequeue_tg(tg);
- nr_disp += throtl_dispatch_tg(tg, parent_sq);
+ nr_disp += throtl_dispatch_tg(tg);
if (sq->nr_queued[0] || sq->nr_queued[1])
- tg_update_disptime(tg, parent_sq);
+ tg_update_disptime(tg);
if (nr_disp >= throtl_quantum)
break;
@@ -991,7 +988,7 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
throtl_start_new_slice(tg, 1);
if (tg->flags & THROTL_TG_PENDING) {
- tg_update_disptime(tg, &td->service_queue);
+ tg_update_disptime(tg);
throtl_schedule_next_dispatch(td);
}
@@ -1142,12 +1139,12 @@ queue_bio:
sq->nr_queued[READ], sq->nr_queued[WRITE]);
bio_associate_current(bio);
- throtl_add_bio_tg(bio, tg, &q->td->service_queue);
+ throtl_add_bio_tg(bio, tg);
throttled = true;
/* update @tg's dispatch time if @tg was empty before @bio */
if (tg->flags & THROTL_TG_WAS_EMPTY) {
- tg_update_disptime(tg, &td->service_queue);
+ tg_update_disptime(tg);
throtl_schedule_next_dispatch(td);
}
@@ -1179,12 +1176,12 @@ void blk_throtl_drain(struct request_queue *q)
while ((tg = throtl_rb_first(parent_sq))) {
struct throtl_service_queue *sq = &tg->service_queue;
- throtl_dequeue_tg(tg, parent_sq);
+ throtl_dequeue_tg(tg);
while ((bio = bio_list_peek(&sq->bio_lists[READ])))
- tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio));
while ((bio = bio_list_peek(&sq->bio_lists[WRITE])))
- tg_dispatch_one_bio(tg, bio_data_dir(bio), parent_sq);
+ tg_dispatch_one_bio(tg, bio_data_dir(bio));
}
spin_unlock_irq(q->queue_lock);
@@ -1205,7 +1202,7 @@ int blk_throtl_init(struct request_queue *q)
return -ENOMEM;
INIT_DELAYED_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn);
- throtl_service_queue_init(&td->service_queue);
+ throtl_service_queue_init(&td->service_queue, NULL);
q->td = td;
td->queue = q;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 21/33] blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log()
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (15 preceding siblings ...)
2013-05-06 22:45 ` [PATCH 20/33] blk-throttle: add throtl_service_queue->parent_sq Tejun Heo
@ 2013-05-06 22:46 ` Tejun Heo
2013-05-06 22:46 ` [PATCH 28/33] blk-throttle: make tg_dispatch_one_bio() ready for hierarchy Tejun Heo
` (4 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:46 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
Now that both throtl_data and throtl_grp embed throtl_service_queue,
we can unify throtl_log() and throtl_log_tg().
* sq_to_tg() is added. This returns the throtl_grp a service_queue is
embedded in. If the service_queue is the top-level one embedded in
throtl_data, NULL is returned.
* sq_to_td() is added. A service_queue is always associated with a
throtl_data. This function finds the associated td and returns it.
* throtl_log() is updated to take throtl_service_queue instead of
throtl_data. If the service_queue is one embedded in throtl_grp, it
prints the same header as throtl_log_tg() did. If it's one embedded
in throtl_data, it behaves the same as before. This renders
throtl_log_tg() unnecessary. Removed.
This change is necessary for hierarchy support as we're gonna be using
the same code paths to dispatch bios to intermediate service_queues
embedded in throtl_grps and the top-level service_queue embedded in
throtl_data.
This patch doesn't make any behavior changes.
v2: throtl_log() didn't print a space after blkg path. Updated so
that it prints a space after throtl_grp path. Spotted by Vivek.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
block/blk-throttle.c | 110 +++++++++++++++++++++++++++++++++++++--------------
1 file changed, 81 insertions(+), 29 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 00cfdd0..2875ff6 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -151,16 +151,65 @@ static inline struct throtl_grp *td_root_tg(struct throtl_data *td)
return blkg_to_tg(td->queue->root_blkg);
}
-#define throtl_log_tg(tg, fmt, args...) do { \
- char __pbuf[128]; \
+/**
+ * sq_to_tg - return the throl_grp the specified service queue belongs to
+ * @sq: the throtl_service_queue of interest
+ *
+ * Return the throtl_grp @sq belongs to. If @sq is the top-level one
+ * embedded in throtl_data, %NULL is returned.
+ */
+static struct throtl_grp *sq_to_tg(struct throtl_service_queue *sq)
+{
+ if (sq && sq->parent_sq)
+ return container_of(sq, struct throtl_grp, service_queue);
+ else
+ return NULL;
+}
+
+/**
+ * sq_to_td - return throtl_data the specified service queue belongs to
+ * @sq: the throtl_service_queue of interest
+ *
+ * A service_queue can be embeded in either a throtl_grp or throtl_data.
+ * Determine the associated throtl_data accordingly and return it.
+ */
+static struct throtl_data *sq_to_td(struct throtl_service_queue *sq)
+{
+ struct throtl_grp *tg = sq_to_tg(sq);
+
+ if (tg)
+ return tg->td;
+ else
+ return container_of(sq, struct throtl_data, service_queue);
+}
+
+/**
+ * throtl_log - log debug message via blktrace
+ * @sq: the service_queue being reported
+ * @fmt: printf format string
+ * @args: printf args
+ *
+ * The messages are prefixed with "throtl BLKG_NAME" if @sq belongs to a
+ * throtl_grp; otherwise, just "throtl".
+ *
+ * TODO: this should be made a function and name formatting should happen
+ * after testing whether blktrace is enabled.
+ */
+#define throtl_log(sq, fmt, args...) do { \
+ struct throtl_grp *__tg = sq_to_tg((sq)); \
+ struct throtl_data *__td = sq_to_td((sq)); \
+ \
+ (void)__td; \
+ if ((__tg)) { \
+ char __pbuf[128]; \
\
- blkg_path(tg_to_blkg(tg), __pbuf, sizeof(__pbuf)); \
- blk_add_trace_msg((tg)->td->queue, "throtl %s " fmt, __pbuf, ##args); \
+ blkg_path(tg_to_blkg(__tg), __pbuf, sizeof(__pbuf)); \
+ blk_add_trace_msg(__td->queue, "throtl %s " fmt, __pbuf, ##args); \
+ } else { \
+ blk_add_trace_msg(__td->queue, "throtl " fmt, ##args); \
+ } \
} while (0)
-#define throtl_log(td, fmt, args...) \
- blk_add_trace_msg((td)->queue, "throtl " fmt, ##args)
-
/*
* Worker for allocating per cpu stat for tgs. This is scheduled on the
* system_wq once there are some groups on the alloc_list waiting for
@@ -402,9 +451,10 @@ static void throtl_schedule_delayed_work(struct throtl_data *td,
unsigned long delay)
{
struct delayed_work *dwork = &td->dispatch_work;
+ struct throtl_service_queue *sq = &td->service_queue;
mod_delayed_work(kthrotld_workqueue, dwork, delay);
- throtl_log(td, "schedule work. delay=%lu jiffies=%lu", delay, jiffies);
+ throtl_log(sq, "schedule work. delay=%lu jiffies=%lu", delay, jiffies);
}
static void throtl_schedule_next_dispatch(struct throtl_data *td)
@@ -429,9 +479,10 @@ static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
tg->io_disp[rw] = 0;
tg->slice_start[rw] = jiffies;
tg->slice_end[rw] = jiffies + throtl_slice;
- throtl_log_tg(tg, "[%c] new slice start=%lu end=%lu jiffies=%lu",
- rw == READ ? 'R' : 'W', tg->slice_start[rw],
- tg->slice_end[rw], jiffies);
+ throtl_log(&tg->service_queue,
+ "[%c] new slice start=%lu end=%lu jiffies=%lu",
+ rw == READ ? 'R' : 'W', tg->slice_start[rw],
+ tg->slice_end[rw], jiffies);
}
static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
@@ -444,9 +495,10 @@ static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw,
unsigned long jiffy_end)
{
tg->slice_end[rw] = roundup(jiffy_end, throtl_slice);
- throtl_log_tg(tg, "[%c] extend slice start=%lu end=%lu jiffies=%lu",
- rw == READ ? 'R' : 'W', tg->slice_start[rw],
- tg->slice_end[rw], jiffies);
+ throtl_log(&tg->service_queue,
+ "[%c] extend slice start=%lu end=%lu jiffies=%lu",
+ rw == READ ? 'R' : 'W', tg->slice_start[rw],
+ tg->slice_end[rw], jiffies);
}
/* Determine if previously allocated or extended slice is complete or not */
@@ -511,10 +563,10 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
tg->slice_start[rw] += nr_slices * throtl_slice;
- throtl_log_tg(tg, "[%c] trim slice nr=%lu bytes=%llu io=%lu"
- " start=%lu end=%lu jiffies=%lu",
- rw == READ ? 'R' : 'W', nr_slices, bytes_trim, io_trim,
- tg->slice_start[rw], tg->slice_end[rw], jiffies);
+ throtl_log(&tg->service_queue,
+ "[%c] trim slice nr=%lu bytes=%llu io=%lu start=%lu end=%lu jiffies=%lu",
+ rw == READ ? 'R' : 'W', nr_slices, bytes_trim, io_trim,
+ tg->slice_start[rw], tg->slice_end[rw], jiffies);
}
static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
@@ -852,7 +904,7 @@ void blk_throtl_dispatch_work_fn(struct work_struct *work)
bio_list_init(&bio_list_on_stack);
- throtl_log(td, "dispatch nr_queued=%u read=%u write=%u",
+ throtl_log(sq, "dispatch nr_queued=%u read=%u write=%u",
td->nr_queued[READ] + td->nr_queued[WRITE],
td->nr_queued[READ], td->nr_queued[WRITE]);
@@ -863,7 +915,7 @@ void blk_throtl_dispatch_work_fn(struct work_struct *work)
bio_list_merge(&bio_list_on_stack, &sq->bio_lists[rw]);
bio_list_init(&sq->bio_lists[rw]);
}
- throtl_log(td, "bios disp=%u", nr_disp);
+ throtl_log(sq, "bios disp=%u", nr_disp);
}
throtl_schedule_next_dispatch(td);
@@ -972,9 +1024,10 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
else
*(unsigned int *)((void *)tg + cft->private) = ctx.v;
- throtl_log_tg(tg, "limit change rbps=%llu wbps=%llu riops=%u wiops=%u",
- tg->bps[READ], tg->bps[WRITE],
- tg->iops[READ], tg->iops[WRITE]);
+ throtl_log(&tg->service_queue,
+ "limit change rbps=%llu wbps=%llu riops=%u wiops=%u",
+ tg->bps[READ], tg->bps[WRITE],
+ tg->iops[READ], tg->iops[WRITE]);
/*
* We're already holding queue_lock and know @tg is valid. Let's
@@ -1131,12 +1184,11 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
}
queue_bio:
- throtl_log_tg(tg, "[%c] bio. bdisp=%llu sz=%u bps=%llu"
- " iodisp=%u iops=%u queued=%d/%d",
- rw == READ ? 'R' : 'W',
- tg->bytes_disp[rw], bio->bi_size, tg->bps[rw],
- tg->io_disp[rw], tg->iops[rw],
- sq->nr_queued[READ], sq->nr_queued[WRITE]);
+ throtl_log(sq, "[%c] bio. bdisp=%llu sz=%u bps=%llu iodisp=%u iops=%u queued=%d/%d",
+ rw == READ ? 'R' : 'W',
+ tg->bytes_disp[rw], bio->bi_size, tg->bps[rw],
+ tg->io_disp[rw], tg->iops[rw],
+ sq->nr_queued[READ], sq->nr_queued[WRITE]);
bio_associate_current(bio);
throtl_add_bio_tg(bio, tg);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 28/33] blk-throttle: make tg_dispatch_one_bio() ready for hierarchy
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (16 preceding siblings ...)
2013-05-06 22:46 ` [PATCH 21/33] blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log() Tejun Heo
@ 2013-05-06 22:46 ` Tejun Heo
2013-05-06 22:46 ` [PATCH 31/33] blk-throttle: Account for child group's start time in parent while bio climbs up Tejun Heo
` (3 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:46 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lizefan-hv44wF8Li93QT0dZR+AlfA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
Tejun Heo
tg_dispatch_one_bio() currently assumes that the parent_sq is the top
level one and the bio being dispatched is ready to be issued; however,
this assumption will be wrong with proper hierarchy support. This
patch makes the following changes to make tg_dispatch_on_bio() ready
for hiearchy.
* throtl_data->nr_queued[] is incremented in blk_throtl_bio() instead
of throtl_add_bio_tg() so that throtl_add_bio_tg() can be used to
transfer a bio from a child tg to its parent.
* tg_dispatch_one_bio() is updated to distinguish whether its parent
is another throtl_grp or the throtl_data. If former, the bio is
transferred to the parent throtl_grp using throtl_add_bio_tg(). If
latter, the bio is ready to be issued and put on the top-level
service_queue's bio_lists[] and throtl_data->nr_queued is
decremented.
As all throtl_grps currently have the top level service_queue as their
->parent_sq, this patch in itself doesn't make any behavior
difference.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 52321a4..0420261 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -824,7 +824,6 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_grp *tg)
/* Take a bio reference on tg */
blkg_get(tg_to_blkg(tg));
sq->nr_queued[rw]++;
- tg->td->nr_queued[rw]++;
throtl_enqueue_tg(tg);
}
@@ -855,20 +854,34 @@ static void tg_update_disptime(struct throtl_grp *tg)
static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
{
struct throtl_service_queue *sq = &tg->service_queue;
+ struct throtl_service_queue *parent_sq = sq->parent_sq;
+ struct throtl_grp *parent_tg = sq_to_tg(parent_sq);
struct bio *bio;
bio = bio_list_pop(&sq->bio_lists[rw]);
sq->nr_queued[rw]--;
- /* Drop bio reference on blkg */
- blkg_put(tg_to_blkg(tg));
-
- BUG_ON(tg->td->nr_queued[rw] <= 0);
- tg->td->nr_queued[rw]--;
throtl_charge_bio(tg, bio);
- bio_list_add(&sq->parent_sq->bio_lists[rw], bio);
+
+ /*
+ * If our parent is another tg, we just need to transfer @bio to
+ * the parent using throtl_add_bio_tg(). If our parent is
+ * @td->service_queue, @bio is ready to be issued. Put it on its
+ * bio_lists[] and decrease total number queued. The caller is
+ * responsible for issuing these bios.
+ */
+ if (parent_tg) {
+ throtl_add_bio_tg(bio, parent_tg);
+ } else {
+ bio_list_add(&parent_sq->bio_lists[rw], bio);
+ BUG_ON(tg->td->nr_queued[rw] <= 0);
+ tg->td->nr_queued[rw]--;
+ }
throtl_trim_slice(tg, rw);
+
+ /* @bio is transferred to parent, drop its blkg reference */
+ blkg_put(tg_to_blkg(tg));
}
static int throtl_dispatch_tg(struct throtl_grp *tg)
@@ -1283,6 +1296,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
sq->nr_queued[READ], sq->nr_queued[WRITE]);
bio_associate_current(bio);
+ tg->td->nr_queued[rw]++;
throtl_add_bio_tg(bio, tg);
throttled = true;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 31/33] blk-throttle: Account for child group's start time in parent while bio climbs up
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (17 preceding siblings ...)
2013-05-06 22:46 ` [PATCH 28/33] blk-throttle: make tg_dispatch_one_bio() ready for hierarchy Tejun Heo
@ 2013-05-06 22:46 ` Tejun Heo
2013-05-06 22:46 ` [PATCH 33/33] blk-throttle: implement proper hierarchy support Tejun Heo
` (2 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:46 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lizefan-hv44wF8Li93QT0dZR+AlfA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
Tejun Heo
From: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
With the planned proper hierarchy support, a bio will climb up the
tree before actually being dispatched. This makes sure bio is also
subjected to parent's throttling limits, if any.
It might happen that parent is idle and when bio is transferred to
parent, a new slice starts fresh. But that is incorrect as parents
wait time should have started when bio was queued in child group and
causes IOs to be throttled more than configured as they climb the
hierarchy.
Given the fact that we have not written hierarchical algorithm in a
way where child's and parents time slices are synchronized, we
transfer the child's start time to parent if parent was idling. If
parent was busy doing dispatch of other bios all this while, this is
not an issue.
Child's slice start time is passed to parent. Parent looks at its
last expired slice start time. If child's start time is after parents
old start time, that means parent had been idle and after parent
went idle, child had an IO queued. So use child's start time as
parent start time.
If parent's start time is after child's start time, that means,
when IO got queued in child group, parent was not idle. But later
it dispatched some IO, its slice got trimmed and then it went idle.
After a while child's request got shifted in parent group. In this
case use parent's old start time as new start time as that's the
duration of slice we did not use.
This logic is far from perfect as if there are multiple childs
then first child transferring the bio decides the start time while
a bio might have queued up even earlier in other child, which is
yet to be transferred up to parent. In that case we will lose
time and bandwidth in parent. This patch is just an approximation
to make situation somewhat better.
Signed-off-by: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-throttle.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 541bd0d..7477f33 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -633,6 +633,28 @@ static bool throtl_schedule_next_dispatch(struct throtl_service_queue *sq,
return false;
}
+static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
+ bool rw, unsigned long start)
+{
+ tg->bytes_disp[rw] = 0;
+ tg->io_disp[rw] = 0;
+
+ /*
+ * Previous slice has expired. We must have trimmed it after last
+ * bio dispatch. That means since start of last slice, we never used
+ * that bandwidth. Do try to make use of that bandwidth while giving
+ * credit.
+ */
+ if (time_after_eq(start, tg->slice_start[rw]))
+ tg->slice_start[rw] = start;
+
+ tg->slice_end[rw] = jiffies + throtl_slice;
+ throtl_log(&tg->service_queue,
+ "[%c] new slice with credit start=%lu end=%lu jiffies=%lu",
+ rw == READ ? 'R' : 'W', tg->slice_start[rw],
+ tg->slice_end[rw], jiffies);
+}
+
static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
{
tg->bytes_disp[rw] = 0;
@@ -992,6 +1014,16 @@ static void tg_update_disptime(struct throtl_grp *tg)
tg->flags &= ~THROTL_TG_WAS_EMPTY;
}
+static void start_parent_slice_with_credit(struct throtl_grp *child_tg,
+ struct throtl_grp *parent_tg, bool rw)
+{
+ if (throtl_slice_used(parent_tg, rw)) {
+ throtl_start_new_slice_with_credit(parent_tg, rw,
+ child_tg->slice_start[rw]);
+ }
+
+}
+
static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
{
struct throtl_service_queue *sq = &tg->service_queue;
@@ -1020,6 +1052,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
*/
if (parent_tg) {
throtl_add_bio_tg(bio, &tg->qnode_on_parent[rw], parent_tg);
+ start_parent_slice_with_credit(tg, parent_tg, rw);
} else {
throtl_qnode_add_bio(bio, &tg->qnode_on_parent[rw],
&parent_sq->queued[rw]);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 33/33] blk-throttle: implement proper hierarchy support
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (18 preceding siblings ...)
2013-05-06 22:46 ` [PATCH 31/33] blk-throttle: Account for child group's start time in parent while bio climbs up Tejun Heo
@ 2013-05-06 22:46 ` Tejun Heo
[not found] ` <1367880372-28312-34-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-07 14:02 ` [PATCHSET v2] " Vivek Goyal
2013-05-07 14:16 ` Vivek Goyal
21 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2013-05-06 22:46 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lizefan-hv44wF8Li93QT0dZR+AlfA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
Tejun Heo
With the recent updates, blk-throttle is finally ready for proper
hierarchy support. Dispatching now honors service_queue->parent_sq
and propagates correctly. The only thing missing is setting
->parent_sq correctly so that throtl_grp hierarchy matches the cgroup
hierarchy.
This patch updates throtl_pd_init() such that service_queues form the
same hierarchy as the cgroup hierarchy if sane_behavior is enabled.
As this concludes proper hierarchy support for blkcg, the shameful
.broken_hierarchy tag is removed from blkio_subsys.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
block/blk-cgroup.c | 8 --------
block/blk-throttle.c | 22 +++++++++++++++++++++-
include/linux/cgroup.h | 2 ++
3 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index af2ca27..8d9edc8 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -911,14 +911,6 @@ struct cgroup_subsys blkio_subsys = {
.subsys_id = blkio_subsys_id,
.base_cftypes = blkcg_files,
.module = THIS_MODULE,
-
- /*
- * blkio subsystem is utterly broken in terms of hierarchy support.
- * It treats all cgroups equally regardless of where they're
- * located in the hierarchy - all cgroups are treated as if they're
- * right below the root. Fix it and remove the following.
- */
- .broken_hierarchy = true,
};
EXPORT_SYMBOL_GPL(blkio_subsys);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 27f006b..08a32df 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -397,10 +397,30 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
struct throtl_data *td = blkg->q->td;
+ struct throtl_service_queue *parent_sq;
unsigned long flags;
int rw;
- throtl_service_queue_init(&tg->service_queue, &td->service_queue);
+ /*
+ * If sane_hierarchy is enabled, we switch to properly hierarchical
+ * behavior where limits on a given throtl_grp are applied to the
+ * whole subtree rather than just the group itself. e.g. If 16M
+ * read_bps limit is set on the root group, the whole system can't
+ * exceed 16M for the device.
+ *
+ * If sane_hierarchy is not enabled, the broken flat hierarchy
+ * behavior is retained where all throtl_grps are treated as if
+ * they're all separate root groups right below throtl_data.
+ * Limits of a group don't interact with limits of other groups
+ * regardless of the position of the group in the hierarchy.
+ */
+ parent_sq = &td->service_queue;
+
+ if (cgroup_sane_behavior(blkg->blkcg->css.cgroup) && blkg->parent)
+ parent_sq = &blkg_to_tg(blkg->parent)->service_queue;
+
+ throtl_service_queue_init(&tg->service_queue, parent_sq);
+
for (rw = READ; rw <= WRITE; rw++) {
throtl_qnode_init(&tg->qnode_on_self[rw], tg);
throtl_qnode_init(&tg->qnode_on_parent[rw], tg);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c371888..3c5f780 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -271,6 +271,8 @@ enum {
* - memcg: use_hierarchy is on by default and the cgroup file for
* the flag is not created.
*
+ * - blkcg: blk-throttle becomes properly hierarchical.
+ *
* The followings are planned changes.
*
* - release_agent will be disallowed once replacement notification
--
1.8.1.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCHSET v2] blk-throttle: implement proper hierarchy support
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (19 preceding siblings ...)
2013-05-06 22:46 ` [PATCH 33/33] blk-throttle: implement proper hierarchy support Tejun Heo
@ 2013-05-07 14:02 ` Vivek Goyal
2013-05-07 14:16 ` Vivek Goyal
21 siblings, 0 replies; 39+ messages in thread
From: Vivek Goyal @ 2013-05-07 14:02 UTC (permalink / raw)
To: Tejun Heo
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
On Mon, May 06, 2013 at 03:45:39PM -0700, Tejun Heo wrote:
> Changes since the last take[L] are
>
> * Unnecessary throtl_schedule_delayed_work() call dropped from 0007.
>
> * throtl_log() implement in 0021 forgot to print space after blkg
> path. Fixed.
>
> * 0030-blk-throttle-add-throtl_qnode-for-dispatch-fairness.patch added
> to address dispatch fairness.
>
> * 0031-blk-throttle-Account-for-child-group-s-start-time-in.patch
> added to address unwarranted penalty of nested limit enforcement due
> to staggered delays of slice start times at multiple levels.
>
> The original patchset description follows.
>
This patch series looks good to me.
Acked-by: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Vivek
> blk-throttle is the last controller with broken hierarchy support
> making blkcg the last one tagged with .broken_hierarchy. This
> patchset implements hierarchy support for blk-throttle. The semantics
> is pretty simple - limits on an intermediate node applies to the whole
> subtree and the statistics remain local.
>
> As this changes the meaning of the knobs in an incompatible manner -
> e.g. configuring limits on root cgroup now means setting the limit for
> the whole system - the hierarchy mode is enabled by "sane_behavior"
> cgroup mount flag. If the flag is not specified, the original broken
> flat hierarchy behavior is retained.
>
> While this patchset contains many patches, the implementation is
> pretty straight-forward. throtl_grp's form a tree anchored at
> throtl_data and bios climb the tree as they get dispatched at each
> level. The bios which reach the top of the tree - throl_data - are
> issued. The scheduling algorithm remains unchanged at each level and
> blk-throttle should behave the same for flat hierarchy after the
> changes. The same algorithm is repeated until bios clear all limits
> to the top of the tree.
>
> This patchset contains the following 33 patches.
>
> 0001-blkcg-fix-error-return-path-in-blkg_create.patch
> 0002-blkcg-move-blkg_for_each_descendant_pre-to-block-blk.patch
> 0003-blkcg-implement-blkg_for_each_descendant_post.patch
> 0004-blkcg-invoke-blkcg_policy-pd_init-after-parent-is-li.patch
> 0005-blkcg-move-bulk-of-blkcg_gq-release-operations-to-th.patch
> 0006-blk-throttle-remove-spurious-throtl_enqueue_tg-call-.patch
> 0007-blk-throttle-removed-deferred-config-application-mec.patch
> 0008-blk-throttle-collapse-throtl_dispatch-into-the-work-.patch
> 0009-blk-throttle-relocate-throtl_schedule_delayed_work.patch
> 0010-blk-throttle-remove-pointless-throtl_nr_queued-optim.patch
> 0011-blk-throttle-rename-throtl_rb_root-to-throtl_service.patch
> 0012-blk-throttle-simplify-throtl_grp-flag-handling.patch
> 0013-blk-throttle-add-backlink-pointer-from-throtl_grp-to.patch
> 0014-blk-throttle-pass-around-throtl_service_queue-instea.patch
> 0015-blk-throttle-reorganize-throtl_service_queue-passed-.patch
> 0016-blk-throttle-add-throtl_grp-service_queue.patch
> 0017-blk-throttle-move-bio_lists-and-friends-to-throtl_se.patch
> 0018-blk-throttle-dispatch-to-throtl_data-service_queue.b.patch
> 0019-blk-throttle-generalize-update_disptime-optimization.patch
> 0020-blk-throttle-add-throtl_service_queue-parent_sq.patch
> 0021-blk-throttle-implement-sq_to_tg-sq_to_td-and-throtl_.patch
> 0022-blk-throttle-set-REQ_THROTTLED-from-throtl_charge_bi.patch
> 0023-blk-throttle-separate-out-throtl_service_queue-pendi.patch
> 0024-blk-throttle-implement-dispatch-looping.patch
> 0025-blk-throttle-dispatch-from-throtl_pending_timer_fn.patch
> 0026-blk-throttle-make-blk_throtl_drain-ready-for-hierarc.patch
> 0027-blk-throttle-make-blk_throtl_bio-ready-for-hierarchy.patch
> 0028-blk-throttle-make-tg_dispatch_one_bio-ready-for-hier.patch
> 0029-blk-throttle-make-throtl_pending_timer_fn-ready-for-.patch
> 0030-blk-throttle-add-throtl_qnode-for-dispatch-fairness.patch
> 0031-blk-throttle-Account-for-child-group-s-start-time-in.patch
> 0032-blk-throttle-implement-throtl_grp-has_rules.patch
> 0033-blk-throttle-implement-proper-hierarchy-support.patch
>
> 0001-0005 prepare blkcg so that hierarchy operations are easier.
>
> 0006-0016 reorganize code piece-by-piece so that hierarchy support can
> be added. These don't change behaviors.
>
> 0017-0025 prepare for hierarchy support. Moves fields which are used
> in hierarchy to throtl_service_queue and define parent-child
> relationship.
>
> 0026-0032 make queueing, dispatching and configuration changes
> propagate through the hierarchy.
>
> 0033 implemenats hierarchy support.
>
> As we're in the middle of a merge window, this patchset is currently
> based on cgroup/for-3.10. Once 3.10-rc1 drops, I'll rebase the tree
> and send pull request to Jens so that it can be routed with other
> block changes. The patches are also available on the following git
> branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-throtl-hierarchy
>
> diffstat follows. Thanks.
>
> block/blk-cgroup.c | 105 +---
> block/blk-cgroup.h | 38 +
> block/blk-throttle.c | 1064 ++++++++++++++++++++++++++++++++++---------------
> include/linux/cgroup.h | 2
> 4 files changed, 822 insertions(+), 387 deletions(-)
>
> --
> tejun
>
> [L] http://thread.gmane.org/gmane.linux.kernel.containers/25845
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCHSET v2] blk-throttle: implement proper hierarchy support
[not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (20 preceding siblings ...)
2013-05-07 14:02 ` [PATCHSET v2] " Vivek Goyal
@ 2013-05-07 14:16 ` Vivek Goyal
21 siblings, 0 replies; 39+ messages in thread
From: Vivek Goyal @ 2013-05-07 14:16 UTC (permalink / raw)
To: Tejun Heo
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
On Mon, May 06, 2013 at 03:45:39PM -0700, Tejun Heo wrote:
> Changes since the last take[L] are
>
> * Unnecessary throtl_schedule_delayed_work() call dropped from 0007.
>
> * throtl_log() implement in 0021 forgot to print space after blkg
> path. Fixed.
>
> * 0030-blk-throttle-add-throtl_qnode-for-dispatch-fairness.patch added
> to address dispatch fairness.
>
> * 0031-blk-throttle-Account-for-child-group-s-start-time-in.patch
> added to address unwarranted penalty of nested limit enforcement due
> to staggered delays of slice start times at multiple levels.
>
> The original patchset description follows.
Hi Tejun,
Can you please also update blkio-controller.txt file to reflect the
fact that now throttling also supports hierarchy. One of the places
which needs updation is following.
Hierarchical Cgroups
====================
- Currently only CFQ supports hierarchical groups. For throttling,
cgroup interface does allow creation of hierarchical cgroups and
internally it treats them as flat hierarchy.
Thanks
Vivek
^ permalink raw reply [flat|nested] 39+ messages in thread