cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: [PATCH 31/33] blk-throttle: Account for child group's start time in parent while bio climbs up
Date: Mon,  6 May 2013 15:46:10 -0700	[thread overview]
Message-ID: <1367880372-28312-32-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

With the planned proper hierarchy support, a bio will climb up the
tree before actually being dispatched. This makes sure bio is also
subjected to parent's throttling limits, if any.

It might happen that parent is idle and when bio is transferred to
parent, a new slice starts fresh. But that is incorrect as parents
wait time should have started when bio was queued in child group and
causes IOs to be throttled more than configured as they climb the
hierarchy.

Given the fact that we have not written hierarchical algorithm in a
way where child's and parents time slices are synchronized, we
transfer the child's start time to parent if parent was idling.  If
parent was busy doing dispatch of other bios all this while, this is
not an issue.

Child's slice start time is passed to parent. Parent looks at its
last expired slice start time. If child's start time is after parents
old start time, that means parent had been idle and after parent
went idle, child had an IO queued. So use child's start time as
parent start time.

If parent's start time is after child's start time, that means,
when IO got queued in child group, parent was not idle. But later
it dispatched some IO, its slice got trimmed and then it went idle.
After a while child's request got shifted in parent group. In this
case use parent's old start time as new start time as that's the
duration of slice we did not use.

This logic is far from perfect as if there are multiple childs
then first child transferring the bio decides the start time while
a bio might have queued up even earlier in other child, which is
yet to be transferred up to parent. In that case we will lose
time and bandwidth in parent. This patch is just an approximation
to make situation somewhat better.

Signed-off-by: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 block/blk-throttle.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 541bd0d..7477f33 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -633,6 +633,28 @@ static bool throtl_schedule_next_dispatch(struct throtl_service_queue *sq,
 	return false;
 }
 
+static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
+		bool rw, unsigned long start)
+{
+	tg->bytes_disp[rw] = 0;
+	tg->io_disp[rw] = 0;
+
+	/*
+	 * Previous slice has expired. We must have trimmed it after last
+	 * bio dispatch. That means since start of last slice, we never used
+	 * that bandwidth. Do try to make use of that bandwidth while giving
+	 * credit.
+	 */
+	if (time_after_eq(start, tg->slice_start[rw]))
+		tg->slice_start[rw] = start;
+
+	tg->slice_end[rw] = jiffies + throtl_slice;
+	throtl_log(&tg->service_queue,
+		   "[%c] new slice with credit start=%lu end=%lu jiffies=%lu",
+		   rw == READ ? 'R' : 'W', tg->slice_start[rw],
+		   tg->slice_end[rw], jiffies);
+}
+
 static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
 {
 	tg->bytes_disp[rw] = 0;
@@ -992,6 +1014,16 @@ static void tg_update_disptime(struct throtl_grp *tg)
 	tg->flags &= ~THROTL_TG_WAS_EMPTY;
 }
 
+static void start_parent_slice_with_credit(struct throtl_grp *child_tg,
+					struct throtl_grp *parent_tg, bool rw)
+{
+	if (throtl_slice_used(parent_tg, rw)) {
+		throtl_start_new_slice_with_credit(parent_tg, rw,
+				child_tg->slice_start[rw]);
+	}
+
+}
+
 static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 {
 	struct throtl_service_queue *sq = &tg->service_queue;
@@ -1020,6 +1052,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 	 */
 	if (parent_tg) {
 		throtl_add_bio_tg(bio, &tg->qnode_on_parent[rw], parent_tg);
+		start_parent_slice_with_credit(tg, parent_tg, rw);
 	} else {
 		throtl_qnode_add_bio(bio, &tg->qnode_on_parent[rw],
 				     &parent_sq->queued[rw]);
-- 
1.8.1.4

  parent reply	other threads:[~2013-05-06 22:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-06 22:45 [PATCHSET v2] blk-throttle: implement proper hierarchy support Tejun Heo
     [not found] ` <1367880372-28312-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-06 22:45   ` [PATCH 01/33] blkcg: fix error return path in blkg_create() Tejun Heo
2013-05-06 22:45   ` [PATCH 02/33] blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.h Tejun Heo
2013-05-06 22:45   ` [PATCH 03/33] blkcg: implement blkg_for_each_descendant_post() Tejun Heo
2013-05-06 22:45   ` [PATCH 04/33] blkcg: invoke blkcg_policy->pd_init() after parent is linked Tejun Heo
2013-05-06 22:45   ` [PATCH 05/33] blkcg: move bulk of blkcg_gq release operations to the RCU callback Tejun Heo
2013-05-06 22:45   ` [PATCH 06/33] blk-throttle: remove spurious throtl_enqueue_tg() call from throtl_select_dispatch() Tejun Heo
2013-05-06 22:45   ` [PATCH 08/33] blk-throttle: collapse throtl_dispatch() into the work function Tejun Heo
2013-05-06 22:45   ` [PATCH 09/33] blk-throttle: relocate throtl_schedule_delayed_work() Tejun Heo
2013-05-06 22:45   ` [PATCH 11/33] blk-throttle: rename throtl_rb_root to throtl_service_queue Tejun Heo
2013-05-06 22:45   ` [PATCH 13/33] blk-throttle: add backlink pointer from throtl_grp to throtl_data Tejun Heo
2013-05-06 22:45   ` [PATCH 14/33] blk-throttle: pass around throtl_service_queue instead of throtl_data Tejun Heo
2013-05-06 22:45   ` [PATCH 15/33] blk-throttle: reorganize throtl_service_queue passed around as argument Tejun Heo
2013-05-06 22:45   ` [PATCH 16/33] blk-throttle: add throtl_grp->service_queue Tejun Heo
2013-05-06 22:45   ` [PATCH 18/33] blk-throttle: dispatch to throtl_data->service_queue.bio_lists[] Tejun Heo
2013-05-06 22:45   ` [PATCH 19/33] blk-throttle: generalize update_disptime optimization in blk_throtl_bio() Tejun Heo
2013-05-06 22:45   ` [PATCH 20/33] blk-throttle: add throtl_service_queue->parent_sq Tejun Heo
2013-05-06 22:46   ` [PATCH 21/33] blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log() Tejun Heo
2013-05-06 22:46   ` [PATCH 28/33] blk-throttle: make tg_dispatch_one_bio() ready for hierarchy Tejun Heo
2013-05-06 22:46   ` Tejun Heo [this message]
2013-05-06 22:46   ` [PATCH 33/33] blk-throttle: implement proper hierarchy support Tejun Heo
     [not found]     ` <1367880372-28312-34-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-05-07 13:55       ` Vivek Goyal
     [not found]         ` <20130507135511.GA7082-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-07 16:14           ` Tejun Heo
2013-05-07 16:50       ` [PATCH v2 " Tejun Heo
2013-05-07 14:02   ` [PATCHSET v2] " Vivek Goyal
2013-05-07 14:16   ` Vivek Goyal
2013-05-06 22:45 ` [PATCH 07/33] blk-throttle: removed deferred config application mechanism Tejun Heo
2013-05-06 22:45 ` [PATCH 10/33] blk-throttle: remove pointless throtl_nr_queued() optimizations Tejun Heo
2013-05-06 22:45 ` [PATCH 12/33] blk-throttle: simplify throtl_grp flag handling Tejun Heo
2013-05-06 22:45 ` [PATCH 17/33] blk-throttle: move bio_lists[] and friends to throtl_service_queue Tejun Heo
2013-05-06 22:46 ` [PATCH 22/33] blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats update with it Tejun Heo
2013-05-06 22:46 ` [PATCH 23/33] blk-throttle: separate out throtl_service_queue->pending_timer from throtl_data->dispatch_work Tejun Heo
2013-05-06 22:46 ` [PATCH 24/33] blk-throttle: implement dispatch looping Tejun Heo
2013-05-06 22:46 ` [PATCH 25/33] blk-throttle: dispatch from throtl_pending_timer_fn() Tejun Heo
2013-05-06 22:46 ` [PATCH 26/33] blk-throttle: make blk_throtl_drain() ready for hierarchy Tejun Heo
2013-05-06 22:46 ` [PATCH 27/33] blk-throttle: make blk_throtl_bio() " Tejun Heo
2013-05-06 22:46 ` [PATCH 29/33] blk-throttle: make throtl_pending_timer_fn() " Tejun Heo
2013-05-06 22:46 ` [PATCH 30/33] blk-throttle: add throtl_qnode for dispatch fairness Tejun Heo
2013-05-06 22:46 ` [PATCH 32/33] blk-throttle: implement throtl_grp->has_rules[] Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1367880372-28312-32-git-send-email-tj@kernel.org \
    --to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    --cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).