From: Vivek Goyal <vgoyal@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org,
lizefan@huawei.com, containers@lists.linux-foundation.org,
cgroups@vger.kernel.org
Subject: Re: [PATCHSET] blk-throttle: implement proper hierarchy support
Date: Thu, 2 May 2013 14:08:15 -0400 [thread overview]
Message-ID: <20130502180815.GG30020@redhat.com> (raw)
In-Reply-To: <20130502173428.GA4771@redhat.com>
On Thu, May 02, 2013 at 01:34:28PM -0400, Vivek Goyal wrote:
> On Wed, May 01, 2013 at 05:39:18PM -0700, Tejun Heo wrote:
>
> [..]
> > While this patchset contains many patches, the implementation is
> > pretty straight-forward. throtl_grp's form a tree anchored at
> > throtl_data and bios climb the tree as they get dispatched at each
> > level. The bios which reach the top of the tree - throl_data - are
> > issued.
>
> Have a question here. Looks like when bio climbs from child group
> to parent group, then parent group slice starts fresh if parent
> was empty. So if we have a parent with 1MB/s limit and a child with
> 1MB/s limit and a bio gets queued in child, then looks like effective
> IO rate would be .5MB/s and not 1MB/s?
>
> IOW, when child gets queued, we should start time accounting for
> all parents in the hiearchy too.
Hi Tejun,
Also, IIUC, there might be bandwidth sharing problems. Once a bio
climbs up the ladder, it gets queued at the end of the parent queue. And
that can lead to unfair distribution of available bandwidth. For
example,
G1
/ \
T1 G2
|
T2
G1 and G2 are 2 groups and T1 and T2 are tasks in groups respectively.
Assume both G1 and G2 are having 1MB/s IO rate limit. Assume T1 and
T2 are doing enough IO to keep respective queues backlogged.
Now While T2 is backlogged in in G2, T1 can queue up multiple 1MB
size bio and all these bio's will be served first and then one bio
from T2. And this can repeat for long time and problem only worsens
with hierarchy depth.
Ideally both T1 and G2 should share 1MB/s link equally (That is .5MB/sec)
each but in this case, T1 can run away with lot more than fair share.
Can't think how to get fair distribution of available bandwidth with
bio climbing the tree model.
I was thinking that we should implement it something along the lines
of what cpu scheduler has done. All parent groups get enqueued on
service tree when IO gets queued in any of child groups. Time slice
accounting starts at each level. And at each level we do round robin
for dispatch of bio from each eligible child group/queue.
Thanks
Vivek
next prev parent reply other threads:[~2013-05-02 18:08 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-02 0:39 [PATCHSET] blk-throttle: implement proper hierarchy support Tejun Heo
2013-05-02 0:39 ` [PATCH 01/31] blkcg: fix error return path in blkg_create() Tejun Heo
2013-05-02 0:39 ` [PATCH 02/31] blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.h Tejun Heo
2013-05-02 0:39 ` [PATCH 03/31] blkcg: implement blkg_for_each_descendant_post() Tejun Heo
2013-05-02 0:39 ` [PATCH 04/31] blkcg: invoke blkcg_policy->pd_init() after parent is linked Tejun Heo
2013-05-02 0:39 ` [PATCH 05/31] blkcg: move bulk of blkcg_gq release operations to the RCU callback Tejun Heo
2013-05-02 0:39 ` [PATCH 06/31] blk-throttle: remove spurious throtl_enqueue_tg() call from throtl_select_dispatch() Tejun Heo
2013-05-02 0:39 ` [PATCH 07/31] blk-throttle: removed deferred config application mechanism Tejun Heo
2013-05-02 14:49 ` Vivek Goyal
2013-05-02 17:27 ` Tejun Heo
2013-05-02 0:39 ` [PATCH 08/31] blk-throttle: collapse throtl_dispatch() into the work function Tejun Heo
2013-05-02 0:39 ` [PATCH 09/31] blk-throttle: relocate throtl_schedule_delayed_work() Tejun Heo
2013-05-02 0:39 ` [PATCH 10/31] blk-throttle: remove pointless throtl_nr_queued() optimizations Tejun Heo
2013-05-02 0:39 ` [PATCH 11/31] blk-throttle: rename throtl_rb_root to throtl_service_queue Tejun Heo
2013-05-02 0:39 ` [PATCH 12/31] blk-throttle: simplify throtl_grp flag handling Tejun Heo
2013-05-02 0:39 ` [PATCH 13/31] blk-throttle: add backlink pointer from throtl_grp to throtl_data Tejun Heo
2013-05-02 0:39 ` [PATCH 14/31] blk-throttle: pass around throtl_service_queue instead of throtl_data Tejun Heo
2013-05-02 0:39 ` [PATCH 15/31] blk-throttle: reorganize throtl_service_queue passed around as argument Tejun Heo
2013-05-02 15:21 ` Vivek Goyal
2013-05-02 17:29 ` Tejun Heo
2013-05-02 0:39 ` [PATCH 16/31] blk-throttle: add throtl_grp->service_queue Tejun Heo
2013-05-02 0:39 ` [PATCH 17/31] blk-throttle: move bio_lists[] and friends to throtl_service_queue Tejun Heo
2013-05-02 0:39 ` [PATCH 18/31] blk-throttle: dispatch to throtl_data->service_queue.bio_lists[] Tejun Heo
2013-05-02 0:39 ` [PATCH 19/31] blk-throttle: generalize update_disptime optimization in blk_throtl_bio() Tejun Heo
2013-05-02 0:39 ` [PATCH 20/31] blk-throttle: add throtl_service_queue->parent_sq Tejun Heo
2013-05-02 0:39 ` [PATCH 21/31] blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log() Tejun Heo
2013-05-06 17:36 ` Vivek Goyal
2013-05-06 18:38 ` Tejun Heo
2013-05-06 20:38 ` Tejun Heo
2013-05-06 20:39 ` Tejun Heo
2013-05-06 20:41 ` Vivek Goyal
2013-05-06 20:43 ` Tejun Heo
2013-05-02 0:39 ` [PATCH 22/31] blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats update with it Tejun Heo
2013-05-02 0:39 ` [PATCH 23/31] blk-throttle: separate out throtl_service_queue->pending_timer from throtl_data->dispatch_work Tejun Heo
2013-05-02 0:39 ` [PATCH 24/31] blk-throttle: implement dispatch looping Tejun Heo
2013-05-02 0:39 ` [PATCH 25/31] blk-throttle: dispatch from throtl_pending_timer_fn() Tejun Heo
2013-05-02 0:39 ` [PATCH 26/31] blk-throttle: make blk_throtl_drain() ready for hierarchy Tejun Heo
2013-05-02 0:39 ` [PATCH 27/31] blk-throttle: make blk_throtl_bio() " Tejun Heo
2013-05-02 0:39 ` [PATCH 28/31] blk-throttle: make tg_dispatch_one_bio() " Tejun Heo
2013-05-02 0:39 ` [PATCH 29/31] blk-throttle: make throtl_pending_timer_fn() " Tejun Heo
2013-05-02 0:39 ` [PATCH 30/31] blk-throttle: implement throtl_grp->has_rules[] Tejun Heo
2013-05-02 0:39 ` [PATCH 31/31] blk-throttle: implement proper hierarchy support Tejun Heo
2013-05-02 17:34 ` [PATCHSET] " Vivek Goyal
2013-05-02 17:57 ` Tejun Heo
2013-05-02 18:17 ` Vivek Goyal
2013-05-02 18:29 ` Tejun Heo
2013-05-02 18:45 ` Vivek Goyal
2013-05-02 18:49 ` Tejun Heo
2013-05-02 19:07 ` Vivek Goyal
2013-05-02 19:11 ` Tejun Heo
2013-05-02 19:31 ` Vivek Goyal
2013-05-02 23:13 ` Tejun Heo
2013-05-03 17:56 ` Vivek Goyal
2013-05-03 18:57 ` Tejun Heo
2013-05-03 18:58 ` Tejun Heo
2013-05-03 19:08 ` Vivek Goyal
2013-05-03 19:14 ` Tejun Heo
2013-05-03 19:26 ` Vivek Goyal
2013-05-03 21:05 ` Vivek Goyal
2013-05-03 23:54 ` Tejun Heo
2013-05-06 17:33 ` Vivek Goyal
2013-05-02 18:08 ` Vivek Goyal [this message]
2013-05-02 18:44 ` Tejun Heo
2013-05-02 18:59 ` Vivek Goyal
2013-05-04 0:50 ` [PATCH 29.5/32] blk-throttle: add throtl_qnode for dispatch fairness Tejun Heo
2013-05-04 0:53 ` Tejun Heo
2013-05-06 16:00 ` Vivek Goyal
2013-05-06 18:35 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130502180815.GG30020@redhat.com \
--to=vgoyal@redhat.com \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).