From: Shaohua Li <shli@fb.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <axboe@kernel.dk>,
<tj@kernel.org>, <jmoyer@redhat.com>, <Kernel-team@fb.com>
Subject: Re: [RFC 2/3] blk-throttling: weight based throttling
Date: Thu, 21 Jan 2016 13:00:44 -0800 [thread overview]
Message-ID: <20160121210043.GA3696942@devbig084.prn1.facebook.com> (raw)
In-Reply-To: <20160121203332.GD8379@redhat.com>
On Thu, Jan 21, 2016 at 03:33:32PM -0500, Vivek Goyal wrote:
> On Wed, Jan 20, 2016 at 09:49:18AM -0800, Shaohua Li wrote:
> > We know total bandwidth of a disk and can calculate cgroup's bandwidth
> > percentage against disk bandwidth according to its weight. We can easily
> > calculate cgroup bandwidth.
> >
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > ---
> > block/blk-throttle.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 134 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > index 2149a1d..b3f847d 100644
> > --- a/block/blk-throttle.c
> > +++ b/block/blk-throttle.c
> > @@ -12,6 +12,9 @@
> > #include <linux/blk-cgroup.h>
> > #include "blk.h"
> >
> > +#define MAX_WEIGHT (1000)
> > +#define WEIGHT_RATIO_SHIFT (12)
> > +#define WEIGHT_RATIO (1 << WEIGHT_RATIO_SHIFT)
> > /* Max dispatch from a group in 1 round */
> > static int throtl_grp_quantum = 8;
> >
> > @@ -74,6 +77,10 @@ struct throtl_service_queue {
> > unsigned int nr_pending; /* # queued in the tree */
> > unsigned long first_pending_disptime; /* disptime of the first tg */
> > struct timer_list pending_timer; /* fires on first_pending_disptime */
> > +
> > + unsigned int weight;
> > + unsigned int children_weight;
> > + unsigned int ratio;
>
> Will it be better to call it "share" instead of "ratio". It is basically
> a measure of % disk share of the group and share seems more intuitive.
Ok
>
> [..]
> > +static void tg_update_bps(struct throtl_grp *tg)
> > +{
> > + struct throtl_service_queue *sq, *parent_sq;
> > +
> > + sq = &tg->service_queue;
> > + parent_sq = sq->parent_sq;
> > +
> > + if (!tg->td->weight_based || !parent_sq)
> > + return;
> > + sq->ratio = max_t(unsigned int,
> > + parent_sq->ratio * sq->weight / parent_sq->children_weight,
> > + 1);
> > +
>
> It might be good to decouple updation of "share/ratio" and updation of
> bps. Change of share can happen any time either weight is changed or
> an active group is queue/dequeued and we don't have to do it every time
> a bio is submitted.
Ok
> > + tg->bps[READ] = max_t(uint64_t,
> > + (queue_bandwidth(tg->td, READ) * sq->ratio) >>
> > + WEIGHT_RATIO_SHIFT,
> > + 1024);
> > + tg->bps[WRITE] = max_t(uint64_t,
> > + (queue_bandwidth(tg->td, WRITE) * sq->ratio) >>
> > + WEIGHT_RATIO_SHIFT,
> > + 1024);
> > +}
> > +
> > +static void tg_update_ratio(struct throtl_grp *tg)
> > +{
> > + struct throtl_data *td = tg->td;
> > + struct cgroup_subsys_state *pos_css;
> > + struct blkcg_gq *blkg;
> > +
> > + blkg_for_each_descendant_pre(blkg, pos_css, td->queue->root_blkg) {
>
> Is it possible to traverse only the affected subtree instead of whole
> tree of groups. Because if weight is updated on a group, then we just
> need to traverse the subtree under that group's parent.
makes sense
> [..]
> > @@ -1415,6 +1546,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
> >
> > sq = &tg->service_queue;
> >
> > + tg_update_bps(tg);
>
> Updating bps for every bio submitted sounds like a lot. We probably could
> do it when first bio gets queued in the group and then refresh it at
> some regular interval. Say when next set of dispatch happens from group
> we could update bandwidth of group after dispatch.
That calculation isn't very heavy. I'll revisit this if it's a problem.
Thanks,
Shaohua
next prev parent reply other threads:[~2016-01-21 21:01 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-20 17:49 [RFC 0/3] block: proportional based blk-throttling Shaohua Li
2016-01-20 17:49 ` [RFC 1/3] block: estimate disk bandwidth Shaohua Li
2016-01-20 17:49 ` [RFC 2/3] blk-throttling: weight based throttling Shaohua Li
2016-01-21 20:33 ` Vivek Goyal
2016-01-21 21:00 ` Shaohua Li [this message]
2016-01-20 17:49 ` [RFC 3/3] blk-throttling: detect inactive cgroup Shaohua Li
2016-01-21 20:44 ` Vivek Goyal
2016-01-21 21:05 ` Shaohua Li
2016-01-21 21:09 ` Vivek Goyal
2016-01-20 19:05 ` [RFC 0/3] block: proportional based blk-throttling Vivek Goyal
2016-01-20 19:34 ` Shaohua Li
2016-01-20 19:40 ` Vivek Goyal
2016-01-20 19:43 ` Shaohua Li
2016-01-20 19:54 ` Vivek Goyal
2016-01-20 21:11 ` Vivek Goyal
2016-01-20 21:34 ` Shaohua Li
2016-01-21 21:10 ` Tejun Heo
2016-01-21 22:24 ` Shaohua Li
2016-01-21 22:41 ` Tejun Heo
2016-01-22 0:00 ` Shaohua Li
2016-01-22 14:48 ` Tejun Heo
2016-01-22 15:52 ` Vivek Goyal
2016-01-22 18:00 ` Shaohua Li
2016-01-22 19:09 ` Vivek Goyal
2016-01-22 19:45 ` Shaohua Li
2016-01-22 20:04 ` Vivek Goyal
2016-01-22 17:57 ` Shaohua Li
2016-01-22 18:08 ` Tejun Heo
2016-01-22 19:11 ` Shaohua Li
2016-01-22 14:43 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160121210043.GA3696942@devbig084.prn1.facebook.com \
--to=shli@fb.com \
--cc=Kernel-team@fb.com \
--cc=axboe@kernel.dk \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.