Re: [PATCH 4/8] bfq: Limit number of requests consumed by each cgroup

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Jan Kara <jack@suse.cz>, Paolo Valente <paolo.valente@linaro.org>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 4/8] bfq: Limit number of requests consumed by each cgroup
Date: Wed, 3 Nov 2021 14:03:14 +0100	[thread overview]
Message-ID: <20211103130314.GC20482@quack2.suse.cz> (raw)
In-Reply-To: <20211102181658.GA63407@blackbody.suse.cz>

Hello,

On Tue 02-11-21 19:16:58, Michal Koutný wrote:
> On Wed, Oct 06, 2021 at 07:31:43PM +0200, Jan Kara <jack@suse.cz> wrote:
> > +	for (level--; level >= 0; level--) {
> > +		entity = entities[level];
> > +		if (level > 0) {
> > +			wsum = bfq_entity_service_tree(entity)->wsum;
> > +		} else {
> > +			int i;
> > +			/*
> > +			 * For bfqq itself we take into account service trees
> > +			 * of all higher priority classes and multiply their
> > +			 * weights so that low prio queue from higher class
> > +			 * gets more requests than high prio queue from lower
> > +			 * class.
> > +			 */
> > +			wsum = 0;
> > +			for (i = 0; i <= class_idx; i++) {
> > +				wsum = wsum * IOPRIO_BE_NR +
> > +					sched_data->service_tree[i].wsum;
> > +			}
> > +		}
> > +		limit = DIV_ROUND_CLOSEST(limit * entity->weight, wsum);
> 
> This scheme caught my eye. You mutliply (tree) weights by a factor
> depending on the class when counting the wsum but then you don't apply
> the same scaling for the evaluated entity in the numerator.

Since we stop the loop at bfq_class_idx(entity) I don't think scaling of
the numerator makes sense - effectively when all the processes having IO to
submit (these are accounted in wsum) are in the same IO priority class, the
above code collapses to just:

  limit = limit * entity->weight / sched_data->service_tree[bfq_class_idx(entity)].wsum

I.e., we scale available tags proportionally to bfq_queue weight (which
scales linearly with IO priority).

When there are processes say both in RT and BE IO priority classes, then a
process in RT class still uses the above formula - i.e., as if all tags
available for a cgroup are split proportionally only among active tasks in
RT IO priority class. So in principle it can happen that there would be no
tag left for a process in lower IO priority class - and that is fine, we
don't care, because we don't want to submit IO from lower IO priority class
while there is still IO in higher IO priority class.

Now consider a situation for a process in BE IO priority class in this
setting. All processes in BE class can together occupy at most BE_wsum /
(RT_wsum * IOPRIO_BE_NR + BE_wsum) fraction of tags. This is admittedly
somewhat arbitrary fraction but it makes sure for each process in RT class
there are at least as many tags left as for the highest priority process in
BE class.

> IOW, I think there should be something like
> 	scale = (level > 0) ? 1 : int_pow(IOPRIO_BE_NR, BFQ_IOPRIO_CLASSES - bfq_class_idx(entity));
> 	limit = DIV_ROUND_CLOSEST(limit * entity->weight * scale, wsum);
> 
> For instance, if there are two cgroups (level=1) with weights 100 and
> 200, and each cgroup has a single IOPRIO_CLASS_BE entity (level=0) in
> it, the `limit` distribution would honor the ratio of weights from
> level=1 (100:200) but it would artificially lower the absolute amount of
> allowed tags. If I am not mistaken, that would be reduced by factor
> 1/BFQ_IOPRIO_CLASSES.

I don't see where my code would lead to available number of tags being
artifically lower as you write - in your example wsum for RT class would be
0 so effectively all terms of the formula for that class will cancel out.
As I wrote above, the highest active IO priority class effectively allows
processes in this class to consume all tags available for a cgroup. If
there are lower IO priority classes active as well, we allow them to
consume some tags but never allow them to consume all of them...

> Also if I consider it more broadly, is this supposed to match/extend
> bfq_io_prio_to_weight() calculation?

Yes, this is kind of an extension of bfq_io_prio_to_weight() that allows
some comparison of queues from different IO priority classes.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2021-11-03 13:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06 17:31 [PATCH 0/8 v3] bfq: Limit number of allocated scheduler tags per cgroup Jan Kara
2021-10-06 17:31 ` [PATCH 1/8] block: Provide icq in request allocation data Jan Kara
2021-10-06 17:31 ` [PATCH 2/8] bfq: Track number of allocated requests in bfq_entity Jan Kara
2021-10-06 17:31 ` [PATCH 3/8] bfq: Store full bitmap depth in bfq_data Jan Kara
2021-10-06 17:31 ` [PATCH 4/8] bfq: Limit number of requests consumed by each cgroup Jan Kara
2021-11-02 18:16   ` Michal Koutný
2021-11-03 13:03     ` Jan Kara [this message]
2021-11-03 18:12       ` Michal Koutný
2021-11-04 11:20         ` Jan Kara
2021-10-06 17:31 ` [PATCH 5/8] bfq: Limit waker detection in time Jan Kara
2021-10-06 17:31 ` [PATCH 6/8] bfq: Provide helper to generate bfqq name Jan Kara
2021-10-06 17:31 ` [PATCH 7/8] bfq: Log waker detections Jan Kara
2021-10-06 17:31 ` [PATCH 8/8] bfq: Do not let waker requests skip proper accounting Jan Kara
2021-10-07 16:33 ` [PATCH 0/8 v3] bfq: Limit number of allocated scheduler tags per cgroup Paolo Valente
2021-10-25  7:58   ` Paolo Valente
2021-10-25 11:14     ` Jan Kara
2021-11-10 10:24       ` Paolo Valente
  -- strict thread matches above, loose matches on Subject: below --
2021-11-23 10:29 [PATCH 0/8 v4] " Jan Kara
2021-11-23 10:29 ` [PATCH 4/8] bfq: Limit number of requests consumed by each cgroup Jan Kara
2021-11-25 13:36 [PATCH 0/8 v5] bfq: Limit number of allocated scheduler tags per cgroup Jan Kara
2021-11-25 13:36 ` [PATCH 4/8] bfq: Limit number of requests consumed by each cgroup Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211103130314.GC20482@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox