Re: [patch 13/15] sched: expire slack quota using generation counters

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Turner <pjt@google.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kernel@vger.kernel.org,
	Bharata B Rao <bharata@linux.vnet.ibm.com>,
	Dhaval Giani <dhaval.giani@gmail.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [patch 13/15] sched: expire slack quota using generation counters
Date: Wed, 6 Apr 2011 00:22:29 -0700	[thread overview]
Message-ID: <BANLkTimOP_EtyxEZ9pOEpgJPqEEic21eCA@mail.gmail.com> (raw)
In-Reply-To: <1302010089.2225.1313.camel@twins>

On Tue, Apr 5, 2011 at 6:28 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2011-03-22 at 20:03 -0700, Paul Turner wrote:
>
> Argh, this patch is terrible for the reason that it changes the whole
> accounting just introduced and me having to re-open all the previous
> patches to look up hth stuff worked before.

I didn't think it was too drastic -- the introduction of the
generation is more incremental.  However I agree it does cause
unnecessary churn within the accounting functions across the series.
Given that expiring quota is a requirement, this can be streamlined by
introducing some of these notions earlier in the series as opposed to
bootstrapping them at the end here -- will clean it up.

>
>> @@ -436,8 +438,10 @@ void init_cfs_bandwidth(struct cfs_bandw
>>       raw_spin_lock_init(&cfs_b->lock);
>>       cfs_b->quota = cfs_b->runtime = quota;
>>       cfs_b->period = ns_to_ktime(period);
>> +     cfs_b->quota_generation = 0;
>>       INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>>
>> +
>>       hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>>       cfs_b->period_timer.function = sched_cfs_period_timer;
>
> We're in desperate need of more whitespace there? :-)

8< 8<

>
>> @@ -9333,6 +9337,8 @@ static int tg_set_cfs_bandwidth(struct t
>>       raw_spin_lock_irq(&cfs_b->lock);
>>       cfs_b->period = ns_to_ktime(period);
>>       cfs_b->runtime = cfs_b->quota = quota;
>> +
>> +     cfs_bump_quota_generation(cfs_b);
>>       raw_spin_unlock_irq(&cfs_b->lock);
>>
>>       for_each_possible_cpu(i) {
>> Index: tip/kernel/sched_fair.c
>> ===================================================================
>> --- tip.orig/kernel/sched_fair.c
>> +++ tip/kernel/sched_fair.c
>> @@ -1331,11 +1331,25 @@ static void check_cfs_rq_quota(struct cf
>>       resched_task(rq_of(cfs_rq)->curr);
>>  }
>>
>> +static void cfs_bump_quota_generation(struct cfs_bandwidth *cfs_b)
>> +{
>> +     cfs_b->quota_generation++;
>> +     smp_mb();
>> +}
>
> Memory barriers come in pairs and with a comment, you fail on both
> counts.
>

scratches head... erm right, I meant to pair this with a read barrier
on querying the generation but balked when I realized that still
yields an lfence on x86 since I didn't want to introduce that to the
update_curr() path.

While we can probably do away with the barrier completely (it's not
critical that we line them up perfectly with the new generation), I've
been thinking about this one and I think I have something a little
nicer that also reduces the shared cache hits.

We can take advantage of the fact that sched_clocks are already
synchronized within 2 jiffies and store the quota's expiration,
instead of a generation, when we refresh.

This effectively yields a fairly simple control flow (we can use
rq->clock since we're always paired with update_rq_clock operations):
a) our rq->clock < expiration always implies quota is valid

Obviously if our cpu clock is ahead of the one that issued the quota,
our quota is still valid since the real deadline is even further
behind
Even if our cpu's clock is behind the max 1.99 jiffies the amount of
time that the stale quota can remain valid is basically already within
our potential margin of error since for a long running process we
check on each tick edge anyway.

 b) our rq->clock > expiration

Again there's two cases, if our cpu clock is behind (or equal) then
the deadline has indeed passed and the quota is expired.  This can be
confirmed by comparing the global deadline with our local one (the
global expiration will have advanced with quota refresh for this to be
true).

We can also catch that our cpu is potentially ahead -- by the fact
that our rq->clock > expiration but that the global expiration has not
yet advanced.  In this case we recognize that our quota is still valid
and extend our local expiration time by either the maximum margin of
error or some fraction there of (say 1 jiffy) which is guaranteed to
push us back in case a) above.  Again this is within our existing
margin of error due to entity_tick() alignment.

This ends up looking a lot simpler and avoids much of the pressure on
the global variable since we need to compare against it in the case
where our clock passes expiration, once a quota period (as the
extension will put us in case a where we know we don't need to
consider it).

This ends up simpler than the generation muck and can be introduced
cleanly earlier in the series, avoiding the churn mentioned above.

Make sense?

>> +
>> +static inline int cfs_rq_quota_current(struct cfs_rq *cfs_rq)
>> +{
>> +     struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
>> +
>> +     return cfs_rq->quota_generation == cfs_b->quota_generation;
>> +}
>> +
>>  static void request_cfs_rq_quota(struct cfs_rq *cfs_rq)
>>  {
>>       struct task_group *tg = cfs_rq->tg;
>>       struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
>>       u64 amount = 0, min_amount;
>> +     int generation;
>
> Not initialized,
>
>>       min_amount = sched_cfs_bandwidth_slice() + (-cfs_rq->quota_remaining);
>>
>> @@ -1347,10 +1361,18 @@ static void request_cfs_rq_quota(struct
>>               } else {
>>                       amount = min_amount;
>>               }
>> +             generation = cfs_b->quota_generation;
>>               raw_spin_unlock(&cfs_b->lock);
>>       }
>
> and since there's an if there, one can fail it, leaving generation
> uninitialized,
>
>>
>> +     /* a deficit should be carried forwards, surplus should be dropped */
>> +
>> +     if (generation != cfs_rq->quota_generation &&
>> +         cfs_rq->quota_remaining > 0)
>> +             cfs_rq->quota_remaining = 0;
>> +
>>       cfs_rq->quota_remaining += amount;
>> +     cfs_rq->quota_generation = generation;
>>  }
>
> Resulting in uninitialized usage right there.
>

Truth

next prev parent reply	other threads:[~2011-04-06  7:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-23  3:03 [patch 00/15] CFS Bandwidth Control V5 Paul Turner
2011-03-23  3:03 ` [patch 01/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
2011-03-24 12:38   ` Kamalesh Babulal
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 02/15] sched: validate CFS quota hierarchies Paul Turner
2011-03-23 10:39   ` torbenh
2011-03-23 20:49     ` Paul Turner
2011-03-24  6:31   ` Bharata B Rao
2011-04-08 17:01     ` Peter Zijlstra
2011-03-29  6:57   ` Hidetoshi Seto
2011-04-04 23:10     ` Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 03/15] sched: accumulate per-cfs_rq cpu usage Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06 20:44     ` Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06 20:47     ` Paul Turner
2011-03-23  3:03 ` [patch 04/15] sched: throttle cfs_rq entities which exceed their local quota Paul Turner
2011-03-23  5:09   ` Mike Galbraith
2011-03-23 20:53     ` Paul Turner
2011-03-24  6:36   ` Bharata B Rao
2011-03-24  7:40     ` Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-05 23:15     ` Paul Turner
2011-03-23  3:03 ` [patch 05/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-05 13:33     ` Peter Zijlstra
2011-04-05 13:28   ` Peter Zijlstra
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 06/15] sched: allow for positional tg_tree walks Paul Turner
2011-03-23  3:03 ` [patch 07/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 08/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06  2:31     ` Paul Turner
2011-03-23  3:03 ` [patch 09/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 10/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 11/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 12/15] sched: maintain throttled rqs as a list Paul Turner
2011-04-22  2:50   ` Hidetoshi Seto
2011-04-24 21:23     ` Paul Turner
2011-03-23  3:03 ` [patch 13/15] sched: expire slack quota using generation counters Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06  7:22     ` Paul Turner [this message]
2011-04-06  8:15       ` Peter Zijlstra
2011-04-06 11:26       ` Peter Zijlstra
2011-03-23  3:03 ` [patch 14/15] sched: return unused quota on voluntary sleep Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06  2:25     ` Paul Turner
2011-03-23  3:03 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
2011-03-24  6:38   ` Bharata B Rao
2011-03-24 16:12 ` [patch 00/15] CFS Bandwidth Control V5 Bharata B Rao
2011-03-31  7:57 ` Xiao Guangrong
2011-04-04 23:10   ` Paul Turner
2011-04-05 13:28 ` Peter Zijlstra
2011-05-20  2:12 ` Test for CFS Bandwidth Control V6 Xiao Guangrong
2011-05-24  0:53   ` Hidetoshi Seto
2011-05-24  7:56     ` Xiao Guangrong
2011-06-08  2:54     ` Paul Turner
2011-06-08  5:55       ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTimOP_EtyxEZ9pOEpgJPqEEic21eCA@mail.gmail.com \
    --to=pjt@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=dhaval.giani@gmail.com \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).