linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Turner <pjt@google.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Bharata B Rao <bharata@linux.vnet.ibm.com>,
	Dhaval Giani <dhaval.giani@gmail.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, Pavel Emelyanov <xemul@openvz.org>,
	Nikhil Rao <ncrao@google.com>
Subject: Re: [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh
Date: Wed, 11 May 2011 02:24:14 -0700	[thread overview]
Message-ID: <BANLkTi=NmCxKX6EbDQcJYDJ5kKyG2N1ssw@mail.gmail.com> (raw)
In-Reply-To: <4DC8E810.1020208@jp.fujitsu.com>

On Tue, May 10, 2011 at 12:24 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> Some comments...
>
> (2011/05/03 18:28), Paul Turner wrote:
>> At the start of a new period there are several actions we must refresh the
>> global bandwidth pool as well as unthrottle any cfs_rq entities who previously
>> ran out of bandwidth (as quota permits).
>>
>> Unthrottled entities have the cfs_rq->throttled flag cleared and are re-enqueued
>> into the cfs entity hierarchy.
>>
>> Signed-off-by: Paul Turner <pjt@google.com>
>> Signed-off-by: Nikhil Rao <ncrao@google.com>
>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>> ---
>>  kernel/sched.c      |    3 +
>>  kernel/sched_fair.c |  105 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 107 insertions(+), 1 deletion(-)
>>
>> Index: tip/kernel/sched.c
>> ===================================================================
>> --- tip.orig/kernel/sched.c
>> +++ tip/kernel/sched.c
>> @@ -9294,6 +9294,9 @@ static int tg_set_cfs_bandwidth(struct t
>>               cfs_rq->runtime_enabled = quota != RUNTIME_INF;
>>               cfs_rq->runtime_remaining = 0;
>>               cfs_rq->runtime_expires = runtime_expires;
>> +
>> +             if (cfs_rq_throttled(cfs_rq))
>> +                     unthrottle_cfs_rq(cfs_rq);
>>               raw_spin_unlock_irq(&rq->lock);
>>       }
>>  out_unlock:
>> Index: tip/kernel/sched_fair.c
>> ===================================================================
>> --- tip.orig/kernel/sched_fair.c
>> +++ tip/kernel/sched_fair.c
>> @@ -1456,10 +1456,88 @@ static void check_enqueue_throttle(struc
>>               throttle_cfs_rq(cfs_rq);
>>  }
>>
>> +static void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>> +{
>> +     struct rq *rq = rq_of(cfs_rq);
>> +     struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
>> +     struct sched_entity *se;
>> +     int enqueue = 1;
>> +     long task_delta;
>> +
>> +     se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
>> +
>> +     cfs_rq->throttled = 0;
>> +     raw_spin_lock(&cfs_b->lock);
>> +     list_del_rcu(&cfs_rq->throttled_list);
>> +     raw_spin_unlock(&cfs_b->lock);
>> +
>> +     if (!cfs_rq->load.weight)
>> +             return;
>> +
>> +     task_delta = cfs_rq->h_nr_running;
>> +     for_each_sched_entity(se) {
>> +             if (se->on_rq)
>> +                     enqueue = 0;
>> +
>> +             cfs_rq = cfs_rq_of(se);
>> +             if (enqueue)
>> +                     enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>> +             cfs_rq->h_nr_running += task_delta;
>> +
>> +             if (cfs_rq_throttled(cfs_rq))
>> +                     break;
>> +     }
>> +
>> +     if (!se)
>> +             rq->nr_running += task_delta;
>> +
>> +     /* determine whether we need to wake up potentially idle cpu */
>> +     if (rq->curr == rq->idle && rq->cfs.nr_running)
>> +             resched_task(rq->curr);
>> +}
>> +
>> +static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>> +             u64 remaining, u64 expires)
>> +{
>> +     struct cfs_rq *cfs_rq;
>> +     u64 runtime = remaining;
>> +
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
>> +                             throttled_list) {
>> +             struct rq *rq = rq_of(cfs_rq);
>> +
>> +             raw_spin_lock(&rq->lock);
>> +             if (!cfs_rq_throttled(cfs_rq))
>> +                     goto next;
>> +
>> +             runtime = -cfs_rq->runtime_remaining + 1;
>
> It will helpful if a comment can explain why negative and 1.

Remaining runtime of <= 0 implies that there was no bandwidth
available.  See checks below et al. in check_... functions.

We choose the minimum amount here to return to a positive quota state.

Originally I had elected to take a full slice here.  The limitation
became that this then effectively duplicated the assign_cfs_rq_runtime
path, and would require the quota handed out in each to be in
lockstep.  Another trade-off is be that when we're in a large state of
arrears, handing out this extra bandwidth (in excess of the minimum
+1) up-front may prevent us from unthrottling another cfs_rq.

Will add a comment explaining that the minimum amount to leave arrears
is chosen above.

>
>> +             if (runtime > remaining)
>> +                     runtime = remaining;
>> +             remaining -= runtime;
>> +
>> +             cfs_rq->runtime_remaining += runtime;
>> +             cfs_rq->runtime_expires = expires;
>> +
>> +             /* we check whether we're throttled above */
>> +             if (cfs_rq->runtime_remaining > 0)
>> +                     unthrottle_cfs_rq(cfs_rq);
>> +
>> +next:
>> +             raw_spin_unlock(&rq->lock);
>> +
>> +             if (!remaining)
>> +                     break;
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     return remaining;
>> +}
>> +
>>  static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>>  {
>>       u64 quota, runtime = 0, runtime_expires;
>> -     int idle = 0;
>> +     int idle = 0, throttled = 0;
>>
>>       runtime_expires = sched_clock_cpu(smp_processor_id());
>>
>> @@ -1469,6 +1547,7 @@ static int do_sched_cfs_period_timer(str
>>       if (quota != RUNTIME_INF) {
>>               runtime = quota;
>>               runtime_expires += ktime_to_ns(cfs_b->period);
>> +             throttled = !list_empty(&cfs_b->throttled_cfs_rq);
>>
>>               cfs_b->runtime = runtime;
>>               cfs_b->runtime_expires = runtime_expires;
>> @@ -1477,6 +1556,30 @@ static int do_sched_cfs_period_timer(str
>>       }
>>       raw_spin_unlock(&cfs_b->lock);
>>
>> +     if (!throttled || quota == RUNTIME_INF)
>> +             goto out;
>> +     idle = 0;
>> +
>> +retry:
>> +     runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
>> +
>> +     raw_spin_lock(&cfs_b->lock);
>> +     /* new new bandwidth may have been set */
>
> Typo? new, newer, newest...?
>

s/new new/new/ :)

>> +     if (unlikely(runtime_expires != cfs_b->runtime_expires))
>> +             goto out_unlock;
>> +     /*
>> +      * make sure no-one was throttled while we were handing out the new
>> +      * runtime.
>> +      */
>> +     if (runtime > 0 && !list_empty(&cfs_b->throttled_cfs_rq)) {
>> +             raw_spin_unlock(&cfs_b->lock);
>> +             goto retry;
>> +     }
>> +     cfs_b->runtime = runtime;
>> +     cfs_b->idle = idle;
>> +out_unlock:
>> +     raw_spin_unlock(&cfs_b->lock);
>> +out:
>>       return idle;
>>  }
>>  #else
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> It would be better if this unthrottle patch (09/15) comes before
> throttle patch (08/15) in this series, not to make a small window
> in the history that throttled entity never back to the run queue.
> But I'm just paranoid...
>

The feature is inert unless bandwidth is set so this should be safe.

The trade-off with reversing the order is that a patch undoing state
that doesn't yet exist looks very strange :).  If the above is a
concern I'd probably prefer to separate it into 3 parts:
1. add throttle
2. add unthrottle
3. enable throttle

Where (3) would consist only of the enqueue/put checks to trigger throttling.


>
> Thanks,
> H.Seto
>
>

  reply	other threads:[~2011-05-11 16:30 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-03  9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
2011-05-03  9:28 ` [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-05-10  7:14   ` Hidetoshi Seto
2011-05-10  8:32     ` Mike Galbraith
2011-05-11  7:55       ` Hidetoshi Seto
2011-05-11  8:13         ` Paul Turner
2011-05-11  8:45           ` Mike Galbraith
2011-05-11  8:59             ` Hidetoshi Seto
2011-05-03  9:28 ` [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
2011-05-10  7:17   ` Hidetoshi Seto
2011-05-03  9:28 ` [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
2011-05-10  7:18   ` Hidetoshi Seto
2011-05-03  9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
2011-05-10  7:20   ` Hidetoshi Seto
2011-05-11  9:37     ` Paul Turner
2011-05-16  9:30   ` Peter Zijlstra
2011-05-16  9:43   ` Peter Zijlstra
2011-05-16 12:32     ` Paul Turner
2011-05-17 15:26       ` Peter Zijlstra
2011-05-18  7:16         ` Paul Turner
2011-05-18 11:57           ` Peter Zijlstra
2011-05-03  9:28 ` [patch 05/15] sched: add a timer to handle CFS bandwidth refresh Paul Turner
2011-05-10  7:21   ` Hidetoshi Seto
2011-05-11  9:27     ` Paul Turner
2011-05-16 10:18   ` Peter Zijlstra
2011-05-16 12:56     ` Paul Turner
2011-05-03  9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
2011-05-10  7:22   ` Hidetoshi Seto
2011-05-11  9:25     ` Paul Turner
2011-05-16 10:27   ` Peter Zijlstra
2011-05-16 12:59     ` Paul Turner
2011-05-17 15:28       ` Peter Zijlstra
2011-05-18  7:02         ` Paul Turner
2011-05-16 10:32   ` Peter Zijlstra
2011-05-03  9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
2011-05-10  7:22   ` Hidetoshi Seto
2011-05-16 11:05   ` Peter Zijlstra
2011-05-16 11:07   ` Peter Zijlstra
2011-05-03  9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
2011-05-10  7:23   ` Hidetoshi Seto
2011-05-16 15:58   ` Peter Zijlstra
2011-05-16 16:05   ` Peter Zijlstra
2011-05-03  9:28 ` [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
2011-05-10  7:24   ` Hidetoshi Seto
2011-05-11  9:24     ` Paul Turner [this message]
2011-05-03  9:28 ` [patch 10/15] sched: allow for positional tg_tree walks Paul Turner
2011-05-10  7:24   ` Hidetoshi Seto
2011-05-17 13:31   ` Peter Zijlstra
2011-05-18  7:18     ` Paul Turner
2011-05-03  9:28 ` [patch 11/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
2011-05-10  7:26   ` Hidetoshi Seto
2011-05-11  9:11     ` Paul Turner
2011-05-03  9:28 ` [patch 12/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
2011-05-10  7:27   ` Hidetoshi Seto
2011-05-11  9:10     ` Paul Turner
2011-05-03  9:28 ` [patch 13/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-05-10  7:27   ` Hidetoshi Seto
2011-05-11  7:56   ` Hidetoshi Seto
2011-05-11  9:09     ` Paul Turner
2011-05-03  9:29 ` [patch 14/15] sched: return unused runtime on voluntary sleep Paul Turner
2011-05-10  7:28   ` Hidetoshi Seto
2011-05-03  9:29 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
2011-05-10  7:29   ` Hidetoshi Seto
2011-05-11  9:09     ` Paul Turner
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
2011-06-08  3:09   ` Paul Turner
2011-06-08 10:46   ` Vladimir Davydov
2011-06-08 16:32     ` Kamalesh Babulal
2011-06-09  3:25       ` Paul Turner
2011-06-10 18:17         ` Kamalesh Babulal
2011-06-14  0:00           ` Paul Turner
2011-06-15  5:37             ` Kamalesh Babulal
2011-06-21 19:48               ` Paul Turner
2011-06-24 15:05                 ` Kamalesh Babulal
2011-09-07 11:00                 ` Srivatsa Vaddagiri
2011-09-07 14:54                 ` Srivatsa Vaddagiri
2011-09-07 15:20                 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede Srivatsa Vaddagiri
2011-09-07 19:22                   ` Peter Zijlstra
2011-09-08 15:15                     ` Srivatsa Vaddagiri
2011-09-09 12:31                       ` Peter Zijlstra
2011-09-09 13:26                         ` Srivatsa Vaddagiri
2011-09-12 10:17                         ` Srivatsa Vaddagiri
2011-09-12 12:35                           ` Peter Zijlstra
2011-09-13  4:15                             ` Srivatsa Vaddagiri
2011-09-13  5:03                               ` Srivatsa Vaddagiri
2011-09-13  5:05                                 ` Srivatsa Vaddagiri
2011-09-13  9:39                                 ` Peter Zijlstra
2011-09-13 11:28                                   ` Srivatsa Vaddagiri
2011-09-13 14:07                                     ` Peter Zijlstra
2011-09-13 16:21                                       ` Srivatsa Vaddagiri
2011-09-13 16:33                                         ` Peter Zijlstra
2011-09-13 17:41                                           ` Srivatsa Vaddagiri
2011-09-13 16:36                                         ` Peter Zijlstra
2011-09-13 17:54                                           ` Srivatsa Vaddagiri
2011-09-13 18:03                                             ` Peter Zijlstra
2011-09-13 18:12                                               ` Srivatsa Vaddagiri
2011-09-13 18:07                                             ` Peter Zijlstra
2011-09-13 18:19                                             ` Peter Zijlstra
2011-09-13 18:28                                               ` Srivatsa Vaddagiri
2011-09-13 18:30                                                 ` Peter Zijlstra
2011-09-13 18:35                                                   ` Srivatsa Vaddagiri
2011-09-15 17:55                                                     ` Kamalesh Babulal
2011-09-15 21:48                                                       ` Peter Zijlstra
2011-09-19 17:51                                                         ` Kamalesh Babulal
2011-09-20  0:38                                                           ` Venki Pallipadi
2011-09-20 11:09                                                             ` Kamalesh Babulal
2011-09-20 13:56                                                           ` Peter Zijlstra
2011-09-20 14:04                                                           ` Peter Zijlstra
2011-09-20 12:55                                                       ` Peter Zijlstra
2011-09-21 17:34                                                         ` Kamalesh Babulal
2011-09-13 14:19                               ` Peter Zijlstra
2011-09-13 18:01                                 ` Srivatsa Vaddagiri
2011-09-13 18:23                                   ` Peter Zijlstra
2011-09-16  8:14                                     ` Paul Turner
2011-09-16  8:28                                       ` Peter Zijlstra
2011-09-19 16:35                                         ` Srivatsa Vaddagiri
2011-09-16  8:22                   ` Paul Turner
2011-06-14 10:16   ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Hidetoshi Seto
2011-06-14  6:58 ` [patch 00/15] CFS Bandwidth Control V6 Hu Tao
2011-06-14  7:29   ` Hidetoshi Seto
2011-06-14  7:44     ` Hu Tao
2011-06-15  8:37     ` Hu Tao
2011-06-16  0:57       ` Hidetoshi Seto
2011-06-16  9:45         ` Hu Tao
2011-06-17  1:22           ` Hidetoshi Seto
2011-06-17  6:05             ` Hu Tao
2011-06-17  6:25             ` Paul Turner
2011-06-17  9:13               ` Hidetoshi Seto
2011-06-18  0:28                 ` Paul Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=NmCxKX6EbDQcJYDJ5kKyG2N1ssw@mail.gmail.com' \
    --to=pjt@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=dhaval.giani@gmail.com \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=ncrao@google.com \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).