From: bsegall@google.com
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>, pjt@google.com
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily
Date: Thu, 16 Jun 2016 10:33:19 -0700 [thread overview]
Message-ID: <xm26fusdvy1s.fsf@bsegall-linux.mtv.corp.google.com> (raw)
In-Reply-To: <5762E090.4000706@yandex-team.ru> (Konstantin Khlebnikov's message of "Thu, 16 Jun 2016 20:23:28 +0300")
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> On 16.06.2016 20:03, bsegall@google.com wrote:
>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>
>>> Cgroup created inside throttled group must inherit current throttle_count.
>>> Broken throttle_count allows to nominate throttled entries as a next buddy,
>>> later this leads to null pointer dereference in pick_next_task_fair().
>>>
>>> This patch initialize cfs_rq->throttle_count at first enqueue: laziness
>>> allows to skip locking all rq at group creation. Lazy approach also allows
>>> to skip full sub-tree scan at throttling hierarchy (not in this patch).
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Cc: Stable <stable@vger.kernel.org> # v3.2+
>>> ---
>>> kernel/sched/fair.c | 19 +++++++++++++++++++
>>> kernel/sched/sched.h | 2 +-
>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 218f8e83db73..fe809fe169d2 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
>>> if (!cfs_bandwidth_used())
>>> return;
>>>
>>> + /* synchronize hierarchical throttle counter */
>>> + if (unlikely(!cfs_rq->throttle_uptodate)) {
>>> + struct rq *rq = rq_of(cfs_rq);
>>> + struct cfs_rq *pcfs_rq;
>>> + struct task_group *tg;
>>> +
>>> + cfs_rq->throttle_uptodate = 1;
>>> + /* get closest uptodate node because leaves goes first */
>>> + for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) {
>>> + pcfs_rq = tg->cfs_rq[cpu_of(rq)];
>>> + if (pcfs_rq->throttle_uptodate)
>>> + break;
>>> + }
>>> + if (tg) {
>>> + cfs_rq->throttle_count = pcfs_rq->throttle_count;
>>> + cfs_rq->throttled_clock_task = rq_clock_task(rq);
>>> + }
>>> + }
>>> +
>>
>> Checking just in enqueue is not sufficient - throttled_lb_pair can check
>> against a cfs_rq that has never been enqueued (and possibly other
>> paths).
>
> Looks like this is minor problem: in worst case load-balancer will migrate
> task into throttled hierarchy. And this could happens only once for each
> newly created group.
>
>>
>> It might also make sense to go ahead and initialize all the cfs_rqs we
>> skipped over to avoid some n^2 pathological behavior. You could also use
>> throttle_count == -1 or < 0. (We had our own version of this patch that
>> I guess we forgot to push?)
>
> n^2 shouldn't be a problem while this happens only once for each
> group.
Yeah it's shouldn't be a problem, it's more a why not.
>
> throttle_count == -1 could be overwritten when parent throttles/unthrottles
> before initialization. We could set it to INT_MIN/2 and check <0 but this
> will hide possible bugs here. One more int in the same cacheline shouldn't
> add noticeable overhead.
Yeah, you would have to check in tg_throttle_up/tg_unthrottle_up as well
to do this.
>
> I've also added this into our kernel to catch such problems without crash.
> Probably it's worth to add into upstream because stale buddy is a real pain)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5506,8 +5506,11 @@ static void set_last_buddy(struct sched_entity *se)
> if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
> return;
>
> - for_each_sched_entity(se)
> + for_each_sched_entity(se) {
> + if (WARN_ON_ONCE(!se->on_rq))
> + return;
> cfs_rq_of(se)->last = se;
> + }
> }
>
> static void set_next_buddy(struct sched_entity *se)
> @@ -5515,8 +5518,11 @@ static void set_next_buddy(struct sched_entity *se)
> if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
> return;
>
> - for_each_sched_entity(se)
> + for_each_sched_entity(se) {
> + if (WARN_ON_ONCE(!se->on_rq))
> + return;
> cfs_rq_of(se)->next = se;
> + }
> }
>
> static void set_skip_buddy(struct sched_entity *se)
>
>
>>
>>
>>> /* an active group must be handled by the update_curr()->put() path */
>>> if (!cfs_rq->runtime_enabled || cfs_rq->curr)
>>> return;
>>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>>> index 72f1f3087b04..7cbeb92a1cb9 100644
>>> --- a/kernel/sched/sched.h
>>> +++ b/kernel/sched/sched.h
>>> @@ -437,7 +437,7 @@ struct cfs_rq {
>>>
>>> u64 throttled_clock, throttled_clock_task;
>>> u64 throttled_clock_task_time;
>>> - int throttled, throttle_count;
>>> + int throttled, throttle_count, throttle_uptodate;
>>> struct list_head throttled_list;
>>> #endif /* CONFIG_CFS_BANDWIDTH */
>>> #endif /* CONFIG_FAIR_GROUP_SCHED */
next prev parent reply other threads:[~2016-06-16 17:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-16 12:57 [PATCH] sched/fair: initialize throttle_count for new task-groups lazily Konstantin Khlebnikov
2016-06-16 17:03 ` bsegall
2016-06-16 17:23 ` Konstantin Khlebnikov
2016-06-16 17:33 ` bsegall [this message]
2016-06-21 13:41 ` Konstantin Khlebnikov
2016-06-21 21:10 ` Peter Zijlstra
2016-06-22 8:10 ` Konstantin Khlebnikov
2016-06-22 8:23 ` Peter Zijlstra
2016-06-24 8:59 ` [tip:sched/urgent] sched/fair: Initialize " tip-bot for Konstantin Khlebnikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xm26fusdvy1s.fsf@bsegall-linux.mtv.corp.google.com \
--to=bsegall@google.com \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.