From: Kirill Tkhai <tkhai@yandex.ru>
To: Juri Lelli <juri.lelli@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC] sched/deadline: Prevent rt_time growth to infinity
Date: Sat, 22 Feb 2014 03:50:02 +0400 [thread overview]
Message-ID: <57671393026602@web10j.yandex.ru> (raw)
In-Reply-To: <20140221175305.1e170b45be08fe05c93a33b4@gmail.com>
21.02.2014, 20:52, "Juri Lelli" <juri.lelli@gmail.com>:
> On Fri, 21 Feb 2014 17:36:41 +0100
> Juri Lelli <juri.lelli@gmail.com> wrote:
>
>> On Fri, 21 Feb 2014 11:37:15 +0100
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>> On Thu, Feb 20, 2014 at 02:16:00AM +0400, Kirill Tkhai wrote:
>>>> Since deadline tasks share rt bandwidth, we must care about
>>>> bandwidth timer set. Otherwise rt_time may grow up to infinity
>>>> in update_curr_dl(), if there are no other available RT tasks
>>>> on top level bandwidth.
>>>>
>>>> I'm going to decide the problem the way below. Almost untested
>>>> because of I skipped almost all of recent patches which haveto be applied from lkml.
>>>>
>>>> Please say, if I skipped anything in idea. Maybe better put
>>>> start_top_rt_bandwidth() into set_curr_task_dl()?
>>> How about we only increment rt_time when there's an RT bandwidth timer
>>> active?
>>>
>>> ---
>>> --- a/kernel/sched/rt.c
>>> +++ b/kernel/sched/rt.c
>>> @@ -568,6 +568,12 @@ static inline struct rt_bandwidth *sched
>>>
>>> #endif /* CONFIG_RT_GROUP_SCHED */
>>>
>>> +bool sched_rt_bandwidth_active(struct rt_rq *rt_rq)
>>> +{
>>> + struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
>>> + return hrtimer_active(&rt_b->rt_period_timer);
>>> +}
>>> +
>>> #ifdef CONFIG_SMP
>>> /*
>>> * We ran out of runtime, see if we can borrow some from our neighbours.
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -587,6 +587,8 @@ int dl_runtime_exceeded(struct rq *rq, s
>>> return 1;
>>> }
>>>
>>> +extern bool sched_rt_bandwidth_active(struct rt_rq *rt_rq);
>>> +
>>> /*
>>> * Update the current task's runtime statistics (provided it is still
>>> * a -deadline task and has not been removed from the dl_rq).
>>> @@ -650,11 +652,13 @@ static void update_curr_dl(struct rq *rq
>>> struct rt_rq *rt_rq = &rq->rt;
>>>
>>> raw_spin_lock(&rt_rq->rt_runtime_lock);
>>> - rt_rq->rt_time += delta_exec;
>>> /*
>>> * We'll let actual RT tasks worry about the overflow here, we
>>> - * have our own CBS to keep us inline -- see above.
>>> + * have our own CBS to keep us inline; only account when RT
>>> + * bandwidth is relevant.
>>> */
>>> + if (sched_rt_bandwidth_active(rt_rq))
>>> + rt_rq->rt_time += delta_exec;
>>> raw_spin_unlock(&rt_rq->rt_runtime_lock);
>>> }
>>> }
>> So, I ran some tests with the above and I'd like to share with you what
>> I've found. You can find here a trace-cmd trace that should be feeded
>> to kernelshark to be able to understand what follows (or feel free to
>> reproduce same scenario :)):
>> http://retis.sssup.it/~jlelli/traces/trace_rt_time.dat
>>
>> Here you have a DL task (4/10) and a while(1) RT task, both running
>> inside a rt_bw of 0.5. RT tasks is activated 500ms after DL. As I
>> filtered in sched_rt_period_timer(), you can search for time instants
>> when the rt_bw is replenished. It is evident that the first time after
>> rt timer is activated back (search for start_bandwidth_timer), we can
>> eat some bw to FAIR tasks (if any). This is due to the fact that we
>> reset rt_bw budget at this time, start decrementing rt_time for both DL
>
> The reset happens when rt_bw replenishment timer fires, after a bit:
>
> sched_rt_period_timer <-- __run_hrtimer
Juri, sorry, I forgot to wrote I mean the situation when only one task is on_rq
at every moment.
DL, RT, DL, RT, ...
rt_runtime = n;
rt_period = 2n;
| DL's working, RT's sleeping | RT's working, DL's sleeping | all sleep |
------------------------------------------------------------------------------------------|
| (1) duration = n | (2) duration = n | (3) duration = n | (repeat)
|------------------------------|------------------------------|---------------------------|
| (rt_bw timer is not running) | (rt_bw timer is running) |
According to the patch, rt_bw timer is working only if we have queued RT task.
In the case above part (1) has no queued RT tasks, so timer is not working.
rt_time is not being increased too.
We have ratio 2/3.
Thanks,
Kirill
>
> Apologies,
>
> - Juri
>
>> and RT tasks, throttle RT tasks when rt_time > runtime, but, since DL
>> tasks acually executes inside their own server, they don't care about
>> rt_bw. Good news is that steady state is ok: keeping track of overruns
>> we are able to stop eating bw to other guys.
>>
>> My thougths:
>>
>> - Peter's patch is an easy fix to Kirill's problem (RT tasks were
>> throttled too early);
>> - something to add to this solution could be to pre-calculate bw of
>> ready DL tasks and subtract it to rt_bw at replenishment time, but
>> it sounds quite awkward, pessimistic, and I'm not sure it is gonna
>> work;
>> - we are stealing bw to best-effort tasks, and just at the beginning
>> of the transistion, is it really a problem?
>> - I mean, if you want guarantees make your tasks DL! :);
>> - in the long run we are gonna have RT tasks scheduled inside CBS
>> servers, and all this will be properly fixed up.
>>
>> Comments?
>>
>> BTW, rt timer activation/deactivation should probably be fixed for
>> !RT_GROUP_SCHED with something like this:
>>
>> ---
>> kernel/sched/rt.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 6161de8..274f992 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -86,12 +86,12 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
>> raw_spin_lock_init(&rt_rq->rt_runtime_lock);
>> }
>>
>> -#ifdef CONFIG_RT_GROUP_SCHED
>> static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
>> {
>> hrtimer_cancel(&rt_b->rt_period_timer);
>> }
>>
>> +#ifdef CONFIG_RT_GROUP_SCHED
>> #define rt_entity_is_task(rt_se) (!(rt_se)->my_q)
>>
>> static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
>> @@ -1017,8 +1017,12 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
>> start_rt_bandwidth(&def_rt_bandwidth);
>> }
>>
>> -static inline
>> -void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {}
>> +static void
>> +dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
>> +{
>> + if (!rt_rq->rt_nr_running)
>> + destroy_rt_bandwidth(&def_rt_bandwidth);
>> +}
>>
>> #endif /* CONFIG_RT_GROUP_SCHED */
next prev parent reply other threads:[~2014-02-21 23:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-19 22:16 [RFC] sched/deadline: Prevent rt_time growth to infinity Kirill Tkhai
2014-02-20 21:22 ` Steven Rostedt
2014-02-21 10:37 ` Peter Zijlstra
2014-02-21 11:33 ` Kirill Tkhai
2014-02-21 12:09 ` Kirill Tkhai
2014-02-21 12:44 ` Juri Lelli
2014-02-21 14:25 ` Kirill Tkhai
2014-02-21 16:36 ` Juri Lelli
2014-02-21 16:53 ` Juri Lelli
2014-02-21 23:50 ` Kirill Tkhai [this message]
2014-02-22 0:56 ` Kirill Tkhai
2014-02-25 14:15 ` Juri Lelli
2014-02-25 14:58 ` Kirill Tkhai
2014-02-27 13:32 ` [tip:sched/urgent] " tip-bot for Juri Lelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57671393026602@web10j.yandex.ru \
--to=tkhai@yandex.ru \
--cc=juri.lelli@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.