Re: [RFC] sched/deadline: Prevent rt_time growth to infinity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Kirill Tkhai <tkhai@yandex.ru>
To: Juri Lelli <juri.lelli@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC] sched/deadline: Prevent rt_time growth to infinity
Date: Sat, 22 Feb 2014 03:50:02 +0400	[thread overview]
Message-ID: <57671393026602@web10j.yandex.ru> (raw)
In-Reply-To: <20140221175305.1e170b45be08fe05c93a33b4@gmail.com>



21.02.2014, 20:52, "Juri Lelli" <juri.lelli@gmail.com>:
> On Fri, 21 Feb 2014 17:36:41 +0100
> Juri Lelli <juri.lelli@gmail.com> wrote:
>
>>  On Fri, 21 Feb 2014 11:37:15 +0100
>>  Peter Zijlstra <peterz@infradead.org> wrote:
>>>  On Thu, Feb 20, 2014 at 02:16:00AM +0400, Kirill Tkhai wrote:
>>>>  Since deadline tasks share rt bandwidth, we must care about
>>>>  bandwidth timer set. Otherwise rt_time may grow up to infinity
>>>>  in update_curr_dl(), if there are no other available RT tasks
>>>>  on top level bandwidth.
>>>>
>>>>  I'm going to decide the problem the way below. Almost untested
>>>>  because of I skipped almost all of recent patches which haveto be applied from lkml.
>>>>
>>>>  Please say, if I skipped anything in idea. Maybe better put
>>>>  start_top_rt_bandwidth() into set_curr_task_dl()?
>>>  How about we only increment rt_time when there's an RT bandwidth timer
>>>  active?
>>>
>>>  ---
>>>  --- a/kernel/sched/rt.c
>>>  +++ b/kernel/sched/rt.c
>>>  @@ -568,6 +568,12 @@ static inline struct rt_bandwidth *sched
>>>
>>>   #endif /* CONFIG_RT_GROUP_SCHED */
>>>
>>>  +bool sched_rt_bandwidth_active(struct rt_rq *rt_rq)
>>>  +{
>>>  + struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
>>>  + return hrtimer_active(&rt_b->rt_period_timer);
>>>  +}
>>>  +
>>>   #ifdef CONFIG_SMP
>>>   /*
>>>    * We ran out of runtime, see if we can borrow some from our neighbours.
>>>  --- a/kernel/sched/deadline.c
>>>  +++ b/kernel/sched/deadline.c
>>>  @@ -587,6 +587,8 @@ int dl_runtime_exceeded(struct rq *rq, s
>>>           return 1;
>>>   }
>>>
>>>  +extern bool sched_rt_bandwidth_active(struct rt_rq *rt_rq);
>>>  +
>>>   /*
>>>    * Update the current task's runtime statistics (provided it is still
>>>    * a -deadline task and has not been removed from the dl_rq).
>>>  @@ -650,11 +652,13 @@ static void update_curr_dl(struct rq *rq
>>>                   struct rt_rq *rt_rq = &rq->rt;
>>>
>>>                   raw_spin_lock(&rt_rq->rt_runtime_lock);
>>>  - rt_rq->rt_time += delta_exec;
>>>                   /*
>>>                    * We'll let actual RT tasks worry about the overflow here, we
>>>  - * have our own CBS to keep us inline -- see above.
>>>  + * have our own CBS to keep us inline; only account when RT
>>>  + * bandwidth is relevant.
>>>                    */
>>>  + if (sched_rt_bandwidth_active(rt_rq))
>>>  + rt_rq->rt_time += delta_exec;
>>>                   raw_spin_unlock(&rt_rq->rt_runtime_lock);
>>>           }
>>>   }
>>  So, I ran some tests with the above and I'd like to share with you what
>>  I've found. You can find here a trace-cmd trace that should be feeded
>>  to kernelshark to be able to understand what follows (or feel free to
>>  reproduce same scenario :)):
>>  http://retis.sssup.it/~jlelli/traces/trace_rt_time.dat
>>
>>  Here you have a DL task (4/10) and a while(1) RT task, both running
>>  inside a rt_bw of 0.5. RT tasks is activated 500ms after DL. As I
>>  filtered in sched_rt_period_timer(), you can search for time instants
>>  when the rt_bw is replenished. It is evident that the first time after
>>  rt timer is activated back (search for start_bandwidth_timer), we can
>>  eat some bw to FAIR tasks (if any). This is due to the fact that we
>>  reset rt_bw budget at this time, start decrementing rt_time for both DL
>
> The reset happens when rt_bw replenishment timer fires, after a bit:
>
>  sched_rt_period_timer <-- __run_hrtimer

Juri, sorry, I forgot to wrote I mean the situation when only one task is on_rq
at every moment.

DL, RT, DL, RT, ...

rt_runtime = n;
rt_period  = 2n;

| DL's working, RT's sleeping  | RT's working, DL's sleeping  |   all sleep               |
------------------------------------------------------------------------------------------|
| (1)     duration = n         | (2)     duration = n         | (3)     duration = n      |  (repeat)
|------------------------------|------------------------------|---------------------------|
| (rt_bw timer is not running) |                   (rt_bw timer is running)               |


According to the patch, rt_bw timer is working only if we have queued RT task.

In the case above part (1) has no queued RT tasks, so timer is not working.
rt_time is not being increased too.

We have ratio 2/3.

Thanks,
Kirill

>
> Apologies,
>
> - Juri
>
>>  and RT tasks, throttle RT tasks when rt_time > runtime, but, since DL
>>  tasks acually executes inside their own server, they don't care about
>>  rt_bw. Good news is that steady state is ok: keeping track of overruns
>>  we are able to stop eating bw to other guys.
>>
>>  My thougths:
>>
>>   - Peter's patch is an easy fix to Kirill's problem (RT tasks were
>>     throttled too early);
>>   - something to add to this solution could be to pre-calculate bw of
>>     ready DL tasks and subtract it to rt_bw at replenishment time, but
>>     it sounds quite awkward, pessimistic, and I'm not sure it is gonna
>>     work;
>>   - we are stealing bw to best-effort tasks, and just at the beginning
>>     of the transistion, is it really a problem?
>>   - I mean, if you want guarantees make your tasks DL! :);
>>   - in the long run we are gonna have RT tasks scheduled inside CBS
>>     servers, and all this will be properly fixed up.
>>
>>  Comments?
>>
>>  BTW, rt timer activation/deactivation should probably be fixed for
>>  !RT_GROUP_SCHED with something like this:
>>
>>  ---
>>   kernel/sched/rt.c |   10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>>  diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>>  index 6161de8..274f992 100644
>>  --- a/kernel/sched/rt.c
>>  +++ b/kernel/sched/rt.c
>>  @@ -86,12 +86,12 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
>>           raw_spin_lock_init(&rt_rq->rt_runtime_lock);
>>   }
>>
>>  -#ifdef CONFIG_RT_GROUP_SCHED
>>   static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
>>   {
>>           hrtimer_cancel(&rt_b->rt_period_timer);
>>   }
>>
>>  +#ifdef CONFIG_RT_GROUP_SCHED
>>   #define rt_entity_is_task(rt_se) (!(rt_se)->my_q)
>>
>>   static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
>>  @@ -1017,8 +1017,12 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
>>           start_rt_bandwidth(&def_rt_bandwidth);
>>   }
>>
>>  -static inline
>>  -void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {}
>>  +static void
>>  +dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
>>  +{
>>  + if (!rt_rq->rt_nr_running)
>>  + destroy_rt_bandwidth(&def_rt_bandwidth);
>>  +}
>>
>>   #endif /* CONFIG_RT_GROUP_SCHED */

next prev parent reply	other threads:[~2014-02-21 23:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-19 22:16 [RFC] sched/deadline: Prevent rt_time growth to infinity Kirill Tkhai
2014-02-20 21:22 ` Steven Rostedt
2014-02-21 10:37 ` Peter Zijlstra
2014-02-21 11:33   ` Kirill Tkhai
2014-02-21 12:09     ` Kirill Tkhai
2014-02-21 12:44       ` Juri Lelli
2014-02-21 14:25         ` Kirill Tkhai
2014-02-21 16:36   ` Juri Lelli
2014-02-21 16:53     ` Juri Lelli
2014-02-21 23:50       ` Kirill Tkhai [this message]
2014-02-22  0:56     ` Kirill Tkhai
2014-02-25 14:15       ` Juri Lelli
2014-02-25 14:58         ` Kirill Tkhai
2014-02-27 13:32         ` [tip:sched/urgent] " tip-bot for Juri Lelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57671393026602@web10j.yandex.ru \
    --to=tkhai@yandex.ru \
    --cc=juri.lelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox