All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@parallels.com>
To: Ben Segall <bsegall@google.com>
Cc: <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	<khorenko@parallels.com>, Paul Turner <pjt@google.com>,
	<tkhai@yandex.ru>
Subject: Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
Date: Wed, 25 Jun 2014 12:03:38 +0400	[thread overview]
Message-ID: <1403683418.3462.35.camel@tkhai> (raw)
In-Reply-To: <1403682814.3462.33.camel@tkhai>

В Ср, 25/06/2014 в 11:53 +0400, Kirill Tkhai пишет:
> В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
> > On 24.06.2014 23:13, bsegall@google.com wrote:
> > > Kirill Tkhai <tkhai@yandex.ru> writes:
> > > 
> > >> On 24.06.2014 21:03, bsegall@google.com wrote:
> > >>> Kirill Tkhai <ktkhai@parallels.com> writes:
> > >>>
> > >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
> > >>>>
> > >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> > >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> > >>>>
> > >>>> This unthrottles all throttled cfs_rqs.
> > >>>>
> > >>>> But the cpu is still able to call schedule() till
> > >>>>
> > >>>> 	take_cpu_down->__cpu_disable()
> > >>>>
> > >>>> is called from stop_machine.
> > >>>>
> > >>>> This case the tasks from just unthrottled cfs_rqs are pickable
> > >>>> in a standard scheduler way, and they are picked by dying cpu.
> > >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
> > >>>> in migration_call skips their tasks (one more unthrottle
> > >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> > >>>> is already NULL).
> > >>>>
> > >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
> > >>>> is not accounted, and the cfs_rqs won't exceed given
> > >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> > >>>> in migrate_tasks(). runtime_enabled is recalculated again
> > >>>> when rq becomes online again.
> > >>>>
> > >>>> Ben Segall also noticed, we always enable runtime in
> > >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
> > >>>> cpus only. To fix that, we check if a cpu is online when
> > >>>> its rq is locked. This guarantees we do not have races with
> > >>>> set_rq_offline(), which also requires rq->lock.
> > >>>>
> > >>>> v2: Fix race with tg_set_cfs_bandwidth().
> > >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> > >>>>
> > >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> > >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
> > >>>> CC: Ben Segall <bsegall@google.com>
> > >>>> CC: Paul Turner <pjt@google.com>
> > >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> > >>>> CC: Peter Zijlstra <peterz@infradead.org>
> > >>>> CC: Ingo Molnar <mingo@kernel.org>
> > >>>> ---
> > >>>>  kernel/sched/core.c |   15 +++++++++++----
> > >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
> > >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
> > >>>>
> > >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > >>>> index 7f3063c..707a3c5 100644
> > >>>> --- a/kernel/sched/core.c
> > >>>> +++ b/kernel/sched/core.c
> > >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
> > >>>>  		struct rq *rq = cfs_rq->rq;
> > >>>>  
> > >>>>  		raw_spin_lock_irq(&rq->lock);
> > >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
> > >>>> -		cfs_rq->runtime_remaining = 0;
> > >>>> +		/*
> > >>>> +		 * Do not enable runtime on offline runqueues. We specially
> > >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
> > >>>> +		 */
> > >>>> +		if (cpu_online(i)) {
> > >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
> > >>>> +			cfs_rq->runtime_remaining = 0;
> > >>>> +
> > >>>> +			if (cfs_rq->throttled)
> > >>>> +				unthrottle_cfs_rq(cfs_rq);
> > >>>> +		}
> > >>>
> > >>> We can just do for_each_online_cpu, yes? Also we probably need
> > >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
> > >>> right?
> > >>>
> > >>
> > >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
> > >> get_online_cpus() taken. But it adds one more lock dependence.
> > >> This looks worse for me.
> > > 
> > > I mean, you need get_online_cpus anyway - cpu_online is just a test
> > > against the same mask that for_each_online_cpu uses, and without taking
> > > the lock you can still race with offlining and reset runtime_enabled.
> > > 
> > 
> > Oh, I see. Thanks.
> 
> But we can check for rq->online, don't we? How about this?

Oh, rq->online presents only in SMP case... Ok, lets do with
get_online_cpus()...

> 
>     sched/fair: Disable runtime_enabled on dying rq
>     
>     We kill rq->rd on the CPU_DOWN_PREPARE stage:
>     
>     	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>     	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>     
>     This unthrottles all throttled cfs_rqs.
>     
>     But the cpu is still able to call schedule() till
>     
>     	take_cpu_down->__cpu_disable()
>     
>     is called from stop_machine.
>     
>     This case the tasks from just unthrottled cfs_rqs are pickable
>     in a standard scheduler way, and they are picked by dying cpu.
>     The cfs_rqs becomes throttled again, and migrate_tasks()
>     in migration_call skips their tasks (one more unthrottle
>     in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>     is already NULL).
>     
>     Patch sets runtime_enabled to zero. This guarantees, the runtime
>     is not accounted, and the cfs_rqs won't exceed given
>     cfs_rq->runtime_remaining = 1, and tasks will be pickable
>     in migrate_tasks(). runtime_enabled is recalculated again
>     when rq becomes online again.
>     
>     Ben Segall also noticed, we always enable runtime in
>     tg_set_cfs_bandwidth(). Actually, we should do that for online
>     cpus only. To fix that, we check if a cpu is online when
>     its rq is locked. This guarantees we do not have races with
>     set_rq_offline(), which also requires rq->lock.
>     
>     v2: Fix race with tg_set_cfs_bandwidth().
>         Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>     v3: Check for rq->online instead of cpu_active.
>     
>     Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ceea8d0..4c163c9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7837,11 +7837,17 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>  		struct rq *rq = cfs_rq->rq;
>  
>  		raw_spin_lock_irq(&rq->lock);
> -		cfs_rq->runtime_enabled = runtime_enabled;
> -		cfs_rq->runtime_remaining = 0;
> +		/*
> +		 * Do not enable runtime on offline runqueues. We specially
> +		 * disable it in unthrottle_offline_cfs_rqs().
> +		 */
> +		if (rq->online) {
> +			cfs_rq->runtime_enabled = runtime_enabled;
> +			cfs_rq->runtime_remaining = 0;
>  
> -		if (cfs_rq->throttled)
> -			unthrottle_cfs_rq(cfs_rq);
> +			if (cfs_rq->throttled)
> +				unthrottle_cfs_rq(cfs_rq);
> +		}
>  		raw_spin_unlock_irq(&rq->lock);
>  	}
>  	if (runtime_was_enabled && !runtime_enabled)
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1f9c457..5616d23 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3776,6 +3776,19 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>  	hrtimer_cancel(&cfs_b->slack_timer);
>  }
>  
> +static void __maybe_unused update_runtime_enabled(struct rq *rq)
> +{
> +	struct cfs_rq *cfs_rq;
> +
> +	for_each_leaf_cfs_rq(rq, cfs_rq) {
> +		struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
> +
> +		raw_spin_lock(&cfs_b->lock);
> +		cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
> +		raw_spin_unlock(&cfs_b->lock);
> +	}
> +}
> +
>  static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>  {
>  	struct cfs_rq *cfs_rq;
> @@ -3789,6 +3802,12 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>  		 * there's some valid quota amount
>  		 */
>  		cfs_rq->runtime_remaining = 1;
> +		/*
> +		 * Offline rq is schedulable till cpu is completely disabled
> +		 * in take_cpu_down(), so we prevent new cfs throttling here.
> +		 */
> +		cfs_rq->runtime_enabled = 0;
> +
>  		if (cfs_rq_throttled(cfs_rq))
>  			unthrottle_cfs_rq(cfs_rq);
>  	}
> @@ -3832,6 +3851,7 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
>  	return NULL;
>  }
>  static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
> +static inline void update_runtime_enabled(struct rq *rq) {}
>  static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
>  
>  #endif /* CONFIG_CFS_BANDWIDTH */
> @@ -7325,6 +7345,8 @@ void trigger_load_balance(struct rq *rq)
>  static void rq_online_fair(struct rq *rq)
>  {
>  	update_sysctl();
> +
> +	update_runtime_enabled(rq);
>  }
>  
>  static void rq_offline_fair(struct rq *rq)
> 



  reply	other threads:[~2014-06-25  8:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20140624074148.8738.57690.stgit@tkhai>
2014-06-24  7:53 ` [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq Kirill Tkhai
2014-06-24 17:03   ` bsegall
2014-06-24 19:01     ` Kirill Tkhai
2014-06-24 19:13       ` bsegall
2014-06-24 19:26         ` Kirill Tkhai
2014-06-25  7:53           ` Kirill Tkhai
2014-06-25  8:03             ` Kirill Tkhai [this message]
2014-06-25 16:52             ` bsegall
2014-06-25 17:31               ` Kirill Tkhai
2014-06-25 17:40                 ` bsegall
2014-06-25 17:44                   ` Kirill Tkhai
2014-06-26 11:08   ` Srikar Dronamraju
2014-06-24  7:53 ` [PATCH v2 2/3] sched/rt: __disable_runtime: Enqueue just unthrottled rt_rq back on the stack Kirill Tkhai
2014-06-24  7:54 ` [PATCH v2 3/3] sched: Rework check_for_tasks() Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1403683418.3462.35.camel@tkhai \
    --to=ktkhai@parallels.com \
    --cc=bsegall@google.com \
    --cc=khorenko@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tkhai@yandex.ru \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.