All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Dima Zavin <dima@android.com>
Cc: linux-kernel@vger.kernel.org,
	"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
	"Ingo Molnar" <mingo@elte.hu>,
	"Arve Hjønnevåg" <arve@android.com>
Subject: Re: [PATCH 1/2] sched: normalize sleeper's vruntime during group change
Date: Thu, 07 Oct 2010 04:24:46 +0200	[thread overview]
Message-ID: <1286418286.7422.18.camel@marge.simson.net> (raw)
In-Reply-To: <1286405790-6987-1-git-send-email-dima@android.com>

On Wed, 2010-10-06 at 15:56 -0700, Dima Zavin wrote:
> If you switch the cgroup of a sleeping thread, its vruntime does
> not get adjusted correctly for the difference between the
> min_vruntime values of the two groups.
> 
> The problem becomes most apparent when one has cgroups whose
> cpu shares differ greatly, say group A.shares=1024 and group B.shares=52.
> After some time, the vruntime of the group with the larger share (A)
> will be way ahead of the group with the small share (B). Currently,
> when a sleeping task is moved from group A to group B, it will retain its
> larger vruntime value and thus will be way ahead of all the other tasks
> in its new group. This will prevent this task from executing for an
> extended period of time.

Yeah, seems clear that normalization is a must.

Questionable is whether we should apply START_DEBIT.  That's part of a
different problem though (SCHED_PROCESS).

> This patch adds a new callback, prep_move_task, to struct sched_class
> to give sched_fair the opportunity to adjust the task's vruntime
> just before setting its new group. This allows us to properly normalize
> a sleeping task's vruntime when moving it between different cgroups.

Acked-by: Mike Galbraith <efault@gmx.de>

> Cc: Arve Hjønnevåg <arve@android.com>
> Signed-off-by: Dima Zavin <dima@android.com>
> ---
>  include/linux/sched.h |    1 +
>  kernel/sched.c        |    5 +++++
>  kernel/sched_fair.c   |   14 +++++++++++++-
>  3 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1e2a6db..ba3494e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1073,6 +1073,7 @@ struct sched_class {
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  	void (*moved_group) (struct task_struct *p, int on_rq);
> +	void (*prep_move_group) (struct task_struct *p, int on_rq);
>  #endif
>  };
>  
> diff --git a/kernel/sched.c b/kernel/sched.c
> index dc85ceb..fe4bb20 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -8297,6 +8297,11 @@ void sched_move_task(struct task_struct *tsk)
>  	if (unlikely(running))
>  		tsk->sched_class->put_prev_task(rq, tsk);
>  
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> +	if (tsk->sched_class->prep_move_group)
> +		tsk->sched_class->prep_move_group(tsk, on_rq);
> +#endif
> +
>  	set_task_rq(tsk, task_cpu(tsk));
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index db3f674..6ded59f 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3827,10 +3827,21 @@ static void set_curr_task_fair(struct rq *rq)
>  static void moved_group_fair(struct task_struct *p, int on_rq)
>  {
>  	struct cfs_rq *cfs_rq = task_cfs_rq(p);
> +	struct sched_entity *se = &p->se;
>  
>  	update_curr(cfs_rq);
>  	if (!on_rq)
> -		place_entity(cfs_rq, &p->se, 1);
> +		se->vruntime += cfs_rq->min_vruntime;
> +}
> +
> +static void prep_move_group_fair(struct task_struct *p, int on_rq)
> +{
> +	struct cfs_rq *cfs_rq = task_cfs_rq(p);
> +	struct sched_entity *se = &p->se;
> +
> +	/* normalize the runtime of a sleeping task before moving it */
> +	if (!on_rq)
> +		se->vruntime -= cfs_rq->min_vruntime;
>  }
>  #endif
>  
> @@ -3883,6 +3894,7 @@ static const struct sched_class fair_sched_class = {
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  	.moved_group		= moved_group_fair,
> +	.prep_move_group	= prep_move_group_fair,
>  #endif
>  };
>  


  reply	other threads:[~2010-10-07  2:24 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-29  6:46 [PATCH 1/2] sched: normalize sleeper's vruntime during group change Dima Zavin
2010-09-29  6:46 ` [PATCH 2/2] sched: use the old min_vruntime when normalizing on dequeue Dima Zavin
2010-10-07 21:00   ` Dima Zavin
2010-10-08  6:57     ` Mike Galbraith
2010-09-29  6:54 ` [PATCH 1/2] sched: normalize sleeper's vruntime during group change Pekka Enberg
2010-09-29  7:17   ` Dima Zavin
2010-09-29  8:13 ` Mike Galbraith
2010-09-29 19:02   ` Dima Zavin
2010-09-29 21:44   ` Dima Zavin
2010-09-30 10:47 ` Peter Zijlstra
2010-09-30 19:14   ` Dima Zavin
2010-10-01 11:59     ` Peter Zijlstra
2010-10-04 19:18       ` Dima Zavin
2010-10-06 22:56         ` Dima Zavin
2010-10-07  2:24           ` Mike Galbraith [this message]
2010-10-15 13:50         ` Peter Zijlstra
2010-10-22 13:02           ` [tip:sched/urgent] sched, cgroup: Fixup broken cgroup movement tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1286418286.7422.18.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=arve@android.com \
    --cc=dima@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.