Re: [RFC PATCH 2/3] sched: add yield_to function

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Rik van Riel <riel@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Avi Kiviti <avi@redhat.com>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 2/3] sched: add yield_to function
Date: Fri, 03 Dec 2010 14:23:39 +0100	[thread overview]
Message-ID: <1291382619.32004.2124.camel@laptop> (raw)
In-Reply-To: <20101202144423.3ad1908d@annuminas.surriel.com>

On Thu, 2010-12-02 at 14:44 -0500, Rik van Riel wrote:
				unsigned long clone_flags);
> +
> +#ifdef CONFIG_SCHED_HRTICK
> +extern u64 slice_remain(struct task_struct *);
> +extern void yield_to(struct task_struct *);
> +#else
> +static inline void yield_to(struct task_struct *p) yield()
> +#endif

That does SCHED_HRTICK have to do with any of this?

>  #ifdef CONFIG_SMP
>   extern void kick_process(struct task_struct *tsk);
>  #else
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f8e5a25..ef088cd 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1909,6 +1909,26 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
>  	p->se.on_rq = 0;
>  }
>  
> +/**
> + * requeue_task - requeue a task which priority got changed by yield_to

priority doesn't seem the right word, you're not actually changing
anything related to p->*prio

> + * @rq: the tasks's runqueue
> + * @p: the task in question
> + * Must be called with the runqueue lock held. Will cause the CPU to
> + * reschedule if p is now at the head of the runqueue.
> + */
> +void requeue_task(struct rq *rq, struct task_struct *p)
> +{
> +	assert_spin_locked(&rq->lock);
> +
> +	if (!p->se.on_rq || task_running(rq, p) || task_has_rt_policy(p))
> +		return;
> +
> +	dequeue_task(rq, p, 0);
> +	enqueue_task(rq, p, 0);
> +
> +	resched_task(p);

I guess that wants to be something like check_preempt_curr()

> +}
> +
>  /*
>   * __normal_prio - return the priority that is based on the static prio
>   */
> @@ -6797,6 +6817,36 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
>  	return ret;
>  }
>  
> +#ifdef CONFIG_SCHED_HRTICK

Still wondering what all this has to do with SCHED_HRTICK..

> +/*
> + * Yield the CPU, giving the remainder of our time slice to task p.
> + * Typically used to hand CPU time to another thread inside the same
> + * process, eg. when p holds a resource other threads are waiting for.
> + * Giving priority to p may help get that resource released sooner.
> + */
> +void yield_to(struct task_struct *p)
> +{
> +	unsigned long flags;
> +	struct sched_entity *se = &p->se;
> +	struct rq *rq;
> +	struct cfs_rq *cfs_rq;
> +	u64 remain = slice_remain(current);
> +
> +	rq = task_rq_lock(p, &flags);
> +	if (task_running(rq, p) || task_has_rt_policy(p))
> +		goto out;

See, this all ain't nice, slice_remain() don't make no sense to be
called for !fair tasks.

Why not write:

  if (curr->sched_class == p->sched_class &&
      curr->sched_class->yield_to)
	curr->sched_class->yield_to(curr, p);

or something, and then implement sched_class_fair::yield_to only,
leaving it a NOP for all other classes.


Also, I think you can side-step that whole curr vs p rq->lock thing
you're doing here, by holding p's rq->lock, you've disabled IRQs in
current's task context, since ->sum_exec_runtime and all are only
changed during scheduling and the scheduler_tick, disabling IRQs in its
task context pins them.

> +	cfs_rq = cfs_rq_of(se);
> +	se->vruntime -= remain;
> +	if (se->vruntime < cfs_rq->min_vruntime)
> +		se->vruntime = cfs_rq->min_vruntime;

Now here we have another problem, remain was measured in wall-time, and
then you go change a virtual time measure using that. These things are
related like:

 vt = t/weight

So you're missing a weight factor somewhere.

Also, that check against min_vruntime doesn't really make much sense.


> +	requeue_task(rq, p);

Just makes me wonder why you added requeue task to begin with.. why not
simply dequeue at the top of this function, and enqueue at the tail,
like all the rest does: see rt_mutex_setprio(), set_user_nice(),
sched_move_task().

> + out:
> +	task_rq_unlock(rq, &flags);
> +	yield();
> +}
> +EXPORT_SYMBOL(yield_to);

EXPORT_SYMBOL_GPL() pretty please, I really hate how kvm is a module and
needs to export hooks all over the core kernel :/

> +#endif
> +
>  /**
>   * sys_sched_yield - yield the current processor to other threads.
>   *
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 5119b08..2a0a595 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -974,6 +974,25 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>   */
>  
>  #ifdef CONFIG_SCHED_HRTICK
> +u64 slice_remain(struct task_struct *p)
> +{
> +	unsigned long flags;
> +	struct sched_entity *se = &p->se;
> +	struct cfs_rq *cfs_rq;
> +	struct rq *rq;
> +	u64 slice, ran;
> +	s64 delta;
> +
> +	rq = task_rq_lock(p, &flags);
> +	cfs_rq = cfs_rq_of(se);
> +	slice = sched_slice(cfs_rq, se);
> +	ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> +	delta = slice - ran;
> +	task_rq_unlock(rq, &flags);
> +
> +	return max(delta, 0LL);
> +}


Right, so another approach might be to simply swap the vruntime between
curr and p.

next prev parent reply	other threads:[~2010-12-03 13:23 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-02 19:41 [RFC PATCH 0/3] directed yield for Pause Loop Exiting Rik van Riel
2010-12-02 19:43 ` [RFC PATCH 1/3] kvm: keep track of which task is running a KVM vcpu Rik van Riel
2010-12-03  1:18   ` Chris Wright
2010-12-03 14:50     ` Rik van Riel
2010-12-03 15:55       ` Chris Wright
2010-12-05 12:40       ` Avi Kivity
2010-12-03 12:17   ` Srivatsa Vaddagiri
2010-12-03 14:16     ` Rik van Riel
2010-12-05 12:59       ` Avi Kivity
2010-12-02 19:44 ` [RFC PATCH 2/3] sched: add yield_to function Rik van Riel
2010-12-03  0:50   ` Chris Wright
2010-12-03 18:27     ` Rik van Riel
2010-12-03 19:30       ` Chris Wright
2010-12-03 21:30       ` Peter Zijlstra
2010-12-03  5:54   ` Mike Galbraith
2010-12-03 13:46     ` Srivatsa Vaddagiri
2010-12-03 14:45       ` Mike Galbraith
2010-12-03 14:48         ` Rik van Riel
2010-12-03 15:09           ` Mike Galbraith
2010-12-03 15:35             ` Rik van Riel
2010-12-03 16:20               ` Srivatsa Vaddagiri
2010-12-03 17:09                 ` Rik van Riel
2010-12-03 17:29                   ` Srivatsa Vaddagiri
2010-12-03 17:33                     ` Rik van Riel
2010-12-03 17:45                       ` Srivatsa Vaddagiri
2010-12-03 20:05               ` Mike Galbraith
2010-12-03 21:26             ` Peter Zijlstra
2010-12-03 13:23   ` Peter Zijlstra [this message]
2010-12-03 13:30     ` Srivatsa Vaddagiri
2010-12-03 14:03       ` Peter Zijlstra
2010-12-03 14:06         ` Srivatsa Vaddagiri
2010-12-03 14:10           ` Srivatsa Vaddagiri
2010-12-03 21:23             ` Peter Zijlstra
2010-12-04 13:02               ` Rik van Riel
2010-12-10  4:34           ` Rik van Riel
2010-12-10  8:39             ` Srivatsa Vaddagiri
2010-12-10 14:55               ` Rik van Riel
2010-12-08 17:55     ` Rik van Riel
2010-12-08 20:00       ` Peter Zijlstra
2010-12-08 20:04         ` Peter Zijlstra
2010-12-08 22:59         ` Rik van Riel
2010-12-02 19:45 ` [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin Rik van Riel
2010-12-03  2:24   ` Chris Wright
2010-12-05 12:58     ` Avi Kivity
2010-12-05 12:56   ` Avi Kivity
2010-12-08 22:38     ` Rik van Riel
2010-12-09 10:28       ` Avi Kivity
2010-12-09 17:07         ` Rik van Riel
2010-12-11  7:27           ` Avi Kivity
2010-12-02 22:41 ` [RFC PATCH 0/3] directed yield for Pause Loop Exiting Chris Wright
2010-12-05 13:02   ` Avi Kivity
2010-12-10  5:03 ` Balbir Singh
2010-12-10 14:54   ` Rik van Riel
2010-12-11  7:31   ` Avi Kivity
2010-12-11 13:57     ` Balbir Singh
2010-12-13 11:57       ` Avi Kivity
2010-12-13 12:39         ` Balbir Singh
2010-12-13 12:42           ` Avi Kivity
2010-12-13 17:02       ` Rik van Riel
2010-12-14  9:25         ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291382619.32004.2124.camel@laptop \
    --to=a.p.zijlstra@chello.nl \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox