* Re: [RFC PATCH 2/3] sched: add yield_to function
@ 2011-01-02 11:43 Hillf Danton
2011-01-02 17:19 ` Rik van Riel
0 siblings, 1 reply; 37+ messages in thread
From: Hillf Danton @ 2011-01-02 11:43 UTC (permalink / raw)
To: Rik van Riel, Marcelo Tosatti
Cc: Peter Zijlstra, Srivatsa Vaddagiri, linux-kernel
On Thu, 2 Dec 2010 14:44:23 -0500, Rik van Riel wrote:
> Add a yield_to function to the scheduler code, allowing us to
> give the remainder of our timeslice to another thread.
>
> We may want to use this to provide a sys_yield_to system call
> one day.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Hey all
The following work is based on what Rik posted, with a few changes.
[1] the added requeue_task() is replaced with resched_task().
[2] there is no longer change by slice_remain() in schedule class.
[3] the schedule_hrtimeout() in KVM still plays its role, and it looks
nicer to move the searching of task out of kvm_vcpu_on_spin()
to be a function.
The compensation of yielded nanoseconds is not considered
or corrected in this work for both lender and borrower, but it looks
not a vulnerability since the lock contention is detected by CPU,
as Rik mentioned, and since both lender and borrower are marked
with PF_VCPU.
What scheduler should consider further for the PF_VCPU?
Cheers
Hillf
---
--- a/include/linux/sched.h 2010-11-01 19:54:12.000000000 +0800
+++ b/include/linux/sched.h 2011-01-02 18:09:38.000000000 +0800
@@ -1945,6 +1945,7 @@ static inline int rt_mutex_getprio(struc
extern void set_user_nice(struct task_struct *p, long nice);
extern int task_prio(const struct task_struct *p);
extern int task_nice(const struct task_struct *p);
+extern void yield_to(struct task_struct *, u64 *);
extern int can_nice(const struct task_struct *p, const int nice);
extern int task_curr(const struct task_struct *p);
extern int idle_cpu(int cpu);
--- a/kernel/sched.c 2010-11-01 19:54:12.000000000 +0800
+++ b/kernel/sched.c 2011-01-02 18:14:20.000000000 +0800
@@ -5151,6 +5151,42 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t
return ret;
}
+/*
+ * Yield the CPU, giving the remainder of our time slice to task p.
+ * Typically used to hand CPU time to another thread inside the same
+ * process, eg. when p holds a resource other threads are waiting for.
+ * Giving priority to p may help get that resource released sooner.
+ *
+ * @nsecs: feedback to caller the nanoseconds yielded
+ */
+void yield_to(struct task_struct *p, u64 *nsecs)
+{
+ unsigned long flags;
+ struct sched_entity *se = &p->se;
+ struct rq *rq;
+ struct cfs_rq *cfs_rq;
+ u64 vruntime;
+
+ rq = task_rq_lock(p, &flags);
+ if (task_running(rq, p) || task_has_rt_policy(p))
+ goto out;
+ cfs_rq = cfs_rq_of(se);
+ vruntime = se->vruntime;
+ se->vruntime = cfs_rq->min_vruntime;
+ if (nsecs) {
+ if (vruntime > se->vruntime)
+ vruntime -= se->vruntime;
+ else
+ vruntime = 0;
+ *nsecs = vruntime;
+ }
+ /* kick p onto its CPU */
+ resched_task(rq->curr);
+ out:
+ task_rq_unlock(rq, &flags);
+}
+EXPORT_SYMBOL_GPL(yield_to);
+
/**
* sys_sched_yield - yield the current processor to other threads.
*
--- a/include/linux/kvm_host.h 2010-11-01 19:54:12.000000000 +0800
+++ b/include/linux/kvm_host.h 2011-01-02 17:43:26.000000000 +0800
@@ -91,6 +91,7 @@ struct kvm_vcpu {
int fpu_active;
int guest_fpu_loaded, guest_xcr0_loaded;
wait_queue_head_t wq;
+ int spinning;
int sigset_active;
sigset_t sigset;
struct kvm_vcpu_stat stat;
@@ -186,6 +187,7 @@ struct kvm {
#endif
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
atomic_t online_vcpus;
+ int last_boosted_vcpu;
struct list_head vm_list;
struct mutex lock;
struct kvm_io_bus *buses[KVM_NR_BUSES];
--- a/virt/kvm/kvm_main.c 2010-11-01 19:54:12.000000000 +0800
+++ b/virt/kvm/kvm_main.c 2011-01-02 18:03:42.000000000 +0800
@@ -1289,18 +1289,65 @@ void kvm_resched(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_resched);
-void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
+void kvm_vcpu_on_spin(struct kvm_vcpu *me)
{
ktime_t expires;
DEFINE_WAIT(wait);
+ u64 nsecs;
+ struct kvm *kvm = me->kvm;
+ struct kvm_vcpu *vcpu;
+ int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
+ int first_round = 1;
+ int i;
- prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+ me->spinning = 1;
+ /*
+ * We boost the priority of a VCPU that is runnable but not
+ * currently running, because it got preempted by something
+ * else and called schedule in __vcpu_run. Hopefully that
+ * VCPU is holding the lock that we need and will release it.
+ * We approximate round-robin by starting at the last boosted VCPU.
+ */
+ again:
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct task_struct *task = vcpu->task;
+ if (first_round && i < last_boosted_vcpu) {
+ i = last_boosted_vcpu;
+ continue;
+ } else if (!first_round && i > last_boosted_vcpu)
+ break;
+ if (vcpu == me)
+ continue;
+ if (vcpu->spinning)
+ continue;
+ if (!task)
+ continue;
+ if (waitqueue_active(&vcpu->wq))
+ continue;
+ if (task->flags & PF_VCPU)
+ continue;
+ kvm->last_boosted_vcpu = i;
+ goto yield;
+ }
+ if (first_round && last_boosted_vcpu == kvm->last_boosted_vcpu) {
+ /* We have not found anyone yet. */
+ first_round = 0;
+ goto again;
+ }
+ me->spinning = 0;
+ return;
+ yield:
+ yield_to(task, &(nsecs=0));
/* Sleep for 100 us, and hope lock-holder got scheduled */
- expires = ktime_add_ns(ktime_get(), 100000UL);
+ if (nsecs < 100000)
+ nsecs = 100000;
+ prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+ expires = ktime_add_ns(ktime_get(), nsecs);
schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
-
finish_wait(&vcpu->wq, &wait);
+
+ me->spinning = 0;
}
EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [RFC PATCH 2/3] sched: add yield_to function 2011-01-02 11:43 [RFC PATCH 2/3] sched: add yield_to function Hillf Danton @ 2011-01-02 17:19 ` Rik van Riel 2011-01-03 4:18 ` Hillf Danton 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2011-01-02 17:19 UTC (permalink / raw) To: Hillf Danton Cc: Marcelo Tosatti, Peter Zijlstra, Srivatsa Vaddagiri, linux-kernel, Avi Kivity, Edwin Zhai On 01/02/2011 06:43 AM, Hillf Danton wrote: > The following work is based on what Rik posted, with a few changes. I threw out my patch before Christmas, replacing it with Mike Galbraith's work. Today I finally discovered why I had not been getting very useful results testing the PLE code on the test system. On this particular CPU model (Intel(R) Xeon(R) CPU X5670), the value of ple_gap needs to be at least 48 ! A ple_gap value of 41 is simply too low to trigger reliably with Avi's kvm-unit-tests PLE test and also only triggers a few times a second at most (not too useful) with the other tests I had been running... -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2011-01-02 17:19 ` Rik van Riel @ 2011-01-03 4:18 ` Hillf Danton 2011-01-03 5:04 ` Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Hillf Danton @ 2011-01-03 4:18 UTC (permalink / raw) To: Rik van Riel Cc: Marcelo Tosatti, Peter Zijlstra, Srivatsa Vaddagiri, linux-kernel, Avi Kivity, Edwin Zhai On Mon, Jan 3, 2011 at 1:19 AM, Rik van Riel <riel@redhat.com> wrote: > On 01/02/2011 06:43 AM, Hillf Danton wrote: > >> The following work is based on what Rik posted, with a few changes. > > I threw out my patch before Christmas, replacing it with > Mike Galbraith's work. > Would you please share the work by Mike, or info where to get it? Hillf > Today I finally discovered why I had not been getting > very useful results testing the PLE code on the test > system. > > On this particular CPU model (Intel(R) Xeon(R) CPU X5670), > the value of ple_gap needs to be at least 48 ! > > A ple_gap value of 41 is simply too low to trigger reliably > with Avi's kvm-unit-tests PLE test and also only triggers > a few times a second at most (not too useful) with the other > tests I had been running... > > -- > All rights reversed > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2011-01-03 4:18 ` Hillf Danton @ 2011-01-03 5:04 ` Rik van Riel 2011-01-04 12:44 ` Hillf Danton 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2011-01-03 5:04 UTC (permalink / raw) To: Hillf Danton Cc: Marcelo Tosatti, Peter Zijlstra, Srivatsa Vaddagiri, linux-kernel, Avi Kivity, Edwin Zhai On 01/02/2011 11:18 PM, Hillf Danton wrote: > On Mon, Jan 3, 2011 at 1:19 AM, Rik van Riel<riel@redhat.com> wrote: >> On 01/02/2011 06:43 AM, Hillf Danton wrote: >> >>> The following work is based on what Rik posted, with a few changes. >> >> I threw out my patch before Christmas, replacing it with >> Mike Galbraith's work. >> > > Would you please share the work by Mike, or info where to get it? Now that I finally got useful test results (after increasing ple_gap to be more useful on the X5670 CPU), I will post the patch series. Christmas holidays are a good thing, but expect patches tomorrow :) -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2011-01-03 5:04 ` Rik van Riel @ 2011-01-04 12:44 ` Hillf Danton 0 siblings, 0 replies; 37+ messages in thread From: Hillf Danton @ 2011-01-04 12:44 UTC (permalink / raw) To: Rik van Riel Cc: Marcelo Tosatti, Peter Zijlstra, Srivatsa Vaddagiri, linux-kernel, Avi Kivity, Edwin Zhai On Mon, Jan 3, 2011 at 1:04 PM, Rik van Riel <riel@redhat.com> wrote: > On 01/02/2011 11:18 PM, Hillf Danton wrote: >> >> On Mon, Jan 3, 2011 at 1:19 AM, Rik van Riel<riel@redhat.com> wrote: >>> >>> On 01/02/2011 06:43 AM, Hillf Danton wrote: >>> >>>> The following work is based on what Rik posted, with a few changes. >>> >>> I threw out my patch before Christmas, replacing it with >>> Mike Galbraith's work. >>> >> >> Would you please share the work by Mike, or info where to get it? > > Now that I finally got useful test results (after > increasing ple_gap to be more useful on the X5670 > CPU), I will post the patch series. > Good news after holiday! > Christmas holidays are a good thing, but expect > patches tomorrow :) > Please then get me Cced when posting :) Hillf ^ permalink raw reply [flat|nested] 37+ messages in thread
* [RFC PATCH 0/3] directed yield for Pause Loop Exiting @ 2010-12-02 19:41 Rik van Riel 2010-12-02 19:44 ` [RFC PATCH 2/3] sched: add yield_to function Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2010-12-02 19:41 UTC (permalink / raw) To: kvm Cc: linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Peter Zijlstra, Ingo Molnar, Anthony Liguori When running SMP virtual machines, it is possible for one VCPU to be spinning on a spinlock, while the VCPU that holds the spinlock is not currently running, because the host scheduler preempted it to run something else. Both Intel and AMD CPUs have a feature that detects when a virtual CPU is spinning on a lock and will trap to the host. The current KVM code sleeps for a bit whenever that happens, which results in eg. a 64 VCPU Windows guest taking forever and a bit to boot up. This is because the VCPU holding the lock is actually running and not sleeping, so the pause is counter-productive. In other workloads a pause can also be counter-productive, with spinlock detection resulting in one guest giving up its CPU time to the others. Instead of spinning, it ends up simply not running much at all. This patch series aims to fix that, by having a VCPU that spins give the remainder of its timeslice to another VCPU in the same guest before yielding the CPU - one that is runnable but got preempted, hopefully the lock holder. Scheduler people, please flame me with anything I may have done wrong, so I can do it right for a next version :) -- All rights reversed. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [RFC PATCH 2/3] sched: add yield_to function 2010-12-02 19:41 [RFC PATCH 0/3] directed yield for Pause Loop Exiting Rik van Riel @ 2010-12-02 19:44 ` Rik van Riel 2010-12-03 0:50 ` Chris Wright ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Rik van Riel @ 2010-12-02 19:44 UTC (permalink / raw) To: kvm Cc: linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Peter Zijlstra, Ingo Molnar, Anthony Liguori Add a yield_to function to the scheduler code, allowing us to give the remainder of our timeslice to another thread. We may want to use this to provide a sys_yield_to system call one day. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> diff --git a/include/linux/sched.h b/include/linux/sched.h index c5f926c..4f3cce9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1985,6 +1985,7 @@ extern void set_user_nice(struct task_struct *p, long nice); extern int task_prio(const struct task_struct *p); extern int task_nice(const struct task_struct *p); extern int can_nice(const struct task_struct *p, const int nice); +extern void requeue_task(struct rq *rq, struct task_struct *p); extern int task_curr(const struct task_struct *p); extern int idle_cpu(int cpu); extern int sched_setscheduler(struct task_struct *, int, struct sched_param *); @@ -2058,6 +2059,14 @@ extern int wake_up_state(struct task_struct *tsk, unsigned int state); extern int wake_up_process(struct task_struct *tsk); extern void wake_up_new_task(struct task_struct *tsk, unsigned long clone_flags); + +#ifdef CONFIG_SCHED_HRTICK +extern u64 slice_remain(struct task_struct *); +extern void yield_to(struct task_struct *); +#else +static inline void yield_to(struct task_struct *p) yield() +#endif + #ifdef CONFIG_SMP extern void kick_process(struct task_struct *tsk); #else diff --git a/kernel/sched.c b/kernel/sched.c index f8e5a25..ef088cd 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -1909,6 +1909,26 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) p->se.on_rq = 0; } +/** + * requeue_task - requeue a task which priority got changed by yield_to + * @rq: the tasks's runqueue + * @p: the task in question + * Must be called with the runqueue lock held. Will cause the CPU to + * reschedule if p is now at the head of the runqueue. + */ +void requeue_task(struct rq *rq, struct task_struct *p) +{ + assert_spin_locked(&rq->lock); + + if (!p->se.on_rq || task_running(rq, p) || task_has_rt_policy(p)) + return; + + dequeue_task(rq, p, 0); + enqueue_task(rq, p, 0); + + resched_task(p); +} + /* * __normal_prio - return the priority that is based on the static prio */ @@ -6797,6 +6817,36 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len, return ret; } +#ifdef CONFIG_SCHED_HRTICK +/* + * Yield the CPU, giving the remainder of our time slice to task p. + * Typically used to hand CPU time to another thread inside the same + * process, eg. when p holds a resource other threads are waiting for. + * Giving priority to p may help get that resource released sooner. + */ +void yield_to(struct task_struct *p) +{ + unsigned long flags; + struct sched_entity *se = &p->se; + struct rq *rq; + struct cfs_rq *cfs_rq; + u64 remain = slice_remain(current); + + rq = task_rq_lock(p, &flags); + if (task_running(rq, p) || task_has_rt_policy(p)) + goto out; + cfs_rq = cfs_rq_of(se); + se->vruntime -= remain; + if (se->vruntime < cfs_rq->min_vruntime) + se->vruntime = cfs_rq->min_vruntime; + requeue_task(rq, p); + out: + task_rq_unlock(rq, &flags); + yield(); +} +EXPORT_SYMBOL(yield_to); +#endif + /** * sys_sched_yield - yield the current processor to other threads. * diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 5119b08..2a0a595 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -974,6 +974,25 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) */ #ifdef CONFIG_SCHED_HRTICK +u64 slice_remain(struct task_struct *p) +{ + unsigned long flags; + struct sched_entity *se = &p->se; + struct cfs_rq *cfs_rq; + struct rq *rq; + u64 slice, ran; + s64 delta; + + rq = task_rq_lock(p, &flags); + cfs_rq = cfs_rq_of(se); + slice = sched_slice(cfs_rq, se); + ran = se->sum_exec_runtime - se->prev_sum_exec_runtime; + delta = slice - ran; + task_rq_unlock(rq, &flags); + + return max(delta, 0LL); +} + static void hrtick_start_fair(struct rq *rq, struct task_struct *p) { struct sched_entity *se = &p->se; -- All rights reversed. ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-02 19:44 ` [RFC PATCH 2/3] sched: add yield_to function Rik van Riel @ 2010-12-03 0:50 ` Chris Wright 2010-12-03 18:27 ` Rik van Riel 2010-12-03 5:54 ` Mike Galbraith 2010-12-03 13:23 ` Peter Zijlstra 2 siblings, 1 reply; 37+ messages in thread From: Chris Wright @ 2010-12-03 0:50 UTC (permalink / raw) To: Rik van Riel Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Peter Zijlstra, Ingo Molnar, Anthony Liguori * Rik van Riel (riel@redhat.com) wrote: > Add a yield_to function to the scheduler code, allowing us to > give the remainder of our timeslice to another thread. > > We may want to use this to provide a sys_yield_to system call > one day. > > Signed-off-by: Rik van Riel <riel@redhat.com> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index c5f926c..4f3cce9 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1985,6 +1985,7 @@ extern void set_user_nice(struct task_struct *p, long nice); > extern int task_prio(const struct task_struct *p); > extern int task_nice(const struct task_struct *p); > extern int can_nice(const struct task_struct *p, const int nice); > +extern void requeue_task(struct rq *rq, struct task_struct *p); > extern int task_curr(const struct task_struct *p); > extern int idle_cpu(int cpu); > extern int sched_setscheduler(struct task_struct *, int, struct sched_param *); > @@ -2058,6 +2059,14 @@ extern int wake_up_state(struct task_struct *tsk, unsigned int state); > extern int wake_up_process(struct task_struct *tsk); > extern void wake_up_new_task(struct task_struct *tsk, > unsigned long clone_flags); > + > +#ifdef CONFIG_SCHED_HRTICK > +extern u64 slice_remain(struct task_struct *); > +extern void yield_to(struct task_struct *); > +#else > +static inline void yield_to(struct task_struct *p) yield() Missing {}'s ? > +#endif > + > #ifdef CONFIG_SMP > extern void kick_process(struct task_struct *tsk); > #else > diff --git a/kernel/sched.c b/kernel/sched.c > index f8e5a25..ef088cd 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -1909,6 +1909,26 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) > p->se.on_rq = 0; > } > > +/** > + * requeue_task - requeue a task which priority got changed by yield_to > + * @rq: the tasks's runqueue > + * @p: the task in question > + * Must be called with the runqueue lock held. Will cause the CPU to > + * reschedule if p is now at the head of the runqueue. > + */ > +void requeue_task(struct rq *rq, struct task_struct *p) > +{ > + assert_spin_locked(&rq->lock); > + > + if (!p->se.on_rq || task_running(rq, p) || task_has_rt_policy(p)) > + return; already checked task_running(rq, p) || task_has_rt_policy(p) w/ rq lock held. > + > + dequeue_task(rq, p, 0); > + enqueue_task(rq, p, 0); seems like you could condense to save an update_rq_clock() call at least, don't know if the info_queued, info_dequeued need to be updated > + resched_task(p); > +} > + > /* > * __normal_prio - return the priority that is based on the static prio > */ > @@ -6797,6 +6817,36 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len, > return ret; > } > > +#ifdef CONFIG_SCHED_HRTICK > +/* > + * Yield the CPU, giving the remainder of our time slice to task p. > + * Typically used to hand CPU time to another thread inside the same > + * process, eg. when p holds a resource other threads are waiting for. > + * Giving priority to p may help get that resource released sooner. > + */ > +void yield_to(struct task_struct *p) > +{ > + unsigned long flags; > + struct sched_entity *se = &p->se; > + struct rq *rq; > + struct cfs_rq *cfs_rq; > + u64 remain = slice_remain(current); > + > + rq = task_rq_lock(p, &flags); > + if (task_running(rq, p) || task_has_rt_policy(p)) > + goto out; > + cfs_rq = cfs_rq_of(se); > + se->vruntime -= remain; > + if (se->vruntime < cfs_rq->min_vruntime) > + se->vruntime = cfs_rq->min_vruntime; Should these details all be in sched_fair? Seems like the wrong layer here. And would that condition go the other way? If new vruntime is smaller than min, then it becomes new cfs_rq->min_vruntime? > + requeue_task(rq, p); > + out: > + task_rq_unlock(rq, &flags); > + yield(); > +} > +EXPORT_SYMBOL(yield_to); > +#endif > + > /** > * sys_sched_yield - yield the current processor to other threads. > * > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 5119b08..2a0a595 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -974,6 +974,25 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) > */ > > #ifdef CONFIG_SCHED_HRTICK > +u64 slice_remain(struct task_struct *p) > +{ > + unsigned long flags; > + struct sched_entity *se = &p->se; > + struct cfs_rq *cfs_rq; > + struct rq *rq; > + u64 slice, ran; > + s64 delta; > + > + rq = task_rq_lock(p, &flags); > + cfs_rq = cfs_rq_of(se); > + slice = sched_slice(cfs_rq, se); > + ran = se->sum_exec_runtime - se->prev_sum_exec_runtime; > + delta = slice - ran; > + task_rq_unlock(rq, &flags); > + > + return max(delta, 0LL); Can delta go negative? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 0:50 ` Chris Wright @ 2010-12-03 18:27 ` Rik van Riel 2010-12-03 19:30 ` Chris Wright 2010-12-03 21:30 ` Peter Zijlstra 0 siblings, 2 replies; 37+ messages in thread From: Rik van Riel @ 2010-12-03 18:27 UTC (permalink / raw) To: Chris Wright Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Peter Zijlstra, Ingo Molnar, Anthony Liguori On 12/02/2010 07:50 PM, Chris Wright wrote: >> +void requeue_task(struct rq *rq, struct task_struct *p) >> +{ >> + assert_spin_locked(&rq->lock); >> + >> + if (!p->se.on_rq || task_running(rq, p) || task_has_rt_policy(p)) >> + return; > > already checked task_running(rq, p) || task_has_rt_policy(p) w/ rq lock > held. OK, I removed the duplicate checks. >> + >> + dequeue_task(rq, p, 0); >> + enqueue_task(rq, p, 0); > > seems like you could condense to save an update_rq_clock() call at least, > don't know if the info_queued, info_dequeued need to be updated Or I can do the whole operation with the task not queued. Not sure yet what approach I'll take... >> +#ifdef CONFIG_SCHED_HRTICK >> +/* >> + * Yield the CPU, giving the remainder of our time slice to task p. >> + * Typically used to hand CPU time to another thread inside the same >> + * process, eg. when p holds a resource other threads are waiting for. >> + * Giving priority to p may help get that resource released sooner. >> + */ >> +void yield_to(struct task_struct *p) >> +{ >> + unsigned long flags; >> + struct sched_entity *se =&p->se; >> + struct rq *rq; >> + struct cfs_rq *cfs_rq; >> + u64 remain = slice_remain(current); >> + >> + rq = task_rq_lock(p,&flags); >> + if (task_running(rq, p) || task_has_rt_policy(p)) >> + goto out; >> + cfs_rq = cfs_rq_of(se); >> + se->vruntime -= remain; >> + if (se->vruntime< cfs_rq->min_vruntime) >> + se->vruntime = cfs_rq->min_vruntime; > > Should these details all be in sched_fair? Seems like the wrong layer > here. And would that condition go the other way? If new vruntime is > smaller than min, then it becomes new cfs_rq->min_vruntime? That would be nice. Unfortunately, EXPORT_SYMBOL() does not seem to work right from sched_fair.c, which is included from sched.c instead of being built from the makefile! >> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c >> index 5119b08..2a0a595 100644 >> --- a/kernel/sched_fair.c >> +++ b/kernel/sched_fair.c >> @@ -974,6 +974,25 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) >> */ >> >> #ifdef CONFIG_SCHED_HRTICK >> +u64 slice_remain(struct task_struct *p) >> +{ >> + unsigned long flags; >> + struct sched_entity *se =&p->se; >> + struct cfs_rq *cfs_rq; >> + struct rq *rq; >> + u64 slice, ran; >> + s64 delta; >> + >> + rq = task_rq_lock(p,&flags); >> + cfs_rq = cfs_rq_of(se); >> + slice = sched_slice(cfs_rq, se); >> + ran = se->sum_exec_runtime - se->prev_sum_exec_runtime; >> + delta = slice - ran; >> + task_rq_unlock(rq,&flags); >> + >> + return max(delta, 0LL); > > Can delta go negative? Good question. I don't know. -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 18:27 ` Rik van Riel @ 2010-12-03 19:30 ` Chris Wright 2010-12-03 21:30 ` Peter Zijlstra 1 sibling, 0 replies; 37+ messages in thread From: Chris Wright @ 2010-12-03 19:30 UTC (permalink / raw) To: Rik van Riel Cc: Chris Wright, kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Peter Zijlstra, Ingo Molnar, Anthony Liguori * Rik van Riel (riel@redhat.com) wrote: > On 12/02/2010 07:50 PM, Chris Wright wrote: > >>+/* > >>+ * Yield the CPU, giving the remainder of our time slice to task p. > >>+ * Typically used to hand CPU time to another thread inside the same > >>+ * process, eg. when p holds a resource other threads are waiting for. > >>+ * Giving priority to p may help get that resource released sooner. > >>+ */ > >>+void yield_to(struct task_struct *p) > >>+{ > >>+ unsigned long flags; > >>+ struct sched_entity *se =&p->se; > >>+ struct rq *rq; > >>+ struct cfs_rq *cfs_rq; > >>+ u64 remain = slice_remain(current); > >>+ > >>+ rq = task_rq_lock(p,&flags); > >>+ if (task_running(rq, p) || task_has_rt_policy(p)) > >>+ goto out; > >>+ cfs_rq = cfs_rq_of(se); > >>+ se->vruntime -= remain; > >>+ if (se->vruntime< cfs_rq->min_vruntime) > >>+ se->vruntime = cfs_rq->min_vruntime; > > > >Should these details all be in sched_fair? Seems like the wrong layer > >here. And would that condition go the other way? If new vruntime is > >smaller than min, then it becomes new cfs_rq->min_vruntime? > > That would be nice. Unfortunately, EXPORT_SYMBOL() does > not seem to work right from sched_fair.c, which is included > from sched.c instead of being built from the makefile! add a ->yield_to() to properly isolate (only relevant then in sched_fair)? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 18:27 ` Rik van Riel 2010-12-03 19:30 ` Chris Wright @ 2010-12-03 21:30 ` Peter Zijlstra 1 sibling, 0 replies; 37+ messages in thread From: Peter Zijlstra @ 2010-12-03 21:30 UTC (permalink / raw) To: Rik van Riel Cc: Chris Wright, kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 13:27 -0500, Rik van Riel wrote: > > Should these details all be in sched_fair? Seems like the wrong layer > > here. And would that condition go the other way? If new vruntime is > > smaller than min, then it becomes new cfs_rq->min_vruntime? > > That would be nice. Unfortunately, EXPORT_SYMBOL() does > not seem to work right from sched_fair.c, which is included > from sched.c instead of being built from the makefile! I'm not quite sure why that is, but I kinda like that, the policy implementation should never export stuff. Code outside the scheduler cannot ever know the policy of a task, hence policy specific exports are bad. A generic export with policy implementations (like the sched_class::yield_to() proposal) are the proper way. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-02 19:44 ` [RFC PATCH 2/3] sched: add yield_to function Rik van Riel 2010-12-03 0:50 ` Chris Wright @ 2010-12-03 5:54 ` Mike Galbraith 2010-12-03 13:46 ` Srivatsa Vaddagiri 2010-12-03 13:23 ` Peter Zijlstra 2 siblings, 1 reply; 37+ messages in thread From: Mike Galbraith @ 2010-12-03 5:54 UTC (permalink / raw) To: Rik van Riel Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Thu, 2010-12-02 at 14:44 -0500, Rik van Riel wrote: > +#ifdef CONFIG_SCHED_HRTICK > +/* > + * Yield the CPU, giving the remainder of our time slice to task p. > + * Typically used to hand CPU time to another thread inside the same > + * process, eg. when p holds a resource other threads are waiting for. > + * Giving priority to p may help get that resource released sooner. > + */ > +void yield_to(struct task_struct *p) > +{ > + unsigned long flags; > + struct sched_entity *se = &p->se; > + struct rq *rq; > + struct cfs_rq *cfs_rq; > + u64 remain = slice_remain(current); That "slice remaining" only shows the distance to last preempt, however brief. It shows nothing wrt tree position, the yielding task may well already be right of the task it wants to yield to, having been a buddy. > cfs_rq = cfs_rq_of(se); > + se->vruntime -= remain; > + if (se->vruntime < cfs_rq->min_vruntime) > + se->vruntime = cfs_rq->min_vruntime; (This is usually done using max_vruntime()) If the recipient was already left of the fair stick (min_vruntime), clipping advances it's vruntime, vaporizing entitlement from both donor and recipient. What if a task tries to yield to another not on the same cpu, and/or in the same task group? This would munge min_vruntime of other queues. I think you'd have to restrict this to same cpu, same group. If tasks can donate cross cfs_rq, (say) pinned task A cpu A running solo could donate vruntime to selected tasks pinned to cpu B, for as long as minuscule preemptions can resupply ammo. Would suck to not be the favored child. Maybe you could exchange vruntimes cooperatively (iff same cfs_rq) between threads, but I don't think donations with clipping works. -Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 5:54 ` Mike Galbraith @ 2010-12-03 13:46 ` Srivatsa Vaddagiri 2010-12-03 14:45 ` Mike Galbraith 0 siblings, 1 reply; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 13:46 UTC (permalink / raw) To: Mike Galbraith Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 06:54:16AM +0100, Mike Galbraith wrote: > > +void yield_to(struct task_struct *p) > > +{ > > + unsigned long flags; > > + struct sched_entity *se = &p->se; > > + struct rq *rq; > > + struct cfs_rq *cfs_rq; > > + u64 remain = slice_remain(current); > > That "slice remaining" only shows the distance to last preempt, however > brief. It shows nothing wrt tree position, the yielding task may well > already be right of the task it wants to yield to, having been a buddy. Good point. > > cfs_rq = cfs_rq_of(se); > > + se->vruntime -= remain; > > + if (se->vruntime < cfs_rq->min_vruntime) > > + se->vruntime = cfs_rq->min_vruntime; > > (This is usually done using max_vruntime()) > > If the recipient was already left of the fair stick (min_vruntime), > clipping advances it's vruntime, vaporizing entitlement from both donor > and recipient. > > What if a task tries to yield to another not on the same cpu, and/or in > the same task group? In this case, target of yield_to is a vcpu belonging to the same VM and hence is expected to be in same task group, but I agree its good to put a check. > This would munge min_vruntime of other queues. I > think you'd have to restrict this to same cpu, same group. If tasks can > donate cross cfs_rq, (say) pinned task A cpu A running solo could donate > vruntime to selected tasks pinned to cpu B, for as long as minuscule > preemptions can resupply ammo. Would suck to not be the favored child. IOW starving "non-favored" childs? > Maybe you could exchange vruntimes cooperatively (iff same cfs_rq) > between threads, but I don't think donations with clipping works. Can't that lead to starvation again (as I pointed in a mail to Peterz): p0 -> A0 B0 A1 A0/A1 enter a yield_to(other) deadlock, which means we keep swapping their vruntimes, starving B0? - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 13:46 ` Srivatsa Vaddagiri @ 2010-12-03 14:45 ` Mike Galbraith 2010-12-03 14:48 ` Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Mike Galbraith @ 2010-12-03 14:45 UTC (permalink / raw) To: vatsa Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 19:16 +0530, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 06:54:16AM +0100, Mike Galbraith wrote: > > > +void yield_to(struct task_struct *p) > > > +{ > > > + unsigned long flags; > > > + struct sched_entity *se = &p->se; > > > + struct rq *rq; > > > + struct cfs_rq *cfs_rq; > > > + u64 remain = slice_remain(current); > > > > That "slice remaining" only shows the distance to last preempt, however > > brief. It shows nothing wrt tree position, the yielding task may well > > already be right of the task it wants to yield to, having been a buddy. > > Good point. > > > > cfs_rq = cfs_rq_of(se); > > > + se->vruntime -= remain; > > > + if (se->vruntime < cfs_rq->min_vruntime) > > > + se->vruntime = cfs_rq->min_vruntime; > > > > (This is usually done using max_vruntime()) > > > > If the recipient was already left of the fair stick (min_vruntime), > > clipping advances it's vruntime, vaporizing entitlement from both donor > > and recipient. > > > > What if a task tries to yield to another not on the same cpu, and/or in > > the same task group? > > In this case, target of yield_to is a vcpu belonging to the same VM and hence is > expected to be in same task group, but I agree its good to put a check. > > > This would munge min_vruntime of other queues. I > > think you'd have to restrict this to same cpu, same group. If tasks can > > donate cross cfs_rq, (say) pinned task A cpu A running solo could donate > > vruntime to selected tasks pinned to cpu B, for as long as minuscule > > preemptions can resupply ammo. Would suck to not be the favored child. > > IOW starving "non-favored" childs? Yes, as in fairness ceases to exists. > > Maybe you could exchange vruntimes cooperatively (iff same cfs_rq) > > between threads, but I don't think donations with clipping works. > > Can't that lead to starvation again (as I pointed in a mail to Peterz): > > p0 -> A0 B0 A1 > > A0/A1 enter a yield_to(other) deadlock, which means we keep swapping their > vruntimes, starving B0? I'll have to go back and re-read that. Off the top of my head, I see no way it could matter which container the numbers live in as long as they keep advancing, and stay in the same runqueue. (hm, task weights would have to be the same too or scaled. dangerous business, tinkering with vruntimes) -Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 14:45 ` Mike Galbraith @ 2010-12-03 14:48 ` Rik van Riel 2010-12-03 15:09 ` Mike Galbraith 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2010-12-03 14:48 UTC (permalink / raw) To: Mike Galbraith Cc: vatsa, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On 12/03/2010 09:45 AM, Mike Galbraith wrote: > I'll have to go back and re-read that. Off the top of my head, I see no > way it could matter which container the numbers live in as long as they > keep advancing, and stay in the same runqueue. (hm, task weights would > have to be the same too or scaled. dangerous business, tinkering with > vruntimes) They're not necessarily in the same runqueue, the VCPU that is given time might be on another CPU than the one that was spinning on a lock. -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 14:48 ` Rik van Riel @ 2010-12-03 15:09 ` Mike Galbraith 2010-12-03 15:35 ` Rik van Riel 2010-12-03 21:26 ` Peter Zijlstra 0 siblings, 2 replies; 37+ messages in thread From: Mike Galbraith @ 2010-12-03 15:09 UTC (permalink / raw) To: Rik van Riel Cc: vatsa, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 09:48 -0500, Rik van Riel wrote: > On 12/03/2010 09:45 AM, Mike Galbraith wrote: > > > I'll have to go back and re-read that. Off the top of my head, I see no > > way it could matter which container the numbers live in as long as they > > keep advancing, and stay in the same runqueue. (hm, task weights would > > have to be the same too or scaled. dangerous business, tinkering with > > vruntimes) > > They're not necessarily in the same runqueue, the > VCPU that is given time might be on another CPU > than the one that was spinning on a lock. I don't think pumping vruntime cross cfs_rq would be safe, for the reason noted (et al). No competition means vruntime is meaningless. Donating just advances a clock that nobody's looking at. -Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 15:09 ` Mike Galbraith @ 2010-12-03 15:35 ` Rik van Riel 2010-12-03 16:20 ` Srivatsa Vaddagiri 2010-12-03 20:05 ` Mike Galbraith 2010-12-03 21:26 ` Peter Zijlstra 1 sibling, 2 replies; 37+ messages in thread From: Rik van Riel @ 2010-12-03 15:35 UTC (permalink / raw) To: Mike Galbraith Cc: vatsa, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On 12/03/2010 10:09 AM, Mike Galbraith wrote: > On Fri, 2010-12-03 at 09:48 -0500, Rik van Riel wrote: >> On 12/03/2010 09:45 AM, Mike Galbraith wrote: >> >>> I'll have to go back and re-read that. Off the top of my head, I see no >>> way it could matter which container the numbers live in as long as they >>> keep advancing, and stay in the same runqueue. (hm, task weights would >>> have to be the same too or scaled. dangerous business, tinkering with >>> vruntimes) >> >> They're not necessarily in the same runqueue, the >> VCPU that is given time might be on another CPU >> than the one that was spinning on a lock. > > I don't think pumping vruntime cross cfs_rq would be safe, for the > reason noted (et al). No competition means vruntime is meaningless. > Donating just advances a clock that nobody's looking at. Do you have suggestions on what I should do to make this yield_to functionality work? I'm willing to implement pretty much anything the scheduler people will be happy with :) -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 15:35 ` Rik van Riel @ 2010-12-03 16:20 ` Srivatsa Vaddagiri 2010-12-03 17:09 ` Rik van Riel 2010-12-03 20:05 ` Mike Galbraith 1 sibling, 1 reply; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 16:20 UTC (permalink / raw) To: Rik van Riel Cc: Mike Galbraith, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 10:35:25AM -0500, Rik van Riel wrote: > Do you have suggestions on what I should do to make > this yield_to functionality work? Keeping in mind the complications of yield_to, I had suggested we do something suggested below: http://marc.info/?l=kvm&m=129122645006996&w=2 Essentially yield to other tasks on your own runqueue and when you get to run again, try reclaiming what you gave up earlier (with a cap on how much one can feedback this relinquished time). It can be accomplished via a special form of yield(), available only to in-kernel users, kvm in this case. - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 16:20 ` Srivatsa Vaddagiri @ 2010-12-03 17:09 ` Rik van Riel 2010-12-03 17:29 ` Srivatsa Vaddagiri 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2010-12-03 17:09 UTC (permalink / raw) To: vatsa Cc: Mike Galbraith, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On 12/03/2010 11:20 AM, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 10:35:25AM -0500, Rik van Riel wrote: >> Do you have suggestions on what I should do to make >> this yield_to functionality work? > > Keeping in mind the complications of yield_to, I had suggested we do something > suggested below: > > http://marc.info/?l=kvm&m=129122645006996&w=2 > > Essentially yield to other tasks on your own runqueue and when you get to run > again, try reclaiming what you gave up earlier (with a cap on how much one can > feedback this relinquished time). It can be accomplished via a special form of > yield(), available only to in-kernel users, kvm in this case. I don't see how that is going to help get the lock released, when the VCPU holding the lock is on another CPU. -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 17:09 ` Rik van Riel @ 2010-12-03 17:29 ` Srivatsa Vaddagiri 2010-12-03 17:33 ` Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 17:29 UTC (permalink / raw) To: Rik van Riel Cc: Mike Galbraith, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 12:09:01PM -0500, Rik van Riel wrote: > I don't see how that is going to help get the lock > released, when the VCPU holding the lock is on another > CPU. Even the directed yield() is not guaranteed to get the lock released, given its shooting in the dark? Anyway, the intention of yield() proposed was not to get lock released immediately (which will happen eventually), but rather to avoid inefficiency associated with (long) spinning and at the same time make sure we are not leaking our bandwidth to other guests because of a naive yield .. - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 17:29 ` Srivatsa Vaddagiri @ 2010-12-03 17:33 ` Rik van Riel 2010-12-03 17:45 ` Srivatsa Vaddagiri 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2010-12-03 17:33 UTC (permalink / raw) To: vatsa Cc: Mike Galbraith, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On 12/03/2010 12:29 PM, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 12:09:01PM -0500, Rik van Riel wrote: >> I don't see how that is going to help get the lock >> released, when the VCPU holding the lock is on another >> CPU. > > Even the directed yield() is not guaranteed to get the lock released, given its > shooting in the dark? True, that's a fair point. > Anyway, the intention of yield() proposed was not to get lock released > immediately (which will happen eventually), but rather to avoid inefficiency > associated with (long) spinning and at the same time make sure we are not > leaking our bandwidth to other guests because of a naive yield .. A KVM guest can run on the host alongside short-lived processes, though. How can we ensure that a VCPU that donates time gets it back again later, when the task time was donated to may no longer exist? -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 17:33 ` Rik van Riel @ 2010-12-03 17:45 ` Srivatsa Vaddagiri 0 siblings, 0 replies; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 17:45 UTC (permalink / raw) To: Rik van Riel Cc: Mike Galbraith, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 12:33:29PM -0500, Rik van Riel wrote: > >Anyway, the intention of yield() proposed was not to get lock released > >immediately (which will happen eventually), but rather to avoid inefficiency > >associated with (long) spinning and at the same time make sure we are not > >leaking our bandwidth to other guests because of a naive yield .. > > A KVM guest can run on the host alongside short-lived > processes, though. How can we ensure that a VCPU that > donates time gets it back again later, when the task > time was donated to may no longer exist? I think that does not matter. What matters for fairness in this case is how much cpu time yielding thread gets over some (larger) time window. By ensuring that relinquished time is fedback, we should maintian fairness for that particular vcpu thread ..This also avoids nasty interactions associated with donation .. - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 15:35 ` Rik van Riel 2010-12-03 16:20 ` Srivatsa Vaddagiri @ 2010-12-03 20:05 ` Mike Galbraith 1 sibling, 0 replies; 37+ messages in thread From: Mike Galbraith @ 2010-12-03 20:05 UTC (permalink / raw) To: Rik van Riel Cc: vatsa, kvm, linux-kernel, Avi Kiviti, Peter Zijlstra, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 10:35 -0500, Rik van Riel wrote: > On 12/03/2010 10:09 AM, Mike Galbraith wrote: > > On Fri, 2010-12-03 at 09:48 -0500, Rik van Riel wrote: > >> On 12/03/2010 09:45 AM, Mike Galbraith wrote: > >> > >>> I'll have to go back and re-read that. Off the top of my head, I see no > >>> way it could matter which container the numbers live in as long as they > >>> keep advancing, and stay in the same runqueue. (hm, task weights would > >>> have to be the same too or scaled. dangerous business, tinkering with > >>> vruntimes) > >> > >> They're not necessarily in the same runqueue, the > >> VCPU that is given time might be on another CPU > >> than the one that was spinning on a lock. > > > > I don't think pumping vruntime cross cfs_rq would be safe, for the > > reason noted (et al). No competition means vruntime is meaningless. > > Donating just advances a clock that nobody's looking at. > > Do you have suggestions on what I should do to make > this yield_to functionality work? Hm. The problem with donating vruntime across queues is that there is no global clock. You have to be in the same frame of reference for vruntime donation to make any sense. Same with cross cpu yield_to() hw wise though. It makes no sense from another frame of reference. Pull. -Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 15:09 ` Mike Galbraith 2010-12-03 15:35 ` Rik van Riel @ 2010-12-03 21:26 ` Peter Zijlstra 1 sibling, 0 replies; 37+ messages in thread From: Peter Zijlstra @ 2010-12-03 21:26 UTC (permalink / raw) To: Mike Galbraith Cc: Rik van Riel, vatsa, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 16:09 +0100, Mike Galbraith wrote: > On Fri, 2010-12-03 at 09:48 -0500, Rik van Riel wrote: > > On 12/03/2010 09:45 AM, Mike Galbraith wrote: > > > > > I'll have to go back and re-read that. Off the top of my head, I see no > > > way it could matter which container the numbers live in as long as they > > > keep advancing, and stay in the same runqueue. (hm, task weights would > > > have to be the same too or scaled. dangerous business, tinkering with > > > vruntimes) > > > > They're not necessarily in the same runqueue, the > > VCPU that is given time might be on another CPU > > than the one that was spinning on a lock. > > I don't think pumping vruntime cross cfs_rq would be safe, for the > reason noted (et al). No competition means vruntime is meaningless. > Donating just advances a clock that nobody's looking at. Yeah, cross-cpu you have to model it like exchanging lag. That's a slightly more complicated trick (esp. since we still don't have a proper measure for lag) but it should be doable. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-02 19:44 ` [RFC PATCH 2/3] sched: add yield_to function Rik van Riel 2010-12-03 0:50 ` Chris Wright 2010-12-03 5:54 ` Mike Galbraith @ 2010-12-03 13:23 ` Peter Zijlstra 2010-12-03 13:30 ` Srivatsa Vaddagiri 2010-12-08 17:55 ` Rik van Riel 2 siblings, 2 replies; 37+ messages in thread From: Peter Zijlstra @ 2010-12-03 13:23 UTC (permalink / raw) To: Rik van Riel Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Ingo Molnar, Anthony Liguori On Thu, 2010-12-02 at 14:44 -0500, Rik van Riel wrote: unsigned long clone_flags); > + > +#ifdef CONFIG_SCHED_HRTICK > +extern u64 slice_remain(struct task_struct *); > +extern void yield_to(struct task_struct *); > +#else > +static inline void yield_to(struct task_struct *p) yield() > +#endif That does SCHED_HRTICK have to do with any of this? > #ifdef CONFIG_SMP > extern void kick_process(struct task_struct *tsk); > #else > diff --git a/kernel/sched.c b/kernel/sched.c > index f8e5a25..ef088cd 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -1909,6 +1909,26 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) > p->se.on_rq = 0; > } > > +/** > + * requeue_task - requeue a task which priority got changed by yield_to priority doesn't seem the right word, you're not actually changing anything related to p->*prio > + * @rq: the tasks's runqueue > + * @p: the task in question > + * Must be called with the runqueue lock held. Will cause the CPU to > + * reschedule if p is now at the head of the runqueue. > + */ > +void requeue_task(struct rq *rq, struct task_struct *p) > +{ > + assert_spin_locked(&rq->lock); > + > + if (!p->se.on_rq || task_running(rq, p) || task_has_rt_policy(p)) > + return; > + > + dequeue_task(rq, p, 0); > + enqueue_task(rq, p, 0); > + > + resched_task(p); I guess that wants to be something like check_preempt_curr() > +} > + > /* > * __normal_prio - return the priority that is based on the static prio > */ > @@ -6797,6 +6817,36 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len, > return ret; > } > > +#ifdef CONFIG_SCHED_HRTICK Still wondering what all this has to do with SCHED_HRTICK.. > +/* > + * Yield the CPU, giving the remainder of our time slice to task p. > + * Typically used to hand CPU time to another thread inside the same > + * process, eg. when p holds a resource other threads are waiting for. > + * Giving priority to p may help get that resource released sooner. > + */ > +void yield_to(struct task_struct *p) > +{ > + unsigned long flags; > + struct sched_entity *se = &p->se; > + struct rq *rq; > + struct cfs_rq *cfs_rq; > + u64 remain = slice_remain(current); > + > + rq = task_rq_lock(p, &flags); > + if (task_running(rq, p) || task_has_rt_policy(p)) > + goto out; See, this all ain't nice, slice_remain() don't make no sense to be called for !fair tasks. Why not write: if (curr->sched_class == p->sched_class && curr->sched_class->yield_to) curr->sched_class->yield_to(curr, p); or something, and then implement sched_class_fair::yield_to only, leaving it a NOP for all other classes. Also, I think you can side-step that whole curr vs p rq->lock thing you're doing here, by holding p's rq->lock, you've disabled IRQs in current's task context, since ->sum_exec_runtime and all are only changed during scheduling and the scheduler_tick, disabling IRQs in its task context pins them. > + cfs_rq = cfs_rq_of(se); > + se->vruntime -= remain; > + if (se->vruntime < cfs_rq->min_vruntime) > + se->vruntime = cfs_rq->min_vruntime; Now here we have another problem, remain was measured in wall-time, and then you go change a virtual time measure using that. These things are related like: vt = t/weight So you're missing a weight factor somewhere. Also, that check against min_vruntime doesn't really make much sense. > + requeue_task(rq, p); Just makes me wonder why you added requeue task to begin with.. why not simply dequeue at the top of this function, and enqueue at the tail, like all the rest does: see rt_mutex_setprio(), set_user_nice(), sched_move_task(). > + out: > + task_rq_unlock(rq, &flags); > + yield(); > +} > +EXPORT_SYMBOL(yield_to); EXPORT_SYMBOL_GPL() pretty please, I really hate how kvm is a module and needs to export hooks all over the core kernel :/ > +#endif > + > /** > * sys_sched_yield - yield the current processor to other threads. > * > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 5119b08..2a0a595 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -974,6 +974,25 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) > */ > > #ifdef CONFIG_SCHED_HRTICK > +u64 slice_remain(struct task_struct *p) > +{ > + unsigned long flags; > + struct sched_entity *se = &p->se; > + struct cfs_rq *cfs_rq; > + struct rq *rq; > + u64 slice, ran; > + s64 delta; > + > + rq = task_rq_lock(p, &flags); > + cfs_rq = cfs_rq_of(se); > + slice = sched_slice(cfs_rq, se); > + ran = se->sum_exec_runtime - se->prev_sum_exec_runtime; > + delta = slice - ran; > + task_rq_unlock(rq, &flags); > + > + return max(delta, 0LL); > +} Right, so another approach might be to simply swap the vruntime between curr and p. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 13:23 ` Peter Zijlstra @ 2010-12-03 13:30 ` Srivatsa Vaddagiri 2010-12-03 14:03 ` Peter Zijlstra 2010-12-08 17:55 ` Rik van Riel 1 sibling, 1 reply; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 13:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 02:23:39PM +0100, Peter Zijlstra wrote: > Right, so another approach might be to simply swap the vruntime between > curr and p. Can't that cause others to stave? For ex: consider a cpu p0 having these tasks: p0 -> A0 B0 A1 A0/A1 have entered some sort of AB<->BA spin-deadlock, as a result A0 wants to direct yield to A1 which wants to direct yield to A0. If we keep swapping their runtimes, can't it starve B0? - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 13:30 ` Srivatsa Vaddagiri @ 2010-12-03 14:03 ` Peter Zijlstra 2010-12-03 14:06 ` Srivatsa Vaddagiri 0 siblings, 1 reply; 37+ messages in thread From: Peter Zijlstra @ 2010-12-03 14:03 UTC (permalink / raw) To: vatsa Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 19:00 +0530, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 02:23:39PM +0100, Peter Zijlstra wrote: > > Right, so another approach might be to simply swap the vruntime between > > curr and p. > > Can't that cause others to stave? For ex: consider a cpu p0 having these tasks: > > p0 -> A0 B0 A1 > > A0/A1 have entered some sort of AB<->BA spin-deadlock, as a result A0 wants to > direct yield to A1 which wants to direct yield to A0. If we keep swapping their > runtimes, can't it starve B0? No, because they do receive service (they spend some time spinning before being interrupted), so the respective vruntimes will increase, at some point they'll pass B0 and it'll get scheduled. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 14:03 ` Peter Zijlstra @ 2010-12-03 14:06 ` Srivatsa Vaddagiri 2010-12-03 14:10 ` Srivatsa Vaddagiri 2010-12-10 4:34 ` Rik van Riel 0 siblings, 2 replies; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 14:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: > No, because they do receive service (they spend some time spinning > before being interrupted), so the respective vruntimes will increase, at > some point they'll pass B0 and it'll get scheduled. Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this case)? - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 14:06 ` Srivatsa Vaddagiri @ 2010-12-03 14:10 ` Srivatsa Vaddagiri 2010-12-03 21:23 ` Peter Zijlstra 2010-12-10 4:34 ` Rik van Riel 1 sibling, 1 reply; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-03 14:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Fri, Dec 03, 2010 at 07:36:07PM +0530, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: > > No, because they do receive service (they spend some time spinning > > before being interrupted), so the respective vruntimes will increase, at > > some point they'll pass B0 and it'll get scheduled. > > Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this > case)? Hmm perhaps yes, althought at cost of tons of context switches, which would be nice to minimize on? - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 14:10 ` Srivatsa Vaddagiri @ 2010-12-03 21:23 ` Peter Zijlstra 2010-12-04 13:02 ` Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Peter Zijlstra @ 2010-12-03 21:23 UTC (permalink / raw) To: vatsa Cc: Rik van Riel, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Fri, 2010-12-03 at 19:40 +0530, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 07:36:07PM +0530, Srivatsa Vaddagiri wrote: > > On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: > > > No, because they do receive service (they spend some time spinning > > > before being interrupted), so the respective vruntimes will increase, at > > > some point they'll pass B0 and it'll get scheduled. > > > > Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this > > case)? > > Hmm perhaps yes, althought at cost of tons of context switches, which would be > nice to minimize on? Don't care, as long as the guys calling yield_to() pay for that time its their problem. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 21:23 ` Peter Zijlstra @ 2010-12-04 13:02 ` Rik van Riel 0 siblings, 0 replies; 37+ messages in thread From: Rik van Riel @ 2010-12-04 13:02 UTC (permalink / raw) To: Peter Zijlstra Cc: vatsa, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On 12/03/2010 04:23 PM, Peter Zijlstra wrote: > On Fri, 2010-12-03 at 19:40 +0530, Srivatsa Vaddagiri wrote: >> On Fri, Dec 03, 2010 at 07:36:07PM +0530, Srivatsa Vaddagiri wrote: >>> On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: >>>> No, because they do receive service (they spend some time spinning >>>> before being interrupted), so the respective vruntimes will increase, at >>>> some point they'll pass B0 and it'll get scheduled. >>> >>> Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this >>> case)? >> >> Hmm perhaps yes, althought at cost of tons of context switches, which would be >> nice to minimize on? > > Don't care, as long as the guys calling yield_to() pay for that time its > their problem. Also, the context switches are cheaper than spinning entire time slices on spinlocks we're not going to get (because the VCPU holding the lock is not running). -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 14:06 ` Srivatsa Vaddagiri 2010-12-03 14:10 ` Srivatsa Vaddagiri @ 2010-12-10 4:34 ` Rik van Riel 2010-12-10 8:39 ` Srivatsa Vaddagiri 1 sibling, 1 reply; 37+ messages in thread From: Rik van Riel @ 2010-12-10 4:34 UTC (permalink / raw) To: vatsa Cc: Peter Zijlstra, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On 12/03/2010 09:06 AM, Srivatsa Vaddagiri wrote: > On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: >> No, because they do receive service (they spend some time spinning >> before being interrupted), so the respective vruntimes will increase, at >> some point they'll pass B0 and it'll get scheduled. > > Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this > case)? I have a rough idea for a simpler way to ensure fairness. At yield_to time, we could track in the runqueue structure that a task received CPU time (and on the other runqueue that a task donated CPU time). The balancer can count time-given-to CPUs as busier, and donated-time CPUs as less busy, moving tasks away in the unlikely event that the same task gets keeping CPU time given to it. Conversely, it can move other tasks onto CPUs that have tasks on them that cannot make progress right now and are just donating their CPU time. Most of the time the time-given and time-received should balance out and there should be little to no influence on the load balancer. This code would just be there to deal with exceptional circumstances, to avoid the theoretical worst case people have described. -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-10 4:34 ` Rik van Riel @ 2010-12-10 8:39 ` Srivatsa Vaddagiri 2010-12-10 14:55 ` Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Srivatsa Vaddagiri @ 2010-12-10 8:39 UTC (permalink / raw) To: Rik van Riel Cc: Peter Zijlstra, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On Thu, Dec 09, 2010 at 11:34:46PM -0500, Rik van Riel wrote: > On 12/03/2010 09:06 AM, Srivatsa Vaddagiri wrote: > >On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: > >>No, because they do receive service (they spend some time spinning > >>before being interrupted), so the respective vruntimes will increase, at > >>some point they'll pass B0 and it'll get scheduled. > > > >Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this > >case)? > > I have a rough idea for a simpler way to ensure > fairness. > > At yield_to time, we could track in the runqueue > structure that a task received CPU time (and on > the other runqueue that a task donated CPU time). > > The balancer can count time-given-to CPUs as > busier, and donated-time CPUs as less busy, > moving tasks away in the unlikely event that > the same task gets keeping CPU time given to > it. I think just capping donation (either on send side or receive side) may be more simpler here than to mess with load balancer logic. > Conversely, it can move other tasks onto CPUs > that have tasks on them that cannot make progress > right now and are just donating their CPU time. > > Most of the time the time-given and time-received > should balance out and there should be little to > no influence on the load balancer. This code would > just be there to deal with exceptional circumstances, > to avoid the theoretical worst case people have > described. - vatsa ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-10 8:39 ` Srivatsa Vaddagiri @ 2010-12-10 14:55 ` Rik van Riel 0 siblings, 0 replies; 37+ messages in thread From: Rik van Riel @ 2010-12-10 14:55 UTC (permalink / raw) To: vatsa Cc: Peter Zijlstra, kvm, linux-kernel, Avi Kiviti, Ingo Molnar, Anthony Liguori On 12/10/2010 03:39 AM, Srivatsa Vaddagiri wrote: > On Thu, Dec 09, 2010 at 11:34:46PM -0500, Rik van Riel wrote: >> On 12/03/2010 09:06 AM, Srivatsa Vaddagiri wrote: >>> On Fri, Dec 03, 2010 at 03:03:30PM +0100, Peter Zijlstra wrote: >>>> No, because they do receive service (they spend some time spinning >>>> before being interrupted), so the respective vruntimes will increase, at >>>> some point they'll pass B0 and it'll get scheduled. >>> >>> Is that sufficient to ensure that B0 receives its fair share (1/3 cpu in this >>> case)? >> >> I have a rough idea for a simpler way to ensure >> fairness. >> >> At yield_to time, we could track in the runqueue >> structure that a task received CPU time (and on >> the other runqueue that a task donated CPU time). >> >> The balancer can count time-given-to CPUs as >> busier, and donated-time CPUs as less busy, >> moving tasks away in the unlikely event that >> the same task gets keeping CPU time given to >> it. > > I think just capping donation (either on send side or receive side) may be more > simpler here than to mess with load balancer logic. Do you have any ideas on how to implement this in a simple enough way that it may be acceptable upstream? :) -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-03 13:23 ` Peter Zijlstra 2010-12-03 13:30 ` Srivatsa Vaddagiri @ 2010-12-08 17:55 ` Rik van Riel 2010-12-08 20:00 ` Peter Zijlstra 1 sibling, 1 reply; 37+ messages in thread From: Rik van Riel @ 2010-12-08 17:55 UTC (permalink / raw) To: Peter Zijlstra Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Ingo Molnar, Anthony Liguori On 12/03/2010 08:23 AM, Peter Zijlstra wrote: > On Thu, 2010-12-02 at 14:44 -0500, Rik van Riel wrote: > unsigned long clone_flags); >> + >> +#ifdef CONFIG_SCHED_HRTICK >> +extern u64 slice_remain(struct task_struct *); >> +extern void yield_to(struct task_struct *); >> +#else >> +static inline void yield_to(struct task_struct *p) yield() >> +#endif > > That does SCHED_HRTICK have to do with any of this? Legacy from an old prototype this patch is based on. I'll get rid of that. >> +/** >> + * requeue_task - requeue a task which priority got changed by yield_to > > priority doesn't seem the right word, you're not actually changing > anything related to p->*prio True, I'll change the comment. >> + * @rq: the tasks's runqueue >> + * @p: the task in question >> + * Must be called with the runqueue lock held. Will cause the CPU to >> + * reschedule if p is now at the head of the runqueue. >> + */ >> +void requeue_task(struct rq *rq, struct task_struct *p) >> +{ >> + assert_spin_locked(&rq->lock); >> + >> + if (!p->se.on_rq || task_running(rq, p) || task_has_rt_policy(p)) >> + return; >> + >> + dequeue_task(rq, p, 0); >> + enqueue_task(rq, p, 0); >> + >> + resched_task(p); > > I guess that wants to be something like check_preempt_curr() Done. Thanks for pointing that out. >> @@ -6797,6 +6817,36 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len, >> return ret; >> } >> >> +#ifdef CONFIG_SCHED_HRTICK > > Still wondering what all this has to do with SCHED_HRTICK.. > >> +/* >> + * Yield the CPU, giving the remainder of our time slice to task p. >> + * Typically used to hand CPU time to another thread inside the same >> + * process, eg. when p holds a resource other threads are waiting for. >> + * Giving priority to p may help get that resource released sooner. >> + */ >> +void yield_to(struct task_struct *p) >> +{ >> + unsigned long flags; >> + struct sched_entity *se =&p->se; >> + struct rq *rq; >> + struct cfs_rq *cfs_rq; >> + u64 remain = slice_remain(current); >> + >> + rq = task_rq_lock(p,&flags); >> + if (task_running(rq, p) || task_has_rt_policy(p)) >> + goto out; > > See, this all ain't nice, slice_remain() don't make no sense to be > called for !fair tasks. > > Why not write: > > if (curr->sched_class == p->sched_class&& > curr->sched_class->yield_to) > curr->sched_class->yield_to(curr, p); > > or something, and then implement sched_class_fair::yield_to only, > leaving it a NOP for all other classes. Done. >> + cfs_rq = cfs_rq_of(se); >> + se->vruntime -= remain; >> + if (se->vruntime< cfs_rq->min_vruntime) >> + se->vruntime = cfs_rq->min_vruntime; > > Now here we have another problem, remain was measured in wall-time, and > then you go change a virtual time measure using that. These things are > related like: > > vt = t/weight > > So you're missing a weight factor somewhere. > > Also, that check against min_vruntime doesn't really make much sense. OK, how do I do this? >> + requeue_task(rq, p); > > Just makes me wonder why you added requeue task to begin with.. why not > simply dequeue at the top of this function, and enqueue at the tail, > like all the rest does: see rt_mutex_setprio(), set_user_nice(), > sched_move_task(). Done. >> + out: >> + task_rq_unlock(rq,&flags); >> + yield(); >> +} >> +EXPORT_SYMBOL(yield_to); > > EXPORT_SYMBOL_GPL() pretty please, I really hate how kvm is a module and > needs to export hooks all over the core kernel :/ Done. > Right, so another approach might be to simply swap the vruntime between > curr and p. Doesn't that run into the same scale issue you described above? -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-08 17:55 ` Rik van Riel @ 2010-12-08 20:00 ` Peter Zijlstra 2010-12-08 20:04 ` Peter Zijlstra 2010-12-08 22:59 ` Rik van Riel 0 siblings, 2 replies; 37+ messages in thread From: Peter Zijlstra @ 2010-12-08 20:00 UTC (permalink / raw) To: Rik van Riel Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Ingo Molnar, Anthony Liguori, Paul Turner On Wed, 2010-12-08 at 12:55 -0500, Rik van Riel wrote: > > > > Right, so another approach might be to simply swap the vruntime between > > curr and p. > > Doesn't that run into the same scale issue you described > above? Not really, but its tricky on SMP because vruntime only has meaning within a cfs_rq. The below is quickly cobbled together from a few patches I had laying about dealing with similar issues. The avg_vruntime() stuff takes the weighted average of the vruntimes on a cfs runqueue, this weighted average corresponds to the 0-lag point, that is the point where an ideal scheduler would have all tasks. Using the 0-lag point you can compute the lag of a task, the lag is a measure of service for a task, negative lag means the task is owed services, positive lag means its got too much service (at least, thats the sign convention used here, I always forget what the standard is). What we do is, when @p the target task, is owed less service than current, we flip lags such that p will become more eligible. The trouble with all this is that computing the weighted average is terribly expensive (it increases cost of all scheduler hot paths). The dyn_vruntime stuff mixed in there is an attempt at computing something similar, although its not used and I haven't tested the quality of the approximation in a while. Anyway, complete untested and such.. --- include/linux/sched.h | 2 + kernel/sched.c | 39 ++++++++++ kernel/sched_debug.c | 31 ++++----- kernel/sched_fair.c | 179 ++++++++++++++++++++++++++++++++++++++--------- kernel/sched_features.h | 8 +-- 5 files changed, 203 insertions(+), 56 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b0fc8ee..538559e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1095,6 +1095,8 @@ struct sched_class { #ifdef CONFIG_FAIR_GROUP_SCHED void (*task_move_group) (struct task_struct *p, int on_rq); #endif + + void (*yield_to) (struct task_struct *p); }; struct load_weight { diff --git a/kernel/sched.c b/kernel/sched.c index 0debad9..fe0adb0 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -313,6 +313,9 @@ struct cfs_rq { struct load_weight load; unsigned long nr_running; + s64 avg_vruntime; + u64 avg_load; + u64 exec_clock; u64 min_vruntime; @@ -5062,6 +5065,42 @@ SYSCALL_DEFINE0(sched_yield) return 0; } +void yield_to(struct task_struct *p) +{ + struct task_struct *curr = current; + struct rq *p_rq, *rq; + unsigned long flags; + int on_rq; + + local_irq_save(flags); + rq = this_rq(); +again: + p_rq = task_rq(p); + double_rq_lock(rq, p_rq); + if (p_rq != task_rq(p)) { + double_rq_unlock(rq, p_rq); + goto again; + } + + update_rq_clock(rq); + update_rq_clock(p_rq); + + on_rq = p->se.on_rq; + if (on_rq) + dequeue_task(p_rq, p, 0); + + ret = 0; + if (p->sched_class == curr->sched_class && curr->sched_class->yield_to) + curr->sched_class->yield_to(p); + + if (on_rq) + enqueue_task(p_rq, p, 0); + + double_rq_unlock(rq, p_rq); + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(yield_to); + static inline int should_resched(void) { return need_resched() && !(preempt_count() & PREEMPT_ACTIVE); diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index 1dfae3d..b5cc773 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -138,10 +138,9 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu) void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) { - s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1, - spread, rq0_min_vruntime, spread0; + s64 left_vruntime = -1, min_vruntime, right_vruntime = -1, spread; struct rq *rq = cpu_rq(cpu); - struct sched_entity *last; + struct sched_entity *last, *first; unsigned long flags; SEQ_printf(m, "\ncfs_rq[%d]:\n", cpu); @@ -149,28 +148,26 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) SPLIT_NS(cfs_rq->exec_clock)); raw_spin_lock_irqsave(&rq->lock, flags); - if (cfs_rq->rb_leftmost) - MIN_vruntime = (__pick_next_entity(cfs_rq))->vruntime; + first = __pick_first_entity(cfs_rq); + if (first) + left_vruntime = first->vruntime; last = __pick_last_entity(cfs_rq); if (last) - max_vruntime = last->vruntime; + right_vruntime = last->vruntime; min_vruntime = cfs_rq->min_vruntime; - rq0_min_vruntime = cpu_rq(0)->cfs.min_vruntime; raw_spin_unlock_irqrestore(&rq->lock, flags); - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime", - SPLIT_NS(MIN_vruntime)); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_vruntime", + SPLIT_NS(left_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", SPLIT_NS(min_vruntime)); - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "max_vruntime", - SPLIT_NS(max_vruntime)); - spread = max_vruntime - MIN_vruntime; - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", - SPLIT_NS(spread)); - spread0 = min_vruntime - rq0_min_vruntime; - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread0", - SPLIT_NS(spread0)); SEQ_printf(m, " .%-30s: %d\n", "nr_spread_over", cfs_rq->nr_spread_over); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime", + SPLIT_NS(avg_vruntime(cfs_rq))); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime", + SPLIT_NS(right_vruntime)); + spread = right_vruntime - left_vruntime; + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread)); SEQ_printf(m, " .%-30s: %ld\n", "nr_running", cfs_rq->nr_running); SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight); #ifdef CONFIG_FAIR_GROUP_SCHED diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index c886717..8689bcd 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -346,25 +346,90 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se) return se->vruntime - cfs_rq->min_vruntime; } -static void update_min_vruntime(struct cfs_rq *cfs_rq) +/* + * Compute virtual time from the per-task service numbers: + * + * Fair schedulers conserve lag: \Sum lag_i = 0 + * + * lag_i = S_i - s_i = w_i * (V - v_i) + * + * \Sum lag_i = 0 -> \Sum w_i * (V - v_i) = V * \Sum w_i - \Sum w_i * v_i = 0 + * + * From which we solve V: + * + * \Sum v_i * w_i + * V = -------------- + * \Sum w_i + * + * However, since v_i is u64, and the multiplcation could easily overflow + * transform it into a relative form that uses smaller quantities: + * + * Substitute: v_i == (v_i - v) + v + * + * \Sum ((v_i - v) + v) * w_i \Sum (v_i - v) * w_i + * V = -------------------------- = -------------------- + v + * \Sum w_i \Sum w_i + * + * min_vruntime = v + * avg_vruntime = \Sum (v_i - v) * w_i + * cfs_rq->load = \Sum w_i + * + * Since min_vruntime is a monotonic increasing variable that closely tracks + * the per-task service, these deltas: (v_i - v), will be in the order of the + * maximal (virtual) lag induced in the system due to quantisation. + */ +static void +avg_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se) { - u64 vruntime = cfs_rq->min_vruntime; + s64 key = entity_key(cfs_rq, se); + cfs_rq->avg_vruntime += key * se->load.weight; + cfs_rq->avg_load += se->load.weight; +} - if (cfs_rq->curr) - vruntime = cfs_rq->curr->vruntime; +static void +avg_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + s64 key = entity_key(cfs_rq, se); + cfs_rq->avg_vruntime -= key * se->load.weight; + cfs_rq->avg_load -= se->load.weight; +} + +static inline +void avg_vruntime_update(struct cfs_rq *cfs_rq, s64 delta) +{ + /* + * v' = v + d ==> avg_vruntime' = avg_runtime - d*avg_load + */ + cfs_rq->avg_vruntime -= cfs_rq->avg_load * delta; +} - if (cfs_rq->rb_leftmost) { - struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost, - struct sched_entity, - run_node); +static u64 avg_vruntime(struct cfs_rq *cfs_rq) +{ + struct sched_entity *se = cfs_rq->curr; + s64 lag = cfs_rq->avg_vruntime; + long load = cfs_rq->avg_load; - if (!cfs_rq->curr) - vruntime = se->vruntime; - else - vruntime = min_vruntime(vruntime, se->vruntime); + if (se) { + lag += entity_key(cfs_rq, se) * se->load.weight; + load += se->load.weight; } - cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime); + if (load) + lag = div_s64(lag, load); + + return cfs_rq->min_vruntime + lag; +} + +static void __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime) +{ + /* + * open coded max_vruntime() to allow updating avg_vruntime + */ + s64 delta = (s64)(vruntime - cfs_rq->min_vruntime); + if (delta > 0) { + avg_vruntime_update(cfs_rq, delta); + cfs_rq->min_vruntime = vruntime; + } } /* @@ -378,6 +443,8 @@ static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) s64 key = entity_key(cfs_rq, se); int leftmost = 1; + avg_vruntime_add(cfs_rq, se); + /* * Find the right place in the rbtree: */ @@ -417,9 +484,10 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) } rb_erase(&se->run_node, &cfs_rq->tasks_timeline); + avg_vruntime_sub(cfs_rq, se); } -static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq) +static struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq) { struct rb_node *left = cfs_rq->rb_leftmost; @@ -542,6 +610,33 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) static void update_cfs_load(struct cfs_rq *cfs_rq, int global_update); static void update_cfs_shares(struct cfs_rq *cfs_rq, long weight_delta); +static void update_min_vruntime(struct cfs_rq *cfs_rq, unsigned long delta_exec) +{ + struct sched_entity *left = __pick_first_entity(cfs_rq); + struct sched_entity *curr = cfs_rq->curr; + u64 vruntime; + + if (left && curr) + vruntime = min_vruntime(left->vruntime, curr->vruntime); + else if (left) + vruntime = left->vruntime; + else if (curr) + vruntime = curr->vruntime; + else + return; + + if (sched_feat(DYN_MIN_VRUNTIME)) { + u64 new_vruntime = cfs_rq->min_vruntime; + if (delta_exec) { + new_vruntime += calc_delta_mine(delta_exec, + NICE_0_LOAD, &cfs_rq->load); + } + vruntime = max_vruntime(new_vruntime, vruntime); + } + + __update_min_vruntime(cfs_rq, vruntime); +} + /* * Update the current task's runtime statistics. Skip current tasks that * are not in our scheduling class. @@ -560,7 +655,7 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, delta_exec_weighted = calc_delta_fair(delta_exec, curr); curr->vruntime += delta_exec_weighted; - update_min_vruntime(cfs_rq); + update_min_vruntime(cfs_rq, delta_exec); #if defined CONFIG_SMP && defined CONFIG_FAIR_GROUP_SCHED cfs_rq->load_unacc_exec_time += delta_exec; @@ -895,16 +990,7 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se) static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { - u64 vruntime = cfs_rq->min_vruntime; - - /* - * The 'current' period is already promised to the current tasks, - * however the extra weight of the new task will slow them down a - * little, place the new task so that it fits in the slot that - * stays open at the end. - */ - if (initial && sched_feat(START_DEBIT)) - vruntime += sched_vslice(cfs_rq, se); + u64 vruntime = avg_vruntime(cfs_rq); /* sleeps up to a single latency don't count. */ if (!initial) { @@ -934,7 +1020,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * through callig update_curr(). */ if (!(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_WAKING)) - se->vruntime += cfs_rq->min_vruntime; + se->vruntime += avg_vruntime(cfs_rq); /* * Update run-time statistics of the 'current'. @@ -1003,7 +1089,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) se->on_rq = 0; update_cfs_load(cfs_rq, 0); account_entity_dequeue(cfs_rq, se); - update_min_vruntime(cfs_rq); + update_min_vruntime(cfs_rq, 0); update_cfs_shares(cfs_rq, 0); /* @@ -1012,7 +1098,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * movement in our normalized position. */ if (!(flags & DEQUEUE_SLEEP)) - se->vruntime -= cfs_rq->min_vruntime; + se->vruntime -= avg_vruntime(cfs_rq); } /* @@ -1090,7 +1176,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se); static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq) { - struct sched_entity *se = __pick_next_entity(cfs_rq); + struct sched_entity *se = __pick_first_entity(cfs_rq); struct sched_entity *left = se; if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) @@ -1326,7 +1412,7 @@ static void task_waking_fair(struct rq *rq, struct task_struct *p) struct sched_entity *se = &p->se; struct cfs_rq *cfs_rq = cfs_rq_of(se); - se->vruntime -= cfs_rq->min_vruntime; + se->vruntime -= avg_vruntime(cfs_rq); } #ifdef CONFIG_FAIR_GROUP_SCHED @@ -4025,7 +4111,7 @@ static void task_fork_fair(struct task_struct *p) resched_task(rq->curr); } - se->vruntime -= cfs_rq->min_vruntime; + se->vruntime -= avg_vruntime(cfs_rq); raw_spin_unlock_irqrestore(&rq->lock, flags); } @@ -4096,10 +4182,10 @@ static void task_move_group_fair(struct task_struct *p, int on_rq) * fair sleeper stuff for the first placement, but who cares. */ if (!on_rq) - p->se.vruntime -= cfs_rq_of(&p->se)->min_vruntime; + p->se.vruntime -= avg_vruntime(cfs_rq_of(&p->se)); set_task_rq(p, task_cpu(p)); if (!on_rq) - p->se.vruntime += cfs_rq_of(&p->se)->min_vruntime; + p->se.vruntime += avg_vruntime(cfs_rq_of(&p->se)); } #endif @@ -4118,6 +4204,31 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task return rr_interval; } +static void yield_to_fair(struct task_stuct *p) +{ + struct sched_entity *se = ¤t->se; + struct sched_entity *p_se = &p->se; + u64 lag0, p_lag0; + s64 lag, p_lag; + + lag0 = avg_vruntime(cfs_rq_of(se)); + p_lag0 = avg_vruntime(cfs_rq_of(p_se)); + + lag = se->vruntime - avg_vruntime(cfs_rq); + p_lag = p_se->vruntime - avg_vruntime(p_cfs_rq); + + if (p_lag > lag) { /* if P is owed less service */ + se->vruntime = lag0 + p_lag; + p_se->vruntime = p_lag + lag; + } + + /* + * XXX try something smarter here + */ + resched_task(p); + resched_task(current); +} + /* * All the scheduling class methods: */ @@ -4150,6 +4261,8 @@ static const struct sched_class fair_sched_class = { .get_rr_interval = get_rr_interval_fair, + .yield_to = yield_to_fair, + #ifdef CONFIG_FAIR_GROUP_SCHED .task_move_group = task_move_group_fair, #endif diff --git a/kernel/sched_features.h b/kernel/sched_features.h index 68e69ac..feda9b8 100644 --- a/kernel/sched_features.h +++ b/kernel/sched_features.h @@ -6,12 +6,6 @@ SCHED_FEAT(GENTLE_FAIR_SLEEPERS, 1) /* - * Place new tasks ahead so that they do not starve already running - * tasks - */ -SCHED_FEAT(START_DEBIT, 1) - -/* * Should wakeups try to preempt running tasks. */ SCHED_FEAT(WAKEUP_PREEMPT, 1) @@ -53,6 +47,8 @@ SCHED_FEAT(HRTICK, 0) SCHED_FEAT(DOUBLE_TICK, 0) SCHED_FEAT(LB_BIAS, 1) +SCHED_FEAT(DYN_MIN_VRUNTIME, 0) + /* * Spin-wait on mutex acquisition when the mutex owner is running on * another cpu -- assumes that when the owner is running, it will soon ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-08 20:00 ` Peter Zijlstra @ 2010-12-08 20:04 ` Peter Zijlstra 2010-12-08 22:59 ` Rik van Riel 1 sibling, 0 replies; 37+ messages in thread From: Peter Zijlstra @ 2010-12-08 20:04 UTC (permalink / raw) To: Rik van Riel Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Ingo Molnar, Anthony Liguori, Paul Turner On Wed, 2010-12-08 at 21:00 +0100, Peter Zijlstra wrote: > + lag0 = avg_vruntime(cfs_rq_of(se)); > + p_lag0 = avg_vruntime(cfs_rq_of(p_se)); > + > + lag = se->vruntime - avg_vruntime(cfs_rq); > + p_lag = p_se->vruntime - avg_vruntime(p_cfs_rq); > + > + if (p_lag > lag) { /* if P is owed less service */ > + se->vruntime = lag0 + p_lag; > + p_se->vruntime = p_lag + lag; > + } clearly that should read: p_se->vruntime = p_lag0 + lag; ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/3] sched: add yield_to function 2010-12-08 20:00 ` Peter Zijlstra 2010-12-08 20:04 ` Peter Zijlstra @ 2010-12-08 22:59 ` Rik van Riel 1 sibling, 0 replies; 37+ messages in thread From: Rik van Riel @ 2010-12-08 22:59 UTC (permalink / raw) To: Peter Zijlstra Cc: kvm, linux-kernel, Avi Kiviti, Srivatsa Vaddagiri, Ingo Molnar, Anthony Liguori, Paul Turner On 12/08/2010 03:00 PM, Peter Zijlstra wrote: > Anyway, complete untested and such.. Looks very promising. I've been making a few changes in the same direction (except for the fancy CFS bits) and have one way to solve the one problem you pointed out in your patch. > +void yield_to(struct task_struct *p) > +{ ... > + on_rq = p->se.on_rq; > + if (on_rq) > + dequeue_task(p_rq, p, 0); > + > + ret = 0; > + if (p->sched_class == curr->sched_class&& curr->sched_class->yield_to) > + curr->sched_class->yield_to(p); > + > + if (on_rq) > + enqueue_task(p_rq, p, 0); > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index c886717..8689bcd 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > +static void yield_to_fair(struct task_stuct *p) > +{ > + struct sched_entity *se =¤t->se; > + struct sched_entity *p_se =&p->se; > + u64 lag0, p_lag0; > + s64 lag, p_lag; > + > + lag0 = avg_vruntime(cfs_rq_of(se)); > + p_lag0 = avg_vruntime(cfs_rq_of(p_se)); > + > + lag = se->vruntime - avg_vruntime(cfs_rq); > + p_lag = p_se->vruntime - avg_vruntime(p_cfs_rq); > + > + if (p_lag> lag) { /* if P is owed less service */ > + se->vruntime = lag0 + p_lag; > + p_se->vruntime = p_lag + lag; > + } > + > + /* > + * XXX try something smarter here > + */ > + resched_task(p); > + resched_task(current); > +} If we do the dequeue_task and enqueue_task here, we can use check_preempt_curr in yield_to_fair. Alternatively, we can do the rescheduling from the main yield_to function, not from yield_to_fair, by calling check_preempt_curr on p and current after p has been enqueued. -- All rights reversed ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2011-01-04 12:44 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-02 11:43 [RFC PATCH 2/3] sched: add yield_to function Hillf Danton 2011-01-02 17:19 ` Rik van Riel 2011-01-03 4:18 ` Hillf Danton 2011-01-03 5:04 ` Rik van Riel 2011-01-04 12:44 ` Hillf Danton -- strict thread matches above, loose matches on Subject: below -- 2010-12-02 19:41 [RFC PATCH 0/3] directed yield for Pause Loop Exiting Rik van Riel 2010-12-02 19:44 ` [RFC PATCH 2/3] sched: add yield_to function Rik van Riel 2010-12-03 0:50 ` Chris Wright 2010-12-03 18:27 ` Rik van Riel 2010-12-03 19:30 ` Chris Wright 2010-12-03 21:30 ` Peter Zijlstra 2010-12-03 5:54 ` Mike Galbraith 2010-12-03 13:46 ` Srivatsa Vaddagiri 2010-12-03 14:45 ` Mike Galbraith 2010-12-03 14:48 ` Rik van Riel 2010-12-03 15:09 ` Mike Galbraith 2010-12-03 15:35 ` Rik van Riel 2010-12-03 16:20 ` Srivatsa Vaddagiri 2010-12-03 17:09 ` Rik van Riel 2010-12-03 17:29 ` Srivatsa Vaddagiri 2010-12-03 17:33 ` Rik van Riel 2010-12-03 17:45 ` Srivatsa Vaddagiri 2010-12-03 20:05 ` Mike Galbraith 2010-12-03 21:26 ` Peter Zijlstra 2010-12-03 13:23 ` Peter Zijlstra 2010-12-03 13:30 ` Srivatsa Vaddagiri 2010-12-03 14:03 ` Peter Zijlstra 2010-12-03 14:06 ` Srivatsa Vaddagiri 2010-12-03 14:10 ` Srivatsa Vaddagiri 2010-12-03 21:23 ` Peter Zijlstra 2010-12-04 13:02 ` Rik van Riel 2010-12-10 4:34 ` Rik van Riel 2010-12-10 8:39 ` Srivatsa Vaddagiri 2010-12-10 14:55 ` Rik van Riel 2010-12-08 17:55 ` Rik van Riel 2010-12-08 20:00 ` Peter Zijlstra 2010-12-08 20:04 ` Peter Zijlstra 2010-12-08 22:59 ` Rik van Riel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.