All of lore.kernel.org
 help / color / mirror / Atom feed
* scheduler independent forced vcpu selection
@ 2005-05-17 20:48 Ryan Harper
  2005-05-18 12:10 ` Stephan Diestelhorst
  0 siblings, 1 reply; 9+ messages in thread
From: Ryan Harper @ 2005-05-17 20:48 UTC (permalink / raw)
  To: xen-devel

I'm working on a new hypercall, do_confer, which allows the directed
yielding of a vcpu to another vcpu.  It is mainly used when a vcpu fails
to acquire a spinlock, yielding to the lock holder instead of spinning. I
ported the ppc64 spinlock implementation for the i386 linux portion.  In
implementing the hypercall, I've been trying to figure out how to get
the scheduler (I've only played with bvt) to run the vcpu passed in the
hypercall (after some validation) but I've run into various bad state
situations (do_softirq pending != 0 assert, '!active_ac_timer(timer)'
failed , and __task_on_runqueue(prev) failed) which tells me I
don't fully understand all of the book-keeping that is needed.  Has
anyone thought about how to do this with either BVT or the new EDF
scheduler?

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-17 20:48 scheduler independent forced vcpu selection Ryan Harper
@ 2005-05-18 12:10 ` Stephan Diestelhorst
  2005-05-18 14:55   ` Ryan Harper
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Stephan Diestelhorst @ 2005-05-18 12:10 UTC (permalink / raw)
  To: Ryan Harper; +Cc: xen-devel

That is a good idea, there is quite a number of other spinlock
optimisations on the way...

> I'm working on a new hypercall, do_confer, which allows the directed
> yielding of a vcpu to another vcpu.  It is mainly used when a vcpu fails
> to acquire a spinlock, yielding to the lock holder instead of spinning. I
> ported the ppc64 spinlock implementation for the i386 linux portion.  In
> implementing the hypercall, I've been trying to figure out how to get
> the scheduler (I've only played with bvt) to run the vcpu passed in the
> hypercall (after some validation) but I've run into various bad state
> situations (do_softirq pending != 0 assert, '!active_ac_timer(timer)'
> failed , and __task_on_runqueue(prev) failed) which tells me I
> don't fully understand all of the book-keeping that is needed.  Has
> anyone thought about how to do this with either BVT or the new EDF
> scheduler?

Building code similar to do_block and __enter_scheduler in
xen/common/schedule.c should be working fine, except of course running
the original scheduler, but switching directly to the hinted domain.

Are you calling do_softirq directly? If not then it is quite strange,
that this assertion fails.
The timer assertion might be the old scheduling timer, which gets
probably reset, but not deleted beforehand... And the on runqueue
assertion suggests that you are 'stealing' the domain from the
schedulers queues without giving it a chance to notice.

I'd guess cloning do_block and appending code from __enter_scheduler
with some checks (is the 'receiver' domain runnable? if not run proper
sched.do_schedule) should give you a solid base to start from.

Cheers,
  Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-18 12:10 ` Stephan Diestelhorst
@ 2005-05-18 14:55   ` Ryan Harper
  2005-05-18 18:03   ` Ryan Harper
  2005-05-18 22:37   ` Ryan Harper
  2 siblings, 0 replies; 9+ messages in thread
From: Ryan Harper @ 2005-05-18 14:55 UTC (permalink / raw)
  To: Stephan Diestelhorst; +Cc: Ryan Harper, xen-devel

* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> Are you calling do_softirq directly? If not then it is quite strange,

No, I just call raise_softirq(SCHEDULE_SOFTIRQ); without a subsequent
do_softirq().

> that this assertion fails.
> The timer assertion might be the old scheduling timer, which gets
> probably reset, but not deleted beforehand... And the on runqueue
> assertion suggests that you are 'stealing' the domain from the
> schedulers queues without giving it a chance to notice.

Could you explain what 'giving it a chance to notice' means?

> I'd guess cloning do_block and appending code from __enter_scheduler
> with some checks (is the 'receiver' domain runnable? if not run proper
> sched.do_schedule) should give you a solid base to start from.

Let me add in a check for domain_runnable and see if that helps.

Thanks for the feedback.  Let me know if you want me to post the patch
of where I'm at right now.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-18 12:10 ` Stephan Diestelhorst
  2005-05-18 14:55   ` Ryan Harper
@ 2005-05-18 18:03   ` Ryan Harper
  2005-05-19 13:22     ` Stephan Diestelhorst
  2005-05-18 22:37   ` Ryan Harper
  2 siblings, 1 reply; 9+ messages in thread
From: Ryan Harper @ 2005-05-18 18:03 UTC (permalink / raw)
  To: Stephan Diestelhorst; +Cc: xen-devel

* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> The timer assertion might be the old scheduling timer, which gets
> probably reset, but not deleted beforehand... And the on runqueue
> assertion suggests that you are 'stealing' the domain from the
> schedulers queues without giving it a chance to notice.

Looking at both bvt and sedf, the runqueue is ordered by some metric or
another (evt, deadline respectively).  What I think we need is a way to
swap positions in the runqueues.  That is, if the lock holder is
runnable, I want the holder to run instead of current.  Is there some
way to do this in a scheduler independent manner with the current set of
scheduler ops defined in sched-if.h ?

I noticed that neither bvt or sedf implement the rem_task function which
I thought could be used to help out with the 'stealing' by notifying the
schedulers that prev was going away (removing it from the runqueue) but
just removing the exec_domain from the runqueue didn't help.

I'm including a patch that I'm currently using so you can get a better
idea of the modifications to schedule.c I'm making.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com


---
--- b/xen/common/schedule.c	2005-05-17 22:16:55.000000000 -0500
+++ c/xen/common/schedule.c	2005-05-18 12:42:44.765691872 -0500
@@ -273,6 +273,49 @@
     return 0;
 }
 
+/* Confer control to another vcpu */
+long do_confer(unsigned int vcpu, unsigned int yield_count)
+{
+    struct domain *d = current->domain;
+   
+    /* count hcalls */
+    current->confercnt++;
+
+    /* Validate CONFER prereqs:
+    * - vcpu is within bounds
+    * - vcpu is a valid in this domain
+    * - current has not already conferred its slice to vcpu
+    * - vcpu is not already running
+    * - designated vcpu's yield_count matches value from call
+    *
+    * of 1-4 are ok, then set conferred value and enter scheduler
+    */
+
+    if (vcpu > MAX_VIRT_CPUS)
+        return 0; 
+
+    if (d->exec_domain[vcpu] == NULL)
+        return 0;
+
+    if (current->conferred != VCPU_CANCONFER)
+        return 0;
+
+    /* even counts indicate a running vcpu, odd is preempted/conferred */
+    if ((d->exec_domain[vcpu]->vcpu_info->yield_count & 1) == 0)
+        return 0;
+
+    if (d->exec_domain[vcpu]->vcpu_info->yield_count != yield_count)
+        return 0;
+
+    /*
+     * set which vcpu should run in conferred state, request scheduling
+     */
+    current->conferred = (VCPU_CONFERRING|vcpu);
+    raise_softirq(SCHEDULE_SOFTIRQ);
+
+    return 0;
+}
+
 /*
  * Demultiplex scheduler-related hypercalls.
  */
@@ -412,8 +455,9 @@
  */
 static void __enter_scheduler(void)
 {
-    struct exec_domain *prev = current, *next = NULL;
+    struct exec_domain *prev = current, *next = NULL, *holder = NULL;
     int                 cpu = prev->processor;
+    unsigned int        holder_vcpu;
     s_time_t            now;
     struct task_slice   next_slice;
     s32                 r_time;     /* time for new dom to run */
@@ -436,12 +480,39 @@
 
     prev->cpu_time += now - prev->lastschd;
 
-    /* get policy-specific decision on scheduling... */
-    next_slice = ops.do_schedule(now);
+    /* get ed pointer to holder vcpu */
+    holder_vcpu = 0xffff & prev->conferred;
+    holder = prev->domain->exec_domain[holder_vcpu];
+
+    if (unlikely(prev->conferred & VCPU_CONFERRING) &&
+        domain_runnable(holder)) 
+    {
+        /* run holder next */
+        next = holder;
+
+        /* run for the remainder of prev's slice */
+        r_time = schedule_data[cpu].s_timer.expires - now;
+
+        /* increment confer counters */
+        prev->confer_out++;
+        next->confer_in++;
+
+        /* change prev's confer state to prevent re-entrance */
+        prev->conferred = VCPU_CONFERRED;
+
+    } else {      
+        /* get policy-specific decision on scheduling... */
+        next_slice = ops.do_schedule(now);
+
+        r_time = next_slice.time;
+        next = next_slice.task;
+    }
+
+    /* 
+     * always clear conferred state so this vcpu can confer during its slice
+     */
+    next->conferred = 0;
 
-    r_time = next_slice.time;
-    next = next_slice.task;
-    
     schedule_data[cpu].curr = next;
     
     next->lastschd = now;
@@ -455,6 +526,12 @@
 
     spin_unlock_irq(&schedule_data[cpu].schedule_lock);
 
+    /* bump vcpu yield_count when controlling domain is not-idle */
+    if ( !is_idle_task(prev->domain) )
+        prev->vcpu_info->yield_count++;
+    if ( !is_idle_task(next->domain) )
+        next->vcpu_info->yield_count++;
+
     if ( unlikely(prev == next) ) {
 #ifdef ADV_SCHED_HISTO
         adv_sched_hist_to_stop(cpu);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-18 12:10 ` Stephan Diestelhorst
  2005-05-18 14:55   ` Ryan Harper
  2005-05-18 18:03   ` Ryan Harper
@ 2005-05-18 22:37   ` Ryan Harper
  2005-05-19 13:25     ` Stephan Diestelhorst
  2 siblings, 1 reply; 9+ messages in thread
From: Ryan Harper @ 2005-05-18 22:37 UTC (permalink / raw)
  To: Stephan Diestelhorst; +Cc: xen-devel

* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> > I'm working on a new hypercall, do_confer, which allows the directed
> > yielding of a vcpu to another vcpu.  It is mainly used when a vcpu fails
> > to acquire a spinlock, yielding to the lock holder instead of spinning. I
> > ported the ppc64 spinlock implementation for the i386 linux portion.  In
> > implementing the hypercall, I've been trying to figure out how to get
> > the scheduler (I've only played with bvt) to run the vcpu passed in the
> > hypercall (after some validation) but I've run into various bad state
> > situations (do_softirq pending != 0 assert, '!active_ac_timer(timer)'
> > failed , and __task_on_runqueue(prev) failed) which tells me I
> > don't fully understand all of the book-keeping that is needed.  Has
> > anyone thought about how to do this with either BVT or the new EDF
> > scheduler?

After some thought, domain_wake(), followed by
raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge
mess I was making in __enter_scheduler().  

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-18 18:03   ` Ryan Harper
@ 2005-05-19 13:22     ` Stephan Diestelhorst
  0 siblings, 0 replies; 9+ messages in thread
From: Stephan Diestelhorst @ 2005-05-19 13:22 UTC (permalink / raw)
  To: Ryan Harper, xen-devel

Ryan Harper schrieb:
> * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> 
>>The timer assertion might be the old scheduling timer, which gets
>>probably reset, but not deleted beforehand... And the on runqueue
>>assertion suggests that you are 'stealing' the domain from the
>>schedulers queues without giving it a chance to notice.
> 
> 
> Looking at both bvt and sedf, the runqueue is ordered by some metric or
> another (evt, deadline respectively).  What I think we need is a way to
> swap positions in the runqueues.  That is, if the lock holder is
> runnable, I want the holder to run instead of current.  Is there some
> way to do this in a scheduler independent manner with the current set of
> scheduler ops defined in sched-if.h ?

How about blocking/pausing the currently running domain? I can't think
of another way of doing this in an scheduler independent fashion...

> I noticed that neither bvt or sedf implement the rem_task function which
> I thought could be used to help out with the 'stealing' by notifying the
> schedulers that prev was going away (removing it from the runqueue) but
> just removing the exec_domain from the runqueue didn't help.

That is really nasty, and just describes what I meant with "stealing" a
domain from the scheduler! :-)

> I'm including a patch that I'm currently using so you can get a better
> idea of the modifications to schedule.c I'm making.

Thanks,
  Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-18 22:37   ` Ryan Harper
@ 2005-05-19 13:25     ` Stephan Diestelhorst
  2005-05-19 14:55       ` Ryan Harper
  2005-05-19 15:05       ` Ryan Harper
  0 siblings, 2 replies; 9+ messages in thread
From: Stephan Diestelhorst @ 2005-05-19 13:25 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Stephan Diestelhorst, xen-devel

Ryan Harper schrieb:
> * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> 
>>>I'm working on a new hypercall, do_confer, which allows the directed
>>>yielding of a vcpu to another vcpu.  It is mainly used when a vcpu fails
>>>to acquire a spinlock, yielding to the lock holder instead of spinning. I
>>>ported the ppc64 spinlock implementation for the i386 linux portion.  In
>>>implementing the hypercall, I've been trying to figure out how to get
>>>the scheduler (I've only played with bvt) to run the vcpu passed in the
>>>hypercall (after some validation) but I've run into various bad state
>>>situations (do_softirq pending != 0 assert, '!active_ac_timer(timer)'
>>>failed , and __task_on_runqueue(prev) failed) which tells me I
>>>don't fully understand all of the book-keeping that is needed.  Has
>>>anyone thought about how to do this with either BVT or the new EDF
>>>scheduler?
> 
> 
> After some thought, domain_wake(), followed by
> raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge
> mess I was making in __enter_scheduler().  

Are you waking up the domain that holds the lock? Then you would rely on
the scheduler to give the woken domain a high "priority" (whatever this
means for the current scheduler) and should start that domain
immediatelly, right?

Best,
  Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-19 13:25     ` Stephan Diestelhorst
@ 2005-05-19 14:55       ` Ryan Harper
  2005-05-19 15:05       ` Ryan Harper
  1 sibling, 0 replies; 9+ messages in thread
From: Ryan Harper @ 2005-05-19 14:55 UTC (permalink / raw)
  To: Stephan Diestelhorst; +Cc: xen-devel

* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-19 09:04]:
> Ryan Harper schrieb:
> > * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> > 
> >>>I'm working on a new hypercall, do_confer, which allows the directed
> >>>yielding of a vcpu to another vcpu.  It is mainly used when a vcpu fails
> >>>to acquire a spinlock, yielding to the lock holder instead of spinning. I
> >>>ported the ppc64 spinlock implementation for the i386 linux portion.  In
> >>>implementing the hypercall, I've been trying to figure out how to get
> >>>the scheduler (I've only played with bvt) to run the vcpu passed in the
> >>>hypercall (after some validation) but I've run into various bad state
> >>>situations (do_softirq pending != 0 assert, '!active_ac_timer(timer)'
> >>>failed , and __task_on_runqueue(prev) failed) which tells me I
> >>>don't fully understand all of the book-keeping that is needed.  Has
> >>>anyone thought about how to do this with either BVT or the new EDF
> >>>scheduler?
> > 
> > 
> > After some thought, domain_wake(), followed by
> > raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge
> > mess I was making in __enter_scheduler().  
> 
> Are you waking up the domain that holds the lock? Then you would rely on

Yes, that is the idea.

> the scheduler to give the woken domain a high "priority" (whatever this
> means for the current scheduler) and should start that domain
> immediatelly, right?

Yes, that is part of what is required.  I need to do two things after
validation of do_confer:

1) Wake the lock-holder vcpu
2) Schedule the lock-holder to only run for the remaining time-slice of
the current running vcpu.

Using domain_wake() and softirq, I'm only getting (1), but I have no
guarantee when the lock-holder is actually woken up.  

Any thoughts on how to get (2)?

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scheduler independent forced vcpu selection
  2005-05-19 13:25     ` Stephan Diestelhorst
  2005-05-19 14:55       ` Ryan Harper
@ 2005-05-19 15:05       ` Ryan Harper
  1 sibling, 0 replies; 9+ messages in thread
From: Ryan Harper @ 2005-05-19 15:05 UTC (permalink / raw)
  To: Stephan Diestelhorst; +Cc: Stephan Diestelhorst, Ryan Harper, xen-devel

* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-19 09:04]:
> Ryan Harper schrieb:
> > * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:
> > 
> >>>I'm working on a new hypercall, do_confer, which allows the directed
> >>>yielding of a vcpu to another vcpu.  It is mainly used when a vcpu fails
> >>>to acquire a spinlock, yielding to the lock holder instead of spinning. I
> >>>ported the ppc64 spinlock implementation for the i386 linux portion.  In
> >>>implementing the hypercall, I've been trying to figure out how to get
> >>>the scheduler (I've only played with bvt) to run the vcpu passed in the
> >>>hypercall (after some validation) but I've run into various bad state
> >>>situations (do_softirq pending != 0 assert, '!active_ac_timer(timer)'
> >>>failed , and __task_on_runqueue(prev) failed) which tells me I
> >>>don't fully understand all of the book-keeping that is needed.  Has
> >>>anyone thought about how to do this with either BVT or the new EDF
> >>>scheduler?
> > 
> > 
> > After some thought, domain_wake(), followed by
> > raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge
> > mess I was making in __enter_scheduler().  
> 
> Are you waking up the domain that holds the lock? Then you would rely on
> the scheduler to give the woken domain a high "priority" (whatever this
> means for the current scheduler) and should start that domain
> immediatelly, right?

I noticed your comments in sched_sedf.c about domain waking.

* 3. Unconservative (i.e. incorrect)
*     -to boost the performance of I/O dependent domains it would be possible
*      to put the domain into the runnable queue immediately, and let it run
*      for the remainder of the slice of the current period
*      (or even worse: allocate a new full slice for the domain)
*     -either behaviour can lead to missed deadlines in other domains as
*      opposed to approaches 1,2a,2b

Giving the remainder of the current slice to the domain we are waking
*sounds* like what I wanted, but you are concerned that it causes missed
deadlines.  Could you elaborate when we would have such a case?  If we are
only running in the remaining timeslice (which would expire before the
next deadline) then why would such behaviour lead to missing deadlines?


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-05-19 15:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-17 20:48 scheduler independent forced vcpu selection Ryan Harper
2005-05-18 12:10 ` Stephan Diestelhorst
2005-05-18 14:55   ` Ryan Harper
2005-05-18 18:03   ` Ryan Harper
2005-05-19 13:22     ` Stephan Diestelhorst
2005-05-18 22:37   ` Ryan Harper
2005-05-19 13:25     ` Stephan Diestelhorst
2005-05-19 14:55       ` Ryan Harper
2005-05-19 15:05       ` Ryan Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.