[PROPOSAL] Doing work in idle-vcpu context

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PROPOSAL] Doing work in idle-vcpu context
@ 2010-04-16 18:05 Keir Fraser
  2010-04-19  5:00 ` Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Keir Fraser @ 2010-04-16 18:05 UTC (permalink / raw)
  To: Jiang, Yunhong, George Dunlap, xen-devel@lists.xensource.com

George, Yunhong, and others,

So, it seems that runing stop_machine_run(), and now
continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
Because the softirq can stop the currently-running vcpu from being
descheduled we can end up with subtle deadlocks. For example, with s_m_r()
we try to rendezvous all cpus in softirq context -- we can have CPU A enter
the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
never leaves it, and we have deadlock.

There are various possible solutions to this, but one of the architecturally
neatest would be to run the s_m_r() and c_h_o_c() work in a
'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
context. Rather than introducing the whole kthread concept into Xen, one
possibility would be to schedule this work on the idle vcpus -- effectively
promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
can include running the idle loop.

One bit of mechanism this would require is the ability to bump the idle vcpu
priority up - preferably to 'max' priority forcing it to run next until we
return it to idle/lowest priority. George: how hard would such a mechanism
be to implement do you think?

More generally: what do people think of this idea?

 Thanks,
 Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-16 18:05 [PROPOSAL] Doing work in idle-vcpu context Keir Fraser
@ 2010-04-19  5:00 ` Juergen Gross
  2010-04-19  5:55 ` Jiang, Yunhong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Juergen Gross @ 2010-04-19  5:00 UTC (permalink / raw)
  To: Keir Fraser; +Cc: George Dunlap, Jiang, Yunhong, xen-devel@lists.xensource.com

Keir Fraser wrote:
> George, Yunhong, and others,
> 
> So, it seems that runing stop_machine_run(), and now
> continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
> Because the softirq can stop the currently-running vcpu from being
> descheduled we can end up with subtle deadlocks. For example, with s_m_r()
> we try to rendezvous all cpus in softirq context -- we can have CPU A enter
> the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
> trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
> never leaves it, and we have deadlock.
> 
> There are various possible solutions to this, but one of the architecturally
> neatest would be to run the s_m_r() and c_h_o_c() work in a
> 'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
> context. Rather than introducing the whole kthread concept into Xen, one
> possibility would be to schedule this work on the idle vcpus -- effectively
> promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
> can include running the idle loop.
> 
> One bit of mechanism this would require is the ability to bump the idle vcpu
> priority up - preferably to 'max' priority forcing it to run next until we
> return it to idle/lowest priority. George: how hard would such a mechanism
> be to implement do you think?
> 
> More generally: what do people think of this idea?

Sounds a little bit like my original proposal for the change of c_h_o_c():
Introduce a "hypervisor domain" for stuff like this.
I still think, this would be cleaner. The hypervisor vcpus would run with
high priority and they could block without rising major problems.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-16 18:05 [PROPOSAL] Doing work in idle-vcpu context Keir Fraser
  2010-04-19  5:00 ` Juergen Gross
@ 2010-04-19  5:55 ` Jiang, Yunhong
  2010-04-19  6:08   ` Dulloor
  2010-04-19  6:45   ` Keir Fraser
  2010-04-19  9:43 ` George Dunlap
  2010-04-19 10:52 ` Keir Fraser
  3 siblings, 2 replies; 10+ messages in thread
From: Jiang, Yunhong @ 2010-04-19  5:55 UTC (permalink / raw)
  To: Keir Fraser, George Dunlap, xen-devel@lists.xensource.com



>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Saturday, April 17, 2010 2:06 AM
>To: Jiang, Yunhong; George Dunlap; xen-devel@lists.xensource.com
>Subject: [PROPOSAL] Doing work in idle-vcpu context
>
>George, Yunhong, and others,
>
>So, it seems that runing stop_machine_run(), and now
>continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
>Because the softirq can stop the currently-running vcpu from being
>descheduled we can end up with subtle deadlocks. For example, with s_m_r()
>we try to rendezvous all cpus in softirq context -- we can have CPU A enter
>the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
>trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
>never leaves it, and we have deadlock.
>
>There are various possible solutions to this, but one of the architecturally
>neatest would be to run the s_m_r() and c_h_o_c() work in a
>'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
>context. Rather than introducing the whole kthread concept into Xen, one
>possibility would be to schedule this work on the idle vcpus -- effectively
>promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
>can include running the idle loop.
>
>One bit of mechanism this would require is the ability to bump the idle vcpu
>priority up - preferably to 'max' priority forcing it to run next until we
>return it to idle/lowest priority. George: how hard would such a mechanism
>be to implement do you think?
>
>More generally: what do people think of this idea?

The only concern from me is, are there any assumption in other components that idle vcpu is always for idle, and is always lowest priority?

--jyh

>
> Thanks,
> Keir
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RE: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-19  5:55 ` Jiang, Yunhong
@ 2010-04-19  6:08   ` Dulloor
  2010-04-19  6:53     ` Keir Fraser
  2010-04-19  6:45   ` Keir Fraser
  1 sibling, 1 reply; 10+ messages in thread
From: Dulloor @ 2010-04-19  6:08 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: George Dunlap, xen-devel@lists.xensource.com, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

On Mon, Apr 19, 2010 at 1:55 AM, Jiang, Yunhong <yunhong.jiang@intel.com> wrote:
>
>
>>-----Original Message-----
>>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>Sent: Saturday, April 17, 2010 2:06 AM
>>To: Jiang, Yunhong; George Dunlap; xen-devel@lists.xensource.com
>>Subject: [PROPOSAL] Doing work in idle-vcpu context
>>
>>George, Yunhong, and others,
>>
>>So, it seems that runing stop_machine_run(), and now
>>continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
>>Because the softirq can stop the currently-running vcpu from being
>>descheduled we can end up with subtle deadlocks. For example, with s_m_r()
>>we try to rendezvous all cpus in softirq context -- we can have CPU A enter
>>the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
>>trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
>>never leaves it, and we have deadlock.
>>
>>There are various possible solutions to this, but one of the architecturally
>>neatest would be to run the s_m_r() and c_h_o_c() work in a
>>'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
>>context. Rather than introducing the whole kthread concept into Xen, one
>>possibility would be to schedule this work on the idle vcpus -- effectively
>>promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
>>can include running the idle loop.
>>
>>One bit of mechanism this would require is the ability to bump the idle vcpu
>>priority up - preferably to 'max' priority forcing it to run next until we
>>return it to idle/lowest priority. George: how hard would such a mechanism
>>be to implement do you think?
>>
>>More generally: what do people think of this idea?
>
> The only concern from me is, are there any assumption in other components that idle
> vcpu is always for idle, and is always lowest priority?

Using the idle_domain as a worker_domain sounds a good idea. And,
bumping the credit up
doesn't seem to be too difficult. I have attached a quickly whipped
working patch (with a test driver) for this.
Not many scheduler changes. I have looked at all the other places for
idle_vcpu and
PRI_IDLE too and they look fine to me.

Keir, is this similar to what you are looking for ?

>
> --jyh
>
>>
>> Thanks,
>> Keir
>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #2: workqueue.patch --]
[-- Type: text/x-patch, Size: 6469 bytes --]

diff -r 7ee8bb40200a xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/arch/x86/domain.c	Mon Apr 19 02:01:06 2010 -0400
@@ -51,6 +51,7 @@
 #include <asm/nmi.h>
 #include <xen/numa.h>
 #include <xen/iommu.h>
+#include <xen/workqueue.h>
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
 #endif
@@ -116,12 +117,58 @@
     (*dead_idle)();
 }
 
+/* xen_work implementation */
+DEFINE_PER_CPU(struct xen_workqueue_struct, xen_workqueue);
+
+void init_xen_workqueues(void)
+{
+	int i;
+	for_each_possible_cpu(i)
+	{
+		struct xen_workqueue_struct *wq = &per_cpu(xen_workqueue, i);
+		INIT_LIST_HEAD(&wq->list);
+		spin_lock_init(&wq->lock);
+	}
+}
+
+int schedule_xen_work_on(int cpu, struct xen_work_struct *work)
+{
+	struct xen_workqueue_struct *wq = &per_cpu(xen_workqueue, cpu);
+
+	spin_lock(&wq->lock);
+	list_add_tail(&work->entry, &wq->list);
+	spin_unlock(&wq->lock);
+
+	return 0;
+}
+
+/* execute xen_work queued on xen_workqueue[cpu] 
+ * XXX: called only from idle_loop */
+static void process_xen_workqueue(void)
+{
+	struct xen_workqueue_struct *wq = &this_cpu(xen_workqueue);
+
+	spin_lock(&wq->lock);
+	while (!list_empty(&wq->list))
+	{
+		struct xen_work_struct *work = list_entry(wq->list.next,
+										struct xen_work_struct, entry);
+		xen_work_func_t func = work->func;
+		list_del_init(wq->list.next);
+		spin_unlock(&wq->lock);
+		func(work);
+		spin_lock(&wq->lock);
+	}
+	spin_unlock(&wq->lock);
+}
+
 void idle_loop(void)
 {
     for ( ; ; )
     {
         if ( cpu_is_offline(smp_processor_id()) )
             play_dead();
+		process_xen_workqueue();
         (*pm_idle)();
         do_softirq();
     }
diff -r 7ee8bb40200a xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/arch/x86/setup.c	Mon Apr 19 02:01:06 2010 -0400
@@ -21,6 +21,7 @@
 #include <xen/vga.h>
 #include <xen/dmi.h>
 #include <xen/nodemask.h>
+#include <xen/workqueue.h>
 #include <public/version.h>
 #ifdef CONFIG_COMPAT
 #include <compat/platform.h>
@@ -242,6 +243,7 @@
 {
     struct domain *idle_domain;
 
+	init_xen_workqueues();
     /* Domain creation requires that scheduler structures are initialised. */
     scheduler_init();
 
diff -r 7ee8bb40200a xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/common/sched_credit.c	Mon Apr 19 02:01:06 2010 -0400
@@ -22,6 +22,7 @@
 #include <asm/atomic.h>
 #include <xen/errno.h>
 #include <xen/keyhandler.h>
+#include <xen/workqueue.h>
 
 /*
  * CSCHED_STATS
@@ -1173,6 +1174,11 @@
         burn_credits(scurr, now);
         scurr->start_time -= now;
     }
+	else
+	{ /* idle vcpu - revert the priority, if needed */
+		if (scurr->pri != CSCHED_PRI_IDLE)
+			scurr->pri = CSCHED_PRI_IDLE;
+	}
 
     /*
      * Select next runnable local VCPU (ie top of local runq)
@@ -1182,7 +1188,15 @@
     else
         BUG_ON( is_idle_vcpu(current) || list_empty(runq) );
 
-    snext = __runq_elem(runq->next);
+	/* There is work to be done (in idle_vcpu context). Temporarily boost
+	 * the priority of idle_vcpu[cpu] and schedule it */
+	if (!xen_workqueue_empty(cpu))
+	{
+		snext = CSCHED_VCPU(idle_vcpu[cpu]);
+		snext->pri = CSCHED_PRI_TS_BOOST;
+	}
+	else
+    	snext = __runq_elem(runq->next);
 
     /*
      * SMP Load balance:
@@ -1217,7 +1231,7 @@
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_vcpu(snext->vcpu) ?
+    ret.time = ((is_idle_vcpu(snext->vcpu) && (snext->pri==CSCHED_PRI_IDLE)) ?
                 -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
     ret.task = snext->vcpu;
 
diff -r 7ee8bb40200a xen/common/schedule.c
--- a/xen/common/schedule.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/common/schedule.c	Mon Apr 19 02:01:06 2010 -0400
@@ -31,6 +31,7 @@
 #include <xen/errno.h>
 #include <xen/guest_access.h>
 #include <xen/multicall.h>
+#include <xen/workqueue.h>
 #include <public/sched.h>
 #include <xsm/xsm.h>
 
@@ -955,12 +956,35 @@
     SCHED_OP(init);
 }
 
+static struct xen_work_struct dummy_work[NR_CPUS];
+static void dummy_xen_work_func(struct xen_work_struct *work)
+{
+    unsigned int cpu = smp_processor_id();
+	s_time_t then = (s_time_t)work->data;
+	s_time_t now = NOW();
+	printk("XEN_WORK_CPU[%02d] : %lu -> %lu (%lu)\n", cpu, 
+			(unsigned long)then, (unsigned long)now, (unsigned long)(now-then));
+	return;
+}
+
+static void schedule_xen_work_on_all(void)
+{
+	int i;
+	for_each_online_cpu(i)
+	{
+		INIT_XEN_WORK(&dummy_work[i], dummy_xen_work_func, NOW());
+		schedule_xen_work_on(i, &dummy_work[i]);
+	}
+}
+
 void dump_runq(unsigned char key)
 {
     s_time_t      now = NOW();
     int           i;
     unsigned long flags;
 
+	schedule_xen_work_on_all();
+
     local_irq_save(flags);
 
     printk("Scheduler: %s (%s)\n", ops.name, ops.opt_name);
diff -r 7ee8bb40200a xen/include/xen/workqueue.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/xen/workqueue.h	Mon Apr 19 02:01:06 2010 -0400
@@ -0,0 +1,56 @@
+#ifndef __XEN_WORKQUEUE_H
+#define __XEN_WORKQUEUE_H
+
+#include <xen/list.h>
+#include <xen/spinlock.h>
+#include <xen/config.h>
+
+/* xen_workqueue is a per-cpu workqueue through which we can assign work
+ * to a cpu using xen_work_struct. The work actually gets done in the context
+ * of idle_domain's vcpu (in idle_loop). */
+struct xen_work_struct;
+typedef void(*xen_work_func_t)(struct xen_work_struct *work);
+struct xen_work_struct
+{
+	struct list_head entry;
+	uint64_t data;
+	xen_work_func_t func;
+};
+
+
+#define DECLARE_XEN_WORK(_n) 								\
+	struct xen_work_struct _n = {							\
+					.entry = {&(_n).entry, &(_n).entry},	\
+					.data = 0, 								\
+					.func = 0,								\
+					}
+
+#define INIT_XEN_WORK(_w, _f, _d)		\
+	do{									\
+		INIT_LIST_HEAD(&(_w)->entry);	\
+		(_w)->data = (_d);				\
+		(_w)->func = (_f);				\
+	}while(0)
+
+extern int schedule_xen_work_on(int cpu, struct xen_work_struct *work);
+
+struct xen_workqueue_struct {
+	struct list_head list;
+	spinlock_t lock;
+};
+DECLARE_PER_CPU(struct xen_workqueue_struct, xen_workqueue);
+extern void init_xen_workqueues(void);
+
+/* scheduler must first check for any pending work in xen_workqueue */
+static inline int xen_workqueue_empty(int cpu)
+{
+	int ret;
+	//unsigned long flags;
+	struct xen_workqueue_struct *wq = &per_cpu(xen_workqueue, cpu);
+	//spin_lock_irqsave(&wq->lock, flags);
+	ret = list_empty(&wq->list);
+	//spin_unlock_irqrestore(&wq->lock, flags);
+	return ret;
+}
+
+#endif

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-19  5:55 ` Jiang, Yunhong
  2010-04-19  6:08   ` Dulloor
@ 2010-04-19  6:45   ` Keir Fraser
  1 sibling, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2010-04-19  6:45 UTC (permalink / raw)
  To: Jiang, Yunhong, George Dunlap, xen-devel@lists.xensource.com

On 19/04/2010 06:55, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

>> One bit of mechanism this would require is the ability to bump the idle vcpu
>> priority up - preferably to 'max' priority forcing it to run next until we
>> return it to idle/lowest priority. George: how hard would such a mechanism
>> be to implement do you think?
>> 
>> More generally: what do people think of this idea?
> 
> The only concern from me is, are there any assumption in other components that
> idle vcpu is always for idle, and is always lowest priority?

I suppose we would find out. I don't think so, except of course it is built
into the scheduler that it is lowest priority.

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RE: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-19  6:08   ` Dulloor
@ 2010-04-19  6:53     ` Keir Fraser
  0 siblings, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2010-04-19  6:53 UTC (permalink / raw)
  To: Dulloor, Jiang, Yunhong; +Cc: George Dunlap, xen-devel@lists.xensource.com

On 19/04/2010 07:08, "Dulloor" <dulloor@gmail.com> wrote:

> Using the idle_domain as a worker_domain sounds a good idea. And,
> bumping the credit up
> doesn't seem to be too difficult. I have attached a quickly whipped
> working patch (with a test driver) for this.
> Not many scheduler changes. I have looked at all the other places for
> idle_vcpu and
> PRI_IDLE too and they look fine to me.
> 
> Keir, is this similar to what you are looking for ?

Yes indeed something similar to that. It's the scheduler changes I can't be
sure about, and of course we really need SEDF and credit2 changes as well,
hence I'd also like some feedback from George. Letting the scheduler peek
straight at per-cpu workqueues to check for work to do is a good idea
though. Easier than adding more scheduler hooks I think.

 Thanks,
 Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-16 18:05 [PROPOSAL] Doing work in idle-vcpu context Keir Fraser
  2010-04-19  5:00 ` Juergen Gross
  2010-04-19  5:55 ` Jiang, Yunhong
@ 2010-04-19  9:43 ` George Dunlap
  2010-04-19 10:52 ` Keir Fraser
  3 siblings, 0 replies; 10+ messages in thread
From: George Dunlap @ 2010-04-19  9:43 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Jiang, Yunhong, xen-devel@lists.xensource.com

Keir Fraser wrote:
> One bit of mechanism this would require is the ability to bump the idle vcpu
> priority up - preferably to 'max' priority forcing it to run next until we
> return it to idle/lowest priority. George: how hard would such a mechanism
> be to implement do you think?
>
> More generally: what do people think of this idea?
>   
I think it's a pretty good idea.  Obviously having to have the the idle 
thread be able to switch to "schedule me NOW" priority would involve 
some special-casing (esp with shared runqueues), but it shoudn't be too bad.

 -George

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-16 18:05 [PROPOSAL] Doing work in idle-vcpu context Keir Fraser
                   ` (2 preceding siblings ...)
  2010-04-19  9:43 ` George Dunlap
@ 2010-04-19 10:52 ` Keir Fraser
  2010-04-20  6:50   ` Dulloor
  3 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2010-04-19 10:52 UTC (permalink / raw)
  To: Jiang, Yunhong, George Dunlap, xen-devel@lists.xensource.com; +Cc: Dulloor

So I've now implemented this at the tip of xen-unstable staging tree. Except
that I retasked the concept of 'tasklets' to implement this, rather than
introducing a whole new abstraction like Linux workqueues.

Thanks to Dulloor for initial changes to the credit scheduler. I should have
acknowledged you in the changeset comment too: sorry about that. :-(

George: let me know if the scheduler changes in c/s 21197 look okay.

 Thanks,
 Keir

On 16/04/2010 19:05, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> George, Yunhong, and others,
> 
> So, it seems that runing stop_machine_run(), and now
> continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
> Because the softirq can stop the currently-running vcpu from being
> descheduled we can end up with subtle deadlocks. For example, with s_m_r()
> we try to rendezvous all cpus in softirq context -- we can have CPU A enter
> the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
> trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
> never leaves it, and we have deadlock.
> 
> There are various possible solutions to this, but one of the architecturally
> neatest would be to run the s_m_r() and c_h_o_c() work in a
> 'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
> context. Rather than introducing the whole kthread concept into Xen, one
> possibility would be to schedule this work on the idle vcpus -- effectively
> promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
> can include running the idle loop.
> 
> One bit of mechanism this would require is the ability to bump the idle vcpu
> priority up - preferably to 'max' priority forcing it to run next until we
> return it to idle/lowest priority. George: how hard would such a mechanism
> be to implement do you think?
> 
> More generally: what do people think of this idea?
> 
>  Thanks,
>  Keir
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-19 10:52 ` Keir Fraser
@ 2010-04-20  6:50   ` Dulloor
  2010-04-20 12:47     ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Dulloor @ 2010-04-20  6:50 UTC (permalink / raw)
  To: Keir Fraser; +Cc: George Dunlap, Jiang, Yunhong, xen-devel@lists.xensource.com

On Mon, Apr 19, 2010 at 6:52 AM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:
> So I've now implemented this at the tip of xen-unstable staging tree. Except
> that I retasked the concept of 'tasklets' to implement this, rather than
> introducing a whole new abstraction like Linux workqueues.
Yeah, this looks better.
>
> Thanks to Dulloor for initial changes to the credit scheduler. I should have
> acknowledged you in the changeset comment too: sorry about that. :-(
No problem :)

>
> George: let me know if the scheduler changes in c/s 21197 look okay.

George might be able to comment better, but two things :
1. Should we not set (ret.time) to some timeslice (rather than -1)
when we BOOST the idle_vcpu (for csched and csched2).
2. Is it fine to use a simple list_empty in checking if the
tasklet_queue is empty for a cpu, with other cpus possibly accessing
the list too.

>
>  Thanks,
>  Keir
>
> On 16/04/2010 19:05, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>
>> George, Yunhong, and others,
>>
>> So, it seems that runing stop_machine_run(), and now
>> continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
>> Because the softirq can stop the currently-running vcpu from being
>> descheduled we can end up with subtle deadlocks. For example, with s_m_r()
>> we try to rendezvous all cpus in softirq context -- we can have CPU A enter
>> the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
>> trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
>> never leaves it, and we have deadlock.
>>
>> There are various possible solutions to this, but one of the architecturally
>> neatest would be to run the s_m_r() and c_h_o_c() work in a
>> 'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
>> context. Rather than introducing the whole kthread concept into Xen, one
>> possibility would be to schedule this work on the idle vcpus -- effectively
>> promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
>> can include running the idle loop.
>>
>> One bit of mechanism this would require is the ability to bump the idle vcpu
>> priority up - preferably to 'max' priority forcing it to run next until we
>> return it to idle/lowest priority. George: how hard would such a mechanism
>> be to implement do you think?
>>
>> More generally: what do people think of this idea?
>>
>>  Thanks,
>>  Keir
>>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PROPOSAL] Doing work in idle-vcpu context
  2010-04-20  6:50   ` Dulloor
@ 2010-04-20 12:47     ` Keir Fraser
  0 siblings, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2010-04-20 12:47 UTC (permalink / raw)
  To: Dulloor; +Cc: George Dunlap, Jiang, Yunhong, xen-devel@lists.xensource.com

On 20/04/2010 07:50, "Dulloor" <dulloor@gmail.com> wrote:

>> George: let me know if the scheduler changes in c/s 21197 look okay.
> 
> George might be able to comment better, but two things :
> 1. Should we not set (ret.time) to some timeslice (rather than -1)
> when we BOOST the idle_vcpu (for csched and csched2).

I purposely skipped that bit of your patch because the idle vcpu will not be
descheduled until it voluntarily re-enters the scheduler, and there is no
tasklet work to do, so it becomes unboosted. The time-slice mechanism is
completely redundant in this scenario so we may as well leave it turned off.

I would have done that in the other two schedulers too, but they still
appear to like to set a real timeout for the idle vcpu. I don't know why,
but George will be able to answer for credit2 at least!

> 2. Is it fine to use a simple list_empty in checking if the
> tasklet_queue is empty for a cpu, with other cpus possibly accessing
> the list too.

The scheduler will always be run through in its entirety at some time after
any change is made to the tasklet_list (because we always raise
SCHEDULE_SOFTIRQ immediately after any such change). As long as the tasklet
is not removed meanwhile it is *guaranteed* that list_empty() will return
false on that subsequent scheduler run-through.

Also note that no matter how clever we make the tasklet_queue_empty()
function, the caller itself is not synchronised to the tasklet subsystem, so
its return value can be stale before the caller even looks at it, no matter
what syncronisation we do before return. Therefore we have to accept that
the scheduler can act on stale/bad information, and simply ensure that in
such cases the scheduler will get run through again very soon after. Hence
the proliferation of [cpu_]raise_softirq() calls in tasklet.c!

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-04-20 12:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-16 18:05 [PROPOSAL] Doing work in idle-vcpu context Keir Fraser
2010-04-19  5:00 ` Juergen Gross
2010-04-19  5:55 ` Jiang, Yunhong
2010-04-19  6:08   ` Dulloor
2010-04-19  6:53     ` Keir Fraser
2010-04-19  6:45   ` Keir Fraser
2010-04-19  9:43 ` George Dunlap
2010-04-19 10:52 ` Keir Fraser
2010-04-20  6:50   ` Dulloor
2010-04-20 12:47     ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).