Re: RE: [PROPOSAL] Doing work in idle-vcpu context

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dulloor <dulloor@gmail.com>
To: "Jiang, Yunhong" <yunhong.jiang@intel.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>
Subject: Re: RE: [PROPOSAL] Doing work in idle-vcpu context
Date: Mon, 19 Apr 2010 02:08:10 -0400	[thread overview]
Message-ID: <z2t940bcfd21004182308ra988bdjf4776b0a32fc5dcc@mail.gmail.com> (raw)
In-Reply-To: <789F9655DD1B8F43B48D77C5D30659731D797DC3@shsmsx501.ccr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

On Mon, Apr 19, 2010 at 1:55 AM, Jiang, Yunhong <yunhong.jiang@intel.com> wrote:
>
>
>>-----Original Message-----
>>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>Sent: Saturday, April 17, 2010 2:06 AM
>>To: Jiang, Yunhong; George Dunlap; xen-devel@lists.xensource.com
>>Subject: [PROPOSAL] Doing work in idle-vcpu context
>>
>>George, Yunhong, and others,
>>
>>So, it seems that runing stop_machine_run(), and now
>>continue_hypercall_on_cpu(), in softirq context is a bit of a problem.
>>Because the softirq can stop the currently-running vcpu from being
>>descheduled we can end up with subtle deadlocks. For example, with s_m_r()
>>we try to rendezvous all cpus in softirq context -- we can have CPU A enter
>>the softirq interrupting VCPU X, meanwhile VCPU Y on CPU B is spinning
>>trying to pause VCPU X. Hence CPU B doesn't get into softirq, and so CPU A
>>never leaves it, and we have deadlock.
>>
>>There are various possible solutions to this, but one of the architecturally
>>neatest would be to run the s_m_r() and c_h_o_c() work in a
>>'Linux-workqueue' type of environment -- i.e., in a proper non-guest vcpu
>>context. Rather than introducing the whole kthread concept into Xen, one
>>possibility would be to schedule this work on the idle vcpus -- effectively
>>promoting idle vcpus to a more general kind of 'Xen worker vcpu' whose job
>>can include running the idle loop.
>>
>>One bit of mechanism this would require is the ability to bump the idle vcpu
>>priority up - preferably to 'max' priority forcing it to run next until we
>>return it to idle/lowest priority. George: how hard would such a mechanism
>>be to implement do you think?
>>
>>More generally: what do people think of this idea?
>
> The only concern from me is, are there any assumption in other components that idle
> vcpu is always for idle, and is always lowest priority?

Using the idle_domain as a worker_domain sounds a good idea. And,
bumping the credit up
doesn't seem to be too difficult. I have attached a quickly whipped
working patch (with a test driver) for this.
Not many scheduler changes. I have looked at all the other places for
idle_vcpu and
PRI_IDLE too and they look fine to me.

Keir, is this similar to what you are looking for ?

>
> --jyh
>
>>
>> Thanks,
>> Keir
>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #2: workqueue.patch --]
[-- Type: text/x-patch, Size: 6469 bytes --]

diff -r 7ee8bb40200a xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/arch/x86/domain.c	Mon Apr 19 02:01:06 2010 -0400
@@ -51,6 +51,7 @@
 #include <asm/nmi.h>
 #include <xen/numa.h>
 #include <xen/iommu.h>
+#include <xen/workqueue.h>
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
 #endif
@@ -116,12 +117,58 @@
     (*dead_idle)();
 }
 
+/* xen_work implementation */
+DEFINE_PER_CPU(struct xen_workqueue_struct, xen_workqueue);
+
+void init_xen_workqueues(void)
+{
+	int i;
+	for_each_possible_cpu(i)
+	{
+		struct xen_workqueue_struct *wq = &per_cpu(xen_workqueue, i);
+		INIT_LIST_HEAD(&wq->list);
+		spin_lock_init(&wq->lock);
+	}
+}
+
+int schedule_xen_work_on(int cpu, struct xen_work_struct *work)
+{
+	struct xen_workqueue_struct *wq = &per_cpu(xen_workqueue, cpu);
+
+	spin_lock(&wq->lock);
+	list_add_tail(&work->entry, &wq->list);
+	spin_unlock(&wq->lock);
+
+	return 0;
+}
+
+/* execute xen_work queued on xen_workqueue[cpu] 
+ * XXX: called only from idle_loop */
+static void process_xen_workqueue(void)
+{
+	struct xen_workqueue_struct *wq = &this_cpu(xen_workqueue);
+
+	spin_lock(&wq->lock);
+	while (!list_empty(&wq->list))
+	{
+		struct xen_work_struct *work = list_entry(wq->list.next,
+										struct xen_work_struct, entry);
+		xen_work_func_t func = work->func;
+		list_del_init(wq->list.next);
+		spin_unlock(&wq->lock);
+		func(work);
+		spin_lock(&wq->lock);
+	}
+	spin_unlock(&wq->lock);
+}
+
 void idle_loop(void)
 {
     for ( ; ; )
     {
         if ( cpu_is_offline(smp_processor_id()) )
             play_dead();
+		process_xen_workqueue();
         (*pm_idle)();
         do_softirq();
     }
diff -r 7ee8bb40200a xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/arch/x86/setup.c	Mon Apr 19 02:01:06 2010 -0400
@@ -21,6 +21,7 @@
 #include <xen/vga.h>
 #include <xen/dmi.h>
 #include <xen/nodemask.h>
+#include <xen/workqueue.h>
 #include <public/version.h>
 #ifdef CONFIG_COMPAT
 #include <compat/platform.h>
@@ -242,6 +243,7 @@
 {
     struct domain *idle_domain;
 
+	init_xen_workqueues();
     /* Domain creation requires that scheduler structures are initialised. */
     scheduler_init();
 
diff -r 7ee8bb40200a xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/common/sched_credit.c	Mon Apr 19 02:01:06 2010 -0400
@@ -22,6 +22,7 @@
 #include <asm/atomic.h>
 #include <xen/errno.h>
 #include <xen/keyhandler.h>
+#include <xen/workqueue.h>
 
 /*
  * CSCHED_STATS
@@ -1173,6 +1174,11 @@
         burn_credits(scurr, now);
         scurr->start_time -= now;
     }
+	else
+	{ /* idle vcpu - revert the priority, if needed */
+		if (scurr->pri != CSCHED_PRI_IDLE)
+			scurr->pri = CSCHED_PRI_IDLE;
+	}
 
     /*
      * Select next runnable local VCPU (ie top of local runq)
@@ -1182,7 +1188,15 @@
     else
         BUG_ON( is_idle_vcpu(current) || list_empty(runq) );
 
-    snext = __runq_elem(runq->next);
+	/* There is work to be done (in idle_vcpu context). Temporarily boost
+	 * the priority of idle_vcpu[cpu] and schedule it */
+	if (!xen_workqueue_empty(cpu))
+	{
+		snext = CSCHED_VCPU(idle_vcpu[cpu]);
+		snext->pri = CSCHED_PRI_TS_BOOST;
+	}
+	else
+    	snext = __runq_elem(runq->next);
 
     /*
      * SMP Load balance:
@@ -1217,7 +1231,7 @@
     /*
      * Return task to run next...
      */
-    ret.time = (is_idle_vcpu(snext->vcpu) ?
+    ret.time = ((is_idle_vcpu(snext->vcpu) && (snext->pri==CSCHED_PRI_IDLE)) ?
                 -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
     ret.task = snext->vcpu;
 
diff -r 7ee8bb40200a xen/common/schedule.c
--- a/xen/common/schedule.c	Thu Apr 15 19:11:16 2010 +0100
+++ b/xen/common/schedule.c	Mon Apr 19 02:01:06 2010 -0400
@@ -31,6 +31,7 @@
 #include <xen/errno.h>
 #include <xen/guest_access.h>
 #include <xen/multicall.h>
+#include <xen/workqueue.h>
 #include <public/sched.h>
 #include <xsm/xsm.h>
 
@@ -955,12 +956,35 @@
     SCHED_OP(init);
 }
 
+static struct xen_work_struct dummy_work[NR_CPUS];
+static void dummy_xen_work_func(struct xen_work_struct *work)
+{
+    unsigned int cpu = smp_processor_id();
+	s_time_t then = (s_time_t)work->data;
+	s_time_t now = NOW();
+	printk("XEN_WORK_CPU[%02d] : %lu -> %lu (%lu)\n", cpu, 
+			(unsigned long)then, (unsigned long)now, (unsigned long)(now-then));
+	return;
+}
+
+static void schedule_xen_work_on_all(void)
+{
+	int i;
+	for_each_online_cpu(i)
+	{
+		INIT_XEN_WORK(&dummy_work[i], dummy_xen_work_func, NOW());
+		schedule_xen_work_on(i, &dummy_work[i]);
+	}
+}
+
 void dump_runq(unsigned char key)
 {
     s_time_t      now = NOW();
     int           i;
     unsigned long flags;
 
+	schedule_xen_work_on_all();
+
     local_irq_save(flags);
 
     printk("Scheduler: %s (%s)\n", ops.name, ops.opt_name);
diff -r 7ee8bb40200a xen/include/xen/workqueue.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/xen/workqueue.h	Mon Apr 19 02:01:06 2010 -0400
@@ -0,0 +1,56 @@
+#ifndef __XEN_WORKQUEUE_H
+#define __XEN_WORKQUEUE_H
+
+#include <xen/list.h>
+#include <xen/spinlock.h>
+#include <xen/config.h>
+
+/* xen_workqueue is a per-cpu workqueue through which we can assign work
+ * to a cpu using xen_work_struct. The work actually gets done in the context
+ * of idle_domain's vcpu (in idle_loop). */
+struct xen_work_struct;
+typedef void(*xen_work_func_t)(struct xen_work_struct *work);
+struct xen_work_struct
+{
+	struct list_head entry;
+	uint64_t data;
+	xen_work_func_t func;
+};
+
+
+#define DECLARE_XEN_WORK(_n) 								\
+	struct xen_work_struct _n = {							\
+					.entry = {&(_n).entry, &(_n).entry},	\
+					.data = 0, 								\
+					.func = 0,								\
+					}
+
+#define INIT_XEN_WORK(_w, _f, _d)		\
+	do{									\
+		INIT_LIST_HEAD(&(_w)->entry);	\
+		(_w)->data = (_d);				\
+		(_w)->func = (_f);				\
+	}while(0)
+
+extern int schedule_xen_work_on(int cpu, struct xen_work_struct *work);
+
+struct xen_workqueue_struct {
+	struct list_head list;
+	spinlock_t lock;
+};
+DECLARE_PER_CPU(struct xen_workqueue_struct, xen_workqueue);
+extern void init_xen_workqueues(void);
+
+/* scheduler must first check for any pending work in xen_workqueue */
+static inline int xen_workqueue_empty(int cpu)
+{
+	int ret;
+	//unsigned long flags;
+	struct xen_workqueue_struct *wq = &per_cpu(xen_workqueue, cpu);
+	//spin_lock_irqsave(&wq->lock, flags);
+	ret = list_empty(&wq->list);
+	//spin_unlock_irqrestore(&wq->lock, flags);
+	return ret;
+}
+
+#endif

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

next prev parent reply	other threads:[~2010-04-19  6:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-16 18:05 [PROPOSAL] Doing work in idle-vcpu context Keir Fraser
2010-04-19  5:00 ` Juergen Gross
2010-04-19  5:55 ` Jiang, Yunhong
2010-04-19  6:08   ` Dulloor [this message]
2010-04-19  6:53     ` Keir Fraser
2010-04-19  6:45   ` Keir Fraser
2010-04-19  9:43 ` George Dunlap
2010-04-19 10:52 ` Keir Fraser
2010-04-20  6:50   ` Dulloor
2010-04-20 12:47     ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=z2t940bcfd21004182308ra988bdjf4776b0a32fc5dcc@mail.gmail.com \
    --to=dulloor@gmail.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).