From: George Dunlap <george.dunlap@citrix.com>
To: John Weekes <lists.xen@nuclearfallout.net>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Keir Fraser <keir@xen.org>
Subject: Re: Re: credit2 BUG_ON triggered
Date: Tue, 26 Apr 2011 18:02:46 +0100 [thread overview]
Message-ID: <1303837366.1955.21010.camel@elijah> (raw)
In-Reply-To: <4DB6F6AF.7060909@nuclearfallout.net>
[-- Attachment #1: Type: text/plain, Size: 6372 bytes --]
OK -- it turns out on my box, I hit the bug as the machine boots, so
repro was very easy. :-) (Can you tell I haven't been working on
credit2 in a little while?)
Can you test the attached patch and see if it works for you?
-George
On Tue, 2011-04-26 at 17:45 +0100, John Weekes wrote:
> On 4/26/2011 4:01 AM, George Dunlap wrote:
> > John,
> >
> > Thanks for your help looking into this. Just to clarify, you're running
> > squeeze in HVM mode (not PV mode)?
>
> Yes, that's correct.
>
> -John
>
> > -George
> >
> > On Tue, 2011-04-19 at 21:58 +0100, John Weekes wrote:
> >> I am testing credit2 on a dual Xeon L5640 machine. I have an HVM Debian
> >> Squeeze domU that reliably leads to a panic when it's run with the
> >> credit2 scheduler, but not with credit.
> >>
> >> The reproduction steps on this machine are simple:
> >>
> >> 1. Fully boot up the machine.
> >> 2. Enter commands that cause dom0 to use 100% CPU. For instance:
> >>
> >> screen -AmdS burn1 perl -e 'while(1) {}'
> >> screen -AmdS burn2 perl -e 'while(1) {}'
> >> screen -AmdS burn3 perl -e 'while(1) {}'
> >> screen -AmdS burn4 perl -e 'while(1) {}'
> >>
> >> 3. Start up the prepared Squeeze domU (which is a stock install), with
> >> "xm" ("xl" doesn't work with debug=y because of a spurious assert, but
> >> has the same problem with debug=n):
> >>
> >> cd /servers/customers
> >> xm create testvds4.cfg
> >>
> >> The serial console then shows this:
> >>
> >> (XEN) irq.c:324: Dom1 callback via changed to Direct Vector 0xe9
> >> (XEN) Xen BUG at sched_credit2.c:1606
> >> (XEN) ----[ Xen-4.1.1-rc1-pre x86_64 debug=y Not tainted ]----
> >> (XEN) CPU: 12
> >> (XEN) RIP: e008:[<ffff82c48011a383>] csched_schedule+0xdb/0xab1
> >> (XEN) RFLAGS: 0000000000010082 CONTEXT: hypervisor
> >> (XEN) rax: ffff830c2246c000 rbx: ffff830c2246bd10 rcx: 0000000000000000
> >> (XEN) rdx: 0000000000000001 rsi: ffff82c480241680 rdi: ffff8300bf74c000
> >> (XEN) rbp: ffff83043b28fe38 rsp: ffff83043b28fd58 r8: 0000000000000002
> >> (XEN) r9: 000000000000003e r10: 0000000000000018 r11: 00000000000186a0
> >> (XEN) r12: 0000000000000000 r13: ffff83043ffe02d0 r14: 000000000000000c
> >> (XEN) r15: ffff83043ffe0010 cr0: 000000008005003b cr4: 00000000000026f0
> >> (XEN) cr3: 0000000c22436000 cr2: 0000000000000000
> >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> >> (XEN) Xen stack trace from rsp=ffff83043b28fd58:
> >> (XEN) ffff82c4801bc822 ffff83043b298040 0000000000000282 ffff83043b28fd88
> >> (XEN) ffff82c48012248f ffff8300bf74c000 ffff83043b28fdb8 ffff82c4801b59bd
> >> (XEN) 00000014c77c5137 ffff83043b28fe68 ffff82c480241680 0000000000000001
> >> (XEN) 00007cfbc4d70217 ffff82c48014b2c0 ffff83043b298060 ffff83043b28fde8
> >> (XEN) ffff82c480124345 ffff83043b298060 ffff83043b28fe38 0000000000000082
> >> (XEN) 00000000000186a0 0000000000000082 000000000000000c ffff8300bf74c000
> >> (XEN) 0000000000000000 ffff82c480241680 ffff83043b298040 ffff83043b298060
> >> (XEN) ffff83043b28feb8 ffff82c48012061c ffff83043b28feb8 00000014c77c5137
> >> (XEN) 0000000000000293 ffff8300bf74d868 ffff82c48012248f ffff8300bf74c000
> >> (XEN) ffff83043b28fe98 ffff82c4801b19d0 ffff8300bf74c000 ffff82c4802a8e80
> >> (XEN) 00000000ffffffff ffff82c4802a8880 ffff83043b28ff18 ffffffffffffffff
> >> (XEN) ffff83043b28fef8 ffff82c480121caf 0440080000000001 ffff8300bf74c000
> >> (XEN) 0000000000000046 ffff8800018501a0 ffffffff81311470 0000000000000092
> >> (XEN) ffff83043b28ff08 ffff82c480121d0c ffff88000184c600 ffff82c4801bb3f1
> >> (XEN) 0000000000000092 ffffffff81311470 ffff8800018501a0 0000000000000046
> >> (XEN) ffff88000184c600 0000000000000001 00000000000186a0 0000000000000008
> >> (XEN) 0000000000000200 0000000000000008 0000000000000000 0000000000000002
> >> (XEN) 0000000000000000 0000000000000002 0000000000000007 0000beef0000beef
> >> (XEN) ffffffff81009308 0000beef0000beef 0000000000000046 ffff880031e63e58
> >> (XEN) 000000000000beef 000000000000beef 000000000000beef 000000000000beef
> >> (XEN) Xen call trace:
> >> (XEN) [<ffff82c48011a383>] csched_schedule+0xdb/0xab1
> >> (XEN) [<ffff82c48012061c>] schedule+0x122/0x60c
> >> (XEN) [<ffff82c480121caf>] __do_softirq+0x8d/0x9e
> >> (XEN) [<ffff82c480121d0c>] do_softirq+0x4c/0x4e
> >> (XEN)
> >> (XEN)
> >> (XEN) ****************************************
> >> (XEN) Panic on CPU 12:
> >> (XEN) Xen BUG at sched_credit2.c:1606
> >> (XEN) ****************************************
> >> (XEN)
> >> (XEN) Reboot in five seconds...
> >>
> >> Where do we go from here?
> >>
> >> -John
> >>
> >> For reference, xl info output:
> >>
> >> dallas-dodec226-5 ~ # xl info
> >> host : dallas-dodec226-5
> >> release : 2.6.32.37-gbe57219
> >> version : #1 SMP Tue Apr 19 00:14:46 CDT 2011
> >> machine : x86_64
> >> nr_cpus : 24
> >> nr_nodes : 2
> >> cores_per_socket : 6
> >> threads_per_core : 2
> >> cpu_mhz : 2266
> >> hw_caps :
> >> bfebfbff:2c100800:00000000:00003f40:009ee3fd:00000000:00000001:00000000
> >> virt_caps : hvm hvm_directio
> >> total_memory : 49143
> >> free_memory : 47106
> >> free_cpus : 0
> >> xen_major : 4
> >> xen_minor : 1
> >> xen_extra : .1-rc1-pre
> >> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> >> hvm-3.0-x86_32p hvm-3.0-x86_64
> >> xen_scheduler : credit2
> >> xen_pagesize : 4096
> >> platform_params : virt_start=0xffff800000000000
> >> xen_changeset : Thu Apr 07 15:26:58 2011 +0100 23025:dbf2ddf652dc
> >> xen_commandline : dom0_mem=1500M dom0_max_vcpus=4
> >> iommu=dom0-passthrough sched=credit2 loglvl=all guest_loglvl=all
> >> com2=115200,8n1 console=com2
> >> cc_compiler : gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5)
> >> cc_compile_by : root
> >> cc_compile_domain : nuclearfallout.net
> >> cc_compile_date : Tue Apr 19 14:26:02 CDT 2011
> >> xend_config_format : 4
> >>
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
[-- Attachment #2: credit2-migrate-callback.diff --]
[-- Type: text/x-patch, Size: 3769 bytes --]
credit2: add a callback to migrate to a new cpu
In credit2, there needs to be a strong correlation between
v->processor and the runqueue to which a vcpu is assigned;
much of the code relies on this invariant. Allow credit2
to manage the actual migration itself.
This fixes the most recent credit2 bug reported on the list
(Xen BUG at sched_credit2.c:1606) in Xen 4.1, as well as
the bug at 811 in -unstable.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
diff -r 80401982465d -r 475c6c9e6217 xen/common/sched_credit2.c
--- a/xen/common/sched_credit2.c Mon Apr 25 13:17:05 2011 +0100
+++ b/xen/common/sched_credit2.c Tue Apr 26 17:54:39 2011 +0100
@@ -1352,32 +1351,28 @@
static int
csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
{
- struct csched_vcpu * const svc = CSCHED_VCPU(vc);
int new_cpu;
- /* The scheduler interface doesn't have an explicit mechanism to
- * involve the choosable scheduler in the migrate process, so we
- * infer that a change may happen by the call to cpu_pick, and
- * remove it from the old runqueue while the lock for the old
- * runqueue is held. It can't be actively waiting to run. It
- * will be added to the new runqueue when it next wakes.
- *
- * If we want to be able to call pick() separately, we need to add
- * a mechansim to remove a vcpu from an old processor / runqueue
- * before releasing the lock. */
- BUG_ON(__vcpu_on_runq(svc));
-
new_cpu = choose_cpu(ops, vc);
- /* If we're suggesting moving to a different runqueue, remove it
- * from the old runqueue while we have the lock. It will be added
- * to the new one when it wakes. */
- if ( svc->rqd != NULL
- && RQD(ops, new_cpu) != svc->rqd )
- runq_deassign(ops, vc);
return new_cpu;
}
+static void
+csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc, int new_cpu)
+{
+ struct csched_vcpu * const svc = CSCHED_VCPU(vc);
+ struct csched_runqueue_data *trqd;
+
+ /* Check if new_cpu is valid */
+ BUG_ON(!cpu_isset(new_cpu, CSCHED_PRIV(ops)->initialized));
+
+ trqd = RQD(ops, new_cpu);
+
+ if ( trqd != svc->rqd )
+ migrate(ops, svc, trqd, NOW());
+}
+
static int
csched_dom_cntl(
const struct scheduler *ops,
@@ -2121,6 +2116,7 @@
.adjust = csched_dom_cntl,
.pick_cpu = csched_cpu_pick,
+ .migrate = csched_vcpu_migrate,
.do_schedule = csched_schedule,
.context_saved = csched_context_saved,
diff -r 80401982465d -r 475c6c9e6217 xen/common/schedule.c
--- a/xen/common/schedule.c Mon Apr 25 13:17:05 2011 +0100
+++ b/xen/common/schedule.c Tue Apr 26 17:54:39 2011 +0100
@@ -489,7 +489,11 @@
* Switch to new CPU, then unlock new and old CPU. This is safe because
* the lock pointer cant' change while the current lock is held.
*/
- v->processor = new_cpu;
+ if ( VCPU2OP(v)->migrate )
+ SCHED_OP(VCPU2OP(v), migrate, v, new_cpu);
+ else
+ v->processor = new_cpu;
+
if ( old_lock != new_lock )
spin_unlock(new_lock);
diff -r 80401982465d -r 475c6c9e6217 xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h Mon Apr 25 13:17:05 2011 +0100
+++ b/xen/include/xen/sched-if.h Tue Apr 26 17:54:39 2011 +0100
@@ -170,6 +170,7 @@
bool_t tasklet_work_scheduled);
int (*pick_cpu) (const struct scheduler *, struct vcpu *);
+ void (*migrate) (const struct scheduler *, struct vcpu *, int);
int (*adjust) (const struct scheduler *, struct domain *,
struct xen_domctl_scheduler_op *);
int (*adjust_global) (const struct scheduler *,
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next prev parent reply other threads:[~2011-04-26 17:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-19 20:58 credit2 BUG_ON triggered John Weekes
2011-04-20 9:36 ` George Dunlap
2011-04-20 16:12 ` John Weekes
2011-04-20 16:51 ` George Dunlap
2011-04-20 17:33 ` John Weekes
2011-04-20 19:10 ` John Weekes
2011-04-20 19:29 ` John Weekes
2011-04-26 11:01 ` George Dunlap
2011-04-26 16:45 ` John Weekes
2011-04-26 17:02 ` George Dunlap [this message]
2011-04-26 20:15 ` John Weekes
2011-04-27 9:10 ` George Dunlap
2011-04-27 17:24 ` credit2 domU freeze at "Writing SMBIOS tables ..." John Weekes
2011-04-28 13:10 ` George Dunlap
2011-04-28 16:25 ` John Weekes
2011-04-28 16:35 ` George Dunlap
2011-04-28 19:56 ` John Weekes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1303837366.1955.21010.camel@elijah \
--to=george.dunlap@citrix.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=keir@xen.org \
--cc=lists.xen@nuclearfallout.net \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.