* external module sched_in event
@ 2007-12-21 17:40 Andrea Arcangeli
[not found] ` <20071221174048.GB1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2007-12-21 17:40 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Hello,
[ I already sent it once as andrea-l3A5Bk7waGM@public.gmane.org but it didn't go through
for whatever reason, trying again from private email, hope there
won't be dups ]
My worst longstanding problem with KVM is that as the uptime of my
host system increased, my opensuse guest images started to destabilize
and lockup at boot. The weird thing was that fresh after boot
everything was always perfectly ok, so I thought it was rmmod/insmod
or some other sticky effect on the CPU after restarting the guest a
few times that triggered the crash. Furthermore if I loaded the cpu a
lot (like with a while :; do true;done), the crash would magically
disappear. Decreasing cpu frequency and timings didn't help. Debugging
wasn't trivial because it required a certain uptime and it didn't
always crash.
So I once debugged this more aggressively I figured out KVM was ok, it
was the guest that crashed in the tsc clocksource because tsc wasn't
monotone. guest was looping in an infinite loop with irq disabled. So
I tried to pass "notsc" and that fixed the crash just fine.
Initially I thought it was the tsc_offset logic being wrong but then I
figured out that the vcpu_put/load wasn't always executed, this
bugcheck triggers with current git and so I recommend to apply this to
kvm.git to avoid similar nasty hard-to-detect bugs in the future (Avi
says vmx would crash hard in such a condition, svm is much simpler and
it somewhat survives the lack of sched_in and only crashes the guest
due to not monotone tsc):
Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac876ec..26372fa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -742,6 +742,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ WARN_ON(vcpu->cpu != smp_processor_id());
kvm_x86_ops->vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
}
So trying to understand why the ->cpu was wrong, I looked into the
preempt notifiers emulation, and it looked quite fragile without a
real sched_in hook. I figured out I could provide a real sched_in hook
by loading the proper values in the
tsk->thread.debugreg[0/7]. Initially I got the hooking points out of
objdump -d vmlinux, but Avi preferred no dependency on the vmlinux and
he suggested to try to find the sched_in hook in the stack. So that's
what I implemented now and this should provide real robustness to the
out of tree module compiled against binary kernel images with
CONFIG_KVM=n. I tried to be compatible with all kernels down to 2.6.5
but only 2.6.2x host is tested and only on 64bit and only on SVM (no
vmx system around here at all).
This fixes my longstanding KVM instability and "-smp 2" now works
flawlessy with svm too! -smp 2 -snapshot crashes in qemu userland but
that's not kernel related, must be some thread mutex lock recursion or
lock inversion in the qcow cow code. Removing -snapshot make -smp 2
stable. Multiple guests UP and SMP seems stable too.
To reproduce my crash easily without waiting ages for the two tsc to
deviate with an error larger than the number of cycles it takes for a
CPU migration, run write_tsc(0,0) in kernel mode (like in the svm.c
init function and then insmod kvm-amd; rmmod kvm-amd and then remove
write_tsc and recompile kvm-amd).
#include <stdio.h>
main()
{
unsigned long x1, x2;
unsigned int a, d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
x1 = ((unsigned long)d << 32) | a;
for (;;) {
asm volatile("rdtsc" : "=a" (a), "=d" (d));
x2 = ((unsigned long)d << 32) | a;
if (x2 < x1)
printf("error %Ld\n", x1-x2);
else
printf("good %Ld\n", x1-x2);
x1 = x2;
}
}
(the "good.." printf can be commented out if you run this on the host,
but it better stay if you run this in the guest because it helps
rescheduling X with SDL that increases the frequency of the
CPU-switches of the kvm task)
So in short with the below fix applied, after a write_tsc(0,0), the
UP guest never return any error anymore. Previously it would return
frequent errors because sched_in wasn't properly invoked by svm.c and
it would crash at boot every single time after a write_tsc(0,0).
The SMP guest of course still returns TSC errors but that's ok, the
smp host also return TSC errors, that's ok, it's only the UP guest
that is forbidden to have a not monotone TSC or the guest would crash
like it happened to me.
I'm unsure if special_reload_db7 is needed at all, but it certainly
can't hurt so it's the only hack I left.
Finally I can enjoy KVM stability too ;). If you always compiled your
host kernel with CONFIG_KVM=y on a recent kernels including the
preempt-notifiers, you could never run into this. If you compile your
host kernel with CONFIG_KVM=n please try to test this.
Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org>
diff --git a/kernel/hack-module.awk b/kernel/hack-module.awk
index 7993aa2..5187c96 100644
--- a/kernel/hack-module.awk
+++ b/kernel/hack-module.awk
@@ -24,32 +24,6 @@
printf("MODULE_INFO(version, \"%s\");\n", version)
}
-/^static unsigned long vmcs_readl/ {
- in_vmcs_read = 1
-}
-
-/ASM_VMX_VMREAD_RDX_RAX/ && in_vmcs_read {
- printf("\tstart_special_insn();\n")
-}
-
-/return/ && in_vmcs_read {
- printf("\tend_special_insn();\n");
- in_vmcs_read = 0
-}
-
-/^static void vmcs_writel/ {
- in_vmcs_write = 1
-}
-
-/ASM_VMX_VMWRITE_RAX_RDX/ && in_vmcs_write {
- printf("\tstart_special_insn();\n")
-}
-
-/if/ && in_vmcs_write {
- printf("\tend_special_insn();\n");
- in_vmcs_write = 0
-}
-
/^static void vmx_load_host_state/ {
vmx_load_host_state = 1
}
@@ -74,15 +48,6 @@
print "\tspecial_reload_dr7();"
}
-/static void vcpu_put|static int __vcpu_run|static struct kvm_vcpu \*vmx_create_vcpu/ {
- in_tricky_func = 1
-}
-
-/preempt_disable|get_cpu/ && in_tricky_func {
- printf("\tin_special_section();\n");
- in_tricky_func = 0
-}
-
/unsigned long flags;/ && vmx_load_host_state {
print "\tunsigned long gsbase;"
}
@@ -90,4 +55,3 @@
/local_irq_save/ && vmx_load_host_state {
print "\t\tgsbase = vmcs_readl(HOST_GS_BASE);"
}
-
diff --git a/kernel/preempt.c b/kernel/preempt.c
index 8bb0405..6e57277 100644
--- a/kernel/preempt.c
+++ b/kernel/preempt.c
@@ -6,8 +6,6 @@
static DEFINE_SPINLOCK(pn_lock);
static LIST_HEAD(pn_list);
-static DEFINE_PER_CPU(int, notifier_enabled);
-static DEFINE_PER_CPU(struct task_struct *, last_tsk);
#define dprintk(fmt) do { \
if (0) \
@@ -15,59 +13,88 @@ static DEFINE_PER_CPU(struct task_struct *, last_tsk);
current->pid, raw_smp_processor_id()); \
} while (0)
-static void preempt_enable_notifiers(void)
+static void preempt_enable_sched_out_notifiers(void)
{
- int cpu = raw_smp_processor_id();
-
- if (per_cpu(notifier_enabled, cpu))
- return;
-
- dprintk("\n");
- per_cpu(notifier_enabled, cpu) = 1;
asm volatile ("mov %0, %%db0" : : "r"(schedule));
- asm volatile ("mov %0, %%db7" : : "r"(0x702ul));
+ asm volatile ("mov %0, %%db7" : : "r"(0x701ul));
+#ifdef CONFIG_X86_64
+ current->thread.debugreg7 = 0ul;
+#else
+ current->thread.debugreg[7] = 0ul;
+#endif
+#ifdef TIF_DEBUG
+ clear_tsk_thread_flag(current, TIF_DEBUG);
+#endif
+}
+
+static void preempt_enable_sched_in_notifiers(void * addr)
+{
+ asm volatile ("mov %0, %%db0" : : "r"(addr));
+ asm volatile ("mov %0, %%db7" : : "r"(0x701ul));
+#ifdef CONFIG_X86_64
+ current->thread.debugreg0 = (unsigned long) addr;
+ current->thread.debugreg7 = 0x701ul;
+#else
+ current->thread.debugreg[0] = (unsigned long) addr;
+ current->thread.debugreg[7] = 0x701ul;
+#endif
+#ifdef TIF_DEBUG
+ set_tsk_thread_flag(current, TIF_DEBUG);
+#endif
}
void special_reload_dr7(void)
{
- asm volatile ("mov %0, %%db7" : : "r"(0x702ul));
+ asm volatile ("mov %0, %%db7" : : "r"(0x701ul));
}
EXPORT_SYMBOL_GPL(special_reload_dr7);
-static void preempt_disable_notifiers(void)
+static void __preempt_disable_notifiers(void)
{
- int cpu = raw_smp_processor_id();
-
- if (!per_cpu(notifier_enabled, cpu))
- return;
+ asm volatile ("mov %0, %%db7" : : "r"(0ul));
+}
- dprintk("\n");
- per_cpu(notifier_enabled, cpu) = 0;
- asm volatile ("mov %0, %%db7" : : "r"(0x400ul));
+static void preempt_disable_notifiers(void)
+{
+ __preempt_disable_notifiers();
+#ifdef CONFIG_X86_64
+ current->thread.debugreg7 = 0ul;
+#else
+ current->thread.debugreg[7] = 0ul;
+#endif
+#ifdef TIF_DEBUG
+ clear_tsk_thread_flag(current, TIF_DEBUG);
+#endif
}
-static void __attribute__((used)) preempt_notifier_trigger(void)
+static void fastcall __attribute__((used)) preempt_notifier_trigger(void *** ip)
{
struct preempt_notifier *pn;
int cpu = raw_smp_processor_id();
int found = 0;
- unsigned long flags;
dprintk(" - in\n");
//dump_stack();
- spin_lock_irqsave(&pn_lock, flags);
+ spin_lock(&pn_lock);
list_for_each_entry(pn, &pn_list, link)
if (pn->tsk == current) {
found = 1;
break;
}
- spin_unlock_irqrestore(&pn_lock, flags);
- preempt_disable_notifiers();
+ spin_unlock(&pn_lock);
+
if (found) {
- dprintk("sched_out\n");
- pn->ops->sched_out(pn, NULL);
- per_cpu(last_tsk, cpu) = NULL;
- }
+ if ((void *) *ip != schedule) {
+ dprintk("sched_in\n");
+ preempt_enable_sched_out_notifiers();
+ pn->ops->sched_in(pn, cpu);
+ } else {
+ dprintk("sched_out\n");
+ preempt_enable_sched_in_notifiers(**(ip+3));
+ pn->ops->sched_out(pn, NULL);
+ }
+ } else
+ __preempt_disable_notifiers();
dprintk(" - out\n");
}
@@ -104,6 +131,11 @@ asm ("pn_int1_handler: \n\t"
"pop " TMP " \n\t"
"jz .Lnotme \n\t"
SAVE_REGS "\n\t"
+#ifdef CONFIG_X86_64
+ "leaq 120(%rsp),%rdi\n\t"
+#else
+ "leal 32(%esp),%eax\n\t"
+#endif
"call preempt_notifier_trigger \n\t"
RESTORE_REGS "\n\t"
#ifdef CONFIG_X86_64
@@ -121,75 +153,28 @@ asm ("pn_int1_handler: \n\t"
#endif
);
-void in_special_section(void)
-{
- struct preempt_notifier *pn;
- int cpu = raw_smp_processor_id();
- int found = 0;
- unsigned long flags;
-
- if (per_cpu(last_tsk, cpu) == current)
- return;
-
- dprintk(" - in\n");
- spin_lock_irqsave(&pn_lock, flags);
- list_for_each_entry(pn, &pn_list, link)
- if (pn->tsk == current) {
- found = 1;
- break;
- }
- spin_unlock_irqrestore(&pn_lock, flags);
- if (found) {
- dprintk("\n");
- per_cpu(last_tsk, cpu) = current;
- pn->ops->sched_in(pn, cpu);
- preempt_enable_notifiers();
- }
- dprintk(" - out\n");
-}
-EXPORT_SYMBOL_GPL(in_special_section);
-
-void start_special_insn(void)
-{
- preempt_disable();
- in_special_section();
-}
-EXPORT_SYMBOL_GPL(start_special_insn);
-
-void end_special_insn(void)
-{
- preempt_enable();
-}
-EXPORT_SYMBOL_GPL(end_special_insn);
-
void preempt_notifier_register(struct preempt_notifier *notifier)
{
- int cpu = get_cpu();
unsigned long flags;
dprintk(" - in\n");
spin_lock_irqsave(&pn_lock, flags);
- preempt_enable_notifiers();
+ preempt_enable_sched_out_notifiers();
notifier->tsk = current;
list_add(¬ifier->link, &pn_list);
spin_unlock_irqrestore(&pn_lock, flags);
- per_cpu(last_tsk, cpu) = current;
- put_cpu();
dprintk(" - out\n");
}
void preempt_notifier_unregister(struct preempt_notifier *notifier)
{
- int cpu = get_cpu();
unsigned long flags;
dprintk(" - in\n");
spin_lock_irqsave(&pn_lock, flags);
list_del(¬ifier->link);
spin_unlock_irqrestore(&pn_lock, flags);
- per_cpu(last_tsk, cpu) = NULL;
preempt_disable_notifiers();
- put_cpu();
dprintk(" - out\n");
}
@@ -238,7 +223,16 @@ void preempt_notifier_sys_init(void)
static void do_disable(void *blah)
{
- preempt_disable_notifiers();
+#ifdef TIF_DEBUG
+ if (!test_tsk_thread_flag(current, TIF_DEBUG))
+#else
+#ifdef CONFIG_X86_64
+ if (!current->thread.debugreg7)
+#else
+ if (!current->thread.debugreg[7])
+#endif
+#endif
+ __preempt_disable_notifiers();
}
void preempt_notifier_sys_exit(void)
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply related [flat|nested] 12+ messages in thread[parent not found: <20071221174048.GB1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <20071221174048.GB1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2007-12-21 17:52 ` Izik Eidus [not found] ` <476BFD74.2040509-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Izik Eidus @ 2007-12-21 17:52 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Andrea Arcangeli wrote: > Hello, > > [ I already sent it once as andrea-l3A5Bk7waGM@public.gmane.org but it didn't go through > for whatever reason, trying again from private email, hope there > won't be dups ] > oh, it was sent to the list, dont trust (in case you did) the source forge site for the mails inside this list, gmane is much better... > My worst longstanding problem with KVM is that as the uptime of my > host system increased, my opensuse guest images started to destabilize > and lockup at boot. The weird thing was that fresh after boot > everything was always perfectly ok, so I thought it was rmmod/insmod > or some other sticky effect on the CPU after restarting the guest a > few times that triggered the crash. Furthermore if I loaded the cpu a > lot (like with a while :; do true;done), the crash would magically > disappear. Decreasing cpu frequency and timings didn't help. Debugging > wasn't trivial because it required a certain uptime and it didn't > always crash. > > So I once debugged this more aggressively I figured out KVM was ok, it > was the guest that crashed in the tsc clocksource because tsc wasn't > monotone. guest was looping in an infinite loop with irq disabled. So > I tried to pass "notsc" and that fixed the crash just fine. > > Initially I thought it was the tsc_offset logic being wrong but then I > figured out that the vcpu_put/load wasn't always executed, this > bugcheck triggers with current git and so I recommend to apply this to > kvm.git to avoid similar nasty hard-to-detect bugs in the future (Avi > says vmx would crash hard in such a condition, svm is much simpler and > it somewhat survives the lack of sched_in and only crashes the guest > due to not monotone tsc): > > Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ac876ec..26372fa 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -742,6 +742,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > > void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > { > + WARN_ON(vcpu->cpu != smp_processor_id()); > kvm_x86_ops->vcpu_put(vcpu); > kvm_put_guest_fpu(vcpu); > } > > > > So trying to understand why the ->cpu was wrong, I looked into the > preempt notifiers emulation, and it looked quite fragile without a > real sched_in hook. I figured out I could provide a real sched_in hook > by loading the proper values in the > tsk->thread.debugreg[0/7]. Initially I got the hooking points out of > objdump -d vmlinux, but Avi preferred no dependency on the vmlinux and > he suggested to try to find the sched_in hook in the stack. So that's > what I implemented now and this should provide real robustness to the > out of tree module compiled against binary kernel images with > CONFIG_KVM=n. I tried to be compatible with all kernels down to 2.6.5 > but only 2.6.2x host is tested and only on 64bit and only on SVM (no > vmx system around here at all). > > This fixes my longstanding KVM instability and "-smp 2" now works > flawlessy with svm too! -smp 2 -snapshot crashes in qemu userland but > that's not kernel related, must be some thread mutex lock recursion or > lock inversion in the qcow cow code. Removing -snapshot make -smp 2 > stable. Multiple guests UP and SMP seems stable too. > you mean that without -snapshot, the userspace not hang at the sigwait() in the qcow code? > To reproduce my crash easily without waiting ages for the two tsc to > deviate with an error larger than the number of cycles it takes for a > CPU migration, run write_tsc(0,0) in kernel mode (like in the svm.c > init function and then insmod kvm-amd; rmmod kvm-amd and then remove > write_tsc and recompile kvm-amd). > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <476BFD74.2040509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <476BFD74.2040509-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-12-21 18:22 ` Andrea Arcangeli [not found] ` <20071221182257.GG1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> 2007-12-22 20:24 ` external module sched_in event Avi Kivity 1 sibling, 1 reply; 12+ messages in thread From: Andrea Arcangeli @ 2007-12-21 18:22 UTC (permalink / raw) To: Izik Eidus; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Fri, Dec 21, 2007 at 07:52:52PM +0200, Izik Eidus wrote: > oh, it was sent to the list, dont trust (in case you did) the source forge > site for the mails But this time I received it in my kvm-devel folder... previously I didn't, so it had to be blocked my some spamfilter in the other account sorry for the dup but there was definitely a glitch somewhere and there was no glitch this time ;). Thanks for the gmane tip! > you mean that without -snapshot, the userspace not hang at the sigwait() in > the qcow code? Yes. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20071221182257.GG1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* mailman setup for kvm-devel (was Re: external module sched_in event) [not found] ` <20071221182257.GG1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2007-12-21 18:50 ` Carlo Marcelo Arenas Belon 2007-12-22 20:21 ` Avi Kivity 0 siblings, 1 reply; 12+ messages in thread From: Carlo Marcelo Arenas Belon @ 2007-12-21 18:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Fri, Dec 21, 2007 at 07:22:57PM +0100, Andrea Arcangeli wrote: > On Fri, Dec 21, 2007 at 07:52:52PM +0200, Izik Eidus wrote: > > oh, it was sent to the list, dont trust (in case you did) the source forge > > site for the mails > > But this time I received it in my kvm-devel folder... for some reason the list is configured with the mailman option for : Receive your own posts to the list? no so you will only see it you post it with another account as the one you have subscribed as you find out, or if waiting long enough for the mail archive to catch up. > didn't, so it had to be blocked my some spamfilter and it doesn't have spam filter. Carlo PS. I agree that this setup is confusing and would rather see it changed if that is an open possibility. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mailman setup for kvm-devel (was Re: external module sched_in event) 2007-12-21 18:50 ` mailman setup for kvm-devel (was Re: external module sched_in event) Carlo Marcelo Arenas Belon @ 2007-12-22 20:21 ` Avi Kivity 0 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2007-12-22 20:21 UTC (permalink / raw) To: Carlo Marcelo Arenas Belon Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrea Arcangeli Carlo Marcelo Arenas Belon wrote: > On Fri, Dec 21, 2007 at 07:22:57PM +0100, Andrea Arcangeli wrote: > >> On Fri, Dec 21, 2007 at 07:52:52PM +0200, Izik Eidus wrote: >> >>> oh, it was sent to the list, dont trust (in case you did) the source forge >>> site for the mails >>> >> But this time I received it in my kvm-devel folder... >> > > for some reason the list is configured with the mailman option for : > > Receive your own posts to the list? no > > so you will only see it you post it with another account as the one you > have subscribed as you find out, or if waiting long enough for the mail > archive to catch up. > > It's just a default; users can change it if they wish. I changed the default to send a copy of own posts, but that only applied to new subscribers. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: external module sched_in event [not found] ` <476BFD74.2040509-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-12-21 18:22 ` Andrea Arcangeli @ 2007-12-22 20:24 ` Avi Kivity 1 sibling, 0 replies; 12+ messages in thread From: Avi Kivity @ 2007-12-22 20:24 UTC (permalink / raw) To: Izik Eidus; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrea Arcangeli Izik Eidus wrote: >> This fixes my longstanding KVM instability and "-smp 2" now works >> flawlessy with svm too! -smp 2 -snapshot crashes in qemu userland but >> that's not kernel related, must be some thread mutex lock recursion or >> lock inversion in the qcow cow code. Removing -snapshot make -smp 2 >> stable. Multiple guests UP and SMP seems stable too. >> >> > you mean that without -snapshot, the userspace not hang at the sigwait() > in the qcow code? > The problem is probably exaggerated by -snapshot, which needs much more metadata activity than without -snapshot. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* external module sched_in event
@ 2007-12-20 16:23 Andrea Arcangeli
[not found] ` <20071220162353.GA3802-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2007-12-20 16:23 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Hello,
My worst longstanding problem with KVM is that as the uptime of my
host system increased, my opensuse guest images started to destabilize
and lockup at boot. The weird thing was that fresh after boot
everything was always perfectly ok, so I thought it was rmmod/insmod
or some other sticky effect on the CPU after restarting the guest a
few times that triggered the crash. Furthermore if I loaded the cpu a
lot (like with a while :; do true;done), the crash would magically
disappear. Decreasing cpu frequency and timings didn't help. Debugging
wasn't trivial because it required a certain uptime and it didn't
always crash.
So I once debugged this more aggressively I figured out KVM was ok, it
was the guest that crashed in the tsc clocksource because tsc wasn't
monotone. guest was looping in an infinite loop with irq disabled. So
I tried to pass "notsc" and that fixed the crash just fine.
Initially I thought it was the tsc_offset logic being wrong but then I
figured out that the vcpu_put/load wasn't always executed, this
bugcheck triggers with current git and so I recommend to apply this to
kvm.git to avoid similar nasty hard-to-detect bugs in the future (Avi
says vmx would crash hard in such a condition, svm is much simpler and
it somewhat survives the lack of sched_in and only crashes the guest
due to not monotone tsc):
Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac876ec..26372fa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -742,6 +742,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ WARN_ON(vcpu->cpu != smp_processor_id());
kvm_x86_ops->vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
}
So trying to understand why the ->cpu was wrong, I looked into the
preempt notifiers emulation, and it looked quite fragile without a
real sched_in hook. I figured out I could provide a real sched_in hook
by loading the proper values in the
tsk->thread.debugreg[0/7]. Initially I got the hooking points out of
objdump -d vmlinux, but Avi preferred no dependency on the vmlinux and
he suggested to try to find the sched_in hook in the stack. So that's
what I implemented now and this should provide real robustness to the
out of tree module compiled against binary kernel images with
CONFIG_KVM=n. I tried to be compatible with all kernels down to 2.6.5
but only 2.6.2x host is tested and only on 64bit and only on SVM (no
vmx system around here at all).
This fixes my longstanding KVM instability and "-smp 2" now works
flawlessy with svm too! -smp 2 -snapshot crashes in qemu userland but
that's not kernel related, must be some thread mutex lock recursion or
lock inversion in the qcow cow code. Removing -snapshot make -smp 2
stable. Multiple guests UP and SMP seems stable too.
To reproduce my crash easily without waiting ages for the two tsc to
deviate with an error larger than the number of cycles it takes for a
CPU migration, run write_tsc(0,0) in kernel mode (like in the svm.c
init function and then insmod kvm-amd; rmmod kvm-amd and then remove
write_tsc and recompile kvm-amd).
#include <stdio.h>
main()
{
unsigned long x1, x2;
unsigned int a, d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
x1 = ((unsigned long)d << 32) | a;
for (;;) {
asm volatile("rdtsc" : "=a" (a), "=d" (d));
x2 = ((unsigned long)d << 32) | a;
if (x2 < x1)
printf("error %Ld\n", x1-x2);
else
printf("good %Ld\n", x1-x2);
x1 = x2;
}
}
(the "good.." printf can be commented out if you run this on the host,
but it better stay if you run this in the guest because it helps
rescheduling X with SDL that increases the frequency of the
CPU-switches of the kvm task)
So in short with the below fix applied, after a write_tsc(0,0), the
UP guest never return any error anymore. Previously it would return
frequent errors because sched_in wasn't properly invoked by svm.c and
it would crash at boot every single time after a write_tsc(0,0).
The SMP guest of course still returns TSC errors but that's ok, the
smp host also return TSC errors, that's ok, it's only the UP guest
that is forbidden to have a not monotone TSC or the guest would crash
like it happened to me.
I'm unsure if special_reload_db7 is needed at all, but it certainly
can't hurt so it's the only hack I left.
Finally I can enjoy KVM stability too ;). If you always compiled your
host kernel with CONFIG_KVM=y on a recent kernels including the
preempt-notifiers, you could never run into this. If you compile your
host kernel with CONFIG_KVM=n please try to test this.
Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org>
diff --git a/kernel/hack-module.awk b/kernel/hack-module.awk
index 7993aa2..5187c96 100644
--- a/kernel/hack-module.awk
+++ b/kernel/hack-module.awk
@@ -24,32 +24,6 @@
printf("MODULE_INFO(version, \"%s\");\n", version)
}
-/^static unsigned long vmcs_readl/ {
- in_vmcs_read = 1
-}
-
-/ASM_VMX_VMREAD_RDX_RAX/ && in_vmcs_read {
- printf("\tstart_special_insn();\n")
-}
-
-/return/ && in_vmcs_read {
- printf("\tend_special_insn();\n");
- in_vmcs_read = 0
-}
-
-/^static void vmcs_writel/ {
- in_vmcs_write = 1
-}
-
-/ASM_VMX_VMWRITE_RAX_RDX/ && in_vmcs_write {
- printf("\tstart_special_insn();\n")
-}
-
-/if/ && in_vmcs_write {
- printf("\tend_special_insn();\n");
- in_vmcs_write = 0
-}
-
/^static void vmx_load_host_state/ {
vmx_load_host_state = 1
}
@@ -74,15 +48,6 @@
print "\tspecial_reload_dr7();"
}
-/static void vcpu_put|static int __vcpu_run|static struct kvm_vcpu \*vmx_create_vcpu/ {
- in_tricky_func = 1
-}
-
-/preempt_disable|get_cpu/ && in_tricky_func {
- printf("\tin_special_section();\n");
- in_tricky_func = 0
-}
-
/unsigned long flags;/ && vmx_load_host_state {
print "\tunsigned long gsbase;"
}
@@ -90,4 +55,3 @@
/local_irq_save/ && vmx_load_host_state {
print "\t\tgsbase = vmcs_readl(HOST_GS_BASE);"
}
-
diff --git a/kernel/preempt.c b/kernel/preempt.c
index 8bb0405..6e57277 100644
--- a/kernel/preempt.c
+++ b/kernel/preempt.c
@@ -6,8 +6,6 @@
static DEFINE_SPINLOCK(pn_lock);
static LIST_HEAD(pn_list);
-static DEFINE_PER_CPU(int, notifier_enabled);
-static DEFINE_PER_CPU(struct task_struct *, last_tsk);
#define dprintk(fmt) do { \
if (0) \
@@ -15,59 +13,88 @@ static DEFINE_PER_CPU(struct task_struct *, last_tsk);
current->pid, raw_smp_processor_id()); \
} while (0)
-static void preempt_enable_notifiers(void)
+static void preempt_enable_sched_out_notifiers(void)
{
- int cpu = raw_smp_processor_id();
-
- if (per_cpu(notifier_enabled, cpu))
- return;
-
- dprintk("\n");
- per_cpu(notifier_enabled, cpu) = 1;
asm volatile ("mov %0, %%db0" : : "r"(schedule));
- asm volatile ("mov %0, %%db7" : : "r"(0x702ul));
+ asm volatile ("mov %0, %%db7" : : "r"(0x701ul));
+#ifdef CONFIG_X86_64
+ current->thread.debugreg7 = 0ul;
+#else
+ current->thread.debugreg[7] = 0ul;
+#endif
+#ifdef TIF_DEBUG
+ clear_tsk_thread_flag(current, TIF_DEBUG);
+#endif
+}
+
+static void preempt_enable_sched_in_notifiers(void * addr)
+{
+ asm volatile ("mov %0, %%db0" : : "r"(addr));
+ asm volatile ("mov %0, %%db7" : : "r"(0x701ul));
+#ifdef CONFIG_X86_64
+ current->thread.debugreg0 = (unsigned long) addr;
+ current->thread.debugreg7 = 0x701ul;
+#else
+ current->thread.debugreg[0] = (unsigned long) addr;
+ current->thread.debugreg[7] = 0x701ul;
+#endif
+#ifdef TIF_DEBUG
+ set_tsk_thread_flag(current, TIF_DEBUG);
+#endif
}
void special_reload_dr7(void)
{
- asm volatile ("mov %0, %%db7" : : "r"(0x702ul));
+ asm volatile ("mov %0, %%db7" : : "r"(0x701ul));
}
EXPORT_SYMBOL_GPL(special_reload_dr7);
-static void preempt_disable_notifiers(void)
+static void __preempt_disable_notifiers(void)
{
- int cpu = raw_smp_processor_id();
-
- if (!per_cpu(notifier_enabled, cpu))
- return;
+ asm volatile ("mov %0, %%db7" : : "r"(0ul));
+}
- dprintk("\n");
- per_cpu(notifier_enabled, cpu) = 0;
- asm volatile ("mov %0, %%db7" : : "r"(0x400ul));
+static void preempt_disable_notifiers(void)
+{
+ __preempt_disable_notifiers();
+#ifdef CONFIG_X86_64
+ current->thread.debugreg7 = 0ul;
+#else
+ current->thread.debugreg[7] = 0ul;
+#endif
+#ifdef TIF_DEBUG
+ clear_tsk_thread_flag(current, TIF_DEBUG);
+#endif
}
-static void __attribute__((used)) preempt_notifier_trigger(void)
+static void fastcall __attribute__((used)) preempt_notifier_trigger(void *** ip)
{
struct preempt_notifier *pn;
int cpu = raw_smp_processor_id();
int found = 0;
- unsigned long flags;
dprintk(" - in\n");
//dump_stack();
- spin_lock_irqsave(&pn_lock, flags);
+ spin_lock(&pn_lock);
list_for_each_entry(pn, &pn_list, link)
if (pn->tsk == current) {
found = 1;
break;
}
- spin_unlock_irqrestore(&pn_lock, flags);
- preempt_disable_notifiers();
+ spin_unlock(&pn_lock);
+
if (found) {
- dprintk("sched_out\n");
- pn->ops->sched_out(pn, NULL);
- per_cpu(last_tsk, cpu) = NULL;
- }
+ if ((void *) *ip != schedule) {
+ dprintk("sched_in\n");
+ preempt_enable_sched_out_notifiers();
+ pn->ops->sched_in(pn, cpu);
+ } else {
+ dprintk("sched_out\n");
+ preempt_enable_sched_in_notifiers(**(ip+3));
+ pn->ops->sched_out(pn, NULL);
+ }
+ } else
+ __preempt_disable_notifiers();
dprintk(" - out\n");
}
@@ -104,6 +131,11 @@ asm ("pn_int1_handler: \n\t"
"pop " TMP " \n\t"
"jz .Lnotme \n\t"
SAVE_REGS "\n\t"
+#ifdef CONFIG_X86_64
+ "leaq 120(%rsp),%rdi\n\t"
+#else
+ "leal 32(%esp),%eax\n\t"
+#endif
"call preempt_notifier_trigger \n\t"
RESTORE_REGS "\n\t"
#ifdef CONFIG_X86_64
@@ -121,75 +153,28 @@ asm ("pn_int1_handler: \n\t"
#endif
);
-void in_special_section(void)
-{
- struct preempt_notifier *pn;
- int cpu = raw_smp_processor_id();
- int found = 0;
- unsigned long flags;
-
- if (per_cpu(last_tsk, cpu) == current)
- return;
-
- dprintk(" - in\n");
- spin_lock_irqsave(&pn_lock, flags);
- list_for_each_entry(pn, &pn_list, link)
- if (pn->tsk == current) {
- found = 1;
- break;
- }
- spin_unlock_irqrestore(&pn_lock, flags);
- if (found) {
- dprintk("\n");
- per_cpu(last_tsk, cpu) = current;
- pn->ops->sched_in(pn, cpu);
- preempt_enable_notifiers();
- }
- dprintk(" - out\n");
-}
-EXPORT_SYMBOL_GPL(in_special_section);
-
-void start_special_insn(void)
-{
- preempt_disable();
- in_special_section();
-}
-EXPORT_SYMBOL_GPL(start_special_insn);
-
-void end_special_insn(void)
-{
- preempt_enable();
-}
-EXPORT_SYMBOL_GPL(end_special_insn);
-
void preempt_notifier_register(struct preempt_notifier *notifier)
{
- int cpu = get_cpu();
unsigned long flags;
dprintk(" - in\n");
spin_lock_irqsave(&pn_lock, flags);
- preempt_enable_notifiers();
+ preempt_enable_sched_out_notifiers();
notifier->tsk = current;
list_add(¬ifier->link, &pn_list);
spin_unlock_irqrestore(&pn_lock, flags);
- per_cpu(last_tsk, cpu) = current;
- put_cpu();
dprintk(" - out\n");
}
void preempt_notifier_unregister(struct preempt_notifier *notifier)
{
- int cpu = get_cpu();
unsigned long flags;
dprintk(" - in\n");
spin_lock_irqsave(&pn_lock, flags);
list_del(¬ifier->link);
spin_unlock_irqrestore(&pn_lock, flags);
- per_cpu(last_tsk, cpu) = NULL;
preempt_disable_notifiers();
- put_cpu();
dprintk(" - out\n");
}
@@ -238,7 +223,16 @@ void preempt_notifier_sys_init(void)
static void do_disable(void *blah)
{
- preempt_disable_notifiers();
+#ifdef TIF_DEBUG
+ if (!test_tsk_thread_flag(current, TIF_DEBUG))
+#else
+#ifdef CONFIG_X86_64
+ if (!current->thread.debugreg7)
+#else
+ if (!current->thread.debugreg[7])
+#endif
+#endif
+ __preempt_disable_notifiers();
}
void preempt_notifier_sys_exit(void)
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply related [flat|nested] 12+ messages in thread[parent not found: <20071220162353.GA3802-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <20071220162353.GA3802-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2007-12-22 19:13 ` Avi Kivity [not found] ` <476D61E8.5000102-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Avi Kivity @ 2007-12-22 19:13 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Andrea Arcangeli wrote: [snip] > So in short with the below fix applied, after a write_tsc(0,0), the > UP guest never return any error anymore. Previously it would return > frequent errors because sched_in wasn't properly invoked by svm.c and > it would crash at boot every single time after a write_tsc(0,0). > > The SMP guest of course still returns TSC errors but that's ok, the > smp host also return TSC errors, that's ok, it's only the UP guest > that is forbidden to have a not monotone TSC or the guest would crash > like it happened to me. > > I'm unsure if special_reload_db7 is needed at all, but it certainly > can't hurt so it's the only hack I left. > It's needed, vmx (and IIRC svm) will clear out db7 so we must reload it. In fairness we need also reload it if the host had it set; it shouldn't be a hack but part of mainline. > Finally I can enjoy KVM stability too ;). If you always compiled your > host kernel with CONFIG_KVM=y on a recent kernels including the > preempt-notifiers, you could never run into this. If you compile your > host kernel with CONFIG_KVM=n please try to test this. > Unfortunately, this fails badly on Intel i386: > kvm: emulating preempt notifiers; do not benchmark on this machine > loaded kvm module (kvm-56-127-g433be51) > vmwrite error: reg c08 value d8 (err 3080) > [<f8baf9e2>] vmx_save_host_state+0x4f/0x162 [kvm_intel] > [<c0425803>] __cond_resched+0x25/0x3c > [<f91a22a4>] kvm_arch_vcpu_ioctl_run+0x16f/0x3a7 [kvm] > [<f919f244>] kvm_vcpu_ioctl+0xcb/0x28f [kvm] > [<c0421987>] enqueue_entity+0x2c0/0x2ea > [<c05a8340>] skb_dequeue+0x39/0x3f > [<c0604b6d>] unix_stream_recvmsg+0x3a2/0x4c3 > [<c0425c82>] scheduler_tick+0x1a1/0x274 > [<c0487329>] core_sys_select+0x21f/0x2fa > [<c043e9e6>] clockevents_program_event+0xb5/0xbc > [<c04c6853>] avc_has_perm+0x4e/0x58 > [<c04c7174>] inode_has_perm+0x66/0x6e > [<c0430bed>] recalc_sigpending+0xb/0x1d > [<c043231d>] dequeue_signal+0xa9/0x12a > [<c043cb95>] getnstimeofday+0x30/0xbf > [<c04c7205>] file_has_perm+0x89/0x91 > [<f919f179>] kvm_vcpu_ioctl+0x0/0x28f [kvm] > [<c04861b9>] do_ioctl+0x21/0xa0 > [<c048646f>] vfs_ioctl+0x237/0x249 > [<c04864cd>] sys_ioctl+0x4c/0x67 > [<c0404f26>] sysenter_past_esp+0x5f/0x85 > ======================= vmwrite error means the vmcs pointer was not loaded, probably because the sched_in event did not fire after a vcpu migration. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <476D61E8.5000102-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <476D61E8.5000102-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-12-23 16:49 ` Andrea Arcangeli [not found] ` <20071223164932.GA8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Andrea Arcangeli @ 2007-12-23 16:49 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sat, Dec 22, 2007 at 09:13:44PM +0200, Avi Kivity wrote: > Unfortunately, this fails badly on Intel i386: Hmm ok there's a definitive bug that I forgot a int1 kernel->kernel switch on x86 has no special debug stack like on x86-64. This will have a better chance to work, hope I got all offsets right by memory.... At least the offset "32" in the leal and eax + fastcall should all be right or I doubt it could survive the double dereferencing. Likely the one-more-derefence didn't oops there because you likely have >=1g of ram and there was a 25% chance of crashing due the lack of sched-in and 75% chance of crashing in the one-more-dereference in a more meaningful way. Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org> diff --git a/kernel/hack-module.awk b/kernel/hack-module.awk index 7993aa2..5187c96 100644 --- a/kernel/hack-module.awk +++ b/kernel/hack-module.awk @@ -24,32 +24,6 @@ printf("MODULE_INFO(version, \"%s\");\n", version) } -/^static unsigned long vmcs_readl/ { - in_vmcs_read = 1 -} - -/ASM_VMX_VMREAD_RDX_RAX/ && in_vmcs_read { - printf("\tstart_special_insn();\n") -} - -/return/ && in_vmcs_read { - printf("\tend_special_insn();\n"); - in_vmcs_read = 0 -} - -/^static void vmcs_writel/ { - in_vmcs_write = 1 -} - -/ASM_VMX_VMWRITE_RAX_RDX/ && in_vmcs_write { - printf("\tstart_special_insn();\n") -} - -/if/ && in_vmcs_write { - printf("\tend_special_insn();\n"); - in_vmcs_write = 0 -} - /^static void vmx_load_host_state/ { vmx_load_host_state = 1 } @@ -74,15 +48,6 @@ print "\tspecial_reload_dr7();" } -/static void vcpu_put|static int __vcpu_run|static struct kvm_vcpu \*vmx_create_vcpu/ { - in_tricky_func = 1 -} - -/preempt_disable|get_cpu/ && in_tricky_func { - printf("\tin_special_section();\n"); - in_tricky_func = 0 -} - /unsigned long flags;/ && vmx_load_host_state { print "\tunsigned long gsbase;" } @@ -90,4 +55,3 @@ /local_irq_save/ && vmx_load_host_state { print "\t\tgsbase = vmcs_readl(HOST_GS_BASE);" } - diff --git a/kernel/preempt.c b/kernel/preempt.c index 8bb0405..fd6f8dc 100644 --- a/kernel/preempt.c +++ b/kernel/preempt.c @@ -6,8 +6,6 @@ static DEFINE_SPINLOCK(pn_lock); static LIST_HEAD(pn_list); -static DEFINE_PER_CPU(int, notifier_enabled); -static DEFINE_PER_CPU(struct task_struct *, last_tsk); #define dprintk(fmt) do { \ if (0) \ @@ -15,59 +13,95 @@ static DEFINE_PER_CPU(struct task_struct *, last_tsk); current->pid, raw_smp_processor_id()); \ } while (0) -static void preempt_enable_notifiers(void) +static void preempt_enable_sched_out_notifiers(void) { - int cpu = raw_smp_processor_id(); - - if (per_cpu(notifier_enabled, cpu)) - return; - - dprintk("\n"); - per_cpu(notifier_enabled, cpu) = 1; asm volatile ("mov %0, %%db0" : : "r"(schedule)); - asm volatile ("mov %0, %%db7" : : "r"(0x702ul)); + asm volatile ("mov %0, %%db7" : : "r"(0x701ul)); +#ifdef CONFIG_X86_64 + current->thread.debugreg7 = 0ul; +#else + current->thread.debugreg[7] = 0ul; +#endif +#ifdef TIF_DEBUG + clear_tsk_thread_flag(current, TIF_DEBUG); +#endif +} + +static void preempt_enable_sched_in_notifiers(void * addr) +{ + asm volatile ("mov %0, %%db0" : : "r"(addr)); + asm volatile ("mov %0, %%db7" : : "r"(0x701ul)); +#ifdef CONFIG_X86_64 + current->thread.debugreg0 = (unsigned long) addr; + current->thread.debugreg7 = 0x701ul; +#else + current->thread.debugreg[0] = (unsigned long) addr; + current->thread.debugreg[7] = 0x701ul; +#endif +#ifdef TIF_DEBUG + set_tsk_thread_flag(current, TIF_DEBUG); +#endif } void special_reload_dr7(void) { - asm volatile ("mov %0, %%db7" : : "r"(0x702ul)); + asm volatile ("mov %0, %%db7" : : "r"(0x701ul)); } EXPORT_SYMBOL_GPL(special_reload_dr7); -static void preempt_disable_notifiers(void) +static void __preempt_disable_notifiers(void) { - int cpu = raw_smp_processor_id(); - - if (!per_cpu(notifier_enabled, cpu)) - return; + asm volatile ("mov %0, %%db7" : : "r"(0ul)); +} - dprintk("\n"); - per_cpu(notifier_enabled, cpu) = 0; - asm volatile ("mov %0, %%db7" : : "r"(0x400ul)); +static void preempt_disable_notifiers(void) +{ + __preempt_disable_notifiers(); +#ifdef CONFIG_X86_64 + current->thread.debugreg7 = 0ul; +#else + current->thread.debugreg[7] = 0ul; +#endif +#ifdef TIF_DEBUG + clear_tsk_thread_flag(current, TIF_DEBUG); +#endif } -static void __attribute__((used)) preempt_notifier_trigger(void) +static void fastcall __attribute__((used)) preempt_notifier_trigger(void *** ip) { struct preempt_notifier *pn; int cpu = raw_smp_processor_id(); int found = 0; - unsigned long flags; dprintk(" - in\n"); //dump_stack(); - spin_lock_irqsave(&pn_lock, flags); + spin_lock(&pn_lock); list_for_each_entry(pn, &pn_list, link) if (pn->tsk == current) { found = 1; break; } - spin_unlock_irqrestore(&pn_lock, flags); - preempt_disable_notifiers(); + spin_unlock(&pn_lock); + if (found) { - dprintk("sched_out\n"); - pn->ops->sched_out(pn, NULL); - per_cpu(last_tsk, cpu) = NULL; - } + if ((void *) *ip != schedule) { + dprintk("sched_in\n"); + preempt_enable_sched_out_notifiers(); + pn->ops->sched_in(pn, cpu); + } else { + void * sched_in_addr; + dprintk("sched_out\n"); +#ifdef CONFIG_X86_64 + sched_in_addr = **(ip+3); +#else + /* no special debug stack switch on x86 */ + sched_in_addr = (void *) *(ip+3); +#endif + preempt_enable_sched_in_notifiers(sched_in_addr); + pn->ops->sched_out(pn, NULL); + } + } else + __preempt_disable_notifiers(); dprintk(" - out\n"); } @@ -104,6 +138,11 @@ asm ("pn_int1_handler: \n\t" "pop " TMP " \n\t" "jz .Lnotme \n\t" SAVE_REGS "\n\t" +#ifdef CONFIG_X86_64 + "leaq 120(%rsp),%rdi\n\t" +#else + "leal 32(%esp),%eax\n\t" +#endif "call preempt_notifier_trigger \n\t" RESTORE_REGS "\n\t" #ifdef CONFIG_X86_64 @@ -121,75 +160,28 @@ asm ("pn_int1_handler: \n\t" #endif ); -void in_special_section(void) -{ - struct preempt_notifier *pn; - int cpu = raw_smp_processor_id(); - int found = 0; - unsigned long flags; - - if (per_cpu(last_tsk, cpu) == current) - return; - - dprintk(" - in\n"); - spin_lock_irqsave(&pn_lock, flags); - list_for_each_entry(pn, &pn_list, link) - if (pn->tsk == current) { - found = 1; - break; - } - spin_unlock_irqrestore(&pn_lock, flags); - if (found) { - dprintk("\n"); - per_cpu(last_tsk, cpu) = current; - pn->ops->sched_in(pn, cpu); - preempt_enable_notifiers(); - } - dprintk(" - out\n"); -} -EXPORT_SYMBOL_GPL(in_special_section); - -void start_special_insn(void) -{ - preempt_disable(); - in_special_section(); -} -EXPORT_SYMBOL_GPL(start_special_insn); - -void end_special_insn(void) -{ - preempt_enable(); -} -EXPORT_SYMBOL_GPL(end_special_insn); - void preempt_notifier_register(struct preempt_notifier *notifier) { - int cpu = get_cpu(); unsigned long flags; dprintk(" - in\n"); spin_lock_irqsave(&pn_lock, flags); - preempt_enable_notifiers(); + preempt_enable_sched_out_notifiers(); notifier->tsk = current; list_add(¬ifier->link, &pn_list); spin_unlock_irqrestore(&pn_lock, flags); - per_cpu(last_tsk, cpu) = current; - put_cpu(); dprintk(" - out\n"); } void preempt_notifier_unregister(struct preempt_notifier *notifier) { - int cpu = get_cpu(); unsigned long flags; dprintk(" - in\n"); spin_lock_irqsave(&pn_lock, flags); list_del(¬ifier->link); spin_unlock_irqrestore(&pn_lock, flags); - per_cpu(last_tsk, cpu) = NULL; preempt_disable_notifiers(); - put_cpu(); dprintk(" - out\n"); } @@ -238,7 +230,16 @@ void preempt_notifier_sys_init(void) static void do_disable(void *blah) { - preempt_disable_notifiers(); +#ifdef TIF_DEBUG + if (!test_tsk_thread_flag(current, TIF_DEBUG)) +#else +#ifdef CONFIG_X86_64 + if (!current->thread.debugreg7) +#else + if (!current->thread.debugreg[7]) +#endif +#endif + __preempt_disable_notifiers(); } void preempt_notifier_sys_exit(void) > > > kvm: emulating preempt notifiers; do not benchmark on this machine > > loaded kvm module (kvm-56-127-g433be51) > > vmwrite error: reg c08 value d8 (err 3080) > > [<f8baf9e2>] vmx_save_host_state+0x4f/0x162 [kvm_intel] > > [<c0425803>] __cond_resched+0x25/0x3c > > [<f91a22a4>] kvm_arch_vcpu_ioctl_run+0x16f/0x3a7 [kvm] > > [<f919f244>] kvm_vcpu_ioctl+0xcb/0x28f [kvm] > > [<c0421987>] enqueue_entity+0x2c0/0x2ea > > [<c05a8340>] skb_dequeue+0x39/0x3f > > [<c0604b6d>] unix_stream_recvmsg+0x3a2/0x4c3 > > [<c0425c82>] scheduler_tick+0x1a1/0x274 > > [<c0487329>] core_sys_select+0x21f/0x2fa > > [<c043e9e6>] clockevents_program_event+0xb5/0xbc > > [<c04c6853>] avc_has_perm+0x4e/0x58 > > [<c04c7174>] inode_has_perm+0x66/0x6e > > [<c0430bed>] recalc_sigpending+0xb/0x1d > > [<c043231d>] dequeue_signal+0xa9/0x12a > > [<c043cb95>] getnstimeofday+0x30/0xbf > > [<c04c7205>] file_has_perm+0x89/0x91 > > [<f919f179>] kvm_vcpu_ioctl+0x0/0x28f [kvm] > > [<c04861b9>] do_ioctl+0x21/0xa0 > > [<c048646f>] vfs_ioctl+0x237/0x249 > > [<c04864cd>] sys_ioctl+0x4c/0x67 > > [<c0404f26>] sysenter_past_esp+0x5f/0x85 > > ======================= > > vmwrite error means the vmcs pointer was not loaded, probably because > the sched_in event did not fire after a vcpu migration. > > -- > Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <20071223164932.GA8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <20071223164932.GA8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2007-12-23 17:37 ` Avi Kivity [not found] ` <476E9CE4.2060705-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Avi Kivity @ 2007-12-23 17:37 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Andrea Arcangeli wrote: > On Sat, Dec 22, 2007 at 09:13:44PM +0200, Avi Kivity wrote: > >> Unfortunately, this fails badly on Intel i386: >> > > Hmm ok there's a definitive bug that I forgot a int1 kernel->kernel > switch on x86 has no special debug stack like on x86-64. This will > have a better chance to work, hope I got all offsets right by > memory.... At least the offset "32" in the leal and eax + fastcall > should all be right or I doubt it could survive the double > dereferencing. Likely the one-more-derefence didn't oops there because > you likely have >=1g of ram and there was a 25% chance of crashing due > the lack of sched-in and 75% chance of crashing in the > one-more-dereference in a more meaningful way. > > Now I see lots of > BUG: warning at arch/i386/kernel/smp.c:701/smp_call_function_single() > (Not tainted) > [<f8c053bb>] __vcpu_clear+0x0/0x4a [kvm_intel] > [<c0417ab9>] smp_call_function_single+0x90/0x10c > [<c0403126>] __switch_to+0x174/0x18e > [<f8c05614>] vcpu_clear+0x41/0x50 [kvm_intel] > [<f8c058a5>] vmx_vcpu_load+0x2e/0x103 [kvm_intel] > [<f8c0516d>] vmx_vcpu_put+0xc0/0xf3 [kvm_intel] > [<f8c5f744>] kvm_arch_vcpu_load+0x9/0xa [kvm] > [<f8c6b961>] preempt_notifier_trigger+0x5b/0xe1 [kvm] > [<f8c6b79a>] pn_int1_handler+0x16/0x26 [kvm] > [<c061fa14>] __mutex_lock_slowpath+0x45/0x77 > [<c061f8ff>] mutex_lock+0x26/0x29 > [<f8c6a465>] apic_update_ppr+0x17/0x3e [kvm] > [<f8c650ed>] kvm_mmu_page_fault+0x14/0x9b [kvm] > [<f8c6a55a>] kvm_get_apic_interrupt+0x3a/0x4f [kvm] > [<f8c06cdb>] kvm_handle_exit+0x6a/0x86 [kvm_intel] > [<f8c623cb>] kvm_arch_vcpu_ioctl_run+0x2a4/0x3aa [kvm] > [<f8c5f246>] kvm_vcpu_ioctl+0xce/0x298 [kvm] > [<c0420e83>] __activate_task+0x1c/0x29 > [<c0422645>] try_to_wake_up+0x3aa/0x3b4 > [<c06205b5>] _spin_unlock_irq+0x5/0x7 > [<c041fb40>] __wake_up_common+0x32/0x55 > [<c0420a39>] __wake_up+0x32/0x43 > [<c043b367>] wake_futex+0x42/0x4c > [<c043b61a>] futex_wake+0xa6/0xb0 > [<c043c233>] do_futex+0x217/0xb7d > [<f88626e5>] journal_stop+0x1cb/0x1d7 [jbd] > [<c045addb>] mapping_tagged+0x2b/0x32 > [<f8c5ee89>] kvm_vm_ioctl+0x172/0x183 [kvm] > [<c06205b5>] _spin_unlock_irq+0x5/0x7 > [<c061ef69>] __sched_text_start+0x999/0xa21 > [<c0419d4e>] smp_apic_timer_interrupt+0x76/0x80 > [<f8c5f178>] kvm_vcpu_ioctl+0x0/0x298 [kvm] > [<c047c4a7>] do_ioctl+0x1f/0x62 > [<c047c72e>] vfs_ioctl+0x244/0x256 > [<c047c78c>] sys_ioctl+0x4c/0x64 > [<c0403f64>] syscall_call+0x7/0xb > ======================= The sched_in notifier needs to enable interrupts (but it must disable preemption to avoid recursion). Eventually I got this: BUG: spinlock lockup on CPU#3, qemu-system-x86/4425, c07001cc (Not tainted) [<f8c053bb>] __vcpu_clear+0x0/0x4a [kvm_intel] [<c04edec8>] _raw_spin_lock+0xb8/0xd9 [<c0417ac3>] smp_call_function_single+0x9a/0x10c [<c0403126>] __switch_to+0x174/0x18e [<f8c05614>] vcpu_clear+0x41/0x50 [kvm_intel] [<f8c058a5>] vmx_vcpu_load+0x2e/0x103 [kvm_intel] [<f8c0516d>] vmx_vcpu_put+0xc0/0xf3 [kvm_intel] [<f8c5f744>] kvm_arch_vcpu_load+0x9/0xa [kvm] [<f8c6b961>] preempt_notifier_trigger+0x5b/0xe1 [kvm] [<f8c6b79a>] pn_int1_handler+0x16/0x26 [kvm] [<c061fa14>] __mutex_lock_slowpath+0x45/0x77 [<c061f8ff>] mutex_lock+0x26/0x29 [<f8c6a465>] apic_update_ppr+0x17/0x3e [kvm] [<f8c650ed>] kvm_mmu_page_fault+0x14/0x9b [kvm] [<f8c6a55a>] kvm_get_apic_interrupt+0x3a/0x4f [kvm] [<f8c06cdb>] kvm_handle_exit+0x6a/0x86 [kvm_intel] [<f8c623cb>] kvm_arch_vcpu_ioctl_run+0x2a4/0x3aa [kvm] [<c05e6d63>] tcp_sendmsg+0x913/0xa04 [<f8c5f246>] kvm_vcpu_ioctl+0xce/0x298 [kvm] [<c0420e83>] __activate_task+0x1c/0x29 [<c0422645>] try_to_wake_up+0x3aa/0x3b4 [<c06205b5>] _spin_unlock_irq+0x5/0x7 [<c061ef69>] __sched_text_start+0x999/0xa21 [<c047d5e5>] core_sys_select+0x218/0x2f3 [<c043b61a>] futex_wake+0xa6/0xb0 [<c043c233>] do_futex+0x217/0xb7d [<c042faa1>] __dequeue_signal+0xff/0x14e [<c0430ca0>] dequeue_signal+0x36/0xae [<f8c5ee89>] kvm_vm_ioctl+0x172/0x183 [kvm] [<c043a03a>] ktime_get_ts+0x16/0x44 [<c043a07a>] ktime_get+0x12/0x34 [<c043698b>] common_timer_get+0xf4/0x130 [<f8c5f178>] kvm_vcpu_ioctl+0x0/0x298 [kvm] [<c047c4a7>] do_ioctl+0x1f/0x62 [<c047c72e>] vfs_ioctl+0x244/0x256 [<c04ed2c0>] copy_to_user+0x3c/0x50 [<c047c78c>] sys_ioctl+0x4c/0x64 [<c0403f64>] syscall_call+0x7/0xb ======================= followed by lockup of the qemu process, but it may be due to interrupts being disabled. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <476E9CE4.2060705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <476E9CE4.2060705-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-12-24 16:26 ` Andrea Arcangeli [not found] ` <20071224162639.GH8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Andrea Arcangeli @ 2007-12-24 16:26 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sun, Dec 23, 2007 at 07:37:40PM +0200, Avi Kivity wrote: > The sched_in notifier needs to enable interrupts (but it must disable > preemption to avoid recursion). Ok this update fixes the smp_call_function deadlock. Signed-off-by: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org> diff --git a/kernel/hack-module.awk b/kernel/hack-module.awk index 7993aa2..5187c96 100644 --- a/kernel/hack-module.awk +++ b/kernel/hack-module.awk @@ -24,32 +24,6 @@ printf("MODULE_INFO(version, \"%s\");\n", version) } -/^static unsigned long vmcs_readl/ { - in_vmcs_read = 1 -} - -/ASM_VMX_VMREAD_RDX_RAX/ && in_vmcs_read { - printf("\tstart_special_insn();\n") -} - -/return/ && in_vmcs_read { - printf("\tend_special_insn();\n"); - in_vmcs_read = 0 -} - -/^static void vmcs_writel/ { - in_vmcs_write = 1 -} - -/ASM_VMX_VMWRITE_RAX_RDX/ && in_vmcs_write { - printf("\tstart_special_insn();\n") -} - -/if/ && in_vmcs_write { - printf("\tend_special_insn();\n"); - in_vmcs_write = 0 -} - /^static void vmx_load_host_state/ { vmx_load_host_state = 1 } @@ -74,15 +48,6 @@ print "\tspecial_reload_dr7();" } -/static void vcpu_put|static int __vcpu_run|static struct kvm_vcpu \*vmx_create_vcpu/ { - in_tricky_func = 1 -} - -/preempt_disable|get_cpu/ && in_tricky_func { - printf("\tin_special_section();\n"); - in_tricky_func = 0 -} - /unsigned long flags;/ && vmx_load_host_state { print "\tunsigned long gsbase;" } @@ -90,4 +55,3 @@ /local_irq_save/ && vmx_load_host_state { print "\t\tgsbase = vmcs_readl(HOST_GS_BASE);" } - diff --git a/kernel/preempt.c b/kernel/preempt.c index 8bb0405..2582efa 100644 --- a/kernel/preempt.c +++ b/kernel/preempt.c @@ -6,8 +6,6 @@ static DEFINE_SPINLOCK(pn_lock); static LIST_HEAD(pn_list); -static DEFINE_PER_CPU(int, notifier_enabled); -static DEFINE_PER_CPU(struct task_struct *, last_tsk); #define dprintk(fmt) do { \ if (0) \ @@ -15,59 +13,105 @@ static DEFINE_PER_CPU(struct task_struct *, last_tsk); current->pid, raw_smp_processor_id()); \ } while (0) -static void preempt_enable_notifiers(void) +static void preempt_enable_sched_out_notifiers(void) { - int cpu = raw_smp_processor_id(); - - if (per_cpu(notifier_enabled, cpu)) - return; - - dprintk("\n"); - per_cpu(notifier_enabled, cpu) = 1; asm volatile ("mov %0, %%db0" : : "r"(schedule)); - asm volatile ("mov %0, %%db7" : : "r"(0x702ul)); + asm volatile ("mov %0, %%db7" : : "r"(0x701ul)); +#ifdef CONFIG_X86_64 + current->thread.debugreg7 = 0ul; +#else + current->thread.debugreg[7] = 0ul; +#endif +#ifdef TIF_DEBUG + clear_tsk_thread_flag(current, TIF_DEBUG); +#endif +} + +static void preempt_enable_sched_in_notifiers(void * addr) +{ + asm volatile ("mov %0, %%db0" : : "r"(addr)); + asm volatile ("mov %0, %%db7" : : "r"(0x701ul)); +#ifdef CONFIG_X86_64 + current->thread.debugreg0 = (unsigned long) addr; + current->thread.debugreg7 = 0x701ul; +#else + current->thread.debugreg[0] = (unsigned long) addr; + current->thread.debugreg[7] = 0x701ul; +#endif +#ifdef TIF_DEBUG + set_tsk_thread_flag(current, TIF_DEBUG); +#endif } void special_reload_dr7(void) { - asm volatile ("mov %0, %%db7" : : "r"(0x702ul)); + asm volatile ("mov %0, %%db7" : : "r"(0x701ul)); } EXPORT_SYMBOL_GPL(special_reload_dr7); -static void preempt_disable_notifiers(void) +static void __preempt_disable_notifiers(void) { - int cpu = raw_smp_processor_id(); - - if (!per_cpu(notifier_enabled, cpu)) - return; + asm volatile ("mov %0, %%db7" : : "r"(0ul)); +} - dprintk("\n"); - per_cpu(notifier_enabled, cpu) = 0; - asm volatile ("mov %0, %%db7" : : "r"(0x400ul)); +static void preempt_disable_notifiers(void) +{ + __preempt_disable_notifiers(); +#ifdef CONFIG_X86_64 + current->thread.debugreg7 = 0ul; +#else + current->thread.debugreg[7] = 0ul; +#endif +#ifdef TIF_DEBUG + clear_tsk_thread_flag(current, TIF_DEBUG); +#endif } -static void __attribute__((used)) preempt_notifier_trigger(void) +static void fastcall __attribute__((used)) preempt_notifier_trigger(void *** ip) { struct preempt_notifier *pn; int cpu = raw_smp_processor_id(); int found = 0; - unsigned long flags; dprintk(" - in\n"); //dump_stack(); - spin_lock_irqsave(&pn_lock, flags); + spin_lock(&pn_lock); list_for_each_entry(pn, &pn_list, link) if (pn->tsk == current) { found = 1; break; } - spin_unlock_irqrestore(&pn_lock, flags); - preempt_disable_notifiers(); + spin_unlock(&pn_lock); + if (found) { - dprintk("sched_out\n"); - pn->ops->sched_out(pn, NULL); - per_cpu(last_tsk, cpu) = NULL; - } + if ((void *) *ip != schedule) { + dprintk("sched_in\n"); + preempt_enable_sched_out_notifiers(); + + preempt_disable(); + local_irq_enable(); + pn->ops->sched_in(pn, cpu); + local_irq_disable(); + preempt_enable_no_resched(); + } else { + void * sched_in_addr; + dprintk("sched_out\n"); +#ifdef CONFIG_X86_64 + sched_in_addr = **(ip+3); +#else + /* no special debug stack switch on x86 */ + sched_in_addr = (void *) *(ip+3); +#endif + preempt_enable_sched_in_notifiers(sched_in_addr); + + preempt_disable(); + local_irq_enable(); + pn->ops->sched_out(pn, NULL); + local_irq_disable(); + preempt_enable_no_resched(); + } + } else + __preempt_disable_notifiers(); dprintk(" - out\n"); } @@ -104,6 +148,11 @@ asm ("pn_int1_handler: \n\t" "pop " TMP " \n\t" "jz .Lnotme \n\t" SAVE_REGS "\n\t" +#ifdef CONFIG_X86_64 + "leaq 120(%rsp),%rdi\n\t" +#else + "leal 32(%esp),%eax\n\t" +#endif "call preempt_notifier_trigger \n\t" RESTORE_REGS "\n\t" #ifdef CONFIG_X86_64 @@ -121,75 +170,28 @@ asm ("pn_int1_handler: \n\t" #endif ); -void in_special_section(void) -{ - struct preempt_notifier *pn; - int cpu = raw_smp_processor_id(); - int found = 0; - unsigned long flags; - - if (per_cpu(last_tsk, cpu) == current) - return; - - dprintk(" - in\n"); - spin_lock_irqsave(&pn_lock, flags); - list_for_each_entry(pn, &pn_list, link) - if (pn->tsk == current) { - found = 1; - break; - } - spin_unlock_irqrestore(&pn_lock, flags); - if (found) { - dprintk("\n"); - per_cpu(last_tsk, cpu) = current; - pn->ops->sched_in(pn, cpu); - preempt_enable_notifiers(); - } - dprintk(" - out\n"); -} -EXPORT_SYMBOL_GPL(in_special_section); - -void start_special_insn(void) -{ - preempt_disable(); - in_special_section(); -} -EXPORT_SYMBOL_GPL(start_special_insn); - -void end_special_insn(void) -{ - preempt_enable(); -} -EXPORT_SYMBOL_GPL(end_special_insn); - void preempt_notifier_register(struct preempt_notifier *notifier) { - int cpu = get_cpu(); unsigned long flags; dprintk(" - in\n"); spin_lock_irqsave(&pn_lock, flags); - preempt_enable_notifiers(); + preempt_enable_sched_out_notifiers(); notifier->tsk = current; list_add(¬ifier->link, &pn_list); spin_unlock_irqrestore(&pn_lock, flags); - per_cpu(last_tsk, cpu) = current; - put_cpu(); dprintk(" - out\n"); } void preempt_notifier_unregister(struct preempt_notifier *notifier) { - int cpu = get_cpu(); unsigned long flags; dprintk(" - in\n"); spin_lock_irqsave(&pn_lock, flags); list_del(¬ifier->link); spin_unlock_irqrestore(&pn_lock, flags); - per_cpu(last_tsk, cpu) = NULL; preempt_disable_notifiers(); - put_cpu(); dprintk(" - out\n"); } @@ -238,7 +240,16 @@ void preempt_notifier_sys_init(void) static void do_disable(void *blah) { - preempt_disable_notifiers(); +#ifdef TIF_DEBUG + if (!test_tsk_thread_flag(current, TIF_DEBUG)) +#else +#ifdef CONFIG_X86_64 + if (!current->thread.debugreg7) +#else + if (!current->thread.debugreg[7]) +#endif +#endif + __preempt_disable_notifiers(); } void preempt_notifier_sys_exit(void) ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <20071224162639.GH8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: external module sched_in event [not found] ` <20071224162639.GH8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2007-12-25 9:00 ` Avi Kivity 0 siblings, 0 replies; 12+ messages in thread From: Avi Kivity @ 2007-12-25 9:00 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Andrea Arcangeli wrote: > On Sun, Dec 23, 2007 at 07:37:40PM +0200, Avi Kivity wrote: > >> The sched_in notifier needs to enable interrupts (but it must disable >> preemption to avoid recursion). >> > > Ok this update fixes the smp_call_function deadlock. > > I was able to boot a 4-way guest on a 2-way i386 host, so looks like this works. Thanks, it will reduce the maintenance burden needed for the external module. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-12-25 9:00 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-21 17:40 external module sched_in event Andrea Arcangeli
[not found] ` <20071221174048.GB1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2007-12-21 17:52 ` Izik Eidus
[not found] ` <476BFD74.2040509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-21 18:22 ` Andrea Arcangeli
[not found] ` <20071221182257.GG1292-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2007-12-21 18:50 ` mailman setup for kvm-devel (was Re: external module sched_in event) Carlo Marcelo Arenas Belon
2007-12-22 20:21 ` Avi Kivity
2007-12-22 20:24 ` external module sched_in event Avi Kivity
-- strict thread matches above, loose matches on Subject: below --
2007-12-20 16:23 Andrea Arcangeli
[not found] ` <20071220162353.GA3802-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2007-12-22 19:13 ` Avi Kivity
[not found] ` <476D61E8.5000102-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 16:49 ` Andrea Arcangeli
[not found] ` <20071223164932.GA8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2007-12-23 17:37 ` Avi Kivity
[not found] ` <476E9CE4.2060705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24 16:26 ` Andrea Arcangeli
[not found] ` <20071224162639.GH8483-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2007-12-25 9:00 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox