* Another preempt folding issue?
@ 2014-02-11 18:34 Stefan Bader
2014-02-11 19:45 ` Peter Zijlstra
0 siblings, 1 reply; 34+ messages in thread
From: Stefan Bader @ 2014-02-11 18:34 UTC (permalink / raw)
To: Peter Zijlstra, Paolo Bonzini; +Cc: Linux Kernel Mailing List, kvm
[-- Attachment #1: Type: text/plain, Size: 2624 bytes --]
Hi Peter,
I am currently looking at a weird issue that manifest itself when trying to run
kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important the
cpu is 64bit capable, so qemu-system-x86_64 is called). Sooner or later this
causes softlockup messages on the host. I tracked this down to __vcpu_run in
arch/x86/kvm/x86.c which does a loop which in that case never seems to make
progress or exit.
What I found is that vcpu_enter_guest will exit quickly without causing the loop
to exit when need_resched() is true. Looking at a crash dump I took, this was
the case (thread_info->flags had TIF_NEED_RESCHED set). So after immediately
returning __vcpu_run has the following code:
if (need_resched()) {
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
kvm_resched(vcpu); // now cond_resched();
vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
}
The kvm_resched basically would end up doing a cond_resched() which now checks
preempt_count() to be 0. If that is zero it will do the reschedule, otherwise it
just does nothing. Looking at the percpu variables in the dump, I saw that
the preempt_count was 0x8000000 (actually it was 0x80110000 but that was me
triggering the kexec crashdump with sysrq-c).
I saw that there have been some changes in the upstream kernel and have picked
the following:
1) x86, acpi, idle: Restructure the mwait idle routines
2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
4) sched/preempt/x86: Fix voluntary preempt for x86
Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and to
the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do for
3.13.y stable].
Still, with all those I got the softlockup. Since I knew from the dump info that
something is wrong with the folding, I made the pragmatic approach and added the
following:
if (need_resched()) {
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+ preempt_fold_need_resched();
kvm_resched(vcpu); // now cond_resched();
vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
}
And this lets the kvm guest run without the softlockups! However I am less than
convinced that this is the right thing to do. Somehow something done when
converting the preempt_count into percpu has caused at least the i386 side to
get into this mess (as there has not been any whining about 64bit). Just fail to
see what.
-Stefan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Another preempt folding issue? 2014-02-11 18:34 Another preempt folding issue? Stefan Bader @ 2014-02-11 19:45 ` Peter Zijlstra 2014-02-12 8:20 ` Stefan Bader 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-11 19:45 UTC (permalink / raw) To: Stefan Bader Cc: Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel, Borislav Petkov On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote: > Hi Peter, > > I am currently looking at a weird issue that manifest itself when trying to run > kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important the > cpu is 64bit capable, so qemu-system-x86_64 is called). AMD or Intel machine? > Sooner or later this > causes softlockup messages on the host. I tracked this down to __vcpu_run in > arch/x86/kvm/x86.c which does a loop which in that case never seems to make > progress or exit. > > What I found is that vcpu_enter_guest will exit quickly without causing the loop > to exit when need_resched() is true. Looking at a crash dump I took, this was > the case (thread_info->flags had TIF_NEED_RESCHED set). So after immediately > returning __vcpu_run has the following code: > > if (need_resched()) { > srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > kvm_resched(vcpu); // now cond_resched(); > vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > } > > The kvm_resched basically would end up doing a cond_resched() which now checks > preempt_count() to be 0. If that is zero it will do the reschedule, otherwise it > just does nothing. Looking at the percpu variables in the dump, I saw that > the preempt_count was 0x8000000 (actually it was 0x80110000 but that was me > triggering the kexec crashdump with sysrq-c). > > I saw that there have been some changes in the upstream kernel and have picked > the following: > 1) x86, acpi, idle: Restructure the mwait idle routines > 2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers > 3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding > 4) sched/preempt/x86: Fix voluntary preempt for x86 > > Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and to > the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do for > 3.13.y stable]. > > Still, with all those I got the softlockup. Since I knew from the dump info that > something is wrong with the folding, I made the pragmatic approach and added the > following: > > if (need_resched()) { > srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > + preempt_fold_need_resched(); > kvm_resched(vcpu); // now cond_resched(); > vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > } > > And this lets the kvm guest run without the softlockups! However I am less than > convinced that this is the right thing to do. Somehow something done when > converting the preempt_count into percpu has caused at least the i386 side to > get into this mess (as there has not been any whining about 64bit). Just fail to > see what. I've been looking at the same thing too; I've got a trace from someone who can reproduce and its just not making sense. Looks like the hardware is loosing an interrupt (or worse). With the below patch on top of current -git (might be whitespace damaged due to copy-paste). --- arch/x86/kvm/x86.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 39c28f09dfd5..f8b2fc7ce491 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6115,7 +6115,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) } if (need_resched()) { srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); - cond_resched(); + if (need_resched() != should_resched()) { + tracing_stop(); + printk(KERN_ERR "Stopped tracing, due to inconsistent state.\n"); + } + schedule(); vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); } } --- I got the following from an Athlon X2 (fam 0xf): On Mon, Feb 10, 2014 at 05:23:15PM -0200, Marcelo Tosatti wrote: > What is the problem exactly? >From the trace: qemu-system-x86-2455 [001] d.s3 393.630586: smp_reschedule_interrupt <-reschedule_interrupt qemu-system-x86-2455 [001] d.s3 393.630586: scheduler_ipi <-smp_reschedule_interrupt qemu-system-x86-2455 [001] ..s3 393.630586: preempt_count_sub <-_raw_spin_unlock_irqrestore qemu-system-x86-2455 [001] ..s2 393.630586: _raw_spin_lock_irq <-run_timer_softirq qemu-system-x86-2455 [001] d.s2 393.630586: preempt_count_add <-_raw_spin_lock_irq qemu-system-x86-2455 [001] d.s3 393.630586: _raw_spin_unlock_irq <-run_timer_softirq qemu-system-x86-2455 [001] ..s3 393.630586: preempt_count_sub <-_raw_spin_unlock_irq qemu-system-x86-2455 [001] ..s2 393.630586: rcu_bh_qs <-__do_softirq qemu-system-x86-2455 [001] d.s2 393.630586: irqtime_account_irq <-__do_softirq qemu-system-x86-2455 [001] d.s2 393.630586: __local_bh_enable <-__do_softirq qemu-system-x86-2455 [001] d.s2 393.630586: preempt_count_sub <-__local_bh_enable dmesg-2458 [000] d.s5 393.630614: resched_task <-check_preempt_wakeup dmesg-2458 [000] d.s5 393.630614: native_smp_send_reschedule <-resched_task dmesg-2458 [000] d.s5 393.630614: default_send_IPI_mask_logical <-native_smp_send_reschedule qemu-system-x86-2455 [001] .n.. 393.630636: preempt_count_add <-kmap_atomic_prot qemu-system-x86-2455 [001] .n.1 393.630639: __kunmap_atomic <-clear_huge_page qemu-system-x86-2455 [001] .n.1 393.630639: preempt_count_sub <-__kunmap_atomic qemu-system-x86-2455 [001] .n.. 393.630639: _cond_resched <-clear_huge_page qemu-system-x86-2455 [001] .n.. 393.630640: kmap_atomic <-clear_huge_page qemu-system-x86-2455 [001] .n.. 393.630640: kmap_atomic_prot <-kmap_atomic qemu-system-x86-2455 [001] .n.. 393.630640: preempt_count_add <-kmap_atomic_prot The resched_task() in the middle does: set_tsk_need_resched(); smp_mb(); smp_send_reschedule(); This means that the remote cpu must observe TIF_NEED_RESCHED (.n.. in the traces) when the IPI (smp_reschedule_interrupt) hits. Now given the machine has unsynchronized TSC the order in the above is not necessarily the true order, but there's only two possible scenarios: 1) its the right order; the smp_reschedule_interrupt() is from a previous resched_task()/rcu kick or any of the other users that sends it. In which case we should get another smp_reschedule_interrupt() _after_ the resched_task() which should observe the 'n' and promote it to an 'N'. This doesn't happen and we're stuck with the 'n' state.. 2) its the wrong order and the smp_reschedule_interrupt() actually happened after the resched_task(), but in that case we should have observed the 'n', we did not. Now given that 2) would mean a bug in the coherency fabric and 1) would mean a lost interrupt, 1 seems like the more likely suspect. Joerg wanted the extra kvm tracepoints to see what the vm-exit reason is. ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-11 19:45 ` Peter Zijlstra @ 2014-02-12 8:20 ` Stefan Bader 2014-02-12 10:37 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-12 8:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel, Borislav Petkov [-- Attachment #1: Type: text/plain, Size: 8150 bytes --] On 11.02.2014 20:45, Peter Zijlstra wrote: > On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote: >> Hi Peter, >> >> I am currently looking at a weird issue that manifest itself when trying to run >> kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important the >> cpu is 64bit capable, so qemu-system-x86_64 is called). > > AMD or Intel machine? Personally I am reproducing this on an AMD: cpu family : 15 model : 72 model name : AMD Turion(tm) 64 X2 Mobile Technology TL-64 but the reports I got are showing the same issue on an Intel i7 cpu family : 6 model : 37 model name : Intel(R) Core(TM) i7 CPU M 620 @ 2.67GH Another reporter also saw this on an AMD and said it could not be reproduced on the same hardware and the same software versions when using 64bit instead of 32. In my case on a 32bit installation I will see this on every invocation of qemu-system-x86_64 -cdrom <32bit live iso> -boot d -m 512 > >> Sooner or later this >> causes softlockup messages on the host. I tracked this down to __vcpu_run in >> arch/x86/kvm/x86.c which does a loop which in that case never seems to make >> progress or exit. >> >> What I found is that vcpu_enter_guest will exit quickly without causing the loop >> to exit when need_resched() is true. Looking at a crash dump I took, this was >> the case (thread_info->flags had TIF_NEED_RESCHED set). So after immediately >> returning __vcpu_run has the following code: >> >> if (need_resched()) { >> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); >> kvm_resched(vcpu); // now cond_resched(); >> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); >> } >> >> The kvm_resched basically would end up doing a cond_resched() which now checks >> preempt_count() to be 0. If that is zero it will do the reschedule, otherwise it >> just does nothing. Looking at the percpu variables in the dump, I saw that >> the preempt_count was 0x8000000 (actually it was 0x80110000 but that was me >> triggering the kexec crashdump with sysrq-c). >> >> I saw that there have been some changes in the upstream kernel and have picked >> the following: >> 1) x86, acpi, idle: Restructure the mwait idle routines >> 2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers >> 3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding >> 4) sched/preempt/x86: Fix voluntary preempt for x86 >> >> Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and to >> the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do for >> 3.13.y stable]. >> >> Still, with all those I got the softlockup. Since I knew from the dump info that >> something is wrong with the folding, I made the pragmatic approach and added the >> following: >> >> if (need_resched()) { >> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); >> + preempt_fold_need_resched(); >> kvm_resched(vcpu); // now cond_resched(); >> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); >> } >> >> And this lets the kvm guest run without the softlockups! However I am less than >> convinced that this is the right thing to do. Somehow something done when >> converting the preempt_count into percpu has caused at least the i386 side to >> get into this mess (as there has not been any whining about 64bit). Just fail to >> see what. > > I've been looking at the same thing too; I've got a trace from someone > who can reproduce and its just not making sense. > > Looks like the hardware is loosing an interrupt (or worse). > > With the below patch on top of current -git (might be whitespace damaged > due to copy-paste). > > --- > arch/x86/kvm/x86.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 39c28f09dfd5..f8b2fc7ce491 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -6115,7 +6115,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) > } > if (need_resched()) { > srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > - cond_resched(); > + if (need_resched() != should_resched()) { > + tracing_stop(); > + printk(KERN_ERR "Stopped tracing, due to inconsistent state.\n"); > + } > + schedule(); > vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > } > } > > --- > > I got the following from an Athlon X2 (fam 0xf): > > On Mon, Feb 10, 2014 at 05:23:15PM -0200, Marcelo Tosatti wrote: >> What is the problem exactly? > > From the trace: > > qemu-system-x86-2455 [001] d.s3 393.630586: smp_reschedule_interrupt <-reschedule_interrupt > qemu-system-x86-2455 [001] d.s3 393.630586: scheduler_ipi <-smp_reschedule_interrupt > qemu-system-x86-2455 [001] ..s3 393.630586: preempt_count_sub <-_raw_spin_unlock_irqrestore > qemu-system-x86-2455 [001] ..s2 393.630586: _raw_spin_lock_irq <-run_timer_softirq > qemu-system-x86-2455 [001] d.s2 393.630586: preempt_count_add <-_raw_spin_lock_irq > qemu-system-x86-2455 [001] d.s3 393.630586: _raw_spin_unlock_irq <-run_timer_softirq > qemu-system-x86-2455 [001] ..s3 393.630586: preempt_count_sub <-_raw_spin_unlock_irq > qemu-system-x86-2455 [001] ..s2 393.630586: rcu_bh_qs <-__do_softirq > qemu-system-x86-2455 [001] d.s2 393.630586: irqtime_account_irq <-__do_softirq > qemu-system-x86-2455 [001] d.s2 393.630586: __local_bh_enable <-__do_softirq > qemu-system-x86-2455 [001] d.s2 393.630586: preempt_count_sub <-__local_bh_enable > > > dmesg-2458 [000] d.s5 393.630614: resched_task <-check_preempt_wakeup > dmesg-2458 [000] d.s5 393.630614: native_smp_send_reschedule <-resched_task > dmesg-2458 [000] d.s5 393.630614: default_send_IPI_mask_logical <-native_smp_send_reschedule > > > qemu-system-x86-2455 [001] .n.. 393.630636: preempt_count_add <-kmap_atomic_prot > qemu-system-x86-2455 [001] .n.1 393.630639: __kunmap_atomic <-clear_huge_page > qemu-system-x86-2455 [001] .n.1 393.630639: preempt_count_sub <-__kunmap_atomic > qemu-system-x86-2455 [001] .n.. 393.630639: _cond_resched <-clear_huge_page > qemu-system-x86-2455 [001] .n.. 393.630640: kmap_atomic <-clear_huge_page > qemu-system-x86-2455 [001] .n.. 393.630640: kmap_atomic_prot <-kmap_atomic > qemu-system-x86-2455 [001] .n.. 393.630640: preempt_count_add <-kmap_atomic_prot > > > The resched_task() in the middle does: > > set_tsk_need_resched(); > smp_mb(); > smp_send_reschedule(); > > This means that the remote cpu must observe TIF_NEED_RESCHED (.n.. in > the traces) when the IPI (smp_reschedule_interrupt) hits. > > Now given the machine has unsynchronized TSC the order in the above is > not necessarily the true order, but there's only two possible scenarios: > > 1) its the right order; the smp_reschedule_interrupt() is from a > previous resched_task()/rcu kick or any of the other users that sends > it. > > In which case we should get another smp_reschedule_interrupt() _after_ > the resched_task() which should observe the 'n' and promote it to an > 'N'. > > This doesn't happen and we're stuck with the 'n' state.. > > 2) its the wrong order and the smp_reschedule_interrupt() actually > happened after the resched_task(), but in that case we should have > observed the 'n', we did not. > > Now given that 2) would mean a bug in the coherency fabric and 1) would > mean a lost interrupt, 1 seems like the more likely suspect. > > Joerg wanted the extra kvm tracepoints to see what the vm-exit reason > is. > Let me think about this after I had another cup of coffee and my brain decides to follow my body in the process of waking up... -Stefan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-12 8:20 ` Stefan Bader @ 2014-02-12 10:37 ` Peter Zijlstra 2014-02-12 10:40 ` Borislav Petkov 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-12 10:37 UTC (permalink / raw) To: Stefan Bader Cc: Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel, Borislav Petkov On Wed, Feb 12, 2014 at 09:20:24AM +0100, Stefan Bader wrote: > On 11.02.2014 20:45, Peter Zijlstra wrote: > > On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote: > >> Hi Peter, > >> > >> I am currently looking at a weird issue that manifest itself when trying to run > >> kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important the > >> cpu is 64bit capable, so qemu-system-x86_64 is called). > > > > AMD or Intel machine? > > Personally I am reproducing this on an AMD: > > cpu family : 15 > model : 72 > model name : AMD Turion(tm) 64 X2 Mobile Technology TL-64 > > but the reports I got are showing the same issue on an Intel i7 > > cpu family : 6 > model : 37 > model name : Intel(R) Core(TM) i7 CPU M 620 @ 2.67GH > > Another reporter also saw this on an AMD and said it could not be reproduced on > the same hardware and the same software versions when using 64bit instead of 32. > > In my case on a 32bit installation I will see this on every invocation of Curious.. so its both AMD and Intel. That makes funny hardware less likely... but how can the vmexit loose an interrupt like this. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-12 10:37 ` Peter Zijlstra @ 2014-02-12 10:40 ` Borislav Petkov 2014-02-12 11:09 ` Stefan Bader 2014-02-12 11:12 ` Joerg Roedel 0 siblings, 2 replies; 34+ messages in thread From: Borislav Petkov @ 2014-02-12 10:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Stefan Bader, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Wed, Feb 12, 2014 at 11:37:13AM +0100, Peter Zijlstra wrote: > > Another reporter also saw this on an AMD and said it could not be reproduced on > > the same hardware and the same software versions when using 64bit instead of 32. > > > > In my case on a 32bit installation I will see this on every invocation of > > Curious.. so its both AMD and Intel. > > That makes funny hardware less likely... but how can the vmexit loose an > interrupt like this. Also what I'm wondering about and what's not clear from Stefan's reply is whether this is purely a 32-bit issue, i.e. a 32-bit host running a 64-bit qemu running a 32-bit iso or what is it? Or do we have reports for both 32-bit and 64-bit hosts? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-12 10:40 ` Borislav Petkov @ 2014-02-12 11:09 ` Stefan Bader 2014-02-12 11:54 ` Peter Zijlstra 2014-02-12 11:12 ` Joerg Roedel 1 sibling, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-12 11:09 UTC (permalink / raw) To: Borislav Petkov, Peter Zijlstra Cc: Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 1481 bytes --] On 12.02.2014 11:40, Borislav Petkov wrote: > On Wed, Feb 12, 2014 at 11:37:13AM +0100, Peter Zijlstra wrote: >>> Another reporter also saw this on an AMD and said it could not be reproduced on >>> the same hardware and the same software versions when using 64bit instead of 32. >>> >>> In my case on a 32bit installation I will see this on every invocation of >> >> Curious.. so its both AMD and Intel. >> >> That makes funny hardware less likely... but how can the vmexit loose an >> interrupt like this. > > Also what I'm wondering about and what's not clear from Stefan's reply > is whether this is purely a 32-bit issue, i.e. a 32-bit host running a > 64-bit qemu running a 32-bit iso or what is it? > > Or do we have reports for both 32-bit and 64-bit hosts? Only for 32bit hosts. I have not tested a 64bit kernel plus 32bit userspace, though. But 64bit kernel and 64bit user-space seemed ok. And one detail I should add is that while the reproduction is possible in all attempts, the timing can vary. In some of the cases this was even before running a Linux kernel in the guest. Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS limited to 8 (for the 32bit kernel). So the default apic driver is used. Since default_send_IPI_mask_logical is only used from there, I assume the trace you got does the same. Maybe something there is wrong which would explain why we only see it on 32bit hosts. > > Thanks. > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-12 11:09 ` Stefan Bader @ 2014-02-12 11:54 ` Peter Zijlstra 2014-02-13 17:00 ` Stefan Bader 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-12 11:54 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: > Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS > limited to 8 (for the 32bit kernel). So the default apic driver is used. Since > default_send_IPI_mask_logical is only used from there, I assume the trace you > got does the same. Maybe something there is wrong which would explain why we > only see it on 32bit hosts. Can you try with a different APIC driver to test this? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-12 11:54 ` Peter Zijlstra @ 2014-02-13 17:00 ` Stefan Bader 2014-02-13 17:38 ` Peter Zijlstra 2014-02-13 18:25 ` Another preempt folding issue? Peter Zijlstra 0 siblings, 2 replies; 34+ messages in thread From: Stefan Bader @ 2014-02-13 17:00 UTC (permalink / raw) To: Peter Zijlstra Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 3277 bytes --] On 12.02.2014 12:54, Peter Zijlstra wrote: > On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: >> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS >> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since >> default_send_IPI_mask_logical is only used from there, I assume the trace you >> got does the same. Maybe something there is wrong which would explain why we >> only see it on 32bit hosts. > > Can you try with a different APIC driver to test this? > I don't think I can. And I think the statement about this only be used for 32bit could be wrong. I got mislead to think so because those are only defined in probe_32 but the 64bit counterpart isn't containing much aside that. Anyway, I played around with tracing a bit more. So with this change: if (need_resched()) { srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); if (need_resched() != should_resched()) { + trace_printk("need(%i) != should(%i)\n", + need_resched(), should_resched()); + trace_printk("exit_reason=%u\n", + vcpu->run->exit_reason); + trace_printk("preempt_count=%lx\n", + __this_cpu_read_4(__preempt_count)); + tracing_stop(); + printk(KERN_ERR "Stopped tracing, due to inconsistent state.\n"); } + schedule(); - cond_reschedule(); vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); } I get the following (weird) output: Xorg-1078 [001] d... 71.270251: native_smp_send_reschedule <-resched_task Xorg-1078 [001] d... 71.270251: default_send_IPI_mask_logical <-native_smp_send_reschedule bamfdaemon-2318 [001] d... 71.270465: resched_task <-check_preempt_wakeup bamfdaemon-2318 [001] d... 71.270539: resched_task <-check_preempt_wakeup compiz-2365 [001] d... 71.270689: resched_task <-check_preempt_wakeup compiz-2365 [001] d... 71.270827: resched_task <-check_preempt_wakeup compiz-2365 [001] d... 71.270940: resched_task <-check_preempt_wakeup qemu-system-x86-2679 [000] dn.. 71.270999: smp_reschedule_interrupt <-reschedule_interrupt qemu-system-x86-2679 [000] dn.. 71.270999: scheduler_ipi <-smp_reschedule_interrupt qemu-system-x86-2679 [000] .N.. 71.271001: kvm_arch_vcpu_ioctl_run: need(1) != should(0) qemu-system-x86-2679 [000] .N.. 71.271002: kvm_arch_vcpu_ioctl_run: exit_reason=2 qemu-system-x86-2679 [000] .N.. 71.271003: kvm_arch_vcpu_ioctl_run: preempt_count=0 So am I reading this right, that the interrupt did get delivered to cpu#0 while the thread info already had the resched flag set. So this really should have cleared the bit in preempt_count. But while the trace info shows 'N' for some reason should_reschedule returns false but at the same time reading the preempt count manually shows it 0? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 17:00 ` Stefan Bader @ 2014-02-13 17:38 ` Peter Zijlstra 2014-02-13 18:03 ` Stefan Bader 2014-02-13 18:25 ` Another preempt folding issue? Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-13 17:38 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote: > On 12.02.2014 12:54, Peter Zijlstra wrote: > > On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: > >> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS > >> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since > >> default_send_IPI_mask_logical is only used from there, I assume the trace you > >> got does the same. Maybe something there is wrong which would explain why we > >> only see it on 32bit hosts. > > > > Can you try with a different APIC driver to test this? > > > I don't think I can. And I think the statement about this only be used for 32bit > could be wrong. I got mislead to think so because those are only defined in > probe_32 but the 64bit counterpart isn't containing much aside that. > > Anyway, I played around with tracing a bit more. So with this change: > > if (need_resched()) { > srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > if (need_resched() != should_resched()) { > + trace_printk("need(%i) != should(%i)\n", > + need_resched(), should_resched()); > + trace_printk("exit_reason=%u\n", > + vcpu->run->exit_reason); > + trace_printk("preempt_count=%lx\n", > + __this_cpu_read_4(__preempt_count)); > + tracing_stop(); > + printk(KERN_ERR "Stopped tracing, due to > inconsistent state.\n"); > } > + schedule(); > - cond_reschedule(); > vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > } > > I get the following (weird) output: > > Xorg-1078 [001] d... 71.270251: native_smp_send_reschedule > <-resched_task > Xorg-1078 [001] d... 71.270251: default_send_IPI_mask_logical > <-native_smp_send_reschedule > bamfdaemon-2318 [001] d... 71.270465: resched_task <-check_preempt_wakeup > bamfdaemon-2318 [001] d... 71.270539: resched_task <-check_preempt_wakeup > compiz-2365 [001] d... 71.270689: resched_task <-check_preempt_wakeup > compiz-2365 [001] d... 71.270827: resched_task <-check_preempt_wakeup > compiz-2365 [001] d... 71.270940: resched_task <-check_preempt_wakeup > qemu-system-x86-2679 [000] dn.. 71.270999: smp_reschedule_interrupt > <-reschedule_interrupt > qemu-system-x86-2679 [000] dn.. 71.270999: scheduler_ipi > <-smp_reschedule_interrupt > qemu-system-x86-2679 [000] .N.. 71.271001: kvm_arch_vcpu_ioctl_run: need(1) > != should(0) > qemu-system-x86-2679 [000] .N.. 71.271002: kvm_arch_vcpu_ioctl_run: > exit_reason=2 > qemu-system-x86-2679 [000] .N.. 71.271003: kvm_arch_vcpu_ioctl_run: > preempt_count=0 > > So am I reading this right, that the interrupt did get delivered to cpu#0 while > the thread info already had the resched flag set. So this really should have > cleared the bit in preempt_count. But while the trace info shows 'N' for some > reason should_reschedule returns false but at the same time reading the preempt > count manually shows it 0? *blink*... That's weird indeed... do you have the asm that goes along with that? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 17:38 ` Peter Zijlstra @ 2014-02-13 18:03 ` Stefan Bader 2014-02-13 18:26 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-13 18:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1.1: Type: text/plain, Size: 3706 bytes --] On 13.02.2014 18:38, Peter Zijlstra wrote: > On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote: >> On 12.02.2014 12:54, Peter Zijlstra wrote: >>> On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: >>>> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS >>>> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since >>>> default_send_IPI_mask_logical is only used from there, I assume the trace you >>>> got does the same. Maybe something there is wrong which would explain why we >>>> only see it on 32bit hosts. >>> >>> Can you try with a different APIC driver to test this? >>> >> I don't think I can. And I think the statement about this only be used for 32bit >> could be wrong. I got mislead to think so because those are only defined in >> probe_32 but the 64bit counterpart isn't containing much aside that. >> >> Anyway, I played around with tracing a bit more. So with this change: >> >> if (need_resched()) { >> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); >> if (need_resched() != should_resched()) { >> + trace_printk("need(%i) != should(%i)\n", >> + need_resched(), should_resched()); >> + trace_printk("exit_reason=%u\n", >> + vcpu->run->exit_reason); >> + trace_printk("preempt_count=%lx\n", >> + __this_cpu_read_4(__preempt_count)); >> + tracing_stop(); >> + printk(KERN_ERR "Stopped tracing, due to >> inconsistent state.\n"); >> } >> + schedule(); >> - cond_reschedule(); >> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); >> } >> >> I get the following (weird) output: >> >> Xorg-1078 [001] d... 71.270251: native_smp_send_reschedule >> <-resched_task >> Xorg-1078 [001] d... 71.270251: default_send_IPI_mask_logical >> <-native_smp_send_reschedule >> bamfdaemon-2318 [001] d... 71.270465: resched_task <-check_preempt_wakeup >> bamfdaemon-2318 [001] d... 71.270539: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270689: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270827: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270940: resched_task <-check_preempt_wakeup >> qemu-system-x86-2679 [000] dn.. 71.270999: smp_reschedule_interrupt >> <-reschedule_interrupt >> qemu-system-x86-2679 [000] dn.. 71.270999: scheduler_ipi >> <-smp_reschedule_interrupt >> qemu-system-x86-2679 [000] .N.. 71.271001: kvm_arch_vcpu_ioctl_run: need(1) >> != should(0) >> qemu-system-x86-2679 [000] .N.. 71.271002: kvm_arch_vcpu_ioctl_run: >> exit_reason=2 >> qemu-system-x86-2679 [000] .N.. 71.271003: kvm_arch_vcpu_ioctl_run: >> preempt_count=0 >> >> So am I reading this right, that the interrupt did get delivered to cpu#0 while >> the thread info already had the resched flag set. So this really should have >> cleared the bit in preempt_count. But while the trace info shows 'N' for some >> reason should_reschedule returns false but at the same time reading the preempt >> count manually shows it 0? > > *blink*... That's weird indeed... do you have the asm that goes along > with that? > Yeah... not sure the interleaved source helps or not ... [-- Attachment #1.2: kvm-x86.disass.gz --] [-- Type: application/x-gzip, Size: 259965 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 18:03 ` Stefan Bader @ 2014-02-13 18:26 ` Peter Zijlstra 2014-02-14 13:34 ` Borislav Petkov 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-13 18:26 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote: > Yeah... not sure the interleaved source helps or not ... It did, thanks! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 18:26 ` Peter Zijlstra @ 2014-02-14 13:34 ` Borislav Petkov 2014-02-14 13:40 ` Stefan Bader 2014-02-14 14:24 ` Stefan Bader 0 siblings, 2 replies; 34+ messages in thread From: Borislav Petkov @ 2014-02-14 13:34 UTC (permalink / raw) To: Stefan Bader Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Thu, Feb 13, 2014 at 07:26:05PM +0100, Peter Zijlstra wrote: > On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote: > > Yeah... not sure the interleaved source helps or not ... > > It did, thanks! Stefan, can you also send sched/core.s? I'm particularly interested in what resched_task() has been turned into by your build env. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 13:34 ` Borislav Petkov @ 2014-02-14 13:40 ` Stefan Bader 2014-02-14 14:24 ` Stefan Bader 1 sibling, 0 replies; 34+ messages in thread From: Stefan Bader @ 2014-02-14 13:40 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1.1: Type: text/plain, Size: 443 bytes --] On 14.02.2014 14:34, Borislav Petkov wrote: > On Thu, Feb 13, 2014 at 07:26:05PM +0100, Peter Zijlstra wrote: >> On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote: >>> Yeah... not sure the interleaved source helps or not ... >> >> It did, thanks! > > Stefan, can you also send sched/core.s? I'm particularly interested in > what resched_task() has been turned into by your build env. > > Thanks. > Sure. Here it is. [-- Attachment #1.2: sched-core.disass.gz --] [-- Type: application/x-gzip, Size: 198537 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 13:34 ` Borislav Petkov 2014-02-14 13:40 ` Stefan Bader @ 2014-02-14 14:24 ` Stefan Bader 2014-02-14 14:47 ` Borislav Petkov 2014-02-14 15:21 ` Another preempt folding issue? (maybe bisect) Borislav Petkov 1 sibling, 2 replies; 34+ messages in thread From: Stefan Bader @ 2014-02-14 14:24 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1.1: Type: text/plain, Size: 519 bytes --] On 14.02.2014 14:34, Borislav Petkov wrote: > On Thu, Feb 13, 2014 at 07:26:05PM +0100, Peter Zijlstra wrote: >> On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote: >>> Yeah... not sure the interleaved source helps or not ... >> >> It did, thanks! > > Stefan, can you also send sched/core.s? I'm particularly interested in > what resched_task() has been turned into by your build env. > > Thanks. > Actually, this code just makes so much more sense if I let objdump do relocation info... [-- Attachment #1.2: sched-core.disass.gz --] [-- Type: application/x-gzip, Size: 210509 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 14:24 ` Stefan Bader @ 2014-02-14 14:47 ` Borislav Petkov 2014-02-14 17:02 ` Stefan Bader 2014-02-14 15:21 ` Another preempt folding issue? (maybe bisect) Borislav Petkov 1 sibling, 1 reply; 34+ messages in thread From: Borislav Petkov @ 2014-02-14 14:47 UTC (permalink / raw) To: Stefan Bader Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 03:24:09PM +0100, Stefan Bader wrote: > Actually, this code just makes so much more sense if I let objdump do > relocation info... Ok, we're pretty sure you have an MFENCE there in resched_task but can you confirm it please. First, does /proc/cpuinfo have the "sse2" string? It should but nowadays I don't trust anything. Then, can you boot that kernel in qemu with the -gdb flag so that it starts the gdb stub, here's the manpage about it: -gdb dev Wait for gdb connection on device dev. Typical connections will likely be TCP-based, but also UDP, pseudo TTY, or even stdio are reasonable use case. The latter is allowing to start QEMU from within gdb and establish the connection via a pipe: (gdb) target remote | exec qemu-system-i386 -gdb stdio ... -s Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 1234. then boot the guest and when it is up, do $ gdb ./vmlinux $ target remote localhost:1234 and type in the prompt $ (gdb) x/50i resched_task It gives here: (gdb) x/50i resched_task 0xffffffff810836f0 <resched_task>: data32 data32 data32 xchg %ax,%ax 0xffffffff810836f5 <resched_task+5>: push %rbp 0xffffffff810836f6 <resched_task+6>: mov 0x85e123(%rip),%r10d # 0xffffffff818e1820 <debug_locks> 0xffffffff810836fd <resched_task+13>: mov %rsp,%rbp 0xffffffff81083700 <resched_task+16>: push %r12 0xffffffff81083702 <resched_task+18>: test %r10d,%r10d 0xffffffff81083705 <resched_task+21>: push %rbx 0xffffffff81083706 <resched_task+22>: mov %rdi,%rbx 0xffffffff81083709 <resched_task+25>: jne 0xffffffff81083760 <resched_task+112> 0xffffffff8108370b <resched_task+27>: mov 0x8(%rbx),%rax 0xffffffff8108370f <resched_task+31>: mov 0x10(%rax),%rdx 0xffffffff81083713 <resched_task+35>: and $0x8,%edx 0xffffffff81083716 <resched_task+38>: jne 0xffffffff8108373c <resched_task+76> 0xffffffff81083718 <resched_task+40>: lock orb $0x8,0x10(%rax) 0xffffffff8108371d <resched_task+45>: mov 0x8(%rbx),%rax 0xffffffff81083721 <resched_task+49>: mov 0x18(%rax),%r12d 0xffffffff81083725 <resched_task+53>: callq 0xffffffff812d8fc0 <debug_smp_processor_id> 0xffffffff8108372a <resched_task+58>: cmp %r12d,%eax 0xffffffff8108372d <resched_task+61>: je 0xffffffff810837a0 <resched_task+176> 0xffffffff8108372f <resched_task+63>: mfence ^^^^^^ I want to make sure the smp_mb() is really replaced with an MFENCE there. Thanks! -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 14:47 ` Borislav Petkov @ 2014-02-14 17:02 ` Stefan Bader 2014-02-14 17:21 ` Peter Zijlstra 2014-02-14 17:33 ` Borislav Petkov 0 siblings, 2 replies; 34+ messages in thread From: Stefan Bader @ 2014-02-14 17:02 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 4815 bytes --] On 14.02.2014 15:47, Borislav Petkov wrote: > On Fri, Feb 14, 2014 at 03:24:09PM +0100, Stefan Bader wrote: >> Actually, this code just makes so much more sense if I let objdump do >> relocation info... > > Ok, we're pretty sure you have an MFENCE there in resched_task but can > you confirm it please. > > First, does /proc/cpuinfo have the "sse2" string? It should but nowadays > I don't trust anything. > > Then, can you boot that kernel in qemu with the -gdb flag so that it > starts the gdb stub, here's the manpage about it: > > -gdb dev > Wait for gdb connection on device dev. Typical connections will likely be > TCP-based, but also UDP, pseudo TTY, or even stdio are reasonable use case. > The latter is allowing to start QEMU from within gdb and establish the > connection via a pipe: > > (gdb) target remote | exec qemu-system-i386 -gdb stdio ... > > -s Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 1234. > > then boot the guest and when it is up, do > > $ gdb ./vmlinux > $ target remote localhost:1234 > > and type in the prompt > > $ (gdb) x/50i resched_task > > It gives here: > > (gdb) x/50i resched_task > 0xffffffff810836f0 <resched_task>: data32 data32 data32 xchg %ax,%ax > 0xffffffff810836f5 <resched_task+5>: push %rbp > 0xffffffff810836f6 <resched_task+6>: mov 0x85e123(%rip),%r10d # 0xffffffff818e1820 <debug_locks> > 0xffffffff810836fd <resched_task+13>: mov %rsp,%rbp > 0xffffffff81083700 <resched_task+16>: push %r12 > 0xffffffff81083702 <resched_task+18>: test %r10d,%r10d > 0xffffffff81083705 <resched_task+21>: push %rbx > 0xffffffff81083706 <resched_task+22>: mov %rdi,%rbx > 0xffffffff81083709 <resched_task+25>: jne 0xffffffff81083760 <resched_task+112> > 0xffffffff8108370b <resched_task+27>: mov 0x8(%rbx),%rax > 0xffffffff8108370f <resched_task+31>: mov 0x10(%rax),%rdx > 0xffffffff81083713 <resched_task+35>: and $0x8,%edx > 0xffffffff81083716 <resched_task+38>: jne 0xffffffff8108373c <resched_task+76> > 0xffffffff81083718 <resched_task+40>: lock orb $0x8,0x10(%rax) > 0xffffffff8108371d <resched_task+45>: mov 0x8(%rbx),%rax > 0xffffffff81083721 <resched_task+49>: mov 0x18(%rax),%r12d > 0xffffffff81083725 <resched_task+53>: callq 0xffffffff812d8fc0 <debug_smp_processor_id> > 0xffffffff8108372a <resched_task+58>: cmp %r12d,%eax > 0xffffffff8108372d <resched_task+61>: je 0xffffffff810837a0 <resched_task+176> > 0xffffffff8108372f <resched_task+63>: mfence > ^^^^^^ > I want to make sure the smp_mb() is really replaced with an MFENCE there. > > Thanks! > Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. And there is a mfence in the disassembly: 0xc107dcb0 <resched_task>: push %ebp 0xc107dcb1 <resched_task+1>: mov %esp,%ebp 0xc107dcb3 <resched_task+3>: lea %ds:0x0(%esi,%eiz,1),%esi 0xc107dcb8 <resched_task+8>: mov 0x4(%eax),%edx 0xc107dcbb <resched_task+11>: mov 0x8(%edx),%ecx 0xc107dcbe <resched_task+14>: and $0x8,%ecx 0xc107dcc1 <resched_task+17>: jne 0xc107dce7 <resched_task+55> 0xc107dcc3 <resched_task+19>: orb $0x8,%ds:0x8(%edx) 0xc107dcc8 <resched_task+24>: mov 0x4(%eax),%edx 0xc107dccb <resched_task+27>: mov %fs:0xc1a71010,%ecx 0xc107dcd2 <resched_task+34>: mov 0x10(%edx),%edx 0xc107dcd5 <resched_task+37>: cmp %ecx,%edx 0xc107dcd7 <resched_task+39>: je 0xc107dd00 <resched_task+80> 0xc107dcd9 <resched_task+41>: mfence 0xc107dcdc <resched_task+44>: mov %esi,%esi 0xc107dcde <resched_task+46>: mov 0x4(%eax),%eax 0xc107dce1 <resched_task+49>: testb $0x4,0xc(%eax) 0xc107dce5 <resched_task+53>: je 0xc107dcf0 <resched_task+64> 0xc107dce7 <resched_task+55>: pop %ebp 0xc107dce8 <resched_task+56>: ret 0xc107dce9 <resched_task+57>: lea 0x0(%esi,%eiz,1),%esi 0xc107dcf0 <resched_task+64>: mov %edx,%eax 0xc107dcf2 <resched_task+66>: call *0xc1917950 0xc107dcf8 <resched_task+72>: pop %ebp Thinking about it, I guess Peter is quite right saying that I likely will end on the patch that converted preempt_count to percpu. One thing I likely should do is to reinstall the exact same laptop with 64bit kernel and userspace... maybe only 64bit kernel first... and make sure on my side that this does not show up on 64bit, too. I took the word of reporters for that (and the impression that otherwise many more people would have complained). [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 17:02 ` Stefan Bader @ 2014-02-14 17:21 ` Peter Zijlstra 2014-02-20 15:38 ` Stefan Bader 2014-02-14 17:33 ` Borislav Petkov 1 sibling, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-14 17:21 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote: > One thing I likely should do is to reinstall the exact same laptop with 64bit > kernel and userspace... maybe only 64bit kernel first... and make sure on my > side that this does not show up on 64bit, too. I took the word of reporters for > that (and the impression that otherwise many more people would have complained). Yeha, I'm going to try and install some 32bit userspace on a usb harddisk I've got and see if I can boot my Core2 laptop from that to try and reproduce. But all that is probably going to be Monday :/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 17:21 ` Peter Zijlstra @ 2014-02-20 15:38 ` Stefan Bader 2014-02-20 15:50 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-20 15:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] On 14.02.2014 18:21, Peter Zijlstra wrote: > On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote: >> One thing I likely should do is to reinstall the exact same laptop with 64bit >> kernel and userspace... maybe only 64bit kernel first... and make sure on my >> side that this does not show up on 64bit, too. I took the word of reporters for >> that (and the impression that otherwise many more people would have complained). > > Yeha, I'm going to try and install some 32bit userspace on a usb > harddisk I've got and see if I can boot my Core2 laptop from that to try > and reproduce. > > But all that is probably going to be Monday :/ > *sigh* Already Thursday... Peter, did you get to reproduce this locally? Unfortunately I had some interruption and have not more Information than on last Friday (which is that the same hw but 64bit kernel does not show it). Meanwhile I wonder whether it would make sense to push the following (or more?) to stable for 3.13.y: 1) 16824255394f55adf31b9a96a9965d8c15bdac4c x86, acpi, idle: Restructure the mwait idle routines 2) 7e98b71920464b8d15fa95c74366416cd3c88861 x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers 3) 8cb75e0c4ec9786b81439761eac1d18d4a931af3 sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding 4) 215393bc1fab3d61a5a296838bdffce22f27ffda sched/preempt/x86: Fix voluntary preempt for x86 1+2 would be to avoid touching 3 too much and looked to be improvements on their own. 3+4 would be cherry-picks if not for some fuzz 2. I saw a few more things labelled preempt between 3.13 and current HEAD but am not sure whether or which of those are strictly required. Namely some fixing to preempt_enable_no_resched() mis-usage and maybe one fixing an issue of ftrace locking up. -Stefan 3 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-20 15:38 ` Stefan Bader @ 2014-02-20 15:50 ` Peter Zijlstra 2014-03-24 17:39 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2014-02-20 15:50 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Thu, Feb 20, 2014 at 04:38:13PM +0100, Stefan Bader wrote: > On 14.02.2014 18:21, Peter Zijlstra wrote: > > On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote: > >> One thing I likely should do is to reinstall the exact same laptop with 64bit > >> kernel and userspace... maybe only 64bit kernel first... and make sure on my > >> side that this does not show up on 64bit, too. I took the word of reporters for > >> that (and the impression that otherwise many more people would have complained). > > > > Yeha, I'm going to try and install some 32bit userspace on a usb > > harddisk I've got and see if I can boot my Core2 laptop from that to try > > and reproduce. > > > > But all that is probably going to be Monday :/ > > > *sigh* Already Thursday... > > Peter, did you get to reproduce this locally? Unfortunately I had some > interruption and have not more Information than on last Friday (which is that > the same hw but 64bit kernel does not show it). I got side-tracked as well, someone reported crashes, which come above weird behaviour :/ > Meanwhile I wonder whether it would make sense to push the following (or more?) > to stable for 3.13.y: > > 1) 16824255394f55adf31b9a96a9965d8c15bdac4c > x86, acpi, idle: Restructure the mwait idle routines > 2) 7e98b71920464b8d15fa95c74366416cd3c88861 > x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers > 3) 8cb75e0c4ec9786b81439761eac1d18d4a931af3 > sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding > 4) 215393bc1fab3d61a5a296838bdffce22f27ffda > sched/preempt/x86: Fix voluntary preempt for x86 > > 1+2 would be to avoid touching 3 too much and looked to be improvements on their > own. 3+4 would be cherry-picks if not for some fuzz 2. I'll not object; but if someone wants a smaller set you could do with just the idle.c hunk from 3. > I saw a few more things labelled preempt between 3.13 and current HEAD but am > not sure whether or which of those are strictly required. Namely some fixing to > preempt_enable_no_resched() mis-usage and maybe one fixing an issue of ftrace > locking up. Yeah, that's all fresh breakage and doesn't go back to 3.13 I think. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-20 15:50 ` Peter Zijlstra @ 2014-03-24 17:39 ` Paolo Bonzini 2014-03-25 8:23 ` Stefan Bader 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2014-03-24 17:39 UTC (permalink / raw) To: Peter Zijlstra, Stefan Bader Cc: Borislav Petkov, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel Il 20/02/2014 16:50, Peter Zijlstra ha scritto: >>>> > >> One thing I likely should do is to reinstall the exact same laptop with 64bit >>>> > >> kernel and userspace... maybe only 64bit kernel first... and make sure on my >>>> > >> side that this does not show up on 64bit, too. I took the word of reporters for >>>> > >> that (and the impression that otherwise many more people would have complained). >>> > > >>> > > Yeha, I'm going to try and install some 32bit userspace on a usb >>> > > harddisk I've got and see if I can boot my Core2 laptop from that to try >>> > > and reproduce. >>> > > >>> > > But all that is probably going to be Monday :/ >>> > > >> > *sigh* Already Thursday... >> > >> > Peter, did you get to reproduce this locally? Unfortunately I had some >> > interruption and have not more Information than on last Friday (which is that >> > the same hw but 64bit kernel does not show it). > I got side-tracked as well, someone reported crashes, which come above > weird behaviour :/ > Stefan, Peter, any news here? Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-03-24 17:39 ` Paolo Bonzini @ 2014-03-25 8:23 ` Stefan Bader 0 siblings, 0 replies; 34+ messages in thread From: Stefan Bader @ 2014-03-25 8:23 UTC (permalink / raw) To: Paolo Bonzini, Peter Zijlstra Cc: Borislav Petkov, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 1671 bytes --] On 24.03.2014 18:39, Paolo Bonzini wrote: > Il 20/02/2014 16:50, Peter Zijlstra ha scritto: >>>>> > >> One thing I likely should do is to reinstall the exact same laptop >>>>> with 64bit >>>>> > >> kernel and userspace... maybe only 64bit kernel first... and make sure >>>>> on my >>>>> > >> side that this does not show up on 64bit, too. I took the word of >>>>> reporters for >>>>> > >> that (and the impression that otherwise many more people would have >>>>> complained). >>>> > > >>>> > > Yeha, I'm going to try and install some 32bit userspace on a usb >>>> > > harddisk I've got and see if I can boot my Core2 laptop from that to try >>>> > > and reproduce. >>>> > > >>>> > > But all that is probably going to be Monday :/ >>>> > > >>> > *sigh* Already Thursday... >>> > >>> > Peter, did you get to reproduce this locally? Unfortunately I had some >>> > interruption and have not more Information than on last Friday (which is that >>> > the same hw but 64bit kernel does not show it). >> I got side-tracked as well, someone reported crashes, which come above >> weird behaviour :/ >> > > Stefan, Peter, any news here? > > Paolo No, unfortunately not. :( I thought I had some idea as i386 seems to have soft and hard irqs using a seperate stack and thought they maybe loose the process flag there (x86_64 seems to always do that for soft irqs and if it does for hard irqs it was hidden well enough for me not to see). But trial and error showed showed no improvement when I made a copy of the original TIF_NEED_RSCHED before executing on stack. So not really any further. And then the usual distractions hit... -Stefan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 17:02 ` Stefan Bader 2014-02-14 17:21 ` Peter Zijlstra @ 2014-02-14 17:33 ` Borislav Petkov 2014-02-14 18:23 ` Stefan Bader 1 sibling, 1 reply; 34+ messages in thread From: Borislav Petkov @ 2014-02-14 17:33 UTC (permalink / raw) To: Stefan Bader Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote: > Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. And > there is a mfence in the disassembly: Btw, I just realized booting the kernel in the guest was a dumb idea, because, doh, the guest is not baremetal. The only reliable thing we can say is that sse2 is present and that MFENCE alternative replacement works :) But for simplicity's sake let's just assume the machine can do MFENCE just fine and it gets replaced by the alternatives code. Besides, if that weren't true, we'd have a whole lot of other problems on those boxes. > Thinking about it, I guess Peter is quite right saying that I likely > will end on the patch that converted preempt_count to percpu. Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al. > One thing I likely should do is to reinstall the exact same laptop > with 64bit kernel and userspace... maybe only 64bit kernel first... > and make sure on my side that this does not show up on 64bit, too. I > took the word of reporters for that (and the impression that otherwise > many more people would have complained). Yeah, that should be a prudent thing to do. Also, Paolo and I were wondering whether you can trigger this thing without kvm, i.e. virtualization involved... do you have any data on that? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 17:33 ` Borislav Petkov @ 2014-02-14 18:23 ` Stefan Bader 2014-02-14 19:03 ` Stefan Bader 0 siblings, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-14 18:23 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 2349 bytes --] On 14.02.2014 18:33, Borislav Petkov wrote: > On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote: >> Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. And >> there is a mfence in the disassembly: > > Btw, I just realized booting the kernel in the guest was a dumb idea, > because, doh, the guest is not baremetal. The only reliable thing we > can say is that sse2 is present and that MFENCE alternative replacement > works :) > > But for simplicity's sake let's just assume the machine can do MFENCE > just fine and it gets replaced by the alternatives code. > > Besides, if that weren't true, we'd have a whole lot of other problems > on those boxes. > >> Thinking about it, I guess Peter is quite right saying that I likely >> will end on the patch that converted preempt_count to percpu. > > Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al. > >> One thing I likely should do is to reinstall the exact same laptop >> with 64bit kernel and userspace... maybe only 64bit kernel first... >> and make sure on my side that this does not show up on 64bit, too. I >> took the word of reporters for that (and the impression that otherwise >> many more people would have complained). > > Yeah, that should be a prudent thing to do. > > Also, Paolo and I were wondering whether you can trigger this thing > without kvm, i.e. virtualization involved... do you have any data on > that? Unfortunately no hard evidence. Kvm just happens to be such a good way to notice this as it is using the reschedule interrupt itself and has this exit before running the guest vcpu to hadnle it in the outer loop by calling cond_resched() and repeat. I find running kvm seems to make that laptop quite sluggish in responding to other tasks (in that install) and I got some oddness going on when lightdm quite often refuses to take keyboard input without opening some menu with the mouse first... But I could not be sure whether that is the kernel or some new user-space ... errr "feature". At least Marcello (iirc that other report came from him directly or indirectly) has seen it, too. And he likely has complete different user-space. So I will go and do that different (64bit) kernel and kernel + user-space test. But like fo Peter, it likely is a Monday thing... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 18:23 ` Stefan Bader @ 2014-02-14 19:03 ` Stefan Bader 0 siblings, 0 replies; 34+ messages in thread From: Stefan Bader @ 2014-02-14 19:03 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 3046 bytes --] On 14.02.2014 19:23, Stefan Bader wrote: > On 14.02.2014 18:33, Borislav Petkov wrote: >> On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote: >>> Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. And >>> there is a mfence in the disassembly: >> >> Btw, I just realized booting the kernel in the guest was a dumb idea, >> because, doh, the guest is not baremetal. The only reliable thing we >> can say is that sse2 is present and that MFENCE alternative replacement >> works :) >> >> But for simplicity's sake let's just assume the machine can do MFENCE >> just fine and it gets replaced by the alternatives code. >> >> Besides, if that weren't true, we'd have a whole lot of other problems >> on those boxes. >> >>> Thinking about it, I guess Peter is quite right saying that I likely >>> will end on the patch that converted preempt_count to percpu. >> >> Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al. >> >>> One thing I likely should do is to reinstall the exact same laptop >>> with 64bit kernel and userspace... maybe only 64bit kernel first... >>> and make sure on my side that this does not show up on 64bit, too. I >>> took the word of reporters for that (and the impression that otherwise >>> many more people would have complained). >> >> Yeah, that should be a prudent thing to do. >> >> Also, Paolo and I were wondering whether you can trigger this thing >> without kvm, i.e. virtualization involved... do you have any data on >> that? > > Unfortunately no hard evidence. Kvm just happens to be such a good way to notice > this as it is using the reschedule interrupt itself and has this exit before > running the guest vcpu to hadnle it in the outer loop by calling cond_resched() > and repeat. > I find running kvm seems to make that laptop quite sluggish in responding to > other tasks (in that install) and I got some oddness going on when lightdm quite > often refuses to take keyboard input without opening some menu with the mouse > first... But I could not be sure whether that is the kernel or some new > user-space ... errr "feature". > At least Marcello (iirc that other report came from him directly or indirectly) > has seen it, too. And he likely has complete different user-space. > > So I will go and do that different (64bit) kernel and kernel + user-space test. > But like fo Peter, it likely is a Monday thing... > Ok, it is still Friday... So a quick test (2 boots of a 32bit guest, same as before) on the 32bit user-space, with the same kernel source, but compiled as 64bit (obviously not 100% same config but close). While I see the false inconsistency messages (I modified the in kernel-test to trigger the trace stop only if the __vcpu_run loop encounters an inconsistent state three times in a row), I do not see the final stop message. Also (but that is rather feeling) the system seems to remain more responsive (switching to other windows, opening terminal windows,...) compared to 32bit kernel. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? (maybe bisect) 2014-02-14 14:24 ` Stefan Bader 2014-02-14 14:47 ` Borislav Petkov @ 2014-02-14 15:21 ` Borislav Petkov 2014-02-14 15:28 ` Stefan Bader 2014-02-14 16:21 ` Peter Zijlstra 1 sibling, 2 replies; 34+ messages in thread From: Borislav Petkov @ 2014-02-14 15:21 UTC (permalink / raw) To: Stefan Bader Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel Oh, and just in case this is relatively easy to reproduce and in case we don't have any other idea, bisection might be another option. I'm not saying you should do it right away - I'm just putting it on the table... :-) :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? (maybe bisect) 2014-02-14 15:21 ` Another preempt folding issue? (maybe bisect) Borislav Petkov @ 2014-02-14 15:28 ` Stefan Bader 2014-02-14 15:44 ` Borislav Petkov 2014-02-14 16:21 ` Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-14 15:28 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 446 bytes --] On 14.02.2014 16:21, Borislav Petkov wrote: > Oh, and just in case this is relatively easy to reproduce and in case we > don't have any other idea, bisection might be another option. I'm not > saying you should do it right away - I'm just putting it on the table... > > :-) > > :-) > Oh yeah, bisection is nearly as entertaining as doing my tax records. Hm, on the other hand those will have to be done at some point too... :-P [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? (maybe bisect) 2014-02-14 15:28 ` Stefan Bader @ 2014-02-14 15:44 ` Borislav Petkov 0 siblings, 0 replies; 34+ messages in thread From: Borislav Petkov @ 2014-02-14 15:44 UTC (permalink / raw) To: Stefan Bader Cc: Peter Zijlstra, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 04:28:16PM +0100, Stefan Bader wrote: > Oh yeah, bisection is nearly as entertaining as doing my tax records. > Hm, on the other hand those will have to be done at some point too... > :-P Ah, tax records, crap, I have those lying around in the corner since forever... I need to do those too :-( -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? (maybe bisect) 2014-02-14 15:21 ` Another preempt folding issue? (maybe bisect) Borislav Petkov 2014-02-14 15:28 ` Stefan Bader @ 2014-02-14 16:21 ` Peter Zijlstra 1 sibling, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2014-02-14 16:21 UTC (permalink / raw) To: Borislav Petkov Cc: Stefan Bader, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 04:21:32PM +0100, Borislav Petkov wrote: > Oh, and just in case this is relatively easy to reproduce and in case we > don't have any other idea, bisection might be another option. I'm not > saying you should do it right away - I'm just putting it on the table... I'm fairly sure you're going to hit the patch that introduced the whole per-cpu preemption thing. Before that not receiving the IPI wasn't as bad as it is now. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 17:00 ` Stefan Bader 2014-02-13 17:38 ` Peter Zijlstra @ 2014-02-13 18:25 ` Peter Zijlstra 2014-02-14 10:55 ` Stefan Bader 2014-02-14 11:24 ` Stefan Bader 1 sibling, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2014-02-13 18:25 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote: > On 12.02.2014 12:54, Peter Zijlstra wrote: > > On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: > >> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS > >> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since > >> default_send_IPI_mask_logical is only used from there, I assume the trace you > >> got does the same. Maybe something there is wrong which would explain why we > >> only see it on 32bit hosts. > > > > Can you try with a different APIC driver to test this? > > > I don't think I can. And I think the statement about this only be used for 32bit > could be wrong. I got mislead to think so because those are only defined in > probe_32 but the 64bit counterpart isn't containing much aside that. > > Anyway, I played around with tracing a bit more. So with this change: > > if (need_resched()) { > srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); > if (need_resched() != should_resched()) { > + trace_printk("need(%i) != should(%i)\n", > + need_resched(), should_resched()); > + trace_printk("exit_reason=%u\n", > + vcpu->run->exit_reason); > + trace_printk("preempt_count=%lx\n", > + __this_cpu_read_4(__preempt_count)); > + tracing_stop(); > + printk(KERN_ERR "Stopped tracing, due to > inconsistent state.\n"); > } > + schedule(); > - cond_reschedule(); > vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); > } > > I get the following (weird) output: > > Xorg-1078 [001] d... 71.270251: native_smp_send_reschedule > <-resched_task > Xorg-1078 [001] d... 71.270251: default_send_IPI_mask_logical > <-native_smp_send_reschedule > bamfdaemon-2318 [001] d... 71.270465: resched_task <-check_preempt_wakeup > bamfdaemon-2318 [001] d... 71.270539: resched_task <-check_preempt_wakeup > compiz-2365 [001] d... 71.270689: resched_task <-check_preempt_wakeup > compiz-2365 [001] d... 71.270827: resched_task <-check_preempt_wakeup > compiz-2365 [001] d... 71.270940: resched_task <-check_preempt_wakeup > qemu-system-x86-2679 [000] dn.. 71.270999: smp_reschedule_interrupt > <-reschedule_interrupt > qemu-system-x86-2679 [000] dn.. 71.270999: scheduler_ipi > <-smp_reschedule_interrupt > qemu-system-x86-2679 [000] .N.. 71.271001: kvm_arch_vcpu_ioctl_run: need(1) > != should(0) > qemu-system-x86-2679 [000] .N.. 71.271002: kvm_arch_vcpu_ioctl_run: > exit_reason=2 > qemu-system-x86-2679 [000] .N.. 71.271003: kvm_arch_vcpu_ioctl_run: > preempt_count=0 > > So am I reading this right, that the interrupt did get delivered to cpu#0 while > the thread info already had the resched flag set. So this really should have > cleared the bit in preempt_count. But while the trace info shows 'N' for some > reason should_reschedule returns false but at the same time reading the preempt > count manually shows it 0? So the assembly merges the first and second should_resched(), so its possible that load got before the interrupt(). The 3rd preempt_count load gets re-issued and so that would show the 'true' value again. If you want to force a reload after the condition; put in a barrier(). In any case; this looks like a false-positive. Please try again until you get one where the interrupt doesn't happen and we stay in 'n' state. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 18:25 ` Another preempt folding issue? Peter Zijlstra @ 2014-02-14 10:55 ` Stefan Bader 2014-02-14 13:17 ` Peter Zijlstra 2014-02-14 11:24 ` Stefan Bader 1 sibling, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-14 10:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 6588 bytes --] On 13.02.2014 19:25, Peter Zijlstra wrote: > On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote: >> On 12.02.2014 12:54, Peter Zijlstra wrote: >>> On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: >>>> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS >>>> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since >>>> default_send_IPI_mask_logical is only used from there, I assume the trace you >>>> got does the same. Maybe something there is wrong which would explain why we >>>> only see it on 32bit hosts. >>> >>> Can you try with a different APIC driver to test this? >>> >> I don't think I can. And I think the statement about this only be used for 32bit >> could be wrong. I got mislead to think so because those are only defined in >> probe_32 but the 64bit counterpart isn't containing much aside that. >> >> Anyway, I played around with tracing a bit more. So with this change: >> >> if (need_resched()) { >> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); >> if (need_resched() != should_resched()) { >> + trace_printk("need(%i) != should(%i)\n", >> + need_resched(), should_resched()); >> + trace_printk("exit_reason=%u\n", >> + vcpu->run->exit_reason); >> + trace_printk("preempt_count=%lx\n", >> + __this_cpu_read_4(__preempt_count)); >> + tracing_stop(); >> + printk(KERN_ERR "Stopped tracing, due to >> inconsistent state.\n"); >> } >> + schedule(); >> - cond_reschedule(); >> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); >> } >> >> I get the following (weird) output: >> >> Xorg-1078 [001] d... 71.270251: native_smp_send_reschedule >> <-resched_task >> Xorg-1078 [001] d... 71.270251: default_send_IPI_mask_logical >> <-native_smp_send_reschedule >> bamfdaemon-2318 [001] d... 71.270465: resched_task <-check_preempt_wakeup >> bamfdaemon-2318 [001] d... 71.270539: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270689: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270827: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270940: resched_task <-check_preempt_wakeup >> qemu-system-x86-2679 [000] dn.. 71.270999: smp_reschedule_interrupt >> <-reschedule_interrupt >> qemu-system-x86-2679 [000] dn.. 71.270999: scheduler_ipi >> <-smp_reschedule_interrupt >> qemu-system-x86-2679 [000] .N.. 71.271001: kvm_arch_vcpu_ioctl_run: need(1) >> != should(0) >> qemu-system-x86-2679 [000] .N.. 71.271002: kvm_arch_vcpu_ioctl_run: >> exit_reason=2 >> qemu-system-x86-2679 [000] .N.. 71.271003: kvm_arch_vcpu_ioctl_run: >> preempt_count=0 >> >> So am I reading this right, that the interrupt did get delivered to cpu#0 while >> the thread info already had the resched flag set. So this really should have >> cleared the bit in preempt_count. But while the trace info shows 'N' for some >> reason should_reschedule returns false but at the same time reading the preempt >> count manually shows it 0? > > So the assembly merges the first and second should_resched(), so its > possible that load got before the interrupt(). > > The 3rd preempt_count load gets re-issued and so that would show the > 'true' value again. > > If you want to force a reload after the condition; put in a barrier(). > > In any case; this looks like a false-positive. Please try again until > you get one where the interrupt doesn't happen and we stay in 'n' state. > Ok, I think I now got a log of the actual issue. It seems cpu#1 missed out on handling a reschedule interrupt but did send one to cpu#0 and on cpu#0 while handling the interrupt the tif flag was not set (yet?) but then when it is, there is no more interrupt to do the folding... (sorry for the wrapping)... qemu-system-x86-2503 [000] d.h. 1145.887872: default_send_IPI_mask_logical <-native_smp_send_reschedule qemu-system-x86-2503 [000] d... 1145.887888: resched_task <-check_preempt_curr <idle>-0 [000] d.h. 1145.888001: resched_task <-check_preempt_curr <idle>-0 [001] dn.. 1145.888788: smp_reschedule_interrupt <-reschedule_interrupt <idle>-0 [001] dn.. 1145.888789: scheduler_ipi <-smp_reschedule_interrupt gnome-terminal-2440 [000] d... 1145.888957: resched_task <-check_preempt_curr gnome-terminal-2440 [000] d... 1145.888959: native_smp_send_reschedule <-resched_task gnome-terminal-2440 [000] d... 1145.888959: default_send_IPI_mask_logical <-native_smp_send_reschedule <idle>-0 [001] dn.. 1145.889296: smp_reschedule_interrupt <-reschedule_interrupt <idle>-0 [001] dn.. 1145.889297: scheduler_ipi <-smp_reschedule_interrupt gnome-terminal-2440 [000] d... 1145.889465: resched_task <-check_preempt_curr gnome-terminal-2440 [000] d... 1145.889467: native_smp_send_reschedule <-resched_task gnome-terminal-2440 [000] d... 1145.889467: default_send_IPI_mask_logical <-native_smp_send_reschedule Xorg-1054 [001] d.h. 1145.889469: resched_task <-check_preempt_wakeup Xorg-1054 [001] d.h. 1145.889469: native_smp_send_reschedule <-resched_task Xorg-1054 [001] d.h. 1145.889469: default_send_IPI_mask_logical <-native_smp_send_reschedule qemu-system-x86-2503 [000] d.s. 1145.889643: smp_reschedule_interrupt <-reschedule_interrupt qemu-system-x86-2503 [000] d.s. 1145.889643: scheduler_ipi <-smp_reschedule_interrupt qemu-system-x86-2503 [000] .n.. 1145.889647: kvm_arch_vcpu_ioctl_run: sched inconsistency qemu-system-x86-2503 [000] .n.. 1145.889662: kvm_arch_vcpu_ioctl_run: sched inconsistency qemu-system-x86-2503 [000] .n.. 1145.889664: kvm_arch_vcpu_ioctl_run: sched inconsistency qemu-system-x86-2503 [000] .n.. 1145.889668: kvm_arch_vcpu_ioctl_run: need(1) != should(0) qemu-system-x86-2503 [000] .n.. 1145.889669: kvm_arch_vcpu_ioctl_run: exit_reason=2 qemu-system-x86-2503 [000] .n.. 1145.889669: kvm_arch_vcpu_ioctl_run: preempt_count=80000000 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 10:55 ` Stefan Bader @ 2014-02-14 13:17 ` Peter Zijlstra 0 siblings, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2014-02-14 13:17 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 11:55:25AM +0100, Stefan Bader wrote: > Ok, I think I now got a log of the actual issue. It seems cpu#1 missed out on > handling a reschedule interrupt but did send one to cpu#0 and on cpu#0 while > handling the interrupt the tif flag was not set (yet?) but then when it is, > there is no more interrupt to do the folding... (sorry for the wrapping)... > > qemu-system-x86-2503 [000] d.h. 1145.887872: default_send_IPI_mask_logical <-native_smp_send_reschedule > qemu-system-x86-2503 [000] d... 1145.887888: resched_task <-check_preempt_curr > <idle>-0 [000] d.h. 1145.888001: resched_task <-check_preempt_curr > <idle>-0 [001] dn.. 1145.888788: smp_reschedule_interrupt <-reschedule_interrupt > <idle>-0 [001] dn.. 1145.888789: scheduler_ipi <-smp_reschedule_interrupt > gnome-terminal-2440 [000] d... 1145.888957: resched_task <-check_preempt_curr > gnome-terminal-2440 [000] d... 1145.888959: native_smp_send_reschedule <-resched_task > gnome-terminal-2440 [000] d... 1145.888959: default_send_IPI_mask_logical <-native_smp_send_reschedule > <idle>-0 [001] dn.. 1145.889296: smp_reschedule_interrupt <-reschedule_interrupt > <idle>-0 [001] dn.. 1145.889297: scheduler_ipi <-smp_reschedule_interrupt > gnome-terminal-2440 [000] d... 1145.889465: resched_task <-check_preempt_curr > gnome-terminal-2440 [000] d... 1145.889467: native_smp_send_reschedule <-resched_task > gnome-terminal-2440 [000] d... 1145.889467: default_send_IPI_mask_logical <-native_smp_send_reschedule > Xorg-1054 [001] d.h. 1145.889469: resched_task <-check_preempt_wakeup > Xorg-1054 [001] d.h. 1145.889469: native_smp_send_reschedule <-resched_task > Xorg-1054 [001] d.h. 1145.889469: default_send_IPI_mask_logical <-native_smp_send_reschedule > qemu-system-x86-2503 [000] d.s. 1145.889643: smp_reschedule_interrupt <-reschedule_interrupt > qemu-system-x86-2503 [000] d.s. 1145.889643: scheduler_ipi <-smp_reschedule_interrupt > qemu-system-x86-2503 [000] .n.. 1145.889647: kvm_arch_vcpu_ioctl_run: sched inconsistency > qemu-system-x86-2503 [000] .n.. 1145.889662: kvm_arch_vcpu_ioctl_run: sched inconsistency > qemu-system-x86-2503 [000] .n.. 1145.889664: kvm_arch_vcpu_ioctl_run: sched inconsistency > qemu-system-x86-2503 [000] .n.. 1145.889668: kvm_arch_vcpu_ioctl_run: need(1) != should(0) > qemu-system-x86-2503 [000] .n.. 1145.889669: kvm_arch_vcpu_ioctl_run: exit_reason=2 > qemu-system-x86-2503 [000] .n.. 1145.889669: kvm_arch_vcpu_ioctl_run: preempt_count=80000000 > So afaiu you're running this on an AMD fam 0xf which doesn't have synchronized TSC; so again the timing might be off.. however: The first block from cpu 0, sends an IPI to cpu 1, cpu 1 sees 'n'. Then again, cpu 0 sends to cpu 1, we observe 'n'. So far so good and timing seems to reflect causality. Then however, cpu 0 sends, but nothing in this trace answers on cpu 1.. odd. Then cpu 1 sends to cpu 0, cpu 0 does receive the IPI but does _not_ observe 'n', which is mighty odd. However after the IPI is handled cpu 0 magically gets the 'n'. So one thing I did wonder about; if cpu 0 sends an IPI while cpu 1 is already processing the same interrupt; cpu 1 should latch the interrupt and it would re-trigger with when we re-enable interrupts after we're done with the first. The only thing that would ruin that is late-ack, eg. ack (clear the latch) when exiting the interrupt; but I checked and we ack before we call scheduler_ipi(). Now; if this were to happen (but I don't exactly see how on a 2 CPU system, nor do I see this in the trace), there might be a hole... CPU 0 | CPU 1 --------------------------------+-------------------------------- | /* ! resched_task */ | smp_send_reschedule(1) |L |L smp_reschedule_interrupt() resched_task() |L set_tsk_need_resched() |L smp_mb() |L smp_send_reschedule(1) |L |L ack_APIC_irq() | scheduler_ipi() | need_resched(); Now strictly speaking that need_resched() load could creep up before the store and we'd fail to observe 'n'. However, x86 forbids reads from passing an earlier write to the same location (it allows passing earlier writes in general). At which point the only other option is GCC emitting that load far earlier, in fact before ack_APIC_irq(), but that would be a violation of the C std because scheduler_ipi() is a (real) function call and acts as a compiler barrier AFAIK (as opposed to inlining which doesn't act as a compiler barrier). So colour me confused.. You could try putting an smp_mb() at the very start of scheduler_ipi() just to make absolutely sure, but it _should_ not matter. Of course, I could be completely off base with how interrupts work; and there's a wide gaping hole in there, in which case, someone please educate me ;-) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-13 18:25 ` Another preempt folding issue? Peter Zijlstra 2014-02-14 10:55 ` Stefan Bader @ 2014-02-14 11:24 ` Stefan Bader 2014-02-14 11:49 ` Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: Stefan Bader @ 2014-02-14 11:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel [-- Attachment #1: Type: text/plain, Size: 5120 bytes --] On 13.02.2014 19:25, Peter Zijlstra wrote: > On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote: >> On 12.02.2014 12:54, Peter Zijlstra wrote: >>> On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote: >>>> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS >>>> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since >>>> default_send_IPI_mask_logical is only used from there, I assume the trace you >>>> got does the same. Maybe something there is wrong which would explain why we >>>> only see it on 32bit hosts. >>> >>> Can you try with a different APIC driver to test this? >>> >> I don't think I can. And I think the statement about this only be used for 32bit >> could be wrong. I got mislead to think so because those are only defined in >> probe_32 but the 64bit counterpart isn't containing much aside that. >> >> Anyway, I played around with tracing a bit more. So with this change: >> >> if (need_resched()) { >> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); >> if (need_resched() != should_resched()) { >> + trace_printk("need(%i) != should(%i)\n", >> + need_resched(), should_resched()); >> + trace_printk("exit_reason=%u\n", >> + vcpu->run->exit_reason); >> + trace_printk("preempt_count=%lx\n", >> + __this_cpu_read_4(__preempt_count)); >> + tracing_stop(); >> + printk(KERN_ERR "Stopped tracing, due to >> inconsistent state.\n"); >> } >> + schedule(); >> - cond_reschedule(); >> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); >> } >> >> I get the following (weird) output: >> >> Xorg-1078 [001] d... 71.270251: native_smp_send_reschedule >> <-resched_task >> Xorg-1078 [001] d... 71.270251: default_send_IPI_mask_logical >> <-native_smp_send_reschedule >> bamfdaemon-2318 [001] d... 71.270465: resched_task <-check_preempt_wakeup >> bamfdaemon-2318 [001] d... 71.270539: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270689: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270827: resched_task <-check_preempt_wakeup >> compiz-2365 [001] d... 71.270940: resched_task <-check_preempt_wakeup >> qemu-system-x86-2679 [000] dn.. 71.270999: smp_reschedule_interrupt >> <-reschedule_interrupt >> qemu-system-x86-2679 [000] dn.. 71.270999: scheduler_ipi >> <-smp_reschedule_interrupt >> qemu-system-x86-2679 [000] .N.. 71.271001: kvm_arch_vcpu_ioctl_run: need(1) >> != should(0) >> qemu-system-x86-2679 [000] .N.. 71.271002: kvm_arch_vcpu_ioctl_run: >> exit_reason=2 >> qemu-system-x86-2679 [000] .N.. 71.271003: kvm_arch_vcpu_ioctl_run: >> preempt_count=0 >> >> So am I reading this right, that the interrupt did get delivered to cpu#0 while >> the thread info already had the resched flag set. So this really should have >> cleared the bit in preempt_count. But while the trace info shows 'N' for some >> reason should_reschedule returns false but at the same time reading the preempt >> count manually shows it 0? > > So the assembly merges the first and second should_resched(), so its > possible that load got before the interrupt(). > > The 3rd preempt_count load gets re-issued and so that would show the > 'true' value again. > > If you want to force a reload after the condition; put in a barrier(). > > In any case; this looks like a false-positive. Please try again until > you get one where the interrupt doesn't happen and we stay in 'n' state. > Oh and one thing I was wondering. Not sure I do understand it right... When initially converting to percpu counts, you changed the 32bit assembly like that: --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -362,12 +362,9 @@ END(ret_from_exception) #ifdef CONFIG_PREEMPT ENTRY(resume_kernel) DISABLE_INTERRUPTS(CLBR_ANY) - cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ? - jnz restore_all need_resched: - movl TI_flags(%ebp), %ecx # need_resched set ? - testb $_TIF_NEED_RESCHED, %cl - jz restore_all + cmpl $0,PER_CPU_VAR(__preempt_count) + jnz restore_all testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path jz restore_all call preempt_schedule_irq This seems to say if preempt_count was 0 then then if the thread flag was set and interrupts were not off(?) it would do a preempt ipi and then come back to re-check the thread flag. This would now be if preempt_count is 0 only... and I wonder whether that would change from doing that loop... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-14 11:24 ` Stefan Bader @ 2014-02-14 11:49 ` Peter Zijlstra 0 siblings, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2014-02-14 11:49 UTC (permalink / raw) To: Stefan Bader Cc: Borislav Petkov, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI, Joerg Roedel On Fri, Feb 14, 2014 at 12:24:42PM +0100, Stefan Bader wrote: > Oh and one thing I was wondering. Not sure I do understand it right... When > initially converting to percpu counts, you changed the 32bit assembly like that: > > --- a/arch/x86/kernel/entry_32.S > +++ b/arch/x86/kernel/entry_32.S > @@ -362,12 +362,9 @@ END(ret_from_exception) > #ifdef CONFIG_PREEMPT > ENTRY(resume_kernel) > DISABLE_INTERRUPTS(CLBR_ANY) > - cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ? > - jnz restore_all > need_resched: > - movl TI_flags(%ebp), %ecx # need_resched set ? > - testb $_TIF_NEED_RESCHED, %cl > - jz restore_all > + cmpl $0,PER_CPU_VAR(__preempt_count) > + jnz restore_all > testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path > jz restore_all > call preempt_schedule_irq > > This seems to say if preempt_count was 0 then then if the thread flag was set > and interrupts were not off(?) it would do a preempt ipi and then come back to > re-check the thread flag. No not an IPI; it would reschedule. So the old code: if preempt_count != 0; continue out if !TIF_NEED_RESCHED; continue out if IRQs-off in calling context; continue out preempt_schedule_irq The new code: if preempt_count != 0; continue out if IRQs-off in calling context; continue out preempt_schedule_irq > This would now be if preempt_count is 0 only... and I wonder whether that would > change from doing that loop... We can do away with the TIF_NEED_RESCHED test because that state is folded into the preempt_count by means of PREEMPT_NEED_RESCHED. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Another preempt folding issue? 2014-02-12 10:40 ` Borislav Petkov 2014-02-12 11:09 ` Stefan Bader @ 2014-02-12 11:12 ` Joerg Roedel 1 sibling, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2014-02-12 11:12 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Stefan Bader, Paolo Bonzini, Linux Kernel Mailing List, kvm, Marcelo Tosatti, MASAO TAKAHASHI On Wed, Feb 12, 2014 at 11:40:17AM +0100, Borislav Petkov wrote: > Also what I'm wondering about and what's not clear from Stefan's reply > is whether this is purely a 32-bit issue, i.e. a 32-bit host running a > 64-bit qemu running a 32-bit iso or what is it? > > Or do we have reports for both 32-bit and 64-bit hosts? As I understand it, the problem reproduces only on a 32bit host, which also limits the guests to 32bit, at least when ran with KVM. This is true even if a x86-64 qemu is used. This is also independent on whether the host-cpu supports 64bit or not. Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2014-03-25 8:24 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-11 18:34 Another preempt folding issue? Stefan Bader 2014-02-11 19:45 ` Peter Zijlstra 2014-02-12 8:20 ` Stefan Bader 2014-02-12 10:37 ` Peter Zijlstra 2014-02-12 10:40 ` Borislav Petkov 2014-02-12 11:09 ` Stefan Bader 2014-02-12 11:54 ` Peter Zijlstra 2014-02-13 17:00 ` Stefan Bader 2014-02-13 17:38 ` Peter Zijlstra 2014-02-13 18:03 ` Stefan Bader 2014-02-13 18:26 ` Peter Zijlstra 2014-02-14 13:34 ` Borislav Petkov 2014-02-14 13:40 ` Stefan Bader 2014-02-14 14:24 ` Stefan Bader 2014-02-14 14:47 ` Borislav Petkov 2014-02-14 17:02 ` Stefan Bader 2014-02-14 17:21 ` Peter Zijlstra 2014-02-20 15:38 ` Stefan Bader 2014-02-20 15:50 ` Peter Zijlstra 2014-03-24 17:39 ` Paolo Bonzini 2014-03-25 8:23 ` Stefan Bader 2014-02-14 17:33 ` Borislav Petkov 2014-02-14 18:23 ` Stefan Bader 2014-02-14 19:03 ` Stefan Bader 2014-02-14 15:21 ` Another preempt folding issue? (maybe bisect) Borislav Petkov 2014-02-14 15:28 ` Stefan Bader 2014-02-14 15:44 ` Borislav Petkov 2014-02-14 16:21 ` Peter Zijlstra 2014-02-13 18:25 ` Another preempt folding issue? Peter Zijlstra 2014-02-14 10:55 ` Stefan Bader 2014-02-14 13:17 ` Peter Zijlstra 2014-02-14 11:24 ` Stefan Bader 2014-02-14 11:49 ` Peter Zijlstra 2014-02-12 11:12 ` Joerg Roedel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).