iwlagn and kvm related "BUG: scheduling while atomic" after resuming

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* iwlagn and kvm related "BUG: scheduling while atomic" after resuming
@ 2011-07-18 19:14 Stefan Hajnoczi
  2011-07-18 20:18 ` [Ilw] " Berg, Johannes
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-07-18 19:14 UTC (permalink / raw)
  To: kvm, Intel Linux Wireless; +Cc: Avi Kivity, Wey-Yi Guy

Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
resuming the host from suspend-to-RAM.  I captured as much of the oops
as was displayed on screen:

http://vmsplice.net/~stefan/panic1.jpg
http://vmsplice.net/~stefan/panic2.jpg

It looks like the iwlagn driver may have crashed in an interrupt
handler and the kvm-related panic was triggered in the aftermath.  Any
ideas?

This is Debian unstable linux-image-2.6.39-2-amd64, not linux-2.6.git.
 Perhaps this is a known issue that has been fixed?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-18 19:14 iwlagn and kvm related "BUG: scheduling while atomic" after resuming Stefan Hajnoczi
@ 2011-07-18 20:18 ` Berg, Johannes
  2011-07-18 21:51   ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Berg, Johannes @ 2011-07-18 20:18 UTC (permalink / raw)
  To: Stefan Hajnoczi, kvm, Intel Linux Wireless; +Cc: Avi Kivity

> Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
> resuming the host from suspend-to-RAM.  I captured as much of the oops as
> was displayed on screen:
> 
> http://vmsplice.net/~stefan/panic1.jpg
> http://vmsplice.net/~stefan/panic2.jpg
> 
> It looks like the iwlagn driver may have crashed in an interrupt handler and the
> kvm-related panic was triggered in the aftermath.  Any ideas?

This doesn't look like iwlagn is involved at all -- the fact that it comes up in the backtrace seems to be an artifact of backtracing not being perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no reason to believe that iwlagn should crash kvm.

johannes
--------------------------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland 
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-18 20:18 ` [Ilw] " Berg, Johannes
@ 2011-07-18 21:51   ` Stefan Hajnoczi
  2011-07-19  9:04     ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-07-18 21:51 UTC (permalink / raw)
  To: Berg, Johannes; +Cc: kvm, Intel Linux Wireless, Avi Kivity

On Mon, Jul 18, 2011 at 9:18 PM, Berg, Johannes <johannes.berg@intel.com> wrote:
>> Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
>> resuming the host from suspend-to-RAM.  I captured as much of the oops as
>> was displayed on screen:
>>
>> http://vmsplice.net/~stefan/panic1.jpg
>> http://vmsplice.net/~stefan/panic2.jpg
>>
>> It looks like the iwlagn driver may have crashed in an interrupt handler and the
>> kvm-related panic was triggered in the aftermath.  Any ideas?
>
> This doesn't look like iwlagn is involved at all -- the fact that it comes up in the backtrace seems to be an artifact of backtracing not being perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no reason to believe that iwlagn should crash kvm.

RIP seems to be arch/x86/kvm/x86.c:vcpu_enter_guest():

	preempt_disable();

	kvm_x86_ops->prepare_guest_switch(vcpu);
	if (vcpu->fpu_active)
		kvm_load_guest_fpu(vcpu);
	kvm_load_guest_xcr0(vcpu);

	vcpu->mode = IN_GUEST_MODE;

	/* We should set ->mode before check ->requests,
	 * see the comment in make_all_cpus_request.
	 */
	smp_mb();

	local_irq_disable();

	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
	    || need_resched() || signal_pending(current)) {
		vcpu->mode = OUTSIDE_GUEST_MODE;
		smp_wmb();
		local_irq_enable();
		preempt_enable();
		kvm_x86_ops->cancel_injection(vcpu);
		r = 1;
		goto out;
	}

	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);

	kvm_guest_enter();

	if (unlikely(vcpu->arch.switch_db_regs)) {
		set_debugreg(0, 7);
		set_debugreg(vcpu->arch.eff_db[0], 0);
		set_debugreg(vcpu->arch.eff_db[1], 1);
		set_debugreg(vcpu->arch.eff_db[2], 2);
		set_debugreg(vcpu->arch.eff_db[3], 3);
	}

	trace_kvm_entry(vcpu->vcpu_id);
	kvm_x86_ops->run(vcpu);

	/*
	 * If the guest has used debug registers, at least dr7
	 * will be disabled while returning to the host.
	 * If we don't have active breakpoints in the host, we don't
	 * care about the messed up debug address registers. But if
	 * we have some of them active, restore the old state.
	 */
	if (hw_breakpoint_active())
		hw_breakpoint_restore();

	kvm_get_msr(vcpu, MSR_IA32_TSC, &vcpu->arch.last_guest_tsc);

	vcpu->mode = OUTSIDE_GUEST_MODE;
	smp_wmb();
	local_irq_enable();  /* <--- boom! */

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-18 21:51   ` Stefan Hajnoczi
@ 2011-07-19  9:04     ` Avi Kivity
  2011-07-19  9:13       ` Gleb Natapov
  2011-07-19 14:23       ` Stefan Hajnoczi
  0 siblings, 2 replies; 11+ messages in thread
From: Avi Kivity @ 2011-07-19  9:04 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Berg, Johannes, kvm, Intel Linux Wireless

On 07/19/2011 12:51 AM, Stefan Hajnoczi wrote:
> On Mon, Jul 18, 2011 at 9:18 PM, Berg, Johannes<johannes.berg@intel.com>  wrote:
> >>  Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
> >>  resuming the host from suspend-to-RAM.  I captured as much of the oops as
> >>  was displayed on screen:
> >>
> >>  http://vmsplice.net/~stefan/panic1.jpg
> >>  http://vmsplice.net/~stefan/panic2.jpg
> >>
> >>  It looks like the iwlagn driver may have crashed in an interrupt handler and the
> >>  kvm-related panic was triggered in the aftermath.  Any ideas?
> >
> >  This doesn't look like iwlagn is involved at all -- the fact that it comes up in the backtrace seems to be an artifact of backtracing not being perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no reason to believe that iwlagn should crash kvm.
>
> RIP seems to be arch/x86/kvm/x86.c:vcpu_enter_guest():
>
> 	preempt_disable();
>
> 	kvm_x86_ops->prepare_guest_switch(vcpu);
> 	if (vcpu->fpu_active)
> 		kvm_load_guest_fpu(vcpu);
> 	kvm_load_guest_xcr0(vcpu);
>
> 	vcpu->mode = IN_GUEST_MODE;
>
> 	/* We should set ->mode before check ->requests,
> 	 * see the comment in make_all_cpus_request.
> 	 */
> 	smp_mb();
>
> 	local_irq_disable();
>
> 	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> 	    || need_resched() || signal_pending(current)) {
> 		vcpu->mode = OUTSIDE_GUEST_MODE;
> 		smp_wmb();
> 		local_irq_enable();
> 		preempt_enable();
> 		kvm_x86_ops->cancel_injection(vcpu);
> 		r = 1;
> 		goto out;
> 	}
>
> 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>
> 	kvm_guest_enter();
>
> 	if (unlikely(vcpu->arch.switch_db_regs)) {
> 		set_debugreg(0, 7);
> 		set_debugreg(vcpu->arch.eff_db[0], 0);
> 		set_debugreg(vcpu->arch.eff_db[1], 1);
> 		set_debugreg(vcpu->arch.eff_db[2], 2);
> 		set_debugreg(vcpu->arch.eff_db[3], 3);
> 	}
>
> 	trace_kvm_entry(vcpu->vcpu_id);
> 	kvm_x86_ops->run(vcpu);
>
> 	/*
> 	 * If the guest has used debug registers, at least dr7
> 	 * will be disabled while returning to the host.
> 	 * If we don't have active breakpoints in the host, we don't
> 	 * care about the messed up debug address registers. But if
> 	 * we have some of them active, restore the old state.
> 	 */
> 	if (hw_breakpoint_active())
> 		hw_breakpoint_restore();
>
> 	kvm_get_msr(vcpu, MSR_IA32_TSC,&vcpu->arch.last_guest_tsc);
>
> 	vcpu->mode = OUTSIDE_GUEST_MODE;
> 	smp_wmb();
> 	local_irq_enable();  /*<--- boom! */

Preemption is still disabled at this point.  Where does the "scheduling 
while atomic" come from?  Nothing in this area attempts to schedule.

The preemption counter is 0x10000100, indicating zero preempt depth 
(wrong for this point, should be 1), and 1 softirq depth (doesn't make 
much sense).  Looks very wrong, like the preempt mixup that occured on 
some archs that are not x86_64 recently.

Can you post some disassembly around %rip?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19  9:04     ` Avi Kivity
@ 2011-07-19  9:13       ` Gleb Natapov
  2011-07-19  9:16         ` Avi Kivity
  2011-07-19 14:23       ` Stefan Hajnoczi
  1 sibling, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2011-07-19  9:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Stefan Hajnoczi, Berg, Johannes, kvm, Intel Linux Wireless

On Tue, Jul 19, 2011 at 12:04:21PM +0300, Avi Kivity wrote:
> On 07/19/2011 12:51 AM, Stefan Hajnoczi wrote:
> >On Mon, Jul 18, 2011 at 9:18 PM, Berg, Johannes<johannes.berg@intel.com>  wrote:
> >>>  Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
> >>>  resuming the host from suspend-to-RAM.  I captured as much of the oops as
> >>>  was displayed on screen:
> >>>
> >>>  http://vmsplice.net/~stefan/panic1.jpg
> >>>  http://vmsplice.net/~stefan/panic2.jpg
> >>>
> >>>  It looks like the iwlagn driver may have crashed in an interrupt handler and the
> >>>  kvm-related panic was triggered in the aftermath.  Any ideas?
> >>
> >>  This doesn't look like iwlagn is involved at all -- the fact that it comes up in the backtrace seems to be an artifact of backtracing not being perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no reason to believe that iwlagn should crash kvm.
> >
> >RIP seems to be arch/x86/kvm/x86.c:vcpu_enter_guest():
> >
> >	preempt_disable();
> >
> >	kvm_x86_ops->prepare_guest_switch(vcpu);
> >	if (vcpu->fpu_active)
> >		kvm_load_guest_fpu(vcpu);
> >	kvm_load_guest_xcr0(vcpu);
> >
> >	vcpu->mode = IN_GUEST_MODE;
> >
> >	/* We should set ->mode before check ->requests,
> >	 * see the comment in make_all_cpus_request.
> >	 */
> >	smp_mb();
> >
> >	local_irq_disable();
> >
> >	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> >	    || need_resched() || signal_pending(current)) {
> >		vcpu->mode = OUTSIDE_GUEST_MODE;
> >		smp_wmb();
> >		local_irq_enable();
> >		preempt_enable();
> >		kvm_x86_ops->cancel_injection(vcpu);
> >		r = 1;
> >		goto out;
> >	}
> >
> >	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> >
> >	kvm_guest_enter();
> >
> >	if (unlikely(vcpu->arch.switch_db_regs)) {
> >		set_debugreg(0, 7);
> >		set_debugreg(vcpu->arch.eff_db[0], 0);
> >		set_debugreg(vcpu->arch.eff_db[1], 1);
> >		set_debugreg(vcpu->arch.eff_db[2], 2);
> >		set_debugreg(vcpu->arch.eff_db[3], 3);
> >	}
> >
> >	trace_kvm_entry(vcpu->vcpu_id);
> >	kvm_x86_ops->run(vcpu);
> >
> >	/*
> >	 * If the guest has used debug registers, at least dr7
> >	 * will be disabled while returning to the host.
> >	 * If we don't have active breakpoints in the host, we don't
> >	 * care about the messed up debug address registers. But if
> >	 * we have some of them active, restore the old state.
> >	 */
> >	if (hw_breakpoint_active())
> >		hw_breakpoint_restore();
> >
> >	kvm_get_msr(vcpu, MSR_IA32_TSC,&vcpu->arch.last_guest_tsc);
> >
> >	vcpu->mode = OUTSIDE_GUEST_MODE;
> >	smp_wmb();
> >	local_irq_enable();  /*<--- boom! */
> 
> Preemption is still disabled at this point.  Where does the
> "scheduling while atomic" come from?  Nothing in this area attempts
> to schedule.
> 
0x10000000 in preemption counter is PREEMPT_ACTIVE, so this looks like
preemptable kernel tries to preempt itself.

> The preemption counter is 0x10000100, indicating zero preempt depth
> (wrong for this point, should be 1), and 1 softirq depth (doesn't
> make much sense).  Looks very wrong, like the preempt mixup that
> occured on some archs that are not x86_64 recently.
> 
> Can you post some disassembly around %rip?
> 
> -- 
> error compiling committee.c: too many arguments to function
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19  9:13       ` Gleb Natapov
@ 2011-07-19  9:16         ` Avi Kivity
  2011-07-19  9:18           ` Gleb Natapov
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2011-07-19  9:16 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Stefan Hajnoczi, Berg, Johannes, kvm, Intel Linux Wireless

On 07/19/2011 12:13 PM, Gleb Natapov wrote:
> >  Preemption is still disabled at this point.  Where does the
> >  "scheduling while atomic" come from?  Nothing in this area attempts
> >  to schedule.
> >
> 0x10000000 in preemption counter is PREEMPT_ACTIVE, so this looks like
> preemptable kernel tries to preempt itself.
>

Yes, it's the wlan interrupt comming after local_irq_enable().  But it 
still doesn't make sense.  We're in a preempt_disable() section, so the 
scheduler shouldn't have been called in the first place.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19  9:16         ` Avi Kivity
@ 2011-07-19  9:18           ` Gleb Natapov
  0 siblings, 0 replies; 11+ messages in thread
From: Gleb Natapov @ 2011-07-19  9:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Stefan Hajnoczi, Berg, Johannes, kvm, Intel Linux Wireless

On Tue, Jul 19, 2011 at 12:16:54PM +0300, Avi Kivity wrote:
> On 07/19/2011 12:13 PM, Gleb Natapov wrote:
> >>  Preemption is still disabled at this point.  Where does the
> >>  "scheduling while atomic" come from?  Nothing in this area attempts
> >>  to schedule.
> >>
> >0x10000000 in preemption counter is PREEMPT_ACTIVE, so this looks like
> >preemptable kernel tries to preempt itself.
> >
> 
> Yes, it's the wlan interrupt comming after local_irq_enable().  But
> it still doesn't make sense.  We're in a preempt_disable() section,
> so the scheduler shouldn't have been called in the first place.
> 
Yes, and soft irq depth count should prevent that too. Very strange.

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19  9:04     ` Avi Kivity
  2011-07-19  9:13       ` Gleb Natapov
@ 2011-07-19 14:23       ` Stefan Hajnoczi
  2011-07-19 16:51         ` Avi Kivity
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-07-19 14:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Berg, Johannes, kvm, Intel Linux Wireless

On Tue, Jul 19, 2011 at 10:04 AM, Avi Kivity <avi@redhat.com> wrote:
> On 07/19/2011 12:51 AM, Stefan Hajnoczi wrote:
>>
>> On Mon, Jul 18, 2011 at 9:18 PM, Berg, Johannes<johannes.berg@intel.com>
>>  wrote:
>> >>  Today I encountered a "BUG: scheduling while atomic" from kvm.ko when
>> >>  resuming the host from suspend-to-RAM.  I captured as much of the oops
>> >> as
>> >>  was displayed on screen:
>> >>
>> >>  http://vmsplice.net/~stefan/panic1.jpg
>> >>  http://vmsplice.net/~stefan/panic2.jpg
>> >>
>> >>  It looks like the iwlagn driver may have crashed in an interrupt
>> >> handler and the
>> >>  kvm-related panic was triggered in the aftermath.  Any ideas?
>> >
>> >  This doesn't look like iwlagn is involved at all -- the fact that it
>> > comes up in the backtrace seems to be an artifact of backtracing not being
>> > perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no
>> > reason to believe that iwlagn should crash kvm.
>>
>> RIP seems to be arch/x86/kvm/x86.c:vcpu_enter_guest():
>>
>>        preempt_disable();
>>
>>        kvm_x86_ops->prepare_guest_switch(vcpu);
>>        if (vcpu->fpu_active)
>>                kvm_load_guest_fpu(vcpu);
>>        kvm_load_guest_xcr0(vcpu);
>>
>>        vcpu->mode = IN_GUEST_MODE;
>>
>>        /* We should set ->mode before check ->requests,
>>         * see the comment in make_all_cpus_request.
>>         */
>>        smp_mb();
>>
>>        local_irq_disable();
>>
>>        if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>            || need_resched() || signal_pending(current)) {
>>                vcpu->mode = OUTSIDE_GUEST_MODE;
>>                smp_wmb();
>>                local_irq_enable();
>>                preempt_enable();
>>                kvm_x86_ops->cancel_injection(vcpu);
>>                r = 1;
>>                goto out;
>>        }
>>
>>        srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>>
>>        kvm_guest_enter();
>>
>>        if (unlikely(vcpu->arch.switch_db_regs)) {
>>                set_debugreg(0, 7);
>>                set_debugreg(vcpu->arch.eff_db[0], 0);
>>                set_debugreg(vcpu->arch.eff_db[1], 1);
>>                set_debugreg(vcpu->arch.eff_db[2], 2);
>>                set_debugreg(vcpu->arch.eff_db[3], 3);
>>        }
>>
>>        trace_kvm_entry(vcpu->vcpu_id);
>>        kvm_x86_ops->run(vcpu);
>>
>>        /*
>>         * If the guest has used debug registers, at least dr7
>>         * will be disabled while returning to the host.
>>         * If we don't have active breakpoints in the host, we don't
>>         * care about the messed up debug address registers. But if
>>         * we have some of them active, restore the old state.
>>         */
>>        if (hw_breakpoint_active())
>>                hw_breakpoint_restore();
>>
>>        kvm_get_msr(vcpu, MSR_IA32_TSC,&vcpu->arch.last_guest_tsc);
>>
>>        vcpu->mode = OUTSIDE_GUEST_MODE;
>>        smp_wmb();
>>        local_irq_enable();  /*<--- boom! */
>
> Preemption is still disabled at this point.  Where does the "scheduling
> while atomic" come from?  Nothing in this area attempts to schedule.
>
> The preemption counter is 0x10000100, indicating zero preempt depth (wrong
> for this point, should be 1), and 1 softirq depth (doesn't make much sense).
>  Looks very wrong, like the preempt mixup that occured on some archs that
> are not x86_64 recently.
>
> Can you post some disassembly around %rip?

Here is the kvm.ko disassembly for %rip = kvm_arch_vcpu_ioctl_run+0x927.

I think at runtime the pvops bit should be patched and we should just
have a sti instruction?

I don't understand the backtrace.  It prints out the interrupt handler
stack frames but %rip appears to be down in kvm.ko.  If there is an
interrupt handler running shouldn't %rip be in there?

Unfortunately the full oops scrolled off the screen so I only have
panic1.jpg and panic2.jpg.  We're missing earlier information.  Is it
possible that there was another panic first and that kvm.ko is just
dying in the aftermath?

   11436:       44 8b 6b 24             mov    0x24(%rbx),%r13d
   1143a:       74 23                   je     1145f
<kvm_arch_vcpu_ioctl_run+0x8dd>
   1143c:       4c 8b 25 00 00 00 00    mov    0x0(%rip),%r12        #
11443 <kvm_arch_vcpu_ioctl_run+0x8c1>
                        1143f: R_X86_64_PC32    __tracepoint_kvm_entry+0x1c
   11443:       4d 85 e4                test   %r12,%r12
   11446:       74 17                   je     1145f
<kvm_arch_vcpu_ioctl_run+0x8dd>
   11448:       49 8b 7c 24 08          mov    0x8(%r12),%rdi
   1144d:       44 89 ee                mov    %r13d,%esi
   11450:       41 ff 14 24             callq  *(%r12)
   11454:       49 83 c4 10             add    $0x10,%r12
   11458:       49 83 3c 24 00          cmpq   $0x0,(%r12)
   1145d:       eb e7                   jmp    11446
<kvm_arch_vcpu_ioctl_run+0x8c4>
   1145f:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        #
11466 <kvm_arch_vcpu_ioctl_run+0x8e4>
                        11462: R_X86_64_PC32    kvm_x86_ops+0xfffffffffffffffc
   11466:       48 89 df                mov    %rbx,%rdi
   11469:       ff 90 48 01 00 00       callq  *0x148(%rax)
   1146f:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
   11476:       00 00
                        11474: R_X86_64_32S     cpu_dr7
   11478:       a8 aa                   test   $0xaa,%al
   1147a:       74 05                   je     11481
<kvm_arch_vcpu_ioctl_run+0x8ff>
   1147c:       e8 00 00 00 00          callq  11481
<kvm_arch_vcpu_ioctl_run+0x8ff>
                        1147d: R_X86_64_PC32
hw_breakpoint_restore+0xfffffffffffffffc
   11481:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        #
11488 <kvm_arch_vcpu_ioctl_run+0x906>
                        11484: R_X86_64_PC32    kvm_x86_ops+0xfffffffffffffffc
   11488:       48 8b 54 24 50          mov    0x50(%rsp),%rdx
   1148d:       be 10 00 00 00          mov    $0x10,%esi
   11492:       48 89 df                mov    %rbx,%rdi
   11495:       ff 90 80 00 00 00       callq  *0x80(%rax)
   1149b:       c7 43 2c 00 00 00 00    movl   $0x0,0x2c(%rbx)
   114a2:       ff 14 25 00 00 00 00    callq  *0x0
                        114a5: R_X86_64_32S     pv_irq_ops+0x18

   114a9:       ff 83 b8 00 00 00       incl   0xb8(%rbx)   <--- boom!

   114af:       48 8b 44 24 40          mov    0x40(%rsp),%rax
   114b4:       83 60 14 ef             andl   $0xffffffef,0x14(%rax)
   114b8:       48 8b 3b                mov    (%rbx),%rdi
   114bb:       48 83 c7 38             add    $0x38,%rdi
   114bf:       e8 00 00 00 00          callq  114c4
<kvm_arch_vcpu_ioctl_run+0x942>
                        114c0: R_X86_64_PC32
__srcu_read_lock+0xfffffffffffffffc
   114c4:       89 43 28                mov    %eax,0x28(%rbx)
   114c7:       83 3d 00 00 00 00 04    cmpl   $0x4,0x0(%rip)        #
114ce <kvm_arch_vcpu_ioctl_run+0x94c>
                        114c9: R_X86_64_PC32    prof_on+0xfffffffffffffffb
   114ce:       75 28                   jne    114f8
<kvm_arch_vcpu_ioctl_run+0x976>
   114d0:       be 10 00 00 00          mov    $0x10,%esi
   114d5:       48 89 df                mov    %rbx,%rdi
   114d8:       e8 46 bb ff ff          callq  d023 <kvm_register_read>
   114dd:       83 3d 00 00 00 00 04    cmpl   $0x4,0x0(%rip)        #
114e4 <kvm_arch_vcpu_ioctl_run+0x962>
                        114df: R_X86_64_PC32    prof_on+0xfffffffffffffffb
   114e4:       75 12                   jne    114f8
<kvm_arch_vcpu_ioctl_run+0x976>

For comparison here are the disassembled "Code" bytes from the panic
message.  I think the photo truncates the full output so the last
instruction is incomplete, but it matches up with 0x11488 above:

   0:   99                      cltd
   1:   b2 e0                   mov    $0xe0,%dl
   3:   48 8b 05 f8 62 02 00    mov    0x262f8(%rip),%rax        # 0x26302
   a:   48 8b 54 24 50          mov    0x50(%rsp),%rdx
   f:   be 10 00 00 00          mov    $0x10,%esi
  14:   48 89 df                mov    %rbx,%rdi
  17:   ff                      .byte 0xff
  18:   90                      nop
  19:   80                      .byte 0x80

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19 14:23       ` Stefan Hajnoczi
@ 2011-07-19 16:51         ` Avi Kivity
  2011-07-19 20:20           ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2011-07-19 16:51 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Berg, Johannes, kvm, Intel Linux Wireless

On 07/19/2011 05:23 PM, Stefan Hajnoczi wrote:
> >
> >  Can you post some disassembly around %rip?
>
> Here is the kvm.ko disassembly for %rip = kvm_arch_vcpu_ioctl_run+0x927.
>
> I think at runtime the pvops bit should be patched and we should just
> have a sti instruction?

Yes, it's a local_irq_enable() (followed by a 4-byte nop; %rip points 
past the nop due to interrupt shadow).

> I don't understand the backtrace.  It prints out the interrupt handler
> stack frames but %rip appears to be down in kvm.ko.  If there is an
> interrupt handler running shouldn't %rip be in there?

The interrupt handler finished running; just before returning it called 
the scheduler (which it shouldn't have, since we're in a 
preempt_disabled() section).

> Unfortunately the full oops scrolled off the screen so I only have
> panic1.jpg and panic2.jpg.  We're missing earlier information.  Is it
> possible that there was another panic first and that kvm.ko is just
> dying in the aftermath?

It's possible.  Is this reproducible?  If so please set up netconsole or 
kdump.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19 16:51         ` Avi Kivity
@ 2011-07-19 20:20           ` Stefan Hajnoczi
  2011-07-19 20:27             ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-07-19 20:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Berg, Johannes, kvm, Intel Linux Wireless

On Tue, Jul 19, 2011 at 5:51 PM, Avi Kivity <avi@redhat.com> wrote:
> On 07/19/2011 05:23 PM, Stefan Hajnoczi wrote:
>> Unfortunately the full oops scrolled off the screen so I only have
>> panic1.jpg and panic2.jpg.  We're missing earlier information.  Is it
>> possible that there was another panic first and that kvm.ko is just
>> dying in the aftermath?
>
> It's possible.  Is this reproducible?  If so please set up netconsole or
> kdump.

I only had this oops once and suspect it will be hard to reproduce but
will run a few suspend/resume cycles to try...

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ilw] iwlagn and kvm related "BUG: scheduling while atomic" after resuming
  2011-07-19 20:20           ` Stefan Hajnoczi
@ 2011-07-19 20:27             ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-07-19 20:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Berg, Johannes, kvm, Intel Linux Wireless

On Tue, Jul 19, 2011 at 9:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Jul 19, 2011 at 5:51 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 07/19/2011 05:23 PM, Stefan Hajnoczi wrote:
>>> Unfortunately the full oops scrolled off the screen so I only have
>>> panic1.jpg and panic2.jpg.  We're missing earlier information.  Is it
>>> possible that there was another panic first and that kvm.ko is just
>>> dying in the aftermath?
>>
>> It's possible.  Is this reproducible?  If so please set up netconsole or
>> kdump.
>
> I only had this oops once and suspect it will be hard to reproduce but
> will run a few suspend/resume cycles to try...

No oops afte 10 suspend/resume cycles while running ping -f 10.0.0.1
to keep the wireless card busy.

I guess I should run with kdump in case this happens again in the future.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-07-19 20:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-18 19:14 iwlagn and kvm related "BUG: scheduling while atomic" after resuming Stefan Hajnoczi
2011-07-18 20:18 ` [Ilw] " Berg, Johannes
2011-07-18 21:51   ` Stefan Hajnoczi
2011-07-19  9:04     ` Avi Kivity
2011-07-19  9:13       ` Gleb Natapov
2011-07-19  9:16         ` Avi Kivity
2011-07-19  9:18           ` Gleb Natapov
2011-07-19 14:23       ` Stefan Hajnoczi
2011-07-19 16:51         ` Avi Kivity
2011-07-19 20:20           ` Stefan Hajnoczi
2011-07-19 20:27             ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox