xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Strange kernel BUG() on PV DomU boot
@ 2012-06-22 12:21 Joanna Rutkowska
  2012-06-22 12:26 ` Joanna Rutkowska
  0 siblings, 1 reply; 11+ messages in thread
From: Joanna Rutkowska @ 2012-06-22 12:21 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com; +Cc: Marek Marczykowski


[-- Attachment #1.1: Type: text/plain, Size: 6397 bytes --]

Hello,

From time to time (every several weeks or even less) I run into a
strange Dom0 kernel BUG() that manifests itself with the following
message (see the end of the message). The Dom0 and VM kernels are 3.2.7
pvops, and the Xen hypervisor is 4.1.2 both with only some minor,
irrelevant (I think) modifications for Qubes.

The bug is very hard to reproduce, but once this BUG() starts being
signaled, it consistently prevents me from starting any new VMs in the
system (e.g. tried over a dozen of times now, and every time the VM boot
fails).

The following lines in the VM kernel are responsible for signaling the
BUG():

  if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
        BUG();

...yet, there is nothing in the xl dmesg that would provide more info
why this hypercall fails. Ah, that's because there are not printk's in
the hypercall code:

   case VCPUOP_initialise:
        if ( v->vcpu_info == &dummy_vcpu_info )
            return -EINVAL;

        if ( (ctxt = xmalloc(struct vcpu_guest_context)) == NULL )
            return -ENOMEM;

        if ( copy_from_guest(ctxt, arg, 1) )
        {
            xfree(ctxt);
            return -EFAULT;
        }

        domain_lock(d);
        rc = -EEXIST;
        if ( !v->is_initialised )
            rc = boot_vcpu(d, vcpuid, ctxt);
        domain_unlock(d);

        xfree(ctxt);
        break;

So, looking at the above it seems like it might be failing because of
xmalloc() fails, however Xen seems to have enough memory as reported by
xl info:

total_memory           : 8074
free_memory            : 66
free_cpus              : 0

Any ideas what might be the cause?

FWIW, below the actual oops message.

Thanks,
joanna.




[    0.004356] ------------[ cut here ]------------
[    0.004361] kernel BUG at
/home/user/qubes-src/kernel/kernel-3.2.7/linux-3.2.7/arch/x86/xen/smp.c:322!
[    0.004366] invalid opcode: 0000 [#1] SMP
[    0.004370] CPU 0
[    0.004372] Modules linked in:
[    0.004376]
[    0.004379] Pid: 1, comm: swapper/0 Not tainted
3.2.7-5.pvops.qubes.x86_64 #1
[    0.004385] RIP: e030:[<ffffffff8143a229>]  [<ffffffff8143a229>]
cpu_initialize_context+0x263/0x280
[    0.004396] RSP: e02b:ffff880018063e10  EFLAGS: 00010282
[    0.004399] RAX: fffffffffffffff4 RBX: ffff8800180c0000 RCX:
0000000000000000
[    0.004404] RDX: ffff8800180c0000 RSI: 0000000000000001 RDI:
0000000000000000
[    0.004408] RBP: ffff880018063e50 R08: 00003ffffffff000 R09:
ffff880000000000
[    0.004412] R10: ffff8800180c0000 R11: 0000000000002000 R12:
0000000000000001
[    0.004417] R13: ffff880018f82d30 R14: ffff88001806e0c0 R15:
00000000000a98ed
[    0.004429] FS:  0000000000000000(0000) GS:ffff880018f5c000(0000)
knlGS:0000000000000000
[    0.004436] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.004441] CR2: 0000000000000000 CR3: 0000000001805000 CR4:
0000000000002660
[    0.004447] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.004452] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[    0.004459] Process swapper/0 (pid: 1, threadinfo ffff880018062000,
task ffff880018060040)
[    0.004465] Stack:
[    0.004469]  ffff88001806e0c0 0000000000018f7b ffffffff81866c80
0000000000000001
[    0.004479]  ffff88001806e0c0 0000000000000001 ffffffff81866c80
0000000000000001
[    0.004490]  ffff880018063e80 ffffffff8143a2e1 ffff880018063e70
0000000000000000
[    0.004500] Call Trace:
[    0.004507]  [<ffffffff8143a2e1>] xen_cpu_up+0x9b/0x115
[    0.004513]  [<ffffffff81440ad8>] _cpu_up+0x9c/0x10e
[    0.004520]  [<ffffffff81440bbf>] cpu_up+0x75/0x85
[    0.004527]  [<ffffffff818998f1>] smp_init+0x46/0x9e
[    0.004533]  [<ffffffff8188263c>] kernel_init+0x89/0x142
[    0.004541]  [<ffffffff814518b4>] kernel_thread_helper+0x4/0x10
[    0.004549]  [<ffffffff8144f973>] ? int_ret_from_sys_call+0x7/0x1b
[    0.004558]  [<ffffffff81447d7c>] ? retint_restore_args+0x5/0x6
[    0.004565]  [<ffffffff814518b0>] ? gs_change+0x13/0x13
[    0.004570] Code: 74 0d 48 ba ff ff ff ff ff ff ff 3f 48 21 d0 48 c1
e0 0c 31 ff 49 63 f4 48 89 83 90 13 00 00 48 89 da e8 db 70 bc ff 85 c0
74 04 <0f> 0b eb fe 48 89 df e8 db f6 ce ff 31 c0 48 83 c4 18 5b 41 5c
[    0.004653] RIP  [<ffffffff8143a229>] cpu_initialize_context+0x263/0x280
[    0.004661]  RSP <ffff880018063e10>
[    0.004672] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.004686] Kernel panic - not syncing: Attempted to kill init!
[    0.004692] Pid: 1, comm: swapper/0 Tainted: G      D
3.2.7-5.pvops.qubes.x86_64 #1
[    0.004698] Call Trace:
[    0.004704]  [<ffffffff81444c4a>] panic+0x8c/0x1a2
[    0.004712]  [<ffffffff81059814>] ? enqueue_entity+0x74/0x2f0
[    0.004719]  [<ffffffff8106113d>] forget_original_parent+0x34d/0x360
[    0.004728]  [<ffffffff8100a05f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[    0.004735]  [<ffffffff814478b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[    0.004743]  [<ffffffff8104acb3>] ? sched_move_task+0x93/0x150
[    0.004750]  [<ffffffff81061162>] exit_notify+0x12/0x190
[    0.004756]  [<ffffffff81062a3d>] do_exit+0x1ed/0x3e0
[    0.004763]  [<ffffffff814489e6>] oops_end+0xa6/0xf0
[    0.004770]  [<ffffffff81016476>] die+0x56/0x90
[    0.004776]  [<ffffffff81448584>] do_trap+0xc4/0x170
[    0.004783]  [<ffffffff81014440>] do_invalid_op+0x90/0xb0
[    0.004790]  [<ffffffff8143a229>] ? cpu_initialize_context+0x263/0x280
[    0.004799]  [<ffffffff81128ce4>] ? cache_grow.clone.0+0x2b4/0x3b0
[    0.004805]  [<ffffffff8100a05f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[    0.004812]  [<ffffffff810052f1>] ? pte_mfn_to_pfn+0x71/0xf0
[    0.004820]  [<ffffffff8145172b>] invalid_op+0x1b/0x20
[    0.004827]  [<ffffffff8143a229>] ? cpu_initialize_context+0x263/0x280
[    0.004834]  [<ffffffff8143a2e1>] xen_cpu_up+0x9b/0x115
[    0.004840]  [<ffffffff81440ad8>] _cpu_up+0x9c/0x10e
[    0.004846]  [<ffffffff81440bbf>] cpu_up+0x75/0x85
[    0.004852]  [<ffffffff818998f1>] smp_init+0x46/0x9e
[    0.004858]  [<ffffffff8188263c>] kernel_init+0x89/0x142
[    0.004864]  [<ffffffff814518b4>] kernel_thread_helper+0x4/0x10
[    0.004871]  [<ffffffff8144f973>] ? int_ret_from_sys_call+0x7/0x1b
[    0.004878]  [<ffffffff81447d7c>] ? retint_restore_args+0x5/0x6
[    0.004885]  [<ffffffff814518b0>] ? gs_change+0x13/0x13




[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-06-25 15:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-22 12:21 Strange kernel BUG() on PV DomU boot Joanna Rutkowska
2012-06-22 12:26 ` Joanna Rutkowska
2012-06-22 12:38   ` Jan Beulich
2012-06-22 12:53     ` Handling of out of memory conditions (was: Re: Strange kernel BUG() on PV DomU boot) Joanna Rutkowska
2012-06-22 13:02       ` Jan Beulich
2012-06-22 13:11         ` Handling of out of memory conditions Joanna Rutkowska
2012-06-22 13:21           ` Jan Beulich
2012-06-22 13:24             ` Joanna Rutkowska
2012-06-22 14:46       ` Handling of out of memory conditions (was: Re: Strange kernel BUG() on PV DomU boot) George Dunlap
2012-06-22 15:22         ` George Dunlap
2012-06-25 15:39       ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).