All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Wei Liu <Wei.Liu2@citrix.com>
Cc: xen-devel@lists.xen.org
Subject: Re: HVM bug: system crashes after offline online a vcpu
Date: Wed, 19 Dec 2012 11:04:55 -0500	[thread overview]
Message-ID: <20121219160455.GA12077@phenom.dumpdata.com> (raw)
In-Reply-To: <1355411537.8376.52.camel@iceland>

On Thu, Dec 13, 2012 at 03:12:17PM +0000, Wei Liu wrote:
> Hi Konrad
> 
> I encountered a bug when trying to bring offline a cpu then online it
> again in HVM. As I'm not very familiar with HVM stuffs I cannot come up
> with a quick fix.

I took your two patches that you posted and they are in v3.8 now.

It seems that there are bugs in the offline/online code thought.

I did this:
# echo 0 > /sys/devices/system/cpu/cpu3/online
# echo 1 > /sys/devices/system/cpu/cpu3/online

With a PV guest and it blows up (with or without your patches).

Have you seen something similar to this:

[  106.166795] BUG: scheduling while atomic: swapper/2/0/0x00000000
[  106.167168] microcode: CPU2 sig=0x206a7, pf=0x2, revision=0x17
[  106.167566] Modules linked in: sg sd_mod dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod libcrc32c crc32c radeon fbcon tileblit font bitblit softcursor ttm drm_kms_helper crc32c_intel xen_blkfront xen_netfront xen_fbfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront xenfs xen_privcmd [last unloaded: dump_dma]
[  106.169286] Pid: 0, comm: swapper/2 Tainted: G           O 3.5.0-rc3upstream-00139-gb1849b3-dirty #1
[  106.170152] Call Trace:
[  106.170598]  [<ffffffff8109bcbd>] __schedule_bug+0x4d/0x60
[  106.171042]  [<ffffffff815be0fc>] __schedule+0x69c/0x760
[  106.171469]  [<ffffffff815be284>] schedule+0x24/0x70
[  106.171890]  [<ffffffff8103fbe9>] cpu_idle+0xc9/0xe0
[  106.172309]  [<ffffffff81033e79>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  106.172726]  [<ffffffff815b1c5d>] cpu_bringup_and_idle+0xe/0x10
[  106.174533] BUG: scheduling while atomic: swapper/2/0/0x00000000
?

> 
> The HVM DomU is configured with 4 vcpus. After booting into command
> prompt, I do following operations.
> 
> 
> With Debian's default 2.6.32-5-amd64 kernel, the last log is:
> 
>     Booting processor 3 APIC 0x6 ip 0x6000
> 
> With my own kernel which is of version 3.5, I'm able to get more logs:
> 
> [   44.047358] Booting Node 0 Processor 3 APIC 0x6
> [   44.061201] ------------[ cut here ]------------
> [   44.065186] kernel BUG at kernel/hrtimer.c:1259!
> [   44.065186] invalid opcode: 0000 [#1] SMP
> [   44.065186] CPU 3
> [   44.065186] Modules linked in:
> [   44.065186]
> [   44.065186] Pid: 0, comm: swapper/3 Not tainted 3.5.0-xen-evtchn+ #50 Xen HVM domU
> [   44.065186] RIP: 0010:[<ffffffff8105682e>]  [<ffffffff8105682e>] hrtimer_interrupt+0x24/0x1a5
> [   44.065186] RSP: 0000:ffff88000f463de8  EFLAGS: 00010046
> [   44.065186] RAX: ffffffff8105680a RBX: ffff88000f46e640 RCX: 00000000fffffffa
> [   44.065186] RDX: 00000000fffffffa RSI: 0000000000000000 RDI: ffff88000f46bd80
> [   44.065186] RBP: 0000000000000057 R08: ffff88000e000b40 R09: 0000000000000019
> [   44.065186] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88000e6e8e00
> [   44.065186] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> [   44.065186] FS:  0000000000000000(0000) GS:ffff88000f460000(0000) knlGS:0000000000000000
> [   44.065186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   44.065186] CR2: 0000000000000000 CR3: 000000000181b000 CR4: 00000000000007e0
> [   44.065186] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   44.065186] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   44.065186] Process swapper/3 (pid: 0, threadinfo ffff88000e62e000, task ffff88000e62aea0)
> [   44.065186] Stack:
> [   44.065186]  0000000000000001 ffff88000f46e680 ffffffff81013711 00000008cfba9b27
> [   44.065186]  00000000fffffffa ffff88000e6e97c0 0000000000000057 ffff88000e6e8e00
> [   44.065186]  0000000000000000 0000000000000001 0000000000000000 ffffffff81006954
> [   44.065186] Call Trace:
> [   44.065186]  <IRQ>
> [   44.065186]  [<ffffffff81013711>] ? paravirt_sched_clock+0x5/0x8
> [   44.065186]  [<ffffffff81006954>] ? xen_timer_interrupt+0x26/0x162
> [   44.065186]  [<ffffffff8109a220>] ? check_for_new_grace_period.isra.32+0x90/0x9a
> [   44.065186]  [<ffffffff810956df>] ? handle_irq_event_percpu+0x32/0x1b0
> [   44.065186]  [<ffffffff8128f88b>] ? irq_get_handler_data+0x7/0x16
> [   44.065186]  [<ffffffff81097e39>] ? handle_percpu_irq+0x3a/0x4f
> [   44.065186]  [<ffffffff8128f9ec>] ? __xen_evtchn_do_upcall_l2+0x131/0x1c0
> [   44.065186]  [<ffffffff812913d3>] ? xen_evtchn_do_upcall+0x27/0x37
> [   44.065186]  [<ffffffff8140081a>] ? xen_hvm_callback_vector+0x6a/0x70
> [   44.065186]  <EOI>
> [   44.065186]  [<ffffffff81094b8f>] ? cpumask_next+0x17/0x19
> [   44.065186]  [<ffffffff813eb75b>] ? start_secondary+0x184/0x1e2
> [   44.065186]  [<ffffffff813eb757>] ? start_secondary+0x180/0x1e2
> [   44.065186]  [<ffffffff813eb5d7>] ? set_cpu_sibling_map+0x40e/0x40e
> [   44.065186] Code: 41 5d 41 5e 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 c7 c3 40 e6 00 00 48 83 ec 28 65 48 03 1c 25 e8 db 00 00 83 7b 18 00 75 02 <0f> 0b 48
>  ff 43 20 48 bd ff ff ff ff ff ff ff 7f 41 be 03 00 00
> [   44.065186] RIP  [<ffffffff8105682e>] hrtimer_interrupt+0x24/0x1a5
> [   44.065186]  RSP <ffff88000f463de8>
> [   44.065186] ---[ end trace 9366352b116a03db ]---
> [   44.065186] Kernel panic - not syncing: Fatal exception in interrupt
> 
> And if I offline online cpu 2 in 2.6.32-5-amd64:
> 
> [   27.933928] Booting processor 2 APIC 0x4 ip 0x6000
> [   25.708098] Initializing CPU#2
> [   25.708098] CPU: L1 I cache: 32K, L1 D cache: 32K
> [   25.708098] CPU: L2 cache: 6144K
> [   25.708098] CPU 2/0x4 -> Node 0
> [   25.708098] CPU: Physical Processor ID: 0
> [   25.708098] CPU: Processor Core ID: 4
> [   28.028234] CPU2: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz stepping 07
> [   28.069320] checking TSC synchronization [CPU#0 -> CPU#2]: passed.
> [   25.708098] installing Xen timer for CPU 2
> [   28.098101] CPU0 attaching NULL sched-domain.
> [   28.098106] CPU1 attaching NULL sched-domain.
> [   28.098110] CPU3 attaching NULL sched-domain.
> [   28.098092] ------------[ cut here ]------------
> [   28.098092] WARNING: at /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_none/kernel/irq/chip.c:88 unbind_from_irq+0
> x147/0x159()
> [   28.098092] Hardware name: HVM domU
> [   28.144127] CPU0 attaching sched-domain:
> [   28.144131]  domain 0: span 0-3 level CPU
> [   28.144133]   groups: 0 1 2 3
> [   28.144139] CPU1 attaching sched-domain:
> [   28.144142]  domain 0: span 0-3 level CPU
> [   28.144145]   groups: 1 2 3 0
> [   28.144150] CPU2 attaching sched-domain:
> [   28.144152]  domain 0: span 0-3 level CPU
> [   28.144155]   groups: 2 3 0 1
> [   28.144160] CPU3 attaching sched-domain:
> [   28.144162]  domain 0: span 0-3 level CPU
> [   28.144165]   groups: 3 0 1 2
> [   28.209159] Destroying IRQ18 without calling free_irq
> [   28.215985] Modules linked in: loop parport_pc parport psmouse evdev serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr i2c_piix4 i2c_core butto
> n processor ext3 jbd mbcache ata_generic ata_piix libata floppy thermal thermal_sys xen_blkfront scsi_mod [last unloaded: scsi_wait_scan]
> [   28.224050] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
> [   28.224050] Call Trace:
> [   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
> [   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
> [   28.224050]  [<ffffffff8104dd7c>] ? warn_slowpath_common+0x77/0xa3
> [   28.224050]  [<ffffffff8104de04>] ? warn_slowpath_fmt+0x51/0x59
> [   28.224050]  [<ffffffff810e4493>] ? get_partial_node+0x15/0x85
> [   28.224050]  [<ffffffff811966fd>] ? kvasprintf+0x41/0x68
> [   28.224050]  [<ffffffff8109639e>] ? dynamic_irq_cleanup_x+0x4b/0xc2
> [   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
> [   28.224050]  [<ffffffff811ef5b7>] ? bind_virq_to_irqhandler+0x14c/0x15d
> [   28.224050]  [<ffffffff8100df77>] ? xen_timer_interrupt+0x0/0x18d
> [   28.224050]  [<ffffffff812f5121>] ? set_cpu_sibling_map+0x2f4/0x311
> [   28.224050]  [<ffffffff8100df0d>] ? xen_setup_timer+0x55/0xa2
> [   28.224050]  [<ffffffff8100df71>] ? xen_hvm_setup_cpu_clockevents+0x17/0x1d
> [   28.224050]  [<ffffffff812f52fc>] ? start_secondary+0x17c/0x185
> [   28.224050] ---[ end trace db1493923b5e103d ]---
> 
> The logs for cpu 2 in my 3.5 kernel is identical to those for cpu 3.
> 
> 
> Wei.
> 

  reply	other threads:[~2012-12-19 16:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-13 15:12 HVM bug: system crashes after offline online a vcpu Wei Liu
2012-12-19 16:04 ` Konrad Rzeszutek Wilk [this message]
2012-12-19 16:18   ` Wei Liu
2012-12-19 17:01   ` Wei Liu
2012-12-19 17:40     ` Konrad Rzeszutek Wilk
2012-12-19 17:52       ` Wei Liu
2012-12-19 19:24         ` Konrad Rzeszutek Wilk
2012-12-20 17:48           ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121219160455.GA12077@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Wei.Liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.