3.4.70+ kernel WARNING spew dysfunction on failed migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* 3.4.70+ kernel WARNING spew dysfunction on failed migration
@ 2014-01-07 18:55 Ian Jackson
  2014-01-07 19:11 ` Konrad Rzeszutek Wilk
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ian Jackson @ 2014-01-07 18:55 UTC (permalink / raw)
  To: konrad.wilk; +Cc: xen-devel

I did the following test:

   mv /etc/xen/scripts/block /etc/xen/scripts/block.aside
   xl migrate debian.guest.osstest localhost

xl did what appears to be the right thing: it did most of the
migration, failed to run the block scripts at the end of the
migration, and destroyed the destination domain and instead resumed
the source guest.

However, the source guest immediately went mad spewing WARNINGs and
was after that no longer contactable via the network and not
apparently responsive on the console.  See below.

This is with:

  [    0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc
  version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013

For reasons I don't understand it doesn't seem to print the actual
kernel git hash in dmesg, but I think it was that from flight 22264,
i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
64-bit Xen.

Thanks,
Ian.

debian login: [  124.595658] PM: freeze of devices complete after 2.980 msecs
[  124.595991] PM: late freeze of devices complete after 0.013 msecs
[  124.600919] PM: noirq freeze of devices complete after 4.884 msecs
[  124.601105] Grant tables using version 2 layout.
[  124.601105] ------------[ cut here ]------------
[  124.601105] kernel BUG at drivers/xen/events.c:1582!
[  124.601105] invalid opcode: 0000 [#1] SMP 
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] 
[  124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1  
[  124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0
[  124.601105] EIP is at xen_irq_resume+0x215/0x370
[  124.601105] EAX: ffffffef EBX: deadbeef ECX: deadbeef EDX: 00000000
[  124.601105] ESI: c190b020 EDI: df461f24 EBP: df451eb8 ESP: df451e10
[  124.601105]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[  124.601105] CR0: 8005003b CR2: 08b7c8a8 CR3: 038f0000 CR4: 00002660
[  124.601105] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  124.601105] DR6: ffff0ff0 DR7: 00000400
[  124.601105] Process migration/0 (pid: 6, ti=df450000 task=df43d860 task.ti=df450000)
[  124.601105] Stack:
[  124.601105]  c104ea40 df451e18 c398b80c deadbeef df461f10 df451e58 c12f350a c19b165c
[  124.601105]  df451e94 00000003 df451e78 c190b080 c190b020 00000000 00000010 00000000
[  124.601105]  00000000 00000000 9420f17e 0008a6c2 fc798ba3 0008a6df 00000004 00000413
[  124.601105] Call Trace:
[  124.601105]  [<c104ea40>] ? xen_iret_crit_fixup+0x3c/0x3c
[  124.601105]  [<c12f350a>] ? gnttab_map_frames_v2+0xda/0x120
[  124.601105]  [<c1055b90>] ? xen_spin_lock+0xa0/0x100
[  124.601105]  [<c104d155>] ? xen_mm_unpin_all+0x65/0x80
[  124.601105]  [<c12f6cad>] xen_suspend+0x8d/0xc0
[  124.601105]  [<c10e750b>] stop_machine_cpu_stop+0x9b/0x110
[  124.601105]  [<c10e71f7>] cpu_stopper_thread+0xc7/0x1a0
[  124.601105]  [<c10b3f6f>] ? finish_task_switch+0x5f/0xe0
[  124.601105]  [<c10e7470>] ? stop_one_cpu_nowait+0x40/0x40
[  124.601105]  [<c10b682b>] ? default_wake_function+0xb/0x10
[  124.601105]  [<c10af990>] ? __wake_up_common+0x40/0x70
[  124.601105]  [<c16441ad>] ? _raw_spin_unlock_irqrestore+0x2d/0x50
[  124.601105]  [<c10b2479>] ? complete+0x49/0x60
[  124.601105]  [<c10e7130>] ? res_counter_charge+0x180/0x180
[  124.601105]  [<c10a7474>] kthread+0x74/0x80
[  124.601105]  [<c10a7400>] ? kthread_freezable_should_stop+0x60/0x60
[  124.601105]  [<c164b276>] kernel_thread_helper+0x6/0x10
[  124.601105] Code: 22 e8 ff ff 8b 55 8c 89 d8 e8 88 e6 ff ff 83 45 94 01 83 7d 94 04 0f 84 80 fe ff ff 8b 55 8c 8b 04 95 e0 11 88 c1 e9 64 ff ff ff <0f> 0b eb fe 0f 0b eb fe 8b 1d 00 60 85 c1 81 fb 00 60 85 c1 74 
[  124.601105] EIP: [<c12f5d25>] xen_irq_resume+0x215/0x370 SS:ESP 0069:df451e10
[  124.601105] ---[ end trace 69a5c8cd56e77bce ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/tick-sched.c:464 tick_nohz_idle_enter+0x7a/0x90()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D      3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90
[  124.601105]  [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10d1b1a>] tick_nohz_idle_enter+0x7a/0x90
[  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bcf ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
[  124.601105]  [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0
[  124.601105]  [<c10886ea>] ? print_oops_end_marker+0x2a/0x30
[  124.601105]  [<c10888fd>] ? warn_slowpath_common+0x7d/0xa0
[  124.601105]  [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90
[  124.601105]  [<c10d1ae5>] tick_nohz_idle_enter+0x45/0x90
[  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bd0 ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
[  124.601105]  [<c10d1379>] tick_check_idle+0x39/0xf0
[  124.601105]  [<c108f06c>] irq_enter+0x4c/0x70
[  124.601105]  [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30
[  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
[  124.601105]  [<c1002227>] ? hypercall_page+0x227/0x1000
[  124.601105]  [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30
[  124.601105]  [<c104e994>] check_events+0x8/0xc
[  124.601105]  [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc
[  124.601105]  [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  124.601105]  [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90
[  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bd1 ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
[  124.601105]  [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0
[  124.601105]  [<c12f4288>] ? info_for_irq+0x8/0x20
[  124.601105]  [<c12f47c3>] ? evtchn_from_irq+0x13/0x40
[  124.601105]  [<c104e789>] ? xen_clocksource_read+0x19/0x20
[  124.601105]  [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0
[  124.601105]  [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80
[  124.601105]  [<c108ef1f>] irq_exit+0x4f/0xb0
[  124.601105]  [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30
[  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
[  124.601105]  [<c1002227>] ? hypercall_page+0x227/0x1000
[  124.601105]  [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30
[  124.601105]  [<c104e994>] check_events+0x8/0xc
[  124.601105]  [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc
[  124.601105]  [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  124.601105]  [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90
[  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bd2 ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
[  124.601105]  [<c10d1379>] tick_check_idle+0x39/0xf0
[  124.601105]  [<c108f06c>] irq_enter+0x4c/0x70
[  124.601105]  [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30
[  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
[  124.601105]  [<c10023a7>] ? hypercall_page+0x3a7/0x1000
[  124.601105]  [<c104e1c2>] ? xen_safe_halt+0x12/0x20
[  124.601105]  [<c104e1b0>] ? xen_irq_disable+0x10/0x10
[  124.601105]  [<c105ed2b>] default_idle+0x5b/0x190
[  124.601105]  [<c1040054>] ? svm_set_tsc_khz+0x74/0x140
[  124.601105]  [<c105e27f>] cpu_idle+0x6f/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bd3 ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
[  124.601105]  [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0
[  124.601105]  [<c12f4288>] ? info_for_irq+0x8/0x20
[  124.601105]  [<c12f47c3>] ? evtchn_from_irq+0x13/0x40
[  124.601105]  [<c104e789>] ? xen_clocksource_read+0x19/0x20
[  124.601105]  [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0
[  124.601105]  [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80
[  124.601105]  [<c108ef1f>] irq_exit+0x4f/0xb0
[  124.601105]  [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30
[  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
[  124.601105]  [<c10023a7>] ? hypercall_page+0x3a7/0x1000
[  124.601105]  [<c104e1c2>] ? xen_safe_halt+0x12/0x20
[  124.601105]  [<c104e1b0>] ? xen_irq_disable+0x10/0x10
[  124.601105]  [<c105ed2b>] default_idle+0x5b/0x190
[  124.601105]  [<c1040054>] ? svm_set_tsc_khz+0x74/0x140
[  124.601105]  [<c105e27f>] cpu_idle+0x6f/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bd4 ]---
[  124.601105] ------------[ cut here ]------------
[  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
[  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
[  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
[  124.601105] Call Trace:
[  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
[  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
[  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
[  124.601105]  [<c10d1379>] tick_check_idle+0x39/0xf0
[  124.601105]  [<c108f06c>] irq_enter+0x4c/0x70
[  124.601105]  [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30
[  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
[  124.601105]  [<c10023a7>] ? hypercall_page+0x3a7/0x1000
[  124.601105]  [<c104e1c2>] ? xen_safe_halt+0x12/0x20
[  124.601105]  [<c104e1b0>] ? xen_irq_disable+0x10/0x10
[  124.601105]  [<c105ed2b>] default_idle+0x5b/0x190
[  124.601105]  [<c1040054>] ? svm_set_tsc_khz+0x74/0x140
[  124.601105]  [<c105e27f>] cpu_idle+0x6f/0xa0
[  124.601105]  [<c16242f8>] rest_init+0x58/0x60
[  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
[  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
[  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
[  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
[  124.601105] ---[ end trace 69a5c8cd56e77bd5 ]---
[  124.601105] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 18:55 3.4.70+ kernel WARNING spew dysfunction on failed migration Ian Jackson
@ 2014-01-07 19:11 ` Konrad Rzeszutek Wilk
  2014-01-07 19:23   ` Ian Jackson
  2014-01-07 22:43 ` Ian Campbell
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-07 19:11 UTC (permalink / raw)
  To: Ian Jackson, boris.ostrovsky, david.vrabel; +Cc: xen-devel

On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote:
> I did the following test:
> 
>    mv /etc/xen/scripts/block /etc/xen/scripts/block.aside
>    xl migrate debian.guest.osstest localhost
> 
> xl did what appears to be the right thing: it did most of the
> migration, failed to run the block scripts at the end of the
> migration, and destroyed the destination domain and instead resumed
> the source guest.
> 
> However, the source guest immediately went mad spewing WARNINGs and
> was after that no longer contactable via the network and not
> apparently responsive on the console.  See below.
> 
> This is with:
> 
>   [    0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc
>   version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013
> 
> For reasons I don't understand it doesn't seem to print the actual
> kernel git hash in dmesg, but I think it was that from flight 22264,
> i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
> 64-bit Xen.

This a bit of ancient kernel. Does it show up with 3.12?

CC-ing the other maintainers.
> 
> Thanks,
> Ian.
> 
> debian login: [  124.595658] PM: freeze of devices complete after 2.980 msecs
> [  124.595991] PM: late freeze of devices complete after 0.013 msecs
> [  124.600919] PM: noirq freeze of devices complete after 4.884 msecs
> [  124.601105] Grant tables using version 2 layout.
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] kernel BUG at drivers/xen/events.c:1582!
> [  124.601105] invalid opcode: 0000 [#1] SMP 
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] 
> [  124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1  
> [  124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0
> [  124.601105] EIP is at xen_irq_resume+0x215/0x370
> [  124.601105] EAX: ffffffef EBX: deadbeef ECX: deadbeef EDX: 00000000
> [  124.601105] ESI: c190b020 EDI: df461f24 EBP: df451eb8 ESP: df451e10
> [  124.601105]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> [  124.601105] CR0: 8005003b CR2: 08b7c8a8 CR3: 038f0000 CR4: 00002660
> [  124.601105] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  124.601105] DR6: ffff0ff0 DR7: 00000400
> [  124.601105] Process migration/0 (pid: 6, ti=df450000 task=df43d860 task.ti=df450000)
> [  124.601105] Stack:
> [  124.601105]  c104ea40 df451e18 c398b80c deadbeef df461f10 df451e58 c12f350a c19b165c
> [  124.601105]  df451e94 00000003 df451e78 c190b080 c190b020 00000000 00000010 00000000
> [  124.601105]  00000000 00000000 9420f17e 0008a6c2 fc798ba3 0008a6df 00000004 00000413
> [  124.601105] Call Trace:
> [  124.601105]  [<c104ea40>] ? xen_iret_crit_fixup+0x3c/0x3c
> [  124.601105]  [<c12f350a>] ? gnttab_map_frames_v2+0xda/0x120
> [  124.601105]  [<c1055b90>] ? xen_spin_lock+0xa0/0x100
> [  124.601105]  [<c104d155>] ? xen_mm_unpin_all+0x65/0x80
> [  124.601105]  [<c12f6cad>] xen_suspend+0x8d/0xc0
> [  124.601105]  [<c10e750b>] stop_machine_cpu_stop+0x9b/0x110
> [  124.601105]  [<c10e71f7>] cpu_stopper_thread+0xc7/0x1a0
> [  124.601105]  [<c10b3f6f>] ? finish_task_switch+0x5f/0xe0
> [  124.601105]  [<c10e7470>] ? stop_one_cpu_nowait+0x40/0x40
> [  124.601105]  [<c10b682b>] ? default_wake_function+0xb/0x10
> [  124.601105]  [<c10af990>] ? __wake_up_common+0x40/0x70
> [  124.601105]  [<c16441ad>] ? _raw_spin_unlock_irqrestore+0x2d/0x50
> [  124.601105]  [<c10b2479>] ? complete+0x49/0x60
> [  124.601105]  [<c10e7130>] ? res_counter_charge+0x180/0x180
> [  124.601105]  [<c10a7474>] kthread+0x74/0x80
> [  124.601105]  [<c10a7400>] ? kthread_freezable_should_stop+0x60/0x60
> [  124.601105]  [<c164b276>] kernel_thread_helper+0x6/0x10
> [  124.601105] Code: 22 e8 ff ff 8b 55 8c 89 d8 e8 88 e6 ff ff 83 45 94 01 83 7d 94 04 0f 84 80 fe ff ff 8b 55 8c 8b 04 95 e0 11 88 c1 e9 64 ff ff ff <0f> 0b eb fe 0f 0b eb fe 8b 1d 00 60 85 c1 81 fb 00 60 85 c1 74 
> [  124.601105] EIP: [<c12f5d25>] xen_irq_resume+0x215/0x370 SS:ESP 0069:df451e10
> [  124.601105] ---[ end trace 69a5c8cd56e77bce ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/tick-sched.c:464 tick_nohz_idle_enter+0x7a/0x90()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D      3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90
> [  124.601105]  [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10d1b1a>] tick_nohz_idle_enter+0x7a/0x90
> [  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bcf ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
> [  124.601105]  [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0
> [  124.601105]  [<c10886ea>] ? print_oops_end_marker+0x2a/0x30
> [  124.601105]  [<c10888fd>] ? warn_slowpath_common+0x7d/0xa0
> [  124.601105]  [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90
> [  124.601105]  [<c10d1ae5>] tick_nohz_idle_enter+0x45/0x90
> [  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bd0 ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
> [  124.601105]  [<c10d1379>] tick_check_idle+0x39/0xf0
> [  124.601105]  [<c108f06c>] irq_enter+0x4c/0x70
> [  124.601105]  [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30
> [  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
> [  124.601105]  [<c1002227>] ? hypercall_page+0x227/0x1000
> [  124.601105]  [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30
> [  124.601105]  [<c104e994>] check_events+0x8/0xc
> [  124.601105]  [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc
> [  124.601105]  [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4
> [  124.601105]  [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90
> [  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bd1 ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
> [  124.601105]  [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0
> [  124.601105]  [<c12f4288>] ? info_for_irq+0x8/0x20
> [  124.601105]  [<c12f47c3>] ? evtchn_from_irq+0x13/0x40
> [  124.601105]  [<c104e789>] ? xen_clocksource_read+0x19/0x20
> [  124.601105]  [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0
> [  124.601105]  [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80
> [  124.601105]  [<c108ef1f>] irq_exit+0x4f/0xb0
> [  124.601105]  [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30
> [  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
> [  124.601105]  [<c1002227>] ? hypercall_page+0x227/0x1000
> [  124.601105]  [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30
> [  124.601105]  [<c104e994>] check_events+0x8/0xc
> [  124.601105]  [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc
> [  124.601105]  [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4
> [  124.601105]  [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90
> [  124.601105]  [<c105e22a>] cpu_idle+0x1a/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bd2 ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
> [  124.601105]  [<c10d1379>] tick_check_idle+0x39/0xf0
> [  124.601105]  [<c108f06c>] irq_enter+0x4c/0x70
> [  124.601105]  [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30
> [  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
> [  124.601105]  [<c10023a7>] ? hypercall_page+0x3a7/0x1000
> [  124.601105]  [<c104e1c2>] ? xen_safe_halt+0x12/0x20
> [  124.601105]  [<c104e1b0>] ? xen_irq_disable+0x10/0x10
> [  124.601105]  [<c105ed2b>] default_idle+0x5b/0x190
> [  124.601105]  [<c1040054>] ? svm_set_tsc_khz+0x74/0x140
> [  124.601105]  [<c105e27f>] cpu_idle+0x6f/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bd3 ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
> [  124.601105]  [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0
> [  124.601105]  [<c12f4288>] ? info_for_irq+0x8/0x20
> [  124.601105]  [<c12f47c3>] ? evtchn_from_irq+0x13/0x40
> [  124.601105]  [<c104e789>] ? xen_clocksource_read+0x19/0x20
> [  124.601105]  [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0
> [  124.601105]  [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80
> [  124.601105]  [<c108ef1f>] irq_exit+0x4f/0xb0
> [  124.601105]  [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30
> [  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
> [  124.601105]  [<c10023a7>] ? hypercall_page+0x3a7/0x1000
> [  124.601105]  [<c104e1c2>] ? xen_safe_halt+0x12/0x20
> [  124.601105]  [<c104e1b0>] ? xen_irq_disable+0x10/0x10
> [  124.601105]  [<c105ed2b>] default_idle+0x5b/0x190
> [  124.601105]  [<c1040054>] ? svm_set_tsc_khz+0x74/0x140
> [  124.601105]  [<c105e27f>] cpu_idle+0x6f/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bd4 ]---
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100()
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] Pid: 0, comm: swapper/0 Tainted: G      D W    3.4.70+ #1
> [  124.601105] Call Trace:
> [  124.601105]  [<c10888ed>] warn_slowpath_common+0x6d/0xa0
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c10c9fc9>] ? ktime_get+0xe9/0x100
> [  124.601105]  [<c108893d>] warn_slowpath_null+0x1d/0x20
> [  124.601105]  [<c10c9fc9>] ktime_get+0xe9/0x100
> [  124.601105]  [<c10d1379>] tick_check_idle+0x39/0xf0
> [  124.601105]  [<c108f06c>] irq_enter+0x4c/0x70
> [  124.601105]  [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30
> [  124.601105]  [<c164b2c7>] xen_do_upcall+0x7/0xc
> [  124.601105]  [<c10023a7>] ? hypercall_page+0x3a7/0x1000
> [  124.601105]  [<c104e1c2>] ? xen_safe_halt+0x12/0x20
> [  124.601105]  [<c104e1b0>] ? xen_irq_disable+0x10/0x10
> [  124.601105]  [<c105ed2b>] default_idle+0x5b/0x190
> [  124.601105]  [<c1040054>] ? svm_set_tsc_khz+0x74/0x140
> [  124.601105]  [<c105e27f>] cpu_idle+0x6f/0xa0
> [  124.601105]  [<c16242f8>] rest_init+0x58/0x60
> [  124.601105]  [<c1887919>] start_kernel+0x355/0x35b
> [  124.601105]  [<c1887435>] ? kernel_init+0x1cf/0x1cf
> [  124.601105]  [<c18870ba>] i386_start_kernel+0xa9/0xb0
> [  124.601105]  [<c188b733>] xen_start_kernel+0x5c4/0x5cc
> [  124.601105] ---[ end trace 69a5c8cd56e77bd5 ]---
> [  124.601105] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 19:11 ` Konrad Rzeszutek Wilk
@ 2014-01-07 19:23   ` Ian Jackson
  2014-01-07 19:36     ` Boris Ostrovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Jackson @ 2014-01-07 19:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: boris.ostrovsky, david.vrabel, xen-devel

Konrad Rzeszutek Wilk writes ("Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration"):
> On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote:
> > For reasons I don't understand it doesn't seem to print the actual
> > kernel git hash in dmesg, but I think it was that from flight 22264,
> > i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
> > 64-bit Xen.
> 
> This a bit of ancient kernel. Does it show up with 3.12?

3.4.70 is what the osstest push gate is using.  (ISTR trying to switch
to 3.11 but encountering some problem.)

I haven't tried 3.12 but can do so.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 19:23   ` Ian Jackson
@ 2014-01-07 19:36     ` Boris Ostrovsky
  2014-01-07 20:05       ` Boris Ostrovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Boris Ostrovsky @ 2014-01-07 19:36 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, david.vrabel

On 01/07/2014 02:23 PM, Ian Jackson wrote:
> Konrad Rzeszutek Wilk writes ("Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration"):
>> On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote:
>>> For reasons I don't understand it doesn't seem to print the actual
>>> kernel git hash in dmesg, but I think it was that from flight 22264,
>>> i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
>>> 64-bit Xen.
>> This a bit of ancient kernel. Does it show up with 3.12?
> 3.4.70 is what the osstest push gate is using.  (ISTR trying to switch
> to 3.11 but encountering some problem.)
>
> I haven't tried 3.12 but can do so.
>
> Ian.

This is hypercall failing, btw:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/xen/events.c?id=refs/tags/v3.4.75#n1582

-boris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 19:36     ` Boris Ostrovsky
@ 2014-01-07 20:05       ` Boris Ostrovsky
  0 siblings, 0 replies; 12+ messages in thread
From: Boris Ostrovsky @ 2014-01-07 20:05 UTC (permalink / raw)
  To: Ian Jackson; +Cc: david.vrabel, xen-devel

On 01/07/2014 02:36 PM, Boris Ostrovsky wrote:
> On 01/07/2014 02:23 PM, Ian Jackson wrote:
>> Konrad Rzeszutek Wilk writes ("Re: 3.4.70+ kernel WARNING spew 
>> dysfunction on failed migration"):
>>> On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote:
>>>> For reasons I don't understand it doesn't seem to print the actual
>>>> kernel git hash in dmesg, but I think it was that from flight 22264,
>>>> i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
>>>> 64-bit Xen.
>>> This a bit of ancient kernel. Does it show up with 3.12?
>> 3.4.70 is what the osstest push gate is using.  (ISTR trying to switch
>> to 3.11 but encountering some problem.)
>>
>> I haven't tried 3.12 but can do so.
>>
>> Ian.
>
> This is hypercall failing, btw:
>
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/xen/events.c?id=refs/tags/v3.4.75#n1582 
>

More specifically, it fails

     if ( v->virq_to_evtchn[virq] != 0 )
         ERROR_EXIT(-EEXIST);

in Xen's evtchn_bind_virq().

Would be interesting to see if this is still a problem in new kernels.

-boris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 18:55 3.4.70+ kernel WARNING spew dysfunction on failed migration Ian Jackson
  2014-01-07 19:11 ` Konrad Rzeszutek Wilk
@ 2014-01-07 22:43 ` Ian Campbell
  2014-01-08 13:02 ` Ian Campbell
  2014-01-08 14:19 ` David Vrabel
  3 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2014-01-07 22:43 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Tue, 2014-01-07 at 18:55 +0000, Ian Jackson wrote:
> I did the following test:
> 
>    mv /etc/xen/scripts/block /etc/xen/scripts/block.aside
>    xl migrate debian.guest.osstest localhost
> 
> xl did what appears to be the right thing: it did most of the
> migration, failed to run the block scripts at the end of the
> migration, and destroyed the destination domain and instead resumed
> the source guest.
> 
> However, the source guest immediately went mad spewing WARNINGs and
> was after that no longer contactable via the network and not
> apparently responsive on the console.  See below.

Might this be the libxl resume thing described at the end of:
http://lists.xen.org/archives/html/xen-devel/2013-02/msg00130.html ?

I thought we'd switch to using fast resume by default to workaround
this, but looking at the code it seems not.

It'd be lovely if the slow path finally got implemented instead of
falling through the cracks again.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 18:55 3.4.70+ kernel WARNING spew dysfunction on failed migration Ian Jackson
  2014-01-07 19:11 ` Konrad Rzeszutek Wilk
  2014-01-07 22:43 ` Ian Campbell
@ 2014-01-08 13:02 ` Ian Campbell
  2014-01-08 13:30   ` Processed: " xen
  2014-01-09 19:08   ` Ian Jackson
  2014-01-08 14:19 ` David Vrabel
  3 siblings, 2 replies; 12+ messages in thread
From: Ian Campbell @ 2014-01-08 13:02 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

create ^
title it libxl should implement non-suspend-cancel based resume path
owner Ian Jackson <Ian.Jackson@eu.citrix.com>
thanks

To summarise what I just said to Ian J in the corridor (and lets have a
bug to record it):

There are two mechanisms by which a suspend can be aborted and the
original domain resumed.

The older method is that the toolstack resets a bunch of state (see
tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts
the domain. The domain will see HYPERVISOR_suspend return 0 and will
continue without any realisation that it is actually running in the
original domain and not in a new one. This method is supposed to be
implemented by libxl_domain_resume(suspend_cancel=0) but it is not.

The other method is newer and in this case the toolstack arranges that
HYPERVISOR_suspend returns 1 and restarts it (I beleiv . The domain will
observe this and realise that it has been restarted in the same domain
and will behave accordingly. This method is implemented, correctly
AFAIK, by libxl_domain_resume(suspend_cancel=1).

However the newer method is not available in all kernels, although it
does date from the Linux 2.6.18 days and is implemented in all Linux
pvops kernels I can't speak for others (e.g. BSD). The toolstack is
supposed to check for the XEN_ELFNOTE_SUSPEND_CANCEL ELF note when
building the domain. The presence/absence of this flag needs to be
remembered so that it can be consulted on resume (this also implies
preserving that knowledge over migration).

xl currently uses libxl_domain_resume(suspend_cancel=0) on migration
failure which as it stands won't work for *any* domain. Arguably
switching to suspend_cancel=1 for now will mean that some subset of
kernels will work, and those which don't will not have regressed, until
we can correctly implement the suspend_cancel=0 and the necessary
tracking of XEN_ELFNOTE_SUSPEND_CANCEL.

I've also just noticed that on failure to save (as opposed to migrate)
xl does use suspend_cancel=1.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Processed: Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-08 13:02 ` Ian Campbell
@ 2014-01-08 13:30   ` xen
  2014-01-09 19:08   ` Ian Jackson
  1 sibling, 0 replies; 12+ messages in thread
From: xen @ 2014-01-08 13:30 UTC (permalink / raw)
  To: Ian Campbell, xen-devel

Processing commands for xen@bugs.xenproject.org:

> create ^
Created new bug #30 rooted at `<21196.19900.136146.867552@mariner.uk.xensource.com>'
Title: `Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration'
> title it libxl should implement non-suspend-cancel based resume path
Set title for #30 to `libxl should implement non-suspend-cancel based resume path'
> owner Ian Jackson <Ian.Jackson@eu.citrix.com>
Command failed: Cannot parse arguments at /srv/xen-devel-bugs/lib/emesinae/control.pl line 301, <M> line 36.
Stop processing here.

Modified/created Bugs:
 - 30: http://bugs.xenproject.org/xen/bug/30 (new)

---
Xen Hypervisor Bug Tracker
See http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for information on reporting bugs
Contact xen-bugs-owner@bugs.xenproject.org with any infrastructure issues

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-08 13:02 ` Ian Campbell
  2014-01-08 13:30   ` Processed: " xen
@ 2014-01-09 19:08   ` Ian Jackson
  2014-01-10 10:26     ` Ian Campbell
  1 sibling, 1 reply; 12+ messages in thread
From: Ian Jackson @ 2014-01-09 19:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Ian Campbell writes ("Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration"):
> The older method is that the toolstack resets a bunch of state (see
> tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts
> the domain. The domain will see HYPERVISOR_suspend return 0 and will
> continue without any realisation that it is actually running in the
> original domain and not in a new one. This method is supposed to be
> implemented by libxl_domain_resume(suspend_cancel=0) but it is not.

I have looked into this and I think I can fairly simply implement the
old protocol in libxl.  This is necessary, I think, to preserve our
back-to-3.0 ABI compatibility guarantee.

Looking at a modern pvops Linux kernel, does seem to try to cope with
older hypervisors which don't do the "new" protocol.  So that's a
reasonable thing to start with, but looking at the code in Linux I
suspect it may not actually work very well.  So if anyone has an
ancient test case of some kind that would be helpful...

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-09 19:08   ` Ian Jackson
@ 2014-01-10 10:26     ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2014-01-10 10:26 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Thu, 2014-01-09 at 19:08 +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration"):
> > The older method is that the toolstack resets a bunch of state (see
> > tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts
> > the domain. The domain will see HYPERVISOR_suspend return 0 and will
> > continue without any realisation that it is actually running in the
> > original domain and not in a new one. This method is supposed to be
> > implemented by libxl_domain_resume(suspend_cancel=0) but it is not.
> 
> I have looked into this and I think I can fairly simply implement the
> old protocol in libxl.  This is necessary, I think, to preserve our
> back-to-3.0 ABI compatibility guarantee.
> 
> Looking at a modern pvops Linux kernel, does seem to try to cope with
> older hypervisors which don't do the "new" protocol.  So that's a
> reasonable thing to start with, but looking at the code in Linux I
> suspect it may not actually work very well.  So if anyone has an
> ancient test case of some kind that would be helpful...

The linux-2.6.18-xen.hg kernel ought to work in the old mode I think. Or
any of the SLES fwd ports?

Looks like RHEL4 (linux-2.6.9-89.0.16.EL kernel) doesn't have the
support for the new mode at all.

It would probably be wise to validate this under xend before chasing
red-herrings with xl.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-07 18:55 3.4.70+ kernel WARNING spew dysfunction on failed migration Ian Jackson
                   ` (2 preceding siblings ...)
  2014-01-08 13:02 ` Ian Campbell
@ 2014-01-08 14:19 ` David Vrabel
  2014-01-08 14:24   ` Ian Campbell
  3 siblings, 1 reply; 12+ messages in thread
From: David Vrabel @ 2014-01-08 14:19 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Boris Ostrovsky, xen-devel, Ian Campbell

On 07/01/14 18:55, Ian Jackson wrote:
> I did the following test:
> 
>    mv /etc/xen/scripts/block /etc/xen/scripts/block.aside
>    xl migrate debian.guest.osstest localhost
> 
> xl did what appears to be the right thing: it did most of the
> migration, failed to run the block scripts at the end of the
> migration, and destroyed the destination domain and instead resumed
> the source guest.
> 
> However, the source guest immediately went mad spewing WARNINGs and
> was after that no longer contactable via the network and not
> apparently responsive on the console.  See below.
> 
> This is with:
> 
>   [    0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc
>   version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013
> 
> For reasons I don't understand it doesn't seem to print the actual
> kernel git hash in dmesg, but I think it was that from flight 22264,
> i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
> 64-bit Xen.
> 
> Thanks,
> Ian.
> 
> debian login: [  124.595658] PM: freeze of devices complete after 2.980 msecs
> [  124.595991] PM: late freeze of devices complete after 0.013 msecs
> [  124.600919] PM: noirq freeze of devices complete after 4.884 msecs
> [  124.601105] Grant tables using version 2 layout.
> [  124.601105] ------------[ cut here ]------------
> [  124.601105] kernel BUG at drivers/xen/events.c:1582!
> [  124.601105] invalid opcode: 0000 [#1] SMP 
> [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> [  124.601105] 
> [  124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1  
> [  124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0
> [  124.601105] EIP is at xen_irq_resume+0x215/0x370

We shouldn't be calling xen_irq_resume() when resuming the source VM.
The EVTCHNOP_bind_irq is failing because the VIRQ is still bound.

This would suggest that the suspend hypercall has not correctly returned
the cancelled state.

Could this be because of the tools issue mentioned by Ian C?

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration
  2014-01-08 14:19 ` David Vrabel
@ 2014-01-08 14:24   ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2014-01-08 14:24 UTC (permalink / raw)
  To: David Vrabel; +Cc: Boris Ostrovsky, xen-devel, Ian Jackson

On Wed, 2014-01-08 at 14:19 +0000, David Vrabel wrote:
> On 07/01/14 18:55, Ian Jackson wrote:
> > I did the following test:
> > 
> >    mv /etc/xen/scripts/block /etc/xen/scripts/block.aside
> >    xl migrate debian.guest.osstest localhost
> > 
> > xl did what appears to be the right thing: it did most of the
> > migration, failed to run the block scripts at the end of the
> > migration, and destroyed the destination domain and instead resumed
> > the source guest.
> > 
> > However, the source guest immediately went mad spewing WARNINGs and
> > was after that no longer contactable via the network and not
> > apparently responsive on the console.  See below.
> > 
> > This is with:
> > 
> >   [    0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc
> >   version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013
> > 
> > For reasons I don't understand it doesn't seem to print the actual
> > kernel git hash in dmesg, but I think it was that from flight 22264,
> > i.e.  234d96ee0f3b8e49501d068a2a3165aa4db60903.  It's i386, on a
> > 64-bit Xen.
> > 
> > Thanks,
> > Ian.
> > 
> > debian login: [  124.595658] PM: freeze of devices complete after 2.980 msecs
> > [  124.595991] PM: late freeze of devices complete after 0.013 msecs
> > [  124.600919] PM: noirq freeze of devices complete after 4.884 msecs
> > [  124.601105] Grant tables using version 2 layout.
> > [  124.601105] ------------[ cut here ]------------
> > [  124.601105] kernel BUG at drivers/xen/events.c:1582!
> > [  124.601105] invalid opcode: 0000 [#1] SMP 
> > [  124.601105] Modules linked in: [last unloaded: scsi_wait_scan]
> > [  124.601105] 
> > [  124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1  
> > [  124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0
> > [  124.601105] EIP is at xen_irq_resume+0x215/0x370
> 
> We shouldn't be calling xen_irq_resume() when resuming the source VM.
> The EVTCHNOP_bind_irq is failing because the VIRQ is still bound.
> 
> This would suggest that the suspend hypercall has not correctly returned
> the cancelled state.
> 
> Could this be because of the tools issue mentioned by Ian C?

I'm fairly confident that it is, yes.

(well "this" is actually, toolstack failed to implement the old style
resume but told the guest it had, but not returning cancel...)
Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-01-10 10:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-07 18:55 3.4.70+ kernel WARNING spew dysfunction on failed migration Ian Jackson
2014-01-07 19:11 ` Konrad Rzeszutek Wilk
2014-01-07 19:23   ` Ian Jackson
2014-01-07 19:36     ` Boris Ostrovsky
2014-01-07 20:05       ` Boris Ostrovsky
2014-01-07 22:43 ` Ian Campbell
2014-01-08 13:02 ` Ian Campbell
2014-01-08 13:30   ` Processed: " xen
2014-01-09 19:08   ` Ian Jackson
2014-01-10 10:26     ` Ian Campbell
2014-01-08 14:19 ` David Vrabel
2014-01-08 14:24   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.