* 4.11.0 RC1 panic
@ 2018-04-24 16:06 Manuel Bouyer
2018-04-25 6:58 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-04-24 16:06 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall
[-- Attachment #1: Type: text/plain, Size: 594 bytes --]
Hello,
I tested xen 4.11.0 rc1 with NetBSD as dom0.
I could boot a NetBSD PV domU without problem, but at shutdown time (poweroff
in the domU), I got a Xen panic:
(XEN) Assertion 'cpu < nr_cpu_ids' failed at ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97
A xl destroy instead of poweroff gives the same result.
This happens with both 32bitsPAE and 64bits domU. This doens't seem to
happen with HVM domUs.
Attached are a cut-n-paste of the panic, and the output of xl demsg.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: xen-panic --]
[-- Type: text/plain, Size: 4673 bytes --]
(XEN) Assertion 'cpu < nr_cpu_ids' failed at ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97
(XEN) ----[ Xen-4.11-rcnb0 x86_64 debug=y Tainted: C ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82d080289333>] put_page_from_l1e+0x1a3/0x1f0
(XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor (d0v0)
(XEN) rax: 00000000ffffffff rbx: ffff8300beaad000 rcx: 0000000000000004
(XEN) rdx: ffff8300bf077fff rsi: 007fffffffffffff rdi: ffff8300beaad5d0
(XEN) rbp: ffff82d080382380 rsp: ffff8300bf0779b8 r8: 0000000000000000
(XEN) r9: 0000000000000200 r10: 4000000000000000 r11: ffff82e004207040
(XEN) r12: ffff830213404000 r13: ffff82e004207040 r14: ffff830213404000
(XEN) r15: 1000000000000000 cr0: 000000008005003b cr4: 00000000000026e0
(XEN) cr3: 000000022f0f6000 cr2: 00007f7ff60ce7a0
(XEN) fsb: 00007f7ff7ff36c0 gsb: ffffffff80ca42c0 gss: 0000000000000000
(XEN) ds: 003f es: 003f fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen code around <ffff82d080289333> (put_page_from_l1e+0x1a3/0x1f0):
(XEN) 3b 05 cf 9c 1c 00 72 02 <0f> 0b 89 c2 83 e2 3f 48 8d 7a 01 48 c1 e7 05 48
(XEN) Xen stack trace from rsp=ffff8300bf0779b8:
(XEN) 0000000000000050 ffff820040016000 ffff830213404000 0000000000000000
(XEN) 10ffffffffffffff ffff82d080288780 0000001600000000 0000000100000000
(XEN) 0000000001000000 2400000000000001 ffff82e0042070a0 ffff82e004207080
(XEN) 00ffffffffffffff 10ffffffffffffff 1000000000000000 ffff82d080288e07
(XEN) 00000000002070c0 ffff8300bf077fff ffff830213404000 ffff82e0042070a0
(XEN) ffff82e004207080 ffff830213404000 0000000000000001 ffff82004001e000
(XEN) 0200000000000000 ffff82d08028945f 0000000000000000 ffff82e004207080
(XEN) ffff82d080288869 0000000000210384 0000000000000206 ffff8300bf077fff
(XEN) 4100000000000001 ffff82e004207080 ffff82e004207060 00ffffffffffffff
(XEN) 10ffffffffffffff 1000000000000000 ffff82d080288e07 000000010100ff22
(XEN) ffff8300bf077fff ffff8300bedfc000 ffff82e004207080 ffff82e004207060
(XEN) ffff830213404000 ffff820040011000 00000000ffffffff ffff820040011000
(XEN) ffff82d080288540 0000000000000000 ffff82e004207060 ffff830213404000
(XEN) ffff820040011000 ffff82d0802889eb 0000000000210383 ffff830213404000
(XEN) ffff82d080280c49 6100000000000001 ffff82e004207060 ffff82e00411a0c0
(XEN) 00ffffffffffffff 10ffffffffffffff 1000000000000000 ffff82d080288e07
(XEN) 000000010136d82b ffff8300bf077fff 0000000000000000 ffff82e004207060
(XEN) ffff82e00411a0c0 0000000000208d06 0000000000000000 00000000ffffffff
(XEN) ffff830213404000 ffff82d080289146 0000000000000140 ffff82e00411a0c0
(XEN) ffff82d080288b95 ffff820040001000 ffffffffffffffc0 0000000000000000
(XEN) Xen call trace:
(XEN) [<ffff82d080289333>] put_page_from_l1e+0x1a3/0x1f0
(XEN) [<ffff82d080288780>] free_page_type+0x210/0x790
(XEN) [<ffff82d080288e07>] mm.c#_put_page_type+0x107/0x340
(XEN) [<ffff82d08028945f>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288869>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d080288e07>] mm.c#_put_page_type+0x107/0x340
(XEN) [<ffff82d080288540>] mm.c#put_page_from_l3e+0x1a0/0x1d0
(XEN) [<ffff82d0802889eb>] free_page_type+0x47b/0x790
(XEN) [<ffff82d080280c49>] do_IRQ+0x5e9/0x630
(XEN) [<ffff82d080288e07>] mm.c#_put_page_type+0x107/0x340
(XEN) [<ffff82d080289146>] mm.c#put_page_from_l4e+0x106/0x130
(XEN) [<ffff82d080288b95>] free_page_type+0x625/0x790
(XEN) [<ffff82d080288e07>] mm.c#_put_page_type+0x107/0x340
(XEN) [<ffff82d08028949f>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d080272b3b>] domain.c#relinquish_memory+0xab/0x460
(XEN) [<ffff82d0802810d7>] pirq_guest_eoi+0x27/0x30
(XEN) [<ffff82d080276b43>] domain_relinquish_resources+0x203/0x290
(XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150
(XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
(XEN) [<ffff82d08026ee50>] do_physdev_op_compat+0/0x70
(XEN) [<ffff82d080203210>] do_domctl+0/0x1a90
(XEN) [<ffff82d080367145>] pv_hypercall+0x1f5/0x430
(XEN) [<ffff82d08036d432>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036d43e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036d432>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036d43e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036d432>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036d43e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036d49f>] lstar_enter+0x10f/0x120
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'cpu < nr_cpu_ids' failed at ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
[-- Attachment #3: xen-dmesg --]
[-- Type: text/plain, Size: 10676 bytes --]
(XEN) parameter "gnttab_max_nr_frames" unknown!
Xen 4.11-rcnb0
(XEN) Xen version 4.11-rcnb0 (bouyer@) (gcc (nb2 20150115) 4.8.5) debug=y Tue Apr 24 16:10:25 MEST 2018
(XEN) Latest ChangeSet:
(XEN) Console output is synchronous.
(XEN) Bootloader: unknown
(XEN) Command line: dom0_mem=512M console=com1 com1=9600,8n1 loglvl=all guest_loglvl=all gnttab_max_nr_frames=64 sync_console=1
(XEN) Xen image load base address: 0
(XEN) Video information:
(XEN) VGA is text mode 80x25, font 8x16
(XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN) EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN) Found 1 MBR signatures
(XEN) Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000009ec00 (usable)
(XEN) 00000000000f0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000bf3ff800 (usable)
(XEN) 00000000bf3ff800 - 00000000bf453c00 (ACPI NVS)
(XEN) 00000000bf453c00 - 00000000bf455c00 (ACPI data)
(XEN) 00000000bf455c00 - 00000000c0000000 (reserved)
(XEN) 00000000e0000000 - 00000000fed00400 (reserved)
(XEN) 00000000fed20000 - 00000000feda0000 (reserved)
(XEN) 00000000fee00000 - 00000000fef00000 (reserved)
(XEN) 00000000ffb00000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000238000000 (usable)
(XEN) New Xen image base address: 0xbec00000
(XEN) ACPI: RSDP 000FEC00, 0024 (r2 DELL )
(XEN) ACPI: XSDT 000FC5BB, 0074 (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: FACP 000FC6EB, 00F4 (r3 DELL B9K 15 ASL 61)
(XEN) ACPI: DSDT FFF789A5, 45A0 (r1 DELL dt_ex 1000 INTL 20050624)
(XEN) ACPI: FACS BF3FF800, 0040
(XEN) ACPI: SSDT FFF7D064, 00AA (r1 DELL st_ex 1000 INTL 20050624)
(XEN) ACPI: APIC 000FC7DF, 0092 (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: BOOT 000FC871, 0028 (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: ASF! 000FC899, 0096 (r32 DELL B9K 15 ASL 61)
(XEN) ACPI: MCFG 000FC92F, 003E (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: HPET 000FC96D, 0038 (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: TCPA 000FCBC9, 0032 (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: DMAR 000FCBFB, 0118 (r1 DELL B9K 15 ASL 61)
(XEN) ACPI: SLIC 000FC9A5, 00C0 (r1 DELL B9K 15 ASL 61)
(XEN) System RAM: 8051MB (8244852kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000238000000
(XEN) Domain heap initialised
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 15 (0xf), Stepping 11 (raw 000006fb)
(XEN) found SMP MP-table at 000fe710
(XEN) DMI 2.5 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x808 (32 bits)
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:804,1:0], pm1x_evt[1:800,1:0]
(XEN) ACPI: wakeup_vec[bf3ff80c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x00] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x01] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x02] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x03] disabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1])
(XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a301 base: 0xfed00000
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 8 CPUs (6 hotplug CPUs)
(XEN) IRQ limits: 24 GSI, 376 MSI/MSI-X
(XEN) mce_intel.c:782: MCA Capability: firstbank 1, extended MCE MSR 0, BCAST
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Speculative mitigation facilities:
(XEN) Hardware features:
(XEN) BTI mitigations: Thunk N/A, Others: RSB_NATIVE RSB_VMEXIT
(XEN) XPTI: enabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Platform timer is 14.318MHz HPET
(XEN) Detected 2327.503 MHz processor.
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0804505b8 -> ffff82d080452252
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
(XEN) PCI: MCFG area at e0000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
(XEN) Intel VT-d iommu 2 supported page sizes: 4kB.
(XEN) traps.c:1569: GPF (0000): ffff82d0803f78fb [intel_vtd_setup+0x13b/0x4d0] -> ffff82d08036f2c0
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB.
(XEN) traps.c:1569: GPF (0000): ffff82d0803f78fb [intel_vtd_setup+0x13b/0x4d0] -> ffff82d08036f2c0
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB.
(XEN) traps.c:1569: GPF (0000): ffff82d0803f78fb [intel_vtd_setup+0x13b/0x4d0] -> ffff82d08036f2c0
(XEN) Intel VT-d iommu 3 supported page sizes: 4kB.
(XEN) traps.c:1569: GPF (0000): ffff82d0803f78fb [intel_vtd_setup+0x13b/0x4d0] -> ffff82d08036f2c0
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation not enabled.
(XEN) Intel VT-d Interrupt Remapping not enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
(XEN) I/O virtualisation enabled
(XEN) - Dom0 mode: Relaxed
(XEN) Interrupt remapping disabled
(XEN) nr_sockets: 4
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Allocated console ring of 16 KiB.
(XEN) mwait-idle: does not run on family 6 model 15
(XEN) VMX: Supported advanced features:
(XEN) - APIC MMIO access virtualisation
(XEN) - APIC TPR shadow
(XEN) - Virtual NMI
(XEN) - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) not detected
(XEN) Brought up 2 CPUs
(XEN) build-id: 58c1c38138780bc01086b1f3d58e825ea87f2dc1
(XEN) Running stub recovery selftests...
(XEN) traps.c:1569: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08036f3f2
(XEN) traps.c:754: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d08036f3f2
(XEN) traps.c:1096: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08036f3f2
(XEN) HPET: 0 timers usable for broadcast (4 total)
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) Dom0 has maximum 400 PIRQs
(XEN) grant_table.c:1769:IDLEv0 Expanding d0 grant table from 0 to 1 frames
(XEN) NX (Execute Disable) protection active
(XEN) *** Building a PV Dom0 ***
(XEN) ELF: phdr: paddr=0xffffffff80000000 memsz=0xb828b8
(XEN) ELF: phdr: paddr=0xffffffff80c828c0 memsz=0x24e740
(XEN) ELF: memory: 0xffffffff80000000 -> 0xffffffff80ed1000
(XEN) ELF: __xen_guest: "GUEST_OS=NetBSD,GUEST_VER=4.99,XEN_VER=xen-3.0,LOADER=generic,VIRT_BASE=0xffffffff80000000,ELF_PADDR_OFFSET=0xffffffff80000000,VIRT_ENTRY=0xffffffff80100000,HYPERCALL_PAGE=0x00000101,BSD_SYMTAB=yes"
(XEN) ELF: GUEST_OS="NetBSD"
(XEN) ELF: GUEST_VER="4.99"
(XEN) ELF: XEN_VER="xen-3.0"
(XEN) ELF: LOADER="generic"
(XEN) ELF: VIRT_BASE="0xffffffff80000000"
(XEN) ELF: ELF_PADDR_OFFSET="0xffffffff80000000"
(XEN) ELF: VIRT_ENTRY="0xffffffff80100000"
(XEN) ELF: HYPERCALL_PAGE="0x00000101"
(XEN) ELF: BSD_SYMTAB="yes"
(XEN) ELF: addresses:
(XEN) virt_base = 0xffffffff80000000
(XEN) elf_paddr_offset = 0xffffffff80000000
(XEN) virt_offset = 0x0
(XEN) virt_kstart = 0xffffffff80000000
(XEN) virt_kend = 0xffffffff80ff4510
(XEN) virt_entry = 0xffffffff80100000
(XEN) p2m_base = 0xffffffffffffffff
(XEN) Xen kernel: 64-bit, lsb, compat32
(XEN) Dom0 kernel: 64-bit, lsb, paddr 0xffffffff80000000 -> 0xffffffff80ed1000
(XEN) Dom0 symbol map 0xffffffff80ed1000 -> 0xffffffff80ff4510
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Dom0 alloc.: 000000022e000000->0000000230000000 (122880 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: ffffffff80000000->ffffffff80ff4510
(XEN) Init. ramdisk: ffffffff80ff5000->ffffffff80ff5000
(XEN) Phys-Mach map: ffffffff80ff5000->ffffffff810f5000
(XEN) Start info: ffffffff810f5000->ffffffff810f54b4
(XEN) Xenstore ring: 0000000000000000->0000000000000000
(XEN) Console ring: 0000000000000000->0000000000000000
(XEN) Page tables: ffffffff810f6000->ffffffff81103000
(XEN) Boot stack: ffffffff81103000->ffffffff81104000
(XEN) TOTAL: ffffffff80000000->ffffffff81400000
(XEN) ENTRY ADDRESS: ffffffff80100000
(XEN) Dom0 has maximum 2 VCPUs
(XEN) ELF: phdr 0 at 0xffffffff80000000 -> 0xffffffff80b828b8
(XEN) ELF: phdr 1 at 0xffffffff80c828c0 -> 0xffffffff80d1a578
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs
(XEN) ....................................done.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) ***************************************************
(XEN) WARNING: CONSOLE OUTPUT IS SYNCHRONOUS
(XEN) This option is intended to aid debugging of Xen by ensuring
(XEN) that all output is synchronously delivered on the serial line.
(XEN) However it can introduce SIGNIFICANT latencies and affect
(XEN) timekeeping. It is NOT recommended for production use!
(XEN) ***************************************************
(XEN) 3... 2... 1...
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 444kB init memory
(XEN) io_apic.c:2384: IO-APIC: apic=0, pin=0, irq=0
(XEN) IO-APIC: new_entry=000109f0
(XEN) IO-APIC: old_entry=00010000 pirq=-1
(XEN) IO-APIC: Attempt to modify IO-APIC pin for in-use IRQ!
(XEN) io_apic.c:2384: IO-APIC: apic=0, pin=2, irq=0
(XEN) IO-APIC: new_entry=000109f0
(XEN) IO-APIC: old_entry=000009f0 pirq=-1
(XEN) IO-APIC: Attempt to modify IO-APIC pin for in-use IRQ!
(XEN) io_apic.c:2384: IO-APIC: apic=0, pin=4, irq=4
(XEN) IO-APIC: new_entry=000109f1
(XEN) IO-APIC: old_entry=000009f1 pirq=-1
(XEN) IO-APIC: Attempt to modify IO-APIC pin for in-use IRQ!
(XEN) allocated vector b8 for irq 16
(XEN) allocated vector c0 for irq 17
(XEN) allocated vector c8 for irq 18
(XEN) allocated vector d0 for irq 19
(XEN) allocated vector d8 for irq 20
(XEN) allocated vector 21 for irq 21
(XEN) allocated vector 29 for irq 22
(XEN) allocated vector 31 for irq 23
[-- Attachment #4: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-24 16:06 4.11.0 RC1 panic Manuel Bouyer
@ 2018-04-25 6:58 ` Jan Beulich
2018-04-25 8:16 ` Andrew Cooper
2018-04-25 10:42 ` Manuel Bouyer
0 siblings, 2 replies; 43+ messages in thread
From: Jan Beulich @ 2018-04-25 6:58 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel, julien.grall
>>> On 24.04.18 at 18:06, <bouyer@antioche.eu.org> wrote:
> Hello,
> I tested xen 4.11.0 rc1 with NetBSD as dom0.
> I could boot a NetBSD PV domU without problem, but at shutdown time
> (poweroff
> in the domU), I got a Xen panic:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at
> ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97
>
> A xl destroy instead of poweroff gives the same result.
>
> This happens with both 32bitsPAE and 64bits domU. This doens't seem to
> happen with HVM domUs.
>
> Attached are a cut-n-paste of the panic, and the output of xl demsg.
Without line numbers associated with at least the top stack trace entry
I can only guess what it might be - could you give the patch below a try?
(This may not be the final patch, as I'm afraid there may be some race
here, but I'd have to work this out later.)
Jan
--- unstable.orig/xen/arch/x86/mm.c
+++ unstable/xen/arch/x86/mm.c
@@ -1255,7 +1255,7 @@ void put_page_from_l1e(l1_pgentry_t l1e,
{
for_each_vcpu ( pg_owner, v )
{
- if ( pv_destroy_ldt(v) )
+ if ( pv_destroy_ldt(v) && v->dirty_cpu != VCPU_CPU_CLEAN )
flush_tlb_mask(cpumask_of(v->dirty_cpu));
}
}
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 6:58 ` Jan Beulich
@ 2018-04-25 8:16 ` Andrew Cooper
2018-04-25 10:58 ` Manuel Bouyer
2018-04-25 10:42 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Andrew Cooper @ 2018-04-25 8:16 UTC (permalink / raw)
To: Jan Beulich, Manuel Bouyer; +Cc: xen-devel, julien.grall
On 25/04/2018 07:58, Jan Beulich wrote:
>>>> On 24.04.18 at 18:06, <bouyer@antioche.eu.org> wrote:
>> Hello,
>> I tested xen 4.11.0 rc1 with NetBSD as dom0.
>> I could boot a NetBSD PV domU without problem, but at shutdown time
>> (poweroff
>> in the domU), I got a Xen panic:
>> (XEN) Assertion 'cpu < nr_cpu_ids' failed at
>> ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97
>>
>> A xl destroy instead of poweroff gives the same result.
>>
>> This happens with both 32bitsPAE and 64bits domU. This doens't seem to
>> happen with HVM domUs.
>>
>> Attached are a cut-n-paste of the panic, and the output of xl demsg.
> Without line numbers associated with at least the top stack trace entry
> I can only guess what it might be - could you give the patch below a try?
> (This may not be the final patch, as I'm afraid there may be some race
> here, but I'd have to work this out later.)
>
> Jan
>
> --- unstable.orig/xen/arch/x86/mm.c
> +++ unstable/xen/arch/x86/mm.c
> @@ -1255,7 +1255,7 @@ void put_page_from_l1e(l1_pgentry_t l1e,
> {
> for_each_vcpu ( pg_owner, v )
> {
> - if ( pv_destroy_ldt(v) )
> + if ( pv_destroy_ldt(v) && v->dirty_cpu != VCPU_CPU_CLEAN )
> flush_tlb_mask(cpumask_of(v->dirty_cpu));
> }
> }
>
Manuel: As a tangentially related question, does NetBSD ever try to page
out its LDT?
I'm fairly sure this particular bit of code exists solely for the
Windows XP port to PV guests. The code itself is broken as far as the
"feature" goes (as it only works on present => not present PTE changes,
and not for other PTE permissions changes which would also drop the
segdesc typeref), and dropping it would remove one vcpu scalability
limitation for PV guests.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 6:58 ` Jan Beulich
2018-04-25 8:16 ` Andrew Cooper
@ 2018-04-25 10:42 ` Manuel Bouyer
2018-04-25 14:42 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-04-25 10:42 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, julien.grall
On Wed, Apr 25, 2018 at 12:58:47AM -0600, Jan Beulich wrote:
> >>> On 24.04.18 at 18:06, <bouyer@antioche.eu.org> wrote:
> > Hello,
> > I tested xen 4.11.0 rc1 with NetBSD as dom0.
> > I could boot a NetBSD PV domU without problem, but at shutdown time
> > (poweroff
> > in the domU), I got a Xen panic:
> > (XEN) Assertion 'cpu < nr_cpu_ids' failed at
> > ...1/work/xen-4.11.0-rc1/xen/include/xen/cpumask.h:97
> >
> > A xl destroy instead of poweroff gives the same result.
> >
> > This happens with both 32bitsPAE and 64bits domU. This doens't seem to
> > happen with HVM domUs.
> >
> > Attached are a cut-n-paste of the panic, and the output of xl demsg.
>
> Without line numbers associated with at least the top stack trace entry
> I can only guess what it might be - could you give the patch below a try?
> (This may not be the final patch, as I'm afraid there may be some race
> here, but I'd have to work this out later.)
Yes, this works. thanks !
I'll now put this version on the NetBSD testbed I'm running.
This should put some pressure on it.
thanks !
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 8:16 ` Andrew Cooper
@ 2018-04-25 10:58 ` Manuel Bouyer
0 siblings, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-04-25 10:58 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel, julien.grall, Jan Beulich
On Wed, Apr 25, 2018 at 09:16:59AM +0100, Andrew Cooper wrote:
> Manuel: As a tangentially related question, does NetBSD ever try to page
> out its LDT?
AFAIK no, LDTs are allocated as kernel wired memory
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 10:42 ` Manuel Bouyer
@ 2018-04-25 14:42 ` Manuel Bouyer
2018-04-25 15:28 ` Jan Beulich
2018-04-30 13:31 ` Jan Beulich
0 siblings, 2 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-04-25 14:42 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, julien.grall
[-- Attachment #1: Type: text/plain, Size: 943 bytes --]
On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote:
> > Without line numbers associated with at least the top stack trace entry
> > I can only guess what it might be - could you give the patch below a try?
> > (This may not be the final patch, as I'm afraid there may be some race
> > here, but I'd have to work this out later.)
>
> Yes, this works. thanks !
> I'll now put this version on the NetBSD testbed I'm running.
> This should put some pressure on it.
Running NetBSD tests in several guests I got:
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'oc > 0' failed at mm.c:628
(XEN) ****************************************
(see attached file for complete report).
I got similar panics on Xen 4.8 after patching for meltdown
(XSA-254).
I'll try the patch from XSA-259
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: xen-panic2 --]
[-- Type: text/plain, Size: 3698 bytes --]
(XEN) Assertion 'oc > 0' failed at mm.c:628
(XEN) ----[ Xen-4.11-rcnb0 x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<ffff82d080284a22>] mm.c#dec_linear_entries+0x12/0x20
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d14v3)
(XEN) rax: ffffffffffff0000 rbx: 4400000000000001 rcx: 0000000000189b0d
(XEN) rdx: 0400000000000000 rsi: 0000000000000008 rdi: ffff82e0031349c0
(XEN) rbp: ffff82e0031361a0 rsp: ffff8301bf15fc08 r8: 0000000000000000
(XEN) r9: 0000000000000200 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff82e0031349c0 r13: 0000000000000000 r14: 10ffffffffffffff
(XEN) r15: 1000000000000000 cr0: 0000000080050033 cr4: 00000000000026e4
(XEN) cr3: 00000001b98de000 cr2: 00000000cd9ffe80
(XEN) fsb: 00000000c0e02000 gsb: 0000000000000000 gss: 0000000000000000
(XEN) ds: 0011 es: 0011 fs: 0031 gs: 0011 ss: 0000 cs: e008
(XEN) Xen code around <ffff82d080284a22> (mm.c#dec_linear_entries+0x12/0x20):
(XEN) c1 47 1e 66 85 c0 7f 02 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 41 54
(XEN) Xen stack trace from rsp=ffff8301bf15fc08:
(XEN) ffff82d080288e3e 0000000000800063 ffff8301bf15ffff 4c00000000000002
(XEN) ffff82e0031361a0 ffff82e0031349c0 ffff8301b970e000 0000000000000001
(XEN) ffff82004000b000 0200000000000000 ffff82d08028945f 00000000000001fd
(XEN) ffff82e0031349c0 ffff82d080288869 0000000000189a4e 0000000000000000
(XEN) ffff8301bf15ffff 4400000000000001 ffff82e0031349c0 0000000000000000
(XEN) 00ffffffffffffff 10ffffffffffffff 1000000000000000 ffff82d080288e07
(XEN) 0000000101000206 ffff8301bf15ffff 4400000000000002 0000000000189a4e
(XEN) ffff82e0031349c0 0000000000000000 ffff8301b970e000 ffff82008000c000
(XEN) 0000000000000000 ffff82d08028949f ffff82d0802906cd ffff8300bf9be000
(XEN) 00000001802a7eb2 ffff8301b970e000 0000000000000000 ffff8301b970e000
(XEN) 0000000000000007 ffff8300bf9be000 00007ff000000000 0000000000000000
(XEN) ffff8301b970e000 ffff82e0031ab060 ffff82d0804b0058 ffff82d0804b0060
(XEN) 000000000018d583 000000000018d583 0000000000000004 0000000000189a4e
(XEN) 00000000cd9c9ce4 ffff82008000c018 0000000000000001 00000000cd9c9af4
(XEN) ffff82d080386b30 0000000000000001 ffff8301bf15ffff ffff82d080295190
(XEN) ffff8301bf15fe14 00000001ffffffff ffff82008000c000 0000000000000000
(XEN) 00007ff000000000 000000048036b1d8 cd7cc00000189a4e ffff8301bf15fef8
(XEN) ffff8300bf9be000 00000000000001a0 00000000deadf00d 0000000000000004
(XEN) 00000000deadf00d ffff82d0803672fa ffff82d000007ff0 ffff82d000000000
(XEN) ffff82d000000001 ffff82d0cd9c9ae8 ffff82d08036b1e4 ffff82d08036b1d8
(XEN) Xen call trace:
(XEN) [<ffff82d080284a22>] mm.c#dec_linear_entries+0x12/0x20
(XEN) [<ffff82d080288e3e>] mm.c#_put_page_type+0x13e/0x340
(XEN) [<ffff82d08028945f>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288869>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d080288e07>] mm.c#_put_page_type+0x107/0x340
(XEN) [<ffff82d08028949f>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d0802906cd>] do_mmuext_op+0x73d/0x1810
(XEN) [<ffff82d080295190>] compat_mmuext_op+0x430/0x450
(XEN) [<ffff82d0803672fa>] pv_hypercall+0x3aa/0x430
(XEN) [<ffff82d08036b1e4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036b1d8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036b1e4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036b1d8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036b1e4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036b1d8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036b1e4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d080368b6e>] do_entry_int82+0x1e/0x20
(XEN) [<ffff82d08036b221>] entry_int82+0xb1/0xc0
[-- Attachment #3: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 14:42 ` Manuel Bouyer
@ 2018-04-25 15:28 ` Jan Beulich
2018-04-25 15:57 ` Manuel Bouyer
2018-04-30 13:31 ` Jan Beulich
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-04-25 15:28 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel, julien.grall
>>> On 25.04.18 at 16:42, <bouyer@antioche.eu.org> wrote:
> On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote:
>> > Without line numbers associated with at least the top stack trace entry
>> > I can only guess what it might be - could you give the patch below a try?
>> > (This may not be the final patch, as I'm afraid there may be some race
>> > here, but I'd have to work this out later.)
>>
>> Yes, this works. thanks !
>> I'll now put this version on the NetBSD testbed I'm running.
>> This should put some pressure on it.
>
> Running NetBSD tests in several guests I got:
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'oc > 0' failed at mm.c:628
> (XEN) ****************************************
> (see attached file for complete report).
Do you know what exactly the guest was doing at that time? IOW do
you have any information on how to repro (preferably without having
to run NetBSD)? Did these failures start occurring recently (your
mention of 4.8 seems to suggest otherwise)?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 15:28 ` Jan Beulich
@ 2018-04-25 15:57 ` Manuel Bouyer
0 siblings, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-04-25 15:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, julien.grall
On Wed, Apr 25, 2018 at 09:28:03AM -0600, Jan Beulich wrote:
> >>> On 25.04.18 at 16:42, <bouyer@antioche.eu.org> wrote:
> > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote:
> >> > Without line numbers associated with at least the top stack trace entry
> >> > I can only guess what it might be - could you give the patch below a try?
> >> > (This may not be the final patch, as I'm afraid there may be some race
> >> > here, but I'd have to work this out later.)
> >>
> >> Yes, this works. thanks !
> >> I'll now put this version on the NetBSD testbed I'm running.
> >> This should put some pressure on it.
> >
> > Running NetBSD tests in several guests I got:
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 1:
> > (XEN) Assertion 'oc > 0' failed at mm.c:628
> > (XEN) ****************************************
> > (see attached file for complete report).
>
> Do you know what exactly the guest was doing at that time?
Unfortunably no. It was running the NetBSD test benchs, but as this
is automated I don't even know what version of NetBSD was running
in the guests.
BTW there doesn't seem to be a domain number in the panic message ...
> IOW do
> you have any information on how to repro (preferably without having
> to run NetBSD)?
Unfortunably no, and it's not reliably reproductible either.
A cron job starts running the tests for available builds daily,
and the panic occurs once in a while.
You may be able to reproduce it with a linux dom0:
install anita from http://www.gson.org/netbsd/anita/download/
this is a set of python script; so you should be able to
extract the tar.gz and run the anita script in there.
Then run:
./anita --test-timeout 14400 --vmm xl --vmm-args vcpus=4 --disk-size 2G --memory-size 256M test http://ftp.fr.netbsd.org/pub/NetBSD-daily/HEAD/201804210730Z/amd64/
you will have to adjust the URLs: these are daily builds, and older versions
are deleted when newer ones are build. You can also use other branches
instead of HEAD.
Eventually Xen will panic (but only once in a while).
> Did these failures start occurring recently (your
> mention of 4.8 seems to suggest otherwise)?
Looking at the server's log, the first time I've seen them was with
Xen 4.6.6, with patches up to XSA244. Before that it was running 4.6.5
with patch for XSA-212. It looks like the ASSERT() was added as part of
XSA240.
Then I upgraded to Xen 4.8.x (also with the security patches) but this
didn't fix the problem. I still had it with 4.8.3, and now with 4.11 too
(I didn't try anything else between 4.8 and 4.11)
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-25 14:42 ` Manuel Bouyer
2018-04-25 15:28 ` Jan Beulich
@ 2018-04-30 13:31 ` Jan Beulich
2018-05-01 20:22 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-04-30 13:31 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel, julien.grall
>>> On 25.04.18 at 16:42, <bouyer@antioche.eu.org> wrote:
> On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote:
>> > Without line numbers associated with at least the top stack trace entry
>> > I can only guess what it might be - could you give the patch below a try?
>> > (This may not be the final patch, as I'm afraid there may be some race
>> > here, but I'd have to work this out later.)
>>
>> Yes, this works. thanks !
>> I'll now put this version on the NetBSD testbed I'm running.
>> This should put some pressure on it.
>
> Running NetBSD tests in several guests I got:
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'oc > 0' failed at mm.c:628
> (XEN) ****************************************
> (see attached file for complete report).
So in combination with your later reply I'm confused: Are you observing
this with 64-bit guests as well (your later reply appears to hint towards
64-bit-ness), or (as the stack trace suggests) only 32-bit ones? Knowing
this may already narrow areas where to look.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-04-30 13:31 ` Jan Beulich
@ 2018-05-01 20:22 ` Manuel Bouyer
2018-05-15 9:30 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-05-01 20:22 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, julien.grall
On Mon, Apr 30, 2018 at 07:31:28AM -0600, Jan Beulich wrote:
> >>> On 25.04.18 at 16:42, <bouyer@antioche.eu.org> wrote:
> > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote:
> >> > Without line numbers associated with at least the top stack trace entry
> >> > I can only guess what it might be - could you give the patch below a try?
> >> > (This may not be the final patch, as I'm afraid there may be some race
> >> > here, but I'd have to work this out later.)
> >>
> >> Yes, this works. thanks !
> >> I'll now put this version on the NetBSD testbed I'm running.
> >> This should put some pressure on it.
> >
> > Running NetBSD tests in several guests I got:
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 1:
> > (XEN) Assertion 'oc > 0' failed at mm.c:628
> > (XEN) ****************************************
> > (see attached file for complete report).
>
> So in combination with your later reply I'm confused: Are you observing
> this with 64-bit guests as well (your later reply appears to hint towards
> 64-bit-ness), or (as the stack trace suggests) only 32-bit ones? Knowing
> this may already narrow areas where to look.
I've seen it a server where, I think, only 32bits domUs are running.
But the dom0 is a 64bits NetBSD anyway.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-05-01 20:22 ` Manuel Bouyer
@ 2018-05-15 9:30 ` Jan Beulich
2018-05-22 11:01 ` Manuel Bouyer
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-05-15 9:30 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 01.05.18 at 22:22, <bouyer@antioche.eu.org> wrote:
> On Mon, Apr 30, 2018 at 07:31:28AM -0600, Jan Beulich wrote:
>> >>> On 25.04.18 at 16:42, <bouyer@antioche.eu.org> wrote:
>> > On Wed, Apr 25, 2018 at 12:42:42PM +0200, Manuel Bouyer wrote:
>> >> > Without line numbers associated with at least the top stack trace entry
>> >> > I can only guess what it might be - could you give the patch below a try?
>> >> > (This may not be the final patch, as I'm afraid there may be some race
>> >> > here, but I'd have to work this out later.)
>> >>
>> >> Yes, this works. thanks !
>> >> I'll now put this version on the NetBSD testbed I'm running.
>> >> This should put some pressure on it.
>> >
>> > Running NetBSD tests in several guests I got:
>> > (XEN)
>> > (XEN) ****************************************
>> > (XEN) Panic on CPU 1:
>> > (XEN) Assertion 'oc > 0' failed at mm.c:628
>> > (XEN) ****************************************
>> > (see attached file for complete report).
>>
>> So in combination with your later reply I'm confused: Are you observing
>> this with 64-bit guests as well (your later reply appears to hint towards
>> 64-bit-ness), or (as the stack trace suggests) only 32-bit ones? Knowing
>> this may already narrow areas where to look.
>
> I've seen it a server where, I think, only 32bits domUs are running.
> But the dom0 is a 64bits NetBSD anyway.
Right; Dom0 bitness is of no interest. I've been going through numerous
possibly racing combinations of code paths, without being able to spot
anything yet. I'm afraid I'm not in the position to try to set up the full
environment you're observing the problem in. It would therefore really
help if you could
- debug this yourself, or
- reduce the test environment (ideally to a simple [XTF?] test), or
- at least narrow the conditions, or
- at the very least summarize the relevant actions NetBSD takes in
terms of page table management, to hopefully reduce the sets of
code paths potentially involved (for example, across a larger set of
crashes knowing whether UNPIN is always involved would be
helpful; I've been blindly assuming it would be short of having
further data)
(besides a more reliable confirmation - or otherwise - that this indeed
is an issue with 32-bit guests only).
While I think I have ruled out the TLB flush time stamp setting still
happening too early / wrongly in certain cases, there's a small
debugging patch that I would hope could help prove this one or the
other way (see below).
Btw: You've said earlier that there wouldn't be a domain number in
the panic message. However,
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d14v3)
has it (at the end: domain 14, vCPU 3). Just in case this helps
identifying further useful pieces of information.
Jan
--- unstable.orig/xen/arch/x86/mm.c
+++ unstable/xen/arch/x86/mm.c
@@ -578,7 +578,11 @@ static inline void set_tlbflush_timestam
*/
if ( !(page->count_info & PGC_page_table) ||
!shadow_mode_enabled(page_get_owner(page)) )
+ {
+ /* NB: This depends on WRAP_MASK in flushtlb.c to be <= 0xffff. */
+ ASSERT(!page->linear_pt_count);
page_set_tlbflush_timestamp(page);
+ }
}
const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-05-15 9:30 ` Jan Beulich
@ 2018-05-22 11:01 ` Manuel Bouyer
2018-05-22 14:46 ` Jan Beulich
2018-06-10 9:54 ` Jan Beulich
0 siblings, 2 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-05-22 11:01 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Tue, May 15, 2018 at 03:30:17AM -0600, Jan Beulich wrote:
> >> So in combination with your later reply I'm confused: Are you observing
> >> this with 64-bit guests as well (your later reply appears to hint towards
> >> 64-bit-ness), or (as the stack trace suggests) only 32-bit ones? Knowing
> >> this may already narrow areas where to look.
> >
> > I've seen it a server where, I think, only 32bits domUs are running.
> > But the dom0 is a 64bits NetBSD anyway.
>
> Right; Dom0 bitness is of no interest. I've been going through numerous
> possibly racing combinations of code paths, without being able to spot
> anything yet. I'm afraid I'm not in the position to try to set up the full
> environment you're observing the problem in. It would therefore really
> help if you could
> - debug this yourself, or
In my experience this kind of bug can only be found by code inspection,
or by putting asserts to try to detect the problem earlier. Both needs
good knowledge of the affected code, and I don't have this knowledge.
> - reduce the test environment (ideally to a simple [XTF?] test), or
> - at least narrow the conditions, or
Now that I know where to find the domU number in the panic message,
I can say that, so far, only 32bit domUs have caused this assert failure.
> - at the very least summarize the relevant actions NetBSD takes in
> terms of page table management, to hopefully reduce the sets of
> code paths potentially involved (for example, across a larger set of
> crashes knowing whether UNPIN is always involved would be
> helpful; I've been blindly assuming it would be short of having
> further data)
So far I've seen 2 stack traces with 4.11:
(XEN) Xen call trace:
(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
(XEN) [<ffff82d08023a00d>] _spin_lock+0xd/0x50
(XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
(XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460
(XEN) [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290
(XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150
(XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
(XEN) [<ffff82d080203210>] do_domctl+0/0x1a90
(XEN) [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430
(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036e48c>] lstar_enter+0x10c/0x120
and
(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
(XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
(XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d080290b6d>] do_mmuext_op+0x73d/0x1810
(XEN) [<ffff82d080295630>] compat_mmuext_op+0x430/0x450
(XEN) [<ffff82d080367d4a>] pv_hypercall+0x3aa/0x430
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036957e>] do_entry_int82+0x1e/0x20
(XEN) [<ffff82d08036bc31>] entry_int82+0xb1/0xc0
both are from 4.11rc4
> (besides a more reliable confirmation - or otherwise - that this indeed
> is an issue with 32-bit guests only).
>
> While I think I have ruled out the TLB flush time stamp setting still
> happening too early / wrongly in certain cases, there's a small
> debugging patch that I would hope could help prove this one or the
> other way (see below).
I applied this patch to 4.11rc4 a week ago, but the assert didn't fire so far.
t still panics with:
(XEN) Assertion 'oc > 0' failed at mm.c:681
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-05-22 11:01 ` Manuel Bouyer
@ 2018-05-22 14:46 ` Jan Beulich
2018-06-10 9:54 ` Jan Beulich
1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2018-05-22 14:46 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 22.05.18 at 13:01, <bouyer@antioche.eu.org> wrote:
> On Tue, May 15, 2018 at 03:30:17AM -0600, Jan Beulich wrote:
>> - reduce the test environment (ideally to a simple [XTF?] test), or
>> - at least narrow the conditions, or
>
> Now that I know where to find the domU number in the panic message,
> I can say that, so far, only 32bit domUs have caused this assert failure.
>
>> - at the very least summarize the relevant actions NetBSD takes in
>> terms of page table management, to hopefully reduce the sets of
>> code paths potentially involved (for example, across a larger set of
>> crashes knowing whether UNPIN is always involved would be
>> helpful; I've been blindly assuming it would be short of having
>> further data)
>
> So far I've seen 2 stack traces with 4.11:
> (XEN) Xen call trace:
> (XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
> (XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
> (XEN) [<ffff82d08023a00d>] _spin_lock+0xd/0x50
> (XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
> (XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
> (XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
> (XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
> (XEN) [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460
> (XEN) [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290
> (XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150
> (XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
> (XEN) [<ffff82d080203210>] do_domctl+0/0x1a90
> (XEN) [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430
> (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN) [<ffff82d08036e48c>] lstar_enter+0x10c/0x120
That's interesting: So far I've been working with the assumption that
there would be a race of the put_page_from_l2e() with some other
piece of code. The issue happening out of domain_relinquish_resources()
pretty much excludes this, and instead suggests that such a race (if
there is one in the first place, but you seeing this only sporadically
highly suggests so) would sit somewhere earlier, perhaps when the
page gets established as a recursive L2 one. Unless someone else
gets to this earlier than me, I'll have to go through the related code
another time with this property in mind.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-05-22 11:01 ` Manuel Bouyer
2018-05-22 14:46 ` Jan Beulich
@ 2018-06-10 9:54 ` Jan Beulich
2018-06-10 10:57 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-06-10 9:54 UTC (permalink / raw)
To: bouyer; +Cc: xen-devel
>>> Manuel Bouyer <bouyer@antioche.eu.org> 05/22/18 1:01 PM >>>
>So far I've seen 2 stack traces with 4.11:
>(XEN) Xen call trace:
>(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
>(XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
>(XEN) [<ffff82d08023a00d>] _spin_lock+0xd/0x50
>(XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
>(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
>(XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
>(XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
>(XEN) [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460
>(XEN) [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290
>(XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150
>(XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
>(XEN) [<ffff82d080203210>] do_domctl+0/0x1a90
>(XEN) [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430
>(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
>(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
>(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
>(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
>(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
>(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
>(XEN) [<ffff82d08036e48c>] lstar_enter+0x10c/0x120
>
>and
>(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
>(XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
>(XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
>(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
>(XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
>(XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
>(XEN) [<ffff82d080290b6d>] do_mmuext_op+0x73d/0x1810
>(XEN) [<ffff82d080295630>] compat_mmuext_op+0x430/0x450
>(XEN) [<ffff82d080367d4a>] pv_hypercall+0x3aa/0x430
>(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
>(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
>(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
>(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
>(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
>(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
>(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
>(XEN) [<ffff82d08036957e>] do_entry_int82+0x1e/0x20
>(XEN) [<ffff82d08036bc31>] entry_int82+0xb1/0xc0
>
>both are from 4.11rc4
So I've been trying to look into this some more, and I've noticed an oddity in
the raw stack dump you had provided with the first report. Unfortunately you
didn't include that part for either of the above (the first one as being on a
different path would be of particular interest). Additionally, to be able to check
whether the (type_info) values on the stack really point at some anomaly, I'd
need the xen-syms (or xen.efi) from that same build of Xen. That'll allow me
to determine whether the values are simply leftovers from prior function
invocations. Otherwise, if they're live values, it'd be suspicious for
free_page_type() to be called on a page the type refcount of which is still 2.
As I assume that you don't have recorded/stored the additional bits I'm asking
for, I do realize that this means that unfortunately you'll have to obtain the data
another time. I'm sorry for that.
Two other questions on the internals of NetBSD page table management: Are all
updates to a given (set of) page table(s) fully serialized, e.g. via a respective
spin lock? Could you additionally point me at that code, or give an outline of
how the (un)pinning of L2 tables in the 32-bit case works?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-10 9:54 ` Jan Beulich
@ 2018-06-10 10:57 ` Manuel Bouyer
2018-06-10 15:38 ` Jan Beulich
2018-06-12 7:57 ` Jan Beulich
0 siblings, 2 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-10 10:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Sun, Jun 10, 2018 at 03:54:45AM -0600, Jan Beulich wrote:
> [...]
>
> So I've been trying to look into this some more, and I've noticed an oddity in
> the raw stack dump you had provided with the first report. Unfortunately you
> didn't include that part for either of the above (the first one as being on a
> different path would be of particular interest).
> Additionally, to be able to check
> whether the (type_info) values on the stack really point at some anomaly, I'd
> need the xen-syms (or xen.efi) from that same build of Xen. That'll allow me
> to determine whether the values are simply leftovers from prior function
> invocations. Otherwise, if they're live values, it'd be suspicious for
> free_page_type() to be called on a page the type refcount of which is still 2.
>
> As I assume that you don't have recorded/stored the additional bits I'm asking
> for, I do realize that this means that unfortunately you'll have to obtain the data
> another time. I'm sorry for that.
>
Actually I have them: I have complete logs of the serial console,
and I still have the built directory.
This one is from a newer build (4.11.0rc4).
I've put the binary files at ftp://asim.lip6.fr/outgoing/bouyer/xen-debug/
(XEN) Xen version 4.11-rcnb0 (bouyer@) (gcc (nb2 20150115) 4.8.5) debug=y Tue M
ay 15 17:21:40 MEST 2018
[...]
(XEN) Assertion 'oc > 0' failed at mm.c:681
(XEN) ----[ Xen-4.11-rcnb0 x86_64 debug=y Not tainted ]----
(XEN) CPU: 4
(XEN) RIP: e008:[<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d0v0)
(XEN) rax: ffffffffffff0000 rbx: 4400000000000001 rcx: 00000000001a3200
(XEN) rdx: 0400000000000000 rsi: 000000000000002e rdi: ffff82e003465040
(XEN) rbp: ffff82e003464000 rsp: ffff8301bf127bb0 r8: 0000000000000000
(XEN) r9: 0000000000000200 r10: 4000000000000000 r11: ffff82e003465040
(XEN) r12: ffff82e003465040 r13: 0000000000000000 r14: 10ffffffffffffff
(XEN) r15: 1000000000000000 cr0: 000000008005003b cr4: 0000000000002660
(XEN) cr3: 00000001b9096000 cr2: 00007f7ff60ce790
(XEN) fsb: 00007f7ff7ff36c0 gsb: ffffffff80cc8500 gss: 0000000000000000
(XEN) ds: 003f es: 003f fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen code around <ffff82d080284bd2> (mm.c#dec_linear_entries+0x12/0x20):
(XEN) c1 47 1e 66 85 c0 7f 02 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 41 54
(XEN) Xen stack trace from rsp=ffff8301bf127bb0:
(XEN) ffff82d08028922e 000000000023a00d ffff8301bf127fff ffff82d08023a00d
(XEN) ffff82e003464000 ffff82e003465040 ffff8301b98b6000 0000000000000001
(XEN) ffff820040030000 0200000000000000 ffff82d0802898af 00000000000001fc
(XEN) ffff82e003465040 ffff82d080288c59 00000000001a3282 0000001500000000
(XEN) ffff8301bf127fff 4400000000000001 ffff82e003465040 0000000000000000
(XEN) 00ffffffffffffff 10ffffffffffffff 1000000000000000 ffff82d0802891f7
(XEN) 000000010122471f ffff8301bf127fff ffff8301bf127d10 ffff82e003465040
(XEN) ffff8301bf127d10 ffff8301b98b6028 ffff8301b98b6000 ffff8301b98b6020
(XEN) ffff82e003465050 ffff82d0802898ef ffff82d080272adb 0000000000000068
(XEN) e400000000000000 ffff8301bf127fff 8000000000000000 ffff8301b98b6000
(XEN) 0000000000000000 ffff8301b98b6018 deadbeefdeadf00d 0000000000000001
(XEN) 00007f7ff7b1b004 ffff82d080276ae3 ffff8301b98b6000 00007f7ff7b1b004
(XEN) ffff82d0802068bd ffff8301b98b6000 00007f7ff7b1b004 0000000000000000
(XEN) ffff82d0802039e3 ffff82d080457640 ffff8301bf0e3fa0 0000000000000002
(XEN) 00000000000000f0 0000000000000000 0000000000000000 0000000000000000
(XEN) 000000000000000f 0000000000000000 0000001000000002 00007f7ff7c00016
(XEN) 00007f7ff7ffe400 0000000000000016 00007f7fffffd510 00007f7ff74c97d1
(XEN) 00007f7ff7c02d7d 0000000000000246 0000000000000000 0000000000000000
(XEN) 0000000000000000 00007f7ff7b10800 0000000000000016 0000000000000016
(XEN) 0000000000000000 00007f7ff7b10800 0000000000000202 00007f7ff7ffa800
(XEN) Xen call trace:
(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
(XEN) [<ffff82d08023a00d>] _spin_lock+0xd/0x50
(XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
(XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460
(XEN) [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290
(XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150
(XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
(XEN) [<ffff82d080203210>] do_domctl+0/0x1a90
(XEN) [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430
(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
(XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
(XEN) [<ffff82d08036e48c>] lstar_enter+0x10c/0x120
The second one is from the same build:
(XEN) Xen version 4.11-rcnb0 (bouyer@) (gcc (nb2 20150115) 4.8.5) debug=y Tue M
ay 15 17:21:40 MEST 2018
[...]
(XEN) Assertion 'oc > 0' failed at mm.c:681
(XEN) ----[ Xen-4.11-rcnb0 x86_64 debug=y Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e008:[<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d2v0)
(XEN) rax: ffffffffffff0000 rbx: 4400000000000001 rcx: 00000000001a7d29
(XEN) rdx: 0400000000000000 rsi: 0000000000000011 rdi: ffff82e0034f5540
(XEN) rbp: ffff82e0034fa520 rsp: ffff8301bf13fc08 r8: 0000000000000000
(XEN) r9: 0000000000000200 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff82e0034f5540 r13: 0000000000000000 r14: 10ffffffffffffff
(XEN) r15: 1000000000000000 cr0: 000000008005003b cr4: 00000000000026e4
(XEN) cr3: 00000001b98b6000 cr2: 00000000bb418010
(XEN) fsb: 00000000c0636dc0 gsb: 0000000000000000 gss: 0000000000000000
(XEN) ds: 0011 es: 0011 fs: 0031 gs: 0011 ss: 0000 cs: e008
(XEN) Xen code around <ffff82d080284bd2> (mm.c#dec_linear_entries+0x12/0x20):
(XEN) c1 47 1e 66 85 c0 7f 02 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 41 54
(XEN) Xen stack trace from rsp=ffff8301bf13fc08:
(XEN) ffff82d08028922e 0000000000800063 ffff8301bf13ffff 4c00000000000002
(XEN) ffff82e0034fa520 ffff82e0034f5540 ffff8301b98f5000 0000000000000001
(XEN) ffff820040018000 0200000000000000 ffff82d0802898af 00000000000001fd
(XEN) ffff82e0034f5540 ffff82d080288c59 00000000001a7aaa 0000000000000000
(XEN) ffff8301bf13ffff 4400000000000001 ffff82e0034f5540 0000000000000000
(XEN) 00ffffffffffffff 10ffffffffffffff 1000000000000000 ffff82d0802891f7
(XEN) 0000000101000206 ffff8301bf13ffff 4400000000000002 00000000001a7aaa
(XEN) ffff82e0034f5540 0000000000000000 ffff8301b98f5000 ffff820080000000
(XEN) 0000000000000000 ffff82d0802898ef ffff82d080290b6d ffff8300bff40000
(XEN) 00000001802a8302 ffff8301b98f5000 0000000000000000 ffff8301b98f5000
(XEN) 0000000000000007 ffff8300bff40000 00007ff000000000 0000000000000000
(XEN) ffff8301b98f5000 ffff82e00350bf80 ffff82d0804b0058 ffff82d0804b0060
(XEN) 00000000001a85fc 00000000001a85fc 0000000100000004 00000000001a7aaa
(XEN) 00000000001aaaa7 ffff820080000018 0000000000000001 00000000cdb0db7c
(XEN) ffff82d080387b90 0000000000000001 ffff8301bf13ffff ffff82d080295630
(XEN) ffff8301bf13fe14 00000001bf13ffff ffff820080000000 0000000000000000
(XEN) 00007ff000000000 000000048036bbe8 cdb0dbb0001a7aaa ffff8301bf13fef8
(XEN) ffff8300bff40000 00000000000001a0 00000000deadf00d 0000000000000004
(XEN) 00000000deadf00d ffff82d080367d4a ffff82d000007ff0 ffff82d000000000
(XEN) ffff82d000000001 ffff82d0cdb0db70 ffff82d08036bbf4 ffff82d08036bbe8
(XEN) Xen call trace:
(XEN) Xen call trace:
(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
(XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
(XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d080290b6d>] do_mmuext_op+0x73d/0x1810
(XEN) [<ffff82d080295630>] compat_mmuext_op+0x430/0x450
(XEN) [<ffff82d080367d4a>] pv_hypercall+0x3aa/0x430
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbe8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbf4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036957e>] do_entry_int82+0x1e/0x20
(XEN) [<ffff82d08036bc31>] entry_int82+0xb1/0xc0
>
> Two other questions on the internals of NetBSD page table management: Are all
> updates to a given (set of) page table(s) fully serialized, e.g. via a respective
> spin lock?
In the general case they're done using atomic operations (pmap_pte_cas()).
For xenPV, the update is protected by a global lock, so page table updates
are serialized (globally).
> Could you additionally point me at that code, or give an outline of
> how the (un)pinning of L2 tables in the 32-bit case works?
When a new set of page tables is needed (this is pmap_create()), a pdp is
requested from a cache. If the cache is empty, pages are allocated in
pmap_pdp_ctor(), which is going to also pin the L2 pages.
When the page table is not needed any more (this is pmap_destroy()),
the pdp is returned to the cache. L2 pages remains pinned, with pointers to
the kernel L1 pages. If memory needs to be reclaimed from the cache,
or is an explicit call to pool_cache_destruct_object() is done,
the L2 pages are unpinned, but they are not explicitely zeroed out before
(can this be a problem ?).
The code for this is in
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/pmap.c
Some helper functions are in
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/xen/x86/xen_pmap.c
Some #defines and inline are in
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/xen/include/xenpmap.h
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/i386/include/pmap.h
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-10 10:57 ` Manuel Bouyer
@ 2018-06-10 15:38 ` Jan Beulich
2018-06-10 16:32 ` Manuel Bouyer
2018-06-12 7:57 ` Jan Beulich
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-06-10 15:38 UTC (permalink / raw)
To: bouyer; +Cc: xen-devel
>>> Manuel Bouyer <bouyer@antioche.eu.org> 06/10/18 1:30 PM >>>
>When a new set of page tables is needed (this is pmap_create()), a pdp is
>requested from a cache. If the cache is empty, pages are allocated in
>pmap_pdp_ctor(), which is going to also pin the L2 pages.
>When the page table is not needed any more (this is pmap_destroy()),
>the pdp is returned to the cache. L2 pages remains pinned, with pointers to
>the kernel L1 pages. If memory needs to be reclaimed from the cache,
>or is an explicit call to pool_cache_destruct_object() is done,
>the L2 pages are unpinned, but they are not explicitely zeroed out before
>(can this be a problem ?).
I don't think so, no. Whatever is still in there is going to have respective
refcounts dropped while unvalidating the L2.
But I conclude that if there is an L2 unpin, that L2 is not expected to be in use
(as an L2) anywhere anymore, i.e. the only type reference is supposed to be
the one associated with the pinned status. Nor is it expected for L2s to ever
get freed by other means than unpinning (i.e. by them being taken off an L3).
If that's the case, maybe there is a way to place some more (NetBSD-specific)
assertions in a few places ...
What about L2 tables to be used in slot 3 of an L3 table? Aiui Xen won't allow
them to be pinned, hence I'd expect there to be some special casing in your
code. Considering no similar issues have been observed with 64-bit guests,
this one special case looks to me to be the prime suspect for something going
wrong (in Xen).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-10 15:38 ` Jan Beulich
@ 2018-06-10 16:32 ` Manuel Bouyer
[not found] ` <5B1D528F020000C104A2FF49@prv1-mh.provo.novell.com>
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-10 16:32 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Sun, Jun 10, 2018 at 09:38:17AM -0600, Jan Beulich wrote:
> >>> Manuel Bouyer <bouyer@antioche.eu.org> 06/10/18 1:30 PM >>>
> >When a new set of page tables is needed (this is pmap_create()), a pdp is
> >requested from a cache. If the cache is empty, pages are allocated in
> >pmap_pdp_ctor(), which is going to also pin the L2 pages.
> >When the page table is not needed any more (this is pmap_destroy()),
> >the pdp is returned to the cache. L2 pages remains pinned, with pointers to
> >the kernel L1 pages. If memory needs to be reclaimed from the cache,
> >or is an explicit call to pool_cache_destruct_object() is done,
> >the L2 pages are unpinned, but they are not explicitely zeroed out before
> >(can this be a problem ?).
>
> I don't think so, no. Whatever is still in there is going to have respective
> refcounts dropped while unvalidating the L2.
>
> But I conclude that if there is an L2 unpin, that L2 is not expected to be in use
> (as an L2) anywhere anymore, i.e. the only type reference is supposed to be
> the one associated with the pinned status.
Yes, I think so.
> Nor is it expected for L2s to ever
> get freed by other means than unpinning (i.e. by them being taken off an L3).
Yes, that should be true.
One thing I forgot to mention: NetBSD allocates one L3 per CPU, for the life
of the system (L3 are never freed). Context switching is done by updating the
entries in these L3s.
> If that's the case, maybe there is a way to place some more (NetBSD-specific)
> assertions in a few places ...
>
>
> What about L2 tables to be used in slot 3 of an L3 table? Aiui Xen won't allow
> them to be pinned, hence I'd expect there to be some special casing in your
> code. Considering no similar issues have been observed with 64-bit guests,
> this one special case looks to me to be the prime suspect for something going
> wrong (in Xen).
AFAIK this L2 is allocated at boot, and should never be freed. It's
shared by all CPUs.
There is one special L2 case: it's pinned as L2 but used as L1 in the "kernel"
L2 (the one in slot 3 of the L3 tables). This is for recursive mappings
of the kernel map. This one will be allocated/freed (and so pinned/unpinned)
for each context.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B1D528F020000C104A2FF49@prv1-mh.provo.novell.com>
@ 2018-06-11 9:58 ` Jan Beulich
2018-06-11 10:13 ` Manuel Bouyer
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-06-11 9:58 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 10.06.18 at 18:32, <bouyer@antioche.eu.org> wrote:
> On Sun, Jun 10, 2018 at 09:38:17AM -0600, Jan Beulich wrote:
>> What about L2 tables to be used in slot 3 of an L3 table? Aiui Xen won't allow
>> them to be pinned, hence I'd expect there to be some special casing in your
>> code. Considering no similar issues have been observed with 64-bit guests,
>> this one special case looks to me to be the prime suspect for something going
>> wrong (in Xen).
>
> AFAIK this L2 is allocated at boot, and should never be freed. It's
> shared by all CPUs.
I guess that's what goes into L3 slot 3 and ...
> There is one special L2 case: it's pinned as L2 but used as L1 in the "kernel"
> L2 (the one in slot 3 of the L3 tables). This is for recursive mappings
> of the kernel map. This one will be allocated/freed (and so pinned/unpinned)
> for each context.
... here you mean L3 slot 2 (all assuming 0-based slot numbering)?
Otherwise I'm afraid I'm confused now, as the sharing by all CPUs of
the former seems to contradict the per-context nature of the latter.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-11 9:58 ` Jan Beulich
@ 2018-06-11 10:13 ` Manuel Bouyer
0 siblings, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-11 10:13 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Mon, Jun 11, 2018 at 03:58:01AM -0600, Jan Beulich wrote:
> >>> On 10.06.18 at 18:32, <bouyer@antioche.eu.org> wrote:
> > On Sun, Jun 10, 2018 at 09:38:17AM -0600, Jan Beulich wrote:
> >> What about L2 tables to be used in slot 3 of an L3 table? Aiui Xen won't allow
> >> them to be pinned, hence I'd expect there to be some special casing in your
> >> code. Considering no similar issues have been observed with 64-bit guests,
> >> this one special case looks to me to be the prime suspect for something going
> >> wrong (in Xen).
> >
> > AFAIK this L2 is allocated at boot, and should never be freed. It's
> > shared by all CPUs.
>
> I guess that's what goes into L3 slot 3 and ...
Yes
>
> > There is one special L2 case: it's pinned as L2 but used as L1 in the "kernel"
> > L2 (the one in slot 3 of the L3 tables). This is for recursive mappings
> > of the kernel map. This one will be allocated/freed (and so pinned/unpinned)
> > for each context.
>
> ... here you mean L3 slot 2 (all assuming 0-based slot numbering)?
> Otherwise I'm afraid I'm confused now, as the sharing by all CPUs of
> the former seems to contradict the per-context nature of the latter.
yes, sorry, it's referecened in the last entry of the "L3 slot 2" L2 page.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-10 10:57 ` Manuel Bouyer
2018-06-10 15:38 ` Jan Beulich
@ 2018-06-12 7:57 ` Jan Beulich
2018-06-12 11:39 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-06-12 7:57 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 10.06.18 at 12:57, <bouyer@antioche.eu.org> wrote:
> (XEN) Xen call trace:
> (XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
> (XEN) [<ffff82d08028922e>] mm.c#_put_page_type+0x13e/0x350
> (XEN) [<ffff82d08023a00d>] _spin_lock+0xd/0x50
> (XEN) [<ffff82d0802898af>] mm.c#put_page_from_l2e+0xdf/0x110
> (XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
> (XEN) [<ffff82d0802891f7>] mm.c#_put_page_type+0x107/0x350
> (XEN) [<ffff82d0802898ef>] put_page_type_preemptible+0xf/0x10
> (XEN) [<ffff82d080272adb>] domain.c#relinquish_memory+0xab/0x460
> (XEN) [<ffff82d080276ae3>] domain_relinquish_resources+0x203/0x290
> (XEN) [<ffff82d0802068bd>] domain_kill+0xbd/0x150
> (XEN) [<ffff82d0802039e3>] do_domctl+0x7d3/0x1a90
> (XEN) [<ffff82d080203210>] do_domctl+0/0x1a90
> (XEN) [<ffff82d080367b95>] pv_hypercall+0x1f5/0x430
> (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN) [<ffff82d08036e422>] lstar_enter+0xa2/0x120
> (XEN) [<ffff82d08036e42e>] lstar_enter+0xae/0x120
> (XEN) [<ffff82d08036e48c>] lstar_enter+0x10c/0x120
Let's focus on this scenario for now, as it is under better (timing) control
on the Xen side. Below is a first debugging patch which
- avoids the ASSERT() in question, instead triggering a printk(), in the hope
that the data logged and/or other ASSERT()s shed some additional light
on the situation
- logs cleanup activity (this is likely to be quite chatty, so be sure you set
up large enough internal buffers)
Ideally, if no other ASSERT() triggers as a result of the bypassed one,
you'd try to catch more than a single instance of the problem, so we can
see a possible pattern (if there is one). A simplistic first XTF test I've
created based on your description of the L2 handling model in NetBSD
did not trigger the interesting printk(), but at least that way I've been
able to see that the domain cleanup logging produces useful data.
At the very least I hope that with this we can derive whether the
root of the problem is at page table teardown / cleanup time, or with
management of live ones.
Jan
--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -1872,6 +1872,7 @@ static int relinquish_memory(
while ( (page = page_list_remove_head(list)) )
{
+bool log = false;//temp
/* Grab a reference to the page so it won't disappear from under us. */
if ( unlikely(!get_page(page, d)) )
{
@@ -1880,6 +1881,10 @@ static int relinquish_memory(
continue;
}
+if(is_pv_32bit_domain(d) && PGT_type_equal(page->u.inuse.type_info, PGT_l2_page_table)) {//temp
+ printk("d%d:%"PRI_mfn": %lx:%d\n", d->domain_id, mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count);
+ log = true;
+}
if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
ret = put_page_and_type_preemptible(page);
switch ( ret )
@@ -1921,7 +1926,13 @@ static int relinquish_memory(
if ( likely(y == x) )
{
/* No need for atomic update of type_info here: noone else updates it. */
- switch ( ret = free_page_type(page, x, 1) )
+//temp switch ( ret = free_page_type(page, x, 1) )
+ret = free_page_type(page, x, 1);//temp
+if(log) {//temp
+ printk("%"PRI_mfn" -> %lx:%d (%d,%d,%d)\n", mfn_x(page_to_mfn(page)), page->u.inuse.type_info,
+ page->linear_pt_count, ret, page->nr_validated_ptes, page->partial_pte);
+}
+switch(ret)//temp
{
case 0:
break;
--- unstable.orig/xen/arch/x86/mm.c
+++ unstable/xen/arch/x86/mm.c
@@ -705,12 +705,19 @@ static bool inc_linear_entries(struct pa
return true;
}
-static void dec_linear_entries(struct page_info *pg)
+//temp static void dec_linear_entries(struct page_info *pg)
+static const struct domain*dec_linear_entries(struct page_info*pg)//temp
{
typeof(pg->linear_pt_count) oc;
oc = arch_fetch_and_add(&pg->linear_pt_count, -1);
+{//temp
+ const struct domain*owner = page_get_owner(pg);
+ if(oc <= 0 && is_pv_32bit_domain(owner))
+ return owner;
+}
ASSERT(oc > 0);
+return NULL;//temp
}
static bool inc_linear_uses(struct page_info *pg)
@@ -2617,8 +2624,15 @@ static int _put_final_page_type(struct p
{
if ( ptpg && PGT_type_equal(type, ptpg->u.inuse.type_info) )
{
+const struct domain*d;//temp
dec_linear_uses(page);
+if((d = ({//temp
dec_linear_entries(ptpg);
+})) != NULL) {//temp
+ printk("d%d: %"PRI_mfn":%lx:%d -> %"PRI_mfn":%lx:%d\n", d->domain_id,
+ mfn_x(page_to_mfn(ptpg)), ptpg->u.inuse.type_info, ptpg->linear_pt_count,
+ mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count);
+}
}
ASSERT(!page->linear_pt_count || page_get_owner(page)->is_dying);
set_tlbflush_timestamp(page);
@@ -2704,8 +2718,15 @@ static int _put_page_type(struct page_in
if ( ptpg && PGT_type_equal(x, ptpg->u.inuse.type_info) )
{
+const struct domain*d;//temp
dec_linear_uses(page);
+if((d = ({//temp
dec_linear_entries(ptpg);
+})) != NULL) {//temp
+ printk("d%d: %"PRI_mfn":%lx:%d => %"PRI_mfn":%lx:%d\n", d->domain_id,
+ mfn_x(page_to_mfn(ptpg)), ptpg->u.inuse.type_info, ptpg->linear_pt_count,
+ mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count);
+}
}
return 0;
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-12 7:57 ` Jan Beulich
@ 2018-06-12 11:39 ` Manuel Bouyer
2018-06-12 15:38 ` Manuel Bouyer
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-12 11:39 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Tue, Jun 12, 2018 at 01:57:35AM -0600, Jan Beulich wrote:
> Let's focus on this scenario for now, as it is under better (timing) control
> on the Xen side. Below is a first debugging patch which
> - avoids the ASSERT() in question, instead triggering a printk(), in the hope
> that the data logged and/or other ASSERT()s shed some additional light
> on the situation
> - logs cleanup activity (this is likely to be quite chatty, so be sure you set
> up large enough internal buffers)
>
> Ideally, if no other ASSERT() triggers as a result of the bypassed one,
> you'd try to catch more than a single instance of the problem, so we can
> see a possible pattern (if there is one). A simplistic first XTF test I've
> created based on your description of the L2 handling model in NetBSD
> did not trigger the interesting printk(), but at least that way I've been
> able to see that the domain cleanup logging produces useful data.
>
> At the very least I hope that with this we can derive whether the
> root of the problem is at page table teardown / cleanup time, or with
> management of live ones.
I applied this patch to 4.11rc4 (let's not change too much things at the
same time) and rebooted my test host. Hopefully I'll have some data to report
soon
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-12 11:39 ` Manuel Bouyer
@ 2018-06-12 15:38 ` Manuel Bouyer
2018-06-12 15:54 ` Andrew Cooper
2018-06-12 20:55 ` Manuel Bouyer
0 siblings, 2 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-12 15:38 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
> I applied this patch to 4.11rc4 (let's not change too much things at the
> same time) and rebooted my test host. Hopefully I'll have some data to report
> soon
Got the first panic (still from a i386 domU):
login: (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
(XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
(XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
(XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
(XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
(XEN) d4v2 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf077f78, dr6 ffff0ff0
(XEN) d4: 1a3942:4000000000000001:-1 -> 1a0f40:4000000000000001:0
(XEN) Assertion '!page->linear_pt_count || page_get_owner(page)->is_dying' failed at mm.c:2518
(XEN) ----[ Xen-4.11-rcnb1 x86_64 debug=y Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82d080289408>] mm.c#_put_page_type+0x208/0x480
(XEN) RFLAGS: 0000000000210246 CONTEXT: hypervisor (d4v0)
(XEN) rax: 000000000000ffff rbx: 4100000000000001 rcx: 00000001b98f1000
(XEN) rdx: ffff8301b98f1000 rsi: 000000000000000b rdi: 000000000000000b
(XEN) rbp: ffff82e003472840 rsp: ffff8300bfc77ca8 r8: ffff814100200058
(XEN) r9: 0000000000000200 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 10ffffffffffffff
(XEN) r15: 1000000000000000 cr0: 000000008005003b cr4: 00000000000026e4
(XEN) cr3: 00000001b98be000 cr2: 00000000ccbec000
(XEN) fsb: 00000000c0636dc0 gsb: 0000000000000000 gss: 0000000000000000
(XEN) ds: 0011 es: 0011 fs: 0031 gs: 0011 ss: 0000 cs: e008
(XEN) Xen code around <ffff82d080289408> (mm.c#_put_page_type+0x208/0x480):
(XEN) ba 20 01 00 00 00 75 02 <0f> 0b f6 45 0f 20 74 28 8b 4d 18 ba 00 00 00 00
(XEN) Xen stack trace from rsp=ffff8300bfc77ca8:
(XEN) 1000000000000000 ffff82d080289494 000000010128d429 ffff8300bfc77fff
(XEN) ffff8300bfc77d20 ffff8300bff40000 ffff8300bff40000 0000000000000000
(XEN) ffff8301b98f1000 0000000000000000 00000000deadf00d ffff82d080289b52
(XEN) ffff8300bfc77ef8 ffff8300bff40000 ffff82d0802906b6 ffff8300bff40000
(XEN) 80000000802a8542 ffff8301b98f1000 0000000000000000 ffff8301b98f1000
(XEN) 0000000000000007 ffff8300bff40000 0000000000000000 0000000000000000
(XEN) ffff8301b98f1000 ffff82e003502a80 ffff82d0804b0058 ffff82d000000018
(XEN) ffff8300bfc77de8 00000000bfc77d98 0000000100000004 00000000001a3942
(XEN) 00000000ce32f7ac ffff8300bfc77ef8 ffff8300bff40000 0000000000000000
(XEN) 00000000deadf00d 0000000000000004 00000000deadf00d ffff82d080295470
(XEN) ffff8300bfc77e14 00000001cc7d0f2c ffff82d08036be34 0000000000000000
(XEN) 000000008036be34 ffff82d08036be28 ffff82d08036be34 ffff8300bfc77ef8
(XEN) ffff8300bff40000 00000000000001a0 00000000deadf00d 0000000000000004
(XEN) 00000000deadf00d ffff82d080367f8a ffff82d000000000 ffff82d000000000
(XEN) ffff82d080000000 ffff82d000000000 ffff82d08036be34 ffff82d08036be28
(XEN) ffff82d08036be34 ffff82d08036be28 ffff82d08036be34 ffff82d08036be28
(XEN) ffff82d08036be34 ffff8300bfc77ef8 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82d0803697be
(XEN) ffff8300bff40000 ffff82d08036be71 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 00000000ce32f5a4 0000000000000000
(XEN) Xen call trace:
(XEN) [<ffff82d080289408>] mm.c#_put_page_type+0x208/0x480
(XEN) [<ffff82d080289494>] mm.c#_put_page_type+0x294/0x480
(XEN) [<ffff82d080289b52>] put_old_guest_table+0x22/0x60
(XEN) [<ffff82d0802906b6>] do_mmuext_op+0x46/0x1810
(XEN) [<ffff82d080295470>] compat_mmuext_op+0x30/0x450
(XEN) [<ffff82d08036be34>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036be28>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036be34>] entry_int82+0x74/0xc0
(XEN) [<ffff82d080367f8a>] pv_hypercall+0x3aa/0x430
(XEN) [<ffff82d08036be34>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036be28>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036be34>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036be28>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036be34>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036be28>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036be34>] entry_int82+0x74/0xc0
(XEN) [<ffff82d0803697be>] do_entry_int82+0x1e/0x20
(XEN) [<ffff82d08036be71>] entry_int82+0xb1/0xc0
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '!page->linear_pt_count || page_get_owner(page)->is_dying' failed at mm.c:2518
(XEN) ****************************************
(XEN)
I put the build files at
ftp://asim.lip6.fr/outgoing/bouyer/xen-debug/0612/
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-12 15:38 ` Manuel Bouyer
@ 2018-06-12 15:54 ` Andrew Cooper
2018-06-12 16:00 ` Manuel Bouyer
2018-06-12 20:55 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Andrew Cooper @ 2018-06-12 15:54 UTC (permalink / raw)
To: Manuel Bouyer, Jan Beulich; +Cc: xen-devel
On 12/06/18 16:38, Manuel Bouyer wrote:
> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
>> I applied this patch to 4.11rc4 (let's not change too much things at the
>> same time) and rebooted my test host. Hopefully I'll have some data to report
>> soon
> Got the first panic (still from a i386 domU):
> login: (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> (XEN) d4v2 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf077f78, dr6 ffff0ff0
I presume you're running a XSA-263 (MovSS) exploit in testing?
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-12 15:54 ` Andrew Cooper
@ 2018-06-12 16:00 ` Manuel Bouyer
2018-06-12 16:29 ` Andrew Cooper
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-12 16:00 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel, Jan Beulich
On Tue, Jun 12, 2018 at 04:54:30PM +0100, Andrew Cooper wrote:
> On 12/06/18 16:38, Manuel Bouyer wrote:
> > On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
> >> I applied this patch to 4.11rc4 (let's not change too much things at the
> >> same time) and rebooted my test host. Hopefully I'll have some data to report
> >> soon
> > Got the first panic (still from a i386 domU):
> > login: (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> > (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> > (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> > (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> > (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
> > (XEN) d4v2 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf077f78, dr6 ffff0ff0
>
> I presume you're running a XSA-263 (MovSS) exploit in testing?
Not intentionally, these are the NetBSD test suite and I don't think any
specifically targets this (there are 759 tests at this time).
But these includes network tests, so there is probably in kernel bpf code tests.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-12 16:00 ` Manuel Bouyer
@ 2018-06-12 16:29 ` Andrew Cooper
0 siblings, 0 replies; 43+ messages in thread
From: Andrew Cooper @ 2018-06-12 16:29 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel, Jan Beulich
On 12/06/18 17:00, Manuel Bouyer wrote:
> On Tue, Jun 12, 2018 at 04:54:30PM +0100, Andrew Cooper wrote:
>> On 12/06/18 16:38, Manuel Bouyer wrote:
>>> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
>>>> I applied this patch to 4.11rc4 (let's not change too much things at the
>>>> same time) and rebooted my test host. Hopefully I'll have some data to report
>>>> soon
>>> Got the first panic (still from a i386 domU):
>>> login: (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>> (XEN) d4v2 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf077f78, dr6 ffff0ff0
>> I presume you're running a XSA-263 (MovSS) exploit in testing?
> Not intentionally, these are the NetBSD test suite and I don't think any
> specifically targets this (there are 759 tests at this time).
> But these includes network tests, so there is probably in kernel bpf code tests.
This specific message can only be triggered (so far as we know) by a
MovSS-deferred #DB, in this case over an `into` instruction.
If this isn't a dedicated test, then whatever you've got in your test
suite came dangerously close to discovering the MovSS issue.
Anyway - it was more of an observation than anything else, to point out
that it isn't liable to be related to the assertion failure.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-12 15:38 ` Manuel Bouyer
2018-06-12 15:54 ` Andrew Cooper
@ 2018-06-12 20:55 ` Manuel Bouyer
[not found] ` <5B1FB0E5020000F903B0F0C2@prv1-mh.provo.novell.com>
1 sibling, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-12 20:55 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 796 bytes --]
On Tue, Jun 12, 2018 at 05:38:45PM +0200, Manuel Bouyer wrote:
> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
> > I applied this patch to 4.11rc4 (let's not change too much things at the
> > same time) and rebooted my test host. Hopefully I'll have some data to report
> > soon
>
> Got the first panic (still from a i386 domU):
I got another panic, possibly at domain shutdown. But it seems that not
all messages from the console made it to the serial port (some Xen internal
buffer overflow ?). Attached is the trace from serial console, the first
2 lines are from the NetBSD dom0 kernel, indicating vbds for domain 6 being
detached (probably at shutdown of dom6).
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: xen_console.txt --]
[-- Type: text/plain, Size: 16279 bytes --]
(XEN) xbd backend: detach device vnd1d for domain 6
xbd backend: detach device vnd0d for domain 6
(XEN) d6:1ab56f: 4000000000000000:0
(XEN) d6:1ab570: 4000000000000000:0
(XEN) d6:1ab571: 4000000000000000:0
(XEN) d6:1ab572: 4c00000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab572: 4900000000000001:0
(XEN) d6:1ab573: 4000000000000000:0
(XEN) d6:1ab57e: 4000000000000000:0
(XEN) d6:1ab1fd: 4000000000000000:0
(XEN) d6:1aae37: 4000000000000000:0
(XEN) d6:1aae59: 4000000000000000:0
(XEN) d6:1aaf06: 4c00000000000001:3
(XEN) d6:1aaf06: 4900000000000001:2
(XEN) d6:1aaf06: 4900000000000001:1
(XEN) d6:1aaf06: 4900000000000001:0
(XEN) d6:1aaf06: 4900000000000001:0
(XEN) d6:1aaf06: 4900000000000001:0
(XEN) d6:1aaf06: 4900000000000001:0
(XEN) d6:1aaf07: 4c00000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf07: 4900000000000001:0
(XEN) d6:1aaf44: 4c00000000000001:0
(XEN) d6:1aaf44: 4900000000000001:0
(XEN) d6:1aaf45: 4c00000000000001:0
(XEN) d6:1aaf8f: 4c00000000000002:-1
(XEN) d6:1aafcd: 4c00000000000002:-1
(XEN) d6:1aafce: 4c00000000000001:3
(XEN) d6:1aafce: 4900000000000001:2
(XEN) d6:1aafce: 4900000000000001:1
(XEN) d6:1aafce: 4900000000000001:0
(XEN) d6:1aac0c: 4c00000000000001:0
(XEN) d6:1aac32: 4c00000000000001:3
(XEN) d6:1aac32: 4900000000000001:2
(XEN) d6:1aac32: 4900000000000001:1
(XEN) d6:1aac32: 4900000000000001:0
(XEN) d6:1aac5f: 4c00000000000002:-1
(XEN) d6:1aac8d: 4c00000000000002:-1
(XEN) d6:1aac9a: 4c00000000000001:3
(XEN) d6:1aac9a: 4900000000000001:2
(XEN) d6:1aac9a: 4900000000000001:1
(XEN) d6:1aac9a: 4900000000000001:0
(XEN) d6:1aac9b: 4c00000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9b: 4900000000000001:0
(XEN) d6:1aac9c: 4c00000000000002:-1
(XEN) d6:1aac9e: 4c00000000000001:3
(XEN) d6:1aac9e: 4900000000000001:2
(XEN) d6:1aac9e: 4900000000000001:1
(XEN) d6:1aac9e: 4900000000000001:0
(XEN) d6:1aaccc: 4c00000000000002:-1
(XEN) d6:1aaccf: 4c00000000000002:-1
(XEN) d6:1aacd8: 4c00000000000001:0
(XEN) d6:1aacd9: 4c00000000000001:0
(XEN) d6:1aacdd: 4c00000000000001:0
(XEN) d6:1aace4: 4c00000000000002:-1
(XEN) d6:1aad0e: 4c00000000000001:3
(XEN) d6:1aad0e: 4900000000000001:2
(XEN) d6:1aad0e: 4900000000000001:1
(XEN) d6:1aad0e: 4900000000000001:0
(XEN) d6:1aad27: 4c00000000000002:-1
(XEN) d6:1aad6b: 4c00000000000002:-1
(XEN) d6:1aadaa: 4c00000000000001:3
(XEN) d6:1aadaa: 4900000000000001:2
(XEN) d6:1aadaa: 4900000000000001:1
(XEN) d6:1aadaa: 4900000000000001:0
(XEN) d6:1aaa25: 4c00000000000002:-1
(XEN) d6:1aaa28: 4c00000000000001:0
(XEN) d6:1aaa66: 4c00000000000001:3
(XEN) d6:1aaa66: 4900000000000001:2
(XEN) d6:1aaa66: 4900000000000001:1
(XEN) d6:1aaa66: 4900000000000001:0
(XEN) d6:1aaa9a: 4c00000000000001:3
(XEN) d6:1aaa9a: 4900000000000001:2
(XEN) d6:1aaa9a: 4900000000000001:1
(XEN) d6:1aaa9a: 4900000000000001:0
(XEN) d6:1aaaa9: 4c00000000000001:0
(XEN) d6:1aab47: 4c00000000000002:-1
(XEN) d6:1aa851: 4c00000000000002:-1
(XEN) d6:1aa886: 4c00000000000001:3
(XEN) d6:1aa886: 4900000000000001:2
(XEN) d6:1aa886: 4900000000000001:1
(XEN) d6:1aa886: 4900000000000001:0
(XEN) d6:1aa8c5: 4c00000000000001:0
(XEN) d6:1aa903: 4c00000000000002:-1
(XEN) d6:1aa904: 4c00000000000001:0
(XEN) d6:1aa9dc: 4c00000000000002:-1
(XEN) d6:1aa601: 4c00000000000002:-1
(XEN) d6:1aa602: 4c00000000000001:3
(XEN) d6:1aa602: 4900000000000001:2
(XEN) d6:1aa602: 4900000000000001:1
(XEN) d6:1aa602: 4900000000000001:0
(XEN) d6:1aa622: 4c00000000000001:3
(XEN) d6:1aa622: 4900000000000001:2
(XEN) d6:1aa622: 4900000000000001:1
(XEN) d6:1aa622: 4900000000000001:0
(XEN) d6:1aa693: 4c00000000000002:-1
(XEN) d6:1aa6e3: 4c00000000000002:-1
(XEN) d6:1aa6f7: 4c00000000000002:-1
(XEN) d6:1aa717: 4c00000000000002:-1
(XEN) d6:1aa740: 4c00000000000001:0
(XEN) d6:1aa756: 4c00000000000001:3
(XEN) d6:1aa756: 4900000000000001:2
(XEN) d6:1aa756: 4900000000000001:1
(XEN) d6:1aa756: 4900000000000001:0
(XEN) d6:1aa773: 4c00000000000002:-1
(XEN) d6:1aa776: 4c00000000000001:3
(XEN) d6:1aa776: 4900000000000001:2
(XEN) d6:1aa776: 4900000000000001:1
(XEN) d6:1aa776: 4900000000000001:0
(XEN) d6:1aa795: 4c00000000000001:0
(XEN) d6:1aa7d4: 4c00000000000001:0
(XEN) d6:1aa7d9: 4c00000000000001:0
(XEN) d6:1aa7e4: 4c00000000000002:-1
(XEN) d6:1aa420: 4c00000000000001:0
(XEN) d6:1aa425: 4c00000000000002:-1
(XEN) d6:1aa432: 4c00000(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 4:
(XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
[-- Attachment #3: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B20335B0200008603B30F33@prv1-mh.provo.novell.com>
@ 2018-06-13 6:23 ` Jan Beulich
2018-06-13 8:07 ` Jan Beulich
1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2018-06-13 6:23 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 12.06.18 at 22:55, <bouyer@antioche.eu.org> wrote:
> On Tue, Jun 12, 2018 at 05:38:45PM +0200, Manuel Bouyer wrote:
>> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
>> > I applied this patch to 4.11rc4 (let's not change too much things at the
>> > same time) and rebooted my test host. Hopefully I'll have some data to report
>> > soon
>>
>> Got the first panic (still from a i386 domU):
>
> I got another panic, possibly at domain shutdown. But it seems that not
> all messages from the console made it to the serial port (some Xen internal
> buffer overflow ?).
That's what my remark regarding buffer sizes was about: You need to
make sure they're large enough (via command line options). Or you
may want to run with "sync_console", assuming that at least the
shutdown variant of the issue does not get affected timing wise to a
degree where it's not reproducible anymore.
> Attached is the trace from serial console, the first
> 2 lines are from the NetBSD dom0 kernel, indicating vbds for domain 6 being
> detached (probably at shutdown of dom6).
Yeah, sadly this is of little use with some messages dropped in the middle
(and especially very close to the assertion actually having hit).
But from both this and the earlier (non-shutdown) instance you've sent
I see that I should replace/suppress further assertions, which now
trigger signaling just the same problem that the already replaced one
did.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B1FF4E60200007D03B2D421@prv1-mh.provo.novell.com>
@ 2018-06-13 6:54 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2018-06-13 6:54 UTC (permalink / raw)
To: Manuel Bouyer, Andrew Cooper; +Cc: xen-devel
>>> On 12.06.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
> On 12/06/18 17:00, Manuel Bouyer wrote:
>> On Tue, Jun 12, 2018 at 04:54:30PM +0100, Andrew Cooper wrote:
>>> On 12/06/18 16:38, Manuel Bouyer wrote:
>>>> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
>>>>> I applied this patch to 4.11rc4 (let's not change too much things at the
>>>>> same time) and rebooted my test host. Hopefully I'll have some data to report
>>>>> soon
>>>> Got the first panic (still from a i386 domU):
>>>> login: (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>>> (XEN) d4v0 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf117f78, dr6 ffff0ff0
>>>> (XEN) d4v2 Hit #DB in Xen context: e008:ffff82d08036eb00 [overflow], stk 0000:ffff8301bf077f78, dr6 ffff0ff0
>>> I presume you're running a XSA-263 (MovSS) exploit in testing?
>> Not intentionally, these are the NetBSD test suite and I don't think any
>> specifically targets this (there are 759 tests at this time).
>> But these includes network tests, so there is probably in kernel bpf code tests.
>
> This specific message can only be triggered (so far as we know) by a
> MovSS-deferred #DB, in this case over an `into` instruction.
>
> If this isn't a dedicated test, then whatever you've got in your test
> suite came dangerously close to discovering the MovSS issue.
So maybe, instead of logging CS and SS in the above message, it would
be more helpful to log the guest mode RIP here, to help understand the
origin?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B20335B0200008603B30F33@prv1-mh.provo.novell.com>
2018-06-13 6:23 ` Jan Beulich
@ 2018-06-13 8:07 ` Jan Beulich
2018-06-13 8:57 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-06-13 8:07 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 12.06.18 at 22:55, <bouyer@antioche.eu.org> wrote:
> On Tue, Jun 12, 2018 at 05:38:45PM +0200, Manuel Bouyer wrote:
>> On Tue, Jun 12, 2018 at 01:39:05PM +0200, Manuel Bouyer wrote:
>> > I applied this patch to 4.11rc4 (let's not change too much things at the
>> > same time) and rebooted my test host. Hopefully I'll have some data to report
>> > soon
>>
>> Got the first panic (still from a i386 domU):
>
> I got another panic, possibly at domain shutdown.
Hmm, I can't identify the source of
(XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
In fact, there's no assertion with that expression anywhere I could
see. Do you have any local patches in place? In any event, to take
care of the other assertion you've hit below an updated debugging
patch. I hope I didn't overlook any further (cascade) ones.
Jan
--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -1872,6 +1872,7 @@ static int relinquish_memory(
while ( (page = page_list_remove_head(list)) )
{
+bool log = false;//temp
/* Grab a reference to the page so it won't disappear from under us. */
if ( unlikely(!get_page(page, d)) )
{
@@ -1880,6 +1881,10 @@ static int relinquish_memory(
continue;
}
+if(is_pv_32bit_domain(d) && PGT_type_equal(page->u.inuse.type_info, PGT_l2_page_table)) {//temp
+ printk("d%d:%"PRI_mfn": %lx:%d\n", d->domain_id, mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count);
+ log = true;
+}
if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
ret = put_page_and_type_preemptible(page);
switch ( ret )
@@ -1921,7 +1926,13 @@ static int relinquish_memory(
if ( likely(y == x) )
{
/* No need for atomic update of type_info here: noone else updates it. */
- switch ( ret = free_page_type(page, x, 1) )
+//temp switch ( ret = free_page_type(page, x, 1) )
+ret = free_page_type(page, x, 1);//temp
+if(log) {//temp
+ printk("%"PRI_mfn" -> %lx:%d (%d,%d,%d)\n", mfn_x(page_to_mfn(page)), page->u.inuse.type_info,
+ page->linear_pt_count, ret, page->nr_validated_ptes, page->partial_pte);
+}
+switch(ret)//temp
{
case 0:
break;
--- unstable.orig/xen/arch/x86/mm.c
+++ unstable/xen/arch/x86/mm.c
@@ -705,12 +705,19 @@ static bool inc_linear_entries(struct pa
return true;
}
-static void dec_linear_entries(struct page_info *pg)
+//temp static void dec_linear_entries(struct page_info *pg)
+static struct domain*dec_linear_entries(struct page_info*pg)//temp
{
typeof(pg->linear_pt_count) oc;
oc = arch_fetch_and_add(&pg->linear_pt_count, -1);
+{//temp
+ struct domain*owner = page_get_owner(pg);
+ if(oc <= 0 && is_pv_32bit_domain(owner))
+ return owner;
+}
ASSERT(oc > 0);
+return NULL;//temp
}
static bool inc_linear_uses(struct page_info *pg)
@@ -2615,11 +2622,25 @@ static int _put_final_page_type(struct p
/* No need for atomic update of type_info here: noone else updates it. */
if ( rc == 0 )
{
+struct domain*d;//temp
if ( ptpg && PGT_type_equal(type, ptpg->u.inuse.type_info) )
{
dec_linear_uses(page);
+if((d = ({//temp
dec_linear_entries(ptpg);
+})) != NULL) {//temp
+ printk("d%d: %"PRI_mfn":%lx:%d -> %"PRI_mfn":%lx:%d\n", d->domain_id,
+ mfn_x(page_to_mfn(ptpg)), ptpg->u.inuse.type_info, ptpg->linear_pt_count,
+ mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count);
+ domain_crash(d);
+}
}
+if(is_pv_32bit_domain(d = page_get_owner(page))) {//temp
+ if(page->linear_pt_count && !d->is_dying) {
+ printk("d%d:%"PRI_mfn": %lx:%d (%p)\n", d->domain_id, mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count, ptpg);
+ domain_crash(d);
+ }
+} else
ASSERT(!page->linear_pt_count || page_get_owner(page)->is_dying);
set_tlbflush_timestamp(page);
smp_wmb();
@@ -2704,8 +2725,16 @@ static int _put_page_type(struct page_in
if ( ptpg && PGT_type_equal(x, ptpg->u.inuse.type_info) )
{
+struct domain*d;//temp
dec_linear_uses(page);
+if((d = ({//temp
dec_linear_entries(ptpg);
+})) != NULL) {//temp
+ printk("d%d: %"PRI_mfn":%lx:%d => %"PRI_mfn":%lx:%d\n", d->domain_id,
+ mfn_x(page_to_mfn(ptpg)), ptpg->u.inuse.type_info, ptpg->linear_pt_count,
+ mfn_x(page_to_mfn(page)), page->u.inuse.type_info, page->linear_pt_count);
+ domain_crash(d);
+}
}
return 0;
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-13 8:07 ` Jan Beulich
@ 2018-06-13 8:57 ` Manuel Bouyer
[not found] ` <5B20DC910200000204A693B0@prv1-mh.provo.novell.com>
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-13 8:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 637 bytes --]
On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote:
>
> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
>
> In fact, there's no assertion with that expression anywhere I could
> see. Do you have any local patches in place?
Yes, 2 of them from you (the first one is where the assert is). See attached.
> In any event, to take
> care of the other assertion you've hit below an updated debugging
> patch. I hope I didn't overlook any further (cascade) ones.
Will rebuild with it, I'll keep you informed.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: patch-tmp --]
[-- Type: text/plain, Size: 1091 bytes --]
(besides a more reliable confirmation - or otherwise - that this indeed
is an issue with 32-bit guests only).
While I think I have ruled out the TLB flush time stamp setting still
happening too early / wrongly in certain cases, there's a small
debugging patch that I would hope could help prove this one or the
other way (see below).
Btw: You've said earlier that there wouldn't be a domain number in
the panic message. However,
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d14v3)
has it (at the end: domain 14, vCPU 3). Just in case this helps
identifying further useful pieces of information.
Jan
--- xen/arch/x86/mm.c.orig
+++ xen/arch/x86/mm.c
@@ -578,7 +578,11 @@ static inline void set_tlbflush_timestam
*/
if ( !(page->count_info & PGC_page_table) ||
!shadow_mode_enabled(page_get_owner(page)) )
+ {
+ /* NB: This depends on WRAP_MASK in flushtlb.c to be <= 0xffff. */
+ ASSERT(!page->linear_pt_count);
page_set_tlbflush_timestamp(page);
+ }
}
const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
[-- Attachment #3: patch-tmp2 --]
[-- Type: text/plain, Size: 3483 bytes --]
$NetBSD: $
Commit df8234fd2c ("replace vCPU's dirty CPU mask by numeric ID") was
too lax in two respects: First of all it didn't consider the case of a
vCPU not having a valid dirty CPU in the descriptor table TLB flush
case. This is the issue Manual has run into with NetBSD.
Additionally reads of ->dirty_cpu for other than the current vCPU are at
risk of racing with scheduler actions, i.e. single atomic reads need to
be used there. Obviously the non-init write sites then better also use
atomic writes.
Having to touch the descriptor table TLB flush code here anyway, take
the opportunity and switch it to be at most one flush_tlb_mask()
invocation.
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- xen/arch/x86/domain.c.orig
+++ xen/arch/x86/domain.c
@@ -1631,7 +1631,7 @@ static void __context_switch(void)
*/
if ( pd != nd )
cpumask_set_cpu(cpu, nd->dirty_cpumask);
- n->dirty_cpu = cpu;
+ write_atomic(&n->dirty_cpu, cpu);
if ( !is_idle_domain(nd) )
{
@@ -1687,7 +1687,7 @@ static void __context_switch(void)
if ( pd != nd )
cpumask_clear_cpu(cpu, pd->dirty_cpumask);
- p->dirty_cpu = VCPU_CPU_CLEAN;
+ write_atomic(&p->dirty_cpu, VCPU_CPU_CLEAN);
per_cpu(curr_vcpu, cpu) = n;
}
--- xen/arch/x86/mm.c.orig
+++ xen/arch/x86/mm.c
@@ -1202,11 +1202,23 @@ void put_page_from_l1e(l1_pgentry_t l1e,
unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) &&
(l1e_owner == pg_owner) )
{
+ cpumask_t *mask = this_cpu(scratch_cpumask);
+
+ cpumask_clear(mask);
+
for_each_vcpu ( pg_owner, v )
{
- if ( pv_destroy_ldt(v) )
- flush_tlb_mask(cpumask_of(v->dirty_cpu));
+ unsigned int cpu;
+
+ if ( !pv_destroy_ldt(v) )
+ continue;
+ cpu = read_atomic(&v->dirty_cpu);
+ if ( is_vcpu_dirty_cpu(cpu) )
+ __cpumask_set_cpu(cpu, mask);
}
+
+ if ( !cpumask_empty(mask) )
+ flush_tlb_mask(mask);
}
put_page(page);
}
@@ -2979,13 +2991,18 @@ static inline int vcpumask_to_pcpumask(
while ( vmask )
{
+ unsigned int cpu;
+
vcpu_id = find_first_set_bit(vmask);
vmask &= ~(1UL << vcpu_id);
vcpu_id += vcpu_bias;
if ( (vcpu_id >= d->max_vcpus) )
return 0;
- if ( ((v = d->vcpu[vcpu_id]) != NULL) && vcpu_cpu_dirty(v) )
- __cpumask_set_cpu(v->dirty_cpu, pmask);
+ if ( (v = d->vcpu[vcpu_id]) == NULL )
+ continue;
+ cpu = read_atomic(&v->dirty_cpu);
+ if ( is_vcpu_dirty_cpu(cpu) )
+ __cpumask_set_cpu(cpu, pmask);
}
}
}
--- xen/include/xen/sched.h.orig
+++ xen/include/xen/sched.h
@@ -795,10 +795,15 @@ static inline int vcpu_runnable(struct v
atomic_read(&v->domain->pause_count));
}
-static inline bool vcpu_cpu_dirty(const struct vcpu *v)
+static inline bool is_vcpu_dirty_cpu(unsigned int cpu)
{
BUILD_BUG_ON(NR_CPUS >= VCPU_CPU_CLEAN);
- return v->dirty_cpu != VCPU_CPU_CLEAN;
+ return cpu != VCPU_CPU_CLEAN;
+}
+
+static inline bool vcpu_cpu_dirty(const struct vcpu *v)
+{
+ return is_vcpu_dirty_cpu(v->dirty_cpu);
}
void vcpu_block(void);
[-- Attachment #4: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B20DC910200000204A693B0@prv1-mh.provo.novell.com>
@ 2018-06-13 9:59 ` Jan Beulich
2018-06-13 10:07 ` Manuel Bouyer
2018-06-13 22:16 ` Manuel Bouyer
0 siblings, 2 replies; 43+ messages in thread
From: Jan Beulich @ 2018-06-13 9:59 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 13.06.18 at 10:57, <bouyer@antioche.eu.org> wrote:
> On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote:
>>
>> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
>>
>> In fact, there's no assertion with that expression anywhere I could
>> see. Do you have any local patches in place?
>
> Yes, 2 of them from you (the first one is where the assert is). See
> attached.
Oh, I had long dropped that first one, after you had said that it didn't
trigger in a long time. It triggering with the other debugging patch is
not unexpected. So please drop that patch at least for the time being.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-13 9:59 ` Jan Beulich
@ 2018-06-13 10:07 ` Manuel Bouyer
2018-06-13 22:16 ` Manuel Bouyer
1 sibling, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-13 10:07 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Wed, Jun 13, 2018 at 03:59:19AM -0600, Jan Beulich wrote:
> >>> On 13.06.18 at 10:57, <bouyer@antioche.eu.org> wrote:
> > On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote:
> >>
> >> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
> >>
> >> In fact, there's no assertion with that expression anywhere I could
> >> see. Do you have any local patches in place?
> >
> > Yes, 2 of them from you (the first one is where the assert is). See
> > attached.
>
> Oh, I had long dropped that first one, after you had said that it didn't
> trigger in a long time. It triggering with the other debugging patch is
> not unexpected. So please drop that patch at least for the time being.
OK, rebuilding Xen right now
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-13 9:59 ` Jan Beulich
2018-06-13 10:07 ` Manuel Bouyer
@ 2018-06-13 22:16 ` Manuel Bouyer
[not found] ` <5B2197D3020000F004A75E9B@prv1-mh.provo.novell.com>
1 sibling, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-13 22:16 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]
On Wed, Jun 13, 2018 at 03:59:19AM -0600, Jan Beulich wrote:
> >>> On 13.06.18 at 10:57, <bouyer@antioche.eu.org> wrote:
> > On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote:
> >>
> >> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
> >>
> >> In fact, there's no assertion with that expression anywhere I could
> >> see. Do you have any local patches in place?
> >
> > Yes, 2 of them from you (the first one is where the assert is). See
> > attached.
>
> Oh, I had long dropped that first one, after you had said that it didn't
> trigger in a long time. It triggering with the other debugging patch is
> not unexpected. So please drop that patch at least for the time being.
So far I've not been able to make Xen panic with the new xen kernel.
Attached is a log of the serial console, in case you notice something.
I'll keep anita tests running in a loop overnight, in case it ends up
hitting an assert.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: xen_console.txt.gz --]
[-- Type: application/x-gunzip, Size: 23134 bytes --]
[-- Attachment #3: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B2197D3020000F004A75E9B@prv1-mh.provo.novell.com>
@ 2018-06-14 14:33 ` Jan Beulich
2018-06-15 15:01 ` Manuel Bouyer
2018-06-25 8:33 ` Manuel Bouyer
0 siblings, 2 replies; 43+ messages in thread
From: Jan Beulich @ 2018-06-14 14:33 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 14.06.18 at 00:16, <bouyer@antioche.eu.org> wrote:
> On Wed, Jun 13, 2018 at 03:59:19AM -0600, Jan Beulich wrote:
>> >>> On 13.06.18 at 10:57, <bouyer@antioche.eu.org> wrote:
>> > On Wed, Jun 13, 2018 at 02:07:29AM -0600, Jan Beulich wrote:
>> >>
>> >> (XEN) Assertion '!page->linear_pt_count' failed at mm.c:596
>> >>
>> >> In fact, there's no assertion with that expression anywhere I could
>> >> see. Do you have any local patches in place?
>> >
>> > Yes, 2 of them from you (the first one is where the assert is). See
>> > attached.
>>
>> Oh, I had long dropped that first one, after you had said that it didn't
>> trigger in a long time. It triggering with the other debugging patch is
>> not unexpected. So please drop that patch at least for the time being.
>
> So far I've not been able to make Xen panic with the new xen kernel.
> Attached is a log of the serial console, in case you notice something.
None of the printk()s replacing ASSERT()s have triggered, so nothing
interesting to lear from the log, unfortunately.
> I'll keep anita tests running in a loop overnight, in case it ends up
> hitting an assert.
Thanks.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-14 14:33 ` Jan Beulich
@ 2018-06-15 15:01 ` Manuel Bouyer
2018-06-25 8:33 ` Manuel Bouyer
1 sibling, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-15 15:01 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Thu, Jun 14, 2018 at 08:33:17AM -0600, Jan Beulich wrote:
> > So far I've not been able to make Xen panic with the new xen kernel.
> > Attached is a log of the serial console, in case you notice something.
>
> None of the printk()s replacing ASSERT()s have triggered, so nothing
> interesting to lear from the log, unfortunately.
>
> > I'll keep anita tests running in a loop overnight, in case it ends up
> > hitting an assert.
>
> Thanks.
Still nothing in the console logs. Maybe because the sync_console prevents
some race condition from happening. I'm trying again with a large console
ring instead.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-06-14 14:33 ` Jan Beulich
2018-06-15 15:01 ` Manuel Bouyer
@ 2018-06-25 8:33 ` Manuel Bouyer
[not found] ` <5B30A8DD0200007003BD270D@prv1-mh.provo.novell.com>
1 sibling, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-06-25 8:33 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 994 bytes --]
On Thu, Jun 14, 2018 at 08:33:17AM -0600, Jan Beulich wrote:
> > So far I've not been able to make Xen panic with the new xen kernel.
> > Attached is a log of the serial console, in case you notice something.
>
> None of the printk()s replacing ASSERT()s have triggered, so nothing
> interesting to lear from the log, unfortunately.
>
> > I'll keep anita tests running in a loop overnight, in case it ends up
> > hitting an assert.
Hello,
the dom0 has been running for a week now, running the daily NetBSD tests.
Attached is the console log.
I didn't notice anything suspect, exept a few domU crashes (crashing in
Xen, the problem is not reported back to the domU). But as this is
running NetBSD-HEAD tests it can also be a bug in the domU, that has
been fixed since then.
It's possible that the printk changed timings in a way that prevents the
race condition from happening ...
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: xen1.gz --]
[-- Type: application/x-gunzip, Size: 17816 bytes --]
[-- Attachment #3: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B30A8DD0200007003BD270D@prv1-mh.provo.novell.com>
@ 2018-06-26 15:11 ` Jan Beulich
2018-07-03 15:14 ` Jan Beulich
1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2018-06-26 15:11 UTC (permalink / raw)
To: Manuel Bouyer, Andrew Cooper, George Dunlap; +Cc: xen-devel
>>> On 25.06.18 at 10:33, <bouyer@antioche.eu.org> wrote:
> On Thu, Jun 14, 2018 at 08:33:17AM -0600, Jan Beulich wrote:
>> > So far I've not been able to make Xen panic with the new xen kernel.
>> > Attached is a log of the serial console, in case you notice something.
>>
>> None of the printk()s replacing ASSERT()s have triggered, so nothing
>> interesting to lear from the log, unfortunately.
>>
>> > I'll keep anita tests running in a loop overnight, in case it ends up
>> > hitting an assert.
>
> Hello,
> the dom0 has been running for a week now, running the daily NetBSD tests.
> Attached is the console log.
> I didn't notice anything suspect, exept a few domU crashes (crashing in
> Xen, the problem is not reported back to the domU). But as this is
> running NetBSD-HEAD tests it can also be a bug in the domU, that has
> been fixed since then.
>
> It's possible that the printk changed timings in a way that prevents the
> race condition from happening ...
It may have made it less likely, but there is at least one instance in the
log (around line 6830). Sadly, this follows a set of dropped messages
(which may have been sufficient to make the race trigger again). That
is - we know nothing about d32 ahead of the crash, which is not helpful
at all. The only interesting aspect is that this appears to trigger for two
slots in a row. To me this makes it less likely again for there to be a
race in updating the counter, and more likely for the counter (living in a
union, as you may recall) to be overwritten by other code.
There's another similar instance around line 14480. The 3rd instance
(around line 13580) is a little different, in that there's no direct sign of
dropped messages, but then again there are also no useful messages
for d63 immediately ahead of the crash.
What is clear is that the referenced page always has a correct count
associated (it's always printed as zero, meaning it was incremented
from -1 just before the crash).
I now wonder whether the set_tlbflush_timestamp() invocation from
_put_page_type() is still too aggressive. In commit 2c458dfcb5 we've
reduced the invocations just as much as was deemed necessary then,
and the description explicitly says "for now". I see two options for
refining the conditional: One would be "if ( !ptpg )" (i.e. just drop the
other half of the || ), another would be to fully match the comment
and invoke it only for non-page-table pages (sort of the inverse of
the earlier if(), i.e. (x & PGT_type_mask) > PGT_l4_page_table).
It was done that minimal way because we were afraid of losing a
flush that indeed is necessary.
But if that was the case, and if the linear page table use in NetBSD
is not too different between 32- and 64-bit, I'd expect the same
issue to be observable with 64-bit guests. Or wait - in the 32-bit
case we can come here with ptpg either L2 or L3, while in the 64-bit
case this would only ever be an L4 (unless someone artificially set
up linear tables at the L3 level). So this might explain the difference
in behavior. The only remaining issue then is that I can't seem to be
able to make up a scenario where we would reach that second if()
for a page table in the first place: There would need to be one with
(initially) a single type ref but both PGT_validated and PGT_partial
clear.
Andrew, George, do you have any helpful thoughts here?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B30A8DD0200007003BD270D@prv1-mh.provo.novell.com>
2018-06-26 15:11 ` Jan Beulich
@ 2018-07-03 15:14 ` Jan Beulich
2018-07-03 16:17 ` Manuel Bouyer
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-07-03 15:14 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 25.06.18 at 10:33, <bouyer@antioche.eu.org> wrote:
> the dom0 has been running for a week now, running the daily NetBSD tests.
> Attached is the console log.
> I didn't notice anything suspect, exept a few domU crashes (crashing in
> Xen, the problem is not reported back to the domU). But as this is
> running NetBSD-HEAD tests it can also be a bug in the domU, that has
> been fixed since then.
>
> It's possible that the printk changed timings in a way that prevents the
> race condition from happening ...
So instead of the debugging patch, could you give the one below
a try?
Jan
x86: further avoid setting TLB flush time stamp
NetBSD's use of linear page tables in 32-bit mode exposes an issue with
us still storing TLB flush time stamps too early, corrupting the
linear_pt_count field living in the same union. Since we go that path
(for page tables) only when neither PGT_validated nor PGT_partial are
set on a page, we don't really require a flush to happen (see also the
code comment), yet we're also no concerned if one happens which isn't
needed (which might occur when we never write the time stamp).
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- unstable.orig/xen/arch/x86/mm.c
+++ unstable/xen/arch/x86/mm.c
@@ -2541,8 +2541,17 @@ static int _put_page_type(struct page_in
switch ( nx & (PGT_locked | PGT_count_mask) )
{
case 0:
- if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) &&
- likely(nx & (PGT_validated|PGT_partial)) )
+ /*
+ * set_tlbflush_timestamp() accesses the same union linear_pt_count
+ * lives in. Pages (including page table ones), however, don't need
+ * their flush time stamp set except when the last reference is
+ * dropped. For PT pages this happens in _put_final_page_type(). PT
+ * pages which don't have PGT_validated set do not require flushing,
+ * as they would never have been installed into a PT hierarchy.
+ */
+ if ( likely((nx & PGT_type_mask) > PGT_l4_page_table) )
+ set_tlbflush_timestamp(page);
+ else if ( likely(nx & (PGT_validated|PGT_partial)) )
{
int rc;
@@ -2563,19 +2572,8 @@ static int _put_page_type(struct page_in
return rc;
}
- if ( !ptpg || !PGT_type_equal(x, ptpg->u.inuse.type_info) )
- {
- /*
- * set_tlbflush_timestamp() accesses the same union
- * linear_pt_count lives in. Pages (including page table ones),
- * however, don't need their flush time stamp set except when
- * the last reference is being dropped. For page table pages
- * this happens in _put_final_page_type().
- */
- set_tlbflush_timestamp(page);
- }
- else
- BUG_ON(!IS_ENABLED(CONFIG_PV_LINEAR_PT));
+ BUG_ON(!IS_ENABLED(CONFIG_PV_LINEAR_PT) && ptpg &&
+ PGT_type_equal(x, ptpg->u.inuse.type_info));
/* fall through */
default:
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-07-03 15:14 ` Jan Beulich
@ 2018-07-03 16:17 ` Manuel Bouyer
2018-07-06 14:26 ` Manuel Bouyer
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-07-03 16:17 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Tue, Jul 03, 2018 at 09:14:30AM -0600, Jan Beulich wrote:
> >>> On 25.06.18 at 10:33, <bouyer@antioche.eu.org> wrote:
> > the dom0 has been running for a week now, running the daily NetBSD tests.
> > Attached is the console log.
> > I didn't notice anything suspect, exept a few domU crashes (crashing in
> > Xen, the problem is not reported back to the domU). But as this is
> > running NetBSD-HEAD tests it can also be a bug in the domU, that has
> > been fixed since then.
> >
> > It's possible that the printk changed timings in a way that prevents the
> > race condition from happening ...
>
> So instead of the debugging patch, could you give the one below
> a try?
Sure, the test server is now running with it.
As I'm still using 4.11rc4 sources I had to adjust it a bit (the second chunk
didn't apply cleanly) but it didn't look difficult to fix it.
Now lets wait for some automated tests run to complete ...
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-07-03 16:17 ` Manuel Bouyer
@ 2018-07-06 14:26 ` Manuel Bouyer
2018-07-16 10:30 ` Manuel Bouyer
0 siblings, 1 reply; 43+ messages in thread
From: Manuel Bouyer @ 2018-07-06 14:26 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Tue, Jul 03, 2018 at 06:17:28PM +0200, Manuel Bouyer wrote:
> > So instead of the debugging patch, could you give the one below
> > a try?
>
> Sure, the test server is now running with it.
> As I'm still using 4.11rc4 sources I had to adjust it a bit (the second chunk
> didn't apply cleanly) but it didn't look difficult to fix it.
>
> Now lets wait for some automated tests run to complete ...
So far no crash, so this looks good. But there has been only 14 runs,
and a few of them did not complete for unrelated issues, so it would need
a bit more time to be sure.
I'm about to leave for a one week vacation; I may have network acces
and keep an eye on it but no promise. More new on monday 16.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-07-06 14:26 ` Manuel Bouyer
@ 2018-07-16 10:30 ` Manuel Bouyer
0 siblings, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-07-16 10:30 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Fri, Jul 06, 2018 at 04:26:38PM +0200, Manuel Bouyer wrote:
> On Tue, Jul 03, 2018 at 06:17:28PM +0200, Manuel Bouyer wrote:
> > > So instead of the debugging patch, could you give the one below
> > > a try?
> >
> > Sure, the test server is now running with it.
> > As I'm still using 4.11rc4 sources I had to adjust it a bit (the second chunk
> > didn't apply cleanly) but it didn't look difficult to fix it.
> >
> > Now lets wait for some automated tests run to complete ...
>
> So far no crash, so this looks good. But there has been only 14 runs,
> and a few of them did not complete for unrelated issues, so it would need
> a bit more time to be sure.
>
> I'm about to leave for a one week vacation; I may have network acces
> and keep an eye on it but no promise. More new on monday 16.
Unfortunably there has been a crash last week:
(XEN) Assertion 'oc > 0' failed at mm.c:677
(XEN) ----[ Xen-4.11-rcnb4 x86_64 debug=y Not tainted ]----
(XEN) CPU: 4
(XEN) RIP: e008:[<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d41v0)
(XEN) rax: ffffffffffff0000 rbx: 4400000000000001 rcx: 00000000001aa738
(XEN) rdx: 0400000000000000 rsi: 000000000000001c rdi: ffff82e00354ff40
(XEN) rbp: ffff82e00354e700 rsp: ffff8301bf137c08 r8: 0000000000000000
(XEN) r9: 0000000000000200 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff82e00354ff40 r13: 0000000000000000 r14: 10ffffffffffffff
(XEN) r15: 1000000000000000 cr0: 000000008005003b cr4: 00000000000026e4
(XEN) cr3: 00000001b9935000 cr2: 00000000ccf85000
(XEN) fsb: 00000000c057fe00 gsb: 0000000000000000 gss: 0000000000000000
(XEN) ds: 0011 es: 0011 fs: 0031 gs: 0011 ss: 0000 cs: e008
(XEN) Xen code around <ffff82d080284bd2> (mm.c#dec_linear_entries+0x12/0x20):
(XEN) c1 47 1e 66 85 c0 7f 02 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 41 54
(XEN) Xen stack trace from rsp=ffff8301bf137c08:
(XEN) ffff82d080289277 0000000000800063 ffff8301bf137fff 4c00000000000002
(XEN) ffff82e00354e700 ffff82e00354ff40 ffff8301b98f5000 0000000000000001
(XEN) ffff82004001f000 0200000000000000 ffff82d08028987f 00000000000001fc
(XEN) ffff82e00354ff40 ffff82d080288c59 00000000001aa7fa 0000000000000000
(XEN) ffff8301bf137fff 4400000000000001 ffff82e00354ff40 0000000000000000
(XEN) 00ffffffffffffff 10ffffffffffffff 1000000000000000 ffff82d080289240
(XEN) 0000000101000206 ffff8301bf137fff 4400000000000002 00000000001aa7fa
(XEN) ffff82e00354ff40 0000000000000000 ffff8301b98f5000 ffff820080000000
(XEN) 0000000000000000 ffff82d0802898bf ffff82d080290b3d ffff8300bff40000
(XEN) 00000001802a82d2 ffff8301b98f5000 0000000000000000 ffff8301b98f5000
(XEN) 0000000000000007 ffff8300bff40000 00007ff000000000 0000000000000000
(XEN) ffff8301b98f5000 ffff82e003519fc0 ffff82d0804b0058 ffff82d0804b0060
(XEN) 00000000001a8cfe 00000000001a8cfe 0000000100000004 00000000001aa7fa
(XEN) 00000000cd473d14 ffff820080000018 0000000000000001 00000000cd709808
(XEN) ffff82d080387b70 0000000000000001 ffff8301bf137fff ffff82d080295600
(XEN) ffff8301bf137e14 00000001ffffffff ffff820080000000 0000000000000000
(XEN) 00007ff000000000 000000048036bbb8 cd2f8000001aa7fa ffff8301bf137ef8
(XEN) ffff8300bff40000 00000000000001a0 00000000deadf00d 0000000000000004
(XEN) 00000000deadf00d ffff82d080367d1a ffff82d000007ff0 ffff82d000000000
(XEN) ffff82d000000001 ffff82d0cd7097fc ffff82d08036bbc4 ffff82d08036bbb8
(XEN) Xen call trace:
(XEN) [<ffff82d080284bd2>] mm.c#dec_linear_entries+0x12/0x20
(XEN) [<ffff82d080289277>] mm.c#_put_page_type+0x187/0x320
(XEN) [<ffff82d08028987f>] mm.c#put_page_from_l2e+0xdf/0x110
(XEN) [<ffff82d080288c59>] free_page_type+0x2f9/0x790
(XEN) [<ffff82d080289240>] mm.c#_put_page_type+0x150/0x320
(XEN) [<ffff82d0802898bf>] put_page_type_preemptible+0xf/0x10
(XEN) [<ffff82d080290b3d>] do_mmuext_op+0x73d/0x1810
(XEN) [<ffff82d080295600>] compat_mmuext_op+0x430/0x450
(XEN) [<ffff82d080367d1a>] pv_hypercall+0x3aa/0x430
(XEN) [<ffff82d08036bbc4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbb8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbc4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbb8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbc4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036bbb8>] entry_int82+0x68/0xc0
(XEN) [<ffff82d08036bbc4>] entry_int82+0x74/0xc0
(XEN) [<ffff82d08036954e>] do_entry_int82+0x1e/0x20
(XEN) [<ffff82d08036bc01>] entry_int82+0xb1/0xc0
still with an i386 guest running.
I've put the build files at ftp://asim.lip6.fr/outgoing/bouyer/xen-debug/0703/
I'm now going to rebuild with the 4.11 release.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
[not found] ` <5B4C73C30200007F03CE8FF3@prv1-mh.provo.novell.com>
@ 2018-07-16 11:02 ` Jan Beulich
2018-07-24 15:03 ` Manuel Bouyer
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2018-07-16 11:02 UTC (permalink / raw)
To: Manuel Bouyer; +Cc: xen-devel
>>> On 16.07.18 at 12:30, <bouyer@antioche.eu.org> wrote:
> On Fri, Jul 06, 2018 at 04:26:38PM +0200, Manuel Bouyer wrote:
>> On Tue, Jul 03, 2018 at 06:17:28PM +0200, Manuel Bouyer wrote:
>> > > So instead of the debugging patch, could you give the one below
>> > > a try?
>> >
>> > Sure, the test server is now running with it.
>> > As I'm still using 4.11rc4 sources I had to adjust it a bit (the second chunk
>> > didn't apply cleanly) but it didn't look difficult to fix it.
>> >
>> > Now lets wait for some automated tests run to complete ...
>>
>> So far no crash, so this looks good. But there has been only 14 runs,
>> and a few of them did not complete for unrelated issues, so it would need
>> a bit more time to be sure.
>>
>> I'm about to leave for a one week vacation; I may have network acces
>> and keep an eye on it but no promise. More new on monday 16.
>
> Unfortunably there has been a crash last week:
Hmm, looks to be still all the same as before (except for the line
number). I'm afraid I'm out of ideas, at least for the moment.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 4.11.0 RC1 panic
2018-07-16 11:02 ` Jan Beulich
@ 2018-07-24 15:03 ` Manuel Bouyer
0 siblings, 0 replies; 43+ messages in thread
From: Manuel Bouyer @ 2018-07-24 15:03 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 477 bytes --]
On Mon, Jul 16, 2018 at 05:02:01AM -0600, Jan Beulich wrote:
> > Unfortunably there has been a crash last week:
>
> Hmm, looks to be still all the same as before (except for the line
> number). I'm afraid I'm out of ideas, at least for the moment.
OK, FYI I commited xen 4.11 packages for NetBSD, with the attached
patch. With this the hypervisor doens't panic ...
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
[-- Attachment #2: patch-zz-bouyer --]
[-- Type: text/plain, Size: 843 bytes --]
$NetBSD: patch-zz-bouyer,v 1.1 2018/07/24 13:40:11 bouyer Exp $
Dirty hack to avoid assert failure. This has been discussed on xen-devel
but no solution has been fonud so far.
The box producing http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/
is running with this patch; the printk has fired but the
hypervisor keeps running.
--- xen/arch/x86/mm.c.orig 2018-07-19 10:32:07.000000000 +0200
+++ xen/arch/x86/mm.c 2018-07-21 20:47:47.000000000 +0200
@@ -674,7 +674,12 @@
typeof(pg->linear_pt_count) oc;
oc = arch_fetch_and_add(&pg->linear_pt_count, -1);
- ASSERT(oc > 0);
+ if (oc <= 0) {
+ gdprintk(XENLOG_WARNING,
+ "mm.c:dec_linear_entries(): oc %d would fail assert\n", oc);
+ pg->linear_pt_count = 0;
+ }
+ /* ASSERT(oc > 0); */
}
static bool inc_linear_uses(struct page_info *pg)
[-- Attachment #3: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2018-07-24 15:03 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-24 16:06 4.11.0 RC1 panic Manuel Bouyer
2018-04-25 6:58 ` Jan Beulich
2018-04-25 8:16 ` Andrew Cooper
2018-04-25 10:58 ` Manuel Bouyer
2018-04-25 10:42 ` Manuel Bouyer
2018-04-25 14:42 ` Manuel Bouyer
2018-04-25 15:28 ` Jan Beulich
2018-04-25 15:57 ` Manuel Bouyer
2018-04-30 13:31 ` Jan Beulich
2018-05-01 20:22 ` Manuel Bouyer
2018-05-15 9:30 ` Jan Beulich
2018-05-22 11:01 ` Manuel Bouyer
2018-05-22 14:46 ` Jan Beulich
2018-06-10 9:54 ` Jan Beulich
2018-06-10 10:57 ` Manuel Bouyer
2018-06-10 15:38 ` Jan Beulich
2018-06-10 16:32 ` Manuel Bouyer
[not found] ` <5B1D528F020000C104A2FF49@prv1-mh.provo.novell.com>
2018-06-11 9:58 ` Jan Beulich
2018-06-11 10:13 ` Manuel Bouyer
2018-06-12 7:57 ` Jan Beulich
2018-06-12 11:39 ` Manuel Bouyer
2018-06-12 15:38 ` Manuel Bouyer
2018-06-12 15:54 ` Andrew Cooper
2018-06-12 16:00 ` Manuel Bouyer
2018-06-12 16:29 ` Andrew Cooper
2018-06-12 20:55 ` Manuel Bouyer
[not found] ` <5B1FB0E5020000F903B0F0C2@prv1-mh.provo.novell.com>
[not found] ` <5B1FE90D020000F804A5F2B5@prv1-mh.provo.novell.com>
[not found] ` <5B20335B0200008603B30F33@prv1-mh.provo.novell.com>
2018-06-13 6:23 ` Jan Beulich
2018-06-13 8:07 ` Jan Beulich
2018-06-13 8:57 ` Manuel Bouyer
[not found] ` <5B20DC910200000204A693B0@prv1-mh.provo.novell.com>
2018-06-13 9:59 ` Jan Beulich
2018-06-13 10:07 ` Manuel Bouyer
2018-06-13 22:16 ` Manuel Bouyer
[not found] ` <5B2197D3020000F004A75E9B@prv1-mh.provo.novell.com>
2018-06-14 14:33 ` Jan Beulich
2018-06-15 15:01 ` Manuel Bouyer
2018-06-25 8:33 ` Manuel Bouyer
[not found] ` <5B30A8DD0200007003BD270D@prv1-mh.provo.novell.com>
2018-06-26 15:11 ` Jan Beulich
2018-07-03 15:14 ` Jan Beulich
2018-07-03 16:17 ` Manuel Bouyer
2018-07-06 14:26 ` Manuel Bouyer
2018-07-16 10:30 ` Manuel Bouyer
[not found] ` <5B1FECBE0200005C04A5F773@prv1-mh.provo.novell.com>
[not found] ` <5B1FEE270200008403B2CB9D@prv1-mh.provo.novell.com>
[not found] ` <5B1FF4E60200007D03B2D421@prv1-mh.provo.novell.com>
2018-06-13 6:54 ` Jan Beulich
[not found] ` <5B3BA1A6020000F004B7A140@prv1-mh.provo.novell.com>
[not found] ` <5B3F7C2F020000D004BA0458@prv1-mh.provo.novell.com>
[not found] ` <5B4C73C30200007F03CE8FF3@prv1-mh.provo.novell.com>
2018-07-16 11:02 ` Jan Beulich
2018-07-24 15:03 ` Manuel Bouyer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).