* kernel bootup slow issue on ovm3.1.1 @ 2012-08-07 7:22 zhenzhong.duan 2012-08-07 8:37 ` Jan Beulich 2012-08-07 16:26 ` Konrad Rzeszutek Wilk 0 siblings, 2 replies; 25+ messages in thread From: zhenzhong.duan @ 2012-08-07 7:22 UTC (permalink / raw) To: xen-devel, Konrad Rzeszutek Wilk, Feng Jin [-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --] Hi maintainers, We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). The system env is an exalogic node with 24 cores + 100G mem (2 socket , 6 cores per socket, 2 HT threads per core). After boot up this node with all cores enabled, We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, it takes 30+ mins to boot. If we remove passthrough device from vm.cfg, bootup takes about 2 mins. If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. So a big mem + passthrough device made the worst case. If we boot this node with HT disabled from BIOS. Now only 12 cores are available. OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! After some debug, we found it's in kernel mtrr init that make this delay. mtrr_aps_init() \-> set_mtrr() \-> mtrr_work_handler() kernel spin in mtrr_work_handler. But we don't know the scene hide in the hypervisor. Why big mem + passthrough made the worst case. Is this already fixed in xen upstream? Any comments are welcome, I'll upload all data depend on your need. thanks zduan [-- Attachment #1.2: Type: text/html, Size: 1707 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-07 7:22 kernel bootup slow issue on ovm3.1.1 zhenzhong.duan @ 2012-08-07 8:37 ` Jan Beulich 2012-08-08 9:48 ` zhenzhong.duan 2012-08-07 16:26 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 25+ messages in thread From: Jan Beulich @ 2012-08-07 8:37 UTC (permalink / raw) To: zhenzhong.duan; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel >>> On 07.08.12 at 09:22, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > After some debug, we found it's in kernel mtrr init that make this delay. > > mtrr_aps_init() > \-> set_mtrr() > \-> mtrr_work_handler() > > kernel spin in mtrr_work_handler. > > But we don't know the scene hide in the hypervisor. Why big mem + > passthrough made the worst case. > Is this already fixed in xen upstream? First of all it would have been useful to indicate the kernel version, since mtrr_work_handler() disappeared after 3.0. Obviously worth checking whether that change by itself already addresses your problem. Next, if you already spotted where the spinning occurs, you should also be able to tell what's going on at the other side, i.e. why the event that is being waited for isn't occurring for this long a time. Since there's a number of open coded spin loops here, knowing exactly which one each CPU is sitting in (and which ones might not be in any) is the fundamental information needed. >From what you're telling us so far, I'd rather suspect a kernel problem, not a hypervisor one here. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-07 8:37 ` Jan Beulich @ 2012-08-08 9:48 ` zhenzhong.duan 2012-08-08 14:47 ` Jan Beulich 2012-08-08 15:01 ` Jan Beulich 0 siblings, 2 replies; 25+ messages in thread From: zhenzhong.duan @ 2012-08-08 9:48 UTC (permalink / raw) To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel 于 2012-08-07 16:37, Jan Beulich 写道: >>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> After some debug, we found it's in kernel mtrr init that make this delay. >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> kernel spin in mtrr_work_handler. >> >> But we don't know the scene hide in the hypervisor. Why big mem + >> passthrough made the worst case. >> Is this already fixed in xen upstream? > First of all it would have been useful to indicate the kernel version, > since mtrr_work_handler() disappeared after 3.0. Obviously worth > checking whether that change by itself already addresses your > problem. No luck, tried upstream kernel 3.6.0-rc1, seems worse. It took 2 hours to boot up. > > Next, if you already spotted where the spinning occurs, you > should also be able to tell what's going on at the other side, i.e. > why the event that is being waited for isn't occurring for this > long a time. Since there's a number of open coded spin loops > here, knowing exactly which one each CPU is sitting in (and > which ones might not be in any) is the fundamental information > needed. > > From what you're telling us so far, I'd rather suspect a kernel > problem, not a hypervisor one here. Per my finding, most of vcpus spin at set_atomicity_lock. Some spin at stop_machine after finish their job. Only one vcpu is calling generic_set_all. I'm not sure if the vcpu calling generic_set_all don't have higher priority and maybe preempt by other vcpus and dom0 frequently. This waste much time. > > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-08 9:48 ` zhenzhong.duan @ 2012-08-08 14:47 ` Jan Beulich 2012-08-08 15:01 ` Jan Beulich 1 sibling, 0 replies; 25+ messages in thread From: Jan Beulich @ 2012-08-08 14:47 UTC (permalink / raw) To: zhenzhong.duan; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel >>> On 08.08.12 at 11:48, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-07 16:37, Jan Beulich 写道: >>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> After some debug, we found it's in kernel mtrr init that make this delay. >>> >>> mtrr_aps_init() >>> \-> set_mtrr() >>> \-> mtrr_work_handler() >>> >>> kernel spin in mtrr_work_handler. >>> >>> But we don't know the scene hide in the hypervisor. Why big mem + >>> passthrough made the worst case. >>> Is this already fixed in xen upstream? >> First of all it would have been useful to indicate the kernel version, >> since mtrr_work_handler() disappeared after 3.0. Obviously worth >> checking whether that change by itself already addresses your >> problem. > No luck, tried upstream kernel 3.6.0-rc1, seems worse. It took 2 hours > to boot up. That's quite a big step from 3.0.x. And in another response you point out that 3.6 is way worse than 3.5 was. So maybe going back to 3.1 or 3.2 might be a better idea if debugging the issue doesn't get you anywhere. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-08 9:48 ` zhenzhong.duan 2012-08-08 14:47 ` Jan Beulich @ 2012-08-08 15:01 ` Jan Beulich 2012-08-09 9:42 ` zhenzhong.duan 1 sibling, 1 reply; 25+ messages in thread From: Jan Beulich @ 2012-08-08 15:01 UTC (permalink / raw) To: zhenzhong.duan; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel >>> On 08.08.12 at 11:48, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-07 16:37, Jan Beulich 写道: >>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> Next, if you already spotted where the spinning occurs, you >> should also be able to tell what's going on at the other side, i.e. >> why the event that is being waited for isn't occurring for this >> long a time. Since there's a number of open coded spin loops >> here, knowing exactly which one each CPU is sitting in (and >> which ones might not be in any) is the fundamental information >> needed. >> >> From what you're telling us so far, I'd rather suspect a kernel >> problem, not a hypervisor one here. > Per my finding, most of vcpus spin at set_atomicity_lock. Then you need to determine what the current owner of the lock is doing. > Some spin at stop_machine after finish their job. And here you'd need to find out what they're waiting for, and what those CPUs are doing. > Only one vcpu is calling generic_set_all. > I'm not sure if the vcpu calling generic_set_all don't have higher > priority and maybe preempt by other vcpus and dom0 frequently. > This waste much time. There's not that much being done in generic_set_all(), so the code should finish reasonably quickly. Are you perhaps having more vCPU-s in the guest than pCPU-s they can run on? Does your hardware support Pause-Loop-Exiting (or the AMD equivalent, don't recall their term right now)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-08 15:01 ` Jan Beulich @ 2012-08-09 9:42 ` zhenzhong.duan 2012-08-09 10:35 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: zhenzhong.duan @ 2012-08-09 9:42 UTC (permalink / raw) To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel 于 2012-08-08 23:01, Jan Beulich 写道: >>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-07 16:37, Jan Beulich 写道: >>>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> Next, if you already spotted where the spinning occurs, you >>> should also be able to tell what's going on at the other side, i.e. >>> why the event that is being waited for isn't occurring for this >>> long a time. Since there's a number of open coded spin loops >>> here, knowing exactly which one each CPU is sitting in (and >>> which ones might not be in any) is the fundamental information >>> needed. >>> >>> From what you're telling us so far, I'd rather suspect a kernel >>> problem, not a hypervisor one here. >> Per my finding, most of vcpus spin at set_atomicity_lock. > Then you need to determine what the current owner of the > lock is doing. I add printk.time=1 to kernel cmdline, but dmesg don't show much help. [ 1.978706] smpboot: Total of 24 processors activated (140449.34 BogoMIPS) (block ~30 mins) [ 1.988859] devtmpfs: initialized > >> Some spin at stop_machine after finish their job. > And here you'd need to find out what they're waiting for, > and what those CPUs are doing. They are waiting the vcpu calling generic_set_all and those spin at set_atomicity_lock. In fact, all are waiting generic_set_all > >> Only one vcpu is calling generic_set_all. >> I'm not sure if the vcpu calling generic_set_all don't have higher >> priority and maybe preempt by other vcpus and dom0 frequently. >> This waste much time. > There's not that much being done in generic_set_all(), so the > code should finish reasonably quickly. Are you perhaps having > more vCPU-s in the guest than pCPU-s they can run on? System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 cores per socket, 2 HT threads per core). Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device. > Does > your hardware support Pause-Loop-Exiting (or the AMD > equivalent, don't recall their term right now)? I have no access to serial line, could I get the info by a command? /proc/cpuinfo shows below: cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-09 9:42 ` zhenzhong.duan @ 2012-08-09 10:35 ` Jan Beulich 2012-08-10 4:40 ` zhenzhong.duan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2012-08-09 10:35 UTC (permalink / raw) To: zhenzhong.duan; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel >>> On 09.08.12 at 11:42, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-08 23:01, Jan Beulich 写道: >>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> 于 2012-08-07 16:37, Jan Beulich 写道: >>> Some spin at stop_machine after finish their job. >> And here you'd need to find out what they're waiting for, >> and what those CPUs are doing. > They are waiting the vcpu calling generic_set_all and those spin at > set_atomicity_lock. > In fact, all are waiting generic_set_all I think we're moving in circles - what is the vCPU currently generic_set_all() then doing? >> There's not that much being done in generic_set_all(), so the >> code should finish reasonably quickly. Are you perhaps having >> more vCPU-s in the guest than pCPU-s they can run on? > System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 > cores per socket, 2 HT threads per core). > Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device. So you're indeed over-committing the system. How many vCPU-s does you Dom0 have? Are there any other VMs? Is there any vCPU pinning in effect? >> Does >> your hardware support Pause-Loop-Exiting (or the AMD >> equivalent, don't recall their term right now)? > I have no access to serial line, could I get the info by a command? "xl dmesg" run early enough (i.e. before the log buffer wraps). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-09 10:35 ` Jan Beulich @ 2012-08-10 4:40 ` zhenzhong.duan 2012-08-10 14:22 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: zhenzhong.duan @ 2012-08-10 4:40 UTC (permalink / raw) To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 11147 bytes --] 于 2012-08-09 18:35, Jan Beulich 写道: >>>> On 09.08.12 at 11:42, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-08 23:01, Jan Beulich 写道: >>>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>> 于 2012-08-07 16:37, Jan Beulich 写道: >>>> Some spin at stop_machine after finish their job. >>> And here you'd need to find out what they're waiting for, >>> and what those CPUs are doing. >> They are waiting the vcpu calling generic_set_all and those spin at >> set_atomicity_lock. >> In fact, all are waiting generic_set_all > I think we're moving in circles - what is the vCPU currently > generic_set_all() then doing? Add some debug print, generic_set_all->prepare_set->write_cr0 took much time, all else are quick. set_atomicity_lock serialized this process between cpus, make it worse. One iteration: MTRR: CPU 2 prepare_set: before read_cr0 prepare_set: before write_cr0 ------*block here* prepare_set: before wbinvd prepare_set: before read_cr4 prepare_set: before write_cr4 prepare_set: before __flush_tlb prepare_set: before rdmsr prepare_set: before wrmsr generic_set_all: before set_mtrr_state generic_set_all: before pat_init post_set: before wbinvd post_set: before wrmsr post_set: before write_cr0 post_set: before write_cr4 > >>> There's not that much being done in generic_set_all(), so the >>> code should finish reasonably quickly. Are you perhaps having >>> more vCPU-s in the guest than pCPU-s they can run on? >> System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 >> cores per socket, 2 HT threads per core). >> Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device. > So you're indeed over-committing the system. How many vCPU-s > does you Dom0 have? Are there any other VMs? Is there any > vCPU pinning in effect? dom0 boot with 24 vcpus(same result with dom0_max_vcpus=4). No other vm except dom0. All 24 vcpus spin from xentop result. Below is xentop clip. NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID Domain-0 -----r 43072 158.8 2050560 2.0 no limit n/a 24 0 0 0 0 0 0 0 0 0 0 VCPUs(sec): 0: 13649s 1: 6197s 2: 4254s 3: 2006s 4: 1409s 5: 930s 6: 698s 7: 630s 8: 612s 9: 2038s 10: 544s 11: 940s 12: 556s 13: 510s 14: 456s 15: 591s 16: 438s 17: 508s 18: 3350s 19: 512s 20: 544s 21: 529s 22: 547s 23: 610s zduan_test -----r 13140 2234.4 92327920 91.7 92327936 91.7 24 1 0 0 1 0 0 0 0 0 0 VCPUs(sec): 0: 556s 1: 551s 2: 549s 3: 544s 4: 549s 5: 545s 6: 545s 7: 547s 8: 545s 9: 548s 10: 545s 11: 546s 12: 545s 13: 548s 14: 543s 15: 544s 16: 551s 17: 545s 18: 547s 19: 551s 20: 544s 21: 549s 22: 546s 23: 545s >>> Does >>> your hardware support Pause-Loop-Exiting (or the AMD >>> equivalent, don't recall their term right now)? >> I have no access to serial line, could I get the info by a command? > "xl dmesg" run early enough (i.e. before the log buffer wraps). Below is xl dmesg result for your reference. thanks [root@scae02cn01 zduan]# xl dmesg __ __ _ _ ___ ____ _____ ____ __ \ \/ /___ _ __ | || | / _ \ |___ \ / _ \ \ / / \/ | \ // _ \ '_ \ | || |_| | | | __) |__| | | \ \ / /| |\/| | / \ __/ | | | |__ _| |_| | / __/|__| |_| |\ V / | | | | /_/\_\___|_| |_| |_|(_)___(_)_____| \___/ \_/ |_| |_| (XEN) Xen version 4.0.2-OVM (mockbuild@(none)) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) Fri Dec 23 17:00:16 EST 2011 (XEN) Latest ChangeSet: unavailable (XEN) Bootloader: GNU GRUB 0.97 (XEN) Command line: dom0_mem=2G (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 1 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 0000000000099400 (usable) (XEN) 0000000000099400 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 000000007f780000 (usable) (XEN) 000000007f78e000 - 000000007f790000 type 9 (XEN) 000000007f790000 - 000000007f79e000 (ACPI data) (XEN) 000000007f79e000 - 000000007f7d0000 (ACPI NVS) (XEN) 000000007f7d0000 - 000000007f7e0000 (reserved) (XEN) 000000007f7ec000 - 0000000080000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000001880000000 (usable) (XEN) ACPI: RSDP 000FAA40, 0024 (r2 SUN ) (XEN) ACPI: XSDT 7F790100, 0094 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: FACP 7F790290, 00F4 (r4 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: DSDT 7F7905C0, 5ECF (r2 SUN Xxx70 1 INTL 20051117) (XEN) ACPI: FACS 7F79E000, 0040 (XEN) ACPI: APIC 7F790390, 011E (r2 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: MCFG 7F790500, 003C (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: SLIT 7F790540, 0030 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: SPMI 7F790570, 0041 (r5 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: OEMB 7F79E040, 00BE (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: HPET 7F79A5C0, 0038 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: DMAR 7F79E100, 0130 (r1 SUN Xxx70 1 MSFT 97) (XEN) ACPI: SRAT 7F79A600, 0250 (r1 SUN Xxx70 1 INTC 1) (XEN) ACPI: SSDT 7F79EF60, 0363 (r1 SUN Xxx70 12 INTL 20051117) (XEN) ACPI: EINJ 7F79A850, 0130 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: BERT 7F79A9E0, 0030 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: ERST 7F79AA10, 01B0 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) ACPI: HEST 7F79ABC0, 00A8 (r1 SUN Xxx70 20111011 MSFT 97) (XEN) System RAM: 98295MB (100654180kB) (XEN) Domain heap initialised DMA width 32 bits (XEN) Processor #0 6:12 APIC version 21 (XEN) Processor #2 6:12 APIC version 21 (XEN) Processor #4 6:12 APIC version 21 (XEN) Processor #16 6:12 APIC version 21 (XEN) Processor #18 6:12 APIC version 21 (XEN) Processor #20 6:12 APIC version 21 (XEN) Processor #32 6:12 APIC version 21 (XEN) Processor #34 6:12 APIC version 21 (XEN) Processor #36 6:12 APIC version 21 (XEN) Processor #48 6:12 APIC version 21 (XEN) Processor #50 6:12 APIC version 21 (XEN) Processor #52 6:12 APIC version 21 (XEN) Processor #1 6:12 APIC version 21 (XEN) Processor #3 6:12 APIC version 21 (XEN) Processor #5 6:12 APIC version 21 (XEN) Processor #17 6:12 APIC version 21 (XEN) Processor #19 6:12 APIC version 21 (XEN) Processor #21 6:12 APIC version 21 (XEN) Processor #33 6:12 APIC version 21 (XEN) Processor #35 6:12 APIC version 21 (XEN) Processor #37 6:12 APIC version 21 (XEN) Processor #49 6:12 APIC version 21 (XEN) Processor #51 6:12 APIC version 21 (XEN) Processor #53 6:12 APIC version 21 (XEN) IOAPIC[0]: apic_id 6, version 32, address 0xfec00000, GSI 0-23 (XEN) IOAPIC[1]: apic_id 7, version 32, address 0xfec8a000, GSI 24-47 (XEN) Enabling APIC mode: Phys. Using 2 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2926.029 MHz processor. (XEN) Initing memory sharing. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) - Unrestricted Guest (XEN) EPT supports 2MB super page. (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging detected. (XEN) Intel VT-d Snoop Control enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) Total of 24 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using old ACK method (XEN) TSC is reliable, synchronization unnecessary (XEN) Platform timer is 14.318MHz HPET (XEN) Allocated console ring of 64 KiB. (XEN) Brought up 24 CPUs (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, lsb, paddr 0x2000 -> 0x6d5000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000835000000->0000000836000000 (520192 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff80002000->ffffffff806d5000 (XEN) Init. ramdisk: ffffffff806d5000->ffffffff80ed7400 (XEN) Phys-Mach map: ffffea0000000000->ffffea0000400000 (XEN) Start info: ffffffff80ed8000->ffffffff80ed84b4 (XEN) Page tables: ffffffff80ed9000->ffffffff80ee4000 (XEN) Boot stack: ffffffff80ee4000->ffffffff80ee5000 (XEN) TOTAL: ffffffff80000000->ffffffff81000000 (XEN) ENTRY ADDRESS: ffffffff80002000 (XEN) Dom0 has maximum 24 VCPUs (XEN) Scrubbing Free RAM: .....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 168kB init memory. [-- Attachment #1.2: Type: text/html, Size: 15595 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-10 4:40 ` zhenzhong.duan @ 2012-08-10 14:22 ` Jan Beulich 2012-08-13 7:58 ` zhenzhong.duan 2012-08-13 9:07 ` Tim Deegan 0 siblings, 2 replies; 25+ messages in thread From: Jan Beulich @ 2012-08-10 14:22 UTC (permalink / raw) To: zhenzhong.duan; +Cc: Konrad Rzeszutek Wilk, Feng Jin, xen-devel >>> On 10.08.12 at 06:40, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-09 18:35, Jan Beulich 写道: >>>>> On 09.08.12 at 11:42, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> 于 2012-08-08 23:01, Jan Beulich 写道: >>>>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>>> 于 2012-08-07 16:37, Jan Beulich 写道: >>>>> Some spin at stop_machine after finish their job. >>>> And here you'd need to find out what they're waiting for, >>>> and what those CPUs are doing. >>> They are waiting the vcpu calling generic_set_all and those spin at >>> set_atomicity_lock. >>> In fact, all are waiting generic_set_all >> I think we're moving in circles - what is the vCPU currently >> generic_set_all() then doing? > Add some debug print, generic_set_all->prepare_set->write_cr0 took much > time, > all else are quick. set_atomicity_lock serialized this process between > cpus, make it worse. > One iteration: > MTRR: CPU 2 > prepare_set: before read_cr0 > prepare_set: before write_cr0 ------*block here* Yeah, that CR0 write disables the caches, and that's pretty expensive on EPT (not sure why NPT doesn't use/need the same hook) when the guest has any active MMIO regions: vmx_set_uc_mode(), when HAP is enabled, calls ept_change_entry_emt_with_range(), which is a walk through the entire guest page tables (i.e. scales with guest size, or, to be precise, with the highest populated GFN). Going back to your original mail, I wonder however why this gets done at all. You said it got there via mtrr_aps_init() \-> set_mtrr() \-> mtrr_work_handler() yet this isn't done unconditionally - see the comment before checking mtrr_aps_delayed_init. Can you find out where the obviously necessary call(s) to set_mtrr_aps_delayed_init() come(s) from? > prepare_set: before wbinvd > prepare_set: before read_cr4 > prepare_set: before write_cr4 > prepare_set: before __flush_tlb > prepare_set: before rdmsr > prepare_set: before wrmsr > generic_set_all: before set_mtrr_state > generic_set_all: before pat_init > post_set: before wbinvd > post_set: before wrmsr > post_set: before write_cr0 > post_set: before write_cr4 > >> >>>> There's not that much being done in generic_set_all(), so the >>>> code should finish reasonably quickly. Are you perhaps having >>>> more vCPU-s in the guest than pCPU-s they can run on? >>> System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 >>> cores per socket, 2 HT threads per core). >>> Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device. >> So you're indeed over-committing the system. How many vCPU-s >> does you Dom0 have? Are there any other VMs? Is there any >> vCPU pinning in effect? > dom0 boot with 24 vcpus(same result with dom0_max_vcpus=4). No other vm > except dom0. All 24 vcpus spin from xentop result. Below is xentop clip. Yes, this way you do overcommit - 24 guest vCPU-s spinning, plus anything Dom0 may need to do. >>>> Does >>>> your hardware support Pause-Loop-Exiting (or the AMD >>>> equivalent, don't recall their term right now)? >>> I have no access to serial line, could I get the info by a command? >> "xl dmesg" run early enough (i.e. before the log buffer wraps). > Below is xl dmesg result for your reference. thanks >... > (XEN) VMX: Supported advanced features: > (XEN) - APIC MMIO access virtualisation > (XEN) - APIC TPR shadow > (XEN) - Extended Page Tables (EPT) > (XEN) - Virtual-Processor Identifiers (VPID) > (XEN) - Virtual NMI > (XEN) - MSR direct-access bitmap > (XEN) - Unrestricted Guest I'm sorry, I had expected this to be printed here, but it isn't. Hence I can't tell for sure whether PLE is implemented there, but given how long it has been available it ought to be when "Unrestricted Guest" is there (which iirc got introduced much later). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-10 14:22 ` Jan Beulich @ 2012-08-13 7:58 ` zhenzhong.duan 2012-08-13 9:29 ` Jan Beulich 2012-08-13 9:07 ` Tim Deegan 1 sibling, 1 reply; 25+ messages in thread From: zhenzhong.duan @ 2012-08-13 7:58 UTC (permalink / raw) To: Jan Beulich; +Cc: Satish Kantheti, Konrad Rzeszutek Wilk, Feng Jin, xen-devel Ccing satish who first find this issue. 于 2012-08-10 22:22, Jan Beulich 写道: >>>> On 10.08.12 at 06:40, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-09 18:35, Jan Beulich 写道: >>>>>> On 09.08.12 at 11:42, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>> 于 2012-08-08 23:01, Jan Beulich 写道: >>>>>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>>>> 于 2012-08-07 16:37, Jan Beulich 写道: >>>>>> Some spin at stop_machine after finish their job. >>>>> And here you'd need to find out what they're waiting for, >>>>> and what those CPUs are doing. >>>> They are waiting the vcpu calling generic_set_all and those spin at >>>> set_atomicity_lock. >>>> In fact, all are waiting generic_set_all >>> I think we're moving in circles - what is the vCPU currently >>> generic_set_all() then doing? >> Add some debug print, generic_set_all->prepare_set->write_cr0 took much >> time, >> all else are quick. set_atomicity_lock serialized this process between >> cpus, make it worse. >> One iteration: >> MTRR: CPU 2 >> prepare_set: before read_cr0 >> prepare_set: before write_cr0 ------*block here* > > Yeah, that CR0 write disables the caches, and that's pretty > expensive on EPT (not sure why NPT doesn't use/need the > same hook) when the guest has any active MMIO regions: > vmx_set_uc_mode(), when HAP is enabled, calls > ept_change_entry_emt_with_range(), which is a walk through > the entire guest page tables (i.e. scales with guest size, or, to > be precise, with the highest populated GFN). > > Going back to your original mail, I wonder however why this > gets done at all. You said it got there via > > mtrr_aps_init() > \-> set_mtrr() > \-> mtrr_work_handler() > > yet this isn't done unconditionally - see the comment before > checking mtrr_aps_delayed_init. Can you find out where the > obviously necessary call(s) to set_mtrr_aps_delayed_init() > come(s) from? At bootup stage, set_mtrr_aps_delayed_init is called by native_smp_prepare_cpus. mtrr_aps_delayed_init is always set to ture for intel processor in upstream code. >>>>> Does >>>>> your hardware support Pause-Loop-Exiting (or the AMD >>>>> equivalent, don't recall their term right now)? >>>> I have no access to serial line, could I get the info by a command? >>> "xl dmesg" run early enough (i.e. before the log buffer wraps). >> Below is xl dmesg result for your reference. thanks >> ... >> (XEN) VMX: Supported advanced features: >> (XEN) - APIC MMIO access virtualisation >> (XEN) - APIC TPR shadow >> (XEN) - Extended Page Tables (EPT) >> (XEN) - Virtual-Processor Identifiers (VPID) >> (XEN) - Virtual NMI >> (XEN) - MSR direct-access bitmap >> (XEN) - Unrestricted Guest > > I'm sorry, I had expected this to be printed here, but it isn't. > Hence I can't tell for sure whether PLE is implemented there, > but given how long it has been available it ought to be when > "Unrestricted Guest" is there (which iirc got introduced much > later). From VMCS dump, looks PAUSE exiting is 0, PLE is 1. (XEN) *** Control State *** (XEN) PinBased=0000003f CPUBased=b6a065fe SecondaryExec=000004eb zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-13 7:58 ` zhenzhong.duan @ 2012-08-13 9:29 ` Jan Beulich 2012-08-13 11:08 ` Stefano Stabellini 2012-08-29 5:36 ` zhenzhong.duan 0 siblings, 2 replies; 25+ messages in thread From: Jan Beulich @ 2012-08-13 9:29 UTC (permalink / raw) To: zhenzhong.duan Cc: Satish Kantheti, xen-devel, Feng Jin, Konrad Rzeszutek Wilk, Stefano Stabellini >>> On 13.08.12 at 09:58, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-10 22:22, Jan Beulich 写道: >> Going back to your original mail, I wonder however why this >> gets done at all. You said it got there via >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> yet this isn't done unconditionally - see the comment before >> checking mtrr_aps_delayed_init. Can you find out where the >> obviously necessary call(s) to set_mtrr_aps_delayed_init() >> come(s) from? > At bootup stage, set_mtrr_aps_delayed_init is called by > native_smp_prepare_cpus. > mtrr_aps_delayed_init is always set to ture for intel processor in upstream > code. Indeed, and that (in one form or another) has been done virtually forever in Linux. I wonder why the problem wasn't noticed (or looked into, if it was noticed) so far. As it's going to be rather difficult to convince the Linux folks to change their code (plus this wouldn't help with existing kernels anyway), we'll need to find a way to improve this in the hypervisor. One seemingly orthogonal thing would presumably help quite a bit on the guest side nevertheless - para-virtualized spin locks. I have no idea why this is only being done when running pv, but not for pvhvm. Konrad, Stefano? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-13 9:29 ` Jan Beulich @ 2012-08-13 11:08 ` Stefano Stabellini 2012-08-29 5:19 ` zhenzhong.duan 2012-08-29 5:36 ` zhenzhong.duan 1 sibling, 1 reply; 25+ messages in thread From: Stefano Stabellini @ 2012-08-13 11:08 UTC (permalink / raw) To: Jan Beulich Cc: Satish Kantheti, Konrad Rzeszutek Wilk, Stefano Stabellini, Feng Jin, zhenzhong.duan@oracle.com, xen-devel [-- Attachment #1: Type: text/plain, Size: 2599 bytes --] On Mon, 13 Aug 2012, Jan Beulich wrote: > >>> On 13.08.12 at 09:58, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > > 于 2012-08-10 22:22, Jan Beulich 写道: > >> Going back to your original mail, I wonder however why this > >> gets done at all. You said it got there via > >> > >> mtrr_aps_init() > >> \-> set_mtrr() > >> \-> mtrr_work_handler() > >> > >> yet this isn't done unconditionally - see the comment before > >> checking mtrr_aps_delayed_init. Can you find out where the > >> obviously necessary call(s) to set_mtrr_aps_delayed_init() > >> come(s) from? > > At bootup stage, set_mtrr_aps_delayed_init is called by > > native_smp_prepare_cpus. > > mtrr_aps_delayed_init is always set to ture for intel processor in upstream > > code. > > Indeed, and that (in one form or another) has been done > virtually forever in Linux. I wonder why the problem wasn't > noticed (or looked into, if it was noticed) so far. > > As it's going to be rather difficult to convince the Linux folks > to change their code (plus this wouldn't help with existing > kernels anyway), we'll need to find a way to improve this in > the hypervisor. > > One seemingly orthogonal thing would presumably help quite > a bit on the guest side nevertheless - para-virtualized spin > locks. I have no idea why this is only being done when running > pv, but not for pvhvm. Konrad, Stefano? I tried to use PV spinlocks on PV on HVM guests but I found that: commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Date: Tue Sep 6 17:41:47 2011 +0100 xen: disable PV spinlocks on HVM PV spinlocks cannot possibly work with the current code because they are enabled after pvops patching has already been done, and because PV spinlocks use a different data structure than native spinlocks so we cannot switch between them dynamically. A spinlock that has been taken once by the native code (__ticket_spin_lock) cannot be taken by __xen_spin_lock even after it has been released. Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> at that time Jeremy was finishing off his PV ticket locks series, that has the nice side effect of making it much easier to implement PV on HVM spin locks so I just deciced to wait and just append the following patch to his series: http://marc.info/?l=xen-devel&m=131846828430409&w=2 that clearly never went upstream. [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-13 11:08 ` Stefano Stabellini @ 2012-08-29 5:19 ` zhenzhong.duan 2012-08-29 18:28 ` Stefano Stabellini 0 siblings, 1 reply; 25+ messages in thread From: zhenzhong.duan @ 2012-08-29 5:19 UTC (permalink / raw) To: Stefano Stabellini Cc: Satish Kantheti, Konrad Rzeszutek Wilk, Feng Jin, Jan Beulich, xen-devel On 2012-08-13 19:08, Stefano Stabellini wrote: > On Mon, 13 Aug 2012, Jan Beulich wrote: > > I tried to use PV spinlocks on PV on HVM guests but I found that: > > commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 > Author: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > Date: Tue Sep 6 17:41:47 2011 +0100 > > xen: disable PV spinlocks on HVM > > PV spinlocks cannot possibly work with the current code because they are > enabled after pvops patching has already been done, and because PV > spinlocks use a different data structure than native spinlocks so we > cannot switch between them dynamically. A spinlock that has been taken > once by the native code (__ticket_spin_lock) cannot be taken by > __xen_spin_lock even after it has been released. > > Reported-and-Tested-by: Stefan Bader<stefan.bader@canonical.com> > Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > Signed-off-by: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com> > > > at that time Jeremy was finishing off his PV ticket locks series, that > has the nice side effect of making it much easier to implement PV on HVM > spin locks so I just deciced to wait and just append the following patch > to his series: > > http://marc.info/?l=xen-devel&m=131846828430409&w=2 > > that clearly never went upstream. Hi Stefano, Is there a schedule those patch merge to upstream? zduan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-29 5:19 ` zhenzhong.duan @ 2012-08-29 18:28 ` Stefano Stabellini 0 siblings, 0 replies; 25+ messages in thread From: Stefano Stabellini @ 2012-08-29 18:28 UTC (permalink / raw) To: zhenzhong.duan Cc: Satish Kantheti, Konrad Rzeszutek Wilk, Stefano Stabellini, Feng Jin, xen-devel, Jan Beulich On Wed, 29 Aug 2012, zhenzhong.duan wrote: > > On 2012-08-13 19:08, Stefano Stabellini wrote: > > On Mon, 13 Aug 2012, Jan Beulich wrote: > > > > I tried to use PV spinlocks on PV on HVM guests but I found that: > > > > commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 > > Author: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > > Date: Tue Sep 6 17:41:47 2011 +0100 > > > > xen: disable PV spinlocks on HVM > > > > PV spinlocks cannot possibly work with the current code because they are > > enabled after pvops patching has already been done, and because PV > > spinlocks use a different data structure than native spinlocks so we > > cannot switch between them dynamically. A spinlock that has been taken > > once by the native code (__ticket_spin_lock) cannot be taken by > > __xen_spin_lock even after it has been released. > > > > Reported-and-Tested-by: Stefan Bader<stefan.bader@canonical.com> > > Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > > Signed-off-by: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com> > > > > > > at that time Jeremy was finishing off his PV ticket locks series, that > > has the nice side effect of making it much easier to implement PV on HVM > > spin locks so I just deciced to wait and just append the following patch > > to his series: > > > > http://marc.info/?l=xen-devel&m=131846828430409&w=2 > > > > that clearly never went upstream. > Hi Stefano, > Is there a schedule those patch merge to upstream? They are currently being handled by the KVM guys: https://lkml.org/lkml/2012/5/2/119 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-13 9:29 ` Jan Beulich 2012-08-13 11:08 ` Stefano Stabellini @ 2012-08-29 5:36 ` zhenzhong.duan 2012-08-30 9:03 ` Tim Deegan 2012-08-31 9:07 ` Jan Beulich 1 sibling, 2 replies; 25+ messages in thread From: zhenzhong.duan @ 2012-08-29 5:36 UTC (permalink / raw) To: Jan Beulich Cc: Satish Kantheti, xen-devel, Feng Jin, Konrad Rzeszutek Wilk, Stefano Stabellini 于 2012-08-13 17:29, Jan Beulich 写道: >>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-10 22:22, Jan Beulich 写道: >>> Going back to your original mail, I wonder however why this >>> gets done at all. You said it got there via >>> >>> mtrr_aps_init() >>> \-> set_mtrr() >>> \-> mtrr_work_handler() >>> >>> yet this isn't done unconditionally - see the comment before >>> checking mtrr_aps_delayed_init. Can you find out where the >>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>> come(s) from? >> At bootup stage, set_mtrr_aps_delayed_init is called by >> native_smp_prepare_cpus. >> mtrr_aps_delayed_init is always set to ture for intel processor in upstream >> code. > Indeed, and that (in one form or another) has been done > virtually forever in Linux. I wonder why the problem wasn't > noticed (or looked into, if it was noticed) so far. > > As it's going to be rather difficult to convince the Linux folks > to change their code (plus this wouldn't help with existing > kernels anyway), we'll need to find a way to improve this in > the hypervisor. Hi Jan, Tim Is this issue improvable from xen side? thanks zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-29 5:36 ` zhenzhong.duan @ 2012-08-30 9:03 ` Tim Deegan 2012-09-19 2:39 ` zhenzhong.duan 2013-04-29 17:55 ` Konrad Rzeszutek Wilk 2012-08-31 9:07 ` Jan Beulich 1 sibling, 2 replies; 25+ messages in thread From: Tim Deegan @ 2012-08-30 9:03 UTC (permalink / raw) To: zhenzhong.duan Cc: Satish Kantheti, Stefano Stabellini, Konrad Rzeszutek Wilk, Feng Jin, xen-devel, Jan Beulich At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: > > > ??? 2012-08-13 17:29, Jan Beulich ??????: > >>>>On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> > >>>>wrote: > >>??? 2012-08-10 22:22, Jan Beulich ??????: > >>>Going back to your original mail, I wonder however why this > >>>gets done at all. You said it got there via > >>> > >>>mtrr_aps_init() > >>> \-> set_mtrr() > >>> \-> mtrr_work_handler() > >>> > >>>yet this isn't done unconditionally - see the comment before > >>>checking mtrr_aps_delayed_init. Can you find out where the > >>>obviously necessary call(s) to set_mtrr_aps_delayed_init() > >>>come(s) from? > >>At bootup stage, set_mtrr_aps_delayed_init is called by > >>native_smp_prepare_cpus. > >>mtrr_aps_delayed_init is always set to ture for intel processor in > >>upstream > >>code. > >Indeed, and that (in one form or another) has been done > >virtually forever in Linux. I wonder why the problem wasn't > >noticed (or looked into, if it was noticed) so far. > > > >As it's going to be rather difficult to convince the Linux folks > >to change their code (plus this wouldn't help with existing > >kernels anyway), we'll need to find a way to improve this in > >the hypervisor. > Hi Jan, Tim > Is this issue improvable from xen side? Probably; we're looking into the best way to address it. Tim. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-30 9:03 ` Tim Deegan @ 2012-09-19 2:39 ` zhenzhong.duan 2012-09-19 10:29 ` Jan Beulich 2013-04-29 17:55 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 25+ messages in thread From: zhenzhong.duan @ 2012-09-19 2:39 UTC (permalink / raw) To: Tim Deegan Cc: Satish Kantheti, Stefano Stabellini, Konrad Rzeszutek Wilk, Feng Jin, xen-devel, Jan Beulich 于 2012-08-30 17:03, Tim Deegan 写道: > At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: >> >> ??? 2012-08-13 17:29, Jan Beulich ??????: >>>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> >>>>>> wrote: >>>> ??? 2012-08-10 22:22, Jan Beulich ??????: >>>>> Going back to your original mail, I wonder however why this >>>>> gets done at all. You said it got there via >>>>> >>>>> mtrr_aps_init() >>>>> \-> set_mtrr() >>>>> \-> mtrr_work_handler() >>>>> >>>>> yet this isn't done unconditionally - see the comment before >>>>> checking mtrr_aps_delayed_init. Can you find out where the >>>>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>>>> come(s) from? >>>> At bootup stage, set_mtrr_aps_delayed_init is called by >>>> native_smp_prepare_cpus. >>>> mtrr_aps_delayed_init is always set to ture for intel processor in >>>> upstream >>>> code. >>> Indeed, and that (in one form or another) has been done >>> virtually forever in Linux. I wonder why the problem wasn't >>> noticed (or looked into, if it was noticed) so far. >>> >>> As it's going to be rather difficult to convince the Linux folks >>> to change their code (plus this wouldn't help with existing >>> kernels anyway), we'll need to find a way to improve this in >>> the hypervisor. >> Hi Jan, Tim >> Is this issue improvable from xen side? > Probably; we're looking into the best way to address it. Hi Jan, Tim Is there any patch for us to test? We are looking foward to your fix. Our customer got unsatisfied with more than 30 mins of bootup and long time wait. Regards zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-09-19 2:39 ` zhenzhong.duan @ 2012-09-19 10:29 ` Jan Beulich 0 siblings, 0 replies; 25+ messages in thread From: Jan Beulich @ 2012-09-19 10:29 UTC (permalink / raw) To: zhenzhong.duan Cc: Satish Kantheti, Konrad Rzeszutek Wilk, Feng Jin, Stefano Stabellini, Tim Deegan, xen-devel >>> On 19.09.12 at 04:39, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > > 于 2012-08-30 17:03, Tim Deegan 写道: >> At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: >>> >>> ??? 2012-08-13 17:29, Jan Beulich ??????: >>>>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> >>>>>>> wrote: >>>>> ??? 2012-08-10 22:22, Jan Beulich ??????: >>>>>> Going back to your original mail, I wonder however why this >>>>>> gets done at all. You said it got there via >>>>>> >>>>>> mtrr_aps_init() >>>>>> \-> set_mtrr() >>>>>> \-> mtrr_work_handler() >>>>>> >>>>>> yet this isn't done unconditionally - see the comment before >>>>>> checking mtrr_aps_delayed_init. Can you find out where the >>>>>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>>>>> come(s) from? >>>>> At bootup stage, set_mtrr_aps_delayed_init is called by >>>>> native_smp_prepare_cpus. >>>>> mtrr_aps_delayed_init is always set to ture for intel processor in >>>>> upstream >>>>> code. >>>> Indeed, and that (in one form or another) has been done >>>> virtually forever in Linux. I wonder why the problem wasn't >>>> noticed (or looked into, if it was noticed) so far. >>>> >>>> As it's going to be rather difficult to convince the Linux folks >>>> to change their code (plus this wouldn't help with existing >>>> kernels anyway), we'll need to find a way to improve this in >>>> the hypervisor. >>> Hi Jan, Tim >>> Is this issue improvable from xen side? >> Probably; we're looking into the best way to address it. > > Is there any patch for us to test? No, sorry. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-30 9:03 ` Tim Deegan 2012-09-19 2:39 ` zhenzhong.duan @ 2013-04-29 17:55 ` Konrad Rzeszutek Wilk 2013-04-30 10:37 ` George Dunlap 1 sibling, 1 reply; 25+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-04-29 17:55 UTC (permalink / raw) To: Tim Deegan Cc: Satish Kantheti, Stefano Stabellini, Feng Jin, zhenzhong.duan, xen-devel, Jan Beulich On Thu, Aug 30, 2012 at 10:03:12AM +0100, Tim Deegan wrote: > At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: > > > > > > ??? 2012-08-13 17:29, Jan Beulich ??????: > > >>>>On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> > > >>>>wrote: > > >>??? 2012-08-10 22:22, Jan Beulich ??????: > > >>>Going back to your original mail, I wonder however why this > > >>>gets done at all. You said it got there via > > >>> > > >>>mtrr_aps_init() > > >>> \-> set_mtrr() > > >>> \-> mtrr_work_handler() > > >>> > > >>>yet this isn't done unconditionally - see the comment before > > >>>checking mtrr_aps_delayed_init. Can you find out where the > > >>>obviously necessary call(s) to set_mtrr_aps_delayed_init() > > >>>come(s) from? > > >>At bootup stage, set_mtrr_aps_delayed_init is called by > > >>native_smp_prepare_cpus. > > >>mtrr_aps_delayed_init is always set to ture for intel processor in > > >>upstream > > >>code. > > >Indeed, and that (in one form or another) has been done > > >virtually forever in Linux. I wonder why the problem wasn't > > >noticed (or looked into, if it was noticed) so far. > > > > > >As it's going to be rather difficult to convince the Linux folks > > >to change their code (plus this wouldn't help with existing > > >kernels anyway), we'll need to find a way to improve this in > > >the hypervisor. > > Hi Jan, Tim > > Is this issue improvable from xen side? > > Probably; we're looking into the best way to address it. > > Tim. Ping? Was there any progress on this? Thanks ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2013-04-29 17:55 ` Konrad Rzeszutek Wilk @ 2013-04-30 10:37 ` George Dunlap 0 siblings, 0 replies; 25+ messages in thread From: George Dunlap @ 2013-04-30 10:37 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Satish Kantheti, Stefano Stabellini, Feng Jin, Tim Deegan, zhenzhong.duan, xen-devel, Jan Beulich On Mon, Apr 29, 2013 at 6:55 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Thu, Aug 30, 2012 at 10:03:12AM +0100, Tim Deegan wrote: >> At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: >> > >> > >> > ??? 2012-08-13 17:29, Jan Beulich ??????: >> > >>>>On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> >> > >>>>wrote: >> > >>??? 2012-08-10 22:22, Jan Beulich ??????: >> > >>>Going back to your original mail, I wonder however why this >> > >>>gets done at all. You said it got there via >> > >>> >> > >>>mtrr_aps_init() >> > >>> \-> set_mtrr() >> > >>> \-> mtrr_work_handler() >> > >>> >> > >>>yet this isn't done unconditionally - see the comment before >> > >>>checking mtrr_aps_delayed_init. Can you find out where the >> > >>>obviously necessary call(s) to set_mtrr_aps_delayed_init() >> > >>>come(s) from? >> > >>At bootup stage, set_mtrr_aps_delayed_init is called by >> > >>native_smp_prepare_cpus. >> > >>mtrr_aps_delayed_init is always set to ture for intel processor in >> > >>upstream >> > >>code. >> > >Indeed, and that (in one form or another) has been done >> > >virtually forever in Linux. I wonder why the problem wasn't >> > >noticed (or looked into, if it was noticed) so far. >> > > >> > >As it's going to be rather difficult to convince the Linux folks >> > >to change their code (plus this wouldn't help with existing >> > >kernels anyway), we'll need to find a way to improve this in >> > >the hypervisor. >> > Hi Jan, Tim >> > Is this issue improvable from xen side? >> >> Probably; we're looking into the best way to address it. >> >> Tim. > > Ping? Was there any progress on this? Thanks Does this need to be added to our tracking list? -George ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-29 5:36 ` zhenzhong.duan 2012-08-30 9:03 ` Tim Deegan @ 2012-08-31 9:07 ` Jan Beulich 1 sibling, 0 replies; 25+ messages in thread From: Jan Beulich @ 2012-08-31 9:07 UTC (permalink / raw) To: zhenzhong.duan Cc: Satish Kantheti, Konrad Rzeszutek Wilk, Feng Jin, Stefano Stabellini, Tim Deegan, xen-devel >>> On 29.08.12 at 07:36, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > > 于 2012-08-13 17:29, Jan Beulich 写道: >>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> 于 2012-08-10 22:22, Jan Beulich 写道: >>>> Going back to your original mail, I wonder however why this >>>> gets done at all. You said it got there via >>>> >>>> mtrr_aps_init() >>>> \-> set_mtrr() >>>> \-> mtrr_work_handler() >>>> >>>> yet this isn't done unconditionally - see the comment before >>>> checking mtrr_aps_delayed_init. Can you find out where the >>>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>>> come(s) from? >>> At bootup stage, set_mtrr_aps_delayed_init is called by >>> native_smp_prepare_cpus. >>> mtrr_aps_delayed_init is always set to ture for intel processor in upstream >>> code. >> Indeed, and that (in one form or another) has been done >> virtually forever in Linux. I wonder why the problem wasn't >> noticed (or looked into, if it was noticed) so far. >> >> As it's going to be rather difficult to convince the Linux folks >> to change their code (plus this wouldn't help with existing >> kernels anyway), we'll need to find a way to improve this in >> the hypervisor. > Is this issue improvable from xen side? Yes, we're investigating options. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-10 14:22 ` Jan Beulich 2012-08-13 7:58 ` zhenzhong.duan @ 2012-08-13 9:07 ` Tim Deegan 1 sibling, 0 replies; 25+ messages in thread From: Tim Deegan @ 2012-08-13 9:07 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Feng Jin, zhenzhong.duan, Konrad Rzeszutek Wilk At 15:22 +0100 on 10 Aug (1344612120), Jan Beulich wrote: > Yeah, that CR0 write disables the caches, and that's pretty > expensive on EPT (not sure why NPT doesn't use/need the > same hook) when the guest has any active MMIO regions: > vmx_set_uc_mode(), when HAP is enabled, calls > ept_change_entry_emt_with_range(), which is a walk through > the entire guest page tables (i.e. scales with guest size, or, to > be precise, with the highest populated GFN). :( That's not so great. It can definitely be done more efficiently than with that for() loop, and I wonder whether there isn't some better way involving flipping a global flag somewhere. If no EPT maintainers have commented on this by Thursday I'll look into it then. Tim. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-07 7:22 kernel bootup slow issue on ovm3.1.1 zhenzhong.duan 2012-08-07 8:37 ` Jan Beulich @ 2012-08-07 16:26 ` Konrad Rzeszutek Wilk 2012-08-08 9:23 ` zhenzhong.duan 1 sibling, 1 reply; 25+ messages in thread From: Konrad Rzeszutek Wilk @ 2012-08-07 16:26 UTC (permalink / raw) To: zhenzhong.duan; +Cc: xen-devel, Feng Jin On Tue, Aug 07, 2012 at 03:22:50PM +0800, zhenzhong.duan wrote: > Hi maintainers, > > We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). > > The system env is an exalogic node with 24 cores + 100G mem (2 socket , > 6 cores per socket, 2 HT threads per core). > After boot up this node with all cores enabled, > We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, > it takes 30+ mins to boot. > If we remove passthrough device from vm.cfg, bootup takes about 2 mins. > If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. > So a big mem + passthrough device made the worst case. > > If we boot this node with HT disabled from BIOS. Now only 12 cores are > available. > OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! > > After some debug, we found it's in kernel mtrr init that make this delay. > > mtrr_aps_init() > \-> set_mtrr() > \-> mtrr_work_handler() > > kernel spin in mtrr_work_handler. > > But we don't know the scene hide in the hypervisor. Why big mem + > passthrough made the worst case. > Is this already fixed in xen upstream? > Any comments are welcome, I'll upload all data depend on your need. What happens if you run with a upstream version of kernel? Say v3.4.7 ? Do you see the same issue? > > thanks > zduan > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-07 16:26 ` Konrad Rzeszutek Wilk @ 2012-08-08 9:23 ` zhenzhong.duan 2012-08-08 14:43 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: zhenzhong.duan @ 2012-08-08 9:23 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Feng Jin [-- Attachment #1.1: Type: text/plain, Size: 2612 bytes --] ? 2012-08-08 00:26, Konrad Rzeszutek Wilk ??: > On Tue, Aug 07, 2012 at 03:22:50PM +0800, zhenzhong.duan wrote: >> Hi maintainers, >> >> We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). >> >> The system env is an exalogic node with 24 cores + 100G mem (2 socket , >> 6 cores per socket, 2 HT threads per core). >> After boot up this node with all cores enabled, >> We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, >> it takes 30+ mins to boot. >> If we remove passthrough device from vm.cfg, bootup takes about 2 mins. >> If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. >> So a big mem + passthrough device made the worst case. >> >> If we boot this node with HT disabled from BIOS. Now only 12 cores are >> available. >> OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! >> >> After some debug, we found it's in kernel mtrr init that make this delay. >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> kernel spin in mtrr_work_handler. >> >> But we don't know the scene hide in the hypervisor. Why big mem + >> passthrough made the worst case. >> Is this already fixed in xen upstream? >> Any comments are welcome, I'll upload all data depend on your need. > What happens if you run with a upstream version of kernel? Say v3.4.7 ? Hi konrad, Jan, I tried 3.5.0-2.fc17.x86_64 and 3.6.0-rc1. * 3.5.0-2.fc17.x86_64 took ~30 mins.* Below is piece of fc17 dmesg: #22[ 0.002999] installing Xen timer for CPU 22 #23[ 0.002999] installing Xen timer for CPU 23 [ 1.844896] Brought up 24 CPUs [ 1.844898] Total of 24 processors activated (140449.34 BogoMIPS). *block for 30 mins here.* [ 1.899794] devtmpfs: initialized [ 1.905956] atomic64 test passed for x86-64 platform with CX8 and with SSE * 3.6.0-rc1 took more than 2 hours.* piece of dmesg: cpu 22 spinlock event irq 218 [ 1.884775] #22[ 0.001999] installing Xen timer for CPU 22 cpu 23 spinlock event irq 225 [ 1.932764] #23[ 0.001999] installing Xen timer for CPU 23 [ 1.977734] Brought up 24 CPUs [ 1.978706] smpboot: Total of 24 processors activated (140449.34 BogoMIPS) *block for more than 2 hours here.* [ 1.988859] devtmpfs: initialized [ 2.021785] dummy: [ 2.023706] NET: Registered protocol family 16 [ 2.026735] ACPI: bus type pci registered [ 2.028002] PCI: Using configuration type 1 for base access I also send a patch to lkml that can workaround this issue, but I don't know the reason of block in xen side. link: https://lkml.org/lkml/2012/8/7/50 regards zduan [-- Attachment #1.2: Type: text/html, Size: 3775 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: kernel bootup slow issue on ovm3.1.1 2012-08-08 9:23 ` zhenzhong.duan @ 2012-08-08 14:43 ` Jan Beulich 0 siblings, 0 replies; 25+ messages in thread From: Jan Beulich @ 2012-08-08 14:43 UTC (permalink / raw) To: zhenzhong.duan; +Cc: xen-devel, Feng Jin, Konrad Rzeszutek Wilk >>> On 08.08.12 at 11:23, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > I also send a patch to lkml that can workaround this issue, but I don't > know the reason of block in xen side. > link: https://lkml.org/lkml/2012/8/7/50 Without understanding the reason for this, I agree with hpa that blindly changing the kernel to address this is not really a good idea. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2013-04-30 10:37 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-07 7:22 kernel bootup slow issue on ovm3.1.1 zhenzhong.duan 2012-08-07 8:37 ` Jan Beulich 2012-08-08 9:48 ` zhenzhong.duan 2012-08-08 14:47 ` Jan Beulich 2012-08-08 15:01 ` Jan Beulich 2012-08-09 9:42 ` zhenzhong.duan 2012-08-09 10:35 ` Jan Beulich 2012-08-10 4:40 ` zhenzhong.duan 2012-08-10 14:22 ` Jan Beulich 2012-08-13 7:58 ` zhenzhong.duan 2012-08-13 9:29 ` Jan Beulich 2012-08-13 11:08 ` Stefano Stabellini 2012-08-29 5:19 ` zhenzhong.duan 2012-08-29 18:28 ` Stefano Stabellini 2012-08-29 5:36 ` zhenzhong.duan 2012-08-30 9:03 ` Tim Deegan 2012-09-19 2:39 ` zhenzhong.duan 2012-09-19 10:29 ` Jan Beulich 2013-04-29 17:55 ` Konrad Rzeszutek Wilk 2013-04-30 10:37 ` George Dunlap 2012-08-31 9:07 ` Jan Beulich 2012-08-13 9:07 ` Tim Deegan 2012-08-07 16:26 ` Konrad Rzeszutek Wilk 2012-08-08 9:23 ` zhenzhong.duan 2012-08-08 14:43 ` Jan Beulich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).