From mboxrd@z Thu Jan 1 00:00:00 1970 From: "zhenzhong.duan" Subject: Re: kernel bootup slow issue on ovm3.1.1 Date: Wed, 08 Aug 2012 17:23:51 +0800 Message-ID: <50223027.6080502@oracle.com> References: <5020C24A.3060604@oracle.com> <20120807162637.GB15053@phenom.dumpdata.com> Reply-To: zhenzhong.duan@oracle.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4844915715082705569==" Return-path: In-Reply-To: <20120807162637.GB15053@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, Feng Jin List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --===============4844915715082705569== Content-Type: multipart/alternative; boundary="------------060509090501000408040803" This is a multi-part message in MIME format. --------------060509090501000408040803 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit ? 2012-08-08 00:26, Konrad Rzeszutek Wilk ??: > On Tue, Aug 07, 2012 at 03:22:50PM +0800, zhenzhong.duan wrote: >> Hi maintainers, >> >> We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). >> >> The system env is an exalogic node with 24 cores + 100G mem (2 socket , >> 6 cores per socket, 2 HT threads per core). >> After boot up this node with all cores enabled, >> We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, >> it takes 30+ mins to boot. >> If we remove passthrough device from vm.cfg, bootup takes about 2 mins. >> If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. >> So a big mem + passthrough device made the worst case. >> >> If we boot this node with HT disabled from BIOS. Now only 12 cores are >> available. >> OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! >> >> After some debug, we found it's in kernel mtrr init that make this delay. >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> kernel spin in mtrr_work_handler. >> >> But we don't know the scene hide in the hypervisor. Why big mem + >> passthrough made the worst case. >> Is this already fixed in xen upstream? >> Any comments are welcome, I'll upload all data depend on your need. > What happens if you run with a upstream version of kernel? Say v3.4.7 ? Hi konrad, Jan, I tried 3.5.0-2.fc17.x86_64 and 3.6.0-rc1. * 3.5.0-2.fc17.x86_64 took ~30 mins.* Below is piece of fc17 dmesg: #22[ 0.002999] installing Xen timer for CPU 22 #23[ 0.002999] installing Xen timer for CPU 23 [ 1.844896] Brought up 24 CPUs [ 1.844898] Total of 24 processors activated (140449.34 BogoMIPS). *block for 30 mins here.* [ 1.899794] devtmpfs: initialized [ 1.905956] atomic64 test passed for x86-64 platform with CX8 and with SSE * 3.6.0-rc1 took more than 2 hours.* piece of dmesg: cpu 22 spinlock event irq 218 [ 1.884775] #22[ 0.001999] installing Xen timer for CPU 22 cpu 23 spinlock event irq 225 [ 1.932764] #23[ 0.001999] installing Xen timer for CPU 23 [ 1.977734] Brought up 24 CPUs [ 1.978706] smpboot: Total of 24 processors activated (140449.34 BogoMIPS) *block for more than 2 hours here.* [ 1.988859] devtmpfs: initialized [ 2.021785] dummy: [ 2.023706] NET: Registered protocol family 16 [ 2.026735] ACPI: bus type pci registered [ 2.028002] PCI: Using configuration type 1 for base access I also send a patch to lkml that can workaround this issue, but I don't know the reason of block in xen side. link: https://lkml.org/lkml/2012/8/7/50 regards zduan --------------060509090501000408040803 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit

于 2012-08-08 00:26, Konrad Rzeszutek Wilk 写道:
On Tue, Aug 07, 2012 at 03:22:50PM +0800, zhenzhong.duan wrote:
Hi maintainers,

We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1).

The system env is an exalogic node with 24 cores + 100G mem (2 socket ,
6 cores per socket, 2 HT threads per core).
After boot up this node with all cores enabled,
We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device,
it takes 30+ mins to boot.
If we remove passthrough device from vm.cfg, bootup takes about 2 mins.
If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins.
So a big mem + passthrough device made the worst case.

If we boot this node with HT disabled from BIOS. Now only 12 cores are
available.
OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins!

After some debug, we found it's in kernel mtrr init that make this delay.

mtrr_aps_init() 
 \-> set_mtrr() 
     \-> mtrr_work_handler() 

kernel spin in mtrr_work_handler.

But we don't know the scene hide in the hypervisor. Why big mem +
passthrough made the worst case.
Is this already fixed in xen upstream?
Any comments are welcome, I'll upload all data depend on your need.
What happens if you run with a upstream version of kernel? Say v3.4.7 ?
Hi konrad, Jan,

I tried 3.5.0-2.fc17.x86_64 and 3.6.0-rc1.

3.5.0-2.fc17.x86_64 took ~30 mins.

Below is piece of fc17 dmesg:
 #22[    0.002999] installing Xen timer for CPU 22
 #23[    0.002999] installing Xen timer for CPU 23
[    1.844896] Brought up 24 CPUs
[    1.844898] Total of 24 processors activated (140449.34 BogoMIPS).
block for 30 mins here.
[    1.899794] devtmpfs: initialized
[    1.905956] atomic64 test passed for x86-64 platform with CX8 and with SSE

3.6.0-rc1 took more than 2 hours.

piece of dmesg:
cpu 22 spinlock event irq 218
[    1.884775]  #22[    0.001999] installing Xen timer for CPU 22
cpu 23 spinlock event irq 225
[    1.932764]  #23[    0.001999] installing Xen timer for CPU 23
[    1.977734] Brought up 24 CPUs
[    1.978706] smpboot: Total of 24 processors activated (140449.34 BogoMIPS)
block for more than 2 hours here.
[    1.988859] devtmpfs: initialized
[    2.021785] dummy:
[    2.023706] NET: Registered protocol family 16
[    2.026735] ACPI: bus type pci registered
[    2.028002] PCI: Using configuration type 1 for base access

I also send a patch to lkml that can workaround this issue, but I don't know the reason of block in xen side.
link: https://lkml.org/lkml/2012/8/7/50

regards
zduan

    
--------------060509090501000408040803-- --===============4844915715082705569== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============4844915715082705569==--