From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Poor SMP performance pv_ops domU Date: Wed, 19 May 2010 10:44:27 -0700 Message-ID: <4BF4237B.4080209@goop.org> References: <4BF2DEBD.7040108@goop.org> <54D71582-B33E-4808-A134-639BD898A011@clustered.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <54D71582-B33E-4808-A134-639BD898A011@clustered.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: John Morrison Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 05/19/2010 09:24 AM, John Morrison wrote: > I've tried with various kernel's today - pv_ops seems to only use 1 core out of 8. > > PV spinlocks makes no difference. > > The thing that sticks out most is I cannot get the dom0 (xen-3.4.2) to show more that about 99.7% cpu usage for any pv_ops kernel. > > #!/usr/bin/perl > > while () {} > > running 8 of these loads 2.6.18.8-xenU with nearly 800% cpu as shown in dom0 > running the same 8 in any pv_ops kernel's only gets as high as about 99.7% > What tool are you using to show CPU use? > Inside the pv and xenU kernels top -s show all 8 cores being used. > I tried to reproduce this: 1. I created a 4 vcpu pvops PV domain (4 pcpu host) 2. Confirmed that all 4 vcpus are present with "cat /proc/cpuinfo" in the domain 3. Ran 4 instances of ``perl -e "while(){}"&'' in the domain 4. "top" within the domain shows 99% overall user time, no stolen time, with the perl processes each using 99% cpu time 5. in dom0 "watch -n 1 xl vcpu-list " shows all 4 vcpus are consuming 1 vcpu second per second 6. running a spin loop in dom0 makes top within the domain show 16-25% stolen time Aside from top showing "99%" rather than ~400% as one might expect, it all seems OK, and it looks like the vcpus are actually getting all the CPU they're asking for. I think the 99 vs 400 difference is just a change in how the kernel shows its accounting (since there's been a lot of change in that area between .18 and .32, including a whole new scheduler). If you're seeing a real performance regression between .18 and .32, that's interesting, but it would be useful to make sure you're comparing apples to apples; in particular, isolating any performance effect inherent in Linux's performance change from .18 -> .32, compared to pvops vs xenU. So, things to try: * make sure all the vcpus are actually enabled within your domain; if your adding them after the domain has booted, you need to make sure they get hot-plugged properly * make sure you don't have any expensive debug options enabled in your kernel config * run your benchmark on the 2.6.32 kernel booted native and compare it to pvops running under xen * compare it with the Novell 2.6.32 non-pvops kernel * try pinning the vcpus to physical cpus to eliminate any Xen scheduler effects Thanks, J