From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: Poor SMP performance pv_ops domU
Date: Wed, 19 May 2010 10:44:27 -0700
Message-ID: <4BF4237B.4080209@goop.org>
References: <E2279633-C226-4C37-9313-49CE6A53B628@clustered.net>
	<4BF2DEBD.7040108@goop.org>
	<54D71582-B33E-4808-A134-639BD898A011@clustered.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <54D71582-B33E-4808-A134-639BD898A011@clustered.net>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: John Morrison <john@clustered.net>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On 05/19/2010 09:24 AM, John Morrison wrote:
> I've tried with various kernel's today - pv_ops seems to only use 1 core out of 8.
>
> PV spinlocks makes no difference.
>
> The thing that sticks out most is I cannot get the dom0 (xen-3.4.2) to show more that about 99.7% cpu usage for any pv_ops kernel.
>
> #!/usr/bin/perl
>
> while () {}
>
> running 8 of these loads 2.6.18.8-xenU with nearly 800% cpu as shown in dom0
> running the same 8 in any pv_ops kernel's only gets as high as about 99.7%
>   

What tool are you using to show CPU use?

> Inside the pv and xenU kernels top -s show all 8 cores being used.
>   

I tried to reproduce this:

   1. I created a 4 vcpu pvops PV domain (4 pcpu host)
   2. Confirmed that all 4 vcpus are present with "cat /proc/cpuinfo" in
      the domain
   3. Ran 4 instances of ``perl -e "while(){}"&'' in the domain
   4. "top" within the domain shows 99% overall user time, no stolen
      time, with the perl processes each using 99% cpu time
   5. in dom0 "watch -n 1 xl vcpu-list <domain>" shows all 4 vcpus are
      consuming 1 vcpu second per second
   6. running a spin loop in dom0 makes top within the domain show
      16-25% stolen time

Aside from top showing "99%" rather than ~400% as one might expect, it
all seems OK, and it looks like the vcpus are actually getting all the
CPU they're asking for.  I think the 99 vs 400 difference is just a
change in how the kernel shows its accounting (since there's been a lot
of change in that area between .18 and .32, including a whole new
scheduler).

If you're seeing a real performance regression between .18 and .32,
that's interesting, but it would be useful to make sure you're comparing
apples to apples; in particular, isolating any performance effect
inherent in Linux's performance change from .18 -> .32, compared to
pvops vs xenU.

So, things to try:

    * make sure all the vcpus are actually enabled within your domain;
      if your adding them after the domain has booted, you need to make
      sure they get hot-plugged properly
    * make sure you don't have any expensive debug options enabled in
      your kernel config
    * run your benchmark on the 2.6.32 kernel booted native and compare
      it to pvops running under xen
    * compare it with the Novell 2.6.32 non-pvops kernel
    * try pinning the vcpus to physical cpus to eliminate any Xen
      scheduler effects

Thanks,
    J