qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Performance Profiling 2 VMs
@ 2016-03-02  0:06 kalyan tata
  2016-03-02 11:28 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: kalyan tata @ 2016-03-02  0:06 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1649 bytes --]

Hi All,

I am new to qemu development.
Sorry If this is not the correct forum for this question, it would be great
if you could direct me to correct forum.

I am seeing very low virtio network throughput on an older (2.6.18) linux
guest  vs another newer guest (3.10) both running on the same host. (same
config 2 vcpus, no multi Q etc.)  I see very high CPU usage on the 2.6.18
guest at very low network throughput and want to profile to find
bottleneck.

I tried to use "perf kvm" but the analysis  shows overhead as  max .25 %
where as top in VM shows 100% cpu. (I used following as a guide
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-perf_kvm
)

     0.25%  :5235    [uhci_hcd]        [g] 0xffffffff80182236
     0.24%  :5235    [uhci_hcd]        [g] 0xffffffff8018226a
     0.23%  :5235    [virtio_ring]     [g] vring_new_virtqueue
     0.20%  :5236    [uhci_hcd]        [g] 0xffffffff80182236
     0.18%  :5236    [uhci_hcd]        [g] 0xffffffff8018226a
     0.18%  :5235    [uhci_hcd]        [g] 0xffffffff8016f385
     0.14%  :5236    [uhci_hcd]        [g] 0xffffffff802fbe0f
     0.14%  :5235    [uhci_hcd]        [g] 0xffffffff8001161a
     0.14%  :5235    [virtio_ring]     [g] virtqueue_is_broken


My basic question is - Is there a way I can profile the older version of
linux guest so i can see the bottleneck (where the guest is spending CPU
cycles) My aim is to see if i can patch the older version in the critical
path with improvements made in newer version

Thanks
Kal

[-- Attachment #2: Type: text/html, Size: 2275 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Performance Profiling 2 VMs
  2016-03-02  0:06 [Qemu-devel] Performance Profiling 2 VMs kalyan tata
@ 2016-03-02 11:28 ` Stefan Hajnoczi
  2016-03-03  5:32   ` kalyan tata
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2016-03-02 11:28 UTC (permalink / raw)
  To: kalyan tata; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2092 bytes --]

On Tue, Mar 01, 2016 at 04:06:16PM -0800, kalyan tata wrote:
> Hi All,
> 
> I am new to qemu development.
> Sorry If this is not the correct forum for this question, it would be great
> if you could direct me to correct forum.
> 
> I am seeing very low virtio network throughput on an older (2.6.18) linux
> guest  vs another newer guest (3.10) both running on the same host. (same
> config 2 vcpus, no multi Q etc.)  I see very high CPU usage on the 2.6.18
> guest at very low network throughput and want to profile to find
> bottleneck.
> 
> I tried to use "perf kvm" but the analysis  shows overhead as  max .25 %
> where as top in VM shows 100% cpu. (I used following as a guide
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-perf_kvm
> )
> 
>      0.25%  :5235    [uhci_hcd]        [g] 0xffffffff80182236
>      0.24%  :5235    [uhci_hcd]        [g] 0xffffffff8018226a
>      0.23%  :5235    [virtio_ring]     [g] vring_new_virtqueue
>      0.20%  :5236    [uhci_hcd]        [g] 0xffffffff80182236
>      0.18%  :5236    [uhci_hcd]        [g] 0xffffffff8018226a
>      0.18%  :5235    [uhci_hcd]        [g] 0xffffffff8016f385
>      0.14%  :5236    [uhci_hcd]        [g] 0xffffffff802fbe0f
>      0.14%  :5235    [uhci_hcd]        [g] 0xffffffff8001161a
>      0.14%  :5235    [virtio_ring]     [g] virtqueue_is_broken
> 
> 
> My basic question is - Is there a way I can profile the older version of
> linux guest so i can see the bottleneck (where the guest is spending CPU
> cycles) My aim is to see if i can patch the older version in the critical
> path with improvements made in newer version

What is the output of "mpstat 5" in the guest and on the host?  mpstat
is part of the "sysstat" package.

mpstat is similar to vmstat but also shows "guest time" and "steal
time".  Both are relevant to virtualization and will help show which
component is using so much CPU time.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Performance Profiling 2 VMs
  2016-03-02 11:28 ` Stefan Hajnoczi
@ 2016-03-03  5:32   ` kalyan tata
  2016-03-09 16:15     ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: kalyan tata @ 2016-03-03  5:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4573 bytes --]

Thanks a lot for the quick reply Stefan

Following from problem VM:
18:56:29     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
%idle    intr/s

18:56:44       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00
 100.20      0.00
18:56:49       1    0.00    0.00    0.00    0.00    3.21   10.22    0.00
79.56    908.22
18:56:54       1    0.00    0.00    0.00    0.00   11.47   54.93    0.00
 2.82   5527.77
18:56:59       1    0.00    0.00    0.00    0.00   10.04   66.06    0.00
 2.21   7160.64
18:57:04       1    0.00    0.00    0.00    0.00   10.42   65.13    0.00
 2.00   7295.99
18:57:09       1    0.00    0.00    0.00    0.00   12.53   50.51    0.00
 4.04   5700.20
18:57:14       1    0.00    0.00    0.00    0.00   16.43   65.53    0.00
 8.62   9572.34
18:57:19       1    0.00    0.00    0.00    0.00   11.45   60.64    0.00
 4.02   5798.19
18:57:24       1    0.00    0.00    0.00    0.00   11.45   81.33    0.00
 0.80   6064.26
18:57:29       1    0.00    0.00    0.00    0.00    7.65   85.11    0.00
 0.80   7578.27
18:57:34       1    0.00    0.00    0.00    0.00    9.42   84.17    0.00
 1.40   9083.97
18:57:39       1    0.00    0.00    0.00    0.00    7.78   82.83    0.00
 1.60   7264.87
18:57:44       1    0.00    0.00    0.00    0.00    8.62   87.78    0.00
 0.60   8597.80
18:57:49       1    0.00    0.00    0.00    0.00   10.02   82.16    0.00
 2.40   7750.90
18:57:54       1    0.00    0.00    0.00    0.00    8.42   81.76    0.00
 1.00   6303.41
18:57:59       1    0.00    0.00    0.00    0.00    7.63   87.35    0.00
 1.20   9422.49
18:58:04       1    0.00    0.00    0.00    0.00   10.44   80.32    0.00
 2.21   7496.79
18:58:09       1    0.00    0.00    0.00    0.00    6.43   59.84    0.00
26.91   5019.28
18:58:14       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00
 100.00      1.00
18:58:19       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00
 100.00      0.00

I set the affinity of both tx and rx interfaces to cpu 1 so just showing
cpu1.

NAPI weight is 128 in this version, I changed to 64 just to see. This
version of the code seems to be changing quota and budget (which i did not
see in newer versions) I am thinking of playing around with that.
I also see that this version kicks for every packet on the tx side.

Any other pointers would be really helpful.

Thanks
Kal




On Wed, Mar 2, 2016 at 3:28 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Tue, Mar 01, 2016 at 04:06:16PM -0800, kalyan tata wrote:
> > Hi All,
> >
> > I am new to qemu development.
> > Sorry If this is not the correct forum for this question, it would be
> great
> > if you could direct me to correct forum.
> >
> > I am seeing very low virtio network throughput on an older (2.6.18) linux
> > guest  vs another newer guest (3.10) both running on the same host. (same
> > config 2 vcpus, no multi Q etc.)  I see very high CPU usage on the 2.6.18
> > guest at very low network throughput and want to profile to find
> > bottleneck.
> >
> > I tried to use "perf kvm" but the analysis  shows overhead as  max .25 %
> > where as top in VM shows 100% cpu. (I used following as a guide
> >
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-perf_kvm
> > )
> >
> >      0.25%  :5235    [uhci_hcd]        [g] 0xffffffff80182236
> >      0.24%  :5235    [uhci_hcd]        [g] 0xffffffff8018226a
> >      0.23%  :5235    [virtio_ring]     [g] vring_new_virtqueue
> >      0.20%  :5236    [uhci_hcd]        [g] 0xffffffff80182236
> >      0.18%  :5236    [uhci_hcd]        [g] 0xffffffff8018226a
> >      0.18%  :5235    [uhci_hcd]        [g] 0xffffffff8016f385
> >      0.14%  :5236    [uhci_hcd]        [g] 0xffffffff802fbe0f
> >      0.14%  :5235    [uhci_hcd]        [g] 0xffffffff8001161a
> >      0.14%  :5235    [virtio_ring]     [g] virtqueue_is_broken
> >
> >
> > My basic question is - Is there a way I can profile the older version of
> > linux guest so i can see the bottleneck (where the guest is spending CPU
> > cycles) My aim is to see if i can patch the older version in the critical
> > path with improvements made in newer version
>
> What is the output of "mpstat 5" in the guest and on the host?  mpstat
> is part of the "sysstat" package.
>
> mpstat is similar to vmstat but also shows "guest time" and "steal
> time".  Both are relevant to virtualization and will help show which
> component is using so much CPU time.
>
> Stefan
>

[-- Attachment #2: Type: text/html, Size: 6453 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Performance Profiling 2 VMs
  2016-03-03  5:32   ` kalyan tata
@ 2016-03-09 16:15     ` Stefan Hajnoczi
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2016-03-09 16:15 UTC (permalink / raw)
  To: kalyan tata; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

On Wed, Mar 02, 2016 at 09:32:30PM -0800, kalyan tata wrote:
> Thanks a lot for the quick reply Stefan
> 
> Following from problem VM:
> 18:56:29     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> %idle    intr/s
> 
> 18:56:44       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00
>  100.20      0.00
> 18:56:49       1    0.00    0.00    0.00    0.00    3.21   10.22    0.00
> 79.56    908.22
> 18:56:54       1    0.00    0.00    0.00    0.00   11.47   54.93    0.00
>  2.82   5527.77
> 18:56:59       1    0.00    0.00    0.00    0.00   10.04   66.06    0.00
>  2.21   7160.64
> 18:57:04       1    0.00    0.00    0.00    0.00   10.42   65.13    0.00
>  2.00   7295.99
> 18:57:09       1    0.00    0.00    0.00    0.00   12.53   50.51    0.00
>  4.04   5700.20
> 18:57:14       1    0.00    0.00    0.00    0.00   16.43   65.53    0.00
>  8.62   9572.34
> 18:57:19       1    0.00    0.00    0.00    0.00   11.45   60.64    0.00
>  4.02   5798.19
> 18:57:24       1    0.00    0.00    0.00    0.00   11.45   81.33    0.00
>  0.80   6064.26
> 18:57:29       1    0.00    0.00    0.00    0.00    7.65   85.11    0.00
>  0.80   7578.27
> 18:57:34       1    0.00    0.00    0.00    0.00    9.42   84.17    0.00
>  1.40   9083.97
> 18:57:39       1    0.00    0.00    0.00    0.00    7.78   82.83    0.00
>  1.60   7264.87
> 18:57:44       1    0.00    0.00    0.00    0.00    8.62   87.78    0.00
>  0.60   8597.80
> 18:57:49       1    0.00    0.00    0.00    0.00   10.02   82.16    0.00
>  2.40   7750.90
> 18:57:54       1    0.00    0.00    0.00    0.00    8.42   81.76    0.00
>  1.00   6303.41
> 18:57:59       1    0.00    0.00    0.00    0.00    7.63   87.35    0.00
>  1.20   9422.49
> 18:58:04       1    0.00    0.00    0.00    0.00   10.44   80.32    0.00
>  2.21   7496.79
> 18:58:09       1    0.00    0.00    0.00    0.00    6.43   59.84    0.00
> 26.91   5019.28
> 18:58:14       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00
>  100.00      1.00
> 18:58:19       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00
>  100.00      0.00
> 
> I set the affinity of both tx and rx interfaces to cpu 1 so just showing
> cpu1.
> 
> NAPI weight is 128 in this version, I changed to 64 just to see. This
> version of the code seems to be changing quota and budget (which i did not
> see in newer versions) I am thinking of playing around with that.
> I also see that this version kicks for every packet on the tx side.

There is nothing surprising in the output.  It makes sense for a
CPU-bound virtual networking workload.

Perhaps someone else has ideas.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-03-09 16:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-02  0:06 [Qemu-devel] Performance Profiling 2 VMs kalyan tata
2016-03-02 11:28 ` Stefan Hajnoczi
2016-03-03  5:32   ` kalyan tata
2016-03-09 16:15     ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).