Benchmarking for vhost polling patch

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Benchmarking for vhost polling patch
       [not found] <1414586281-razya@il.ibm.com>
@ 2014-10-29 12:38 ` Razya Ladelsky
  2014-10-30 11:30   ` Zhang Haoyu
  2014-11-09 12:19   ` Razya Ladelsky
  0 siblings, 2 replies; 8+ messages in thread
From: Razya Ladelsky @ 2014-10-29 12:38 UTC (permalink / raw)
  To: mst; +Cc: razya, GLIKSON, ERANRA, YOSSIKU, JOELN, abel.gordon, kvm

Hi Michael,

Following the polling patch thread: http://marc.info/?l=kvm&m=140853271510179&w=2, 
I changed poll_stop_idle to be counted in micro seconds, and carried out 
experiments using varying sizes of this value. The setup for netperf consisted of 
1 vm and 1 vhost , each running on their own dedicated core.
  
Here are the  numbers for netperf (micro benchmark):

polling|Send |Throughput|Utilization |S. Demand   |vhost|exits|throughput|throughput
mode   |Msg  |          |Send  Recv  |Send  Recv  |util |/sec | /cpu     |   /cpu
       |Size |          |local remote|local remote|     |     |          |% change
       |bytes|10^6bits/s|  %    %    |us/KB us/KB |  %  |     |          |    
-----------------------------------------------------------------------------
NoPolling  64   1054.11   99.97 3.01  7.78  3.74   38.80  92K    7.60
Polling=1  64   1036.67   99.97 2.93  7.90  3.70   53.00  92K    6.78     -10.78
Polling=5  64   1079.27   99.97 3.07  7.59  3.73   83.00  90K    5.90     -22.35
Polling=7  64   1444.90   99.97 3.98  5.67  3.61   95.00  19.5K  7.41      -2.44
Polling=10 64   1521.70   99.97 4.21  5.38  3.63   98.00  8.5K   7.69       1.19
Polling=25 64   1534.24   99.97 4.18  5.34  3.57   99.00  8.5K   7.71       1.51
Polling=50 64   1534.24   99.97 4.18  5.34  3.57   99.00  8.5K   7.71       1.51
                              
NoPolling  128  1577.39   99.97 4.09  5.19  3.40   54.00  113K   10.24 
Polling=1  128  1596.08   99.97 4.22  5.13  3.47   71.00  120K   9.34      -8.88
Polling=5  128  2238.49   99.97 5.45  3.66  3.19   92.00  24K    11.66     13.82
Polling=7  128  2330.97   99.97 5.59  3.51  3.14   95.00  19.5K  11.96     16.70
Polling=10 128  2375.78   99.97 5.69  3.45  3.14   98.00  10K    12.00     17.14
Polling=25 128  2655.01   99.97 2.45  3.09  1.21   99.00  8.5K   13.34     30.25
Polling=50 128  2655.01   99.97 2.45  3.09  1.21   99.00  8.5K   13.34     30.25
                              
NoPolling  25   2558.10   99.97 2.33  3.20  1.20   67.00  120K   15.32 
Polling=1  25   2508.93   99.97 3.13  3.27  1.67   75.00  125K   14.34     -6.41
Polling=5  25   3740.34   99.97 2.70  2.19  0.95   94.00  17K    19.28     25.86
Polling=7  25   3692.69   99.97 2.80  2.22  0.99   97.00  15.5K  18.75     22.37
Polling=10 25   4036.60   99.97 2.69  2.03  0.87   99.00  8.5K   20.29     32.42
Polling=25 25   3998.89   99.97 2.64  2.05  0.87   99.00  8.5K   20.10     31.18
Polling=50 25   3998.89   99.97 2.64  2.05  0.87   99.00  8.5K   20.10     31.18
                              
NoPolling  512  4531.50   99.90 2.75  1.81  0.79   78.00  55K    25.47 
Polling=1  512  4684.19   99.95 2.69  1.75  0.75   83.00  35K    25.60      0.52
Polling=5  512  4932.65   99.75 2.75  1.68  0.74   91.00  12K    25.86      1.52
Polling=7  512  5226.14   99.86 2.80  1.57  0.70   95.00  7.5K   26.82      5.30
Polling=10 512  5464.90   99.60 2.90  1.49  0.70   96.00  8.2K   27.94      9.69
Polling=25 512  5550.44   99.58 2.84  1.47  0.67   99.00  7.5K   27.95      9.73
Polling=50 512  5550.44   99.58 2.84  1.47  0.67   99.00  7.5K   27.95      9.73
                                    
                                    
As you can see from the last column, polling improves performance in most cases.

I ran memcached (macro benchmark), where (as in the previous benchmark) the vm and 
vhost each get their own dedicated core. I configured memslap with C=128, T=8, as 
this configuration was required to produce enough load to saturate the vm.
I tried several other configurations, but this one produced the maximal 
throughput(for the baseline). 
  
The numbers for memcached (macro benchmark):

polling     time   TPS     Net    vhost vm   exits  TPS/cpu  TPS/cpu
mode                       rate   util  util /sec             % change
                              %                                   
Disabled    15.9s  125819  91.5   45    99   87K    873.74   
polling=1   15.8s  126820  92.3   60    99   87K    797.61   -8.71
polling=5   12.82  155799  113.4  79    99   25.5K  875.28    0.18
polling=10  11.7s  160639  116.9  83    99   16.3K  882.63    1.02
pollling=15 12.4s  160897  117.2  87    99   15K    865.04   -1.00
polling=100 11.7s  170971  124.4  99    99   30     863.49   -1.17


For memcached TPS/cpu does not show a significant difference in any of the cases. 
However, TPS numbers did improve in up to 35%, which can be useful for under-utilized 
systems which have cpu time to spare for extra throughput. 

If it makes sense to you, I will continue with the other changes requested for 
the patch.

Thank you,
Razya





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Benchmarking for vhost polling patch
  2014-10-29 12:38 ` Benchmarking for vhost polling patch Razya Ladelsky
@ 2014-10-30 11:30   ` Zhang Haoyu
  2014-10-30 12:11     ` Razya Ladelsky
  2014-11-09 12:19   ` Razya Ladelsky
  1 sibling, 1 reply; 8+ messages in thread
From: Zhang Haoyu @ 2014-10-30 11:30 UTC (permalink / raw)
  To: Razya Ladelsky, mst; +Cc: razya, kvm

> Hi Michael,
> 
> Following the polling patch thread: http://marc.info/?l=kvm&m=140853271510179&w=2, 
> I changed poll_stop_idle to be counted in micro seconds, and carried out 
> experiments using varying sizes of this value. The setup for netperf consisted of 
> 1 vm and 1 vhost , each running on their own dedicated core.
> 
Could you provide your changing code?

Thanks,
Zhang Haoyu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Benchmarking for vhost polling patch
  2014-10-30 11:30   ` Zhang Haoyu
@ 2014-10-30 12:11     ` Razya Ladelsky
  2014-10-31  2:21       ` Zhang Haoyu
  0 siblings, 1 reply; 8+ messages in thread
From: Razya Ladelsky @ 2014-10-30 12:11 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: kvm, mst

"Zhang Haoyu" <zhanghy@sangfor.com> wrote on 30/10/2014 01:30:08 PM:

> From: "Zhang Haoyu" <zhanghy@sangfor.com>
> To: Razya Ladelsky/Haifa/IBM@IBMIL, "mst" <mst@redhat.com>
> Cc: Razya Ladelsky/Haifa/IBM@IBMIL, "kvm" <kvm@vger.kernel.org>
> Date: 30/10/2014 01:30 PM
> Subject: Re: Benchmarking for vhost polling patch
> 
> > Hi Michael,
> > 
> > Following the polling patch thread: http://marc.info/?
> l=kvm&m=140853271510179&w=2, 
> > I changed poll_stop_idle to be counted in micro seconds, and carried 
out 
> > experiments using varying sizes of this value. The setup for 
> netperf consisted of 
> > 1 vm and 1 vhost , each running on their own dedicated core.
> > 
> Could you provide your changing code?
> 
> Thanks,
> Zhang Haoyu
> 
Hi Zhang,
Do you mean the change in code for poll_stop_idle?
Thanks,
Razya


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Benchmarking for vhost polling patch
  2014-10-30 12:11     ` Razya Ladelsky
@ 2014-10-31  2:21       ` Zhang Haoyu
  0 siblings, 0 replies; 8+ messages in thread
From: Zhang Haoyu @ 2014-10-31  2:21 UTC (permalink / raw)
  To: Razya Ladelsky; +Cc: kvm, mst

>> > Hi Michael,
>> > 
>> > Following the polling patch thread: http://marc.info/?
>> l=kvm&m=140853271510179&w=2, 
>> > I changed poll_stop_idle to be counted in micro seconds, and carried 
>out 
>> > experiments using varying sizes of this value. The setup for 
>> netperf consisted of 
>> > 1 vm and 1 vhost , each running on their own dedicated core.
>> > 
>> Could you provide your changing code?
>> 
>> Thanks,
>> Zhang Haoyu
>> 
>Hi Zhang,
>Do you mean the change in code for poll_stop_idle?
Yes, it's better to provide the complete code, including the polling patch.

Thanks,
Zhang Haoyu
>Thanks,
>Razya


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Benchmarking for vhost polling patch
  2014-10-29 12:38 ` Benchmarking for vhost polling patch Razya Ladelsky
  2014-10-30 11:30   ` Zhang Haoyu
@ 2014-11-09 12:19   ` Razya Ladelsky
  1 sibling, 0 replies; 8+ messages in thread
From: Razya Ladelsky @ 2014-11-09 12:19 UTC (permalink / raw)
  To: mst; +Cc: kvm, Joel Nider, Yossi Kuperman1, Alex Glikson, Eyal Moscovici

Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:

> From: Razya Ladelsky/Haifa/IBM@IBMIL
> To: mst@redhat.com
> Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
> Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
> Joel Nider/Haifa/IBM@IBMIL, abel.gordon@gmail.com, kvm@vger.kernel.org
> Date: 29/10/2014 02:38 PM
> Subject: Benchmarking for vhost polling patch
> 
> Hi Michael,
> 
> Following the polling patch thread: http://marc.info/?
> l=kvm&m=140853271510179&w=2, 
> I changed poll_stop_idle to be counted in micro seconds, and carried out 

> experiments using varying sizes of this value. 
> 
> If it makes sense to you, I will continue with the other changes 
> requested for 
> the patch.
> 
> Thank you,
> Razya
> 
> 

Hi Michael,
Have you had the chance to look into these numbers?
Thank you,
Razya 


> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Benchmarking for vhost polling patch
@ 2014-11-16 12:08 Razya Ladelsky
  2014-11-16 14:56 ` Michael S. Tsirkin
  0 siblings, 1 reply; 8+ messages in thread
From: Razya Ladelsky @ 2014-11-16 12:08 UTC (permalink / raw)
  To: mst; +Cc: kvm, Joel Nider, Yossi Kuperman1, Alex Glikson, Eyal Moscovici

Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:

> From: Razya Ladelsky/Haifa/IBM@IBMIL
> To: mst@redhat.com
> Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
> Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
> Joel Nider/Haifa/IBM@IBMIL, abel.gordon@gmail.com, kvm@vger.kernel.org
> Date: 29/10/2014 02:38 PM
> Subject: Benchmarking for vhost polling patch
> 
> Hi Michael,
> 
> Following the polling patch thread: http://marc.info/?
> l=kvm&m=140853271510179&w=2, 
> I changed poll_stop_idle to be counted in micro seconds, and carried out 

> experiments using varying sizes of this value. 
> 
> If it makes sense to you, I will continue with the other changes 
> requested for 
> the patch.
> 
> Thank you,
> Razya
> 
> 

Dear Michael,
I'm still interested in hearing your opinion about these numbers 
http://marc.info/?l=kvm&m=141458631532669&w=2, 
and whether it is worthwhile to continue with the polling patch.
Thank you,
Razya 


> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Benchmarking for vhost polling patch
  2014-11-16 12:08 Razya Ladelsky
@ 2014-11-16 14:56 ` Michael S. Tsirkin
  0 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2014-11-16 14:56 UTC (permalink / raw)
  To: Razya Ladelsky
  Cc: kvm, Joel Nider, Yossi Kuperman1, Alex Glikson, Eyal Moscovici

On Sun, Nov 16, 2014 at 02:08:49PM +0200, Razya Ladelsky wrote:
> Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:
> 
> > From: Razya Ladelsky/Haifa/IBM@IBMIL
> > To: mst@redhat.com
> > Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
> > Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
> > Joel Nider/Haifa/IBM@IBMIL, abel.gordon@gmail.com, kvm@vger.kernel.org
> > Date: 29/10/2014 02:38 PM
> > Subject: Benchmarking for vhost polling patch
> > 
> > Hi Michael,
> > 
> > Following the polling patch thread: http://marc.info/?
> > l=kvm&m=140853271510179&w=2, 
> > I changed poll_stop_idle to be counted in micro seconds, and carried out 
> 
> > experiments using varying sizes of this value. 
> > 
> > If it makes sense to you, I will continue with the other changes 
> > requested for 
> > the patch.
> > 
> > Thank you,
> > Razya
> > 
> > 
> 
> Dear Michael,
> I'm still interested in hearing your opinion about these numbers 
> http://marc.info/?l=kvm&m=141458631532669&w=2, 
> and whether it is worthwhile to continue with the polling patch.
> Thank you,
> Razya 
> 
> 
> > 
> > 

Hi Razya,
On the netperf benchmark, it looks like polling=10 gives a modest but
measureable gain.  So from that perspective it might be worth it if it's
not too much code, though we'll need to spend more time checking the
macro effect - we barely moved the needle on the macro benchmark and
that is suspicious.
Is there a chance you are actually trading latency for throughput?
do you observe any effect on latency?
How about trying some other benchmark, e.g. NFS?


Also, I am wondering:

since vhost thread is polling in kernel anyway, shouldn't
we try and poll the host NIC?
that would likely reduce at least the latency significantly,
won't it?


-- 
MST

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Benchmarking for vhost polling patch
       [not found] <1416919320-razya@il.ibm.com>
@ 2014-11-25 12:42 ` Razya Ladelsky
  0 siblings, 0 replies; 8+ messages in thread
From: Razya Ladelsky @ 2014-11-25 12:42 UTC (permalink / raw)
  To: mst; +Cc: razya, GLIKSON, ERANRA, YOSSIKU, JOELN, abel.gordon, kvm

Hi Michael,

> Hi Razya,
> On the netperf benchmark, it looks like polling=10 gives a modest but
> measureable gain.  So from that perspective it might be worth it if it's
> not too much code, though we'll need to spend more time checking the
> macro effect - we barely moved the needle on the macro benchmark and
> that is suspicious.

I ran memcached with various values for the key & value arguments, and 
managed to see a bigger impact of polling than when I used the default values,
Here are the numbers:

key=250     TPS      net    vhost vm   TPS/cpu  TPS/CPU
value=2048           rate   util  util          change

polling=0   101540   103.0  46   100   695.47
polling=5   136747   123.0  83   100   747.25   0.074440609
polling=7   140722   125.7  84   100   764.79   0.099663658
polling=10  141719   126.3  87   100   757.85   0.089688003
polling=15  142430   127.1  90   100   749.63   0.077863015
polling=25  146347   128.7  95   100   750.49   0.079107993
polling=50  150882   131.1  100  100   754.41   0.084733701

Macro benchmarks are less I/O intensive than the micro benchmark, which is why 
we can expect less impact for polling as compared to netperf. 
However, as shown above, we managed to get 10% TPS/CPU improvement with the 
polling patch.

> Is there a chance you are actually trading latency for throughput?
> do you observe any effect on latency?

No.

> How about trying some other benchmark, e.g. NFS?
> 

Tried, but didn't have enough I/O produced (vhost was at most at 15% util)

> 
> Also, I am wondering:
> 
> since vhost thread is polling in kernel anyway, shouldn't
> we try and poll the host NIC?
> that would likely reduce at least the latency significantly,
> won't it?
> 

Yes, it could be a great addition at some point, but needs a thorough 
investigation. In any case, not a part of this patch...

Thanks,
Razya


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-25 12:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1414586281-razya@il.ibm.com>
2014-10-29 12:38 ` Benchmarking for vhost polling patch Razya Ladelsky
2014-10-30 11:30   ` Zhang Haoyu
2014-10-30 12:11     ` Razya Ladelsky
2014-10-31  2:21       ` Zhang Haoyu
2014-11-09 12:19   ` Razya Ladelsky
2014-11-16 12:08 Razya Ladelsky
2014-11-16 14:56 ` Michael S. Tsirkin
     [not found] <1416919320-razya@il.ibm.com>
2014-11-25 12:42 ` Razya Ladelsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox