8% performance improved by change tap interact with kernel stack

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 8% performance improved by change tap interact with kernel stack
@ 2014-01-28  8:14 Qin Chuanyu
  2014-01-28  8:34 ` Michael S. Tsirkin
  2014-01-28 14:49 ` Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Qin Chuanyu @ 2014-01-28  8:14 UTC (permalink / raw)
  To: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev

according perf test result，I found that there are 5%-8% cpu cost on 
softirq by use netif_rx_ni called in tun_get_user.

so I changed the function which cause skb transmitted more quickly.
from
	tun_get_user	->
		 netif_rx_ni(skb);
to
	tun_get_user	->
		rcu_read_lock_bh();
		netif_receive_skb(skb);
		rcu_read_unlock_bh();

The test result is as below:
	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
	NIC: intel 82599
	Host OS/Guest OS:suse11sp3
	Qemu-1.6
	netperf udp 512(VM tx)
	test model: VM->host->host

	modified before : 2.00Gbps 461146pps
	modified after  : 2.16Gbps 498782pps

8% performance gained from this change,
Is there any problem for this patch ?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28  8:14 8% performance improved by change tap interact with kernel stack Qin Chuanyu
@ 2014-01-28  8:34 ` Michael S. Tsirkin
  2014-01-28  9:14   ` Qin Chuanyu
  2014-01-28 14:49 ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-01-28  8:34 UTC (permalink / raw)
  To: Qin Chuanyu; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On Tue, Jan 28, 2014 at 04:14:12PM +0800, Qin Chuanyu wrote:
> according perf test result，I found that there are 5%-8% cpu cost on
> softirq by use netif_rx_ni called in tun_get_user.
> 
> so I changed the function which cause skb transmitted more quickly.
> from
> 	tun_get_user	->
> 		 netif_rx_ni(skb);
> to
> 	tun_get_user	->
> 		rcu_read_lock_bh();
> 		netif_receive_skb(skb);
> 		rcu_read_unlock_bh();
> 
> The test result is as below:
> 	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
> 	NIC: intel 82599
> 	Host OS/Guest OS:suse11sp3
> 	Qemu-1.6
> 	netperf udp 512(VM tx)
> 	test model: VM->host->host
> 
> 	modified before : 2.00Gbps 461146pps
> 	modified after  : 2.16Gbps 498782pps
> 
> 8% performance gained from this change,
> Is there any problem for this patch ?

I think it's okay - IIUC this way we are processing xmit directly
instead of going through softirq.
Was meaning to try this - I'm glad you are looking into this.

Could you please check latency results?


-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28  8:34 ` Michael S. Tsirkin
@ 2014-01-28  9:14   ` Qin Chuanyu
  2014-01-28  9:41     ` Michael S. Tsirkin
  2014-01-28 16:56     ` Rick Jones
  0 siblings, 2 replies; 14+ messages in thread
From: Qin Chuanyu @ 2014-01-28  9:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On 2014/1/28 16:34, Michael S. Tsirkin wrote:
> On Tue, Jan 28, 2014 at 04:14:12PM +0800, Qin Chuanyu wrote:
>> according perf test result，I found that there are 5%-8% cpu cost on
>> softirq by use netif_rx_ni called in tun_get_user.
>>
>> so I changed the function which cause skb transmitted more quickly.
>> from
>> 	tun_get_user	->
>> 		 netif_rx_ni(skb);
>> to
>> 	tun_get_user	->
>> 		rcu_read_lock_bh();
>> 		netif_receive_skb(skb);
>> 		rcu_read_unlock_bh();
>>
>> The test result is as below:
>> 	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
>> 	NIC: intel 82599
>> 	Host OS/Guest OS:suse11sp3
>> 	Qemu-1.6
>> 	netperf udp 512(VM tx)
>> 	test model: VM->host->host
>>
>> 	modified before : 2.00Gbps 461146pps
>> 	modified after  : 2.16Gbps 498782pps
>>
>> 8% performance gained from this change,
>> Is there any problem for this patch ?
>
> I think it's okay - IIUC this way we are processing xmit directly
> instead of going through softirq.
> Was meaning to try this - I'm glad you are looking into this.
>
> Could you please check latency results?
>
netperf UDP_RR 512
test model: VM->host->host

modified before : 11108
modified after  : 11480

3% gained by this patch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28  9:14   ` Qin Chuanyu
@ 2014-01-28  9:41     ` Michael S. Tsirkin
  2014-01-28 10:19       ` Qin Chuanyu
  2014-01-28 16:56     ` Rick Jones
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-01-28  9:41 UTC (permalink / raw)
  To: Qin Chuanyu; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On Tue, Jan 28, 2014 at 05:14:46PM +0800, Qin Chuanyu wrote:
> On 2014/1/28 16:34, Michael S. Tsirkin wrote:
> >On Tue, Jan 28, 2014 at 04:14:12PM +0800, Qin Chuanyu wrote:
> >>according perf test result，I found that there are 5%-8% cpu cost on
> >>softirq by use netif_rx_ni called in tun_get_user.
> >>
> >>so I changed the function which cause skb transmitted more quickly.
> >>from
> >>	tun_get_user	->
> >>		 netif_rx_ni(skb);
> >>to
> >>	tun_get_user	->
> >>		rcu_read_lock_bh();
> >>		netif_receive_skb(skb);
> >>		rcu_read_unlock_bh();
> >>
> >>The test result is as below:
> >>	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
> >>	NIC: intel 82599
> >>	Host OS/Guest OS:suse11sp3
> >>	Qemu-1.6
> >>	netperf udp 512(VM tx)
> >>	test model: VM->host->host
> >>
> >>	modified before : 2.00Gbps 461146pps
> >>	modified after  : 2.16Gbps 498782pps
> >>
> >>8% performance gained from this change,
> >>Is there any problem for this patch ?
> >
> >I think it's okay - IIUC this way we are processing xmit directly
> >instead of going through softirq.
> >Was meaning to try this - I'm glad you are looking into this.
> >
> >Could you please check latency results?
> >
> netperf UDP_RR 512
> test model: VM->host->host
> 
> modified before : 11108
> modified after  : 11480
> 
> 3% gained by this patch
> 
> 

Nice.
What about CPU utilization?
It's trivially easy to speed up networking by
burning up a lot of CPU so we must make sure it's
not doing that.
And I think we should see some tests with TCP as well, and
try several message sizes.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28  9:41     ` Michael S. Tsirkin
@ 2014-01-28 10:19       ` Qin Chuanyu
  2014-01-28 10:33         ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Qin Chuanyu @ 2014-01-28 10:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On 2014/1/28 17:41, Michael S. Tsirkin wrote:
>>> I think it's okay - IIUC this way we are processing xmit directly
>>> instead of going through softirq.
>>> Was meaning to try this - I'm glad you are looking into this.
>>>
>>> Could you please check latency results?
>>>
>> netperf UDP_RR 512
>> test model: VM->host->host
>>
>> modified before : 11108
>> modified after  : 11480
>>
>> 3% gained by this patch
>>
>>
> Nice.
> What about CPU utilization?
> It's trivially easy to speed up networking by
> burning up a lot of CPU so we must make sure it's
> not doing that.
> And I think we should see some tests with TCP as well, and
> try several message sizes.
>
>
Yes, by burning up more CPU we could get better performance easily.
So I have bond vhost thread and interrupt of nic on CPU1 while testing.

modified before, the idle of CPU1 is 0%-1% while testing.
and after modify, the idle of CPU1 is 2%-3% while testing

TCP also could gain from this, but pps is less than UDP, so I think the 
improvement would be not so obviously.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28 10:19       ` Qin Chuanyu
@ 2014-01-28 10:33         ` Michael S. Tsirkin
  2014-01-28 16:58           ` Stephen Hemminger
  2014-01-29  7:41           ` Qin Chuanyu
  0 siblings, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-01-28 10:33 UTC (permalink / raw)
  To: Qin Chuanyu; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On Tue, Jan 28, 2014 at 06:19:02PM +0800, Qin Chuanyu wrote:
> On 2014/1/28 17:41, Michael S. Tsirkin wrote:
> >>>I think it's okay - IIUC this way we are processing xmit directly
> >>>instead of going through softirq.
> >>>Was meaning to try this - I'm glad you are looking into this.
> >>>
> >>>Could you please check latency results?
> >>>
> >>netperf UDP_RR 512
> >>test model: VM->host->host
> >>
> >>modified before : 11108
> >>modified after  : 11480
> >>
> >>3% gained by this patch
> >>
> >>
> >Nice.
> >What about CPU utilization?
> >It's trivially easy to speed up networking by
> >burning up a lot of CPU so we must make sure it's
> >not doing that.
> >And I think we should see some tests with TCP as well, and
> >try several message sizes.
> >
> >
> Yes, by burning up more CPU we could get better performance easily.
> So I have bond vhost thread and interrupt of nic on CPU1 while testing.
> 
> modified before, the idle of CPU1 is 0%-1% while testing.
> and after modify, the idle of CPU1 is 2%-3% while testing
> 
> TCP also could gain from this, but pps is less than UDP, so I think
> the improvement would be not so obviously.

Still need to test this doesn't regress but overall looks convincing to me.
Could you send a patch, accompanied by testing results for
throughput latency and cpu utilization for tcp and udp
with various message sizes?

Thanks!

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28  8:14 8% performance improved by change tap interact with kernel stack Qin Chuanyu
  2014-01-28  8:34 ` Michael S. Tsirkin
@ 2014-01-28 14:49 ` Eric Dumazet
  2014-01-29  7:12   ` Qin Chuanyu
  2014-02-11 13:21   ` Qin Chuanyu
  1 sibling, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2014-01-28 14:49 UTC (permalink / raw)
  To: Qin Chuanyu
  Cc: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Peter Klausler

On Tue, 2014-01-28 at 16:14 +0800, Qin Chuanyu wrote:
> according perf test result，I found that there are 5%-8% cpu cost on 
> softirq by use netif_rx_ni called in tun_get_user.
> 
> so I changed the function which cause skb transmitted more quickly.
> from
> 	tun_get_user	->
> 		 netif_rx_ni(skb);
> to
> 	tun_get_user	->
> 		rcu_read_lock_bh();
> 		netif_receive_skb(skb);
> 		rcu_read_unlock_bh();

No idea why you use rcu here ?

> 
> The test result is as below:
> 	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
> 	NIC: intel 82599
> 	Host OS/Guest OS:suse11sp3
> 	Qemu-1.6
> 	netperf udp 512(VM tx)
> 	test model: VM->host->host
> 
> 	modified before : 2.00Gbps 461146pps
> 	modified after  : 2.16Gbps 498782pps
> 
> 8% performance gained from this change,
> Is there any problem for this patch ?

http://patchwork.ozlabs.org/patch/52963/





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28  9:14   ` Qin Chuanyu
  2014-01-28  9:41     ` Michael S. Tsirkin
@ 2014-01-28 16:56     ` Rick Jones
  1 sibling, 0 replies; 14+ messages in thread
From: Rick Jones @ 2014-01-28 16:56 UTC (permalink / raw)
  To: Qin Chuanyu, Michael S. Tsirkin
  Cc: jasowang, Anthony Liguori, KVM list, netdev

On 01/28/2014 01:14 AM, Qin Chuanyu wrote:
> On 2014/1/28 16:34, Michael S. Tsirkin wrote:
>> Could you please check latency results?
>>
> netperf UDP_RR 512
> test model: VM->host->host
>
> modified before : 11108
> modified after  : 11480
>
> 3% gained by this patch

Netperf UDP_RR can be very sensitive to packet losses.  Not that there 
were necessarily any in your tests, but to further confirm the 3% 
improvement in latency, I would suggest using the confidence intervals 
functionality in your before/after netperf testing.  And to get at what 
Michael asks about CPU utilization I would suggest:

netperf -H <otherguy> -c -C -l 30 -i 30,3 -t UDP_RR -- -r 512

(I was guessing as to what netperf options you may have been using already)

happy benchmarking,

rick jones


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28 10:33         ` Michael S. Tsirkin
@ 2014-01-28 16:58           ` Stephen Hemminger
  2014-01-28 17:18             ` Michael S. Tsirkin
  2014-01-29  7:41           ` Qin Chuanyu
  1 sibling, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2014-01-28 16:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Qin Chuanyu, jasowang, Anthony Liguori, KVM list, netdev

On Tue, 28 Jan 2014 12:33:25 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jan 28, 2014 at 06:19:02PM +0800, Qin Chuanyu wrote:
> > On 2014/1/28 17:41, Michael S. Tsirkin wrote:
> > >>>I think it's okay - IIUC this way we are processing xmit directly
> > >>>instead of going through softirq.
> > >>>Was meaning to try this - I'm glad you are looking into this.
> > >>>
> > >>>Could you please check latency results?
> > >>>
> > >>netperf UDP_RR 512
> > >>test model: VM->host->host
> > >>
> > >>modified before : 11108
> > >>modified after  : 11480
> > >>
> > >>3% gained by this patch
> > >>
> > >>
> > >Nice.
> > >What about CPU utilization?
> > >It's trivially easy to speed up networking by
> > >burning up a lot of CPU so we must make sure it's
> > >not doing that.
> > >And I think we should see some tests with TCP as well, and
> > >try several message sizes.
> > >
> > >
> > Yes, by burning up more CPU we could get better performance easily.
> > So I have bond vhost thread and interrupt of nic on CPU1 while testing.
> > 
> > modified before, the idle of CPU1 is 0%-1% while testing.
> > and after modify, the idle of CPU1 is 2%-3% while testing
> > 
> > TCP also could gain from this, but pps is less than UDP, so I think
> > the improvement would be not so obviously.
> 
> Still need to test this doesn't regress but overall looks convincing to me.
> Could you send a patch, accompanied by testing results for
> throughput latency and cpu utilization for tcp and udp
> with various message sizes?
> 
> Thanks!
> 

There are a couple potential problems with this. The primary one is
that now you are violating the explicit assumptions about when netif_receive_skb()
can be called and because of that it may break things all over the place.

 *
 *	netif_receive_skb() is the main receive data processing function.
 *	It always succeeds. The buffer may be dropped during processing
 *	for congestion control or by the protocol layers.
 *
 *	This function may only be called from softirq context and interrupts
 *	should be enabled.

At a minimum, softirq (BH) and preempt must be disabled.

Another potential problem is that since a softirq is not used, the kernel stack
maybe much larger.

Maybe a better way would be implementing some form of NAPI in the TUN device?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28 16:58           ` Stephen Hemminger
@ 2014-01-28 17:18             ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-01-28 17:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Qin Chuanyu, jasowang, Anthony Liguori, KVM list, netdev

On Tue, Jan 28, 2014 at 08:58:34AM -0800, Stephen Hemminger wrote:
> On Tue, 28 Jan 2014 12:33:25 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Jan 28, 2014 at 06:19:02PM +0800, Qin Chuanyu wrote:
> > > On 2014/1/28 17:41, Michael S. Tsirkin wrote:
> > > >>>I think it's okay - IIUC this way we are processing xmit directly
> > > >>>instead of going through softirq.
> > > >>>Was meaning to try this - I'm glad you are looking into this.
> > > >>>
> > > >>>Could you please check latency results?
> > > >>>
> > > >>netperf UDP_RR 512
> > > >>test model: VM->host->host
> > > >>
> > > >>modified before : 11108
> > > >>modified after  : 11480
> > > >>
> > > >>3% gained by this patch
> > > >>
> > > >>
> > > >Nice.
> > > >What about CPU utilization?
> > > >It's trivially easy to speed up networking by
> > > >burning up a lot of CPU so we must make sure it's
> > > >not doing that.
> > > >And I think we should see some tests with TCP as well, and
> > > >try several message sizes.
> > > >
> > > >
> > > Yes, by burning up more CPU we could get better performance easily.
> > > So I have bond vhost thread and interrupt of nic on CPU1 while testing.
> > > 
> > > modified before, the idle of CPU1 is 0%-1% while testing.
> > > and after modify, the idle of CPU1 is 2%-3% while testing
> > > 
> > > TCP also could gain from this, but pps is less than UDP, so I think
> > > the improvement would be not so obviously.
> > 
> > Still need to test this doesn't regress but overall looks convincing to me.
> > Could you send a patch, accompanied by testing results for
> > throughput latency and cpu utilization for tcp and udp
> > with various message sizes?
> > 
> > Thanks!
> > 
> 
> There are a couple potential problems with this. The primary one is
> that now you are violating the explicit assumptions about when netif_receive_skb()
> can be called and because of that it may break things all over the place.

Specifically http://patchwork.ozlabs.org/patch/52963/
mentions cls_cgroup_classify which has this code:
        if (in_serving_softirq()) {
                /* If there is an sk_classid we'll use that. */
                if (!skb->sk)
                        return -1;
                classid = skb->sk->sk_classid;
        }

in_serving_softirq now checks flag so we could thinkably set it
just like softirq does.

>  *
>  *	netif_receive_skb() is the main receive data processing function.
>  *	It always succeeds. The buffer may be dropped during processing
>  *	for congestion control or by the protocol layers.
>  *
>  *	This function may only be called from softirq context and interrupts
>  *	should be enabled.
> 
> At a minimum, softirq (BH) and preempt must be disabled.

Yes.

> Another potential problem is that since a softirq is not used, the kernel stack
> maybe much larger.

tun itself is pretty modest in its stack use -
as thread linked above says it might not be a big issue.

> Maybe a better way would be implementing some form of NAPI in the TUN device?
> 

We can't always do this.

regular devices get skbs from card or RAM so they can do this
in softirq context.
tun gets skbs from userspace memory so it needs to run in
process context, at least sometimes.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28 14:49 ` Eric Dumazet
@ 2014-01-29  7:12   ` Qin Chuanyu
  2014-02-11 13:21   ` Qin Chuanyu
  1 sibling, 0 replies; 14+ messages in thread
From: Qin Chuanyu @ 2014-01-29  7:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Peter Klausler

On 2014/1/28 22:49, Eric Dumazet wrote:
> On Tue, 2014-01-28 at 16:14 +0800, Qin Chuanyu wrote:
>> according perf test result，I found that there are 5%-8% cpu cost on
>> softirq by use netif_rx_ni called in tun_get_user.
>>
>> so I changed the function which cause skb transmitted more quickly.
>> from
>> 	tun_get_user	->
>> 		 netif_rx_ni(skb);
>> to
>> 	tun_get_user	->
>> 		rcu_read_lock_bh();
>> 		netif_receive_skb(skb);
>> 		rcu_read_unlock_bh();
>
> No idea why you use rcu here ?

In my first version, I forgot to add lock when called netif_receive_skb
then I met a dad spinlock when using tcpdump.

tcpdump receive skb in netif_receive_skb but also in dev_queue_xmit.
and I have notice dev_queue_xmit add rcu_read_lock_bh before 
transmitting skb, and this lock avoid race between softirq and transmit 
thread.
	/* Disable soft irqs for various locks below. Also
	 * stops preemption for RCU.
	 */
	rcu_read_lock_bh();
Now I try to xmit skb in vhost thread, so I did the same thing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28 10:33         ` Michael S. Tsirkin
  2014-01-28 16:58           ` Stephen Hemminger
@ 2014-01-29  7:41           ` Qin Chuanyu
  2014-01-29  7:56             ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Qin Chuanyu @ 2014-01-29  7:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On 2014/1/28 18:33, Michael S. Tsirkin wrote:

>>> Nice.
>>> What about CPU utilization?
>>> It's trivially easy to speed up networking by
>>> burning up a lot of CPU so we must make sure it's
>>> not doing that.
>>> And I think we should see some tests with TCP as well, and
>>> try several message sizes.
>>>
>>>
>> Yes, by burning up more CPU we could get better performance easily.
>> So I have bond vhost thread and interrupt of nic on CPU1 while testing.
>>
>> modified before, the idle of CPU1 is 0%-1% while testing.
>> and after modify, the idle of CPU1 is 2%-3% while testing
>>
>> TCP also could gain from this, but pps is less than UDP, so I think
>> the improvement would be not so obviously.
>
> Still need to test this doesn't regress but overall looks convincing to me.
> Could you send a patch, accompanied by testing results for
> throughput latency and cpu utilization for tcp and udp
> with various message sizes?
>
> Thanks!
>
because of spring festival of china, the test result would be given two 
week later.
throughput would be test by netperf, and latency would be tested by 
qperf. Is that OK?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-29  7:41           ` Qin Chuanyu
@ 2014-01-29  7:56             ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-01-29  7:56 UTC (permalink / raw)
  To: Qin Chuanyu; +Cc: jasowang, Anthony Liguori, KVM list, netdev

On Wed, Jan 29, 2014 at 03:41:24PM +0800, Qin Chuanyu wrote:
> On 2014/1/28 18:33, Michael S. Tsirkin wrote:
> 
> >>>Nice.
> >>>What about CPU utilization?
> >>>It's trivially easy to speed up networking by
> >>>burning up a lot of CPU so we must make sure it's
> >>>not doing that.
> >>>And I think we should see some tests with TCP as well, and
> >>>try several message sizes.
> >>>
> >>>
> >>Yes, by burning up more CPU we could get better performance easily.
> >>So I have bond vhost thread and interrupt of nic on CPU1 while testing.
> >>
> >>modified before, the idle of CPU1 is 0%-1% while testing.
> >>and after modify, the idle of CPU1 is 2%-3% while testing
> >>
> >>TCP also could gain from this, but pps is less than UDP, so I think
> >>the improvement would be not so obviously.
> >
> >Still need to test this doesn't regress but overall looks convincing to me.
> >Could you send a patch, accompanied by testing results for
> >throughput latency and cpu utilization for tcp and udp
> >with various message sizes?
> >
> >Thanks!
> >
> because of spring festival of china, the test result would be given
> two week later.
> throughput would be test by netperf, and latency would be tested by
> qperf. Is that OK?

For testing - sounds good. Run vmstat in host to check host cpu utilization.
Pls don't forget to address all issues raised in this thread and in
the old one Eric mentioned:
http://patchwork.ozlabs.org/patch/52963/
either address in code or address in commit log why it doesn't apply
anymore.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 8% performance improved by change tap interact with kernel stack
  2014-01-28 14:49 ` Eric Dumazet
  2014-01-29  7:12   ` Qin Chuanyu
@ 2014-02-11 13:21   ` Qin Chuanyu
  1 sibling, 0 replies; 14+ messages in thread
From: Qin Chuanyu @ 2014-02-11 13:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Peter Klausler, davem

On 2014/1/28 22:49, Eric Dumazet wrote:
> On Tue, 2014-01-28 at 16:14 +0800, Qin Chuanyu wrote:
>> according perf test result，I found that there are 5%-8% cpu cost on
>> softirq by use netif_rx_ni called in tun_get_user.
>>
>> so I changed the function which cause skb transmitted more quickly.
>> from
>> 	tun_get_user	->
>> 		 netif_rx_ni(skb);
>> to
>> 	tun_get_user	->
>> 		rcu_read_lock_bh();
>> 		netif_receive_skb(skb);
>> 		rcu_read_unlock_bh();
>
> No idea why you use rcu here ?
>
>>
>> The test result is as below:
>> 	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
>> 	NIC: intel 82599
>> 	Host OS/Guest OS:suse11sp3
>> 	Qemu-1.6
>> 	netperf udp 512(VM tx)
>> 	test model: VM->host->host
>>
>> 	modified before : 2.00Gbps 461146pps
>> 	modified after  : 2.16Gbps 498782pps
>>
>> 8% performance gained from this change,
>> Is there any problem for this patch ?
>
> http://patchwork.ozlabs.org/patch/52963/
the problem list above show that:

   Since the cgroup classifier has this check:
     if (softirq_count() != SOFTIRQ_OFFSET))
	   return -1;
   We still fail to classify the frame.

but the source in the recently version has changed as below:
	if (in_serving_softirq()) {
		/* If there is an sk_classid we'll use that. */
		if (!skb->sk)
			return -1;
		classid = skb->sk->sk_classid;
	}

skb is allocated by tun_alloc_skb, so skb->sk is not NULL.
I think the problem is not existed anymore.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-02-11 13:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-28  8:14 8% performance improved by change tap interact with kernel stack Qin Chuanyu
2014-01-28  8:34 ` Michael S. Tsirkin
2014-01-28  9:14   ` Qin Chuanyu
2014-01-28  9:41     ` Michael S. Tsirkin
2014-01-28 10:19       ` Qin Chuanyu
2014-01-28 10:33         ` Michael S. Tsirkin
2014-01-28 16:58           ` Stephen Hemminger
2014-01-28 17:18             ` Michael S. Tsirkin
2014-01-29  7:41           ` Qin Chuanyu
2014-01-29  7:56             ` Michael S. Tsirkin
2014-01-28 16:56     ` Rick Jones
2014-01-28 14:49 ` Eric Dumazet
2014-01-29  7:12   ` Qin Chuanyu
2014-02-11 13:21   ` Qin Chuanyu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).