From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Wang Subject: Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable Date: Fri, 22 Aug 2014 10:53:31 +0800 Message-ID: <53F6B0AB.2060700@redhat.com> References: <1408608310-13579-1-git-send-email-jasowang@redhat.com> <1408608310-13579-2-git-send-email-jasowang@redhat.com> <20140821081140.GA29116@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org To: "Michael S. Tsirkin" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:15862 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753562AbaHVCxf (ORCPT ); Thu, 21 Aug 2014 22:53:35 -0400 In-Reply-To: <20140821081140.GA29116@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On 08/21/2014 04:11 PM, Michael S. Tsirkin wrote: > On Thu, Aug 21, 2014 at 04:05:10PM +0800, Jason Wang wrote: >> > Rx busy loop does not scale well in the case when several parallel >> > sessions is active. This is because we keep looping even if there's >> > another process is runnable. For example, if that process is about to >> > send packet, keep busy polling in current process will brings extra >> > delay and damage the performance. >> > >> > This patch solves this issue by exiting the busy loop when there's >> > another process is runnable in current cpu. Simple test that pin two >> > netperf sessions in the same cpu in receiving side shows obvious >> > improvement: >> > >> > Before: >> > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ >> > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 >> > 16384 87380 1 1 10.00 15513.74 >> > 16384 87380 >> > 16384 87380 1 1 10.00 15092.78 >> > 16384 87380 >> > >> > After: >> > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ >> > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 >> > 16384 87380 1 1 10.00 23334.53 >> > 16384 87380 >> > 16384 87380 1 1 10.00 23327.58 >> > 16384 87380 >> > >> > Benchmark was done through two 8 cores Xeon machine back to back connected >> > with mlx4 through netperf TCP_RR test (busy_read were set to 50): >> > >> > sessions/bytes/before/after/+improvement%/busy_read=0/ >> > 1/1/30062.10/30034.72/+0%/20228.96/ >> > 16/1/214719.83/307669.01/+43%/268997.71/ >> > 32/1/231252.81/345845.16/+49%/336157.442/ >> > 64/512/212467.39/373464.93/+75%/397449.375/ >> > >> > Signed-off-by: Jason Wang >> > --- >> > include/net/busy_poll.h | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h >> > index 1d67fb6..8a33fb2 100644 >> > --- a/include/net/busy_poll.h >> > +++ b/include/net/busy_poll.h >> > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) >> > cpu_relax(); >> > >> > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && >> > - !need_resched() && !busy_loop_timeout(end_time)); >> > + !need_resched() && !busy_loop_timeout(end_time) && >> > + nr_running_this_cpu() < 2); > <= 1 would be a bit clearer? We want at most one process here. > Ok, will change it in next version.