From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1161729AbcEaDXR (ORCPT <rfc822;w@1wt.eu>);
	Mon, 30 May 2016 23:23:17 -0400
Received: from mx1.redhat.com ([209.132.183.28]:59408 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932469AbcEaDXP (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 30 May 2016 23:23:15 -0400
Subject: Re: [PATCH V2 2/2] vhost_net: conditionally enable tx polling
To: "Michael S. Tsirkin" <mst@redhat.com>
References: <1464590874-39539-1-git-send-email-jasowang@redhat.com>
 <1464590874-39539-3-git-send-email-jasowang@redhat.com>
 <20160530155521.GA5427@redhat.com>
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org
From: Jason Wang <jasowang@redhat.com>
Message-ID: <574D039E.5020101@redhat.com>
Date: Tue, 31 May 2016 11:23:10 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.8.0
MIME-Version: 1.0
In-Reply-To: <20160530155521.GA5427@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 31 May 2016 03:23:14 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 2016年05月30日 23:55, Michael S. Tsirkin wrote:
> On Mon, May 30, 2016 at 02:47:54AM -0400, Jason Wang wrote:
>> We always poll tx for socket, this is sub optimal since:
>>
>> - it will be only used when we exceed the sndbuf of the socket.
>> - since we use two independent polls for tx and vq, this will slightly
>>    increase the waitqueue traversing time and more important, vhost
>>    could not benefit from commit
>>    9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 ("net-tun: restructure
>>    tun_do_read for better sleep/wakeup efficiency") even if we've
>>    stopped rx polling during handle_rx since tx poll were still left in
>>    the waitqueue.
> Why is this an issue?
> sock_def_write_space only wakes up when queue is half empty,
> not on each packet.
>          if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
>
> I suspect the issue is with your previous patch,
> it now pokes at the spinlock on data path
> where it used not to.
>
> Is that right?

The problem is not tx wake up but still rx wake up. Patch 1 removes rx 
poll, but still left tx poll. So in sock_def_readable(), 
skwq_has_sleeper() returns true, we still need to traverse waitqueue and 
touch spinlocks. With this patch, unless a heavy tx load, tx poll were 
disabled, sock_def_readable() can return finish very soon.

>
>
>> Fix this by conditionally enable tx polling only when -EAGAIN were
>> met.
>>
>> Test shows about 8% improvement on guest rx pps.
>>
>> Before: ~1350000
>> After:  ~1460000
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/vhost/net.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index e91603b..5a05fa0 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -378,6 +378,7 @@ static void handle_tx(struct vhost_net *net)
>>   		goto out;
>>   
>>   	vhost_disable_notify(&net->dev, vq);
>> +	vhost_net_disable_vq(net, vq);
>>   
>>   	hdr_size = nvq->vhost_hlen;
>>   	zcopy = nvq->ubufs;
>> @@ -459,6 +460,8 @@ static void handle_tx(struct vhost_net *net)
>>   					% UIO_MAXIOV;
>>   			}
>>   			vhost_discard_vq_desc(vq, 1);
>> +			if (err == -EAGAIN)
>> +				vhost_net_enable_vq(net, vq);
>>   			break;
>>   		}
>>   		if (err != len)
>> -- 
>> 1.8.3.1