From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S966476AbcKKERn (ORCPT <rfc822;w@1wt.eu>);
        Thu, 10 Nov 2016 23:17:43 -0500
Received: from mail-pf0-f196.google.com ([209.85.192.196]:35848 "EHLO
        mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S965482AbcKKERl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Nov 2016 23:17:41 -0500
Subject: Re: [PATCH 1/3] tuntap: rx batching
To: "Michael S. Tsirkin" <mst@redhat.com>,
        Jason Wang <jasowang@redhat.com>
References: <1478677113-13126-1-git-send-email-jasowang@redhat.com>
 <20161109183259-mutt-send-email-mst@kernel.org>
 <c6cd619f-b9a6-784d-2c44-6106e64f5664@redhat.com>
 <20161111053048-mutt-send-email-mst@kernel.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
From: John Fastabend <john.fastabend@gmail.com>
Message-ID: <58254654.4000501@gmail.com>
Date: Thu, 10 Nov 2016 20:17:24 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <20161111053048-mutt-send-email-mst@kernel.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 16-11-10 07:31 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2016 at 10:07:44AM +0800, Jason Wang wrote:
>>
>>
>> On 2016年11月10日 00:38, Michael S. Tsirkin wrote:
>>> On Wed, Nov 09, 2016 at 03:38:31PM +0800, Jason Wang wrote:
>>>> Backlog were used for tuntap rx, but it can only process 1 packet at
>>>> one time since it was scheduled during sendmsg() synchronously in
>>>> process context. This lead bad cache utilization so this patch tries
>>>> to do some batching before call rx NAPI. This is done through:
>>>>
>>>> - accept MSG_MORE as a hint from sendmsg() caller, if it was set,
>>>>    batch the packet temporarily in a linked list and submit them all
>>>>    once MSG_MORE were cleared.
>>>> - implement a tuntap specific NAPI handler for processing this kind of
>>>>    possible batching. (This could be done by extending backlog to
>>>>    support skb like, but using a tun specific one looks cleaner and
>>>>    easier for future extension).
>>>>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> So why do we need an extra queue?
>>
>> The idea was borrowed from backlog to allow some kind of bulking and avoid
>> spinlock on each dequeuing.
>>
>>>   This is not what hardware devices do.
>>> How about adding the packet to queue unconditionally, deferring
>>> signalling until we get sendmsg without MSG_MORE?
>>
>> Then you need touch spinlock when dequeuing each packet.
> 

Random thought, I have a cmpxchg ring I am using for the qdisc work that
could possibly replace the spinlock implementation. I haven't figured
out the resizing API yet because I did not need it but I assume it could
help here and let you dequeue multiple skbs in one operation.

I can post the latest version if useful or an older version is
somewhere on patchworks as well.

.John


> It runs on the same CPU, right? Otherwise we should use skb_array...
> 
>>>
>>>
>>>> ---
>>>>   drivers/net/tun.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
>>>>   1 file changed, 65 insertions(+), 6 deletions(-)
>>>>
>>
>> [...]
>>
>>>>   	rxhash = skb_get_hash(skb);
>>>> -	netif_rx_ni(skb);
>>>> +	skb_queue_tail(&tfile->socket.sk->sk_write_queue, skb);
>>>> +
>>>> +	if (!more) {
>>>> +		local_bh_disable();
>>>> +		napi_schedule(&tfile->napi);
>>>> +		local_bh_enable();
>>> Why do we need to disable bh here? I thought napi_schedule can
>>> be called from any context.
>>
>> Yes, it's unnecessary. Will remove.
>>
>> Thanks