From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753034AbcERJVh (ORCPT ); Wed, 18 May 2016 05:21:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56034 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750870AbcERJVe (ORCPT ); Wed, 18 May 2016 05:21:34 -0400 Date: Wed, 18 May 2016 11:21:29 +0200 From: Jesper Dangaard Brouer To: "Michael S. Tsirkin" Cc: Jason Wang , davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, brouer@redhat.com Subject: Re: [PATCH net-next] tuntap: introduce tx skb ring Message-ID: <20160518112129.0472b5dc@redhat.com> In-Reply-To: <20160518112045-mutt-send-email-mst@redhat.com> References: <1463361421-4397-1-git-send-email-jasowang@redhat.com> <20160516070012-mutt-send-email-mst@redhat.com> <57397C2B.7000603@redhat.com> <20160516105434-mutt-send-email-mst@redhat.com> <573A761D.8080909@redhat.com> <20160518101631.368e3447@redhat.com> <20160518112045-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 18 May 2016 09:21:33 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 18 May 2016 11:21:59 +0300 "Michael S. Tsirkin" wrote: > On Wed, May 18, 2016 at 10:16:31AM +0200, Jesper Dangaard Brouer wrote: > > > > On Tue, 17 May 2016 09:38:37 +0800 Jason Wang wrote: > > > > > >> And if tx_queue_length is not power of 2, > > > >> we probably need modulus to calculate the capacity. > > > > Is that really that important for speed? > > > > > > Not sure, I can test. > > > > In my experience, yes, adding a modulus does affect performance. > > How about simple > if (unlikely(++idx > size)) > idx = 0; So, you are exchanging an AND-operation with a mask, for a branch-operation. If the branch predictor is good enough in the CPU and code-"size" use-case, then I could be just as fast. I've actually played with a lot of different approaches: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/alf_queue_helpers.h I cannot remember the exact results. I do remember micro benchmarking showed good results with the advanced "unroll" approach, but IPv4 forwarding, where I know I-cache is getting evicted, showed best results with the more simpler implementations. > > > > > > Right, this sounds a good solution. > > > > Good idea. > > I'm not that sure - it's clearly wasting memory. Rounding up to power of two. In this case I don't think the memory wast is too high. As we are talking about max 16 bytes elements. I am concerned about memory in another way. We need to keep these arrays/rings small, due to data cache usage. A 4096 ring queue is bad because e.g. 16*4096=65536 bytes, and typical L1 cache is 32K-64K. As this is a circular buffer, we walk over this memory all the time, thus evicting the L1 cache. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer