From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Jones Subject: Re: [RFC] New driver API to speed up small packets xmits Date: Thu, 10 May 2007 15:09:56 -0700 Message-ID: <46439834.4090406@hp.com> References: <46439189.5090907@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Stevens , Evgeniy Polyakov , Krishna Kumar2 , netdev@vger.kernel.org, netdev-owner@vger.kernel.org To: Eric Dumazet Return-path: Received: from palrel10.hp.com ([156.153.255.245]:34246 "EHLO palrel10.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752004AbXEJWKB (ORCPT ); Thu, 10 May 2007 18:10:01 -0400 In-Reply-To: <46439189.5090907@cosmosbay.com> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Eric Dumazet wrote: > David Stevens a =E9crit : >=20 >> The word "small" is coming up a lot in this discussion, and >> I think packet size really has nothing to do with it. Multiple >> streams generating packets of any size would benefit; the >> key ingredient is a queue length greater than 1. >> >> I think the intent is to remove queue lock cycles by taking >> the whole list (at least up to the count of free ring buffers) >> when the queue is greater than one packet, thus effectively >> removing the lock expense for n-1 packets. >> >=20 > Yes, but on modern cpus, locked operations are basically free once th= e=20 > CPU already has the cache line in exclusive access in its L1 cache. But will it here? Any of the CPUs are trying to add things to the qdis= c, but=20 only one CPU is pulling from it right? Even if the "pulling from it" i= s=20 happening in a loop, there can be scores or more other cores trying to = add=20 things to the queue, which would cause that cache line to migrate. > I am not sure adding yet another driver API will help very much. > It will for sure adds some bugs and pain. That could very well be. > A less expensive (and less prone to bugs) optimization would be to=20 > prefetch one cache line for next qdisc skb, as a cache line miss is f= ar=20 > more expensive than a locked operation (if lock already in L1 cache o= f=20 > course) Might they not build on on top of the other? rick jones