From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [PATCH] NET: Multiqueue network device support. Date: Wed, 6 Jun 2007 20:47:12 -0400 Message-ID: <20070607004712.GE3304@havoc.gtf.org> References: <1181168020.4064.46.camel@localhost> <20070606.153530.48530367.davem@davemloft.net> <1181172766.4064.83.camel@localhost> <20070606.165215.38711917.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: hadi@cyberus.ca, kaber@trash.net, peter.p.waskiewicz.jr@intel.com, netdev@vger.kernel.org, auke-jan.h.kok@intel.com To: David Miller Return-path: Received: from havoc.gtf.org ([69.61.125.42]:51324 "EHLO havoc.gtf.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761426AbXFGArP (ORCPT ); Wed, 6 Jun 2007 20:47:15 -0400 Content-Disposition: inline In-Reply-To: <20070606.165215.38711917.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Wed, Jun 06, 2007 at 04:52:15PM -0700, David Miller wrote: > For the locking is makes a ton of sense. > > If you have sendmsg() calls going on N cpus, would you rather > they: > > 1) All queue up to the single netdev->tx_lock > > or > > 2) All take local per-hw-queue locks > > to transmit the data they are sending? > > I thought this was obvious... guess not :-) Agreed ++ For my part, I definitely want to see parallel Tx as well as parallel Rx. It's the only thing that makes sense for modern multi-core CPUs. Two warnings flags are raised in my brain though: 1) you need (a) well-designed hardware _and_ (b) a smart driver writer to avoid bottlenecking on internal driver locks. As you can see we have both (a) and (b) for tg3 ;-) But it's up in the air whether a multi-TX-queue scheme can be sanely locked internally on other hardware. At the moment we have to hope Intel gets it right in their driver... 2) I fear that the getting-it-into-the-Tx-queue part will take some thought in order to make this happen, too. Just like you have the SMP/SMT/Multi-core scheduler scheduling various resources, surely we will want some smarts so that sockets are not bouncing wildly across CPUs, absent other factors outside our control. Otherwise you will negate a lot of the value of the nifty multi-TX-lock driver API, by bouncing data across CPUs on each transmit anyway. IOW, you will have to sanely fill each of the TX queues. Jeff