From mboxrd@z Thu Jan 1 00:00:00 1970 From: Greg Lindahl Subject: Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue Date: Wed, 1 Sep 2010 23:41:36 -0700 Message-ID: <20100902064136.GA8633@bx9.net> References: <1283356463.2556.351.camel@edumazet-laptop> <20100901.183251.106803238.davem@davemloft.net> <20100901185627.239ad165@nehalam> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Miller , therbert@google.com, eric.dumazet@gmail.com, netdev@vger.kernel.org To: Stephen Hemminger Return-path: Received: from rc.bx9.net ([64.13.160.15]:36774 "EHLO rc.bx9.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753160Ab0IBHOG (ORCPT ); Thu, 2 Sep 2010 03:14:06 -0400 Content-Disposition: inline In-Reply-To: <20100901185627.239ad165@nehalam> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Sep 01, 2010 at 06:56:27PM -0700, Stephen Hemminger wrote: > Just to be contrarian :-) This same idea had started before when IBM > proposed a user-space NUMA API. It never got any traction, the concept > of "lets make the applications NUMA aware" never got accepted because > it is so hard to do right and fragile that it was the wrong idea > to start with. The only people that can manage it are the engineers > tweeking a one off database benchmark. As an non-database user-space example, there are many applications which know about the typical 'first touch' locality policy for pages and use that to be NUMA-aware. Just about every OpenMP program ever written does that; it's even fairly portable among OSes. A second user-level example is MPI implementations such as OpenMPI. Those guys run 1 process per core and they don't need to move around, so getting process locked to a core and all the pages in the right place is a nice win without the MPI programmer doing anything. For kernel (but non-Ethernet) networking examples, HPC interconnects typically go out of their way to ensure locality of kernel pages related to a given core's workload. Examples include Myrinet's OpenMX+MPI and the InfiniPath InfiniBand adapater, whatever QLogic renamed it to this week (TrueScale, I suppose.) How can you get ~ 1 microsecond messages if you've got a buffer in the wrong place? Or achieve extremely high messaging rates when you're waiting for remote memory all the time? > I would rather see a "good enough" policy in the kernel that works > for everything from a single-core embedded system to a 100 core > server environment. I'd like a pony. Yes, it's challenging to directly aapply the above networking example to Ethernet networking, but there's a pony in there somewhere. -- greg