From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter P Waskiewicz Jr Subject: Re: [PATCH] irq: Add node_affinity CPU masks for smarter irqbalance hints Date: Tue, 24 Nov 2009 10:33:21 -0800 Message-ID: <1259087601.2631.56.camel@ppwaskie-mobl2> References: <1258995923.4531.715.camel@laptop> <4B0B782A.4030901@linux.intel.com> <1259051986.4531.1057.camel@laptop> <20091124.093956.247147202.davem@davemloft.net> <1259085412.2631.48.camel@ppwaskie-mobl2> <4B0C2547.8030408@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Miller , "peterz@infradead.org" , "arjan@linux.intel.com" , "yong.zhang0@gmail.com" , "linux-kernel@vger.kernel.org" , "arjan@linux.jf.intel.com" , "netdev@vger.kernel.org" To: Eric Dumazet Return-path: Received: from mga09.intel.com ([134.134.136.24]:41191 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933901AbZKXSdP (ORCPT ); Tue, 24 Nov 2009 13:33:15 -0500 In-Reply-To: <4B0C2547.8030408@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2009-11-24 at 10:26 -0800, Eric Dumazet wrote: > Peter P Waskiewicz Jr a =C3=A9crit : > That's the kind of thing PJ is trying to make available. > >=20 > > Yes, that's exactly what I'm trying to do. Even further, we want t= o > > allocate the ring SW struct itself and descriptor structures on oth= er > > NUMA nodes, and make sure the interrupt lines up with those allocat= ions. > >=20 >=20 > Say you allocate ring buffers on NUMA node of the CPU handling interr= upt > on a particular queue. >=20 > If irqbalance or an admin changes /proc/irq/{number}/smp_affinities, > do you want to realloc ring buffer to another NUMA node ? >=20 That's why I'm trying to add the node_affinity mechanism that irqbalanc= e can use to prevent the interrupt being moved to another node. > It seems complex to me, maybe optimal thing would be to use a NUMA po= licy to > spread vmalloc() allocations to all nodes to get a good bandwidth... That's exactly what we're doing in our 10GbE driver right now (isn't pushed upstream yet, still finalizing our testing). We spread to all NUMA nodes in a semi-intelligent fashion when allocating our rings and buffers. The last piece is ensuring the interrupts tied to the various queues all route to the NUMA nodes those CPUs belong to. irqbalance needs some kind of hint to make sure it does the right thing, which today it does not. I don't see how this is complex though. Driver loads, allocates across the NUMA nodes for optimal throughput, then writes CPU masks for the NUMA nodes each interrupt belongs to. irqbalance comes along and looks at the new mask "hint," and then balances that interrupt within that hinted mask. Cheers, -PJ