From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH 2/2 v5] xps: Transmit Packet Steering Date: Sun, 07 Nov 2010 21:40:55 +0100 Message-ID: <1289162455.2478.295.camel@edumazet-laptop> References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: davem@davemloft.net, netdev@vger.kernel.org To: Tom Herbert Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:56071 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751767Ab0KGUlA (ORCPT ); Sun, 7 Nov 2010 15:41:00 -0500 Received: by wwb39 with SMTP id 39so3149529wwb.1 for ; Sun, 07 Nov 2010 12:40:59 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Le dimanche 07 novembre 2010 =C3=A0 11:52 -0800, Tom Herbert a =C3=A9cr= it : > This patch implements transmit packet steering (XPS) for multiqueue > devices. XPS selects a transmit queue during packet transmission bas= ed > on configuration. This is done by mapping the CPU transmitting the > packet to a queue. This is the transmit side analogue to RPS-- where > RPS is selecting a CPU based on receive queue, XPS selects a queue > based on the CPU (previously there was an XPS patch from Eric > Dumazet, but that might more appropriately be called transmit complet= ion > steering). >=20 > Each transmit queue can be associated with a number of CPUs which wil= l > use the queue to send packets. This is configured as a CPU mask on a > per queue basis in: >=20 > /sys/class/net/eth/queues/tx-/xps_cpus >=20 > The mappings are stored per device in an inverted data structure that > maps CPUs to queues. In the netdevice structure this is an array of > num_possible_cpu structures where each structure holds and array of > queue_indexes for queues which that CPU can use. >=20 > The benefits of XPS are improved locality in the per queue data > structures. Also, transmit completions are more likely to be done > nearer to the sending thread, so this should promote locality back > to the socket on free (e.g. UDP). The benefits of XPS are dependent = on > cache hierarchy, application load, and other factors. XPS would > nominally be configured so that a queue would only be shared by CPUs > which are sharing a cache, the degenerative configuration woud be tha= t > each CPU has it's own queue. >=20 > Below are some benchmark results which show the potential benfit of > this patch. The netperf test has 500 instances of netperf TCP_RR tes= t > with 1 byte req. and resp. >=20 > bnx2x on 16 core AMD > XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU > No XPS (16 queues) 996K at 100% CPU >=20 > Signed-off-by: Tom Herbert > --- > include/linux/netdevice.h | 32 ++++ > net/core/dev.c | 54 +++++++- > net/core/net-sysfs.c | 367 +++++++++++++++++++++++++++++++++++= +++++++++- > net/core/net-sysfs.h | 3 + > 4 files changed, 450 insertions(+), 6 deletions(-) >=20 > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 072652d..b2ea7c0 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -503,6 +503,13 @@ struct netdev_queue { > struct Qdisc *qdisc; > unsigned long state; > struct Qdisc *qdisc_sleeping; > +#ifdef CONFIG_RPS > + struct netdev_queue *first; > + atomic_t count; > + struct xps_dev_maps *xps_maps; Tom, I still dont understand why *xps_maps is here, and not in net_device ? I am asking because netdev_get_xps_maps(dev) might be slowed down because queue 0 state might change often (__QUEUE_STATE_XOFF) This means _tx[0] becomes a very hot cache line, needed to access all queues (from get_xps_queue()) Other than that, your patch seems fine (not tested yet) Thanks