From: Eric Dumazet <eric.dumazet@gmail.com>
To: Tom Herbert <therbert@google.com>
Cc: David Miller <davem@davemloft.net>,
Linux Netdev List <netdev@vger.kernel.org>
Subject: Re: [PATCH v4 1/1] rps: core implementation
Date: Wed, 06 Jan 2010 06:54:20 +0100 [thread overview]
Message-ID: <4B44258C.2050302@gmail.com> (raw)
In-Reply-To: <65634d661001051732qd64e79dt37e6247f8b0dc863@mail.gmail.com>
Le 06/01/2010 02:32, Tom Herbert a écrit :
> Here's an RPS updated patch with some minor fixes, sorry for the long
> turnaround. This addresses most of the comments for last patch:
>
> - Moved shared fields in softnet_data into a separate cacheline
> - Make hashrnd __read_mostly
> - Removed extra "hash" variable in get_rps_cpu
> - Allow use of RPS from netif_rx (we have a use case where this is needed)
> - In net_rps_action clear each cpu in the mask before calling the
> function, I believe this prevents race condition
Hmm, I cant see a race condition here, could you elaborate on this ?
mask is local to this cpu, and we cannot re-enter a function that could
change some bits under us (we are called from net_rx_action())
If you believe there is a race condition, I suspect race is still there.
>
> I still don't have a better way to do a per-napi RPS mask than using a
> single variable in sysfs under the device. It still seems like we'd
> want a file or even directory for each napi instance, but that looks
> like some major changes.
>
> Also, we found that a few drivers are calling napi_gro_receive in lieu
> of netif_receive_skb (tg3, e1000e for example). The patch does not
> support that, so there is no benefit for them with RPS :-(. The GRO
> path looks pretty intertwined with the receive although way through
> TCP so I'm not sure it will be easy to retrofit. We changed e1000e to
> call netif_receive_skb and top netperf RR throughput went for 85K tps
> to 241K tps, and for our workloads at least this is may be the bigger
> win.
Did you tested with VLANS too ? (with/without hardware support)
>
> Tom
Excellent, but I suspect big win comes from using few NICS.
(number_of(NICS) < num_online_cpus)
(in the reverse case, possible contention on queue->csd)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 97873e3..7107b13 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -676,6 +676,29 @@ struct net_device_ops {
> };
>
> /*
> + * Structure for Receive Packet Steering. Length of map and array of CPU ID's.
> + */
> +struct rps_map {
> + int len;
> + u16 map[0];
> +};
> +
> +/*
> + * Structure that contains the rps maps for various NAPI instances of a device.
> + */
> +struct dev_rps_maps {
> + int num_maps;
> + struct rcu_head rcu;
> + struct rps_map maps[0];
> +};
I feel uneasy about this, because of kmalloc() max size and rounding to power of two effects.
It also uses a single node in NUMA setups.
> +
> +/* Bound number of CPUs that can be in an rps map */
> +#define MAX_RPS_CPUS (num_possible_cpus() < 256 ? num_possible_cpus() : 256)
> +
> +/* Maximum size of RPS map (for allocation) */
> +#define RPS_MAP_SIZE (sizeof(struct rps_map) + (MAX_RPS_CPUS * sizeof(u16)))
> +
> +/*
> * The DEVICE structure.
> * Actually, this whole structure is a big mistake. It mixes I/O
> * data with strictly "high-level" data, and it has to know about
> @@ -861,6 +884,9 @@ struct net_device {
>
> struct netdev_queue rx_queue;
>
> + struct dev_rps_maps *dev_rps_maps; /* Per-NAPI maps for
> + receive packet steeing */
> +
If you store rps_map pointer into napi itself, you could avoid this MAX_RPS_CPUS thing
and really dynamically allocate the structure with the number of online cpus mentioned
in the map.
But yes, it makes store_rps_cpus() more complex :(
This probably can be done later, this Version 4 of RPS looks very good, thanks !
I am going to test it today on my dev machine before giving an Acked-by :)
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
next prev parent reply other threads:[~2010-01-06 5:54 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-20 23:28 [PATCH v4 1/1] rps: core implementation Tom Herbert
2009-11-20 23:39 ` David Miller
2009-11-20 23:50 ` Tom Herbert
2009-11-21 0:05 ` David Miller
2009-11-21 0:12 ` Tom Herbert
2009-11-21 0:40 ` Jarek Poplawski
2009-11-20 23:40 ` Stephen Hemminger
2009-11-20 23:53 ` Tom Herbert
2009-11-20 23:56 ` David Miller
2009-12-17 21:04 ` Tom Herbert
2010-01-06 1:32 ` Tom Herbert
2010-01-06 5:54 ` Eric Dumazet [this message]
2010-01-06 7:56 ` Tom Herbert
2010-01-06 18:38 ` Eric Dumazet
2010-01-06 21:10 ` [BUG net-next-2.6] Had to revert bonding: allow arp_ip_targets on separate vlans to use arp validation Eric Dumazet
2010-01-06 21:28 ` Jay Vosburgh
2010-01-06 21:34 ` Eric Dumazet
2010-01-06 21:38 ` David Miller
2010-01-06 21:45 ` Andy Gospodarek
2010-01-06 22:56 ` [PATCH net-next-2.6] fix " Andy Gospodarek
2010-01-06 23:53 ` Jay Vosburgh
2010-01-07 8:37 ` Eric Dumazet
2010-01-07 8:41 ` David Miller
2010-01-06 22:54 ` [PATCH v4 1/1] rps: core implementation Tom Herbert
2010-01-07 9:15 ` Eric Dumazet
2010-01-07 17:42 ` rps: some comments Eric Dumazet
2010-01-08 0:07 ` Tom Herbert
2010-01-08 6:27 ` Eric Dumazet
2010-01-11 6:25 ` [PATCH v4 1/1] rps: core implementation Tom Herbert
2010-01-11 9:00 ` Eric Dumazet
2010-01-14 4:40 ` David Miller
2009-11-20 23:42 ` Stephen Hemminger
2009-11-21 0:04 ` Tom Herbert
2009-11-21 8:07 ` Eric Dumazet
2009-11-21 9:03 ` Tom Herbert
2009-11-21 9:31 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B44258C.2050302@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=therbert@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.