* [net-next PATCH 0/2] Prevent packet reordering on tap/tun interfaces
@ 2016-06-02 1:17 Alexander Duyck
2016-06-02 1:17 ` [net-next PATCH 1/2] net: Add function to allow configuration of RPS Alexander Duyck
2016-06-02 1:17 ` [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled Alexander Duyck
0 siblings, 2 replies; 6+ messages in thread
From: Alexander Duyck @ 2016-06-02 1:17 UTC (permalink / raw)
To: netdev, davem, alexander.duyck
This patch set is meant to address a recent issue I found with VMs sending
traffic across a network. Specifically what I found was that the tap
interfaces were spreading single flows across all CPUs on the system due
to the fact that the sending VM was being load balanced across them. Under
light load this doesn't have much of an impact, however under heavier loads
where the interfaces are already running with Rx cleanup from other flows
on the CPUs I have seen this have a fairly significant impact as we can
avoid any issues due to reordering. The results can be anywhere from 10 to
30% improvement from what I have seen.
---
Alexander Duyck (2):
net: Add function to allow configuration of RPS
tun: Configure Rx queues to default to RPS enabled
drivers/net/tun.c | 4 +++-
include/linux/netdevice.h | 9 ++++++++
net/core/dev.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-
net/core/net-sysfs.c | 45 +++++++---------------------------------
4 files changed, 70 insertions(+), 39 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [net-next PATCH 1/2] net: Add function to allow configuration of RPS
2016-06-02 1:17 [net-next PATCH 0/2] Prevent packet reordering on tap/tun interfaces Alexander Duyck
@ 2016-06-02 1:17 ` Alexander Duyck
2016-06-02 1:23 ` Alexander Duyck
2016-06-02 1:17 ` [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled Alexander Duyck
1 sibling, 1 reply; 6+ messages in thread
From: Alexander Duyck @ 2016-06-02 1:17 UTC (permalink / raw)
To: netdev, davem, alexander.duyck
This patch gives drivers the ability to set their own default RPS
configuration. The general idea is to allow drivers that might benefit
from enabling RPS an opportunity to configure it for themselves.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
include/linux/netdevice.h | 9 ++++++++
net/core/dev.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-
net/core/net-sysfs.c | 45 +++++++---------------------------------
3 files changed, 67 insertions(+), 38 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929ce8157..329d554c9219 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -684,6 +684,15 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table,
bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id,
u16 filter_id);
#endif
+int netif_set_rps_cpus(struct net_device *dev, const struct cpumask *mask,
+ u16 index);
+#else
+static inline int netif_set_rps_cpus(struct net_device *dev,
+ const struct cpumask *mask,
+ u16 index)
+{
+ return 0;
+}
#endif /* CONFIG_RPS */
/* This structure contains an instance of an RX queue. */
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff431d570..efdcc917cf02 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1934,7 +1934,7 @@ static void netif_setup_tc(struct net_device *dev, unsigned int txq)
return;
}
- /* Invalidated prio to tc mappings set to TC0 */
+ /* Invaldated prio to tc mappings set to TC0 */
for (i = 1; i < TC_BITMASK + 1; i++) {
int q = netdev_get_prio_tc_map(dev, i);
@@ -1947,6 +1947,55 @@ static void netif_setup_tc(struct net_device *dev, unsigned int txq)
}
}
+#ifdef CONFIG_RPS
+int netif_set_rps_cpus(struct net_device *dev, const struct cpumask *mask,
+ u16 index)
+{
+ static DEFINE_MUTEX(rps_map_mutex);
+ struct netdev_rx_queue *queue;
+ struct rps_map *old_map, *map;
+ int cpu, map_sz;
+
+ if (index >= dev->real_num_rx_queues)
+ return -EINVAL;
+
+ map_sz = max_t(unsigned int,
+ L1_CACHE_BYTES,
+ RPS_MAP_SIZE(cpumask_weight(mask)));
+ map = kzalloc(map_sz, GFP_KERNEL);
+
+ if (!map)
+ return -ENOMEM;
+
+ for_each_cpu_and(cpu, mask, cpu_online_mask)
+ map->cpus[map->len++] = cpu;
+
+ if (!map->len) {
+ kfree(map);
+ map = NULL;
+ }
+
+ queue = dev->_rx + index;
+ mutex_lock(&rps_map_mutex);
+
+ old_map = rcu_dereference_protected(queue->rps_map,
+ mutex_is_locked(&rps_map_mutex));
+ rcu_assign_pointer(queue->rps_map, map);
+
+ if (map)
+ static_key_slow_inc(&rps_needed);
+ if (old_map)
+ static_key_slow_dec(&rps_needed);
+
+ mutex_unlock(&rps_map_mutex);
+
+ if (old_map)
+ kfree_rcu(old_map, rcu);
+
+ return 0;
+}
+EXPORT_SYMBOL(netif_set_rps_cpus);
+#endif /* CONFIG_RPS */
#ifdef CONFIG_XPS
static DEFINE_MUTEX(xps_map_mutex);
#define xmap_dereference(P) \
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 2b3f76fe65f4..1d270744b296 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -690,10 +690,10 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
struct rx_queue_attribute *attribute,
const char *buf, size_t len)
{
- struct rps_map *old_map, *map;
+ struct net_device *dev = queue->dev;
+ unsigned long index;
cpumask_var_t mask;
- int err, cpu, i;
- static DEFINE_MUTEX(rps_map_mutex);
+ int err;
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -701,48 +701,19 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
if (!alloc_cpumask_var(&mask, GFP_KERNEL))
return -ENOMEM;
+ index = get_netdev_rx_queue_index(queue);
+
err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits);
if (err) {
free_cpumask_var(mask);
return err;
}
- map = kzalloc(max_t(unsigned int,
- RPS_MAP_SIZE(cpumask_weight(mask)), L1_CACHE_BYTES),
- GFP_KERNEL);
- if (!map) {
- free_cpumask_var(mask);
- return -ENOMEM;
- }
-
- i = 0;
- for_each_cpu_and(cpu, mask, cpu_online_mask)
- map->cpus[i++] = cpu;
-
- if (i)
- map->len = i;
- else {
- kfree(map);
- map = NULL;
- }
-
- mutex_lock(&rps_map_mutex);
- old_map = rcu_dereference_protected(queue->rps_map,
- mutex_is_locked(&rps_map_mutex));
- rcu_assign_pointer(queue->rps_map, map);
-
- if (map)
- static_key_slow_inc(&rps_needed);
- if (old_map)
- static_key_slow_dec(&rps_needed);
-
- mutex_unlock(&rps_map_mutex);
-
- if (old_map)
- kfree_rcu(old_map, rcu);
+ err = netif_set_rps_cpus(dev, mask, index);
free_cpumask_var(mask);
- return len;
+
+ return err ? : len;
}
static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled
2016-06-02 1:17 [net-next PATCH 0/2] Prevent packet reordering on tap/tun interfaces Alexander Duyck
2016-06-02 1:17 ` [net-next PATCH 1/2] net: Add function to allow configuration of RPS Alexander Duyck
@ 2016-06-02 1:17 ` Alexander Duyck
2016-06-02 13:12 ` Eric Dumazet
1 sibling, 1 reply; 6+ messages in thread
From: Alexander Duyck @ 2016-06-02 1:17 UTC (permalink / raw)
To: netdev, davem, alexander.duyck
This patch enables tun/tap interfaces to use RPS by default. The
motivation behind this is to address the fact that the interfaces are
currently using netif_rx_ni which in turn will queue packets on whatever
CPU the function is called on, and when combined with load balancing this
can result in packets being received out of order.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
drivers/net/tun.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e16487cc6a9a..51555daaa6e4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -646,7 +646,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
rcu_assign_pointer(tfile->tun, tun);
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
- tun->numqueues++;
if (tfile->detached)
tun_enable_queue(tfile);
@@ -655,6 +654,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
tun_set_real_num_queues(tun);
+ netif_set_rps_cpus(tun->dev, cpu_online_mask, tun->numqueues);
+ tun->numqueues++;
+
/* device is allowed to go away first, so no need to hold extra
* refcnt.
*/
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [net-next PATCH 1/2] net: Add function to allow configuration of RPS
2016-06-02 1:17 ` [net-next PATCH 1/2] net: Add function to allow configuration of RPS Alexander Duyck
@ 2016-06-02 1:23 ` Alexander Duyck
0 siblings, 0 replies; 6+ messages in thread
From: Alexander Duyck @ 2016-06-02 1:23 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Netdev, David Miller
On Wed, Jun 1, 2016 at 6:17 PM, Alexander Duyck <aduyck@mirantis.com> wrote:
> This patch gives drivers the ability to set their own default RPS
> configuration. The general idea is to allow drivers that might benefit
> from enabling RPS an opportunity to configure it for themselves.
>
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
> ---
> include/linux/netdevice.h | 9 ++++++++
> net/core/dev.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-
> net/core/net-sysfs.c | 45 +++++++---------------------------------
> 3 files changed, 67 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f45929ce8157..329d554c9219 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -684,6 +684,15 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table,
> bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id,
> u16 filter_id);
> #endif
> +int netif_set_rps_cpus(struct net_device *dev, const struct cpumask *mask,
> + u16 index);
> +#else
> +static inline int netif_set_rps_cpus(struct net_device *dev,
> + const struct cpumask *mask,
> + u16 index)
> +{
> + return 0;
> +}
> #endif /* CONFIG_RPS */
>
> /* This structure contains an instance of an RX queue. */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 904ff431d570..efdcc917cf02 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1934,7 +1934,7 @@ static void netif_setup_tc(struct net_device *dev, unsigned int txq)
> return;
> }
>
> - /* Invalidated prio to tc mappings set to TC0 */
> + /* Invaldated prio to tc mappings set to TC0 */
> for (i = 1; i < TC_BITMASK + 1; i++) {
> int q = netdev_get_prio_tc_map(dev, i);
>
I just noticed that I fat fingered this when doing some last edits on
the patch. So I will be submitting a v2.
For now I will leave this out here for review, and I will wait to see
if there are any other changes needed after others have had a chance
to look over the rest of this patch set.
- Alex
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled
2016-06-02 1:17 ` [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled Alexander Duyck
@ 2016-06-02 13:12 ` Eric Dumazet
2016-06-02 15:24 ` Alexander Duyck
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2016-06-02 13:12 UTC (permalink / raw)
To: Alexander Duyck; +Cc: netdev, davem, alexander.duyck
On Wed, 2016-06-01 at 18:17 -0700, Alexander Duyck wrote:
> This patch enables tun/tap interfaces to use RPS by default. The
> motivation behind this is to address the fact that the interfaces are
> currently using netif_rx_ni which in turn will queue packets on whatever
> CPU the function is called on, and when combined with load balancing this
> can result in packets being received out of order.
Hmm...
I do not believe this can be made the default, this would be a major
regression in some cases.
Some users want cpu isolation. This is their number one priority.
If they use one cpu to feed packets through tun device, they do not want
to spread a DDOS to all online cpus. Traffic from one VM would hurt all
others.
We have ways to avoid reorders already in TX path (skb->ooo_okay) and
receive path in RFS layer.
tun could probably avoid reorders using a similar technique.
netif_rx_ni() could be extended to give a hint on the cpu that processed
prior packets.
(This would be a new function)
If the prior cpu is different than current cpu, we have to look at the
backlog of prior cpu.
If not empty, and prior cpu online, we need to queue the packet to prior
cpu queue.
If empty, we can 'switch' to the new cpu queue.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled
2016-06-02 13:12 ` Eric Dumazet
@ 2016-06-02 15:24 ` Alexander Duyck
0 siblings, 0 replies; 6+ messages in thread
From: Alexander Duyck @ 2016-06-02 15:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Alexander Duyck, Netdev, David Miller
On Thu, Jun 2, 2016 at 6:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2016-06-01 at 18:17 -0700, Alexander Duyck wrote:
>> This patch enables tun/tap interfaces to use RPS by default. The
>> motivation behind this is to address the fact that the interfaces are
>> currently using netif_rx_ni which in turn will queue packets on whatever
>> CPU the function is called on, and when combined with load balancing this
>> can result in packets being received out of order.
>
> Hmm...
>
> I do not believe this can be made the default, this would be a major
> regression in some cases.
Yeah, while thinking about it this morning I kind of realized the
solution is very x86-centric as well. It assumes all CPUs are equal
and I know there are a few architectures where that is not the case.
> Some users want cpu isolation. This is their number one priority.
>
> If they use one cpu to feed packets through tun device, they do not want
> to spread a DDOS to all online cpus. Traffic from one VM would hurt all
> others.
>
> We have ways to avoid reorders already in TX path (skb->ooo_okay) and
> receive path in RFS layer.
>
> tun could probably avoid reorders using a similar technique.
>
> netif_rx_ni() could be extended to give a hint on the cpu that processed
> prior packets.
>
> (This would be a new function)
>
> If the prior cpu is different than current cpu, we have to look at the
> backlog of prior cpu.
> If not empty, and prior cpu online, we need to queue the packet to prior
> cpu queue.
> If empty, we can 'switch' to the new cpu queue.
What I can probably do is look into borrowing some of the code from
the receive flow steering path. I believe there was some logic there
that already did checks for packets in the backlog. What I can
probably do is just make use of that in order to avoid reordering.
- Alex
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-02 15:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-02 1:17 [net-next PATCH 0/2] Prevent packet reordering on tap/tun interfaces Alexander Duyck
2016-06-02 1:17 ` [net-next PATCH 1/2] net: Add function to allow configuration of RPS Alexander Duyck
2016-06-02 1:23 ` Alexander Duyck
2016-06-02 1:17 ` [net-next PATCH 2/2] tun: Configure Rx queues to default to RPS enabled Alexander Duyck
2016-06-02 13:12 ` Eric Dumazet
2016-06-02 15:24 ` Alexander Duyck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).