* [PATCH net-next-2.6] ip_gre: lockless xmit
@ 2010-09-28 9:05 Eric Dumazet
2010-09-29 8:10 ` Nicolas Dichtel
2010-09-29 17:33 ` Jesse Gross
0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-09-28 9:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :
Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s
After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s
Last problem to solve is the contention on dst :
38660.00 21.4% __ip_route_output_key vmlinux
20786.00 11.5% dst_release vmlinux
14191.00 7.8% __xfrm_lookup vmlinux
12410.00 6.9% ip_finish_output vmlinux
4540.00 2.5% ip_push_pending_frames vmlinux
4427.00 2.4% ip_append_data vmlinux
4265.00 2.4% __alloc_skb vmlinux
4140.00 2.3% __ip_local_out vmlinux
3991.00 2.2% dev_queue_xmit vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/ip_gre.c | 4 ++++
1 files changed, 4 insertions(+)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a1b5d5e..035db63 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla
if (!tb[IFLA_MTU])
dev->mtu = mtu;
+ /* Can use a lockless transmit, unless we generate output sequences */
+ if (!(nt->parms.o_flags & GRE_SEQ))
+ dev->features |= NETIF_F_LLTX;
+
err = register_netdevice(dev);
if (err)
goto out;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
2010-09-28 9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet
@ 2010-09-29 8:10 ` Nicolas Dichtel
2010-09-29 8:18 ` Eric Dumazet
2010-09-29 17:33 ` Jesse Gross
1 sibling, 1 reply; 9+ messages in thread
From: Nicolas Dichtel @ 2010-09-29 8:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
NETIF_F_LLTX is marked as deprecated:
include/linux/netdevice.h:
#define NETIF_F_LLTX 4096 /* LockLess TX - deprecated.
Please */
/* do not use LLTX in new
drivers */
Is it right to use it?
Regards,
Nicolas
Eric Dumazet wrote:
> GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
>
> Note: If tunnels are created with the "oseq" option, LLTX is not
> enabled :
>
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
>
> Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
> 10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
>
> Before patch :
> real 3m0.094s
> user 0m9.365s
> sys 47m50.103s
>
> After patch:
> real 0m29.756s
> user 0m11.097s
> sys 7m33.012s
>
> Last problem to solve is the contention on dst :
>
>
> 38660.00 21.4% __ip_route_output_key vmlinux
> 20786.00 11.5% dst_release vmlinux
> 14191.00 7.8% __xfrm_lookup vmlinux
> 12410.00 6.9% ip_finish_output vmlinux
> 4540.00 2.5% ip_push_pending_frames vmlinux
> 4427.00 2.4% ip_append_data vmlinux
> 4265.00 2.4% __alloc_skb vmlinux
> 4140.00 2.3% __ip_local_out vmlinux
> 3991.00 2.2% dev_queue_xmit vmlinux
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/ipv4/ip_gre.c | 4 ++++
> 1 files changed, 4 insertions(+)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index a1b5d5e..035db63 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla
> if (!tb[IFLA_MTU])
> dev->mtu = mtu;
>
> + /* Can use a lockless transmit, unless we generate output sequences */
> + if (!(nt->parms.o_flags & GRE_SEQ))
> + dev->features |= NETIF_F_LLTX;
> +
> err = register_netdevice(dev);
> if (err)
> goto out;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
2010-09-29 8:10 ` Nicolas Dichtel
@ 2010-09-29 8:18 ` Eric Dumazet
2010-09-29 8:21 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-09-29 8:18 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: David Miller, netdev
Le mercredi 29 septembre 2010 à 10:10 +0200, Nicolas Dichtel a écrit :
> NETIF_F_LLTX is marked as deprecated:
>
> include/linux/netdevice.h:
> #define NETIF_F_LLTX 4096 /* LockLess TX - deprecated.
> Please */
> /* do not use LLTX in new
> drivers */
>
> Is it right to use it?
>
In this particular case (and drivers/net/loopback.c), yes.
This is the only way to avoid the locking in core network
(net/core/dev.c)
What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the
ndo_xmit() driver method.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
2010-09-29 8:18 ` Eric Dumazet
@ 2010-09-29 8:21 ` David Miller
0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-09-29 8:21 UTC (permalink / raw)
To: eric.dumazet; +Cc: nicolas.dichtel, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 29 Sep 2010 10:18:08 +0200
> What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the
> ndo_xmit() driver method.
Also we've discussed recently to do away with the NETIF_F_LLTX
flag entirely for queue-less devices such as loopback and
sw tunnels.
Then all will be left are the truly "deprecated" cases Eric
mentions.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
2010-09-28 9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet
2010-09-29 8:10 ` Nicolas Dichtel
@ 2010-09-29 17:33 ` Jesse Gross
2010-09-29 18:43 ` Eric Dumazet
1 sibling, 1 reply; 9+ messages in thread
From: Jesse Gross @ 2010-09-29 17:33 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
On Tue, Sep 28, 2010 at 2:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
>
> Note: If tunnels are created with the "oseq" option, LLTX is not
> enabled :
>
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
>
> Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
> 10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
>
> Before patch :
> real 3m0.094s
> user 0m9.365s
> sys 47m50.103s
>
> After patch:
> real 0m29.756s
> user 0m11.097s
> sys 7m33.012s
>
> Last problem to solve is the contention on dst :
>
>
> 38660.00 21.4% __ip_route_output_key vmlinux
> 20786.00 11.5% dst_release vmlinux
> 14191.00 7.8% __xfrm_lookup vmlinux
> 12410.00 6.9% ip_finish_output vmlinux
> 4540.00 2.5% ip_push_pending_frames vmlinux
> 4427.00 2.4% ip_append_data vmlinux
> 4265.00 2.4% __alloc_skb vmlinux
> 4140.00 2.3% __ip_local_out vmlinux
> 3991.00 2.2% dev_queue_xmit vmlinux
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
The tx lock has another use here: to break local loops. With this
change, a misconfigured tunnel can bring down the machine with a stack
overflow. There are clearly other ways to fix this that don't require
a lock that restricts parallelism, such as a loop counter, but that's
the way it is now.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
2010-09-29 17:33 ` Jesse Gross
@ 2010-09-29 18:43 ` Eric Dumazet
2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
2010-09-29 19:24 ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross
0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-09-29 18:43 UTC (permalink / raw)
To: Jesse Gross; +Cc: David Miller, netdev
Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
> The tx lock has another use here: to break local loops. With this
> change, a misconfigured tunnel can bring down the machine with a stack
> overflow. There are clearly other ways to fix this that don't require
> a lock that restricts parallelism, such as a loop counter, but that's
> the way it is now.
Thats a very good point !
We could use a loop counter in the skb, but this use a bit of ram,
or percpu counters in tunnel drivers, to avoid a given level of
recursion.
/* this should be shared by all tunnels */
DEFINE_PER_CPU(tunnel_xmit_count);
tunnel_xmit()
{
if (__this_cpu_read(tunnel_xmit_count) >= LIMIT)
goto tx_error;
__this_cpu_inc(tunnel_xmit_count);
....
__IPTUNNEL_XMIT(tstats, &dev->stats);
__this_cpu_dec(tunnel_xmit_count),
return NETDEV_TX_OK;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next-2.6] net: add a recursion limit in xmit path
2010-09-29 18:43 ` Eric Dumazet
@ 2010-09-29 19:04 ` Eric Dumazet
2010-09-29 20:23 ` David Miller
2010-09-29 19:24 ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross
1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-09-29 19:04 UTC (permalink / raw)
To: Jesse Gross; +Cc: David Miller, netdev
Le mercredi 29 septembre 2010 à 20:43 +0200, Eric Dumazet a écrit :
> Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
>
> > The tx lock has another use here: to break local loops. With this
> > change, a misconfigured tunnel can bring down the machine with a stack
> > overflow. There are clearly other ways to fix this that don't require
> > a lock that restricts parallelism, such as a loop counter, but that's
> > the way it is now.
>
> Thats a very good point !
>
> We could use a loop counter in the skb, but this use a bit of ram,
> or percpu counters in tunnel drivers, to avoid a given level of
> recursion.
>
> /* this should be shared by all tunnels */
> DEFINE_PER_CPU(tunnel_xmit_count);
>
>
>
> tunnel_xmit()
> {
> if (__this_cpu_read(tunnel_xmit_count) >= LIMIT)
> goto tx_error;
> __this_cpu_inc(tunnel_xmit_count);
>
> ....
>
> __IPTUNNEL_XMIT(tstats, &dev->stats);
>
> __this_cpu_dec(tunnel_xmit_count),
> return NETDEV_TX_OK;
>
> }
>
>
This can be handled generically in net/core/dev.c
David, if you accept the lockless patches, please apply this one
before ;)
Thanks !
[PATCH net-next-2.6] net: add a recursion limit in xmit path
As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.
Add a percpu variable, and limit to three the number of stacked xmits.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/core/dev.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 48ad47f..50dacca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2177,6 +2177,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
return rc;
}
+static DEFINE_PER_CPU(int, xmit_recursion);
+#define RECURSION_LIMIT 3
+
/**
* dev_queue_xmit - transmit a buffer
* @skb: buffer to transmit
@@ -2242,10 +2245,15 @@ int dev_queue_xmit(struct sk_buff *skb)
if (txq->xmit_lock_owner != cpu) {
+ if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
+ goto recursion_alert;
+
HARD_TX_LOCK(dev, txq, cpu);
if (!netif_tx_queue_stopped(txq)) {
+ __this_cpu_inc(xmit_recursion);
rc = dev_hard_start_xmit(skb, dev, txq);
+ __this_cpu_dec(xmit_recursion);
if (dev_xmit_complete(rc)) {
HARD_TX_UNLOCK(dev, txq);
goto out;
@@ -2257,7 +2265,9 @@ int dev_queue_xmit(struct sk_buff *skb)
"queue packet!\n", dev->name);
} else {
/* Recursion is detected! It is possible,
- * unfortunately */
+ * unfortunately
+ */
+recursion_alert:
if (net_ratelimit())
printk(KERN_CRIT "Dead loop on virtual device "
"%s, fix it urgently!\n", dev->name);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
2010-09-29 18:43 ` Eric Dumazet
2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
@ 2010-09-29 19:24 ` Jesse Gross
1 sibling, 0 replies; 9+ messages in thread
From: Jesse Gross @ 2010-09-29 19:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
On Wed, Sep 29, 2010 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
>
>> The tx lock has another use here: to break local loops. With this
>> change, a misconfigured tunnel can bring down the machine with a stack
>> overflow. There are clearly other ways to fix this that don't require
>> a lock that restricts parallelism, such as a loop counter, but that's
>> the way it is now.
>
> Thats a very good point !
>
> We could use a loop counter in the skb, but this use a bit of ram,
> or percpu counters in tunnel drivers, to avoid a given level of
> recursion.
I agree, a percpu loop counter is the way to go. I implemented
something nearly identical to deal with this problem in Open vSwitch.
There are actually a number of optimizations in the Open vSwitch
tunneling stack that you may be interested in taking a look at as well
(for example, it has had this sort of lockless transmit for a while):
http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/tunnel.c;hb=HEAD
The plan is to upstream all of the kernel code (or at least offer it),
just trying to get the userspace interfaces settled down first.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] net: add a recursion limit in xmit path
2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
@ 2010-09-29 20:23 ` David Miller
0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-09-29 20:23 UTC (permalink / raw)
To: eric.dumazet; +Cc: jesse, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 29 Sep 2010 21:04:32 +0200
> [PATCH net-next-2.6] net: add a recursion limit in xmit path
>
> As tunnel devices are going to be lockless, we need to make sure a
> misconfigured machine wont enter an infinite loop.
>
> Add a percpu variable, and limit to three the number of stacked xmits.
>
> Reported-by: Jesse Gross <jesse@nicira.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-09-29 20:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-28 9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet
2010-09-29 8:10 ` Nicolas Dichtel
2010-09-29 8:18 ` Eric Dumazet
2010-09-29 8:21 ` David Miller
2010-09-29 17:33 ` Jesse Gross
2010-09-29 18:43 ` Eric Dumazet
2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
2010-09-29 20:23 ` David Miller
2010-09-29 19:24 ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).