netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] ip_gre: lockless xmit
@ 2010-09-28  9:05 Eric Dumazet
  2010-09-29  8:10 ` Nicolas Dichtel
  2010-09-29 17:33 ` Jesse Gross
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-09-28  9:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :

Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)

Before patch : 
real	3m0.094s
user	0m9.365s
sys	47m50.103s

After patch:
real	0m29.756s
user	0m11.097s
sys	7m33.012s

Last problem to solve is the contention on dst :


38660.00 21.4% __ip_route_output_key          vmlinux             
20786.00 11.5% dst_release                    vmlinux             
14191.00  7.8% __xfrm_lookup                  vmlinux             
12410.00  6.9% ip_finish_output               vmlinux             
 4540.00  2.5% ip_push_pending_frames         vmlinux             
 4427.00  2.4% ip_append_data                 vmlinux             
 4265.00  2.4% __alloc_skb                    vmlinux             
 4140.00  2.3% __ip_local_out                 vmlinux             
 3991.00  2.2% dev_queue_xmit                 vmlinux     

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_gre.c |    4 ++++
 1 files changed, 4 insertions(+)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a1b5d5e..035db63 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla
 	if (!tb[IFLA_MTU])
 		dev->mtu = mtu;
 
+	/* Can use a lockless transmit, unless we generate output sequences */
+	if (!(nt->parms.o_flags & GRE_SEQ))
+		dev->features |= NETIF_F_LLTX;
+
 	err = register_netdevice(dev);
 	if (err)
 		goto out;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
  2010-09-28  9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet
@ 2010-09-29  8:10 ` Nicolas Dichtel
  2010-09-29  8:18   ` Eric Dumazet
  2010-09-29 17:33 ` Jesse Gross
  1 sibling, 1 reply; 9+ messages in thread
From: Nicolas Dichtel @ 2010-09-29  8:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

NETIF_F_LLTX is marked as deprecated:

include/linux/netdevice.h:
#define NETIF_F_LLTX            4096    /* LockLess TX - deprecated. 
Please */
                                         /* do not use LLTX in new 
drivers */

Is it right to use it?

Regards,
Nicolas

Eric Dumazet wrote:
> GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
> 
> Note: If tunnels are created with the "oseq" option, LLTX is not
> enabled :
> 
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
> 
> Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
> 10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
> 
> Before patch : 
> real	3m0.094s
> user	0m9.365s
> sys	47m50.103s
> 
> After patch:
> real	0m29.756s
> user	0m11.097s
> sys	7m33.012s
> 
> Last problem to solve is the contention on dst :
> 
> 
> 38660.00 21.4% __ip_route_output_key          vmlinux             
> 20786.00 11.5% dst_release                    vmlinux             
> 14191.00  7.8% __xfrm_lookup                  vmlinux             
> 12410.00  6.9% ip_finish_output               vmlinux             
>  4540.00  2.5% ip_push_pending_frames         vmlinux             
>  4427.00  2.4% ip_append_data                 vmlinux             
>  4265.00  2.4% __alloc_skb                    vmlinux             
>  4140.00  2.3% __ip_local_out                 vmlinux             
>  3991.00  2.2% dev_queue_xmit                 vmlinux     
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/ip_gre.c |    4 ++++
>  1 files changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index a1b5d5e..035db63 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla
>  	if (!tb[IFLA_MTU])
>  		dev->mtu = mtu;
>  
> +	/* Can use a lockless transmit, unless we generate output sequences */
> +	if (!(nt->parms.o_flags & GRE_SEQ))
> +		dev->features |= NETIF_F_LLTX;
> +
>  	err = register_netdevice(dev);
>  	if (err)
>  		goto out;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
  2010-09-29  8:10 ` Nicolas Dichtel
@ 2010-09-29  8:18   ` Eric Dumazet
  2010-09-29  8:21     ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-09-29  8:18 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: David Miller, netdev

Le mercredi 29 septembre 2010 à 10:10 +0200, Nicolas Dichtel a écrit :
> NETIF_F_LLTX is marked as deprecated:
> 
> include/linux/netdevice.h:
> #define NETIF_F_LLTX            4096    /* LockLess TX - deprecated. 
> Please */
>                                          /* do not use LLTX in new 
> drivers */
> 
> Is it right to use it?
> 

In this particular case (and drivers/net/loopback.c), yes.

This is the only way to avoid the locking in core network
(net/core/dev.c)

What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the
ndo_xmit() driver method.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
  2010-09-29  8:18   ` Eric Dumazet
@ 2010-09-29  8:21     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-09-29  8:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nicolas.dichtel, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 29 Sep 2010 10:18:08 +0200

> What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the
> ndo_xmit() driver method.

Also we've discussed recently to do away with the NETIF_F_LLTX
flag entirely for queue-less devices such as loopback and
sw tunnels.

Then all will be left are the truly "deprecated" cases Eric
mentions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
  2010-09-28  9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet
  2010-09-29  8:10 ` Nicolas Dichtel
@ 2010-09-29 17:33 ` Jesse Gross
  2010-09-29 18:43   ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Jesse Gross @ 2010-09-29 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, Sep 28, 2010 at 2:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
>
> Note: If tunnels are created with the "oseq" option, LLTX is not
> enabled :
>
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
>
> Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
> 10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
>
> Before patch :
> real    3m0.094s
> user    0m9.365s
> sys     47m50.103s
>
> After patch:
> real    0m29.756s
> user    0m11.097s
> sys     7m33.012s
>
> Last problem to solve is the contention on dst :
>
>
> 38660.00 21.4% __ip_route_output_key          vmlinux
> 20786.00 11.5% dst_release                    vmlinux
> 14191.00  7.8% __xfrm_lookup                  vmlinux
> 12410.00  6.9% ip_finish_output               vmlinux
>  4540.00  2.5% ip_push_pending_frames         vmlinux
>  4427.00  2.4% ip_append_data                 vmlinux
>  4265.00  2.4% __alloc_skb                    vmlinux
>  4140.00  2.3% __ip_local_out                 vmlinux
>  3991.00  2.2% dev_queue_xmit                 vmlinux
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

The tx lock has another use here: to break local loops.  With this
change, a misconfigured tunnel can bring down the machine with a stack
overflow.  There are clearly other ways to fix this that don't require
a lock that restricts parallelism, such as a loop counter, but that's
the way it is now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
  2010-09-29 17:33 ` Jesse Gross
@ 2010-09-29 18:43   ` Eric Dumazet
  2010-09-29 19:04     ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
  2010-09-29 19:24     ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-09-29 18:43 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev

Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :

> The tx lock has another use here: to break local loops.  With this
> change, a misconfigured tunnel can bring down the machine with a stack
> overflow.  There are clearly other ways to fix this that don't require
> a lock that restricts parallelism, such as a loop counter, but that's
> the way it is now.

Thats a very good point !

We could use a loop counter in the skb, but this use a bit of ram,
or percpu counters in tunnel drivers, to avoid a given level of
recursion.

/* this should be shared by all tunnels */
DEFINE_PER_CPU(tunnel_xmit_count);



tunnel_xmit()
{
if (__this_cpu_read(tunnel_xmit_count) >= LIMIT)
	goto tx_error;
__this_cpu_inc(tunnel_xmit_count);

....

__IPTUNNEL_XMIT(tstats, &dev->stats);

__this_cpu_dec(tunnel_xmit_count),
return NETDEV_TX_OK;

}




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next-2.6] net: add a recursion limit in xmit path
  2010-09-29 18:43   ` Eric Dumazet
@ 2010-09-29 19:04     ` Eric Dumazet
  2010-09-29 20:23       ` David Miller
  2010-09-29 19:24     ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-09-29 19:04 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev

Le mercredi 29 septembre 2010 à 20:43 +0200, Eric Dumazet a écrit :
> Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
> 
> > The tx lock has another use here: to break local loops.  With this
> > change, a misconfigured tunnel can bring down the machine with a stack
> > overflow.  There are clearly other ways to fix this that don't require
> > a lock that restricts parallelism, such as a loop counter, but that's
> > the way it is now.
> 
> Thats a very good point !
> 
> We could use a loop counter in the skb, but this use a bit of ram,
> or percpu counters in tunnel drivers, to avoid a given level of
> recursion.
> 
> /* this should be shared by all tunnels */
> DEFINE_PER_CPU(tunnel_xmit_count);
> 
> 
> 
> tunnel_xmit()
> {
> if (__this_cpu_read(tunnel_xmit_count) >= LIMIT)
> 	goto tx_error;
> __this_cpu_inc(tunnel_xmit_count);
> 
> ....
> 
> __IPTUNNEL_XMIT(tstats, &dev->stats);
> 
> __this_cpu_dec(tunnel_xmit_count),
> return NETDEV_TX_OK;
> 
> }
> 
> 

This can be handled generically in net/core/dev.c

David, if you accept the lockless patches, please apply this one
before ;)

Thanks !

[PATCH net-next-2.6] net: add a recursion limit in xmit path

As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.

Add a percpu variable, and limit to three the number of stacked xmits.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 48ad47f..50dacca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2177,6 +2177,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
+static DEFINE_PER_CPU(int, xmit_recursion);
+#define RECURSION_LIMIT 3
+
 /**
  *	dev_queue_xmit - transmit a buffer
  *	@skb: buffer to transmit
@@ -2242,10 +2245,15 @@ int dev_queue_xmit(struct sk_buff *skb)
 
 		if (txq->xmit_lock_owner != cpu) {
 
+			if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
+				goto recursion_alert;
+
 			HARD_TX_LOCK(dev, txq, cpu);
 
 			if (!netif_tx_queue_stopped(txq)) {
+				__this_cpu_inc(xmit_recursion);
 				rc = dev_hard_start_xmit(skb, dev, txq);
+				__this_cpu_dec(xmit_recursion);
 				if (dev_xmit_complete(rc)) {
 					HARD_TX_UNLOCK(dev, txq);
 					goto out;
@@ -2257,7 +2265,9 @@ int dev_queue_xmit(struct sk_buff *skb)
 				       "queue packet!\n", dev->name);
 		} else {
 			/* Recursion is detected! It is possible,
-			 * unfortunately */
+			 * unfortunately
+			 */
+recursion_alert:
 			if (net_ratelimit())
 				printk(KERN_CRIT "Dead loop on virtual device "
 				       "%s, fix it urgently!\n", dev->name);



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
  2010-09-29 18:43   ` Eric Dumazet
  2010-09-29 19:04     ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
@ 2010-09-29 19:24     ` Jesse Gross
  1 sibling, 0 replies; 9+ messages in thread
From: Jesse Gross @ 2010-09-29 19:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Wed, Sep 29, 2010 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
>
>> The tx lock has another use here: to break local loops.  With this
>> change, a misconfigured tunnel can bring down the machine with a stack
>> overflow.  There are clearly other ways to fix this that don't require
>> a lock that restricts parallelism, such as a loop counter, but that's
>> the way it is now.
>
> Thats a very good point !
>
> We could use a loop counter in the skb, but this use a bit of ram,
> or percpu counters in tunnel drivers, to avoid a given level of
> recursion.

I agree, a percpu loop counter is the way to go.  I implemented
something nearly identical to deal with this problem in Open vSwitch.

There are actually a number of optimizations in the Open vSwitch
tunneling stack that you may be interested in taking a look at as well
(for example, it has had this sort of lockless transmit for a while):

http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/tunnel.c;hb=HEAD

The plan is to upstream all of the kernel code (or at least offer it),
just trying to get the userspace interfaces settled down first.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next-2.6] net: add a recursion limit in xmit path
  2010-09-29 19:04     ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
@ 2010-09-29 20:23       ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-09-29 20:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jesse, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 29 Sep 2010 21:04:32 +0200

> [PATCH net-next-2.6] net: add a recursion limit in xmit path
> 
> As tunnel devices are going to be lockless, we need to make sure a
> misconfigured machine wont enter an infinite loop.
> 
> Add a percpu variable, and limit to three the number of stacked xmits.
> 
> Reported-by: Jesse Gross <jesse@nicira.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-29 20:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-28  9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet
2010-09-29  8:10 ` Nicolas Dichtel
2010-09-29  8:18   ` Eric Dumazet
2010-09-29  8:21     ` David Miller
2010-09-29 17:33 ` Jesse Gross
2010-09-29 18:43   ` Eric Dumazet
2010-09-29 19:04     ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet
2010-09-29 20:23       ` David Miller
2010-09-29 19:24     ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).