* [PATCH net-next-2.6] ip_gre: lockless xmit @ 2010-09-28 9:05 Eric Dumazet 2010-09-29 8:10 ` Nicolas Dichtel 2010-09-29 17:33 ` Jesse Gross 0 siblings, 2 replies; 9+ messages in thread From: Eric Dumazet @ 2010-09-28 9:05 UTC (permalink / raw) To: David Miller; +Cc: netdev GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX Note: If tunnels are created with the "oseq" option, LLTX is not enabled : Even using an atomic_t o_seq, we would increase chance for packets being out of order at receiver. Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) Before patch : real 3m0.094s user 0m9.365s sys 47m50.103s After patch: real 0m29.756s user 0m11.097s sys 7m33.012s Last problem to solve is the contention on dst : 38660.00 21.4% __ip_route_output_key vmlinux 20786.00 11.5% dst_release vmlinux 14191.00 7.8% __xfrm_lookup vmlinux 12410.00 6.9% ip_finish_output vmlinux 4540.00 2.5% ip_push_pending_frames vmlinux 4427.00 2.4% ip_append_data vmlinux 4265.00 2.4% __alloc_skb vmlinux 4140.00 2.3% __ip_local_out vmlinux 3991.00 2.2% dev_queue_xmit vmlinux Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/ip_gre.c | 4 ++++ 1 files changed, 4 insertions(+) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index a1b5d5e..035db63 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla if (!tb[IFLA_MTU]) dev->mtu = mtu; + /* Can use a lockless transmit, unless we generate output sequences */ + if (!(nt->parms.o_flags & GRE_SEQ)) + dev->features |= NETIF_F_LLTX; + err = register_netdevice(dev); if (err) goto out; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit 2010-09-28 9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet @ 2010-09-29 8:10 ` Nicolas Dichtel 2010-09-29 8:18 ` Eric Dumazet 2010-09-29 17:33 ` Jesse Gross 1 sibling, 1 reply; 9+ messages in thread From: Nicolas Dichtel @ 2010-09-29 8:10 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev NETIF_F_LLTX is marked as deprecated: include/linux/netdevice.h: #define NETIF_F_LLTX 4096 /* LockLess TX - deprecated. Please */ /* do not use LLTX in new drivers */ Is it right to use it? Regards, Nicolas Eric Dumazet wrote: > GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX > > Note: If tunnels are created with the "oseq" option, LLTX is not > enabled : > > Even using an atomic_t o_seq, we would increase chance for packets being > out of order at receiver. > > Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending > 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) > > Before patch : > real 3m0.094s > user 0m9.365s > sys 47m50.103s > > After patch: > real 0m29.756s > user 0m11.097s > sys 7m33.012s > > Last problem to solve is the contention on dst : > > > 38660.00 21.4% __ip_route_output_key vmlinux > 20786.00 11.5% dst_release vmlinux > 14191.00 7.8% __xfrm_lookup vmlinux > 12410.00 6.9% ip_finish_output vmlinux > 4540.00 2.5% ip_push_pending_frames vmlinux > 4427.00 2.4% ip_append_data vmlinux > 4265.00 2.4% __alloc_skb vmlinux > 4140.00 2.3% __ip_local_out vmlinux > 3991.00 2.2% dev_queue_xmit vmlinux > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/ipv4/ip_gre.c | 4 ++++ > 1 files changed, 4 insertions(+) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index a1b5d5e..035db63 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -1557,6 +1557,10 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, struct nla > if (!tb[IFLA_MTU]) > dev->mtu = mtu; > > + /* Can use a lockless transmit, unless we generate output sequences */ > + if (!(nt->parms.o_flags & GRE_SEQ)) > + dev->features |= NETIF_F_LLTX; > + > err = register_netdevice(dev); > if (err) > goto out; > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit 2010-09-29 8:10 ` Nicolas Dichtel @ 2010-09-29 8:18 ` Eric Dumazet 2010-09-29 8:21 ` David Miller 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2010-09-29 8:18 UTC (permalink / raw) To: nicolas.dichtel; +Cc: David Miller, netdev Le mercredi 29 septembre 2010 à 10:10 +0200, Nicolas Dichtel a écrit : > NETIF_F_LLTX is marked as deprecated: > > include/linux/netdevice.h: > #define NETIF_F_LLTX 4096 /* LockLess TX - deprecated. > Please */ > /* do not use LLTX in new > drivers */ > > Is it right to use it? > In this particular case (and drivers/net/loopback.c), yes. This is the only way to avoid the locking in core network (net/core/dev.c) What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the ndo_xmit() driver method. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit 2010-09-29 8:18 ` Eric Dumazet @ 2010-09-29 8:21 ` David Miller 0 siblings, 0 replies; 9+ messages in thread From: David Miller @ 2010-09-29 8:21 UTC (permalink / raw) To: eric.dumazet; +Cc: nicolas.dichtel, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 29 Sep 2010 10:18:08 +0200 > What is deprecated is to assert NETIF_F_LLTX and yet, use a lock in the > ndo_xmit() driver method. Also we've discussed recently to do away with the NETIF_F_LLTX flag entirely for queue-less devices such as loopback and sw tunnels. Then all will be left are the truly "deprecated" cases Eric mentions. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit 2010-09-28 9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet 2010-09-29 8:10 ` Nicolas Dichtel @ 2010-09-29 17:33 ` Jesse Gross 2010-09-29 18:43 ` Eric Dumazet 1 sibling, 1 reply; 9+ messages in thread From: Jesse Gross @ 2010-09-29 17:33 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev On Tue, Sep 28, 2010 at 2:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX > > Note: If tunnels are created with the "oseq" option, LLTX is not > enabled : > > Even using an atomic_t o_seq, we would increase chance for packets being > out of order at receiver. > > Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending > 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) > > Before patch : > real 3m0.094s > user 0m9.365s > sys 47m50.103s > > After patch: > real 0m29.756s > user 0m11.097s > sys 7m33.012s > > Last problem to solve is the contention on dst : > > > 38660.00 21.4% __ip_route_output_key vmlinux > 20786.00 11.5% dst_release vmlinux > 14191.00 7.8% __xfrm_lookup vmlinux > 12410.00 6.9% ip_finish_output vmlinux > 4540.00 2.5% ip_push_pending_frames vmlinux > 4427.00 2.4% ip_append_data vmlinux > 4265.00 2.4% __alloc_skb vmlinux > 4140.00 2.3% __ip_local_out vmlinux > 3991.00 2.2% dev_queue_xmit vmlinux > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> The tx lock has another use here: to break local loops. With this change, a misconfigured tunnel can bring down the machine with a stack overflow. There are clearly other ways to fix this that don't require a lock that restricts parallelism, such as a loop counter, but that's the way it is now. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit 2010-09-29 17:33 ` Jesse Gross @ 2010-09-29 18:43 ` Eric Dumazet 2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet 2010-09-29 19:24 ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross 0 siblings, 2 replies; 9+ messages in thread From: Eric Dumazet @ 2010-09-29 18:43 UTC (permalink / raw) To: Jesse Gross; +Cc: David Miller, netdev Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit : > The tx lock has another use here: to break local loops. With this > change, a misconfigured tunnel can bring down the machine with a stack > overflow. There are clearly other ways to fix this that don't require > a lock that restricts parallelism, such as a loop counter, but that's > the way it is now. Thats a very good point ! We could use a loop counter in the skb, but this use a bit of ram, or percpu counters in tunnel drivers, to avoid a given level of recursion. /* this should be shared by all tunnels */ DEFINE_PER_CPU(tunnel_xmit_count); tunnel_xmit() { if (__this_cpu_read(tunnel_xmit_count) >= LIMIT) goto tx_error; __this_cpu_inc(tunnel_xmit_count); .... __IPTUNNEL_XMIT(tstats, &dev->stats); __this_cpu_dec(tunnel_xmit_count), return NETDEV_TX_OK; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next-2.6] net: add a recursion limit in xmit path 2010-09-29 18:43 ` Eric Dumazet @ 2010-09-29 19:04 ` Eric Dumazet 2010-09-29 20:23 ` David Miller 2010-09-29 19:24 ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross 1 sibling, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2010-09-29 19:04 UTC (permalink / raw) To: Jesse Gross; +Cc: David Miller, netdev Le mercredi 29 septembre 2010 à 20:43 +0200, Eric Dumazet a écrit : > Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit : > > > The tx lock has another use here: to break local loops. With this > > change, a misconfigured tunnel can bring down the machine with a stack > > overflow. There are clearly other ways to fix this that don't require > > a lock that restricts parallelism, such as a loop counter, but that's > > the way it is now. > > Thats a very good point ! > > We could use a loop counter in the skb, but this use a bit of ram, > or percpu counters in tunnel drivers, to avoid a given level of > recursion. > > /* this should be shared by all tunnels */ > DEFINE_PER_CPU(tunnel_xmit_count); > > > > tunnel_xmit() > { > if (__this_cpu_read(tunnel_xmit_count) >= LIMIT) > goto tx_error; > __this_cpu_inc(tunnel_xmit_count); > > .... > > __IPTUNNEL_XMIT(tstats, &dev->stats); > > __this_cpu_dec(tunnel_xmit_count), > return NETDEV_TX_OK; > > } > > This can be handled generically in net/core/dev.c David, if you accept the lockless patches, please apply this one before ;) Thanks ! [PATCH net-next-2.6] net: add a recursion limit in xmit path As tunnel devices are going to be lockless, we need to make sure a misconfigured machine wont enter an infinite loop. Add a percpu variable, and limit to three the number of stacked xmits. Reported-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/core/dev.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 48ad47f..50dacca 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2177,6 +2177,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, return rc; } +static DEFINE_PER_CPU(int, xmit_recursion); +#define RECURSION_LIMIT 3 + /** * dev_queue_xmit - transmit a buffer * @skb: buffer to transmit @@ -2242,10 +2245,15 @@ int dev_queue_xmit(struct sk_buff *skb) if (txq->xmit_lock_owner != cpu) { + if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT) + goto recursion_alert; + HARD_TX_LOCK(dev, txq, cpu); if (!netif_tx_queue_stopped(txq)) { + __this_cpu_inc(xmit_recursion); rc = dev_hard_start_xmit(skb, dev, txq); + __this_cpu_dec(xmit_recursion); if (dev_xmit_complete(rc)) { HARD_TX_UNLOCK(dev, txq); goto out; @@ -2257,7 +2265,9 @@ int dev_queue_xmit(struct sk_buff *skb) "queue packet!\n", dev->name); } else { /* Recursion is detected! It is possible, - * unfortunately */ + * unfortunately + */ +recursion_alert: if (net_ratelimit()) printk(KERN_CRIT "Dead loop on virtual device " "%s, fix it urgently!\n", dev->name); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] net: add a recursion limit in xmit path 2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet @ 2010-09-29 20:23 ` David Miller 0 siblings, 0 replies; 9+ messages in thread From: David Miller @ 2010-09-29 20:23 UTC (permalink / raw) To: eric.dumazet; +Cc: jesse, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 29 Sep 2010 21:04:32 +0200 > [PATCH net-next-2.6] net: add a recursion limit in xmit path > > As tunnel devices are going to be lockless, we need to make sure a > misconfigured machine wont enter an infinite loop. > > Add a percpu variable, and limit to three the number of stacked xmits. > > Reported-by: Jesse Gross <jesse@nicira.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next-2.6] ip_gre: lockless xmit 2010-09-29 18:43 ` Eric Dumazet 2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet @ 2010-09-29 19:24 ` Jesse Gross 1 sibling, 0 replies; 9+ messages in thread From: Jesse Gross @ 2010-09-29 19:24 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev On Wed, Sep 29, 2010 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit : > >> The tx lock has another use here: to break local loops. With this >> change, a misconfigured tunnel can bring down the machine with a stack >> overflow. There are clearly other ways to fix this that don't require >> a lock that restricts parallelism, such as a loop counter, but that's >> the way it is now. > > Thats a very good point ! > > We could use a loop counter in the skb, but this use a bit of ram, > or percpu counters in tunnel drivers, to avoid a given level of > recursion. I agree, a percpu loop counter is the way to go. I implemented something nearly identical to deal with this problem in Open vSwitch. There are actually a number of optimizations in the Open vSwitch tunneling stack that you may be interested in taking a look at as well (for example, it has had this sort of lockless transmit for a while): http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/tunnel.c;hb=HEAD The plan is to upstream all of the kernel code (or at least offer it), just trying to get the userspace interfaces settled down first. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-09-29 20:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-28 9:05 [PATCH net-next-2.6] ip_gre: lockless xmit Eric Dumazet 2010-09-29 8:10 ` Nicolas Dichtel 2010-09-29 8:18 ` Eric Dumazet 2010-09-29 8:21 ` David Miller 2010-09-29 17:33 ` Jesse Gross 2010-09-29 18:43 ` Eric Dumazet 2010-09-29 19:04 ` [PATCH net-next-2.6] net: add a recursion limit in xmit path Eric Dumazet 2010-09-29 20:23 ` David Miller 2010-09-29 19:24 ` [PATCH net-next-2.6] ip_gre: lockless xmit Jesse Gross
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).