* [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs @ 2013-12-23 12:56 Sathya Perla 2013-12-23 19:28 ` Stephen Hemminger 2014-01-06 17:54 ` [PATCH net-next] vxlan: keep original skb ownership Eric Dumazet 0 siblings, 2 replies; 10+ messages in thread From: Sathya Perla @ 2013-12-23 12:56 UTC (permalink / raw) To: netdev; +Cc: edumazet, stephen The vxlan driver sets itself as the socket owner for all the TX flows it encapsulates (using vxlan_set_owner()) and assigns it's own skb destructor. This causes all tunneled traffic to land up on only one TXQ as all encapsulated skbs refer to the vxlan socket and not the original socket. Also, the vxlan skb destructor breaks some functionality for tunneled traffic like wmem accounting and as Eric D. mentioned, TCP small queues and FQ/pacing packet scheduler too. This patch removes vxlan ownership on tunneled skbs. This causes tunneled traffic to be hashed into multiple TXQs based on the original socket hash. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> --- drivers/net/vxlan.c | 22 +++------------------- 1 files changed, 3 insertions(+), 19 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index aef44aa..eb55c08 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1381,20 +1381,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; } -static void vxlan_sock_put(struct sk_buff *skb) -{ - sock_put(skb->sk); -} - -/* On transmit, associate with the tunnel socket */ -static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb) -{ - skb_orphan(skb); - sock_hold(sk); - skb->sk = sk; - skb->destructor = vxlan_sock_put; -} - /* Compute source port for outgoing packet * first choice to use L4 flow hash since it will spread * better and maybe available from hardware @@ -1514,8 +1500,6 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, ip6h->daddr = *daddr; ip6h->saddr = *saddr; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1572,8 +1556,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, uh->len = htons(skb->len); uh->check = 0; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1836,8 +1818,10 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) struct sk_buff *skb1; skb1 = skb_clone(skb, GFP_ATOMIC); - if (skb1) + if (skb1) { + skb1->sk = skb->sk; vxlan_xmit_one(skb1, dev, rdst, did_rsc); + } } dev_kfree_skb(skb); -- 1.7.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2013-12-23 12:56 [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs Sathya Perla @ 2013-12-23 19:28 ` Stephen Hemminger 2013-12-24 12:27 ` Sathya Perla 2013-12-24 18:39 ` Eric Dumazet 2014-01-06 17:54 ` [PATCH net-next] vxlan: keep original skb ownership Eric Dumazet 1 sibling, 2 replies; 10+ messages in thread From: Stephen Hemminger @ 2013-12-23 19:28 UTC (permalink / raw) To: Sathya Perla; +Cc: netdev, edumazet On Mon, 23 Dec 2013 18:26:53 +0530 Sathya Perla <sathya.perla@emulex.com> wrote: > The vxlan driver sets itself as the socket owner for all the TX flows > it encapsulates (using vxlan_set_owner()) and assigns it's own skb > destructor. This causes all tunneled traffic to land up on only one TXQ > as all encapsulated skbs refer to the vxlan socket and not the original > socket. Also, the vxlan skb destructor breaks some functionality for > tunneled traffic like wmem accounting and as Eric D. mentioned, TCP > small queues and FQ/pacing packet scheduler too. > > This patch removes vxlan ownership on tunneled skbs. This causes > tunneled traffic to be hashed into multiple TXQs based on the original > socket hash. > > Signed-off-by: Sathya Perla <sathya.perla@emulex.com> > --- > drivers/net/vxlan.c | 22 +++------------------- > 1 files changed, 3 insertions(+), 19 deletions(-) > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index aef44aa..eb55c08 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -1381,20 +1381,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) > return false; > } > > -static void vxlan_sock_put(struct sk_buff *skb) > -{ > - sock_put(skb->sk); > -} > - > -/* On transmit, associate with the tunnel socket */ > -static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb) > -{ > - skb_orphan(skb); > - sock_hold(sk); > - skb->sk = sk; > - skb->destructor = vxlan_sock_put; > -} > - > /* Compute source port for outgoing packet > * first choice to use L4 flow hash since it will spread > * better and maybe available from hardware > @@ -1514,8 +1500,6 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, > ip6h->daddr = *daddr; > ip6h->saddr = *saddr; > > - vxlan_set_owner(vs->sock->sk, skb); > - > err = handle_offloads(skb); > if (err) > return err; > @@ -1572,8 +1556,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, > uh->len = htons(skb->len); > uh->check = 0; > > - vxlan_set_owner(vs->sock->sk, skb); > - > err = handle_offloads(skb); > if (err) > return err; > @@ -1836,8 +1818,10 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) > struct sk_buff *skb1; > > skb1 = skb_clone(skb, GFP_ATOMIC); > - if (skb1) > + if (skb1) { > + skb1->sk = skb->sk; > vxlan_xmit_one(skb1, dev, rdst, did_rsc); > + } > } > > dev_kfree_skb(skb); The idea is good, but without the destructor there is nothing to keep the UDP socket from being destroyed while packet is being sent on another CPU. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2013-12-23 19:28 ` Stephen Hemminger @ 2013-12-24 12:27 ` Sathya Perla 2013-12-24 18:39 ` Eric Dumazet 1 sibling, 0 replies; 10+ messages in thread From: Sathya Perla @ 2013-12-24 12:27 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev@vger.kernel.org, edumazet@google.com > -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > On Mon, 23 Dec 2013 18:26:53 +0530 > Sathya Perla <sathya.perla@emulex.com> wrote: > > > The vxlan driver sets itself as the socket owner for all the TX flows > > it encapsulates (using vxlan_set_owner()) and assigns it's own skb > > destructor. This causes all tunneled traffic to land up on only one TXQ > > as all encapsulated skbs refer to the vxlan socket and not the original > > socket. Also, the vxlan skb destructor breaks some functionality for > > tunneled traffic like wmem accounting and as Eric D. mentioned, TCP > > small queues and FQ/pacing packet scheduler too. > > > > This patch removes vxlan ownership on tunneled skbs. This causes > > tunneled traffic to be hashed into multiple TXQs based on the original > > socket hash. > > ... > > - > > /* Compute source port for outgoing packet > > * first choice to use L4 flow hash since it will spread > > * better and maybe available from hardware > > @@ -1514,8 +1500,6 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, > > ip6h->daddr = *daddr; > > ip6h->saddr = *saddr; > > > > - vxlan_set_owner(vs->sock->sk, skb); > > - > > err = handle_offloads(skb); > > if (err) > > return err; > > @@ -1572,8 +1556,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, > > uh->len = htons(skb->len); > > uh->check = 0; > > > > - vxlan_set_owner(vs->sock->sk, skb); > > - > > err = handle_offloads(skb); > > if (err) > > return err; > > @@ -1836,8 +1818,10 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct > net_device *dev) > > struct sk_buff *skb1; > > > > skb1 = skb_clone(skb, GFP_ATOMIC); > > - if (skb1) > > + if (skb1) { > > + skb1->sk = skb->sk; > > vxlan_xmit_one(skb1, dev, rdst, did_rsc); > > + } > > } > > > > dev_kfree_skb(skb); > > The idea is good, but without the destructor there is nothing to keep > the UDP socket from being destroyed while packet is being sent on another > CPU. Stephen, what is the consequence of freeing the UDP socket while a tunneled skb is still in flight on another CPU? Wouldn't it be safe if the skb did not refer to the vxlan socket at all? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2013-12-23 19:28 ` Stephen Hemminger 2013-12-24 12:27 ` Sathya Perla @ 2013-12-24 18:39 ` Eric Dumazet 2013-12-24 23:05 ` Stephen Hemminger 2013-12-31 18:56 ` David Miller 1 sibling, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2013-12-24 18:39 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Sathya Perla, netdev, edumazet On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > The idea is good, but without the destructor there is nothing to keep > the UDP socket from being destroyed while packet is being sent on another > CPU. I see no requirement of holding a reference on the vxlan UDP socket in transmit path. At the time vxlan_set_owner() is called, nothing requires access to the socket anymore. If you believe its needed, then its already too late. Sathya, your patch is a step in the right direction, but the skb_clone() thing should be done a bit differently. The trick would be to avoid the skb_clone() for the last vxlan_xmit_one() call. diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 249e01c5600c..3a1e2cee7c6e 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1366,20 +1366,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; } -static void vxlan_sock_put(struct sk_buff *skb) -{ - sock_put(skb->sk); -} - -/* On transmit, associate with the tunnel socket */ -static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb) -{ - skb_orphan(skb); - sock_hold(sk); - skb->sk = sk; - skb->destructor = vxlan_sock_put; -} - /* Compute source port for outgoing packet * first choice to use L4 flow hash since it will spread * better and maybe available from hardware @@ -1499,8 +1485,6 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, ip6h->daddr = *daddr; ip6h->saddr = *saddr; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1557,8 +1541,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, uh->len = htons(skb->len); uh->check = 0; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1770,7 +1752,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst; + struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; skb_reset_mac_header(skb); @@ -1812,7 +1794,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) vxlan_fdb_miss(vxlan, eth->h_dest); dev->stats.tx_dropped++; - dev_kfree_skb(skb); + kfree_skb(skb); return NETDEV_TX_OK; } } @@ -1820,12 +1802,19 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) list_for_each_entry_rcu(rdst, &f->remotes, list) { struct sk_buff *skb1; + if (!fdst) { + fdst = rdst; + continue; + } skb1 = skb_clone(skb, GFP_ATOMIC); if (skb1) vxlan_xmit_one(skb1, dev, rdst, did_rsc); } - dev_kfree_skb(skb); + if (fdst) + vxlan_xmit_one(skb, dev, fdst, did_rsc); + else + kfree_skb(skb); return NETDEV_TX_OK; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2013-12-24 18:39 ` Eric Dumazet @ 2013-12-24 23:05 ` Stephen Hemminger 2013-12-31 18:56 ` David Miller 1 sibling, 0 replies; 10+ messages in thread From: Stephen Hemminger @ 2013-12-24 23:05 UTC (permalink / raw) To: Eric Dumazet; +Cc: Sathya Perla, netdev, edumazet On Tue, 24 Dec 2013 10:39:06 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > > > The idea is good, but without the destructor there is nothing to keep > > the UDP socket from being destroyed while packet is being sent on another > > CPU. > > I see no requirement of holding a reference on the vxlan UDP socket in > transmit path. > > At the time vxlan_set_owner() is called, nothing requires access to the > socket anymore. If you believe its needed, then its already too late. > > Sathya, your patch is a step in the right direction, but the skb_clone() > thing should be done a bit differently. > > The trick would be to avoid the skb_clone() for the last > vxlan_xmit_one() call. > That code was cloned from L2TP, I assumed that that it was required to hold reference when calling UDP, since that is what socket layer does. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2013-12-24 18:39 ` Eric Dumazet 2013-12-24 23:05 ` Stephen Hemminger @ 2013-12-31 18:56 ` David Miller 2014-01-02 5:56 ` Eric Dumazet 1 sibling, 1 reply; 10+ messages in thread From: David Miller @ 2013-12-31 18:56 UTC (permalink / raw) To: eric.dumazet; +Cc: stephen, sathya.perla, netdev, edumazet From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 24 Dec 2013 10:39:06 -0800 > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > >> The idea is good, but without the destructor there is nothing to keep >> the UDP socket from being destroyed while packet is being sent on another >> CPU. > > I see no requirement of holding a reference on the vxlan UDP socket in > transmit path. I'm trying to figure out how leaving a dangling socket attached to skb->sk, as in this original patch, can be OK. If skb->sk is there, anyone can reference it, and meanwhile anyone can destroy and free it. That's Stephens' objection. Are you saying that we have something that allows this to be valid? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2013-12-31 18:56 ` David Miller @ 2014-01-02 5:56 ` Eric Dumazet 2014-01-02 6:36 ` David Miller 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2014-01-02 5:56 UTC (permalink / raw) To: David Miller; +Cc: stephen, sathya.perla, netdev, edumazet On Tue, 2013-12-31 at 13:56 -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 24 Dec 2013 10:39:06 -0800 > > > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: > > > >> The idea is good, but without the destructor there is nothing to keep > >> the UDP socket from being destroyed while packet is being sent on another > >> CPU. > > > > I see no requirement of holding a reference on the vxlan UDP socket in > > transmit path. > > I'm trying to figure out how leaving a dangling socket attached to > skb->sk, as in this original patch, can be OK. Sorry, I lost track here (vacation time...), what patch are you referring to ? > > If skb->sk is there, anyone can reference it, and meanwhile anyone can > destroy and free it. > > That's Stephens' objection. Not really. Stephen told us he copied code from L2TP. (But IPIP, GRE tunnels do not do that..._ He thought it was needed to hold a reference on vxlan socket, while it is not needed for any valid reason. RCU locking is more than enough to be able to build the encapsulation. > > Are you saying that we have something that allows this to be valid? Once we pass the tunnel, we can either : 1) Leave skb->sk set to the original socket sk1 (Say TCP or UDP producer) 2) Assign skb->sk to the 'socket' used in vxlan (after orphaning and releasing reference on socket sk1) Current vxlan code chose 2), but it makes no sense because : We use skb->sk to A) control amount of bytes/packets queued on behalf a socket, but current vxlan code does the skb->sk transfert without any limit/control on vxlan socket sk_sndbuf. B) security puposes (as selinux) or netfilter uses, and I do not think anything is prepared to handle vxlan stacked case in this area. If we chose 1), it makes more sense, because each producer will be effectively limited by the proper sk->sk_sndbuf limit. And a socket cannot be destroyed anyway as long as at least one skb is in flight (with skb->sk set to this socket) Really, I do not think vxlan should behave in a different way than other tunnels (GRE, IPIP, SIT, ...), which chose 1) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs 2014-01-02 5:56 ` Eric Dumazet @ 2014-01-02 6:36 ` David Miller 0 siblings, 0 replies; 10+ messages in thread From: David Miller @ 2014-01-02 6:36 UTC (permalink / raw) To: eric.dumazet; +Cc: stephen, sathya.perla, netdev, edumazet From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 01 Jan 2014 21:56:46 -0800 > On Tue, 2013-12-31 at 13:56 -0500, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Tue, 24 Dec 2013 10:39:06 -0800 >> >> > On Mon, 2013-12-23 at 11:28 -0800, Stephen Hemminger wrote: >> > >> >> The idea is good, but without the destructor there is nothing to keep >> >> the UDP socket from being destroyed while packet is being sent on another >> >> CPU. >> > >> > I see no requirement of holding a reference on the vxlan UDP socket in >> > transmit path. >> >> I'm trying to figure out how leaving a dangling socket attached to >> skb->sk, as in this original patch, can be OK. > > Sorry, I lost track here (vacation time...), what patch are you > referring to ? http://patchwork.ozlabs.org/patch/304767/ > Once we pass the tunnel, we can either : > > 1) Leave skb->sk set to the original socket sk1 (Say TCP or UDP > producer) > > 2) Assign skb->sk to the 'socket' used in vxlan (after orphaning and > releasing reference on socket sk1) ... > If we chose 1), it makes more sense, because each producer will be > effectively limited by the proper sk->sk_sndbuf limit. > > And a socket cannot be destroyed anyway as long as at least one skb is > in flight (with skb->sk set to this socket) > > Really, I do not think vxlan should behave in a different way than other > tunnels (GRE, IPIP, SIT, ...), which chose 1) Ok, agreed. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net-next] vxlan: keep original skb ownership 2013-12-23 12:56 [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs Sathya Perla 2013-12-23 19:28 ` Stephen Hemminger @ 2014-01-06 17:54 ` Eric Dumazet 2014-01-06 21:41 ` David Miller 1 sibling, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2014-01-06 17:54 UTC (permalink / raw) To: Sathya Perla; +Cc: netdev, edumazet, stephen From: Eric Dumazet <edumazet@google.com> Sathya Perla posted a patch trying to address following problem : <quote> The vxlan driver sets itself as the socket owner for all the TX flows it encapsulates (using vxlan_set_owner()) and assigns it's own skb destructor. This causes all tunneled traffic to land up on only one TXQ as all encapsulated skbs refer to the vxlan socket and not the original socket. Also, the vxlan skb destructor breaks some functionality for tunneled traffic like wmem accounting and as TCP small queues and FQ/pacing packet scheduler. </quote> I reworked Sathya patch and added some explanations. vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair and gain better drop monitor accuracy, by calling kfree_skb() when appropriate. The UDP socket used by vxlan to perform encapsulation of xmit packets do not need to be alive while packets leave vxlan code. Its better to keep original socket ownership to get proper feedback from qdisc and NIC layers. We use skb->sk to A) control amount of bytes/packets queued on behalf of a socket, but prior vxlan code did the skb->sk transfert without any limit/control on vxlan socket sk_sndbuf. B) security purposes (as selinux) or netfilter uses, and I do not think anything is prepared to handle vxlan stacked case in this area. By not changing ownership, vxlan tunnels behave like other tunnels. As Stephen mentioned, we might do the same change in L2TP. Reported-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> --- drivers/net/vxlan.c | 31 ++++++++++--------------------- 1 file changed, 10 insertions(+), 21 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 474a99ed0222..ab2e92eec949 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1381,20 +1381,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; } -static void vxlan_sock_put(struct sk_buff *skb) -{ - sock_put(skb->sk); -} - -/* On transmit, associate with the tunnel socket */ -static void vxlan_set_owner(struct sock *sk, struct sk_buff *skb) -{ - skb_orphan(skb); - sock_hold(sk); - skb->sk = sk; - skb->destructor = vxlan_sock_put; -} - /* Compute source port for outgoing packet * first choice to use L4 flow hash since it will spread * better and maybe available from hardware @@ -1514,8 +1500,6 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, ip6h->daddr = *daddr; ip6h->saddr = *saddr; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1572,8 +1556,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, uh->len = htons(skb->len); uh->check = 0; - vxlan_set_owner(vs->sock->sk, skb); - err = handle_offloads(skb); if (err) return err; @@ -1786,7 +1768,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst; + struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; skb_reset_mac_header(skb); @@ -1828,7 +1810,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) vxlan_fdb_miss(vxlan, eth->h_dest); dev->stats.tx_dropped++; - dev_kfree_skb(skb); + kfree_skb(skb); return NETDEV_TX_OK; } } @@ -1836,12 +1818,19 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) list_for_each_entry_rcu(rdst, &f->remotes, list) { struct sk_buff *skb1; + if (!fdst) { + fdst = rdst; + continue; + } skb1 = skb_clone(skb, GFP_ATOMIC); if (skb1) vxlan_xmit_one(skb1, dev, rdst, did_rsc); } - dev_kfree_skb(skb); + if (fdst) + vxlan_xmit_one(skb, dev, fdst, did_rsc); + else + kfree_skb(skb); return NETDEV_TX_OK; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next] vxlan: keep original skb ownership 2014-01-06 17:54 ` [PATCH net-next] vxlan: keep original skb ownership Eric Dumazet @ 2014-01-06 21:41 ` David Miller 0 siblings, 0 replies; 10+ messages in thread From: David Miller @ 2014-01-06 21:41 UTC (permalink / raw) To: eric.dumazet; +Cc: sathya.perla, netdev, edumazet, stephen From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 06 Jan 2014 09:54:31 -0800 > From: Eric Dumazet <edumazet@google.com> > > Sathya Perla posted a patch trying to address following problem : > > <quote> > The vxlan driver sets itself as the socket owner for all the TX flows > it encapsulates (using vxlan_set_owner()) and assigns it's own skb > destructor. This causes all tunneled traffic to land up on only one TXQ > as all encapsulated skbs refer to the vxlan socket and not the original > socket. Also, the vxlan skb destructor breaks some functionality for > tunneled traffic like wmem accounting and as TCP small queues and > FQ/pacing packet scheduler. > </quote> > > I reworked Sathya patch and added some explanations. > > vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair > and gain better drop monitor accuracy, by calling kfree_skb() when > appropriate. > > The UDP socket used by vxlan to perform encapsulation of xmit packets > do not need to be alive while packets leave vxlan code. Its better > to keep original socket ownership to get proper feedback from qdisc and > NIC layers. > > We use skb->sk to > > A) control amount of bytes/packets queued on behalf of a socket, but > prior vxlan code did the skb->sk transfert without any limit/control > on vxlan socket sk_sndbuf. > > B) security purposes (as selinux) or netfilter uses, and I do not think > anything is prepared to handle vxlan stacked case in this area. > > By not changing ownership, vxlan tunnels behave like other tunnels. > As Stephen mentioned, we might do the same change in L2TP. > > Reported-by: Sathya Perla <sathya.perla@emulex.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks a lot Eric. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-01-06 21:41 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-23 12:56 [PATCH net-next] vxlan: distribute vxlan tunneled traffic across multiple TXQs Sathya Perla 2013-12-23 19:28 ` Stephen Hemminger 2013-12-24 12:27 ` Sathya Perla 2013-12-24 18:39 ` Eric Dumazet 2013-12-24 23:05 ` Stephen Hemminger 2013-12-31 18:56 ` David Miller 2014-01-02 5:56 ` Eric Dumazet 2014-01-02 6:36 ` David Miller 2014-01-06 17:54 ` [PATCH net-next] vxlan: keep original skb ownership Eric Dumazet 2014-01-06 21:41 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).