netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING
@ 2014-04-01  9:08 Xin Long
  2014-04-01 20:10 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Xin Long @ 2014-04-01  9:08 UTC (permalink / raw)
  To: network dev; +Cc: Xin Long

When we use AF_PACKET socket in normal network interface, it bypass the
protocol stack and send packets directly with dev_queue_xmit(), so it will
never trigger this problem. when we use it in vxlan interface. the path will
like this:
dev_queue_xmit()->dev_hard_start_xmit()->vxlan_xmit()->ip_tunnel_xmit()->ip_mc_output(),
after reaching ip_mc_output(), it will check the mc_loop by sk_mc_loop(), but
it have no case for AF_PACKET, then cause WARN_ON(1).

Perhaps others type interfaces like vxlan also has the same issue. so fix it
by adding the case for AF_PACKET in sk_mc_loop().

you can use 'dhclient' on a vxlan interface to reproduce it easily.

[ 2133.224368] ------------[ cut here ]------------
[ 2133.228971] WARNING: CPU: 0 PID: 10226 at include/net/ip.h:433
ip_mc_output+0x189/0x250()
[ 2133.235054] Modules linked in: vxlan(F) ip_tunnel nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw kvm_amd kvm ppdev parport_pc pcspkr serio_raw i2c_piix4 parport
pvpanic xfs libcrc32c cirrus drm_kms_helper ttm drm virtio_net virtio_blk
ata_generic virtio_pci pata_acpi virtio_ring virtio i2c_core floppy
[ 2133.278480] CPU: 0 PID: 10226 Comm: dhclient Tainted: GF       W
3.14.0-rc3+ #20
[ 2133.284651] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[ 2133.289983]  0000000000000009 ffff8800bb3ed970 ffffffff816a2b65
0000000000000000
[ 2133.296498]  ffff8800bb3ed9a8 ffffffff8108196d ffff880138694200
ffff8800b8295840
[ 2133.302891]  ffff8800b8295840 0000000000000000 000000000a0000ef
ffff8800bb3ed9b8
[ 2133.309170] Call Trace:
[ 2133.313236]  [<ffffffff816a2b65>] dump_stack+0x45/0x56
[ 2133.318500]  [<ffffffff8108196d>] warn_slowpath_common+0x7d/0xa0
[ 2133.324052]  [<ffffffff81081a4a>] warn_slowpath_null+0x1a/0x20
[ 2133.329153]  [<ffffffff815e1fa9>] ip_mc_output+0x189/0x250
[ 2133.333910]  [<ffffffff815e1825>] ip_local_out+0x25/0x30
[ 2133.338845]  [<ffffffff81624225>] iptunnel_xmit+0xf5/0x110
[ 2133.343609]  [<ffffffffa033b36d>] vxlan_xmit_skb+0x1bd/0x340 [vxlan]
[ 2133.349063]  [<ffffffffa033c235>] vxlan_xmit_one+0x5e5/0xa60 [vxlan]
[ 2133.354062]  [<ffffffffa033dcdc>] vxlan_xmit+0x13c/0x8f0 [vxlan]
[ 2133.359125]  [<ffffffff8159261e>] ? __alloc_skb+0x7e/0x2b0
[ 2133.363963]  [<ffffffff81591c91>] ? __kmalloc_reserve.isra.25+0x31/0x90
[ 2133.369216]  [<ffffffff815925ee>] ? __alloc_skb+0x4e/0x2b0
[ 2133.374104]  [<ffffffff815a4146>] dev_hard_start_xmit+0x326/0x5d0
[ 2133.379153]  [<ffffffff815a4706>] __dev_queue_xmit+0x316/0x480
[ 2133.384179]  [<ffffffff815a4880>] dev_queue_xmit+0x10/0x20
[ 2133.389063]  [<ffffffff8168ca4b>] packet_sendmsg+0xfab/0x1050
[ 2133.393805]  [<ffffffff815886de>] sock_aio_write+0xfe/0x130
[ 2133.398450]  [<ffffffff811cfb2a>] do_sync_write+0x5a/0x90
[ 2133.403282]  [<ffffffff811d0395>] vfs_write+0x1c5/0x1e0
[ 2133.407887]  [<ffffffff811d0c69>] SyS_write+0x49/0xa0
[ 2133.412713]  [<ffffffff81110d46>] ? __audit_syscall_exit+0x1f6/0x2a0
[ 2133.417899]  [<ffffffff816b2369>] system_call_fastpath

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/ip.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 25064c2..d9f1fbd 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -428,6 +428,8 @@ static inline int sk_mc_loop(struct sock *sk)
 	case AF_INET6:
 		return inet6_sk(sk)->mc_loop;
 #endif
+	case AF_PACKET:
+		return 1;
 	}
 	WARN_ON(1);
 	return 1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING
  2014-04-01  9:08 [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING Xin Long
@ 2014-04-01 20:10 ` David Miller
  2014-04-02  5:31   ` lucien xin
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2014-04-01 20:10 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev

From: Xin Long <lucien.xin@gmail.com>
Date: Tue,  1 Apr 2014 17:08:50 +0800

> dev_queue_xmit()->dev_hard_start_xmit()->vxlan_xmit()->ip_tunnel_xmit()->ip_mc_output(),
> after reaching ip_mc_output(), it will check the mc_loop by sk_mc_loop(), but
> it have no case for AF_PACKET, then cause WARN_ON(1).
 ...
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -428,6 +428,8 @@ static inline int sk_mc_loop(struct sock *sk)
>  	case AF_INET6:
>  		return inet6_sk(sk)->mc_loop;
>  #endif
> +	case AF_PACKET:
> +		return 1;
>  	}
>  	WARN_ON(1);
>  	return 1;

This is not the correct fix.

If vxlan is going to reinject dev_hard_start_xmit() bound packets into
the ipv4 stack, it must make sure that they are well formed.

The SKB traversing the ipv4 stack must either be bound to an ipv4/ipv6
socket or be orphaned.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING
  2014-04-01 20:10 ` David Miller
@ 2014-04-02  5:31   ` lucien xin
  2014-04-02 14:51     ` Stephen Hemminger
  0 siblings, 1 reply; 6+ messages in thread
From: lucien xin @ 2014-04-02  5:31 UTC (permalink / raw)
  To: David Miller; +Cc: network dev

On Wed, Apr 2, 2014 at 4:10 AM, David Miller <davem@davemloft.net> wrote:
> This is not the correct fix.
>
> If vxlan is going to reinject dev_hard_start_xmit() bound packets into
> the ipv4 stack, it must make sure that they are well formed.
>
> The SKB traversing the ipv4 stack must either be bound to an ipv4/ipv6
> socket or be orphaned.

yeah,  it's right. then I want to bound skb to vxlan's sock in
vxlan_xmit_skb(), like this:

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 1236812..8fd7e93 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1755,6 +1755,8 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
        if (err)
                return err;

+       skb_set_owner_w(skb, vs->sock->sk);
+
        return iptunnel_xmit(rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
                             false);
 }

which can not only solve the issue, but also make the logic more correct.
David, how do you think?

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING
  2014-04-02  5:31   ` lucien xin
@ 2014-04-02 14:51     ` Stephen Hemminger
  2014-04-03  5:28       ` lucien xin
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2014-04-02 14:51 UTC (permalink / raw)
  To: lucien xin; +Cc: David Miller, network dev

On Wed, 2 Apr 2014 13:31:59 +0800
lucien xin <lucien.xin@gmail.com> wrote:

> On Wed, Apr 2, 2014 at 4:10 AM, David Miller <davem@davemloft.net> wrote:
> > This is not the correct fix.
> >
> > If vxlan is going to reinject dev_hard_start_xmit() bound packets into
> > the ipv4 stack, it must make sure that they are well formed.
> >
> > The SKB traversing the ipv4 stack must either be bound to an ipv4/ipv6
> > socket or be orphaned.
> 
> yeah,  it's right. then I want to bound skb to vxlan's sock in
> vxlan_xmit_skb(), like this:
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 1236812..8fd7e93 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1755,6 +1755,8 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>         if (err)
>                 return err;
> 
> +       skb_set_owner_w(skb, vs->sock->sk);
> +
>         return iptunnel_xmit(rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
>                              false);
>  }
> 
> which can not only solve the issue, but also make the logic more correct.
> David, how do you think?

That breaks local socket accounting for tunneled packets.
The idea is that a TCP socket sending over vxlan will keep it's
memory accounting all the way until the device sends it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING
  2014-04-02 14:51     ` Stephen Hemminger
@ 2014-04-03  5:28       ` lucien xin
  2014-04-03 16:48         ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: lucien xin @ 2014-04-03  5:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, network dev

On Wed, Apr 2, 2014 at 10:51 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> That breaks local socket accounting for tunneled packets.
It do break, I think it should break, becaue vlxan send UDP packets,
it should be bound to UDP socket.
after  encapsulating the packet with local socket.  it reach the
vxlan's xmit.  and It should be take into
vxlan's UDP socket.and local socket should not take care of it.

now the problem is that the local socket(AF_PACKET) cannot take it to
travel the ipv4/6 protocol stack,
unlike TCP/UDP socket.   perhaps, in later, there will be other type
protocol family that has the same issue.
so I think vxlan should support it.

> The idea is that a TCP socket sending over vxlan will keep it's
> memory accounting all the way until the device sends it.
yes, the idea is there.  But if vxlan is treated as net_device,  the
packet enter it's xmit(), which seems
 to be send by the device.  so when the packet goes there, the TCP
socket also should let it go.

of course, if you insist the idea you said,  to support only some type
protocol family, do you have some
better suggestion?

I cannot be sure I understand you well, those are just my own
thoughts. pls help to check and correct it .  :)

many thanks to your reply.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING
  2014-04-03  5:28       ` lucien xin
@ 2014-04-03 16:48         ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2014-04-03 16:48 UTC (permalink / raw)
  To: lucien.xin; +Cc: stephen, netdev

From: lucien xin <lucien.xin@gmail.com>
Date: Thu, 3 Apr 2014 13:28:33 +0800

> On Wed, Apr 2, 2014 at 10:51 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> That breaks local socket accounting for tunneled packets.
> It do break, I think it should break, becaue vlxan send UDP packets,
> it should be bound to UDP socket.

The issue is that we use sockets in two different contexts here:

1) For memory accounting, which is why we should hold onto the
   original socket.

2) For IP transmit state, which is why we need to reference the
   vxlan layer's ipv4/ipv6 socket.

Also, we've never marked the vxlan socket as being charged to
the SKB, there may be unwanted side effects of this.

Furthmore this will hurt performance, it will add two new
atomic operations per packet send over VXLAN devices.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-03 16:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-01  9:08 [PATCH] vxlan: using AF_PACKET socket to send multicast packet cause WARNING Xin Long
2014-04-01 20:10 ` David Miller
2014-04-02  5:31   ` lucien xin
2014-04-02 14:51     ` Stephen Hemminger
2014-04-03  5:28       ` lucien xin
2014-04-03 16:48         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).