From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [net-next PATCH] net: allow vlan traffic to be received under bond Date: Fri, 28 Oct 2011 13:06:26 +0200 Message-ID: <1319799986.23112.101.camel@edumazet-laptop> References: <4E972301.4030004@intel.com> <20111018.234723.1235424375699917420.davem@davemloft.net> <1319796053.23112.92.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: jesse@nicira.com, john.r.fastabend@intel.com, hans.schillstrom@ericsson.com, jpirko@redhat.com, mbizon@freebox.fr, netdev@vger.kernel.org, fubar@us.ibm.com To: David Miller Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:43002 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755391Ab1J1LTf (ORCPT ); Fri, 28 Oct 2011 07:19:35 -0400 Received: by wyg36 with SMTP id 36so3768010wyg.19 for ; Fri, 28 Oct 2011 04:19:34 -0700 (PDT) In-Reply-To: <1319796053.23112.92.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: Le vendredi 28 octobre 2011 =C3=A0 12:00 +0200, Eric Dumazet a =C3=A9cr= it : > Oh well, this broke my setup, a very basic one. >=20 > eth1 and eth2 on a bonding device, bond0, active-backup >=20 > some vlans on top of bond0, say vlan.103 >=20 > $ ip link show dev vlan.103 > 8: vlan.103@bond0: mtu 1500 qdisc > pfifo_fast state UP qlen 100 > link/ether 00:1e:0b:ec:d3:d2 brd ff:ff:ff:ff:ff:ff >=20 >=20 > arp_rcv() now gets packets with skb->type PACKET_OTHERHOST and drops > such packets. >=20 > [000] 52870.115435: skb_gro_reset_offset <-napi_gro_receive > [000] 52870.115435: dev_gro_receive <-napi_gro_receive > [000] 52870.115435: napi_skb_finish <-napi_gro_receive > [000] 52870.115435: netif_receive_skb <-napi_skb_finish > [000] 52870.115435: get_rps_cpu <-netif_receive_skb > [000] 52870.115435: __netif_receive_skb <-netif_receive_skb > [000] 52870.115436: vlan_do_receive <-__netif_receive_skb > [000] 52870.115436: bond_handle_frame <-__netif_receive_skb > [000] 52870.115436: vlan_do_receive <-__netif_receive_skb > [000] 52870.115436: arp_rcv <-__netif_receive_skb > [000] 52870.115436: kfree_skb <-arp_rcv > [000] 52870.115437: __kfree_skb <-kfree_skb > [000] 52870.115437: skb_release_head_state <-__kfree_skb > [000] 52870.115437: skb_release_data <-__kfree_skb > [000] 52870.115437: kfree <-skb_release_data > [000] 52870.115437: kmem_cache_free <-__kfree_skb >=20 >=20 > By the way, we have no SNMP counter here so I spent some time to trac= k > this. I'll send a patch for this. >=20 > If this host initiates the trafic, all is well. >=20 > Please guys, can we get back ARP or revert this patch ? =46ollowing patch cures the problem, I am not sure its the right fix. Problem is we dont know how many times vlan_do_receive() can be called for a packet. Only last call should set/mess pkt_type to PACKET_OTHERHOST. So the caller should be responsible for this, not vlan_do_receive() Alternative would be to check skb->dev->rx_handler being NULL, but its not clean. =46ollowing patch is a hack because it handles multicast/broadcast traf= ic only. Unicast is already handled in lines 26-33, this is why we didnt catch the problem. diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index f1f2f7b..6861899 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -13,7 +13,7 @@ bool vlan_do_receive(struct sk_buff **skbp) =20 vlan_dev =3D vlan_find_dev(skb->dev, vlan_id); if (!vlan_dev) { - if (vlan_id) + if (vlan_id && skb->pkt_type =3D=3D PACKET_HOST) skb->pkt_type =3D PACKET_OTHERHOST; return false; }