* Reproducible VLAN/e1000e crash in 2.6.36 vanilla. @ 2010-10-25 17:57 Ben Greear 2010-10-25 21:18 ` Ben Greear 0 siblings, 1 reply; 4+ messages in thread From: Ben Greear @ 2010-10-25 17:57 UTC (permalink / raw) To: NetDev To re-create, setup 2 802.1q vlans on different physical interfaces on the same system, set up routing rules such that send-to-self works, and pass traffic (UDP/IPv4 in this case, but doesn't seem to matter). Stop traffic, then attempt to create additional 802.1q vlans on the same physical interfaces. The crash only appears to happen after having sent traffic on the interface. Likely it will also crash if one system is sending to another, but so far we've just tested sending-to-self. This appears very reproducible for us, and appears to be the same problem that I had reported against our hacked kernel here: http://www.spinics.net/lists/netdev/msg144748.html [root@ct503-60 ~]# general protection fault: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/virtual/net/eth2.103/type CPU 2 Modules linked in: 8021q garp bridge stp llc veth arc4 michael_mic macvlan pktgen fuse nfs lockd fscach] Pid: 0, comm: kworker/0:1 Not tainted 2.6.36 #32 X8DTU/X8DTU RIP: 0010:[<ffffffff813cada1>] [<ffffffff813cada1>] vlan_hwaccel_do_receive+0x64/0xca RSP: 0018:ffff880001a43c10 EFLAGS: 00010287 RAX: 0000000000000002 RBX: ffff88031d1b0200 RCX: ffff88032d600000 RDX: ffff880001a43c00 RSI: ffff88031d1b0200 RDI: 0000000000000001 RBP: ffff880001a43c30 R08: 0000000000000067 R09: ffff8803217268c0 R10: ffff88031d1b0228 R11: 00000000000005f2 R12: ffff88032d600000 R13: ffff10032f040890 R14: 0000000000000000 R15: ffff880330b6ae00 FS: 0000000000000000(0000) GS:ffff880001a40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001d07be8 CR3: 0000000001642000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:1 (pid: 0, threadinfo ffff8803321f0000, task ffff88033209f700) Stack: ffff880001a43c30 ffff88031d1b0200 ffff88032d600908 ffff88031d1b0208 <0> ffff880001a43c90 ffffffff81344313 ffff88031d1b0200 ffff88032d600908 <0> ffff880001a43c70 ffffffff81061d07 000000004cc5bf73 ffff88031d1b0200 Call Trace: <IRQ> [<ffffffff81344313>] __netif_receive_skb+0x36/0x3b3 [<ffffffff81061d07>] ? ktime_get_real+0x11/0x3e [<ffffffff813454a5>] netif_receive_skb+0x67/0x6e [<ffffffff81345b8c>] napi_skb_finish+0x24/0x3b [<ffffffff813cb07f>] vlan_gro_receive+0x7b/0x80 [<ffffffffa016d5b9>] e1000_receive_skb+0x51/0x6d [e1000e] [<ffffffffa016eeb0>] e1000_clean_rx_irq+0x1ed/0x292 [e1000e] [<ffffffffa016f287>] e1000_clean+0x75/0x221 [e1000e] [<ffffffff81345690>] net_rx_action+0xad/0x19c [<ffffffff81048926>] __do_softirq+0xa8/0x135 [<ffffffff8100a99c>] call_softirq+0x1c/0x30 [<ffffffff8100c085>] do_softirq+0x41/0x7e [<ffffffff81048ab8>] irq_exit+0x36/0x85 [<ffffffff8100b7bf>] do_IRQ+0xad/0xc4 [<ffffffff813ed4d3>] ret_from_intr+0x0/0x11 <EOI> [<ffffffff8120dab2>] ? intel_idle+0xe6/0x112 [<ffffffff8120da95>] ? intel_idle+0xc9/0x112 [<ffffffff8131d121>] cpuidle_idle_call+0xab/0xe6 [<ffffffff81008dd5>] cpu_idle+0x59/0xb5 [<ffffffff813e6da8>] start_secondary+0x1a9/0x1ae Code: 0d 0f b7 c0 41 8b 44 85 04 66 c7 83 bc 00 00 00 00 00 89 43 78 4d 8b ad d8 00 00 00 e8 c1 95 e0 f -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Reproducible VLAN/e1000e crash in 2.6.36 vanilla. 2010-10-25 17:57 Reproducible VLAN/e1000e crash in 2.6.36 vanilla Ben Greear @ 2010-10-25 21:18 ` Ben Greear 2010-10-25 21:34 ` John Fastabend 0 siblings, 1 reply; 4+ messages in thread From: Ben Greear @ 2010-10-25 21:18 UTC (permalink / raw) To: NetDev On 10/25/2010 10:57 AM, Ben Greear wrote: > > To re-create, setup 2 802.1q vlans on different physical interfaces on > the same system, > set up routing rules such that send-to-self works, and pass traffic > (UDP/IPv4 in this case, > but doesn't seem to matter). > Stop traffic, then attempt to create additional 802.1q vlans on the same > physical interfaces. > The crash only appears to happen after having sent traffic on the > interface. > > Likely it will also crash if one system is sending to another, but so > far we've > just tested sending-to-self. > > This appears very reproducible for us, and appears to be the same > problem that > I had reported against our hacked kernel here: > > http://www.spinics.net/lists/netdev/msg144748.html Bleh, I think I see the problem. If a NIC is in promis mode, it can receive VLAN packets for which there are no VLAN devices. static gro_result_t vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp, unsigned int vlan_tci, struct sk_buff *skb) { struct sk_buff *p; struct net_device *vlan_dev; u16 vlan_id; if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master))) skb->deliver_no_wcard = 1; skb->skb_iif = skb->dev->ifindex; __vlan_hwaccel_put_tag(skb, vlan_tci); vlan_id = vlan_tci & VLAN_VID_MASK; vlan_dev = vlan_group_get_device(grp, vlan_id); if (vlan_dev) skb->dev = vlan_dev; else if (vlan_id) { if (!(skb->dev->flags & IFF_PROMISC)) goto drop; skb->pkt_type = PACKET_OTHERHOST; } You hit that else branch, and then skb->dev remains the physical device. Later, it's passed to: int vlan_hwaccel_do_receive(struct sk_buff *skb) { struct net_device *dev = skb->dev; struct vlan_rx_stats *rx_stats; skb->dev = vlan_dev_info(dev)->real_dev; netif_nit_deliver(skb); which does no checking before assuming that skb->dev is a vlan device. Things go downhill rapidly after that. Maybe this code in dev.c should check that skb->dev is VLAN device before passing to the hwaccel code? static int __netif_receive_skb(struct sk_buff *skb) { struct packet_type *ptype, *pt_prev; rx_handler_func_t *rx_handler; struct net_device *orig_dev; struct net_device *master; struct net_device *null_or_orig; struct net_device *orig_or_bond; int ret = NET_RX_DROP; __be16 type; if (!netdev_tstamp_prequeue) net_timestamp_check(skb); if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb)) return NET_RX_SUCCESS; Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Reproducible VLAN/e1000e crash in 2.6.36 vanilla. 2010-10-25 21:18 ` Ben Greear @ 2010-10-25 21:34 ` John Fastabend 2010-10-25 21:38 ` Eric Dumazet 0 siblings, 1 reply; 4+ messages in thread From: John Fastabend @ 2010-10-25 21:34 UTC (permalink / raw) To: Ben Greear; +Cc: NetDev On 10/25/2010 2:18 PM, Ben Greear wrote: > On 10/25/2010 10:57 AM, Ben Greear wrote: >> >> To re-create, setup 2 802.1q vlans on different physical interfaces on >> the same system, >> set up routing rules such that send-to-self works, and pass traffic >> (UDP/IPv4 in this case, >> but doesn't seem to matter). >> Stop traffic, then attempt to create additional 802.1q vlans on the same >> physical interfaces. >> The crash only appears to happen after having sent traffic on the >> interface. >> >> Likely it will also crash if one system is sending to another, but so >> far we've >> just tested sending-to-self. >> >> This appears very reproducible for us, and appears to be the same >> problem that >> I had reported against our hacked kernel here: >> >> http://www.spinics.net/lists/netdev/msg144748.html > > Bleh, I think I see the problem. > > If a NIC is in promis mode, it can receive VLAN packets for which there > are no VLAN devices. > > static gro_result_t > vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp, > unsigned int vlan_tci, struct sk_buff *skb) > { > struct sk_buff *p; > struct net_device *vlan_dev; > u16 vlan_id; > > if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master))) > skb->deliver_no_wcard = 1; > > skb->skb_iif = skb->dev->ifindex; > __vlan_hwaccel_put_tag(skb, vlan_tci); > vlan_id = vlan_tci & VLAN_VID_MASK; > vlan_dev = vlan_group_get_device(grp, vlan_id); > > if (vlan_dev) > skb->dev = vlan_dev; > else if (vlan_id) { > if (!(skb->dev->flags & IFF_PROMISC)) > goto drop; > skb->pkt_type = PACKET_OTHERHOST; > } > > You hit that else branch, and then skb->dev remains the physical > device. > > Later, it's passed to: > > int vlan_hwaccel_do_receive(struct sk_buff *skb) > { > struct net_device *dev = skb->dev; > struct vlan_rx_stats *rx_stats; > > skb->dev = vlan_dev_info(dev)->real_dev; > netif_nit_deliver(skb); > Looks like this should be fixed on net-next, bool vlan_hwaccel_do_receive(struct sk_buff **skbp) { struct sk_buff *skb = *skbp; u16 vlan_id = skb->vlan_tci & VLAN_VID_MASK; struct net_device *vlan_dev; struct vlan_rx_stats *rx_stats; vlan_dev = vlan_find_dev(skb->dev, vlan_id); if (!vlan_dev) { if (vlan_id) skb->pkt_type = PACKET_OTHERHOST; return false; } If the vlan_dev is not found do not set skb->dev and return false then in __netif_receive_skb, if (vlan_tx_tag_present(skb)) { if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = NULL; } if (vlan_hwaccel_do_receive(&skb)) { ret = __netif_receive_skb(skb); goto out; } else if (unlikely(!skb)) goto out; } > > which does no checking before assuming that skb->dev is a vlan > device. > > Things go downhill rapidly after that. > > > Maybe this code in dev.c should check that skb->dev is > VLAN device before passing to the hwaccel code? > > static int __netif_receive_skb(struct sk_buff *skb) > { > struct packet_type *ptype, *pt_prev; > rx_handler_func_t *rx_handler; > struct net_device *orig_dev; > struct net_device *master; > struct net_device *null_or_orig; > struct net_device *orig_or_bond; > int ret = NET_RX_DROP; > __be16 type; > > if (!netdev_tstamp_prequeue) > net_timestamp_check(skb); > > if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb)) > return NET_RX_SUCCESS; > > > Thanks, > Ben > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Reproducible VLAN/e1000e crash in 2.6.36 vanilla. 2010-10-25 21:34 ` John Fastabend @ 2010-10-25 21:38 ` Eric Dumazet 0 siblings, 0 replies; 4+ messages in thread From: Eric Dumazet @ 2010-10-25 21:38 UTC (permalink / raw) To: John Fastabend; +Cc: Ben Greear, NetDev Le lundi 25 octobre 2010 à 14:34 -0700, John Fastabend a écrit : > On 10/25/2010 2:18 PM, Ben Greear wrote: > > On 10/25/2010 10:57 AM, Ben Greear wrote: > >> > >> To re-create, setup 2 802.1q vlans on different physical interfaces on > >> the same system, > >> set up routing rules such that send-to-self works, and pass traffic > >> (UDP/IPv4 in this case, > >> but doesn't seem to matter). > >> Stop traffic, then attempt to create additional 802.1q vlans on the same > >> physical interfaces. > >> The crash only appears to happen after having sent traffic on the > >> interface. > >> > >> Likely it will also crash if one system is sending to another, but so > >> far we've > >> just tested sending-to-self. > >> > >> This appears very reproducible for us, and appears to be the same > >> problem that > >> I had reported against our hacked kernel here: > >> > >> http://www.spinics.net/lists/netdev/msg144748.html > > > > Bleh, I think I see the problem. > > > > If a NIC is in promis mode, it can receive VLAN packets for which there > > are no VLAN devices. > > > > static gro_result_t > > vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp, > > unsigned int vlan_tci, struct sk_buff *skb) > > { > > struct sk_buff *p; > > struct net_device *vlan_dev; > > u16 vlan_id; > > > > if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master))) > > skb->deliver_no_wcard = 1; > > > > skb->skb_iif = skb->dev->ifindex; > > __vlan_hwaccel_put_tag(skb, vlan_tci); > > vlan_id = vlan_tci & VLAN_VID_MASK; > > vlan_dev = vlan_group_get_device(grp, vlan_id); > > > > if (vlan_dev) > > skb->dev = vlan_dev; > > else if (vlan_id) { > > if (!(skb->dev->flags & IFF_PROMISC)) > > goto drop; > > skb->pkt_type = PACKET_OTHERHOST; > > } > > > > You hit that else branch, and then skb->dev remains the physical > > device. > > > > Later, it's passed to: > > > > int vlan_hwaccel_do_receive(struct sk_buff *skb) > > { > > struct net_device *dev = skb->dev; > > struct vlan_rx_stats *rx_stats; > > > > skb->dev = vlan_dev_info(dev)->real_dev; > > netif_nit_deliver(skb); > > > > Looks like this should be fixed on net-next, > > bool vlan_hwaccel_do_receive(struct sk_buff **skbp) > { > struct sk_buff *skb = *skbp; > u16 vlan_id = skb->vlan_tci & VLAN_VID_MASK; > struct net_device *vlan_dev; > struct vlan_rx_stats *rx_stats; > > vlan_dev = vlan_find_dev(skb->dev, vlan_id); > if (!vlan_dev) { > if (vlan_id) > skb->pkt_type = PACKET_OTHERHOST; > return false; > } > > If the vlan_dev is not found do not set skb->dev and return false then > in __netif_receive_skb, > > if (vlan_tx_tag_present(skb)) { > if (pt_prev) { > ret = deliver_skb(skb, pt_prev, orig_dev); > pt_prev = NULL; > } > if (vlan_hwaccel_do_receive(&skb)) { > ret = __netif_receive_skb(skb); > goto out; > } else if (unlikely(!skb)) > goto out; > } > Yes but net-next is totally different beast for vlans ;) We should make a patch for 2.6.36, not bringing huge vlan stuff added for 2.6.37 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-10-25 21:38 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-25 17:57 Reproducible VLAN/e1000e crash in 2.6.36 vanilla Ben Greear 2010-10-25 21:18 ` Ben Greear 2010-10-25 21:34 ` John Fastabend 2010-10-25 21:38 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).