* [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. @ 2012-04-17 6:22 Peter Huang (Peng) 2012-04-17 15:52 ` Stephen Hemminger 2012-04-18 19:04 ` Jonathan Nieder 0 siblings, 2 replies; 9+ messages in thread From: Peter Huang (Peng) @ 2012-04-17 6:22 UTC (permalink / raw) To: shemminger, 'David S. Miller', netdev Cc: eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, peter.huangpeng, harry.majun When bridge is deleted before tap/vif device's delete, kernel may encounter an oops because of NULL reference to fake_rtable's dst. Set fake_rtable's dst to NULL before sending packets out can solve this problem. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> --- include/linux/netfilter_bridge.h | 8 ++++++++ net/bridge/br_forward.c | 1 + net/bridge/br_netfilter.c | 6 +----- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h index 0ddd161..70744fe 100644 --- a/include/linux/netfilter_bridge.h +++ b/include/linux/netfilter_bridge.h @@ -104,9 +104,17 @@ struct bridge_skb_cb { } daddr; }; +static inline void br_drop_fake_rtable(struct sk_buff *skb) { + struct dst_entry *dst = skb_dst(skb); + /* abuse fact that only fake_rtable has DST_NOPEER set */ + if (dst && (dst->flags & DST_NOPEER)) + skb_dst_drop(skb); +} + #else #define nf_bridge_maybe_copy_header(skb) (0) #define nf_bridge_pad(skb) (0) +#define br_drop_fake_rtable(skb) (0) #endif /* CONFIG_BRIDGE_NETFILTER */ #endif /* __KERNEL__ */ diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 61f6534..a2098e3 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -47,6 +47,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb) kfree_skb(skb); } else { skb_push(skb, ETH_HLEN); + br_drop_fake_rtable(skb); dev_queue_xmit(skb); } diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index dec4f38..946dcb0 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -694,11 +694,7 @@ static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff *skb, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - struct rtable *rt = skb_rtable(skb); - - if (rt && rt == bridge_parent_rtable(in)) - skb_dst_drop(skb); - + br_drop_fake_rtable(skb); return NF_ACCEPT; } -------------------------------- Peter Huang(peng) ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 6:22 [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops Peter Huang (Peng) @ 2012-04-17 15:52 ` Stephen Hemminger 2012-04-17 16:48 ` Eric Dumazet 2012-04-18 19:04 ` Jonathan Nieder 1 sibling, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2012-04-17 15:52 UTC (permalink / raw) To: Peter Huang (Peng) Cc: 'David S. Miller', netdev, eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On Tue, 17 Apr 2012 14:22:26 +0800 "Peter Huang (Peng)" <peter.huangpeng@huawei.com> wrote: > When bridge is deleted before tap/vif device's delete, kernel may encounter an oops because of NULL reference to fake_rtable's dst. > Set fake_rtable's dst to NULL before sending packets out can solve this problem. > > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> > --- > include/linux/netfilter_bridge.h | 8 ++++++++ > net/bridge/br_forward.c | 1 + > net/bridge/br_netfilter.c | 6 +----- > 3 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h > index 0ddd161..70744fe 100644 > --- a/include/linux/netfilter_bridge.h > +++ b/include/linux/netfilter_bridge.h > @@ -104,9 +104,17 @@ struct bridge_skb_cb { > } daddr; > }; > > +static inline void br_drop_fake_rtable(struct sk_buff *skb) { > + struct dst_entry *dst = skb_dst(skb); > + /* abuse fact that only fake_rtable has DST_NOPEER set */ > + if (dst && (dst->flags & DST_NOPEER)) > + skb_dst_drop(skb); > +} This check seems like a disaster waiting to happen when the next change to DST table happens. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 15:52 ` Stephen Hemminger @ 2012-04-17 16:48 ` Eric Dumazet 2012-04-18 7:57 ` Peter Huang(Peng) 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2012-04-17 16:48 UTC (permalink / raw) To: Stephen Hemminger Cc: Peter Huang (Peng), 'David S. Miller', netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: > This check seems like a disaster waiting to happen when the next > change to DST table happens. Please Peter Document this, adding a new DST_FAKE_RTABLE flag #define DST_FAKE_RTABLE DST_NOPEER or just use a bit, we have plenty of them available. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 16:48 ` Eric Dumazet @ 2012-04-18 7:57 ` Peter Huang(Peng) 2012-04-18 8:23 ` Eric Dumazet 2012-04-18 8:41 ` David Miller 0 siblings, 2 replies; 9+ messages in thread From: Peter Huang(Peng) @ 2012-04-18 7:57 UTC (permalink / raw) To: Eric Dumazet, Stephen Hemminger Cc: 'David S. Miller', netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On 2012/4/18 0:48, Eric Dumazet wrote: > On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: > >> This check seems like a disaster waiting to happen when the next >> change to DST table happens. > > Please Peter Document this, adding a new DST_FAKE_RTABLE flag > > #define DST_FAKE_RTABLE DST_NOPEER > > or just use a bit, we have plenty of them available. > Add DST_FAKE_RTABLE to dst_entry, this is the new patch. Is this ok? Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> --- include/linux/netfilter_bridge.h | 8 ++++++++ include/net/dst.h | 1 + net/bridge/br_forward.c | 1 + net/bridge/br_netfilter.c | 8 ++------ 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h index 0ddd161..eb09e3b 100644 --- a/include/linux/netfilter_bridge.h +++ b/include/linux/netfilter_bridge.h @@ -104,9 +104,17 @@ struct bridge_skb_cb { } daddr; }; +static inline void br_drop_fake_rtable(struct sk_buff *skb) { + struct dst_entry *dst = skb_dst(skb); + /* abuse fact that only fake_rtable has DST_FAKE_RTABLE set */ + if (dst && (dst->flags & DST_FAKE_RTABLE)) + skb_dst_drop(skb); +} + #else #define nf_bridge_maybe_copy_header(skb) (0) #define nf_bridge_pad(skb) (0) +#define br_drop_fake_rtable(skb) (0) #endif /* CONFIG_BRIDGE_NETFILTER */ #endif /* __KERNEL__ */ diff --git a/include/net/dst.h b/include/net/dst.h index 59c5d18..b094030 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -55,6 +55,7 @@ struct dst_entry { #define DST_NOCACHE 0x0010 #define DST_NOCOUNT 0x0020 #define DST_NOPEER 0x0040 +#define DST_FAKE_RTABLE 0x0080 short error; short obsolete; diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 61f6534..a2098e3 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -47,6 +47,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb) kfree_skb(skb); } else { skb_push(skb, ETH_HLEN); + br_drop_fake_rtable(skb); dev_queue_xmit(skb); } diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index dec4f38..d7f49b6 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -156,7 +156,7 @@ void br_netfilter_rtable_init(struct net_bridge *br) rt->dst.dev = br->dev; rt->dst.path = &rt->dst; dst_init_metrics(&rt->dst, br_dst_default_metrics, true); - rt->dst.flags = DST_NOXFRM | DST_NOPEER; + rt->dst.flags = DST_NOXFRM | DST_NOPEER | DST_FAKE_RTABLE; rt->dst.ops = &fake_dst_ops; } @@ -694,11 +694,7 @@ static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff *skb, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - struct rtable *rt = skb_rtable(skb); - - if (rt && rt == bridge_parent_rtable(in)) - skb_dst_drop(skb); - + br_drop_fake_rtable(skb); return NF_ACCEPT; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-18 7:57 ` Peter Huang(Peng) @ 2012-04-18 8:23 ` Eric Dumazet 2012-04-18 8:41 ` David Miller 1 sibling, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2012-04-18 8:23 UTC (permalink / raw) To: Peter Huang(Peng) Cc: Stephen Hemminger, 'David S. Miller', netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On Wed, 2012-04-18 at 15:57 +0800, Peter Huang(Peng) wrote: > On 2012/4/18 0:48, Eric Dumazet wrote: > > On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: > > > >> This check seems like a disaster waiting to happen when the next > >> change to DST table happens. > > > > Please Peter Document this, adding a new DST_FAKE_RTABLE flag > > > > #define DST_FAKE_RTABLE DST_NOPEER > > > > or just use a bit, we have plenty of them available. > > > > > Add DST_FAKE_RTABLE to dst_entry, this is the new patch. > Is this ok? > A full new patch is needed, with nice changelog, and proper formatting (your mail was mangled) > }; > > +static inline void br_drop_fake_rtable(struct sk_buff *skb) { > + struct dst_entry *dst = skb_dst(skb); > + /* abuse fact that only fake_rtable has DST_FAKE_RTABLE set */ Remove the comment, since we dont abuse NOPEER anymore, we use a dedicated flag. (keep an empty line) > + if (dst && (dst->flags & DST_FAKE_RTABLE)) > + skb_dst_drop(skb); > +} > + > #else > #define nf_bridge_maybe_copy_header(skb) (0) > #define nf_bridge_pad(skb) (0) > +#define br_drop_fake_rtable(skb) (0) > #endif /* CONFIG_BRIDGE_NETFILTER */ > > #endif /* __KERNEL__ */ > diff --git a/include/net/dst.h b/include/net/dst.h > index 59c5d18..b094030 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -55,6 +55,7 @@ struct dst_entry { > #define DST_NOCACHE 0x0010 > #define DST_NOCOUNT 0x0020 > #define DST_NOPEER 0x0040 > +#define DST_FAKE_RTABLE 0x0080 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-18 7:57 ` Peter Huang(Peng) 2012-04-18 8:23 ` Eric Dumazet @ 2012-04-18 8:41 ` David Miller 1 sibling, 0 replies; 9+ messages in thread From: David Miller @ 2012-04-18 8:41 UTC (permalink / raw) To: peter.huangpeng Cc: eric.dumazet, shemminger, netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun From: "Peter Huang(Peng)" <peter.huangpeng@huawei.com> Date: Wed, 18 Apr 2012 15:57:23 +0800 > On 2012/4/18 0:48, Eric Dumazet wrote: >> On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: >> >>> This check seems like a disaster waiting to happen when the next >>> change to DST table happens. >> >> Please Peter Document this, adding a new DST_FAKE_RTABLE flag >> >> #define DST_FAKE_RTABLE DST_NOPEER >> >> or just use a bit, we have plenty of them available. >> > > > Add DST_FAKE_RTABLE to dst_entry, this is the new patch. > Is this ok? > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> Please post new patches as completely new emails, not as replies to other emails. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 6:22 [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops Peter Huang (Peng) 2012-04-17 15:52 ` Stephen Hemminger @ 2012-04-18 19:04 ` Jonathan Nieder 2012-04-19 7:58 ` Massimo Cetra 1 sibling, 1 reply; 9+ messages in thread From: Jonathan Nieder @ 2012-04-18 19:04 UTC (permalink / raw) To: Peter Huang (Peng) Cc: shemminger, 'David S. Miller', netdev, eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun Hi, Peter Huang (Peng) wrote: > When bridge is deleted before tap/vif device's delete, kernel may > encounter an oops because of NULL reference to fake_rtable's dst. > > Set fake_rtable's dst to NULL before sending packets out can solve > this problem. > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> > --- > include/linux/netfilter_bridge.h | 8 ++++++++ > net/bridge/br_forward.c | 1 + > net/bridge/br_netfilter.c | 6 +----- > 3 files changed, 10 insertions(+), 5 deletions(-) Massimo Cetra (cc-ed) tested the patch against a 3.2.y kernel and wrote[1]: > The patch i applied yesterday to the debian kernel has been installed > and the kernel is not panic-ing anymore. > > I'll try to keep this bug up to date. So it seems to work. Dave, please consider queuing this for stable@ when the final patch is ready. Thanks, Jonathan [1] http://bugs.debian.org/668511#37 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-18 19:04 ` Jonathan Nieder @ 2012-04-19 7:58 ` Massimo Cetra 0 siblings, 0 replies; 9+ messages in thread From: Massimo Cetra @ 2012-04-19 7:58 UTC (permalink / raw) To: Jonathan Nieder Cc: Peter Huang (Peng), shemminger, 'David S. Miller', netdev, eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On 18/04/2012 21:04, Jonathan Nieder wrote: > > Massimo Cetra (cc-ed) tested the patch against a 3.2.y kernel and wrote[1]: > >> The patch i applied yesterday to the debian kernel has been installed >> and the kernel is not panic-ing anymore. >> >> I'll try to keep this bug up to date. > > So it seems to work. Dave, please consider queuing this for stable@ > when the final patch is ready. Also please consider that the bridge problems are not over and the kernel keeps panicing in a different way... - http://marc.info/?t=133474395500004&r=1&w=2 Thanks, Massimo ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <002501cd0d74$317fd100$947f7300$%huangpeng@huawei.com>]
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. [not found] <002501cd0d74$317fd100$947f7300$%huangpeng@huawei.com> @ 2012-03-29 6:36 ` Eric Dumazet 0 siblings, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2012-03-29 6:36 UTC (permalink / raw) To: Peter Huang (Peng); +Cc: linux-kernel, harry.majun, zhoukang7, netdev On Thu, 2012-03-29 at 14:21 +0800, Peter Huang (Peng) wrote: > In our environment, we encountered a kernel Oops problem, and caused a > restart. > CC netdev, since its more appropriate > Below are what happened: > kernel: 2.6.32.36-0.5-xen OS:xen + dom-0 + guest(rhel5.5) > 1.destroy one VM. > 2.ipsan path have some problem and make destroy process delayed about 10s. > 3.customer defined script find that VM no longer exsit through libvirt API. > 4.br0(related to the VM we are destoryed before) was deleted by the script. > 5.delayed VM destroy process come to tap device releasing, this will > decrement > skb->_skb_dst's reference count(skb->_skb_dst points to fake_rtable), but > br0 > deleting already released this struct, and unfortunately OS reused this > memory > and marked it read-only. > 6.Oops happened, and caused restart. > > After analyzing the stack dump info, we find out that during our VM destroy, > lots of ipv6 multicast pkts > exsited, and skb->_skb_dst pointed to (stuct)fake_rtable. > through kernel source greping, will only find one reference to fake_rtable's > MTU setting. > > So I'm wondering that what fake_rtable stands for, and where we are using > it. > If fake_rtable's dst is not used, we can make dst as NULL to avoid our > problem,. > I also added the patch which modified the skb->_skb_dst to NULL when > "skb->_skb_dst == (unsigned long)&to->br->fake_rtable". > > BTW, we also verified a similar senario on kernel-3.3, that br0 has attached > eth0 and eth1, eth1 was > connected to our guest which will multicast ipv6 packets, and you can get an > "WARNING: at net/core/dst.c:274 dst_release+0x6d/0x70()" > by using the fake_rtable_verify.c attached, > #gcc fake_rtable_verify.c > #./a.out & > #sleep 30 //make sure ipv6 pkts was in tap00's receiving queue. > #ifconfig br0 down > #brctl delbr br0 //delete br0, will also delete net_device's fake_rtable. > #sleep 50 > #kill -9 `pidof a.out` //tap00's delete will do dst_release, and this will > write to the memory already freed. > > Below is the Oops stack dump info: > //////////////////////////////////////////////////////////////////////////// > /// > RIP: e030:[<ffffffff802ddbd1>] > <ffffffff802ddbd1>{dst_release+0x11} > RSP: e02b:ffff88008b185b70 EFLAGS: 00010286 > RAX: 00000000ffffffff RBX: ffff880033d184c0 RCX: 0000000000000000 > RDX: ffff88008b54f080 RSI: 0000000012df12df RDI: ffff88008b54efc0 > RBP: ffff8800f4a3f500 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000002 R11: ffffffff8018c1e0 R12: ffff8800f4a3f400 > R13: 0000000000000001 R14: ffff8800f4a3f4e0 R15: ffff8800351030c0 > FS: 00007f4cbd080700(0000) GS:ffff880002008000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff88008b54f080 CR3: 000000008a27c000 CR4: 0000000000002620 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > <ffffffff80009b05>{dump_trace+0x65} > <ffffffff8037d897>{notifier_call_chain+0x37} > <ffffffff8005a1ed>{notify_die+0x2d} > <ffffffff8037bd0b>{__die+0x8b} > <ffffffff8001bed1>{no_context+0xd1} > <ffffffff8001c1f5>{__bad_area_nosemaphore+0x175} > <ffffffff8037b298>{page_fault+0x28} > <ffffffff802ddbd1>{dst_release+0x11} > <ffffffff802cd69d>{skb_release_head_state+0xbd} > <ffffffff802cd369>{__kfree_skb+0x9} > <ffffffff802edaab>{pfifo_fast_reset+0x5b} > <ffffffff802edbd3>{qdisc_reset+0x13} > <ffffffff802edcc7>{dev_deactivate_queue+0x57} > <ffffffff802ee4bf>{dev_deactivate+0x3f} > <ffffffff802d9575>{dev_close+0x65} > <ffffffff802d960e>{rollback_registered+0x3e} > <ffffffff802d9715>{unregister_netdevice+0x15} > <ffffffffa0807655>{tun:tun_chr_close+0xe5} > <ffffffff800d9edd>{__fput+0xcd} > <ffffffff800d6076>{filp_close+0x56} > <ffffffff8003fd9a>{put_files_struct+0x7a} > <ffffffff80040fb2>{do_exit+0x752} > <ffffffff800410ef>{do_group_exit+0x3f} > <ffffffff8004d9d9>{get_signal_to_deliver+0x229} > <ffffffff80006acd>{do_notify_resume+0x11d} > <ffffffff8000763c>{int_signal+0x12} > [<00007f4cbc7fd57d>] > //////////////////////////////////////////////////////////////////////////// > /// > > Signed-off-by: Peter Huang(Peng) <peter.huangpeng@huawei.com> > --- > diff -Nur a/net/bridge/br_forward.c b/net/bridge/br_forward.c > @@ -91,6 +91,9 @@ > skb->dev = to->dev; > skb_forward_csum(skb); > > + if (skb->_skb_dst == (unsigned long)&to->br->fake_rtable) > + skb_dst_set(skb, NULL); > + > NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev, > br_forward_finish); > } Did you check current kernel has this bug ? I remember we already fix this, maybe you need a backport. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-04-19 8:04 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-17 6:22 [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops Peter Huang (Peng) 2012-04-17 15:52 ` Stephen Hemminger 2012-04-17 16:48 ` Eric Dumazet 2012-04-18 7:57 ` Peter Huang(Peng) 2012-04-18 8:23 ` Eric Dumazet 2012-04-18 8:41 ` David Miller 2012-04-18 19:04 ` Jonathan Nieder 2012-04-19 7:58 ` Massimo Cetra [not found] <002501cd0d74$317fd100$947f7300$%huangpeng@huawei.com> 2012-03-29 6:36 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).