* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. [not found] <002501cd0d74$317fd100$947f7300$%huangpeng@huawei.com> @ 2012-03-29 6:36 ` Eric Dumazet 2012-03-29 6:40 ` 答复: " Peter Huang (Peng) 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-03-29 6:36 UTC (permalink / raw) To: Peter Huang (Peng); +Cc: linux-kernel, harry.majun, zhoukang7, netdev On Thu, 2012-03-29 at 14:21 +0800, Peter Huang (Peng) wrote: > In our environment, we encountered a kernel Oops problem, and caused a > restart. > CC netdev, since its more appropriate > Below are what happened: > kernel: 2.6.32.36-0.5-xen OS:xen + dom-0 + guest(rhel5.5) > 1.destroy one VM. > 2.ipsan path have some problem and make destroy process delayed about 10s. > 3.customer defined script find that VM no longer exsit through libvirt API. > 4.br0(related to the VM we are destoryed before) was deleted by the script. > 5.delayed VM destroy process come to tap device releasing, this will > decrement > skb->_skb_dst's reference count(skb->_skb_dst points to fake_rtable), but > br0 > deleting already released this struct, and unfortunately OS reused this > memory > and marked it read-only. > 6.Oops happened, and caused restart. > > After analyzing the stack dump info, we find out that during our VM destroy, > lots of ipv6 multicast pkts > exsited, and skb->_skb_dst pointed to (stuct)fake_rtable. > through kernel source greping, will only find one reference to fake_rtable's > MTU setting. > > So I'm wondering that what fake_rtable stands for, and where we are using > it. > If fake_rtable's dst is not used, we can make dst as NULL to avoid our > problem,. > I also added the patch which modified the skb->_skb_dst to NULL when > "skb->_skb_dst == (unsigned long)&to->br->fake_rtable". > > BTW, we also verified a similar senario on kernel-3.3, that br0 has attached > eth0 and eth1, eth1 was > connected to our guest which will multicast ipv6 packets, and you can get an > "WARNING: at net/core/dst.c:274 dst_release+0x6d/0x70()" > by using the fake_rtable_verify.c attached, > #gcc fake_rtable_verify.c > #./a.out & > #sleep 30 //make sure ipv6 pkts was in tap00's receiving queue. > #ifconfig br0 down > #brctl delbr br0 //delete br0, will also delete net_device's fake_rtable. > #sleep 50 > #kill -9 `pidof a.out` //tap00's delete will do dst_release, and this will > write to the memory already freed. > > Below is the Oops stack dump info: > //////////////////////////////////////////////////////////////////////////// > /// > RIP: e030:[<ffffffff802ddbd1>] > <ffffffff802ddbd1>{dst_release+0x11} > RSP: e02b:ffff88008b185b70 EFLAGS: 00010286 > RAX: 00000000ffffffff RBX: ffff880033d184c0 RCX: 0000000000000000 > RDX: ffff88008b54f080 RSI: 0000000012df12df RDI: ffff88008b54efc0 > RBP: ffff8800f4a3f500 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000002 R11: ffffffff8018c1e0 R12: ffff8800f4a3f400 > R13: 0000000000000001 R14: ffff8800f4a3f4e0 R15: ffff8800351030c0 > FS: 00007f4cbd080700(0000) GS:ffff880002008000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff88008b54f080 CR3: 000000008a27c000 CR4: 0000000000002620 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > <ffffffff80009b05>{dump_trace+0x65} > <ffffffff8037d897>{notifier_call_chain+0x37} > <ffffffff8005a1ed>{notify_die+0x2d} > <ffffffff8037bd0b>{__die+0x8b} > <ffffffff8001bed1>{no_context+0xd1} > <ffffffff8001c1f5>{__bad_area_nosemaphore+0x175} > <ffffffff8037b298>{page_fault+0x28} > <ffffffff802ddbd1>{dst_release+0x11} > <ffffffff802cd69d>{skb_release_head_state+0xbd} > <ffffffff802cd369>{__kfree_skb+0x9} > <ffffffff802edaab>{pfifo_fast_reset+0x5b} > <ffffffff802edbd3>{qdisc_reset+0x13} > <ffffffff802edcc7>{dev_deactivate_queue+0x57} > <ffffffff802ee4bf>{dev_deactivate+0x3f} > <ffffffff802d9575>{dev_close+0x65} > <ffffffff802d960e>{rollback_registered+0x3e} > <ffffffff802d9715>{unregister_netdevice+0x15} > <ffffffffa0807655>{tun:tun_chr_close+0xe5} > <ffffffff800d9edd>{__fput+0xcd} > <ffffffff800d6076>{filp_close+0x56} > <ffffffff8003fd9a>{put_files_struct+0x7a} > <ffffffff80040fb2>{do_exit+0x752} > <ffffffff800410ef>{do_group_exit+0x3f} > <ffffffff8004d9d9>{get_signal_to_deliver+0x229} > <ffffffff80006acd>{do_notify_resume+0x11d} > <ffffffff8000763c>{int_signal+0x12} > [<00007f4cbc7fd57d>] > //////////////////////////////////////////////////////////////////////////// > /// > > Signed-off-by: Peter Huang(Peng) <peter.huangpeng@huawei.com> > --- > diff -Nur a/net/bridge/br_forward.c b/net/bridge/br_forward.c > @@ -91,6 +91,9 @@ > skb->dev = to->dev; > skb_forward_csum(skb); > > + if (skb->_skb_dst == (unsigned long)&to->br->fake_rtable) > + skb_dst_set(skb, NULL); > + > NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev, > br_forward_finish); > } Did you check current kernel has this bug ? I remember we already fix this, maybe you need a backport. ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-29 6:36 ` [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops Eric Dumazet @ 2012-03-29 6:40 ` Peter Huang (Peng) 2012-03-29 8:52 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Peter Huang (Peng) @ 2012-03-29 6:40 UTC (permalink / raw) To: 'Eric Dumazet' Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' We already check current kernel-3.3, it has the same problem. I am not very sure that if this modify could cause other problems or not, Because I don't know where fake_rtable was used. -----邮件原件----- 发件人: Eric Dumazet [mailto:eric.dumazet@gmail.com] 发送时间: 2012年3月29日 14:36 收件人: Peter Huang (Peng) 抄送: linux-kernel@vger.kernel.org; harry.majun@huawei.com; zhoukang7@huawei.com; netdev 主题: Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. On Thu, 2012-03-29 at 14:21 +0800, Peter Huang (Peng) wrote: > In our environment, we encountered a kernel Oops problem, and caused a > restart. > CC netdev, since its more appropriate > Below are what happened: > kernel: 2.6.32.36-0.5-xen OS:xen + dom-0 + guest(rhel5.5) > 1.destroy one VM. > 2.ipsan path have some problem and make destroy process delayed about 10s. > 3.customer defined script find that VM no longer exsit through libvirt API. > 4.br0(related to the VM we are destoryed before) was deleted by the script. > 5.delayed VM destroy process come to tap device releasing, this will > decrement > skb->_skb_dst's reference count(skb->_skb_dst points to fake_rtable), but > br0 > deleting already released this struct, and unfortunately OS reused this > memory > and marked it read-only. > 6.Oops happened, and caused restart. > > After analyzing the stack dump info, we find out that during our VM destroy, > lots of ipv6 multicast pkts > exsited, and skb->_skb_dst pointed to (stuct)fake_rtable. > through kernel source greping, will only find one reference to fake_rtable's > MTU setting. > > So I'm wondering that what fake_rtable stands for, and where we are using > it. > If fake_rtable's dst is not used, we can make dst as NULL to avoid our > problem,. > I also added the patch which modified the skb->_skb_dst to NULL when > "skb->_skb_dst == (unsigned long)&to->br->fake_rtable". > > BTW, we also verified a similar senario on kernel-3.3, that br0 has attached > eth0 and eth1, eth1 was > connected to our guest which will multicast ipv6 packets, and you can get an > "WARNING: at net/core/dst.c:274 dst_release+0x6d/0x70()" > by using the fake_rtable_verify.c attached, > #gcc fake_rtable_verify.c > #./a.out & > #sleep 30 //make sure ipv6 pkts was in tap00's receiving queue. > #ifconfig br0 down > #brctl delbr br0 //delete br0, will also delete net_device's fake_rtable. > #sleep 50 > #kill -9 `pidof a.out` //tap00's delete will do dst_release, and this will > write to the memory already freed. > > Below is the Oops stack dump info: > //////////////////////////////////////////////////////////////////////////// > /// > RIP: e030:[<ffffffff802ddbd1>] > <ffffffff802ddbd1>{dst_release+0x11} > RSP: e02b:ffff88008b185b70 EFLAGS: 00010286 > RAX: 00000000ffffffff RBX: ffff880033d184c0 RCX: 0000000000000000 > RDX: ffff88008b54f080 RSI: 0000000012df12df RDI: ffff88008b54efc0 > RBP: ffff8800f4a3f500 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000002 R11: ffffffff8018c1e0 R12: ffff8800f4a3f400 > R13: 0000000000000001 R14: ffff8800f4a3f4e0 R15: ffff8800351030c0 > FS: 00007f4cbd080700(0000) GS:ffff880002008000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff88008b54f080 CR3: 000000008a27c000 CR4: 0000000000002620 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > <ffffffff80009b05>{dump_trace+0x65} > <ffffffff8037d897>{notifier_call_chain+0x37} > <ffffffff8005a1ed>{notify_die+0x2d} > <ffffffff8037bd0b>{__die+0x8b} > <ffffffff8001bed1>{no_context+0xd1} > <ffffffff8001c1f5>{__bad_area_nosemaphore+0x175} > <ffffffff8037b298>{page_fault+0x28} > <ffffffff802ddbd1>{dst_release+0x11} > <ffffffff802cd69d>{skb_release_head_state+0xbd} > <ffffffff802cd369>{__kfree_skb+0x9} > <ffffffff802edaab>{pfifo_fast_reset+0x5b} > <ffffffff802edbd3>{qdisc_reset+0x13} > <ffffffff802edcc7>{dev_deactivate_queue+0x57} > <ffffffff802ee4bf>{dev_deactivate+0x3f} > <ffffffff802d9575>{dev_close+0x65} > <ffffffff802d960e>{rollback_registered+0x3e} > <ffffffff802d9715>{unregister_netdevice+0x15} > <ffffffffa0807655>{tun:tun_chr_close+0xe5} > <ffffffff800d9edd>{__fput+0xcd} > <ffffffff800d6076>{filp_close+0x56} > <ffffffff8003fd9a>{put_files_struct+0x7a} > <ffffffff80040fb2>{do_exit+0x752} > <ffffffff800410ef>{do_group_exit+0x3f} > <ffffffff8004d9d9>{get_signal_to_deliver+0x229} > <ffffffff80006acd>{do_notify_resume+0x11d} > <ffffffff8000763c>{int_signal+0x12} > [<00007f4cbc7fd57d>] > //////////////////////////////////////////////////////////////////////////// > /// > > Signed-off-by: Peter Huang(Peng) <peter.huangpeng@huawei.com> > --- > diff -Nur a/net/bridge/br_forward.c b/net/bridge/br_forward.c > @@ -91,6 +91,9 @@ > skb->dev = to->dev; > skb_forward_csum(skb); > > + if (skb->_skb_dst == (unsigned long)&to->br->fake_rtable) > + skb_dst_set(skb, NULL); > + > NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev, > br_forward_finish); > } Did you check current kernel has this bug ? I remember we already fix this, maybe you need a backport. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-29 6:40 ` 答复: " Peter Huang (Peng) @ 2012-03-29 8:52 ` Eric Dumazet 2012-03-29 9:38 ` 答复: " Peter Huang (Peng) 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-03-29 8:52 UTC (permalink / raw) To: Peter Huang (Peng); +Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' On Thu, 2012-03-29 at 14:40 +0800, Peter Huang (Peng) wrote: > We already check current kernel-3.3, it has the same problem. > > I am not very sure that if this modify could cause other problems or not, > Because I don't know where fake_rtable was used. Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value in PMTU (revised)) Apparently bug is because struct net_bridge is freed while its embedded fake_rtable is still used by some packets. I am not sure we are allowed to NULLify skb->dst, it might break netfilter. Maybe real fix would be to use a non embedded dst. ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-29 8:52 ` Eric Dumazet @ 2012-03-29 9:38 ` Peter Huang (Peng) 2012-03-29 11:31 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Peter Huang (Peng) @ 2012-03-29 9:38 UTC (permalink / raw) To: 'Eric Dumazet' Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' Thks for your mail. >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value >in PMTU (revised)) This patch already included in kernel-3.3, but for our case, virtual tap device's delayed Deletion will also cause kernel oops even in kernel3.3. >Apparently bug is because struct net_bridge is freed while its embedded >fake_rtable is still used by some packets. >I am not sure we are allowed to NULLify skb->dst, it might break >netfilter. Are you familiar with fake_rtable? I search the source, but only find one op that operate on the MTU. >Maybe real fix would be to use a non embedded dst. I agreed with you to, so at least until now this problem still exsits in the latest kernel, I will also look into it, And try to modify it from the root. Thanks again. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-29 9:38 ` 答复: " Peter Huang (Peng) @ 2012-03-29 11:31 ` Eric Dumazet 2012-03-29 11:41 ` 答复: " Peter Huang (Peng) 2012-03-31 1:26 ` Peter Huang (Peng) 0 siblings, 2 replies; 17+ messages in thread From: Eric Dumazet @ 2012-03-29 11:31 UTC (permalink / raw) To: Peter Huang (Peng); +Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' On Thu, 2012-03-29 at 17:38 +0800, Peter Huang (Peng) wrote: > Thks for your mail. > > >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce > >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value > >in PMTU (revised)) > > This patch already included in kernel-3.3, but for our case, virtual tap device's delayed > Deletion will also cause kernel oops even in kernel3.3. I was suggesting you take a look at the commit content ;) Then you can see the code in br_nf_local_in(), a bit cleaner than yours. ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-29 11:31 ` Eric Dumazet @ 2012-03-29 11:41 ` Peter Huang (Peng) 2012-03-31 1:26 ` Peter Huang (Peng) 1 sibling, 0 replies; 17+ messages in thread From: Peter Huang (Peng) @ 2012-03-29 11:41 UTC (permalink / raw) To: 'Eric Dumazet' Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' > >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce > >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value > >in PMTU (revised)) > > This patch already included in kernel-3.3, but for our case, virtual tap device's delayed > Deletion will also cause kernel oops even in kernel3.3. >I was suggesting you take a look at the commit content ;) >Then you can see the code in br_nf_local_in(), a bit cleaner than yours. Yes, I will look into it, and trying to find out why this patch didn't work for our case. Thanks for your advice:) ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-29 11:31 ` Eric Dumazet 2012-03-29 11:41 ` 答复: " Peter Huang (Peng) @ 2012-03-31 1:26 ` Peter Huang (Peng) 2012-03-31 5:41 ` Eric Dumazet 1 sibling, 1 reply; 17+ messages in thread From: Peter Huang (Peng) @ 2012-03-31 1:26 UTC (permalink / raw) To: 'Eric Dumazet' Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' > > >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce > > >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value > > >in PMTU (revised)) Hi, Eric I confirmed the patch contents again. For our case, NF_INET_PRE_ROUTING is involved, not NF_BR_LOCAL_IN. It seems pre-routing is not included in the patch you mentioned. BTW, our pkts are all ipv6 DHCP pkts(with MAC 33 33 00 01 00 02). We verified the patch bellow, and this works fine, no WARN_ON happened. -------------------------------- Peter Huang(peng) > On Thu, 2012-03-29 at 17:38 +0800, Peter Huang (Peng) wrote: > > Thks for your mail. > > > > >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce > > >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value > > >in PMTU (revised)) > > > > This patch already included in kernel-3.3, but for our case, virtual tap device's delayed > > Deletion will also cause kernel oops even in kernel3.3. > > I was suggesting you take a look at the commit content ;) > > Then you can see the code in br_nf_local_in(), a bit cleaner than yours. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 答复: 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-31 1:26 ` Peter Huang (Peng) @ 2012-03-31 5:41 ` Eric Dumazet 2012-03-31 7:29 ` 答复: " Peter Huang (Peng) 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-03-31 5:41 UTC (permalink / raw) To: Peter Huang (Peng); +Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' On Sat, 2012-03-31 at 09:26 +0800, Peter Huang (Peng) wrote: > > > >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce > > > >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value > > > >in PMTU (revised)) > > Hi, Eric > > I confirmed the patch contents again. > For our case, NF_INET_PRE_ROUTING is involved, not NF_BR_LOCAL_IN. > It seems pre-routing is not included in the patch you mentioned. > BTW, our pkts are all ipv6 DHCP pkts(with MAC 33 33 00 01 00 02). > > We verified the patch bellow, and this works fine, no WARN_ON happened. Hi Peter I claim that your patch is not the good one and you need to refine it. First, code is not needed if CONFIG_BRIDGE_NETFILTER is not set. In fact, if CONFIG_BRIDGE_NETFILTER is not set, compilation will fail since fake_rtable doesnt exist in "struct net_bridge", so you fix a bug and introduce a new one. CC [M] net/bridge/br_forward.o net/bridge/br_forward.c: In function ‘__br_forward’: net/bridge/br_forward.c:94: error: ‘struct sk_buff’ has no member named ‘_skb_dst’ net/bridge/br_forward.c:94: error: ‘struct net_bridge’ has no member named ‘fake_rtable’ make[1]: *** [net/bridge/br_forward.o] Error 1 make: *** [_module_net/bridge] Error 2 Then, the test is using obsolete dst internals and cast that should not be in a C file. : if (skb->_skb_dst == (unsigned long)&to->br->fake_rtable) So I suggested you take a look at net/bridge/br_netfilter.c code to see how this can be done properly. Maybe you need to add a helper in an include file to make this proper. I hope this is now clear to you, because your initial patch cannot be applied as is. If you want full credit for this work, you must go a step forward, or else another guy will finish the job. Thanks ^ permalink raw reply [flat|nested] 17+ messages in thread
* 答复: 答复: 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-03-31 5:41 ` Eric Dumazet @ 2012-03-31 7:29 ` Peter Huang (Peng) 0 siblings, 0 replies; 17+ messages in thread From: Peter Huang (Peng) @ 2012-03-31 7:29 UTC (permalink / raw) To: 'Eric Dumazet' Cc: linux-kernel, harry.majun, zhoukang7, 'netdev' Hi, Eric Thanks very much for your reply. I will refine this patch and send it out later. -------------------------------- Peter Huang(peng) > -----邮件原件----- > 发件人: Eric Dumazet [mailto:eric.dumazet@gmail.com] > 发送时间: 2012年3月31日 13:41 > 收件人: Peter Huang (Peng) > 抄送: linux-kernel@vger.kernel.org; harry.majun@huawei.com; zhoukang7@huawei.com; 'netdev' > 主题: Re: 答复: 答复: 答复: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. > > On Sat, 2012-03-31 at 09:26 +0800, Peter Huang (Peng) wrote: > > > > >Check net/bridge/br_netfilter.c and commits e688a6048076 (net: introduce > > > > >DST_NOPEER dst flag ) 4adf0af6818f3ea5 (bridge: send correct MTU value > > > > >in PMTU (revised)) > > > > Hi, Eric > > > > I confirmed the patch contents again. > > For our case, NF_INET_PRE_ROUTING is involved, not NF_BR_LOCAL_IN. > > It seems pre-routing is not included in the patch you mentioned. > > BTW, our pkts are all ipv6 DHCP pkts(with MAC 33 33 00 01 00 02). > > > > We verified the patch bellow, and this works fine, no WARN_ON happened. > > Hi Peter > > I claim that your patch is not the good one and you need to refine it. > > First, code is not needed if CONFIG_BRIDGE_NETFILTER is not set. > > In fact, if CONFIG_BRIDGE_NETFILTER is not set, compilation will fail > since fake_rtable doesnt exist in "struct net_bridge", so you fix a bug > and introduce a new one. > > CC [M] net/bridge/br_forward.o > net/bridge/br_forward.c: In function ‘__br_forward’: > net/bridge/br_forward.c:94: error: ‘struct sk_buff’ has no member named ‘_skb_dst’ > net/bridge/br_forward.c:94: error: ‘struct net_bridge’ has no member named ‘fake_rtable’ > make[1]: *** [net/bridge/br_forward.o] Error 1 > make: *** [_module_net/bridge] Error 2 > > > Then, the test is using obsolete dst internals and cast that should not > be in a C file. : > > if (skb->_skb_dst == (unsigned long)&to->br->fake_rtable) > > So I suggested you take a look at net/bridge/br_netfilter.c code to see > how this can be done properly. Maybe you need to add a helper in an > include file to make this proper. > > I hope this is now clear to you, because your initial patch cannot be > applied as is. If you want full credit for this work, you must go a step > forward, or else another guy will finish the job. > > Thanks ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. @ 2012-04-17 6:22 Peter Huang (Peng) 2012-04-17 15:52 ` Stephen Hemminger 2012-04-18 19:04 ` Jonathan Nieder 0 siblings, 2 replies; 17+ messages in thread From: Peter Huang (Peng) @ 2012-04-17 6:22 UTC (permalink / raw) To: shemminger, 'David S. Miller', netdev Cc: eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, peter.huangpeng, harry.majun When bridge is deleted before tap/vif device's delete, kernel may encounter an oops because of NULL reference to fake_rtable's dst. Set fake_rtable's dst to NULL before sending packets out can solve this problem. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> --- include/linux/netfilter_bridge.h | 8 ++++++++ net/bridge/br_forward.c | 1 + net/bridge/br_netfilter.c | 6 +----- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h index 0ddd161..70744fe 100644 --- a/include/linux/netfilter_bridge.h +++ b/include/linux/netfilter_bridge.h @@ -104,9 +104,17 @@ struct bridge_skb_cb { } daddr; }; +static inline void br_drop_fake_rtable(struct sk_buff *skb) { + struct dst_entry *dst = skb_dst(skb); + /* abuse fact that only fake_rtable has DST_NOPEER set */ + if (dst && (dst->flags & DST_NOPEER)) + skb_dst_drop(skb); +} + #else #define nf_bridge_maybe_copy_header(skb) (0) #define nf_bridge_pad(skb) (0) +#define br_drop_fake_rtable(skb) (0) #endif /* CONFIG_BRIDGE_NETFILTER */ #endif /* __KERNEL__ */ diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 61f6534..a2098e3 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -47,6 +47,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb) kfree_skb(skb); } else { skb_push(skb, ETH_HLEN); + br_drop_fake_rtable(skb); dev_queue_xmit(skb); } diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index dec4f38..946dcb0 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -694,11 +694,7 @@ static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff *skb, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - struct rtable *rt = skb_rtable(skb); - - if (rt && rt == bridge_parent_rtable(in)) - skb_dst_drop(skb); - + br_drop_fake_rtable(skb); return NF_ACCEPT; } -------------------------------- Peter Huang(peng) ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 6:22 Peter Huang (Peng) @ 2012-04-17 15:52 ` Stephen Hemminger 2012-04-17 16:48 ` Eric Dumazet 2012-04-18 19:04 ` Jonathan Nieder 1 sibling, 1 reply; 17+ messages in thread From: Stephen Hemminger @ 2012-04-17 15:52 UTC (permalink / raw) To: Peter Huang (Peng) Cc: 'David S. Miller', netdev, eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On Tue, 17 Apr 2012 14:22:26 +0800 "Peter Huang (Peng)" <peter.huangpeng@huawei.com> wrote: > When bridge is deleted before tap/vif device's delete, kernel may encounter an oops because of NULL reference to fake_rtable's dst. > Set fake_rtable's dst to NULL before sending packets out can solve this problem. > > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> > --- > include/linux/netfilter_bridge.h | 8 ++++++++ > net/bridge/br_forward.c | 1 + > net/bridge/br_netfilter.c | 6 +----- > 3 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h > index 0ddd161..70744fe 100644 > --- a/include/linux/netfilter_bridge.h > +++ b/include/linux/netfilter_bridge.h > @@ -104,9 +104,17 @@ struct bridge_skb_cb { > } daddr; > }; > > +static inline void br_drop_fake_rtable(struct sk_buff *skb) { > + struct dst_entry *dst = skb_dst(skb); > + /* abuse fact that only fake_rtable has DST_NOPEER set */ > + if (dst && (dst->flags & DST_NOPEER)) > + skb_dst_drop(skb); > +} This check seems like a disaster waiting to happen when the next change to DST table happens. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 15:52 ` Stephen Hemminger @ 2012-04-17 16:48 ` Eric Dumazet 2012-04-18 7:57 ` Peter Huang(Peng) 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2012-04-17 16:48 UTC (permalink / raw) To: Stephen Hemminger Cc: Peter Huang (Peng), 'David S. Miller', netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: > This check seems like a disaster waiting to happen when the next > change to DST table happens. Please Peter Document this, adding a new DST_FAKE_RTABLE flag #define DST_FAKE_RTABLE DST_NOPEER or just use a bit, we have plenty of them available. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 16:48 ` Eric Dumazet @ 2012-04-18 7:57 ` Peter Huang(Peng) 2012-04-18 8:23 ` Eric Dumazet 2012-04-18 8:41 ` David Miller 0 siblings, 2 replies; 17+ messages in thread From: Peter Huang(Peng) @ 2012-04-18 7:57 UTC (permalink / raw) To: Eric Dumazet, Stephen Hemminger Cc: 'David S. Miller', netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On 2012/4/18 0:48, Eric Dumazet wrote: > On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: > >> This check seems like a disaster waiting to happen when the next >> change to DST table happens. > > Please Peter Document this, adding a new DST_FAKE_RTABLE flag > > #define DST_FAKE_RTABLE DST_NOPEER > > or just use a bit, we have plenty of them available. > Add DST_FAKE_RTABLE to dst_entry, this is the new patch. Is this ok? Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> --- include/linux/netfilter_bridge.h | 8 ++++++++ include/net/dst.h | 1 + net/bridge/br_forward.c | 1 + net/bridge/br_netfilter.c | 8 ++------ 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h index 0ddd161..eb09e3b 100644 --- a/include/linux/netfilter_bridge.h +++ b/include/linux/netfilter_bridge.h @@ -104,9 +104,17 @@ struct bridge_skb_cb { } daddr; }; +static inline void br_drop_fake_rtable(struct sk_buff *skb) { + struct dst_entry *dst = skb_dst(skb); + /* abuse fact that only fake_rtable has DST_FAKE_RTABLE set */ + if (dst && (dst->flags & DST_FAKE_RTABLE)) + skb_dst_drop(skb); +} + #else #define nf_bridge_maybe_copy_header(skb) (0) #define nf_bridge_pad(skb) (0) +#define br_drop_fake_rtable(skb) (0) #endif /* CONFIG_BRIDGE_NETFILTER */ #endif /* __KERNEL__ */ diff --git a/include/net/dst.h b/include/net/dst.h index 59c5d18..b094030 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -55,6 +55,7 @@ struct dst_entry { #define DST_NOCACHE 0x0010 #define DST_NOCOUNT 0x0020 #define DST_NOPEER 0x0040 +#define DST_FAKE_RTABLE 0x0080 short error; short obsolete; diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 61f6534..a2098e3 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -47,6 +47,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb) kfree_skb(skb); } else { skb_push(skb, ETH_HLEN); + br_drop_fake_rtable(skb); dev_queue_xmit(skb); } diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index dec4f38..d7f49b6 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -156,7 +156,7 @@ void br_netfilter_rtable_init(struct net_bridge *br) rt->dst.dev = br->dev; rt->dst.path = &rt->dst; dst_init_metrics(&rt->dst, br_dst_default_metrics, true); - rt->dst.flags = DST_NOXFRM | DST_NOPEER; + rt->dst.flags = DST_NOXFRM | DST_NOPEER | DST_FAKE_RTABLE; rt->dst.ops = &fake_dst_ops; } @@ -694,11 +694,7 @@ static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff *skb, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - struct rtable *rt = skb_rtable(skb); - - if (rt && rt == bridge_parent_rtable(in)) - skb_dst_drop(skb); - + br_drop_fake_rtable(skb); return NF_ACCEPT; } ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-18 7:57 ` Peter Huang(Peng) @ 2012-04-18 8:23 ` Eric Dumazet 2012-04-18 8:41 ` David Miller 1 sibling, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2012-04-18 8:23 UTC (permalink / raw) To: Peter Huang(Peng) Cc: Stephen Hemminger, 'David S. Miller', netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On Wed, 2012-04-18 at 15:57 +0800, Peter Huang(Peng) wrote: > On 2012/4/18 0:48, Eric Dumazet wrote: > > On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: > > > >> This check seems like a disaster waiting to happen when the next > >> change to DST table happens. > > > > Please Peter Document this, adding a new DST_FAKE_RTABLE flag > > > > #define DST_FAKE_RTABLE DST_NOPEER > > > > or just use a bit, we have plenty of them available. > > > > > Add DST_FAKE_RTABLE to dst_entry, this is the new patch. > Is this ok? > A full new patch is needed, with nice changelog, and proper formatting (your mail was mangled) > }; > > +static inline void br_drop_fake_rtable(struct sk_buff *skb) { > + struct dst_entry *dst = skb_dst(skb); > + /* abuse fact that only fake_rtable has DST_FAKE_RTABLE set */ Remove the comment, since we dont abuse NOPEER anymore, we use a dedicated flag. (keep an empty line) > + if (dst && (dst->flags & DST_FAKE_RTABLE)) > + skb_dst_drop(skb); > +} > + > #else > #define nf_bridge_maybe_copy_header(skb) (0) > #define nf_bridge_pad(skb) (0) > +#define br_drop_fake_rtable(skb) (0) > #endif /* CONFIG_BRIDGE_NETFILTER */ > > #endif /* __KERNEL__ */ > diff --git a/include/net/dst.h b/include/net/dst.h > index 59c5d18..b094030 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -55,6 +55,7 @@ struct dst_entry { > #define DST_NOCACHE 0x0010 > #define DST_NOCOUNT 0x0020 > #define DST_NOPEER 0x0040 > +#define DST_FAKE_RTABLE 0x0080 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-18 7:57 ` Peter Huang(Peng) 2012-04-18 8:23 ` Eric Dumazet @ 2012-04-18 8:41 ` David Miller 1 sibling, 0 replies; 17+ messages in thread From: David Miller @ 2012-04-18 8:41 UTC (permalink / raw) To: peter.huangpeng Cc: eric.dumazet, shemminger, netdev, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun From: "Peter Huang(Peng)" <peter.huangpeng@huawei.com> Date: Wed, 18 Apr 2012 15:57:23 +0800 > On 2012/4/18 0:48, Eric Dumazet wrote: >> On Tue, 2012-04-17 at 08:52 -0700, Stephen Hemminger wrote: >> >>> This check seems like a disaster waiting to happen when the next >>> change to DST table happens. >> >> Please Peter Document this, adding a new DST_FAKE_RTABLE flag >> >> #define DST_FAKE_RTABLE DST_NOPEER >> >> or just use a bit, we have plenty of them available. >> > > > Add DST_FAKE_RTABLE to dst_entry, this is the new patch. > Is this ok? > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> Please post new patches as completely new emails, not as replies to other emails. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-17 6:22 Peter Huang (Peng) 2012-04-17 15:52 ` Stephen Hemminger @ 2012-04-18 19:04 ` Jonathan Nieder 2012-04-19 7:58 ` Massimo Cetra 1 sibling, 1 reply; 17+ messages in thread From: Jonathan Nieder @ 2012-04-18 19:04 UTC (permalink / raw) To: Peter Huang (Peng) Cc: shemminger, 'David S. Miller', netdev, eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun Hi, Peter Huang (Peng) wrote: > When bridge is deleted before tap/vif device's delete, kernel may > encounter an oops because of NULL reference to fake_rtable's dst. > > Set fake_rtable's dst to NULL before sending packets out can solve > this problem. > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> > --- > include/linux/netfilter_bridge.h | 8 ++++++++ > net/bridge/br_forward.c | 1 + > net/bridge/br_netfilter.c | 6 +----- > 3 files changed, 10 insertions(+), 5 deletions(-) Massimo Cetra (cc-ed) tested the patch against a 3.2.y kernel and wrote[1]: > The patch i applied yesterday to the debian kernel has been installed > and the kernel is not panic-ing anymore. > > I'll try to keep this bug up to date. So it seems to work. Dave, please consider queuing this for stable@ when the final patch is ready. Thanks, Jonathan [1] http://bugs.debian.org/668511#37 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops. 2012-04-18 19:04 ` Jonathan Nieder @ 2012-04-19 7:58 ` Massimo Cetra 0 siblings, 0 replies; 17+ messages in thread From: Massimo Cetra @ 2012-04-19 7:58 UTC (permalink / raw) To: Jonathan Nieder Cc: Peter Huang (Peng), shemminger, 'David S. Miller', netdev, eric.dumazet, linux-kernel, ctrix+debianbugs, peter.huangpeng, harry.majun On 18/04/2012 21:04, Jonathan Nieder wrote: > > Massimo Cetra (cc-ed) tested the patch against a 3.2.y kernel and wrote[1]: > >> The patch i applied yesterday to the debian kernel has been installed >> and the kernel is not panic-ing anymore. >> >> I'll try to keep this bug up to date. > > So it seems to work. Dave, please consider queuing this for stable@ > when the final patch is ready. Also please consider that the bridge problems are not over and the kernel keeps panicing in a different way... - http://marc.info/?t=133474395500004&r=1&w=2 Thanks, Massimo ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-04-19 8:04 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <002501cd0d74$317fd100$947f7300$%huangpeng@huawei.com>
2012-03-29 6:36 ` [PATCH] set fake_rtable's dst to NULL to avoid kernel Oops Eric Dumazet
2012-03-29 6:40 ` 答复: " Peter Huang (Peng)
2012-03-29 8:52 ` Eric Dumazet
2012-03-29 9:38 ` 答复: " Peter Huang (Peng)
2012-03-29 11:31 ` Eric Dumazet
2012-03-29 11:41 ` 答复: " Peter Huang (Peng)
2012-03-31 1:26 ` Peter Huang (Peng)
2012-03-31 5:41 ` Eric Dumazet
2012-03-31 7:29 ` 答复: " Peter Huang (Peng)
2012-04-17 6:22 Peter Huang (Peng)
2012-04-17 15:52 ` Stephen Hemminger
2012-04-17 16:48 ` Eric Dumazet
2012-04-18 7:57 ` Peter Huang(Peng)
2012-04-18 8:23 ` Eric Dumazet
2012-04-18 8:41 ` David Miller
2012-04-18 19:04 ` Jonathan Nieder
2012-04-19 7:58 ` Massimo Cetra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox