* [PATCH] net bridge: add null pointer check, fix panic [not found] <51C2710D.2060405@gmail.com> @ 2013-06-20 3:08 ` xiaoming gao 2013-06-20 4:55 ` Eric Dumazet 0 siblings, 1 reply; 7+ messages in thread From: xiaoming gao @ 2013-06-20 3:08 UTC (permalink / raw) To: stephen, davem, bridge, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4353 bytes --] From: newtongao <newtongao@tencent.com> Date: Wed, 19 Jun 2013 14:58:33 +0800 Subject: [PATCH] net bridge: add null pointer check,fix panic in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge, but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using, so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic. kernel 3.4 also has this bug,i have verified. mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet. method to reproduce null pointer panic: 1. in function del_nbp,amplify the gap between "dev->priv_flags &= ~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new line "while(1);". 2. create net bridge testbr and bind some network interface(e.g. testif) on it. 3. send packets to testif frequetly,such as ping testif. 4. brctl delif testbr testif. backtrace: [1002130.426411] FS: 00007f2235153700(0000) GS:ffff88007b020000(0000) knlGS:0000000000000000 [1002130.524656] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [1002130.594931] CR2: 0000000000000021 CR3: 0000000001dc7000 CR4: 0000000000002660 [1002130.681798] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1002130.768654] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [1002130.855525] Process ksoftirqd/1 (pid: 9, threadinfo ffff880062642000, task ffff880062640800) [1002130.957868] Stack: [1002130.983758] 0000000080000000 ffff88007bc06a40 ffff88005e3c6c80 ffffffff81901d10 [1002131.073887] ffff88005efc2000 0000000000000001 ffff880062643bf0 ffffffff818497d9 [1002131.163945] ffff88006147e090 0000000000000020 ffff88007bc06a40 ffff88005e3c6c80 [1002131.254023] Call Trace: [1002131.285086] [<ffffffff81901d10>] ? br_handle_frame_finish+0x2b0/0x2b0 [1002131.364677] [<ffffffff818497d9>] __netif_receive_skb+0x109/0x4a0 [1002131.439081] [<ffffffff81849b12>] __netif_receive_skb+0x442/0x4a0 [1002131.513493] [<ffffffff810e3a6e>] ? __kmalloc_node+0x3e/0x50 [1002131.582744] [<ffffffff8184b7d8>] netif_receive_skb+0x78/0x80 [1002131.653053] [<ffffffff8184b928>] napi_skb_finish+0x48/0x60 [1002131.721269] [<ffffffff8184bf1d>] napi_gro_receive+0xfd/0x130 [1002131.791551] [<ffffffff8192b2f6>] vlan_gro_receive+0x16/0x20 [1002131.860796] [<ffffffff816a41d1>] igb_poll+0x6f1/0xdc0 [1002131.923871] [<ffffffff8196c0d5>] ? __schedule+0x2e5/0x810 [1002131.991071] [<ffffffff8196e38b>] ? _raw_spin_lock_irq+0xb/0x30 [1002132.063421] [<ffffffff8184c0c1>] net_rx_action+0xa1/0x1d0 [1002132.130619] [<ffffffff81057449>] __do_softirq+0x99/0x130 [1002132.196803] [<ffffffff8105759a>] run_ksoftirqd+0xba/0x170 [1002132.264000] [<ffffffff810574e0>] ? __do_softirq+0x130/0x130 [1002132.333258] [<ffffffff8106d306>] kthread+0x96/0xa0 [1002132.393224] [<ffffffff8196ffe4>] kernel_thread_helper+0x4/0x10 [1002132.465580] [<ffffffff8196f0f6>] ? int_ret_from_sys_call+0x7/0x1b [1002132.541016] [<ffffffff8196e8a1>] ? retint_restore_args+0x5/0x6 [1002132.613368] [<ffffffff8196ffe0>] ? gs_change+0x13/0x13 [1002132.677466] Code: 02 f6 81 a5 01 00 00 40 48 8b 81 e0 02 00 00 89 d6 4c 0f 45 f0 48 8b 05 50 76 20 00 66 33 70 04 66 f7 c6 ff f0 0f 84 d0 00 00 00 [1002132.835500] 0f b6 46 21 3c 02 74 50 3c 03 0f 85 60 ff ff ff 48 8b 05 e1 [1002132.920864] RIP [<ffffffff81901e17>] br_handle_frame+0x107/0x260 [1002132.995296] RSP <ffff880062643b50> [1002133.038735] CR2: 0000000000000021 Signed-off-by: Xiaoming Gao <newtongao@tencent.com> --- net/bridge/br_input.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index f06ee39..c8b365f 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -122,7 +122,8 @@ static int br_handle_local_finish(struct sk_buff *skb) { struct net_bridge_port *p = br_port_get_rcu(skb->dev); - br_fdb_update(p->br, p, eth_hdr(skb)->h_source); + if (p) + br_fdb_update(p->br, p, eth_hdr(skb)->h_source); return 0; /* process further */ } @@ -160,6 +161,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; p = br_port_get_rcu(skb->dev); + if (!p) + goto drop; if (unlikely(is_link_local(dest))) { /* Pause frames shouldn't be passed up by driver anyway */ -- 1.7.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-net-bridge-add-null-pointer-check.patch --] [-- Type: text/plain; charset=gb18030; name="0001-net-bridge-add-null-pointer-check.patch", Size: 4422 bytes --] From 2b39c4ccd812723891117184b297f6886dcfd976 Mon Sep 17 00:00:00 2001 From: newtongao <newtongao@tencent.com> Date: Wed, 19 Jun 2013 14:58:33 +0800 Subject: [PATCH] net bridge: add null pointer check,fix panic in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge, but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using, so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic. kernel 3.4 also has this bug,i have verified. mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet. method to reproduce null pointer panic: 1. in function del_nbp,amplify the gap between "dev->priv_flags &= ~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new line "while(1);". 2. create net bridge testbr and bind some network interface(e.g. testif) on it. 3. send packets to testif frequetly,such as ping testif. 4. brctl delif testbr testif. backtrace: [1002130.426411] FS: 00007f2235153700(0000) GS:ffff88007b020000(0000) knlGS:0000000000000000 [1002130.524656] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [1002130.594931] CR2: 0000000000000021 CR3: 0000000001dc7000 CR4: 0000000000002660 [1002130.681798] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1002130.768654] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [1002130.855525] Process ksoftirqd/1 (pid: 9, threadinfo ffff880062642000, task ffff880062640800) [1002130.957868] Stack: [1002130.983758] 0000000080000000 ffff88007bc06a40 ffff88005e3c6c80 ffffffff81901d10 [1002131.073887] ffff88005efc2000 0000000000000001 ffff880062643bf0 ffffffff818497d9 [1002131.163945] ffff88006147e090 0000000000000020 ffff88007bc06a40 ffff88005e3c6c80 [1002131.254023] Call Trace: [1002131.285086] [<ffffffff81901d10>] ? br_handle_frame_finish+0x2b0/0x2b0 [1002131.364677] [<ffffffff818497d9>] __netif_receive_skb+0x109/0x4a0 [1002131.439081] [<ffffffff81849b12>] __netif_receive_skb+0x442/0x4a0 [1002131.513493] [<ffffffff810e3a6e>] ? __kmalloc_node+0x3e/0x50 [1002131.582744] [<ffffffff8184b7d8>] netif_receive_skb+0x78/0x80 [1002131.653053] [<ffffffff8184b928>] napi_skb_finish+0x48/0x60 [1002131.721269] [<ffffffff8184bf1d>] napi_gro_receive+0xfd/0x130 [1002131.791551] [<ffffffff8192b2f6>] vlan_gro_receive+0x16/0x20 [1002131.860796] [<ffffffff816a41d1>] igb_poll+0x6f1/0xdc0 [1002131.923871] [<ffffffff8196c0d5>] ? __schedule+0x2e5/0x810 [1002131.991071] [<ffffffff8196e38b>] ? _raw_spin_lock_irq+0xb/0x30 [1002132.063421] [<ffffffff8184c0c1>] net_rx_action+0xa1/0x1d0 [1002132.130619] [<ffffffff81057449>] __do_softirq+0x99/0x130 [1002132.196803] [<ffffffff8105759a>] run_ksoftirqd+0xba/0x170 [1002132.264000] [<ffffffff810574e0>] ? __do_softirq+0x130/0x130 [1002132.333258] [<ffffffff8106d306>] kthread+0x96/0xa0 [1002132.393224] [<ffffffff8196ffe4>] kernel_thread_helper+0x4/0x10 [1002132.465580] [<ffffffff8196f0f6>] ? int_ret_from_sys_call+0x7/0x1b [1002132.541016] [<ffffffff8196e8a1>] ? retint_restore_args+0x5/0x6 [1002132.613368] [<ffffffff8196ffe0>] ? gs_change+0x13/0x13 [1002132.677466] Code: 02 f6 81 a5 01 00 00 40 48 8b 81 e0 02 00 00 89 d6 4c 0f 45 f0 48 8b 05 50 76 20 00 66 33 70 04 66 f7 c6 ff f0 0f 84 d0 00 00 00 [1002132.835500] 0f b6 46 21 3c 02 74 50 3c 03 0f 85 60 ff ff ff 48 8b 05 e1 [1002132.920864] RIP [<ffffffff81901e17>] br_handle_frame+0x107/0x260 [1002132.995296] RSP <ffff880062643b50> [1002133.038735] CR2: 0000000000000021 Signed-off-by: Xiaoming Gao <newtongao@tencent.com> --- net/bridge/br_input.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index f06ee39..c8b365f 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -122,7 +122,8 @@ static int br_handle_local_finish(struct sk_buff *skb) { struct net_bridge_port *p = br_port_get_rcu(skb->dev); - br_fdb_update(p->br, p, eth_hdr(skb)->h_source); + if (p) + br_fdb_update(p->br, p, eth_hdr(skb)->h_source); return 0; /* process further */ } @@ -160,6 +161,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; p = br_port_get_rcu(skb->dev); + if (!p) + goto drop; if (unlikely(is_link_local(dest))) { /* Pause frames shouldn't be passed up by driver anyway */ -- 1.7.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic 2013-06-20 3:08 ` [PATCH] net bridge: add null pointer check, fix panic xiaoming gao @ 2013-06-20 4:55 ` Eric Dumazet 2013-06-20 7:00 ` xiaoming gao 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2013-06-20 4:55 UTC (permalink / raw) To: xiaoming gao; +Cc: stephen, davem, bridge, netdev, linux-kernel On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote: > From: newtongao <newtongao@tencent.com> > Date: Wed, 19 Jun 2013 14:58:33 +0800 > Subject: [PATCH] net bridge: add null pointer check,fix panic > > in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge, > but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using, > so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic. > > kernel 3.4 also has this bug,i have verified. > mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet. Please check current version before sending a patch. This was most probably fixed in commit 00cfec37484761a44 ("net: add a synchronize_net() in netdev_rx_handler_unregister()") Thanks ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic 2013-06-20 4:55 ` Eric Dumazet @ 2013-06-20 7:00 ` xiaoming gao 2013-06-20 7:29 ` Eric Dumazet 0 siblings, 1 reply; 7+ messages in thread From: xiaoming gao @ 2013-06-20 7:00 UTC (permalink / raw) To: Eric Dumazet; +Cc: stephen, davem, bridge, netdev, linux-kernel Eric Dumazet said, at 2013-6-20 12:55: > On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote: >> From: newtongao <newtongao@tencent.com> >> Date: Wed, 19 Jun 2013 14:58:33 +0800 >> Subject: [PATCH] net bridge: add null pointer check,fix panic >> >> in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge, >> but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using, >> so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic. >> >> kernel 3.4 also has this bug,i have verified. >> mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet. > > Please check current version before sending a patch. > > This was most probably fixed in commit 00cfec37484761a44 > ("net: add a synchronize_net() in netdev_rx_handler_unregister()") > > Thanks > > HI Eric the problem is as follow: br_del_if()-->del_nbp(): list_del_rcu(&p->list); dev->priv_flags &= ~IFF_BRIDGE_PORT; ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame() ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL ------>so in br_handle_frame , there will be a null panic. netdev_rx_handler_unregister(dev); synchronize_net(); i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic 2013-06-20 7:00 ` xiaoming gao @ 2013-06-20 7:29 ` Eric Dumazet 2013-06-20 7:47 ` xiaoming gao 2013-11-11 10:27 ` Alexander Y. Fomichev 0 siblings, 2 replies; 7+ messages in thread From: Eric Dumazet @ 2013-06-20 7:29 UTC (permalink / raw) To: xiaoming gao; +Cc: stephen, davem, bridge, netdev, linux-kernel On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote: > HI Eric > the problem is as follow: > br_del_if()-->del_nbp(): > > list_del_rcu(&p->list); > dev->priv_flags &= ~IFF_BRIDGE_PORT; > > ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame() > ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL > ------>so in br_handle_frame , there will be a null panic. > > netdev_rx_handler_unregister(dev); > synchronize_net(); This code is no longer like that in current tree. Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd ("bridge: remove a redundant synchronize_net()") > > > i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. I claim adding NULL tests is not needed in the fast path, it was clearly stated in the changelog. I would change the dismantle path to make sure br_get_port_rcu() does not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT Try something like that : diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 1b8b8b8..2edfb80 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb) int br_handle_frame_finish(struct sk_buff *skb) { const unsigned char *dest = eth_hdr(skb)->h_dest; - struct net_bridge_port *p = br_port_get_rcu(skb->dev); + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); struct net_bridge *br; struct net_bridge_fdb_entry *dst; struct net_bridge_mdb_entry *mdst; @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb) bool unicast = true; u16 vid = 0; - if (!p || p->state == BR_STATE_DISABLED) + if (p->state == BR_STATE_DISABLED) goto drop; if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) @@ -142,7 +142,7 @@ drop: /* note: already called with rcu_read_lock */ static int br_handle_local_finish(struct sk_buff *skb) { - struct net_bridge_port *p = br_port_get_rcu(skb->dev); + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); u16 vid = 0; br_vlan_get_tag(skb, &vid); @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) if (!skb) return RX_HANDLER_CONSUMED; - p = br_port_get_rcu(skb->dev); + p = __br_port_get_rcu(skb->dev); if (unlikely(is_link_local_ether_addr(dest))) { /* diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 3be89b3..9fdd467 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -184,6 +184,11 @@ struct net_bridge_port #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT) +static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev) +{ + return rcu_dereference(dev->rx_handler_data); +} + static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) { struct net_bridge_port *port = ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic 2013-06-20 7:29 ` Eric Dumazet @ 2013-06-20 7:47 ` xiaoming gao 2013-06-20 8:14 ` Eric Dumazet 2013-11-11 10:27 ` Alexander Y. Fomichev 1 sibling, 1 reply; 7+ messages in thread From: xiaoming gao @ 2013-06-20 7:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: stephen, davem, bridge, netdev, linux-kernel Eric Dumazet said, at 2013-6-20 15:29: > On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote: > >> HI Eric >> the problem is as follow: >> br_del_if()-->del_nbp(): >> >> list_del_rcu(&p->list); >> dev->priv_flags &= ~IFF_BRIDGE_PORT; >> >> ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame() >> ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL >> ------>so in br_handle_frame , there will be a null panic. >> >> netdev_rx_handler_unregister(dev); >> synchronize_net(); > > This code is no longer like that in current tree. > Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd > ("bridge: remove a redundant synchronize_net()") > >> >> >> i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. > > I claim adding NULL tests is not needed in the fast path, it was clearly > stated in the changelog. > > I would change the dismantle path to make sure br_get_port_rcu() does > not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT > > Try something like that : > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > index 1b8b8b8..2edfb80 100644 > --- a/net/bridge/br_input.c > +++ b/net/bridge/br_input.c > @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb) > int br_handle_frame_finish(struct sk_buff *skb) > { > const unsigned char *dest = eth_hdr(skb)->h_dest; > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > struct net_bridge *br; > struct net_bridge_fdb_entry *dst; > struct net_bridge_mdb_entry *mdst; > @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb) > bool unicast = true; > u16 vid = 0; > > - if (!p || p->state == BR_STATE_DISABLED) > + if (p->state == BR_STATE_DISABLED) > goto drop; > > if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) > @@ -142,7 +142,7 @@ drop: > /* note: already called with rcu_read_lock */ > static int br_handle_local_finish(struct sk_buff *skb) > { > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > u16 vid = 0; > > br_vlan_get_tag(skb, &vid); > @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) > if (!skb) > return RX_HANDLER_CONSUMED; > > - p = br_port_get_rcu(skb->dev); > + p = __br_port_get_rcu(skb->dev); > > if (unlikely(is_link_local_ether_addr(dest))) { > /* > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index 3be89b3..9fdd467 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -184,6 +184,11 @@ struct net_bridge_port > > #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT) > > +static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev) > +{ > + return rcu_dereference(dev->rx_handler_data); > +} > + > static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) > { > struct net_bridge_port *port = > > if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns NULL, the problem is fixed. but the codes in mainline is still bugged, am i right?? by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very easily to reproduce. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic 2013-06-20 7:47 ` xiaoming gao @ 2013-06-20 8:14 ` Eric Dumazet 0 siblings, 0 replies; 7+ messages in thread From: Eric Dumazet @ 2013-06-20 8:14 UTC (permalink / raw) To: xiaoming gao; +Cc: stephen, davem, bridge, netdev, linux-kernel On Thu, 2013-06-20 at 15:47 +0800, xiaoming gao wrote: > > if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns NULL, the problem is fixed. > but the codes in mainline is still bugged, am i right?? > by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very easily to reproduce. Commit 00cfec37484761a4 has to be backported anyway, if not already backported. I would rather fix things properly, instead of adding defensive code. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic 2013-06-20 7:29 ` Eric Dumazet 2013-06-20 7:47 ` xiaoming gao @ 2013-11-11 10:27 ` Alexander Y. Fomichev 1 sibling, 0 replies; 7+ messages in thread From: Alexander Y. Fomichev @ 2013-11-11 10:27 UTC (permalink / raw) To: Eric Dumazet; +Cc: xiaoming gao, stephen, davem, bridge, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4775 bytes --] On Thu, Jun 20, 2013 at 11:29 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote: > >> HI Eric >> the problem is as follow: >> br_del_if()-->del_nbp(): >> >> list_del_rcu(&p->list); >> dev->priv_flags &= ~IFF_BRIDGE_PORT; >> >> ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame() >> ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL >> ------>so in br_handle_frame , there will be a null panic. >> >> netdev_rx_handler_unregister(dev); >> synchronize_net(); > > This code is no longer like that in current tree. > Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd > ("bridge: remove a redundant synchronize_net()") > >> >> >> i have checked commit 00cfec37484761a44, i think it didn't fix this bug.. > > I claim adding NULL tests is not needed in the fast path, it was clearly > stated in the changelog. > > I would change the dismantle path to make sure br_get_port_rcu() does > not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT > Try something like that : > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > index 1b8b8b8..2edfb80 100644 > --- a/net/bridge/br_input.c > +++ b/net/bridge/br_input.c > @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb) > int br_handle_frame_finish(struct sk_buff *skb) > { > const unsigned char *dest = eth_hdr(skb)->h_dest; > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > struct net_bridge *br; > struct net_bridge_fdb_entry *dst; > struct net_bridge_mdb_entry *mdst; > @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb) > bool unicast = true; > u16 vid = 0; > > - if (!p || p->state == BR_STATE_DISABLED) > + if (p->state == BR_STATE_DISABLED) > goto drop; > > if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) > @@ -142,7 +142,7 @@ drop: > /* note: already called with rcu_read_lock */ > static int br_handle_local_finish(struct sk_buff *skb) > { > - struct net_bridge_port *p = br_port_get_rcu(skb->dev); > + struct net_bridge_port *p = __br_port_get_rcu(skb->dev); > u16 vid = 0; > > br_vlan_get_tag(skb, &vid); > @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) > if (!skb) > return RX_HANDLER_CONSUMED; > > - p = br_port_get_rcu(skb->dev); > + p = __br_port_get_rcu(skb->dev); > > if (unlikely(is_link_local_ether_addr(dest))) { > /* > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index 3be89b3..9fdd467 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -184,6 +184,11 @@ struct net_bridge_port > > #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT) > > +static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev) > +{ > + return rcu_dereference(dev->rx_handler_data); > +} > + > static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) > { > struct net_bridge_port *port = > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > I claim adding NULL tests is not needed in the fast path, it was clearly > stated in the changelog. Hello, This commit makes trouble for current STP. Two days ago i tried to switch to 3.10.18 and i've caught "bad magic" on uninitialized br->lock: ./net/bridge/br_stp_bpdu.c +158 in br_stp_rcv (trace attached): p = br_port_get_rcu(dev); br = p->br; spin_lock(&br->lock); <- here ----------------------------------------------- Bisect pointed to this commit (linux-stable: 960b8e5018a552f62cfbc0dfe94be7b6ba178f13) (mainline 716ec052d2280d511e10e90ad54a86f5b5d4dcc2) As far as i can see this happens when: a) bridge module had been loaded but there was no bridge interface, br->lock had not been initialized. b) interface had been in promiscuous mod (tcpdump) c) stp broadcasts 01:80:c2:00:00:00 coming to this iface (llc_rcv drops PACKET_OTHERHOST to protect us in promiscuous mode but seems like not a broadcasts) d) and finally rx_handler_data had been initialised for this interface ( by macvlan in my case) It seems like STP needs its own IFF_BRIDGE_PORT check. probably an easiest option to check it in br_stp_rcv as before (or probably in llc_rcv)... -- Best regards. Alexander Y. Fomichev <git.user@gmail.com> [-- Attachment #2: br_stp_rcv-spin_lock.trace --] [-- Type: application/octet-stream, Size: 5983 bytes --] [ 1761.045223] BUG: soft lockup - CPU#6 stuck for 23s! [md1_resync:1597] [ 1761.051701] Modules linked in: bridge stp llc macvlan ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter ip_tables [last unloaded: xt_owner] [ 1761.068794] irq event stamp: 229014772 [ 1761.072564] hardirqs last enabled at (229014772): [<ffffffff8103e492>] vprintk_emit+0x1b2/0x460 [ 1761.081420] hardirqs last disabled at (229014771): [<ffffffff8103e324>] vprintk_emit+0x44/0x460 [ 1761.090191] softirqs last enabled at (229014700): [<ffffffff810444d8>] __do_softirq+0x178/0x2a0 [ 1761.099042] softirqs last disabled at (229014753): [<ffffffff816aa44c>] call_softirq+0x1c/0x30 [ 1761.107729] CPU: 6 PID: 1597 Comm: md1_resync Not tainted 3.10.17 #13 [ 1761.114199] Hardware name: Supermicro X9DRG-HF/X9DRG-HF, BIOS 1.0c 08/22/2012 [ 1761.121368] task: ffff88084f080000 ti: ffff88084f07a000 task.ti: ffff88084f07a000 [ 1761.128908] RIP: 0010:[<ffffffff81322c57>] [<ffffffff81322c57>] delay_tsc+0x37/0x60 [ 1761.136727] RSP: 0018:ffff88107fc03ad0 EFLAGS: 00000206 [ 1761.142073] RAX: 0000000000dc56cb RBX: ffffffff816a065c RCX: 0000000000000001 [ 1761.149250] RDX: 00000000001daa0a RSI: 0000000000dc569f RDI: 0000000000000001 [ 1761.156418] RBP: ffff88107fc03ad0 R08: 0000000000000006 R09: 0000000000000000 [ 1761.163577] R10: 0000000000000002 R11: 0000000000000002 R12: ffff88107fc03a48 [ 1761.170746] R13: ffffffff816a9d2f R14: ffff88107fc03ad0 R15: 000000002a552e2e [ 1761.177903] FS: 0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000 [ 1761.186050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1761.191826] CR2: 0000000001e18000 CR3: 0000000001974000 CR4: 00000000000407e0 [ 1761.199000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1761.206158] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1761.213316] Stack: [ 1761.216583] ffff88107fc03ae0 ffffffff81322b5f ffff88107fc03b10 ffffffff816950a4 [ 1761.224124] 0000000000007465 ffff8810527a8000 ffff8810527a8000 ffff88084e96e000 [ 1761.231666] ffff88107fc03b30 ffffffff8132b344 ffff88104b5c1180 ffff8810527a8000 [ 1761.239214] Call Trace: [ 1761.241686] <IRQ> [ 1761.243619] [<ffffffff81322b5f>] __delay+0xf/0x20 [ 1761.248693] [<ffffffff816950a4>] __spin_lock_debug+0x43/0x7b [ 1761.254474] [<ffffffff8132b344>] do_raw_spin_lock+0x44/0xc0 [ 1761.260158] [<ffffffff8169f929>] _raw_spin_lock+0x39/0x40 [ 1761.265681] [<ffffffffa005b3da>] ? br_stp_rcv+0x8a/0x3f0 [bridge] [ 1761.271902] [<ffffffffa005b3da>] br_stp_rcv+0x8a/0x3f0 [bridge] [ 1761.277941] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp] [ 1761.284495] [<ffffffff816888e3>] ? printk+0x4d/0x4f [ 1761.289494] [<ffffffff81044219>] ? local_bh_enable+0x89/0xf0 [ 1761.295275] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp] [ 1761.301834] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp] [ 1761.308416] [<ffffffffa000e248>] stp_pdu_rcv+0xe8/0xea0 [stp] [ 1761.314299] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp] [ 1761.320868] [<ffffffffa003e661>] llc_rcv+0x371/0x4e0 [llc] [ 1761.326486] [<ffffffff815ee646>] __netif_receive_skb_core+0x5c6/0x700 [ 1761.333046] [<ffffffff815ee17a>] ? __netif_receive_skb_core+0xfa/0x700 [ 1761.339694] [<ffffffff810a098f>] ? __lock_release+0x6f/0x100 [ 1761.345469] [<ffffffff815f23e1>] __netif_receive_skb+0x21/0x70 [ 1761.351421] [<ffffffff815f25f3>] netif_receive_skb+0x23/0xf0 [ 1761.357203] [<ffffffff8139f220>] ? dma_issue_pending_all+0xb0/0xe0 [ 1761.363495] [<ffffffff815f36e8>] napi_gro_receive+0xe8/0x140 [ 1761.369283] [<ffffffff814ee0f8>] igb_clean_rx_irq+0x148/0x1c0 [ 1761.375143] [<ffffffff814ee1ce>] igb_poll+0x5e/0xa0 [ 1761.380162] [<ffffffff815f2a29>] net_rx_action+0x139/0x250 [ 1761.385768] [<ffffffff81044409>] ? __do_softirq+0xa9/0x2a0 [ 1761.391375] [<ffffffff81044450>] __do_softirq+0xf0/0x2a0 [ 1761.396828] [<ffffffff816aa44c>] call_softirq+0x1c/0x30 [ 1761.402183] [<ffffffff81003e7d>] do_softirq+0x8d/0xc0 [ 1761.407349] [<ffffffff8104479e>] irq_exit+0xae/0xd0 [ 1761.412352] [<ffffffff816aa993>] do_IRQ+0x63/0xe0 [ 1761.417179] [<ffffffff816a05af>] common_interrupt+0x6f/0x6f [ 1761.422887] <EOI> [ 1761.424821] [<ffffffff811072e1>] ? get_page_from_freelist+0x241/0x5d0 [ 1761.431610] [<ffffffff811081c2>] __alloc_pages_nodemask+0x182/0xa50 [ 1761.437989] [<ffffffff8109fe61>] ? __lock_acquire+0x3f1/0xa80 [ 1761.443848] [<ffffffff8106fbe3>] ? finish_task_switch+0x83/0xf0 [ 1761.449905] [<ffffffff8109fe61>] ? __lock_acquire+0x3f1/0xa80 [ 1761.455799] [<ffffffff812ffae6>] ? ioc_lookup_icq+0x86/0xf0 [ 1761.461486] [<ffffffff811998c1>] ? bio_alloc_bioset+0x161/0x1c0 [ 1761.467536] [<ffffffff8114239a>] alloc_pages_current+0xba/0x170 [ 1761.473568] [<ffffffff815663a4>] r10buf_pool_alloc+0x174/0x2c0 [ 1761.479531] [<ffffffff8110241e>] mempool_alloc+0x5e/0x160 [ 1761.485044] [<ffffffff815676c6>] sync_request+0x916/0x1180 [ 1761.490650] [<ffffffff81570af1>] ? is_mddev_idle+0x141/0x170 [ 1761.496439] [<ffffffff815709b5>] ? is_mddev_idle+0x5/0x170 [ 1761.502039] [<ffffffff81573c98>] md_do_sync+0x548/0xd10 [ 1761.507385] [<ffffffff8106fbe3>] ? finish_task_switch+0x83/0xf0 [ 1761.513418] [<ffffffff81063b60>] ? __init_waitqueue_head+0x60/0x60 [ 1761.519720] [<ffffffff81570d8d>] md_thread+0x11d/0x170 [ 1761.524972] [<ffffffff8109e91d>] ? trace_hardirqs_on+0xd/0x10 [ 1761.530840] [<ffffffff81570c70>] ? md_register_thread+0xf0/0xf0 [ 1761.536879] [<ffffffff8106304b>] kthread+0xdb/0xe0 [ 1761.541795] [<ffffffff81062f70>] ? __init_kthread_worker+0x70/0x70 [ 1761.548094] [<ffffffff816a8f5c>] ret_from_fork+0x7c/0xb0 [ 1761.553530] [<ffffffff81062f70>] ? __init_kthread_worker+0x70/0x70 [ 1761.559829] Code: 04 25 1c b0 00 00 0f 1f 00 0f ae e8 0f 31 89 c6 eb 11 66 90 f3 90 65 8b 0c 25 1c b0 00 00 41 39 c8 75 12 0f 1f 00 0f ae e8 0f 31 <89> c2 29 f2 39 fa 72 e1 5d c3 29 c6 01 f7 0f 1f 00 0f ae e8 0f [-- Attachment #3: fix_stp_bridge_uninitialized.patch --] [-- Type: application/x-download, Size: 477 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-11-11 10:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <51C2710D.2060405@gmail.com>
2013-06-20 3:08 ` [PATCH] net bridge: add null pointer check, fix panic xiaoming gao
2013-06-20 4:55 ` Eric Dumazet
2013-06-20 7:00 ` xiaoming gao
2013-06-20 7:29 ` Eric Dumazet
2013-06-20 7:47 ` xiaoming gao
2013-06-20 8:14 ` Eric Dumazet
2013-11-11 10:27 ` Alexander Y. Fomichev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox