* [PATCH] net bridge: add null pointer check, fix panic
[not found] <51C2710D.2060405@gmail.com>
@ 2013-06-20 3:08 ` xiaoming gao
2013-06-20 4:55 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: xiaoming gao @ 2013-06-20 3:08 UTC (permalink / raw)
To: stephen, davem, bridge, netdev, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4353 bytes --]
From: newtongao <newtongao@tencent.com>
Date: Wed, 19 Jun 2013 14:58:33 +0800
Subject: [PATCH] net bridge: add null pointer check,fix panic
in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge,
but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using,
so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic.
kernel 3.4 also has this bug,i have verified.
mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet.
method to reproduce null pointer panic:
1. in function del_nbp,amplify the gap between "dev->priv_flags &= ~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new line "while(1);".
2. create net bridge testbr and bind some network interface(e.g. testif) on it.
3. send packets to testif frequetly,such as ping testif.
4. brctl delif testbr testif.
backtrace:
[1002130.426411] FS: 00007f2235153700(0000) GS:ffff88007b020000(0000) knlGS:0000000000000000
[1002130.524656] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[1002130.594931] CR2: 0000000000000021 CR3: 0000000001dc7000 CR4: 0000000000002660
[1002130.681798] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1002130.768654] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1002130.855525] Process ksoftirqd/1 (pid: 9, threadinfo ffff880062642000, task ffff880062640800)
[1002130.957868] Stack:
[1002130.983758] 0000000080000000 ffff88007bc06a40 ffff88005e3c6c80 ffffffff81901d10
[1002131.073887] ffff88005efc2000 0000000000000001 ffff880062643bf0 ffffffff818497d9
[1002131.163945] ffff88006147e090 0000000000000020 ffff88007bc06a40 ffff88005e3c6c80
[1002131.254023] Call Trace:
[1002131.285086] [<ffffffff81901d10>] ? br_handle_frame_finish+0x2b0/0x2b0
[1002131.364677] [<ffffffff818497d9>] __netif_receive_skb+0x109/0x4a0
[1002131.439081] [<ffffffff81849b12>] __netif_receive_skb+0x442/0x4a0
[1002131.513493] [<ffffffff810e3a6e>] ? __kmalloc_node+0x3e/0x50
[1002131.582744] [<ffffffff8184b7d8>] netif_receive_skb+0x78/0x80
[1002131.653053] [<ffffffff8184b928>] napi_skb_finish+0x48/0x60
[1002131.721269] [<ffffffff8184bf1d>] napi_gro_receive+0xfd/0x130
[1002131.791551] [<ffffffff8192b2f6>] vlan_gro_receive+0x16/0x20
[1002131.860796] [<ffffffff816a41d1>] igb_poll+0x6f1/0xdc0
[1002131.923871] [<ffffffff8196c0d5>] ? __schedule+0x2e5/0x810
[1002131.991071] [<ffffffff8196e38b>] ? _raw_spin_lock_irq+0xb/0x30
[1002132.063421] [<ffffffff8184c0c1>] net_rx_action+0xa1/0x1d0
[1002132.130619] [<ffffffff81057449>] __do_softirq+0x99/0x130
[1002132.196803] [<ffffffff8105759a>] run_ksoftirqd+0xba/0x170
[1002132.264000] [<ffffffff810574e0>] ? __do_softirq+0x130/0x130
[1002132.333258] [<ffffffff8106d306>] kthread+0x96/0xa0
[1002132.393224] [<ffffffff8196ffe4>] kernel_thread_helper+0x4/0x10
[1002132.465580] [<ffffffff8196f0f6>] ? int_ret_from_sys_call+0x7/0x1b
[1002132.541016] [<ffffffff8196e8a1>] ? retint_restore_args+0x5/0x6
[1002132.613368] [<ffffffff8196ffe0>] ? gs_change+0x13/0x13
[1002132.677466] Code: 02 f6 81 a5 01 00 00 40 48 8b 81 e0 02 00 00 89 d6 4c 0f 45 f0 48 8b 05 50 76 20 00 66 33 70 04 66 f7 c6 ff f0 0f 84 d0 00 00 00
[1002132.835500] 0f b6 46 21 3c 02 74 50 3c 03 0f 85 60 ff ff ff 48 8b 05 e1
[1002132.920864] RIP [<ffffffff81901e17>] br_handle_frame+0x107/0x260
[1002132.995296] RSP <ffff880062643b50>
[1002133.038735] CR2: 0000000000000021
Signed-off-by: Xiaoming Gao <newtongao@tencent.com>
---
net/bridge/br_input.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index f06ee39..c8b365f 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -122,7 +122,8 @@ static int br_handle_local_finish(struct sk_buff *skb)
{
struct net_bridge_port *p = br_port_get_rcu(skb->dev);
- br_fdb_update(p->br, p, eth_hdr(skb)->h_source);
+ if (p)
+ br_fdb_update(p->br, p, eth_hdr(skb)->h_source);
return 0; /* process further */
}
@@ -160,6 +161,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
return RX_HANDLER_CONSUMED;
p = br_port_get_rcu(skb->dev);
+ if (!p)
+ goto drop;
if (unlikely(is_link_local(dest))) {
/* Pause frames shouldn't be passed up by driver anyway */
--
1.7.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-net-bridge-add-null-pointer-check.patch --]
[-- Type: text/plain; charset=gb18030; name="0001-net-bridge-add-null-pointer-check.patch", Size: 4422 bytes --]
From 2b39c4ccd812723891117184b297f6886dcfd976 Mon Sep 17 00:00:00 2001
From: newtongao <newtongao@tencent.com>
Date: Wed, 19 Jun 2013 14:58:33 +0800
Subject: [PATCH] net bridge: add null pointer check,fix panic
in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge,
but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using,
so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic.
kernel 3.4 also has this bug,i have verified.
mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet.
method to reproduce null pointer panic:
1. in function del_nbp,amplify the gap between "dev->priv_flags &= ~IFF_BRIDGE_PORT;" and "netdev_rx_handler_unregister(dev);" , such as add new line "while(1);".
2. create net bridge testbr and bind some network interface(e.g. testif) on it.
3. send packets to testif frequetly,such as ping testif.
4. brctl delif testbr testif.
backtrace:
[1002130.426411] FS: 00007f2235153700(0000) GS:ffff88007b020000(0000) knlGS:0000000000000000
[1002130.524656] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[1002130.594931] CR2: 0000000000000021 CR3: 0000000001dc7000 CR4: 0000000000002660
[1002130.681798] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1002130.768654] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1002130.855525] Process ksoftirqd/1 (pid: 9, threadinfo ffff880062642000, task ffff880062640800)
[1002130.957868] Stack:
[1002130.983758] 0000000080000000 ffff88007bc06a40 ffff88005e3c6c80 ffffffff81901d10
[1002131.073887] ffff88005efc2000 0000000000000001 ffff880062643bf0 ffffffff818497d9
[1002131.163945] ffff88006147e090 0000000000000020 ffff88007bc06a40 ffff88005e3c6c80
[1002131.254023] Call Trace:
[1002131.285086] [<ffffffff81901d10>] ? br_handle_frame_finish+0x2b0/0x2b0
[1002131.364677] [<ffffffff818497d9>] __netif_receive_skb+0x109/0x4a0
[1002131.439081] [<ffffffff81849b12>] __netif_receive_skb+0x442/0x4a0
[1002131.513493] [<ffffffff810e3a6e>] ? __kmalloc_node+0x3e/0x50
[1002131.582744] [<ffffffff8184b7d8>] netif_receive_skb+0x78/0x80
[1002131.653053] [<ffffffff8184b928>] napi_skb_finish+0x48/0x60
[1002131.721269] [<ffffffff8184bf1d>] napi_gro_receive+0xfd/0x130
[1002131.791551] [<ffffffff8192b2f6>] vlan_gro_receive+0x16/0x20
[1002131.860796] [<ffffffff816a41d1>] igb_poll+0x6f1/0xdc0
[1002131.923871] [<ffffffff8196c0d5>] ? __schedule+0x2e5/0x810
[1002131.991071] [<ffffffff8196e38b>] ? _raw_spin_lock_irq+0xb/0x30
[1002132.063421] [<ffffffff8184c0c1>] net_rx_action+0xa1/0x1d0
[1002132.130619] [<ffffffff81057449>] __do_softirq+0x99/0x130
[1002132.196803] [<ffffffff8105759a>] run_ksoftirqd+0xba/0x170
[1002132.264000] [<ffffffff810574e0>] ? __do_softirq+0x130/0x130
[1002132.333258] [<ffffffff8106d306>] kthread+0x96/0xa0
[1002132.393224] [<ffffffff8196ffe4>] kernel_thread_helper+0x4/0x10
[1002132.465580] [<ffffffff8196f0f6>] ? int_ret_from_sys_call+0x7/0x1b
[1002132.541016] [<ffffffff8196e8a1>] ? retint_restore_args+0x5/0x6
[1002132.613368] [<ffffffff8196ffe0>] ? gs_change+0x13/0x13
[1002132.677466] Code: 02 f6 81 a5 01 00 00 40 48 8b 81 e0 02 00 00 89 d6 4c 0f 45 f0 48 8b 05 50 76 20 00 66 33 70 04 66 f7 c6 ff f0 0f 84 d0 00 00 00
[1002132.835500] 0f b6 46 21 3c 02 74 50 3c 03 0f 85 60 ff ff ff 48 8b 05 e1
[1002132.920864] RIP [<ffffffff81901e17>] br_handle_frame+0x107/0x260
[1002132.995296] RSP <ffff880062643b50>
[1002133.038735] CR2: 0000000000000021
Signed-off-by: Xiaoming Gao <newtongao@tencent.com>
---
net/bridge/br_input.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index f06ee39..c8b365f 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -122,7 +122,8 @@ static int br_handle_local_finish(struct sk_buff *skb)
{
struct net_bridge_port *p = br_port_get_rcu(skb->dev);
- br_fdb_update(p->br, p, eth_hdr(skb)->h_source);
+ if (p)
+ br_fdb_update(p->br, p, eth_hdr(skb)->h_source);
return 0; /* process further */
}
@@ -160,6 +161,8 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
return RX_HANDLER_CONSUMED;
p = br_port_get_rcu(skb->dev);
+ if (!p)
+ goto drop;
if (unlikely(is_link_local(dest))) {
/* Pause frames shouldn't be passed up by driver anyway */
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic
2013-06-20 3:08 ` [PATCH] net bridge: add null pointer check, fix panic xiaoming gao
@ 2013-06-20 4:55 ` Eric Dumazet
[not found] ` <CAHBR9PJQRw2vuSaG6gpUAGVYsrotNVTjYU_YY6jsA6o0mq4-Jw@mail.gmail.com>
2013-06-20 7:00 ` xiaoming gao
0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-06-20 4:55 UTC (permalink / raw)
To: xiaoming gao; +Cc: stephen, netdev, bridge, davem, linux-kernel
On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote:
> From: newtongao <newtongao@tencent.com>
> Date: Wed, 19 Jun 2013 14:58:33 +0800
> Subject: [PATCH] net bridge: add null pointer check,fix panic
>
> in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge,
> but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using,
> so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic.
>
> kernel 3.4 also has this bug,i have verified.
> mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet.
Please check current version before sending a patch.
This was most probably fixed in commit 00cfec37484761a44
("net: add a synchronize_net() in netdev_rx_handler_unregister()")
Thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Fwd: [PATCH] net bridge: add null pointer check, fix panic
[not found] ` <CAHBR9PJQRw2vuSaG6gpUAGVYsrotNVTjYU_YY6jsA6o0mq4-Jw@mail.gmail.com>
@ 2013-06-20 6:53 ` xiaoming gao
0 siblings, 0 replies; 8+ messages in thread
From: xiaoming gao @ 2013-06-20 6:53 UTC (permalink / raw)
To: Eric Dumazet; +Cc: stephen, netdev, bridge, davem, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]
HI Eric
the problem is as follow:
br_del_if()-->del_nbp():
list_del_rcu(&p->list);
dev->priv_flags &= ~IFF_BRIDGE_PORT;
------>at this point, the nic be deleting still have rx_handler , so , may
in br_handle_frame()
------>br_port_exists() will return false,so br_get_port_rcu() will return
NULL
------>so in br_handle_frame , there will be a null panic.
netdev_rx_handler_unregister(dev);
synchronize_net();
i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
thanks.
On Thu, Jun 20, 2013 at 12:55 PM, Eric Dumazet <eric.dumazet@gmail.com>wrote:
> On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote:
> > From: newtongao <newtongao@tencent.com>
> > Date: Wed, 19 Jun 2013 14:58:33 +0800
> > Subject: [PATCH] net bridge: add null pointer check,fix panic
> >
> > in kernel 3.0, br_port_get_rcu() may return NULL when network interface
> be deleting from bridge,
> > but in function br_handle_frame and br_handle_local_finish, the pointer
> didn't be checked before using,
> > so all br_port_get_rcu callers must do null check,or there occurs the
> null pointer panic.
> >
> > kernel 3.4 also has this bug,i have verified.
> > mainline kernel still did not check br_port_get_rcu()'s NULL pointer,
> but i have not tested it yet.
>
> Please check current version before sending a patch.
>
> This was most probably fixed in commit 00cfec37484761a44
> ("net: add a synchronize_net() in netdev_rx_handler_unregister()")
>
> Thanks
>
>
>
[-- Attachment #2: Type: text/html, Size: 3753 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic
2013-06-20 4:55 ` Eric Dumazet
[not found] ` <CAHBR9PJQRw2vuSaG6gpUAGVYsrotNVTjYU_YY6jsA6o0mq4-Jw@mail.gmail.com>
@ 2013-06-20 7:00 ` xiaoming gao
2013-06-20 7:29 ` Eric Dumazet
1 sibling, 1 reply; 8+ messages in thread
From: xiaoming gao @ 2013-06-20 7:00 UTC (permalink / raw)
To: Eric Dumazet; +Cc: stephen, davem, bridge, netdev, linux-kernel
Eric Dumazet said, at 2013-6-20 12:55:
> On Thu, 2013-06-20 at 11:08 +0800, xiaoming gao wrote:
>> From: newtongao <newtongao@tencent.com>
>> Date: Wed, 19 Jun 2013 14:58:33 +0800
>> Subject: [PATCH] net bridge: add null pointer check,fix panic
>>
>> in kernel 3.0, br_port_get_rcu() may return NULL when network interface be deleting from bridge,
>> but in function br_handle_frame and br_handle_local_finish, the pointer didn't be checked before using,
>> so all br_port_get_rcu callers must do null check,or there occurs the null pointer panic.
>>
>> kernel 3.4 also has this bug,i have verified.
>> mainline kernel still did not check br_port_get_rcu()'s NULL pointer, but i have not tested it yet.
>
> Please check current version before sending a patch.
>
> This was most probably fixed in commit 00cfec37484761a44
> ("net: add a synchronize_net() in netdev_rx_handler_unregister()")
>
> Thanks
>
>
HI Eric
the problem is as follow:
br_del_if()-->del_nbp():
list_del_rcu(&p->list);
dev->priv_flags &= ~IFF_BRIDGE_PORT;
------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame()
------>br_port_exists() will return false,so br_get_port_rcu() will return NULL
------>so in br_handle_frame , there will be a null panic.
netdev_rx_handler_unregister(dev);
synchronize_net();
i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic
2013-06-20 7:00 ` xiaoming gao
@ 2013-06-20 7:29 ` Eric Dumazet
2013-06-20 7:47 ` xiaoming gao
2013-11-11 10:27 ` Alexander Y. Fomichev
0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-06-20 7:29 UTC (permalink / raw)
To: xiaoming gao; +Cc: stephen, netdev, bridge, davem, linux-kernel
On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote:
> HI Eric
> the problem is as follow:
> br_del_if()-->del_nbp():
>
> list_del_rcu(&p->list);
> dev->priv_flags &= ~IFF_BRIDGE_PORT;
>
> ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame()
> ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL
> ------>so in br_handle_frame , there will be a null panic.
>
> netdev_rx_handler_unregister(dev);
> synchronize_net();
This code is no longer like that in current tree.
Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd
("bridge: remove a redundant synchronize_net()")
>
>
> i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
I claim adding NULL tests is not needed in the fast path, it was clearly
stated in the changelog.
I would change the dismantle path to make sure br_get_port_rcu() does
not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT
Try something like that :
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 1b8b8b8..2edfb80 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
int br_handle_frame_finish(struct sk_buff *skb)
{
const unsigned char *dest = eth_hdr(skb)->h_dest;
- struct net_bridge_port *p = br_port_get_rcu(skb->dev);
+ struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
struct net_bridge *br;
struct net_bridge_fdb_entry *dst;
struct net_bridge_mdb_entry *mdst;
@@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
bool unicast = true;
u16 vid = 0;
- if (!p || p->state == BR_STATE_DISABLED)
+ if (p->state == BR_STATE_DISABLED)
goto drop;
if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid))
@@ -142,7 +142,7 @@ drop:
/* note: already called with rcu_read_lock */
static int br_handle_local_finish(struct sk_buff *skb)
{
- struct net_bridge_port *p = br_port_get_rcu(skb->dev);
+ struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
u16 vid = 0;
br_vlan_get_tag(skb, &vid);
@@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
if (!skb)
return RX_HANDLER_CONSUMED;
- p = br_port_get_rcu(skb->dev);
+ p = __br_port_get_rcu(skb->dev);
if (unlikely(is_link_local_ether_addr(dest))) {
/*
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3be89b3..9fdd467 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -184,6 +184,11 @@ struct net_bridge_port
#define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
+static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev)
+{
+ return rcu_dereference(dev->rx_handler_data);
+}
+
static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
{
struct net_bridge_port *port =
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic
2013-06-20 7:29 ` Eric Dumazet
@ 2013-06-20 7:47 ` xiaoming gao
2013-06-20 8:14 ` Eric Dumazet
2013-11-11 10:27 ` Alexander Y. Fomichev
1 sibling, 1 reply; 8+ messages in thread
From: xiaoming gao @ 2013-06-20 7:47 UTC (permalink / raw)
To: Eric Dumazet; +Cc: stephen, davem, bridge, netdev, linux-kernel
Eric Dumazet said, at 2013-6-20 15:29:
> On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote:
>
>> HI Eric
>> the problem is as follow:
>> br_del_if()-->del_nbp():
>>
>> list_del_rcu(&p->list);
>> dev->priv_flags &= ~IFF_BRIDGE_PORT;
>>
>> ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame()
>> ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL
>> ------>so in br_handle_frame , there will be a null panic.
>>
>> netdev_rx_handler_unregister(dev);
>> synchronize_net();
>
> This code is no longer like that in current tree.
> Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd
> ("bridge: remove a redundant synchronize_net()")
>
>>
>>
>> i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
>
> I claim adding NULL tests is not needed in the fast path, it was clearly
> stated in the changelog.
>
> I would change the dismantle path to make sure br_get_port_rcu() does
> not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT
>
> Try something like that :
>
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 1b8b8b8..2edfb80 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
> int br_handle_frame_finish(struct sk_buff *skb)
> {
> const unsigned char *dest = eth_hdr(skb)->h_dest;
> - struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> + struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
> struct net_bridge *br;
> struct net_bridge_fdb_entry *dst;
> struct net_bridge_mdb_entry *mdst;
> @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
> bool unicast = true;
> u16 vid = 0;
>
> - if (!p || p->state == BR_STATE_DISABLED)
> + if (p->state == BR_STATE_DISABLED)
> goto drop;
>
> if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid))
> @@ -142,7 +142,7 @@ drop:
> /* note: already called with rcu_read_lock */
> static int br_handle_local_finish(struct sk_buff *skb)
> {
> - struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> + struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
> u16 vid = 0;
>
> br_vlan_get_tag(skb, &vid);
> @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> if (!skb)
> return RX_HANDLER_CONSUMED;
>
> - p = br_port_get_rcu(skb->dev);
> + p = __br_port_get_rcu(skb->dev);
>
> if (unlikely(is_link_local_ether_addr(dest))) {
> /*
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 3be89b3..9fdd467 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -184,6 +184,11 @@ struct net_bridge_port
>
> #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
>
> +static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev)
> +{
> + return rcu_dereference(dev->rx_handler_data);
> +}
> +
> static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
> {
> struct net_bridge_port *port =
>
>
if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns NULL, the problem is fixed.
but the codes in mainline is still bugged, am i right??
by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very easily to reproduce.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic
2013-06-20 7:47 ` xiaoming gao
@ 2013-06-20 8:14 ` Eric Dumazet
0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-06-20 8:14 UTC (permalink / raw)
To: xiaoming gao; +Cc: stephen, netdev, bridge, davem, linux-kernel
On Thu, 2013-06-20 at 15:47 +0800, xiaoming gao wrote:
>
> if you remove the test on FF_BRIDGE_PORT, and br_port_get_rcu never returns NULL, the problem is fixed.
> but the codes in mainline is still bugged, am i right??
> by the way, kernel-stable 3.0 and 3.4 tree also have this bug, and is very easily to reproduce.
Commit 00cfec37484761a4 has to be backported anyway, if not already
backported.
I would rather fix things properly, instead of adding defensive code.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net bridge: add null pointer check, fix panic
2013-06-20 7:29 ` Eric Dumazet
2013-06-20 7:47 ` xiaoming gao
@ 2013-11-11 10:27 ` Alexander Y. Fomichev
1 sibling, 0 replies; 8+ messages in thread
From: Alexander Y. Fomichev @ 2013-11-11 10:27 UTC (permalink / raw)
To: Eric Dumazet; +Cc: xiaoming gao, stephen, davem, bridge, netdev, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4775 bytes --]
On Thu, Jun 20, 2013 at 11:29 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-06-20 at 15:00 +0800, xiaoming gao wrote:
>
>> HI Eric
>> the problem is as follow:
>> br_del_if()-->del_nbp():
>>
>> list_del_rcu(&p->list);
>> dev->priv_flags &= ~IFF_BRIDGE_PORT;
>>
>> ------>at this point, the nic be deleting still have rx_handler , so , may in br_handle_frame()
>> ------>br_port_exists() will return false,so br_get_port_rcu() will return NULL
>> ------>so in br_handle_frame , there will be a null panic.
>>
>> netdev_rx_handler_unregister(dev);
>> synchronize_net();
>
> This code is no longer like that in current tree.
> Check commit 4a0b5ec12f0ffc3024616e6dc62cf8a04c54edcd
> ("bridge: remove a redundant synchronize_net()")
>
>>
>>
>> i have checked commit 00cfec37484761a44, i think it didn't fix this bug..
>
> I claim adding NULL tests is not needed in the fast path, it was clearly
> stated in the changelog.
>
> I would change the dismantle path to make sure br_get_port_rcu() does
> not return NULL in the fast path, and remove the test on FF_BRIDGE_PORT
> Try something like that :
>
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 1b8b8b8..2edfb80 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
> int br_handle_frame_finish(struct sk_buff *skb)
> {
> const unsigned char *dest = eth_hdr(skb)->h_dest;
> - struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> + struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
> struct net_bridge *br;
> struct net_bridge_fdb_entry *dst;
> struct net_bridge_mdb_entry *mdst;
> @@ -68,7 +68,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
> bool unicast = true;
> u16 vid = 0;
>
> - if (!p || p->state == BR_STATE_DISABLED)
> + if (p->state == BR_STATE_DISABLED)
> goto drop;
>
> if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid))
> @@ -142,7 +142,7 @@ drop:
> /* note: already called with rcu_read_lock */
> static int br_handle_local_finish(struct sk_buff *skb)
> {
> - struct net_bridge_port *p = br_port_get_rcu(skb->dev);
> + struct net_bridge_port *p = __br_port_get_rcu(skb->dev);
> u16 vid = 0;
>
> br_vlan_get_tag(skb, &vid);
> @@ -172,7 +172,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> if (!skb)
> return RX_HANDLER_CONSUMED;
>
> - p = br_port_get_rcu(skb->dev);
> + p = __br_port_get_rcu(skb->dev);
>
> if (unlikely(is_link_local_ether_addr(dest))) {
> /*
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 3be89b3..9fdd467 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -184,6 +184,11 @@ struct net_bridge_port
>
> #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
>
> +static inline struct net_bridge_port *__br_port_get_rcu(const struct net_device *dev)
> +{
> + return rcu_dereference(dev->rx_handler_data);
> +}
> +
> static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
> {
> struct net_bridge_port *port =
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> I claim adding NULL tests is not needed in the fast path, it was clearly
> stated in the changelog.
Hello,
This commit makes trouble for current STP.
Two days ago i tried to switch to 3.10.18 and i've caught "bad magic"
on uninitialized br->lock:
./net/bridge/br_stp_bpdu.c +158 in br_stp_rcv (trace attached):
p = br_port_get_rcu(dev);
br = p->br;
spin_lock(&br->lock); <- here
-----------------------------------------------
Bisect pointed to this commit
(linux-stable: 960b8e5018a552f62cfbc0dfe94be7b6ba178f13)
(mainline 716ec052d2280d511e10e90ad54a86f5b5d4dcc2)
As far as i can see this happens when:
a) bridge module had been loaded but there was no bridge interface,
br->lock had not been initialized.
b) interface had been in promiscuous mod (tcpdump)
c) stp broadcasts 01:80:c2:00:00:00 coming to this iface
(llc_rcv drops PACKET_OTHERHOST to protect us in promiscuous mode
but seems like not a broadcasts)
d) and finally rx_handler_data had been initialised for this interface
( by macvlan in my case)
It seems like STP needs its own IFF_BRIDGE_PORT check.
probably an easiest option to check it in br_stp_rcv as before (or
probably in llc_rcv)...
--
Best regards.
Alexander Y. Fomichev <git.user@gmail.com>
[-- Attachment #2: br_stp_rcv-spin_lock.trace --]
[-- Type: application/octet-stream, Size: 5983 bytes --]
[ 1761.045223] BUG: soft lockup - CPU#6 stuck for 23s! [md1_resync:1597]
[ 1761.051701] Modules linked in: bridge stp llc macvlan ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter ip_tables [last unloaded: xt_owner]
[ 1761.068794] irq event stamp: 229014772
[ 1761.072564] hardirqs last enabled at (229014772): [<ffffffff8103e492>] vprintk_emit+0x1b2/0x460
[ 1761.081420] hardirqs last disabled at (229014771): [<ffffffff8103e324>] vprintk_emit+0x44/0x460
[ 1761.090191] softirqs last enabled at (229014700): [<ffffffff810444d8>] __do_softirq+0x178/0x2a0
[ 1761.099042] softirqs last disabled at (229014753): [<ffffffff816aa44c>] call_softirq+0x1c/0x30
[ 1761.107729] CPU: 6 PID: 1597 Comm: md1_resync Not tainted 3.10.17 #13
[ 1761.114199] Hardware name: Supermicro X9DRG-HF/X9DRG-HF, BIOS 1.0c 08/22/2012
[ 1761.121368] task: ffff88084f080000 ti: ffff88084f07a000 task.ti: ffff88084f07a000
[ 1761.128908] RIP: 0010:[<ffffffff81322c57>] [<ffffffff81322c57>] delay_tsc+0x37/0x60
[ 1761.136727] RSP: 0018:ffff88107fc03ad0 EFLAGS: 00000206
[ 1761.142073] RAX: 0000000000dc56cb RBX: ffffffff816a065c RCX: 0000000000000001
[ 1761.149250] RDX: 00000000001daa0a RSI: 0000000000dc569f RDI: 0000000000000001
[ 1761.156418] RBP: ffff88107fc03ad0 R08: 0000000000000006 R09: 0000000000000000
[ 1761.163577] R10: 0000000000000002 R11: 0000000000000002 R12: ffff88107fc03a48
[ 1761.170746] R13: ffffffff816a9d2f R14: ffff88107fc03ad0 R15: 000000002a552e2e
[ 1761.177903] FS: 0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
[ 1761.186050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1761.191826] CR2: 0000000001e18000 CR3: 0000000001974000 CR4: 00000000000407e0
[ 1761.199000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1761.206158] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1761.213316] Stack:
[ 1761.216583] ffff88107fc03ae0 ffffffff81322b5f ffff88107fc03b10 ffffffff816950a4
[ 1761.224124] 0000000000007465 ffff8810527a8000 ffff8810527a8000 ffff88084e96e000
[ 1761.231666] ffff88107fc03b30 ffffffff8132b344 ffff88104b5c1180 ffff8810527a8000
[ 1761.239214] Call Trace:
[ 1761.241686] <IRQ>
[ 1761.243619] [<ffffffff81322b5f>] __delay+0xf/0x20
[ 1761.248693] [<ffffffff816950a4>] __spin_lock_debug+0x43/0x7b
[ 1761.254474] [<ffffffff8132b344>] do_raw_spin_lock+0x44/0xc0
[ 1761.260158] [<ffffffff8169f929>] _raw_spin_lock+0x39/0x40
[ 1761.265681] [<ffffffffa005b3da>] ? br_stp_rcv+0x8a/0x3f0 [bridge]
[ 1761.271902] [<ffffffffa005b3da>] br_stp_rcv+0x8a/0x3f0 [bridge]
[ 1761.277941] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp]
[ 1761.284495] [<ffffffff816888e3>] ? printk+0x4d/0x4f
[ 1761.289494] [<ffffffff81044219>] ? local_bh_enable+0x89/0xf0
[ 1761.295275] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp]
[ 1761.301834] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp]
[ 1761.308416] [<ffffffffa000e248>] stp_pdu_rcv+0xe8/0xea0 [stp]
[ 1761.314299] [<ffffffffa000e160>] ? stp_proto_register+0xb0/0xb0 [stp]
[ 1761.320868] [<ffffffffa003e661>] llc_rcv+0x371/0x4e0 [llc]
[ 1761.326486] [<ffffffff815ee646>] __netif_receive_skb_core+0x5c6/0x700
[ 1761.333046] [<ffffffff815ee17a>] ? __netif_receive_skb_core+0xfa/0x700
[ 1761.339694] [<ffffffff810a098f>] ? __lock_release+0x6f/0x100
[ 1761.345469] [<ffffffff815f23e1>] __netif_receive_skb+0x21/0x70
[ 1761.351421] [<ffffffff815f25f3>] netif_receive_skb+0x23/0xf0
[ 1761.357203] [<ffffffff8139f220>] ? dma_issue_pending_all+0xb0/0xe0
[ 1761.363495] [<ffffffff815f36e8>] napi_gro_receive+0xe8/0x140
[ 1761.369283] [<ffffffff814ee0f8>] igb_clean_rx_irq+0x148/0x1c0
[ 1761.375143] [<ffffffff814ee1ce>] igb_poll+0x5e/0xa0
[ 1761.380162] [<ffffffff815f2a29>] net_rx_action+0x139/0x250
[ 1761.385768] [<ffffffff81044409>] ? __do_softirq+0xa9/0x2a0
[ 1761.391375] [<ffffffff81044450>] __do_softirq+0xf0/0x2a0
[ 1761.396828] [<ffffffff816aa44c>] call_softirq+0x1c/0x30
[ 1761.402183] [<ffffffff81003e7d>] do_softirq+0x8d/0xc0
[ 1761.407349] [<ffffffff8104479e>] irq_exit+0xae/0xd0
[ 1761.412352] [<ffffffff816aa993>] do_IRQ+0x63/0xe0
[ 1761.417179] [<ffffffff816a05af>] common_interrupt+0x6f/0x6f
[ 1761.422887] <EOI>
[ 1761.424821] [<ffffffff811072e1>] ? get_page_from_freelist+0x241/0x5d0
[ 1761.431610] [<ffffffff811081c2>] __alloc_pages_nodemask+0x182/0xa50
[ 1761.437989] [<ffffffff8109fe61>] ? __lock_acquire+0x3f1/0xa80
[ 1761.443848] [<ffffffff8106fbe3>] ? finish_task_switch+0x83/0xf0
[ 1761.449905] [<ffffffff8109fe61>] ? __lock_acquire+0x3f1/0xa80
[ 1761.455799] [<ffffffff812ffae6>] ? ioc_lookup_icq+0x86/0xf0
[ 1761.461486] [<ffffffff811998c1>] ? bio_alloc_bioset+0x161/0x1c0
[ 1761.467536] [<ffffffff8114239a>] alloc_pages_current+0xba/0x170
[ 1761.473568] [<ffffffff815663a4>] r10buf_pool_alloc+0x174/0x2c0
[ 1761.479531] [<ffffffff8110241e>] mempool_alloc+0x5e/0x160
[ 1761.485044] [<ffffffff815676c6>] sync_request+0x916/0x1180
[ 1761.490650] [<ffffffff81570af1>] ? is_mddev_idle+0x141/0x170
[ 1761.496439] [<ffffffff815709b5>] ? is_mddev_idle+0x5/0x170
[ 1761.502039] [<ffffffff81573c98>] md_do_sync+0x548/0xd10
[ 1761.507385] [<ffffffff8106fbe3>] ? finish_task_switch+0x83/0xf0
[ 1761.513418] [<ffffffff81063b60>] ? __init_waitqueue_head+0x60/0x60
[ 1761.519720] [<ffffffff81570d8d>] md_thread+0x11d/0x170
[ 1761.524972] [<ffffffff8109e91d>] ? trace_hardirqs_on+0xd/0x10
[ 1761.530840] [<ffffffff81570c70>] ? md_register_thread+0xf0/0xf0
[ 1761.536879] [<ffffffff8106304b>] kthread+0xdb/0xe0
[ 1761.541795] [<ffffffff81062f70>] ? __init_kthread_worker+0x70/0x70
[ 1761.548094] [<ffffffff816a8f5c>] ret_from_fork+0x7c/0xb0
[ 1761.553530] [<ffffffff81062f70>] ? __init_kthread_worker+0x70/0x70
[ 1761.559829] Code: 04 25 1c b0 00 00 0f 1f 00 0f ae e8 0f 31 89 c6 eb 11 66 90 f3 90 65 8b 0c 25 1c b0 00 00 41 39 c8 75 12 0f 1f 00 0f ae e8 0f 31 <89> c2 29 f2 39 fa 72 e1 5d c3 29 c6 01 f7 0f 1f 00 0f ae e8 0f
[-- Attachment #3: fix_stp_bridge_uninitialized.patch --]
[-- Type: application/x-download, Size: 477 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-11-11 10:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <51C2710D.2060405@gmail.com>
2013-06-20 3:08 ` [PATCH] net bridge: add null pointer check, fix panic xiaoming gao
2013-06-20 4:55 ` Eric Dumazet
[not found] ` <CAHBR9PJQRw2vuSaG6gpUAGVYsrotNVTjYU_YY6jsA6o0mq4-Jw@mail.gmail.com>
2013-06-20 6:53 ` Fwd: " xiaoming gao
2013-06-20 7:00 ` xiaoming gao
2013-06-20 7:29 ` Eric Dumazet
2013-06-20 7:47 ` xiaoming gao
2013-06-20 8:14 ` Eric Dumazet
2013-11-11 10:27 ` Alexander Y. Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).