* [PATCH AUTOSEL 4.14 070/123] netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
[not found] <20190327181628.15899-1-sashal@kernel.org>
@ 2019-03-27 18:15 ` Sasha Levin
2019-03-27 18:16 ` [PATCH AUTOSEL 4.14 110/123] netfilter: physdev: relax br_netfilter dependency Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2019-03-27 18:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Chieh-Min Wang, Pablo Neira Ayuso, Sasha Levin, netfilter-devel,
coreteam, netdev
From: Chieh-Min Wang <chiehminw@synology.com>
[ Upstream commit 13f5251fd17088170c18844534682d9cab5ff5aa ]
For bridge(br_flood) or broadcast/multicast packets, they could clone
skb with unconfirmed conntrack which break the rule that unconfirmed
skb->_nfct is never shared. With nfqueue running on my system, the race
can be easily reproduced with following warning calltrace:
[13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P W 4.4.60 #7744
[13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
[13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14)
[13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8)
[13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0)
[13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24)
[13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618)
[13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc)
[13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
[13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
[13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c)
[13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
[13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
[13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4)
[13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac)
[13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170)
[13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420)
[13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248)
[13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0)
[13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c)
[13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368)
[13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44)
[13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200)
[13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64)
[13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34)
The original code just triggered the warning but do nothing. It will
caused the shared conntrack moves to the dying list and the packet be
droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack).
- Reproduce steps:
+----------------------------+
| br0(bridge) |
| |
+-+---------+---------+------+
| eth0| | eth1| | eth2|
| | | | | |
+--+--+ +--+--+ +---+-+
| | |
| | |
+--+-+ +-+--+ +--+-+
| PC1| | PC2| | PC3|
+----+ +----+ +----+
iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass
ps: Our nfq userspace program will set mark on packets whose connection
has already been processed.
PC1 sends broadcast packets simulated by hping3:
hping3 --rand-source --udp 192.168.1.255 -i u100
- Broadcast racing flow chart is as follow:
br_handle_frame
BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
// skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
br_handle_frame_finish
// check if this packet is broadcast
br_flood_forward
br_flood
list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port
maybe_deliver
deliver_clone
skb = skb_clone(skb)
__br_forward
BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
// queue in our nfq and received by our userspace program
// goto __nf_conntrack_confirm with process context on CPU 1
br_pass_frame_up
BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
// goto __nf_conntrack_confirm with softirq context on CPU 0
Because conntrack confirm can happen at both INPUT and POSTROUTING
stage. So with NFQUEUE running, skb->_nfct with the same unconfirmed
conntrack could race on different core.
This patch fixes a repeating kernel splat, now it is only displayed
once.
Signed-off-by: Chieh-Min Wang <chiehminw@synology.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_core.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f07357ba9629..06520bf30f29 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -763,10 +763,18 @@ __nf_conntrack_confirm(struct sk_buff *skb)
* REJECT will give spurious warnings here.
*/
- /* No external references means no one else could have
- * confirmed us.
+ /* Another skb with the same unconfirmed conntrack may
+ * win the race. This may happen for bridge(br_flood)
+ * or broadcast/multicast packets do skb_clone with
+ * unconfirmed conntrack.
*/
- WARN_ON(nf_ct_is_confirmed(ct));
+ if (unlikely(nf_ct_is_confirmed(ct))) {
+ WARN_ON_ONCE(1);
+ nf_conntrack_double_unlock(hash, reply_hash);
+ local_bh_enable();
+ return NF_DROP;
+ }
+
pr_debug("Confirming conntrack %p\n", ct);
/* We have to check the DYING flag after unlink to prevent
* a race against nf_ct_get_next_corpse() possibly called from
--
2.19.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH AUTOSEL 4.14 110/123] netfilter: physdev: relax br_netfilter dependency
[not found] <20190327181628.15899-1-sashal@kernel.org>
2019-03-27 18:15 ` [PATCH AUTOSEL 4.14 070/123] netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm Sasha Levin
@ 2019-03-27 18:16 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2019-03-27 18:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Florian Westphal, Pablo Neira Ayuso, Sasha Levin, netfilter-devel,
coreteam, netdev
From: Florian Westphal <fw@strlen.de>
[ Upstream commit 8e2f311a68494a6677c1724bdcb10bada21af37c ]
Following command:
iptables -D FORWARD -m physdev ...
causes connectivity loss in some setups.
Reason is that iptables userspace will probe kernel for the module revision
of the physdev patch, and physdev has an artificial dependency on
br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
is loaded).
This causes the "phydev" module to be loaded, which in turn enables the
"call-iptables" infrastructure.
bridged packets might then get dropped by the iptables ruleset.
The better fix would be to change the "call-iptables" defaults to 0 and
enforce explicit setting to 1, but that breaks backwards compatibility.
This does the next best thing: add a request_module call to checkentry.
This was a stray '-D ... -m physdev' won't activate br_netfilter
anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/net/netfilter/br_netfilter.h | 1 -
net/bridge/br_netfilter_hooks.c | 5 -----
net/netfilter/xt_physdev.c | 9 +++++++--
3 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
index 74af19c3a8f7..a4ba601b5d04 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -49,7 +49,6 @@ static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
}
struct net_device *setup_pre_routing(struct sk_buff *skb);
-void br_netfilter_enable(void);
#if IS_ENABLED(CONFIG_IPV6)
int br_validate_ipv6(struct net *net, struct sk_buff *skb);
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 3f3859b8d49f..5fd283d9929e 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -881,11 +881,6 @@ static const struct nf_br_ops br_ops = {
.br_dev_xmit_hook = br_nf_dev_xmit,
};
-void br_netfilter_enable(void)
-{
-}
-EXPORT_SYMBOL_GPL(br_netfilter_enable);
-
/* For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
* br_dev_queue_push_xmit is called afterwards */
static const struct nf_hook_ops br_nf_ops[] = {
diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index bb33598e4530..ec247d8370e8 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -96,8 +96,7 @@ physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
static int physdev_mt_check(const struct xt_mtchk_param *par)
{
const struct xt_physdev_info *info = par->matchinfo;
-
- br_netfilter_enable();
+ static bool brnf_probed __read_mostly;
if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
info->bitmask & ~XT_PHYSDEV_OP_MASK)
@@ -113,6 +112,12 @@ static int physdev_mt_check(const struct xt_mtchk_param *par)
if (par->hook_mask & (1 << NF_INET_LOCAL_OUT))
return -EINVAL;
}
+
+ if (!brnf_probed) {
+ brnf_probed = true;
+ request_module("br_netfilter");
+ }
+
return 0;
}
--
2.19.1
^ permalink raw reply related [flat|nested] 2+ messages in thread