From: xiaolinkui <xiaolinkui@gmail.com>
To: pablo@netfilter.org, kadlec@netfilter.org, fw@strlen.de,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, kuniyu@amazon.com, justinstitt@google.com
Cc: netfilter-devel@vger.kernel.org,
Linkui Xiao <xiaolinkui@kylinos.cn>,
stable@vger.kernel.org
Subject: [PATCH] netfilter: ipset: fix race condition in ipset swap, destroy and test/add/del
Date: Tue, 17 Oct 2023 20:52:56 +0800 [thread overview]
Message-ID: <20231017125256.23056-1-xiaolinkui@gmail.com> (raw)
From: Linkui Xiao <xiaolinkui@kylinos.cn>
There is a race condition which can be demonstrated by the following
script:
ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576
ipset add hash_ip1 172.20.0.0/16
ipset add hash_ip1 192.168.0.0/16
iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
while [ 1 ]
do
ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
ipset add hash_ip2 172.20.0.0/16
ipset swap hash_ip1 hash_ip2
ipset destroy hash_ip2
sleep 0.05
done
Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. So after running this script for a period of time, the following
race situations may occur:
CPU0: CPU1:
ipt_do_table
->set_match_v4
->ip_set_test
ipset swap hash_ip1 hash_ip2
ipset destroy hash_ip2
->hash_net4_kadt
CPU0 found ipset through the index, and at this time, hash_ip2 has been
destroyed by CPU1 through name search. So CPU0 will crash when accessing
set->data in the function hash_net4_kadt.
With this fix in place swap will wait for the ongoing operations to be
finished.
V1->V2 changes:
- replace ref_netlink with rcu synchonize_rcu()
Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/
Cc: stable@vger.kernel.org
Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn>
---
net/netfilter/ipset/ip_set_core.c | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 35d2f9c9ada0..62ee4de6ffee 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -712,13 +712,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index)
struct ip_set_net *inst = ip_set_pernet(net);
rcu_read_lock();
- /* ip_set_list itself needs to be protected */
+ /* ip_set_list and the set pointer need to be protected */
set = rcu_dereference(inst->ip_set_list)[index];
- rcu_read_unlock();
return set;
}
+static inline void
+ip_set_rcu_put(struct ip_set *set __always_unused)
+{
+ rcu_read_unlock();
+}
+
static inline void
ip_set_lock(struct ip_set *set)
{
@@ -744,8 +749,10 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb,
pr_debug("set %s, index %u\n", set->name, index);
if (opt->dim < set->type->dimension ||
- !(opt->family == set->family || set->family == NFPROTO_UNSPEC))
+ !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) {
+ ip_set_rcu_put(set);
return 0;
+ }
ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt);
@@ -764,6 +771,7 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb,
ret = -ret;
}
+ ip_set_rcu_put(set);
/* Convert error codes to nomatch */
return (ret < 0 ? 0 : ret);
}
@@ -780,12 +788,15 @@ ip_set_add(ip_set_id_t index, const struct sk_buff *skb,
pr_debug("set %s, index %u\n", set->name, index);
if (opt->dim < set->type->dimension ||
- !(opt->family == set->family || set->family == NFPROTO_UNSPEC))
+ !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) {
+ ip_set_rcu_put(set);
return -IPSET_ERR_TYPE_MISMATCH;
+ }
ip_set_lock(set);
ret = set->variant->kadt(set, skb, par, IPSET_ADD, opt);
ip_set_unlock(set);
+ ip_set_rcu_put(set);
return ret;
}
@@ -802,12 +813,15 @@ ip_set_del(ip_set_id_t index, const struct sk_buff *skb,
pr_debug("set %s, index %u\n", set->name, index);
if (opt->dim < set->type->dimension ||
- !(opt->family == set->family || set->family == NFPROTO_UNSPEC))
+ !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) {
+ ip_set_rcu_put(set);
return -IPSET_ERR_TYPE_MISMATCH;
+ }
ip_set_lock(set);
ret = set->variant->kadt(set, skb, par, IPSET_DEL, opt);
ip_set_unlock(set);
+ ip_set_rcu_put(set);
return ret;
}
@@ -882,6 +896,7 @@ ip_set_name_byindex(struct net *net, ip_set_id_t index, char *name)
read_lock_bh(&ip_set_ref_lock);
strscpy_pad(name, set->name, IPSET_MAXNAMELEN);
read_unlock_bh(&ip_set_ref_lock);
+ ip_set_rcu_put(set);
}
EXPORT_SYMBOL_GPL(ip_set_name_byindex);
@@ -1348,6 +1363,9 @@ static int ip_set_rename(struct sk_buff *skb, const struct nfnl_info *info,
* protected by the ip_set_ref_lock. The kernel interfaces
* do not hold the mutex but the pointer settings are atomic
* so the ip_set_list always contains valid pointers to the sets.
+ * However after swapping, a userspace set destroy command could
+ * remove a set still processed by kernel side add/del/test.
+ * Therefore we need to wait for the ongoing operations to be finished.
*/
static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
@@ -1397,6 +1415,9 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
ip_set(inst, to_id) = from;
write_unlock_bh(&ip_set_ref_lock);
+ /* Make sure all readers of the old set pointers are completed. */
+ synchronize_rcu();
+
return 0;
}
--
2.17.1
next reply other threads:[~2023-10-17 12:53 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-17 12:52 xiaolinkui [this message]
2023-10-17 15:05 ` [PATCH] netfilter: ipset: fix race condition in ipset swap, destroy and test/add/del Pablo Neira Ayuso
2023-10-17 19:26 ` Jozsef Kadlecsik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231017125256.23056-1-xiaolinkui@gmail.com \
--to=xiaolinkui@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=justinstitt@google.com \
--cc=kadlec@netfilter.org \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=netfilter-devel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
--cc=stable@vger.kernel.org \
--cc=xiaolinkui@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox