* [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2
@ 2023-11-04 10:03 Jozsef Kadlecsik
2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik
0 siblings, 1 reply; 5+ messages in thread
From: Jozsef Kadlecsik @ 2023-11-04 10:03 UTC (permalink / raw)
To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao
Hi Pablo,
Please apply the next patch to your nf tree, which fixes a race condition:
* Due to the insufficiently protected set pointers, there's a race between a fast
swap/destroy and a slow kernel side add/del/test element operation in ipset.
The attached patch fixes it by extending the the rcu_read_lock() protected areas and
forcing ip_set_swap() to wait for the rcu_read_unlock() markers.
v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
ip_set_destroy() unnecessarily when all sets are destroyed.
Thanks!
Jozsef
The following changes since commit 7153a404fb70d21097af3169354e1e5fda3fbb02:
Merge tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf (2023-09-07 11:47:15 +0200)
are available in the Git repository at:
git://blackhole.kfki.hu/nf 682a101165d8b640577ed
for you to fetch changes up to 682a101165d8b640577ede10ca2a803250e48ba8:
netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 (2023-11-04 10:58:49 +0100)
----------------------------------------------------------------
Jozsef Kadlecsik (1):
netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
net/netfilter/ipset/ip_set_core.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 2023-11-04 10:03 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2 Jozsef Kadlecsik @ 2023-11-04 10:03 ` Jozsef Kadlecsik 2023-11-04 10:35 ` Florian Westphal 0 siblings, 1 reply; 5+ messages in thread From: Jozsef Kadlecsik @ 2023-11-04 10:03 UTC (permalink / raw) To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao Linkui Xiao reported that there's a race condition when ipset swap and destroy is called, which can lead to crash in add/del/test element operations. Swap then destroy are usual operations to replace a set with another one in a production system. The issue can in some cases be reproduced with the script: ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip1 172.20.0.0/16 ipset add hash_ip1 192.168.0.0/16 iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT while [ 1 ] do # ... Ongoing traffic... ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip2 172.20.0.0/16 ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 sleep 0.05 done In the race case the possible order of the operations are CPU0 CPU1 ip_set_test ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 hash_net_kadt Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy removed it, hash_net_kadt crashes. The fix is to protect both the list of the sets and the set pointers in an extended RCU region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock(). The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>. v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden ip_set_destroy() unnecessarily when all sets are destroyed. Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/ Reported by: Linkui Xiao <xiaolinkui@kylinos.cn> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> --- net/netfilter/ipset/ip_set_core.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index e564b5174261..a50ded1ee66d 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index) struct ip_set_net *inst = ip_set_pernet(net); rcu_read_lock(); - /* ip_set_list itself needs to be protected */ + /* ip_set_list and the set pointer need to be protected */ set = rcu_dereference(inst->ip_set_list)[index]; - rcu_read_unlock(); return set; } +static inline void +ip_set_rcu_put(struct ip_set *set __always_unused) +{ + rcu_read_unlock(); +} + static inline void ip_set_lock(struct ip_set *set) { @@ -736,8 +741,10 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb, pr_debug("set %s, index %u\n", set->name, index); if (opt->dim < set->type->dimension || - !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) + !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) { + ip_set_rcu_put(set); return 0; + } ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt); @@ -756,6 +763,7 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb, ret = -ret; } + ip_set_rcu_put(set); /* Convert error codes to nomatch */ return (ret < 0 ? 0 : ret); } @@ -772,12 +780,15 @@ ip_set_add(ip_set_id_t index, const struct sk_buff *skb, pr_debug("set %s, index %u\n", set->name, index); if (opt->dim < set->type->dimension || - !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) + !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) { + ip_set_rcu_put(set); return -IPSET_ERR_TYPE_MISMATCH; + } ip_set_lock(set); ret = set->variant->kadt(set, skb, par, IPSET_ADD, opt); ip_set_unlock(set); + ip_set_rcu_put(set); return ret; } @@ -794,12 +805,15 @@ ip_set_del(ip_set_id_t index, const struct sk_buff *skb, pr_debug("set %s, index %u\n", set->name, index); if (opt->dim < set->type->dimension || - !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) + !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) { + ip_set_rcu_put(set); return -IPSET_ERR_TYPE_MISMATCH; + } ip_set_lock(set); ret = set->variant->kadt(set, skb, par, IPSET_DEL, opt); ip_set_unlock(set); + ip_set_rcu_put(set); return ret; } @@ -874,6 +888,7 @@ ip_set_name_byindex(struct net *net, ip_set_id_t index, char *name) read_lock_bh(&ip_set_ref_lock); strscpy_pad(name, set->name, IPSET_MAXNAMELEN); read_unlock_bh(&ip_set_ref_lock); + ip_set_rcu_put(set); } EXPORT_SYMBOL_GPL(ip_set_name_byindex); @@ -1389,6 +1404,9 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info, ip_set(inst, to_id) = from; write_unlock_bh(&ip_set_ref_lock); + /* Make sure all readers of the old set pointers are completed. */ + synchronize_rcu(); + return 0; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik @ 2023-11-04 10:35 ` Florian Westphal 2023-11-08 10:46 ` Jozsef Kadlecsik 0 siblings, 1 reply; 5+ messages in thread From: Florian Westphal @ 2023-11-04 10:35 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: netfilter-devel, Pablo Neira Ayuso, Linkui Xiao Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > Linkui Xiao reported that there's a race condition when ipset swap and destroy is > called, which can lead to crash in add/del/test element operations. Swap then > destroy are usual operations to replace a set with another one in a production > system. The issue can in some cases be reproduced with the script: Hi Jozsef, > iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT > while [ 1 ] > do > # ... Ongoing traffic... > ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576 > ipset add hash_ip2 172.20.0.0/16 > ipset swap hash_ip1 hash_ip2 > ipset destroy hash_ip2 > sleep 0.05 > done > > In the race case the possible order of the operations are > > CPU0 CPU1 > ip_set_test > ipset swap hash_ip1 hash_ip2 > ipset destroy hash_ip2 > hash_net_kadt > > Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which > is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy > removed it, hash_net_kadt crashes. > > The fix is to protect both the list of the sets and the set pointers in an extended RCU > region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock(). > The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>. > > v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden > ip_set_destroy() unnecessarily when all sets are destroyed. Hmm. Isn't it enough to only call synchronize_rcu() in ip_set_swap? All netfilter hooks run with rcu_read_lock() held, em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair. > @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index) > struct ip_set_net *inst = ip_set_pernet(net); > > rcu_read_lock(); > - /* ip_set_list itself needs to be protected */ > + /* ip_set_list and the set pointer need to be protected */ > set = rcu_dereference(inst->ip_set_list)[index]; > - rcu_read_unlock(); > > return set; > } ... so I don't understand why ip_set_rcu_get() has to extend the locked section. AFAICS there are only two type of callers: 1. rcu read lock is already held (datapath) 2. ipset nfnl subsys mutex is held *probably* This could be changed in a separate patch to: - rcu_read_lock(); - /* ip_set_list itself needs to be protected */ - set = rcu_dereference(inst->ip_set_list)[index]; - rcu_read_unlock(); + /* ip_set_list and the set pointer need to be protected */ + return rcu_dereference_check(inst->ip_set_list, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)[index]; This is an example, probably better to add a small "ip_set_dereference_nfnl" helper to hide the lockdep construct... Not saying the patch is wrong; rcu read locks nest and ipset locking is not simple so I might be missing something. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 2023-11-04 10:35 ` Florian Westphal @ 2023-11-08 10:46 ` Jozsef Kadlecsik 0 siblings, 0 replies; 5+ messages in thread From: Jozsef Kadlecsik @ 2023-11-08 10:46 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Pablo Neira Ayuso, Linkui Xiao Hi Florian, On Sat, 4 Nov 2023, Florian Westphal wrote: > Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > > Linkui Xiao reported that there's a race condition when ipset swap and destroy is > > called, which can lead to crash in add/del/test element operations. Swap then > > destroy are usual operations to replace a set with another one in a production > > system. The issue can in some cases be reproduced with the script: > > > iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT > > while [ 1 ] > > do > > # ... Ongoing traffic... > > ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576 > > ipset add hash_ip2 172.20.0.0/16 > > ipset swap hash_ip1 hash_ip2 > > ipset destroy hash_ip2 > > sleep 0.05 > > done > > > > In the race case the possible order of the operations are > > > > CPU0 CPU1 > > ip_set_test > > ipset swap hash_ip1 hash_ip2 > > ipset destroy hash_ip2 > > hash_net_kadt > > > > Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which > > is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy > > removed it, hash_net_kadt crashes. > > > > The fix is to protect both the list of the sets and the set pointers in an extended RCU > > region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock(). > > The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>. > > > > v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden > > ip_set_destroy() unnecessarily when all sets are destroyed. > > Hmm. Isn't it enough to only call synchronize_rcu() in ip_set_swap? > > All netfilter hooks run with rcu_read_lock() held, em_ipset.c wraps the > entire ip_set_test() in rcu read lock/unlock pair. Hm, actually, you are right. Technically there's no need to extend the rcu_read_lock() protected area inside ipset itself. > > @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index) > > struct ip_set_net *inst = ip_set_pernet(net); > > > > rcu_read_lock(); > > - /* ip_set_list itself needs to be protected */ > > + /* ip_set_list and the set pointer need to be protected */ > > set = rcu_dereference(inst->ip_set_list)[index]; > > - rcu_read_unlock(); > > > > return set; > > } > > ... so I don't understand why ip_set_rcu_get() has to extend > the locked section. > > AFAICS there are only two type of callers: > 1. rcu read lock is already held (datapath) > 2. ipset nfnl subsys mutex is held > > *probably* This could be changed in a separate patch to: > > - rcu_read_lock(); > - /* ip_set_list itself needs to be protected */ > - set = rcu_dereference(inst->ip_set_list)[index]; > - rcu_read_unlock(); > + /* ip_set_list and the set pointer need to be protected */ > + return rcu_dereference_check(inst->ip_set_list, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)[index]; > > This is an example, probably better to add a small > "ip_set_dereference_nfnl" helper to hide the lockdep construct... > > Not saying the patch is wrong; rcu read locks nest and > ipset locking is not simple so I might be missing something. I'm going to rework the patch, thanks for your input! Best regards, Jozsef -- E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v3
@ 2023-11-13 20:09 Jozsef Kadlecsik
2023-11-13 20:09 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 Jozsef Kadlecsik
0 siblings, 1 reply; 5+ messages in thread
From: Jozsef Kadlecsik @ 2023-11-13 20:09 UTC (permalink / raw)
To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao, Florian Westphal
Hi Pablo,
Please apply the next patch to your nf tree, which fixes a race condition:
* There's a race between a fast swap/destroy and a slow kernel side add/del/test element
operation in ipset. The attached patch fixes it by forcing ip_set_swap() to wait for
all readers to finish accessing the old set pointers.
v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
ip_set_destroy() unnecessarily when all sets are destroyed.
v3: Florian Westphal pointed out that all netfilter hooks run with rcu_read_lock() held
and em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair
So there's no need to extend the rcu read locked area in ipset itse
Thanks!
Jozsef
The following changes since commit 7153a404fb70d21097af3169354e1e5fda3fbb02:
Merge tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf (2023-09-07 11:47:15 +0200)
are available in the Git repository at:
git://blackhole.kfki.hu/nf eca49fc2a1c2d
for you to fetch changes up to eca49fc2a1c2deafb870d88bef20690ecd5aeefd:
netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 (2023-11-13 21:04:43 +0100)
----------------------------------------------------------------
Jozsef Kadlecsik (1):
netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
net/netfilter/ipset/ip_set_core.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 2023-11-13 20:09 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v3 Jozsef Kadlecsik @ 2023-11-13 20:09 ` Jozsef Kadlecsik 0 siblings, 0 replies; 5+ messages in thread From: Jozsef Kadlecsik @ 2023-11-13 20:09 UTC (permalink / raw) To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao, Florian Westphal Linkui Xiao reported that there's a race condition when ipset swap and destroy is called, which can lead to crash in add/del/test element operations. Swap then destroy are usual operations to replace a set with another one in a production system. The issue can in some cases be reproduced with the script: ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip1 172.20.0.0/16 ipset add hash_ip1 192.168.0.0/16 iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT while [ 1 ] do # ... Ongoing traffic... ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip2 172.20.0.0/16 ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 sleep 0.05 done In the race case the possible order of the operations are CPU0 CPU1 ip_set_test ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 hash_net_kadt Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy removed it, hash_net_kadt crashes. The fix is to force ip_set_swap() to wait for all readers to finish accessing the old set pointers by calling synchronize_rcu(). The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>. v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden ip_set_destroy() unnecessarily when all sets are destroyed. v3: Florian Westphal pointed out that all netfilter hooks run with rcu_read_lock() held and em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair. So there's no need to extend the rcu read locked area in ipset itself. Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/ Reported by: Linkui Xiao <xiaolinkui@kylinos.cn> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> --- net/netfilter/ipset/ip_set_core.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index e564b5174261..43cd64d6dc26 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -61,6 +61,8 @@ MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET); ip_set_dereference((inst)->ip_set_list)[id] #define ip_set_ref_netlink(inst,id) \ rcu_dereference_raw((inst)->ip_set_list)[id] +#define ip_set_dereference_nfnl(p) \ + rcu_dereference_check(p, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)) /* The set types are implemented in modules and registered set types * can be found in ip_set_type_list. Adding/deleting types is @@ -700,15 +702,10 @@ __ip_set_put_netlink(struct ip_set *set) static struct ip_set * ip_set_rcu_get(struct net *net, ip_set_id_t index) { - struct ip_set *set; struct ip_set_net *inst = ip_set_pernet(net); - rcu_read_lock(); - /* ip_set_list itself needs to be protected */ - set = rcu_dereference(inst->ip_set_list)[index]; - rcu_read_unlock(); - - return set; + /* ip_set_list and the set pointer need to be protected */ + return ip_set_dereference_nfnl(inst->ip_set_list)[index]; } static inline void @@ -1389,6 +1386,9 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info, ip_set(inst, to_id) = from; write_unlock_bh(&ip_set_ref_lock); + /* Make sure all readers of the old set pointers are completed. */ + synchronize_rcu(); + return 0; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-13 20:09 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-04 10:03 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2 Jozsef Kadlecsik 2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik 2023-11-04 10:35 ` Florian Westphal 2023-11-08 10:46 ` Jozsef Kadlecsik -- strict thread matches above, loose matches on Subject: below -- 2023-11-13 20:09 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v3 Jozsef Kadlecsik 2023-11-13 20:09 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 Jozsef Kadlecsik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).