netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2
@ 2023-11-04 10:03 Jozsef Kadlecsik
  2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik
  0 siblings, 1 reply; 5+ messages in thread
From: Jozsef Kadlecsik @ 2023-11-04 10:03 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao

Hi Pablo,

Please apply the next patch to your nf tree, which fixes a race condition:

* Due to the insufficiently protected set pointers, there's a race between a fast
  swap/destroy and a slow kernel side add/del/test element operation in ipset.
  The attached patch fixes it by extending the the rcu_read_lock() protected areas and
  forcing ip_set_swap() to wait for the rcu_read_unlock() markers.
  v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
  ip_set_destroy() unnecessarily when all sets are destroyed.

Thanks!
Jozsef

The following changes since commit 7153a404fb70d21097af3169354e1e5fda3fbb02:

  Merge tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf (2023-09-07 11:47:15 +0200)

are available in the Git repository at:

  git://blackhole.kfki.hu/nf 682a101165d8b640577ed

for you to fetch changes up to 682a101165d8b640577ede10ca2a803250e48ba8:

  netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 (2023-11-04 10:58:49 +0100)

----------------------------------------------------------------
Jozsef Kadlecsik (1):
      netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2

 net/netfilter/ipset/ip_set_core.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
  2023-11-04 10:03 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2 Jozsef Kadlecsik
@ 2023-11-04 10:03 ` Jozsef Kadlecsik
  2023-11-04 10:35   ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Jozsef Kadlecsik @ 2023-11-04 10:03 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao

Linkui Xiao reported that there's a race condition when ipset swap and destroy is
called, which can lead to crash in add/del/test element operations. Swap then
destroy are usual operations to replace a set with another one in a production
system. The issue can in some cases be reproduced with the script:

ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576
ipset add hash_ip1 172.20.0.0/16
ipset add hash_ip1 192.168.0.0/16
iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
while [ 1 ]
do
	# ... Ongoing traffic...
        ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
        ipset add hash_ip2 172.20.0.0/16
        ipset swap hash_ip1 hash_ip2
        ipset destroy hash_ip2
        sleep 0.05
done

In the race case the possible order of the operations are

	CPU0			CPU1
	ip_set_test
				ipset swap hash_ip1 hash_ip2
				ipset destroy hash_ip2
	hash_net_kadt

Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
removed it, hash_net_kadt crashes.

The fix is to protect both the list of the sets and the set pointers in an extended RCU
region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock().

The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>.

v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
ip_set_destroy() unnecessarily when all sets are destroyed.

Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/
Reported by: Linkui Xiao <xiaolinkui@kylinos.cn>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index e564b5174261..a50ded1ee66d 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index)
 	struct ip_set_net *inst = ip_set_pernet(net);
 
 	rcu_read_lock();
-	/* ip_set_list itself needs to be protected */
+	/* ip_set_list and the set pointer need to be protected */
 	set = rcu_dereference(inst->ip_set_list)[index];
-	rcu_read_unlock();
 
 	return set;
 }
 
+static inline void
+ip_set_rcu_put(struct ip_set *set __always_unused)
+{
+	rcu_read_unlock();
+}
+
 static inline void
 ip_set_lock(struct ip_set *set)
 {
@@ -736,8 +741,10 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb,
 	pr_debug("set %s, index %u\n", set->name, index);
 
 	if (opt->dim < set->type->dimension ||
-	    !(opt->family == set->family || set->family == NFPROTO_UNSPEC))
+	    !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) {
+		ip_set_rcu_put(set);
 		return 0;
+	}
 
 	ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt);
 
@@ -756,6 +763,7 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb,
 			ret = -ret;
 	}
 
+	ip_set_rcu_put(set);
 	/* Convert error codes to nomatch */
 	return (ret < 0 ? 0 : ret);
 }
@@ -772,12 +780,15 @@ ip_set_add(ip_set_id_t index, const struct sk_buff *skb,
 	pr_debug("set %s, index %u\n", set->name, index);
 
 	if (opt->dim < set->type->dimension ||
-	    !(opt->family == set->family || set->family == NFPROTO_UNSPEC))
+	    !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) {
+		ip_set_rcu_put(set);
 		return -IPSET_ERR_TYPE_MISMATCH;
+	}
 
 	ip_set_lock(set);
 	ret = set->variant->kadt(set, skb, par, IPSET_ADD, opt);
 	ip_set_unlock(set);
+	ip_set_rcu_put(set);
 
 	return ret;
 }
@@ -794,12 +805,15 @@ ip_set_del(ip_set_id_t index, const struct sk_buff *skb,
 	pr_debug("set %s, index %u\n", set->name, index);
 
 	if (opt->dim < set->type->dimension ||
-	    !(opt->family == set->family || set->family == NFPROTO_UNSPEC))
+	    !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) {
+		ip_set_rcu_put(set);
 		return -IPSET_ERR_TYPE_MISMATCH;
+	}
 
 	ip_set_lock(set);
 	ret = set->variant->kadt(set, skb, par, IPSET_DEL, opt);
 	ip_set_unlock(set);
+	ip_set_rcu_put(set);
 
 	return ret;
 }
@@ -874,6 +888,7 @@ ip_set_name_byindex(struct net *net, ip_set_id_t index, char *name)
 	read_lock_bh(&ip_set_ref_lock);
 	strscpy_pad(name, set->name, IPSET_MAXNAMELEN);
 	read_unlock_bh(&ip_set_ref_lock);
+	ip_set_rcu_put(set);
 }
 EXPORT_SYMBOL_GPL(ip_set_name_byindex);
 
@@ -1389,6 +1404,9 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
 	ip_set(inst, to_id) = from;
 	write_unlock_bh(&ip_set_ref_lock);
 
+	/* Make sure all readers of the old set pointers are completed. */
+	synchronize_rcu();
+
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
  2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik
@ 2023-11-04 10:35   ` Florian Westphal
  2023-11-08 10:46     ` Jozsef Kadlecsik
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2023-11-04 10:35 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, Pablo Neira Ayuso, Linkui Xiao

Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> Linkui Xiao reported that there's a race condition when ipset swap and destroy is
> called, which can lead to crash in add/del/test element operations. Swap then
> destroy are usual operations to replace a set with another one in a production
> system. The issue can in some cases be reproduced with the script:

Hi Jozsef,

> iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
> while [ 1 ]
> do
> 	# ... Ongoing traffic...
>         ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
>         ipset add hash_ip2 172.20.0.0/16
>         ipset swap hash_ip1 hash_ip2
>         ipset destroy hash_ip2
>         sleep 0.05
> done
> 
> In the race case the possible order of the operations are
> 
> 	CPU0			CPU1
> 	ip_set_test
> 				ipset swap hash_ip1 hash_ip2
> 				ipset destroy hash_ip2
> 	hash_net_kadt
> 
> Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
> is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
> removed it, hash_net_kadt crashes.
> 
> The fix is to protect both the list of the sets and the set pointers in an extended RCU
> region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock().
> The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>.
> 
> v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
> ip_set_destroy() unnecessarily when all sets are destroyed.

Hmm.  Isn't it enough to only call synchronize_rcu() in ip_set_swap?

All netfilter hooks run with rcu_read_lock() held,
em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair.

> @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index)
>  	struct ip_set_net *inst = ip_set_pernet(net);
>  
>  	rcu_read_lock();
> -	/* ip_set_list itself needs to be protected */
> +	/* ip_set_list and the set pointer need to be protected */
>  	set = rcu_dereference(inst->ip_set_list)[index];
> -	rcu_read_unlock();
>  
>  	return set;
>  }

... so I don't understand why ip_set_rcu_get() has to extend
the locked section.

AFAICS there are only two type of callers:
1. rcu read lock is already held (datapath)
2. ipset nfnl subsys mutex is held

*probably* This could be changed in a separate patch to:

  -	rcu_read_lock();
  -	/* ip_set_list itself needs to be protected */
  -	set = rcu_dereference(inst->ip_set_list)[index];
  -	rcu_read_unlock();
  +	/* ip_set_list and the set pointer need to be protected */
  +	return rcu_dereference_check(inst->ip_set_list, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)[index];

This is an example, probably better to add a small
"ip_set_dereference_nfnl" helper to hide the lockdep construct...

Not saying the patch is wrong; rcu read locks nest and
ipset locking is not simple so I might be missing something.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
  2023-11-04 10:35   ` Florian Westphal
@ 2023-11-08 10:46     ` Jozsef Kadlecsik
  0 siblings, 0 replies; 5+ messages in thread
From: Jozsef Kadlecsik @ 2023-11-08 10:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Pablo Neira Ayuso, Linkui Xiao

Hi Florian,

On Sat, 4 Nov 2023, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > Linkui Xiao reported that there's a race condition when ipset swap and destroy is
> > called, which can lead to crash in add/del/test element operations. Swap then
> > destroy are usual operations to replace a set with another one in a production
> > system. The issue can in some cases be reproduced with the script:
> 
> > iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
> > while [ 1 ]
> > do
> > 	# ... Ongoing traffic...
> >         ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
> >         ipset add hash_ip2 172.20.0.0/16
> >         ipset swap hash_ip1 hash_ip2
> >         ipset destroy hash_ip2
> >         sleep 0.05
> > done
> > 
> > In the race case the possible order of the operations are
> > 
> > 	CPU0			CPU1
> > 	ip_set_test
> > 				ipset swap hash_ip1 hash_ip2
> > 				ipset destroy hash_ip2
> > 	hash_net_kadt
> > 
> > Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
> > is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
> > removed it, hash_net_kadt crashes.
> > 
> > The fix is to protect both the list of the sets and the set pointers in an extended RCU
> > region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock().
> > The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>.
> > 
> > v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
> > ip_set_destroy() unnecessarily when all sets are destroyed.
> 
> Hmm.  Isn't it enough to only call synchronize_rcu() in ip_set_swap?
> 
> All netfilter hooks run with rcu_read_lock() held, em_ipset.c wraps the 
> entire ip_set_test() in rcu read lock/unlock pair.

Hm, actually, you are right. Technically there's no need to extend the 
rcu_read_lock() protected area inside ipset itself.
 
> > @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index)
> >  	struct ip_set_net *inst = ip_set_pernet(net);
> >  
> >  	rcu_read_lock();
> > -	/* ip_set_list itself needs to be protected */
> > +	/* ip_set_list and the set pointer need to be protected */
> >  	set = rcu_dereference(inst->ip_set_list)[index];
> > -	rcu_read_unlock();
> >  
> >  	return set;
> >  }
> 
> ... so I don't understand why ip_set_rcu_get() has to extend
> the locked section.
> 
> AFAICS there are only two type of callers:
> 1. rcu read lock is already held (datapath)
> 2. ipset nfnl subsys mutex is held
> 
> *probably* This could be changed in a separate patch to:
> 
>   -	rcu_read_lock();
>   -	/* ip_set_list itself needs to be protected */
>   -	set = rcu_dereference(inst->ip_set_list)[index];
>   -	rcu_read_unlock();
>   +	/* ip_set_list and the set pointer need to be protected */
>   +	return rcu_dereference_check(inst->ip_set_list, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)[index];
> 
> This is an example, probably better to add a small
> "ip_set_dereference_nfnl" helper to hide the lockdep construct...
> 
> Not saying the patch is wrong; rcu read locks nest and
> ipset locking is not simple so I might be missing something.

I'm going to rework the patch, thanks for your input!

Best regards,
Jozsef
-- 
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
  2023-11-13 20:09 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v3 Jozsef Kadlecsik
@ 2023-11-13 20:09 ` Jozsef Kadlecsik
  0 siblings, 0 replies; 5+ messages in thread
From: Jozsef Kadlecsik @ 2023-11-13 20:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Linkui Xiao, Florian Westphal

Linkui Xiao reported that there's a race condition when ipset swap and destroy is
called, which can lead to crash in add/del/test element operations. Swap then
destroy are usual operations to replace a set with another one in a production
system. The issue can in some cases be reproduced with the script:

ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576
ipset add hash_ip1 172.20.0.0/16
ipset add hash_ip1 192.168.0.0/16
iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
while [ 1 ]
do
	# ... Ongoing traffic...
        ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
        ipset add hash_ip2 172.20.0.0/16
        ipset swap hash_ip1 hash_ip2
        ipset destroy hash_ip2
        sleep 0.05
done

In the race case the possible order of the operations are

	CPU0			CPU1
	ip_set_test
				ipset swap hash_ip1 hash_ip2
				ipset destroy hash_ip2
	hash_net_kadt

Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
removed it, hash_net_kadt crashes.

The fix is to force ip_set_swap() to wait for all readers to finish accessing the
old set pointers by calling synchronize_rcu().

The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>.

v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
    ip_set_destroy() unnecessarily when all sets are destroyed.
v3: Florian Westphal pointed out that all netfilter hooks run with rcu_read_lock() held
    and em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair.
    So there's no need to extend the rcu read locked area in ipset itself.

Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/
Reported by: Linkui Xiao <xiaolinkui@kylinos.cn>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index e564b5174261..43cd64d6dc26 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -61,6 +61,8 @@ MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET);
 	ip_set_dereference((inst)->ip_set_list)[id]
 #define ip_set_ref_netlink(inst,id)	\
 	rcu_dereference_raw((inst)->ip_set_list)[id]
+#define ip_set_dereference_nfnl(p)	\
+	rcu_dereference_check(p, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET))
 
 /* The set types are implemented in modules and registered set types
  * can be found in ip_set_type_list. Adding/deleting types is
@@ -700,15 +702,10 @@ __ip_set_put_netlink(struct ip_set *set)
 static struct ip_set *
 ip_set_rcu_get(struct net *net, ip_set_id_t index)
 {
-	struct ip_set *set;
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	rcu_read_lock();
-	/* ip_set_list itself needs to be protected */
-	set = rcu_dereference(inst->ip_set_list)[index];
-	rcu_read_unlock();
-
-	return set;
+	/* ip_set_list and the set pointer need to be protected */
+	return ip_set_dereference_nfnl(inst->ip_set_list)[index];
 }
 
 static inline void
@@ -1389,6 +1386,9 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
 	ip_set(inst, to_id) = from;
 	write_unlock_bh(&ip_set_ref_lock);
 
+	/* Make sure all readers of the old set pointers are completed. */
+	synchronize_rcu();
+
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-11-13 20:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-04 10:03 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2 Jozsef Kadlecsik
2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik
2023-11-04 10:35   ` Florian Westphal
2023-11-08 10:46     ` Jozsef Kadlecsik
  -- strict thread matches above, loose matches on Subject: below --
2023-11-13 20:09 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v3 Jozsef Kadlecsik
2023-11-13 20:09 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 Jozsef Kadlecsik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).