All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: netfilter-devel@vger.kernel.org
Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com, fw@strlen.de
Subject: [PATCH net 14/14] netfilter: ipset: fix performance regression in swap operation
Date: Wed, 17 Jan 2024 17:00:30 +0100	[thread overview]
Message-ID: <20240117160030.140264-15-pablo@netfilter.org> (raw)
In-Reply-To: <20240117160030.140264-1-pablo@netfilter.org>

From: Jozsef Kadlecsik <kadlec@netfilter.org>

The patch "netfilter: ipset: fix race condition between swap/destroy
and kernel side add/del/test", commit 28628fa9 fixes a race condition.
But the synchronize_rcu() added to the swap function unnecessarily slows
it down: it can safely be moved to destroy and use call_rcu() instead.
Thus we can get back the same performance and preventing the race condition
at the same time.

Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test")
Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/
Reported-by: Ale Crismani <ale.crismani@automattic.com>
Reported-by: David Wang <00107082@163.com
Tested-by: Ale Crismani <ale.crismani@automattic.com>
Tested-by: David Wang <00107082@163.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/ipset/ip_set.h |  2 ++
 net/netfilter/ipset/ip_set_core.c      | 31 +++++++++++++++++++-------
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h
index e8c350a3ade1..912f750d0bea 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -242,6 +242,8 @@ extern void ip_set_type_unregister(struct ip_set_type *set_type);
 
 /* A generic IP set */
 struct ip_set {
+	/* For call_cru in destroy */
+	struct rcu_head rcu;
 	/* The name of the set */
 	char name[IPSET_MAXNAMELEN];
 	/* Lock protecting the set data */
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 4c133e06be1d..3bf9bb345809 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1182,6 +1182,14 @@ ip_set_destroy_set(struct ip_set *set)
 	kfree(set);
 }
 
+static void
+ip_set_destroy_set_rcu(struct rcu_head *head)
+{
+	struct ip_set *set = container_of(head, struct ip_set, rcu);
+
+	ip_set_destroy_set(set);
+}
+
 static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
 			  const struct nlattr * const attr[])
 {
@@ -1193,8 +1201,6 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
 	if (unlikely(protocol_min_failed(attr)))
 		return -IPSET_ERR_PROTOCOL;
 
-	/* Must wait for flush to be really finished in list:set */
-	rcu_barrier();
 
 	/* Commands are serialized and references are
 	 * protected by the ip_set_ref_lock.
@@ -1206,8 +1212,10 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
 	 * counter, so if it's already zero, we can proceed
 	 * without holding the lock.
 	 */
-	read_lock_bh(&ip_set_ref_lock);
 	if (!attr[IPSET_ATTR_SETNAME]) {
+		/* Must wait for flush to be really finished in list:set */
+		rcu_barrier();
+		read_lock_bh(&ip_set_ref_lock);
 		for (i = 0; i < inst->ip_set_max; i++) {
 			s = ip_set(inst, i);
 			if (s && (s->ref || s->ref_netlink)) {
@@ -1228,6 +1236,9 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
 		inst->is_destroyed = false;
 	} else {
 		u32 flags = flag_exist(info->nlh);
+		u16 features = 0;
+
+		read_lock_bh(&ip_set_ref_lock);
 		s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
 				    &i);
 		if (!s) {
@@ -1238,10 +1249,14 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
 			ret = -IPSET_ERR_BUSY;
 			goto out;
 		}
+		features = s->type->features;
 		ip_set(inst, i) = NULL;
 		read_unlock_bh(&ip_set_ref_lock);
-
-		ip_set_destroy_set(s);
+		if (features & IPSET_TYPE_NAME) {
+			/* Must wait for flush to be really finished  */
+			rcu_barrier();
+		}
+		call_rcu(&s->rcu, ip_set_destroy_set_rcu);
 	}
 	return 0;
 out:
@@ -1394,9 +1409,6 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
 	ip_set(inst, to_id) = from;
 	write_unlock_bh(&ip_set_ref_lock);
 
-	/* Make sure all readers of the old set pointers are completed. */
-	synchronize_rcu();
-
 	return 0;
 }
 
@@ -2357,6 +2369,9 @@ ip_set_net_exit(struct net *net)
 
 	inst->is_deleted = true; /* flag for ip_set_nfnl_put */
 
+	/* Wait for call_rcu() in destroy */
+	rcu_barrier();
+
 	nfnl_lock(NFNL_SUBSYS_IPSET);
 	for (i = 0; i < inst->ip_set_max; i++) {
 		set = ip_set(inst, i);
-- 
2.30.2


  parent reply	other threads:[~2024-01-17 16:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 16:00 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 01/14] netfilter: nf_tables: reject invalid set policy Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 02/14] netfilter: nf_tables: validate .maxattr at expression registration Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 03/14] netfilter: nf_tables: bail out if stateful expression provides no .clone Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 04/14] netfilter: nft_limit: do not ignore unsupported flags Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 05/14] netfilter: nfnetlink_log: use proper helper for fetching physinif Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 06/14] netfilter: nf_queue: remove excess nf_bridge variable Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 07/14] netfilter: propagate net to nf_bridge_get_physindev Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 08/14] netfilter: bridge: replace physindev with physinif in nf_bridge_info Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 09/14] netfilter: nf_tables: check if catch-all set element is active in next generation Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 10/14] netfilter: nf_tables: do not allow mismatch field size and set key length Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 11/14] netfilter: nf_tables: skip dead set elements in netlink dump Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 12/14] netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description Pablo Neira Ayuso
2024-01-17 16:00 ` [PATCH net 13/14] ipvs: avoid stat macros calls from preemptible context Pablo Neira Ayuso
2024-01-17 16:00 ` Pablo Neira Ayuso [this message]
2024-01-17 16:08   ` [PATCH net 14/14] netfilter: ipset: fix performance regression in swap operation Eric Dumazet
2024-01-17 16:23     ` Jozsef Kadlecsik
2024-01-17 16:28       ` Eric Dumazet
2024-01-18 11:08         ` Paolo Abeni
2024-01-18 12:14           ` Eric Dumazet
2024-01-18 10:08   ` Eric Dumazet
2024-01-18 14:24     ` Jozsef Kadlecsik
2024-01-23  9:03       ` Jozsef Kadlecsik
2024-01-23 17:20         ` David Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240117160030.140264-15-pablo@netfilter.org \
    --to=pablo@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.