All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types
  2026-04-08  7:02 [PATCH 0/2] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik
@ 2026-04-08  7:02 ` Jozsef Kadlecsik
  0 siblings, 0 replies; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-08  7:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal

The "ipset list -terse" command is actually a dump operation which
may run parallel with "ipset add" commands, which can trigger an
internal resizing of the hash type of sets just being dumped. However,
dumping just the header part of the set was not protected against
underlying resizing. Fix it by protecting the header dumping part
as well.

Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index d0c9fe59c67d..e6a8b3acc556 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1648,13 +1648,13 @@ ip_set_dump_do(struct sk_buff *skb, struct netlink_callback *cb)
 			if (cb->args[IPSET_CB_PROTO] > IPSET_PROTOCOL_MIN &&
 			    nla_put_net16(skb, IPSET_ATTR_INDEX, htons(index)))
 				goto nla_put_failure;
+			if (set->variant->uref)
+				set->variant->uref(set, cb, true);
 			ret = set->variant->head(set, skb);
 			if (ret < 0)
 				goto release_refcount;
 			if (dump_flags & IPSET_FLAG_LIST_HEADER)
 				goto next_set;
-			if (set->variant->uref)
-				set->variant->uref(set, cb, true);
 			fallthrough;
 		default:
 			ret = set->variant->list(set, skb, cb);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes
@ 2026-04-15  8:20 Jozsef Kadlecsik
  2026-04-15  8:20 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik
  2026-04-15  8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik
  0 siblings, 2 replies; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-15  8:20 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal

Hi Pablo, Florian,

Please consider applying the next patches:

* Fix a data race between add and list in all hash types due to setting
  the position index too early.
* Fix a data race between add and list header commands in all hash types 
  by protecting the list header dumping part as well.

Best regards,
Jozsef

The following changes since commit a9d4f4f6e65e0bf9bbddedecc84d67249991979c:

  net/mlx5: Update the list of the PCI supported devices (2026-04-06 19:17:42 -0700)

are available in the Git repository at:

  git://blackhole.kfki.hu/nf 90262ae5b482a4bfb9282c

for you to fetch changes up to 90262ae5b482a4bfb9282c19f035aafcc7ac9af2:

  netfilter: ipset: Fix data race between add and dump in all hash types (2026-04-15 10:16:33 +0200)

----------------------------------------------------------------
Jozsef Kadlecsik (2):
      netfilter: ipset: Fix data race between add and list header in all hash types
      netfilter: ipset: Fix data race between add and dump in all hash types

 net/netfilter/ipset/ip_set_core.c     | 4 ++--
 net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++--
 2 files changed, 6 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types
  2026-04-15  8:20 [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik
@ 2026-04-15  8:20 ` Jozsef Kadlecsik
  2026-04-15  8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik
  1 sibling, 0 replies; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-15  8:20 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal

The "ipset list -terse" command is actually a dump operation which
may run parallel with "ipset add" commands, which can trigger an
internal resizing of the hash type of sets just being dumped. However,
dumping just the header part of the set was not protected against
underlying resizing. Fix it by protecting the header dumping part
as well.

Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index d0c9fe59c67d..e6a8b3acc556 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1648,13 +1648,13 @@ ip_set_dump_do(struct sk_buff *skb, struct netlink_callback *cb)
 			if (cb->args[IPSET_CB_PROTO] > IPSET_PROTOCOL_MIN &&
 			    nla_put_net16(skb, IPSET_ATTR_INDEX, htons(index)))
 				goto nla_put_failure;
+			if (set->variant->uref)
+				set->variant->uref(set, cb, true);
 			ret = set->variant->head(set, skb);
 			if (ret < 0)
 				goto release_refcount;
 			if (dump_flags & IPSET_FLAG_LIST_HEADER)
 				goto next_set;
-			if (set->variant->uref)
-				set->variant->uref(set, cb, true);
 			fallthrough;
 		default:
 			ret = set->variant->list(set, skb, cb);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types
  2026-04-15  8:20 [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik
  2026-04-15  8:20 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik
@ 2026-04-15  8:20 ` Jozsef Kadlecsik
  2026-04-15 10:22   ` Florian Westphal
  1 sibling, 1 reply; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-15  8:20 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal

When adding a new entry to the next position in the existing hash bucket,
the position index was incremented too early and parallel dump could
read it before the entry was populated with the value. Move the setting
of the position index after populating the entry.

Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com
Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com
Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index b79e5dd2af03..0da02a8dfbae 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -844,7 +844,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 	const struct mtype_elem *d = value;
 	struct mtype_elem *data;
 	struct hbucket *n, *old = ERR_PTR(-ENOENT);
-	int i, j = -1, ret;
+	int i, j = -1, npos = 0, ret;
 	bool flag_exist = flags & IPSET_FLAG_EXIST;
 	bool deleted = false, forceadd = false, reuse = false;
 	u32 r, key, multi = 0, elements, maxelem;
@@ -889,6 +889,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 			ext_size(AHASH_INIT_SIZE, set->dsize);
 		goto copy_elem;
 	}
+	npos = n->pos;
 	for (i = 0; i < n->pos; i++) {
 		if (!test_bit(i, n->used)) {
 			/* Reuse first deleted entry */
@@ -962,7 +963,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 	}
 
 copy_elem:
-	j = n->pos++;
+	j = npos = n->pos + 1;
 	data = ahash_data(n, j, set->dsize);
 copy_data:
 	t->hregion[r].elements++;
@@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 	if (SET_WITH_TIMEOUT(set))
 		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
 	smp_mb__before_atomic();
+	n->pos = npos;
 	set_bit(j, n->used);
 	if (old != ERR_PTR(-ENOENT)) {
 		rcu_assign_pointer(hbucket(t, key), n);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types
  2026-04-15  8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik
@ 2026-04-15 10:22   ` Florian Westphal
  2026-04-15 10:33     ` Jozsef Kadlecsik
  2026-04-20 13:11     ` Jozsef Kadlecsik
  0 siblings, 2 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-15 10:22 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel, Pablo Neira Ayuso

Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> When adding a new entry to the next position in the existing hash bucket,
> the position index was incremented too early and parallel dump could
> read it before the entry was populated with the value. Move the setting
> of the position index after populating the entry.
> 
> Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com
> Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com
> Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com
> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
> ---
>  net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index b79e5dd2af03..0da02a8dfbae 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -844,7 +844,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>  	const struct mtype_elem *d = value;
>  	struct mtype_elem *data;
>  	struct hbucket *n, *old = ERR_PTR(-ENOENT);
> -	int i, j = -1, ret;
> +	int i, j = -1, npos = 0, ret;
>  	bool flag_exist = flags & IPSET_FLAG_EXIST;
>  	bool deleted = false, forceadd = false, reuse = false;
>  	u32 r, key, multi = 0, elements, maxelem;
> @@ -889,6 +889,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>  			ext_size(AHASH_INIT_SIZE, set->dsize);
>  		goto copy_elem;
>  	}
> +	npos = n->pos;
>  	for (i = 0; i < n->pos; i++) {
>  		if (!test_bit(i, n->used)) {
>  			/* Reuse first deleted entry */
> @@ -962,7 +963,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>  	}
>  
>  copy_elem:
> -	j = n->pos++;
> +	j = npos = n->pos + 1;

Hmm. Should that be:
+	j = npos;
+	npos = n->pos + 1;

As-is j is advanced by 1 compared to old code.

>  	data = ahash_data(n, j, set->dsize);
>  copy_data:
>  	t->hregion[r].elements++;
> @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>  	if (SET_WITH_TIMEOUT(set))
>  		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
>  	smp_mb__before_atomic();
> +	n->pos = npos;
>  	set_bit(j, n->used);

I think this needs a followup-patch to switch this to smp_store_release
and readers to smp_load_acquire helpers.

Here is a diff for this (generated by LLM and only compile tested).
I think this can be a separate patch to not make this change too big and
to not mix different fixes.

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 0da02a8dfbae..2ca674e51699 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -682,10 +682,13 @@ mtype_resize(struct ip_set *set, bool retried)
 		rcu_read_lock_bh();
 		for (i = ahash_bucket_start(r, orig->htable_bits);
 		     i < ahash_bucket_end(r, orig->htable_bits); i++) {
+			u8 pos;
+
 			n = __ipset_dereference(hbucket(orig, i));
 			if (!n)
 				continue;
-			for (j = 0; j < n->pos; j++) {
+			pos = smp_load_acquire(&n->pos);
+			for (j = 0; j < pos; j++) {
 				if (!test_bit(j, n->used))
 					continue;
 				data = ahash_data(n, j, dsize);
@@ -817,10 +820,13 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size)
 	for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) {
 		for (i = ahash_bucket_start(r, t->htable_bits);
 		     i < ahash_bucket_end(r, t->htable_bits); i++) {
+			u8 pos;
+
 			n = rcu_dereference_bh(hbucket(t, i));
 			if (!n)
 				continue;
-			for (j = 0; j < n->pos; j++) {
+			pos = smp_load_acquire(&n->pos);
+			for (j = 0; j < pos; j++) {
 				if (!test_bit(j, n->used))
 					continue;
 				data = ahash_data(n, j, set->dsize);
@@ -963,7 +969,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 	}
 
 copy_elem:
-	j = npos = n->pos + 1;
+	j = npos;
+	npos = n->pos + 1;
 	data = ahash_data(n, j, set->dsize);
 copy_data:
 	t->hregion[r].elements++;
@@ -985,8 +992,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 	/* Must come last for the case when timed out entry is reused */
 	if (SET_WITH_TIMEOUT(set))
 		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
-	smp_mb__before_atomic();
-	n->pos = npos;
+	/* Ensure all data writes are visible before updating position */
+	smp_store_release(&n->pos, npos);
 	set_bit(j, n->used);
 	if (old != ERR_PTR(-ENOENT)) {
 		rcu_assign_pointer(hbucket(t, key), n);
@@ -1172,6 +1179,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
 	int ret, i, j = 0;
 #endif
 	u32 key, multi = 0;
+	u8 pos;
 
 	pr_debug("test by nets\n");
 	for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) {
@@ -1189,7 +1197,8 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
 		n = rcu_dereference_bh(hbucket(t, key));
 		if (!n)
 			continue;
-		for (i = 0; i < n->pos; i++) {
+		pos = smp_load_acquire(&n->pos);
+		for (i = 0; i < pos; i++) {
 			if (!test_bit(i, n->used))
 				continue;
 			data = ahash_data(n, i, set->dsize);
@@ -1223,6 +1232,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 	struct mtype_elem *data;
 	int i, ret = 0;
 	u32 key, multi = 0;
+	u8 pos;
 
 	rcu_read_lock_bh();
 	t = rcu_dereference_bh(h->table);
@@ -1245,7 +1255,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
 		ret = 0;
 		goto out;
 	}
-	for (i = 0; i < n->pos; i++) {
+	pos = smp_load_acquire(&n->pos);
+	for (i = 0; i < pos; i++) {
 		if (!test_bit(i, n->used))
 			continue;
 		data = ahash_data(n, i, set->dsize);
@@ -1373,6 +1384,8 @@ mtype_list(const struct ip_set *set,
 	rcu_read_lock();
 	for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits);
 	     cb->args[IPSET_CB_ARG0]++) {
+		u8 pos;
+
 		cond_resched_rcu();
 		incomplete = skb_tail_pointer(skb);
 		n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0]));
@@ -1380,7 +1393,9 @@ mtype_list(const struct ip_set *set,
 			 cb->args[IPSET_CB_ARG0], t, n);
 		if (!n)
 			continue;
-		for (i = 0; i < n->pos; i++) {
+		/* Acquire ordering with smp_store_release in mtype_add */
+		pos = smp_load_acquire(&n->pos);
+		for (i = 0; i < pos; i++) {
 			if (!test_bit(i, n->used))
 				continue;
 			e = ahash_data(n, i, set->dsize);



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types
  2026-04-15 10:22   ` Florian Westphal
@ 2026-04-15 10:33     ` Jozsef Kadlecsik
  2026-04-20 13:11     ` Jozsef Kadlecsik
  1 sibling, 0 replies; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-15 10:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Jozsef Kadlecsik, netfilter-devel, Pablo Neira Ayuso

Hi Florian,

On Wed, 15 Apr 2026, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
>> When adding a new entry to the next position in the existing hash bucket,
>> the position index was incremented too early and parallel dump could
>> read it before the entry was populated with the value. Move the setting
>> of the position index after populating the entry.
>>
>> Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com
>> Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com
>> Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com
>> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
>> ---
>>  net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
>> index b79e5dd2af03..0da02a8dfbae 100644
>> --- a/net/netfilter/ipset/ip_set_hash_gen.h
>> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
>> @@ -844,7 +844,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>>  	const struct mtype_elem *d = value;
>>  	struct mtype_elem *data;
>>  	struct hbucket *n, *old = ERR_PTR(-ENOENT);
>> -	int i, j = -1, ret;
>> +	int i, j = -1, npos = 0, ret;
>>  	bool flag_exist = flags & IPSET_FLAG_EXIST;
>>  	bool deleted = false, forceadd = false, reuse = false;
>>  	u32 r, key, multi = 0, elements, maxelem;
>> @@ -889,6 +889,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>>  			ext_size(AHASH_INIT_SIZE, set->dsize);
>>  		goto copy_elem;
>>  	}
>> +	npos = n->pos;
>>  	for (i = 0; i < n->pos; i++) {
>>  		if (!test_bit(i, n->used)) {
>>  			/* Reuse first deleted entry */
>> @@ -962,7 +963,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>>  	}
>>
>>  copy_elem:
>> -	j = n->pos++;
>> +	j = npos = n->pos + 1;
>
> Hmm. Should that be:
> +	j = npos;
> +	npos = n->pos + 1;
>
> As-is j is advanced by 1 compared to old code.
>
>>  	data = ahash_data(n, j, set->dsize);
>>  copy_data:
>>  	t->hregion[r].elements++;
>> @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>>  	if (SET_WITH_TIMEOUT(set))
>>  		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
>>  	smp_mb__before_atomic();
>> +	n->pos = npos;
>>  	set_bit(j, n->used);
>
> I think this needs a followup-patch to switch this to smp_store_release
> and readers to smp_load_acquire helpers.

Your comments are sharp, thanks! I'd better resend the batch in a new 
version.

Best regards,
Jozsef

> Here is a diff for this (generated by LLM and only compile tested).
> I think this can be a separate patch to not make this change too big and
> to not mix different fixes.
>
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index 0da02a8dfbae..2ca674e51699 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -682,10 +682,13 @@ mtype_resize(struct ip_set *set, bool retried)
> 		rcu_read_lock_bh();
> 		for (i = ahash_bucket_start(r, orig->htable_bits);
> 		     i < ahash_bucket_end(r, orig->htable_bits); i++) {
> +			u8 pos;
> +
> 			n = __ipset_dereference(hbucket(orig, i));
> 			if (!n)
> 				continue;
> -			for (j = 0; j < n->pos; j++) {
> +			pos = smp_load_acquire(&n->pos);
> +			for (j = 0; j < pos; j++) {
> 				if (!test_bit(j, n->used))
> 					continue;
> 				data = ahash_data(n, j, dsize);
> @@ -817,10 +820,13 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size)
> 	for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) {
> 		for (i = ahash_bucket_start(r, t->htable_bits);
> 		     i < ahash_bucket_end(r, t->htable_bits); i++) {
> +			u8 pos;
> +
> 			n = rcu_dereference_bh(hbucket(t, i));
> 			if (!n)
> 				continue;
> -			for (j = 0; j < n->pos; j++) {
> +			pos = smp_load_acquire(&n->pos);
> +			for (j = 0; j < pos; j++) {
> 				if (!test_bit(j, n->used))
> 					continue;
> 				data = ahash_data(n, j, set->dsize);
> @@ -963,7 +969,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 	}
>
> copy_elem:
> -	j = npos = n->pos + 1;
> +	j = npos;
> +	npos = n->pos + 1;
> 	data = ahash_data(n, j, set->dsize);
> copy_data:
> 	t->hregion[r].elements++;
> @@ -985,8 +992,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 	/* Must come last for the case when timed out entry is reused */
> 	if (SET_WITH_TIMEOUT(set))
> 		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
> -	smp_mb__before_atomic();
> -	n->pos = npos;
> +	/* Ensure all data writes are visible before updating position */
> +	smp_store_release(&n->pos, npos);
> 	set_bit(j, n->used);
> 	if (old != ERR_PTR(-ENOENT)) {
> 		rcu_assign_pointer(hbucket(t, key), n);
> @@ -1172,6 +1179,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
> 	int ret, i, j = 0;
> #endif
> 	u32 key, multi = 0;
> +	u8 pos;
>
> 	pr_debug("test by nets\n");
> 	for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) {
> @@ -1189,7 +1197,8 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
> 		n = rcu_dereference_bh(hbucket(t, key));
> 		if (!n)
> 			continue;
> -		for (i = 0; i < n->pos; i++) {
> +		pos = smp_load_acquire(&n->pos);
> +		for (i = 0; i < pos; i++) {
> 			if (!test_bit(i, n->used))
> 				continue;
> 			data = ahash_data(n, i, set->dsize);
> @@ -1223,6 +1232,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 	struct mtype_elem *data;
> 	int i, ret = 0;
> 	u32 key, multi = 0;
> +	u8 pos;
>
> 	rcu_read_lock_bh();
> 	t = rcu_dereference_bh(h->table);
> @@ -1245,7 +1255,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 		ret = 0;
> 		goto out;
> 	}
> -	for (i = 0; i < n->pos; i++) {
> +	pos = smp_load_acquire(&n->pos);
> +	for (i = 0; i < pos; i++) {
> 		if (!test_bit(i, n->used))
> 			continue;
> 		data = ahash_data(n, i, set->dsize);
> @@ -1373,6 +1384,8 @@ mtype_list(const struct ip_set *set,
> 	rcu_read_lock();
> 	for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits);
> 	     cb->args[IPSET_CB_ARG0]++) {
> +		u8 pos;
> +
> 		cond_resched_rcu();
> 		incomplete = skb_tail_pointer(skb);
> 		n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0]));
> @@ -1380,7 +1393,9 @@ mtype_list(const struct ip_set *set,
> 			 cb->args[IPSET_CB_ARG0], t, n);
> 		if (!n)
> 			continue;
> -		for (i = 0; i < n->pos; i++) {
> +		/* Acquire ordering with smp_store_release in mtype_add */
> +		pos = smp_load_acquire(&n->pos);
> +		for (i = 0; i < pos; i++) {
> 			if (!test_bit(i, n->used))
> 				continue;
> 			e = ahash_data(n, i, set->dsize);
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types
  2026-04-15 10:22   ` Florian Westphal
  2026-04-15 10:33     ` Jozsef Kadlecsik
@ 2026-04-20 13:11     ` Jozsef Kadlecsik
  2026-04-20 14:03       ` Florian Westphal
  1 sibling, 1 reply; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-20 13:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Jozsef Kadlecsik, netfilter-devel, Pablo Neira Ayuso

Hi Florian,

On Wed, 15 Apr 2026, Florian Westphal wrote:

>>  	data = ahash_data(n, j, set->dsize);
>>  copy_data:
>>  	t->hregion[r].elements++;
>> @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>>  	if (SET_WITH_TIMEOUT(set))
>>  		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
>>  	smp_mb__before_atomic();
>> +	n->pos = npos;
>>  	set_bit(j, n->used);
>
> I think this needs a followup-patch to switch this to smp_store_release
> and readers to smp_load_acquire helpers.

I'm pondering over the helpers and unsure. The hash bucket structure is as 
follows:

/* A hash bucket */
struct hbucket {
         struct rcu_head rcu;    /* for call_rcu */
         /* Which positions are used in the array */
         DECLARE_BITMAP(used, AHASH_MAX_TUNED);
         u8 size;                /* size of the array */
         u8 pos;                 /* position of the first free entry */
         unsigned char value[]   /* the array of the values */
                 __aligned(__alignof__(u64));
};

The elements in "value" are filled up successively, "pos" is there to 
limit the searching range for existing elements (which might be timed out 
as well). So until the "size" is unchanged (no growing/shrinking), the 
worst things which can happen if it's not "correct":

- new element added but dump/test don't "find" it
- element deleted but dump/test find it

The critical part is when "size" changes. But "size" never updated 
directly: a completely new bucket is created when growing/shrinking and 
the new bucket is assigned via RCU mechanism.

So why do you thing the helpers are required to read/update the "pos" 
element of the hash bucket? I might not see the wood for the trees.

Best regards,
Jozsef

> Here is a diff for this (generated by LLM and only compile tested).
> I think this can be a separate patch to not make this change too big and
> to not mix different fixes.
>
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index 0da02a8dfbae..2ca674e51699 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -682,10 +682,13 @@ mtype_resize(struct ip_set *set, bool retried)
> 		rcu_read_lock_bh();
> 		for (i = ahash_bucket_start(r, orig->htable_bits);
> 		     i < ahash_bucket_end(r, orig->htable_bits); i++) {
> +			u8 pos;
> +
> 			n = __ipset_dereference(hbucket(orig, i));
> 			if (!n)
> 				continue;
> -			for (j = 0; j < n->pos; j++) {
> +			pos = smp_load_acquire(&n->pos);
> +			for (j = 0; j < pos; j++) {
> 				if (!test_bit(j, n->used))
> 					continue;
> 				data = ahash_data(n, j, dsize);
> @@ -817,10 +820,13 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size)
> 	for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) {
> 		for (i = ahash_bucket_start(r, t->htable_bits);
> 		     i < ahash_bucket_end(r, t->htable_bits); i++) {
> +			u8 pos;
> +
> 			n = rcu_dereference_bh(hbucket(t, i));
> 			if (!n)
> 				continue;
> -			for (j = 0; j < n->pos; j++) {
> +			pos = smp_load_acquire(&n->pos);
> +			for (j = 0; j < pos; j++) {
> 				if (!test_bit(j, n->used))
> 					continue;
> 				data = ahash_data(n, j, set->dsize);
> @@ -963,7 +969,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 	}
>
> copy_elem:
> -	j = npos = n->pos + 1;
> +	j = npos;
> +	npos = n->pos + 1;
> 	data = ahash_data(n, j, set->dsize);
> copy_data:
> 	t->hregion[r].elements++;
> @@ -985,8 +992,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 	/* Must come last for the case when timed out entry is reused */
> 	if (SET_WITH_TIMEOUT(set))
> 		ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
> -	smp_mb__before_atomic();
> -	n->pos = npos;
> +	/* Ensure all data writes are visible before updating position */
> +	smp_store_release(&n->pos, npos);
> 	set_bit(j, n->used);
> 	if (old != ERR_PTR(-ENOENT)) {
> 		rcu_assign_pointer(hbucket(t, key), n);
> @@ -1172,6 +1179,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
> 	int ret, i, j = 0;
> #endif
> 	u32 key, multi = 0;
> +	u8 pos;
>
> 	pr_debug("test by nets\n");
> 	for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) {
> @@ -1189,7 +1197,8 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
> 		n = rcu_dereference_bh(hbucket(t, key));
> 		if (!n)
> 			continue;
> -		for (i = 0; i < n->pos; i++) {
> +		pos = smp_load_acquire(&n->pos);
> +		for (i = 0; i < pos; i++) {
> 			if (!test_bit(i, n->used))
> 				continue;
> 			data = ahash_data(n, i, set->dsize);
> @@ -1223,6 +1232,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 	struct mtype_elem *data;
> 	int i, ret = 0;
> 	u32 key, multi = 0;
> +	u8 pos;
>
> 	rcu_read_lock_bh();
> 	t = rcu_dereference_bh(h->table);
> @@ -1245,7 +1255,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
> 		ret = 0;
> 		goto out;
> 	}
> -	for (i = 0; i < n->pos; i++) {
> +	pos = smp_load_acquire(&n->pos);
> +	for (i = 0; i < pos; i++) {
> 		if (!test_bit(i, n->used))
> 			continue;
> 		data = ahash_data(n, i, set->dsize);
> @@ -1373,6 +1384,8 @@ mtype_list(const struct ip_set *set,
> 	rcu_read_lock();
> 	for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits);
> 	     cb->args[IPSET_CB_ARG0]++) {
> +		u8 pos;
> +
> 		cond_resched_rcu();
> 		incomplete = skb_tail_pointer(skb);
> 		n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0]));
> @@ -1380,7 +1393,9 @@ mtype_list(const struct ip_set *set,
> 			 cb->args[IPSET_CB_ARG0], t, n);
> 		if (!n)
> 			continue;
> -		for (i = 0; i < n->pos; i++) {
> +		/* Acquire ordering with smp_store_release in mtype_add */
> +		pos = smp_load_acquire(&n->pos);
> +		for (i = 0; i < pos; i++) {
> 			if (!test_bit(i, n->used))
> 				continue;
> 			e = ahash_data(n, i, set->dsize);
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types
  2026-04-20 13:11     ` Jozsef Kadlecsik
@ 2026-04-20 14:03       ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-20 14:03 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Jozsef Kadlecsik, netfilter-devel, Pablo Neira Ayuso

Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> limit the searching range for existing elements (which might be timed out 
> as well). So until the "size" is unchanged (no growing/shrinking), the 
> worst things which can happen if it's not "correct":
> 
> - new element added but dump/test don't "find" it
> - element deleted but dump/test find it
> 
> The critical part is when "size" changes. But "size" never updated 
> directly: a completely new bucket is created when growing/shrinking and 
> the new bucket is assigned via RCU mechanism.
> 
> So why do you thing the helpers are required to read/update the "pos" 
> element of the hash bucket? I might not see the wood for the trees.

As-is it is confusing. Pos is updated concurrently to dumpers
with no ordering and no annotations (READ_ONCE, WRITE_ONCE).

Maybe they are not required, but then there should be WRITE_ONCE()
and READ_ONCE and a comment explaining that this is deliberately racy.

Basically what you wrote above.  It would help to understand what
guarantees there are and that this is good enough.

As-is KCSAN will surely splat without data race annotations.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-20 14:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15  8:20 [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik
2026-04-15  8:20 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik
2026-04-15  8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik
2026-04-15 10:22   ` Florian Westphal
2026-04-15 10:33     ` Jozsef Kadlecsik
2026-04-20 13:11     ` Jozsef Kadlecsik
2026-04-20 14:03       ` Florian Westphal
  -- strict thread matches above, loose matches on Subject: below --
2026-04-08  7:02 [PATCH 0/2] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik
2026-04-08  7:02 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.