* [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes
@ 2026-04-15 8:20 Jozsef Kadlecsik
2026-04-15 8:20 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik
2026-04-15 8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik
0 siblings, 2 replies; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-15 8:20 UTC (permalink / raw)
To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal
Hi Pablo, Florian,
Please consider applying the next patches:
* Fix a data race between add and list in all hash types due to setting
the position index too early.
* Fix a data race between add and list header commands in all hash types
by protecting the list header dumping part as well.
Best regards,
Jozsef
The following changes since commit a9d4f4f6e65e0bf9bbddedecc84d67249991979c:
net/mlx5: Update the list of the PCI supported devices (2026-04-06 19:17:42 -0700)
are available in the Git repository at:
git://blackhole.kfki.hu/nf 90262ae5b482a4bfb9282c
for you to fetch changes up to 90262ae5b482a4bfb9282c19f035aafcc7ac9af2:
netfilter: ipset: Fix data race between add and dump in all hash types (2026-04-15 10:16:33 +0200)
----------------------------------------------------------------
Jozsef Kadlecsik (2):
netfilter: ipset: Fix data race between add and list header in all hash types
netfilter: ipset: Fix data race between add and dump in all hash types
net/netfilter/ipset/ip_set_core.c | 4 ++--
net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++--
2 files changed, 6 insertions(+), 4 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types 2026-04-15 8:20 [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik @ 2026-04-15 8:20 ` Jozsef Kadlecsik 2026-04-15 8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik 1 sibling, 0 replies; 8+ messages in thread From: Jozsef Kadlecsik @ 2026-04-15 8:20 UTC (permalink / raw) To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal The "ipset list -terse" command is actually a dump operation which may run parallel with "ipset add" commands, which can trigger an internal resizing of the hash type of sets just being dumped. However, dumping just the header part of the set was not protected against underlying resizing. Fix it by protecting the header dumping part as well. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> --- net/netfilter/ipset/ip_set_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index d0c9fe59c67d..e6a8b3acc556 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1648,13 +1648,13 @@ ip_set_dump_do(struct sk_buff *skb, struct netlink_callback *cb) if (cb->args[IPSET_CB_PROTO] > IPSET_PROTOCOL_MIN && nla_put_net16(skb, IPSET_ATTR_INDEX, htons(index))) goto nla_put_failure; + if (set->variant->uref) + set->variant->uref(set, cb, true); ret = set->variant->head(set, skb); if (ret < 0) goto release_refcount; if (dump_flags & IPSET_FLAG_LIST_HEADER) goto next_set; - if (set->variant->uref) - set->variant->uref(set, cb, true); fallthrough; default: ret = set->variant->list(set, skb, cb); -- 2.39.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types 2026-04-15 8:20 [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik 2026-04-15 8:20 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik @ 2026-04-15 8:20 ` Jozsef Kadlecsik 2026-04-15 10:22 ` Florian Westphal 1 sibling, 1 reply; 8+ messages in thread From: Jozsef Kadlecsik @ 2026-04-15 8:20 UTC (permalink / raw) To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal When adding a new entry to the next position in the existing hash bucket, the position index was incremented too early and parallel dump could read it before the entry was populated with the value. Move the setting of the position index after populating the entry. Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> --- net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index b79e5dd2af03..0da02a8dfbae 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -844,7 +844,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, const struct mtype_elem *d = value; struct mtype_elem *data; struct hbucket *n, *old = ERR_PTR(-ENOENT); - int i, j = -1, ret; + int i, j = -1, npos = 0, ret; bool flag_exist = flags & IPSET_FLAG_EXIST; bool deleted = false, forceadd = false, reuse = false; u32 r, key, multi = 0, elements, maxelem; @@ -889,6 +889,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, ext_size(AHASH_INIT_SIZE, set->dsize); goto copy_elem; } + npos = n->pos; for (i = 0; i < n->pos; i++) { if (!test_bit(i, n->used)) { /* Reuse first deleted entry */ @@ -962,7 +963,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, } copy_elem: - j = n->pos++; + j = npos = n->pos + 1; data = ahash_data(n, j, set->dsize); copy_data: t->hregion[r].elements++; @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (SET_WITH_TIMEOUT(set)) ip_set_timeout_set(ext_timeout(data, set), ext->timeout); smp_mb__before_atomic(); + n->pos = npos; set_bit(j, n->used); if (old != ERR_PTR(-ENOENT)) { rcu_assign_pointer(hbucket(t, key), n); -- 2.39.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types 2026-04-15 8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik @ 2026-04-15 10:22 ` Florian Westphal 2026-04-15 10:33 ` Jozsef Kadlecsik 2026-04-20 13:11 ` Jozsef Kadlecsik 0 siblings, 2 replies; 8+ messages in thread From: Florian Westphal @ 2026-04-15 10:22 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: netfilter-devel, Pablo Neira Ayuso Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > When adding a new entry to the next position in the existing hash bucket, > the position index was incremented too early and parallel dump could > read it before the entry was populated with the value. Move the setting > of the position index after populating the entry. > > Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com > Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com > Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com > Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> > --- > net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h > index b79e5dd2af03..0da02a8dfbae 100644 > --- a/net/netfilter/ipset/ip_set_hash_gen.h > +++ b/net/netfilter/ipset/ip_set_hash_gen.h > @@ -844,7 +844,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > const struct mtype_elem *d = value; > struct mtype_elem *data; > struct hbucket *n, *old = ERR_PTR(-ENOENT); > - int i, j = -1, ret; > + int i, j = -1, npos = 0, ret; > bool flag_exist = flags & IPSET_FLAG_EXIST; > bool deleted = false, forceadd = false, reuse = false; > u32 r, key, multi = 0, elements, maxelem; > @@ -889,6 +889,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > ext_size(AHASH_INIT_SIZE, set->dsize); > goto copy_elem; > } > + npos = n->pos; > for (i = 0; i < n->pos; i++) { > if (!test_bit(i, n->used)) { > /* Reuse first deleted entry */ > @@ -962,7 +963,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > } > > copy_elem: > - j = n->pos++; > + j = npos = n->pos + 1; Hmm. Should that be: + j = npos; + npos = n->pos + 1; As-is j is advanced by 1 compared to old code. > data = ahash_data(n, j, set->dsize); > copy_data: > t->hregion[r].elements++; > @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > if (SET_WITH_TIMEOUT(set)) > ip_set_timeout_set(ext_timeout(data, set), ext->timeout); > smp_mb__before_atomic(); > + n->pos = npos; > set_bit(j, n->used); I think this needs a followup-patch to switch this to smp_store_release and readers to smp_load_acquire helpers. Here is a diff for this (generated by LLM and only compile tested). I think this can be a separate patch to not make this change too big and to not mix different fixes. diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 0da02a8dfbae..2ca674e51699 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -682,10 +682,13 @@ mtype_resize(struct ip_set *set, bool retried) rcu_read_lock_bh(); for (i = ahash_bucket_start(r, orig->htable_bits); i < ahash_bucket_end(r, orig->htable_bits); i++) { + u8 pos; + n = __ipset_dereference(hbucket(orig, i)); if (!n) continue; - for (j = 0; j < n->pos; j++) { + pos = smp_load_acquire(&n->pos); + for (j = 0; j < pos; j++) { if (!test_bit(j, n->used)) continue; data = ahash_data(n, j, dsize); @@ -817,10 +820,13 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size) for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) { for (i = ahash_bucket_start(r, t->htable_bits); i < ahash_bucket_end(r, t->htable_bits); i++) { + u8 pos; + n = rcu_dereference_bh(hbucket(t, i)); if (!n) continue; - for (j = 0; j < n->pos; j++) { + pos = smp_load_acquire(&n->pos); + for (j = 0; j < pos; j++) { if (!test_bit(j, n->used)) continue; data = ahash_data(n, j, set->dsize); @@ -963,7 +969,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, } copy_elem: - j = npos = n->pos + 1; + j = npos; + npos = n->pos + 1; data = ahash_data(n, j, set->dsize); copy_data: t->hregion[r].elements++; @@ -985,8 +992,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, /* Must come last for the case when timed out entry is reused */ if (SET_WITH_TIMEOUT(set)) ip_set_timeout_set(ext_timeout(data, set), ext->timeout); - smp_mb__before_atomic(); - n->pos = npos; + /* Ensure all data writes are visible before updating position */ + smp_store_release(&n->pos, npos); set_bit(j, n->used); if (old != ERR_PTR(-ENOENT)) { rcu_assign_pointer(hbucket(t, key), n); @@ -1172,6 +1179,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, int ret, i, j = 0; #endif u32 key, multi = 0; + u8 pos; pr_debug("test by nets\n"); for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) { @@ -1189,7 +1197,8 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, n = rcu_dereference_bh(hbucket(t, key)); if (!n) continue; - for (i = 0; i < n->pos; i++) { + pos = smp_load_acquire(&n->pos); + for (i = 0; i < pos; i++) { if (!test_bit(i, n->used)) continue; data = ahash_data(n, i, set->dsize); @@ -1223,6 +1232,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, struct mtype_elem *data; int i, ret = 0; u32 key, multi = 0; + u8 pos; rcu_read_lock_bh(); t = rcu_dereference_bh(h->table); @@ -1245,7 +1255,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, ret = 0; goto out; } - for (i = 0; i < n->pos; i++) { + pos = smp_load_acquire(&n->pos); + for (i = 0; i < pos; i++) { if (!test_bit(i, n->used)) continue; data = ahash_data(n, i, set->dsize); @@ -1373,6 +1384,8 @@ mtype_list(const struct ip_set *set, rcu_read_lock(); for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits); cb->args[IPSET_CB_ARG0]++) { + u8 pos; + cond_resched_rcu(); incomplete = skb_tail_pointer(skb); n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0])); @@ -1380,7 +1393,9 @@ mtype_list(const struct ip_set *set, cb->args[IPSET_CB_ARG0], t, n); if (!n) continue; - for (i = 0; i < n->pos; i++) { + /* Acquire ordering with smp_store_release in mtype_add */ + pos = smp_load_acquire(&n->pos); + for (i = 0; i < pos; i++) { if (!test_bit(i, n->used)) continue; e = ahash_data(n, i, set->dsize); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types 2026-04-15 10:22 ` Florian Westphal @ 2026-04-15 10:33 ` Jozsef Kadlecsik 2026-04-20 13:11 ` Jozsef Kadlecsik 1 sibling, 0 replies; 8+ messages in thread From: Jozsef Kadlecsik @ 2026-04-15 10:33 UTC (permalink / raw) To: Florian Westphal; +Cc: Jozsef Kadlecsik, netfilter-devel, Pablo Neira Ayuso Hi Florian, On Wed, 15 Apr 2026, Florian Westphal wrote: > Jozsef Kadlecsik <kadlec@netfilter.org> wrote: >> When adding a new entry to the next position in the existing hash bucket, >> the position index was incremented too early and parallel dump could >> read it before the entry was populated with the value. Move the setting >> of the position index after populating the entry. >> >> Reported-by: syzbot+786c889f046e8b003ca6@syzkaller.appspotmail.com >> Reported-by: syzbot+1da17e4b41d795df059e@syzkaller.appspotmail.com >> Reported-by: syzbot+421c5f3ff8e9493084d9@syzkaller.appspotmail.com >> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> >> --- >> net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h >> index b79e5dd2af03..0da02a8dfbae 100644 >> --- a/net/netfilter/ipset/ip_set_hash_gen.h >> +++ b/net/netfilter/ipset/ip_set_hash_gen.h >> @@ -844,7 +844,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, >> const struct mtype_elem *d = value; >> struct mtype_elem *data; >> struct hbucket *n, *old = ERR_PTR(-ENOENT); >> - int i, j = -1, ret; >> + int i, j = -1, npos = 0, ret; >> bool flag_exist = flags & IPSET_FLAG_EXIST; >> bool deleted = false, forceadd = false, reuse = false; >> u32 r, key, multi = 0, elements, maxelem; >> @@ -889,6 +889,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, >> ext_size(AHASH_INIT_SIZE, set->dsize); >> goto copy_elem; >> } >> + npos = n->pos; >> for (i = 0; i < n->pos; i++) { >> if (!test_bit(i, n->used)) { >> /* Reuse first deleted entry */ >> @@ -962,7 +963,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, >> } >> >> copy_elem: >> - j = n->pos++; >> + j = npos = n->pos + 1; > > Hmm. Should that be: > + j = npos; > + npos = n->pos + 1; > > As-is j is advanced by 1 compared to old code. > >> data = ahash_data(n, j, set->dsize); >> copy_data: >> t->hregion[r].elements++; >> @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, >> if (SET_WITH_TIMEOUT(set)) >> ip_set_timeout_set(ext_timeout(data, set), ext->timeout); >> smp_mb__before_atomic(); >> + n->pos = npos; >> set_bit(j, n->used); > > I think this needs a followup-patch to switch this to smp_store_release > and readers to smp_load_acquire helpers. Your comments are sharp, thanks! I'd better resend the batch in a new version. Best regards, Jozsef > Here is a diff for this (generated by LLM and only compile tested). > I think this can be a separate patch to not make this change too big and > to not mix different fixes. > > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h > index 0da02a8dfbae..2ca674e51699 100644 > --- a/net/netfilter/ipset/ip_set_hash_gen.h > +++ b/net/netfilter/ipset/ip_set_hash_gen.h > @@ -682,10 +682,13 @@ mtype_resize(struct ip_set *set, bool retried) > rcu_read_lock_bh(); > for (i = ahash_bucket_start(r, orig->htable_bits); > i < ahash_bucket_end(r, orig->htable_bits); i++) { > + u8 pos; > + > n = __ipset_dereference(hbucket(orig, i)); > if (!n) > continue; > - for (j = 0; j < n->pos; j++) { > + pos = smp_load_acquire(&n->pos); > + for (j = 0; j < pos; j++) { > if (!test_bit(j, n->used)) > continue; > data = ahash_data(n, j, dsize); > @@ -817,10 +820,13 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size) > for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) { > for (i = ahash_bucket_start(r, t->htable_bits); > i < ahash_bucket_end(r, t->htable_bits); i++) { > + u8 pos; > + > n = rcu_dereference_bh(hbucket(t, i)); > if (!n) > continue; > - for (j = 0; j < n->pos; j++) { > + pos = smp_load_acquire(&n->pos); > + for (j = 0; j < pos; j++) { > if (!test_bit(j, n->used)) > continue; > data = ahash_data(n, j, set->dsize); > @@ -963,7 +969,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > } > > copy_elem: > - j = npos = n->pos + 1; > + j = npos; > + npos = n->pos + 1; > data = ahash_data(n, j, set->dsize); > copy_data: > t->hregion[r].elements++; > @@ -985,8 +992,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > /* Must come last for the case when timed out entry is reused */ > if (SET_WITH_TIMEOUT(set)) > ip_set_timeout_set(ext_timeout(data, set), ext->timeout); > - smp_mb__before_atomic(); > - n->pos = npos; > + /* Ensure all data writes are visible before updating position */ > + smp_store_release(&n->pos, npos); > set_bit(j, n->used); > if (old != ERR_PTR(-ENOENT)) { > rcu_assign_pointer(hbucket(t, key), n); > @@ -1172,6 +1179,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, > int ret, i, j = 0; > #endif > u32 key, multi = 0; > + u8 pos; > > pr_debug("test by nets\n"); > for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) { > @@ -1189,7 +1197,8 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, > n = rcu_dereference_bh(hbucket(t, key)); > if (!n) > continue; > - for (i = 0; i < n->pos; i++) { > + pos = smp_load_acquire(&n->pos); > + for (i = 0; i < pos; i++) { > if (!test_bit(i, n->used)) > continue; > data = ahash_data(n, i, set->dsize); > @@ -1223,6 +1232,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, > struct mtype_elem *data; > int i, ret = 0; > u32 key, multi = 0; > + u8 pos; > > rcu_read_lock_bh(); > t = rcu_dereference_bh(h->table); > @@ -1245,7 +1255,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, > ret = 0; > goto out; > } > - for (i = 0; i < n->pos; i++) { > + pos = smp_load_acquire(&n->pos); > + for (i = 0; i < pos; i++) { > if (!test_bit(i, n->used)) > continue; > data = ahash_data(n, i, set->dsize); > @@ -1373,6 +1384,8 @@ mtype_list(const struct ip_set *set, > rcu_read_lock(); > for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits); > cb->args[IPSET_CB_ARG0]++) { > + u8 pos; > + > cond_resched_rcu(); > incomplete = skb_tail_pointer(skb); > n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0])); > @@ -1380,7 +1393,9 @@ mtype_list(const struct ip_set *set, > cb->args[IPSET_CB_ARG0], t, n); > if (!n) > continue; > - for (i = 0; i < n->pos; i++) { > + /* Acquire ordering with smp_store_release in mtype_add */ > + pos = smp_load_acquire(&n->pos); > + for (i = 0; i < pos; i++) { > if (!test_bit(i, n->used)) > continue; > e = ahash_data(n, i, set->dsize); > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types 2026-04-15 10:22 ` Florian Westphal 2026-04-15 10:33 ` Jozsef Kadlecsik @ 2026-04-20 13:11 ` Jozsef Kadlecsik 2026-04-20 14:03 ` Florian Westphal 1 sibling, 1 reply; 8+ messages in thread From: Jozsef Kadlecsik @ 2026-04-20 13:11 UTC (permalink / raw) To: Florian Westphal; +Cc: Jozsef Kadlecsik, netfilter-devel, Pablo Neira Ayuso Hi Florian, On Wed, 15 Apr 2026, Florian Westphal wrote: >> data = ahash_data(n, j, set->dsize); >> copy_data: >> t->hregion[r].elements++; >> @@ -985,6 +986,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, >> if (SET_WITH_TIMEOUT(set)) >> ip_set_timeout_set(ext_timeout(data, set), ext->timeout); >> smp_mb__before_atomic(); >> + n->pos = npos; >> set_bit(j, n->used); > > I think this needs a followup-patch to switch this to smp_store_release > and readers to smp_load_acquire helpers. I'm pondering over the helpers and unsure. The hash bucket structure is as follows: /* A hash bucket */ struct hbucket { struct rcu_head rcu; /* for call_rcu */ /* Which positions are used in the array */ DECLARE_BITMAP(used, AHASH_MAX_TUNED); u8 size; /* size of the array */ u8 pos; /* position of the first free entry */ unsigned char value[] /* the array of the values */ __aligned(__alignof__(u64)); }; The elements in "value" are filled up successively, "pos" is there to limit the searching range for existing elements (which might be timed out as well). So until the "size" is unchanged (no growing/shrinking), the worst things which can happen if it's not "correct": - new element added but dump/test don't "find" it - element deleted but dump/test find it The critical part is when "size" changes. But "size" never updated directly: a completely new bucket is created when growing/shrinking and the new bucket is assigned via RCU mechanism. So why do you thing the helpers are required to read/update the "pos" element of the hash bucket? I might not see the wood for the trees. Best regards, Jozsef > Here is a diff for this (generated by LLM and only compile tested). > I think this can be a separate patch to not make this change too big and > to not mix different fixes. > > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h > index 0da02a8dfbae..2ca674e51699 100644 > --- a/net/netfilter/ipset/ip_set_hash_gen.h > +++ b/net/netfilter/ipset/ip_set_hash_gen.h > @@ -682,10 +682,13 @@ mtype_resize(struct ip_set *set, bool retried) > rcu_read_lock_bh(); > for (i = ahash_bucket_start(r, orig->htable_bits); > i < ahash_bucket_end(r, orig->htable_bits); i++) { > + u8 pos; > + > n = __ipset_dereference(hbucket(orig, i)); > if (!n) > continue; > - for (j = 0; j < n->pos; j++) { > + pos = smp_load_acquire(&n->pos); > + for (j = 0; j < pos; j++) { > if (!test_bit(j, n->used)) > continue; > data = ahash_data(n, j, dsize); > @@ -817,10 +820,13 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size) > for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) { > for (i = ahash_bucket_start(r, t->htable_bits); > i < ahash_bucket_end(r, t->htable_bits); i++) { > + u8 pos; > + > n = rcu_dereference_bh(hbucket(t, i)); > if (!n) > continue; > - for (j = 0; j < n->pos; j++) { > + pos = smp_load_acquire(&n->pos); > + for (j = 0; j < pos; j++) { > if (!test_bit(j, n->used)) > continue; > data = ahash_data(n, j, set->dsize); > @@ -963,7 +969,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > } > > copy_elem: > - j = npos = n->pos + 1; > + j = npos; > + npos = n->pos + 1; > data = ahash_data(n, j, set->dsize); > copy_data: > t->hregion[r].elements++; > @@ -985,8 +992,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > /* Must come last for the case when timed out entry is reused */ > if (SET_WITH_TIMEOUT(set)) > ip_set_timeout_set(ext_timeout(data, set), ext->timeout); > - smp_mb__before_atomic(); > - n->pos = npos; > + /* Ensure all data writes are visible before updating position */ > + smp_store_release(&n->pos, npos); > set_bit(j, n->used); > if (old != ERR_PTR(-ENOENT)) { > rcu_assign_pointer(hbucket(t, key), n); > @@ -1172,6 +1179,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, > int ret, i, j = 0; > #endif > u32 key, multi = 0; > + u8 pos; > > pr_debug("test by nets\n"); > for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) { > @@ -1189,7 +1197,8 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, > n = rcu_dereference_bh(hbucket(t, key)); > if (!n) > continue; > - for (i = 0; i < n->pos; i++) { > + pos = smp_load_acquire(&n->pos); > + for (i = 0; i < pos; i++) { > if (!test_bit(i, n->used)) > continue; > data = ahash_data(n, i, set->dsize); > @@ -1223,6 +1232,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, > struct mtype_elem *data; > int i, ret = 0; > u32 key, multi = 0; > + u8 pos; > > rcu_read_lock_bh(); > t = rcu_dereference_bh(h->table); > @@ -1245,7 +1255,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, > ret = 0; > goto out; > } > - for (i = 0; i < n->pos; i++) { > + pos = smp_load_acquire(&n->pos); > + for (i = 0; i < pos; i++) { > if (!test_bit(i, n->used)) > continue; > data = ahash_data(n, i, set->dsize); > @@ -1373,6 +1384,8 @@ mtype_list(const struct ip_set *set, > rcu_read_lock(); > for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits); > cb->args[IPSET_CB_ARG0]++) { > + u8 pos; > + > cond_resched_rcu(); > incomplete = skb_tail_pointer(skb); > n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0])); > @@ -1380,7 +1393,9 @@ mtype_list(const struct ip_set *set, > cb->args[IPSET_CB_ARG0], t, n); > if (!n) > continue; > - for (i = 0; i < n->pos; i++) { > + /* Acquire ordering with smp_store_release in mtype_add */ > + pos = smp_load_acquire(&n->pos); > + for (i = 0; i < pos; i++) { > if (!test_bit(i, n->used)) > continue; > e = ahash_data(n, i, set->dsize); > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] netfilter: ipset: Fix data race between add and dump in all hash types 2026-04-20 13:11 ` Jozsef Kadlecsik @ 2026-04-20 14:03 ` Florian Westphal 0 siblings, 0 replies; 8+ messages in thread From: Florian Westphal @ 2026-04-20 14:03 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: Jozsef Kadlecsik, netfilter-devel, Pablo Neira Ayuso Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote: > limit the searching range for existing elements (which might be timed out > as well). So until the "size" is unchanged (no growing/shrinking), the > worst things which can happen if it's not "correct": > > - new element added but dump/test don't "find" it > - element deleted but dump/test find it > > The critical part is when "size" changes. But "size" never updated > directly: a completely new bucket is created when growing/shrinking and > the new bucket is assigned via RCU mechanism. > > So why do you thing the helpers are required to read/update the "pos" > element of the hash bucket? I might not see the wood for the trees. As-is it is confusing. Pos is updated concurrently to dumpers with no ordering and no annotations (READ_ONCE, WRITE_ONCE). Maybe they are not required, but then there should be WRITE_ONCE() and READ_ONCE and a comment explaining that this is deliberately racy. Basically what you wrote above. It would help to understand what guarantees there are and that this is good enough. As-is KCSAN will surely splat without data race annotations. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 0/2] netfilter: ipset: concurrent add and dump fixes
@ 2026-04-08 7:02 Jozsef Kadlecsik
2026-04-08 7:02 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik
0 siblings, 1 reply; 8+ messages in thread
From: Jozsef Kadlecsik @ 2026-04-08 7:02 UTC (permalink / raw)
To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal
Hi Pablo, Florian,
Please consider applying the next patches:
* Fix a data race between add and list in all hash types due to setting
the position index too early.
* Fix a data race between add and list header commands in all hash types
by protecting the list header dumping part as well.
Best regards,
Jozsef
The following changes since commit a9d4f4f6e65e0bf9bbddedecc84d67249991979c:
net/mlx5: Update the list of the PCI supported devices (2026-04-06 19:17:42 -0700)
are available in the Git repository at:
git://blackhole.kfki.hu/nf b93f39a52388ac
for you to fetch changes up to b93f39a52388ac170530633db53137ff7cc41cf3:
netfilter: ipset: Fix data race between add and dump in all hash types (2026-04-08 08:52:31 +0200)
----------------------------------------------------------------
Jozsef Kadlecsik (2):
netfilter: ipset: Fix data race between add and list header in all hash types
netfilter: ipset: Fix data race between add and dump in all hash types
net/netfilter/ipset/ip_set_core.c | 4 ++--
net/netfilter/ipset/ip_set_hash_gen.h | 6 ++++--
2 files changed, 6 insertions(+), 4 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types 2026-04-08 7:02 [PATCH 0/2] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik @ 2026-04-08 7:02 ` Jozsef Kadlecsik 0 siblings, 0 replies; 8+ messages in thread From: Jozsef Kadlecsik @ 2026-04-08 7:02 UTC (permalink / raw) To: netfilter-devel; +Cc: Pablo Neira Ayuso, Florian Westphal The "ipset list -terse" command is actually a dump operation which may run parallel with "ipset add" commands, which can trigger an internal resizing of the hash type of sets just being dumped. However, dumping just the header part of the set was not protected against underlying resizing. Fix it by protecting the header dumping part as well. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> --- net/netfilter/ipset/ip_set_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index d0c9fe59c67d..e6a8b3acc556 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1648,13 +1648,13 @@ ip_set_dump_do(struct sk_buff *skb, struct netlink_callback *cb) if (cb->args[IPSET_CB_PROTO] > IPSET_PROTOCOL_MIN && nla_put_net16(skb, IPSET_ATTR_INDEX, htons(index))) goto nla_put_failure; + if (set->variant->uref) + set->variant->uref(set, cb, true); ret = set->variant->head(set, skb); if (ret < 0) goto release_refcount; if (dump_flags & IPSET_FLAG_LIST_HEADER) goto next_set; - if (set->variant->uref) - set->variant->uref(set, cb, true); fallthrough; default: ret = set->variant->list(set, skb, cb); -- 2.39.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-20 14:03 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-15 8:20 [PATCH 0/2 v3] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik 2026-04-15 8:20 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik 2026-04-15 8:20 ` [PATCH 2/2] netfilter: ipset: Fix data race between add and dump " Jozsef Kadlecsik 2026-04-15 10:22 ` Florian Westphal 2026-04-15 10:33 ` Jozsef Kadlecsik 2026-04-20 13:11 ` Jozsef Kadlecsik 2026-04-20 14:03 ` Florian Westphal -- strict thread matches above, loose matches on Subject: below -- 2026-04-08 7:02 [PATCH 0/2] netfilter: ipset: concurrent add and dump fixes Jozsef Kadlecsik 2026-04-08 7:02 ` [PATCH 1/2] netfilter: ipset: Fix data race between add and list header in all hash types Jozsef Kadlecsik
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.