From: Jozsef Kadlecsik <kadlec@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel@vger.kernel.org,
Pablo Neira Ayuso <pablo@netfilter.org>,
Linkui Xiao <xiaolinkui@kylinos.cn>
Subject: Re: [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2
Date: Wed, 8 Nov 2023 11:46:08 +0100 (CET) [thread overview]
Message-ID: <5610e82-e462-1ef-d06c-6d15853224b@netfilter.org> (raw)
In-Reply-To: <20231104103551.GA9892@breakpoint.cc>
Hi Florian,
On Sat, 4 Nov 2023, Florian Westphal wrote:
> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > Linkui Xiao reported that there's a race condition when ipset swap and destroy is
> > called, which can lead to crash in add/del/test element operations. Swap then
> > destroy are usual operations to replace a set with another one in a production
> > system. The issue can in some cases be reproduced with the script:
>
> > iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
> > while [ 1 ]
> > do
> > # ... Ongoing traffic...
> > ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
> > ipset add hash_ip2 172.20.0.0/16
> > ipset swap hash_ip1 hash_ip2
> > ipset destroy hash_ip2
> > sleep 0.05
> > done
> >
> > In the race case the possible order of the operations are
> >
> > CPU0 CPU1
> > ip_set_test
> > ipset swap hash_ip1 hash_ip2
> > ipset destroy hash_ip2
> > hash_net_kadt
> >
> > Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
> > is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
> > removed it, hash_net_kadt crashes.
> >
> > The fix is to protect both the list of the sets and the set pointers in an extended RCU
> > region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock().
> > The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>.
> >
> > v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
> > ip_set_destroy() unnecessarily when all sets are destroyed.
>
> Hmm. Isn't it enough to only call synchronize_rcu() in ip_set_swap?
>
> All netfilter hooks run with rcu_read_lock() held, em_ipset.c wraps the
> entire ip_set_test() in rcu read lock/unlock pair.
Hm, actually, you are right. Technically there's no need to extend the
rcu_read_lock() protected area inside ipset itself.
> > @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index)
> > struct ip_set_net *inst = ip_set_pernet(net);
> >
> > rcu_read_lock();
> > - /* ip_set_list itself needs to be protected */
> > + /* ip_set_list and the set pointer need to be protected */
> > set = rcu_dereference(inst->ip_set_list)[index];
> > - rcu_read_unlock();
> >
> > return set;
> > }
>
> ... so I don't understand why ip_set_rcu_get() has to extend
> the locked section.
>
> AFAICS there are only two type of callers:
> 1. rcu read lock is already held (datapath)
> 2. ipset nfnl subsys mutex is held
>
> *probably* This could be changed in a separate patch to:
>
> - rcu_read_lock();
> - /* ip_set_list itself needs to be protected */
> - set = rcu_dereference(inst->ip_set_list)[index];
> - rcu_read_unlock();
> + /* ip_set_list and the set pointer need to be protected */
> + return rcu_dereference_check(inst->ip_set_list, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)[index];
>
> This is an example, probably better to add a small
> "ip_set_dereference_nfnl" helper to hide the lockdep construct...
>
> Not saying the patch is wrong; rcu read locks nest and
> ipset locking is not simple so I might be missing something.
I'm going to rework the patch, thanks for your input!
Best regards,
Jozsef
--
E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
H-1525 Budapest 114, POB. 49, Hungary
next prev parent reply other threads:[~2023-11-08 10:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-04 10:03 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v2 Jozsef Kadlecsik
2023-11-04 10:03 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side " Jozsef Kadlecsik
2023-11-04 10:35 ` Florian Westphal
2023-11-08 10:46 ` Jozsef Kadlecsik [this message]
-- strict thread matches above, loose matches on Subject: below --
2023-11-13 20:09 [PATCH 0/1] ipset patch to fix race condition between swap/destroy and add/del/test, v3 Jozsef Kadlecsik
2023-11-13 20:09 ` [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2 Jozsef Kadlecsik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5610e82-e462-1ef-d06c-6d15853224b@netfilter.org \
--to=kadlec@netfilter.org \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=xiaolinkui@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).