* Netfilter: suspicious RCU usage in __nft_rule_lookup
@ 2024-10-24 16:56 Matthieu Baerts
2024-10-24 17:35 ` Florian Westphal
2024-10-24 17:57 ` Pablo Neira Ayuso
0 siblings, 2 replies; 15+ messages in thread
From: Matthieu Baerts @ 2024-10-24 16:56 UTC (permalink / raw)
To: netfilter-devel
Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Phil Sutter, coreteam
Hello,
First, thank you for all the work you did and are still doing around
Netfilter!
I'm writing you this email, because when I run the MPTCP test suite with
a VM running a kernel built with a debug config including
CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y), I get the following
warning:
> =============================
> WARNING: suspicious RCU usage
> 6.12.0-rc3+ #7 Not tainted
> -----------------------------
> net/netfilter/nf_tables_api.c:3420 RCU-list traversed in non-reader section!!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by iptables/134:
> #0: ffff888008c4fcc8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid (include/linux/jiffies.h:101) nf_tables
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 134 Comm: iptables Not tainted 6.12.0-rc3+ #7
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:123)
> lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
> __nft_rule_lookup (net/netfilter/nf_tables_api.c:3420 (discriminator 7)) nf_tables
> nf_tables_delrule (net/netfilter/nf_tables_api.c:4300 (discriminator 1)) nf_tables
> ? __pfx_nf_tables_delrule (net/netfilter/nf_tables_api.c:4262) nf_tables
> ? __mutex_unlock_slowpath (arch/x86/include/asm/atomic64_64.h:101 (discriminator 1))
> ? __nla_validate_parse (lib/nlattr.c:638)
> nfnetlink_rcv_batch (net/netfilter/nfnetlink.c:524)
> ? __pfx_nfnetlink_rcv_batch (net/netfilter/nfnetlink.c:373)
> ? rcu_read_lock_any_held (kernel/rcu/update.c:386 (discriminator 1))
> ? validate_chain (kernel/locking/lockdep.c:3797 (discriminator 1))
> ? rcu_read_lock_any_held (kernel/rcu/update.c:386 (discriminator 1))
> ? validate_chain (kernel/locking/lockdep.c:3797 (discriminator 1))
> ? __pfx_validate_chain (kernel/locking/lockdep.c:3860)
> ? __nla_validate_parse (lib/nlattr.c:638)
> nfnetlink_rcv (net/netfilter/nfnetlink.c:647)
> ? __pfx___netlink_lookup (net/netlink/af_netlink.c:512)
> ? __pfx_nfnetlink_rcv (net/netfilter/nfnetlink.c:651)
> netlink_unicast (net/netlink/af_netlink.c:1331)
> ? __pfx_netlink_unicast (net/netlink/af_netlink.c:1342)
> ? find_held_lock (kernel/locking/lockdep.c:5315 (discriminator 1))
> ? __might_fault (mm/memory.c:6700 (discriminator 5))
> netlink_sendmsg (net/netlink/af_netlink.c:1901)
> ? __pfx_netlink_sendmsg (net/netlink/af_netlink.c:1820)
> ? __import_iovec (lib/iov_iter.c:1433 (discriminator 1))
> ____sys_sendmsg (net/socket.c:729 (discriminator 1))
> ? __pfx_____sys_sendmsg (net/socket.c:2553)
> ? __pfx_copy_msghdr_from_user (net/socket.c:2533)
> ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347)
> ? sk_setsockopt (net/core/sock.c:1129)
> ? __local_bh_enable_ip (arch/x86/include/asm/irqflags.h:42)
> ___sys_sendmsg (net/socket.c:2663)
> ? __pfx_sk_setsockopt (net/core/sock.c:1163)
> ? __pfx____sys_sendmsg (net/socket.c:2650)
> ? mark_lock (kernel/locking/lockdep.c:4703 (discriminator 1))
> ? fdget (include/linux/atomic/atomic-arch-fallback.h:479 (discriminator 2))
> ? find_held_lock (kernel/locking/lockdep.c:5315 (discriminator 1))
> __sys_sendmsg (net/socket.c:2690 (discriminator 1))
> ? __pfx___sys_sendmsg (net/socket.c:2678)
> ? __sys_setsockopt (include/linux/file.h:35)
> do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1))
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7ff6a026e004
It is very easy for me to reproduce it: simply by adding and removing an
IPTables rule. Just in case, here are the steps that can be used to have
the same behaviour:
$ cd [kernel source code]
$ echo "iptables -A OUTPUT -j REJECT; iptables -D OUTPUT -j REJECT" \
> .virtme-exec-run
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-debug -e RCU_EXPERT -e PROVE_RCU_LIST
I looked a bit at the code in net/netfilter/nf_tables_api.c, and I can
see that rcu_read_(un)lock() are probably missing, but I'm a bit
confused by how the chain->rules list is protected, and modified using
or not the RCU helpers.
Do you mind looking at this issue please?
(No urgency on my side.)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 16:56 Netfilter: suspicious RCU usage in __nft_rule_lookup Matthieu Baerts
@ 2024-10-24 17:35 ` Florian Westphal
2024-10-24 18:00 ` Pablo Neira Ayuso
2024-10-24 17:57 ` Pablo Neira Ayuso
1 sibling, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2024-10-24 17:35 UTC (permalink / raw)
To: Matthieu Baerts
Cc: netfilter-devel, Pablo Neira Ayuso, Jozsef Kadlecsik,
Florian Westphal, Phil Sutter, coreteam
Matthieu Baerts <matttbe@kernel.org> wrote:
> Hello,
>
> First, thank you for all the work you did and are still doing around
> Netfilter!
>
> I'm writing you this email, because when I run the MPTCP test suite with
> a VM running a kernel built with a debug config including
> CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y), I get the following
> warning:
>
>
> > 6.12.0-rc3+ #7 Not tainted
> > -----------------------------
> > net/netfilter/nf_tables_api.c:3420 RCU-list traversed in non-reader section!!
> >
> > other info that might help us debug this:
> >
> > rcu_scheduler_active = 2, debug_locks = 1
> > 1 lock held by iptables/134:
> > #0: ffff888008c4fcc8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid (include/linux/jiffies.h:101) nf_tables
List is protected by transaction mutex, but we can't switch to plain
for_each_entry as this is also called from rcu-only context.
We either need two functions or pass nft_net + lockdep_is_held() check
as extra arg.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 16:56 Netfilter: suspicious RCU usage in __nft_rule_lookup Matthieu Baerts
2024-10-24 17:35 ` Florian Westphal
@ 2024-10-24 17:57 ` Pablo Neira Ayuso
2024-10-24 18:20 ` Pablo Neira Ayuso
1 sibling, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-24 17:57 UTC (permalink / raw)
To: Matthieu Baerts
Cc: netfilter-devel, Jozsef Kadlecsik, Florian Westphal, Phil Sutter,
coreteam
Hi,
On Thu, Oct 24, 2024 at 06:56:43PM +0200, Matthieu Baerts wrote:
> Hello,
>
> First, thank you for all the work you did and are still doing around
> Netfilter!
>
> I'm writing you this email, because when I run the MPTCP test suite with
> a VM running a kernel built with a debug config including
> CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y), I get the following
> warning:
>
>
> > =============================
> > WARNING: suspicious RCU usage
> > 6.12.0-rc3+ #7 Not tainted
> > -----------------------------
> > net/netfilter/nf_tables_api.c:3420 RCU-list traversed in non-reader section!!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 2, debug_locks = 1
> > 1 lock held by iptables/134:
> > #0: ffff888008c4fcc8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid (include/linux/jiffies.h:101) nf_tables
> >
> > stack backtrace:
> > CPU: 1 UID: 0 PID: 134 Comm: iptables Not tainted 6.12.0-rc3+ #7
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > Call Trace:
> > <TASK>
> > dump_stack_lvl (lib/dump_stack.c:123)
> > lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
> > __nft_rule_lookup (net/netfilter/nf_tables_api.c:3420 (discriminator 7)) nf_tables
This is a _rcu notation which is not correct, while mutex is held, it
was introduced here:
d9adf22a2918 ("netfilter: nf_tables: use call_rcu in netlink dumps")
I will post a patch, thanks for your report.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 17:35 ` Florian Westphal
@ 2024-10-24 18:00 ` Pablo Neira Ayuso
0 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-24 18:00 UTC (permalink / raw)
To: Florian Westphal
Cc: Matthieu Baerts, netfilter-devel, Jozsef Kadlecsik, Phil Sutter,
coreteam
On Thu, Oct 24, 2024 at 07:35:36PM +0200, Florian Westphal wrote:
> Matthieu Baerts <matttbe@kernel.org> wrote:
> > Hello,
> >
> > First, thank you for all the work you did and are still doing around
> > Netfilter!
> >
> > I'm writing you this email, because when I run the MPTCP test suite with
> > a VM running a kernel built with a debug config including
> > CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y), I get the following
> > warning:
> >
> >
> > > 6.12.0-rc3+ #7 Not tainted
> > > -----------------------------
> > > net/netfilter/nf_tables_api.c:3420 RCU-list traversed in non-reader section!!
> > >
> > > other info that might help us debug this:
> > >
> > > rcu_scheduler_active = 2, debug_locks = 1
> > > 1 lock held by iptables/134:
> > > #0: ffff888008c4fcc8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid (include/linux/jiffies.h:101) nf_tables
>
> List is protected by transaction mutex, but we can't switch to plain
> for_each_entry as this is also called from rcu-only context.
>
> We either need two functions or pass nft_net + lockdep_is_held() check
> as extra arg.
Right, I can see _rcu is still needed for the nf_tables_getrule_single() case.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 17:57 ` Pablo Neira Ayuso
@ 2024-10-24 18:20 ` Pablo Neira Ayuso
2024-10-24 23:22 ` Florian Westphal
0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-24 18:20 UTC (permalink / raw)
To: Matthieu Baerts
Cc: netfilter-devel, Jozsef Kadlecsik, Florian Westphal, Phil Sutter,
coreteam
On Thu, Oct 24, 2024 at 07:57:40PM +0200, Pablo Neira Ayuso wrote:
> Hi,
>
> On Thu, Oct 24, 2024 at 06:56:43PM +0200, Matthieu Baerts wrote:
> > Hello,
> >
> > First, thank you for all the work you did and are still doing around
> > Netfilter!
> >
> > I'm writing you this email, because when I run the MPTCP test suite with
> > a VM running a kernel built with a debug config including
> > CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y), I get the following
> > warning:
> >
> >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 6.12.0-rc3+ #7 Not tainted
> > > -----------------------------
> > > net/netfilter/nf_tables_api.c:3420 RCU-list traversed in non-reader section!!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 2, debug_locks = 1
> > > 1 lock held by iptables/134:
> > > #0: ffff888008c4fcc8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid (include/linux/jiffies.h:101) nf_tables
> > >
> > > stack backtrace:
> > > CPU: 1 UID: 0 PID: 134 Comm: iptables Not tainted 6.12.0-rc3+ #7
> > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl (lib/dump_stack.c:123)
> > > lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
> > > __nft_rule_lookup (net/netfilter/nf_tables_api.c:3420 (discriminator 7)) nf_tables
>
> This is a _rcu notation which is not correct, while mutex is held, it
> was introduced here:
>
> d9adf22a2918 ("netfilter: nf_tables: use call_rcu in netlink dumps")
Hm. There is also:
3cb03edb4de3 ("netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests")
this comment below is also not valid anymore:
/* called with rcu_read_lock held */
static struct sk_buff *
nf_tables_getrule_single(u32 portid, const struct nfnl_info *info,
const struct nlattr * const nla[], bool reset)
This is not the only spot that can trigger rcu splats.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 18:20 ` Pablo Neira Ayuso
@ 2024-10-24 23:22 ` Florian Westphal
2024-10-25 8:14 ` Matthieu Baerts
2024-10-25 10:35 ` Pablo Neira Ayuso
0 siblings, 2 replies; 15+ messages in thread
From: Florian Westphal @ 2024-10-24 23:22 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Matthieu Baerts, netfilter-devel, Jozsef Kadlecsik,
Florian Westphal, Phil Sutter, coreteam
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> this comment below is also not valid anymore:
>
> /* called with rcu_read_lock held */
> static struct sk_buff *
> nf_tables_getrule_single(u32 portid, const struct nfnl_info *info,
> const struct nlattr * const nla[], bool reset)
Yes, either called with rcu read lock or commit mutex held.
> This is not the only spot that can trigger rcu splats.
Agree. Will you make a patch or should I take a look?
I'm leaning towards a common helper that can pass the
right lockdep annotation, i.e. pass nft_net as arg to
document when RCU or transaction semantics apply.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 23:22 ` Florian Westphal
@ 2024-10-25 8:14 ` Matthieu Baerts
2024-10-25 9:23 ` Florian Westphal
2024-10-25 10:35 ` Pablo Neira Ayuso
1 sibling, 1 reply; 15+ messages in thread
From: Matthieu Baerts @ 2024-10-25 8:14 UTC (permalink / raw)
To: Florian Westphal, Pablo Neira Ayuso
Cc: netfilter-devel, Jozsef Kadlecsik, Phil Sutter, coreteam
Hi Florian, Pablo,
On 25/10/2024 01:22, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>>
>> this comment below is also not valid anymore:
>>
>> /* called with rcu_read_lock held */
>> static struct sk_buff *
>> nf_tables_getrule_single(u32 portid, const struct nfnl_info *info,
>> const struct nlattr * const nla[], bool reset)
>
> Yes, either called with rcu read lock or commit mutex held.
>
>> This is not the only spot that can trigger rcu splats.
>
> Agree. Will you make a patch or should I take a look?
> I'm leaning towards a common helper that can pass the
> right lockdep annotation, i.e. pass nft_net as arg to
> document when RCU or transaction semantics apply.
Thank you both for your quick replies, and for looking for a fix!
While at it, I had a question related to the rules' list: in
__nft_release_basechain() from the same nf_tables_api.c file, list's
entries are not removed with the _rcu variant → is it OK to do that
because this function is only called last at the cleanup time, when no
other readers can iterate over the list? So similar to what is done in
__nft_release_table()?
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 8:14 ` Matthieu Baerts
@ 2024-10-25 9:23 ` Florian Westphal
2024-10-25 9:42 ` Pablo Neira Ayuso
2024-10-25 9:46 ` Phil Sutter
0 siblings, 2 replies; 15+ messages in thread
From: Florian Westphal @ 2024-10-25 9:23 UTC (permalink / raw)
To: Matthieu Baerts
Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel,
Jozsef Kadlecsik, Phil Sutter, coreteam
Matthieu Baerts <matttbe@kernel.org> wrote:
> While at it, I had a question related to the rules' list: in
> __nft_release_basechain() from the same nf_tables_api.c file, list's
> entries are not removed with the _rcu variant → is it OK to do that
> because this function is only called last at the cleanup time, when no
> other readers can iterate over the list? So similar to what is done in
> __nft_release_table()?
Looks like __nft_release_basechain() is broken from start, I don't see
how it can work, it doesn't call synchronize_rcu or anything like that
afaics.
No idea what to do here.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 9:23 ` Florian Westphal
@ 2024-10-25 9:42 ` Pablo Neira Ayuso
2024-10-25 9:46 ` Phil Sutter
1 sibling, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-25 9:42 UTC (permalink / raw)
To: Florian Westphal
Cc: Matthieu Baerts, netfilter-devel, Jozsef Kadlecsik, Phil Sutter,
coreteam
On Fri, Oct 25, 2024 at 11:23:56AM +0200, Florian Westphal wrote:
> Matthieu Baerts <matttbe@kernel.org> wrote:
> > While at it, I had a question related to the rules' list: in
> > __nft_release_basechain() from the same nf_tables_api.c file, list's
> > entries are not removed with the _rcu variant → is it OK to do that
> > because this function is only called last at the cleanup time, when no
> > other readers can iterate over the list? So similar to what is done in
> > __nft_release_table()?
>
> Looks like __nft_release_basechain() is broken from start, I don't see
> how it can work, it doesn't call synchronize_rcu or anything like that
> afaics.
Right, it should unregister hooks, then wait for rcu grace period, and
finally release objects.
> No idea what to do here.
I'm looking into this.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 9:23 ` Florian Westphal
2024-10-25 9:42 ` Pablo Neira Ayuso
@ 2024-10-25 9:46 ` Phil Sutter
2024-10-25 10:26 ` Pablo Neira Ayuso
1 sibling, 1 reply; 15+ messages in thread
From: Phil Sutter @ 2024-10-25 9:46 UTC (permalink / raw)
To: Florian Westphal
Cc: Matthieu Baerts, Pablo Neira Ayuso, netfilter-devel,
Jozsef Kadlecsik, coreteam
On Fri, Oct 25, 2024 at 11:23:56AM +0200, Florian Westphal wrote:
> Matthieu Baerts <matttbe@kernel.org> wrote:
> > While at it, I had a question related to the rules' list: in
> > __nft_release_basechain() from the same nf_tables_api.c file, list's
> > entries are not removed with the _rcu variant → is it OK to do that
> > because this function is only called last at the cleanup time, when no
> > other readers can iterate over the list? So similar to what is done in
> > __nft_release_table()?
>
> Looks like __nft_release_basechain() is broken from start, I don't see
> how it can work, it doesn't call synchronize_rcu or anything like that
> afaics.
>
> No idea what to do here.
It will vanish with my name-based netdev hooks series (the second part).
I could prepare a patch for nf/stable which merely kills that function -
dropping netdev-family chains upon removal of last interface is
inconsistent wrt. flowtables which remain in place.
Another alternative might be to call synchronize_rcu() in there, but it
slows down interface teardown AIUI.
Cheers, Phil
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 9:46 ` Phil Sutter
@ 2024-10-25 10:26 ` Pablo Neira Ayuso
2024-10-25 10:56 ` Phil Sutter
0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-25 10:26 UTC (permalink / raw)
To: Phil Sutter, Florian Westphal, Matthieu Baerts, netfilter-devel,
Jozsef Kadlecsik, coreteam
On Fri, Oct 25, 2024 at 11:46:33AM +0200, Phil Sutter wrote:
> On Fri, Oct 25, 2024 at 11:23:56AM +0200, Florian Westphal wrote:
> > Matthieu Baerts <matttbe@kernel.org> wrote:
> > > While at it, I had a question related to the rules' list: in
> > > __nft_release_basechain() from the same nf_tables_api.c file, list's
> > > entries are not removed with the _rcu variant → is it OK to do that
> > > because this function is only called last at the cleanup time, when no
> > > other readers can iterate over the list? So similar to what is done in
> > > __nft_release_table()?
> >
> > Looks like __nft_release_basechain() is broken from start, I don't see
> > how it can work, it doesn't call synchronize_rcu or anything like that
> > afaics.
> >
> > No idea what to do here.
>
> It will vanish with my name-based netdev hooks series (the second part).
> I could prepare a patch for nf/stable which merely kills that function -
> dropping netdev-family chains upon removal of last interface is
> inconsistent wrt. flowtables which remain in place.
I like the idea of keeping the basechain in place. With chain update
support, it makes sense to add a basechain then update it with the
devices to hook in.
But chain device updates are only recently supported:
b9703ed44ffbfba85c103b9de01886a225e14b38
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri Apr 21 00:34:31 2023 +0200
netfilter: nf_tables: support for adding new devices to an existing netdev chain
that is, in older kernels, this chain would remain unused because
updates are not be possible.
> Another alternative might be to call synchronize_rcu() in there, but it
> slows down interface teardown AIUI.
Else unregister objects from lists under mutex then call_rcu() to
release them.
Then, take you patch so new kernel don't remove the basechain.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-24 23:22 ` Florian Westphal
2024-10-25 8:14 ` Matthieu Baerts
@ 2024-10-25 10:35 ` Pablo Neira Ayuso
1 sibling, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-25 10:35 UTC (permalink / raw)
To: Florian Westphal
Cc: Matthieu Baerts, netfilter-devel, Jozsef Kadlecsik, Phil Sutter,
coreteam
On Fri, Oct 25, 2024 at 01:22:30AM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > this comment below is also not valid anymore:
> >
> > /* called with rcu_read_lock held */
> > static struct sk_buff *
> > nf_tables_getrule_single(u32 portid, const struct nfnl_info *info,
> > const struct nlattr * const nla[], bool reset)
>
> Yes, either called with rcu read lock or commit mutex held.
>
> > This is not the only spot that can trigger rcu splats.
>
> Agree. Will you make a patch or should I take a look?
> I'm leaning towards a common helper that can pass the
> right lockdep annotation, i.e. pass nft_net as arg to
> document when RCU or transaction semantics apply.
Please, go ahead if you have the time. Thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 10:26 ` Pablo Neira Ayuso
@ 2024-10-25 10:56 ` Phil Sutter
2024-10-25 11:43 ` Pablo Neira Ayuso
0 siblings, 1 reply; 15+ messages in thread
From: Phil Sutter @ 2024-10-25 10:56 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Florian Westphal, Matthieu Baerts, netfilter-devel,
Jozsef Kadlecsik, coreteam
On Fri, Oct 25, 2024 at 12:26:47PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Oct 25, 2024 at 11:46:33AM +0200, Phil Sutter wrote:
> > On Fri, Oct 25, 2024 at 11:23:56AM +0200, Florian Westphal wrote:
> > > Matthieu Baerts <matttbe@kernel.org> wrote:
> > > > While at it, I had a question related to the rules' list: in
> > > > __nft_release_basechain() from the same nf_tables_api.c file, list's
> > > > entries are not removed with the _rcu variant → is it OK to do that
> > > > because this function is only called last at the cleanup time, when no
> > > > other readers can iterate over the list? So similar to what is done in
> > > > __nft_release_table()?
> > >
> > > Looks like __nft_release_basechain() is broken from start, I don't see
> > > how it can work, it doesn't call synchronize_rcu or anything like that
> > > afaics.
> > >
> > > No idea what to do here.
> >
> > It will vanish with my name-based netdev hooks series (the second part).
> > I could prepare a patch for nf/stable which merely kills that function -
> > dropping netdev-family chains upon removal of last interface is
> > inconsistent wrt. flowtables which remain in place.
>
> I like the idea of keeping the basechain in place. With chain update
> support, it makes sense to add a basechain then update it with the
> devices to hook in.
>
> But chain device updates are only recently supported:
>
> b9703ed44ffbfba85c103b9de01886a225e14b38
> Author: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Fri Apr 21 00:34:31 2023 +0200
>
> netfilter: nf_tables: support for adding new devices to an existing netdev chain
>
> that is, in older kernels, this chain would remain unused because
> updates are not be possible.
No idea how many nftables "service" scripts out there actually do it,
but I keep considering the hypothetical case of SAVE_RULESET_ON_STOP=1
which does 'nft list ruleset >/etc/nftables/nftables.conf'. If it runs
after the "network" service shuts down and removes all virtual
interfaces, it would drop any hooked chains from user's config.
> > Another alternative might be to call synchronize_rcu() in there, but it
> > slows down interface teardown AIUI.
>
> Else unregister objects from lists under mutex then call_rcu() to
> release them.
This requires to add an rcu_head to nft_chain or nft_base_chain, right?
Or is it possible to allocate a new object just for the purpose, like:
| struct nft_chain_rcu_call {
| struct rcu_head rcu;
| struct nft_chain *chain;
| };
And free both base chain and this temporary object in the callback?
> Then, take you patch so new kernel don't remove the basechain.
We would not change behaviour in stable this way, also not the worst
thing to do. Your call!
Thanks, Phil
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 10:56 ` Phil Sutter
@ 2024-10-25 11:43 ` Pablo Neira Ayuso
2024-10-25 13:16 ` Pablo Neira Ayuso
0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-25 11:43 UTC (permalink / raw)
To: Phil Sutter, Florian Westphal, Matthieu Baerts, netfilter-devel,
Jozsef Kadlecsik, coreteam
On Fri, Oct 25, 2024 at 12:56:03PM +0200, Phil Sutter wrote:
> On Fri, Oct 25, 2024 at 12:26:47PM +0200, Pablo Neira Ayuso wrote:
> > On Fri, Oct 25, 2024 at 11:46:33AM +0200, Phil Sutter wrote:
> > > On Fri, Oct 25, 2024 at 11:23:56AM +0200, Florian Westphal wrote:
> > > > Matthieu Baerts <matttbe@kernel.org> wrote:
> > > > > While at it, I had a question related to the rules' list: in
> > > > > __nft_release_basechain() from the same nf_tables_api.c file, list's
> > > > > entries are not removed with the _rcu variant → is it OK to do that
> > > > > because this function is only called last at the cleanup time, when no
> > > > > other readers can iterate over the list? So similar to what is done in
> > > > > __nft_release_table()?
> > > >
> > > > Looks like __nft_release_basechain() is broken from start, I don't see
> > > > how it can work, it doesn't call synchronize_rcu or anything like that
> > > > afaics.
> > > >
> > > > No idea what to do here.
> > >
> > > It will vanish with my name-based netdev hooks series (the second part).
> > > I could prepare a patch for nf/stable which merely kills that function -
> > > dropping netdev-family chains upon removal of last interface is
> > > inconsistent wrt. flowtables which remain in place.
> >
> > I like the idea of keeping the basechain in place. With chain update
> > support, it makes sense to add a basechain then update it with the
> > devices to hook in.
> >
> > But chain device updates are only recently supported:
> >
> > b9703ed44ffbfba85c103b9de01886a225e14b38
> > Author: Pablo Neira Ayuso <pablo@netfilter.org>
> > Date: Fri Apr 21 00:34:31 2023 +0200
> >
> > netfilter: nf_tables: support for adding new devices to an existing netdev chain
> >
> > that is, in older kernels, this chain would remain unused because
> > updates are not be possible.
>
> No idea how many nftables "service" scripts out there actually do it,
> but I keep considering the hypothetical case of SAVE_RULESET_ON_STOP=1
> which does 'nft list ruleset >/etc/nftables/nftables.conf'. If it runs
> after the "network" service shuts down and removes all virtual
> interfaces, it would drop any hooked chains from user's config.
I'm afraid if the chain remains in place with no device it won't be
any better once the system boots up again, because chains cannot
updated in old kernels.
I think chain updates go hand in hand with your proposed approach to
leave the basechain in place.
> > > Another alternative might be to call synchronize_rcu() in there, but it
> > > slows down interface teardown AIUI.
> >
> > Else unregister objects from lists under mutex then call_rcu() to
> > release them.
>
> This requires to add an rcu_head to nft_chain or nft_base_chain, right?
> Or is it possible to allocate a new object just for the purpose, like:
>
> | struct nft_chain_rcu_call {
> | struct rcu_head rcu;
> | struct nft_chain *chain;
> | };
>
> And free both base chain and this temporary object in the callback?
>
> > Then, take you patch so new kernel don't remove the basechain.
>
> We would not change behaviour in stable this way, also not the worst
> thing to do. Your call!
I am still exploring how ugly this fix looks, you will see get to know.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Netfilter: suspicious RCU usage in __nft_rule_lookup
2024-10-25 11:43 ` Pablo Neira Ayuso
@ 2024-10-25 13:16 ` Pablo Neira Ayuso
0 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-25 13:16 UTC (permalink / raw)
To: Phil Sutter, Florian Westphal, Matthieu Baerts, netfilter-devel,
Jozsef Kadlecsik, coreteam
[-- Attachment #1: Type: text/plain, Size: 375 bytes --]
On Fri, Oct 25, 2024 at 01:43:28PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Oct 25, 2024 at 12:56:03PM +0200, Phil Sutter wrote:
[...]
> > We would not change behaviour in stable this way, also not the worst
> > thing to do. Your call!
>
> I am still exploring how ugly this fix looks, you will see get to know.
This is a quick preview, compile-tested only at this stage.
[-- Attachment #2: candidate-fix.patch --]
[-- Type: text/x-diff, Size: 2441 bytes --]
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 91ae20cb7648..8dd8e278843d 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1120,6 +1120,7 @@ struct nft_chain {
char *name;
u16 udlen;
u8 *udata;
+ struct rcu_head rcu_head;
/* Only used during control plane commit phase: */
struct nft_rule_blob *blob_next;
@@ -1282,6 +1283,7 @@ struct nft_table {
struct list_head sets;
struct list_head objects;
struct list_head flowtables;
+ possible_net_t net;
u64 hgenerator;
u64 handle;
u32 use;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index a24fe62650a7..762879187edb 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1495,6 +1495,7 @@ static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info,
INIT_LIST_HEAD(&table->sets);
INIT_LIST_HEAD(&table->objects);
INIT_LIST_HEAD(&table->flowtables);
+ write_pnet(&table->net, net);
table->family = family;
table->flags = flags;
table->handle = ++nft_net->table_handle;
@@ -11430,22 +11431,39 @@ int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
}
EXPORT_SYMBOL_GPL(nft_data_dump);
-int __nft_release_basechain(struct nft_ctx *ctx)
+static void __nft_release_basechain_rcu(struct rcu_head *head)
{
+ struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head);
struct nft_rule *rule, *nr;
+ struct nft_ctx ctx = {
+ .family = chain->table->family,
+ .net = read_pnet(&chain->table->net),
+ };
+
+ list_for_each_entry_safe(rule, nr, &chain->rules, list) {
+ list_del(&rule->list);
+ nf_tables_rule_release(&ctx, rule);
+ }
+ nf_tables_chain_destroy(chain);
+ put_net(ctx.net);
+}
+
+int __nft_release_basechain(struct nft_ctx *ctx)
+{
+ struct nft_rule *rule;
if (WARN_ON(!nft_is_base_chain(ctx->chain)))
return 0;
nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain);
- list_for_each_entry_safe(rule, nr, &ctx->chain->rules, list) {
- list_del(&rule->list);
+ list_for_each_entry(rule, &ctx->chain->rules, list)
nft_use_dec(&ctx->chain->use);
- nf_tables_rule_release(ctx, rule);
- }
+
nft_chain_del(ctx->chain);
nft_use_dec(&ctx->table->use);
- nf_tables_chain_destroy(ctx->chain);
+ get_net(ctx->net);
+
+ call_rcu(&ctx->chain->rcu_head, __nft_release_basechain_rcu);
return 0;
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-10-25 13:16 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-24 16:56 Netfilter: suspicious RCU usage in __nft_rule_lookup Matthieu Baerts
2024-10-24 17:35 ` Florian Westphal
2024-10-24 18:00 ` Pablo Neira Ayuso
2024-10-24 17:57 ` Pablo Neira Ayuso
2024-10-24 18:20 ` Pablo Neira Ayuso
2024-10-24 23:22 ` Florian Westphal
2024-10-25 8:14 ` Matthieu Baerts
2024-10-25 9:23 ` Florian Westphal
2024-10-25 9:42 ` Pablo Neira Ayuso
2024-10-25 9:46 ` Phil Sutter
2024-10-25 10:26 ` Pablo Neira Ayuso
2024-10-25 10:56 ` Phil Sutter
2024-10-25 11:43 ` Pablo Neira Ayuso
2024-10-25 13:16 ` Pablo Neira Ayuso
2024-10-25 10:35 ` Pablo Neira Ayuso
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.