* [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
@ 2024-10-07 20:52 Florian Westphal
2024-10-08 1:15 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2024-10-07 20:52 UTC (permalink / raw)
To: linux-kernel
Cc: akpm, Florian Westphal, Uladzislau Rezki, Vlastimil Babka,
Suren Baghdasaryan, Kent Overstreet, Ben Greear
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
lib/codetag.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/codetag.c b/lib/codetag.c
index afa8a2d4f317..a3472d428821 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
if (!mod)
return true;
+ kvfree_rcu_barrier();
+
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
struct codetag_module *found = NULL;
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-07 20:52 [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls Florian Westphal @ 2024-10-08 1:15 ` Andrew Morton 2024-10-08 1:49 ` Suren Baghdasaryan 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2024-10-08 1:15 UTC (permalink / raw) To: Florian Westphal Cc: linux-kernel, Uladzislau Rezki, Vlastimil Babka, Suren Baghdasaryan, Kent Overstreet, Ben Greear On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > Ben Greear reports following splat: > ------------[ cut here ]------------ > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > ... > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > codetag_unload_module+0x19b/0x2a0 > ? codetag_load_module+0x80/0x80 > > nf_nat module exit calls kfree_rcu on those addresses, but the free > operation is likely still pending by the time alloc_tag checks for leaks. > > Wait for outstanding kfree_rcu operations to complete before checking > resolves this warning. > > Reproducer: > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > grep nf_nat /proc/allocinfo # will list 4 allocations > rmmod nft_chain_nat > rmmod nf_nat # will WARN. > > ... > > --- a/lib/codetag.c > +++ b/lib/codetag.c > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > if (!mod) > return true; > > + kvfree_rcu_barrier(); > + > mutex_lock(&codetag_lock); > list_for_each_entry(cttype, &codetag_types, link) { > struct codetag_module *found = NULL; It's always hard to determine why a thing like this is present, so a comment is helpful: --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix +++ a/lib/codetag.c @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module if (!mod) return true; + /* await any module's kfree_rcu() operations to complete */ kvfree_rcu_barrier(); mutex_lock(&codetag_lock); _ But I do wonder whether this is in the correct place. Waiting for a module's ->exit() function's kfree_rcu()s to complete should properly be done by the core module handling code? free_module() does a full-on synchronize_rcu() prior to freeing the module memory itself and I think codetag_unload_module() could be called after that? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 1:15 ` Andrew Morton @ 2024-10-08 1:49 ` Suren Baghdasaryan 2024-10-08 7:14 ` Uladzislau Rezki 2024-10-08 8:14 ` Vlastimil Babka 0 siblings, 2 replies; 12+ messages in thread From: Suren Baghdasaryan @ 2024-10-08 1:49 UTC (permalink / raw) To: Andrew Morton Cc: Florian Westphal, linux-kernel, Uladzislau Rezki, Vlastimil Babka, Kent Overstreet, Ben Greear On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > Ben Greear reports following splat: > > ------------[ cut here ]------------ > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > ... > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > codetag_unload_module+0x19b/0x2a0 > > ? codetag_load_module+0x80/0x80 > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > Wait for outstanding kfree_rcu operations to complete before checking > > resolves this warning. > > > > Reproducer: > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > grep nf_nat /proc/allocinfo # will list 4 allocations > > rmmod nft_chain_nat > > rmmod nf_nat # will WARN. > > > > ... > > > > --- a/lib/codetag.c > > +++ b/lib/codetag.c > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > if (!mod) > > return true; > > > > + kvfree_rcu_barrier(); > > + > > mutex_lock(&codetag_lock); > > list_for_each_entry(cttype, &codetag_types, link) { > > struct codetag_module *found = NULL; > > It's always hard to determine why a thing like this is present, so a > comment is helpful: > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > +++ a/lib/codetag.c > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > if (!mod) > return true; > > + /* await any module's kfree_rcu() operations to complete */ > kvfree_rcu_barrier(); > > mutex_lock(&codetag_lock); > _ > > But I do wonder whether this is in the correct place. > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > should properly be done by the core module handling code? I don't think core module code cares about kfree_rcu()s being complete before the module is unloaded. Allocation tagging OTOH cares because it is about to destroy tags which will be accessed when kfree() actually happens, therefore a strict ordering is important. > > free_module() does a full-on synchronize_rcu() prior to freeing the > module memory itself and I think codetag_unload_module() could be > called after that? I think we could move codetag_unload_module() after synchronize_rcu() inside free_module() but according to the reply in https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ synchronize_rcu() does not help. I'm not quite sure why... Note that once I'm done upstreaming https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, this change will not be needed and I'm planning to remove this call, however this change is useful for backporting. It should be sent to stable@vger.kernel.org # v6.10+ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 1:49 ` Suren Baghdasaryan @ 2024-10-08 7:14 ` Uladzislau Rezki 2024-10-08 11:16 ` Suren Baghdasaryan 2024-10-08 8:14 ` Vlastimil Babka 1 sibling, 1 reply; 12+ messages in thread From: Uladzislau Rezki @ 2024-10-08 7:14 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Andrew Morton, Florian Westphal, linux-kernel, Uladzislau Rezki, Vlastimil Babka, Kent Overstreet, Ben Greear On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > Ben Greear reports following splat: > > > ------------[ cut here ]------------ > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > ... > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > codetag_unload_module+0x19b/0x2a0 > > > ? codetag_load_module+0x80/0x80 > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > resolves this warning. > > > > > > Reproducer: > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > rmmod nft_chain_nat > > > rmmod nf_nat # will WARN. > > > > > > ... > > > > > > --- a/lib/codetag.c > > > +++ b/lib/codetag.c > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > if (!mod) > > > return true; > > > > > > + kvfree_rcu_barrier(); > > > + > > > mutex_lock(&codetag_lock); > > > list_for_each_entry(cttype, &codetag_types, link) { > > > struct codetag_module *found = NULL; > > > > It's always hard to determine why a thing like this is present, so a > > comment is helpful: > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > +++ a/lib/codetag.c > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > if (!mod) > > return true; > > > > + /* await any module's kfree_rcu() operations to complete */ > > kvfree_rcu_barrier(); > > > > mutex_lock(&codetag_lock); > > _ > > > > But I do wonder whether this is in the correct place. > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > should properly be done by the core module handling code? > > I don't think core module code cares about kfree_rcu()s being complete > before the module is unloaded. > Allocation tagging OTOH cares because it is about to destroy tags > which will be accessed when kfree() actually happens, therefore a > strict ordering is important. > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > module memory itself and I think codetag_unload_module() could be > > called after that? > > I think we could move codetag_unload_module() after synchronize_rcu() > inside free_module() but according to the reply in > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > synchronize_rcu() does not help. I'm not quite sure why... > It is because, synchronize_rcu() is used for a bit different things, i.e. it is about a GP completion. Offloading objects can span several GPs. > Note that once I'm done upstreaming > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > this change will not be needed and I'm planning to remove this call, > however this change is useful for backporting. It should be sent to > stable@vger.kernel.org # v6.10+ > The kvfree_rcu_barrier() has been added into v6.12: <snip> urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 next-20240912 next-20240919 next-20240920 next-20241002 v6.12-rc1 urezki@pc638:~/data/raid0/coding/linux.git$ <snip> For 6.10+, it implies that the mentioned commit should be backported also. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 7:14 ` Uladzislau Rezki @ 2024-10-08 11:16 ` Suren Baghdasaryan 2024-10-08 13:07 ` Uladzislau Rezki 0 siblings, 1 reply; 12+ messages in thread From: Suren Baghdasaryan @ 2024-10-08 11:16 UTC (permalink / raw) To: Uladzislau Rezki Cc: Andrew Morton, Florian Westphal, linux-kernel, Vlastimil Babka, Kent Overstreet, Ben Greear On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > Ben Greear reports following splat: > > > > ------------[ cut here ]------------ > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > ... > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > codetag_unload_module+0x19b/0x2a0 > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > resolves this warning. > > > > > > > > Reproducer: > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > rmmod nft_chain_nat > > > > rmmod nf_nat # will WARN. > > > > > > > > ... > > > > > > > > --- a/lib/codetag.c > > > > +++ b/lib/codetag.c > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > if (!mod) > > > > return true; > > > > > > > > + kvfree_rcu_barrier(); > > > > + > > > > mutex_lock(&codetag_lock); > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > struct codetag_module *found = NULL; > > > > > > It's always hard to determine why a thing like this is present, so a > > > comment is helpful: > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > +++ a/lib/codetag.c > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > if (!mod) > > > return true; > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > kvfree_rcu_barrier(); > > > > > > mutex_lock(&codetag_lock); > > > _ > > > > > > But I do wonder whether this is in the correct place. > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > should properly be done by the core module handling code? > > > > I don't think core module code cares about kfree_rcu()s being complete > > before the module is unloaded. > > Allocation tagging OTOH cares because it is about to destroy tags > > which will be accessed when kfree() actually happens, therefore a > > strict ordering is important. > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > module memory itself and I think codetag_unload_module() could be > > > called after that? > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > inside free_module() but according to the reply in > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > synchronize_rcu() does not help. I'm not quite sure why... > > > It is because, synchronize_rcu() is used for a bit different things, > i.e. it is about a GP completion. Offloading objects can span several > GPs. Ah, thanks! Now that I looked into the patch that recently added kvfree_rcu_barrier() I understand that a bit better. > > > Note that once I'm done upstreaming > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > this change will not be needed and I'm planning to remove this call, > > however this change is useful for backporting. It should be sent to > > stable@vger.kernel.org # v6.10+ > > > The kvfree_rcu_barrier() has been added into v6.12: > > <snip> > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > next-20240912 > next-20240919 > next-20240920 > next-20241002 > v6.12-rc1 > urezki@pc638:~/data/raid0/coding/linux.git$ > <snip> > > For 6.10+, it implies that the mentioned commit should be backported also. I see. I guess for pre-6.12 we would use rcu_barrier() instead of kvfree_rcu_barrier()? > > -- > Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 11:16 ` Suren Baghdasaryan @ 2024-10-08 13:07 ` Uladzislau Rezki 2024-10-08 18:34 ` Suren Baghdasaryan 0 siblings, 1 reply; 12+ messages in thread From: Uladzislau Rezki @ 2024-10-08 13:07 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Uladzislau Rezki, Andrew Morton, Florian Westphal, linux-kernel, Vlastimil Babka, Kent Overstreet, Ben Greear On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > Ben Greear reports following splat: > > > > > ------------[ cut here ]------------ > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > ... > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > resolves this warning. > > > > > > > > > > Reproducer: > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > rmmod nft_chain_nat > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > ... > > > > > > > > > > --- a/lib/codetag.c > > > > > +++ b/lib/codetag.c > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > if (!mod) > > > > > return true; > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > + > > > > > mutex_lock(&codetag_lock); > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > struct codetag_module *found = NULL; > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > comment is helpful: > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > +++ a/lib/codetag.c > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > if (!mod) > > > > return true; > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > kvfree_rcu_barrier(); > > > > > > > > mutex_lock(&codetag_lock); > > > > _ > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > should properly be done by the core module handling code? > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > before the module is unloaded. > > > Allocation tagging OTOH cares because it is about to destroy tags > > > which will be accessed when kfree() actually happens, therefore a > > > strict ordering is important. > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > module memory itself and I think codetag_unload_module() could be > > > > called after that? > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > inside free_module() but according to the reply in > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > It is because, synchronize_rcu() is used for a bit different things, > > i.e. it is about a GP completion. Offloading objects can span several > > GPs. > > Ah, thanks! Now that I looked into the patch that recently added > kvfree_rcu_barrier() I understand that a bit better. > > > > > > Note that once I'm done upstreaming > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > this change will not be needed and I'm planning to remove this call, > > > however this change is useful for backporting. It should be sent to > > > stable@vger.kernel.org # v6.10+ > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > <snip> > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > next-20240912 > > next-20240919 > > next-20240920 > > next-20241002 > > v6.12-rc1 > > urezki@pc638:~/data/raid0/coding/linux.git$ > > <snip> > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > kvfree_rcu_barrier()? > For kernels < 6.12, unfortunately not :) If you have a possibility to switch to reclaim over call_rcu() API, then you can go with rcu_barrier() which waits for all callbacks to be completed. Or backport kvfree_rcu_barrier(). It __should__ be easy since there were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 13:07 ` Uladzislau Rezki @ 2024-10-08 18:34 ` Suren Baghdasaryan 2024-10-09 8:36 ` Uladzislau Rezki 0 siblings, 1 reply; 12+ messages in thread From: Suren Baghdasaryan @ 2024-10-08 18:34 UTC (permalink / raw) To: Uladzislau Rezki Cc: Andrew Morton, Florian Westphal, linux-kernel, Vlastimil Babka, Kent Overstreet, Ben Greear On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > Ben Greear reports following splat: > > > > > > ------------[ cut here ]------------ > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > ... > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > resolves this warning. > > > > > > > > > > > > Reproducer: > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > rmmod nft_chain_nat > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > ... > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > +++ b/lib/codetag.c > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > if (!mod) > > > > > > return true; > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > + > > > > > > mutex_lock(&codetag_lock); > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > comment is helpful: > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > +++ a/lib/codetag.c > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > if (!mod) > > > > > return true; > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > kvfree_rcu_barrier(); > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > _ > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > should properly be done by the core module handling code? > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > before the module is unloaded. > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > which will be accessed when kfree() actually happens, therefore a > > > > strict ordering is important. > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > module memory itself and I think codetag_unload_module() could be > > > > > called after that? > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > inside free_module() but according to the reply in > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > i.e. it is about a GP completion. Offloading objects can span several > > > GPs. > > > > Ah, thanks! Now that I looked into the patch that recently added > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > Note that once I'm done upstreaming > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > this change will not be needed and I'm planning to remove this call, > > > > however this change is useful for backporting. It should be sent to > > > > stable@vger.kernel.org # v6.10+ > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > <snip> > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > next-20240912 > > > next-20240919 > > > next-20240920 > > > next-20241002 > > > v6.12-rc1 > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > <snip> > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > kvfree_rcu_barrier()? > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > which waits for all callbacks to be completed. We do not control the call-site inside the module, so this would not be possible. > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. That sounds better. I'll take a stab at it. Thanks! > > -- > Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 18:34 ` Suren Baghdasaryan @ 2024-10-09 8:36 ` Uladzislau Rezki 2024-10-15 16:22 ` Suren Baghdasaryan 0 siblings, 1 reply; 12+ messages in thread From: Uladzislau Rezki @ 2024-10-09 8:36 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Uladzislau Rezki, Andrew Morton, Florian Westphal, linux-kernel, Vlastimil Babka, Kent Overstreet, Ben Greear On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > > > Ben Greear reports following splat: > > > > > > > ------------[ cut here ]------------ > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > > ... > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > > resolves this warning. > > > > > > > > > > > > > > Reproducer: > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > > rmmod nft_chain_nat > > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > > +++ b/lib/codetag.c > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > > if (!mod) > > > > > > > return true; > > > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > > + > > > > > > > mutex_lock(&codetag_lock); > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > > comment is helpful: > > > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > > +++ a/lib/codetag.c > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > > if (!mod) > > > > > > return true; > > > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > > kvfree_rcu_barrier(); > > > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > _ > > > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > > should properly be done by the core module handling code? > > > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > > before the module is unloaded. > > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > > which will be accessed when kfree() actually happens, therefore a > > > > > strict ordering is important. > > > > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > > module memory itself and I think codetag_unload_module() could be > > > > > > called after that? > > > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > > inside free_module() but according to the reply in > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > > i.e. it is about a GP completion. Offloading objects can span several > > > > GPs. > > > > > > Ah, thanks! Now that I looked into the patch that recently added > > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > > > > Note that once I'm done upstreaming > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > > this change will not be needed and I'm planning to remove this call, > > > > > however this change is useful for backporting. It should be sent to > > > > > stable@vger.kernel.org # v6.10+ > > > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > > > <snip> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > > next-20240912 > > > > next-20240919 > > > > next-20240920 > > > > next-20241002 > > > > v6.12-rc1 > > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > > <snip> > > > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > > kvfree_rcu_barrier()? > > > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > > which waits for all callbacks to be completed. > > We do not control the call-site inside the module, so this would not > be possible. > > > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. > > That sounds better. I'll take a stab at it. Thanks! > You are welcome! -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-09 8:36 ` Uladzislau Rezki @ 2024-10-15 16:22 ` Suren Baghdasaryan 2024-10-15 17:44 ` Uladzislau Rezki 2024-10-16 9:56 ` Vlastimil Babka 0 siblings, 2 replies; 12+ messages in thread From: Suren Baghdasaryan @ 2024-10-15 16:22 UTC (permalink / raw) To: Uladzislau Rezki Cc: Andrew Morton, Florian Westphal, linux-kernel, Vlastimil Babka, Kent Overstreet, Ben Greear On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: > > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > > > > > Ben Greear reports following splat: > > > > > > > > ------------[ cut here ]------------ > > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > > > ... > > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > > > resolves this warning. > > > > > > > > > > > > > > > > Reproducer: > > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > > > rmmod nft_chain_nat > > > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > > > +++ b/lib/codetag.c > > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > > > if (!mod) > > > > > > > > return true; > > > > > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > > > + > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > > > comment is helpful: > > > > > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > > > +++ a/lib/codetag.c > > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > > > if (!mod) > > > > > > > return true; > > > > > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > > > kvfree_rcu_barrier(); > > > > > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > _ > > > > > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > > > should properly be done by the core module handling code? > > > > > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > > > before the module is unloaded. > > > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > > > which will be accessed when kfree() actually happens, therefore a > > > > > > strict ordering is important. > > > > > > > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > > > module memory itself and I think codetag_unload_module() could be > > > > > > > called after that? > > > > > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > > > inside free_module() but according to the reply in > > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > > > i.e. it is about a GP completion. Offloading objects can span several > > > > > GPs. > > > > > > > > Ah, thanks! Now that I looked into the patch that recently added > > > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > > > > > > > Note that once I'm done upstreaming > > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > > > this change will not be needed and I'm planning to remove this call, > > > > > > however this change is useful for backporting. It should be sent to > > > > > > stable@vger.kernel.org # v6.10+ > > > > > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > > > > > <snip> > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > > > next-20240912 > > > > > next-20240919 > > > > > next-20240920 > > > > > next-20241002 > > > > > v6.12-rc1 > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > > > <snip> > > > > > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > > > kvfree_rcu_barrier()? > > > > > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > > > which waits for all callbacks to be completed. > > > > We do not control the call-site inside the module, so this would not > > be possible. > > > > > > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. > > > > That sounds better. I'll take a stab at it. Thanks! I prepared backports for stable 6.10.y and 6.11.y branches which contain this patch along with the prerequisite "rcu/kvfree: Add kvfree_rcu_barrier() API" patch. I think that's the only prerequisite required. However, since this patch is not yet in Linus' tree, I'll wait for it to get there before sending backports to stable (per Greg KH's recommendation). Please let me know if you think it's more critical and should be posted earlier. Thanks, Suren. > > > You are welcome! > > -- > Uladzislau Rezki > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-15 16:22 ` Suren Baghdasaryan @ 2024-10-15 17:44 ` Uladzislau Rezki 2024-10-16 9:56 ` Vlastimil Babka 1 sibling, 0 replies; 12+ messages in thread From: Uladzislau Rezki @ 2024-10-15 17:44 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Uladzislau Rezki, Andrew Morton, Florian Westphal, linux-kernel, Vlastimil Babka, Kent Overstreet, Ben Greear On Tue, Oct 15, 2024 at 09:22:55AM -0700, Suren Baghdasaryan wrote: > On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: > > > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > > > > > > > Ben Greear reports following splat: > > > > > > > > > ------------[ cut here ]------------ > > > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > > > > ... > > > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > > > > resolves this warning. > > > > > > > > > > > > > > > > > > Reproducer: > > > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > > > > rmmod nft_chain_nat > > > > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > > > > +++ b/lib/codetag.c > > > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > > > > if (!mod) > > > > > > > > > return true; > > > > > > > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > > > > + > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > > > > comment is helpful: > > > > > > > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > > > > +++ a/lib/codetag.c > > > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > > > > if (!mod) > > > > > > > > return true; > > > > > > > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > > > > kvfree_rcu_barrier(); > > > > > > > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > > _ > > > > > > > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > > > > should properly be done by the core module handling code? > > > > > > > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > > > > before the module is unloaded. > > > > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > > > > which will be accessed when kfree() actually happens, therefore a > > > > > > > strict ordering is important. > > > > > > > > > > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > > > > module memory itself and I think codetag_unload_module() could be > > > > > > > > called after that? > > > > > > > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > > > > inside free_module() but according to the reply in > > > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > > > > i.e. it is about a GP completion. Offloading objects can span several > > > > > > GPs. > > > > > > > > > > Ah, thanks! Now that I looked into the patch that recently added > > > > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > > > > > > > > > > Note that once I'm done upstreaming > > > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > > > > this change will not be needed and I'm planning to remove this call, > > > > > > > however this change is useful for backporting. It should be sent to > > > > > > > stable@vger.kernel.org # v6.10+ > > > > > > > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > > > > > > > <snip> > > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > > > > next-20240912 > > > > > > next-20240919 > > > > > > next-20240920 > > > > > > next-20241002 > > > > > > v6.12-rc1 > > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > > > > <snip> > > > > > > > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > > > > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > > > > kvfree_rcu_barrier()? > > > > > > > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > > > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > > > > which waits for all callbacks to be completed. > > > > > > We do not control the call-site inside the module, so this would not > > > be possible. > > > > > > > > > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > > > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. > > > > > > That sounds better. I'll take a stab at it. Thanks! > > I prepared backports for stable 6.10.y and 6.11.y branches which > contain this patch along with the prerequisite "rcu/kvfree: Add > kvfree_rcu_barrier() API" patch. I think that's the only prerequisite > required. However, since this patch is not yet in Linus' tree, I'll > wait for it to get there before sending backports to stable (per Greg > KH's recommendation). Please let me know if you think it's more > critical and should be posted earlier. > To me it sounds reasonable. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-15 16:22 ` Suren Baghdasaryan 2024-10-15 17:44 ` Uladzislau Rezki @ 2024-10-16 9:56 ` Vlastimil Babka 1 sibling, 0 replies; 12+ messages in thread From: Vlastimil Babka @ 2024-10-16 9:56 UTC (permalink / raw) To: Suren Baghdasaryan, Uladzislau Rezki Cc: Andrew Morton, Florian Westphal, linux-kernel, Kent Overstreet, Ben Greear On 10/15/24 18:22, Suren Baghdasaryan wrote: > On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote: >> >> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: >> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: >> > > >> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: >> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: >> > > > > >> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: >> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> > > > > > > >> > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: >> > > > > > > >> > > > > > > > Ben Greear reports following splat: >> > > > > > > > ------------[ cut here ]------------ >> > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload >> > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 >> > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat >> > > > > > > > ... >> > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 >> > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 >> > > > > > > > codetag_unload_module+0x19b/0x2a0 >> > > > > > > > ? codetag_load_module+0x80/0x80 >> > > > > > > > >> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free >> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. >> > > > > > > > >> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking >> > > > > > > > resolves this warning. >> > > > > > > > >> > > > > > > > Reproducer: >> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp >> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations >> > > > > > > > rmmod nft_chain_nat >> > > > > > > > rmmod nf_nat # will WARN. >> > > > > > > > >> > > > > > > > ... >> > > > > > > > >> > > > > > > > --- a/lib/codetag.c >> > > > > > > > +++ b/lib/codetag.c >> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) >> > > > > > > > if (!mod) >> > > > > > > > return true; >> > > > > > > > >> > > > > > > > + kvfree_rcu_barrier(); >> > > > > > > > + >> > > > > > > > mutex_lock(&codetag_lock); >> > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { >> > > > > > > > struct codetag_module *found = NULL; >> > > > > > > >> > > > > > > It's always hard to determine why a thing like this is present, so a >> > > > > > > comment is helpful: >> > > > > > > >> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix >> > > > > > > +++ a/lib/codetag.c >> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module >> > > > > > > if (!mod) >> > > > > > > return true; >> > > > > > > >> > > > > > > + /* await any module's kfree_rcu() operations to complete */ >> > > > > > > kvfree_rcu_barrier(); >> > > > > > > >> > > > > > > mutex_lock(&codetag_lock); >> > > > > > > _ >> > > > > > > >> > > > > > > But I do wonder whether this is in the correct place. >> > > > > > > >> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete >> > > > > > > should properly be done by the core module handling code? >> > > > > > >> > > > > > I don't think core module code cares about kfree_rcu()s being complete >> > > > > > before the module is unloaded. >> > > > > > Allocation tagging OTOH cares because it is about to destroy tags >> > > > > > which will be accessed when kfree() actually happens, therefore a >> > > > > > strict ordering is important. >> > > > > > >> > > > > > > >> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the >> > > > > > > module memory itself and I think codetag_unload_module() could be >> > > > > > > called after that? >> > > > > > >> > > > > > I think we could move codetag_unload_module() after synchronize_rcu() >> > > > > > inside free_module() but according to the reply in >> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ >> > > > > > synchronize_rcu() does not help. I'm not quite sure why... >> > > > > > >> > > > > It is because, synchronize_rcu() is used for a bit different things, >> > > > > i.e. it is about a GP completion. Offloading objects can span several >> > > > > GPs. >> > > > >> > > > Ah, thanks! Now that I looked into the patch that recently added >> > > > kvfree_rcu_barrier() I understand that a bit better. >> > > > >> > > > > >> > > > > > Note that once I'm done upstreaming >> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, >> > > > > > this change will not be needed and I'm planning to remove this call, >> > > > > > however this change is useful for backporting. It should be sent to >> > > > > > stable@vger.kernel.org # v6.10+ >> > > > > > >> > > > > The kvfree_rcu_barrier() has been added into v6.12: >> > > > > >> > > > > <snip> >> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 >> > > > > next-20240912 >> > > > > next-20240919 >> > > > > next-20240920 >> > > > > next-20241002 >> > > > > v6.12-rc1 >> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ >> > > > > <snip> >> > > > > >> > > > > For 6.10+, it implies that the mentioned commit should be backported also. >> > > > >> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of >> > > > kvfree_rcu_barrier()? >> > > > >> > > For kernels < 6.12, unfortunately not :) If you have a possibility to >> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() >> > > which waits for all callbacks to be completed. >> > >> > We do not control the call-site inside the module, so this would not >> > be possible. >> > >> > > >> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there >> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. >> > >> > That sounds better. I'll take a stab at it. Thanks! > > I prepared backports for stable 6.10.y and 6.11.y branches which 6.10 is already EOL thus won't be necessary. > contain this patch along with the prerequisite "rcu/kvfree: Add > kvfree_rcu_barrier() API" patch. I think that's the only prerequisite > required. However, since this patch is not yet in Linus' tree, I'll > wait for it to get there before sending backports to stable (per Greg > KH's recommendation). Please let me know if you think it's more > critical and should be posted earlier. > Thanks, > Suren. > >> > >> You are welcome! >> >> -- >> Uladzislau Rezki >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls 2024-10-08 1:49 ` Suren Baghdasaryan 2024-10-08 7:14 ` Uladzislau Rezki @ 2024-10-08 8:14 ` Vlastimil Babka 1 sibling, 0 replies; 12+ messages in thread From: Vlastimil Babka @ 2024-10-08 8:14 UTC (permalink / raw) To: Suren Baghdasaryan, Andrew Morton Cc: Florian Westphal, linux-kernel, Uladzislau Rezki, Kent Overstreet, Ben Greear On 10/8/24 03:49, Suren Baghdasaryan wrote: > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> >> On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: >> >> > Ben Greear reports following splat: >> > ------------[ cut here ]------------ >> > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload >> > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 >> > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat >> > ... >> > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 >> > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 >> > codetag_unload_module+0x19b/0x2a0 >> > ? codetag_load_module+0x80/0x80 >> > >> > nf_nat module exit calls kfree_rcu on those addresses, but the free >> > operation is likely still pending by the time alloc_tag checks for leaks. >> > >> > Wait for outstanding kfree_rcu operations to complete before checking >> > resolves this warning. >> > >> > Reproducer: >> > unshare -n iptables-nft -t nat -A PREROUTING -p tcp >> > grep nf_nat /proc/allocinfo # will list 4 allocations >> > rmmod nft_chain_nat >> > rmmod nf_nat # will WARN. >> > >> > ... >> > >> > --- a/lib/codetag.c >> > +++ b/lib/codetag.c >> > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) >> > if (!mod) >> > return true; >> > >> > + kvfree_rcu_barrier(); >> > + >> > mutex_lock(&codetag_lock); >> > list_for_each_entry(cttype, &codetag_types, link) { >> > struct codetag_module *found = NULL; >> >> It's always hard to determine why a thing like this is present, so a >> comment is helpful: >> >> --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix >> +++ a/lib/codetag.c >> @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module >> if (!mod) >> return true; >> >> + /* await any module's kfree_rcu() operations to complete */ >> kvfree_rcu_barrier(); >> >> mutex_lock(&codetag_lock); >> _ >> >> But I do wonder whether this is in the correct place. >> >> Waiting for a module's ->exit() function's kfree_rcu()s to complete >> should properly be done by the core module handling code? > > I don't think core module code cares about kfree_rcu()s being complete > before the module is unloaded. Right, module unload should care about pending call_rcu() involving module code and/or module-created kmem_caches that are to be destroyed, but I think it's up to individual modules anyway to do a rcu_barrier() in those cases? > Allocation tagging OTOH cares because it is about to destroy tags > which will be accessed when kfree() actually happens, therefore a > strict ordering is important. > >> >> free_module() does a full-on synchronize_rcu() prior to freeing the >> module memory itself and I think codetag_unload_module() could be >> called after that? > I think we could move codetag_unload_module() after synchronize_rcu() > inside free_module() but according to the reply in > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > synchronize_rcu() does not help. I'm not quite sure why... synchronize_rcu() is only waiting for potential rcu read sections? You might have expected rcu_barrier() to help as that's waiting for pending call_rcu(). But as Ulad said the kfree_rcu() implementation does extra batching so there's now a new barrier for that, originally intended for kmem_cache_destroy(), but indeed useful here as well. > Note that once I'm done upstreaming > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > this change will not be needed and I'm planning to remove this call, > however this change is useful for backporting. It should be sent to > stable@vger.kernel.org # v6.10+ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-10-16 9:56 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-07 20:52 [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls Florian Westphal 2024-10-08 1:15 ` Andrew Morton 2024-10-08 1:49 ` Suren Baghdasaryan 2024-10-08 7:14 ` Uladzislau Rezki 2024-10-08 11:16 ` Suren Baghdasaryan 2024-10-08 13:07 ` Uladzislau Rezki 2024-10-08 18:34 ` Suren Baghdasaryan 2024-10-09 8:36 ` Uladzislau Rezki 2024-10-15 16:22 ` Suren Baghdasaryan 2024-10-15 17:44 ` Uladzislau Rezki 2024-10-16 9:56 ` Vlastimil Babka 2024-10-08 8:14 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox