* nf-nat-core: allocated memory at module unload. @ 2024-10-01 13:06 Ben Greear 2024-10-01 19:36 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: Ben Greear @ 2024-10-01 13:06 UTC (permalink / raw) To: netdev Hello, I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). Is this a known issue? Is it a serious problem? ------------[ cut here ]------------ net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat jfs nls_ucs2_utils xfs nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv6 nf_defrag_ipv4 vrf 8021q garp mrp stp llc macvlan pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr iwlmvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt ee1004 snd_hda_scodec_component intel_pmc_bxt intel_rapl_msr iTCO_vendor_support snd_hda_intel snd_intel_dspcfg coretemp intel_rapl_common snd_hda_codec intel_uncore_frequency snd_hda_core intel_uncore_frequency_common mac80211 snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi intel_tcc_cooling x86_pkg_temp_thermal snd_timer intel_powerclamp intel_wmi_thunderbolt i2c_i801 snd i2c_smbus pcspkr soundcore i2c_mux bfq cfg80211 mei_hdcp mei_pxp intel_pch_thermal intel_pmc_core intel_vsec pmt_telemetry pmt_class acpi_pad sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram raid1 dm_raid raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq i915 cec rc_core drm_buddy intel_gtt drm_display_helper ixgbe drm_kms_helper igb mdio ttm dca i2c_algo_bit agpgart hwmon drm mei_wdt xhci_pci i2c_core xhci_pci_renesas video wmi fuse [last unloaded: nf_nat] CPU: 1 UID: 0 PID: 10421 Comm: rmmod Tainted: G W 6.11.0+ #2 Tainted: [W]=WARN Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 Code: 00 00 00 00 00 fc ff df 49 89 fa 49 c1 ea 03 41 80 3c 02 00 0f 85 28 01 00 00 48 8b 76 18 48 c7 c7 a0 da 48 84 e8 15 7d c1 fe <0f> 0b 45 31 ed e9 6b fe ff ff 41 bd 01 00 00 00 48 b8 00 00 00 00 RSP: 0018:ffff88813b91fc50 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88812b30b600 RCX: 0000000000000027 RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88841daab988 RBP: 1ffff11027723f8f R08: 0000000000000001 R09: ffffed1083b55731 R10: ffff88841daab98b R11: 0000000000000001 R12: fffffbfff099df23 R13: 0000000000000001 R14: 00000000000000ff R15: dffffc0000000000 FS: 00007f4efd62f740(0000) GS:ffff88841da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561ab516c000 CR3: 0000000120f94003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0xc8/0x2d0 ? alloc_tag_module_unload+0x22b/0x3f0 ? report_bug+0x259/0x2c0 ? handle_bug+0x54/0xa0 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? alloc_tag_module_unload+0x22b/0x3f0 ? idr_get_next_ul+0x189/0x230 ? allocinfo_show+0x6d0/0x6d0 ? rwsem_down_read_slowpath+0xb10/0xb10 codetag_unload_module+0x19b/0x2a0 ? codetag_load_module+0x80/0x80 ? up_write+0x4f0/0x4f0 ? notifier_call_chain+0x95/0x2d0 free_module+0x51/0x3e0 __do_sys_delete_module.constprop.0+0x39c/0x530 ? module_flags+0x300/0x300 ? kmem_cache_alloc_bulk_noprof+0x680/0x6a0 ? __virt_addr_valid+0x1cb/0x390 ? lockdep_hardirqs_on_prepare+0x275/0x3e0 do_syscall_64+0x69/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f4efcf128cb Code: 73 01 c3 48 8b 0d 55 55 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 25 55 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee03b3338 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000055b749cea7b0 RCX: 00007f4efcf128cb RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b749cea818 RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000 R10: 00007f4efcf9dac0 R11: 0000000000000206 R12: 00007ffee03b3590 R13: 00007ffee03b4963 R14: 000055b749cea2a0 R15: 00007ffee03b3598 </TASK> irq event stamp: 7895 hardirqs last enabled at (7907): [<ffffffff8143a209>] __up_console_sem+0x59/0x60 hardirqs last disabled at (7918): [<ffffffff8143a1ee>] __up_console_sem+0x3e/0x60 softirqs last enabled at (7800): [<ffffffff8129bd51>] __irq_exit_rcu+0x91/0xc0 softirqs last disabled at (7795): [<ffffffff8129bd51>] __irq_exit_rcu+0x91/0xc0 ---[ end trace 0000000000000000 ]--- Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-01 13:06 nf-nat-core: allocated memory at module unload Ben Greear @ 2024-10-01 19:36 ` Florian Westphal 2024-10-04 23:13 ` Suren Baghdasaryan 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2024-10-01 19:36 UTC (permalink / raw) To: Ben Greear; +Cc: netdev, kent.overstreet, surenb, pablo Ben Greear <greearb@candelatech.com> wrote: [ CCing codetag folks ] > Hello, > > I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). > > Is this a known issue? Is it a serious problem? Not known to me. Looks like an mm (rcu)+codetag problem. > ------------[ cut here ]------------ > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat ... > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > codetag_unload_module+0x19b/0x2a0 > ? codetag_load_module+0x80/0x80 > ? up_write+0x4f0/0x4f0 "Well, yes, but actually no." At this time, kfree_rcu() has been called on all 4 objects. Looks like kfree_rcu no longer cares even about rcu_barrier(), and there is no kvfree_rcu_barrier() in 6.11. The warning goes away when I replace kfree_rcu with call_rcu+kfree plus rcu_barrier in module exit path. But I don't think its the right thing to do. (referring to nf_nat_unregister_fn(), kfree_rcu(priv, rcu_head);). Reproducer: unshare -n iptables-nft -t nat -A PREROUTING -p tcp grep nf_nat /proc/allocinfo # will list 4 allocations rmmod nft_chain_nat rmmod nf_nat # will WARN. Without rmmod, the 4 allocations go away after a few seconds, grep will no longer list them and then rmmod won't splat. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-01 19:36 ` Florian Westphal @ 2024-10-04 23:13 ` Suren Baghdasaryan 2024-10-07 11:29 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: Suren Baghdasaryan @ 2024-10-04 23:13 UTC (permalink / raw) To: Florian Westphal; +Cc: Ben Greear, netdev, kent.overstreet, pablo On Tue, Oct 1, 2024 at 12:36 PM Florian Westphal <fw@strlen.de> wrote: > > Ben Greear <greearb@candelatech.com> wrote: > > [ CCing codetag folks ] Thanks! I've been on vacation and just saw this report. > > > Hello, > > > > I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). > > > > Is this a known issue? Is it a serious problem? > > Not known to me. Looks like an mm (rcu)+codetag problem. > > > ------------[ cut here ]------------ > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > ... > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > codetag_unload_module+0x19b/0x2a0 > > ? codetag_load_module+0x80/0x80 > > ? up_write+0x4f0/0x4f0 > > "Well, yes, but actually no." > > At this time, kfree_rcu() has been called on all 4 objects. > > Looks like kfree_rcu no longer cares even about rcu_barrier(), and > there is no kvfree_rcu_barrier() in 6.11. > > The warning goes away when I replace kfree_rcu with call_rcu+kfree > plus rcu_barrier in module exit path. > > But I don't think its the right thing to do. > > (referring to nf_nat_unregister_fn(), kfree_rcu(priv, rcu_head);). > > Reproducer: > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > grep nf_nat /proc/allocinfo # will list 4 allocations > rmmod nft_chain_nat > rmmod nf_nat # will WARN. > > Without rmmod, the 4 allocations go away after a few seconds, > grep will no longer list them and then rmmod won't splat. I see. So, the kfree_rcu() was already called but freeing did not happen yet, in the meantime we are unloading the module. We could add a synchronize_rcu() at the beginning of codetag_unload_module() so that all pending kfree_rcu()s complete before we check codetag counters: bool codetag_unload_module(struct module *mod) { struct codetag_type *cttype; bool unload_ok = true; if (!mod) return true; + synchronize_rcu(); mutex_lock(&codetag_lock); Could you please try the above one-line change and see if that fixes the issue? BTW, I'm working on some optimizations and once https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com gets accepted this issue will be eliminated altogether. Thanks, Suren. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-04 23:13 ` Suren Baghdasaryan @ 2024-10-07 11:29 ` Florian Westphal 2024-10-07 15:10 ` Suren Baghdasaryan 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2024-10-07 11:29 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Florian Westphal, Ben Greear, netdev, kent.overstreet, pablo Suren Baghdasaryan <surenb@google.com> wrote: > On Tue, Oct 1, 2024 at 12:36 PM Florian Westphal <fw@strlen.de> wrote: > > > > Ben Greear <greearb@candelatech.com> wrote: > > > > [ CCing codetag folks ] > > Thanks! I've been on vacation and just saw this report. > > > > > > Hello, > > > > > > I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). > > > > > > Is this a known issue? Is it a serious problem? > > > > Not known to me. Looks like an mm (rcu)+codetag problem. > > > > > ------------[ cut here ]------------ > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > ... > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > codetag_unload_module+0x19b/0x2a0 > > > ? codetag_load_module+0x80/0x80 > > > ? up_write+0x4f0/0x4f0 > > > > "Well, yes, but actually no." > > > > At this time, kfree_rcu() has been called on all 4 objects. > > > > Looks like kfree_rcu no longer cares even about rcu_barrier(), and > > there is no kvfree_rcu_barrier() in 6.11. > > > > The warning goes away when I replace kfree_rcu with call_rcu+kfree > > plus rcu_barrier in module exit path. > > > > But I don't think its the right thing to do. > > > > (referring to nf_nat_unregister_fn(), kfree_rcu(priv, rcu_head);). > > > > Reproducer: > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > grep nf_nat /proc/allocinfo # will list 4 allocations > > rmmod nft_chain_nat > > rmmod nf_nat # will WARN. > > > > Without rmmod, the 4 allocations go away after a few seconds, > > grep will no longer list them and then rmmod won't splat. > > I see. So, the kfree_rcu() was already called but freeing did not > happen yet, in the meantime we are unloading the module. Yes. > We could add > a synchronize_rcu() at the beginning of codetag_unload_module() so > that all pending kfree_rcu()s complete before we check codetag > counters: > > bool codetag_unload_module(struct module *mod) > { > struct codetag_type *cttype; > bool unload_ok = true; > > if (!mod) > return true; > > + synchronize_rcu(); > mutex_lock(&codetag_lock); This doesn't help as kfree_rcu doesn't wait for this. Use of kvfree_rcu_barrier() instead does work though. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-07 11:29 ` Florian Westphal @ 2024-10-07 15:10 ` Suren Baghdasaryan 2024-10-09 18:20 ` Ben Greear 0 siblings, 1 reply; 8+ messages in thread From: Suren Baghdasaryan @ 2024-10-07 15:10 UTC (permalink / raw) To: Florian Westphal; +Cc: Ben Greear, netdev, kent.overstreet, pablo On Mon, Oct 7, 2024 at 4:29 AM Florian Westphal <fw@strlen.de> wrote: > > Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Oct 1, 2024 at 12:36 PM Florian Westphal <fw@strlen.de> wrote: > > > > > > Ben Greear <greearb@candelatech.com> wrote: > > > > > > [ CCing codetag folks ] > > > > Thanks! I've been on vacation and just saw this report. > > > > > > > > > Hello, > > > > > > > > I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). > > > > > > > > Is this a known issue? Is it a serious problem? > > > > > > Not known to me. Looks like an mm (rcu)+codetag problem. > > > > > > > ------------[ cut here ]------------ > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > ... > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > codetag_unload_module+0x19b/0x2a0 > > > > ? codetag_load_module+0x80/0x80 > > > > ? up_write+0x4f0/0x4f0 > > > > > > "Well, yes, but actually no." > > > > > > At this time, kfree_rcu() has been called on all 4 objects. > > > > > > Looks like kfree_rcu no longer cares even about rcu_barrier(), and > > > there is no kvfree_rcu_barrier() in 6.11. > > > > > > The warning goes away when I replace kfree_rcu with call_rcu+kfree > > > plus rcu_barrier in module exit path. > > > > > > But I don't think its the right thing to do. > > > > > > (referring to nf_nat_unregister_fn(), kfree_rcu(priv, rcu_head);). > > > > > > Reproducer: > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > rmmod nft_chain_nat > > > rmmod nf_nat # will WARN. > > > > > > Without rmmod, the 4 allocations go away after a few seconds, > > > grep will no longer list them and then rmmod won't splat. > > > > I see. So, the kfree_rcu() was already called but freeing did not > > happen yet, in the meantime we are unloading the module. > > Yes. > > > We could add > > a synchronize_rcu() at the beginning of codetag_unload_module() so > > that all pending kfree_rcu()s complete before we check codetag > > counters: > > > > bool codetag_unload_module(struct module *mod) > > { > > struct codetag_type *cttype; > > bool unload_ok = true; > > > > if (!mod) > > return true; > > > > + synchronize_rcu(); > > mutex_lock(&codetag_lock); > > This doesn't help as kfree_rcu doesn't wait for this. > > Use of kvfree_rcu_barrier() instead does work though. I see. That sounds like an acceptable fix. Please post it and I'll ack it. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-07 15:10 ` Suren Baghdasaryan @ 2024-10-09 18:20 ` Ben Greear 2024-10-09 18:23 ` Suren Baghdasaryan 0 siblings, 1 reply; 8+ messages in thread From: Ben Greear @ 2024-10-09 18:20 UTC (permalink / raw) To: Suren Baghdasaryan, Florian Westphal; +Cc: netdev, kent.overstreet, pablo On 10/7/24 08:10, Suren Baghdasaryan wrote: > On Mon, Oct 7, 2024 at 4:29 AM Florian Westphal <fw@strlen.de> wrote: >> >> Suren Baghdasaryan <surenb@google.com> wrote: >>> On Tue, Oct 1, 2024 at 12:36 PM Florian Westphal <fw@strlen.de> wrote: >>>> >>>> Ben Greear <greearb@candelatech.com> wrote: >>>> >>>> [ CCing codetag folks ] >>> >>> Thanks! I've been on vacation and just saw this report. >>> >>>> >>>>> Hello, >>>>> >>>>> I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). >>>>> >>>>> Is this a known issue? Is it a serious problem? >>>> >>>> Not known to me. Looks like an mm (rcu)+codetag problem. >>>> >>>>> ------------[ cut here ]------------ >>>>> net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload >>>>> WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 >>>>> Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat >>>> ... >>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 >>>>> RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 >>>>> codetag_unload_module+0x19b/0x2a0 >>>>> ? codetag_load_module+0x80/0x80 >>>>> ? up_write+0x4f0/0x4f0 >>>> >>>> "Well, yes, but actually no." >>>> >>>> At this time, kfree_rcu() has been called on all 4 objects. >>>> >>>> Looks like kfree_rcu no longer cares even about rcu_barrier(), and >>>> there is no kvfree_rcu_barrier() in 6.11. >>>> >>>> The warning goes away when I replace kfree_rcu with call_rcu+kfree >>>> plus rcu_barrier in module exit path. >>>> >>>> But I don't think its the right thing to do. Hello, Is this approach just ugly, or plain wrong? kvfree_rcu_barrier does not existing in 6.10 kernel. Thanks, Ben >>>> >>>> (referring to nf_nat_unregister_fn(), kfree_rcu(priv, rcu_head);). >>>> >>>> Reproducer: >>>> unshare -n iptables-nft -t nat -A PREROUTING -p tcp >>>> grep nf_nat /proc/allocinfo # will list 4 allocations >>>> rmmod nft_chain_nat >>>> rmmod nf_nat # will WARN. >>>> >>>> Without rmmod, the 4 allocations go away after a few seconds, >>>> grep will no longer list them and then rmmod won't splat. >>> >>> I see. So, the kfree_rcu() was already called but freeing did not >>> happen yet, in the meantime we are unloading the module. >> >> Yes. >> >>> We could add >>> a synchronize_rcu() at the beginning of codetag_unload_module() so >>> that all pending kfree_rcu()s complete before we check codetag >>> counters: >>> >>> bool codetag_unload_module(struct module *mod) >>> { >>> struct codetag_type *cttype; >>> bool unload_ok = true; >>> >>> if (!mod) >>> return true; >>> >>> + synchronize_rcu(); >>> mutex_lock(&codetag_lock); >> >> This doesn't help as kfree_rcu doesn't wait for this. >> >> Use of kvfree_rcu_barrier() instead does work though. > > I see. That sounds like an acceptable fix. Please post it and I'll ack it. > Thanks! > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-09 18:20 ` Ben Greear @ 2024-10-09 18:23 ` Suren Baghdasaryan 2024-10-09 18:28 ` Ben Greear 0 siblings, 1 reply; 8+ messages in thread From: Suren Baghdasaryan @ 2024-10-09 18:23 UTC (permalink / raw) To: Ben Greear; +Cc: Florian Westphal, netdev, kent.overstreet, pablo On Wed, Oct 9, 2024 at 11:20 AM Ben Greear <greearb@candelatech.com> wrote: > > On 10/7/24 08:10, Suren Baghdasaryan wrote: > > On Mon, Oct 7, 2024 at 4:29 AM Florian Westphal <fw@strlen.de> wrote: > >> > >> Suren Baghdasaryan <surenb@google.com> wrote: > >>> On Tue, Oct 1, 2024 at 12:36 PM Florian Westphal <fw@strlen.de> wrote: > >>>> > >>>> Ben Greear <greearb@candelatech.com> wrote: > >>>> > >>>> [ CCing codetag folks ] > >>> > >>> Thanks! I've been on vacation and just saw this report. > >>> > >>>> > >>>>> Hello, > >>>>> > >>>>> I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). > >>>>> > >>>>> Is this a known issue? Is it a serious problem? > >>>> > >>>> Not known to me. Looks like an mm (rcu)+codetag problem. > >>>> > >>>>> ------------[ cut here ]------------ > >>>>> net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > >>>>> WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > >>>>> Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > >>>> ... > >>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > >>>>> RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > >>>>> codetag_unload_module+0x19b/0x2a0 > >>>>> ? codetag_load_module+0x80/0x80 > >>>>> ? up_write+0x4f0/0x4f0 > >>>> > >>>> "Well, yes, but actually no." > >>>> > >>>> At this time, kfree_rcu() has been called on all 4 objects. > >>>> > >>>> Looks like kfree_rcu no longer cares even about rcu_barrier(), and > >>>> there is no kvfree_rcu_barrier() in 6.11. > >>>> > >>>> The warning goes away when I replace kfree_rcu with call_rcu+kfree > >>>> plus rcu_barrier in module exit path. > >>>> > >>>> But I don't think its the right thing to do. > > Hello, > > Is this approach just ugly, or plain wrong? I think the approach is correct. > > kvfree_rcu_barrier does not existing in 6.10 kernel. Yeah, I'll try backporting kvfree_rcu_barrier() to 6.10 and 6.11 for this change. > > Thanks, > Ben > > >>>> > >>>> (referring to nf_nat_unregister_fn(), kfree_rcu(priv, rcu_head);). > >>>> > >>>> Reproducer: > >>>> unshare -n iptables-nft -t nat -A PREROUTING -p tcp > >>>> grep nf_nat /proc/allocinfo # will list 4 allocations > >>>> rmmod nft_chain_nat > >>>> rmmod nf_nat # will WARN. > >>>> > >>>> Without rmmod, the 4 allocations go away after a few seconds, > >>>> grep will no longer list them and then rmmod won't splat. > >>> > >>> I see. So, the kfree_rcu() was already called but freeing did not > >>> happen yet, in the meantime we are unloading the module. > >> > >> Yes. > >> > >>> We could add > >>> a synchronize_rcu() at the beginning of codetag_unload_module() so > >>> that all pending kfree_rcu()s complete before we check codetag > >>> counters: > >>> > >>> bool codetag_unload_module(struct module *mod) > >>> { > >>> struct codetag_type *cttype; > >>> bool unload_ok = true; > >>> > >>> if (!mod) > >>> return true; > >>> > >>> + synchronize_rcu(); > >>> mutex_lock(&codetag_lock); > >> > >> This doesn't help as kfree_rcu doesn't wait for this. > >> > >> Use of kvfree_rcu_barrier() instead does work though. > > > > I see. That sounds like an acceptable fix. Please post it and I'll ack it. > > Thanks! > > > -- > Ben Greear <greearb@candelatech.com> > Candela Technologies Inc http://www.candelatech.com > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nf-nat-core: allocated memory at module unload. 2024-10-09 18:23 ` Suren Baghdasaryan @ 2024-10-09 18:28 ` Ben Greear 0 siblings, 0 replies; 8+ messages in thread From: Ben Greear @ 2024-10-09 18:28 UTC (permalink / raw) To: Suren Baghdasaryan; +Cc: Florian Westphal, netdev, kent.overstreet, pablo On 10/9/24 11:23, Suren Baghdasaryan wrote: > On Wed, Oct 9, 2024 at 11:20 AM Ben Greear <greearb@candelatech.com> wrote: >> >> On 10/7/24 08:10, Suren Baghdasaryan wrote: >>> On Mon, Oct 7, 2024 at 4:29 AM Florian Westphal <fw@strlen.de> wrote: >>>> >>>> Suren Baghdasaryan <surenb@google.com> wrote: >>>>> On Tue, Oct 1, 2024 at 12:36 PM Florian Westphal <fw@strlen.de> wrote: >>>>>> >>>>>> Ben Greear <greearb@candelatech.com> wrote: >>>>>> >>>>>> [ CCing codetag folks ] >>>>> >>>>> Thanks! I've been on vacation and just saw this report. >>>>> >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I see this splat in 6.11.0 (plus a single patch to fix vrf xmit deadlock). >>>>>>> >>>>>>> Is this a known issue? Is it a serious problem? >>>>>> >>>>>> Not known to me. Looks like an mm (rcu)+codetag problem. >>>>>> >>>>>>> ------------[ cut here ]------------ >>>>>>> net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload >>>>>>> WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 >>>>>>> Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat >>>>>> ... >>>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 >>>>>>> RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 >>>>>>> codetag_unload_module+0x19b/0x2a0 >>>>>>> ? codetag_load_module+0x80/0x80 >>>>>>> ? up_write+0x4f0/0x4f0 >>>>>> >>>>>> "Well, yes, but actually no." >>>>>> >>>>>> At this time, kfree_rcu() has been called on all 4 objects. >>>>>> >>>>>> Looks like kfree_rcu no longer cares even about rcu_barrier(), and >>>>>> there is no kvfree_rcu_barrier() in 6.11. >>>>>> >>>>>> The warning goes away when I replace kfree_rcu with call_rcu+kfree >>>>>> plus rcu_barrier in module exit path. >>>>>> >>>>>> But I don't think its the right thing to do. >> >> Hello, >> >> Is this approach just ugly, or plain wrong? > > I think the approach is correct. > >> >> kvfree_rcu_barrier does not existing in 6.10 kernel. > > Yeah, I'll try backporting kvfree_rcu_barrier() to 6.10 and 6.11 for > this change. Ok, I will be happy to help test. Please respond on this thread if you post something, pointing to whatever patch(es) should be tested. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-09 18:28 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-01 13:06 nf-nat-core: allocated memory at module unload Ben Greear 2024-10-01 19:36 ` Florian Westphal 2024-10-04 23:13 ` Suren Baghdasaryan 2024-10-07 11:29 ` Florian Westphal 2024-10-07 15:10 ` Suren Baghdasaryan 2024-10-09 18:20 ` Ben Greear 2024-10-09 18:23 ` Suren Baghdasaryan 2024-10-09 18:28 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).