netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work
@ 2024-07-01 20:19 syzbot
  2024-07-02 14:08 ` [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: syzbot @ 2024-07-01 20:19 UTC (permalink / raw)
  To: coreteam, davem, edumazet, kadlec, kuba, linux-kernel, netdev,
	netfilter-devel, pabeni, pablo, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    1c5fc27bc48a Merge tag 'nf-next-24-06-28' of git://git.ker..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=103b1da9980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5264b58fdff6e881
dashboard link: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=148791c6980000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12cecb1e980000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9672225af907/disk-1c5fc27b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0f14d163a914/vmlinux-1c5fc27b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ec6c331e6a6e/bzImage-1c5fc27b.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-use-after-free in nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
BUG: KASAN: slab-use-after-free in nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45

CPU: 1 PID: 45 Comm: kworker/1:1 Not tainted 6.10.0-rc5-syzkaller-01137-g1c5fc27bc48a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
 nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
 nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
 process_one_work kernel/workqueue.c:3248 [inline]
 process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3329
 worker_thread+0x86d/0xd50 kernel/workqueue.c:3409
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Allocated by task 6682:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
 __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
 kasan_kmalloc include/linux/kasan.h:211 [inline]
 kmalloc_trace_noprof+0x19c/0x2c0 mm/slub.c:4154
 kmalloc_noprof include/linux/slab.h:660 [inline]
 kzalloc_noprof include/linux/slab.h:778 [inline]
 nf_tables_newtable+0x52e/0x1dc0 net/netfilter/nf_tables_api.c:1465
 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:522 [inline]
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:644 [inline]
 nfnetlink_rcv+0x1427/0x2a90 net/netfilter/nfnetlink.c:662
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x7f0/0x990 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
 ___sys_sendmsg net/socket.c:2639 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 6682:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
 kasan_slab_free include/linux/kasan.h:184 [inline]
 slab_free_hook mm/slub.c:2196 [inline]
 slab_free mm/slub.c:4438 [inline]
 kfree+0x149/0x360 mm/slub.c:4559
 nf_tables_table_destroy net/netfilter/nf_tables_api.c:1672 [inline]
 __nft_release_table+0xe80/0xf40 net/netfilter/nf_tables_api.c:11517
 nft_rcv_nl_event+0x55f/0x6d0 net/netfilter/nf_tables_api.c:11576
 notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
 blocking_notifier_call_chain+0x69/0x90 kernel/notifier.c:388
 netlink_release+0x11a6/0x1b10 net/netlink/af_netlink.c:787
 __sock_release net/socket.c:659 [inline]
 sock_close+0xbc/0x240 net/socket.c:1421
 __fput+0x406/0x8b0 fs/file_table.c:422
 task_work_run+0x24f/0x310 kernel/task_work.c:180
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xa27/0x27e0 kernel/exit.c:874
 do_group_exit+0x207/0x2c0 kernel/exit.c:1023
 __do_sys_exit_group kernel/exit.c:1034 [inline]
 __se_sys_exit_group kernel/exit.c:1032 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1032
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Last potentially related work creation:
 kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
 __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
 insert_work+0x3e/0x330 kernel/workqueue.c:2208
 __queue_work+0xc16/0xee0 kernel/workqueue.c:2360
 queue_work_on+0x1c2/0x380 kernel/workqueue.c:2411
 queue_work include/linux/workqueue.h:621 [inline]
 schedule_work include/linux/workqueue.h:682 [inline]
 __rhashtable_remove_fast_one include/linux/rhashtable.h:1069 [inline]
 __rhashtable_remove_fast include/linux/rhashtable.h:1093 [inline]
 rhltable_remove+0x1097/0x1160 include/linux/rhashtable.h:1144
 nft_chain_del net/netfilter/nf_tables_api.c:9781 [inline]
 __nft_release_table+0xc57/0xf40 net/netfilter/nf_tables_api.c:11513
 nft_rcv_nl_event+0x55f/0x6d0 net/netfilter/nf_tables_api.c:11576
 notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
 blocking_notifier_call_chain+0x69/0x90 kernel/notifier.c:388
 netlink_release+0x11a6/0x1b10 net/netlink/af_netlink.c:787
 __sock_release net/socket.c:659 [inline]
 sock_close+0xbc/0x240 net/socket.c:1421
 __fput+0x406/0x8b0 fs/file_table.c:422
 task_work_run+0x24f/0x310 kernel/task_work.c:180
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xa27/0x27e0 kernel/exit.c:874
 do_group_exit+0x207/0x2c0 kernel/exit.c:1023
 __do_sys_exit_group kernel/exit.c:1034 [inline]
 __se_sys_exit_group kernel/exit.c:1032 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1032
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88802b005000
 which belongs to the cache kmalloc-cg-512 of size 512
The buggy address is located 452 bytes inside of
 freed 512-byte region [ffff88802b005000, ffff88802b005200)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2b004
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffefff(slab)
raw: 00fff00000000040 ffff88801504f140 ffffea0000b63700 0000000000000002
raw: 0000000000000000 0000000080100010 00000001ffffefff 0000000000000000
head: 00fff00000000040 ffff88801504f140 ffffea0000b63700 0000000000000002
head: 0000000000000000 0000000080100010 00000001ffffefff 0000000000000000
head: 00fff00000000002 ffffea0000ac0101 ffffffffffffffff 0000000000000000
head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4764, tgid 4764 (dhcpcd), ts 34584428687, free_ts 34557820760
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1473
 prep_new_page mm/page_alloc.c:1481 [inline]
 get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3425
 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4683
 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
 alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
 alloc_slab_page+0x5f/0x120 mm/slub.c:2265
 allocate_slab+0x5a/0x2f0 mm/slub.c:2428
 new_slab mm/slub.c:2481 [inline]
 ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3667
 __slab_alloc+0x58/0xa0 mm/slub.c:3757
 __slab_alloc_node mm/slub.c:3810 [inline]
 slab_alloc_node mm/slub.c:3990 [inline]
 __do_kmalloc_node mm/slub.c:4122 [inline]
 kmalloc_node_track_caller_noprof+0x281/0x440 mm/slub.c:4143
 kmalloc_reserve+0x111/0x2a0 net/core/skbuff.c:605
 __alloc_skb+0x1f3/0x440 net/core/skbuff.c:674
 alloc_skb include/linux/skbuff.h:1320 [inline]
 alloc_skb_with_frags+0xc3/0x770 net/core/skbuff.c:6524
 sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2815
 unix_dgram_sendmsg+0x6d3/0x1f80 net/unix/af_unix.c:2030
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
 __do_sys_sendto net/socket.c:2204 [inline]
 __se_sys_sendto net/socket.c:2200 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2200
page last free pid 4764 tgid 4764 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1093 [inline]
 free_unref_page+0xd22/0xea0 mm/page_alloc.c:2588
 stack_depot_save_flags+0x6f6/0x830 lib/stackdepot.c:666
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_save_track+0x51/0x80 mm/kasan/common.c:68
 unpoison_slab_object mm/kasan/common.c:312 [inline]
 __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook mm/slub.c:3940 [inline]
 slab_alloc_node mm/slub.c:4002 [inline]
 kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4045
 kmalloc_reserve+0xa8/0x2a0 net/core/skbuff.c:583
 __alloc_skb+0x1f3/0x440 net/core/skbuff.c:674
 alloc_skb include/linux/skbuff.h:1320 [inline]
 __ip6_append_data+0x2ba6/0x4070 net/ipv6/ip6_output.c:1648
 ip6_append_data+0x264/0x3a0 net/ipv6/ip6_output.c:1835
 rawv6_sendmsg+0x18f1/0x23c0 net/ipv6/raw.c:919
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x1a6/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
 ___sys_sendmsg net/socket.c:2639 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff88802b005080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88802b005100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88802b005180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff88802b005200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88802b005280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-01 20:19 [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work syzbot
@ 2024-07-02 14:08 ` Florian Westphal
  2024-07-03 10:35   ` Hillf Danton
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-02 14:08 UTC (permalink / raw)
  To: netfilter-devel
  Cc: linux-kernel, netdev, syzkaller-bugs, Florian Westphal,
	syzbot+4fd66a69358fc15ae2ad

syzbot reports:

KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
 nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
 nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
 nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597

Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.

We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.

AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.

Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_tables_api.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 02d75aefaa8e..683f6a4518ee 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -11552,8 +11552,7 @@ static int nft_rcv_nl_event(struct notifier_block *this, unsigned long event,
 
 	gc_seq = nft_gc_seq_begin(nft_net);
 
-	if (!list_empty(&nf_tables_destroy_list))
-		nf_tables_trans_destroy_flush_work();
+	nf_tables_trans_destroy_flush_work();
 again:
 	list_for_each_entry(table, &nft_net->tables, list) {
 		if (nft_table_has_owner(table) &&
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-02 14:08 ` [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier Florian Westphal
@ 2024-07-03 10:35   ` Hillf Danton
  2024-07-03 10:52     ` Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: Hillf Danton @ 2024-07-03 10:35 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Tue,  2 Jul 2024 16:08:14 +0200 Florian Westphal <fw@strlen.de>
> syzbot reports:
> 
> KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
> KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
> KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
> Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
> [..]
> Workqueue: events nf_tables_trans_destroy_work
> Call Trace:
>  nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
>  nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
>  nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
> 
> Problem is that the notifier does a conditional flush, but its possible
> that the table-to-be-removed is still referenced by transactions being
> processed by the worker, so we need to flush unconditionally.
> 
Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
if trans outlives table.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-03 10:35   ` Hillf Danton
@ 2024-07-03 10:52     ` Florian Westphal
  2024-07-03 12:09       ` Hillf Danton
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-03 10:52 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, netfilter-devel, linux-kernel, netdev,
	syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> if trans outlives table.

trans must never outlive table.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-03 10:52     ` Florian Westphal
@ 2024-07-03 12:09       ` Hillf Danton
  2024-07-03 13:01         ` Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: Hillf Danton @ 2024-07-03 12:09 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > if trans outlives table.
> 
> trans must never outlive table.
> 
What is preventing trans from being freed after closing sock, given
trans is freed in workqueue?

	close sock
	queue work

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-03 12:09       ` Hillf Danton
@ 2024-07-03 13:01         ` Florian Westphal
  2024-07-04 10:35           ` Hillf Danton
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-03 13:01 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, netfilter-devel, linux-kernel, netdev,
	syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > if trans outlives table.
> > 
> > trans must never outlive table.
> > 
> What is preventing trans from being freed after closing sock, given
> trans is freed in workqueue?
> 
> 	close sock
> 	queue work

The notifier acquires the transaction mutex, locking out all other
transactions, so no further transactions requests referencing
the table can be queued.

The work queue is flushed before potentially ripping the table
out.  After this, no transactions referencing the table can exist
anymore; the only transactions than can still be queued are those
coming from a different netns, and tables are scoped per netns.

Table is torn down.  Transaction mutex is released.

Next transaction from userspace can't find the table anymore (its gone),
so no more transactions can be queued for this table.

As I wrote in the commit message, the flush is dumb, this should first
walk to see if there is a matching table to be torn down, and then flush
work queue once before tearing the table down.

But its better to clearly split bug fix and such a change.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-03 13:01         ` Florian Westphal
@ 2024-07-04 10:35           ` Hillf Danton
  2024-07-04 10:54             ` Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: Hillf Danton @ 2024-07-04 10:35 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Wed, 3 Jul 2024 15:01:07 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > > Hillf Danton <hdanton@sina.com> wrote:
> > > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > > if trans outlives table.
> > > 
> > > trans must never outlive table.
> > > 
> > What is preventing trans from being freed after closing sock, given
> > trans is freed in workqueue?
> > 
> > 	close sock
> > 	queue work
> 
> The notifier acquires the transaction mutex, locking out all other
> transactions, so no further transactions requests referencing
> the table can be queued.
> 
As per the syzbot report, trans->table could be instantiated before
notifier acquires the transaction mutex. And in fact the lock helps
trans outlive table even with your patch.

	cpu1			cpu2
	---			---
	transB->table = A
				lock trans mutex
				flush work
				free A
				unlock trans mutex

	queue work to free transB

> The work queue is flushed before potentially ripping the table
> out.  After this, no transactions referencing the table can exist
> anymore; the only transactions than can still be queued are those
> coming from a different netns, and tables are scoped per netns.
> 
> Table is torn down.  Transaction mutex is released.
> 
> Next transaction from userspace can't find the table anymore (its gone),
> so no more transactions can be queued for this table.
> 
> As I wrote in the commit message, the flush is dumb, this should first
> walk to see if there is a matching table to be torn down, and then flush
> work queue once before tearing the table down.
> 
> But its better to clearly split bug fix and such a change.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-04 10:35           ` Hillf Danton
@ 2024-07-04 10:54             ` Florian Westphal
  2024-07-05 10:48               ` Hillf Danton
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-04 10:54 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, netfilter-devel, linux-kernel, netdev,
	syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> On Wed, 3 Jul 2024 15:01:07 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > > > if trans outlives table.
> > > > 
> > > > trans must never outlive table.
> > > > 
> > > What is preventing trans from being freed after closing sock, given
> > > trans is freed in workqueue?
> > > 
> > > 	close sock
> > > 	queue work
> > 
> > The notifier acquires the transaction mutex, locking out all other
> > transactions, so no further transactions requests referencing
> > the table can be queued.
> > 
> As per the syzbot report, trans->table could be instantiated before
> notifier acquires the transaction mutex. And in fact the lock helps
> trans outlive table even with your patch.
> 
> 	cpu1			cpu2
> 	---			---
> 	transB->table = A
> 				lock trans mutex
> 				flush work
> 				free A
> 				unlock trans mutex
> 
> 	queue work to free transB

Can you show a crash reproducer or explain how this assign
and queueing happens unordered wrt. cpu2?

This should look like this:

 	cpu1			cpu2
 	---			---
	lock trans mutex
  				lock trans mutex -> blocks
 	transB->table = A
  	queue work to free transB
	unlock trans mutex
				lock trans mutex returns
  				flush work
  				free A
  				unlock trans mutex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-04 10:54             ` Florian Westphal
@ 2024-07-05 10:48               ` Hillf Danton
  2024-07-05 11:02                 ` Florian Westphal
  2024-07-05 11:18                 ` [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work syzbot
  0 siblings, 2 replies; 17+ messages in thread
From: Hillf Danton @ 2024-07-05 10:48 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Thu, 4 Jul 2024 12:54:18 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > On Wed, 3 Jul 2024 15:01:07 +0200 Florian Westphal <fw@strlen.de>
> > > Hillf Danton <hdanton@sina.com> wrote:
> > > > On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > > > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > > > > if trans outlives table.
> > > > > 
> > > > > trans must never outlive table.
> > > > > 
> > > > What is preventing trans from being freed after closing sock, given
> > > > trans is freed in workqueue?
> > > > 
> > > > 	close sock
> > > > 	queue work
> > > 
> > > The notifier acquires the transaction mutex, locking out all other
> > > transactions, so no further transactions requests referencing
> > > the table can be queued.
> > > 
> > As per the syzbot report, trans->table could be instantiated before
> > notifier acquires the transaction mutex. And in fact the lock helps
> > trans outlive table even with your patch.
> > 
> > 	cpu1			cpu2
> > 	---			---
> > 	transB->table = A
> > 				lock trans mutex
> > 				flush work
> > 				free A
> > 				unlock trans mutex
> > 
> > 	queue work to free transB
> 
> Can you show a crash reproducer or explain how this assign
> and queueing happens unordered wrt. cpu2?
> 
Not so difficult.

> This should look like this:
> 
>  	cpu1			cpu2
>  	---			---
> 	lock trans mutex
>   				lock trans mutex -> blocks
>  	transB->table = A
>   	queue work to free transB
> 	unlock trans mutex
> 				lock trans mutex returns
>   				flush work
>   				free A
>   				unlock trans mutex
> 
If your patch is correct, it should survive a warning.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git  1c5fc27bc48a

--- x/net/netfilter/nf_tables_api.c
+++ y/net/netfilter/nf_tables_api.c
@@ -11552,9 +11552,10 @@ static int nft_rcv_nl_event(struct notif
 
 	gc_seq = nft_gc_seq_begin(nft_net);
 
-	if (!list_empty(&nf_tables_destroy_list))
-		nf_tables_trans_destroy_flush_work();
+	nf_tables_trans_destroy_flush_work();
 again:
+	WARN_ON(!list_empty(&nft_net->commit_list));
+
 	list_for_each_entry(table, &nft_net->tables, list) {
 		if (nft_table_has_owner(table) &&
 		    n->portid == table->nlpid) {
--

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-05 10:48               ` Hillf Danton
@ 2024-07-05 11:02                 ` Florian Westphal
  2024-07-07  7:56                   ` Hillf Danton
  2024-07-05 11:18                 ` [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work syzbot
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-05 11:02 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, netfilter-devel, linux-kernel, netdev,
	syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> > 				lock trans mutex returns
> >   				flush work
> >   				free A
> >   				unlock trans mutex
> > 
> If your patch is correct, it should survive a warning.
> 
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git  1c5fc27bc48a
> 
> --- x/net/netfilter/nf_tables_api.c
> +++ y/net/netfilter/nf_tables_api.c
> @@ -11552,9 +11552,10 @@ static int nft_rcv_nl_event(struct notif
>  
>  	gc_seq = nft_gc_seq_begin(nft_net);
>  
> -	if (!list_empty(&nf_tables_destroy_list))
> -		nf_tables_trans_destroy_flush_work();
> +	nf_tables_trans_destroy_flush_work();
>  again:
> +	WARN_ON(!list_empty(&nft_net->commit_list));
> +

You could officially submit this patch to nf-next, this
is a slow path and the transaction list must be empty here.

I think this change might be useful as it also documents
this requirement.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work
  2024-07-05 10:48               ` Hillf Danton
  2024-07-05 11:02                 ` Florian Westphal
@ 2024-07-05 11:18                 ` syzbot
  1 sibling, 0 replies; 17+ messages in thread
From: syzbot @ 2024-07-05 11:18 UTC (permalink / raw)
  To: fw, hdanton, linux-kernel, netdev, netfilter-devel,
	syzkaller-bugs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com

Tested on:

commit:         1c5fc27b Merge tag 'nf-next-24-06-28' of git://git.ker..
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=152db3d1980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5264b58fdff6e881
dashboard link: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=1395cd71980000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-05 11:02                 ` Florian Westphal
@ 2024-07-07  7:56                   ` Hillf Danton
  2024-07-07  8:08                     ` Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: Hillf Danton @ 2024-07-07  7:56 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Fri, 5 Jul 2024 13:02:18 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > > 				lock trans mutex returns
> > >   				flush work
> > >   				free A
> > >   				unlock trans mutex
> > > 
> > If your patch is correct, it should survive a warning.
> > 
> > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git  1c5fc27bc48a
> > 
> > --- x/net/netfilter/nf_tables_api.c
> > +++ y/net/netfilter/nf_tables_api.c
> > @@ -11552,9 +11552,10 @@ static int nft_rcv_nl_event(struct notif
> >  
> >  	gc_seq = nft_gc_seq_begin(nft_net);
> >  
> > -	if (!list_empty(&nf_tables_destroy_list))
> > -		nf_tables_trans_destroy_flush_work();
> > +	nf_tables_trans_destroy_flush_work();
> >  again:
> > +	WARN_ON(!list_empty(&nft_net->commit_list));
> > +
> 
> You could officially submit this patch to nf-next, this
> is a slow path and the transaction list must be empty here.
> 
> I think this change might be useful as it also documents
> this requirement.

Yes it is boy and the current reproducer triggered another warning [1,2].

[1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
[2] https://lore.kernel.org/lkml/000000000000a42289061c9d452e@google.com/

And feel free to take a look at fput() and sock_put() for instance of
freeing slab pieces asyncally.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-07  7:56                   ` Hillf Danton
@ 2024-07-07  8:08                     ` Florian Westphal
  2024-07-08 10:56                       ` Hillf Danton
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-07  8:08 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, netfilter-devel, linux-kernel, netdev,
	syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> > I think this change might be useful as it also documents
> > this requirement.
> 
> Yes it is boy and the current reproducer triggered another warning [1,2].
> 
> [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/

The WARN is incorrect.  The destroy list can be non-empty; i already
tried to explain why.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-07  8:08                     ` Florian Westphal
@ 2024-07-08 10:56                       ` Hillf Danton
  2024-07-08 11:58                         ` Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: Hillf Danton @ 2024-07-08 10:56 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Tejun Heo, netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > > I think this change might be useful as it also documents
> > > this requirement.
> > 
> > Yes it is boy and the current reproducer triggered another warning [1,2].
> > 
> > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> 
> The WARN is incorrect.  The destroy list can be non-empty; i already
> tried to explain why.
>
That warning as-is could be false positive but it could be triggered with a
single netns.

	cpu1		cpu2		cpu3
	---		---		---
					nf_tables_trans_destroy_work()
					spin_lock(&nf_tables_destroy_list_lock);

					// 1) clear the destroy list
					list_splice_init(&nf_tables_destroy_list, &head);
					spin_unlock(&nf_tables_destroy_list_lock);

			nf_tables_commit_release()
			spin_lock(&nf_tables_destroy_list_lock);

			// 2) refill the destroy list
			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
			spin_unlock(&nf_tables_destroy_list_lock);
			schedule_work(&trans_destroy_work);
			mutex_unlock(&nft_net->commit_mutex);

	nft_rcv_nl_event()
	mutex_lock(&nft_net->commit_mutex);
	flush_work(&trans_destroy_work);
	  start_flush_work()
	    insert_wq_barrier()
	    /*
	     * If @target is currently being executed, schedule the
	     * barrier to the worker; otherwise, put it after @target.
	     */

	// 3) flush work ends with the refilled destroy list left intact
	tear tables down


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-08 10:56                       ` Hillf Danton
@ 2024-07-08 11:58                         ` Florian Westphal
  2024-07-08 12:17                           ` Hillf Danton
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Westphal @ 2024-07-08 11:58 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, Tejun Heo, netfilter-devel, linux-kernel,
	netdev, syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > > I think this change might be useful as it also documents
> > > > this requirement.
> > > 
> > > Yes it is boy and the current reproducer triggered another warning [1,2].
> > > 
> > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> > 
> > The WARN is incorrect.  The destroy list can be non-empty; i already
> > tried to explain why.
> >
> That warning as-is could be false positive but it could be triggered with a
> single netns.

How?

> 	cpu1		cpu2		cpu3
> 	---		---		---
> 					nf_tables_trans_destroy_work()
> 					spin_lock(&nf_tables_destroy_list_lock);
> 
> 					// 1) clear the destroy list
> 					list_splice_init(&nf_tables_destroy_list, &head);
> 					spin_unlock(&nf_tables_destroy_list_lock);
> 
> 			nf_tables_commit_release()
> 			spin_lock(&nf_tables_destroy_list_lock);
> 
> 			// 2) refill the destroy list
> 			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
> 			spin_unlock(&nf_tables_destroy_list_lock);
> 			schedule_work(&trans_destroy_work);
> 			mutex_unlock(&nft_net->commit_mutex);

So you're saying work can be IDLE after schedule_work()?

I'm not following at all.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-08 11:58                         ` Florian Westphal
@ 2024-07-08 12:17                           ` Hillf Danton
  2024-07-08 12:43                             ` Florian Westphal
  0 siblings, 1 reply; 17+ messages in thread
From: Hillf Danton @ 2024-07-08 12:17 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Tejun Heo, netfilter-devel, linux-kernel, netdev, syzkaller-bugs,
	syzbot+4fd66a69358fc15ae2ad

On Mon, 8 Jul 2024 13:58:31 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > I think this change might be useful as it also documents
> > > > > this requirement.
> > > > 
> > > > Yes it is boy and the current reproducer triggered another warning [1,2].
> > > > 
> > > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> > > 
> > > The WARN is incorrect.  The destroy list can be non-empty; i already
> > > tried to explain why.
> > >
> > That warning as-is could be false positive but it could be triggered with a
> > single netns.
> 
> How?
> 
You saw the below cpu diagram, no?

> > 	cpu1		cpu2		cpu3
> > 	---		---		---
> > 					nf_tables_trans_destroy_work()
> > 					spin_lock(&nf_tables_destroy_list_lock);
> > 
> > 					// 1) clear the destroy list
> > 					list_splice_init(&nf_tables_destroy_list, &head);
> > 					spin_unlock(&nf_tables_destroy_list_lock);
> > 
> > 			nf_tables_commit_release()
> > 			spin_lock(&nf_tables_destroy_list_lock);
> > 
> > 			// 2) refill the destroy list
> > 			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
> > 			spin_unlock(&nf_tables_destroy_list_lock);
> > 			schedule_work(&trans_destroy_work);
> > 			mutex_unlock(&nft_net->commit_mutex);
> 
> So you're saying work can be IDLE after schedule_work()?
> 
I got your point but difficult to explain you. In simple words,
like runqueue, workqueue has latency.

> I'm not following at all.

This does not matter but is why I added tj to the cc list.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier
  2024-07-08 12:17                           ` Hillf Danton
@ 2024-07-08 12:43                             ` Florian Westphal
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Westphal @ 2024-07-08 12:43 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Florian Westphal, Tejun Heo, netfilter-devel, linux-kernel,
	netdev, syzkaller-bugs, syzbot+4fd66a69358fc15ae2ad

Hillf Danton <hdanton@sina.com> wrote:
> On Mon, 8 Jul 2024 13:58:31 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> > > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > > I think this change might be useful as it also documents
> > > > > > this requirement.
> > > > > 
> > > > > Yes it is boy and the current reproducer triggered another warning [1,2].
> > > > > 
> > > > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> > > > 
> > > > The WARN is incorrect.  The destroy list can be non-empty; i already
> > > > tried to explain why.
> > > >
> > > That warning as-is could be false positive but it could be triggered with a
> > > single netns.
> > 
> > How?
> > 
> You saw the below cpu diagram, no?

It did not explain the problem in a way I understand.

>	cpu1		cpu2		cpu3
>	---		---		---
>					nf_tables_trans_destroy_work()
>					spin_lock(&nf_tables_destroy_list_lock);
>
>					// 1) clear the destroy list
>					list_splice_init(&nf_tables_destroy_list, &head);
>					spin_unlock(&nf_tables_destroy_list_lock);

This means @work is running on cpu3 and made a snapshot of the list.
I don't even understand how thats relevant, but OK.

>			nf_tables_commit_release()
>			spin_lock(&nf_tables_destroy_list_lock);
>			// 2) refill the destroy list
>			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
>			spin_unlock(&nf_tables_destroy_list_lock);
>			schedule_work(&trans_destroy_work);
>			mutex_unlock(&nft_net->commit_mutex);

Means CPU2 has added transaction structures that could
reference @table to list.

It also called schedule_work BEFORE releasing the mutex and
after placing entries on destroy list.

> nft_rcv_nl_event()
> mutex_lock(&nft_net->commit_mutex);
> flush_work(&trans_destroy_work);

Means cpu1 serializes vs. cpu2, @work
was scheduled.

flush_work() must only return if @work is idle, without
any other pending execution.

If it gets scheduled again right after flush_work
returns that is NOT a problem, as I tried to explain several times.

We hold the transaction mutex, only a different netns can queue more
work, and such foreign netns can only see struct nft_table structures
that are private to their namespaces.

> // 3) flush work ends with the refilled destroy list left intact
> tear tables down

Again, I do not understand how its possible.

The postcondition after flush_work returns is:

1. nf_tables_destroy_list must be empty, UNLESS its from unrelated
   net namespaces, they cannot see the tables we're tearing down in 3),
   so they cannot reference them.

2. nf_tables_trans_destroy_work() is NOT running, unless its
   processing entries queued by other netns, after flush work
   returned.


cpu2 does:
   -> add trans->table to @nf_tables_destroy_list
   -> unlock list spinlock
   -> schedule_work
   -> unlock mutex

cpu1 does:
 -> lock mutex
 -> flush work

You say its not enough and that trans->table queued by cpu2 can still
be on @nf_tables_destroy_list.

I say flush_work after taking the mutex guarantees strans->table has been
processed by @work in all cases.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-07-08 12:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-01 20:19 [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work syzbot
2024-07-02 14:08 ` [PATCH nf] netfilter: nf_tables: unconditionally flush pending work before notifier Florian Westphal
2024-07-03 10:35   ` Hillf Danton
2024-07-03 10:52     ` Florian Westphal
2024-07-03 12:09       ` Hillf Danton
2024-07-03 13:01         ` Florian Westphal
2024-07-04 10:35           ` Hillf Danton
2024-07-04 10:54             ` Florian Westphal
2024-07-05 10:48               ` Hillf Danton
2024-07-05 11:02                 ` Florian Westphal
2024-07-07  7:56                   ` Hillf Danton
2024-07-07  8:08                     ` Florian Westphal
2024-07-08 10:56                       ` Hillf Danton
2024-07-08 11:58                         ` Florian Westphal
2024-07-08 12:17                           ` Hillf Danton
2024-07-08 12:43                             ` Florian Westphal
2024-07-05 11:18                 ` [syzbot] [netfilter?] KASAN: slab-use-after-free Read in nf_tables_trans_destroy_work syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).