* net: use-after-free in tw_timer_handler @ 2017-01-23 10:19 Dmitry Vyukov 2017-01-23 10:23 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-01-23 10:19 UTC (permalink / raw) To: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet Cc: syzkaller Hello, While running syzkaller fuzzer I started seeing use-after-frees in tw_timer_handler. It happens with very low frequency, so far I've seen 22 of them. But all reports look consistent, so I would assume that it is real, just requires a very tricky race to happen. I've stared seeing it around Jan 17, however I did not update kernels for some time before that so potentially the issues was introduced somewhat earlier. Or maybe fuzzer just figured how to trigger it, and the bug is actually old. I am seeing it on all of torvalds/linux-next/mmotm, some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36, 5cf7a0f3442b2312326c39f571d637669a478235, c497f8d17246720afe680ea1a8fa6e48e75af852. Majority of reports points to net_drop_ns as the offending free, but it may be red herring. Since the access happens in timer, it can happen long after free and the memory could have been reused. I've also seen few where the access in tw_timer_handler is reported as out-of-bounds on task_struct and on struct filename. BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cb58c398 Read of size 8 by task syz-executor0/24691 CPU: 0 PID: 24691 Comm: syz-executor0 Not tainted 4.9.0 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff8801dc007328 ffffffff8234530f ffffffff00000000 1ffff1003b800df8 ffffed003b800df0 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021 ffff8801d8ad8f60 ffff8801d8ad8f68 ffff8801d8ad8740 000000000000002e Call Trace: [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329 [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline] [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641 [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284 [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline] [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405 [<ffffffff843723ee>] exiting_irq arch/x86/include/asm/apic.h:659 [inline] [<ffffffff843723ee>] smp_trace_apic_timer_interrupt+0x13e/0x6a8 arch/x86/kernel/apic/apic.c:981 [<ffffffff843713dc>] trace_apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:709 <EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? arch_local_irq_enable arch/x86/include/asm/paravirt.h:777 [inline] <EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:170 [inline] <EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? _raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199 [<ffffffff814cbff2>] finish_lock_switch kernel/sched/sched.h:1157 [inline] [<ffffffff814cbff2>] finish_task_switch+0x1c2/0x710 kernel/sched/core.c:2769 [<ffffffff84356654>] context_switch kernel/sched/core.c:2902 [inline] [<ffffffff84356654>] __schedule+0x724/0x1e90 kernel/sched/core.c:3402 [<ffffffff84357ec8>] schedule+0x108/0x440 kernel/sched/core.c:3457 [<ffffffff8100790f>] exit_to_usermode_loop+0x20f/0x2a0 arch/x86/entry/common.c:149 [<ffffffff81009413>] prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline] [<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 [<ffffffff8436fa22>] entry_SYSCALL_64_fastpath+0xc0/0xc2 Object at ffff8801cb58c1c0, in cache net_namespace size: 6656 Allocated: PID = 3183 [ 2916.342108] [<ffffffff819dcd92>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537 [ 2916.349322] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565 [ 2916.356776] [<ffffffff83549a86>] kmem_cache_zalloc include/linux/slab.h:626 [inline] [ 2916.356776] [<ffffffff83549a86>] net_alloc net/core/net_namespace.c:339 [inline] [ 2916.356776] [<ffffffff83549a86>] copy_net_ns+0x196/0x480 net/core/net_namespace.c:379 [ 2916.363783] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 [ 2916.371605] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164 [ 2916.379042] [<ffffffff814197f1>] copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659 [ 2916.387013] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline] [ 2916.387013] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937 [ 2916.393730] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline] [ 2916.393730] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041 [ 2916.400376] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 [ 2916.407563] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a Freed: PID = 15107 [ 2916.441170] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline] [ 2916.441170] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767 [ 2916.448408] [<ffffffff83548e3e>] net_free net/core/net_namespace.c:355 [inline] [ 2916.448408] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140 net/core/net_namespace.c:362 [ 2916.455370] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:472 [ 2916.462331] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096 [ 2916.469877] [<ffffffff81493bc3>] worker_thread+0x223/0x1990 kernel/workqueue.c:2230 [ 2916.477155] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 [ 2916.483831] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 Memory state around the buggy address: ffff8801cb58c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801cb58c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8801cb58c380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801cb58c400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801cb58c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cd4ec298 Read of size 8 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff8801dc107468 ffffffff8234530f ffffffff00000001 1ffff1003b820e20 ffffed003b820e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021 1ffff1003b820e17 ffff8801daf0e2c0 0000000041b58ab3 ffffffff84af4170 Call Trace: [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329 [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline] [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641 [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284 [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline] [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405 [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline] [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0 arch/x86/kernel/apic/apic.c:960 [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:709 <EOI> [ 1412.821824] [<ffffffff8436dbb6>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53 [<ffffffff8436d08f>] arch_safe_halt arch/x86/include/asm/paravirt.h:103 [inline] [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308 [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299 [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96 [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline] [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline] [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302 [<ffffffff812e47ac>] start_secondary+0x36c/0x460 arch/x86/kernel/smpboot.c:263 Object at ffff8801cd4ec0c0, in cache net_namespace size: 6656 Allocated: PID = 3131 [ 1412.940699] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565 [ 1412.948084] [<ffffffff83549a86>] kmem_cache_zalloc include/linux/slab.h:626 [inline] [ 1412.948084] [<ffffffff83549a86>] net_alloc net/core/net_namespace.c:339 [inline] [ 1412.948084] [<ffffffff83549a86>] copy_net_ns+0x196/0x480 net/core/net_namespace.c:379 [ 1412.955019] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 [ 1412.962817] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164 [ 1412.970094] [<ffffffff814197f1>] copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659 [ 1412.978004] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline] [ 1412.978004] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937 [ 1412.984677] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline] [ 1412.984677] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041 [ 1412.991276] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 [ 1412.998394] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a Freed: PID = 9846 [ 1413.031603] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline] [ 1413.031603] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767 [ 1413.038796] [<ffffffff83548e3e>] net_free net/core/net_namespace.c:355 [inline] [ 1413.038796] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140 net/core/net_namespace.c:362 [ 1413.045734] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:472 [ 1413.052667] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096 [ 1413.060120] [<ffffffff81493bc3>] worker_thread+0x223/0x1990 kernel/workqueue.c:2230 [ 1413.067357] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 [ 1413.073944] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 Memory state around the buggy address: ffff8801cd4ec180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801cd4ec200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8801cd4ec280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801cd4ec300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801cd4ec380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 at addr ffff8801b7b50358 Read of size 8 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff8801dc007468 ffffffff8234530f ffffffff00000000 1ffff1003b800e20 ffffed003b800e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021 ffffffff84e2bba0 ffffffff84e2bba8 ffffffff84e2b380 000000000000002e Call Trace: [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329 [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline] [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641 [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284 [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline] [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405 [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline] [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0 arch/x86/kernel/apic/apic.c:960 [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:709 <EOI> [ 1965.936792] [<ffffffff8436dbb6>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53 [<ffffffff8436d08f>] arch_safe_halt arch/x86/include/asm/paravirt.h:103 [inline] [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308 [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299 [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96 [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline] [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline] [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302 [<ffffffff8434f05d>] rest_init+0x18d/0x1a0 init/main.c:408 [<ffffffff85481b16>] start_kernel+0x7a0/0x7d2 init/main.c:660 [<ffffffff854802e6>] x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:195 [<ffffffff85480424>] x86_64_start_kernel+0x13c/0x149 arch/x86/kernel/head64.c:176 Object at ffff8801b7b50180, in cache net_namespace size: 6656 Allocated: PID = 3169 [ 1966.129951] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565 [ 1966.137357] [<ffffffff83549a86>] kmem_cache_zalloc include/linux/slab.h:626 [inline] [ 1966.137357] [<ffffffff83549a86>] net_alloc net/core/net_namespace.c:339 [inline] [ 1966.137357] [<ffffffff83549a86>] copy_net_ns+0x196/0x480 net/core/net_namespace.c:379 [ 1966.144350] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 [ 1966.152254] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164 [ 1966.159567] [<ffffffff814197f1>] copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659 [ 1966.167484] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline] [ 1966.167484] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937 [ 1966.174207] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline] [ 1966.174207] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041 [ 1966.180832] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 [ 1966.187973] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a Freed: PID = 8938 [ 1966.221347] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline] [ 1966.221347] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767 [ 1966.228568] [<ffffffff83548e3e>] net_free net/core/net_namespace.c:355 [inline] [ 1966.228568] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140 net/core/net_namespace.c:362 [ 1966.235564] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:472 [ 1966.242517] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096 [ 1966.249995] [<ffffffff81493bc3>] worker_thread+0x223/0x1990 kernel/workqueue.c:2230 [ 1966.257258] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 [ 1966.263879] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 Memory state around the buggy address: ffff8801b7b50200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b7b50280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8801b7b50300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801b7b50380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b7b50400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== BUG: KASAN: slab-out-of-bounds in tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 at addr ffff8801c98f43a0 Read of size 8 by task syz-executor8/3423 CPU: 0 PID: 3423 Comm: syz-executor8 Not tainted 4.10.0-rc5 #19 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 kasan_object_err+0x1c/0x70 mm/kasan/report.c:161 print_address_description mm/kasan/report.c:199 [inline] kasan_report_error+0x1d1/0x4d0 mm/kasan/report.c:288 kasan_report mm/kasan/report.c:308 [inline] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329 tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 call_timer_fn+0x241/0x820 kernel/time/timer.c:1308 expire_timers kernel/time/timer.c:1348 [inline] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:658 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline] RIP: 0010:arch_local_irq_save arch/x86/include/asm/paravirt.h:784 [inline] RIP: 0010:lock_is_held_type+0x124/0x310 kernel/locking/lockdep.c:3787 RSP: 0018:ffff8801c946f558 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000286 RBX: 1ffff1003928deac RCX: 1ffff1003928deb0 RDX: 1ffffffff0a18984 RSI: 00000000ffffffff RDI: ffffffff850c4c20 RBP: ffff8801c946f6a8 R08: 0000000000000002 R09: 0000000000000001 R10: 000000000000000a R11: 0000000000000000 R12: ffff8801c946f680 R13: ffff8801c9492640 R14: ffffffff85130ec0 R15: 0000000000000bff </IRQ> lock_is_held include/linux/lockdep.h:348 [inline] ___might_sleep+0x5b3/0x650 kernel/sched/core.c:7748 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 cache_alloc_debugcheck_before mm/slab.c:3071 [inline] slab_alloc mm/slab.c:3386 [inline] kmem_cache_alloc+0x273/0x680 mm/slab.c:3558 shmem_alloc_inode+0x1b/0x40 mm/shmem.c:3647 alloc_inode+0x61/0x180 fs/inode.c:207 new_inode_pseudo+0x69/0x170 fs/inode.c:889 new_inode+0x1c/0x40 fs/inode.c:918 shmem_get_inode+0xd1/0x8a0 mm/shmem.c:2120 shmem_mknod+0x58/0x1b0 mm/shmem.c:2824 shmem_mkdir+0x29/0x50 mm/shmem.c:2875 vfs_mkdir+0x3be/0x600 fs/namei.c:3738 SYSC_mkdirat fs/namei.c:3761 [inline] SyS_mkdirat fs/namei.c:3745 [inline] SYSC_mkdir fs/namei.c:3772 [inline] SyS_mkdir+0x16e/0x290 fs/namei.c:3770 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x44ec87 RSP: 002b:0000000001a2fe40 EFLAGS: 00000212 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 000000000044ec87 RDX: 0000000001a2fe6e RSI: 00000000000001ff RDI: 0000000001a2fe68 RBP: 00000000000019ec R08: 0000000000000000 R09: 0000000000000006 R10: 0000000000000064 R11: 0000000000000212 R12: 0000000001ef390c R13: 0000000000000000 R14: 00000000000a43b5 R15: 00000000000019ec Object at ffff8801c98f44c0, in cache task_struct size: 5696 Allocated: PID = 3452 [<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 [<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502 [<ffffffff819f71da>] set_track mm/kasan/kasan.c:514 [inline] [<ffffffff819f71da>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605 [<ffffffff819f77d2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 [<ffffffff819f1652>] kmem_cache_alloc_node+0x122/0x690 mm/slab.c:3650 [<ffffffff81421fe2>] alloc_task_struct_node kernel/fork.c:142 [inline] [<ffffffff81421fe2>] dup_task_struct kernel/fork.c:482 [inline] [<ffffffff81421fe2>] copy_process.part.42+0x1a32/0x5fd0 kernel/fork.c:1515 [<ffffffff81426ac0>] copy_process kernel/fork.c:1486 [inline] [<ffffffff81426ac0>] _do_fork+0x200/0xff0 kernel/fork.c:1942 [<ffffffff81427987>] SYSC_clone kernel/fork.c:2052 [inline] [<ffffffff81427987>] SyS_clone+0x37/0x50 kernel/fork.c:2046 [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 [<ffffffff8440fb09>] return_from_SYSCALL_64+0x0/0x7a Freed: PID = 29885 [<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 [<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502 [<ffffffff819f784f>] set_track mm/kasan/kasan.c:514 [inline] [<ffffffff819f784f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578 [<ffffffff819f4bf1>] __cache_free mm/slab.c:3502 [inline] [<ffffffff819f4bf1>] kmem_cache_free+0x71/0x240 mm/slab.c:3762 [<ffffffff8141f041>] free_task_struct kernel/fork.c:147 [inline] [<ffffffff8141f041>] free_task+0x151/0x1d0 kernel/fork.c:359 [<ffffffff8141f30b>] __put_task_struct+0x24b/0x5f0 kernel/fork.c:396 [<ffffffff81435baa>] put_task_struct include/linux/sched.h:2257 [inline] [<ffffffff81435baa>] delayed_put_task_struct+0xca/0x3f0 kernel/exit.c:173 [<ffffffff815ef250>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline] [<ffffffff815ef250>] rcu_do_batch.isra.70+0x9e0/0xdf0 kernel/rcu/tree.c:2780 [<ffffffff815efad2>] invoke_rcu_callbacks kernel/rcu/tree.c:3043 [inline] [<ffffffff815efad2>] __rcu_process_callbacks kernel/rcu/tree.c:3010 [inline] [<ffffffff815efad2>] rcu_process_callbacks+0x472/0xc70 kernel/rcu/tree.c:3027 [<ffffffff84412d7f>] __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 Memory state around the buggy address: ffff8801c98f4280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801c98f4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8801c98f4380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff8801c98f4400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801c98f4480: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ================================================================== ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-01-23 10:19 net: use-after-free in tw_timer_handler Dmitry Vyukov @ 2017-01-23 10:23 ` Dmitry Vyukov 2017-01-24 14:28 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-01-23 10:23 UTC (permalink / raw) To: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet Cc: syzkaller On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > Hello, > > While running syzkaller fuzzer I started seeing use-after-frees in > tw_timer_handler. It happens with very low frequency, so far I've seen > 22 of them. But all reports look consistent, so I would assume that it > is real, just requires a very tricky race to happen. I've stared > seeing it around Jan 17, however I did not update kernels for some > time before that so potentially the issues was introduced somewhat > earlier. Or maybe fuzzer just figured how to trigger it, and the bug > is actually old. I am seeing it on all of torvalds/linux-next/mmotm, > some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36, > 5cf7a0f3442b2312326c39f571d637669a478235, > c497f8d17246720afe680ea1a8fa6e48e75af852. > Majority of reports points to net_drop_ns as the offending free, but > it may be red herring. Since the access happens in timer, it can > happen long after free and the memory could have been reused. I've > also seen few where the access in tw_timer_handler is reported as > out-of-bounds on task_struct and on struct filename. I've briefly skimmed through the code. Assuming that it requires a very tricky race to be triggered, the most suspicious looks inet_twsk_deschedule_put vs __inet_twsk_schedule: void inet_twsk_deschedule_put(struct inet_timewait_sock *tw) { if (del_timer_sync(&tw->tw_timer)) inet_twsk_kill(tw); inet_twsk_put(tw); } void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) { tw->tw_kill = timeo <= 4*HZ; if (!rearm) { BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo)); atomic_inc(&tw->tw_dr->tw_count); } else { mod_timer_pending(&tw->tw_timer, jiffies + timeo); } } Can't it somehow end up rearming already deleted timer? Or maybe the first mod_timer happens after del_timer_sync? > BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cb58c398 > Read of size 8 by task syz-executor0/24691 > CPU: 0 PID: 24691 Comm: syz-executor0 Not tainted 4.9.0 #3 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > ffff8801dc007328 ffffffff8234530f ffffffff00000000 1ffff1003b800df8 > ffffed003b800df0 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021 > ffff8801d8ad8f60 ffff8801d8ad8f68 ffff8801d8ad8740 000000000000002e > Call Trace: > [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40 > mm/kasan/report.c:329 > [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 > [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 > [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline] > [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641 > [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 > [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284 > [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline] > [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405 > [<ffffffff843723ee>] exiting_irq arch/x86/include/asm/apic.h:659 [inline] > [<ffffffff843723ee>] smp_trace_apic_timer_interrupt+0x13e/0x6a8 > arch/x86/kernel/apic/apic.c:981 > [<ffffffff843713dc>] trace_apic_timer_interrupt+0x8c/0xa0 > arch/x86/entry/entry_64.S:709 > <EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? arch_local_irq_enable > arch/x86/include/asm/paravirt.h:777 [inline] > <EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? __raw_spin_unlock_irq > include/linux/spinlock_api_smp.h:170 [inline] > <EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? > _raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199 > [<ffffffff814cbff2>] finish_lock_switch kernel/sched/sched.h:1157 [inline] > [<ffffffff814cbff2>] finish_task_switch+0x1c2/0x710 kernel/sched/core.c:2769 > [<ffffffff84356654>] context_switch kernel/sched/core.c:2902 [inline] > [<ffffffff84356654>] __schedule+0x724/0x1e90 kernel/sched/core.c:3402 > [<ffffffff84357ec8>] schedule+0x108/0x440 kernel/sched/core.c:3457 > [<ffffffff8100790f>] exit_to_usermode_loop+0x20f/0x2a0 > arch/x86/entry/common.c:149 > [<ffffffff81009413>] prepare_exit_to_usermode > arch/x86/entry/common.c:190 [inline] > [<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570 > arch/x86/entry/common.c:259 > [<ffffffff8436fa22>] entry_SYSCALL_64_fastpath+0xc0/0xc2 > Object at ffff8801cb58c1c0, in cache net_namespace size: 6656 > Allocated: > PID = 3183 > [ 2916.342108] [<ffffffff819dcd92>] kasan_slab_alloc+0x12/0x20 > mm/kasan/kasan.c:537 > [ 2916.349322] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565 > [ 2916.356776] [<ffffffff83549a86>] kmem_cache_zalloc > include/linux/slab.h:626 [inline] > [ 2916.356776] [<ffffffff83549a86>] net_alloc > net/core/net_namespace.c:339 [inline] > [ 2916.356776] [<ffffffff83549a86>] copy_net_ns+0x196/0x480 > net/core/net_namespace.c:379 > [ 2916.363783] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860 > kernel/nsproxy.c:106 > [ 2916.371605] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420 > kernel/nsproxy.c:164 > [ 2916.379042] [<ffffffff814197f1>] > copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659 > [ 2916.387013] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline] > [ 2916.387013] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937 > [ 2916.393730] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline] > [ 2916.393730] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041 > [ 2916.400376] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 > arch/x86/entry/common.c:280 > [ 2916.407563] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a > Freed: > PID = 15107 > [ 2916.441170] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline] > [ 2916.441170] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767 > [ 2916.448408] [<ffffffff83548e3e>] net_free > net/core/net_namespace.c:355 [inline] > [ 2916.448408] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140 > net/core/net_namespace.c:362 > [ 2916.455370] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90 > net/core/net_namespace.c:472 > [ 2916.462331] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 > kernel/workqueue.c:2096 > [ 2916.469877] [<ffffffff81493bc3>] worker_thread+0x223/0x1990 > kernel/workqueue.c:2230 > [ 2916.477155] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 > [ 2916.483831] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40 > arch/x86/entry/entry_64.S:433 > Memory state around the buggy address: > ffff8801cb58c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8801cb58c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>ffff8801cb58c380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff8801cb58c400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8801cb58c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > > BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cd4ec298 > Read of size 8 by task swapper/1/0 > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0 #3 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > ffff8801dc107468 ffffffff8234530f ffffffff00000001 1ffff1003b820e20 > ffffed003b820e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021 > 1ffff1003b820e17 ffff8801daf0e2c0 0000000041b58ab3 ffffffff84af4170 > Call Trace: > [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40 > mm/kasan/report.c:329 > [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 > [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 > [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline] > [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641 > [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 > [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284 > [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline] > [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405 > [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline] > [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0 > arch/x86/kernel/apic/apic.c:960 > [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0 > arch/x86/entry/entry_64.S:709 > <EOI> [ 1412.821824] [<ffffffff8436dbb6>] ? > native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53 > [<ffffffff8436d08f>] arch_safe_halt > arch/x86/include/asm/paravirt.h:103 [inline] > [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308 > [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299 > [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96 > [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline] > [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline] > [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302 > [<ffffffff812e47ac>] start_secondary+0x36c/0x460 arch/x86/kernel/smpboot.c:263 > Object at ffff8801cd4ec0c0, in cache net_namespace size: 6656 > Allocated: > PID = 3131 > [ 1412.940699] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565 > [ 1412.948084] [<ffffffff83549a86>] kmem_cache_zalloc > include/linux/slab.h:626 [inline] > [ 1412.948084] [<ffffffff83549a86>] net_alloc > net/core/net_namespace.c:339 [inline] > [ 1412.948084] [<ffffffff83549a86>] copy_net_ns+0x196/0x480 > net/core/net_namespace.c:379 > [ 1412.955019] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860 > kernel/nsproxy.c:106 > [ 1412.962817] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420 > kernel/nsproxy.c:164 > [ 1412.970094] [<ffffffff814197f1>] > copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659 > [ 1412.978004] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline] > [ 1412.978004] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937 > [ 1412.984677] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline] > [ 1412.984677] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041 > [ 1412.991276] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 > arch/x86/entry/common.c:280 > [ 1412.998394] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a > Freed: > PID = 9846 > [ 1413.031603] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline] > [ 1413.031603] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767 > [ 1413.038796] [<ffffffff83548e3e>] net_free > net/core/net_namespace.c:355 [inline] > [ 1413.038796] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140 > net/core/net_namespace.c:362 > [ 1413.045734] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90 > net/core/net_namespace.c:472 > [ 1413.052667] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 > kernel/workqueue.c:2096 > [ 1413.060120] [<ffffffff81493bc3>] worker_thread+0x223/0x1990 > kernel/workqueue.c:2230 > [ 1413.067357] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 > [ 1413.073944] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40 > arch/x86/entry/entry_64.S:433 > Memory state around the buggy address: > ffff8801cd4ec180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8801cd4ec200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>ffff8801cd4ec280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff8801cd4ec300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8801cd4ec380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > > BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 at addr ffff8801b7b50358 > Read of size 8 by task swapper/0/0 > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0 #3 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > ffff8801dc007468 ffffffff8234530f ffffffff00000000 1ffff1003b800e20 > ffffed003b800e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021 > ffffffff84e2bba0 ffffffff84e2bba8 ffffffff84e2b380 000000000000002e > Call Trace: > [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40 > mm/kasan/report.c:329 > [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 > [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308 > [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline] > [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641 > [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654 > [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284 > [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline] > [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405 > [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline] > [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0 > arch/x86/kernel/apic/apic.c:960 > [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0 > arch/x86/entry/entry_64.S:709 > <EOI> [ 1965.936792] [<ffffffff8436dbb6>] ? > native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53 > [<ffffffff8436d08f>] arch_safe_halt > arch/x86/include/asm/paravirt.h:103 [inline] > [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308 > [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299 > [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96 > [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline] > [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline] > [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302 > [<ffffffff8434f05d>] rest_init+0x18d/0x1a0 init/main.c:408 > [<ffffffff85481b16>] start_kernel+0x7a0/0x7d2 init/main.c:660 > [<ffffffff854802e6>] x86_64_start_reservations+0x2a/0x2c > arch/x86/kernel/head64.c:195 > [<ffffffff85480424>] x86_64_start_kernel+0x13c/0x149 > arch/x86/kernel/head64.c:176 > Object at ffff8801b7b50180, in cache net_namespace size: 6656 > Allocated: > PID = 3169 > [ 1966.129951] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565 > [ 1966.137357] [<ffffffff83549a86>] kmem_cache_zalloc > include/linux/slab.h:626 [inline] > [ 1966.137357] [<ffffffff83549a86>] net_alloc > net/core/net_namespace.c:339 [inline] > [ 1966.137357] [<ffffffff83549a86>] copy_net_ns+0x196/0x480 > net/core/net_namespace.c:379 > [ 1966.144350] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860 > kernel/nsproxy.c:106 > [ 1966.152254] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420 > kernel/nsproxy.c:164 > [ 1966.159567] [<ffffffff814197f1>] > copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659 > [ 1966.167484] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline] > [ 1966.167484] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937 > [ 1966.174207] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline] > [ 1966.174207] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041 > [ 1966.180832] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 > arch/x86/entry/common.c:280 > [ 1966.187973] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a > Freed: > PID = 8938 > [ 1966.221347] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline] > [ 1966.221347] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767 > [ 1966.228568] [<ffffffff83548e3e>] net_free > net/core/net_namespace.c:355 [inline] > [ 1966.228568] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140 > net/core/net_namespace.c:362 > [ 1966.235564] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90 > net/core/net_namespace.c:472 > [ 1966.242517] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 > kernel/workqueue.c:2096 > [ 1966.249995] [<ffffffff81493bc3>] worker_thread+0x223/0x1990 > kernel/workqueue.c:2230 > [ 1966.257258] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 > [ 1966.263879] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40 > arch/x86/entry/entry_64.S:433 > Memory state around the buggy address: > ffff8801b7b50200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8801b7b50280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>ffff8801b7b50300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff8801b7b50380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8801b7b50400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > > BUG: KASAN: slab-out-of-bounds in tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 at addr ffff8801c98f43a0 > Read of size 8 by task syz-executor8/3423 > CPU: 0 PID: 3423 Comm: syz-executor8 Not tainted 4.10.0-rc5 #19 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > kasan_object_err+0x1c/0x70 mm/kasan/report.c:161 > print_address_description mm/kasan/report.c:199 [inline] > kasan_report_error+0x1d1/0x4d0 mm/kasan/report.c:288 > kasan_report mm/kasan/report.c:308 [inline] > __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329 > tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 > call_timer_fn+0x241/0x820 kernel/time/timer.c:1308 > expire_timers kernel/time/timer.c:1348 [inline] > __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642 > run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655 > __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 > invoke_softirq kernel/softirq.c:364 [inline] > irq_exit+0x1cc/0x200 kernel/softirq.c:405 > exiting_irq arch/x86/include/asm/apic.h:658 [inline] > smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961 > apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 > RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline] > RIP: 0010:arch_local_irq_save arch/x86/include/asm/paravirt.h:784 [inline] > RIP: 0010:lock_is_held_type+0x124/0x310 kernel/locking/lockdep.c:3787 > RSP: 0018:ffff8801c946f558 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10 > RAX: 0000000000000286 RBX: 1ffff1003928deac RCX: 1ffff1003928deb0 > RDX: 1ffffffff0a18984 RSI: 00000000ffffffff RDI: ffffffff850c4c20 > RBP: ffff8801c946f6a8 R08: 0000000000000002 R09: 0000000000000001 > R10: 000000000000000a R11: 0000000000000000 R12: ffff8801c946f680 > R13: ffff8801c9492640 R14: ffffffff85130ec0 R15: 0000000000000bff > </IRQ> > lock_is_held include/linux/lockdep.h:348 [inline] > ___might_sleep+0x5b3/0x650 kernel/sched/core.c:7748 > __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 > cache_alloc_debugcheck_before mm/slab.c:3071 [inline] > slab_alloc mm/slab.c:3386 [inline] > kmem_cache_alloc+0x273/0x680 mm/slab.c:3558 > shmem_alloc_inode+0x1b/0x40 mm/shmem.c:3647 > alloc_inode+0x61/0x180 fs/inode.c:207 > new_inode_pseudo+0x69/0x170 fs/inode.c:889 > new_inode+0x1c/0x40 fs/inode.c:918 > shmem_get_inode+0xd1/0x8a0 mm/shmem.c:2120 > shmem_mknod+0x58/0x1b0 mm/shmem.c:2824 > shmem_mkdir+0x29/0x50 mm/shmem.c:2875 > vfs_mkdir+0x3be/0x600 fs/namei.c:3738 > SYSC_mkdirat fs/namei.c:3761 [inline] > SyS_mkdirat fs/namei.c:3745 [inline] > SYSC_mkdir fs/namei.c:3772 [inline] > SyS_mkdir+0x16e/0x290 fs/namei.c:3770 > entry_SYSCALL_64_fastpath+0x1f/0xc2 > RIP: 0033:0x44ec87 > RSP: 002b:0000000001a2fe40 EFLAGS: 00000212 ORIG_RAX: 0000000000000053 > RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 000000000044ec87 > RDX: 0000000001a2fe6e RSI: 00000000000001ff RDI: 0000000001a2fe68 > RBP: 00000000000019ec R08: 0000000000000000 R09: 0000000000000006 > R10: 0000000000000064 R11: 0000000000000212 R12: 0000000001ef390c > R13: 0000000000000000 R14: 00000000000a43b5 R15: 00000000000019ec > Object at ffff8801c98f44c0, in cache task_struct size: 5696 > Allocated: > PID = 3452 > [<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 > [<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502 > [<ffffffff819f71da>] set_track mm/kasan/kasan.c:514 [inline] > [<ffffffff819f71da>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605 > [<ffffffff819f77d2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 > [<ffffffff819f1652>] kmem_cache_alloc_node+0x122/0x690 mm/slab.c:3650 > [<ffffffff81421fe2>] alloc_task_struct_node kernel/fork.c:142 [inline] > [<ffffffff81421fe2>] dup_task_struct kernel/fork.c:482 [inline] > [<ffffffff81421fe2>] copy_process.part.42+0x1a32/0x5fd0 kernel/fork.c:1515 > [<ffffffff81426ac0>] copy_process kernel/fork.c:1486 [inline] > [<ffffffff81426ac0>] _do_fork+0x200/0xff0 kernel/fork.c:1942 > [<ffffffff81427987>] SYSC_clone kernel/fork.c:2052 [inline] > [<ffffffff81427987>] SyS_clone+0x37/0x50 kernel/fork.c:2046 > [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 > [<ffffffff8440fb09>] return_from_SYSCALL_64+0x0/0x7a > Freed: > PID = 29885 > [<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 > [<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502 > [<ffffffff819f784f>] set_track mm/kasan/kasan.c:514 [inline] > [<ffffffff819f784f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578 > [<ffffffff819f4bf1>] __cache_free mm/slab.c:3502 [inline] > [<ffffffff819f4bf1>] kmem_cache_free+0x71/0x240 mm/slab.c:3762 > [<ffffffff8141f041>] free_task_struct kernel/fork.c:147 [inline] > [<ffffffff8141f041>] free_task+0x151/0x1d0 kernel/fork.c:359 > [<ffffffff8141f30b>] __put_task_struct+0x24b/0x5f0 kernel/fork.c:396 > [<ffffffff81435baa>] put_task_struct include/linux/sched.h:2257 [inline] > [<ffffffff81435baa>] delayed_put_task_struct+0xca/0x3f0 kernel/exit.c:173 > [<ffffffff815ef250>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline] > [<ffffffff815ef250>] rcu_do_batch.isra.70+0x9e0/0xdf0 kernel/rcu/tree.c:2780 > [<ffffffff815efad2>] invoke_rcu_callbacks kernel/rcu/tree.c:3043 [inline] > [<ffffffff815efad2>] __rcu_process_callbacks kernel/rcu/tree.c:3010 [inline] > [<ffffffff815efad2>] rcu_process_callbacks+0x472/0xc70 kernel/rcu/tree.c:3027 > [<ffffffff84412d7f>] __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 > Memory state around the buggy address: > ffff8801c98f4280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff8801c98f4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >>ffff8801c98f4380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ^ > ffff8801c98f4400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff8801c98f4480: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb > ================================================================== ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-01-23 10:23 ` Dmitry Vyukov @ 2017-01-24 14:28 ` Eric Dumazet 2017-01-24 15:06 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2017-01-24 14:28 UTC (permalink / raw) To: Dmitry Vyukov Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller On Mon, 2017-01-23 at 11:23 +0100, Dmitry Vyukov wrote: > On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > > Hello, > > > > While running syzkaller fuzzer I started seeing use-after-frees in > > tw_timer_handler. It happens with very low frequency, so far I've seen > > 22 of them. But all reports look consistent, so I would assume that it > > is real, just requires a very tricky race to happen. I've stared > > seeing it around Jan 17, however I did not update kernels for some > > time before that so potentially the issues was introduced somewhat > > earlier. Or maybe fuzzer just figured how to trigger it, and the bug > > is actually old. I am seeing it on all of torvalds/linux-next/mmotm, > > some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36, > > 5cf7a0f3442b2312326c39f571d637669a478235, > > c497f8d17246720afe680ea1a8fa6e48e75af852. > > Majority of reports points to net_drop_ns as the offending free, but > > it may be red herring. Since the access happens in timer, it can > > happen long after free and the memory could have been reused. I've > > also seen few where the access in tw_timer_handler is reported as > > out-of-bounds on task_struct and on struct filename. > > > > I've briefly skimmed through the code. Assuming that it requires a > very tricky race to be triggered, the most suspicious looks > inet_twsk_deschedule_put vs __inet_twsk_schedule: > > void inet_twsk_deschedule_put(struct inet_timewait_sock *tw) > { > if (del_timer_sync(&tw->tw_timer)) > inet_twsk_kill(tw); > inet_twsk_put(tw); > } > > void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) > { > tw->tw_kill = timeo <= 4*HZ; > if (!rearm) { > BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo)); > atomic_inc(&tw->tw_dr->tw_count); > } else { > mod_timer_pending(&tw->tw_timer, jiffies + timeo); > } > } > > Can't it somehow end up rearming already deleted timer? Or maybe the > first mod_timer happens after del_timer_sync? This code was changed a long time ago : https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 So I suspect a recent patch broke the logic. You might start a bisection : I would check if 4.7 and 4.8 trigger the issue you noticed. Thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-01-24 14:28 ` Eric Dumazet @ 2017-01-24 15:06 ` Dmitry Vyukov 2017-01-24 15:52 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-01-24 15:06 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller On Tue, Jan 24, 2017 at 3:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2017-01-23 at 11:23 +0100, Dmitry Vyukov wrote: >> On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> > Hello, >> > >> > While running syzkaller fuzzer I started seeing use-after-frees in >> > tw_timer_handler. It happens with very low frequency, so far I've seen >> > 22 of them. But all reports look consistent, so I would assume that it >> > is real, just requires a very tricky race to happen. I've stared >> > seeing it around Jan 17, however I did not update kernels for some >> > time before that so potentially the issues was introduced somewhat >> > earlier. Or maybe fuzzer just figured how to trigger it, and the bug >> > is actually old. I am seeing it on all of torvalds/linux-next/mmotm, >> > some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36, >> > 5cf7a0f3442b2312326c39f571d637669a478235, >> > c497f8d17246720afe680ea1a8fa6e48e75af852. >> > Majority of reports points to net_drop_ns as the offending free, but >> > it may be red herring. Since the access happens in timer, it can >> > happen long after free and the memory could have been reused. I've >> > also seen few where the access in tw_timer_handler is reported as >> > out-of-bounds on task_struct and on struct filename. >> >> >> >> I've briefly skimmed through the code. Assuming that it requires a >> very tricky race to be triggered, the most suspicious looks >> inet_twsk_deschedule_put vs __inet_twsk_schedule: >> >> void inet_twsk_deschedule_put(struct inet_timewait_sock *tw) >> { >> if (del_timer_sync(&tw->tw_timer)) >> inet_twsk_kill(tw); >> inet_twsk_put(tw); >> } >> >> void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) >> { >> tw->tw_kill = timeo <= 4*HZ; >> if (!rearm) { >> BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo)); >> atomic_inc(&tw->tw_dr->tw_count); >> } else { >> mod_timer_pending(&tw->tw_timer, jiffies + timeo); >> } >> } >> >> Can't it somehow end up rearming already deleted timer? Or maybe the >> first mod_timer happens after del_timer_sync? > > This code was changed a long time ago : > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 > > So I suspect a recent patch broke the logic. > > You might start a bisection : > > I would check if 4.7 and 4.8 trigger the issue you noticed. It happens with too low rate for bisecting (few times per day). I could add some additional checks into code, but I don't know what checks could be useful. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-01-24 15:06 ` Dmitry Vyukov @ 2017-01-24 15:52 ` Eric Dumazet 2017-02-08 17:36 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2017-01-24 15:52 UTC (permalink / raw) To: Dmitry Vyukov Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> This code was changed a long time ago : >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >> >> So I suspect a recent patch broke the logic. >> >> You might start a bisection : >> >> I would check if 4.7 and 4.8 trigger the issue you noticed. > > > It happens with too low rate for bisecting (few times per day). I > could add some additional checks into code, but I don't know what > checks could be useful. If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure we are able to help. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-01-24 15:52 ` Eric Dumazet @ 2017-02-08 17:36 ` Dmitry Vyukov 2017-02-08 17:58 ` Eric Dumazet 2017-02-17 18:51 ` net: use-after-free in tw_timer_handler Cong Wang 0 siblings, 2 replies; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-08 17:36 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote: > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >>> >>> This code was changed a long time ago : >>> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>> >>> So I suspect a recent patch broke the logic. >>> >>> You might start a bisection : >>> >>> I would check if 4.7 and 4.8 trigger the issue you noticed. >> >> >> It happens with too low rate for bisecting (few times per day). I >> could add some additional checks into code, but I don't know what >> checks could be useful. > > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure > we are able to help. There are also chances that the problem is older. Looking at the code, this part of inet_twsk_purge looks fishy: 285 if (unlikely((tw->tw_family != family) || 286 atomic_read(&twsk_net(tw)->count))) { It uses net->count == 0 check to find the right sockets. But what if there are several nets with count == 0 in flight, can't there be several inet_twsk_purge calls running concurrently freeing each other sockets? If so it looks like inet_twsk_purge can call inet_twsk_deschedule_put twice for a socket. Namely, two calls for different nets discover the socket, check that net->count==0 and both call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge net that it needs to purge? The second issue that I noticed is that tw_refcnt is set to 4 _after_ we schedule the timer. The timer will probably won't fire before we set tw_refcnt, but if it somehow does it will corrupt the ref count. I don't think that it's what I am seeing, though. More likely it's the first issues (if it's real). Does it make any sense? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-08 17:36 ` Dmitry Vyukov @ 2017-02-08 17:58 ` Eric Dumazet 2017-02-08 18:55 ` Dmitry Vyukov 2017-02-17 18:51 ` net: use-after-free in tw_timer_handler Cong Wang 1 sibling, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2017-02-08 17:58 UTC (permalink / raw) To: Dmitry Vyukov Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote: > On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > >>> > >>> This code was changed a long time ago : > >>> > >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 > >>> > >>> So I suspect a recent patch broke the logic. > >>> > >>> You might start a bisection : > >>> > >>> I would check if 4.7 and 4.8 trigger the issue you noticed. > >> > >> > >> It happens with too low rate for bisecting (few times per day). I > >> could add some additional checks into code, but I don't know what > >> checks could be useful. > > > > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure > > we are able to help. > > > There are also chances that the problem is older. > > Looking at the code, this part of inet_twsk_purge looks fishy: > > 285 if (unlikely((tw->tw_family != family) || > 286 atomic_read(&twsk_net(tw)->count))) { > > It uses net->count == 0 check to find the right sockets. But what if > there are several nets with count == 0 in flight, can't there be > several inet_twsk_purge calls running concurrently freeing each other > sockets? If so it looks like inet_twsk_purge can call > inet_twsk_deschedule_put twice for a socket. Namely, two calls for > different nets discover the socket, check that net->count==0 and both > call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge > net that it needs to purge? Yes, atomic_read() is not a proper sync point. > The second issue that I noticed is that tw_refcnt is set to 4 _after_ > we schedule the timer. The timer will probably won't fire before we > set tw_refcnt, but if it somehow does it will corrupt the ref count. I > don't think that it's what I am seeing, though. More likely it's the > first issues (if it's real). > Timer is pinned, it cannot fire under us on this cpu. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-08 17:58 ` Eric Dumazet @ 2017-02-08 18:55 ` Dmitry Vyukov 2017-02-08 19:17 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-08 18:55 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Wed, Feb 8, 2017 at 6:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote: >> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote: >> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >>> >> >>> This code was changed a long time ago : >> >>> >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >> >>> >> >>> So I suspect a recent patch broke the logic. >> >>> >> >>> You might start a bisection : >> >>> >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed. >> >> >> >> >> >> It happens with too low rate for bisecting (few times per day). I >> >> could add some additional checks into code, but I don't know what >> >> checks could be useful. >> > >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >> > we are able to help. >> >> >> There are also chances that the problem is older. >> >> Looking at the code, this part of inet_twsk_purge looks fishy: >> >> 285 if (unlikely((tw->tw_family != family) || >> 286 atomic_read(&twsk_net(tw)->count))) { >> >> It uses net->count == 0 check to find the right sockets. But what if >> there are several nets with count == 0 in flight, can't there be >> several inet_twsk_purge calls running concurrently freeing each other >> sockets? If so it looks like inet_twsk_purge can call >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >> different nets discover the socket, check that net->count==0 and both >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >> net that it needs to purge? > > Yes, atomic_read() is not a proper sync point. Do you mean that it does not include read barrier? I more mean that we can call inet_twsk_deschedule_put twice for the same socket. >> The second issue that I noticed is that tw_refcnt is set to 4 _after_ >> we schedule the timer. The timer will probably won't fire before we >> set tw_refcnt, but if it somehow does it will corrupt the ref count. I >> don't think that it's what I am seeing, though. More likely it's the >> first issues (if it's real). >> > > Timer is pinned, it cannot fire under us on this cpu. Ack ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-08 18:55 ` Dmitry Vyukov @ 2017-02-08 19:17 ` Eric Dumazet 2017-02-08 19:32 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2017-02-08 19:17 UTC (permalink / raw) To: Dmitry Vyukov Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Wed, 2017-02-08 at 19:55 +0100, Dmitry Vyukov wrote: > On Wed, Feb 8, 2017 at 6:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote: > >> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote: > >> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > >> >>> > >> >>> This code was changed a long time ago : > >> >>> > >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 > >> >>> > >> >>> So I suspect a recent patch broke the logic. > >> >>> > >> >>> You might start a bisection : > >> >>> > >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed. > >> >> > >> >> > >> >> It happens with too low rate for bisecting (few times per day). I > >> >> could add some additional checks into code, but I don't know what > >> >> checks could be useful. > >> > > >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure > >> > we are able to help. > >> > >> > >> There are also chances that the problem is older. > >> > >> Looking at the code, this part of inet_twsk_purge looks fishy: > >> > >> 285 if (unlikely((tw->tw_family != family) || > >> 286 atomic_read(&twsk_net(tw)->count))) { > >> > >> It uses net->count == 0 check to find the right sockets. But what if > >> there are several nets with count == 0 in flight, can't there be > >> several inet_twsk_purge calls running concurrently freeing each other > >> sockets? If so it looks like inet_twsk_purge can call > >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for > >> different nets discover the socket, check that net->count==0 and both > >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge > >> net that it needs to purge? > > > > Yes, atomic_read() is not a proper sync point. > > Do you mean that it does not include read barrier? > I more mean that we can call inet_twsk_deschedule_put twice for the same socket. I meant that this code assumed RTNL being held. This might not be the case now, after some old change. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-08 19:17 ` Eric Dumazet @ 2017-02-08 19:32 ` Dmitry Vyukov 2017-02-14 19:38 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-08 19:32 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Wed, Feb 8, 2017 at 8:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2017-02-08 at 19:55 +0100, Dmitry Vyukov wrote: >> On Wed, Feb 8, 2017 at 6:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote: >> >> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote: >> >> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> >>> >> >> >>> This code was changed a long time ago : >> >> >>> >> >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >> >> >>> >> >> >>> So I suspect a recent patch broke the logic. >> >> >>> >> >> >>> You might start a bisection : >> >> >>> >> >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed. >> >> >> >> >> >> >> >> >> It happens with too low rate for bisecting (few times per day). I >> >> >> could add some additional checks into code, but I don't know what >> >> >> checks could be useful. >> >> > >> >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >> >> > we are able to help. >> >> >> >> >> >> There are also chances that the problem is older. >> >> >> >> Looking at the code, this part of inet_twsk_purge looks fishy: >> >> >> >> 285 if (unlikely((tw->tw_family != family) || >> >> 286 atomic_read(&twsk_net(tw)->count))) { >> >> >> >> It uses net->count == 0 check to find the right sockets. But what if >> >> there are several nets with count == 0 in flight, can't there be >> >> several inet_twsk_purge calls running concurrently freeing each other >> >> sockets? If so it looks like inet_twsk_purge can call >> >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >> >> different nets discover the socket, check that net->count==0 and both >> >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >> >> net that it needs to purge? >> > >> > Yes, atomic_read() is not a proper sync point. >> >> Do you mean that it does not include read barrier? >> I more mean that we can call inet_twsk_deschedule_put twice for the same socket. > > I meant that this code assumed RTNL being held. > > This might not be the case now, after some old change. cleanup_net releases rtnl lock right before calling these callbacks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-08 19:32 ` Dmitry Vyukov @ 2017-02-14 19:38 ` Dmitry Vyukov 2017-02-21 11:27 ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-14 19:38 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller, Andrey Ryabinin On Wed, Feb 8, 2017 at 8:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >>> >> >>> This code was changed a long time ago : >>> >> >>> >>> >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>> >> >>> >>> >> >>> So I suspect a recent patch broke the logic. >>> >> >>> >>> >> >>> You might start a bisection : >>> >> >>> >>> >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed. >>> >> >> >>> >> >> >>> >> >> It happens with too low rate for bisecting (few times per day). I >>> >> >> could add some additional checks into code, but I don't know what >>> >> >> checks could be useful. >>> >> > >>> >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >>> >> > we are able to help. >>> >> >>> >> >>> >> There are also chances that the problem is older. >>> >> >>> >> Looking at the code, this part of inet_twsk_purge looks fishy: >>> >> >>> >> 285 if (unlikely((tw->tw_family != family) || >>> >> 286 atomic_read(&twsk_net(tw)->count))) { >>> >> >>> >> It uses net->count == 0 check to find the right sockets. But what if >>> >> there are several nets with count == 0 in flight, can't there be >>> >> several inet_twsk_purge calls running concurrently freeing each other >>> >> sockets? If so it looks like inet_twsk_purge can call >>> >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >>> >> different nets discover the socket, check that net->count==0 and both >>> >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >>> >> net that it needs to purge? >>> > >>> > Yes, atomic_read() is not a proper sync point. >>> >>> Do you mean that it does not include read barrier? >>> I more mean that we can call inet_twsk_deschedule_put twice for the same socket. >> >> I meant that this code assumed RTNL being held. >> >> This might not be the case now, after some old change. > > > cleanup_net releases rtnl lock right before calling these callbacks. +Andrey, do you know somebody on your side interested in stability of network namespace? This use-after-free seems to be related to net namespace. For context, full thread is here: https://groups.google.com/forum/#!msg/syzkaller/p1tn-_Kc6l4/smuL_FMAAgAJ ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-14 19:38 ` Dmitry Vyukov @ 2017-02-21 11:27 ` Andrey Ryabinin 2017-02-21 11:56 ` Dmitry Vyukov ` (3 more replies) 0 siblings, 4 replies; 28+ messages in thread From: Andrey Ryabinin @ 2017-02-21 11:27 UTC (permalink / raw) To: Gerrit Renker, David S. Miller, dccp Cc: Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, linux-kernel, Andrey Ryabinin DCCP doesn't purge timewait sockets on network namespace shutdown. So, after net namespace destroyed we could still have an active timer which will trigger use after free in tw_timer_handler(): BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10 Read of size 8 by task swapper/1/0 Call Trace: __asan_load8+0x54/0x90 tw_timer_handler+0x4a/0xa0 call_timer_fn+0x127/0x480 expire_timers+0x1db/0x2e0 run_timer_softirq+0x12f/0x2a0 __do_softirq+0x105/0x5b4 irq_exit+0xdd/0xf0 smp_apic_timer_interrupt+0x57/0x70 apic_timer_interrupt+0x90/0xa0 Object at ffff88010e0d1bc0, in cache net_namespace size: 6848 Allocated: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xee/0x180 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x134/0x310 copy_net_ns+0x8d/0x280 create_new_namespaces+0x23f/0x340 unshare_nsproxy_namespaces+0x75/0xf0 SyS_unshare+0x299/0x4f0 entry_SYSCALL_64_fastpath+0x18/0xad Freed: save_stack_trace+0x1b/0x20 kasan_slab_free+0xae/0x180 kmem_cache_free+0xb4/0x350 net_drop_ns+0x3f/0x50 cleanup_net+0x3df/0x450 process_one_work+0x419/0xbb0 worker_thread+0x92/0x850 kthread+0x192/0x1e0 ret_from_fork+0x2e/0x40 Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge timewait sockets on net namespace destruction and prevent above issue. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> --- net/dccp/ipv4.c | 6 ++++++ net/dccp/ipv6.c | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index d859a5c..da7cb16 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net) inet_ctl_sock_destroy(net->dccp.v4_ctl_sk); } +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) +{ + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET); +} + static struct pernet_operations dccp_v4_ops = { .init = dccp_v4_init_net, .exit = dccp_v4_exit_net, + .exit_batch = dccp_v4_exit_batch, }; static int __init dccp_v4_init(void) diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index c4e879c..f3d8f92 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net) inet_ctl_sock_destroy(net->dccp.v6_ctl_sk); } +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) +{ + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6); +} + static struct pernet_operations dccp_v6_ops = { .init = dccp_v6_init_net, .exit = dccp_v6_exit_net, + .exit_batch = dccp_v6_exit_batch, }; static int __init dccp_v6_init(void) -- 2.10.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 11:27 ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin @ 2017-02-21 11:56 ` Dmitry Vyukov 2017-02-22 6:48 ` Dmitry Vyukov 2017-02-21 13:43 ` Arnaldo Carvalho de Melo ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-21 11:56 UTC (permalink / raw) To: Andrey Ryabinin Cc: Gerrit Renker, David S. Miller, dccp, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML On Tue, Feb 21, 2017 at 2:27 PM, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > DCCP doesn't purge timewait sockets on network namespace shutdown. > So, after net namespace destroyed we could still have an active timer > which will trigger use after free in tw_timer_handler(): > > BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10 > Read of size 8 by task swapper/1/0 > Call Trace: > __asan_load8+0x54/0x90 > tw_timer_handler+0x4a/0xa0 > call_timer_fn+0x127/0x480 > expire_timers+0x1db/0x2e0 > run_timer_softirq+0x12f/0x2a0 > __do_softirq+0x105/0x5b4 > irq_exit+0xdd/0xf0 > smp_apic_timer_interrupt+0x57/0x70 > apic_timer_interrupt+0x90/0xa0 > > Object at ffff88010e0d1bc0, in cache net_namespace size: 6848 > Allocated: > save_stack_trace+0x1b/0x20 > kasan_kmalloc+0xee/0x180 > kasan_slab_alloc+0x12/0x20 > kmem_cache_alloc+0x134/0x310 > copy_net_ns+0x8d/0x280 > create_new_namespaces+0x23f/0x340 > unshare_nsproxy_namespaces+0x75/0xf0 > SyS_unshare+0x299/0x4f0 > entry_SYSCALL_64_fastpath+0x18/0xad > Freed: > save_stack_trace+0x1b/0x20 > kasan_slab_free+0xae/0x180 > kmem_cache_free+0xb4/0x350 > net_drop_ns+0x3f/0x50 > cleanup_net+0x3df/0x450 > process_one_work+0x419/0xbb0 > worker_thread+0x92/0x850 > kthread+0x192/0x1e0 > ret_from_fork+0x2e/0x40 > > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge > timewait sockets on net namespace destruction and prevent above issue. Aha! Thanks for tracking this down! Queued the patch on syzkaller bots, should have the verdict in several hours. > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > --- > net/dccp/ipv4.c | 6 ++++++ > net/dccp/ipv6.c | 6 ++++++ > 2 files changed, 12 insertions(+) > > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c > index d859a5c..da7cb16 100644 > --- a/net/dccp/ipv4.c > +++ b/net/dccp/ipv4.c > @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net) > inet_ctl_sock_destroy(net->dccp.v4_ctl_sk); > } > > +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) > +{ > + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET); > +} > + > static struct pernet_operations dccp_v4_ops = { > .init = dccp_v4_init_net, > .exit = dccp_v4_exit_net, > + .exit_batch = dccp_v4_exit_batch, > }; > > static int __init dccp_v4_init(void) > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c > index c4e879c..f3d8f92 100644 > --- a/net/dccp/ipv6.c > +++ b/net/dccp/ipv6.c > @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net) > inet_ctl_sock_destroy(net->dccp.v6_ctl_sk); > } > > +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) > +{ > + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6); > +} > + > static struct pernet_operations dccp_v6_ops = { > .init = dccp_v6_init_net, > .exit = dccp_v6_exit_net, > + .exit_batch = dccp_v6_exit_batch, > }; > > static int __init dccp_v6_init(void) > -- > 2.10.2 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 11:56 ` Dmitry Vyukov @ 2017-02-22 6:48 ` Dmitry Vyukov 0 siblings, 0 replies; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-22 6:48 UTC (permalink / raw) To: Andrey Ryabinin Cc: Gerrit Renker, David S. Miller, dccp, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML On Tue, Feb 21, 2017 at 2:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, Feb 21, 2017 at 2:27 PM, Andrey Ryabinin > <aryabinin@virtuozzo.com> wrote: >> DCCP doesn't purge timewait sockets on network namespace shutdown. >> So, after net namespace destroyed we could still have an active timer >> which will trigger use after free in tw_timer_handler(): >> >> BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10 >> Read of size 8 by task swapper/1/0 >> Call Trace: >> __asan_load8+0x54/0x90 >> tw_timer_handler+0x4a/0xa0 >> call_timer_fn+0x127/0x480 >> expire_timers+0x1db/0x2e0 >> run_timer_softirq+0x12f/0x2a0 >> __do_softirq+0x105/0x5b4 >> irq_exit+0xdd/0xf0 >> smp_apic_timer_interrupt+0x57/0x70 >> apic_timer_interrupt+0x90/0xa0 >> >> Object at ffff88010e0d1bc0, in cache net_namespace size: 6848 >> Allocated: >> save_stack_trace+0x1b/0x20 >> kasan_kmalloc+0xee/0x180 >> kasan_slab_alloc+0x12/0x20 >> kmem_cache_alloc+0x134/0x310 >> copy_net_ns+0x8d/0x280 >> create_new_namespaces+0x23f/0x340 >> unshare_nsproxy_namespaces+0x75/0xf0 >> SyS_unshare+0x299/0x4f0 >> entry_SYSCALL_64_fastpath+0x18/0xad >> Freed: >> save_stack_trace+0x1b/0x20 >> kasan_slab_free+0xae/0x180 >> kmem_cache_free+0xb4/0x350 >> net_drop_ns+0x3f/0x50 >> cleanup_net+0x3df/0x450 >> process_one_work+0x419/0xbb0 >> worker_thread+0x92/0x850 >> kthread+0x192/0x1e0 >> ret_from_fork+0x2e/0x40 >> >> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge >> timewait sockets on net namespace destruction and prevent above issue. > > > Aha! Thanks for tracking this down! > > Queued the patch on syzkaller bots, should have the verdict in several hours. I do not see the crash happening after applying the patch. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 11:27 ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin 2017-02-21 11:56 ` Dmitry Vyukov @ 2017-02-21 13:43 ` Arnaldo Carvalho de Melo 2017-02-21 13:53 ` Eric Dumazet 2017-02-22 8:59 ` Andrey Ryabinin 2017-02-21 18:23 ` David Miller 2017-02-22 9:35 ` [PATCH v2] " Andrey Ryabinin 3 siblings, 2 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2017-02-21 13:43 UTC (permalink / raw) To: Andrey Ryabinin Cc: Eric W. Biederman, Gerrit Renker, David S. Miller, dccp, Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, linux-kernel Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu: > DCCP doesn't purge timewait sockets on network namespace shutdown. > So, after net namespace destroyed we could still have an active timer > which will trigger use after free in tw_timer_handler(): > > > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge > timewait sockets on net namespace destruction and prevent above issue. Please add this, to help stable kernels to pick this up Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge") Cc: Eric W. Biederman <ebiederm@xmission.com> [acme@jouet linux]$ git describe b099ce2602d8 v2.6.32-rc8-1977-gb099ce2602d8 This one added the pernet operations related to network namespaces, but then the one above got missed. commit 72a2d6138224298a576bcdc33d7d0004de604856 Author: Pavel Emelyanov <xemul@openvz.org> Date: Sun Apr 13 22:29:13 2008 -0700 [NETNS][DCCPV4]: Add dummy per-net operations. ---------------------------------- It looks ok, so please consider adding my: Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> - Arnaldo > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > --- > net/dccp/ipv4.c | 6 ++++++ > net/dccp/ipv6.c | 6 ++++++ > 2 files changed, 12 insertions(+) > > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c > index d859a5c..da7cb16 100644 > --- a/net/dccp/ipv4.c > +++ b/net/dccp/ipv4.c > @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net) > inet_ctl_sock_destroy(net->dccp.v4_ctl_sk); > } > > +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) > +{ > + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET); > +} > + > static struct pernet_operations dccp_v4_ops = { > .init = dccp_v4_init_net, > .exit = dccp_v4_exit_net, > + .exit_batch = dccp_v4_exit_batch, > }; > > static int __init dccp_v4_init(void) > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c > index c4e879c..f3d8f92 100644 > --- a/net/dccp/ipv6.c > +++ b/net/dccp/ipv6.c > @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net) > inet_ctl_sock_destroy(net->dccp.v6_ctl_sk); > } > > +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) > +{ > + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6); > +} > + > static struct pernet_operations dccp_v6_ops = { > .init = dccp_v6_init_net, > .exit = dccp_v6_exit_net, > + .exit_batch = dccp_v6_exit_batch, > }; > > static int __init dccp_v6_init(void) > -- > 2.10.2 > > -- > To unsubscribe from this list: send the line "unsubscribe dccp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 13:43 ` Arnaldo Carvalho de Melo @ 2017-02-21 13:53 ` Eric Dumazet 2017-02-21 18:23 ` David Miller 2017-02-22 8:59 ` Andrey Ryabinin 1 sibling, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2017-02-21 13:53 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Andrey Ryabinin, Eric W. Biederman, Gerrit Renker, David S. Miller, dccp, Dmitry Vyukov, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML On Tue, Feb 21, 2017 at 5:43 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu: > > DCCP doesn't purge timewait sockets on network namespace shutdown. > > So, after net namespace destroyed we could still have an active timer > > which will trigger use after free in tw_timer_handler(): > > > > > > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge > > timewait sockets on net namespace destruction and prevent above issue. > > Please add this, to help stable kernels to pick this up > > Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge") > Cc: Eric W. Biederman <ebiederm@xmission.com> This patch has nothing to do with this bug really. Look at commit d315492b1a6ba29da0fa2860759505ae1b2db857 ("netns : fix kernel panic in timewait socket destruction") Back in 2008, nobody spotted that DCCP was using the same infra. When can we get rid of DCCP in linux so that syszkaller team no longer spend time on it ? Thanks. > > [acme@jouet linux]$ git describe b099ce2602d8 > v2.6.32-rc8-1977-gb099ce2602d8 > > This one added the pernet operations related to network namespaces, but > then the one above got missed. > > commit 72a2d6138224298a576bcdc33d7d0004de604856 > Author: Pavel Emelyanov <xemul@openvz.org> > Date: Sun Apr 13 22:29:13 2008 -0700 > > [NETNS][DCCPV4]: Add dummy per-net operations. > > ---------------------------------- > > It looks ok, so please consider adding my: > > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > - Arnaldo > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > > --- > > net/dccp/ipv4.c | 6 ++++++ > > net/dccp/ipv6.c | 6 ++++++ > > 2 files changed, 12 insertions(+) > > > > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c > > index d859a5c..da7cb16 100644 > > --- a/net/dccp/ipv4.c > > +++ b/net/dccp/ipv4.c > > @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net) > > inet_ctl_sock_destroy(net->dccp.v4_ctl_sk); > > } > > > > +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) > > +{ > > + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET); > > +} > > + > > static struct pernet_operations dccp_v4_ops = { > > .init = dccp_v4_init_net, > > .exit = dccp_v4_exit_net, > > + .exit_batch = dccp_v4_exit_batch, > > }; > > > > static int __init dccp_v4_init(void) > > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c > > index c4e879c..f3d8f92 100644 > > --- a/net/dccp/ipv6.c > > +++ b/net/dccp/ipv6.c > > @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net) > > inet_ctl_sock_destroy(net->dccp.v6_ctl_sk); > > } > > > > +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) > > +{ > > + inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6); > > +} > > + > > static struct pernet_operations dccp_v6_ops = { > > .init = dccp_v6_init_net, > > .exit = dccp_v6_exit_net, > > + .exit_batch = dccp_v6_exit_batch, > > }; > > > > static int __init dccp_v6_init(void) > > -- > > 2.10.2 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe dccp" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 13:53 ` Eric Dumazet @ 2017-02-21 18:23 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2017-02-21 18:23 UTC (permalink / raw) To: edumazet Cc: acme, aryabinin, ebiederm, gerrit, dccp, dvyukov, xiyou.wangcong, kuznet, yoshfuji, kaber, syzkaller, netdev, linux-kernel From: Eric Dumazet <edumazet@google.com> Date: Tue, 21 Feb 2017 05:53:13 -0800 > On Tue, Feb 21, 2017 at 5:43 AM, Arnaldo Carvalho de Melo > <acme@kernel.org> wrote: >> >> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu: >> > DCCP doesn't purge timewait sockets on network namespace shutdown. >> > So, after net namespace destroyed we could still have an active timer >> > which will trigger use after free in tw_timer_handler(): >> > >> > >> > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge >> > timewait sockets on net namespace destruction and prevent above issue. >> >> Please add this, to help stable kernels to pick this up >> >> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge") >> Cc: Eric W. Biederman <ebiederm@xmission.com> > > > This patch has nothing to do with this bug really. > > Look at commit d315492b1a6ba29da0fa2860759505ae1b2db857 > ("netns : fix kernel panic in timewait socket destruction") > > Back in 2008, nobody spotted that DCCP was using the same infra. So, let me get this straight, dccp is buggy because it tried as hard as possible to share and use common pieces of infrastructure instead of duplicating all of said logic? Now I've heard everything. I know it has been a pain in the rear fixing all of these dccp bugs, but removing it from the tree or even pushing it into staging is simply not an option. So we better come up with a better plan based upon reality rather than fantasy. :-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 13:43 ` Arnaldo Carvalho de Melo 2017-02-21 13:53 ` Eric Dumazet @ 2017-02-22 8:59 ` Andrey Ryabinin 1 sibling, 0 replies; 28+ messages in thread From: Andrey Ryabinin @ 2017-02-22 8:59 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Eric W. Biederman, Gerrit Renker, David S. Miller, dccp, Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, linux-kernel On 02/21/2017 04:43 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu: >> DCCP doesn't purge timewait sockets on network namespace shutdown. >> So, after net namespace destroyed we could still have an active timer >> which will trigger use after free in tw_timer_handler(): >> >> >> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge >> timewait sockets on net namespace destruction and prevent above issue. > > Please add this, to help stable kernels to pick this up > > Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge") > Cc: Eric W. Biederman <ebiederm@xmission.com> > Fixes tag should blame commit f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH"). It introduced use of net namespace in the timer callback. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 11:27 ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin 2017-02-21 11:56 ` Dmitry Vyukov 2017-02-21 13:43 ` Arnaldo Carvalho de Melo @ 2017-02-21 18:23 ` David Miller 2017-02-21 18:24 ` David Miller 2017-02-22 9:35 ` [PATCH v2] " Andrey Ryabinin 3 siblings, 1 reply; 28+ messages in thread From: David Miller @ 2017-02-21 18:23 UTC (permalink / raw) To: aryabinin Cc: gerrit, dccp, dvyukov, edumazet, xiyou.wangcong, kuznet, yoshfuji, kaber, syzkaller, netdev, linux-kernel From: Andrey Ryabinin <aryabinin@virtuozzo.com> Date: Tue, 21 Feb 2017 14:27:40 +0300 > DCCP doesn't purge timewait sockets on network namespace shutdown. > So, after net namespace destroyed we could still have an active timer > which will trigger use after free in tw_timer_handler(): ... > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge > timewait sockets on net namespace destruction and prevent above issue. > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Applied and queued up for -stable, thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 18:23 ` David Miller @ 2017-02-21 18:24 ` David Miller 2017-02-22 9:35 ` Andrey Ryabinin 0 siblings, 1 reply; 28+ messages in thread From: David Miller @ 2017-02-21 18:24 UTC (permalink / raw) To: aryabinin Cc: gerrit, dccp, dvyukov, edumazet, xiyou.wangcong, kuznet, yoshfuji, kaber, syzkaller, netdev, linux-kernel From: David Miller <davem@davemloft.net> Date: Tue, 21 Feb 2017 13:23:51 -0500 (EST) > From: Andrey Ryabinin <aryabinin@virtuozzo.com> > Date: Tue, 21 Feb 2017 14:27:40 +0300 > >> DCCP doesn't purge timewait sockets on network namespace shutdown. >> So, after net namespace destroyed we could still have an active timer >> which will trigger use after free in tw_timer_handler(): > ... >> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge >> timewait sockets on net namespace destruction and prevent above issue. >> >> Reported-by: Dmitry Vyukov <dvyukov@google.com> >> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > > Applied and queued up for -stable, thanks. Actually, this doesn't even compile. Please fix this up and resubmit: net/dccp/ipv4.c: In function ‘dccp_v4_exit_batch’: net/dccp/ipv4.c:1022:34: warning: passing argument 2 of ‘inet_twsk_purge’ makes integer from pointer without a cast [-Wint-conversion] inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET); ^ In file included from ./include/linux/dccp.h:14:0, from net/dccp/ipv4.c:13: ./include/net/inet_timewait_sock.h:118:6: note: expected ‘int’ but argument is of type ‘struct inet_timewait_death_row *’ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family); ^ net/dccp/ipv4.c:1022:2: error: too many arguments to function ‘inet_twsk_purge’ inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET); ^ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 18:24 ` David Miller @ 2017-02-22 9:35 ` Andrey Ryabinin 0 siblings, 0 replies; 28+ messages in thread From: Andrey Ryabinin @ 2017-02-22 9:35 UTC (permalink / raw) To: David Miller Cc: gerrit, dccp, dvyukov, edumazet, xiyou.wangcong, kuznet, yoshfuji, kaber, syzkaller, netdev, linux-kernel On 02/21/2017 09:24 PM, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Tue, 21 Feb 2017 13:23:51 -0500 (EST) > >> From: Andrey Ryabinin <aryabinin@virtuozzo.com> >> Date: Tue, 21 Feb 2017 14:27:40 +0300 >> >>> DCCP doesn't purge timewait sockets on network namespace shutdown. >>> So, after net namespace destroyed we could still have an active timer >>> which will trigger use after free in tw_timer_handler(): >> ... >>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge >>> timewait sockets on net namespace destruction and prevent above issue. >>> >>> Reported-by: Dmitry Vyukov <dvyukov@google.com> >>> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> >> >> Applied and queued up for -stable, thanks. > > Actually, this doesn't even compile. Please fix this up and resubmit: > Right, I tested this on top of the Linus' tree. Rebased on -next now. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2] net/dccp: fix use after free in tw_timer_handler() 2017-02-21 11:27 ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin ` (2 preceding siblings ...) 2017-02-21 18:23 ` David Miller @ 2017-02-22 9:35 ` Andrey Ryabinin 2017-02-22 21:15 ` David Miller 3 siblings, 1 reply; 28+ messages in thread From: Andrey Ryabinin @ 2017-02-22 9:35 UTC (permalink / raw) To: Gerrit Renker, David S. Miller Cc: Andrey Ryabinin, Eric Dumazet, Arnaldo Carvalho de Melo, dccp, Dmitry Vyukov, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML DCCP doesn't purge timewait sockets on network namespace shutdown. So, after net namespace destroyed we could still have an active timer which will trigger use after free in tw_timer_handler(): BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10 Read of size 8 by task swapper/1/0 Call Trace: __asan_load8+0x54/0x90 tw_timer_handler+0x4a/0xa0 call_timer_fn+0x127/0x480 expire_timers+0x1db/0x2e0 run_timer_softirq+0x12f/0x2a0 __do_softirq+0x105/0x5b4 irq_exit+0xdd/0xf0 smp_apic_timer_interrupt+0x57/0x70 apic_timer_interrupt+0x90/0xa0 Object at ffff88010e0d1bc0, in cache net_namespace size: 6848 Allocated: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xee/0x180 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x134/0x310 copy_net_ns+0x8d/0x280 create_new_namespaces+0x23f/0x340 unshare_nsproxy_namespaces+0x75/0xf0 SyS_unshare+0x299/0x4f0 entry_SYSCALL_64_fastpath+0x18/0xad Freed: save_stack_trace+0x1b/0x20 kasan_slab_free+0xae/0x180 kmem_cache_free+0xb4/0x350 net_drop_ns+0x3f/0x50 cleanup_net+0x3df/0x450 process_one_work+0x419/0xbb0 worker_thread+0x92/0x850 kthread+0x192/0x1e0 ret_from_fork+0x2e/0x40 Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge timewait sockets on net namespace destruction and prevent above issue. Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- Changes since v1: - Rebased on top the -next - Added Fixes/Acked tags. net/dccp/ipv4.c | 6 ++++++ net/dccp/ipv6.c | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index b043ec8..409d0cf 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -1017,9 +1017,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net) inet_ctl_sock_destroy(net->dccp.v4_ctl_sk); } +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) +{ + inet_twsk_purge(&dccp_hashinfo, AF_INET); +} + static struct pernet_operations dccp_v4_ops = { .init = dccp_v4_init_net, .exit = dccp_v4_exit_net, + .exit_batch = dccp_v4_exit_batch, }; static int __init dccp_v4_init(void) diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index cef60a4..233b573 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1075,9 +1075,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net) inet_ctl_sock_destroy(net->dccp.v6_ctl_sk); } +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) +{ + inet_twsk_purge(&dccp_hashinfo, AF_INET6); +} + static struct pernet_operations dccp_v6_ops = { .init = dccp_v6_init_net, .exit = dccp_v6_exit_net, + .exit_batch = dccp_v6_exit_batch, }; static int __init dccp_v6_init(void) -- 2.10.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] net/dccp: fix use after free in tw_timer_handler() 2017-02-22 9:35 ` [PATCH v2] " Andrey Ryabinin @ 2017-02-22 21:15 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2017-02-22 21:15 UTC (permalink / raw) To: aryabinin Cc: gerrit, edumazet, acme, dccp, dvyukov, xiyou.wangcong, kuznet, yoshfuji, kaber, syzkaller, netdev, linux-kernel From: Andrey Ryabinin <aryabinin@virtuozzo.com> Date: Wed, 22 Feb 2017 12:35:27 +0300 > DCCP doesn't purge timewait sockets on network namespace shutdown. > So, after net namespace destroyed we could still have an active timer > which will trigger use after free in tw_timer_handler(): ... > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge > timewait sockets on net namespace destruction and prevent above issue. > > Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH") > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Applied and queued up for -sable, thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-08 17:36 ` Dmitry Vyukov 2017-02-08 17:58 ` Eric Dumazet @ 2017-02-17 18:51 ` Cong Wang 2017-02-17 20:36 ` Dmitry Vyukov 1 sibling, 1 reply; 28+ messages in thread From: Cong Wang @ 2017-02-17 18:51 UTC (permalink / raw) To: Dmitry Vyukov Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Wed, Feb 8, 2017 at 9:36 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote: >> On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >>>> >>>> This code was changed a long time ago : >>>> >>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>>> >>>> So I suspect a recent patch broke the logic. >>>> >>>> You might start a bisection : >>>> >>>> I would check if 4.7 and 4.8 trigger the issue you noticed. >>> >>> >>> It happens with too low rate for bisecting (few times per day). I >>> could add some additional checks into code, but I don't know what >>> checks could be useful. >> >> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >> we are able to help. > > > There are also chances that the problem is older. > > Looking at the code, this part of inet_twsk_purge looks fishy: > > 285 if (unlikely((tw->tw_family != family) || > 286 atomic_read(&twsk_net(tw)->count))) { > > It uses net->count == 0 check to find the right sockets. But what if > there are several nets with count == 0 in flight, can't there be > several inet_twsk_purge calls running concurrently freeing each other > sockets? If so it looks like inet_twsk_purge can call > inet_twsk_deschedule_put twice for a socket. Namely, two calls for > different nets discover the socket, check that net->count==0 and both > call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge > net that it needs to purge? I don't think this could happen, because cleanup_net() is called in a work struct, and this work can't be scheduled twice, so there should not be any parallel cleanup_net(). Also, inet_twsk_deschedule_put() already waits for the flying timer, net->count==0 at this point and all sockets in this netns are already gone, so I don't know how a timer could be still fired after this. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-17 18:51 ` net: use-after-free in tw_timer_handler Cong Wang @ 2017-02-17 20:36 ` Dmitry Vyukov 2017-02-17 22:30 ` Cong Wang 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-17 20:36 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Fri, Feb 17, 2017 at 7:51 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>>> >>>>> This code was changed a long time ago : >>>>> >>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>>>> >>>>> So I suspect a recent patch broke the logic. >>>>> >>>>> You might start a bisection : >>>>> >>>>> I would check if 4.7 and 4.8 trigger the issue you noticed. >>>> >>>> >>>> It happens with too low rate for bisecting (few times per day). I >>>> could add some additional checks into code, but I don't know what >>>> checks could be useful. >>> >>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >>> we are able to help. >> >> >> There are also chances that the problem is older. >> >> Looking at the code, this part of inet_twsk_purge looks fishy: >> >> 285 if (unlikely((tw->tw_family != family) || >> 286 atomic_read(&twsk_net(tw)->count))) { >> >> It uses net->count == 0 check to find the right sockets. But what if >> there are several nets with count == 0 in flight, can't there be >> several inet_twsk_purge calls running concurrently freeing each other >> sockets? If so it looks like inet_twsk_purge can call >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >> different nets discover the socket, check that net->count==0 and both >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >> net that it needs to purge? > > I don't think this could happen, because cleanup_net() is called in a > work struct, and this work can't be scheduled twice, so there should > not be any parallel cleanup_net(). > > Also, inet_twsk_deschedule_put() already waits for the flying timer, > net->count==0 at this point and all sockets in this netns are already > gone, so I don't know how a timer could be still fired after this. One thing that I noticed is that use-after-free always happens in tw_timer_handler, but never in timer code. This suggests that: 1. Whoever frees the struct waits for the timer mutex unlock. 2. Free happens almost at the same time as timer fires (otherwise the timer would probably be cancelled). That could happen if there is a race in timer code during cancellation or rearming. I understand that it's heavily stressed code, but who knows... I still wonder if twsk_net(tw)->count==0 is the right condition. This net may not yet have been scheduled for destruction, but another task could pick it up and start destroying... Cong, do you have any ideas as to what debugging checks I could add to help track this down? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-17 20:36 ` Dmitry Vyukov @ 2017-02-17 22:30 ` Cong Wang 2017-02-21 9:46 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Cong Wang @ 2017-02-17 22:30 UTC (permalink / raw) To: Dmitry Vyukov Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Fri, Feb 17, 2017 at 12:36 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Feb 17, 2017 at 7:51 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>>>> >>>>>> This code was changed a long time ago : >>>>>> >>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>>>>> >>>>>> So I suspect a recent patch broke the logic. >>>>>> >>>>>> You might start a bisection : >>>>>> >>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed. >>>>> >>>>> >>>>> It happens with too low rate for bisecting (few times per day). I >>>>> could add some additional checks into code, but I don't know what >>>>> checks could be useful. >>>> >>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >>>> we are able to help. >>> >>> >>> There are also chances that the problem is older. >>> >>> Looking at the code, this part of inet_twsk_purge looks fishy: >>> >>> 285 if (unlikely((tw->tw_family != family) || >>> 286 atomic_read(&twsk_net(tw)->count))) { >>> >>> It uses net->count == 0 check to find the right sockets. But what if >>> there are several nets with count == 0 in flight, can't there be >>> several inet_twsk_purge calls running concurrently freeing each other >>> sockets? If so it looks like inet_twsk_purge can call >>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >>> different nets discover the socket, check that net->count==0 and both >>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >>> net that it needs to purge? >> >> I don't think this could happen, because cleanup_net() is called in a >> work struct, and this work can't be scheduled twice, so there should >> not be any parallel cleanup_net(). >> >> Also, inet_twsk_deschedule_put() already waits for the flying timer, >> net->count==0 at this point and all sockets in this netns are already >> gone, so I don't know how a timer could be still fired after this. > > > One thing that I noticed is that use-after-free always happens in > tw_timer_handler, but never in timer code. This suggests that: > 1. Whoever frees the struct waits for the timer mutex unlock. > 2. Free happens almost at the same time as timer fires (otherwise the > timer would probably be cancelled). > > That could happen if there is a race in timer code during cancellation > or rearming. I understand that it's heavily stressed code, but who > knows... Good point! I think this could happen since timer is deactivated before unlinking the tw sock, so another CPU could find it and rearm the timer during such window? > > I still wonder if twsk_net(tw)->count==0 is the right condition. This > net may not yet have been scheduled for destruction, but another task > could pick it up and start destroying... Good question. net_exit_list is just a snapshot of the cleanup_list, so it is true that inet_twsk_purge() could purge more netns'es than in net_exit_list, but I don't see any problem with this, the work later can't find the same twsck again since it is unlinked from hashtable. Do you see otherwise? > > Cong, do you have any ideas as to what debugging checks I could add to > help track this down? I don't have any theory for the cause of this bug. Your point above actually makes sense for me. Maybe you can add some code in between the following code: if (del_timer_sync(&tw->tw_timer)) inet_twsk_kill(tw); to verify is the timer is rearmed or not. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-17 22:30 ` Cong Wang @ 2017-02-21 9:46 ` Dmitry Vyukov 2017-02-21 10:40 ` Dmitry Vyukov 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-21 9:46 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Sat, Feb 18, 2017 at 1:30 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>>>>> >>>>>>> This code was changed a long time ago : >>>>>>> >>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>>>>>> >>>>>>> So I suspect a recent patch broke the logic. >>>>>>> >>>>>>> You might start a bisection : >>>>>>> >>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed. >>>>>> >>>>>> >>>>>> It happens with too low rate for bisecting (few times per day). I >>>>>> could add some additional checks into code, but I don't know what >>>>>> checks could be useful. >>>>> >>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >>>>> we are able to help. >>>> >>>> >>>> There are also chances that the problem is older. >>>> >>>> Looking at the code, this part of inet_twsk_purge looks fishy: >>>> >>>> 285 if (unlikely((tw->tw_family != family) || >>>> 286 atomic_read(&twsk_net(tw)->count))) { >>>> >>>> It uses net->count == 0 check to find the right sockets. But what if >>>> there are several nets with count == 0 in flight, can't there be >>>> several inet_twsk_purge calls running concurrently freeing each other >>>> sockets? If so it looks like inet_twsk_purge can call >>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >>>> different nets discover the socket, check that net->count==0 and both >>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >>>> net that it needs to purge? >>> >>> I don't think this could happen, because cleanup_net() is called in a >>> work struct, and this work can't be scheduled twice, so there should >>> not be any parallel cleanup_net(). >>> >>> Also, inet_twsk_deschedule_put() already waits for the flying timer, >>> net->count==0 at this point and all sockets in this netns are already >>> gone, so I don't know how a timer could be still fired after this. >> >> >> One thing that I noticed is that use-after-free always happens in >> tw_timer_handler, but never in timer code. This suggests that: >> 1. Whoever frees the struct waits for the timer mutex unlock. >> 2. Free happens almost at the same time as timer fires (otherwise the >> timer would probably be cancelled). >> >> That could happen if there is a race in timer code during cancellation >> or rearming. I understand that it's heavily stressed code, but who >> knows... > > Good point! I think this could happen since timer is deactivated before > unlinking the tw sock, so another CPU could find it and rearm the timer > during such window? > >> >> I still wonder if twsk_net(tw)->count==0 is the right condition. This >> net may not yet have been scheduled for destruction, but another task >> could pick it up and start destroying... > > Good question. net_exit_list is just a snapshot of the cleanup_list, so > it is true that inet_twsk_purge() could purge more netns'es than in > net_exit_list, but I don't see any problem with this, the work later can't > find the same twsck again since it is unlinked from hashtable. > Do you see otherwise? No, I don't see otherwise. But it's just something suspicious in the context of an elusive race that happens under load. >> Cong, do you have any ideas as to what debugging checks I could add to >> help track this down? > > I don't have any theory for the cause of this bug. Your point above actually > makes sense for me. Maybe you can add some code in between the following > code: > > if (del_timer_sync(&tw->tw_timer)) > inet_twsk_kill(tw); > > to verify is the timer is rearmed or not. Now looking at the reports more closely, I think that debug info is off-by-one, and the use-after-free happens on the net object on this line: __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED). And this report is actually all correct: BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 at addr ffff88018063a460 Read of size 8 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.0-rc8+ #65 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332 tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 call_timer_fn+0x241/0x820 kernel/time/timer.c:1308 expire_timers kernel/time/timer.c:1348 [inline] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:658 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 Object at ffff88018063a240, in cache net_namespace size: 6784 Allocated: PID = 3270 kmem_cache_zalloc include/linux/slab.h:626 [inline] net_alloc net/core/net_namespace.c:339 [inline] copy_net_ns+0x196/0x530 net/core/net_namespace.c:379 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164 copy_process.part.42+0x20c6/0x5fd0 kernel/fork.c:1664 copy_process kernel/fork.c:1486 [inline] _do_fork+0x200/0xff0 kernel/fork.c:1942 SYSC_clone kernel/fork.c:2052 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2046 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 return_from_SYSCALL_64+0x0/0x7a Freed: PID = 8056 kmem_cache_free+0x71/0x240 mm/slab.c:3762 net_free+0xd7/0x110 net/core/net_namespace.c:355 net_drop_ns+0x31/0x40 net/core/net_namespace.c:362 cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:479 process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098 worker_thread+0x223/0x1990 kernel/workqueue.c:2232 kthread+0x326/0x3f0 kernel/kthread.c:227 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 The read is of size 8, but read of tw->tw_kill should be 4. We could also double check offset within the object and check if it matches what __NET_INC_STATS(twsk_net(tw)) reads. So the net is somehow freed while the timer is active. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: net: use-after-free in tw_timer_handler 2017-02-21 9:46 ` Dmitry Vyukov @ 2017-02-21 10:40 ` Dmitry Vyukov 0 siblings, 0 replies; 28+ messages in thread From: Dmitry Vyukov @ 2017-02-21 10:40 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller On Tue, Feb 21, 2017 at 12:46 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Sat, Feb 18, 2017 at 1:30 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>>>>>> >>>>>>>> This code was changed a long time ago : >>>>>>>> >>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054 >>>>>>>> >>>>>>>> So I suspect a recent patch broke the logic. >>>>>>>> >>>>>>>> You might start a bisection : >>>>>>>> >>>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed. >>>>>>> >>>>>>> >>>>>>> It happens with too low rate for bisecting (few times per day). I >>>>>>> could add some additional checks into code, but I don't know what >>>>>>> checks could be useful. >>>>>> >>>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure >>>>>> we are able to help. >>>>> >>>>> >>>>> There are also chances that the problem is older. >>>>> >>>>> Looking at the code, this part of inet_twsk_purge looks fishy: >>>>> >>>>> 285 if (unlikely((tw->tw_family != family) || >>>>> 286 atomic_read(&twsk_net(tw)->count))) { >>>>> >>>>> It uses net->count == 0 check to find the right sockets. But what if >>>>> there are several nets with count == 0 in flight, can't there be >>>>> several inet_twsk_purge calls running concurrently freeing each other >>>>> sockets? If so it looks like inet_twsk_purge can call >>>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for >>>>> different nets discover the socket, check that net->count==0 and both >>>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge >>>>> net that it needs to purge? >>>> >>>> I don't think this could happen, because cleanup_net() is called in a >>>> work struct, and this work can't be scheduled twice, so there should >>>> not be any parallel cleanup_net(). >>>> >>>> Also, inet_twsk_deschedule_put() already waits for the flying timer, >>>> net->count==0 at this point and all sockets in this netns are already >>>> gone, so I don't know how a timer could be still fired after this. >>> >>> >>> One thing that I noticed is that use-after-free always happens in >>> tw_timer_handler, but never in timer code. This suggests that: >>> 1. Whoever frees the struct waits for the timer mutex unlock. >>> 2. Free happens almost at the same time as timer fires (otherwise the >>> timer would probably be cancelled). >>> >>> That could happen if there is a race in timer code during cancellation >>> or rearming. I understand that it's heavily stressed code, but who >>> knows... >> >> Good point! I think this could happen since timer is deactivated before >> unlinking the tw sock, so another CPU could find it and rearm the timer >> during such window? >> >>> >>> I still wonder if twsk_net(tw)->count==0 is the right condition. This >>> net may not yet have been scheduled for destruction, but another task >>> could pick it up and start destroying... >> >> Good question. net_exit_list is just a snapshot of the cleanup_list, so >> it is true that inet_twsk_purge() could purge more netns'es than in >> net_exit_list, but I don't see any problem with this, the work later can't >> find the same twsck again since it is unlinked from hashtable. >> Do you see otherwise? > > No, I don't see otherwise. But it's just something suspicious in the > context of an elusive race that happens under load. > >>> Cong, do you have any ideas as to what debugging checks I could add to >>> help track this down? >> >> I don't have any theory for the cause of this bug. Your point above actually >> makes sense for me. Maybe you can add some code in between the following >> code: >> >> if (del_timer_sync(&tw->tw_timer)) >> inet_twsk_kill(tw); >> >> to verify is the timer is rearmed or not. > > Now looking at the reports more closely, I think that debug info is > off-by-one, and the use-after-free happens on the net object on this > line: __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED). > > And this report is actually all correct: > > BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0 > net/ipv4/inet_timewait_sock.c:149 at addr ffff88018063a460 > Read of size 8 by task swapper/1/0 > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.0-rc8+ #65 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > Call Trace: > <IRQ> > __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332 > tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149 > call_timer_fn+0x241/0x820 kernel/time/timer.c:1308 > expire_timers kernel/time/timer.c:1348 [inline] > __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642 > run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655 > __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 > invoke_softirq kernel/softirq.c:364 [inline] > irq_exit+0x1cc/0x200 kernel/softirq.c:405 > exiting_irq arch/x86/include/asm/apic.h:658 [inline] > smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961 > apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 > > Object at ffff88018063a240, in cache net_namespace size: 6784 > > Allocated: > PID = 3270 > kmem_cache_zalloc include/linux/slab.h:626 [inline] > net_alloc net/core/net_namespace.c:339 [inline] > copy_net_ns+0x196/0x530 net/core/net_namespace.c:379 > create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 > copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164 > copy_process.part.42+0x20c6/0x5fd0 kernel/fork.c:1664 > copy_process kernel/fork.c:1486 [inline] > _do_fork+0x200/0xff0 kernel/fork.c:1942 > SYSC_clone kernel/fork.c:2052 [inline] > SyS_clone+0x37/0x50 kernel/fork.c:2046 > do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280 > return_from_SYSCALL_64+0x0/0x7a > > Freed: > PID = 8056 > kmem_cache_free+0x71/0x240 mm/slab.c:3762 > net_free+0xd7/0x110 net/core/net_namespace.c:355 > net_drop_ns+0x31/0x40 net/core/net_namespace.c:362 > cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:479 > process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098 > worker_thread+0x223/0x1990 kernel/workqueue.c:2232 > kthread+0x326/0x3f0 kernel/kthread.c:227 > ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > > The read is of size 8, but read of tw->tw_kill should be 4. > We could also double check offset within the object and check if it > matches what __NET_INC_STATS(twsk_net(tw)) reads. > > So the net is somehow freed while the timer is active. Suspecting a race between inet_twsk_deschedule_put and inet_twsk_reschedule, but don't see any bugs. Can inet_twsk_deschedule_put run concurrently with inet_twsk_schedule (not the reschedule)? That might explain the bug, but that's probably not possible, right? Starring at the code, this part looks wrong: 960 static inline int 961 __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) 962 { ... 975 if (timer_pending(timer)) { 976 if (timer->expires == expires) 977 return 1; ... 985 base = lock_timer_base(timer, &flags); 987 clk = base->clk; 988 idx = calc_wheel_index(expires, clk); 995 if (idx == timer_get_idx(timer)) { 996 timer->expires = expires; 997 ret = 1; 998 goto out_unlock; 999 } We check timer_pending before taking the lock, and not re-check it after taking the lock. So we can merely update timer->expires for a non-pending timer and fail to requeue it. It's fine for pending_only (as it has already fired), but not for !pending_only. Am I missing something? But it still does not explain the original bug. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2017-02-22 21:15 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-23 10:19 net: use-after-free in tw_timer_handler Dmitry Vyukov 2017-01-23 10:23 ` Dmitry Vyukov 2017-01-24 14:28 ` Eric Dumazet 2017-01-24 15:06 ` Dmitry Vyukov 2017-01-24 15:52 ` Eric Dumazet 2017-02-08 17:36 ` Dmitry Vyukov 2017-02-08 17:58 ` Eric Dumazet 2017-02-08 18:55 ` Dmitry Vyukov 2017-02-08 19:17 ` Eric Dumazet 2017-02-08 19:32 ` Dmitry Vyukov 2017-02-14 19:38 ` Dmitry Vyukov 2017-02-21 11:27 ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin 2017-02-21 11:56 ` Dmitry Vyukov 2017-02-22 6:48 ` Dmitry Vyukov 2017-02-21 13:43 ` Arnaldo Carvalho de Melo 2017-02-21 13:53 ` Eric Dumazet 2017-02-21 18:23 ` David Miller 2017-02-22 8:59 ` Andrey Ryabinin 2017-02-21 18:23 ` David Miller 2017-02-21 18:24 ` David Miller 2017-02-22 9:35 ` Andrey Ryabinin 2017-02-22 9:35 ` [PATCH v2] " Andrey Ryabinin 2017-02-22 21:15 ` David Miller 2017-02-17 18:51 ` net: use-after-free in tw_timer_handler Cong Wang 2017-02-17 20:36 ` Dmitry Vyukov 2017-02-17 22:30 ` Cong Wang 2017-02-21 9:46 ` Dmitry Vyukov 2017-02-21 10:40 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).