net: use-after-free in tw_timer

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* net: use-after-free in tw_timer_handler
@ 2017-01-23 10:19 Dmitry Vyukov
  2017-01-23 10:23 ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-01-23 10:19 UTC (permalink / raw)
  To: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet
  Cc: syzkaller

Hello,

While running syzkaller fuzzer I started seeing use-after-frees in
tw_timer_handler. It happens with very low frequency, so far I've seen
22 of them. But all reports look consistent, so I would assume that it
is real, just requires a very tricky race to happen. I've stared
seeing it around Jan 17, however I did not update kernels for some
time before that so potentially the issues was introduced somewhat
earlier. Or maybe fuzzer just figured how to trigger it, and the bug
is actually old. I am seeing it on all of torvalds/linux-next/mmotm,
some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36,
5cf7a0f3442b2312326c39f571d637669a478235,
c497f8d17246720afe680ea1a8fa6e48e75af852.
Majority of reports points to net_drop_ns as the offending free, but
it may be red herring. Since the access happens in timer, it can
happen long after free and the memory could have been reused. I've
also seen few where the access in tw_timer_handler is reported as
out-of-bounds on task_struct and on struct filename.


BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cb58c398
Read of size 8 by task syz-executor0/24691
CPU: 0 PID: 24691 Comm: syz-executor0 Not tainted 4.9.0 #3
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 ffff8801dc007328 ffffffff8234530f ffffffff00000000 1ffff1003b800df8
 ffffed003b800df0 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
 ffff8801d8ad8f60 ffff8801d8ad8f68 ffff8801d8ad8740 000000000000002e
Call Trace:
 [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
mm/kasan/report.c:329
 [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149
 [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
 [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
 [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
 [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
 [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
 [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
 [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
 [<ffffffff843723ee>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
 [<ffffffff843723ee>] smp_trace_apic_timer_interrupt+0x13e/0x6a8
arch/x86/kernel/apic/apic.c:981
 [<ffffffff843713dc>] trace_apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:709
 <EOI> [ 2916.083183]  [<ffffffff8436ebe6>] ? arch_local_irq_enable
arch/x86/include/asm/paravirt.h:777 [inline]
 <EOI> [ 2916.083183]  [<ffffffff8436ebe6>] ? __raw_spin_unlock_irq
include/linux/spinlock_api_smp.h:170 [inline]
 <EOI> [ 2916.083183]  [<ffffffff8436ebe6>] ?
_raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199
 [<ffffffff814cbff2>] finish_lock_switch kernel/sched/sched.h:1157 [inline]
 [<ffffffff814cbff2>] finish_task_switch+0x1c2/0x710 kernel/sched/core.c:2769
 [<ffffffff84356654>] context_switch kernel/sched/core.c:2902 [inline]
 [<ffffffff84356654>] __schedule+0x724/0x1e90 kernel/sched/core.c:3402
 [<ffffffff84357ec8>] schedule+0x108/0x440 kernel/sched/core.c:3457
 [<ffffffff8100790f>] exit_to_usermode_loop+0x20f/0x2a0
arch/x86/entry/common.c:149
 [<ffffffff81009413>] prepare_exit_to_usermode
arch/x86/entry/common.c:190 [inline]
 [<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
 [<ffffffff8436fa22>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Object at ffff8801cb58c1c0, in cache net_namespace size: 6656
Allocated:
PID = 3183
 [ 2916.342108] [<ffffffff819dcd92>] kasan_slab_alloc+0x12/0x20
mm/kasan/kasan.c:537
 [ 2916.349322] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
 [ 2916.356776] [<ffffffff83549a86>] kmem_cache_zalloc
include/linux/slab.h:626 [inline]
 [ 2916.356776] [<ffffffff83549a86>] net_alloc
net/core/net_namespace.c:339 [inline]
 [ 2916.356776] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
net/core/net_namespace.c:379
 [ 2916.363783] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
kernel/nsproxy.c:106
 [ 2916.371605] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
kernel/nsproxy.c:164
 [ 2916.379042] [<ffffffff814197f1>]
copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
 [ 2916.387013] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
 [ 2916.387013] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
 [ 2916.393730] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
 [ 2916.393730] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
 [ 2916.400376] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
arch/x86/entry/common.c:280
 [ 2916.407563] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 15107
 [ 2916.441170] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
 [ 2916.441170] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
 [ 2916.448408] [<ffffffff83548e3e>] net_free
net/core/net_namespace.c:355 [inline]
 [ 2916.448408] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
net/core/net_namespace.c:362
 [ 2916.455370] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
net/core/net_namespace.c:472
 [ 2916.462331] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
kernel/workqueue.c:2096
 [ 2916.469877] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
kernel/workqueue.c:2230
 [ 2916.477155] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
 [ 2916.483831] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
 ffff8801cb58c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801cb58c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801cb58c380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff8801cb58c400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801cb58c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cd4ec298
Read of size 8 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0 #3
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 ffff8801dc107468 ffffffff8234530f ffffffff00000001 1ffff1003b820e20
 ffffed003b820e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
 1ffff1003b820e17 ffff8801daf0e2c0 0000000041b58ab3 ffffffff84af4170
Call Trace:
 [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
mm/kasan/report.c:329
 [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149
 [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
 [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
 [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
 [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
 [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
 [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
 [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
 [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
 [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0
arch/x86/kernel/apic/apic.c:960
 [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:709
 <EOI> [ 1412.821824]  [<ffffffff8436dbb6>] ?
native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
 [<ffffffff8436d08f>] arch_safe_halt
arch/x86/include/asm/paravirt.h:103 [inline]
 [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308
 [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299
 [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96
 [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline]
 [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302
 [<ffffffff812e47ac>] start_secondary+0x36c/0x460 arch/x86/kernel/smpboot.c:263
Object at ffff8801cd4ec0c0, in cache net_namespace size: 6656
Allocated:
PID = 3131
 [ 1412.940699] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
 [ 1412.948084] [<ffffffff83549a86>] kmem_cache_zalloc
include/linux/slab.h:626 [inline]
 [ 1412.948084] [<ffffffff83549a86>] net_alloc
net/core/net_namespace.c:339 [inline]
 [ 1412.948084] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
net/core/net_namespace.c:379
 [ 1412.955019] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
kernel/nsproxy.c:106
 [ 1412.962817] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
kernel/nsproxy.c:164
 [ 1412.970094] [<ffffffff814197f1>]
copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
 [ 1412.978004] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
 [ 1412.978004] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
 [ 1412.984677] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
 [ 1412.984677] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
 [ 1412.991276] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
arch/x86/entry/common.c:280
 [ 1412.998394] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 9846
 [ 1413.031603] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
 [ 1413.031603] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
 [ 1413.038796] [<ffffffff83548e3e>] net_free
net/core/net_namespace.c:355 [inline]
 [ 1413.038796] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
net/core/net_namespace.c:362
 [ 1413.045734] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
net/core/net_namespace.c:472
 [ 1413.052667] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
kernel/workqueue.c:2096
 [ 1413.060120] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
kernel/workqueue.c:2230
 [ 1413.067357] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
 [ 1413.073944] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
 ffff8801cd4ec180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801cd4ec200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801cd4ec280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff8801cd4ec300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801cd4ec380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801b7b50358
Read of size 8 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0 #3
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 ffff8801dc007468 ffffffff8234530f ffffffff00000000 1ffff1003b800e20
 ffffed003b800e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
 ffffffff84e2bba0 ffffffff84e2bba8 ffffffff84e2b380 000000000000002e
Call Trace:
 [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
mm/kasan/report.c:329
 [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149
 [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
 [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
 [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
 [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
 [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
 [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
 [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
 [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
 [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0
arch/x86/kernel/apic/apic.c:960
 [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:709
 <EOI> [ 1965.936792]  [<ffffffff8436dbb6>] ?
native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
 [<ffffffff8436d08f>] arch_safe_halt
arch/x86/include/asm/paravirt.h:103 [inline]
 [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308
 [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299
 [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96
 [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline]
 [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302
 [<ffffffff8434f05d>] rest_init+0x18d/0x1a0 init/main.c:408
 [<ffffffff85481b16>] start_kernel+0x7a0/0x7d2 init/main.c:660
 [<ffffffff854802e6>] x86_64_start_reservations+0x2a/0x2c
arch/x86/kernel/head64.c:195
 [<ffffffff85480424>] x86_64_start_kernel+0x13c/0x149
arch/x86/kernel/head64.c:176
Object at ffff8801b7b50180, in cache net_namespace size: 6656
Allocated:
PID = 3169
 [ 1966.129951] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
 [ 1966.137357] [<ffffffff83549a86>] kmem_cache_zalloc
include/linux/slab.h:626 [inline]
 [ 1966.137357] [<ffffffff83549a86>] net_alloc
net/core/net_namespace.c:339 [inline]
 [ 1966.137357] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
net/core/net_namespace.c:379
 [ 1966.144350] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
kernel/nsproxy.c:106
 [ 1966.152254] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
kernel/nsproxy.c:164
 [ 1966.159567] [<ffffffff814197f1>]
copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
 [ 1966.167484] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
 [ 1966.167484] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
 [ 1966.174207] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
 [ 1966.174207] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
 [ 1966.180832] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
arch/x86/entry/common.c:280
 [ 1966.187973] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 8938
 [ 1966.221347] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
 [ 1966.221347] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
 [ 1966.228568] [<ffffffff83548e3e>] net_free
net/core/net_namespace.c:355 [inline]
 [ 1966.228568] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
net/core/net_namespace.c:362
 [ 1966.235564] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
net/core/net_namespace.c:472
 [ 1966.242517] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
kernel/workqueue.c:2096
 [ 1966.249995] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
kernel/workqueue.c:2230
 [ 1966.257258] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
 [ 1966.263879] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
 ffff8801b7b50200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b7b50280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801b7b50300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                    ^
 ffff8801b7b50380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b7b50400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

BUG: KASAN: slab-out-of-bounds in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801c98f43a0
Read of size 8 by task syz-executor8/3423
CPU: 0 PID: 3423 Comm: syz-executor8 Not tainted 4.10.0-rc5 #19
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 kasan_object_err+0x1c/0x70 mm/kasan/report.c:161
 print_address_description mm/kasan/report.c:199 [inline]
 kasan_report_error+0x1d1/0x4d0 mm/kasan/report.c:288
 kasan_report mm/kasan/report.c:308 [inline]
 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329
 tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149
 call_timer_fn+0x241/0x820 kernel/time/timer.c:1308
 expire_timers kernel/time/timer.c:1348 [inline]
 __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642
 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655
 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:658 [inline]
 smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
RIP: 0010:arch_local_irq_save arch/x86/include/asm/paravirt.h:784 [inline]
RIP: 0010:lock_is_held_type+0x124/0x310 kernel/locking/lockdep.c:3787
RSP: 0018:ffff8801c946f558 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000286 RBX: 1ffff1003928deac RCX: 1ffff1003928deb0
RDX: 1ffffffff0a18984 RSI: 00000000ffffffff RDI: ffffffff850c4c20
RBP: ffff8801c946f6a8 R08: 0000000000000002 R09: 0000000000000001
R10: 000000000000000a R11: 0000000000000000 R12: ffff8801c946f680
R13: ffff8801c9492640 R14: ffffffff85130ec0 R15: 0000000000000bff
 </IRQ>
 lock_is_held include/linux/lockdep.h:348 [inline]
 ___might_sleep+0x5b3/0x650 kernel/sched/core.c:7748
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 cache_alloc_debugcheck_before mm/slab.c:3071 [inline]
 slab_alloc mm/slab.c:3386 [inline]
 kmem_cache_alloc+0x273/0x680 mm/slab.c:3558
 shmem_alloc_inode+0x1b/0x40 mm/shmem.c:3647
 alloc_inode+0x61/0x180 fs/inode.c:207
 new_inode_pseudo+0x69/0x170 fs/inode.c:889
 new_inode+0x1c/0x40 fs/inode.c:918
 shmem_get_inode+0xd1/0x8a0 mm/shmem.c:2120
 shmem_mknod+0x58/0x1b0 mm/shmem.c:2824
 shmem_mkdir+0x29/0x50 mm/shmem.c:2875
 vfs_mkdir+0x3be/0x600 fs/namei.c:3738
 SYSC_mkdirat fs/namei.c:3761 [inline]
 SyS_mkdirat fs/namei.c:3745 [inline]
 SYSC_mkdir fs/namei.c:3772 [inline]
 SyS_mkdir+0x16e/0x290 fs/namei.c:3770
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44ec87
RSP: 002b:0000000001a2fe40 EFLAGS: 00000212 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 000000000044ec87
RDX: 0000000001a2fe6e RSI: 00000000000001ff RDI: 0000000001a2fe68
RBP: 00000000000019ec R08: 0000000000000000 R09: 0000000000000006
R10: 0000000000000064 R11: 0000000000000212 R12: 0000000001ef390c
R13: 0000000000000000 R14: 00000000000a43b5 R15: 00000000000019ec
Object at ffff8801c98f44c0, in cache task_struct size: 5696
Allocated:
PID = 3452
[<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
[<ffffffff819f71da>] set_track mm/kasan/kasan.c:514 [inline]
[<ffffffff819f71da>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605
[<ffffffff819f77d2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
[<ffffffff819f1652>] kmem_cache_alloc_node+0x122/0x690 mm/slab.c:3650
[<ffffffff81421fe2>] alloc_task_struct_node kernel/fork.c:142 [inline]
[<ffffffff81421fe2>] dup_task_struct kernel/fork.c:482 [inline]
[<ffffffff81421fe2>] copy_process.part.42+0x1a32/0x5fd0 kernel/fork.c:1515
[<ffffffff81426ac0>] copy_process kernel/fork.c:1486 [inline]
[<ffffffff81426ac0>] _do_fork+0x200/0xff0 kernel/fork.c:1942
[<ffffffff81427987>] SYSC_clone kernel/fork.c:2052 [inline]
[<ffffffff81427987>] SyS_clone+0x37/0x50 kernel/fork.c:2046
[<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
[<ffffffff8440fb09>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 29885
[<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
[<ffffffff819f784f>] set_track mm/kasan/kasan.c:514 [inline]
[<ffffffff819f784f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578
[<ffffffff819f4bf1>] __cache_free mm/slab.c:3502 [inline]
[<ffffffff819f4bf1>] kmem_cache_free+0x71/0x240 mm/slab.c:3762
[<ffffffff8141f041>] free_task_struct kernel/fork.c:147 [inline]
[<ffffffff8141f041>] free_task+0x151/0x1d0 kernel/fork.c:359
[<ffffffff8141f30b>] __put_task_struct+0x24b/0x5f0 kernel/fork.c:396
[<ffffffff81435baa>] put_task_struct include/linux/sched.h:2257 [inline]
[<ffffffff81435baa>] delayed_put_task_struct+0xca/0x3f0 kernel/exit.c:173
[<ffffffff815ef250>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
[<ffffffff815ef250>] rcu_do_batch.isra.70+0x9e0/0xdf0 kernel/rcu/tree.c:2780
[<ffffffff815efad2>] invoke_rcu_callbacks kernel/rcu/tree.c:3043 [inline]
[<ffffffff815efad2>] __rcu_process_callbacks kernel/rcu/tree.c:3010 [inline]
[<ffffffff815efad2>] rcu_process_callbacks+0x472/0xc70 kernel/rcu/tree.c:3027
[<ffffffff84412d7f>] __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
Memory state around the buggy address:
 ffff8801c98f4280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8801c98f4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8801c98f4380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                               ^
 ffff8801c98f4400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8801c98f4480: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
==================================================================

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-01-23 10:19 net: use-after-free in tw_timer_handler Dmitry Vyukov
@ 2017-01-23 10:23 ` Dmitry Vyukov
  2017-01-24 14:28   ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-01-23 10:23 UTC (permalink / raw)
  To: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet
  Cc: syzkaller

On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> While running syzkaller fuzzer I started seeing use-after-frees in
> tw_timer_handler. It happens with very low frequency, so far I've seen
> 22 of them. But all reports look consistent, so I would assume that it
> is real, just requires a very tricky race to happen. I've stared
> seeing it around Jan 17, however I did not update kernels for some
> time before that so potentially the issues was introduced somewhat
> earlier. Or maybe fuzzer just figured how to trigger it, and the bug
> is actually old. I am seeing it on all of torvalds/linux-next/mmotm,
> some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36,
> 5cf7a0f3442b2312326c39f571d637669a478235,
> c497f8d17246720afe680ea1a8fa6e48e75af852.
> Majority of reports points to net_drop_ns as the offending free, but
> it may be red herring. Since the access happens in timer, it can
> happen long after free and the memory could have been reused. I've
> also seen few where the access in tw_timer_handler is reported as
> out-of-bounds on task_struct and on struct filename.



I've briefly skimmed through the code. Assuming that it requires a
very tricky race to be triggered, the most suspicious looks
inet_twsk_deschedule_put vs __inet_twsk_schedule:

void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
{
        if (del_timer_sync(&tw->tw_timer))
                inet_twsk_kill(tw);
        inet_twsk_put(tw);
}

void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm)
{
        tw->tw_kill = timeo <= 4*HZ;
        if (!rearm) {
                BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
                atomic_inc(&tw->tw_dr->tw_count);
        } else {
                mod_timer_pending(&tw->tw_timer, jiffies + timeo);
        }
}

Can't it somehow end up rearming already deleted timer? Or maybe the
first mod_timer happens after del_timer_sync?




> BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cb58c398
> Read of size 8 by task syz-executor0/24691
> CPU: 0 PID: 24691 Comm: syz-executor0 Not tainted 4.9.0 #3
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>  ffff8801dc007328 ffffffff8234530f ffffffff00000000 1ffff1003b800df8
>  ffffed003b800df0 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
>  ffff8801d8ad8f60 ffff8801d8ad8f68 ffff8801d8ad8740 000000000000002e
> Call Trace:
>  [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
> mm/kasan/report.c:329
>  [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149
>  [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
>  [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
>  [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
>  [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
>  [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
>  [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
>  [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  [<ffffffff843723ee>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
>  [<ffffffff843723ee>] smp_trace_apic_timer_interrupt+0x13e/0x6a8
> arch/x86/kernel/apic/apic.c:981
>  [<ffffffff843713dc>] trace_apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:709
>  <EOI> [ 2916.083183]  [<ffffffff8436ebe6>] ? arch_local_irq_enable
> arch/x86/include/asm/paravirt.h:777 [inline]
>  <EOI> [ 2916.083183]  [<ffffffff8436ebe6>] ? __raw_spin_unlock_irq
> include/linux/spinlock_api_smp.h:170 [inline]
>  <EOI> [ 2916.083183]  [<ffffffff8436ebe6>] ?
> _raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199
>  [<ffffffff814cbff2>] finish_lock_switch kernel/sched/sched.h:1157 [inline]
>  [<ffffffff814cbff2>] finish_task_switch+0x1c2/0x710 kernel/sched/core.c:2769
>  [<ffffffff84356654>] context_switch kernel/sched/core.c:2902 [inline]
>  [<ffffffff84356654>] __schedule+0x724/0x1e90 kernel/sched/core.c:3402
>  [<ffffffff84357ec8>] schedule+0x108/0x440 kernel/sched/core.c:3457
>  [<ffffffff8100790f>] exit_to_usermode_loop+0x20f/0x2a0
> arch/x86/entry/common.c:149
>  [<ffffffff81009413>] prepare_exit_to_usermode
> arch/x86/entry/common.c:190 [inline]
>  [<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570
> arch/x86/entry/common.c:259
>  [<ffffffff8436fa22>] entry_SYSCALL_64_fastpath+0xc0/0xc2
> Object at ffff8801cb58c1c0, in cache net_namespace size: 6656
> Allocated:
> PID = 3183
>  [ 2916.342108] [<ffffffff819dcd92>] kasan_slab_alloc+0x12/0x20
> mm/kasan/kasan.c:537
>  [ 2916.349322] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
>  [ 2916.356776] [<ffffffff83549a86>] kmem_cache_zalloc
> include/linux/slab.h:626 [inline]
>  [ 2916.356776] [<ffffffff83549a86>] net_alloc
> net/core/net_namespace.c:339 [inline]
>  [ 2916.356776] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
> net/core/net_namespace.c:379
>  [ 2916.363783] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
> kernel/nsproxy.c:106
>  [ 2916.371605] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
> kernel/nsproxy.c:164
>  [ 2916.379042] [<ffffffff814197f1>]
> copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
>  [ 2916.387013] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
>  [ 2916.387013] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
>  [ 2916.393730] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
>  [ 2916.393730] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
>  [ 2916.400376] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
> arch/x86/entry/common.c:280
>  [ 2916.407563] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
> Freed:
> PID = 15107
>  [ 2916.441170] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
>  [ 2916.441170] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
>  [ 2916.448408] [<ffffffff83548e3e>] net_free
> net/core/net_namespace.c:355 [inline]
>  [ 2916.448408] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
> net/core/net_namespace.c:362
>  [ 2916.455370] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
> net/core/net_namespace.c:472
>  [ 2916.462331] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
> kernel/workqueue.c:2096
>  [ 2916.469877] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
> kernel/workqueue.c:2230
>  [ 2916.477155] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
>  [ 2916.483831] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
> arch/x86/entry/entry_64.S:433
> Memory state around the buggy address:
>  ffff8801cb58c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801cb58c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>ffff8801cb58c380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                             ^
>  ffff8801cb58c400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801cb58c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
> BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cd4ec298
> Read of size 8 by task swapper/1/0
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0 #3
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>  ffff8801dc107468 ffffffff8234530f ffffffff00000001 1ffff1003b820e20
>  ffffed003b820e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
>  1ffff1003b820e17 ffff8801daf0e2c0 0000000041b58ab3 ffffffff84af4170
> Call Trace:
>  [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
> mm/kasan/report.c:329
>  [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149
>  [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
>  [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
>  [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
>  [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
>  [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
>  [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
>  [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
>  [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0
> arch/x86/kernel/apic/apic.c:960
>  [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:709
>  <EOI> [ 1412.821824]  [<ffffffff8436dbb6>] ?
> native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
>  [<ffffffff8436d08f>] arch_safe_halt
> arch/x86/include/asm/paravirt.h:103 [inline]
>  [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308
>  [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299
>  [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96
>  [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline]
>  [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline]
>  [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302
>  [<ffffffff812e47ac>] start_secondary+0x36c/0x460 arch/x86/kernel/smpboot.c:263
> Object at ffff8801cd4ec0c0, in cache net_namespace size: 6656
> Allocated:
> PID = 3131
>  [ 1412.940699] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
>  [ 1412.948084] [<ffffffff83549a86>] kmem_cache_zalloc
> include/linux/slab.h:626 [inline]
>  [ 1412.948084] [<ffffffff83549a86>] net_alloc
> net/core/net_namespace.c:339 [inline]
>  [ 1412.948084] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
> net/core/net_namespace.c:379
>  [ 1412.955019] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
> kernel/nsproxy.c:106
>  [ 1412.962817] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
> kernel/nsproxy.c:164
>  [ 1412.970094] [<ffffffff814197f1>]
> copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
>  [ 1412.978004] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
>  [ 1412.978004] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
>  [ 1412.984677] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
>  [ 1412.984677] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
>  [ 1412.991276] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
> arch/x86/entry/common.c:280
>  [ 1412.998394] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
> Freed:
> PID = 9846
>  [ 1413.031603] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
>  [ 1413.031603] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
>  [ 1413.038796] [<ffffffff83548e3e>] net_free
> net/core/net_namespace.c:355 [inline]
>  [ 1413.038796] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
> net/core/net_namespace.c:362
>  [ 1413.045734] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
> net/core/net_namespace.c:472
>  [ 1413.052667] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
> kernel/workqueue.c:2096
>  [ 1413.060120] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
> kernel/workqueue.c:2230
>  [ 1413.067357] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
>  [ 1413.073944] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
> arch/x86/entry/entry_64.S:433
> Memory state around the buggy address:
>  ffff8801cd4ec180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801cd4ec200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>ffff8801cd4ec280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                             ^
>  ffff8801cd4ec300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801cd4ec380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
> BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149 at addr ffff8801b7b50358
> Read of size 8 by task swapper/0/0
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0 #3
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>  ffff8801dc007468 ffffffff8234530f ffffffff00000000 1ffff1003b800e20
>  ffffed003b800e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
>  ffffffff84e2bba0 ffffffff84e2bba8 ffffffff84e2b380 000000000000002e
> Call Trace:
>  [<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
> mm/kasan/report.c:329
>  [<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149
>  [<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
>  [<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
>  [<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
>  [<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
>  [<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
>  [<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
>  [<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  [<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
>  [<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0
> arch/x86/kernel/apic/apic.c:960
>  [<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:709
>  <EOI> [ 1965.936792]  [<ffffffff8436dbb6>] ?
> native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
>  [<ffffffff8436d08f>] arch_safe_halt
> arch/x86/include/asm/paravirt.h:103 [inline]
>  [<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308
>  [<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299
>  [<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96
>  [<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline]
>  [<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline]
>  [<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302
>  [<ffffffff8434f05d>] rest_init+0x18d/0x1a0 init/main.c:408
>  [<ffffffff85481b16>] start_kernel+0x7a0/0x7d2 init/main.c:660
>  [<ffffffff854802e6>] x86_64_start_reservations+0x2a/0x2c
> arch/x86/kernel/head64.c:195
>  [<ffffffff85480424>] x86_64_start_kernel+0x13c/0x149
> arch/x86/kernel/head64.c:176
> Object at ffff8801b7b50180, in cache net_namespace size: 6656
> Allocated:
> PID = 3169
>  [ 1966.129951] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
>  [ 1966.137357] [<ffffffff83549a86>] kmem_cache_zalloc
> include/linux/slab.h:626 [inline]
>  [ 1966.137357] [<ffffffff83549a86>] net_alloc
> net/core/net_namespace.c:339 [inline]
>  [ 1966.137357] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
> net/core/net_namespace.c:379
>  [ 1966.144350] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
> kernel/nsproxy.c:106
>  [ 1966.152254] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
> kernel/nsproxy.c:164
>  [ 1966.159567] [<ffffffff814197f1>]
> copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
>  [ 1966.167484] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
>  [ 1966.167484] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
>  [ 1966.174207] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
>  [ 1966.174207] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
>  [ 1966.180832] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
> arch/x86/entry/common.c:280
>  [ 1966.187973] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
> Freed:
> PID = 8938
>  [ 1966.221347] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
>  [ 1966.221347] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
>  [ 1966.228568] [<ffffffff83548e3e>] net_free
> net/core/net_namespace.c:355 [inline]
>  [ 1966.228568] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
> net/core/net_namespace.c:362
>  [ 1966.235564] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
> net/core/net_namespace.c:472
>  [ 1966.242517] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
> kernel/workqueue.c:2096
>  [ 1966.249995] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
> kernel/workqueue.c:2230
>  [ 1966.257258] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
>  [ 1966.263879] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
> arch/x86/entry/entry_64.S:433
> Memory state around the buggy address:
>  ffff8801b7b50200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801b7b50280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>ffff8801b7b50300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                     ^
>  ffff8801b7b50380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801b7b50400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
> BUG: KASAN: slab-out-of-bounds in tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149 at addr ffff8801c98f43a0
> Read of size 8 by task syz-executor8/3423
> CPU: 0 PID: 3423 Comm: syz-executor8 Not tainted 4.10.0-rc5 #19
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:161
>  print_address_description mm/kasan/report.c:199 [inline]
>  kasan_report_error+0x1d1/0x4d0 mm/kasan/report.c:288
>  kasan_report mm/kasan/report.c:308 [inline]
>  __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329
>  tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149
>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1308
>  expire_timers kernel/time/timer.c:1348 [inline]
>  __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642
>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655
>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>  invoke_softirq kernel/softirq.c:364 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961
>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
> RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
> RIP: 0010:arch_local_irq_save arch/x86/include/asm/paravirt.h:784 [inline]
> RIP: 0010:lock_is_held_type+0x124/0x310 kernel/locking/lockdep.c:3787
> RSP: 0018:ffff8801c946f558 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
> RAX: 0000000000000286 RBX: 1ffff1003928deac RCX: 1ffff1003928deb0
> RDX: 1ffffffff0a18984 RSI: 00000000ffffffff RDI: ffffffff850c4c20
> RBP: ffff8801c946f6a8 R08: 0000000000000002 R09: 0000000000000001
> R10: 000000000000000a R11: 0000000000000000 R12: ffff8801c946f680
> R13: ffff8801c9492640 R14: ffffffff85130ec0 R15: 0000000000000bff
>  </IRQ>
>  lock_is_held include/linux/lockdep.h:348 [inline]
>  ___might_sleep+0x5b3/0x650 kernel/sched/core.c:7748
>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>  cache_alloc_debugcheck_before mm/slab.c:3071 [inline]
>  slab_alloc mm/slab.c:3386 [inline]
>  kmem_cache_alloc+0x273/0x680 mm/slab.c:3558
>  shmem_alloc_inode+0x1b/0x40 mm/shmem.c:3647
>  alloc_inode+0x61/0x180 fs/inode.c:207
>  new_inode_pseudo+0x69/0x170 fs/inode.c:889
>  new_inode+0x1c/0x40 fs/inode.c:918
>  shmem_get_inode+0xd1/0x8a0 mm/shmem.c:2120
>  shmem_mknod+0x58/0x1b0 mm/shmem.c:2824
>  shmem_mkdir+0x29/0x50 mm/shmem.c:2875
>  vfs_mkdir+0x3be/0x600 fs/namei.c:3738
>  SYSC_mkdirat fs/namei.c:3761 [inline]
>  SyS_mkdirat fs/namei.c:3745 [inline]
>  SYSC_mkdir fs/namei.c:3772 [inline]
>  SyS_mkdir+0x16e/0x290 fs/namei.c:3770
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x44ec87
> RSP: 002b:0000000001a2fe40 EFLAGS: 00000212 ORIG_RAX: 0000000000000053
> RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 000000000044ec87
> RDX: 0000000001a2fe6e RSI: 00000000000001ff RDI: 0000000001a2fe68
> RBP: 00000000000019ec R08: 0000000000000000 R09: 0000000000000006
> R10: 0000000000000064 R11: 0000000000000212 R12: 0000000001ef390c
> R13: 0000000000000000 R14: 00000000000a43b5 R15: 00000000000019ec
> Object at ffff8801c98f44c0, in cache task_struct size: 5696
> Allocated:
> PID = 3452
> [<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
> [<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
> [<ffffffff819f71da>] set_track mm/kasan/kasan.c:514 [inline]
> [<ffffffff819f71da>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605
> [<ffffffff819f77d2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
> [<ffffffff819f1652>] kmem_cache_alloc_node+0x122/0x690 mm/slab.c:3650
> [<ffffffff81421fe2>] alloc_task_struct_node kernel/fork.c:142 [inline]
> [<ffffffff81421fe2>] dup_task_struct kernel/fork.c:482 [inline]
> [<ffffffff81421fe2>] copy_process.part.42+0x1a32/0x5fd0 kernel/fork.c:1515
> [<ffffffff81426ac0>] copy_process kernel/fork.c:1486 [inline]
> [<ffffffff81426ac0>] _do_fork+0x200/0xff0 kernel/fork.c:1942
> [<ffffffff81427987>] SYSC_clone kernel/fork.c:2052 [inline]
> [<ffffffff81427987>] SyS_clone+0x37/0x50 kernel/fork.c:2046
> [<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
> [<ffffffff8440fb09>] return_from_SYSCALL_64+0x0/0x7a
> Freed:
> PID = 29885
> [<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
> [<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
> [<ffffffff819f784f>] set_track mm/kasan/kasan.c:514 [inline]
> [<ffffffff819f784f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578
> [<ffffffff819f4bf1>] __cache_free mm/slab.c:3502 [inline]
> [<ffffffff819f4bf1>] kmem_cache_free+0x71/0x240 mm/slab.c:3762
> [<ffffffff8141f041>] free_task_struct kernel/fork.c:147 [inline]
> [<ffffffff8141f041>] free_task+0x151/0x1d0 kernel/fork.c:359
> [<ffffffff8141f30b>] __put_task_struct+0x24b/0x5f0 kernel/fork.c:396
> [<ffffffff81435baa>] put_task_struct include/linux/sched.h:2257 [inline]
> [<ffffffff81435baa>] delayed_put_task_struct+0xca/0x3f0 kernel/exit.c:173
> [<ffffffff815ef250>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
> [<ffffffff815ef250>] rcu_do_batch.isra.70+0x9e0/0xdf0 kernel/rcu/tree.c:2780
> [<ffffffff815efad2>] invoke_rcu_callbacks kernel/rcu/tree.c:3043 [inline]
> [<ffffffff815efad2>] __rcu_process_callbacks kernel/rcu/tree.c:3010 [inline]
> [<ffffffff815efad2>] rcu_process_callbacks+0x472/0xc70 kernel/rcu/tree.c:3027
> [<ffffffff84412d7f>] __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
> Memory state around the buggy address:
>  ffff8801c98f4280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8801c98f4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>ffff8801c98f4380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>                                ^
>  ffff8801c98f4400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8801c98f4480: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> ==================================================================

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-01-23 10:23 ` Dmitry Vyukov
@ 2017-01-24 14:28   ` Eric Dumazet
  2017-01-24 15:06     ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2017-01-24 14:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller

On Mon, 2017-01-23 at 11:23 +0100, Dmitry Vyukov wrote:
> On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > Hello,
> >
> > While running syzkaller fuzzer I started seeing use-after-frees in
> > tw_timer_handler. It happens with very low frequency, so far I've seen
> > 22 of them. But all reports look consistent, so I would assume that it
> > is real, just requires a very tricky race to happen. I've stared
> > seeing it around Jan 17, however I did not update kernels for some
> > time before that so potentially the issues was introduced somewhat
> > earlier. Or maybe fuzzer just figured how to trigger it, and the bug
> > is actually old. I am seeing it on all of torvalds/linux-next/mmotm,
> > some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36,
> > 5cf7a0f3442b2312326c39f571d637669a478235,
> > c497f8d17246720afe680ea1a8fa6e48e75af852.
> > Majority of reports points to net_drop_ns as the offending free, but
> > it may be red herring. Since the access happens in timer, it can
> > happen long after free and the memory could have been reused. I've
> > also seen few where the access in tw_timer_handler is reported as
> > out-of-bounds on task_struct and on struct filename.
> 
> 
> 
> I've briefly skimmed through the code. Assuming that it requires a
> very tricky race to be triggered, the most suspicious looks
> inet_twsk_deschedule_put vs __inet_twsk_schedule:
> 
> void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
> {
>         if (del_timer_sync(&tw->tw_timer))
>                 inet_twsk_kill(tw);
>         inet_twsk_put(tw);
> }
> 
> void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm)
> {
>         tw->tw_kill = timeo <= 4*HZ;
>         if (!rearm) {
>                 BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
>                 atomic_inc(&tw->tw_dr->tw_count);
>         } else {
>                 mod_timer_pending(&tw->tw_timer, jiffies + timeo);
>         }
> }
> 
> Can't it somehow end up rearming already deleted timer? Or maybe the
> first mod_timer happens after del_timer_sync?

This code was changed a long time ago :

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054

So I suspect a recent patch broke the logic.

You might start a bisection :

I would check if 4.7 and 4.8 trigger the issue you noticed.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-01-24 14:28   ` Eric Dumazet
@ 2017-01-24 15:06     ` Dmitry Vyukov
  2017-01-24 15:52       ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-01-24 15:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller

On Tue, Jan 24, 2017 at 3:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-01-23 at 11:23 +0100, Dmitry Vyukov wrote:
>> On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> > Hello,
>> >
>> > While running syzkaller fuzzer I started seeing use-after-frees in
>> > tw_timer_handler. It happens with very low frequency, so far I've seen
>> > 22 of them. But all reports look consistent, so I would assume that it
>> > is real, just requires a very tricky race to happen. I've stared
>> > seeing it around Jan 17, however I did not update kernels for some
>> > time before that so potentially the issues was introduced somewhat
>> > earlier. Or maybe fuzzer just figured how to trigger it, and the bug
>> > is actually old. I am seeing it on all of torvalds/linux-next/mmotm,
>> > some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36,
>> > 5cf7a0f3442b2312326c39f571d637669a478235,
>> > c497f8d17246720afe680ea1a8fa6e48e75af852.
>> > Majority of reports points to net_drop_ns as the offending free, but
>> > it may be red herring. Since the access happens in timer, it can
>> > happen long after free and the memory could have been reused. I've
>> > also seen few where the access in tw_timer_handler is reported as
>> > out-of-bounds on task_struct and on struct filename.
>>
>>
>>
>> I've briefly skimmed through the code. Assuming that it requires a
>> very tricky race to be triggered, the most suspicious looks
>> inet_twsk_deschedule_put vs __inet_twsk_schedule:
>>
>> void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
>> {
>>         if (del_timer_sync(&tw->tw_timer))
>>                 inet_twsk_kill(tw);
>>         inet_twsk_put(tw);
>> }
>>
>> void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm)
>> {
>>         tw->tw_kill = timeo <= 4*HZ;
>>         if (!rearm) {
>>                 BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
>>                 atomic_inc(&tw->tw_dr->tw_count);
>>         } else {
>>                 mod_timer_pending(&tw->tw_timer, jiffies + timeo);
>>         }
>> }
>>
>> Can't it somehow end up rearming already deleted timer? Or maybe the
>> first mod_timer happens after del_timer_sync?
>
> This code was changed a long time ago :
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>
> So I suspect a recent patch broke the logic.
>
> You might start a bisection :
>
> I would check if 4.7 and 4.8 trigger the issue you noticed.


It happens with too low rate for bisecting (few times per day). I
could add some additional checks into code, but I don't know what
checks could be useful.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-01-24 15:06     ` Dmitry Vyukov
@ 2017-01-24 15:52       ` Eric Dumazet
  2017-02-08 17:36         ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2017-01-24 15:52 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller

On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> This code was changed a long time ago :
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>
>> So I suspect a recent patch broke the logic.
>>
>> You might start a bisection :
>>
>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>
>
> It happens with too low rate for bisecting (few times per day). I
> could add some additional checks into code, but I don't know what
> checks could be useful.

If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
we are able to help.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-01-24 15:52       ` Eric Dumazet
@ 2017-02-08 17:36         ` Dmitry Vyukov
  2017-02-08 17:58           ` Eric Dumazet
  2017-02-17 18:51           ` net: use-after-free in tw_timer_handler Cong Wang
  0 siblings, 2 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-08 17:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller

On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>
>>> This code was changed a long time ago :
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>
>>> So I suspect a recent patch broke the logic.
>>>
>>> You might start a bisection :
>>>
>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>
>>
>> It happens with too low rate for bisecting (few times per day). I
>> could add some additional checks into code, but I don't know what
>> checks could be useful.
>
> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
> we are able to help.

There are also chances that the problem is older.

Looking at the code, this part of inet_twsk_purge looks fishy:

285                         if (unlikely((tw->tw_family != family) ||
286                                      atomic_read(&twsk_net(tw)->count))) {

It uses net->count == 0 check to find the right sockets. But what if
there are several nets with count == 0 in flight, can't there be
several inet_twsk_purge calls running concurrently freeing each other
sockets? If so it looks like inet_twsk_purge can call
inet_twsk_deschedule_put twice for a socket. Namely, two calls for
different nets discover the socket, check that net->count==0 and both
call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
net that it needs to purge?

The second issue that I noticed is that tw_refcnt is set to 4 _after_
we schedule the timer. The timer will probably won't fire before we
set tw_refcnt, but if it somehow does it will corrupt the ref count. I
don't think that it's what I am seeing, though. More likely it's the
first issues (if it's real).

Does it make any sense?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-08 17:36         ` Dmitry Vyukov
@ 2017-02-08 17:58           ` Eric Dumazet
  2017-02-08 18:55             ` Dmitry Vyukov
  2017-02-17 18:51           ` net: use-after-free in tw_timer_handler Cong Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2017-02-08 17:58 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller

On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote:
> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote:
> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>
> >>> This code was changed a long time ago :
> >>>
> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
> >>>
> >>> So I suspect a recent patch broke the logic.
> >>>
> >>> You might start a bisection :
> >>>
> >>> I would check if 4.7 and 4.8 trigger the issue you noticed.
> >>
> >>
> >> It happens with too low rate for bisecting (few times per day). I
> >> could add some additional checks into code, but I don't know what
> >> checks could be useful.
> >
> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
> > we are able to help.
> 
> 
> There are also chances that the problem is older.
> 
> Looking at the code, this part of inet_twsk_purge looks fishy:
> 
> 285                         if (unlikely((tw->tw_family != family) ||
> 286                                      atomic_read(&twsk_net(tw)->count))) {
> 
> It uses net->count == 0 check to find the right sockets. But what if
> there are several nets with count == 0 in flight, can't there be
> several inet_twsk_purge calls running concurrently freeing each other
> sockets? If so it looks like inet_twsk_purge can call
> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
> different nets discover the socket, check that net->count==0 and both
> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
> net that it needs to purge?

Yes, atomic_read() is not a proper sync point.

> The second issue that I noticed is that tw_refcnt is set to 4 _after_
> we schedule the timer. The timer will probably won't fire before we
> set tw_refcnt, but if it somehow does it will corrupt the ref count. I
> don't think that it's what I am seeing, though. More likely it's the
> first issues (if it's real).
> 

Timer is pinned, it cannot fire under us on this cpu.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-08 17:58           ` Eric Dumazet
@ 2017-02-08 18:55             ` Dmitry Vyukov
  2017-02-08 19:17               ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-08 18:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller

On Wed, Feb 8, 2017 at 6:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote:
>> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote:
>> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> >>>
>> >>> This code was changed a long time ago :
>> >>>
>> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>> >>>
>> >>> So I suspect a recent patch broke the logic.
>> >>>
>> >>> You might start a bisection :
>> >>>
>> >>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>> >>
>> >>
>> >> It happens with too low rate for bisecting (few times per day). I
>> >> could add some additional checks into code, but I don't know what
>> >> checks could be useful.
>> >
>> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>> > we are able to help.
>>
>>
>> There are also chances that the problem is older.
>>
>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>
>> 285                         if (unlikely((tw->tw_family != family) ||
>> 286                                      atomic_read(&twsk_net(tw)->count))) {
>>
>> It uses net->count == 0 check to find the right sockets. But what if
>> there are several nets with count == 0 in flight, can't there be
>> several inet_twsk_purge calls running concurrently freeing each other
>> sockets? If so it looks like inet_twsk_purge can call
>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>> different nets discover the socket, check that net->count==0 and both
>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>> net that it needs to purge?
>
> Yes, atomic_read() is not a proper sync point.

Do you mean that it does not include read barrier?
I more mean that we can call inet_twsk_deschedule_put twice for the same socket.


>> The second issue that I noticed is that tw_refcnt is set to 4 _after_
>> we schedule the timer. The timer will probably won't fire before we
>> set tw_refcnt, but if it somehow does it will corrupt the ref count. I
>> don't think that it's what I am seeing, though. More likely it's the
>> first issues (if it's real).
>>
>
> Timer is pinned, it cannot fire under us on this cpu.

Ack

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-08 18:55             ` Dmitry Vyukov
@ 2017-02-08 19:17               ` Eric Dumazet
  2017-02-08 19:32                 ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2017-02-08 19:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller

On Wed, 2017-02-08 at 19:55 +0100, Dmitry Vyukov wrote:
> On Wed, Feb 8, 2017 at 6:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote:
> >> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote:
> >> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >> >>>
> >> >>> This code was changed a long time ago :
> >> >>>
> >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
> >> >>>
> >> >>> So I suspect a recent patch broke the logic.
> >> >>>
> >> >>> You might start a bisection :
> >> >>>
> >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed.
> >> >>
> >> >>
> >> >> It happens with too low rate for bisecting (few times per day). I
> >> >> could add some additional checks into code, but I don't know what
> >> >> checks could be useful.
> >> >
> >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
> >> > we are able to help.
> >>
> >>
> >> There are also chances that the problem is older.
> >>
> >> Looking at the code, this part of inet_twsk_purge looks fishy:
> >>
> >> 285                         if (unlikely((tw->tw_family != family) ||
> >> 286                                      atomic_read(&twsk_net(tw)->count))) {
> >>
> >> It uses net->count == 0 check to find the right sockets. But what if
> >> there are several nets with count == 0 in flight, can't there be
> >> several inet_twsk_purge calls running concurrently freeing each other
> >> sockets? If so it looks like inet_twsk_purge can call
> >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
> >> different nets discover the socket, check that net->count==0 and both
> >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
> >> net that it needs to purge?
> >
> > Yes, atomic_read() is not a proper sync point.
> 
> Do you mean that it does not include read barrier?
> I more mean that we can call inet_twsk_deschedule_put twice for the same socket.

I meant that this code assumed RTNL being held.

This might not be the case now, after some old change.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-08 19:17               ` Eric Dumazet
@ 2017-02-08 19:32                 ` Dmitry Vyukov
  2017-02-14 19:38                   ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-08 19:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller

On Wed, Feb 8, 2017 at 8:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-02-08 at 19:55 +0100, Dmitry Vyukov wrote:
>> On Wed, Feb 8, 2017 at 6:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Wed, 2017-02-08 at 18:36 +0100, Dmitry Vyukov wrote:
>> >> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote:
>> >> > On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> >> >>>
>> >> >>> This code was changed a long time ago :
>> >> >>>
>> >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>> >> >>>
>> >> >>> So I suspect a recent patch broke the logic.
>> >> >>>
>> >> >>> You might start a bisection :
>> >> >>>
>> >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>> >> >>
>> >> >>
>> >> >> It happens with too low rate for bisecting (few times per day). I
>> >> >> could add some additional checks into code, but I don't know what
>> >> >> checks could be useful.
>> >> >
>> >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>> >> > we are able to help.
>> >>
>> >>
>> >> There are also chances that the problem is older.
>> >>
>> >> Looking at the code, this part of inet_twsk_purge looks fishy:
>> >>
>> >> 285                         if (unlikely((tw->tw_family != family) ||
>> >> 286                                      atomic_read(&twsk_net(tw)->count))) {
>> >>
>> >> It uses net->count == 0 check to find the right sockets. But what if
>> >> there are several nets with count == 0 in flight, can't there be
>> >> several inet_twsk_purge calls running concurrently freeing each other
>> >> sockets? If so it looks like inet_twsk_purge can call
>> >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>> >> different nets discover the socket, check that net->count==0 and both
>> >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>> >> net that it needs to purge?
>> >
>> > Yes, atomic_read() is not a proper sync point.
>>
>> Do you mean that it does not include read barrier?
>> I more mean that we can call inet_twsk_deschedule_put twice for the same socket.
>
> I meant that this code assumed RTNL being held.
>
> This might not be the case now, after some old change.


cleanup_net releases rtnl lock right before calling these callbacks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-08 19:32                 ` Dmitry Vyukov
@ 2017-02-14 19:38                   ` Dmitry Vyukov
  2017-02-21 11:27                     ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-14 19:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, syzkaller, Andrey Ryabinin

On Wed, Feb 8, 2017 at 8:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> >> >>> This code was changed a long time ago :
>>> >> >>>
>>> >> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>> >> >>>
>>> >> >>> So I suspect a recent patch broke the logic.
>>> >> >>>
>>> >> >>> You might start a bisection :
>>> >> >>>
>>> >> >>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>> >> >>
>>> >> >>
>>> >> >> It happens with too low rate for bisecting (few times per day). I
>>> >> >> could add some additional checks into code, but I don't know what
>>> >> >> checks could be useful.
>>> >> >
>>> >> > If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>> >> > we are able to help.
>>> >>
>>> >>
>>> >> There are also chances that the problem is older.
>>> >>
>>> >> Looking at the code, this part of inet_twsk_purge looks fishy:
>>> >>
>>> >> 285                         if (unlikely((tw->tw_family != family) ||
>>> >> 286                                      atomic_read(&twsk_net(tw)->count))) {
>>> >>
>>> >> It uses net->count == 0 check to find the right sockets. But what if
>>> >> there are several nets with count == 0 in flight, can't there be
>>> >> several inet_twsk_purge calls running concurrently freeing each other
>>> >> sockets? If so it looks like inet_twsk_purge can call
>>> >> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>>> >> different nets discover the socket, check that net->count==0 and both
>>> >> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>>> >> net that it needs to purge?
>>> >
>>> > Yes, atomic_read() is not a proper sync point.
>>>
>>> Do you mean that it does not include read barrier?
>>> I more mean that we can call inet_twsk_deschedule_put twice for the same socket.
>>
>> I meant that this code assumed RTNL being held.
>>
>> This might not be the case now, after some old change.
>
>
> cleanup_net releases rtnl lock right before calling these callbacks.


+Andrey, do you know somebody on your side interested in stability of
network namespace?
This use-after-free seems to be related to net namespace. For context,
full thread is here:
https://groups.google.com/forum/#!msg/syzkaller/p1tn-_Kc6l4/smuL_FMAAgAJ

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-14 19:38                   ` Dmitry Vyukov
@ 2017-02-21 11:27                     ` Andrey Ryabinin
  2017-02-21 11:56                       ` Dmitry Vyukov
                                         ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Andrey Ryabinin @ 2017-02-21 11:27 UTC (permalink / raw)
  To: Gerrit Renker, David S. Miller, dccp
  Cc: Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev,
	linux-kernel, Andrey Ryabinin

DCCP doesn't purge timewait sockets on network namespace shutdown.
So, after net namespace destroyed we could still have an active timer
which will trigger use after free in tw_timer_handler():

    BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
    Read of size 8 by task swapper/1/0
    Call Trace:
     __asan_load8+0x54/0x90
     tw_timer_handler+0x4a/0xa0
     call_timer_fn+0x127/0x480
     expire_timers+0x1db/0x2e0
     run_timer_softirq+0x12f/0x2a0
     __do_softirq+0x105/0x5b4
     irq_exit+0xdd/0xf0
     smp_apic_timer_interrupt+0x57/0x70
     apic_timer_interrupt+0x90/0xa0

    Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
    Allocated:
     save_stack_trace+0x1b/0x20
     kasan_kmalloc+0xee/0x180
     kasan_slab_alloc+0x12/0x20
     kmem_cache_alloc+0x134/0x310
     copy_net_ns+0x8d/0x280
     create_new_namespaces+0x23f/0x340
     unshare_nsproxy_namespaces+0x75/0xf0
     SyS_unshare+0x299/0x4f0
     entry_SYSCALL_64_fastpath+0x18/0xad
    Freed:
     save_stack_trace+0x1b/0x20
     kasan_slab_free+0xae/0x180
     kmem_cache_free+0xb4/0x350
     net_drop_ns+0x3f/0x50
     cleanup_net+0x3df/0x450
     process_one_work+0x419/0xbb0
     worker_thread+0x92/0x850
     kthread+0x192/0x1e0
     ret_from_fork+0x2e/0x40

Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
timewait sockets on net namespace destruction and prevent above issue.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 net/dccp/ipv4.c | 6 ++++++
 net/dccp/ipv6.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d859a5c..da7cb16 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
 	inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
 }
 
+static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
+{
+	inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
+}
+
 static struct pernet_operations dccp_v4_ops = {
 	.init	= dccp_v4_init_net,
 	.exit	= dccp_v4_exit_net,
+	.exit_batch = dccp_v4_exit_batch,
 };
 
 static int __init dccp_v4_init(void)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index c4e879c..f3d8f92 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
 	inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
 }
 
+static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
+{
+	inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6);
+}
+
 static struct pernet_operations dccp_v6_ops = {
 	.init   = dccp_v6_init_net,
 	.exit   = dccp_v6_exit_net,
+	.exit_batch = dccp_v6_exit_batch,
 };
 
 static int __init dccp_v6_init(void)
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 11:27                     ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin
@ 2017-02-21 11:56                       ` Dmitry Vyukov
  2017-02-22  6:48                         ` Dmitry Vyukov
  2017-02-21 13:43                       ` Arnaldo Carvalho de Melo
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-21 11:56 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Gerrit Renker, David S. Miller, dccp, Eric Dumazet, Cong Wang,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller,
	netdev, LKML

On Tue, Feb 21, 2017 at 2:27 PM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
>
>     BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
>     Read of size 8 by task swapper/1/0
>     Call Trace:
>      __asan_load8+0x54/0x90
>      tw_timer_handler+0x4a/0xa0
>      call_timer_fn+0x127/0x480
>      expire_timers+0x1db/0x2e0
>      run_timer_softirq+0x12f/0x2a0
>      __do_softirq+0x105/0x5b4
>      irq_exit+0xdd/0xf0
>      smp_apic_timer_interrupt+0x57/0x70
>      apic_timer_interrupt+0x90/0xa0
>
>     Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
>     Allocated:
>      save_stack_trace+0x1b/0x20
>      kasan_kmalloc+0xee/0x180
>      kasan_slab_alloc+0x12/0x20
>      kmem_cache_alloc+0x134/0x310
>      copy_net_ns+0x8d/0x280
>      create_new_namespaces+0x23f/0x340
>      unshare_nsproxy_namespaces+0x75/0xf0
>      SyS_unshare+0x299/0x4f0
>      entry_SYSCALL_64_fastpath+0x18/0xad
>     Freed:
>      save_stack_trace+0x1b/0x20
>      kasan_slab_free+0xae/0x180
>      kmem_cache_free+0xb4/0x350
>      net_drop_ns+0x3f/0x50
>      cleanup_net+0x3df/0x450
>      process_one_work+0x419/0xbb0
>      worker_thread+0x92/0x850
>      kthread+0x192/0x1e0
>      ret_from_fork+0x2e/0x40
>
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.


Aha! Thanks for tracking this down!

Queued the patch on syzkaller bots, should have the verdict in several hours.



> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> ---
>  net/dccp/ipv4.c | 6 ++++++
>  net/dccp/ipv6.c | 6 ++++++
>  2 files changed, 12 insertions(+)
>
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index d859a5c..da7cb16 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
>         inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
>  }
>
> +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
> +{
> +       inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
> +}
> +
>  static struct pernet_operations dccp_v4_ops = {
>         .init   = dccp_v4_init_net,
>         .exit   = dccp_v4_exit_net,
> +       .exit_batch = dccp_v4_exit_batch,
>  };
>
>  static int __init dccp_v4_init(void)
> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> index c4e879c..f3d8f92 100644
> --- a/net/dccp/ipv6.c
> +++ b/net/dccp/ipv6.c
> @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
>         inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
>  }
>
> +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
> +{
> +       inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6);
> +}
> +
>  static struct pernet_operations dccp_v6_ops = {
>         .init   = dccp_v6_init_net,
>         .exit   = dccp_v6_exit_net,
> +       .exit_batch = dccp_v6_exit_batch,
>  };
>
>  static int __init dccp_v6_init(void)
> --
> 2.10.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 11:56                       ` Dmitry Vyukov
@ 2017-02-22  6:48                         ` Dmitry Vyukov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-22  6:48 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Gerrit Renker, David S. Miller, dccp, Eric Dumazet, Cong Wang,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller,
	netdev, LKML

On Tue, Feb 21, 2017 at 2:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Feb 21, 2017 at 2:27 PM, Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
>> DCCP doesn't purge timewait sockets on network namespace shutdown.
>> So, after net namespace destroyed we could still have an active timer
>> which will trigger use after free in tw_timer_handler():
>>
>>     BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
>>     Read of size 8 by task swapper/1/0
>>     Call Trace:
>>      __asan_load8+0x54/0x90
>>      tw_timer_handler+0x4a/0xa0
>>      call_timer_fn+0x127/0x480
>>      expire_timers+0x1db/0x2e0
>>      run_timer_softirq+0x12f/0x2a0
>>      __do_softirq+0x105/0x5b4
>>      irq_exit+0xdd/0xf0
>>      smp_apic_timer_interrupt+0x57/0x70
>>      apic_timer_interrupt+0x90/0xa0
>>
>>     Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
>>     Allocated:
>>      save_stack_trace+0x1b/0x20
>>      kasan_kmalloc+0xee/0x180
>>      kasan_slab_alloc+0x12/0x20
>>      kmem_cache_alloc+0x134/0x310
>>      copy_net_ns+0x8d/0x280
>>      create_new_namespaces+0x23f/0x340
>>      unshare_nsproxy_namespaces+0x75/0xf0
>>      SyS_unshare+0x299/0x4f0
>>      entry_SYSCALL_64_fastpath+0x18/0xad
>>     Freed:
>>      save_stack_trace+0x1b/0x20
>>      kasan_slab_free+0xae/0x180
>>      kmem_cache_free+0xb4/0x350
>>      net_drop_ns+0x3f/0x50
>>      cleanup_net+0x3df/0x450
>>      process_one_work+0x419/0xbb0
>>      worker_thread+0x92/0x850
>>      kthread+0x192/0x1e0
>>      ret_from_fork+0x2e/0x40
>>
>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>> timewait sockets on net namespace destruction and prevent above issue.
>
>
> Aha! Thanks for tracking this down!
>
> Queued the patch on syzkaller bots, should have the verdict in several hours.

I do not see the crash happening after applying the patch.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 11:27                     ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin
  2017-02-21 11:56                       ` Dmitry Vyukov
@ 2017-02-21 13:43                       ` Arnaldo Carvalho de Melo
  2017-02-21 13:53                         ` Eric Dumazet
  2017-02-22  8:59                         ` Andrey Ryabinin
  2017-02-21 18:23                       ` David Miller
  2017-02-22  9:35                       ` [PATCH v2] " Andrey Ryabinin
  3 siblings, 2 replies; 28+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-02-21 13:43 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Eric W. Biederman, Gerrit Renker, David S. Miller, dccp,
	Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev,
	linux-kernel

Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
> 
> 
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.

Please add this, to help stable kernels to pick this up

Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
Cc: Eric W. Biederman <ebiederm@xmission.com> 

[acme@jouet linux]$ git describe b099ce2602d8
v2.6.32-rc8-1977-gb099ce2602d8

This one added the pernet operations related to network namespaces, but
then the one above got missed.

commit 72a2d6138224298a576bcdc33d7d0004de604856
Author: Pavel Emelyanov <xemul@openvz.org>
Date:   Sun Apr 13 22:29:13 2008 -0700

    [NETNS][DCCPV4]: Add dummy per-net operations.

----------------------------------

It looks ok, so please consider adding my:

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo

> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> ---
>  net/dccp/ipv4.c | 6 ++++++
>  net/dccp/ipv6.c | 6 ++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index d859a5c..da7cb16 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
>  	inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
>  }
>  
> +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
> +{
> +	inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
> +}
> +
>  static struct pernet_operations dccp_v4_ops = {
>  	.init	= dccp_v4_init_net,
>  	.exit	= dccp_v4_exit_net,
> +	.exit_batch = dccp_v4_exit_batch,
>  };
>  
>  static int __init dccp_v4_init(void)
> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> index c4e879c..f3d8f92 100644
> --- a/net/dccp/ipv6.c
> +++ b/net/dccp/ipv6.c
> @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
>  	inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
>  }
>  
> +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
> +{
> +	inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6);
> +}
> +
>  static struct pernet_operations dccp_v6_ops = {
>  	.init   = dccp_v6_init_net,
>  	.exit   = dccp_v6_exit_net,
> +	.exit_batch = dccp_v6_exit_batch,
>  };
>  
>  static int __init dccp_v6_init(void)
> -- 
> 2.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe dccp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 13:43                       ` Arnaldo Carvalho de Melo
@ 2017-02-21 13:53                         ` Eric Dumazet
  2017-02-21 18:23                           ` David Miller
  2017-02-22  8:59                         ` Andrey Ryabinin
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2017-02-21 13:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andrey Ryabinin, Eric W. Biederman, Gerrit Renker,
	David S. Miller, dccp, Dmitry Vyukov, Cong Wang, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML

On Tue, Feb 21, 2017 at 5:43 AM, Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
> > DCCP doesn't purge timewait sockets on network namespace shutdown.
> > So, after net namespace destroyed we could still have an active timer
> > which will trigger use after free in tw_timer_handler():
> >
> >
> > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> > timewait sockets on net namespace destruction and prevent above issue.
>
> Please add this, to help stable kernels to pick this up
>
> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
> Cc: Eric W. Biederman <ebiederm@xmission.com>


This patch has nothing to do with this bug really.

Look at commit d315492b1a6ba29da0fa2860759505ae1b2db857
("netns : fix kernel panic in timewait socket destruction")

Back in 2008, nobody spotted that DCCP was using the same infra.

When can we get rid of DCCP in linux so that syszkaller team no longer
spend time on it ?

Thanks.

>
> [acme@jouet linux]$ git describe b099ce2602d8
> v2.6.32-rc8-1977-gb099ce2602d8
>
> This one added the pernet operations related to network namespaces, but
> then the one above got missed.
>
> commit 72a2d6138224298a576bcdc33d7d0004de604856
> Author: Pavel Emelyanov <xemul@openvz.org>
> Date:   Sun Apr 13 22:29:13 2008 -0700
>
>     [NETNS][DCCPV4]: Add dummy per-net operations.
>
> ----------------------------------
>
> It looks ok, so please consider adding my:
>
> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> - Arnaldo
>
> > Reported-by: Dmitry Vyukov <dvyukov@google.com>
> > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> > ---
> >  net/dccp/ipv4.c | 6 ++++++
> >  net/dccp/ipv6.c | 6 ++++++
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> > index d859a5c..da7cb16 100644
> > --- a/net/dccp/ipv4.c
> > +++ b/net/dccp/ipv4.c
> > @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
> >       inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
> >  }
> >
> > +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
> > +{
> > +     inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
> > +}
> > +
> >  static struct pernet_operations dccp_v4_ops = {
> >       .init   = dccp_v4_init_net,
> >       .exit   = dccp_v4_exit_net,
> > +     .exit_batch = dccp_v4_exit_batch,
> >  };
> >
> >  static int __init dccp_v4_init(void)
> > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> > index c4e879c..f3d8f92 100644
> > --- a/net/dccp/ipv6.c
> > +++ b/net/dccp/ipv6.c
> > @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
> >       inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
> >  }
> >
> > +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
> > +{
> > +     inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6);
> > +}
> > +
> >  static struct pernet_operations dccp_v6_ops = {
> >       .init   = dccp_v6_init_net,
> >       .exit   = dccp_v6_exit_net,
> > +     .exit_batch = dccp_v6_exit_batch,
> >  };
> >
> >  static int __init dccp_v6_init(void)
> > --
> > 2.10.2
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe dccp" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 13:53                         ` Eric Dumazet
@ 2017-02-21 18:23                           ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-02-21 18:23 UTC (permalink / raw)
  To: edumazet
  Cc: acme, aryabinin, ebiederm, gerrit, dccp, dvyukov, xiyou.wangcong,
	kuznet, yoshfuji, kaber, syzkaller, netdev, linux-kernel

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 21 Feb 2017 05:53:13 -0800

> On Tue, Feb 21, 2017 at 5:43 AM, Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
>>
>> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
>> > DCCP doesn't purge timewait sockets on network namespace shutdown.
>> > So, after net namespace destroyed we could still have an active timer
>> > which will trigger use after free in tw_timer_handler():
>> >
>> >
>> > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>> > timewait sockets on net namespace destruction and prevent above issue.
>>
>> Please add this, to help stable kernels to pick this up
>>
>> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
>> Cc: Eric W. Biederman <ebiederm@xmission.com>
> 
> 
> This patch has nothing to do with this bug really.
> 
> Look at commit d315492b1a6ba29da0fa2860759505ae1b2db857
> ("netns : fix kernel panic in timewait socket destruction")
> 
> Back in 2008, nobody spotted that DCCP was using the same infra.

So, let me get this straight, dccp is buggy because it tried as hard as
possible to share and use common pieces of infrastructure instead of
duplicating all of said logic?

Now I've heard everything.

I know it has been a pain in the rear fixing all of these dccp bugs,
but removing it from the tree or even pushing it into staging is
simply not an option.  So we better come up with a better plan based
upon reality rather than fantasy. :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 13:43                       ` Arnaldo Carvalho de Melo
  2017-02-21 13:53                         ` Eric Dumazet
@ 2017-02-22  8:59                         ` Andrey Ryabinin
  1 sibling, 0 replies; 28+ messages in thread
From: Andrey Ryabinin @ 2017-02-22  8:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Eric W. Biederman, Gerrit Renker, David S. Miller, dccp,
	Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev,
	linux-kernel

On 02/21/2017 04:43 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
>> DCCP doesn't purge timewait sockets on network namespace shutdown.
>> So, after net namespace destroyed we could still have an active timer
>> which will trigger use after free in tw_timer_handler():
>>
>>
>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>> timewait sockets on net namespace destruction and prevent above issue.
> 
> Please add this, to help stable kernels to pick this up
> 
> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
> Cc: Eric W. Biederman <ebiederm@xmission.com> 
> 

Fixes tag should blame commit f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH").
It introduced use of net namespace in the timer callback.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 11:27                     ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin
  2017-02-21 11:56                       ` Dmitry Vyukov
  2017-02-21 13:43                       ` Arnaldo Carvalho de Melo
@ 2017-02-21 18:23                       ` David Miller
  2017-02-21 18:24                         ` David Miller
  2017-02-22  9:35                       ` [PATCH v2] " Andrey Ryabinin
  3 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2017-02-21 18:23 UTC (permalink / raw)
  To: aryabinin
  Cc: gerrit, dccp, dvyukov, edumazet, xiyou.wangcong, kuznet, yoshfuji,
	kaber, syzkaller, netdev, linux-kernel

From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 21 Feb 2017 14:27:40 +0300

> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
 ...
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.
> 
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 18:23                       ` David Miller
@ 2017-02-21 18:24                         ` David Miller
  2017-02-22  9:35                           ` Andrey Ryabinin
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2017-02-21 18:24 UTC (permalink / raw)
  To: aryabinin
  Cc: gerrit, dccp, dvyukov, edumazet, xiyou.wangcong, kuznet, yoshfuji,
	kaber, syzkaller, netdev, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Tue, 21 Feb 2017 13:23:51 -0500 (EST)

> From: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Date: Tue, 21 Feb 2017 14:27:40 +0300
> 
>> DCCP doesn't purge timewait sockets on network namespace shutdown.
>> So, after net namespace destroyed we could still have an active timer
>> which will trigger use after free in tw_timer_handler():
>  ...
>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>> timewait sockets on net namespace destruction and prevent above issue.
>> 
>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> 
> Applied and queued up for -stable, thanks.

Actually, this doesn't even compile.  Please fix this up and resubmit:

net/dccp/ipv4.c: In function ‘dccp_v4_exit_batch’:
net/dccp/ipv4.c:1022:34: warning: passing argument 2 of ‘inet_twsk_purge’ makes integer from pointer without a cast [-Wint-conversion]
  inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
                                  ^
In file included from ./include/linux/dccp.h:14:0,
                 from net/dccp/ipv4.c:13:
./include/net/inet_timewait_sock.h:118:6: note: expected ‘int’ but argument is of type ‘struct inet_timewait_death_row *’
 void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family);
      ^
net/dccp/ipv4.c:1022:2: error: too many arguments to function ‘inet_twsk_purge’
  inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
  ^

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 18:24                         ` David Miller
@ 2017-02-22  9:35                           ` Andrey Ryabinin
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Ryabinin @ 2017-02-22  9:35 UTC (permalink / raw)
  To: David Miller
  Cc: gerrit, dccp, dvyukov, edumazet, xiyou.wangcong, kuznet, yoshfuji,
	kaber, syzkaller, netdev, linux-kernel

On 02/21/2017 09:24 PM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 21 Feb 2017 13:23:51 -0500 (EST)
> 
>> From: Andrey Ryabinin <aryabinin@virtuozzo.com>
>> Date: Tue, 21 Feb 2017 14:27:40 +0300
>>
>>> DCCP doesn't purge timewait sockets on network namespace shutdown.
>>> So, after net namespace destroyed we could still have an active timer
>>> which will trigger use after free in tw_timer_handler():
>>  ...
>>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>>> timewait sockets on net namespace destruction and prevent above issue.
>>>
>>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>>> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
>>
>> Applied and queued up for -stable, thanks.
> 
> Actually, this doesn't even compile.  Please fix this up and resubmit:
> 

Right, I tested this on top of the Linus' tree. Rebased on -next now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] net/dccp: fix use after free in tw_timer_handler()
  2017-02-21 11:27                     ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin
                                         ` (2 preceding siblings ...)
  2017-02-21 18:23                       ` David Miller
@ 2017-02-22  9:35                       ` Andrey Ryabinin
  2017-02-22 21:15                         ` David Miller
  3 siblings, 1 reply; 28+ messages in thread
From: Andrey Ryabinin @ 2017-02-22  9:35 UTC (permalink / raw)
  To: Gerrit Renker, David S. Miller
  Cc: Andrey Ryabinin, Eric Dumazet, Arnaldo Carvalho de Melo, dccp,
	Dmitry Vyukov, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Patrick McHardy, syzkaller, netdev, LKML

DCCP doesn't purge timewait sockets on network namespace shutdown.
So, after net namespace destroyed we could still have an active timer
which will trigger use after free in tw_timer_handler():

    BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
    Read of size 8 by task swapper/1/0
    Call Trace:
     __asan_load8+0x54/0x90
     tw_timer_handler+0x4a/0xa0
     call_timer_fn+0x127/0x480
     expire_timers+0x1db/0x2e0
     run_timer_softirq+0x12f/0x2a0
     __do_softirq+0x105/0x5b4
     irq_exit+0xdd/0xf0
     smp_apic_timer_interrupt+0x57/0x70
     apic_timer_interrupt+0x90/0xa0

    Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
    Allocated:
     save_stack_trace+0x1b/0x20
     kasan_kmalloc+0xee/0x180
     kasan_slab_alloc+0x12/0x20
     kmem_cache_alloc+0x134/0x310
     copy_net_ns+0x8d/0x280
     create_new_namespaces+0x23f/0x340
     unshare_nsproxy_namespaces+0x75/0xf0
     SyS_unshare+0x299/0x4f0
     entry_SYSCALL_64_fastpath+0x18/0xad
    Freed:
     save_stack_trace+0x1b/0x20
     kasan_slab_free+0xae/0x180
     kmem_cache_free+0xb4/0x350
     net_drop_ns+0x3f/0x50
     cleanup_net+0x3df/0x450
     process_one_work+0x419/0xbb0
     worker_thread+0x92/0x850
     kthread+0x192/0x1e0
     ret_from_fork+0x2e/0x40

Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
timewait sockets on net namespace destruction and prevent above issue.

Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---

Changes since v1:
 - Rebased on top the -next
 - Added Fixes/Acked tags.

 net/dccp/ipv4.c | 6 ++++++
 net/dccp/ipv6.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index b043ec8..409d0cf 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -1017,9 +1017,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
 	inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
 }
 
+static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
+{
+	inet_twsk_purge(&dccp_hashinfo, AF_INET);
+}
+
 static struct pernet_operations dccp_v4_ops = {
 	.init	= dccp_v4_init_net,
 	.exit	= dccp_v4_exit_net,
+	.exit_batch = dccp_v4_exit_batch,
 };
 
 static int __init dccp_v4_init(void)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index cef60a4..233b573 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1075,9 +1075,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
 	inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
 }
 
+static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
+{
+	inet_twsk_purge(&dccp_hashinfo, AF_INET6);
+}
+
 static struct pernet_operations dccp_v6_ops = {
 	.init   = dccp_v6_init_net,
 	.exit   = dccp_v6_exit_net,
+	.exit_batch = dccp_v6_exit_batch,
 };
 
 static int __init dccp_v6_init(void)
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] net/dccp: fix use after free in tw_timer_handler()
  2017-02-22  9:35                       ` [PATCH v2] " Andrey Ryabinin
@ 2017-02-22 21:15                         ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-02-22 21:15 UTC (permalink / raw)
  To: aryabinin
  Cc: gerrit, edumazet, acme, dccp, dvyukov, xiyou.wangcong, kuznet,
	yoshfuji, kaber, syzkaller, netdev, linux-kernel

From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Wed, 22 Feb 2017 12:35:27 +0300

> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
 ...
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.
> 
> Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Applied and queued up for -sable, thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-08 17:36         ` Dmitry Vyukov
  2017-02-08 17:58           ` Eric Dumazet
@ 2017-02-17 18:51           ` Cong Wang
  2017-02-17 20:36             ` Dmitry Vyukov
  1 sibling, 1 reply; 28+ messages in thread
From: Cong Wang @ 2017-02-17 18:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	syzkaller

On Wed, Feb 8, 2017 at 9:36 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet <edumazet@google.com> wrote:
>> On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>
>>>> This code was changed a long time ago :
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>
>>>> So I suspect a recent patch broke the logic.
>>>>
>>>> You might start a bisection :
>>>>
>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>
>>>
>>> It happens with too low rate for bisecting (few times per day). I
>>> could add some additional checks into code, but I don't know what
>>> checks could be useful.
>>
>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>> we are able to help.
>
>
> There are also chances that the problem is older.
>
> Looking at the code, this part of inet_twsk_purge looks fishy:
>
> 285                         if (unlikely((tw->tw_family != family) ||
> 286                                      atomic_read(&twsk_net(tw)->count))) {
>
> It uses net->count == 0 check to find the right sockets. But what if
> there are several nets with count == 0 in flight, can't there be
> several inet_twsk_purge calls running concurrently freeing each other
> sockets? If so it looks like inet_twsk_purge can call
> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
> different nets discover the socket, check that net->count==0 and both
> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
> net that it needs to purge?

I don't think this could happen, because cleanup_net() is called in a
work struct, and this work can't be scheduled twice, so there should
not be any parallel cleanup_net().

Also, inet_twsk_deschedule_put() already waits for the flying timer,
net->count==0 at this point and all sockets in this netns are already
gone, so I don't know how a timer could be still fired after this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-17 18:51           ` net: use-after-free in tw_timer_handler Cong Wang
@ 2017-02-17 20:36             ` Dmitry Vyukov
  2017-02-17 22:30               ` Cong Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-17 20:36 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	syzkaller

On Fri, Feb 17, 2017 at 7:51 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>
>>>>> This code was changed a long time ago :
>>>>>
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>>
>>>>> So I suspect a recent patch broke the logic.
>>>>>
>>>>> You might start a bisection :
>>>>>
>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>>
>>>>
>>>> It happens with too low rate for bisecting (few times per day). I
>>>> could add some additional checks into code, but I don't know what
>>>> checks could be useful.
>>>
>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>> we are able to help.
>>
>>
>> There are also chances that the problem is older.
>>
>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>
>> 285                         if (unlikely((tw->tw_family != family) ||
>> 286                                      atomic_read(&twsk_net(tw)->count))) {
>>
>> It uses net->count == 0 check to find the right sockets. But what if
>> there are several nets with count == 0 in flight, can't there be
>> several inet_twsk_purge calls running concurrently freeing each other
>> sockets? If so it looks like inet_twsk_purge can call
>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>> different nets discover the socket, check that net->count==0 and both
>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>> net that it needs to purge?
>
> I don't think this could happen, because cleanup_net() is called in a
> work struct, and this work can't be scheduled twice, so there should
> not be any parallel cleanup_net().
>
> Also, inet_twsk_deschedule_put() already waits for the flying timer,
> net->count==0 at this point and all sockets in this netns are already
> gone, so I don't know how a timer could be still fired after this.


One thing that I noticed is that use-after-free always happens in
tw_timer_handler, but never in timer code. This suggests that:
1. Whoever frees the struct waits for the timer mutex unlock.
2. Free happens almost at the same time as timer fires (otherwise the
timer would probably be cancelled).

That could happen if there is a race in timer code during cancellation
or rearming. I understand that it's heavily stressed code, but who
knows...

I still wonder if twsk_net(tw)->count==0 is the right condition. This
net may not yet have been scheduled for destruction, but another task
could pick it up and start destroying...

Cong, do you have any ideas as to what debugging checks I could add to
help track this down?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-17 20:36             ` Dmitry Vyukov
@ 2017-02-17 22:30               ` Cong Wang
  2017-02-21  9:46                 ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Cong Wang @ 2017-02-17 22:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	syzkaller

On Fri, Feb 17, 2017 at 12:36 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Feb 17, 2017 at 7:51 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>>
>>>>>> This code was changed a long time ago :
>>>>>>
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>>>
>>>>>> So I suspect a recent patch broke the logic.
>>>>>>
>>>>>> You might start a bisection :
>>>>>>
>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>>>
>>>>>
>>>>> It happens with too low rate for bisecting (few times per day). I
>>>>> could add some additional checks into code, but I don't know what
>>>>> checks could be useful.
>>>>
>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>>> we are able to help.
>>>
>>>
>>> There are also chances that the problem is older.
>>>
>>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>>
>>> 285                         if (unlikely((tw->tw_family != family) ||
>>> 286                                      atomic_read(&twsk_net(tw)->count))) {
>>>
>>> It uses net->count == 0 check to find the right sockets. But what if
>>> there are several nets with count == 0 in flight, can't there be
>>> several inet_twsk_purge calls running concurrently freeing each other
>>> sockets? If so it looks like inet_twsk_purge can call
>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>>> different nets discover the socket, check that net->count==0 and both
>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>>> net that it needs to purge?
>>
>> I don't think this could happen, because cleanup_net() is called in a
>> work struct, and this work can't be scheduled twice, so there should
>> not be any parallel cleanup_net().
>>
>> Also, inet_twsk_deschedule_put() already waits for the flying timer,
>> net->count==0 at this point and all sockets in this netns are already
>> gone, so I don't know how a timer could be still fired after this.
>
>
> One thing that I noticed is that use-after-free always happens in
> tw_timer_handler, but never in timer code. This suggests that:
> 1. Whoever frees the struct waits for the timer mutex unlock.
> 2. Free happens almost at the same time as timer fires (otherwise the
> timer would probably be cancelled).
>
> That could happen if there is a race in timer code during cancellation
> or rearming. I understand that it's heavily stressed code, but who
> knows...

Good point! I think this could happen since timer is deactivated before
unlinking the tw sock, so another CPU could find it and rearm the timer
during such window?

>
> I still wonder if twsk_net(tw)->count==0 is the right condition. This
> net may not yet have been scheduled for destruction, but another task
> could pick it up and start destroying...

Good question. net_exit_list is just a snapshot of the cleanup_list, so
it is true that inet_twsk_purge() could purge more netns'es than in
net_exit_list, but I don't see any problem with this, the work later can't
find the same twsck again since it is unlinked from hashtable.
Do you see otherwise?

>
> Cong, do you have any ideas as to what debugging checks I could add to
> help track this down?

I don't have any theory for the cause of this bug. Your point above actually
makes sense for me. Maybe you can add some code in between the following
code:

        if (del_timer_sync(&tw->tw_timer))
                inet_twsk_kill(tw);

to verify is the timer is rearmed or not.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-17 22:30               ` Cong Wang
@ 2017-02-21  9:46                 ` Dmitry Vyukov
  2017-02-21 10:40                   ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-21  9:46 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	syzkaller

On Sat, Feb 18, 2017 at 1:30 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>>>
>>>>>>> This code was changed a long time ago :
>>>>>>>
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>>>>
>>>>>>> So I suspect a recent patch broke the logic.
>>>>>>>
>>>>>>> You might start a bisection :
>>>>>>>
>>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>>>>
>>>>>>
>>>>>> It happens with too low rate for bisecting (few times per day). I
>>>>>> could add some additional checks into code, but I don't know what
>>>>>> checks could be useful.
>>>>>
>>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>>>> we are able to help.
>>>>
>>>>
>>>> There are also chances that the problem is older.
>>>>
>>>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>>>
>>>> 285                         if (unlikely((tw->tw_family != family) ||
>>>> 286                                      atomic_read(&twsk_net(tw)->count))) {
>>>>
>>>> It uses net->count == 0 check to find the right sockets. But what if
>>>> there are several nets with count == 0 in flight, can't there be
>>>> several inet_twsk_purge calls running concurrently freeing each other
>>>> sockets? If so it looks like inet_twsk_purge can call
>>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>>>> different nets discover the socket, check that net->count==0 and both
>>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>>>> net that it needs to purge?
>>>
>>> I don't think this could happen, because cleanup_net() is called in a
>>> work struct, and this work can't be scheduled twice, so there should
>>> not be any parallel cleanup_net().
>>>
>>> Also, inet_twsk_deschedule_put() already waits for the flying timer,
>>> net->count==0 at this point and all sockets in this netns are already
>>> gone, so I don't know how a timer could be still fired after this.
>>
>>
>> One thing that I noticed is that use-after-free always happens in
>> tw_timer_handler, but never in timer code. This suggests that:
>> 1. Whoever frees the struct waits for the timer mutex unlock.
>> 2. Free happens almost at the same time as timer fires (otherwise the
>> timer would probably be cancelled).
>>
>> That could happen if there is a race in timer code during cancellation
>> or rearming. I understand that it's heavily stressed code, but who
>> knows...
>
> Good point! I think this could happen since timer is deactivated before
> unlinking the tw sock, so another CPU could find it and rearm the timer
> during such window?
>
>>
>> I still wonder if twsk_net(tw)->count==0 is the right condition. This
>> net may not yet have been scheduled for destruction, but another task
>> could pick it up and start destroying...
>
> Good question. net_exit_list is just a snapshot of the cleanup_list, so
> it is true that inet_twsk_purge() could purge more netns'es than in
> net_exit_list, but I don't see any problem with this, the work later can't
> find the same twsck again since it is unlinked from hashtable.
> Do you see otherwise?

No, I don't see otherwise. But it's just something suspicious in the
context of an elusive race that happens under load.

>> Cong, do you have any ideas as to what debugging checks I could add to
>> help track this down?
>
> I don't have any theory for the cause of this bug. Your point above actually
> makes sense for me. Maybe you can add some code in between the following
> code:
>
>         if (del_timer_sync(&tw->tw_timer))
>                 inet_twsk_kill(tw);
>
> to verify is the timer is rearmed or not.

Now looking at the reports more closely, I think that debug info is
off-by-one, and the use-after-free happens on the net object on this
line: __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED).

And this report is actually all correct:

BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff88018063a460
Read of size 8 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.0-rc8+ #65
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
 tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149
 call_timer_fn+0x241/0x820 kernel/time/timer.c:1308
 expire_timers kernel/time/timer.c:1348 [inline]
 __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642
 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655
 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:658 [inline]
 smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707

Object at ffff88018063a240, in cache net_namespace size: 6784

Allocated:
PID = 3270
 kmem_cache_zalloc include/linux/slab.h:626 [inline]
 net_alloc net/core/net_namespace.c:339 [inline]
 copy_net_ns+0x196/0x530 net/core/net_namespace.c:379
 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
 copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164
 copy_process.part.42+0x20c6/0x5fd0 kernel/fork.c:1664
 copy_process kernel/fork.c:1486 [inline]
 _do_fork+0x200/0xff0 kernel/fork.c:1942
 SYSC_clone kernel/fork.c:2052 [inline]
 SyS_clone+0x37/0x50 kernel/fork.c:2046
 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
 return_from_SYSCALL_64+0x0/0x7a

Freed:
PID = 8056
 kmem_cache_free+0x71/0x240 mm/slab.c:3762
 net_free+0xd7/0x110 net/core/net_namespace.c:355
 net_drop_ns+0x31/0x40 net/core/net_namespace.c:362
 cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:479
 process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098
 worker_thread+0x223/0x1990 kernel/workqueue.c:2232
 kthread+0x326/0x3f0 kernel/kthread.c:227
 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430

The read is of size 8, but read of tw->tw_kill should be 4.
We could also double check offset within the object and check if it
matches what __NET_INC_STATS(twsk_net(tw)) reads.

So the net is somehow freed while the timer is active.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: net: use-after-free in tw_timer_handler
  2017-02-21  9:46                 ` Dmitry Vyukov
@ 2017-02-21 10:40                   ` Dmitry Vyukov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2017-02-21 10:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	syzkaller

On Tue, Feb 21, 2017 at 12:46 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Sat, Feb 18, 2017 at 1:30 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>>>>
>>>>>>>> This code was changed a long time ago :
>>>>>>>>
>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>>>>>
>>>>>>>> So I suspect a recent patch broke the logic.
>>>>>>>>
>>>>>>>> You might start a bisection :
>>>>>>>>
>>>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>>>>>
>>>>>>>
>>>>>>> It happens with too low rate for bisecting (few times per day). I
>>>>>>> could add some additional checks into code, but I don't know what
>>>>>>> checks could be useful.
>>>>>>
>>>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>>>>> we are able to help.
>>>>>
>>>>>
>>>>> There are also chances that the problem is older.
>>>>>
>>>>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>>>>
>>>>> 285                         if (unlikely((tw->tw_family != family) ||
>>>>> 286                                      atomic_read(&twsk_net(tw)->count))) {
>>>>>
>>>>> It uses net->count == 0 check to find the right sockets. But what if
>>>>> there are several nets with count == 0 in flight, can't there be
>>>>> several inet_twsk_purge calls running concurrently freeing each other
>>>>> sockets? If so it looks like inet_twsk_purge can call
>>>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>>>>> different nets discover the socket, check that net->count==0 and both
>>>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>>>>> net that it needs to purge?
>>>>
>>>> I don't think this could happen, because cleanup_net() is called in a
>>>> work struct, and this work can't be scheduled twice, so there should
>>>> not be any parallel cleanup_net().
>>>>
>>>> Also, inet_twsk_deschedule_put() already waits for the flying timer,
>>>> net->count==0 at this point and all sockets in this netns are already
>>>> gone, so I don't know how a timer could be still fired after this.
>>>
>>>
>>> One thing that I noticed is that use-after-free always happens in
>>> tw_timer_handler, but never in timer code. This suggests that:
>>> 1. Whoever frees the struct waits for the timer mutex unlock.
>>> 2. Free happens almost at the same time as timer fires (otherwise the
>>> timer would probably be cancelled).
>>>
>>> That could happen if there is a race in timer code during cancellation
>>> or rearming. I understand that it's heavily stressed code, but who
>>> knows...
>>
>> Good point! I think this could happen since timer is deactivated before
>> unlinking the tw sock, so another CPU could find it and rearm the timer
>> during such window?
>>
>>>
>>> I still wonder if twsk_net(tw)->count==0 is the right condition. This
>>> net may not yet have been scheduled for destruction, but another task
>>> could pick it up and start destroying...
>>
>> Good question. net_exit_list is just a snapshot of the cleanup_list, so
>> it is true that inet_twsk_purge() could purge more netns'es than in
>> net_exit_list, but I don't see any problem with this, the work later can't
>> find the same twsck again since it is unlinked from hashtable.
>> Do you see otherwise?
>
> No, I don't see otherwise. But it's just something suspicious in the
> context of an elusive race that happens under load.
>
>>> Cong, do you have any ideas as to what debugging checks I could add to
>>> help track this down?
>>
>> I don't have any theory for the cause of this bug. Your point above actually
>> makes sense for me. Maybe you can add some code in between the following
>> code:
>>
>>         if (del_timer_sync(&tw->tw_timer))
>>                 inet_twsk_kill(tw);
>>
>> to verify is the timer is rearmed or not.
>
> Now looking at the reports more closely, I think that debug info is
> off-by-one, and the use-after-free happens on the net object on this
> line: __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED).
>
> And this report is actually all correct:
>
> BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
> net/ipv4/inet_timewait_sock.c:149 at addr ffff88018063a460
> Read of size 8 by task swapper/1/0
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.0-rc8+ #65
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149
>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1308
>  expire_timers kernel/time/timer.c:1348 [inline]
>  __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642
>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655
>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>  invoke_softirq kernel/softirq.c:364 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961
>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
>
> Object at ffff88018063a240, in cache net_namespace size: 6784
>
> Allocated:
> PID = 3270
>  kmem_cache_zalloc include/linux/slab.h:626 [inline]
>  net_alloc net/core/net_namespace.c:339 [inline]
>  copy_net_ns+0x196/0x530 net/core/net_namespace.c:379
>  create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
>  copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164
>  copy_process.part.42+0x20c6/0x5fd0 kernel/fork.c:1664
>  copy_process kernel/fork.c:1486 [inline]
>  _do_fork+0x200/0xff0 kernel/fork.c:1942
>  SYSC_clone kernel/fork.c:2052 [inline]
>  SyS_clone+0x37/0x50 kernel/fork.c:2046
>  do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
>  return_from_SYSCALL_64+0x0/0x7a
>
> Freed:
> PID = 8056
>  kmem_cache_free+0x71/0x240 mm/slab.c:3762
>  net_free+0xd7/0x110 net/core/net_namespace.c:355
>  net_drop_ns+0x31/0x40 net/core/net_namespace.c:362
>  cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:479
>  process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2232
>  kthread+0x326/0x3f0 kernel/kthread.c:227
>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
>
> The read is of size 8, but read of tw->tw_kill should be 4.
> We could also double check offset within the object and check if it
> matches what __NET_INC_STATS(twsk_net(tw)) reads.
>
> So the net is somehow freed while the timer is active.


Suspecting a race between inet_twsk_deschedule_put and
inet_twsk_reschedule, but don't see any bugs.

Can inet_twsk_deschedule_put run concurrently with inet_twsk_schedule
(not the reschedule)? That might explain the bug, but that's probably
not possible, right?

Starring at the code, this part looks wrong:

960 static inline int
961 __mod_timer(struct timer_list *timer, unsigned long expires, bool
pending_only)
962 {
...
975         if (timer_pending(timer)) {
976                 if (timer->expires == expires)
977                         return 1;
...
985                 base = lock_timer_base(timer, &flags);
987                 clk = base->clk;
988                 idx = calc_wheel_index(expires, clk);
995                 if (idx == timer_get_idx(timer)) {
996                         timer->expires = expires;
997                         ret = 1;
998                         goto out_unlock;
999                 }


We check timer_pending before taking the lock, and not re-check it
after taking the lock. So we can merely update timer->expires for a
non-pending timer and fail to requeue it. It's fine for pending_only
(as it has already fired), but not for !pending_only. Am I missing
something? But it still does not explain the original bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-02-22 21:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-23 10:19 net: use-after-free in tw_timer_handler Dmitry Vyukov
2017-01-23 10:23 ` Dmitry Vyukov
2017-01-24 14:28   ` Eric Dumazet
2017-01-24 15:06     ` Dmitry Vyukov
2017-01-24 15:52       ` Eric Dumazet
2017-02-08 17:36         ` Dmitry Vyukov
2017-02-08 17:58           ` Eric Dumazet
2017-02-08 18:55             ` Dmitry Vyukov
2017-02-08 19:17               ` Eric Dumazet
2017-02-08 19:32                 ` Dmitry Vyukov
2017-02-14 19:38                   ` Dmitry Vyukov
2017-02-21 11:27                     ` [PATCH] net/dccp: fix use after free in tw_timer_handler() Andrey Ryabinin
2017-02-21 11:56                       ` Dmitry Vyukov
2017-02-22  6:48                         ` Dmitry Vyukov
2017-02-21 13:43                       ` Arnaldo Carvalho de Melo
2017-02-21 13:53                         ` Eric Dumazet
2017-02-21 18:23                           ` David Miller
2017-02-22  8:59                         ` Andrey Ryabinin
2017-02-21 18:23                       ` David Miller
2017-02-21 18:24                         ` David Miller
2017-02-22  9:35                           ` Andrey Ryabinin
2017-02-22  9:35                       ` [PATCH v2] " Andrey Ryabinin
2017-02-22 21:15                         ` David Miller
2017-02-17 18:51           ` net: use-after-free in tw_timer_handler Cong Wang
2017-02-17 20:36             ` Dmitry Vyukov
2017-02-17 22:30               ` Cong Wang
2017-02-21  9:46                 ` Dmitry Vyukov
2017-02-21 10:40                   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).