From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f69.google.com (mail-vk0-f69.google.com [209.85.213.69]) by kanga.kvack.org (Postfix) with ESMTP id 3F7A16B0033 for ; Mon, 6 Feb 2017 14:13:59 -0500 (EST) Received: by mail-vk0-f69.google.com with SMTP id k127so41599559vke.7 for ; Mon, 06 Feb 2017 11:13:59 -0800 (PST) Received: from mail-ua0-x229.google.com (mail-ua0-x229.google.com. [2607:f8b0:400c:c08::229]) by mx.google.com with ESMTPS id b186si430601vka.65.2017.02.06.11.13.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Feb 2017 11:13:57 -0800 (PST) Received: by mail-ua0-x229.google.com with SMTP id y9so68301644uae.2 for ; Mon, 06 Feb 2017 11:13:57 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Dmitry Vyukov Date: Mon, 6 Feb 2017 20:13:35 +0100 Message-ID: Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Tejun Heo , Christoph Lameter , "linux-mm@kvack.org" , LKML , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , syzkaller , Mel Gorman , Michal Hocko , Andrew Morton On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote: > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote: >> On 29.1.2017 13:44, Dmitry Vyukov wrote: >>> Hello, >>> >>> I've got the following deadlock report while running syzkaller fuzzer >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60: >>> >>> [ INFO: possible circular locking dependency detected ] >>> 4.10.0-rc5-next-20170125 #1 Not tainted >>> ------------------------------------------------------- >>> syz-executor3/14255 is trying to acquire lock: >>> (cpu_hotplug.dep_map){++++++}, at: [] >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239 >>> >>> but task is already holding lock: >>> (pcpu_alloc_mutex){+.+.+.}, at: [] >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 >>> >>> which lock already depends on the new lock. >> >> I suspect the dependency comes from recent changes in drain_all_pages(). They >> were later redone (for other reasons, but nice to have another validation) in >> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could >> you try if it helps? > > It happened only once on linux-next, so I can't verify the fix. But I > will watch out for other occurrences. Unfortunately it does not seem to help. Fuzzer now runs on 510948533b059f4f5033464f9f4a0c32d4ab0c08 of mmotm/auto-latest (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git): commit 510948533b059f4f5033464f9f4a0c32d4ab0c08 Date: Thu Feb 2 10:08:47 2017 +0100 mmotm: userfaultfd-non-cooperative-add-event-for-memory-unmaps-fix The commit you referenced is already there: commit 806b158031ca0b4714e775898396529a758ebc2c Date: Thu Feb 2 08:53:16 2017 +0100 mm, page_alloc: use static global work_struct for draining per-cpu pages But I still got: [ INFO: possible circular locking dependency detected ] 4.9.0 #6 Not tainted ------------------------------------------------------- syz-executor1/8199 is trying to acquire lock: (cpu_hotplug.dep_map){++++++}, at: [] get_online_cpus+0x37/0x90 kernel/cpu.c:246 but task is already holding lock: (pcpu_alloc_mutex){+.+.+.}, at: [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: [ 403.953319] [] validate_chain kernel/locking/lockdep.c:2265 [inline] [ 403.953319] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [ 403.961232] [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [ 403.968788] [] __mutex_lock_common kernel/locking/mutex.c:521 [inline] [ 403.968788] [] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 [ 403.976782] [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 [ 403.984266] [] __alloc_percpu+0x24/0x30 mm/percpu.c:1075 [ 403.991873] [] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44 [ 403.999799] [] cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136 [ 404.008253] [] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493 [ 404.016365] [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057 [ 404.023507] [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [ 404.030647] [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [ 404.037523] [] smp_init+0xe9/0xee kernel/smp.c:564 [ 404.044559] [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [ 404.052811] [] kernel_init+0x13/0x180 init/main.c:941 [ 404.060198] [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 [ 404.072827] [] validate_chain kernel/locking/lockdep.c:2265 [inline] [ 404.072827] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [ 404.080733] [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [ 404.088311] [] __mutex_lock_common kernel/locking/mutex.c:521 [inline] [ 404.088311] [] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 [ 404.096318] [] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304 [ 404.104321] [] _cpu_up+0xca/0x2a0 kernel/cpu.c:1011 [ 404.111357] [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [ 404.118480] [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [ 404.125360] [] smp_init+0xe9/0xee kernel/smp.c:564 [ 404.132393] [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [ 404.140668] [] kernel_init+0x13/0x180 init/main.c:941 [ 404.148079] [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 [ 404.160977] [] check_prev_add kernel/locking/lockdep.c:1828 [inline] [ 404.160977] [] check_prevs_add+0xa8d/0x1c00 kernel/locking/lockdep.c:1938 [ 404.168898] [] validate_chain kernel/locking/lockdep.c:2265 [inline] [ 404.168898] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [ 404.176844] [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [ 404.184416] [] get_online_cpus+0x62/0x90 kernel/cpu.c:248 [ 404.192103] [] drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385 [ 404.199880] [] __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline] [ 404.199880] [] __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778 [ 404.208406] [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980 [ 404.216851] [] __alloc_pages include/linux/gfp.h:426 [inline] [ 404.216851] [] __alloc_pages_node include/linux/gfp.h:439 [inline] [ 404.216851] [] alloc_pages_node include/linux/gfp.h:453 [inline] [ 404.216851] [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] [ 404.216851] [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 [ 404.225015] [] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 [ 404.232482] [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062 [ 404.240389] [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline] [ 404.240389] [] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99 [ 404.248224] [] find_and_alloc_map kernel/bpf/syscall.c:34 [inline] [ 404.248224] [] map_create kernel/bpf/syscall.c:188 [inline] [ 404.248224] [] SYSC_bpf kernel/bpf/syscall.c:870 [inline] [ 404.248224] [] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827 [ 404.255434] [] entry_SYSCALL_64_fastpath+0x1f/0xc2 other info that might help us debug this: Chain exists of: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcpu_alloc_mutex); lock(cpu_hotplug.lock); lock(pcpu_alloc_mutex); lock(cpu_hotplug.dep_map); *** DEADLOCK *** 2 locks held by syz-executor1/8199: #0: (pcpu_alloc_mutex){+.+.+.}, at: [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 #1: (pcpu_drain_mutex){+.+...}, at: [] drain_all_pages+0xd7/0x710 mm/page_alloc.c:2375 stack backtrace: CPU: 0 PID: 8199 Comm: syz-executor1 Not tainted 4.9.0 #6 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff88017ea4e118 ffffffff8234d0df ffffffff00000000 1ffff1002fd49bb6 ffffed002fd49bae 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1 ffffffff84b00510 ffffffff81560170 ffff88018ab02200 0000000041b58ab3 Call Trace: [] __dump_stack lib/dump_stack.c:15 [inline] [] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 [] print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202 [] check_prev_add kernel/locking/lockdep.c:1828 [inline] [] check_prevs_add+0xa8d/0x1c00 kernel/locking/lockdep.c:1938 [] validate_chain kernel/locking/lockdep.c:2265 [inline] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] get_online_cpus+0x62/0x90 kernel/cpu.c:248 [] drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385 [] __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline] [] __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778 [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980 [] __alloc_pages include/linux/gfp.h:426 [inline] [] __alloc_pages_node include/linux/gfp.h:439 [inline] [] alloc_pages_node include/linux/gfp.h:453 [inline] [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 [] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062 [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline] [] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99 [] find_and_alloc_map kernel/bpf/syscall.c:34 [inline] [] map_create kernel/bpf/syscall.c:188 [inline] [] SYSC_bpf kernel/bpf/syscall.c:870 [inline] [] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 syz-executor1: page allocation failure: order:0, mode:0x14001c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_COLD), nodemask=(null) syz-executor1 cpuset=/ mems_allowed=0 CPU: 0 PID: 8199 Comm: syz-executor1 Not tainted 4.9.0 #6 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff88017ea4eb80 ffffffff8234d0df ffffffff00000000 1ffff1002fd49d03 ffffed002fd49cfb 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1 0000000000000282 ffffffff84fd53c0 ffff8801dae65b38 ffff88017ea4e7b8 Call Trace: [] __dump_stack lib/dump_stack.c:15 [inline] [] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 [] warn_alloc+0x21f/0x360 mm/page_alloc.c:3126 [] __alloc_pages_slowpath+0x1c98/0x2370 mm/page_alloc.c:3890 [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980 [] __alloc_pages include/linux/gfp.h:426 [inline] [] __alloc_pages_node include/linux/gfp.h:439 [inline] [] alloc_pages_node include/linux/gfp.h:453 [inline] [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 [] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062 [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline] [] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99 [] find_and_alloc_map kernel/bpf/syscall.c:34 [inline] [] map_create kernel/bpf/syscall.c:188 [inline] [] SYSC_bpf kernel/bpf/syscall.c:870 [inline] [] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org