From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f199.google.com (mail-wj0-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 654426B0033 for ; Tue, 7 Feb 2017 15:19:59 -0500 (EST) Received: by mail-wj0-f199.google.com with SMTP id kq3so28137939wjc.1 for ; Tue, 07 Feb 2017 12:19:59 -0800 (PST) Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com. [74.125.82.68]) by mx.google.com with ESMTPS id z11si318730wmh.2.2017.02.07.12.19.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Feb 2017 12:19:58 -0800 (PST) Received: by mail-wm0-f68.google.com with SMTP id v77so30317268wmv.0 for ; Tue, 07 Feb 2017 12:19:58 -0800 (PST) From: Michal Hocko Subject: [PATCH] mm, page_alloc: do not depend on cpu hotplug locks inside the allocator Date: Tue, 7 Feb 2017 21:19:50 +0100 Message-Id: <20170207201950.20482-1-mhocko@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Jesper Dangaard Brouer , Vlastimil Babka , Hillf Danton , linux-mm@kvack.org, LKML , Michal Hocko , Dmitry Vyukov , Mel Gorman , Tejun Heo From: Michal Hocko Dmitry has reported the following lockdep splat [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] __mutex_lock_common kernel/locking/mutex.c:521 [inline] [] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 [] __alloc_percpu+0x24/0x30 mm/percpu.c:1075 [] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44 [] cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136 [] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493 [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057 [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [] smp_init+0xe9/0xee kernel/smp.c:564 [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [] kernel_init+0x13/0x180 init/main.c:941 [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock pcpu_alloc pcpu_alloc_mutex [] get_online_cpus+0x62/0x90 kernel/cpu.c:248 [] drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385 [] __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline] [] __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778 [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980 [] __alloc_pages include/linux/gfp.h:426 [inline] [] __alloc_pages_node include/linux/gfp.h:439 [inline] [] alloc_pages_node include/linux/gfp.h:453 [inline] [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 [] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062 [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline] [] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99 [] find_and_alloc_map kernel/bpf/syscall.c:34 [inline] [] map_create kernel/bpf/syscall.c:188 [inline] [] SYSC_bpf kernel/bpf/syscall.c:870 [inline] [] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 pcpu_alloc pcpu_alloc_mutex drain_all_pages get_online_cpus cpu_hotplug.lock [] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304 [] _cpu_up+0xca/0x2a0 kernel/cpu.c:1011 [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [] smp_init+0xe9/0xee kernel/smp.c:564 [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [] kernel_init+0x13/0x180 init/main.c:941 [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock Pulling cpu hotplug locks inside the page allocator is just too dangerous. Let's remove the dependency by dropping get_online_cpus() from drain_all_pages. This is not so simple though because now we do not have a protection against cpu hotplug which means 2 things: - the work item might be executed on a different cpu in worker from unbound pool so it doesn't run on pinned on the cpu - we have to make sure that we do not race with page_alloc_cpu_dead calling drain_pages_zone Disabling preemption in drain_local_pages_wq will solve the first problem drain_local_pages will determine its local CPU from the WQ context which will be stable after that point, page_alloc_cpu_dead is pinned to the CPU already. The later condition is achieved by disabling IRQs in drain_pages_zone. Fixes: mm, page_alloc: drain per-cpu pages from workqueue context Reported-by: Dmitry Vyukov Acked-by: Tejun Heo Acked-by: Mel Gorman Signed-off-by: Michal Hocko --- mm/page_alloc.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c3358d4f7932..b6411816787a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone) static void drain_local_pages_wq(struct work_struct *work) { + /* + * drain_all_pages doesn't use proper cpu hotplug protection so + * we can race with cpu offline when the WQ can move this from + * a cpu pinned worker to an unbound one. We can operate on a different + * cpu which is allright but we also have to make sure to not move to + * a different one. + */ + preempt_disable(); drain_local_pages(NULL); + preempt_enable(); } /* @@ -2379,12 +2388,6 @@ void drain_all_pages(struct zone *zone) } /* - * As this can be called from reclaim context, do not reenter reclaim. - * An allocation failure can be handled, it's simply slower - */ - get_online_cpus(); - - /* * We don't care about racing with CPU hotplug event * as offline notification will cause the notified * cpu to drain that CPU pcps and on_each_cpu_mask @@ -2423,7 +2426,6 @@ void drain_all_pages(struct zone *zone) for_each_cpu(cpu, &cpus_with_pcps) flush_work(per_cpu_ptr(&pcpu_drain, cpu)); - put_online_cpus(); mutex_unlock(&pcpu_drain_mutex); } -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org