From: Michal Hocko <mhocko@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Dmitry Vyukov <dvyukov@google.com>,
Vlastimil Babka <vbabka@suse.cz>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
syzkaller <syzkaller@googlegroups.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc
Date: Tue, 7 Feb 2017 09:48:56 +0100 [thread overview]
Message-ID: <20170207084855.GC5065@dhcp22.suse.cz> (raw)
In-Reply-To: <20170206220530.apvuknbagaf2rdlw@techsingularity.net>
On Mon 06-02-17 22:05:30, Mel Gorman wrote:
> On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
> > On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> > >> On 29.1.2017 13:44, Dmitry Vyukov wrote:
> > >>> Hello,
> > >>>
> > >>> I've got the following deadlock report while running syzkaller fuzzer
> > >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
> > >>>
> > >>> [ INFO: possible circular locking dependency detected ]
> > >>> 4.10.0-rc5-next-20170125 #1 Not tainted
> > >>> -------------------------------------------------------
> > >>> syz-executor3/14255 is trying to acquire lock:
> > >>> (cpu_hotplug.dep_map){++++++}, at: [<ffffffff814271c7>]
> > >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239
> > >>>
> > >>> but task is already holding lock:
> > >>> (pcpu_alloc_mutex){+.+.+.}, at: [<ffffffff81937fee>]
> > >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897
> > >>>
> > >>> which lock already depends on the new lock.
> > >>
> > >> I suspect the dependency comes from recent changes in drain_all_pages(). They
> > >> were later redone (for other reasons, but nice to have another validation) in
> > >> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could
> > >> you try if it helps?
> > >
> > > It happened only once on linux-next, so I can't verify the fix. But I
> > > will watch out for other occurrences.
> >
> > Unfortunately it does not seem to help.
>
> I'm a little stuck on how to best handle this. get_online_cpus() can
> halt forever if the hotplug operation is holding the mutex when calling
> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which
> trylocks the mutex. However, given that drain is so unlikely to actually
> make that make a difference when racing against parallel allocations,
> I think this should be acceptable.
>
> Any objections?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3b93879990fd..a3192447e906 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> */
> if (!page && !drained) {
> unreserve_highatomic_pageblock(ac, false);
> - drain_all_pages(NULL);
> +
> + /*
> + * Only drain from contexts allocating for user allocations.
> + * Kernel allocations could be holding a CPU hotplug-related
> + * mutex, particularly hot-add allocating per-cpu structures
> + * while hotplug-related mutex's are held which would prevent
> + * get_online_cpus ever returning.
> + */
> + if (gfp_mask & __GFP_HARDWALL)
> + drain_all_pages(NULL);
> +
This wouldn't work AFAICS. If you look at the lockdep splat, the path
which reverses the locking order (takes pcpu_alloc_mutex prior to
cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus
__GFP_HARDWALL.
I believe we shouldn't pull any dependency on the hotplug locks inside
the allocator. This is just too fragile! Can we simply drop the
get_online_cpus()? Why do we need it, anyway? Say we are racing with the
cpu offlining. I have to check the code but my impression was that WQ
code will ignore the cpu requested by the work item when the cpu is
going offline. If the offline happens while the worker function already
executes then it has to wait as we run with preemption disabled so we
should be safe here. Or am I missing something obvious?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-02-07 8:48 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-29 12:44 mm: deadlock between get_online_cpus/pcpu_alloc Dmitry Vyukov
2017-01-29 17:22 ` Vlastimil Babka
2017-01-30 15:48 ` Dmitry Vyukov
2017-02-06 19:13 ` Dmitry Vyukov
2017-02-06 22:05 ` Mel Gorman
2017-02-07 8:48 ` Michal Hocko [this message]
2017-02-07 9:23 ` Vlastimil Babka
2017-02-07 9:46 ` Mel Gorman
2017-02-07 9:53 ` Michal Hocko
2017-02-07 10:42 ` Mel Gorman
2017-02-07 11:13 ` Mel Gorman
2017-02-07 9:43 ` Mel Gorman
2017-02-07 9:49 ` Vlastimil Babka
2017-02-07 10:05 ` Michal Hocko
2017-02-07 10:28 ` Mel Gorman
2017-02-07 10:35 ` Michal Hocko
2017-02-07 11:34 ` Mel Gorman
2017-02-07 11:43 ` Michal Hocko
2017-02-07 11:54 ` Vlastimil Babka
2017-02-07 12:08 ` Michal Hocko
2017-02-07 12:37 ` Michal Hocko
2017-02-07 12:43 ` Vlastimil Babka
2017-02-07 12:48 ` Michal Hocko
2017-02-07 13:57 ` Vlastimil Babka
2017-02-07 13:58 ` Mel Gorman
2017-02-07 14:19 ` Michal Hocko
2017-02-07 15:34 ` Michal Hocko
2017-02-07 16:22 ` Mel Gorman
2017-02-07 16:41 ` Michal Hocko
2017-02-07 16:55 ` Christoph Lameter
2017-02-07 22:25 ` Thomas Gleixner
2017-02-08 7:35 ` Michal Hocko
2017-02-08 12:02 ` Thomas Gleixner
2017-02-08 12:21 ` Michal Hocko
2017-02-08 12:26 ` Mel Gorman
2017-02-08 13:23 ` Thomas Gleixner
2017-02-08 14:03 ` Mel Gorman
2017-02-08 14:11 ` Peter Zijlstra
2017-02-08 15:11 ` Christoph Lameter
2017-02-08 15:21 ` Michal Hocko
2017-02-08 16:17 ` Christoph Lameter
2017-02-08 17:46 ` Thomas Gleixner
2017-02-09 3:15 ` Christoph Lameter
2017-02-09 11:42 ` Thomas Gleixner
2017-02-09 14:00 ` Christoph Lameter
2017-02-09 14:53 ` Thomas Gleixner
2017-02-09 15:42 ` Christoph Lameter
2017-02-09 16:12 ` Thomas Gleixner
2017-02-09 17:22 ` Christoph Lameter
2017-02-09 17:40 ` Thomas Gleixner
2017-02-09 19:15 ` Michal Hocko
2017-02-10 17:58 ` Christoph Lameter
2017-02-08 15:06 ` Christoph Lameter
2017-02-07 17:03 ` Tejun Heo
2017-02-07 20:16 ` Michal Hocko
2017-02-07 13:03 ` Mel Gorman
2017-02-07 13:48 ` Michal Hocko
2017-02-07 11:24 ` Tetsuo Handa
2017-02-07 8:43 ` Michal Hocko
2017-02-07 21:53 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170207084855.GC5065@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=dvyukov@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=syzkaller@googlegroups.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).