From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 807546B0009 for ; Fri, 9 Mar 2018 23:15:15 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id v8so4735715pgs.9 for ; Fri, 09 Mar 2018 20:15:15 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 2-v6sor1080010plc.26.2018.03.09.20.15.13 for (Google Transport Security); Fri, 09 Mar 2018 20:15:14 -0800 (PST) Date: Fri, 9 Mar 2018 20:15:05 -0800 From: Eric Biggers Subject: Re: possible deadlock in get_user_pages_unlocked Message-ID: <20180310041505.GA598@zzz.localdomain> References: <001a113f6344393d89056430347d@google.com> <20180202045020.GF30522@ZenIV.linux.org.uk> <20180202053502.GB949@zzz.localdomain> <20180202054626.GG30522@ZenIV.linux.org.uk> <20180202062037.GH30522@ZenIV.linux.org.uk> <20180210013640.GN30522@ZenIV.linux.org.uk> <20180210031925.GA1041@zzz.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180210031925.GA1041@zzz.localdomain> Sender: owner-linux-mm@kvack.org List-ID: To: Al Viro Cc: Dmitry Vyukov , syzbot , Andrew Morton , "Aneesh Kumar K.V" , Dan Williams , James Morse , "Kirill A. Shutemov" , Andrea Arcangeli , LKML , Linux-MM , Ingo Molnar , syzkaller-bugs@googlegroups.com On Fri, Feb 09, 2018 at 07:19:25PM -0800, Eric Biggers wrote: > Hi Al, > > On Sat, Feb 10, 2018 at 01:36:40AM +0000, Al Viro wrote: > > On Fri, Feb 02, 2018 at 09:57:27AM +0100, Dmitry Vyukov wrote: > > > > > syzbot tests for up to 5 minutes. However, if there is a race involved > > > then you may need more time because the crash is probabilistic. > > > But from what I see most of the time, if one can't reproduce it > > > easily, it's usually due to some differences in setup that just don't > > > allow the crash to happen at all. > > > FWIW syzbot re-runs each reproducer on a freshly booted dedicated VM > > > and what it provided is the kernel output it got during run of the > > > provided program. So we have reasonably high assurance that this > > > reproducer worked in at least one setup. > > > > Could you guys check if the following fixes the reproducer? > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 61015793f952..058a9a8e4e2e 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -861,6 +861,9 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, > > BUG_ON(*locked != 1); > > } > > > > + if (flags & FOLL_NOWAIT) > > + locked = NULL; > > + > > if (pages) > > flags |= FOLL_GET; > > > > Yes that fixes the reproducer for me. > Just to follow up on this: it seems that Al's suggested fix didn't go anywhere, but someone else eventually ran into this bug (which was a real deadlock) and a slightly different fix was merged, commit 96312e61282ae. It fixes the reproducer for me too. Telling syzbot so that it can close the bug: #syz fix: mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT - Eric