From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751296AbeBBFqn (ORCPT ); Fri, 2 Feb 2018 00:46:43 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:47960 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750704AbeBBFqh (ORCPT ); Fri, 2 Feb 2018 00:46:37 -0500 Date: Fri, 2 Feb 2018 05:46:26 +0000 From: Al Viro To: Eric Biggers Cc: syzbot , akpm@linux-foundation.org, aneesh.kumar@linux.vnet.ibm.com, dan.j.williams@intel.com, james.morse@arm.com, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mingo@kernel.org, syzkaller-bugs@googlegroups.com Subject: Re: possible deadlock in get_user_pages_unlocked Message-ID: <20180202054626.GG30522@ZenIV.linux.org.uk> References: <001a113f6344393d89056430347d@google.com> <20180202045020.GF30522@ZenIV.linux.org.uk> <20180202053502.GB949@zzz.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180202053502.GB949@zzz.localdomain> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote: > Try starting up multiple instances of the program; that sometimes helps with > these races that are hard to hit (since you may e.g. have a different number of > CPUs than syzbot used). If I start up 4 instances I see the lockdep splat after > around 2-5 seconds. 5 instances in parallel, 10 minutes into the run... > This is on latest Linus tree (4bf772b1467). Also note the > reproducer uses KVM, so if you're running it in a VM it will only work if you've > enabled nested virtualization on the host (kvm_intel.nested=1). cat /sys/module/kvm_amd/parameters/nested 1 on host > Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch > get_user_page_nowait() to get_user_pages_unlocked()"). That simply prevents this reproducer hitting get_user_pages_unlocked() instead of grab mmap_sem/get_user_pages/drop mmap_sem. I.e. does not allow __get_user_pages_locked() to drop/regain ->mmap_sem. The bug may be in the way we call get_user_pages_unlocked() in that commit, but it might easily be a bug in __get_user_pages_locked() exposed by that reproducer somehow.