From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753085AbcEBMOP (ORCPT ); Mon, 2 May 2016 08:14:15 -0400 Received: from mail-lf0-f50.google.com ([209.85.215.50]:34339 "EHLO mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752126AbcEBMOH (ORCPT ); Mon, 2 May 2016 08:14:07 -0400 Date: Mon, 2 May 2016 15:14:02 +0300 From: "Kirill A. Shutemov" To: Jerome Glisse , Oleg Nesterov , Hugh Dickins , Linus Torvalds , Andrew Morton Cc: Andrea Arcangeli , Alex Williamson , kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, "linux-mm@kvack.org" Subject: GUP guarantees wrt to userspace mappings redesign Message-ID: <20160502121402.GB23305@node.shutemov.name> References: <20160428102051.17d1c728@t450s.home> <20160428181726.GA2847@node.shutemov.name> <20160428125808.29ad59e5@t450s.home> <20160428232127.GL11700@redhat.com> <20160429005106.GB2847@node.shutemov.name> <20160428204542.5f2053f7@ul30vt.home> <20160429070611.GA4990@node.shutemov.name> <20160429163444.GM11700@redhat.com> <20160502104119.GA23305@node.shutemov.name> <20160502111513.GA4079@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160502111513.GA4079@gmail.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 02, 2016 at 01:15:13PM +0200, Jerome Glisse wrote: > On Mon, May 02, 2016 at 01:41:19PM +0300, Kirill A. Shutemov wrote: > > Other thing I would like to discuss is if there's a problem on vfio side. > > To me it looks like vfio expects guarantee from get_user_pages() which it > > doesn't provide: obtaining pin on the page doesn't guarantee that the page > > is going to remain mapped into userspace until the pin is gone. > > > > Even with THP COW regressing fixed, vfio would stay fragile: any > > MADV_DONTNEED/fork()/mremap()/whatever what would make vfio expectation > > broken. > > > > Well i don't think it is fair/accurate assessment of get_user_pages(), page > must remain mapped to same virtual address until pin is gone. I am ignoring > mremap() as it is a scient decision from userspace and while virtual address > change in that case, the pined page behind should move with the mapping. > Same of MADV_DONTNEED. I agree that get_user_pages() is broken after fork() > but this have been the case since dawn of time, so it is something expected. > > If not vfio, then direct-io, have been expecting this kind of behavior for > long time, so i see this as part of get_user_pages() guarantee. > > Concerning vfio, not providing this guarantee will break countless number of > workload. Thing like qemu/kvm allocate anonymous memory and hand it over to > the guest kernel which presents it as memory. Now a device driver inside the > guest kernel need to get bus mapping for a given (guest) page, which from > host point of view means a mapping from anonymous page to bus mapping but > for guest to keep accessing the same page the anonymous mapping (ie a > specific virtual address on the host side) must keep pointing to the same > page. This have been the case with get_user_pages() until now, so whether > we like it or not we must keep that guarantee. > > This kind of workload knows that they can't do mremap()/fork()/... and keep > that guarantee but they at expect existing guarantee and i don't think we > can break that. Quick look around: - I don't see any check page_count() around __replace_page() in uprobes, so it can easily replace pinned page. - KSM has the page_count() check, there's still race wrt GUP_fast: it can take the pin between the check and establishing new pte entry. - khugepaged: the same story as with KSM. I don't see how we can deliver on the guarantee, especially with lockless GUP_fast. Or am I missing something important? -- Kirill A. Shutemov