From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD384C5DF60 for ; Wed, 6 Nov 2019 00:17:01 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4305B21D6C for ; Wed, 6 Nov 2019 00:17:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4305B21D6C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4776Xq3nmNzF5HT for ; Wed, 6 Nov 2019 11:16:59 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=intel.com (client-ip=192.55.52.120; helo=mga04.intel.com; envelope-from=sean.j.christopherson@intel.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4776FB6qJ5zDsTS for ; Wed, 6 Nov 2019 11:03:21 +1100 (AEDT) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Nov 2019 16:03:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,271,1569308400"; d="scan'208";a="200547335" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga008.fm.intel.com with ESMTP; 05 Nov 2019 16:03:15 -0800 Date: Tue, 5 Nov 2019 16:03:15 -0800 From: Sean Christopherson To: Dan Williams Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes Message-ID: <20191106000315.GI23297@linux.intel.com> References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Mailman-Approved-At: Wed, 06 Nov 2019 11:05:37 +1100 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , Radim =?utf-8?B?S3LEjW3DocWZ?= , KVM list , David Hildenbrand , KarimAllah Ahmed , Dave Hansen , Alexander Duyck , Michal Hocko , Linux MM , Pavel Tatashin , Paul Mackerras , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , "K. Y. Srinivasan" , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Matt Sickler , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: > On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: > > > > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > > had to be said :/ ). Luckily, this should be independent of the > > > > > PG_reserved thingy AFAIKs. > > > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > > page count gets mismanaged and leads to the reported hang. > > > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > > and so never puts its reference to ZONE_DEVICE pages. > > > > Oh, yeah, that's busted. > > Ugh, it's extra busted because every other gup user in the kernel > tracks the pages resulting from gup and puts them (put_page()) when > they are done. KVM wants to forget about whether it did a gup to get > the page and optionally trigger put_page() based purely on the pfn. > Outside of VFIO device assignment that needs pages pinned for DMA, why > does KVM itself need to pin pages? If pages are pinned over a return > to userspace that needs to be a FOLL_LONGTERM gup. Short answer, KVM pins the page to ensure correctness with respect to the primary MMU invalidating the associated host virtual address, e.g. when the page is being migrated or unmapped from host userspace. The main use of gup() is to handle guest page faults and map pages into the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the PFN and to temporarily pin the page. The pin is held just long enough to guaranteed that any invalidation via the mmu_notifier will be stalled until after KVM finishes installing the page into the secondary MMU, i.e. the pin is short-term and not held across a return to userspace or entry into the guest. When a subsequent mmu_notifier invalidation occurs, KVM pulls the PFN from the secondary MMU and uses that to update accessed and dirty bits in the host. There are a few other KVM flows that eventually call into gup(), but those are "traditional" short-term pins and use put_page() directly. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B51EC5DF60 for ; Wed, 6 Nov 2019 00:03:47 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B9C73214D8 for ; Wed, 6 Nov 2019 00:03:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9C73214D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iS8nD-0004qB-0D; Wed, 06 Nov 2019 00:03:23 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iS8nB-0004q6-6p for xen-devel@lists.xenproject.org; Wed, 06 Nov 2019 00:03:21 +0000 X-Inumbo-ID: d6488c0e-0028-11ea-a1a5-12813bfff9fa Received: from mga04.intel.com (unknown [192.55.52.120]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id d6488c0e-0028-11ea-a1a5-12813bfff9fa; Wed, 06 Nov 2019 00:03:18 +0000 (UTC) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Nov 2019 16:03:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,271,1569308400"; d="scan'208";a="200547335" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga008.fm.intel.com with ESMTP; 05 Nov 2019 16:03:15 -0800 Date: Tue, 5 Nov 2019 16:03:15 -0800 From: Sean Christopherson To: Dan Williams Message-ID: <20191106000315.GI23297@linux.intel.com> References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [Xen-devel] [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, Michal Hocko , Radim =?utf-8?B?S3LEjW3DocWZ?= , KVM list , David Hildenbrand , KarimAllah Ahmed , Benjamin Herrenschmidt , Dave Hansen , Alexander Duyck , Michal Hocko , Paul Mackerras , Linux MM , Pavel Tatashin , Paul Mackerras , Michael Ellerman , "H. Peter Anvin" , Wanpeng Li , Alexander Duyck , "K. Y. Srinivasan" , Thomas Gleixner , Kees Cook , devel@driverdev.osuosl.org, Stefano Stabellini , Stephen Hemminger , "Aneesh Kumar K.V" , Joerg Roedel , X86 ML , YueHaibing , "Matthew Wilcox \(Oracle\)" , Mike Rapoport , Peter Zijlstra , Ingo Molnar , Vlastimil Babka , Anthony Yznaga , Oscar Salvador , "Isaac J. Manjarres" , Matt Sickler , Juergen Gross , Anshuman Khandual , Haiyang Zhang , Sasha Levin , kvm-ppc@vger.kernel.org, Qian Cai , Alex Williamson , Mike Rapoport , Borislav Petkov , Nicholas Piggin , Andy Lutomirski , xen-devel , Boris Ostrovsky , Vitaly Kuznetsov , Allison Randal , Jim Mattson , Christophe Leroy , Mel Gorman , Adam Borowski , Cornelia Huck , Pavel Tatashin , Linux Kernel Mailing List , Johannes Weiner , Paolo Bonzini , Andrew Morton , linuxppc-dev Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" T24gVHVlLCBOb3YgMDUsIDIwMTkgYXQgMDM6NDM6MjlQTSAtMDgwMCwgRGFuIFdpbGxpYW1zIHdy b3RlOgo+IE9uIFR1ZSwgTm92IDUsIDIwMTkgYXQgMzozMCBQTSBEYW4gV2lsbGlhbXMgPGRhbi5q LndpbGxpYW1zQGludGVsLmNvbT4gd3JvdGU6Cj4gPgo+ID4gT24gVHVlLCBOb3YgNSwgMjAxOSBh dCAzOjEzIFBNIFNlYW4gQ2hyaXN0b3BoZXJzb24KPiA+IDxzZWFuLmouY2hyaXN0b3BoZXJzb25A aW50ZWwuY29tPiB3cm90ZToKPiA+ID4KPiA+ID4gT24gVHVlLCBOb3YgMDUsIDIwMTkgYXQgMDM6 MDI6NDBQTSAtMDgwMCwgRGFuIFdpbGxpYW1zIHdyb3RlOgo+ID4gPiA+IE9uIFR1ZSwgTm92IDUs IDIwMTkgYXQgMTI6MzEgUE0gRGF2aWQgSGlsZGVuYnJhbmQgPGRhdmlkQHJlZGhhdC5jb20+IHdy b3RlOgo+ID4gPiA+ID4gPiBUaGUgc2NhcmllciBjb2RlIChmb3IgbWUpIGlzIHRyYW5zcGFyZW50 X2h1Z2VwYWdlX2FkanVzdCgpIGFuZAo+ID4gPiA+ID4gPiBrdm1fbW11X3phcF9jb2xsYXBzaWJs ZV9zcHRlKCksIGFzIEkgZG9uJ3QgYXQgYWxsIHVuZGVyc3RhbmQgdGhlCj4gPiA+ID4gPiA+IGlu dGVyYWN0aW9uIGJldHdlZW4gVEhQIGFuZCBfUEFHRV9ERVZNQVAuCj4gPiA+ID4gPgo+ID4gPiA+ ID4gVGhlIHg4NiBLVk0gTU1VIGNvZGUgaXMgb25lIG9mIHRoZSB1Z2xpZXN0IGNvZGUgSSBrbm93 IChzb3JyeSwgYnV0IGl0Cj4gPiA+ID4gPiBoYWQgdG8gYmUgc2FpZCA6LyApLiBMdWNraWx5LCB0 aGlzIHNob3VsZCBiZSBpbmRlcGVuZGVudCBvZiB0aGUKPiA+ID4gPiA+IFBHX3Jlc2VydmVkIHRo aW5neSBBRkFJS3MuCj4gPiA+ID4KPiA+ID4gPiBCb3RoIHRyYW5zcGFyZW50X2h1Z2VwYWdlX2Fk anVzdCgpIGFuZCBrdm1fbW11X3phcF9jb2xsYXBzaWJsZV9zcHRlKCkKPiA+ID4gPiBhcmUgaG9u b3Jpbmcga3ZtX2lzX3Jlc2VydmVkX3BmbigpLCBzbyBhZ2FpbiBJJ20gbWlzc2luZyB3aGVyZSB0 aGUKPiA+ID4gPiBwYWdlIGNvdW50IGdldHMgbWlzbWFuYWdlZCBhbmQgbGVhZHMgdG8gdGhlIHJl cG9ydGVkIGhhbmcuCj4gPiA+Cj4gPiA+IFdoZW4gbWFwcGluZyBwYWdlcyBpbnRvIHRoZSBndWVz dCwgS1ZNIGdldHMgdGhlIHBhZ2UgdmlhIGd1cCgpLCB3aGljaAo+ID4gPiBpbmNyZW1lbnRzIHRo ZSBwYWdlIGNvdW50IGZvciBaT05FX0RFVklDRSBwYWdlcy4gIEJ1dCBLVk0gcHV0cyB0aGUgcGFn ZQo+ID4gPiB1c2luZyBrdm1fcmVsZWFzZV9wZm5fY2xlYW4oKSwgd2hpY2ggc2tpcHMgcHV0X3Bh Z2UoKSBpZiBQYWdlUmVzZXJ2ZWQoKQo+ID4gPiBhbmQgc28gbmV2ZXIgcHV0cyBpdHMgcmVmZXJl bmNlIHRvIFpPTkVfREVWSUNFIHBhZ2VzLgo+ID4KPiA+IE9oLCB5ZWFoLCB0aGF0J3MgYnVzdGVk Lgo+IAo+IFVnaCwgaXQncyBleHRyYSBidXN0ZWQgYmVjYXVzZSBldmVyeSBvdGhlciBndXAgdXNl ciBpbiB0aGUga2VybmVsCj4gdHJhY2tzIHRoZSBwYWdlcyByZXN1bHRpbmcgZnJvbSBndXAgYW5k IHB1dHMgdGhlbSAocHV0X3BhZ2UoKSkgd2hlbgo+IHRoZXkgYXJlIGRvbmUuIEtWTSB3YW50cyB0 byBmb3JnZXQgYWJvdXQgd2hldGhlciBpdCBkaWQgYSBndXAgdG8gZ2V0Cj4gdGhlIHBhZ2UgYW5k IG9wdGlvbmFsbHkgdHJpZ2dlciBwdXRfcGFnZSgpIGJhc2VkIHB1cmVseSBvbiB0aGUgcGZuLgo+ IE91dHNpZGUgb2YgVkZJTyBkZXZpY2UgYXNzaWdubWVudCB0aGF0IG5lZWRzIHBhZ2VzIHBpbm5l ZCBmb3IgRE1BLCB3aHkKPiBkb2VzIEtWTSBpdHNlbGYgbmVlZCB0byBwaW4gcGFnZXM/IElmIHBh Z2VzIGFyZSBwaW5uZWQgb3ZlciBhIHJldHVybgo+IHRvIHVzZXJzcGFjZSB0aGF0IG5lZWRzIHRv IGJlIGEgRk9MTF9MT05HVEVSTSBndXAuCgpTaG9ydCBhbnN3ZXIsIEtWTSBwaW5zIHRoZSBwYWdl IHRvIGVuc3VyZSBjb3JyZWN0bmVzcyB3aXRoIHJlc3BlY3QgdG8gdGhlCnByaW1hcnkgTU1VIGlu dmFsaWRhdGluZyB0aGUgYXNzb2NpYXRlZCBob3N0IHZpcnR1YWwgYWRkcmVzcywgZS5nLiB3aGVu CnRoZSBwYWdlIGlzIGJlaW5nIG1pZ3JhdGVkIG9yIHVubWFwcGVkIGZyb20gaG9zdCB1c2Vyc3Bh Y2UuCgpUaGUgbWFpbiB1c2Ugb2YgZ3VwKCkgaXMgdG8gaGFuZGxlIGd1ZXN0IHBhZ2UgZmF1bHRz IGFuZCBtYXAgcGFnZXMgaW50bwp0aGUgZ3Vlc3QsIGkuZS4gaW50byBLVk0ncyBzZWNvbmRhcnkg TU1VLiAgS1ZNIHVzZXMgZ3VwKCkgdG8gYm90aCBnZXQgdGhlClBGTiBhbmQgdG8gdGVtcG9yYXJp bHkgcGluIHRoZSBwYWdlLiAgVGhlIHBpbiBpcyBoZWxkIGp1c3QgbG9uZyBlbm91Z2ggdG8KZ3Vh cmFudGVlZCB0aGF0IGFueSBpbnZhbGlkYXRpb24gdmlhIHRoZSBtbXVfbm90aWZpZXIgd2lsbCBi ZSBzdGFsbGVkCnVudGlsIGFmdGVyIEtWTSBmaW5pc2hlcyBpbnN0YWxsaW5nIHRoZSBwYWdlIGlu dG8gdGhlIHNlY29uZGFyeSBNTVUsIGkuZS4KdGhlIHBpbiBpcyBzaG9ydC10ZXJtIGFuZCBub3Qg aGVsZCBhY3Jvc3MgYSByZXR1cm4gdG8gdXNlcnNwYWNlIG9yIGVudHJ5CmludG8gdGhlIGd1ZXN0 LiAgV2hlbiBhIHN1YnNlcXVlbnQgbW11X25vdGlmaWVyIGludmFsaWRhdGlvbiBvY2N1cnMsIEtW TQpwdWxscyB0aGUgUEZOIGZyb20gdGhlIHNlY29uZGFyeSBNTVUgYW5kIHVzZXMgdGhhdCB0byB1 cGRhdGUgYWNjZXNzZWQKYW5kIGRpcnR5IGJpdHMgaW4gdGhlIGhvc3QuCgpUaGVyZSBhcmUgYSBm ZXcgb3RoZXIgS1ZNIGZsb3dzIHRoYXQgZXZlbnR1YWxseSBjYWxsIGludG8gZ3VwKCksIGJ1dCB0 aG9zZQphcmUgInRyYWRpdGlvbmFsIiBzaG9ydC10ZXJtIHBpbnMgYW5kIHVzZSBwdXRfcGFnZSgp IGRpcmVjdGx5LgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X18KWGVuLWRldmVsIG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVucHJvamVjdC5vcmcK aHR0cHM6Ly9saXN0cy54ZW5wcm9qZWN0Lm9yZy9tYWlsbWFuL2xpc3RpbmZvL3hlbi1kZXZlbA== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09905C5DF60 for ; Wed, 6 Nov 2019 00:03:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F14021A49 for ; Wed, 6 Nov 2019 00:03:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F14021A49 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 541876B0005; Tue, 5 Nov 2019 19:03:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 517716B0007; Tue, 5 Nov 2019 19:03:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42DF06B0008; Tue, 5 Nov 2019 19:03:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 2FE106B0005 for ; Tue, 5 Nov 2019 19:03:21 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id CAE958249980 for ; Wed, 6 Nov 2019 00:03:20 +0000 (UTC) X-FDA: 76123902960.14.crime41_57496f7612117 X-HE-Tag: crime41_57496f7612117 X-Filterd-Recvd-Size: 7191 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Nov 2019 00:03:19 +0000 (UTC) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Nov 2019 16:03:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,271,1569308400"; d="scan'208";a="200547335" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga008.fm.intel.com with ESMTP; 05 Nov 2019 16:03:15 -0800 Date: Tue, 5 Nov 2019 16:03:15 -0800 From: Sean Christopherson To: Dan Williams Cc: David Hildenbrand , Linux Kernel Mailing List , Linux MM , Michal Hocko , Andrew Morton , kvm-ppc@vger.kernel.org, linuxppc-dev , KVM list , linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org, xen-devel , X86 ML , Alexander Duyck , Alexander Duyck , Alex Williamson , Allison Randal , Andy Lutomirski , "Aneesh Kumar K.V" , Anshuman Khandual , Anthony Yznaga , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Cornelia Huck , Dave Hansen , Haiyang Zhang , "H. Peter Anvin" , Ingo Molnar , "Isaac J. Manjarres" , Jim Mattson , Joerg Roedel , Johannes Weiner , Juergen Gross , KarimAllah Ahmed , Kees Cook , "K. Y. Srinivasan" , "Matthew Wilcox (Oracle)" , Matt Sickler , Mel Gorman , Michael Ellerman , Michal Hocko , Mike Rapoport , Mike Rapoport , Nicholas Piggin , Oscar Salvador , Paolo Bonzini , Paul Mackerras , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , Qian Cai , Radim =?utf-8?B?S3LEjW3DocWZ?= , Sasha Levin , Stefano Stabellini , Stephen Hemminger , Thomas Gleixner , Vitaly Kuznetsov , Vlastimil Babka , Wanpeng Li , YueHaibing , Adam Borowski Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes Message-ID: <20191106000315.GI23297@linux.intel.com> References: <01adb4cb-6092-638c-0bab-e61322be7cf5@redhat.com> <613f3606-748b-0e56-a3ad-1efaffa1a67b@redhat.com> <20191105160000.GC8128@linux.intel.com> <20191105231316.GE23297@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 05, 2019 at 03:43:29PM -0800, Dan Williams wrote: > On Tue, Nov 5, 2019 at 3:30 PM Dan Williams wrote: > > > > On Tue, Nov 5, 2019 at 3:13 PM Sean Christopherson > > wrote: > > > > > > On Tue, Nov 05, 2019 at 03:02:40PM -0800, Dan Williams wrote: > > > > On Tue, Nov 5, 2019 at 12:31 PM David Hildenbrand wrote: > > > > > > The scarier code (for me) is transparent_hugepage_adjust() and > > > > > > kvm_mmu_zap_collapsible_spte(), as I don't at all understand the > > > > > > interaction between THP and _PAGE_DEVMAP. > > > > > > > > > > The x86 KVM MMU code is one of the ugliest code I know (sorry, but it > > > > > had to be said :/ ). Luckily, this should be independent of the > > > > > PG_reserved thingy AFAIKs. > > > > > > > > Both transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte() > > > > are honoring kvm_is_reserved_pfn(), so again I'm missing where the > > > > page count gets mismanaged and leads to the reported hang. > > > > > > When mapping pages into the guest, KVM gets the page via gup(), which > > > increments the page count for ZONE_DEVICE pages. But KVM puts the page > > > using kvm_release_pfn_clean(), which skips put_page() if PageReserved() > > > and so never puts its reference to ZONE_DEVICE pages. > > > > Oh, yeah, that's busted. > > Ugh, it's extra busted because every other gup user in the kernel > tracks the pages resulting from gup and puts them (put_page()) when > they are done. KVM wants to forget about whether it did a gup to get > the page and optionally trigger put_page() based purely on the pfn. > Outside of VFIO device assignment that needs pages pinned for DMA, why > does KVM itself need to pin pages? If pages are pinned over a return > to userspace that needs to be a FOLL_LONGTERM gup. Short answer, KVM pins the page to ensure correctness with respect to the primary MMU invalidating the associated host virtual address, e.g. when the page is being migrated or unmapped from host userspace. The main use of gup() is to handle guest page faults and map pages into the guest, i.e. into KVM's secondary MMU. KVM uses gup() to both get the PFN and to temporarily pin the page. The pin is held just long enough to guaranteed that any invalidation via the mmu_notifier will be stalled until after KVM finishes installing the page into the secondary MMU, i.e. the pin is short-term and not held across a return to userspace or entry into the guest. When a subsequent mmu_notifier invalidation occurs, KVM pulls the PFN from the secondary MMU and uses that to update accessed and dirty bits in the host. There are a few other KVM flows that eventually call into gup(), but those are "traditional" short-term pins and use put_page() directly.