From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBBBAC001DF for ; Tue, 8 Aug 2023 14:26:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF1D76B0071; Tue, 8 Aug 2023 10:26:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D7C186B0074; Tue, 8 Aug 2023 10:26:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1B898D0001; Tue, 8 Aug 2023 10:26:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AC5186B0071 for ; Tue, 8 Aug 2023 10:26:13 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 249CD40D18 for ; Tue, 8 Aug 2023 14:26:13 +0000 (UTC) X-FDA: 81101162226.02.EF99E59 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf27.hostedemail.com (Postfix) with ESMTP id 2D87E4001E for ; Tue, 8 Aug 2023 14:26:09 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="Pih/fD9T"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3gFDSZAYKCJcJ51EA37FF7C5.3FDC9ELO-DDBM13B.FI7@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3gFDSZAYKCJcJ51EA37FF7C5.3FDC9ELO-DDBM13B.FI7@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691504770; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zTo5ojti02a55E58VI3g974kVDsDFjEN9JowZT+qd/k=; b=rGMTokIgADbbKo/2PHXd3RnoEbqqUI1zcsH6+KMbzbJHiHYEdaRN1YJYJiHQ2uVlHVuuK5 Mg/q8u5cO69ebz5IwsNyw3ZP34kJZfePS+NIQehwyPnW48XTCLd+HFDBN09ePH0LBFsG/S yPL6h63gcqUSjnar4rFlVp2vh9cq9m8= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="Pih/fD9T"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3gFDSZAYKCJcJ51EA37FF7C5.3FDC9ELO-DDBM13B.FI7@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3gFDSZAYKCJcJ51EA37FF7C5.3FDC9ELO-DDBM13B.FI7@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691504770; a=rsa-sha256; cv=none; b=JpTAz8OqT75Na0vKyEaRllsCCc11E/P8r6tR6+XlBRXmvytl7Zv4HaGR+vDVsj4gVY2aIl HIKG2M2IbVk2hIDhT1i6zTfxKI1+HP9CRHkYtrxL2LxkLqB0NAaEJx9P56MGFBWKwpYnlF rAP0VjpV5xSisiwSddJfnh3EQRmnthM= Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-563b4d68e07so4264833a12.1 for ; Tue, 08 Aug 2023 07:26:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691504769; x=1692109569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zTo5ojti02a55E58VI3g974kVDsDFjEN9JowZT+qd/k=; b=Pih/fD9TB8YNkhdl0HCf7jqBEuHcdacCw54bwTyELxyNx0OCg2nsMs7RzehtR4uxlz tlA090lL3CU8pQBP7PXoMQ23JzF2W193cETTlMos1HJsQrvQhNsUmH//w6HWWGAWFhbX MobhfLLUzj6I8CMgjJ/GIbr0rEm96LC/5CrmnWyyAFWRNbw1UQ9BvrNsCnWXxugHVWQ6 ZgOXqEg1qWRqeS1cV4iaLz4freDxciBFJFb3XNQ2xPI0KKMg0tgOz0tGW8FsugGfOBxC NlgLoNxpm0atatAojnTVLuhThddAz1IPrl8mF/n2ioq5FnShQrvdX9nroapTz4JGY704 cgEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691504769; x=1692109569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zTo5ojti02a55E58VI3g974kVDsDFjEN9JowZT+qd/k=; b=g7MCPliHaq/DgmqyzfPoCBPA0ipa8mjIlUIHZPe7wramzKjsiKjThMr0f19sdL8/mr mM4EIknE9h7fw/T0O+ASDpSa8XNNkAL4csScnEnBUHOF7MvxqDejgCEjrhg8I/M0b/Gj UzdiXR6Us2k/T+yLcYLMWVHMkawvtvw40MV+gU+jlQw3eKpr3uHiyHGf4HOo+oF2CcXS Gu/Go8GhdmbPX8R0fxT4RT5HMp7NxXdeK+f6pw6Cy+ClMdT3vxL3BDZCQntiY4NtBrQE y8R6nns3e6ZbrkdP1RJswYAaFrMAlaAqkHQl/hHvYjHIdATQOA9dgXQspa7MqnGYmve2 eulQ== X-Gm-Message-State: AOJu0YzUGEkqv5MSmEJpBi+hRZ7ju6h+sFc/wN+8/9bAhnX4eNEk5/0W P8ja0NhLwugG8ysxjczpvmeRgG/WYsY= X-Google-Smtp-Source: AGHT+IEdPz8huAHqBMaXUgoEqld6bpjYXBbq0y8UIRXJ9G7iWwnU0w/pw1lOLCb7FPvPaLYPGA06MiJ6a7U= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:2792:b0:267:f9ed:93b9 with SMTP id pw18-20020a17090b279200b00267f9ed93b9mr101702pjb.3.1691504768630; Tue, 08 Aug 2023 07:26:08 -0700 (PDT) Date: Tue, 8 Aug 2023 07:26:07 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230808071329.19995-1-yan.y.zhao@intel.com> <20230808071702.20269-1-yan.y.zhao@intel.com> Message-ID: Subject: Re: [RFC PATCH 3/3] KVM: x86/mmu: skip zap maybe-dma-pinned pages for NUMA migration From: Sean Christopherson To: Jason Gunthorpe Cc: Yan Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com, mike.kravetz@oracle.com, apopple@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 2D87E4001E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: h6cwf8g5a5pdftfgmp3sq3cwhr9ci7uo X-HE-Tag: 1691504769-735174 X-HE-Meta: U2FsdGVkX19mUnk+hh6KCwMDeZUuCqcbGdDehson1CN43uB2IgKUOA1vIxN0SKlUXjcrgtH/f2E89QQohBjk2VPzzq17L47iD0QPAoTnCe2SG72xrZI809wObC3Y4pMlCnmLCAy/uelIZ0A+kH8qtnRz3rKlK6hz9dK7Gwe28GxGyop4XfvSBGcJjR1ES+3PJPUV+Fcy3LUkk0q8kHCQi23P9/TF72MgVrfqGMOQZaaJlvmDe8NSbbl5q9gd+ZS+zWx8ZdqFTSb33iSKG38q9gbZNFo6hoOFiUkb1oD/UewY33AYaQwc6YFZjKlhF5pB41qyc3ogTKQ3xEdG6fRu4S9bGbiAmRqsHteft6mAc63t5LwXvJ0cl7H1FeeRtJI2ap0kZuvumT0qM818BeiIT7zBHEClA0Jsry2C41RzH0Q8zxD93FBX/zwfbhexZLuOXP+2ZU4eWwGseEl1zUU1ln3WXyP22+JRd+7Ku1dHA473EgTWK3mnTiw4nPasvykEd8K+psTCOmvSW8vbNwaIXXTmm9WW+c7yMpUZ3HNaZ9orxkqvUPHnHixOgtJzCN99RRt13ISO5eRB4I4Hi5hWBRKWgxkI56MM8GpqHftPEiW/FAq93vNopuJROXib18yKUzFrtljHdO+Qw/jkPjI9B1PyOXaeLHznQMz5qF5NjouPwMuaGigscdM+5KqbRKvSWJh45RNlpyoevp97HllJx8pSbfWuNzQx/e6vJop11oOsFkwKz3ve9+aKUjng8G8g0J2rM5PKCwMPmCPh+8+3YfjHqJGf2l8dnHTnUtLvvhc0GQY364sY2arRgLncnDUU/emSUrz1fz3Ej2mzIryLtoVeZBSLaKqQR4ZAkL89Yq8g1EIB03gyC1RRWqJz6JyQwIE4hV7nzAPpCf1RL9X8G+6xKLuGWvqlWAXhiVJRvuM6Qi4CykLKVf5XFoJYYna5LdlAuMIF2KjlKT+mWoa XXM6XPcB 8Je7bVorpGkuGi25z1mh8T8GXWyWq10jJL2Wo/cGWnKEdoBZiXXq9MvM18/GhJpxark6nKVA+eEdCoDrd8ymRgF+cRGIdi1lcFlQ18P94cF6T2/5PxJaW1LTUzc6ar/hSYbS5JPIgu6JgRBTD9h5j0Bao8g5YJLsRWlQu2I7wImZ14I0u7IM8v0UHUm6ovSGUw5lf95oMhJHZIRhEHBlbA1yc9E3sU4eWIX3GoqTMUYoyOYeDcMeJpzinpICyfT8QT3gUFdVvZi8tILeb6NAJcw0jnIlLr6gDiG40OApthjBNjH4dwQipHR2Qjehj/h/jEC7Ia7lms9bPQpNyvS0fA3GCWMYBc/dFSZvVNrfwb7dglZGHtuPM9OFCCAGhOOLFbJTwDMqonqxhuCtOAxAHnAHjdDxlNCOnaok7+mBN1Lq1p038hNwsNA2itRqUuxNsGSRYG44yqa0YPWUbekw7ooElMShAaMrxCQwxK8xDj/YkCBm//BzFlv8BCa6N1rcvd0xLDEK+5+wuTyKIaLk+E0qLYsLMbhJjrmv3kjNjYMWUyNGoZ8yXsrDVGlhs8Fp6N3fc6Pvn7LaVAT00XOtRN3IksQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 08, 2023, Jason Gunthorpe wrote: > On Tue, Aug 08, 2023 at 03:17:02PM +0800, Yan Zhao wrote: > > @@ -859,6 +860,21 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, > > !is_last_spte(iter.old_spte, iter.level)) > > continue; > > > > + if (skip_pinned) { > > + kvm_pfn_t pfn = spte_to_pfn(iter.old_spte); > > + struct page *page = kvm_pfn_to_refcounted_page(pfn); > > + struct folio *folio; > > + > > + if (!page) > > + continue; > > + > > + folio = page_folio(page); > > + > > + if (folio_test_anon(folio) && PageAnonExclusive(&folio->page) && > > + folio_maybe_dma_pinned(folio)) > > + continue; > > + } > > + > > I don't get it.. > > The last patch made it so that the NUMA balancing code doesn't change > page_maybe_dma_pinned() pages to PROT_NONE > > So why doesn't KVM just check if the current and new SPTE are the same > and refrain from invalidating if nothing changed? Because KVM doesn't have visibility into the current and new PTEs when the zapping occurs. The contract for invalidate_range_start() requires that KVM drop all references before returning, and so the zapping occurs before change_pte_range() or change_huge_pmd() have done antyhing. > Duplicating the checks here seems very frail to me. Yes, this is approach gets a hard NAK from me. IIUC, folio_maybe_dma_pinned() can yield different results purely based on refcounts, i.e. KVM could skip pages that the primary MMU does not, and thus violate the mmu_notifier contract. And in general, I am steadfastedly against adding any kind of heuristic to KVM's zapping logic. This really needs to be fixed in the primary MMU and not require any direct involvement from secondary MMUs, e.g. the mmu_notifier invalidation itself needs to be skipped.