From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Date: Mon, 5 Aug 2024 16:22:54 -0700 Subject: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path In-Reply-To: <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> References: <20240726235234.228822-1-seanjc@google.com> <20240726235234.228822-65-seanjc@google.com> <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Message-ID: List-Id: To: kvm-riscv@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Sat, Aug 03, 2024, maobibo wrote: > On 2024/8/3 ??3:32, Sean Christopherson wrote: > > On Fri, Aug 02, 2024, maobibo wrote: > > > On 2024/7/27 ??7:52, Sean Christopherson wrote: > > > > Mark pages/folios dirty only the slow page fault path, i.e. only when > > > > mmu_lock is held and the operation is mmu_notifier-protected, as marking a > > > > page/folio dirty after it has been written back can make some filesystems > > > > unhappy (backing KVM guests will such filesystem files is uncommon, and > > > > the race is minuscule, hence the lack of complaints). > > > > > > > > See the link below for details. > > > > > > > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes at gmail.com > > > > Signed-off-by: Sean Christopherson > > > > --- > > > > arch/loongarch/kvm/mmu.c | 18 ++++++++++-------- > > > > 1 file changed, 10 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > > > > index 2634a9e8d82c..364dd35e0557 100644 > > > > --- a/arch/loongarch/kvm/mmu.c > > > > +++ b/arch/loongarch/kvm/mmu.c > > > > @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ > > > > if (kvm_pte_young(changed)) > > > > kvm_set_pfn_accessed(pfn); > > > > - if (kvm_pte_dirty(changed)) { > > > > - mark_page_dirty(kvm, gfn); > > > > - kvm_set_pfn_dirty(pfn); > > > > - } > > > > if (page) > > > > put_page(page); > > > > } > > > > + > > > > + if (kvm_pte_dirty(changed)) > > > > + mark_page_dirty(kvm, gfn); > > > > + > > > > return ret; > > > > out: > > > > spin_unlock(&kvm->mmu_lock); > > > > @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write) > > > > else > > > > ++kvm->stat.pages; > > > > kvm_set_pte(ptep, new_pte); > > > > - spin_unlock(&kvm->mmu_lock); > > > > - if (prot_bits & _PAGE_DIRTY) { > > > > - mark_page_dirty_in_slot(kvm, memslot, gfn); > > > > + if (writeable) > > > Is it better to use write or (prot_bits & _PAGE_DIRTY) here? writable is > > > pte permission from function hva_to_pfn_slow(), write is fault action. > > > > Marking folios dirty in the slow/full path basically necessitates marking the > > folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio dirty > > if/when _PAGE_DIRTY is set. > > > > Practically speaking, I'm 99.9% certain it doesn't matter. The folio is marked > > dirty by core MM when the folio is made writable, and cleaning the folio triggers > > an mmu_notifier invalidation. I.e. if the page is mapped writable in KVM's > yes, it is. Thanks for the explanation. kvm_set_pfn_dirty() can be put only > in slow page fault path. I only concern with fault type, read fault type can > set pte entry writable however not _PAGE_DIRTY at stage-2 mmu table. > > > stage-2 PTEs, then its folio has already been marked dirty. > Considering one condition although I do not know whether it exists actually. > user mode VMM writes the folio with hva address firstly, then VCPU thread > *reads* the folio. With primary mmu table, pte entry is writable and > _PAGE_DIRTY is set, with secondary mmu table(state-2 PTE table), it is > pte_none since the filio is accessed at first time, so there will be slow > page fault path for stage-2 mmu page table filling. > > Since it is read fault, stage-2 PTE will be created with _PAGE_WRITE(coming > from function hva_to_pfn_slow()), however _PAGE_DIRTY is not set. Do we need > call kvm_set_pfn_dirty() at this situation? If KVM doesn't mark the folio dirty when the stage-2 _PAGE_DIRTY flag is set, i.e. as proposed in this series, then yes, KVM needs to call kvm_set_pfn_dirty() even though the VM hasn't (yet) written to the memory. In practice, KVM calling kvm_set_pfn_dirty() is redundant the majority of the time, as the stage-1 PTE will have _PAGE_DIRTY set, and that will get propagated to the folio when the primary MMU does anything relevant with the PTE. And for file systems that care about writeback, odds are very good that the folio was marked dirty even earlier, when MM invoked vm_operations_struct.page_mkwrite(). The reason I am pushing to have all architectures mark pages/folios dirty in the slow page fault path is that a false positive (marking a folio dirty without the folio ever being written in _any_ context since the last pte_mkclean()) is rare, and at worst results an unnecessary writeback. On the other hand, marking folios dirty in fast page fault handlers (or anywhere else that isn't protected by mmu_notifiers) is technically unsafe. In other words, the intent is to sacrifice accuracy to improve stability/robustness, because the vast majority of time the loss in accuracy has no effect, and the worst case scenario is that the kernel does I/O that wasn't necessary. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50FAF172BA9 for ; Mon, 5 Aug 2024 23:22:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722900179; cv=none; b=QIuFUvnHPuSicAHlt8usXJN5ErrwdvfabG+2v07LPOoLyrWj525bot53+H6pR9972OChDQG87pURsspnSlPMheUcpUuGdb6CvT1ELp6VkoSKBRgAOkVII8eZZqCVSadjQqSkLeNPm6rCu2klnU1UK+1hnhGuFtQ3wR99zvc/L7M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722900179; c=relaxed/simple; bh=lnV0jTwl7I7N5aiPuuu9PVE+w5Wngmq6yvhGhBmjrvk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ReWZT2SEttUIG70LweuS+Gz0B7RjsGkNnxCAGxYYMM7TV0U6KMWgylg5f7D1UvlhPPj6wuyM6eLaAgM9DOD93LnqaXuHZByaAIduBlCiI0BYIOQoxAwug3rLIn11MMx5+03uPKHPp5MLZ4W2Mckv+8DjmnWNMKDlQGFAMJwfKPI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4EdCbKLK; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4EdCbKLK" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7ad78c1a019so229278a12.2 for ; Mon, 05 Aug 2024 16:22:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722900176; x=1723504976; darn=lists.linux.dev; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=ZqhT0BfYtdHqe5JEJY1dxcglZN0X6+UEhtDWxpbx9R0=; b=4EdCbKLK5DpxYdxOAcf04YnU5LBMhSJqxGgNJpEY9G+RLAlBXxFOb2hWkhTWbYLUHN 1NrxhA4jJ6UVlrAUiUI972+XwY/P7amMpXlSUprVfzJOzYwtQJrvFGYT1F8y2abQeuQR aSWQe3XsOSl/tRw37VOfNYvkga0NXwbkE8cTQAAajVvDIBaCdh2uf9rSLbyZUQn2W/Ct NxlxCfRWC84L6FEpb1aI1c0/JMu7M+pPWGO81IBp6GVcv0gRrOHR18kqNWuo/LJgFR+e WyYscB0ZQ7uFTH8K9Ip/HfnPvu/huYSX7CBS7hnYKvrWx1MITwaiz9d6Bvn9Gxit3Snk tdzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722900176; x=1723504976; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ZqhT0BfYtdHqe5JEJY1dxcglZN0X6+UEhtDWxpbx9R0=; b=XNuscAFe0m0cCE83OelBQcrb2eVs/GH+YcQ/3fKbt94U7Ds3g9a6BkIgr5jPdfxqab cOaiP+yKQfQzLQVda8vDW+ms75TtFHsKleCXL1kcL9tdAQAi69il/vfgAJ6KDTEUF73c gZIKqo7ksQmcXqfHJAZoIu3dbpGjEEq8q9Xu2QhZnS6aoJKuUGyu1WXwTWdRfs/tGqzv qPVhoRiNq4ahLncWEztAxgJ3YK1ZBT/qd8qnGOd6loPCEyePygIeVwSvpEoROtFQQI/m 8zRD6FYh5t6tEVxzSZz7iz1exS+te6DMc+NHZOs+VJ05acc4hzEE9G3+iLSJz88gmlAz zEVg== X-Forwarded-Encrypted: i=1; AJvYcCXwZmY3WOw3iBLIQos0KR80S+bneSjBn8AhJu3ctPUTrUVDVYdZG40Qa58L7JXyGSc/RjQyQ2TVNfM9IrtB2SPmPERwt3l7 X-Gm-Message-State: AOJu0YzjBlGNI9r50eEfPVxWPD37lwyhUHleUiI/PRv6khoUQBV74Not bYrjA0RiTI4RSzFK1i0bGW/dUqoNv9ZzsD9CLk2z7UyWPEBOvhC1Qs4N61LDFvMq5BqoUehp09z aQQ== X-Google-Smtp-Source: AGHT+IFnC2s9xgJUYrQamoHpNNjMdfjnR6jKC7U3Po3pJ/hHYZ7yn8m2xqR/ZVZ53CGLeUu13/uKbBw5g2Q= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:dacd:b0:1fb:325d:2b62 with SMTP id d9443c01a7336-1ff57464ba5mr9382545ad.10.1722900176428; Mon, 05 Aug 2024 16:22:56 -0700 (PDT) Date: Mon, 5 Aug 2024 16:22:54 -0700 In-Reply-To: <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240726235234.228822-1-seanjc@google.com> <20240726235234.228822-65-seanjc@google.com> <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Message-ID: Subject: Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path From: Sean Christopherson To: maobibo Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Tianrui Zhao , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, David Matlack , David Stevens Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Sat, Aug 03, 2024, maobibo wrote: > On 2024/8/3 =E4=B8=8A=E5=8D=883:32, Sean Christopherson wrote: > > On Fri, Aug 02, 2024, maobibo wrote: > > > On 2024/7/27 =E4=B8=8A=E5=8D=887:52, Sean Christopherson wrote: > > > > Mark pages/folios dirty only the slow page fault path, i.e. only wh= en > > > > mmu_lock is held and the operation is mmu_notifier-protected, as ma= rking a > > > > page/folio dirty after it has been written back can make some files= ystems > > > > unhappy (backing KVM guests will such filesystem files is uncommon,= and > > > > the race is minuscule, hence the lack of complaints). > > > >=20 > > > > See the link below for details. > > > >=20 > > > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gma= il.com > > > > Signed-off-by: Sean Christopherson > > > > --- > > > > arch/loongarch/kvm/mmu.c | 18 ++++++++++-------- > > > > 1 file changed, 10 insertions(+), 8 deletions(-) > > > >=20 > > > > diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > > > > index 2634a9e8d82c..364dd35e0557 100644 > > > > --- a/arch/loongarch/kvm/mmu.c > > > > +++ b/arch/loongarch/kvm/mmu.c > > > > @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu = *vcpu, unsigned long gpa, bool writ > > > > if (kvm_pte_young(changed)) > > > > kvm_set_pfn_accessed(pfn); > > > > - if (kvm_pte_dirty(changed)) { > > > > - mark_page_dirty(kvm, gfn); > > > > - kvm_set_pfn_dirty(pfn); > > > > - } > > > > if (page) > > > > put_page(page); > > > > } > > > > + > > > > + if (kvm_pte_dirty(changed)) > > > > + mark_page_dirty(kvm, gfn); > > > > + > > > > return ret; > > > > out: > > > > spin_unlock(&kvm->mmu_lock); > > > > @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu= , unsigned long gpa, bool write) > > > > else > > > > ++kvm->stat.pages; > > > > kvm_set_pte(ptep, new_pte); > > > > - spin_unlock(&kvm->mmu_lock); > > > > - if (prot_bits & _PAGE_DIRTY) { > > > > - mark_page_dirty_in_slot(kvm, memslot, gfn); > > > > + if (writeable) > > > Is it better to use write or (prot_bits & _PAGE_DIRTY) here? writabl= e is > > > pte permission from function hva_to_pfn_slow(), write is fault action= . > >=20 > > Marking folios dirty in the slow/full path basically necessitates marki= ng the > > folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio= dirty > > if/when _PAGE_DIRTY is set. > >=20 > > Practically speaking, I'm 99.9% certain it doesn't matter. The folio i= s marked > > dirty by core MM when the folio is made writable, and cleaning the foli= o triggers > > an mmu_notifier invalidation. I.e. if the page is mapped writable in K= VM's > yes, it is. Thanks for the explanation. kvm_set_pfn_dirty() can be put on= ly > in slow page fault path. I only concern with fault type, read fault type = can > set pte entry writable however not _PAGE_DIRTY at stage-2 mmu table. >=20 > > stage-2 PTEs, then its folio has already been marked dirty. > Considering one condition although I do not know whether it exists actual= ly. > user mode VMM writes the folio with hva address firstly, then VCPU thread > *reads* the folio. With primary mmu table, pte entry is writable and > _PAGE_DIRTY is set, with secondary mmu table(state-2 PTE table), it is > pte_none since the filio is accessed at first time, so there will be slow > page fault path for stage-2 mmu page table filling. >=20 > Since it is read fault, stage-2 PTE will be created with _PAGE_WRITE(comi= ng > from function hva_to_pfn_slow()), however _PAGE_DIRTY is not set. Do we n= eed > call kvm_set_pfn_dirty() at this situation? If KVM doesn't mark the folio dirty when the stage-2 _PAGE_DIRTY flag is se= t, i.e. as proposed in this series, then yes, KVM needs to call kvm_set_pfn_di= rty() even though the VM hasn't (yet) written to the memory. In practice, KVM ca= lling kvm_set_pfn_dirty() is redundant the majority of the time, as the stage-1 P= TE will have _PAGE_DIRTY set, and that will get propagated to the folio when t= he primary MMU does anything relevant with the PTE. And for file systems that= care about writeback, odds are very good that the folio was marked dirty even ea= rlier, when MM invoked vm_operations_struct.page_mkwrite(). The reason I am pushing to have all architectures mark pages/folios dirty i= n the slow page fault path is that a false positive (marking a folio dirty withou= t the folio ever being written in _any_ context since the last pte_mkclean()) is = rare, and at worst results an unnecessary writeback. On the other hand, marking = folios dirty in fast page fault handlers (or anywhere else that isn't protected by mmu_notifiers) is technically unsafe. In other words, the intent is to sacrifice accuracy to improve stability/ro= bustness, because the vast majority of time the loss in accuracy has no effect, and t= he worst case scenario is that the kernel does I/O that wasn't necessary. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 61CD9C3DA4A for ; Mon, 5 Aug 2024 23:23:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=uLZadLWXKsje59Gdg70oXfFFltftcn7RkZz4YmBXCpc=; b=QABIMKBoLV72KiRr45A0Pa2coP H61AvVvGhgW8dFuVv9GEv6l6+2ddZBHkszii7K4ifc5WOXdT/3jYWVIbCwba29RNRVLBxvLTDR+1h 6HRzpk8lhqj3DQlw1LAwvZtWEcFUPPU3mcNY9vWu8D8LYuWaAD1MvyBQAIBvGqujx63wBA2bFz90x IKLguv612ulYAfL0Yyaf6mshsctafdcD6XaZ+3J4D8uXzOh7d7dmb/qtST7ILKISi9kXnjm/SldyM Szv8IZ3O3a3/iPJ19vVGlNgV+W29KdkNdcxpLC5zTBg22M4z5Jd8YHt1K6cSk+j04yQVymPYtGrWl 6k4CrERg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sb736-0000000HXSs-1pp8; Mon, 05 Aug 2024 23:23:32 +0000 Received: from mail-pl1-x64a.google.com ([2607:f8b0:4864:20::64a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sb72Y-0000000HXGy-1E4w for linux-riscv@lists.infradead.org; Mon, 05 Aug 2024 23:23:00 +0000 Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1fc5651e888so2607985ad.0 for ; Mon, 05 Aug 2024 16:22:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722900176; x=1723504976; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=ZqhT0BfYtdHqe5JEJY1dxcglZN0X6+UEhtDWxpbx9R0=; b=o7dEg/pPwKafDLzD1eiG7YI9CYxpFruJ9KyJTUvmr3fPLH+nB8TFEjzvdpQK5ueVnr JQfDX+atAb6WFb75r65eSxJruX7T3vtDjtvk5RCmYtl4XDs3lhruVw5Yom2j4TioLscg JZz60w0dHTJzi25vL0Ob7miPVD30+uLp5edIvZ1CFnjaMWDOHpDC1o6zg5Z2oLdX/fl7 Lh38K0UJHWwkTF+OppA107+08wlTkBhG4vY8DcbJ8P59KYPBzT+8pDaKyc3gR0fT3JQk nZApDAaHwOncKOoeUrGNFWQunG8LQbUFBMJJRz7Lddp/bSiXEvsTm0PYzTLeUHkLfO60 HsDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722900176; x=1723504976; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ZqhT0BfYtdHqe5JEJY1dxcglZN0X6+UEhtDWxpbx9R0=; b=qR+TYiU/qT3q+H/aBVIKkZVCtmnky2Jvk4TMjUmN/vupGQ8DCYnlmwqutkBqvU6/p3 7C0pH25me/vrrqhB7wKsyu3S67XcmRK5WMN2XfdVAqGapMAhR9glB3q8GvGiujbF3lmU hyP/3PFuDRUVxft3pr9HoM/PkJbxjkwLD9mj89kigKGAvvH4Ow7hS7YdVNGCkkRLXvxL qeWeJts8yaCTThmF/i+tW4p7ZsrhNruVcGzyOb7s4WsHgx7y3B5z1lsOdu/Av9qPEqVr 6FttV7gO4J4E4MCEOVGB0ryHAc7thMqI2oGNjY4TB4DWpSHKAHI7mg0bDCasJ13cSOpK Zsvw== X-Forwarded-Encrypted: i=1; AJvYcCVz1HEFnlTHbjB8H2XvM7gTMMrk7Z7cJ0VuL8FxSDr4QI7/5Ajzaptm5xQKlxr2HMHb7Hqlik0hQItuu9vnIxXUKpqsY6L9fxA34tElacw7 X-Gm-Message-State: AOJu0Yy+ZOWBk2lKbHZH1FUycCOoMB06p5WoMIeEiltXBWPG+YyS1K4W r/UFmFPNw9ebOLr/rO10WgI7YiC17oqFopN+9rDxV9iJqIJfZZD4hp72nMruqJ+KrTVWNJDNLNe d+g== X-Google-Smtp-Source: AGHT+IFnC2s9xgJUYrQamoHpNNjMdfjnR6jKC7U3Po3pJ/hHYZ7yn8m2xqR/ZVZ53CGLeUu13/uKbBw5g2Q= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:dacd:b0:1fb:325d:2b62 with SMTP id d9443c01a7336-1ff57464ba5mr9382545ad.10.1722900176428; Mon, 05 Aug 2024 16:22:56 -0700 (PDT) Date: Mon, 5 Aug 2024 16:22:54 -0700 In-Reply-To: <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Mime-Version: 1.0 References: <20240726235234.228822-1-seanjc@google.com> <20240726235234.228822-65-seanjc@google.com> <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Message-ID: Subject: Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path From: Sean Christopherson To: maobibo Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Tianrui Zhao , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, David Matlack , David Stevens X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240805_162258_436955_21A8B6EB X-CRM114-Status: GOOD ( 31.03 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org T24gU2F0LCBBdWcgMDMsIDIwMjQsIG1hb2JpYm8gd3JvdGU6Cj4gT24gMjAyNC84LzMg5LiK5Y2I MzozMiwgU2VhbiBDaHJpc3RvcGhlcnNvbiB3cm90ZToKPiA+IE9uIEZyaSwgQXVnIDAyLCAyMDI0 LCBtYW9iaWJvIHdyb3RlOgo+ID4gPiBPbiAyMDI0LzcvMjcg5LiK5Y2INzo1MiwgU2VhbiBDaHJp c3RvcGhlcnNvbiB3cm90ZToKPiA+ID4gPiBNYXJrIHBhZ2VzL2ZvbGlvcyBkaXJ0eSBvbmx5IHRo ZSBzbG93IHBhZ2UgZmF1bHQgcGF0aCwgaS5lLiBvbmx5IHdoZW4KPiA+ID4gPiBtbXVfbG9jayBp cyBoZWxkIGFuZCB0aGUgb3BlcmF0aW9uIGlzIG1tdV9ub3RpZmllci1wcm90ZWN0ZWQsIGFzIG1h cmtpbmcgYQo+ID4gPiA+IHBhZ2UvZm9saW8gZGlydHkgYWZ0ZXIgaXQgaGFzIGJlZW4gd3JpdHRl biBiYWNrIGNhbiBtYWtlIHNvbWUgZmlsZXN5c3RlbXMKPiA+ID4gPiB1bmhhcHB5IChiYWNraW5n IEtWTSBndWVzdHMgd2lsbCBzdWNoIGZpbGVzeXN0ZW0gZmlsZXMgaXMgdW5jb21tb24sIGFuZAo+ ID4gPiA+IHRoZSByYWNlIGlzIG1pbnVzY3VsZSwgaGVuY2UgdGhlIGxhY2sgb2YgY29tcGxhaW50 cykuCj4gPiA+ID4gCj4gPiA+ID4gU2VlIHRoZSBsaW5rIGJlbG93IGZvciBkZXRhaWxzLgo+ID4g PiA+IAo+ID4gPiA+IExpbms6IGh0dHBzOi8vbG9yZS5rZXJuZWwub3JnL2FsbC9jb3Zlci4xNjgz MDQ0MTYyLmdpdC5sc3RvYWtlc0BnbWFpbC5jb20KPiA+ID4gPiBTaWduZWQtb2ZmLWJ5OiBTZWFu IENocmlzdG9waGVyc29uIDxzZWFuamNAZ29vZ2xlLmNvbT4KPiA+ID4gPiAtLS0KPiA+ID4gPiAg ICBhcmNoL2xvb25nYXJjaC9rdm0vbW11LmMgfCAxOCArKysrKysrKysrLS0tLS0tLS0KPiA+ID4g PiAgICAxIGZpbGUgY2hhbmdlZCwgMTAgaW5zZXJ0aW9ucygrKSwgOCBkZWxldGlvbnMoLSkKPiA+ ID4gPiAKPiA+ID4gPiBkaWZmIC0tZ2l0IGEvYXJjaC9sb29uZ2FyY2gva3ZtL21tdS5jIGIvYXJj aC9sb29uZ2FyY2gva3ZtL21tdS5jCj4gPiA+ID4gaW5kZXggMjYzNGE5ZThkODJjLi4zNjRkZDM1 ZTA1NTcgMTAwNjQ0Cj4gPiA+ID4gLS0tIGEvYXJjaC9sb29uZ2FyY2gva3ZtL21tdS5jCj4gPiA+ ID4gKysrIGIvYXJjaC9sb29uZ2FyY2gva3ZtL21tdS5jCj4gPiA+ID4gQEAgLTYwOCwxMyArNjA4 LDEzIEBAIHN0YXRpYyBpbnQga3ZtX21hcF9wYWdlX2Zhc3Qoc3RydWN0IGt2bV92Y3B1ICp2Y3B1 LCB1bnNpZ25lZCBsb25nIGdwYSwgYm9vbCB3cml0Cj4gPiA+ID4gICAgCQlpZiAoa3ZtX3B0ZV95 b3VuZyhjaGFuZ2VkKSkKPiA+ID4gPiAgICAJCQlrdm1fc2V0X3Bmbl9hY2Nlc3NlZChwZm4pOwo+ ID4gPiA+IC0JCWlmIChrdm1fcHRlX2RpcnR5KGNoYW5nZWQpKSB7Cj4gPiA+ID4gLQkJCW1hcmtf cGFnZV9kaXJ0eShrdm0sIGdmbik7Cj4gPiA+ID4gLQkJCWt2bV9zZXRfcGZuX2RpcnR5KHBmbik7 Cj4gPiA+ID4gLQkJfQo+ID4gPiA+ICAgIAkJaWYgKHBhZ2UpCj4gPiA+ID4gICAgCQkJcHV0X3Bh Z2UocGFnZSk7Cj4gPiA+ID4gICAgCX0KPiA+ID4gPiArCj4gPiA+ID4gKwlpZiAoa3ZtX3B0ZV9k aXJ0eShjaGFuZ2VkKSkKPiA+ID4gPiArCQltYXJrX3BhZ2VfZGlydHkoa3ZtLCBnZm4pOwo+ID4g PiA+ICsKPiA+ID4gPiAgICAJcmV0dXJuIHJldDsKPiA+ID4gPiAgICBvdXQ6Cj4gPiA+ID4gICAg CXNwaW5fdW5sb2NrKCZrdm0tPm1tdV9sb2NrKTsKPiA+ID4gPiBAQCAtOTE1LDEyICs5MTUsMTQg QEAgc3RhdGljIGludCBrdm1fbWFwX3BhZ2Uoc3RydWN0IGt2bV92Y3B1ICp2Y3B1LCB1bnNpZ25l ZCBsb25nIGdwYSwgYm9vbCB3cml0ZSkKPiA+ID4gPiAgICAJZWxzZQo+ID4gPiA+ICAgIAkJKytr dm0tPnN0YXQucGFnZXM7Cj4gPiA+ID4gICAgCWt2bV9zZXRfcHRlKHB0ZXAsIG5ld19wdGUpOwo+ ID4gPiA+IC0Jc3Bpbl91bmxvY2soJmt2bS0+bW11X2xvY2spOwo+ID4gPiA+IC0JaWYgKHByb3Rf Yml0cyAmIF9QQUdFX0RJUlRZKSB7Cj4gPiA+ID4gLQkJbWFya19wYWdlX2RpcnR5X2luX3Nsb3Qo a3ZtLCBtZW1zbG90LCBnZm4pOwo+ID4gPiA+ICsJaWYgKHdyaXRlYWJsZSkKPiA+ID4gSXMgaXQg YmV0dGVyIHRvIHVzZSB3cml0ZSBvciAocHJvdF9iaXRzICYgX1BBR0VfRElSVFkpIGhlcmU/ICB3 cml0YWJsZSBpcwo+ID4gPiBwdGUgcGVybWlzc2lvbiBmcm9tIGZ1bmN0aW9uIGh2YV90b19wZm5f c2xvdygpLCB3cml0ZSBpcyBmYXVsdCBhY3Rpb24uCj4gPiAKPiA+IE1hcmtpbmcgZm9saW9zIGRp cnR5IGluIHRoZSBzbG93L2Z1bGwgcGF0aCBiYXNpY2FsbHkgbmVjZXNzaXRhdGVzIG1hcmtpbmcg dGhlCj4gPiBmb2xpbyBkaXJ0eSBpZiBLVk0gY3JlYXRlcyBhIHdyaXRhYmxlIFNQVEUsIGFzIEtW TSB3b24ndCBtYXJrIHRoZSBmb2xpbyBkaXJ0eQo+ID4gaWYvd2hlbiBfUEFHRV9ESVJUWSBpcyBz ZXQuCj4gPiAKPiA+IFByYWN0aWNhbGx5IHNwZWFraW5nLCBJJ20gOTkuOSUgY2VydGFpbiBpdCBk b2Vzbid0IG1hdHRlci4gIFRoZSBmb2xpbyBpcyBtYXJrZWQKPiA+IGRpcnR5IGJ5IGNvcmUgTU0g d2hlbiB0aGUgZm9saW8gaXMgbWFkZSB3cml0YWJsZSwgYW5kIGNsZWFuaW5nIHRoZSBmb2xpbyB0 cmlnZ2Vycwo+ID4gYW4gbW11X25vdGlmaWVyIGludmFsaWRhdGlvbi4gIEkuZS4gaWYgdGhlIHBh Z2UgaXMgbWFwcGVkIHdyaXRhYmxlIGluIEtWTSdzCj4geWVzLCBpdCBpcy4gVGhhbmtzIGZvciB0 aGUgZXhwbGFuYXRpb24uIGt2bV9zZXRfcGZuX2RpcnR5KCkgY2FuIGJlIHB1dCBvbmx5Cj4gaW4g c2xvdyBwYWdlIGZhdWx0IHBhdGguIEkgb25seSBjb25jZXJuIHdpdGggZmF1bHQgdHlwZSwgcmVh ZCBmYXVsdCB0eXBlIGNhbgo+IHNldCBwdGUgZW50cnkgd3JpdGFibGUgaG93ZXZlciBub3QgX1BB R0VfRElSVFkgYXQgc3RhZ2UtMiBtbXUgdGFibGUuCj4gCj4gPiBzdGFnZS0yIFBURXMsIHRoZW4g aXRzIGZvbGlvIGhhcyBhbHJlYWR5IGJlZW4gbWFya2VkIGRpcnR5Lgo+IENvbnNpZGVyaW5nIG9u ZSBjb25kaXRpb24gYWx0aG91Z2ggSSBkbyBub3Qga25vdyB3aGV0aGVyIGl0IGV4aXN0cyBhY3R1 YWxseS4KPiB1c2VyIG1vZGUgVk1NIHdyaXRlcyB0aGUgZm9saW8gd2l0aCBodmEgYWRkcmVzcyBm aXJzdGx5LCB0aGVuIFZDUFUgdGhyZWFkCj4gKnJlYWRzKiB0aGUgZm9saW8uIFdpdGggcHJpbWFy eSBtbXUgdGFibGUsIHB0ZSBlbnRyeSBpcyB3cml0YWJsZSBhbmQKPiBfUEFHRV9ESVJUWSBpcyBz ZXQsIHdpdGggc2Vjb25kYXJ5IG1tdSB0YWJsZShzdGF0ZS0yIFBURSB0YWJsZSksIGl0IGlzCj4g cHRlX25vbmUgc2luY2UgdGhlIGZpbGlvIGlzIGFjY2Vzc2VkIGF0IGZpcnN0IHRpbWUsIHNvIHRo ZXJlIHdpbGwgYmUgc2xvdwo+IHBhZ2UgZmF1bHQgcGF0aCBmb3Igc3RhZ2UtMiBtbXUgcGFnZSB0 YWJsZSBmaWxsaW5nLgo+IAo+IFNpbmNlIGl0IGlzIHJlYWQgZmF1bHQsIHN0YWdlLTIgUFRFIHdp bGwgYmUgY3JlYXRlZCB3aXRoIF9QQUdFX1dSSVRFKGNvbWluZwo+IGZyb20gZnVuY3Rpb24gaHZh X3RvX3Bmbl9zbG93KCkpLCBob3dldmVyIF9QQUdFX0RJUlRZIGlzIG5vdCBzZXQuIERvIHdlIG5l ZWQKPiBjYWxsIGt2bV9zZXRfcGZuX2RpcnR5KCkgYXQgdGhpcyBzaXR1YXRpb24/CgpJZiBLVk0g ZG9lc24ndCBtYXJrIHRoZSBmb2xpbyBkaXJ0eSB3aGVuIHRoZSBzdGFnZS0yIF9QQUdFX0RJUlRZ IGZsYWcgaXMgc2V0LAppLmUuIGFzIHByb3Bvc2VkIGluIHRoaXMgc2VyaWVzLCB0aGVuIHllcywg S1ZNIG5lZWRzIHRvIGNhbGwga3ZtX3NldF9wZm5fZGlydHkoKQpldmVuIHRob3VnaCB0aGUgVk0g aGFzbid0ICh5ZXQpIHdyaXR0ZW4gdG8gdGhlIG1lbW9yeS4gIEluIHByYWN0aWNlLCBLVk0gY2Fs bGluZwprdm1fc2V0X3Bmbl9kaXJ0eSgpIGlzIHJlZHVuZGFudCB0aGUgbWFqb3JpdHkgb2YgdGhl IHRpbWUsIGFzIHRoZSBzdGFnZS0xIFBURQp3aWxsIGhhdmUgX1BBR0VfRElSVFkgc2V0LCBhbmQg dGhhdCB3aWxsIGdldCBwcm9wYWdhdGVkIHRvIHRoZSBmb2xpbyB3aGVuIHRoZQpwcmltYXJ5IE1N VSBkb2VzIGFueXRoaW5nIHJlbGV2YW50IHdpdGggdGhlIFBURS4gIEFuZCBmb3IgZmlsZSBzeXN0 ZW1zIHRoYXQgY2FyZQphYm91dCB3cml0ZWJhY2ssIG9kZHMgYXJlIHZlcnkgZ29vZCB0aGF0IHRo ZSBmb2xpbyB3YXMgbWFya2VkIGRpcnR5IGV2ZW4gZWFybGllciwKd2hlbiBNTSBpbnZva2VkIHZt X29wZXJhdGlvbnNfc3RydWN0LnBhZ2VfbWt3cml0ZSgpLgoKVGhlIHJlYXNvbiBJIGFtIHB1c2hp bmcgdG8gaGF2ZSBhbGwgYXJjaGl0ZWN0dXJlcyBtYXJrIHBhZ2VzL2ZvbGlvcyBkaXJ0eSBpbiB0 aGUKc2xvdyBwYWdlIGZhdWx0IHBhdGggaXMgdGhhdCBhIGZhbHNlIHBvc2l0aXZlIChtYXJraW5n IGEgZm9saW8gZGlydHkgd2l0aG91dCB0aGUKZm9saW8gZXZlciBiZWluZyB3cml0dGVuIGluIF9h bnlfIGNvbnRleHQgc2luY2UgdGhlIGxhc3QgcHRlX21rY2xlYW4oKSkgaXMgcmFyZSwKYW5kIGF0 IHdvcnN0IHJlc3VsdHMgYW4gdW5uZWNlc3Nhcnkgd3JpdGViYWNrLiAgT24gdGhlIG90aGVyIGhh bmQsIG1hcmtpbmcgZm9saW9zCmRpcnR5IGluIGZhc3QgcGFnZSBmYXVsdCBoYW5kbGVycyAob3Ig YW55d2hlcmUgZWxzZSB0aGF0IGlzbid0IHByb3RlY3RlZCBieQptbXVfbm90aWZpZXJzKSBpcyB0 ZWNobmljYWxseSB1bnNhZmUuCgpJbiBvdGhlciB3b3JkcywgdGhlIGludGVudCBpcyB0byBzYWNy aWZpY2UgYWNjdXJhY3kgdG8gaW1wcm92ZSBzdGFiaWxpdHkvcm9idXN0bmVzcywKYmVjYXVzZSB0 aGUgdmFzdCBtYWpvcml0eSBvZiB0aW1lIHRoZSBsb3NzIGluIGFjY3VyYWN5IGhhcyBubyBlZmZl Y3QsIGFuZCB0aGUgd29yc3QKY2FzZSBzY2VuYXJpbyBpcyB0aGF0IHRoZSBrZXJuZWwgZG9lcyBJ L08gdGhhdCB3YXNuJ3QgbmVjZXNzYXJ5LgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX18KbGludXgtcmlzY3YgbWFpbGluZyBsaXN0CmxpbnV4LXJpc2N2QGxp c3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0 aW5mby9saW51eC1yaXNjdgo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 869C9C3DA7F for ; Mon, 5 Aug 2024 23:23:42 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=OHpY7EZT; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WdCDY0dF1z3cbW for ; Tue, 6 Aug 2024 09:23:41 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=OHpY7EZT; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=flex--seanjc.bounces.google.com (client-ip=2607:f8b0:4864:20::649; helo=mail-pl1-x649.google.com; envelope-from=30f6xzgykdi8bxt62vz77z4x.v75416dg88v-wxe41bcb.7i4tub.7az@flex--seanjc.bounces.google.com; receiver=lists.ozlabs.org) Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WdCCm6Rqhz3cK8 for ; Tue, 6 Aug 2024 09:22:59 +1000 (AEST) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1fc5651e888so2608005ad.0 for ; Mon, 05 Aug 2024 16:22:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722900177; x=1723504977; darn=lists.ozlabs.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=ZqhT0BfYtdHqe5JEJY1dxcglZN0X6+UEhtDWxpbx9R0=; b=OHpY7EZT3XGPAALuw+2YKkkyuOkscZPQNhqfVPUzw9ascmeFgwlhFdguuFGlfbN7t/ 7eGhxCSCaX7x6hCyXl4xmKLZbds6y2rR/10qzeLqRi0VagoHXqo9rHswrGKxKNqXr7wP 2jtxW0Gjte1UuzIMFM5fA+pimDPTWqoUKmZUuEt/TKcEjVm8MytzwAnsQi0jk6lY9DRY TPoL4hqbbcaW117H19H8W8PvX7pwdLZb2GA8olSLWM1J/OJZEmPlcY9eNPn+DT2rxUsW tEKUZnuz+KHfCBXPRT/TqHC76F20QOuPBq5gKkiwPFWtfVFzLFhwAwQduynVXxBj9Wqz zlHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722900177; x=1723504977; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ZqhT0BfYtdHqe5JEJY1dxcglZN0X6+UEhtDWxpbx9R0=; b=JXgSLInKtZe60EJY8tRoi7yWWlQ0hBaLvH1vPkxR4KqS6uPebS8Zkx7cz6/eEjcIQg mr4G1sxGOtmrnh3PnTgEtAbuEKQOkrNnkfPNi27ZGd0bSAeqWSoYfVlGBQl/0t58L1o5 2/hkZvoLChy9jVTOmFJYnWpcOQWaZJA/cl40mmva4DG47hLGgCns8tQzMPflc7mWbNn0 r2y9FtxtvIXxXRQ/kfpBaoEhj+wcDf8AH5lrH7lo29CbZDBYuAsDrA/keSILKYC+aUIL 5ePttMqUmP2G4Nf8QpSV19zXXnvmYVjitxV4sPusr72Gv74Q0DUTLgU6AgBaWtnK5RsX yEGQ== X-Forwarded-Encrypted: i=1; AJvYcCWuWUQD/iOjYhLk1BgSbrqIoJq96FSE6Ee9L72XlPngYWhLvLpH4rfK0G0ksKa6vrwf9SZh1a+LsQy2ffXBLMbZyxWHrKYh47Kxu2A6Pg== X-Gm-Message-State: AOJu0YxMIbE+k59XvN93du6Y5lNyZrMjPD9IfZK0kycQPqgrPn9fMw7Y R8GhXFeW32VRB6Tr+2pPMjqomx7CNlu9DtPtrN6LmGD/dNEJPWDOSY1y1FAAjLMaTGKkAYNFRSJ v5A== X-Google-Smtp-Source: AGHT+IFnC2s9xgJUYrQamoHpNNjMdfjnR6jKC7U3Po3pJ/hHYZ7yn8m2xqR/ZVZ53CGLeUu13/uKbBw5g2Q= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:dacd:b0:1fb:325d:2b62 with SMTP id d9443c01a7336-1ff57464ba5mr9382545ad.10.1722900176428; Mon, 05 Aug 2024 16:22:56 -0700 (PDT) Date: Mon, 5 Aug 2024 16:22:54 -0700 In-Reply-To: <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Mime-Version: 1.0 References: <20240726235234.228822-1-seanjc@google.com> <20240726235234.228822-65-seanjc@google.com> <345d89c1-4f31-6b49-2cd4-a0696210fa7c@loongson.cn> Message-ID: Subject: Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path From: Sean Christopherson To: maobibo Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , linux-riscv@lists.infradead.org, Claudio Imbrenda , Marc Zyngier , Janosch Frank , Huacai Chen , Christian Borntraeger , Albert Ou , loongarch@lists.linux.dev, Paul Walmsley , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org, Oliver Upton , Palmer Dabbelt , David Stevens , kvm-riscv@lists.infradead.org, Anup Patel , Paolo Bonzini , Tianrui Zhao , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Sat, Aug 03, 2024, maobibo wrote: > On 2024/8/3 =E4=B8=8A=E5=8D=883:32, Sean Christopherson wrote: > > On Fri, Aug 02, 2024, maobibo wrote: > > > On 2024/7/27 =E4=B8=8A=E5=8D=887:52, Sean Christopherson wrote: > > > > Mark pages/folios dirty only the slow page fault path, i.e. only wh= en > > > > mmu_lock is held and the operation is mmu_notifier-protected, as ma= rking a > > > > page/folio dirty after it has been written back can make some files= ystems > > > > unhappy (backing KVM guests will such filesystem files is uncommon,= and > > > > the race is minuscule, hence the lack of complaints). > > > >=20 > > > > See the link below for details. > > > >=20 > > > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gma= il.com > > > > Signed-off-by: Sean Christopherson > > > > --- > > > > arch/loongarch/kvm/mmu.c | 18 ++++++++++-------- > > > > 1 file changed, 10 insertions(+), 8 deletions(-) > > > >=20 > > > > diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > > > > index 2634a9e8d82c..364dd35e0557 100644 > > > > --- a/arch/loongarch/kvm/mmu.c > > > > +++ b/arch/loongarch/kvm/mmu.c > > > > @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu = *vcpu, unsigned long gpa, bool writ > > > > if (kvm_pte_young(changed)) > > > > kvm_set_pfn_accessed(pfn); > > > > - if (kvm_pte_dirty(changed)) { > > > > - mark_page_dirty(kvm, gfn); > > > > - kvm_set_pfn_dirty(pfn); > > > > - } > > > > if (page) > > > > put_page(page); > > > > } > > > > + > > > > + if (kvm_pte_dirty(changed)) > > > > + mark_page_dirty(kvm, gfn); > > > > + > > > > return ret; > > > > out: > > > > spin_unlock(&kvm->mmu_lock); > > > > @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu= , unsigned long gpa, bool write) > > > > else > > > > ++kvm->stat.pages; > > > > kvm_set_pte(ptep, new_pte); > > > > - spin_unlock(&kvm->mmu_lock); > > > > - if (prot_bits & _PAGE_DIRTY) { > > > > - mark_page_dirty_in_slot(kvm, memslot, gfn); > > > > + if (writeable) > > > Is it better to use write or (prot_bits & _PAGE_DIRTY) here? writabl= e is > > > pte permission from function hva_to_pfn_slow(), write is fault action= . > >=20 > > Marking folios dirty in the slow/full path basically necessitates marki= ng the > > folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio= dirty > > if/when _PAGE_DIRTY is set. > >=20 > > Practically speaking, I'm 99.9% certain it doesn't matter. The folio i= s marked > > dirty by core MM when the folio is made writable, and cleaning the foli= o triggers > > an mmu_notifier invalidation. I.e. if the page is mapped writable in K= VM's > yes, it is. Thanks for the explanation. kvm_set_pfn_dirty() can be put on= ly > in slow page fault path. I only concern with fault type, read fault type = can > set pte entry writable however not _PAGE_DIRTY at stage-2 mmu table. >=20 > > stage-2 PTEs, then its folio has already been marked dirty. > Considering one condition although I do not know whether it exists actual= ly. > user mode VMM writes the folio with hva address firstly, then VCPU thread > *reads* the folio. With primary mmu table, pte entry is writable and > _PAGE_DIRTY is set, with secondary mmu table(state-2 PTE table), it is > pte_none since the filio is accessed at first time, so there will be slow > page fault path for stage-2 mmu page table filling. >=20 > Since it is read fault, stage-2 PTE will be created with _PAGE_WRITE(comi= ng > from function hva_to_pfn_slow()), however _PAGE_DIRTY is not set. Do we n= eed > call kvm_set_pfn_dirty() at this situation? If KVM doesn't mark the folio dirty when the stage-2 _PAGE_DIRTY flag is se= t, i.e. as proposed in this series, then yes, KVM needs to call kvm_set_pfn_di= rty() even though the VM hasn't (yet) written to the memory. In practice, KVM ca= lling kvm_set_pfn_dirty() is redundant the majority of the time, as the stage-1 P= TE will have _PAGE_DIRTY set, and that will get propagated to the folio when t= he primary MMU does anything relevant with the PTE. And for file systems that= care about writeback, odds are very good that the folio was marked dirty even ea= rlier, when MM invoked vm_operations_struct.page_mkwrite(). The reason I am pushing to have all architectures mark pages/folios dirty i= n the slow page fault path is that a false positive (marking a folio dirty withou= t the folio ever being written in _any_ context since the last pte_mkclean()) is = rare, and at worst results an unnecessary writeback. On the other hand, marking = folios dirty in fast page fault handlers (or anywhere else that isn't protected by mmu_notifiers) is technically unsafe. In other words, the intent is to sacrifice accuracy to improve stability/ro= bustness, because the vast majority of time the loss in accuracy has no effect, and t= he worst case scenario is that the kernel does I/O that wasn't necessary.