From: Jiangyifei <jiangyifei@huawei.com>
To: Anup Patel <anup@brainfault.org>
Cc: Anup Patel <anup.patel@wdc.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Palmer Dabbelt <palmerdabbelt@google.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Paolo Bonzini <pbonzini@redhat.com>,
Alexander Graf <graf@amazon.com>,
Atish Patra <atish.patra@wdc.com>,
Alistair Francis <Alistair.Francis@wdc.com>,
"Damien Le Moal" <damien.lemoal@wdc.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"kvm-riscv@lists.infradead.org" <kvm-riscv@lists.infradead.org>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Zhangxiaofeng (F)" <victor.zhangxiaofeng@huawei.com>,
"Wubin (H)" <wu.wubin@huawei.com>,
"dengkai (A)" <dengkai1@huawei.com>,
yinyipeng <yinyipeng1@huawei.com>
Subject: RE: [PATCH v15 10/17] RISC-V: KVM: Implement stage2 page table programming
Date: Tue, 1 Dec 2020 02:02:11 +0000 [thread overview]
Message-ID: <87f95ad08ead4755a2f2cf120be1ff01@huawei.com> (raw)
In-Reply-To: <CAAhSdy13nQqzfK-Voz-aBtGzEfjw_0V1fOrydijc3FLWy_W4nw@mail.gmail.com>
> -----Original Message-----
> From: Anup Patel [mailto:anup@brainfault.org]
> Sent: Monday, November 30, 2020 6:22 PM
> To: Jiangyifei <jiangyifei@huawei.com>
> Cc: Anup Patel <anup.patel@wdc.com>; Palmer Dabbelt
> <palmer@dabbelt.com>; Palmer Dabbelt <palmerdabbelt@google.com>; Paul
> Walmsley <paul.walmsley@sifive.com>; Albert Ou <aou@eecs.berkeley.edu>;
> Paolo Bonzini <pbonzini@redhat.com>; Alexander Graf <graf@amazon.com>;
> Atish Patra <atish.patra@wdc.com>; Alistair Francis
> <Alistair.Francis@wdc.com>; Damien Le Moal <damien.lemoal@wdc.com>;
> kvm@vger.kernel.org; kvm-riscv@lists.infradead.org;
> linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Zhangxiaofeng (F)
> <victor.zhangxiaofeng@huawei.com>; Wubin (H) <wu.wubin@huawei.com>;
> dengkai (A) <dengkai1@huawei.com>; yinyipeng <yinyipeng1@huawei.com>
> Subject: Re: [PATCH v15 10/17] RISC-V: KVM: Implement stage2 page table
> programming
>
> On Tue, Nov 24, 2020 at 2:56 PM Anup Patel <anup@brainfault.org> wrote:
> >
> > On Mon, Nov 16, 2020 at 2:59 PM Jiangyifei <jiangyifei@huawei.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Anup Patel [mailto:anup.patel@wdc.com]
> > > > Sent: Monday, November 9, 2020 7:33 PM
> > > > To: Palmer Dabbelt <palmer@dabbelt.com>; Palmer Dabbelt
> > > > <palmerdabbelt@google.com>; Paul Walmsley
> > > > <paul.walmsley@sifive.com>; Albert Ou <aou@eecs.berkeley.edu>;
> > > > Paolo Bonzini <pbonzini@redhat.com>
> > > > Cc: Alexander Graf <graf@amazon.com>; Atish Patra
> > > > <atish.patra@wdc.com>; Alistair Francis
> > > > <Alistair.Francis@wdc.com>; Damien Le Moal
> > > > <damien.lemoal@wdc.com>; Anup Patel <anup@brainfault.org>;
> > > > kvm@vger.kernel.org; kvm-riscv@lists.infradead.org;
> > > > linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org;
> > > > Anup Patel <anup.patel@wdc.com>; Jiangyifei
> > > > <jiangyifei@huawei.com>
> > > > Subject: [PATCH v15 10/17] RISC-V: KVM: Implement stage2 page
> > > > table programming
> > > >
> > > > This patch implements all required functions for programming the
> > > > stage2 page table for each Guest/VM.
> > > >
> > > > At high-level, the flow of stage2 related functions is similar
> > > > from KVM
> > > > ARM/ARM64 implementation but the stage2 page table format is quite
> > > > different for KVM RISC-V.
> > > >
> > > > [jiangyifei: stage2 dirty log support]
> > > > Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
> > > > Signed-off-by: Anup Patel <anup.patel@wdc.com>
> > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > ---
> > > > arch/riscv/include/asm/kvm_host.h | 12 +
> > > > arch/riscv/include/asm/pgtable-bits.h | 1 +
> > > > arch/riscv/kvm/Kconfig | 1 +
> > > > arch/riscv/kvm/main.c | 19 +
> > > > arch/riscv/kvm/mmu.c | 649
> > > > +++++++++++++++++++++++++-
> > > > arch/riscv/kvm/vm.c | 6 -
> > > > 6 files changed, 672 insertions(+), 16 deletions(-)
> > > >
> > >
> > > ......
> > >
> > > >
> > > > int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, @@ -69,27
> > > > +562,163 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
> > > > gpa_t gpa, unsigned long hva,
> > > > bool writeable, bool is_write) {
> > > > - /* TODO: */
> > > > - return 0;
> > > > + int ret;
> > > > + kvm_pfn_t hfn;
> > > > + short vma_pageshift;
> > > > + gfn_t gfn = gpa >> PAGE_SHIFT;
> > > > + struct vm_area_struct *vma;
> > > > + struct kvm *kvm = vcpu->kvm;
> > > > + struct kvm_mmu_page_cache *pcache =
> &vcpu->arch.mmu_page_cache;
> > > > + bool logging = (memslot->dirty_bitmap &&
> > > > + !(memslot->flags & KVM_MEM_READONLY)) ?
> true : false;
> > > > + unsigned long vma_pagesize;
> > > > +
> > > > + mmap_read_lock(current->mm);
> > > > +
> > > > + vma = find_vma_intersection(current->mm, hva, hva + 1);
> > > > + if (unlikely(!vma)) {
> > > > + kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > > > + mmap_read_unlock(current->mm);
> > > > + return -EFAULT;
> > > > + }
> > > > +
> > > > + if (is_vm_hugetlb_page(vma))
> > > > + vma_pageshift = huge_page_shift(hstate_vma(vma));
> > > > + else
> > > > + vma_pageshift = PAGE_SHIFT;
> > > > + vma_pagesize = 1ULL << vma_pageshift;
> > > > + if (logging || (vma->vm_flags & VM_PFNMAP))
> > > > + vma_pagesize = PAGE_SIZE;
> > > > +
> > > > + if (vma_pagesize == PMD_SIZE || vma_pagesize == PGDIR_SIZE)
> > > > + gfn = (gpa & huge_page_mask(hstate_vma(vma))) >>
> > > > + PAGE_SHIFT;
> > > > +
> > > > + mmap_read_unlock(current->mm);
> > > > +
> > > > + if (vma_pagesize != PGDIR_SIZE &&
> > > > + vma_pagesize != PMD_SIZE &&
> > > > + vma_pagesize != PAGE_SIZE) {
> > > > + kvm_err("Invalid VMA page size 0x%lx\n",
> vma_pagesize);
> > > > + return -EFAULT;
> > > > + }
> > > > +
> > > > + /* We need minimum second+third level pages */
> > > > + ret = stage2_cache_topup(pcache, stage2_pgd_levels,
> > > > +
> KVM_MMU_PAGE_CACHE_NR_OBJS);
> > > > + if (ret) {
> > > > + kvm_err("Failed to topup stage2 cache\n");
> > > > + return ret;
> > > > + }
> > > > +
> > > > + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, NULL);
> > > > + if (hfn == KVM_PFN_ERR_HWPOISON) {
> > > > + send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
> > > > + vma_pageshift, current);
> > > > + return 0;
> > > > + }
> > > > + if (is_error_noslot_pfn(hfn))
> > > > + return -EFAULT;
> > > > +
> > > > + /*
> > > > + * If logging is active then we allow writable pages only
> > > > + * for write faults.
> > > > + */
> > > > + if (logging && !is_write)
> > > > + writeable = false;
> > > > +
> > > > + spin_lock(&kvm->mmu_lock);
> > > > +
> > > > + if (writeable) {
> > >
> > > Hi Anup,
> > >
> > > What is the purpose of "writable = !memslot_is_readonly(slot)" in this
> series?
> >
> > Where ? I don't see this line in any of the patches.
> >
> > >
> > > When mapping the HVA to HPA above, it doesn't know that the PTE
> writeable of stage2 is "!memslot_is_readonly(slot)".
> > > This may causes the difference between the writability of HVA->HPA and
> GPA->HPA.
> > > For example, GPA->HPA is writeable, but HVA->HPA is not writeable.
> >
> > Yes, this is possible particularly when Host kernel is updating
> > writability of HVA->HPA mappings for swapping in/out pages.
> >
> > >
> > > Is it better that the writability of HVA->HPA is also determined by whether
> the memslot is readonly in this change?
> > > Like this:
> > > - hfn = gfn_to_pfn_prot(kvm, gfn, is_write, NULL);
> > > + hfn = gfn_to_pfn_prot(kvm, gfn, writeable, NULL);
> >
> > The gfn_to_pfn_prot() needs to know what type of fault we got (i.e
> > read/write fault). Rest of the information (such as whether slot is
> > writable or not) is already available to gfn_to_pfn_prot().
> >
> > The question here is should we pass "&writeable" or NULL as last
> > parameter to gfn_to_pfn_prot(). The recent JUMP label support in Linux
> > RISC-V causes problem on HW where PTE 'A' and 'D' bits are not updated
> > by HW so I have to change last parameter of gfn_to_pfn_prot() from
> > "&writeable" to NULL.
> >
> > I am still investigating this.
>
> This turned-out to be a bug in Spike which is not fixed.
>
> I will include following change in v16 patch series:
>
>
> diff --git a/arch/riscv/include/asm/kvm_host.h
> b/arch/riscv/include/asm/kvm_host.h
> index 241030956d47..dc2666b4180b 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -232,8 +232,7 @@ void __kvm_riscv_hfence_gvma_all(void);
>
> int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
> struct kvm_memory_slot *memslot,
> - gpa_t gpa, unsigned long hva,
> - bool writeable, bool is_write);
> + gpa_t gpa, unsigned long hva, bool is_write);
> void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu); int
> kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); void
> kvm_riscv_stage2_free_pgd(struct kvm *kvm); diff --git
> a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index
> fcaeadc9b34d..56fda9ef70fd 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -689,11 +689,11 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned
> long hva)
>
> int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
> struct kvm_memory_slot *memslot,
> - gpa_t gpa, unsigned long hva,
> - bool writeable, bool is_write)
> + gpa_t gpa, unsigned long hva, bool is_write)
> {
> int ret;
> kvm_pfn_t hfn;
> + bool writeable;
> short vma_pageshift;
> gfn_t gfn = gpa >> PAGE_SHIFT;
> struct vm_area_struct *vma;
> @@ -742,7 +742,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
>
> mmu_seq = kvm->mmu_notifier_seq;
>
> - hfn = gfn_to_pfn_prot(kvm, gfn, is_write, NULL);
> + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable);
> if (hfn == KVM_PFN_ERR_HWPOISON) {
> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
> vma_pageshift, current); diff --git
> a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index
> f054406792a6..058cfa168abe 100644
> --- a/arch/riscv/kvm/vcpu_exit.c
> +++ b/arch/riscv/kvm/vcpu_exit.c
> @@ -445,7 +445,7 @@ static int stage2_page_fault(struct kvm_vcpu *vcpu,
> struct kvm_run *run,
> };
> }
>
> - ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva, writeable,
> + ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva,
> (trap->scause == EXC_STORE_GUEST_PAGE_FAULT) ? true : false);
> if (ret < 0)
> return ret;
>
> Regards,
> Anup
This change looks good.
Yifei
next prev parent reply other threads:[~2020-12-01 2:02 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-09 11:32 [PATCH v15 00/17] KVM RISC-V Support Anup Patel
2020-11-09 11:32 ` [PATCH v15 01/17] RISC-V: Add hypervisor extension related CSR defines Anup Patel
2020-11-09 11:32 ` [PATCH v15 02/17] RISC-V: Add initial skeletal KVM support Anup Patel
2020-11-09 11:32 ` [PATCH v15 03/17] RISC-V: KVM: Implement VCPU create, init and destroy functions Anup Patel
2020-11-09 11:32 ` [PATCH v15 04/17] RISC-V: KVM: Implement VCPU interrupts and requests handling Anup Patel
2020-11-09 11:32 ` [PATCH v15 05/17] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls Anup Patel
2020-11-09 11:32 ` [PATCH v15 06/17] RISC-V: KVM: Implement VCPU world-switch Anup Patel
2020-11-09 11:32 ` [PATCH v15 07/17] RISC-V: KVM: Handle MMIO exits for VCPU Anup Patel
2020-11-09 11:32 ` [PATCH v15 08/17] RISC-V: KVM: Handle WFI " Anup Patel
2020-11-09 11:32 ` [PATCH v15 09/17] RISC-V: KVM: Implement VMID allocator Anup Patel
2020-11-09 11:32 ` [PATCH v15 10/17] RISC-V: KVM: Implement stage2 page table programming Anup Patel
2020-11-16 9:29 ` Jiangyifei
2020-11-24 9:26 ` Anup Patel
2020-11-30 10:21 ` Anup Patel
2020-12-01 2:02 ` Jiangyifei [this message]
2020-11-09 11:32 ` [PATCH v15 11/17] RISC-V: KVM: Implement MMU notifiers Anup Patel
2020-11-09 11:32 ` [PATCH v15 12/17] RISC-V: KVM: Add timer functionality Anup Patel
2020-12-23 3:30 ` Jiangyifei
2020-12-26 10:26 ` Anup Patel
2020-11-09 11:32 ` [PATCH v15 13/17] RISC-V: KVM: FP lazy save/restore Anup Patel
2020-11-09 11:32 ` [PATCH v15 14/17] RISC-V: KVM: Implement ONE REG interface for FP registers Anup Patel
2020-11-09 11:32 ` [PATCH v15 15/17] RISC-V: KVM: Add SBI v0.1 support Anup Patel
2020-11-09 11:32 ` [PATCH v15 16/17] RISC-V: KVM: Document RISC-V specific parts of KVM API Anup Patel
2020-11-09 11:32 ` [PATCH v15 17/17] RISC-V: KVM: Add MAINTAINERS entry Anup Patel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87f95ad08ead4755a2f2cf120be1ff01@huawei.com \
--to=jiangyifei@huawei.com \
--cc=Alistair.Francis@wdc.com \
--cc=anup.patel@wdc.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=atish.patra@wdc.com \
--cc=damien.lemoal@wdc.com \
--cc=dengkai1@huawei.com \
--cc=graf@amazon.com \
--cc=kvm-riscv@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=palmerdabbelt@google.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=victor.zhangxiaofeng@huawei.com \
--cc=wu.wubin@huawei.com \
--cc=yinyipeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox