qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chao Peng <chao.p.peng@linux.intel.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Wanpeng Li <wanpengli@tencent.com>,
	jun.nakajima@intel.com, kvm@vger.kernel.org, david@redhat.com,
	qemu-devel@nongnu.org, "J . Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, "H . Peter Anvin" <hpa@zytor.com>,
	ak@linux.intel.com, Jonathan Corbet <corbet@lwn.net>,
	Joerg Roedel <joro@8bytes.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	luto@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	dave.hansen@intel.com, Sean Christopherson <seanjc@google.com>,
	susie.li@intel.com, Jeff Layton <jlayton@kernel.org>,
	linux-kernel@vger.kernel.org, john.ji@intel.com,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	linux-fsdevel@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory
Date: Tue, 4 Jan 2022 17:10:08 +0800	[thread overview]
Message-ID: <20220104091008.GA21806@chaop.bj.intel.com> (raw)
In-Reply-To: <20220104014629.GA2330@yzhao56-desk.sh.intel.com>

On Tue, Jan 04, 2022 at 09:46:35AM +0800, Yan Zhao wrote:
> On Thu, Dec 23, 2021 at 08:30:09PM +0800, Chao Peng wrote:
> > When a page fault from the secondary page table while the guest is
> > running happens in a memslot with KVM_MEM_PRIVATE, we need go
> > different paths for private access and shared access.
> > 
> >   - For private access, KVM checks if the page is already allocated in
> >     the memory backend, if yes KVM establishes the mapping, otherwise
> >     exits to userspace to convert a shared page to private one.
> >
> will this conversion be atomical or not?
> For example, after punching a hole in a private memory slot, will KVM
> see two notifications: one for invalidation of the whole private memory
> slot, and one for fallocate of the rest ranges besides the hole?
> Or, KVM only sees one invalidation notification for the hole?

Punching hole doesn't need to invalidate the whole memory slot. It only
send one invalidation notification to KVM for the 'hole' part.

Taking shared-to-private conversion as example it only invalidates the
'hole' part (that usually only the portion of the whole memory) on the
shared fd,, and then fallocate the private memory in the private fd at
the 'hole'. The KVM invalidation notification happens when the shared
hole gets invalidated. The establishment of the private mapping happens
at subsequent KVM page fault handlers.

> Could you please show QEMU code about this conversion?

See below for the QEMU side conversion code. The above described
invalidation and fallocation will be two steps in this conversion. If
error happens in the middle then this error will be propagated to
kvm_run to do the proper action (e.g. may kill the guest?).

int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
                            bool shared_to_private)
{
    int ret; 
    int fd_from, fd_to;

    if (!rb || rb->private_fd <= 0) { 
        return -1;
    }    

    if (!QEMU_PTR_IS_ALIGNED(start, rb->page_size) ||
        !QEMU_PTR_IS_ALIGNED(length, rb->page_size)) {
        return -1;
    }    

    if (length > rb->max_length) {
        return -1;
    }    

    if (shared_to_private) {
        fd_from = rb->fd;
        fd_to = rb->private_fd;
    } else {
        fd_from = rb->private_fd;
        fd_to = rb->fd;
    }    

    ret = ram_block_discard_range_fd(rb, start, length, fd_from);
    if (ret) {
        return ret; 
    }    

    if (fd_to > 0) { 
        return fallocate(fd_to, 0, start, length);
    }    

    return 0;
}

> 
> 
> >   - For shared access, KVM also checks if the page is already allocated
> >     in the memory backend, if yes then exit to userspace to convert a
> >     private page to shared one, otherwise it's treated as a traditional
> >     hva-based shared memory, KVM lets existing code to obtain a pfn with
> >     get_user_pages() and establish the mapping.
> > 
> > The above code assume private memory is persistent and pre-allocated in
> > the memory backend so KVM can use this information as an indicator for
> > a page is private or shared. The above check is then performed by
> > calling kvm_memfd_get_pfn() which currently is implemented as a
> > pagecache search but in theory that can be implemented differently
> > (i.e. when the page is even not mapped into host pagecache there should
> > be some different implementation).
> > 
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c         | 73 ++++++++++++++++++++++++++++++++--
> >  arch/x86/kvm/mmu/paging_tmpl.h | 11 +++--
> >  2 files changed, 77 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 2856eb662a21..fbcdf62f8281 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2920,6 +2920,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> >  	if (max_level == PG_LEVEL_4K)
> >  		return PG_LEVEL_4K;
> >  
> > +	if (kvm_slot_is_private(slot))
> > +		return max_level;
> > +
> >  	host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
> >  	return min(host_level, max_level);
> >  }
> > @@ -3950,7 +3953,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >  				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
> >  }
> >  
> > -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
> > +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> > +{
> > +	/*
> > +	 * At this time private gfn has not been supported yet. Other patch
> > +	 * that enables it should change this.
> > +	 */
> > +	return false;
> > +}
> > +
> > +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > +				    struct kvm_page_fault *fault,
> > +				    bool *is_private_pfn, int *r)
> > +{
> > +	int order;
> > +	int mem_convert_type;
> > +	struct kvm_memory_slot *slot = fault->slot;
> > +	long pfn = kvm_memfd_get_pfn(slot, fault->gfn, &order);
> For private memory slots, it's possible to have pfns backed by
> backends other than memfd, e.g. devicefd.

Surely yes, although this patch only supports memfd, but it's designed
to be extensible to support other memory backing stores than memfd. There
is one assumption in this design however: one private memslot can be
backed by only one type of such memory backing store, e.g. if the
devicefd you mentioned can independently provide memory for a memslot
then that's no issue.

>So is it possible to let those
> private memslots keep private and use traditional hva-based way?

Typically this fd-based private memory uses the 'offset' as the
userspace address to get a pfn from the backing store fd. But I believe
the current code does not prevent you from using the hva as the
userspace address, as long as your memory backing store understand that
address and can provide the pfn basing on it. But since you already have
the hva, you probably already mmap-ed the fd to userspace, that seems
not this private memory patch can protect you. Probably I didn't quite
understand 'keep private' you mentioned here.

Thanks,
Chao
> Reasons below:
> 1. only memfd is supported in this patch set.
> 2. qemu/host read/write to those private memslots backing up by devicefd may
> not cause machine check.
> 
> Thanks
> Yan
> 


  reply	other threads:[~2022-01-04  9:12 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-23 12:29 [PATCH v3 kvm/queue 00/16] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 01/16] mm/shmem: Introduce F_SEAL_INACCESSIBLE Chao Peng
2022-01-04 14:22   ` David Hildenbrand
2022-01-06 13:06     ` Chao Peng
2022-01-13 15:56       ` David Hildenbrand
2021-12-23 12:29 ` [PATCH v3 kvm/queue 02/16] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 03/16] mm/memfd: Introduce MEMFD_OPS Chao Peng
2021-12-24  3:53   ` Robert Hoo
2021-12-31  2:38     ` Chao Peng
2022-01-04 17:38       ` Sean Christopherson
2022-01-05  6:07         ` Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 04/16] KVM: Extend the memslot to support fd-based private memory Chao Peng
2021-12-23 17:35   ` Sean Christopherson
2021-12-31  2:53     ` Chao Peng
2022-01-04 17:34       ` Sean Christopherson
2021-12-23 12:30 ` [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset Chao Peng
2021-12-23 18:02   ` Sean Christopherson
2021-12-24  3:54     ` Chao Peng
2021-12-27 23:50       ` Yao Yuan
2021-12-28 21:48       ` Sean Christopherson
2021-12-31  2:26         ` Chao Peng
2022-01-04 17:43           ` Sean Christopherson
2022-01-05  6:09             ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 06/16] KVM: Implement fd-based memory using MEMFD_OPS interfaces Chao Peng
2021-12-23 18:34   ` Sean Christopherson
2021-12-23 23:09     ` Paolo Bonzini
2021-12-24  4:25       ` Chao Peng
2021-12-28 22:14         ` Sean Christopherson
2021-12-24  4:12     ` Chao Peng
2021-12-24  4:22     ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 07/16] KVM: Refactor hva based memory invalidation code Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 08/16] KVM: Special handling for fd-based memory invalidation Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 09/16] KVM: Split out common memory invalidation code Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 10/16] KVM: Implement fd-based memory invalidation Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 11/16] KVM: Add kvm_map_gfn_range Chao Peng
2021-12-23 18:06   ` Sean Christopherson
2021-12-24  4:13     ` Chao Peng
2021-12-31  2:33       ` Chao Peng
2022-01-04 17:31         ` Sean Christopherson
2022-01-05  6:14           ` Chao Peng
2022-01-05 17:03             ` Sean Christopherson
2022-01-06 12:35               ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 12/16] KVM: Implement fd-based memory fallocation Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 13/16] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2021-12-23 18:28   ` Sean Christopherson
2021-12-23 12:30 ` [PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory Chao Peng
2022-01-04  1:46   ` Yan Zhao
2022-01-04  9:10     ` Chao Peng [this message]
2022-01-04 10:06       ` Yan Zhao
2022-01-05  6:28         ` Chao Peng
2022-01-05  7:53           ` Yan Zhao
2022-01-05 20:52             ` Sean Christopherson
2022-01-14  5:53               ` Yan Zhao
2021-12-23 12:30 ` [PATCH v3 kvm/queue 15/16] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 16/16] KVM: Register/unregister private memory slot to memfd Chao Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220104091008.GA21806@chaop.bj.intel.com \
    --to=chao.p.peng@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=john.ji@intel.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seanjc@google.com \
    --cc=susie.li@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).