From: "Michael S. Tsirkin" <mst@redhat.com>
To: Nitesh Narayan Lal <nilal@redhat.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, pagupta@redhat.com,
wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@redhat.com,
david@redhat.com, dodgen@google.com, konrad.wilk@oracle.com
Subject: Re: [Patch v4 1/6] KVM: Support for guest page hinting
Date: Wed, 15 Nov 2017 22:06:10 +0200 [thread overview]
Message-ID: <20171115220456-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <3ba70546-37f8-cf06-a882-a3abab19d46d@redhat.com>
On Wed, Nov 15, 2017 at 02:38:20PM -0500, Nitesh Narayan Lal wrote:
> Hi Michael,
>
>
> On 11/13/2017 12:59 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 03, 2017 at 04:30:08PM -0400, nilal@redhat.com wrote:
> >> From: Nitesh Narayan Lal <nilal@redhat.com>
> >>
> >> This patch includes the following:
> >> 1. Basic skeleton for the support
> >> 2. Enablement of x86 platform to use the same
> >>
> >> Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
> >> ---
> >> arch/x86/Kbuild | 2 +-
> >> arch/x86/kvm/Makefile | 2 ++
> >> include/linux/gfp.h | 7 +++++++
> >> virt/kvm/Kconfig | 4 ++++
> >> virt/kvm/page_hinting.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> >> 5 files changed, 60 insertions(+), 1 deletion(-)
> >> create mode 100644 virt/kvm/page_hinting.c
> >>
> >> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
> >> index 0038a2d..7d39d7d 100644
> >> --- a/arch/x86/Kbuild
> >> +++ b/arch/x86/Kbuild
> >> @@ -2,7 +2,7 @@ obj-y += entry/
> >>
> >> obj-$(CONFIG_PERF_EVENTS) += events/
> >>
> >> -obj-$(CONFIG_KVM) += kvm/
> >> +obj-$(subst m,y,$(CONFIG_KVM)) += kvm/
> >>
> >> # Xen paravirtualization support
> >> obj-$(CONFIG_XEN) += xen/
> >> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> >> index 09d4b17..d8a3800 100644
> >> --- a/arch/x86/kvm/Makefile
> >> +++ b/arch/x86/kvm/Makefile
> >> @@ -15,6 +15,8 @@ kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
> >> i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
> >> hyperv.o page_track.o debugfs.o
> >>
> >> +obj-$(CONFIG_KVM_FREE_PAGE_HINTING) += $(KVM)/page_hinting.o
> >> +
> >> kvm-intel-y += vmx.o pmu_intel.o
> >> kvm-amd-y += svm.o pmu_amd.o
> >>
> >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> >> index f780718..a74371f 100644
> >> --- a/include/linux/gfp.h
> >> +++ b/include/linux/gfp.h
> >> @@ -452,6 +452,13 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> >> return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
> >> }
> >>
> >> +#ifdef CONFIG_KVM_FREE_PAGE_HINTING
> >> +#define HAVE_ARCH_ALLOC_PAGE
> >> +#define HAVE_ARCH_FREE_PAGE
> >> +void arch_free_page(struct page *page, int order);
> >> +void arch_alloc_page(struct page *page, int order);
> >> +#endif
> >> +
> >> #ifndef HAVE_ARCH_FREE_PAGE
> >> static inline void arch_free_page(struct page *page, int order) { }
> >> #endif
> >
> > So the idea is to send pages to host in arch_free_page?
> >
> > However, I notice:
> >
> > arch_free_page(page, order);
> > kernel_poison_pages(page, 1 << order, 0);
> >
> > Thus pages are immediately written to after they are freed.
> >
> >
> >
> >
> >
> >> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> >> index b0cc1a3..936c71d 100644
> >> --- a/virt/kvm/Kconfig
> >> +++ b/virt/kvm/Kconfig
> >> @@ -50,3 +50,7 @@ config KVM_COMPAT
> >>
> >> config HAVE_KVM_IRQ_BYPASS
> >> bool
> >> +
> >> +config KVM_FREE_PAGE_HINTING
> >> + def_bool y
> >> + depends on KVM
> >> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
> >> new file mode 100644
> >> index 0000000..39d2b1d
> >> --- /dev/null
> >> +++ b/virt/kvm/page_hinting.c
> >> @@ -0,0 +1,46 @@
> >> +#include <linux/gfp.h>
> >> +#include <linux/mm.h>
> >> +#include <linux/page_ref.h>
> >> +#include <linux/kvm_host.h>
> >> +#include <linux/sort.h>
> >> +
> >> +#include <trace/events/kvm.h>
> >> +
> >> +#define MAX_FGPT_ENTRIES 1000
> >> +#define HYPERLIST_THRESHOLD 500
> >> +/*
> >> + * struct kvm_free_pages - Tracks the pages which are freed by the guest.
> >> + * @pfn - page frame number for the page which is to be freed
> >> + * @pages - number of pages which are supposed to be freed.
> >> + * A global array object is used to hold the list of pfn and number of pages
> >> + * which are freed by the guest. This list may also have fragmentated pages so
> >> + * defragmentation is a must prior to the hypercall.
> >> + */
> >> +struct kvm_free_pages {
> >> + unsigned long pfn;
> >> + unsigned int pages;
> >> +};
> >> +
> >> +/*
> >> + * hypervisor_pages - It is a dummy structure passed with the hypercall.
> >> + * @pfn - page frame number for the page which is to be freed.
> >> + * @pages - number of pages which are supposed to be freed.
> >> + * A global array object is used to to hold the list of pfn and pages and is
> >> + * passed as part of the hypercall.
> >> + */
> >> +struct hypervisor_pages {
> >> + unsigned long pfn;
> >> + unsigned int pages;
> >> +};
> >> +
> >> +DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
> >> +DEFINE_PER_CPU(int, kvm_pt_idx);
> >> +struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
> >> +
> >> +void arch_alloc_page(struct page *page, int order)
> >> +{
> >> +}
> >> +
> >> +void arch_free_page(struct page *page, int order)
> >> +{
> >> +}
> >> --
> >> 2.9.4
> Thanks Michael for pointing out this bug.
> The issue here is when both 'Guest Page Hinting' and 'Page Poisoning'
> are enabled as it may result in guest generating memory corruption
> errors when the hypervisor maps a different physical memory to the guest
> after freeing the earlier mapped memory.
>
> Page Poisoning is a feature in which the page is filled with a specific
> pattern of (0x00 or 0xaa) after arch_free_page and the same is verified
> before arch_alloc_page to prevent following issues:
>
> *information leak from the freed data
> *use after free bugs
> *memory corruption
>
> -Selection of the pattern depends on the CONFIG_PAGE_POISONING_ZERO
>
> Guest page hinting is a support for handing freed memory between the
> guest and the host. It enables the guests with DAX (no page cache) to
> rapidly free and reclaims memory to and from the host respectively. To
> achieve this task it uses two lists one CPU-local and other CPU-global.
> Whenever a page is freed it is added to the respective CPU-local list
> through arch_free_page() until it is full. Once the list is full a
> seqlock is taken to prevent any further page allocations and the per
> CPU-local list is traversed and only those pages which are not
> reallocated by the guest are added to the CPU-global list. This global
> list is then sent to the hypervisor/host.
>
> After freeing the pages in the global list following things may happen:
>
> *Hypervisor reallocates the freed memory back to the guest if required
> *Hypervisor frees the memory and maps a different physical memory at
> its place
>
> In order to prevent any information leak hypervisor before allocating
> memory to the guest fills it with zeroes.
> The issue arises when the pattern used for Page Poisoning is 0xaa while
> the newly allocated page received from the hypervisor by the guest is
> filled with the pattern 0x00. This will result in memory corruption errors.
> As per the current implementation in order to enable page poisoning a
> boot time parameter (page_poison) is set. Hence there could be two
> following ways by which we could resolve the above-stated issue:
>
> 1. We can disable page poisoning if guest page hinting is enabled. The
> only code change required in this case will be to check if
> CONFIG_PAGE_POISONING is enabled, if so disable it so that poisoning is
> not checked once we receive newly allocated memory from the hypervisor.
>
> 2. We can have a flag in the page structure using which we can control
> poisoning on a per page basis. Using this we can disable the poisoning
> for only those pages which are sent to the hypervisor so that posioning
> is not checked on the newly allocated memory which is received by the
> guest form the hypervisor. Though in order to incorporate this I may
> have to make changes in page_poison.c, mm_types.h and other related files.
>
> I am looking forward to the comments and suggestions.
>
> [1] http://www.spinics.net/lists/kvm/msg153666.html
> [2] https://www.spinics.net/lists/kvm/msg158054.html
Let's not discuss this off-list. Please repost with Cc a
mailing list so the discussion is archived.
Thanks!
> --
> Regards
> Nitesh
>
next prev parent reply other threads:[~2017-11-15 20:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-03 20:30 [Patch v4 0/6] KVM: Guest page hinting nilal
2017-11-03 20:30 ` [Patch v4 1/6] KVM: Support for guest " nilal
2017-11-13 17:59 ` Michael S. Tsirkin
2017-11-15 19:38 ` Nitesh Narayan Lal
2017-11-15 20:06 ` Michael S. Tsirkin [this message]
2017-11-03 20:30 ` [Patch v4 2/6] KVM: Guest page hinting functionality nilal
2017-11-13 18:03 ` Michael S. Tsirkin
2017-11-03 20:30 ` [Patch v4 3/6] KVM: Adding tracepoints for guest page hinting nilal
2017-11-03 20:30 ` [Patch v4 4/6] virtio: Exposes added descriptor to the other side synchronously nilal
2017-11-03 20:30 ` [Patch v4 5/6] KVM: Sending hyperlist to the host via hinting_vq nilal
2017-11-03 20:30 ` [Patch v4 6/6] KVM: Enabling guest page hinting via static key nilal
2017-11-03 20:37 ` [QEMU PATCH] kvm: Support for guest page hinting nilal
2017-11-03 20:37 ` nilal
2017-11-06 11:21 ` Pankaj Gupta
2017-11-06 14:21 ` Nitesh Narayan Lal
2017-11-12 21:23 ` [Patch v4 0/6] KVM: Guest " Rik van Riel
2017-11-13 15:14 ` Nitesh Narayan Lal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171115220456-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=david@redhat.com \
--cc=dodgen@google.com \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=nilal@redhat.com \
--cc=pagupta@redhat.com \
--cc=pbonzini@redhat.com \
--cc=riel@redhat.com \
--cc=wei.w.wang@intel.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox