* [Patch v3 1/5] KVM: Support for guest page hinting
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
@ 2017-10-10 18:39 ` nilal
2017-10-13 11:47 ` kbuild test robot
2017-10-13 12:44 ` kbuild test robot
2017-10-10 18:39 ` [Patch v3 2/5] KVM: Guest page hinting functionality nilal
` (5 subsequent siblings)
6 siblings, 2 replies; 12+ messages in thread
From: nilal @ 2017-10-10 18:39 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
From: Nitesh Narayan Lal <nilal@redhat.com>
This patch includes the following:
1. Basic skeleton for the support
2. Enablement of x86 platform to use the same
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
arch/x86/kvm/Makefile | 1 +
include/linux/gfp.h | 7 +++++++
virt/kvm/Kconfig | 4 ++++
virt/kvm/page_hinting.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 58 insertions(+)
create mode 100644 virt/kvm/page_hinting.c
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 09d4b17..6f7c382 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,6 +10,7 @@ KVM := ../../../virt/kvm
kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
+kvm-$(CONFIG_KVM_FREE_PAGE_HINTING) += $(KVM)/page_hinting.o
kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f780718..a74371f 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -452,6 +452,13 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
}
+#ifdef CONFIG_KVM_FREE_PAGE_HINTING
+#define HAVE_ARCH_ALLOC_PAGE
+#define HAVE_ARCH_FREE_PAGE
+void arch_free_page(struct page *page, int order);
+void arch_alloc_page(struct page *page, int order);
+#endif
+
#ifndef HAVE_ARCH_FREE_PAGE
static inline void arch_free_page(struct page *page, int order) { }
#endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index b0cc1a3..57f1c7b 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -50,3 +50,7 @@ config KVM_COMPAT
config HAVE_KVM_IRQ_BYPASS
bool
+
+config KVM_FREE_PAGE_HINTING
+ def_bool m
+ depends on KVM
diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
new file mode 100644
index 0000000..39d2b1d
--- /dev/null
+++ b/virt/kvm/page_hinting.c
@@ -0,0 +1,46 @@
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/page_ref.h>
+#include <linux/kvm_host.h>
+#include <linux/sort.h>
+
+#include <trace/events/kvm.h>
+
+#define MAX_FGPT_ENTRIES 1000
+#define HYPERLIST_THRESHOLD 500
+/*
+ * struct kvm_free_pages - Tracks the pages which are freed by the guest.
+ * @pfn - page frame number for the page which is to be freed
+ * @pages - number of pages which are supposed to be freed.
+ * A global array object is used to hold the list of pfn and number of pages
+ * which are freed by the guest. This list may also have fragmentated pages so
+ * defragmentation is a must prior to the hypercall.
+ */
+struct kvm_free_pages {
+ unsigned long pfn;
+ unsigned int pages;
+};
+
+/*
+ * hypervisor_pages - It is a dummy structure passed with the hypercall.
+ * @pfn - page frame number for the page which is to be freed.
+ * @pages - number of pages which are supposed to be freed.
+ * A global array object is used to to hold the list of pfn and pages and is
+ * passed as part of the hypercall.
+ */
+struct hypervisor_pages {
+ unsigned long pfn;
+ unsigned int pages;
+};
+
+DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
+DEFINE_PER_CPU(int, kvm_pt_idx);
+struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
+
+void arch_alloc_page(struct page *page, int order)
+{
+}
+
+void arch_free_page(struct page *page, int order)
+{
+}
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [Patch v3 1/5] KVM: Support for guest page hinting
2017-10-10 18:39 ` [Patch v3 1/5] KVM: Support for guest " nilal
@ 2017-10-13 11:47 ` kbuild test robot
2017-10-13 12:44 ` kbuild test robot
1 sibling, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-13 11:47 UTC (permalink / raw)
To: nilal
Cc: kbuild-all, kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz,
riel, david, mst
[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]
Hi Nitesh,
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.14-rc4 next-20171012]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/nilal-redhat-com/KVM-Guest-page-hinting/20171013-181927
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc
All errors (new ones prefixed by >>):
mm/page_alloc.o: In function `.post_alloc_hook':
(.text+0x3090): undefined reference to `.arch_alloc_page'
mm/page_alloc.o: In function `.get_page_from_freelist':
>> page_alloc.c:(.text+0x5404): undefined reference to `.arch_alloc_page'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23873 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [Patch v3 1/5] KVM: Support for guest page hinting
2017-10-10 18:39 ` [Patch v3 1/5] KVM: Support for guest " nilal
2017-10-13 11:47 ` kbuild test robot
@ 2017-10-13 12:44 ` kbuild test robot
1 sibling, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-13 12:44 UTC (permalink / raw)
To: nilal
Cc: kbuild-all, kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz,
riel, david, mst
[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]
Hi Nitesh,
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.14-rc4 next-20171012]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/nilal-redhat-com/KVM-Guest-page-hinting/20171013-181927
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: x86_64-kexec (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
mm/page_alloc.o: In function `__free_pages_ok':
>> page_alloc.c:(.text+0xfbb): undefined reference to `arch_free_page'
mm/page_alloc.o: In function `post_alloc_hook':
page_alloc.c:(.text+0x1d19): undefined reference to `arch_alloc_page'
mm/page_alloc.o: In function `free_hot_cold_page':
page_alloc.c:(.text+0x2b36): undefined reference to `arch_free_page'
mm/page_alloc.o: In function `get_page_from_freelist':
page_alloc.c:(.text+0x3aa1): undefined reference to `arch_alloc_page'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26128 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Patch v3 2/5] KVM: Guest page hinting functionality
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
2017-10-10 18:39 ` [Patch v3 1/5] KVM: Support for guest " nilal
@ 2017-10-10 18:39 ` nilal
2017-10-10 18:39 ` [Patch v3 3/5] KVM: Adding tracepoints for guest page hinting nilal
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: nilal @ 2017-10-10 18:39 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
From: Nitesh Narayan Lal <niteshnarayanlalleo@gmail.com>
This patch adds the guest implementation in order to maintain the list of
pages which are freed by the guest and are not reused. To avoid any
reallocation it includes seqlock once the list is completely filled.
Though it doesn't carries the hypercall related changes.
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
virt/kvm/page_hinting.c | 245 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 243 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
index 39d2b1d..658856d 100644
--- a/virt/kvm/page_hinting.c
+++ b/virt/kvm/page_hinting.c
@@ -3,8 +3,7 @@
#include <linux/page_ref.h>
#include <linux/kvm_host.h>
#include <linux/sort.h>
-
-#include <trace/events/kvm.h>
+#include <linux/kernel.h>
#define MAX_FGPT_ENTRIES 1000
#define HYPERLIST_THRESHOLD 500
@@ -33,14 +32,256 @@ struct hypervisor_pages {
unsigned int pages;
};
+static __cacheline_aligned_in_smp DEFINE_SEQLOCK(guest_page_lock);
DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
DEFINE_PER_CPU(int, kvm_pt_idx);
struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
+static void empty_hyperlist(void)
+{
+ int i = 0;
+
+ while (i < MAX_FGPT_ENTRIES) {
+ hypervisor_pagelist[i].pfn = 0;
+ hypervisor_pagelist[i].pages = 0;
+ i++;
+ }
+}
+
+static void make_hypercall(void)
+{
+ /*
+ * Dummy function: Tobe filled later.
+ */
+ empty_hyperlist();
+}
+
+static int sort_pfn(const void *a1, const void *b1)
+{
+ const struct hypervisor_pages *a = a1;
+ const struct hypervisor_pages *b = b1;
+
+ if (a->pfn > b->pfn)
+ return 1;
+
+ if (a->pfn < b->pfn)
+ return -1;
+
+ return 0;
+}
+
+static int pack_hyperlist(void)
+{
+ int i = 0, j = 0;
+
+ while (i < MAX_FGPT_ENTRIES) {
+ if (hypervisor_pagelist[i].pfn != 0) {
+ if (i != j) {
+ hypervisor_pagelist[j].pfn =
+ hypervisor_pagelist[i].pfn;
+ hypervisor_pagelist[j].pages =
+ hypervisor_pagelist[i].pages;
+ }
+ j++;
+ }
+ i++;
+ }
+ i = j;
+ while (j < MAX_FGPT_ENTRIES) {
+ hypervisor_pagelist[j].pfn = 0;
+ hypervisor_pagelist[j].pages = 0;
+ j++;
+ }
+ return i;
+}
+
+int compress_hyperlist(void)
+{
+ int i = 0, j = 1, merge_counter = 0, ret = 0;
+
+ sort(hypervisor_pagelist, MAX_FGPT_ENTRIES,
+ sizeof(struct hypervisor_pages), sort_pfn, NULL);
+ while (i < MAX_FGPT_ENTRIES && j < MAX_FGPT_ENTRIES) {
+ unsigned long pfni = hypervisor_pagelist[i].pfn;
+ unsigned int pagesi = hypervisor_pagelist[i].pages;
+ unsigned long pfnj = hypervisor_pagelist[j].pfn;
+ unsigned int pagesj = hypervisor_pagelist[j].pages;
+
+ if (pfnj <= pfni) {
+ if (((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) &&
+ ((pfnj + pagesj - 1) >= (pfni - 1))) {
+ hypervisor_pagelist[i].pfn = pfnj;
+ hypervisor_pagelist[i].pages += pfni - pfnj;
+ hypervisor_pagelist[j].pfn = 0;
+ hypervisor_pagelist[j].pages = 0;
+ j++;
+ merge_counter++;
+ continue;
+ } else if ((pfnj + pagesj - 1) > (pfni + pagesi - 1)) {
+ hypervisor_pagelist[i].pfn = pfnj;
+ hypervisor_pagelist[i].pages = pagesj;
+ hypervisor_pagelist[j].pfn = 0;
+ hypervisor_pagelist[j].pages = 0;
+ j++;
+ merge_counter++;
+ continue;
+ }
+ } else if (pfnj > pfni) {
+ if ((pfnj + pagesj - 1) > (pfni + pagesi - 1) &&
+ (pfnj <= pfni + pagesi)) {
+ hypervisor_pagelist[i].pages +=
+ (pfnj + pagesj - 1) -
+ (pfni + pagesi - 1);
+ hypervisor_pagelist[j].pfn = 0;
+ hypervisor_pagelist[j].pages = 0;
+ j++;
+ merge_counter++;
+ continue;
+ } else if ((pfnj + pagesj - 1) <= (pfni + pagesi - 1)) {
+ hypervisor_pagelist[j].pfn = 0;
+ hypervisor_pagelist[j].pages = 0;
+ j++;
+ merge_counter++;
+ continue;
+ }
+ }
+ i = j;
+ j++;
+ }
+ if (merge_counter != 0)
+ ret = pack_hyperlist() - 1;
+ else
+ ret = MAX_FGPT_ENTRIES - 1;
+ return ret;
+}
+
+void copy_hyperlist(int hyper_idx)
+{
+ int *idx = &get_cpu_var(kvm_pt_idx);
+ struct kvm_free_pages *free_page_obj;
+ int i = 0;
+
+ free_page_obj = &get_cpu_var(kvm_pt)[0];
+ while (i < hyper_idx) {
+ free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
+ free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
+ *idx += 1;
+ i++;
+ }
+ empty_hyperlist();
+ put_cpu_var(kvm_pt);
+ put_cpu_var(kvm_pt_idx);
+}
+
+/*
+ * arch_free_page_slowpath() - This function adds the guest free page entries
+ * to hypervisor_pages list and also ensures defragmentation prior to addition
+ * if it is present with any entry of the kvm_free_pages list.
+ */
+void arch_free_page_slowpath(void)
+{
+ int idx = 0;
+ int hyper_idx = -1;
+ int *kvm_idx = &get_cpu_var(kvm_pt_idx);
+ struct kvm_free_pages *free_page_obj = &get_cpu_var(kvm_pt)[0];
+
+ write_seqlock(&guest_page_lock);
+ while (idx < MAX_FGPT_ENTRIES) {
+ unsigned long pfn = free_page_obj[idx].pfn;
+ unsigned long pfn_end = free_page_obj[idx].pfn +
+ free_page_obj[idx].pages - 1;
+ bool prev_free = false;
+
+ while (pfn <= pfn_end) {
+ struct page *p = pfn_to_page(pfn);
+
+ if (PageCompound(p)) {
+ struct page *head_page = compound_head(p);
+ unsigned long head_pfn = page_to_pfn(head_page);
+ unsigned int alloc_pages =
+ 1 << compound_order(head_page);
+
+ pfn = head_pfn + alloc_pages;
+ prev_free = false;
+ continue;
+ }
+ if (page_ref_count(p)) {
+ pfn++;
+ prev_free = false;
+ continue;
+ }
+ /*
+ * The page is free so add it to the list and free the
+ * hypervisor_pagelist if required.
+ */
+ if (!prev_free) {
+ hyper_idx++;
+ hypervisor_pagelist[hyper_idx].pfn = pfn;
+ hypervisor_pagelist[hyper_idx].pages = 1;
+ if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
+ hyper_idx = compress_hyperlist();
+ if (hyper_idx >=
+ HYPERLIST_THRESHOLD) {
+ make_hypercall();
+ hyper_idx = 0;
+ }
+ }
+ /*
+ * If the next contiguous page is free, it can
+ * be added to this same entry.
+ */
+ prev_free = true;
+ } else {
+ /*
+ * Multiple adjacent free pages
+ */
+ hypervisor_pagelist[hyper_idx].pages++;
+ }
+ pfn++;
+ }
+ free_page_obj[idx].pfn = 0;
+ free_page_obj[idx].pages = 0;
+ idx++;
+ }
+ *kvm_idx = 0;
+ put_cpu_var(kvm_pt);
+ put_cpu_var(kvm_pt_idx);
+ write_sequnlock(&guest_page_lock);
+}
+
void arch_alloc_page(struct page *page, int order)
{
+ unsigned int seq;
+
+ /*
+ * arch_free_page will acquire the lock once the list carrying guest
+ * free pages is full and a hypercall will be made. Until complete free
+ * page list is traversed no further allocaiton will be allowed.
+ */
+ do {
+ seq = read_seqbegin(&guest_page_lock);
+ } while (read_seqretry(&guest_page_lock, seq));
}
void arch_free_page(struct page *page, int order)
{
+ int *free_page_idx = &get_cpu_var(kvm_pt_idx);
+ struct kvm_free_pages *free_page_obj;
+ unsigned long flags;
+
+ /*
+ * use of global variables may trigger a race condition between irq and
+ * process context causing unwanted overwrites. This will be replaced
+ * with a better solution to prevent such race conditions.
+ */
+ local_irq_save(flags);
+ free_page_obj = &get_cpu_var(kvm_pt)[0];
+ free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
+ free_page_obj[*free_page_idx].pages = 1 << order;
+ *free_page_idx += 1;
+ if (*free_page_idx == MAX_FGPT_ENTRIES)
+ arch_free_page_slowpath();
+ put_cpu_var(kvm_pt);
+ put_cpu_var(kvm_pt_idx);
+ local_irq_restore(flags);
}
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [Patch v3 3/5] KVM: Adding tracepoints for guest page hinting
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
2017-10-10 18:39 ` [Patch v3 1/5] KVM: Support for guest " nilal
2017-10-10 18:39 ` [Patch v3 2/5] KVM: Guest page hinting functionality nilal
@ 2017-10-10 18:39 ` nilal
2017-10-10 18:39 ` [Patch v3 4/5] virtio: Exposes added descriptor to the other side synchronously nilal
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: nilal @ 2017-10-10 18:39 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
From: Nitesh Narayan Lal <nilal@redhat.com>
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
include/trace/events/kvm.h | 101 +++++++++++++++++++++++++++++++++++++++++++++
virt/kvm/page_hinting.c | 20 ++++++++-
2 files changed, 119 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index dcffedf..0f6989a 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -395,6 +395,107 @@ TRACE_EVENT(kvm_halt_poll_ns,
#define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \
trace_kvm_halt_poll_ns(false, vcpu_id, new, old)
+TRACE_EVENT(guest_free_page,
+ TP_PROTO(struct page *page, unsigned int order),
+
+ TP_ARGS(page, order),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, pfn)
+ __field(unsigned int, order)
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = page_to_pfn(page);
+ __entry->order = order;
+ ),
+
+ TP_printk("page=%p pfn=%lu number of pages=%d",
+ pfn_to_page(__entry->pfn),
+ __entry->pfn,
+ (1 << __entry->order))
+);
+
+TRACE_EVENT(guest_alloc_page,
+ TP_PROTO(struct page *page, unsigned int order),
+
+ TP_ARGS(page, order),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, pfn)
+ __field(unsigned int, order)
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = page_to_pfn(page);
+ __entry->order = order;
+ ),
+
+ TP_printk("page=%p pfn=%lu number of pages=%d",
+ pfn_to_page(__entry->pfn),
+ __entry->pfn,
+ (1 << __entry->order))
+);
+
+TRACE_EVENT(guest_free_page_slowpath,
+ TP_PROTO(unsigned long pfn, unsigned int pages),
+
+ TP_ARGS(pfn, pages),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, pfn)
+ __field(unsigned int, pages)
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = pfn;
+ __entry->pages = pages;
+ ),
+
+ TP_printk("pfn=%lu number of pages=%u",
+ __entry->pfn,
+ __entry->pages)
+);
+
+TRACE_EVENT(guest_pfn_dump,
+ TP_PROTO(char *type, unsigned long pfn, unsigned int pages),
+
+ TP_ARGS(type, pfn, pages),
+
+ TP_STRUCT__entry(
+ __field(char *, type)
+ __field(unsigned long, pfn)
+ __field(unsigned int, pages)
+ ),
+
+ TP_fast_assign(
+ __entry->type = type;
+ __entry->pfn = pfn;
+ __entry->pages = pages;
+ ),
+
+ TP_printk("Type=%s pfn=%lu number of pages=%d",
+ __entry->type,
+ __entry->pfn,
+ __entry->pages)
+);
+
+TRACE_EVENT(guest_str_dump,
+ TP_PROTO(char *str),
+
+ TP_ARGS(str),
+
+ TP_STRUCT__entry(
+ __field(char *, str)
+ ),
+
+ TP_fast_assign(
+ __entry->str = str;
+ ),
+
+ TP_printk("Debug=%s",
+ __entry->str)
+);
#endif /* _TRACE_KVM_MAIN_H */
/* This part must be outside protection */
diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
index 658856d..ce87a74 100644
--- a/virt/kvm/page_hinting.c
+++ b/virt/kvm/page_hinting.c
@@ -4,6 +4,7 @@
#include <linux/kvm_host.h>
#include <linux/sort.h>
#include <linux/kernel.h>
+#include <trace/events/kvm.h>
#define MAX_FGPT_ENTRIES 1000
#define HYPERLIST_THRESHOLD 500
@@ -48,12 +49,13 @@ static void empty_hyperlist(void)
}
}
-static void make_hypercall(void)
+void make_hypercall(void)
{
/*
* Dummy function: Tobe filled later.
*/
empty_hyperlist();
+ trace_guest_str_dump("Hypercall to host...:");
}
static int sort_pfn(const void *a1, const void *b1)
@@ -70,13 +72,16 @@ static int sort_pfn(const void *a1, const void *b1)
return 0;
}
-static int pack_hyperlist(void)
+int pack_hyperlist(void)
{
int i = 0, j = 0;
while (i < MAX_FGPT_ENTRIES) {
if (hypervisor_pagelist[i].pfn != 0) {
if (i != j) {
+ trace_guest_pfn_dump("Packing Hyperlist",
+ hypervisor_pagelist[i].pfn,
+ hypervisor_pagelist[i].pages);
hypervisor_pagelist[j].pfn =
hypervisor_pagelist[i].pfn;
hypervisor_pagelist[j].pages =
@@ -163,6 +168,9 @@ void copy_hyperlist(int hyper_idx)
free_page_obj = &get_cpu_var(kvm_pt)[0];
while (i < hyper_idx) {
+ trace_guest_pfn_dump("HyperList entry copied",
+ hypervisor_pagelist[i].pfn,
+ hypervisor_pagelist[i].pages);
free_page_obj[*idx].pfn = hypervisor_pagelist[i].pfn;
free_page_obj[*idx].pages = hypervisor_pagelist[i].pages;
*idx += 1;
@@ -203,11 +211,14 @@ void arch_free_page_slowpath(void)
pfn = head_pfn + alloc_pages;
prev_free = false;
+ trace_guest_pfn_dump("Compound",
+ head_pfn, alloc_pages);
continue;
}
if (page_ref_count(p)) {
pfn++;
prev_free = false;
+ trace_guest_pfn_dump("Single", pfn, 1);
continue;
}
/*
@@ -218,6 +229,9 @@ void arch_free_page_slowpath(void)
hyper_idx++;
hypervisor_pagelist[hyper_idx].pfn = pfn;
hypervisor_pagelist[hyper_idx].pages = 1;
+ trace_guest_free_page_slowpath(
+ hypervisor_pagelist[hyper_idx].pfn,
+ hypervisor_pagelist[hyper_idx].pages);
if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
hyper_idx = compress_hyperlist();
if (hyper_idx >=
@@ -261,6 +275,7 @@ void arch_alloc_page(struct page *page, int order)
do {
seq = read_seqbegin(&guest_page_lock);
} while (read_seqretry(&guest_page_lock, seq));
+ trace_guest_alloc_page(page, order);
}
void arch_free_page(struct page *page, int order)
@@ -276,6 +291,7 @@ void arch_free_page(struct page *page, int order)
*/
local_irq_save(flags);
free_page_obj = &get_cpu_var(kvm_pt)[0];
+ trace_guest_free_page(page, order);
free_page_obj[*free_page_idx].pfn = page_to_pfn(page);
free_page_obj[*free_page_idx].pages = 1 << order;
*free_page_idx += 1;
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [Patch v3 4/5] virtio: Exposes added descriptor to the other side synchronously
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
` (2 preceding siblings ...)
2017-10-10 18:39 ` [Patch v3 3/5] KVM: Adding tracepoints for guest page hinting nilal
@ 2017-10-10 18:39 ` nilal
2017-10-10 18:39 ` [Patch v3 5/5] KVM: Sending hyperlist to the host via hinting_vq nilal
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: nilal @ 2017-10-10 18:39 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
From: Nitesh Narayan Lal <nilal@redhat.com>
This patch enables the driver to expose a chain of buffers to the other end
using vring descriptor followed by a kick. After which it busy waits till the
update is done.
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
drivers/virtio/virtio_ring.c | 157 ++++++++++++++++++++++++++++++++++++++++++-
include/linux/virtio.h | 19 ++++++
2 files changed, 175 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb30f3e..651ce8f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -438,6 +438,136 @@ static inline int virtqueue_add(struct virtqueue *_vq,
}
/**
+ * virtqueue_add_chain - expose a chain of buffers to the other end
+ * @_vq: the struct virtqueue we're talking about.
+ * @head: desc id of the chain head.
+ * @indirect: set if the chain of descs are indrect descs.
+ * @indir_desc: the first indirect desc.
+ * @data: the token identifying the chain.
+ * @ctx: extra context for the token.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_chain(struct virtqueue *_vq,
+ unsigned int head,
+ bool indirect,
+ struct vring_desc *indir_desc,
+ void *data,
+ void *ctx)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ /* The desc chain is empty. */
+ if (head == VIRTQUEUE_DESC_ID_INIT)
+ return 0;
+
+ START_USE(vq);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return -EIO;
+ }
+
+ /* This is the data for callback, in our case may not be required. */
+ vq->desc_state[head].data = data;
+ if (indirect)
+ vq->desc_state[head].indir_desc = indir_desc;
+ if (ctx)
+ vq->desc_state[head].indir_desc = ctx;
+
+ vq->avail_idx_shadow = 1;
+ vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+ vq->num_added = 1;
+ END_USE(vq);
+ virtqueue_kick_sync(_vq);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_chain);
+
+/**
+ * virtqueue_add_chain_desc - add a buffer to a chain using a vring desc
+ * @vq: the struct virtqueue we're talking about.
+ * @addr: address of the buffer to add.
+ * @len: length of the buffer.
+ * @head_id: desc id of the chain head.
+ * @prev_id: desc id of the previous buffer.
+ * @in: set if the buffer is for the device to write.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_chain_desc(struct virtqueue *_vq,
+ u64 addr,
+ u32 len,
+ unsigned int *head_id,
+ unsigned int *prev_id,
+ bool in)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct vring_desc *desc = vq->vring.desc;
+ u16 flags = in ? VRING_DESC_F_WRITE : 0;
+ unsigned int i;
+
+ /* Sanity check */
+ if (!_vq || !head_id || !prev_id)
+ return -EINVAL;
+retry:
+ START_USE(vq);
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return -EIO;
+ }
+
+ if (vq->vq.num_free < 1) {
+ /*
+ * If there is no desc avail in the vq, so kick what is
+ * already added, and re-start to build a new chain for
+ * the passed sg.
+ */
+ if (likely(*head_id != VIRTQUEUE_DESC_ID_INIT)) {
+ END_USE(vq);
+ virtqueue_add_chain(_vq, *head_id, 0, NULL, vq, NULL);
+ virtqueue_kick_sync(_vq);
+ *head_id = VIRTQUEUE_DESC_ID_INIT;
+ *prev_id = VIRTQUEUE_DESC_ID_INIT;
+ goto retry;
+ } else {
+ END_USE(vq);
+ return -ENOSPC;
+ }
+ }
+
+ i = vq->free_head;
+ flags &= ~VRING_DESC_F_NEXT;
+ desc[i].flags = cpu_to_virtio16(_vq->vdev, flags);
+ desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+ desc[i].len = cpu_to_virtio32(_vq->vdev, len);
+
+ /* Add the desc to the end of the chain */
+ if (*prev_id != VIRTQUEUE_DESC_ID_INIT) {
+ desc[*prev_id].next = cpu_to_virtio16(_vq->vdev, i);
+ desc[*prev_id].flags |= cpu_to_virtio16(_vq->vdev,
+ VRING_DESC_F_NEXT);
+ }
+ *prev_id = i;
+ if (*head_id == VIRTQUEUE_DESC_ID_INIT)
+ *head_id = *prev_id;
+
+ vq->vq.num_free--;
+ vq->free_head = virtio16_to_cpu(_vq->vdev, desc[i].next);
+ END_USE(vq);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_chain_desc);
+
+/**
* virtqueue_add_sgs - expose buffers to other end
* @vq: the struct virtqueue we're talking about.
* @sgs: array of terminated scatterlists.
@@ -559,7 +689,6 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
START_USE(vq);
/* We need to expose available array entries before checking avail
* event. */
- virtio_mb(vq->weak_barriers);
old = vq->avail_idx_shadow - vq->num_added;
new = vq->avail_idx_shadow;
@@ -609,6 +738,32 @@ bool virtqueue_notify(struct virtqueue *_vq)
EXPORT_SYMBOL_GPL(virtqueue_notify);
/**
+ * virtqueue_kick_sync - update after add_buf and busy wait till update is done
+ * @vq: the struct virtqueue
+ *
+ * After one or more virtqueue_add_* calls, invoke this to kick
+ * the other side. Busy wait till the other side is done with the update.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * Returns false if kick failed, otherwise true.
+ */
+bool virtqueue_kick_sync(struct virtqueue *vq)
+{
+ u32 len;
+
+ if (likely(virtqueue_kick(vq))) {
+ while (!virtqueue_get_buf(vq, &len) &&
+ !virtqueue_is_broken(vq))
+ cpu_relax();
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(virtqueue_kick_sync);
+
+/**
* virtqueue_kick - update after add_buf
* @vq: the struct virtqueue
*
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 28b0e96..bc95e84 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -56,6 +56,25 @@ int virtqueue_add_sgs(struct virtqueue *vq,
unsigned int in_sgs,
void *data,
gfp_t gfp);
+/* A desc with this init id is treated as an invalid desc */
+#define VIRTQUEUE_DESC_ID_INIT UINT_MAX
+int virtqueue_add_chain_desc(struct virtqueue *_vq,
+ u64 addr,
+ u32 len,
+ unsigned int *head_id,
+ unsigned int *prev_id,
+ bool in);
+
+int virtqueue_add_chain(struct virtqueue *_vq,
+ unsigned int head,
+ bool indirect,
+ struct vring_desc *indirect_desc,
+ void *data,
+ void *ctx);
+
+bool virtqueue_kick_sync(struct virtqueue *vq);
+
+bool virtqueue_kick_async(struct virtqueue *vq, wait_queue_head_t wq);
bool virtqueue_kick(struct virtqueue *vq);
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [Patch v3 5/5] KVM: Sending hyperlist to the host via hinting_vq
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
` (3 preceding siblings ...)
2017-10-10 18:39 ` [Patch v3 4/5] virtio: Exposes added descriptor to the other side synchronously nilal
@ 2017-10-10 18:39 ` nilal
2017-10-13 11:44 ` kbuild test robot
2017-10-10 18:41 ` [QEMU PATCH] kvm: Support for guest page hinting nilal
2017-10-11 18:49 ` [Patch v3 0/5] KVM: Guest " Konrad Rzeszutek Wilk
6 siblings, 1 reply; 12+ messages in thread
From: nilal @ 2017-10-10 18:39 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
From: Nitesh Narayan Lal <nilal@redhat.com>
This patch creates a new vq (hinting_vq) to be used for page hinting
and adds support in the existing virtio balloon infrastructure so
that the hyper list carrying pages which are supposed to be freed
could be sent to the host (QEMU) for processing by using hinting_vq.
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
drivers/virtio/virtio_balloon.c | 58 +++++++++++++++++++++++++++++++++++++----
include/linux/page_hinting.h | 16 ++++++++++++
virt/kvm/page_hinting.c | 33 ++++++++---------------
3 files changed, 80 insertions(+), 27 deletions(-)
create mode 100644 include/linux/page_hinting.h
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..0152979 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -32,6 +32,7 @@
#include <linux/mm.h>
#include <linux/mount.h>
#include <linux/magic.h>
+#include <linux/page_hinting.h>
/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -53,7 +54,7 @@ static struct vfsmount *balloon_mnt;
struct virtio_balloon {
struct virtio_device *vdev;
- struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+ struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *hinting_vq;
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -95,6 +96,39 @@ static struct virtio_device_id id_table[] = {
{ 0 },
};
+#ifdef CONFIG_KVM_FREE_PAGE_HINTING
+static void tell_host_one_page(struct virtio_balloon *vb, struct virtqueue *vq,
+ unsigned long pfn)
+{
+ unsigned int id = VIRTQUEUE_DESC_ID_INIT;
+ u64 addr = pfn << VIRTIO_BALLOON_PFN_SHIFT;
+
+ virtqueue_add_chain_desc(vq, addr, PAGE_SIZE, &id, &id, 0);
+ virtqueue_add_chain(vq, id, 0, NULL, (void *)addr, NULL);
+}
+
+void virtballoon_page_hinting(struct virtio_balloon *vb, int hyper_entries)
+{
+ int i = 0;
+
+ for (i = 0; i < hyper_entries; i++) {
+ unsigned long pfn = hypervisor_pagelist[i].pfn;
+ unsigned long pfn_end = hypervisor_pagelist[i].pfn +
+ hypervisor_pagelist[i].pages - 1;
+
+ while (pfn <= pfn_end) {
+ vb->pfns[0] = cpu_to_virtio32(vb->vdev, pfn);
+ vb->num_pfns = 1;
+ tell_host_one_page(vb, vb->hinting_vq,
+ vb->pfns[0]);
+ pfn++;
+ }
+ hypervisor_pagelist[i].pfn = 0;
+ hypervisor_pagelist[i].pages = 0;
+ }
+}
+#endif
+
static u32 page_to_balloon_pfn(struct page *page)
{
unsigned long pfn = page_to_pfn(page);
@@ -111,6 +145,12 @@ static void balloon_ack(struct virtqueue *vq)
wake_up(&vb->acked);
}
+static void hinting_ack(struct virtqueue *vq)
+{
+ struct virtio_balloon *vb = vq->vdev->priv;
+
+ wake_up(&vb->acked);
+}
static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
{
struct scatterlist sg;
@@ -404,22 +444,25 @@ static void update_balloon_size_func(struct work_struct *work)
static int init_vqs(struct virtio_balloon *vb)
{
- struct virtqueue *vqs[3];
- vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
- static const char * const names[] = { "inflate", "deflate", "stats" };
+ struct virtqueue *vqs[4];
+ vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, hinting_ack,
+ stats_request };
+ static const char * const names[] = { "inflate", "deflate", "hinting",
+ "stats" };
int err, nvqs;
/*
* We expect two virtqueues: inflate and deflate, and
* optionally stat.
*/
- nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+ nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 4 : 3;
err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
if (err)
return err;
vb->inflate_vq = vqs[0];
vb->deflate_vq = vqs[1];
+ vb->hinting_vq = vqs[3];
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
struct scatterlist sg;
unsigned int num_stats;
@@ -581,6 +624,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
virtio_device_ready(vdev);
+#ifdef CONFIG_KVM_FREE_PAGE_HINTING
+ request_hypercall = (void *)&virtballoon_page_hinting;
+ balloon_ptr = vb;
+#endif
+
if (towards_target(vb))
virtballoon_changed(vdev);
return 0;
diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h
new file mode 100644
index 0000000..0bfb646
--- /dev/null
+++ b/include/linux/page_hinting.h
@@ -0,0 +1,16 @@
+#define MAX_FGPT_ENTRIES 1000
+/*
+ * hypervisor_pages - It is a dummy structure passed with the hypercall.
+ * @pfn - page frame number for the page which is to be freed.
+ * @pages - number of pages which are supposed to be freed.
+ * A global array object is used to to hold the list of pfn and pages and is
+ * passed as part of the hypercall.
+ */
+struct hypervisor_pages {
+ unsigned long pfn;
+ unsigned int pages;
+};
+
+extern struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
+extern void (*request_hypercall)(void *, int);
+extern void *balloon_ptr;
diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c
index ce87a74..b472db5 100644
--- a/virt/kvm/page_hinting.c
+++ b/virt/kvm/page_hinting.c
@@ -5,8 +5,8 @@
#include <linux/sort.h>
#include <linux/kernel.h>
#include <trace/events/kvm.h>
+#include <linux/page_hinting.h>
-#define MAX_FGPT_ENTRIES 1000
#define HYPERLIST_THRESHOLD 500
/*
* struct kvm_free_pages - Tracks the pages which are freed by the guest.
@@ -21,22 +21,12 @@ struct kvm_free_pages {
unsigned int pages;
};
-/*
- * hypervisor_pages - It is a dummy structure passed with the hypercall.
- * @pfn - page frame number for the page which is to be freed.
- * @pages - number of pages which are supposed to be freed.
- * A global array object is used to to hold the list of pfn and pages and is
- * passed as part of the hypercall.
- */
-struct hypervisor_pages {
- unsigned long pfn;
- unsigned int pages;
-};
-
static __cacheline_aligned_in_smp DEFINE_SEQLOCK(guest_page_lock);
DEFINE_PER_CPU(struct kvm_free_pages [MAX_FGPT_ENTRIES], kvm_pt);
DEFINE_PER_CPU(int, kvm_pt_idx);
struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES];
+void (*request_hypercall)(void *, int);
+void *balloon_ptr;
static void empty_hyperlist(void)
{
@@ -49,13 +39,11 @@ static void empty_hyperlist(void)
}
}
-void make_hypercall(void)
+void hyperlist_ready(int entries)
{
- /*
- * Dummy function: Tobe filled later.
- */
- empty_hyperlist();
trace_guest_str_dump("Hypercall to host...:");
+ request_hypercall(balloon_ptr, entries);
+ empty_hyperlist();
}
static int sort_pfn(const void *a1, const void *b1)
@@ -156,7 +144,7 @@ int compress_hyperlist(void)
if (merge_counter != 0)
ret = pack_hyperlist() - 1;
else
- ret = MAX_FGPT_ENTRIES - 1;
+ ret = MAX_FGPT_ENTRIES;
return ret;
}
@@ -227,16 +215,16 @@ void arch_free_page_slowpath(void)
*/
if (!prev_free) {
hyper_idx++;
- hypervisor_pagelist[hyper_idx].pfn = pfn;
- hypervisor_pagelist[hyper_idx].pages = 1;
trace_guest_free_page_slowpath(
hypervisor_pagelist[hyper_idx].pfn,
hypervisor_pagelist[hyper_idx].pages);
+ hypervisor_pagelist[hyper_idx].pfn = pfn;
+ hypervisor_pagelist[hyper_idx].pages = 1;
if (hyper_idx == MAX_FGPT_ENTRIES - 1) {
hyper_idx = compress_hyperlist();
if (hyper_idx >=
HYPERLIST_THRESHOLD) {
- make_hypercall();
+ hyperlist_ready(hyper_idx);
hyper_idx = 0;
}
}
@@ -272,6 +260,7 @@ void arch_alloc_page(struct page *page, int order)
* free pages is full and a hypercall will be made. Until complete free
* page list is traversed no further allocaiton will be allowed.
*/
+
do {
seq = read_seqbegin(&guest_page_lock);
} while (read_seqretry(&guest_page_lock, seq));
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [Patch v3 5/5] KVM: Sending hyperlist to the host via hinting_vq
2017-10-10 18:39 ` [Patch v3 5/5] KVM: Sending hyperlist to the host via hinting_vq nilal
@ 2017-10-13 11:44 ` kbuild test robot
0 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-13 11:44 UTC (permalink / raw)
To: nilal
Cc: kbuild-all, kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz,
riel, david, mst
[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]
Hi Nitesh,
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.14-rc4 next-20171012]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/nilal-redhat-com/KVM-Guest-page-hinting/20171013-181927
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: s390-defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390
All errors (new ones prefixed by >>):
drivers/virtio/virtio_balloon.o: In function `virtballoon_probe':
drivers/virtio/virtio_balloon.c:628: undefined reference to `request_hypercall'
drivers/virtio/virtio_balloon.c:629: undefined reference to `balloon_ptr'
drivers/virtio/virtio_balloon.o: In function `virtballoon_page_hinting':
drivers/virtio/virtio_balloon.c:114: undefined reference to `hypervisor_pagelist'
>> drivers/virtio/virtio_balloon.o:(.rodata+0x30): undefined reference to `hypervisor_pagelist'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11064 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* [QEMU PATCH] kvm: Support for guest page hinting
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
` (4 preceding siblings ...)
2017-10-10 18:39 ` [Patch v3 5/5] KVM: Sending hyperlist to the host via hinting_vq nilal
@ 2017-10-10 18:41 ` nilal
2017-10-10 18:41 ` [PATCH] " nilal
2017-10-11 18:49 ` [Patch v3 0/5] KVM: Guest " Konrad Rzeszutek Wilk
6 siblings, 1 reply; 12+ messages in thread
From: nilal @ 2017-10-10 18:41 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
This patch adds a new vq (hvq) to existing virtio-balloon code in QEMU.
This hvq is used to recieve the guest address which is supposed to be
freed.
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH] kvm: Support for guest page hinting
2017-10-10 18:41 ` [QEMU PATCH] kvm: Support for guest page hinting nilal
@ 2017-10-10 18:41 ` nilal
0 siblings, 0 replies; 12+ messages in thread
From: nilal @ 2017-10-10 18:41 UTC (permalink / raw)
To: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
From: Nitesh Narayan Lal <nilal@redhat.com>
This patch enables QEMU to handle page hinting requests
from the guest. Once the guest kicks QEMU to free a page,
QEMU retrives the guest physical address and converts it to
host virtual address and then MADVISE that memory.
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
---
hw/virtio/virtio-balloon.c | 54 ++++++++++++++++++++++++++++++++++++++
hw/virtio/virtio.c | 24 +++++++++++++++++
include/hw/virtio/virtio-access.h | 1 +
include/hw/virtio/virtio-balloon.h | 2 +-
include/qemu/osdep.h | 7 +++++
5 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 37cde38..a0e3c41 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -33,6 +33,8 @@
#define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT)
+void page_hinting_request(uint64_t addr, uint32_t len);
+
static void balloon_page(void *addr, int deflate)
{
if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
@@ -205,6 +207,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
balloon_stats_change_timer(s, 0);
}
+static void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, Error **errp)
+{
+ MemoryRegionSection mrs = memory_region_find(get_system_memory(),
+ addr, 1);
+
+ if (!mrs.mr) {
+ error_setg(errp, "No memory is mapped at address 0x%" HWADDR_PRIx, addr);
+ return NULL;
+ }
+
+ if (!memory_region_is_ram(mrs.mr) && !memory_region_is_romd(mrs.mr)) {
+ error_setg(errp, "Memory at address 0x%" HWADDR_PRIx "is not RAM", addr);
+ memory_region_unref(mrs.mr);
+ return NULL;
+ }
+
+ *p_mr = mrs.mr;
+ return qemu_map_ram_ptr(mrs.mr->ram_block, mrs.offset_within_region);
+}
+
+void page_hinting_request(uint64_t addr, uint32_t len)
+{
+ Error *local_err = NULL;
+ MemoryRegion *mr = NULL;
+ void *ptr;
+ int ret = 0;
+
+ /*ptr is the host physical address*/
+ ptr = gpa2hva(&mr, addr, &local_err);
+ if (local_err) {
+ error_report_err(local_err);
+ return;
+ }
+ ret = qemu_madvise((void *)ptr, len, QEMU_MADV_FREE);
+ if (ret == -1)
+ printf("\n%d:%s Error: Madvise failed with error:%d\n", __LINE__, __func__, ret);
+}
+
+
+static void virtio_balloon_page_hinting(VirtIODevice *vdev, VirtQueue *vq)
+{
+ uint64_t addr;
+ uint32_t len;
+ VirtQueueElement elem = {};
+
+ pop_hinting_addr(vq, &addr, &len);
+ page_hinting_request(addr, len);
+ virtqueue_push(vq, &elem, 0);
+ virtio_notify(vdev, vq);
+}
+
static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
{
VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -443,6 +496,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
+ s->hvq = virtio_add_queue(vdev, 128, virtio_balloon_page_hinting);
reset_stats(s);
}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 311929e..24e26ec 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -825,6 +825,30 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
return elem;
}
+void pop_hinting_addr(VirtQueue *vq, uint64_t *addr, uint32_t *len)
+{
+ VRingMemoryRegionCaches *caches;
+ VRingDesc desc;
+ MemoryRegionCache *desc_cache;
+ VirtIODevice *vdev = vq->vdev;
+ unsigned int head, i, max;
+
+ max = vq->vring.num;
+ if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
+ printf("\n%d:%sError: Unable to read head\n", __LINE__, __func__);
+ }
+ i = head;
+
+ caches = vring_get_region_caches(vq);
+ if (caches->desc.len < max * sizeof(VRingDesc)) {
+ virtio_error(vdev, "Cannot map descriptor ring");
+ }
+ desc_cache = &caches->desc;
+ vring_desc_read(vdev, &desc, desc_cache, i);
+ *addr = desc.addr;
+ *len = desc.len;
+}
+
void *virtqueue_pop(VirtQueue *vq, size_t sz)
{
unsigned int i, head, max;
diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h
index 2e92074..568d71f 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -24,6 +24,7 @@
#define LEGACY_VIRTIO_IS_BIENDIAN 1
#endif
+void pop_hinting_addr(VirtQueue *vq, uint64_t *addr, uint32_t *len);
static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
{
#if defined(LEGACY_VIRTIO_IS_BIENDIAN)
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..dfb5782 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -33,7 +33,7 @@ typedef struct virtio_balloon_stat_modern {
typedef struct VirtIOBalloon {
VirtIODevice parent_obj;
- VirtQueue *ivq, *dvq, *svq;
+ VirtQueue *ivq, *dvq, *svq, *hvq;
uint32_t num_pages;
uint32_t actual;
uint64_t stats[VIRTIO_BALLOON_S_NR];
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9dd318a..033d64c 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -278,6 +278,11 @@ void qemu_anon_ram_free(void *ptr, size_t size);
#else
#define QEMU_MADV_REMOVE QEMU_MADV_INVALID
#endif
+#ifdef MADV_FREE
+#define QEMU_MADV_FREE MADV_FREE
+#else
+#define QEMU_MADV_FREE QEMU_MAD_INVALID
+#endif
#elif defined(CONFIG_POSIX_MADVISE)
@@ -291,6 +296,7 @@ void qemu_anon_ram_free(void *ptr, size_t size);
#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
#define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID
#define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#define QEMU_MADV_FREE QEMU_MAD_INVALID
#else /* no-op */
@@ -304,6 +310,7 @@ void qemu_anon_ram_free(void *ptr, size_t size);
#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
#define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID
#define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#define QEMU_MADV_FREE QEMU_MAD_INVALID
#endif
--
2.9.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch v3 0/5] KVM: Guest page hinting
2017-10-10 18:39 [Patch v3 0/5] KVM: Guest page hinting nilal
` (5 preceding siblings ...)
2017-10-10 18:41 ` [QEMU PATCH] kvm: Support for guest page hinting nilal
@ 2017-10-11 18:49 ` Konrad Rzeszutek Wilk
6 siblings, 0 replies; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-10-11 18:49 UTC (permalink / raw)
To: nilal; +Cc: kvm, pbonzini, pagupta, wei.w.wang, yang.zhang.wz, riel, david,
mst
On Tue, Oct 10, 2017 at 02:39:18PM -0400, nilal@redhat.com wrote:
> The following patch-set proposes an efficient mechanism for handing freed memory between the guest and the host. It enables the guests with DAX (no page cache) to rapidly free and reclaim memory to and from the host respectively.
>
> Changelog in V3:
> -A new hinting_vq is added to virtio_balloon for the usage of guest page hinting
> -Implmentation to pass hyperlist entries to the host via hinting_vq
> -Basic functional testing for this support is done
>
> Virtio interface changes are picked up from Wei's patch-set for Virtio-balloon enhancement[3]. "Wei, How would you like me to credit you in the final patch?")
Since VirtIO is an offical specification, is OASIS working through
the changes in the spec that the implementation has?
>
> [1] http://www.spinics.net/lists/kvm/msg153666.html
> [2] https://www.spinics.net/lists/kvm/msg155471.html
> [3] http://www.spinics.net/lists/kvm/msg152734.html
>
^ permalink raw reply [flat|nested] 12+ messages in thread