xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 17/39] x86/xen: export vcpu_info and shared_info
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
@ 2019-02-20 20:15 ` Joao Martins
  2019-02-20 20:15 ` [PATCH RFC 18/39] x86/xen: make hypercall_page generic Joao Martins
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Joao Martins, Stefano Stabellini,
	Radim Krčmář, x86, Ankur Arora, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky,
	Thomas Gleixner

From: Ankur Arora <ankur.a.arora@oracle.com>

Also remove __init annotations from xen_evtchn_2l/fifo_init().

This allows us to support 2-level event channel ABI on
xen_shim_domain().

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 arch/x86/xen/enlighten.c         | 3 +++
 drivers/xen/events/events_2l.c   | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 include/xen/xen-ops.h            | 2 +-
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 750f46ad018a..73b9736e89d2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -50,6 +50,8 @@ DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
 /* Linux <-> Xen vCPU id mapping */
 DEFINE_PER_CPU(uint32_t, xen_vcpu_id);
 EXPORT_PER_CPU_SYMBOL(xen_vcpu_id);
+EXPORT_PER_CPU_SYMBOL(xen_vcpu);
+EXPORT_PER_CPU_SYMBOL(xen_vcpu_info);
 
 enum xen_domain_type xen_domain_type = XEN_NATIVE;
 EXPORT_SYMBOL_GPL(xen_domain_type);
@@ -79,6 +81,7 @@ EXPORT_SYMBOL(xen_start_flags);
  * page as soon as fixmap is up and running.
  */
 struct shared_info *HYPERVISOR_shared_info = &xen_dummy_shared_info;
+EXPORT_SYMBOL_GPL(HYPERVISOR_shared_info);
 
 /*
  * Flag to determine whether vcpu info placement is available on all
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index 8edef51c92e5..b5acf4b09971 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -369,7 +369,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
 	.resume	           = evtchn_2l_resume,
 };
 
-void __init xen_evtchn_2l_init(void)
+void xen_evtchn_2l_init(void)
 {
 	pr_info("Using 2-level ABI\n");
 	evtchn_ops = &evtchn_ops_2l;
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 76b318e88382..453b4b05f238 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -430,7 +430,7 @@ static int xen_evtchn_cpu_dead(unsigned int cpu)
 	return 0;
 }
 
-int __init xen_evtchn_fifo_init(void)
+int xen_evtchn_fifo_init(void)
 {
 	int cpu = smp_processor_id();
 	int ret;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 4969817124a8..92fc45075500 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -10,7 +10,7 @@
 #include <xen/interface/vcpu.h>
 
 DECLARE_PER_CPU(struct vcpu_info *, xen_vcpu);
-
+DECLARE_PER_CPU(struct vcpu_info, xen_vcpu_info);
 DECLARE_PER_CPU(uint32_t, xen_vcpu_id);
 static inline uint32_t xen_vcpu_nr(int cpu)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 18/39] x86/xen: make hypercall_page generic
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
  2019-02-20 20:15 ` [PATCH RFC 17/39] x86/xen: export vcpu_info and shared_info Joao Martins
@ 2019-02-20 20:15 ` Joao Martins
  2019-02-20 20:15 ` [PATCH RFC 19/39] xen/xenbus: xenbus uninit support Joao Martins
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Joao Martins, Stefano Stabellini,
	Radim Krčmář, x86, Ankur Arora, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky,
	Thomas Gleixner

From: Ankur Arora <ankur.a.arora@oracle.com>

Export hypercall_page as a generic interface which can be implemented
by other hypervisors. With this change, hypercall_page now points to
the newly introduced xen_hypercall_page which is seeded by Xen, or to
one that is filled in by a different hypervisor.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 arch/x86/include/asm/xen/hypercall.h | 12 +++++++-----
 arch/x86/xen/enlighten.c             |  1 +
 arch/x86/xen/enlighten_hvm.c         |  3 ++-
 arch/x86/xen/enlighten_pv.c          |  1 +
 arch/x86/xen/enlighten_pvh.c         |  3 ++-
 arch/x86/xen/xen-asm_32.S            |  2 +-
 arch/x86/xen/xen-asm_64.S            |  2 +-
 arch/x86/xen/xen-head.S              |  8 ++++----
 8 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index ef05bea7010d..1a3cd6680e6f 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -86,11 +86,13 @@ struct xen_dm_op_buf;
  * there aren't more than 5 arguments...)
  */
 
-extern struct { char _entry[32]; } hypercall_page[];
+struct hypercall_entry { char _entry[32]; };
+extern struct hypercall_entry xen_hypercall_page[128];
+extern struct hypercall_entry *hypercall_page;
 
-#define __HYPERCALL		"call hypercall_page+%c[offset]"
+#define __HYPERCALL	CALL_NOSPEC
 #define __HYPERCALL_ENTRY(x)						\
-	[offset] "i" (__HYPERVISOR_##x * sizeof(hypercall_page[0]))
+	[thunk_target] "0" (hypercall_page + __HYPERVISOR_##x)
 
 #ifdef CONFIG_X86_32
 #define __HYPERCALL_RETREG	"eax"
@@ -116,7 +118,7 @@ extern struct { char _entry[32]; } hypercall_page[];
 	register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
 	register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
 
-#define __HYPERCALL_0PARAM	"=r" (__res), ASM_CALL_CONSTRAINT
+#define __HYPERCALL_0PARAM	"=&r" (__res), ASM_CALL_CONSTRAINT
 #define __HYPERCALL_1PARAM	__HYPERCALL_0PARAM, "+r" (__arg1)
 #define __HYPERCALL_2PARAM	__HYPERCALL_1PARAM, "+r" (__arg2)
 #define __HYPERCALL_3PARAM	__HYPERCALL_2PARAM, "+r" (__arg3)
@@ -208,7 +210,7 @@ xen_single_call(unsigned int call,
 
 	asm volatile(CALL_NOSPEC
 		     : __HYPERCALL_5PARAM
-		     : [thunk_target] "a" (&hypercall_page[call])
+		     : [thunk_target] "0" (hypercall_page + call)
 		     : __HYPERCALL_CLOBBER5);
 
 	return (long)__res;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 73b9736e89d2..b36a10e6b5d7 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -20,6 +20,7 @@
 #include "smp.h"
 #include "pmu.h"
 
+struct hypercall_entry *hypercall_page;
 EXPORT_SYMBOL_GPL(hypercall_page);
 
 /*
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 0e75642d42a3..40845e3e9a96 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -105,8 +105,9 @@ static void __init init_hvm_pv_info(void)
 
 		pv_info.name = "Xen HVM";
 		msr = cpuid_ebx(base + 2);
-		pfn = __pa(hypercall_page);
+		pfn = __pa(xen_hypercall_page);
 		wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+		hypercall_page = xen_hypercall_page;
 	}
 
 	xen_setup_features();
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index c54a493e139a..e1537713b57d 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1197,6 +1197,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
 	if (!xen_start_info)
 		return;
+	hypercall_page = xen_hypercall_page;
 
 	xen_domain_type = XEN_PV_DOMAIN;
 	xen_start_flags = xen_start_info->flags;
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 35b7599d2d0b..d57a8ad1769e 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -30,8 +30,9 @@ void __init xen_pvh_init(void)
 	xen_start_flags = pvh_start_info.flags;
 
 	msr = cpuid_ebx(xen_cpuid_base() + 2);
-	pfn = __pa(hypercall_page);
+	pfn = __pa(xen_hypercall_page);
 	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+	hypercall_page = xen_hypercall_page;
 }
 
 void __init mem_map_via_hcall(struct boot_params *boot_params_p)
diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index c15db060a242..ee4998055ea9 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -121,7 +121,7 @@ xen_iret_end_crit:
 
 hyper_iret:
 	/* put this out of line since its very rarely used */
-	jmp hypercall_page + __HYPERVISOR_iret * 32
+	jmp xen_hypercall_page + __HYPERVISOR_iret * 32
 
 	.globl xen_iret_start_crit, xen_iret_end_crit
 
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 1e9ef0ba30a5..2172d6aec9a3 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -70,7 +70,7 @@ ENTRY(xen_early_idt_handler_array)
 END(xen_early_idt_handler_array)
 	__FINIT
 
-hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
+hypercall_iret = xen_hypercall_page + __HYPERVISOR_iret * 32
 /*
  * Xen64 iret frame:
  *
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..7ff5437bd83f 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -58,18 +58,18 @@ END(startup_xen)
 
 .pushsection .text
 	.balign PAGE_SIZE
-ENTRY(hypercall_page)
+ENTRY(xen_hypercall_page)
 	.rept (PAGE_SIZE / 32)
 		UNWIND_HINT_EMPTY
 		.skip 32
 	.endr
 
 #define HYPERCALL(n) \
-	.equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \
+	.equ xen_hypercall_##n, xen_hypercall_page + __HYPERVISOR_##n * 32; \
 	.type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32
 #include <asm/xen-hypercalls.h>
 #undef HYPERCALL
-END(hypercall_page)
+END(xen_hypercall_page)
 .popsection
 
 	ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz "linux")
@@ -85,7 +85,7 @@ END(hypercall_page)
 #ifdef CONFIG_XEN_PV
 	ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,          _ASM_PTR startup_xen)
 #endif
-	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
+	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR xen_hypercall_page)
 	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,
 		.ascii "!writable_page_tables|pae_pgdir_above_4gb")
 	ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 19/39] xen/xenbus: xenbus uninit support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
  2019-02-20 20:15 ` [PATCH RFC 17/39] x86/xen: export vcpu_info and shared_info Joao Martins
  2019-02-20 20:15 ` [PATCH RFC 18/39] x86/xen: make hypercall_page generic Joao Martins
@ 2019-02-20 20:15 ` Joao Martins
  2019-02-20 20:15 ` [PATCH RFC 20/39] xen-blkback: module_exit support Joao Martins
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	Ankur Arora, Paolo Bonzini, Boris Ostrovsky, Joao Martins

This allows reinitialization of xenbus which is useful for
xen_shim_domain() support. Cleaning xenbus state means cancelling
pending watch events, and deleting all watches, closing xenstore event
channel and finally stopping xenbus/xenwatch kthreads alongside
unregistering /proc/xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/xen/xenbus/xenbus.h        |  2 ++
 drivers/xen/xenbus/xenbus_client.c |  5 ++++
 drivers/xen/xenbus/xenbus_probe.c  | 51 +++++++++++++++++++++++++++++++++++---
 drivers/xen/xenbus/xenbus_xs.c     | 38 ++++++++++++++++++++++++++++
 4 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h
index 092981171df1..e0e586d81d48 100644
--- a/drivers/xen/xenbus/xenbus.h
+++ b/drivers/xen/xenbus/xenbus.h
@@ -96,6 +96,7 @@ extern wait_queue_head_t xb_waitq;
 extern struct mutex xb_write_mutex;
 
 int xs_init(void);
+void xs_deinit(void);
 int xb_init_comms(void);
 void xb_deinit_comms(void);
 int xs_watch_msg(struct xs_watch_event *event);
@@ -129,6 +130,7 @@ int xenbus_read_otherend_details(struct xenbus_device *xendev,
 				 char *id_node, char *path_node);
 
 void xenbus_ring_ops_init(void);
+void xenbus_ring_ops_deinit(void);
 
 int xenbus_dev_request_and_reply(struct xsd_sockmsg *msg, void *par);
 void xenbus_dev_queue_reply(struct xb_req_data *req);
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index e17ca8156171..ada1c9aa6525 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -935,3 +935,8 @@ void __init xenbus_ring_ops_init(void)
 #endif
 		ring_ops = &ring_ops_hvm;
 }
+
+void xenbus_ring_ops_deinit(void)
+{
+	ring_ops = NULL;
+}
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..2e0ed46b05e7 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -741,6 +741,21 @@ static int __init xenstored_local_init(void)
 	return err;
 }
 
+static void xenstored_local_deinit(void)
+{
+	struct evtchn_close close;
+	void *page = NULL;
+
+	page = gfn_to_virt(xen_store_gfn);
+	free_page((unsigned long)page);
+
+	close.port = xen_store_evtchn;
+
+	HYPERVISOR_event_channel_op(EVTCHNOP_close, &close);
+
+	xen_store_evtchn = 0;
+}
+
 static int xenbus_resume_cb(struct notifier_block *nb,
 			    unsigned long action, void *data)
 {
@@ -765,7 +780,11 @@ static struct notifier_block xenbus_resume_nb = {
 	.notifier_call = xenbus_resume_cb,
 };
 
-static int __init xenbus_init(void)
+#ifdef CONFIG_XEN_COMPAT_XENFS
+struct proc_dir_entry *xen_procfs;
+#endif
+
+int xenbus_init(void)
 {
 	int err = 0;
 	uint64_t v = 0;
@@ -833,13 +852,39 @@ static int __init xenbus_init(void)
 	 * Create xenfs mountpoint in /proc for compatibility with
 	 * utilities that expect to find "xenbus" under "/proc/xen".
 	 */
-	proc_create_mount_point("xen");
+	xen_procfs = proc_create_mount_point("xen");
 #endif
 
 out_error:
 	return err;
 }
-
+EXPORT_SYMBOL_GPL(xenbus_init);
 postcore_initcall(xenbus_init);
 
+void xenbus_deinit(void)
+{
+	if (!xen_domain())
+		return;
+
+#ifdef CONFIG_XEN_COMPAT_XENFS
+	proc_remove(xen_procfs);
+	xen_procfs = NULL;
+#endif
+
+	xs_deinit();
+	xenstored_ready = 0;
+
+	switch (xen_store_domain_type) {
+	case XS_LOCAL:
+		xenstored_local_deinit();
+		xen_store_interface = NULL;
+		break;
+	default:
+		pr_warn("Xenstore state unknown\n");
+		break;
+	}
+	xenbus_ring_ops_deinit();
+}
+EXPORT_SYMBOL_GPL(xenbus_deinit);
+
 MODULE_LICENSE("GPL");
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 49a3874ae6bb..bd6db3703972 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -866,6 +866,7 @@ static int xenwatch_thread(void *unused)
 
 	for (;;) {
 		wait_event_interruptible(watch_events_waitq,
+					 kthread_should_stop() ||
 					 !list_empty(&watch_events));
 
 		if (kthread_should_stop())
@@ -917,6 +918,8 @@ static struct notifier_block xs_reboot_nb = {
 	.notifier_call = xs_reboot_notify,
 };
 
+static struct task_struct *xenwatch_task;
+
 int xs_init(void)
 {
 	int err;
@@ -932,9 +935,44 @@ int xs_init(void)
 	task = kthread_run(xenwatch_thread, NULL, "xenwatch");
 	if (IS_ERR(task))
 		return PTR_ERR(task);
+	xenwatch_task = task;
 
 	/* shutdown watches for kexec boot */
 	xs_reset_watches();
 
 	return 0;
 }
+
+void cancel_watches(void)
+{
+	struct xs_watch_event *event, *tmp;
+
+	/* Cancel pending watch events. */
+	spin_lock(&watch_events_lock);
+	list_for_each_entry_safe(event, tmp, &watch_events, list) {
+		list_del(&event->list);
+		kfree(event);
+	}
+	spin_unlock(&watch_events_lock);
+}
+
+void delete_watches(void)
+{
+	struct xenbus_watch *watch, *tmp;
+
+	spin_lock(&watches_lock);
+	list_for_each_entry_safe(watch, tmp, &watches, list) {
+		list_del(&watch->list);
+	}
+	spin_unlock(&watches_lock);
+}
+
+void xs_deinit(void)
+{
+	kthread_stop(xenwatch_task);
+	xenwatch_task = NULL;
+	xb_deinit_comms();
+	unregister_reboot_notifier(&xs_reboot_nb);
+	cancel_watches();
+	delete_watches();
+}
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 20/39] xen-blkback: module_exit support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (2 preceding siblings ...)
  2019-02-20 20:15 ` [PATCH RFC 19/39] xen/xenbus: xenbus uninit support Joao Martins
@ 2019-02-20 20:15 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 31/39] xen-shim: introduce shim domain driver Joao Martins
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Radim Krčmář, Ankur Arora,
	Paolo Bonzini, Boris Ostrovsky, Joao Martins,
	Roger Pau Monné


Implement module_exit to allow users to do module unload of blkback.
We prevent users from module unload whenever there are still interfaces
allocated, in other words, do module_get on xen_blkif_alloc() and
module_put on xen_blkif_free().

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |  8 ++++++++
 drivers/block/xen-blkback/common.h  |  2 ++
 drivers/block/xen-blkback/xenbus.c  | 14 ++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..d51d88be88e1 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1504,5 +1504,13 @@ static int __init xen_blkif_init(void)
 
 module_init(xen_blkif_init);
 
+static void __exit xen_blkif_exit(void)
+{
+	xen_blkif_interface_exit();
+	xen_blkif_xenbus_exit();
+}
+
+module_exit(xen_blkif_exit);
+
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("xen-backend:vbd");
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..3415c558e115 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -376,8 +376,10 @@ struct phys_req {
 	blkif_sector_t		sector_number;
 };
 int xen_blkif_interface_init(void);
+void xen_blkif_interface_exit(void);
 
 int xen_blkif_xenbus_init(void);
+void xen_blkif_xenbus_exit(void);
 
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..424e2efebe85 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -181,6 +181,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	init_completion(&blkif->drain_complete);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
+	__module_get(THIS_MODULE);
+
 	return blkif;
 }
 
@@ -328,6 +330,8 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 
 	/* Make sure everything is drained before shutting down */
 	kmem_cache_free(xen_blkif_cachep, blkif);
+
+	module_put(THIS_MODULE);
 }
 
 int __init xen_blkif_interface_init(void)
@@ -341,6 +345,11 @@ int __init xen_blkif_interface_init(void)
 	return 0;
 }
 
+void xen_blkif_interface_exit(void)
+{
+	kmem_cache_destroy(xen_blkif_cachep);
+}
+
 /*
  *  sysfs interface for VBD I/O requests
  */
@@ -1115,3 +1124,8 @@ int xen_blkif_xenbus_init(void)
 {
 	return xenbus_register_backend(&xen_blkbk_driver);
 }
+
+void xen_blkif_xenbus_exit(void)
+{
+	xenbus_unregister_driver(&xen_blkbk_driver);
+}
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 31/39] xen-shim: introduce shim domain driver
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (3 preceding siblings ...)
  2019-02-20 20:15 ` [PATCH RFC 20/39] xen-blkback: module_exit support Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 32/39] xen/balloon: xen_shim_domain() support Joao Martins
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Joao Martins, Stefano Stabellini,
	Radim Krčmář, x86, Ankur Arora, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky,
	Thomas Gleixner

From: Ankur Arora <ankur.a.arora@oracle.com>

xen-shim.ko sets up and tears down state needed to support Xen
backends. The underlying primitives that are exposed are interdomain
event-channels and grant-table map/unmap/copy.

We setup the following:

  * Initialize shared_info and vcpu_info pages, essentially setting
    up event-channel state.
  * Set up features (this allows xen_feature() to work)
  * Initialize event-channel subsystem (select event ops and related
    setup.)
  * Initialize xenbus and tear it down on module exit.

This functionality would be used by the backend drivers (e.g. netback,
scsiback, blkback etc) in order to drive guest I/O.

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 arch/x86/include/asm/kvm_host.h          |   2 +
 arch/x86/kvm/Kconfig                     |  10 +++
 arch/x86/kvm/Makefile                    |   1 +
 arch/x86/kvm/xen-shim.c                  | 107 +++++++++++++++++++++++++++++++
 arch/x86/xen/enlighten.c                 |  45 +++++++++++++
 drivers/xen/events/events_2l.c           |   2 +-
 drivers/xen/events/events_base.c         |   6 +-
 drivers/xen/features.c                   |   1 +
 drivers/xen/xenbus/xenbus_dev_frontend.c |   4 +-
 include/xen/xen.h                        |   5 ++
 include/xen/xenbus.h                     |   3 +
 11 files changed, 182 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kvm/xen-shim.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55609e919e14..6bdae8649d56 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -896,6 +896,8 @@ struct kvm_grant_table {
 /* Xen emulation context */
 struct kvm_xen {
 	u64 xen_hypercall;
+
+#define XEN_SHIM_DOMID	0
 	domid_t domid;
 
 	gfn_t shinfo_addr;
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 72fa955f4a15..47347df282dc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -96,6 +96,16 @@ config KVM_MMU_AUDIT
 	 This option adds a R/W kVM module parameter 'mmu_audit', which allows
 	 auditing of KVM MMU events at runtime.
 
+config XEN_SHIM
+       tristate "Xen hypercall emulation shim"
+       depends on KVM
+       depends on XEN
+       default m
+       help
+	Shim to support Xen hypercalls on non-Xen hosts. It intercepts grant
+	table and event channels hypercalls same way as Xen hypervisor. This is
+	useful for having Xen backend drivers work on KVM.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source "drivers/vhost/Kconfig"
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index c1eaabbd0a54..a96a96a002a7 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,3 +18,4 @@ kvm-amd-y		+= svm.o pmu_amd.o
 obj-$(CONFIG_KVM)	+= kvm.o
 obj-$(CONFIG_KVM_INTEL)	+= kvm-intel.o
 obj-$(CONFIG_KVM_AMD)	+= kvm-amd.o
+obj-$(CONFIG_XEN_SHIM)  += xen-shim.o
diff --git a/arch/x86/kvm/xen-shim.c b/arch/x86/kvm/xen-shim.c
new file mode 100644
index 000000000000..61fdceb63ec2
--- /dev/null
+++ b/arch/x86/kvm/xen-shim.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved.
+ *
+ * Xen hypercall emulation shim
+ */
+
+#define pr_fmt(fmt) "KVM:" KBUILD_MODNAME ": " fmt
+
+#include <asm/kvm_host.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+#include <xen/events.h>
+#include <xen/xenbus.h>
+
+#define BITS_PER_EVTCHN_WORD (sizeof(xen_ulong_t)*8)
+
+static struct kvm_xen shim = { .domid = XEN_SHIM_DOMID };
+
+static void shim_evtchn_setup(struct shared_info *s)
+{
+	int cpu;
+
+	/* Point Xen's shared_info to the domain's sinfo page */
+	HYPERVISOR_shared_info = s;
+
+	/* Evtchns will be marked pending on allocation */
+	memset(s->evtchn_pending, 0, sizeof(s->evtchn_pending));
+	/* ... but we do mask all of -- dom0 expect it. */
+	memset(s->evtchn_mask, 1, sizeof(s->evtchn_mask));
+
+	for_each_possible_cpu(cpu) {
+		struct vcpu_info *vcpu_info;
+		int i;
+
+		/* Direct CPU mapping as far as dom0 is concerned */
+		per_cpu(xen_vcpu_id, cpu) = cpu;
+
+		vcpu_info = &per_cpu(xen_vcpu_info, cpu);
+		memset(vcpu_info, 0, sizeof(*vcpu_info));
+
+		vcpu_info->evtchn_upcall_mask = 0;
+
+		vcpu_info->evtchn_upcall_pending = 0;
+		for (i = 0; i < BITS_PER_EVTCHN_WORD; i++)
+			clear_bit(i, &vcpu_info->evtchn_pending_sel);
+
+		per_cpu(xen_vcpu, cpu) = vcpu_info;
+	}
+}
+
+static int __init shim_register(void)
+{
+	struct shared_info *shinfo;
+
+	shinfo = (struct shared_info *)get_zeroed_page(GFP_KERNEL);
+	if (!shinfo) {
+		pr_err("Failed to allocate shared_info page\n");
+		return -ENOMEM;
+	}
+	shim.shinfo = shinfo;
+
+	idr_init(&shim.port_to_evt);
+	mutex_init(&shim.xen_lock);
+
+	kvm_xen_register_lcall(&shim);
+
+	/* We can handle hypercalls after this point */
+	xen_shim_domain = 1;
+
+	shim_evtchn_setup(shim.shinfo);
+
+	xen_setup_features();
+
+	xen_init_IRQ();
+
+	xenbus_init();
+
+	return 0;
+}
+
+static int __init shim_init(void)
+{
+	if (xen_domain())
+		return -ENODEV;
+
+	return shim_register();
+}
+
+static void __exit shim_exit(void)
+{
+	xenbus_deinit();
+	xen_shim_domain = 0;
+
+	kvm_xen_unregister_lcall();
+	HYPERVISOR_shared_info = NULL;
+	free_page((unsigned long) shim.shinfo);
+	shim.shinfo = NULL;
+}
+
+module_init(shim_init);
+module_exit(shim_exit)
+
+MODULE_AUTHOR("Ankur Arora <ankur.a.arora@oracle.com>,"
+	      "Joao Martins <joao.m.martins@oracle.com>");
+MODULE_LICENSE("GPL");
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index b36a10e6b5d7..8d9e93b6eb09 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -57,6 +57,9 @@ EXPORT_PER_CPU_SYMBOL(xen_vcpu_info);
 enum xen_domain_type xen_domain_type = XEN_NATIVE;
 EXPORT_SYMBOL_GPL(xen_domain_type);
 
+int xen_shim_domain;
+EXPORT_SYMBOL_GPL(xen_shim_domain);
+
 unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
 EXPORT_SYMBOL(machine_to_phys_mapping);
 unsigned long  machine_to_phys_nr;
@@ -349,3 +352,45 @@ void xen_arch_unregister_cpu(int num)
 }
 EXPORT_SYMBOL(xen_arch_unregister_cpu);
 #endif
+
+static struct module *find_module_shim(void)
+{
+	static const char name[] = "xen_shim";
+	struct module *module;
+
+	mutex_lock(&module_mutex);
+	module = find_module(name);
+	mutex_unlock(&module_mutex);
+
+	return module;
+}
+
+bool xen_shim_domain_get(void)
+{
+	struct module *shim;
+
+	if (!xen_shim_domain())
+		return false;
+
+	shim = find_module_shim();
+	if (!shim)
+		return false;
+
+	return try_module_get(shim);
+}
+EXPORT_SYMBOL(xen_shim_domain_get);
+
+void xen_shim_domain_put(void)
+{
+	struct module *shim;
+
+	if (!xen_shim_domain())
+		return;
+
+	shim = find_module_shim();
+	if (!shim)
+		return;
+
+	module_put(shim);
+}
+EXPORT_SYMBOL(xen_shim_domain_put);
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index b5acf4b09971..f08d13a033c1 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -89,7 +89,7 @@ static void evtchn_2l_unmask(unsigned port)
 	unsigned int cpu = get_cpu();
 	int do_hypercall = 0, evtchn_pending = 0;
 
-	BUG_ON(!irqs_disabled());
+	WARN_ON(!irqs_disabled());
 
 	if (unlikely((cpu != cpu_from_evtchn(port))))
 		do_hypercall = 1;
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 117e76b2f939..a2087287c3b6 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1665,7 +1665,7 @@ void xen_callback_vector(void) {}
 static bool fifo_events = true;
 module_param(fifo_events, bool, 0);
 
-void __init xen_init_IRQ(void)
+void xen_init_IRQ(void)
 {
 	int ret = -EINVAL;
 	unsigned int evtchn;
@@ -1683,6 +1683,9 @@ void __init xen_init_IRQ(void)
 	for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
 		mask_evtchn(evtchn);
 
+	if (xen_shim_domain())
+		return;
+
 	pirq_needs_eoi = pirq_needs_eoi_flag;
 
 #ifdef CONFIG_X86
@@ -1714,3 +1717,4 @@ void __init xen_init_IRQ(void)
 	}
 #endif
 }
+EXPORT_SYMBOL_GPL(xen_init_IRQ);
diff --git a/drivers/xen/features.c b/drivers/xen/features.c
index d7d34fdfc993..1518c3b6f004 100644
--- a/drivers/xen/features.c
+++ b/drivers/xen/features.c
@@ -31,3 +31,4 @@ void xen_setup_features(void)
 			xen_features[i * 32 + j] = !!(fi.submap & 1<<j);
 	}
 }
+EXPORT_SYMBOL_GPL(xen_setup_features);
diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
index c3e201025ef0..a4080d04a01c 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -680,7 +680,7 @@ static struct miscdevice xenbus_dev = {
 	.fops = &xen_xenbus_fops,
 };
 
-static int __init xenbus_init(void)
+static int __init xenbus_frontend_init(void)
 {
 	int err;
 
@@ -692,4 +692,4 @@ static int __init xenbus_init(void)
 		pr_err("Could not register xenbus frontend device\n");
 	return err;
 }
-device_initcall(xenbus_init);
+device_initcall(xenbus_frontend_init);
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 0e2156786ad2..04dfa99e67eb 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -10,8 +10,12 @@ enum xen_domain_type {
 
 #ifdef CONFIG_XEN
 extern enum xen_domain_type xen_domain_type;
+extern int xen_shim_domain;
+extern bool xen_shim_domain_get(void);
+extern void xen_shim_domain_put(void);
 #else
 #define xen_domain_type		XEN_NATIVE
+#define xen_shim_domain      0
 #endif
 
 #ifdef CONFIG_XEN_PVH
@@ -24,6 +28,7 @@ extern bool xen_pvh;
 #define xen_pv_domain()		(xen_domain_type == XEN_PV_DOMAIN)
 #define xen_hvm_domain()	(xen_domain_type == XEN_HVM_DOMAIN)
 #define xen_pvh_domain()	(xen_pvh)
+#define xen_shim_domain()	(!xen_domain() && xen_shim_domain)
 
 #include <linux/types.h>
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..d2789e7d2055 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -233,4 +233,7 @@ extern const struct file_operations xen_xenbus_fops;
 extern struct xenstore_domain_interface *xen_store_interface;
 extern int xen_store_evtchn;
 
+int xenbus_init(void);
+void xenbus_deinit(void);
+
 #endif /* _XEN_XENBUS_H */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 32/39] xen/balloon: xen_shim_domain() support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (4 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 31/39] xen-shim: introduce shim domain driver Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 33/39] xen/grant-table: " Joao Martins
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Joao Martins, Stefano Stabellini,
	Radim Krčmář, x86, Ankur Arora, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky,
	Thomas Gleixner

Xen ballooning uses hollow struct pages (with the underlying PFNs being
populated/unpopulated via hypercalls) which are used by the grant logic
to map grants from other domains.

For purposes of a KVM based xen-shim, this model is not useful --
mapping is unnecessary since all guest memory is already mapped in the
KVM host. The simplest option is to just translate grant references to
GPAs (essentially a get_page() on the appropriate GPA.)

This patch provides an alternate balloon allocation mechanism where in
the allocation path we just provide a constant struct page
(corresponding to page 0.) This allows the calling code -- which does a
page_to_pfn() on the returned struct page -- to remain unchanged before
doing the grant operation (which in this case would fill in the real
struct page.)

Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 arch/x86/kvm/xen-shim.c | 31 +++++++++++++++++++++++++++++++
 drivers/xen/balloon.c   | 15 ++++++++++++++-
 include/xen/balloon.h   |  7 +++++++
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/xen-shim.c b/arch/x86/kvm/xen-shim.c
index 61fdceb63ec2..4086d92a4bfb 100644
--- a/arch/x86/kvm/xen-shim.c
+++ b/arch/x86/kvm/xen-shim.c
@@ -13,11 +13,40 @@
 #include <xen/xen-ops.h>
 #include <xen/events.h>
 #include <xen/xenbus.h>
+#include <xen/balloon.h>
 
 #define BITS_PER_EVTCHN_WORD (sizeof(xen_ulong_t)*8)
 
 static struct kvm_xen shim = { .domid = XEN_SHIM_DOMID };
 
+static int shim_alloc_pages(int nr_pages, struct page **pages)
+{
+	int i;
+
+	/*
+	 * We provide page 0 instead of NULL because we'll effectively
+	 * do the inverse operation while deriving the pfn to pass to
+	 * xen for mapping.
+	 */
+	for (i = 0; i < nr_pages; i++)
+		pages[i] = pfn_to_page(0);
+
+	return 0;
+}
+
+static void shim_free_pages(int nr_pages, struct page **pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; i++)
+		pages[i] = NULL;
+}
+
+static struct xen_balloon_ops shim_balloon_ops = {
+	.alloc_pages = shim_alloc_pages,
+	.free_pages = shim_free_pages,
+};
+
 static void shim_evtchn_setup(struct shared_info *s)
 {
 	int cpu;
@@ -65,6 +94,7 @@ static int __init shim_register(void)
 	mutex_init(&shim.xen_lock);
 
 	kvm_xen_register_lcall(&shim);
+	xen_balloon_ops = &shim_balloon_ops;
 
 	/* We can handle hypercalls after this point */
 	xen_shim_domain = 1;
@@ -94,6 +124,7 @@ static void __exit shim_exit(void)
 	xen_shim_domain = 0;
 
 	kvm_xen_unregister_lcall();
+	xen_balloon_ops = NULL;
 	HYPERVISOR_shared_info = NULL;
 	free_page((unsigned long) shim.shinfo);
 	shim.shinfo = NULL;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048de9a7..00375fa6c122 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -138,7 +138,7 @@ enum bp_state {
 
 static DEFINE_MUTEX(balloon_mutex);
 
-struct balloon_stats balloon_stats;
+struct balloon_stats balloon_stats __read_mostly;
 EXPORT_SYMBOL_GPL(balloon_stats);
 
 /* We increase/decrease in batches which fit in a page */
@@ -158,6 +158,9 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process);
 #define GFP_BALLOON \
 	(GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
 
+struct xen_balloon_ops *xen_balloon_ops;
+EXPORT_SYMBOL(xen_balloon_ops);
+
 /* balloon_append: add the given page to the balloon. */
 static void __balloon_append(struct page *page)
 {
@@ -589,6 +592,11 @@ int alloc_xenballooned_pages(int nr_pages, struct page **pages)
 	struct page *page;
 	int ret;
 
+	if (xen_shim_domain() && xen_balloon_ops)
+		return xen_balloon_ops->alloc_pages(nr_pages, pages);
+
+	WARN_ON_ONCE(xen_shim_domain());
+
 	mutex_lock(&balloon_mutex);
 
 	balloon_stats.target_unpopulated += nr_pages;
@@ -634,6 +642,11 @@ void free_xenballooned_pages(int nr_pages, struct page **pages)
 {
 	int i;
 
+	if (xen_shim_domain() && xen_balloon_ops)
+		return xen_balloon_ops->free_pages(nr_pages, pages);
+
+	WARN_ON_ONCE(xen_shim_domain());
+
 	mutex_lock(&balloon_mutex);
 
 	for (i = 0; i < nr_pages; i++) {
diff --git a/include/xen/balloon.h b/include/xen/balloon.h
index 4914b93a23f2..9ba6a7e91d5e 100644
--- a/include/xen/balloon.h
+++ b/include/xen/balloon.h
@@ -22,6 +22,13 @@ struct balloon_stats {
 
 extern struct balloon_stats balloon_stats;
 
+struct xen_balloon_ops {
+	int (*alloc_pages)(int nr_pages, struct page **pages);
+	void (*free_pages)(int nr_pages, struct page **pages);
+};
+
+extern struct xen_balloon_ops *xen_balloon_ops;
+
 void balloon_set_new_target(unsigned long target);
 
 int alloc_xenballooned_pages(int nr_pages, struct page **pages);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 33/39] xen/grant-table: xen_shim_domain() support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (5 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 32/39] xen/balloon: xen_shim_domain() support Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 34/39] xen/gntdev: " Joao Martins
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	Ankur Arora, Paolo Bonzini, Boris Ostrovsky, Joao Martins

With xen-shim, allocate_xenballooned_pages() only allocates a
place-holder page (pfn 0) expecting a subsequent map_grant_ref to fix
it up.

However, this means that, until the grant operation
(GNTTABOP_map_grant_ref) provides a valid page, we cannot set
PagePrivate or save any state.

This patch elides the setting of that state if xen_shim_domain(). In
addition, gnttab_map_refs() now fills in the appropriate page returned
from the grant operation.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/xen/grant-table.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7ea6fb6a2e5d..ab05b70d98bb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -804,7 +804,7 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
 	int ret;
 
 	ret = alloc_xenballooned_pages(nr_pages, pages);
-	if (ret < 0)
+	if (ret < 0 || xen_shim_domain())
 		return ret;
 
 	ret = gnttab_pages_set_private(nr_pages, pages);
@@ -1045,6 +1045,11 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		{
 			struct xen_page_foreign *foreign;
 
+			if (xen_shim_domain()) {
+				pages[i] = virt_to_page(map_ops[i].host_addr);
+				continue;
+			}
+
 			SetPageForeign(pages[i]);
 			foreign = xen_page_foreign(pages[i]);
 			foreign->domid = map_ops[i].dom;
@@ -1085,8 +1090,10 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 	if (ret)
 		return ret;
 
-	for (i = 0; i < count; i++)
-		ClearPageForeign(pages[i]);
+	for (i = 0; i < count; i++) {
+		if (!xen_shim_domain())
+			ClearPageForeign(pages[i]);
+	}
 
 	return clear_foreign_p2m_mapping(unmap_ops, kunmap_ops, pages, count);
 }
@@ -1113,7 +1120,7 @@ static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
 	int pc;
 
 	for (pc = 0; pc < item->count; pc++) {
-		if (page_count(item->pages[pc]) > 1) {
+		if (page_count(item->pages[pc]) > 1 && !xen_shim_domain()) {
 			unsigned long delay = GNTTAB_UNMAP_REFS_DELAY * (item->age + 1);
 			schedule_delayed_work(&item->gnttab_work,
 					      msecs_to_jiffies(delay));
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 34/39] xen/gntdev: xen_shim_domain() support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (6 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 33/39] xen/grant-table: " Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 35/39] xen/xenbus: " Joao Martins
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	Ankur Arora, Paolo Bonzini, Boris Ostrovsky, Joao Martins

From: Ankur Arora <ankur.a.arora@oracle.com>

GNTTABOP_map_grant_ref treats host_addr as an OUT parameter for
xen_shim_domaim().

Accordingly it's updated in struct gnttab_unmap_grant_ref before it gets
used via GNTTABOP_unmap_grant_ref.

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/xen/gntdev.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5efc5eee9544..8540a51f7597 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -351,6 +351,8 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
 		}
 
 		map->unmap_ops[i].handle = map->map_ops[i].handle;
+		if (xen_shim_domain())
+			map->unmap_ops[i].host_addr = map->map_ops[i].host_addr;
 		if (use_ptemod)
 			map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
 #ifdef CONFIG_XEN_GRANT_DMA_ALLOC
@@ -1122,7 +1124,9 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 				(map->flags & GNTMAP_readonly))
 			goto out_unlock_put;
 	} else {
-		map->flags = GNTMAP_host_map;
+		map->flags = 0;
+		if (!xen_shim_domain())
+			map->flags = GNTMAP_host_map;
 		if (!(vma->vm_flags & VM_WRITE))
 			map->flags |= GNTMAP_readonly;
 	}
@@ -1207,7 +1211,7 @@ static int __init gntdev_init(void)
 {
 	int err;
 
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain_get())
 		return -ENODEV;
 
 	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
@@ -1215,6 +1219,7 @@ static int __init gntdev_init(void)
 	err = misc_register(&gntdev_miscdev);
 	if (err != 0) {
 		pr_err("Could not register gntdev device\n");
+		xen_shim_domain_put();
 		return err;
 	}
 	return 0;
@@ -1223,6 +1228,7 @@ static int __init gntdev_init(void)
 static void __exit gntdev_exit(void)
 {
 	misc_deregister(&gntdev_miscdev);
+	xen_shim_domain_put();
 }
 
 module_init(gntdev_init);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 35/39] xen/xenbus: xen_shim_domain() support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (7 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 34/39] xen/gntdev: " Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 36/39] drivers/xen: " Joao Martins
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	Ankur Arora, Paolo Bonzini, Boris Ostrovsky, Joao Martins

From: Ankur Arora <ankur.a.arora@oracle.com>

Fixup the gnttab unmap_ops (and other data structures) to handle
host_addr as an OUT parameter from the call to GNTTABOP_map_grant_ref.

Also, allow xenstored to be hosted in XS_LOCAL mode for
xen_shim_domain() -- this means that it does not need to acquire
xenstore evtchn and pfn externally.

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/xen/xenbus/xenbus_client.c       | 23 +++++++++++++++++------
 drivers/xen/xenbus/xenbus_dev_backend.c  |  4 ++--
 drivers/xen/xenbus/xenbus_dev_frontend.c |  4 ++--
 drivers/xen/xenbus/xenbus_probe.c        |  8 ++++----
 4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index ada1c9aa6525..b22149a789d4 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -487,8 +487,11 @@ static int __xenbus_map_ring(struct xenbus_device *dev,
 					 "mapping in shared page %d from domain %d",
 					 gnt_refs[i], dev->otherend_id);
 			goto fail;
-		} else
+		} else {
 			handles[i] = map[i].handle;
+			if (xen_shim_domain())
+				addrs[i] = map[i].host_addr;
+		}
 	}
 
 	return GNTST_okay;
@@ -498,7 +501,8 @@ static int __xenbus_map_ring(struct xenbus_device *dev,
 		if (handles[i] != INVALID_GRANT_HANDLE) {
 			memset(&unmap[j], 0, sizeof(unmap[j]));
 			gnttab_set_unmap_op(&unmap[j], (phys_addr_t)addrs[i],
-					    GNTMAP_host_map, handles[i]);
+					    !xen_shim_domain()?GNTMAP_host_map:0,
+					    handles[i]);
 			j++;
 		}
 	}
@@ -546,7 +550,7 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
 				      void **vaddr)
 {
 	struct xenbus_map_node *node;
-	int err;
+	int i, err;
 	void *addr;
 	bool leaked = false;
 	struct map_ring_valloc_hvm info = {
@@ -572,9 +576,16 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
 			     &info);
 
 	err = __xenbus_map_ring(dev, gnt_ref, nr_grefs, node->handles,
-				info.phys_addrs, GNTMAP_host_map, &leaked);
+				info.phys_addrs,
+				!xen_shim_domain() ? GNTMAP_host_map : 0,
+				&leaked);
 	node->nr_handles = nr_grefs;
 
+	if (xen_shim_domain()) {
+		for (i = 0; i < nr_grefs; i++)
+			node->hvm.pages[i] = virt_to_page(info.phys_addrs[i]);
+	}
+
 	if (err)
 		goto out_free_ballooned_pages;
 
@@ -882,7 +893,7 @@ int xenbus_unmap_ring(struct xenbus_device *dev,
 
 	for (i = 0; i < nr_handles; i++)
 		gnttab_set_unmap_op(&unmap[i], vaddrs[i],
-				    GNTMAP_host_map, handles[i]);
+				    !xen_shim_domain()?GNTMAP_host_map:0, handles[i]);
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap, i))
 		BUG();
@@ -926,7 +937,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
 	.unmap = xenbus_unmap_ring_vfree_hvm,
 };
 
-void __init xenbus_ring_ops_init(void)
+void xenbus_ring_ops_init(void)
 {
 #ifdef CONFIG_XEN_PV
 	if (!xen_feature(XENFEAT_auto_translated_physmap))
diff --git a/drivers/xen/xenbus/xenbus_dev_backend.c b/drivers/xen/xenbus/xenbus_dev_backend.c
index edba5fecde4d..b605c87bff76 100644
--- a/drivers/xen/xenbus/xenbus_dev_backend.c
+++ b/drivers/xen/xenbus/xenbus_dev_backend.c
@@ -119,11 +119,11 @@ static struct miscdevice xenbus_backend_dev = {
 	.fops = &xenbus_backend_fops,
 };
 
-static int __init xenbus_backend_init(void)
+int xenbus_backend_init(void)
 {
 	int err;
 
-	if (!xen_initial_domain())
+	if (!xen_initial_domain() && !xen_shim_domain())
 		return -ENODEV;
 
 	err = misc_register(&xenbus_backend_dev);
diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
index a4080d04a01c..c6fca6cca6c8 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -680,11 +680,11 @@ static struct miscdevice xenbus_dev = {
 	.fops = &xen_xenbus_fops,
 };
 
-static int __init xenbus_frontend_init(void)
+static int xenbus_frontend_init(void)
 {
 	int err;
 
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain())
 		return -ENODEV;
 
 	err = misc_register(&xenbus_dev);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 2e0ed46b05e7..bbc405cd01ef 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -693,7 +693,7 @@ EXPORT_SYMBOL_GPL(xenbus_probe);
 
 static int __init xenbus_probe_initcall(void)
 {
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain())
 		return -ENODEV;
 
 	if (xen_initial_domain() || xen_hvm_domain())
@@ -790,7 +790,7 @@ int xenbus_init(void)
 	uint64_t v = 0;
 	xen_store_domain_type = XS_UNKNOWN;
 
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain())
 		return -ENODEV;
 
 	xenbus_ring_ops_init();
@@ -799,7 +799,7 @@ int xenbus_init(void)
 		xen_store_domain_type = XS_PV;
 	if (xen_hvm_domain())
 		xen_store_domain_type = XS_HVM;
-	if (xen_hvm_domain() && xen_initial_domain())
+	if ((xen_hvm_domain() && xen_initial_domain()) || xen_shim_domain())
 		xen_store_domain_type = XS_LOCAL;
 	if (xen_pv_domain() && !xen_start_info->store_evtchn)
 		xen_store_domain_type = XS_LOCAL;
@@ -863,7 +863,7 @@ postcore_initcall(xenbus_init);
 
 void xenbus_deinit(void)
 {
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain())
 		return;
 
 #ifdef CONFIG_XEN_COMPAT_XENFS
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 36/39] drivers/xen: xen_shim_domain() support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (8 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 35/39] xen/xenbus: " Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 20:16 ` [PATCH RFC 38/39] xen-blkback: " Joao Martins
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	Ankur Arora, Paolo Bonzini, Boris Ostrovsky, Joao Martins

Enable /dev/xen/{gntdev,evtchn} and /proc/xen/ for xen_shim_domain().

These interfaces will be used by xenstored to initialize its
event channel port and the kva used to communicate with the
xenbus driver.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/xen/evtchn.c           | 4 +++-
 drivers/xen/privcmd.c          | 5 ++++-
 drivers/xen/xenbus/xenbus_xs.c | 2 +-
 drivers/xen/xenfs/super.c      | 7 ++++---
 drivers/xen/xenfs/xensyms.c    | 4 ++++
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6d1a5e58968f..367390eb5883 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -708,13 +708,14 @@ static int __init evtchn_init(void)
 {
 	int err;
 
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain_get())
 		return -ENODEV;
 
 	/* Create '/dev/xen/evtchn'. */
 	err = misc_register(&evtchn_miscdev);
 	if (err != 0) {
 		pr_err("Could not register /dev/xen/evtchn\n");
+		xen_shim_domain_put();
 		return err;
 	}
 
@@ -726,6 +727,7 @@ static int __init evtchn_init(void)
 static void __exit evtchn_cleanup(void)
 {
 	misc_deregister(&evtchn_miscdev);
+	xen_shim_domain_put();
 }
 
 module_init(evtchn_init);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index b24ddac1604b..a063ad5c8260 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -999,12 +999,13 @@ static int __init privcmd_init(void)
 {
 	int err;
 
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain_get())
 		return -ENODEV;
 
 	err = misc_register(&privcmd_dev);
 	if (err != 0) {
 		pr_err("Could not register Xen privcmd device\n");
+		xen_shim_domain_put();
 		return err;
 	}
 
@@ -1012,6 +1013,7 @@ static int __init privcmd_init(void)
 	if (err != 0) {
 		pr_err("Could not register Xen hypercall-buf device\n");
 		misc_deregister(&privcmd_dev);
+		xen_shim_domain_put();
 		return err;
 	}
 
@@ -1022,6 +1024,7 @@ static void __exit privcmd_exit(void)
 {
 	misc_deregister(&privcmd_dev);
 	misc_deregister(&xen_privcmdbuf_dev);
+	xen_shim_domain_put();
 }
 
 module_init(privcmd_init);
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index bd6db3703972..062c0a5fa415 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -735,7 +735,7 @@ static void xs_reset_watches(void)
 {
 	int err;
 
-	if (!xen_hvm_domain() || xen_initial_domain())
+	if (!xen_hvm_domain() || xen_initial_domain() || xen_shim_domain())
 		return;
 
 	if (xen_strict_xenbus_quirk())
diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 71ddfb4cf61c..2ef326645880 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -64,7 +64,8 @@ static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
 	};
 
 	return simple_fill_super(sb, XENFS_SUPER_MAGIC,
-			xen_initial_domain() ? xenfs_init_files : xenfs_files);
+			xen_initial_domain() || xen_shim_domain() ?
+				xenfs_init_files : xenfs_files);
 }
 
 static struct dentry *xenfs_mount(struct file_system_type *fs_type,
@@ -84,7 +85,7 @@ MODULE_ALIAS_FS("xenfs");
 
 static int __init xenfs_init(void)
 {
-	if (xen_domain())
+	if (xen_domain() || xen_shim_domain())
 		return register_filesystem(&xenfs_type);
 
 	return 0;
@@ -92,7 +93,7 @@ static int __init xenfs_init(void)
 
 static void __exit xenfs_exit(void)
 {
-	if (xen_domain())
+	if (xen_domain() || xen_shim_domain())
 		unregister_filesystem(&xenfs_type);
 }
 
diff --git a/drivers/xen/xenfs/xensyms.c b/drivers/xen/xenfs/xensyms.c
index c6c73a33c44d..79bccb53d344 100644
--- a/drivers/xen/xenfs/xensyms.c
+++ b/drivers/xen/xenfs/xensyms.c
@@ -8,6 +8,7 @@
 #include <xen/interface/platform.h>
 #include <asm/xen/hypercall.h>
 #include <xen/xen-ops.h>
+#include <xen/xen.h>
 #include "xenfs.h"
 
 
@@ -114,6 +115,9 @@ static int xensyms_open(struct inode *inode, struct file *file)
 	struct xensyms *xs;
 	int ret;
 
+	if (xen_shim_domain())
+		return -EINVAL;
+
 	ret = seq_open_private(file, &xensyms_seq_ops,
 			       sizeof(struct xensyms));
 	if (ret)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH RFC 38/39] xen-blkback: xen_shim_domain() support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (9 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 36/39] drivers/xen: " Joao Martins
@ 2019-02-20 20:16 ` Joao Martins
  2019-02-20 21:09 ` [PATCH RFC 00/39] x86/KVM: Xen HVM guest support Paolo Bonzini
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-20 20:16 UTC (permalink / raw)
  To: kvm, linux-kernel, xen-devel
  Cc: Konrad Rzeszutek Wilk, Radim Krčmář, Ankur Arora,
	Paolo Bonzini, Boris Ostrovsky, Joao Martins,
	Roger Pau Monné

With xen_shim_domain() enabled, the struct pages used for grant mappings
are only valid in the interval between GNTTABOP_map_grant_ref and
GNTTABOP_unmap_grant_ref.

Ensure that we do not cache the struct page outside that duration.

Also, update the struct page for the segment after
GNTTABOP_map_grant_ref, for the subsequent tracking or when doing IO.

In addition, we disable persistent grants for this configuration, since
the map/unmap overhead is fairly small (in the fast path, page lookup,
get_page() and put_page().) This reduces the memory usage in
blkback/blkfront.

Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 drivers/block/xen-blkback/blkback.c | 19 ++++++++++++++++---
 drivers/block/xen-blkback/xenbus.c  |  5 +++--
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index d51d88be88e1..20c15e6787b1 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -167,6 +167,15 @@ static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **pag
 	unsigned long flags;
 	int i;
 
+	/*
+	 * We don't own the struct page after unmap has been called.
+	 * Reallocation is cheap anyway for the shim domain case, so free it.
+	 */
+	if (xen_shim_domain()) {
+		gnttab_free_pages(num, page);
+		return;
+	}
+
 	spin_lock_irqsave(&ring->free_pages_lock, flags);
 	for (i = 0; i < num; i++)
 		list_add(&page[i]->lru, &ring->free_pages);
@@ -825,7 +834,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 	 */
 again:
 	for (i = map_until; i < num; i++) {
-		uint32_t flags;
+		uint32_t flags = 0;
 
 		if (use_persistent_gnts) {
 			persistent_gnt = get_persistent_gnt(
@@ -846,7 +855,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 			addr = vaddr(pages[i]->page);
 			pages_to_gnt[segs_to_map] = pages[i]->page;
 			pages[i]->persistent_gnt = NULL;
-			flags = GNTMAP_host_map;
+			if (!xen_shim_domain())
+				flags = GNTMAP_host_map;
 			if (!use_persistent_gnts && ro)
 				flags |= GNTMAP_readonly;
 			gnttab_set_map_op(&map[segs_to_map++], addr,
@@ -880,6 +890,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 				goto next;
 			}
 			pages[seg_idx]->handle = map[new_map_idx].handle;
+			if (xen_shim_domain())
+				pages[seg_idx]->page = pages_to_gnt[new_map_idx];
 		} else {
 			continue;
 		}
@@ -1478,7 +1490,7 @@ static int __init xen_blkif_init(void)
 {
 	int rc = 0;
 
-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain_get())
 		return -ENODEV;
 
 	if (xen_blkif_max_ring_order > XENBUS_MAX_RING_GRANT_ORDER) {
@@ -1508,6 +1520,7 @@ static void __exit xen_blkif_exit(void)
 {
 	xen_blkif_interface_exit();
 	xen_blkif_xenbus_exit();
+	xen_shim_domain_put();
 }
 
 module_exit(xen_blkif_exit);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 424e2efebe85..11be0c052b77 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -871,7 +871,8 @@ static void connect(struct backend_info *be)
 
 	xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
-	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
+	err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
+			    !xen_shim_domain());
 	if (err) {
 		xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
 				 dev->nodename);
@@ -1059,7 +1060,7 @@ static int connect_ring(struct backend_info *be)
 	}
 	pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
 					   0);
-	be->blkif->vbd.feature_gnt_persistent = pers_grants;
+	be->blkif->vbd.feature_gnt_persistent = pers_grants && !xen_shim_domain();
 	be->blkif->vbd.overflow_max_grants = 0;
 
 	/*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (10 preceding siblings ...)
  2019-02-20 20:16 ` [PATCH RFC 38/39] xen-blkback: " Joao Martins
@ 2019-02-20 21:09 ` Paolo Bonzini
  2019-02-20 23:39 ` Marek Marczykowski-Górecki
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2019-02-20 21:09 UTC (permalink / raw)
  To: Joao Martins, kvm, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	x86, Ankur Arora, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	xen-devel, Boris Ostrovsky, Thomas Gleixner

On 20/02/19 21:15, Joao Martins wrote:
>  2. PV Driver support (patches 17 - 39)
> 
>  We start by redirecting hypercalls from the backend to routines
>  which emulate the behaviour that PV backends expect i.e. grant
>  table and interdomain events. Next, we add support for late
>  initialization of xenbus, followed by implementing
>  frontend/backend communication mechanisms (i.e. grant tables and
>  interdomain event channels). Finally, introduce xen-shim.ko,
>  which will setup a limited Xen environment. This uses the added
>  functionality of Xen specific shared memory (grant tables) and
>  notifications (event channels).

I am a bit worried by the last patches, they seem really brittle and
prone to breakage.  I don't know Xen well enough to understand if the
lack of support for GNTMAP_host_map is fixable, but if not, you have to
define a completely different hypercall.

Of course, tests are missing.  You should use the
tools/testing/selftests/kvm/ framework, and ideally each patch should
come with coverage for the newly-added code.

Thanks,

Paolo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
                   ` (11 preceding siblings ...)
  2019-02-20 21:09 ` [PATCH RFC 00/39] x86/KVM: Xen HVM guest support Paolo Bonzini
@ 2019-02-20 23:39 ` Marek Marczykowski-Górecki
       [not found] ` <35051310-c497-8ad5-4434-1b8426a317d2@redhat.com>
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-02-20 23:39 UTC (permalink / raw)
  To: Joao Martins
  Cc: Juergen Gross, Stefano Stabellini, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	xen-devel, Boris Ostrovsky, Thomas Gleixner


[-- Attachment #1.1: Type: text/plain, Size: 1098 bytes --]

On Wed, Feb 20, 2019 at 08:15:30PM +0000, Joao Martins wrote:
>  2. PV Driver support (patches 17 - 39)
> 
>  We start by redirecting hypercalls from the backend to routines
>  which emulate the behaviour that PV backends expect i.e. grant
>  table and interdomain events. Next, we add support for late
>  initialization of xenbus, followed by implementing
>  frontend/backend communication mechanisms (i.e. grant tables and
>  interdomain event channels). Finally, introduce xen-shim.ko,
>  which will setup a limited Xen environment. This uses the added
>  functionality of Xen specific shared memory (grant tables) and
>  notifications (event channels).

Does it mean backends could be run in another guest, similarly as on
real Xen? AFAIK virtio doesn't allow that as virtio backends need
arbitrary write access to guest memory. But grant tables provide enough
abstraction to do that safely.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] ` <35051310-c497-8ad5-4434-1b8426a317d2@redhat.com>
@ 2019-02-21  0:29   ` Ankur Arora
  2019-02-21 11:45   ` Joao Martins
       [not found]   ` <8b1f4912-4f92-69ae-ae01-d899d5640572@oracle.com>
  2 siblings, 0 replies; 43+ messages in thread
From: Ankur Arora @ 2019-02-21  0:29 UTC (permalink / raw)
  To: Paolo Bonzini, Joao Martins, kvm, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Radim Krčmář,
	x86, Ingo Molnar, Borislav Petkov, H. Peter Anvin, xen-devel,
	Boris Ostrovsky, Thomas Gleixner

On 2/20/19 1:09 PM, Paolo Bonzini wrote:
> On 20/02/19 21:15, Joao Martins wrote:
>>   2. PV Driver support (patches 17 - 39)
>>
>>   We start by redirecting hypercalls from the backend to routines
>>   which emulate the behaviour that PV backends expect i.e. grant
>>   table and interdomain events. Next, we add support for late
>>   initialization of xenbus, followed by implementing
>>   frontend/backend communication mechanisms (i.e. grant tables and
>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>   which will setup a limited Xen environment. This uses the added
>>   functionality of Xen specific shared memory (grant tables) and
>>   notifications (event channels).
> 
> I am a bit worried by the last patches, they seem really brittle and
> prone to breakage.  I don't know Xen well enough to understand if the
> lack of support for GNTMAP_host_map is fixable, but if not, you have to
> define a completely different hypercall.
I assume you are aware of most of this, but just in case, here's the
flow when a backend driver wants to map a grant-reference in the
host: it allocates an empty struct page (via ballooning) and does a
map_grant_ref(GNTMAP_host_map) hypercall. In response, Xen validates the
grant-reference and maps it onto the address associated with the struct
page.
After this, from the POV of the underlying network/block drivers, these
struct pages can be used as just regular pages.

To support this in a KVM environment, where AFAICS no remapping of pages
is possible, the idea was to make minimal changes to the backend drivers
such that map_grant_ref() could just return the PFN from which the
backend could derive the struct page.

To ensure that backends -- when running in this environment -- have been
modified to deal with these new semantics, our map_grant_ref()
implementation explicitly disallows the GNTMAP_host_map flag.

Now if I'm reading you right, you would prefer something more
straightforward -- perhaps similar semantics but a new flag that
makes this behaviour explicit?

> 
> Of course, tests are missing.  You should use the
> tools/testing/selftests/kvm/ framework, and ideally each patch should
> come with coverage for the newly-added code.
Agreed.

Thanks
Ankur

> 
> Thanks,
> 
> Paolo
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] ` <20190220233941.GA5279@mail-itl>
@ 2019-02-21  0:31   ` Ankur Arora
  2019-02-21  7:57   ` Juergen Gross
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Ankur Arora @ 2019-02-21  0:31 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Joao Martins
  Cc: Juergen Gross, Stefano Stabellini, kvm,
	Radim Krčmář, x86, linux-kernel, Paolo Bonzini,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, xen-devel,
	Boris Ostrovsky, Thomas Gleixner



On 2/20/19 3:39 PM, Marek Marczykowski-Górecki wrote:
> On Wed, Feb 20, 2019 at 08:15:30PM +0000, Joao Martins wrote:
>>   2. PV Driver support (patches 17 - 39)
>>
>>   We start by redirecting hypercalls from the backend to routines
>>   which emulate the behaviour that PV backends expect i.e. grant
>>   table and interdomain events. Next, we add support for late
>>   initialization of xenbus, followed by implementing
>>   frontend/backend communication mechanisms (i.e. grant tables and
>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>   which will setup a limited Xen environment. This uses the added
>>   functionality of Xen specific shared memory (grant tables) and
>>   notifications (event channels).
> 
> Does it mean backends could be run in another guest, similarly as on
> real Xen? AFAIK virtio doesn't allow that as virtio backends need
I'm afraid not. For now grant operations (map/unmap) can only be done
by backends to the local KVM instance.

Ankur

> arbitrary write access to guest memory. But grant tables provide enough
> abstraction to do that safely.

> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] ` <20190220233941.GA5279@mail-itl>
  2019-02-21  0:31   ` Ankur Arora
@ 2019-02-21  7:57   ` Juergen Gross
  2019-02-21 11:55   ` Joao Martins
       [not found]   ` <58ff93e1-6c91-c1a6-4273-531c28101569@suse.com>
  3 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2019-02-21  7:57 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Joao Martins
  Cc: Stefano Stabellini, kvm, Radim Krčmář, x86,
	linux-kernel, Ankur Arora, Paolo Bonzini, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, xen-devel, Boris Ostrovsky,
	Thomas Gleixner

On 21/02/2019 00:39, Marek Marczykowski-Górecki wrote:
> On Wed, Feb 20, 2019 at 08:15:30PM +0000, Joao Martins wrote:
>>  2. PV Driver support (patches 17 - 39)
>>
>>  We start by redirecting hypercalls from the backend to routines
>>  which emulate the behaviour that PV backends expect i.e. grant
>>  table and interdomain events. Next, we add support for late
>>  initialization of xenbus, followed by implementing
>>  frontend/backend communication mechanisms (i.e. grant tables and
>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>  which will setup a limited Xen environment. This uses the added
>>  functionality of Xen specific shared memory (grant tables) and
>>  notifications (event channels).
> 
> Does it mean backends could be run in another guest, similarly as on
> real Xen? AFAIK virtio doesn't allow that as virtio backends need
> arbitrary write access to guest memory. But grant tables provide enough
> abstraction to do that safely.

As long as the grant table emulation in xen-shim isn't just a wrapper to
"normal" KVM guest memory access.

I guess the xen-shim implementation doesn't support the same kind of
guest memory isolation as Xen does?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] ` <35051310-c497-8ad5-4434-1b8426a317d2@redhat.com>
  2019-02-21  0:29   ` Ankur Arora
@ 2019-02-21 11:45   ` Joao Martins
       [not found]   ` <8b1f4912-4f92-69ae-ae01-d899d5640572@oracle.com>
  2 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-21 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Juergen Gross, Stefano Stabellini, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, xen-devel,
	Boris Ostrovsky, Thomas Gleixner

On 2/20/19 9:09 PM, Paolo Bonzini wrote:
> On 20/02/19 21:15, Joao Martins wrote:
>>  2. PV Driver support (patches 17 - 39)
>>
>>  We start by redirecting hypercalls from the backend to routines
>>  which emulate the behaviour that PV backends expect i.e. grant
>>  table and interdomain events. Next, we add support for late
>>  initialization of xenbus, followed by implementing
>>  frontend/backend communication mechanisms (i.e. grant tables and
>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>  which will setup a limited Xen environment. This uses the added
>>  functionality of Xen specific shared memory (grant tables) and
>>  notifications (event channels).
> 
> I am a bit worried by the last patches, they seem really brittle and
> prone to breakage.  I don't know Xen well enough to understand if the
> lack of support for GNTMAP_host_map is fixable, but if not, you have to
> define a completely different hypercall.
> 
I guess Ankur already answered this; so just to stack this on top of his comment.

The xen_shim_domain() is only meant to handle the case where the backend
has/can-have full access to guest memory [i.e. netback and blkback would work
with similar assumptions as vhost?]. For the normal case, where a backend *in a
guest* maps and unmaps other guest memory, this is not applicable and these
changes don't affect that case.

IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
actual hypercalls but rather invoking shim_hypercall(). The call chain would go
more or less like:

<netback|blkback|scsiback>
 gnttab_map_refs(map_ops, pages)
   HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
     shim_hypercall()
       shim_hcall_gntmap()

Our reasoning was that given we are already in KVM, why mapping a page if the
user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
the shim determines its user doesn't want to map the page. Also, there's another
issue where PV backends always need a struct page to reference the device
inflight data as Ankur pointed out.

> Of course, tests are missing.

FWIW: this was deliberate as we wanted to get folks impressions before
proceeding further with the work.

> You should use the
> tools/testing/selftests/kvm/ framework, and ideally each patch should
> come with coverage for the newly-added code.
> Got it.

Cheers,
	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found] ` <20190220233941.GA5279@mail-itl>
  2019-02-21  0:31   ` Ankur Arora
  2019-02-21  7:57   ` Juergen Gross
@ 2019-02-21 11:55   ` Joao Martins
       [not found]   ` <58ff93e1-6c91-c1a6-4273-531c28101569@suse.com>
  3 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-21 11:55 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Juergen Gross, Stefano Stabellini, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	xen-devel, Boris Ostrovsky, Thomas Gleixner

On 2/20/19 11:39 PM, Marek Marczykowski-Górecki wrote:
> On Wed, Feb 20, 2019 at 08:15:30PM +0000, Joao Martins wrote:
>>  2. PV Driver support (patches 17 - 39)
>>
>>  We start by redirecting hypercalls from the backend to routines
>>  which emulate the behaviour that PV backends expect i.e. grant
>>  table and interdomain events. Next, we add support for late
>>  initialization of xenbus, followed by implementing
>>  frontend/backend communication mechanisms (i.e. grant tables and
>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>  which will setup a limited Xen environment. This uses the added
>>  functionality of Xen specific shared memory (grant tables) and
>>  notifications (event channels).
> 
> Does it mean backends could be run in another guest, similarly as on
> real Xen? AFAIK virtio doesn't allow that as virtio backends need
> arbitrary write access to guest memory. But grant tables provide enough
> abstraction to do that safely.
> 
In this series not yet. Here we are trying to resemble how KVM drives its kernel
PV backends (i.e. vhost virtio).

The domU grant {un,}mapping could be added complementary. The main difference
between domU gntmap vs shim gntmap is that the former maps/unmaps the  guest
page on the backend. Most of it is common code I think.

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found]   ` <58ff93e1-6c91-c1a6-4273-531c28101569@suse.com>
@ 2019-02-21 12:00     ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-21 12:00 UTC (permalink / raw)
  To: Juergen Gross, Marek Marczykowski-Górecki
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 2/21/19 7:57 AM, Juergen Gross wrote:
> On 21/02/2019 00:39, Marek Marczykowski-Górecki wrote:
>> On Wed, Feb 20, 2019 at 08:15:30PM +0000, Joao Martins wrote:
>>>  2. PV Driver support (patches 17 - 39)
>>>
>>>  We start by redirecting hypercalls from the backend to routines
>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>  table and interdomain events. Next, we add support for late
>>>  initialization of xenbus, followed by implementing
>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>  which will setup a limited Xen environment. This uses the added
>>>  functionality of Xen specific shared memory (grant tables) and
>>>  notifications (event channels).
>>
>> Does it mean backends could be run in another guest, similarly as on
>> real Xen? AFAIK virtio doesn't allow that as virtio backends need
>> arbitrary write access to guest memory. But grant tables provide enough
>> abstraction to do that safely.
> 
> As long as the grant table emulation in xen-shim isn't just a wrapper to
> "normal" KVM guest memory access.
> 
> I guess the xen-shim implementation doesn't support the same kind of
> guest memory isolation as Xen does?
> 
It doesn't, but it's also two different usecases.

The xen-shim is meant to when PV backend lives in the hypervisor (similar model
as KVM vhost), whereas domU grant mapping that Marek is asking about require
additional hypercalls handled by guest (i.e. in kvm_xen_hypercall). This would
equate to how Xen currently performs grant mapping/unmapping.

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found]   ` <8b1f4912-4f92-69ae-ae01-d899d5640572@oracle.com>
@ 2019-02-22 16:59     ` Paolo Bonzini
       [not found]     ` <3ee91f33-2973-c2db-386f-afbf138081b4@redhat.com>
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2019-02-22 16:59 UTC (permalink / raw)
  To: Joao Martins
  Cc: Juergen Gross, Stefano Stabellini, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, xen-devel,
	Boris Ostrovsky, Thomas Gleixner

On 21/02/19 12:45, Joao Martins wrote:
> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>> On 20/02/19 21:15, Joao Martins wrote:
>>>  2. PV Driver support (patches 17 - 39)
>>>
>>>  We start by redirecting hypercalls from the backend to routines
>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>  table and interdomain events. Next, we add support for late
>>>  initialization of xenbus, followed by implementing
>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>  which will setup a limited Xen environment. This uses the added
>>>  functionality of Xen specific shared memory (grant tables) and
>>>  notifications (event channels).
>>
>> I am a bit worried by the last patches, they seem really brittle and
>> prone to breakage.  I don't know Xen well enough to understand if the
>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>> define a completely different hypercall.
>>
> I guess Ankur already answered this; so just to stack this on top of his comment.
> 
> The xen_shim_domain() is only meant to handle the case where the backend
> has/can-have full access to guest memory [i.e. netback and blkback would work
> with similar assumptions as vhost?]. For the normal case, where a backend *in a
> guest* maps and unmaps other guest memory, this is not applicable and these
> changes don't affect that case.
> 
> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
> more or less like:
> 
> <netback|blkback|scsiback>
>  gnttab_map_refs(map_ops, pages)
>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>      shim_hypercall()
>        shim_hcall_gntmap()
> 
> Our reasoning was that given we are already in KVM, why mapping a page if the
> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
> the shim determines its user doesn't want to map the page. Also, there's another
> issue where PV backends always need a struct page to reference the device
> inflight data as Ankur pointed out.

Ultimately it's up to the Xen people.  It does make their API uglier,
especially the in/out change for the parameter.  If you can at least
avoid that, it would alleviate my concerns quite a bit.

Paolo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 20/39] xen-blkback: module_exit support
       [not found] ` <20190220201609.28290-21-joao.m.martins@oracle.com>
@ 2019-02-25 18:57   ` Konrad Rzeszutek Wilk
       [not found]   ` <20190225185719.GA16013@char.us.oracle.com>
  1 sibling, 0 replies; 43+ messages in thread
From: Konrad Rzeszutek Wilk @ 2019-02-25 18:57 UTC (permalink / raw)
  To: Joao Martins
  Cc: kvm, Radim Krčmář, linux-kernel, Ankur Arora,
	Paolo Bonzini, xen-devel, Boris Ostrovsky, Roger Pau Monné

On Wed, Feb 20, 2019 at 08:15:50PM +0000, Joao Martins wrote:
> 
> Implement module_exit to allow users to do module unload of blkback.
> We prevent users from module unload whenever there are still interfaces
> allocated, in other words, do module_get on xen_blkif_alloc() and
> module_put on xen_blkif_free().

This patch looks like it can go now in right?

> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  8 ++++++++
>  drivers/block/xen-blkback/common.h  |  2 ++
>  drivers/block/xen-blkback/xenbus.c  | 14 ++++++++++++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index fd1e19f1a49f..d51d88be88e1 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1504,5 +1504,13 @@ static int __init xen_blkif_init(void)
>  
>  module_init(xen_blkif_init);
>  
> +static void __exit xen_blkif_exit(void)
> +{
> +	xen_blkif_interface_exit();
> +	xen_blkif_xenbus_exit();
> +}
> +
> +module_exit(xen_blkif_exit);
> +
>  MODULE_LICENSE("Dual BSD/GPL");
>  MODULE_ALIAS("xen-backend:vbd");
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index 1d3002d773f7..3415c558e115 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -376,8 +376,10 @@ struct phys_req {
>  	blkif_sector_t		sector_number;
>  };
>  int xen_blkif_interface_init(void);
> +void xen_blkif_interface_exit(void);
>  
>  int xen_blkif_xenbus_init(void);
> +void xen_blkif_xenbus_exit(void);
>  
>  irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
>  int xen_blkif_schedule(void *arg);
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index a4bc74e72c39..424e2efebe85 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -181,6 +181,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  	init_completion(&blkif->drain_complete);
>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
>  
> +	__module_get(THIS_MODULE);
> +
>  	return blkif;
>  }
>  
> @@ -328,6 +330,8 @@ static void xen_blkif_free(struct xen_blkif *blkif)
>  
>  	/* Make sure everything is drained before shutting down */
>  	kmem_cache_free(xen_blkif_cachep, blkif);
> +
> +	module_put(THIS_MODULE);
>  }
>  
>  int __init xen_blkif_interface_init(void)
> @@ -341,6 +345,11 @@ int __init xen_blkif_interface_init(void)
>  	return 0;
>  }
>  
> +void xen_blkif_interface_exit(void)
> +{
> +	kmem_cache_destroy(xen_blkif_cachep);
> +}
> +
>  /*
>   *  sysfs interface for VBD I/O requests
>   */
> @@ -1115,3 +1124,8 @@ int xen_blkif_xenbus_init(void)
>  {
>  	return xenbus_register_backend(&xen_blkbk_driver);
>  }
> +
> +void xen_blkif_xenbus_exit(void)
> +{
> +	xenbus_unregister_driver(&xen_blkbk_driver);
> +}
> -- 
> 2.11.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 20/39] xen-blkback: module_exit support
       [not found]   ` <20190225185719.GA16013@char.us.oracle.com>
@ 2019-02-26 11:20     ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-02-26 11:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kvm, Radim Krčmář, linux-kernel, Ankur Arora,
	Paolo Bonzini, xen-devel, Boris Ostrovsky, Roger Pau Monné

On 2/25/19 6:57 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Feb 20, 2019 at 08:15:50PM +0000, Joao Martins wrote:
>>
>> Implement module_exit to allow users to do module unload of blkback.
>> We prevent users from module unload whenever there are still interfaces
>> allocated, in other words, do module_get on xen_blkif_alloc() and
>> module_put on xen_blkif_free().
> 
> This patch looks like it can go now in right?
> 
Yes.

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
       [not found]     ` <3ee91f33-2973-c2db-386f-afbf138081b4@redhat.com>
@ 2019-03-12 17:14       ` Joao Martins
  2019-04-08  6:44         ` Juergen Gross
  2019-04-08  6:44         ` [Xen-devel] " Juergen Gross
  0 siblings, 2 replies; 43+ messages in thread
From: Joao Martins @ 2019-03-12 17:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Juergen Gross, Stefano Stabellini, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, xen-devel,
	Boris Ostrovsky, Thomas Gleixner

On 2/22/19 4:59 PM, Paolo Bonzini wrote:
> On 21/02/19 12:45, Joao Martins wrote:
>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>  2. PV Driver support (patches 17 - 39)
>>>>
>>>>  We start by redirecting hypercalls from the backend to routines
>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>  table and interdomain events. Next, we add support for late
>>>>  initialization of xenbus, followed by implementing
>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>  which will setup a limited Xen environment. This uses the added
>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>  notifications (event channels).
>>>
>>> I am a bit worried by the last patches, they seem really brittle and
>>> prone to breakage.  I don't know Xen well enough to understand if the
>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>> define a completely different hypercall.
>>>
>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>
>> The xen_shim_domain() is only meant to handle the case where the backend
>> has/can-have full access to guest memory [i.e. netback and blkback would work
>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>> guest* maps and unmaps other guest memory, this is not applicable and these
>> changes don't affect that case.
>>
>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>> more or less like:
>>
>> <netback|blkback|scsiback>
>>  gnttab_map_refs(map_ops, pages)
>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>      shim_hypercall()
>>        shim_hcall_gntmap()
>>
>> Our reasoning was that given we are already in KVM, why mapping a page if the
>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>> the shim determines its user doesn't want to map the page. Also, there's another
>> issue where PV backends always need a struct page to reference the device
>> inflight data as Ankur pointed out.
> 
> Ultimately it's up to the Xen people.  It does make their API uglier,
> especially the in/out change for the parameter.  If you can at least
> avoid that, it would alleviate my concerns quite a bit.

In my view, we have two options overall:

1) Make it explicit, the changes the PV drivers we have to make in
order to support xen_shim_domain(). This could mean e.g. a) add a callback
argument to gnttab_map_refs() that is invoked for every page that gets looked up
successfully, and inside this callback the PV driver may update it's tracking
page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
shim_domain specific bits would be a little more abstracted from Xen PV
backends. See netback example below the scissors mark. Or b) have sort of a
translate_gref() and put_gref() API that Xen PV drivers use which make it even
more explicit that there's no grant ops involved. The latter is more invasive.

2) The second option is to support guest grant mapping/unmapping [*] to allow
hosting PV backends inside the guest. This would remove the Xen changes in this
series completely. But it would require another guest being used
as netback/blkback/xenstored, and less performance than 1) (though, in theory,
it would be equivalent to what does Xen with grants/events). The only changes in
Linux Xen code is adding xenstored domain support, but that is useful on its own
outside the scope of this work.

I think there's value on both; 1) is probably more familiar for KVM users
perhaps (as it is similar to what vhost does?) while  2) equates to implementing
Xen disagregation capabilities in KVM.

Thoughts? Xen maintainers what's your take on this?

	Joao

[*] Interdomain events would also have to change.

---------------- >8 ----------------

It isn't much cleaner, but PV drivers avoid/hide a bunch of xen_shim_domain()
conditionals in the data path. It is more explicit while avoiding the in/out
parameter change in gnttab_map_refs.

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 936c0b3e0ba2..c6e47dcb7e10 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -158,6 +158,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+	struct gnttab_page_changed page_cb[MAX_PENDING_REQS];
 	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
 	struct page *pages_to_map[MAX_PENDING_REQS];
 	struct page *pages_to_unmap[MAX_PENDING_REQS];
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 80aae3a32c2a..56788d8cd813 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -324,15 +324,29 @@ struct xenvif_tx_cb {

 #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)

+static inline void xenvif_tx_page_changed(phys_addr_t addr, void *opaque)
+{
+	struct page **page = opaque;
+
+	*page = virt_to_page(addr);
+}
 static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
 					   u16 pending_idx,
 					   struct xen_netif_tx_request *txp,
 					   unsigned int extra_count,
 					   struct gnttab_map_grant_ref *mop)
 {
-	queue->pages_to_map[mop-queue->tx_map_ops] = queue->mmap_pages[pending_idx];
+	u32 map_idx = mop - queue->tx_map_ops;
+
+	queue->pages_to_map[map_idx] = queue->mmap_pages[pending_idx];
+	queue->page_cb[map_idx].ctx = &queue->mmap_pages[pending_idx];
+	queue->page_cb[map_idx].cb = xenvif_tx_page_changed;
+
 	gnttab_set_map_op(mop, idx_to_kaddr(queue, pending_idx),
-			  GNTMAP_host_map | GNTMAP_readonly,
+			  GNTTAB_host_map | GNTMAP_readonly,
 			  txp->gref, queue->vif->domid);

 	memcpy(&queue->pending_tx_info[pending_idx].req, txp,
@@ -1268,7 +1283,7 @@ static inline void xenvif_tx_dealloc_action(struct
xenvif_queue *queue)
 				queue->mmap_pages[pending_idx];
 			gnttab_set_unmap_op(gop,
 					    idx_to_kaddr(queue, pending_idx),
-					    GNTMAP_host_map,
+					    GNTTAB_host_map,
 					    queue->grant_tx_handle[pending_idx]);
 			xenvif_grant_handle_reset(queue, pending_idx);
 			++gop;
@@ -1322,7 +1337,7 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
 	gnttab_batch_copy(queue->tx_copy_ops, nr_cops);
 	if (nr_mops != 0) {
 		ret = gnttab_map_refs(queue->tx_map_ops,
-				      NULL,
+				      NULL, queue->page_cb,
 				      queue->pages_to_map,
 				      nr_mops);
 		BUG_ON(ret);
@@ -1394,7 +1409,7 @@ void xenvif_idx_unmap(struct xenvif_queue *queue, u16
pending_idx)

 	gnttab_set_unmap_op(&tx_unmap_op,
 			    idx_to_kaddr(queue, pending_idx),
-			    GNTMAP_host_map,
+			    GNTTAB_host_map,
 			    queue->grant_tx_handle[pending_idx]);
 	xenvif_grant_handle_reset(queue, pending_idx);

@@ -1622,7 +1637,7 @@ static int __init netback_init(void)
 {
 	int rc = 0;

-	if (!xen_domain())
+	if (!xen_domain() && !xen_shim_domain_get())
 		return -ENODEV;

 	/* Allow as many queues as there are CPUs but max. 8 if user has not
@@ -1663,6 +1678,7 @@ static void __exit netback_fini(void)
 	debugfs_remove_recursive(xen_netback_dbg_root);
 #endif /* CONFIG_DEBUG_FS */
 	xenvif_xenbus_fini();
+	xen_shim_domain_put();
 }
 module_exit(netback_fini);

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7ea6fb6a2e5d..b4c9d7ff531f 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1031,6 +1031,7 @@ void gnttab_foreach_grant(struct page **pages,

 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
+		    struct gnttab_page_changed *page_cb,
 		    struct page **pages, unsigned int count)
 {
 	int i, ret;
@@ -1045,6 +1046,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		{
 			struct xen_page_foreign *foreign;

+			if (xen_shim_domain() && page_cb) {
+				page_cb[i].cb(map_ops[i].host_addr,
+					      page_cb[i].ctx);
+				continue;
+			}
+
 			SetPageForeign(pages[i]);
 			foreign = xen_page_foreign(pages[i]);
 			foreign->domid = map_ops[i].dom;
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 9bc5bc07d4d3..5e17fa08e779 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -55,6 +55,9 @@
 /* NR_GRANT_FRAMES must be less than or equal to that configured in Xen */
 #define NR_GRANT_FRAMES 4

+/* Selects host map only if on native Xen */
+#define GNTTAB_host_map (xen_shim_domain() ? 0 : GNTMAP_host_map)
+
 struct gnttab_free_callback {
 	struct gnttab_free_callback *next;
 	void (*fn)(void *);
@@ -78,6 +81,12 @@ struct gntab_unmap_queue_data
 	unsigned int age;
 };

+struct gnttab_page_changed
+{
+	void (*cb)(phys_addr_t addr, void *opaque);
+	void *ctx;
+};
+
 int gnttab_init(void);
 int gnttab_suspend(void);
 int gnttab_resume(void);
@@ -221,6 +230,7 @@ void gnttab_pages_clear_private(int nr_pages, struct page
**pages);

 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
+		    struct gnttab_page_changed *cb,
 		    struct page **pages, unsigned int count);
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 		      struct gnttab_unmap_grant_ref *kunmap_ops,

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-03-12 17:14       ` Joao Martins
@ 2019-04-08  6:44         ` Juergen Gross
  2019-04-08  6:44         ` [Xen-devel] " Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-08  6:44 UTC (permalink / raw)
  To: Joao Martins, Paolo Bonzini
  Cc: Stefano Stabellini, kvm, Radim Krčmář, x86,
	linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, xen-devel, Boris Ostrovsky, Thomas Gleixner

On 12/03/2019 18:14, Joao Martins wrote:
> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>> On 21/02/19 12:45, Joao Martins wrote:
>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>
>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>  table and interdomain events. Next, we add support for late
>>>>>  initialization of xenbus, followed by implementing
>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>  notifications (event channels).
>>>>
>>>> I am a bit worried by the last patches, they seem really brittle and
>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>> define a completely different hypercall.
>>>>
>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>
>>> The xen_shim_domain() is only meant to handle the case where the backend
>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>> changes don't affect that case.
>>>
>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>> more or less like:
>>>
>>> <netback|blkback|scsiback>
>>>  gnttab_map_refs(map_ops, pages)
>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>      shim_hypercall()
>>>        shim_hcall_gntmap()
>>>
>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>> the shim determines its user doesn't want to map the page. Also, there's another
>>> issue where PV backends always need a struct page to reference the device
>>> inflight data as Ankur pointed out.
>>
>> Ultimately it's up to the Xen people.  It does make their API uglier,
>> especially the in/out change for the parameter.  If you can at least
>> avoid that, it would alleviate my concerns quite a bit.
> 
> In my view, we have two options overall:
> 
> 1) Make it explicit, the changes the PV drivers we have to make in
> order to support xen_shim_domain(). This could mean e.g. a) add a callback
> argument to gnttab_map_refs() that is invoked for every page that gets looked up
> successfully, and inside this callback the PV driver may update it's tracking
> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
> shim_domain specific bits would be a little more abstracted from Xen PV
> backends. See netback example below the scissors mark. Or b) have sort of a
> translate_gref() and put_gref() API that Xen PV drivers use which make it even
> more explicit that there's no grant ops involved. The latter is more invasive.
> 
> 2) The second option is to support guest grant mapping/unmapping [*] to allow
> hosting PV backends inside the guest. This would remove the Xen changes in this
> series completely. But it would require another guest being used
> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
> it would be equivalent to what does Xen with grants/events). The only changes in
> Linux Xen code is adding xenstored domain support, but that is useful on its own
> outside the scope of this work.
> 
> I think there's value on both; 1) is probably more familiar for KVM users
> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
> Xen disagregation capabilities in KVM.
> 
> Thoughts? Xen maintainers what's your take on this?

What I'd like best would be a new handle (e.g. xenhost_t *) used as an
abstraction layer for this kind of stuff. It should be passed to the
backends and those would pass it on to low-level Xen drivers (xenbus,
event channels, grant table, ...).

I was planning to do that (the xenhost_t * stuff) soon in order to add
support for nested Xen using PV devices (you need two Xenstores for that
as the nested dom0 is acting as Xen backend server, while using PV
frontends for accessing the "real" world outside).

The xenhost_t should be used for:

- accessing Xenstore
- issuing and receiving events
- doing hypercalls
- grant table operations

So exactly the kind of stuff you want to do, too.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-03-12 17:14       ` Joao Martins
  2019-04-08  6:44         ` Juergen Gross
@ 2019-04-08  6:44         ` Juergen Gross
  2019-04-08 10:36           ` Joao Martins
  2019-04-08 10:36           ` [Xen-devel] " Joao Martins
  1 sibling, 2 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-08  6:44 UTC (permalink / raw)
  To: Joao Martins, Paolo Bonzini
  Cc: Stefano Stabellini, kvm, Radim Krčmář, x86,
	linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, xen-devel, Boris Ostrovsky, Thomas Gleixner

On 12/03/2019 18:14, Joao Martins wrote:
> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>> On 21/02/19 12:45, Joao Martins wrote:
>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>
>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>  table and interdomain events. Next, we add support for late
>>>>>  initialization of xenbus, followed by implementing
>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>  notifications (event channels).
>>>>
>>>> I am a bit worried by the last patches, they seem really brittle and
>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>> define a completely different hypercall.
>>>>
>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>
>>> The xen_shim_domain() is only meant to handle the case where the backend
>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>> changes don't affect that case.
>>>
>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>> more or less like:
>>>
>>> <netback|blkback|scsiback>
>>>  gnttab_map_refs(map_ops, pages)
>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>      shim_hypercall()
>>>        shim_hcall_gntmap()
>>>
>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>> the shim determines its user doesn't want to map the page. Also, there's another
>>> issue where PV backends always need a struct page to reference the device
>>> inflight data as Ankur pointed out.
>>
>> Ultimately it's up to the Xen people.  It does make their API uglier,
>> especially the in/out change for the parameter.  If you can at least
>> avoid that, it would alleviate my concerns quite a bit.
> 
> In my view, we have two options overall:
> 
> 1) Make it explicit, the changes the PV drivers we have to make in
> order to support xen_shim_domain(). This could mean e.g. a) add a callback
> argument to gnttab_map_refs() that is invoked for every page that gets looked up
> successfully, and inside this callback the PV driver may update it's tracking
> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
> shim_domain specific bits would be a little more abstracted from Xen PV
> backends. See netback example below the scissors mark. Or b) have sort of a
> translate_gref() and put_gref() API that Xen PV drivers use which make it even
> more explicit that there's no grant ops involved. The latter is more invasive.
> 
> 2) The second option is to support guest grant mapping/unmapping [*] to allow
> hosting PV backends inside the guest. This would remove the Xen changes in this
> series completely. But it would require another guest being used
> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
> it would be equivalent to what does Xen with grants/events). The only changes in
> Linux Xen code is adding xenstored domain support, but that is useful on its own
> outside the scope of this work.
> 
> I think there's value on both; 1) is probably more familiar for KVM users
> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
> Xen disagregation capabilities in KVM.
> 
> Thoughts? Xen maintainers what's your take on this?

What I'd like best would be a new handle (e.g. xenhost_t *) used as an
abstraction layer for this kind of stuff. It should be passed to the
backends and those would pass it on to low-level Xen drivers (xenbus,
event channels, grant table, ...).

I was planning to do that (the xenhost_t * stuff) soon in order to add
support for nested Xen using PV devices (you need two Xenstores for that
as the nested dom0 is acting as Xen backend server, while using PV
frontends for accessing the "real" world outside).

The xenhost_t should be used for:

- accessing Xenstore
- issuing and receiving events
- doing hypercalls
- grant table operations

So exactly the kind of stuff you want to do, too.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08  6:44         ` [Xen-devel] " Juergen Gross
@ 2019-04-08 10:36           ` Joao Martins
  2019-04-08 10:36           ` [Xen-devel] " Joao Martins
  1 sibling, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-04-08 10:36 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 4/8/19 7:44 AM, Juergen Gross wrote:
> On 12/03/2019 18:14, Joao Martins wrote:
>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>> On 21/02/19 12:45, Joao Martins wrote:
>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>
>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>  initialization of xenbus, followed by implementing
>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>  notifications (event channels).
>>>>>
>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>> define a completely different hypercall.
>>>>>
>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>
>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>> changes don't affect that case.
>>>>
>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>> more or less like:
>>>>
>>>> <netback|blkback|scsiback>
>>>>  gnttab_map_refs(map_ops, pages)
>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>      shim_hypercall()
>>>>        shim_hcall_gntmap()
>>>>
>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>> issue where PV backends always need a struct page to reference the device
>>>> inflight data as Ankur pointed out.
>>>
>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>> especially the in/out change for the parameter.  If you can at least
>>> avoid that, it would alleviate my concerns quite a bit.
>>
>> In my view, we have two options overall:
>>
>> 1) Make it explicit, the changes the PV drivers we have to make in
>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>> successfully, and inside this callback the PV driver may update it's tracking
>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>> shim_domain specific bits would be a little more abstracted from Xen PV
>> backends. See netback example below the scissors mark. Or b) have sort of a
>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>> more explicit that there's no grant ops involved. The latter is more invasive.
>>
>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>> hosting PV backends inside the guest. This would remove the Xen changes in this
>> series completely. But it would require another guest being used
>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>> it would be equivalent to what does Xen with grants/events). The only changes in
>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>> outside the scope of this work.
>>
>> I think there's value on both; 1) is probably more familiar for KVM users
>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>> Xen disagregation capabilities in KVM.
>>
>> Thoughts? Xen maintainers what's your take on this?
> 
> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
> abstraction layer for this kind of stuff. It should be passed to the
> backends and those would pass it on to low-level Xen drivers (xenbus,
> event channels, grant table, ...).
> 
So if IIRC backends would use the xenhost layer to access grants or frames
referenced by grants, and that would grok into some of this. IOW, you would have
two implementors of xenhost: one for nested remote/local events+grants and
another for this "shim domain" ?

> I was planning to do that (the xenhost_t * stuff) soon in order to add
> support for nested Xen using PV devices (you need two Xenstores for that
> as the nested dom0 is acting as Xen backend server, while using PV
> frontends for accessing the "real" world outside).
> 
> The xenhost_t should be used for:
> 
> - accessing Xenstore
> - issuing and receiving events
> - doing hypercalls
> - grant table operations
> 

In the text above, I sort of suggested a slice of this on 1.b) with a
translate_gref() and put_gref() API -- to get the page from a gref. This was
because of the flags|host_addr hurdle we depicted above wrt to using using grant
maps/unmaps. You think some of the xenhost layer would be ammenable to support
this case?

> So exactly the kind of stuff you want to do, too.
> 
Cool idea!

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08  6:44         ` [Xen-devel] " Juergen Gross
  2019-04-08 10:36           ` Joao Martins
@ 2019-04-08 10:36           ` Joao Martins
  2019-04-08 10:42             ` Juergen Gross
  2019-04-08 10:42             ` Juergen Gross
  1 sibling, 2 replies; 43+ messages in thread
From: Joao Martins @ 2019-04-08 10:36 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 4/8/19 7:44 AM, Juergen Gross wrote:
> On 12/03/2019 18:14, Joao Martins wrote:
>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>> On 21/02/19 12:45, Joao Martins wrote:
>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>
>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>  initialization of xenbus, followed by implementing
>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>  notifications (event channels).
>>>>>
>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>> define a completely different hypercall.
>>>>>
>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>
>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>> changes don't affect that case.
>>>>
>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>> more or less like:
>>>>
>>>> <netback|blkback|scsiback>
>>>>  gnttab_map_refs(map_ops, pages)
>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>      shim_hypercall()
>>>>        shim_hcall_gntmap()
>>>>
>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>> issue where PV backends always need a struct page to reference the device
>>>> inflight data as Ankur pointed out.
>>>
>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>> especially the in/out change for the parameter.  If you can at least
>>> avoid that, it would alleviate my concerns quite a bit.
>>
>> In my view, we have two options overall:
>>
>> 1) Make it explicit, the changes the PV drivers we have to make in
>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>> successfully, and inside this callback the PV driver may update it's tracking
>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>> shim_domain specific bits would be a little more abstracted from Xen PV
>> backends. See netback example below the scissors mark. Or b) have sort of a
>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>> more explicit that there's no grant ops involved. The latter is more invasive.
>>
>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>> hosting PV backends inside the guest. This would remove the Xen changes in this
>> series completely. But it would require another guest being used
>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>> it would be equivalent to what does Xen with grants/events). The only changes in
>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>> outside the scope of this work.
>>
>> I think there's value on both; 1) is probably more familiar for KVM users
>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>> Xen disagregation capabilities in KVM.
>>
>> Thoughts? Xen maintainers what's your take on this?
> 
> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
> abstraction layer for this kind of stuff. It should be passed to the
> backends and those would pass it on to low-level Xen drivers (xenbus,
> event channels, grant table, ...).
> 
So if IIRC backends would use the xenhost layer to access grants or frames
referenced by grants, and that would grok into some of this. IOW, you would have
two implementors of xenhost: one for nested remote/local events+grants and
another for this "shim domain" ?

> I was planning to do that (the xenhost_t * stuff) soon in order to add
> support for nested Xen using PV devices (you need two Xenstores for that
> as the nested dom0 is acting as Xen backend server, while using PV
> frontends for accessing the "real" world outside).
> 
> The xenhost_t should be used for:
> 
> - accessing Xenstore
> - issuing and receiving events
> - doing hypercalls
> - grant table operations
> 

In the text above, I sort of suggested a slice of this on 1.b) with a
translate_gref() and put_gref() API -- to get the page from a gref. This was
because of the flags|host_addr hurdle we depicted above wrt to using using grant
maps/unmaps. You think some of the xenhost layer would be ammenable to support
this case?

> So exactly the kind of stuff you want to do, too.
> 
Cool idea!

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 10:36           ` [Xen-devel] " Joao Martins
  2019-04-08 10:42             ` Juergen Gross
@ 2019-04-08 10:42             ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-08 10:42 UTC (permalink / raw)
  To: Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 08/04/2019 12:36, Joao Martins wrote:
> On 4/8/19 7:44 AM, Juergen Gross wrote:
>> On 12/03/2019 18:14, Joao Martins wrote:
>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>>
>>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>>  initialization of xenbus, followed by implementing
>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>>  notifications (event channels).
>>>>>>
>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>> define a completely different hypercall.
>>>>>>
>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>
>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>> changes don't affect that case.
>>>>>
>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>> more or less like:
>>>>>
>>>>> <netback|blkback|scsiback>
>>>>>  gnttab_map_refs(map_ops, pages)
>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>      shim_hypercall()
>>>>>        shim_hcall_gntmap()
>>>>>
>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>> issue where PV backends always need a struct page to reference the device
>>>>> inflight data as Ankur pointed out.
>>>>
>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>> especially the in/out change for the parameter.  If you can at least
>>>> avoid that, it would alleviate my concerns quite a bit.
>>>
>>> In my view, we have two options overall:
>>>
>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>> successfully, and inside this callback the PV driver may update it's tracking
>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>
>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>> series completely. But it would require another guest being used
>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>> outside the scope of this work.
>>>
>>> I think there's value on both; 1) is probably more familiar for KVM users
>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>> Xen disagregation capabilities in KVM.
>>>
>>> Thoughts? Xen maintainers what's your take on this?
>>
>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>> abstraction layer for this kind of stuff. It should be passed to the
>> backends and those would pass it on to low-level Xen drivers (xenbus,
>> event channels, grant table, ...).
>>
> So if IIRC backends would use the xenhost layer to access grants or frames
> referenced by grants, and that would grok into some of this. IOW, you would have
> two implementors of xenhost: one for nested remote/local events+grants and
> another for this "shim domain" ?

As I'd need that for nested Xen I guess that would make it 3 variants.
Probably the xen-shim variant would need more hooks, but that should be
no problem.

>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>> support for nested Xen using PV devices (you need two Xenstores for that
>> as the nested dom0 is acting as Xen backend server, while using PV
>> frontends for accessing the "real" world outside).
>>
>> The xenhost_t should be used for:
>>
>> - accessing Xenstore
>> - issuing and receiving events
>> - doing hypercalls
>> - grant table operations
>>
> 
> In the text above, I sort of suggested a slice of this on 1.b) with a
> translate_gref() and put_gref() API -- to get the page from a gref. This was
> because of the flags|host_addr hurdle we depicted above wrt to using using grant
> maps/unmaps. You think some of the xenhost layer would be ammenable to support
> this case?

I think so, yes.

> 
>> So exactly the kind of stuff you want to do, too.
>>
> Cool idea!

In the end you might make my life easier for nested Xen. :-)

Do you want to have a try with that idea or should I do that? I might be
able to start working on that in about a month.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 10:36           ` [Xen-devel] " Joao Martins
@ 2019-04-08 10:42             ` Juergen Gross
  2019-04-08 17:31               ` Joao Martins
  2019-04-08 17:31               ` [Xen-devel] " Joao Martins
  2019-04-08 10:42             ` Juergen Gross
  1 sibling, 2 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-08 10:42 UTC (permalink / raw)
  To: Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 08/04/2019 12:36, Joao Martins wrote:
> On 4/8/19 7:44 AM, Juergen Gross wrote:
>> On 12/03/2019 18:14, Joao Martins wrote:
>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>>
>>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>>  initialization of xenbus, followed by implementing
>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>>  notifications (event channels).
>>>>>>
>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>> define a completely different hypercall.
>>>>>>
>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>
>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>> changes don't affect that case.
>>>>>
>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>> more or less like:
>>>>>
>>>>> <netback|blkback|scsiback>
>>>>>  gnttab_map_refs(map_ops, pages)
>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>      shim_hypercall()
>>>>>        shim_hcall_gntmap()
>>>>>
>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>> issue where PV backends always need a struct page to reference the device
>>>>> inflight data as Ankur pointed out.
>>>>
>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>> especially the in/out change for the parameter.  If you can at least
>>>> avoid that, it would alleviate my concerns quite a bit.
>>>
>>> In my view, we have two options overall:
>>>
>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>> successfully, and inside this callback the PV driver may update it's tracking
>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>
>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>> series completely. But it would require another guest being used
>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>> outside the scope of this work.
>>>
>>> I think there's value on both; 1) is probably more familiar for KVM users
>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>> Xen disagregation capabilities in KVM.
>>>
>>> Thoughts? Xen maintainers what's your take on this?
>>
>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>> abstraction layer for this kind of stuff. It should be passed to the
>> backends and those would pass it on to low-level Xen drivers (xenbus,
>> event channels, grant table, ...).
>>
> So if IIRC backends would use the xenhost layer to access grants or frames
> referenced by grants, and that would grok into some of this. IOW, you would have
> two implementors of xenhost: one for nested remote/local events+grants and
> another for this "shim domain" ?

As I'd need that for nested Xen I guess that would make it 3 variants.
Probably the xen-shim variant would need more hooks, but that should be
no problem.

>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>> support for nested Xen using PV devices (you need two Xenstores for that
>> as the nested dom0 is acting as Xen backend server, while using PV
>> frontends for accessing the "real" world outside).
>>
>> The xenhost_t should be used for:
>>
>> - accessing Xenstore
>> - issuing and receiving events
>> - doing hypercalls
>> - grant table operations
>>
> 
> In the text above, I sort of suggested a slice of this on 1.b) with a
> translate_gref() and put_gref() API -- to get the page from a gref. This was
> because of the flags|host_addr hurdle we depicted above wrt to using using grant
> maps/unmaps. You think some of the xenhost layer would be ammenable to support
> this case?

I think so, yes.

> 
>> So exactly the kind of stuff you want to do, too.
>>
> Cool idea!

In the end you might make my life easier for nested Xen. :-)

Do you want to have a try with that idea or should I do that? I might be
able to start working on that in about a month.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 10:42             ` Juergen Gross
@ 2019-04-08 17:31               ` Joao Martins
  2019-04-08 17:31               ` [Xen-devel] " Joao Martins
  1 sibling, 0 replies; 43+ messages in thread
From: Joao Martins @ 2019-04-08 17:31 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 4/8/19 11:42 AM, Juergen Gross wrote:
> On 08/04/2019 12:36, Joao Martins wrote:
>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>>>
>>>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>>>  initialization of xenbus, followed by implementing
>>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>>>  notifications (event channels).
>>>>>>>
>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>> define a completely different hypercall.
>>>>>>>
>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>
>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>> changes don't affect that case.
>>>>>>
>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>> more or less like:
>>>>>>
>>>>>> <netback|blkback|scsiback>
>>>>>>  gnttab_map_refs(map_ops, pages)
>>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>      shim_hypercall()
>>>>>>        shim_hcall_gntmap()
>>>>>>
>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>> inflight data as Ankur pointed out.
>>>>>
>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>> especially the in/out change for the parameter.  If you can at least
>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>
>>>> In my view, we have two options overall:
>>>>
>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>
>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>> series completely. But it would require another guest being used
>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>> outside the scope of this work.
>>>>
>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>> Xen disagregation capabilities in KVM.
>>>>
>>>> Thoughts? Xen maintainers what's your take on this?
>>>
>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>> abstraction layer for this kind of stuff. It should be passed to the
>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>> event channels, grant table, ...).
>>>
>> So if IIRC backends would use the xenhost layer to access grants or frames
>> referenced by grants, and that would grok into some of this. IOW, you would have
>> two implementors of xenhost: one for nested remote/local events+grants and
>> another for this "shim domain" ?
> 
> As I'd need that for nested Xen I guess that would make it 3 variants.
> Probably the xen-shim variant would need more hooks, but that should be
> no problem.
> 
I probably messed up in the short description but "nested remote/local
events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
So maybe only 2 variants are needed?

>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>> support for nested Xen using PV devices (you need two Xenstores for that
>>> as the nested dom0 is acting as Xen backend server, while using PV
>>> frontends for accessing the "real" world outside).
>>>
>>> The xenhost_t should be used for:
>>>
>>> - accessing Xenstore
>>> - issuing and receiving events
>>> - doing hypercalls
>>> - grant table operations
>>>
>>
>> In the text above, I sort of suggested a slice of this on 1.b) with a
>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>> this case?
> 
> I think so, yes.
> 
>>
>>> So exactly the kind of stuff you want to do, too.
>>>
>> Cool idea!
> 
> In the end you might make my life easier for nested Xen. :-)
> 
Hehe :)

> Do you want to have a try with that idea or should I do that? I might be
> able to start working on that in about a month.
> 
Ankur (CC'ed) will give a shot at it, and should start a new thread on this
xenhost abstraction layer.

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 10:42             ` Juergen Gross
  2019-04-08 17:31               ` Joao Martins
@ 2019-04-08 17:31               ` Joao Martins
  2019-04-09  0:35                 ` Stefano Stabellini
                                   ` (3 more replies)
  1 sibling, 4 replies; 43+ messages in thread
From: Joao Martins @ 2019-04-08 17:31 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 4/8/19 11:42 AM, Juergen Gross wrote:
> On 08/04/2019 12:36, Joao Martins wrote:
>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>>>
>>>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>>>  initialization of xenbus, followed by implementing
>>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>>>  notifications (event channels).
>>>>>>>
>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>> define a completely different hypercall.
>>>>>>>
>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>
>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>> changes don't affect that case.
>>>>>>
>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>> more or less like:
>>>>>>
>>>>>> <netback|blkback|scsiback>
>>>>>>  gnttab_map_refs(map_ops, pages)
>>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>      shim_hypercall()
>>>>>>        shim_hcall_gntmap()
>>>>>>
>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>> inflight data as Ankur pointed out.
>>>>>
>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>> especially the in/out change for the parameter.  If you can at least
>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>
>>>> In my view, we have two options overall:
>>>>
>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>
>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>> series completely. But it would require another guest being used
>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>> outside the scope of this work.
>>>>
>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>> Xen disagregation capabilities in KVM.
>>>>
>>>> Thoughts? Xen maintainers what's your take on this?
>>>
>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>> abstraction layer for this kind of stuff. It should be passed to the
>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>> event channels, grant table, ...).
>>>
>> So if IIRC backends would use the xenhost layer to access grants or frames
>> referenced by grants, and that would grok into some of this. IOW, you would have
>> two implementors of xenhost: one for nested remote/local events+grants and
>> another for this "shim domain" ?
> 
> As I'd need that for nested Xen I guess that would make it 3 variants.
> Probably the xen-shim variant would need more hooks, but that should be
> no problem.
> 
I probably messed up in the short description but "nested remote/local
events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
So maybe only 2 variants are needed?

>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>> support for nested Xen using PV devices (you need two Xenstores for that
>>> as the nested dom0 is acting as Xen backend server, while using PV
>>> frontends for accessing the "real" world outside).
>>>
>>> The xenhost_t should be used for:
>>>
>>> - accessing Xenstore
>>> - issuing and receiving events
>>> - doing hypercalls
>>> - grant table operations
>>>
>>
>> In the text above, I sort of suggested a slice of this on 1.b) with a
>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>> this case?
> 
> I think so, yes.
> 
>>
>>> So exactly the kind of stuff you want to do, too.
>>>
>> Cool idea!
> 
> In the end you might make my life easier for nested Xen. :-)
> 
Hehe :)

> Do you want to have a try with that idea or should I do that? I might be
> able to start working on that in about a month.
> 
Ankur (CC'ed) will give a shot at it, and should start a new thread on this
xenhost abstraction layer.

	Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 17:31               ` [Xen-devel] " Joao Martins
  2019-04-09  0:35                 ` Stefano Stabellini
@ 2019-04-09  0:35                 ` Stefano Stabellini
  2019-04-09  5:04                 ` [Xen-devel] " Juergen Gross
  2019-04-09  5:04                 ` Juergen Gross
  3 siblings, 0 replies; 43+ messages in thread
From: Stefano Stabellini @ 2019-04-09  0:35 UTC (permalink / raw)
  To: Joao Martins
  Cc: Juergen Gross, Artem_Mygaiev, Stefano Stabellini, xen-devel, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Boris Ostrovsky, Thomas Gleixner

On Mon, 8 Apr 2019, Joao Martins wrote:
> On 4/8/19 11:42 AM, Juergen Gross wrote:
> > On 08/04/2019 12:36, Joao Martins wrote:
> >> On 4/8/19 7:44 AM, Juergen Gross wrote:
> >>> On 12/03/2019 18:14, Joao Martins wrote:
> >>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
> >>>>> On 21/02/19 12:45, Joao Martins wrote:
> >>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
> >>>>>>> On 20/02/19 21:15, Joao Martins wrote:
> >>>>>>>>  2. PV Driver support (patches 17 - 39)
> >>>>>>>>
> >>>>>>>>  We start by redirecting hypercalls from the backend to routines
> >>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
> >>>>>>>>  table and interdomain events. Next, we add support for late
> >>>>>>>>  initialization of xenbus, followed by implementing
> >>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
> >>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
> >>>>>>>>  which will setup a limited Xen environment. This uses the added
> >>>>>>>>  functionality of Xen specific shared memory (grant tables) and
> >>>>>>>>  notifications (event channels).
> >>>>>>>
> >>>>>>> I am a bit worried by the last patches, they seem really brittle and
> >>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
> >>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
> >>>>>>> define a completely different hypercall.
> >>>>>>>
> >>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
> >>>>>>
> >>>>>> The xen_shim_domain() is only meant to handle the case where the backend
> >>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
> >>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
> >>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
> >>>>>> changes don't affect that case.
> >>>>>>
> >>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
> >>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
> >>>>>> more or less like:
> >>>>>>
> >>>>>> <netback|blkback|scsiback>
> >>>>>>  gnttab_map_refs(map_ops, pages)
> >>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
> >>>>>>      shim_hypercall()
> >>>>>>        shim_hcall_gntmap()
> >>>>>>
> >>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
> >>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
> >>>>>> the shim determines its user doesn't want to map the page. Also, there's another
> >>>>>> issue where PV backends always need a struct page to reference the device
> >>>>>> inflight data as Ankur pointed out.
> >>>>>
> >>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
> >>>>> especially the in/out change for the parameter.  If you can at least
> >>>>> avoid that, it would alleviate my concerns quite a bit.
> >>>>
> >>>> In my view, we have two options overall:
> >>>>
> >>>> 1) Make it explicit, the changes the PV drivers we have to make in
> >>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
> >>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
> >>>> successfully, and inside this callback the PV driver may update it's tracking
> >>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
> >>>> shim_domain specific bits would be a little more abstracted from Xen PV
> >>>> backends. See netback example below the scissors mark. Or b) have sort of a
> >>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
> >>>> more explicit that there's no grant ops involved. The latter is more invasive.
> >>>>
> >>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
> >>>> hosting PV backends inside the guest. This would remove the Xen changes in this
> >>>> series completely. But it would require another guest being used
> >>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
> >>>> it would be equivalent to what does Xen with grants/events). The only changes in
> >>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
> >>>> outside the scope of this work.
> >>>>
> >>>> I think there's value on both; 1) is probably more familiar for KVM users
> >>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
> >>>> Xen disagregation capabilities in KVM.
> >>>>
> >>>> Thoughts? Xen maintainers what's your take on this?
> >>>
> >>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
> >>> abstraction layer for this kind of stuff. It should be passed to the
> >>> backends and those would pass it on to low-level Xen drivers (xenbus,
> >>> event channels, grant table, ...).
> >>>
> >> So if IIRC backends would use the xenhost layer to access grants or frames
> >> referenced by grants, and that would grok into some of this. IOW, you would have
> >> two implementors of xenhost: one for nested remote/local events+grants and
> >> another for this "shim domain" ?
> > 
> > As I'd need that for nested Xen I guess that would make it 3 variants.
> > Probably the xen-shim variant would need more hooks, but that should be
> > no problem.
> > 
> I probably messed up in the short description but "nested remote/local
> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
> So maybe only 2 variants are needed?
> 
> >>> I was planning to do that (the xenhost_t * stuff) soon in order to add
> >>> support for nested Xen using PV devices (you need two Xenstores for that
> >>> as the nested dom0 is acting as Xen backend server, while using PV
> >>> frontends for accessing the "real" world outside).
> >>>
> >>> The xenhost_t should be used for:
> >>>
> >>> - accessing Xenstore
> >>> - issuing and receiving events
> >>> - doing hypercalls
> >>> - grant table operations
> >>>
> >>
> >> In the text above, I sort of suggested a slice of this on 1.b) with a
> >> translate_gref() and put_gref() API -- to get the page from a gref. This was
> >> because of the flags|host_addr hurdle we depicted above wrt to using using grant
> >> maps/unmaps. You think some of the xenhost layer would be ammenable to support
> >> this case?
> > 
> > I think so, yes.
> > 
> >>
> >>> So exactly the kind of stuff you want to do, too.
> >>>
> >> Cool idea!
> > 
> > In the end you might make my life easier for nested Xen. :-)
> > 
> Hehe :)
> 
> > Do you want to have a try with that idea or should I do that? I might be
> > able to start working on that in about a month.
> > 
> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
> xenhost abstraction layer.

If you are up for it, it would be great to write some documentation too.
We are starting to have decent docs for some PV protocols, describing a
specific PV interface, but we are lacking docs about the basic building
blocks to bring up PV drivers in general. They would be extremely
useful. Given that you have just done the work, you are in a great
position to write those docs. Even bad English would be fine, I am sure
somebody else could volunteer to clean-up the language. Anything would
help :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 17:31               ` [Xen-devel] " Joao Martins
@ 2019-04-09  0:35                 ` Stefano Stabellini
  2019-04-10  5:50                   ` Ankur Arora
  2019-04-10  5:50                   ` Ankur Arora
  2019-04-09  0:35                 ` Stefano Stabellini
                                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 43+ messages in thread
From: Stefano Stabellini @ 2019-04-09  0:35 UTC (permalink / raw)
  To: Joao Martins
  Cc: Juergen Gross, Artem_Mygaiev, Stefano Stabellini, xen-devel, kvm,
	Radim Krčmář, x86, linux-kernel, Ankur Arora,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Boris Ostrovsky, Thomas Gleixner

On Mon, 8 Apr 2019, Joao Martins wrote:
> On 4/8/19 11:42 AM, Juergen Gross wrote:
> > On 08/04/2019 12:36, Joao Martins wrote:
> >> On 4/8/19 7:44 AM, Juergen Gross wrote:
> >>> On 12/03/2019 18:14, Joao Martins wrote:
> >>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
> >>>>> On 21/02/19 12:45, Joao Martins wrote:
> >>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
> >>>>>>> On 20/02/19 21:15, Joao Martins wrote:
> >>>>>>>>  2. PV Driver support (patches 17 - 39)
> >>>>>>>>
> >>>>>>>>  We start by redirecting hypercalls from the backend to routines
> >>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
> >>>>>>>>  table and interdomain events. Next, we add support for late
> >>>>>>>>  initialization of xenbus, followed by implementing
> >>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
> >>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
> >>>>>>>>  which will setup a limited Xen environment. This uses the added
> >>>>>>>>  functionality of Xen specific shared memory (grant tables) and
> >>>>>>>>  notifications (event channels).
> >>>>>>>
> >>>>>>> I am a bit worried by the last patches, they seem really brittle and
> >>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
> >>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
> >>>>>>> define a completely different hypercall.
> >>>>>>>
> >>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
> >>>>>>
> >>>>>> The xen_shim_domain() is only meant to handle the case where the backend
> >>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
> >>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
> >>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
> >>>>>> changes don't affect that case.
> >>>>>>
> >>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
> >>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
> >>>>>> more or less like:
> >>>>>>
> >>>>>> <netback|blkback|scsiback>
> >>>>>>  gnttab_map_refs(map_ops, pages)
> >>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
> >>>>>>      shim_hypercall()
> >>>>>>        shim_hcall_gntmap()
> >>>>>>
> >>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
> >>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
> >>>>>> the shim determines its user doesn't want to map the page. Also, there's another
> >>>>>> issue where PV backends always need a struct page to reference the device
> >>>>>> inflight data as Ankur pointed out.
> >>>>>
> >>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
> >>>>> especially the in/out change for the parameter.  If you can at least
> >>>>> avoid that, it would alleviate my concerns quite a bit.
> >>>>
> >>>> In my view, we have two options overall:
> >>>>
> >>>> 1) Make it explicit, the changes the PV drivers we have to make in
> >>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
> >>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
> >>>> successfully, and inside this callback the PV driver may update it's tracking
> >>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
> >>>> shim_domain specific bits would be a little more abstracted from Xen PV
> >>>> backends. See netback example below the scissors mark. Or b) have sort of a
> >>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
> >>>> more explicit that there's no grant ops involved. The latter is more invasive.
> >>>>
> >>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
> >>>> hosting PV backends inside the guest. This would remove the Xen changes in this
> >>>> series completely. But it would require another guest being used
> >>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
> >>>> it would be equivalent to what does Xen with grants/events). The only changes in
> >>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
> >>>> outside the scope of this work.
> >>>>
> >>>> I think there's value on both; 1) is probably more familiar for KVM users
> >>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
> >>>> Xen disagregation capabilities in KVM.
> >>>>
> >>>> Thoughts? Xen maintainers what's your take on this?
> >>>
> >>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
> >>> abstraction layer for this kind of stuff. It should be passed to the
> >>> backends and those would pass it on to low-level Xen drivers (xenbus,
> >>> event channels, grant table, ...).
> >>>
> >> So if IIRC backends would use the xenhost layer to access grants or frames
> >> referenced by grants, and that would grok into some of this. IOW, you would have
> >> two implementors of xenhost: one for nested remote/local events+grants and
> >> another for this "shim domain" ?
> > 
> > As I'd need that for nested Xen I guess that would make it 3 variants.
> > Probably the xen-shim variant would need more hooks, but that should be
> > no problem.
> > 
> I probably messed up in the short description but "nested remote/local
> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
> So maybe only 2 variants are needed?
> 
> >>> I was planning to do that (the xenhost_t * stuff) soon in order to add
> >>> support for nested Xen using PV devices (you need two Xenstores for that
> >>> as the nested dom0 is acting as Xen backend server, while using PV
> >>> frontends for accessing the "real" world outside).
> >>>
> >>> The xenhost_t should be used for:
> >>>
> >>> - accessing Xenstore
> >>> - issuing and receiving events
> >>> - doing hypercalls
> >>> - grant table operations
> >>>
> >>
> >> In the text above, I sort of suggested a slice of this on 1.b) with a
> >> translate_gref() and put_gref() API -- to get the page from a gref. This was
> >> because of the flags|host_addr hurdle we depicted above wrt to using using grant
> >> maps/unmaps. You think some of the xenhost layer would be ammenable to support
> >> this case?
> > 
> > I think so, yes.
> > 
> >>
> >>> So exactly the kind of stuff you want to do, too.
> >>>
> >> Cool idea!
> > 
> > In the end you might make my life easier for nested Xen. :-)
> > 
> Hehe :)
> 
> > Do you want to have a try with that idea or should I do that? I might be
> > able to start working on that in about a month.
> > 
> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
> xenhost abstraction layer.

If you are up for it, it would be great to write some documentation too.
We are starting to have decent docs for some PV protocols, describing a
specific PV interface, but we are lacking docs about the basic building
blocks to bring up PV drivers in general. They would be extremely
useful. Given that you have just done the work, you are in a great
position to write those docs. Even bad English would be fine, I am sure
somebody else could volunteer to clean-up the language. Anything would
help :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 17:31               ` [Xen-devel] " Joao Martins
                                   ` (2 preceding siblings ...)
  2019-04-09  5:04                 ` [Xen-devel] " Juergen Gross
@ 2019-04-09  5:04                 ` Juergen Gross
  3 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-09  5:04 UTC (permalink / raw)
  To: Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 08/04/2019 19:31, Joao Martins wrote:
> On 4/8/19 11:42 AM, Juergen Gross wrote:
>> On 08/04/2019 12:36, Joao Martins wrote:
>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>>>>
>>>>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>>>>  initialization of xenbus, followed by implementing
>>>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>  notifications (event channels).
>>>>>>>>
>>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>>> define a completely different hypercall.
>>>>>>>>
>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>>
>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>>> changes don't affect that case.
>>>>>>>
>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>>> more or less like:
>>>>>>>
>>>>>>> <netback|blkback|scsiback>
>>>>>>>  gnttab_map_refs(map_ops, pages)
>>>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>      shim_hypercall()
>>>>>>>        shim_hcall_gntmap()
>>>>>>>
>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>>> inflight data as Ankur pointed out.
>>>>>>
>>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>>> especially the in/out change for the parameter.  If you can at least
>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>
>>>>> In my view, we have two options overall:
>>>>>
>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>>
>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>>> series completely. But it would require another guest being used
>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>>> outside the scope of this work.
>>>>>
>>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>>> Xen disagregation capabilities in KVM.
>>>>>
>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>
>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>> event channels, grant table, ...).
>>>>
>>> So if IIRC backends would use the xenhost layer to access grants or frames
>>> referenced by grants, and that would grok into some of this. IOW, you would have
>>> two implementors of xenhost: one for nested remote/local events+grants and
>>> another for this "shim domain" ?
>>
>> As I'd need that for nested Xen I guess that would make it 3 variants.
>> Probably the xen-shim variant would need more hooks, but that should be
>> no problem.
>>
> I probably messed up in the short description but "nested remote/local
> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
> So maybe only 2 variants are needed?

I need one xenhost variant for the "normal" case as today: talking to
the single hypervisor (or in nested case: to the L1 hypervisor).

Then I need a variant for the nested case talking to L0 hypervisor.

And you need a variant talking to xen-shim.

The first two variants can be active in the same system in case of
nested Xen: the backends of L2 dom0 are talking to L1 hypervisor,
while its frontends are talking with L0 hypervisor.

> 
>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>>> support for nested Xen using PV devices (you need two Xenstores for that
>>>> as the nested dom0 is acting as Xen backend server, while using PV
>>>> frontends for accessing the "real" world outside).
>>>>
>>>> The xenhost_t should be used for:
>>>>
>>>> - accessing Xenstore
>>>> - issuing and receiving events
>>>> - doing hypercalls
>>>> - grant table operations
>>>>
>>>
>>> In the text above, I sort of suggested a slice of this on 1.b) with a
>>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>>> this case?
>>
>> I think so, yes.
>>
>>>
>>>> So exactly the kind of stuff you want to do, too.
>>>>
>>> Cool idea!
>>
>> In the end you might make my life easier for nested Xen. :-)
>>
> Hehe :)
> 
>> Do you want to have a try with that idea or should I do that? I might be
>> able to start working on that in about a month.
>>
> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
> xenhost abstraction layer.

Great, looking forward to it!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-08 17:31               ` [Xen-devel] " Joao Martins
  2019-04-09  0:35                 ` Stefano Stabellini
  2019-04-09  0:35                 ` Stefano Stabellini
@ 2019-04-09  5:04                 ` Juergen Gross
  2019-04-10  6:55                   ` Ankur Arora
  2019-04-10  6:55                   ` [Xen-devel] " Ankur Arora
  2019-04-09  5:04                 ` Juergen Gross
  3 siblings, 2 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-09  5:04 UTC (permalink / raw)
  To: Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ankur Arora, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 08/04/2019 19:31, Joao Martins wrote:
> On 4/8/19 11:42 AM, Juergen Gross wrote:
>> On 08/04/2019 12:36, Joao Martins wrote:
>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>  2. PV Driver support (patches 17 - 39)
>>>>>>>>>
>>>>>>>>>  We start by redirecting hypercalls from the backend to routines
>>>>>>>>>  which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>  table and interdomain events. Next, we add support for late
>>>>>>>>>  initialization of xenbus, followed by implementing
>>>>>>>>>  frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>  which will setup a limited Xen environment. This uses the added
>>>>>>>>>  functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>  notifications (event channels).
>>>>>>>>
>>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>>> define a completely different hypercall.
>>>>>>>>
>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>>
>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>>> changes don't affect that case.
>>>>>>>
>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>>> more or less like:
>>>>>>>
>>>>>>> <netback|blkback|scsiback>
>>>>>>>  gnttab_map_refs(map_ops, pages)
>>>>>>>    HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>      shim_hypercall()
>>>>>>>        shim_hcall_gntmap()
>>>>>>>
>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>>> inflight data as Ankur pointed out.
>>>>>>
>>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>>> especially the in/out change for the parameter.  If you can at least
>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>
>>>>> In my view, we have two options overall:
>>>>>
>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>>
>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>>> series completely. But it would require another guest being used
>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>>> outside the scope of this work.
>>>>>
>>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>>> Xen disagregation capabilities in KVM.
>>>>>
>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>
>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>> event channels, grant table, ...).
>>>>
>>> So if IIRC backends would use the xenhost layer to access grants or frames
>>> referenced by grants, and that would grok into some of this. IOW, you would have
>>> two implementors of xenhost: one for nested remote/local events+grants and
>>> another for this "shim domain" ?
>>
>> As I'd need that for nested Xen I guess that would make it 3 variants.
>> Probably the xen-shim variant would need more hooks, but that should be
>> no problem.
>>
> I probably messed up in the short description but "nested remote/local
> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
> So maybe only 2 variants are needed?

I need one xenhost variant for the "normal" case as today: talking to
the single hypervisor (or in nested case: to the L1 hypervisor).

Then I need a variant for the nested case talking to L0 hypervisor.

And you need a variant talking to xen-shim.

The first two variants can be active in the same system in case of
nested Xen: the backends of L2 dom0 are talking to L1 hypervisor,
while its frontends are talking with L0 hypervisor.

> 
>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>>> support for nested Xen using PV devices (you need two Xenstores for that
>>>> as the nested dom0 is acting as Xen backend server, while using PV
>>>> frontends for accessing the "real" world outside).
>>>>
>>>> The xenhost_t should be used for:
>>>>
>>>> - accessing Xenstore
>>>> - issuing and receiving events
>>>> - doing hypercalls
>>>> - grant table operations
>>>>
>>>
>>> In the text above, I sort of suggested a slice of this on 1.b) with a
>>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>>> this case?
>>
>> I think so, yes.
>>
>>>
>>>> So exactly the kind of stuff you want to do, too.
>>>>
>>> Cool idea!
>>
>> In the end you might make my life easier for nested Xen. :-)
>>
> Hehe :)
> 
>> Do you want to have a try with that idea or should I do that? I might be
>> able to start working on that in about a month.
>>
> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
> xenhost abstraction layer.

Great, looking forward to it!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-09  0:35                 ` Stefano Stabellini
  2019-04-10  5:50                   ` Ankur Arora
@ 2019-04-10  5:50                   ` Ankur Arora
  1 sibling, 0 replies; 43+ messages in thread
From: Ankur Arora @ 2019-04-10  5:50 UTC (permalink / raw)
  To: Stefano Stabellini, Joao Martins
  Cc: Juergen Gross, Artem_Mygaiev, kvm, Radim Krčmář,
	x86, linux-kernel, Paolo Bonzini, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, xen-devel, Boris Ostrovsky, Thomas Gleixner

On 2019-04-08 5:35 p.m., Stefano Stabellini wrote:
> On Mon, 8 Apr 2019, Joao Martins wrote:
>> On 4/8/19 11:42 AM, Juergen Gross wrote:
>>> On 08/04/2019 12:36, Joao Martins wrote:
>>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>>   2. PV Driver support (patches 17 - 39)
>>>>>>>>>>
>>>>>>>>>>   We start by redirecting hypercalls from the backend to routines
>>>>>>>>>>   which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>>   table and interdomain events. Next, we add support for late
>>>>>>>>>>   initialization of xenbus, followed by implementing
>>>>>>>>>>   frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>>   which will setup a limited Xen environment. This uses the added
>>>>>>>>>>   functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>>   notifications (event channels).
>>>>>>>>>
>>>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>>>> define a completely different hypercall.
>>>>>>>>>
>>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>>>
>>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>>>> changes don't affect that case.
>>>>>>>>
>>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>>>> more or less like:
>>>>>>>>
>>>>>>>> <netback|blkback|scsiback>
>>>>>>>>   gnttab_map_refs(map_ops, pages)
>>>>>>>>     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>>       shim_hypercall()
>>>>>>>>         shim_hcall_gntmap()
>>>>>>>>
>>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>>>> inflight data as Ankur pointed out.
>>>>>>>
>>>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>>>> especially the in/out change for the parameter.  If you can at least
>>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>>
>>>>>> In my view, we have two options overall:
>>>>>>
>>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>>>
>>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>>>> series completely. But it would require another guest being used
>>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>>>> outside the scope of this work.
>>>>>>
>>>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>>>> Xen disagregation capabilities in KVM.
>>>>>>
>>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>>
>>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>>> event channels, grant table, ...).
>>>>>
>>>> So if IIRC backends would use the xenhost layer to access grants or frames
>>>> referenced by grants, and that would grok into some of this. IOW, you would have
>>>> two implementors of xenhost: one for nested remote/local events+grants and
>>>> another for this "shim domain" ?
>>>
>>> As I'd need that for nested Xen I guess that would make it 3 variants.
>>> Probably the xen-shim variant would need more hooks, but that should be
>>> no problem.
>>>
>> I probably messed up in the short description but "nested remote/local
>> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
>> So maybe only 2 variants are needed?
>>
>>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>>>> support for nested Xen using PV devices (you need two Xenstores for that
>>>>> as the nested dom0 is acting as Xen backend server, while using PV
>>>>> frontends for accessing the "real" world outside).
>>>>>
>>>>> The xenhost_t should be used for:
>>>>>
>>>>> - accessing Xenstore
>>>>> - issuing and receiving events
>>>>> - doing hypercalls
>>>>> - grant table operations
>>>>>
>>>>
>>>> In the text above, I sort of suggested a slice of this on 1.b) with a
>>>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>>>> this case?
>>>
>>> I think so, yes.
>>>
>>>>
>>>>> So exactly the kind of stuff you want to do, too.
>>>>>
>>>> Cool idea!
>>>
>>> In the end you might make my life easier for nested Xen. :-)
>>>
>> Hehe :)
>>
>>> Do you want to have a try with that idea or should I do that? I might be
>>> able to start working on that in about a month.
>>>
>> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
>> xenhost abstraction layer.
> 
> If you are up for it, it would be great to write some documentation too.
> We are starting to have decent docs for some PV protocols, describing a
> specific PV interface, but we are lacking docs about the basic building
> blocks to bring up PV drivers in general. They would be extremely
Agreed. These would be useful.

> useful. Given that you have just done the work, you are in a great
> position to write those docs. Even bad English would be fine, I am sure
> somebody else could volunteer to clean-up the language. Anything would
> help :-)
Can't make any promises on this yet but I will see if I can convert
notes I made into something useful for the community.


Ankur

> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-09  0:35                 ` Stefano Stabellini
@ 2019-04-10  5:50                   ` Ankur Arora
  2019-04-10 20:45                     ` Stefano Stabellini
  2019-04-10 20:45                     ` Stefano Stabellini
  2019-04-10  5:50                   ` Ankur Arora
  1 sibling, 2 replies; 43+ messages in thread
From: Ankur Arora @ 2019-04-10  5:50 UTC (permalink / raw)
  To: Stefano Stabellini, Joao Martins
  Cc: Juergen Gross, Artem_Mygaiev, kvm, Radim Krčmář,
	x86, linux-kernel, Paolo Bonzini, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, xen-devel, Boris Ostrovsky, Thomas Gleixner

On 2019-04-08 5:35 p.m., Stefano Stabellini wrote:
> On Mon, 8 Apr 2019, Joao Martins wrote:
>> On 4/8/19 11:42 AM, Juergen Gross wrote:
>>> On 08/04/2019 12:36, Joao Martins wrote:
>>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>>   2. PV Driver support (patches 17 - 39)
>>>>>>>>>>
>>>>>>>>>>   We start by redirecting hypercalls from the backend to routines
>>>>>>>>>>   which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>>   table and interdomain events. Next, we add support for late
>>>>>>>>>>   initialization of xenbus, followed by implementing
>>>>>>>>>>   frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>>   which will setup a limited Xen environment. This uses the added
>>>>>>>>>>   functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>>   notifications (event channels).
>>>>>>>>>
>>>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>>>> define a completely different hypercall.
>>>>>>>>>
>>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>>>
>>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>>>> changes don't affect that case.
>>>>>>>>
>>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>>>> more or less like:
>>>>>>>>
>>>>>>>> <netback|blkback|scsiback>
>>>>>>>>   gnttab_map_refs(map_ops, pages)
>>>>>>>>     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>>       shim_hypercall()
>>>>>>>>         shim_hcall_gntmap()
>>>>>>>>
>>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>>>> inflight data as Ankur pointed out.
>>>>>>>
>>>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>>>> especially the in/out change for the parameter.  If you can at least
>>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>>
>>>>>> In my view, we have two options overall:
>>>>>>
>>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>>>
>>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>>>> series completely. But it would require another guest being used
>>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>>>> outside the scope of this work.
>>>>>>
>>>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>>>> Xen disagregation capabilities in KVM.
>>>>>>
>>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>>
>>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>>> event channels, grant table, ...).
>>>>>
>>>> So if IIRC backends would use the xenhost layer to access grants or frames
>>>> referenced by grants, and that would grok into some of this. IOW, you would have
>>>> two implementors of xenhost: one for nested remote/local events+grants and
>>>> another for this "shim domain" ?
>>>
>>> As I'd need that for nested Xen I guess that would make it 3 variants.
>>> Probably the xen-shim variant would need more hooks, but that should be
>>> no problem.
>>>
>> I probably messed up in the short description but "nested remote/local
>> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
>> So maybe only 2 variants are needed?
>>
>>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>>>> support for nested Xen using PV devices (you need two Xenstores for that
>>>>> as the nested dom0 is acting as Xen backend server, while using PV
>>>>> frontends for accessing the "real" world outside).
>>>>>
>>>>> The xenhost_t should be used for:
>>>>>
>>>>> - accessing Xenstore
>>>>> - issuing and receiving events
>>>>> - doing hypercalls
>>>>> - grant table operations
>>>>>
>>>>
>>>> In the text above, I sort of suggested a slice of this on 1.b) with a
>>>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>>>> this case?
>>>
>>> I think so, yes.
>>>
>>>>
>>>>> So exactly the kind of stuff you want to do, too.
>>>>>
>>>> Cool idea!
>>>
>>> In the end you might make my life easier for nested Xen. :-)
>>>
>> Hehe :)
>>
>>> Do you want to have a try with that idea or should I do that? I might be
>>> able to start working on that in about a month.
>>>
>> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
>> xenhost abstraction layer.
> 
> If you are up for it, it would be great to write some documentation too.
> We are starting to have decent docs for some PV protocols, describing a
> specific PV interface, but we are lacking docs about the basic building
> blocks to bring up PV drivers in general. They would be extremely
Agreed. These would be useful.

> useful. Given that you have just done the work, you are in a great
> position to write those docs. Even bad English would be fine, I am sure
> somebody else could volunteer to clean-up the language. Anything would
> help :-)
Can't make any promises on this yet but I will see if I can convert
notes I made into something useful for the community.


Ankur

> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-09  5:04                 ` [Xen-devel] " Juergen Gross
@ 2019-04-10  6:55                   ` Ankur Arora
  2019-04-10  6:55                   ` [Xen-devel] " Ankur Arora
  1 sibling, 0 replies; 43+ messages in thread
From: Ankur Arora @ 2019-04-10  6:55 UTC (permalink / raw)
  To: Juergen Gross, Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 2019-04-08 10:04 p.m., Juergen Gross wrote:
> On 08/04/2019 19:31, Joao Martins wrote:
>> On 4/8/19 11:42 AM, Juergen Gross wrote:
>>> On 08/04/2019 12:36, Joao Martins wrote:
>>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>>   2. PV Driver support (patches 17 - 39)
>>>>>>>>>>
>>>>>>>>>>   We start by redirecting hypercalls from the backend to routines
>>>>>>>>>>   which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>>   table and interdomain events. Next, we add support for late
>>>>>>>>>>   initialization of xenbus, followed by implementing
>>>>>>>>>>   frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>>   which will setup a limited Xen environment. This uses the added
>>>>>>>>>>   functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>>   notifications (event channels).
>>>>>>>>>
>>>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>>>> define a completely different hypercall.
>>>>>>>>>
>>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>>>
>>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>>>> changes don't affect that case.
>>>>>>>>
>>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>>>> more or less like:
>>>>>>>>
>>>>>>>> <netback|blkback|scsiback>
>>>>>>>>   gnttab_map_refs(map_ops, pages)
>>>>>>>>     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>>       shim_hypercall()
>>>>>>>>         shim_hcall_gntmap()
>>>>>>>>
>>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>>>> inflight data as Ankur pointed out.
>>>>>>>
>>>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>>>> especially the in/out change for the parameter.  If you can at least
>>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>>
>>>>>> In my view, we have two options overall:
>>>>>>
>>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>>>
>>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>>>> series completely. But it would require another guest being used
>>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>>>> outside the scope of this work.
>>>>>>
>>>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>>>> Xen disagregation capabilities in KVM.
>>>>>>
>>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>>
>>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>>> event channels, grant table, ...).
>>>>>
>>>> So if IIRC backends would use the xenhost layer to access grants or frames
>>>> referenced by grants, and that would grok into some of this. IOW, you would have
>>>> two implementors of xenhost: one for nested remote/local events+grants and
>>>> another for this "shim domain" ?
>>>
>>> As I'd need that for nested Xen I guess that would make it 3 variants.
>>> Probably the xen-shim variant would need more hooks, but that should be
>>> no problem.
>>>
>> I probably messed up in the short description but "nested remote/local
>> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
>> So maybe only 2 variants are needed?
> 
> I need one xenhost variant for the "normal" case as today: talking to
> the single hypervisor (or in nested case: to the L1 hypervisor).
> 
> Then I need a variant for the nested case talking to L0 hypervisor.
> 
> And you need a variant talking to xen-shim.
> 
> The first two variants can be active in the same system in case of
> nested Xen: the backends of L2 dom0 are talking to L1 hypervisor,
> while its frontends are talking with L0 hypervisor.
Thanks this is clarifying.

So, essentially backend drivers with a xenhost_t handle, communicate
with Xen low-level drivers etc using the same handle, however, if they
communicate with frontend drivers for accessing the "real" world,
they exclusively use standard mechanisms (Linux or hypercalls)?

In this scenario L2 dom0 xen-netback and L2 dom0 xen-netfront should
just be able to use Linux interfaces. But if L2 dom0 xenbus-backend
needs to talk to L2 dom0 xenbus-frontend then do you see them layered
or are they still exclusively talking via the standard mechanisms?

Ankur

> 
>>
>>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>>>> support for nested Xen using PV devices (you need two Xenstores for that
>>>>> as the nested dom0 is acting as Xen backend server, while using PV
>>>>> frontends for accessing the "real" world outside).
>>>>>
>>>>> The xenhost_t should be used for:
>>>>>
>>>>> - accessing Xenstore
>>>>> - issuing and receiving events
>>>>> - doing hypercalls
>>>>> - grant table operations
>>>>>
>>>>
>>>> In the text above, I sort of suggested a slice of this on 1.b) with a
>>>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>>>> this case?
>>>
>>> I think so, yes.
>>>
>>>>
>>>>> So exactly the kind of stuff you want to do, too.
>>>>>
>>>> Cool idea!
>>>
>>> In the end you might make my life easier for nested Xen. :-)
>>>
>> Hehe :)
>>
>>> Do you want to have a try with that idea or should I do that? I might be
>>> able to start working on that in about a month.
>>>
>> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
>> xenhost abstraction layer.
> 
> Great, looking forward to it!
> 
> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-09  5:04                 ` [Xen-devel] " Juergen Gross
  2019-04-10  6:55                   ` Ankur Arora
@ 2019-04-10  6:55                   ` Ankur Arora
  2019-04-10  7:14                     ` Juergen Gross
  2019-04-10  7:14                     ` Juergen Gross
  1 sibling, 2 replies; 43+ messages in thread
From: Ankur Arora @ 2019-04-10  6:55 UTC (permalink / raw)
  To: Juergen Gross, Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 2019-04-08 10:04 p.m., Juergen Gross wrote:
> On 08/04/2019 19:31, Joao Martins wrote:
>> On 4/8/19 11:42 AM, Juergen Gross wrote:
>>> On 08/04/2019 12:36, Joao Martins wrote:
>>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>>   2. PV Driver support (patches 17 - 39)
>>>>>>>>>>
>>>>>>>>>>   We start by redirecting hypercalls from the backend to routines
>>>>>>>>>>   which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>>   table and interdomain events. Next, we add support for late
>>>>>>>>>>   initialization of xenbus, followed by implementing
>>>>>>>>>>   frontend/backend communication mechanisms (i.e. grant tables and
>>>>>>>>>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>>   which will setup a limited Xen environment. This uses the added
>>>>>>>>>>   functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>>   notifications (event channels).
>>>>>>>>>
>>>>>>>>> I am a bit worried by the last patches, they seem really brittle and
>>>>>>>>> prone to breakage.  I don't know Xen well enough to understand if the
>>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to
>>>>>>>>> define a completely different hypercall.
>>>>>>>>>
>>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment.
>>>>>>>>
>>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend
>>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work
>>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a
>>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these
>>>>>>>> changes don't affect that case.
>>>>>>>>
>>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't
>>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go
>>>>>>>> more or less like:
>>>>>>>>
>>>>>>>> <netback|blkback|scsiback>
>>>>>>>>   gnttab_map_refs(map_ops, pages)
>>>>>>>>     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>>       shim_hypercall()
>>>>>>>>         shim_hcall_gntmap()
>>>>>>>>
>>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the
>>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how
>>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another
>>>>>>>> issue where PV backends always need a struct page to reference the device
>>>>>>>> inflight data as Ankur pointed out.
>>>>>>>
>>>>>>> Ultimately it's up to the Xen people.  It does make their API uglier,
>>>>>>> especially the in/out change for the parameter.  If you can at least
>>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>>
>>>>>> In my view, we have two options overall:
>>>>>>
>>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback
>>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up
>>>>>> successfully, and inside this callback the PV driver may update it's tracking
>>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all
>>>>>> shim_domain specific bits would be a little more abstracted from Xen PV
>>>>>> backends. See netback example below the scissors mark. Or b) have sort of a
>>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even
>>>>>> more explicit that there's no grant ops involved. The latter is more invasive.
>>>>>>
>>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow
>>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this
>>>>>> series completely. But it would require another guest being used
>>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory,
>>>>>> it would be equivalent to what does Xen with grants/events). The only changes in
>>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own
>>>>>> outside the scope of this work.
>>>>>>
>>>>>> I think there's value on both; 1) is probably more familiar for KVM users
>>>>>> perhaps (as it is similar to what vhost does?) while  2) equates to implementing
>>>>>> Xen disagregation capabilities in KVM.
>>>>>>
>>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>>
>>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an
>>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>>> event channels, grant table, ...).
>>>>>
>>>> So if IIRC backends would use the xenhost layer to access grants or frames
>>>> referenced by grants, and that would grok into some of this. IOW, you would have
>>>> two implementors of xenhost: one for nested remote/local events+grants and
>>>> another for this "shim domain" ?
>>>
>>> As I'd need that for nested Xen I guess that would make it 3 variants.
>>> Probably the xen-shim variant would need more hooks, but that should be
>>> no problem.
>>>
>> I probably messed up in the short description but "nested remote/local
>> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1).
>> So maybe only 2 variants are needed?
> 
> I need one xenhost variant for the "normal" case as today: talking to
> the single hypervisor (or in nested case: to the L1 hypervisor).
> 
> Then I need a variant for the nested case talking to L0 hypervisor.
> 
> And you need a variant talking to xen-shim.
> 
> The first two variants can be active in the same system in case of
> nested Xen: the backends of L2 dom0 are talking to L1 hypervisor,
> while its frontends are talking with L0 hypervisor.
Thanks this is clarifying.

So, essentially backend drivers with a xenhost_t handle, communicate
with Xen low-level drivers etc using the same handle, however, if they
communicate with frontend drivers for accessing the "real" world,
they exclusively use standard mechanisms (Linux or hypercalls)?

In this scenario L2 dom0 xen-netback and L2 dom0 xen-netfront should
just be able to use Linux interfaces. But if L2 dom0 xenbus-backend
needs to talk to L2 dom0 xenbus-frontend then do you see them layered
or are they still exclusively talking via the standard mechanisms?

Ankur

> 
>>
>>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add
>>>>> support for nested Xen using PV devices (you need two Xenstores for that
>>>>> as the nested dom0 is acting as Xen backend server, while using PV
>>>>> frontends for accessing the "real" world outside).
>>>>>
>>>>> The xenhost_t should be used for:
>>>>>
>>>>> - accessing Xenstore
>>>>> - issuing and receiving events
>>>>> - doing hypercalls
>>>>> - grant table operations
>>>>>
>>>>
>>>> In the text above, I sort of suggested a slice of this on 1.b) with a
>>>> translate_gref() and put_gref() API -- to get the page from a gref. This was
>>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant
>>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support
>>>> this case?
>>>
>>> I think so, yes.
>>>
>>>>
>>>>> So exactly the kind of stuff you want to do, too.
>>>>>
>>>> Cool idea!
>>>
>>> In the end you might make my life easier for nested Xen. :-)
>>>
>> Hehe :)
>>
>>> Do you want to have a try with that idea or should I do that? I might be
>>> able to start working on that in about a month.
>>>
>> Ankur (CC'ed) will give a shot at it, and should start a new thread on this
>> xenhost abstraction layer.
> 
> Great, looking forward to it!
> 
> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-10  6:55                   ` [Xen-devel] " Ankur Arora
  2019-04-10  7:14                     ` Juergen Gross
@ 2019-04-10  7:14                     ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-10  7:14 UTC (permalink / raw)
  To: Ankur Arora, Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 10/04/2019 08:55, Ankur Arora wrote:
> On 2019-04-08 10:04 p.m., Juergen Gross wrote:
>> On 08/04/2019 19:31, Joao Martins wrote:
>>> On 4/8/19 11:42 AM, Juergen Gross wrote:
>>>> On 08/04/2019 12:36, Joao Martins wrote:
>>>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>>>   2. PV Driver support (patches 17 - 39)
>>>>>>>>>>>
>>>>>>>>>>>   We start by redirecting hypercalls from the backend to
>>>>>>>>>>> routines
>>>>>>>>>>>   which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>>>   table and interdomain events. Next, we add support for late
>>>>>>>>>>>   initialization of xenbus, followed by implementing
>>>>>>>>>>>   frontend/backend communication mechanisms (i.e. grant
>>>>>>>>>>> tables and
>>>>>>>>>>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>>>   which will setup a limited Xen environment. This uses the
>>>>>>>>>>> added
>>>>>>>>>>>   functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>>>   notifications (event channels).
>>>>>>>>>>
>>>>>>>>>> I am a bit worried by the last patches, they seem really
>>>>>>>>>> brittle and
>>>>>>>>>> prone to breakage.  I don't know Xen well enough to understand
>>>>>>>>>> if the
>>>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not,
>>>>>>>>>> you have to
>>>>>>>>>> define a completely different hypercall.
>>>>>>>>>>
>>>>>>>>> I guess Ankur already answered this; so just to stack this on
>>>>>>>>> top of his comment.
>>>>>>>>>
>>>>>>>>> The xen_shim_domain() is only meant to handle the case where
>>>>>>>>> the backend
>>>>>>>>> has/can-have full access to guest memory [i.e. netback and
>>>>>>>>> blkback would work
>>>>>>>>> with similar assumptions as vhost?]. For the normal case, where
>>>>>>>>> a backend *in a
>>>>>>>>> guest* maps and unmaps other guest memory, this is not
>>>>>>>>> applicable and these
>>>>>>>>> changes don't affect that case.
>>>>>>>>>
>>>>>>>>> IOW, the PV backend here sits on the hypervisor, and the
>>>>>>>>> hypercalls aren't
>>>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The
>>>>>>>>> call chain would go
>>>>>>>>> more or less like:
>>>>>>>>>
>>>>>>>>> <netback|blkback|scsiback>
>>>>>>>>>   gnttab_map_refs(map_ops, pages)
>>>>>>>>>     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>>>       shim_hypercall()
>>>>>>>>>         shim_hcall_gntmap()
>>>>>>>>>
>>>>>>>>> Our reasoning was that given we are already in KVM, why mapping
>>>>>>>>> a page if the
>>>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of
>>>>>>>>> GNTMAP_host_map is how
>>>>>>>>> the shim determines its user doesn't want to map the page.
>>>>>>>>> Also, there's another
>>>>>>>>> issue where PV backends always need a struct page to reference
>>>>>>>>> the device
>>>>>>>>> inflight data as Ankur pointed out.
>>>>>>>>
>>>>>>>> Ultimately it's up to the Xen people.  It does make their API
>>>>>>>> uglier,
>>>>>>>> especially the in/out change for the parameter.  If you can at
>>>>>>>> least
>>>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>>>
>>>>>>> In my view, we have two options overall:
>>>>>>>
>>>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a
>>>>>>> callback
>>>>>>> argument to gnttab_map_refs() that is invoked for every page that
>>>>>>> gets looked up
>>>>>>> successfully, and inside this callback the PV driver may update
>>>>>>> it's tracking
>>>>>>> page. Here we no longer have this in/out parameter in
>>>>>>> gnttab_map_refs, and all
>>>>>>> shim_domain specific bits would be a little more abstracted from
>>>>>>> Xen PV
>>>>>>> backends. See netback example below the scissors mark. Or b) have
>>>>>>> sort of a
>>>>>>> translate_gref() and put_gref() API that Xen PV drivers use which
>>>>>>> make it even
>>>>>>> more explicit that there's no grant ops involved. The latter is
>>>>>>> more invasive.
>>>>>>>
>>>>>>> 2) The second option is to support guest grant mapping/unmapping
>>>>>>> [*] to allow
>>>>>>> hosting PV backends inside the guest. This would remove the Xen
>>>>>>> changes in this
>>>>>>> series completely. But it would require another guest being used
>>>>>>> as netback/blkback/xenstored, and less performance than 1)
>>>>>>> (though, in theory,
>>>>>>> it would be equivalent to what does Xen with grants/events). The
>>>>>>> only changes in
>>>>>>> Linux Xen code is adding xenstored domain support, but that is
>>>>>>> useful on its own
>>>>>>> outside the scope of this work.
>>>>>>>
>>>>>>> I think there's value on both; 1) is probably more familiar for
>>>>>>> KVM users
>>>>>>> perhaps (as it is similar to what vhost does?) while  2) equates
>>>>>>> to implementing
>>>>>>> Xen disagregation capabilities in KVM.
>>>>>>>
>>>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>>>
>>>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used
>>>>>> as an
>>>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>>>> event channels, grant table, ...).
>>>>>>
>>>>> So if IIRC backends would use the xenhost layer to access grants or
>>>>> frames
>>>>> referenced by grants, and that would grok into some of this. IOW,
>>>>> you would have
>>>>> two implementors of xenhost: one for nested remote/local
>>>>> events+grants and
>>>>> another for this "shim domain" ?
>>>>
>>>> As I'd need that for nested Xen I guess that would make it 3 variants.
>>>> Probably the xen-shim variant would need more hooks, but that should be
>>>> no problem.
>>>>
>>> I probably messed up in the short description but "nested remote/local
>>> events+grants" was referring to nested Xen (FWIW remote meant L0 and
>>> local L1).
>>> So maybe only 2 variants are needed?
>>
>> I need one xenhost variant for the "normal" case as today: talking to
>> the single hypervisor (or in nested case: to the L1 hypervisor).
>>
>> Then I need a variant for the nested case talking to L0 hypervisor.
>>
>> And you need a variant talking to xen-shim.
>>
>> The first two variants can be active in the same system in case of
>> nested Xen: the backends of L2 dom0 are talking to L1 hypervisor,
>> while its frontends are talking with L0 hypervisor.
> Thanks this is clarifying.
> 
> So, essentially backend drivers with a xenhost_t handle, communicate
> with Xen low-level drivers etc using the same handle, however, if they
> communicate with frontend drivers for accessing the "real" world,
> they exclusively use standard mechanisms (Linux or hypercalls)?

This should be opaque to the backends. The xenhost_t handle should have
a pointer to a function vector for relevant grant-, event- and Xenstore-
related functions. Calls to such functions should be done via an inline
function with the xenhost_t handle being one parameter, that function
will then call the correct implementation.

> In this scenario L2 dom0 xen-netback and L2 dom0 xen-netfront should
> just be able to use Linux interfaces. But if L2 dom0 xenbus-backend
> needs to talk to L2 dom0 xenbus-frontend then do you see them layered
> or are they still exclusively talking via the standard mechanisms?

The distinction is made via the function vector in xenhost_t. So the
only change in backends needed is the introduction of xenhost_t.

Whether we want to introduce xenhost_t in frontends, too, is TBD.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-10  6:55                   ` [Xen-devel] " Ankur Arora
@ 2019-04-10  7:14                     ` Juergen Gross
  2019-04-10  7:14                     ` Juergen Gross
  1 sibling, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2019-04-10  7:14 UTC (permalink / raw)
  To: Ankur Arora, Joao Martins
  Cc: Stefano Stabellini, xen-devel, kvm, Radim Krčmář,
	x86, linux-kernel, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Paolo Bonzini, Boris Ostrovsky, Thomas Gleixner

On 10/04/2019 08:55, Ankur Arora wrote:
> On 2019-04-08 10:04 p.m., Juergen Gross wrote:
>> On 08/04/2019 19:31, Joao Martins wrote:
>>> On 4/8/19 11:42 AM, Juergen Gross wrote:
>>>> On 08/04/2019 12:36, Joao Martins wrote:
>>>>> On 4/8/19 7:44 AM, Juergen Gross wrote:
>>>>>> On 12/03/2019 18:14, Joao Martins wrote:
>>>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote:
>>>>>>>> On 21/02/19 12:45, Joao Martins wrote:
>>>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote:
>>>>>>>>>> On 20/02/19 21:15, Joao Martins wrote:
>>>>>>>>>>>   2. PV Driver support (patches 17 - 39)
>>>>>>>>>>>
>>>>>>>>>>>   We start by redirecting hypercalls from the backend to
>>>>>>>>>>> routines
>>>>>>>>>>>   which emulate the behaviour that PV backends expect i.e. grant
>>>>>>>>>>>   table and interdomain events. Next, we add support for late
>>>>>>>>>>>   initialization of xenbus, followed by implementing
>>>>>>>>>>>   frontend/backend communication mechanisms (i.e. grant
>>>>>>>>>>> tables and
>>>>>>>>>>>   interdomain event channels). Finally, introduce xen-shim.ko,
>>>>>>>>>>>   which will setup a limited Xen environment. This uses the
>>>>>>>>>>> added
>>>>>>>>>>>   functionality of Xen specific shared memory (grant tables) and
>>>>>>>>>>>   notifications (event channels).
>>>>>>>>>>
>>>>>>>>>> I am a bit worried by the last patches, they seem really
>>>>>>>>>> brittle and
>>>>>>>>>> prone to breakage.  I don't know Xen well enough to understand
>>>>>>>>>> if the
>>>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not,
>>>>>>>>>> you have to
>>>>>>>>>> define a completely different hypercall.
>>>>>>>>>>
>>>>>>>>> I guess Ankur already answered this; so just to stack this on
>>>>>>>>> top of his comment.
>>>>>>>>>
>>>>>>>>> The xen_shim_domain() is only meant to handle the case where
>>>>>>>>> the backend
>>>>>>>>> has/can-have full access to guest memory [i.e. netback and
>>>>>>>>> blkback would work
>>>>>>>>> with similar assumptions as vhost?]. For the normal case, where
>>>>>>>>> a backend *in a
>>>>>>>>> guest* maps and unmaps other guest memory, this is not
>>>>>>>>> applicable and these
>>>>>>>>> changes don't affect that case.
>>>>>>>>>
>>>>>>>>> IOW, the PV backend here sits on the hypervisor, and the
>>>>>>>>> hypercalls aren't
>>>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The
>>>>>>>>> call chain would go
>>>>>>>>> more or less like:
>>>>>>>>>
>>>>>>>>> <netback|blkback|scsiback>
>>>>>>>>>   gnttab_map_refs(map_ops, pages)
>>>>>>>>>     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
>>>>>>>>>       shim_hypercall()
>>>>>>>>>         shim_hcall_gntmap()
>>>>>>>>>
>>>>>>>>> Our reasoning was that given we are already in KVM, why mapping
>>>>>>>>> a page if the
>>>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of
>>>>>>>>> GNTMAP_host_map is how
>>>>>>>>> the shim determines its user doesn't want to map the page.
>>>>>>>>> Also, there's another
>>>>>>>>> issue where PV backends always need a struct page to reference
>>>>>>>>> the device
>>>>>>>>> inflight data as Ankur pointed out.
>>>>>>>>
>>>>>>>> Ultimately it's up to the Xen people.  It does make their API
>>>>>>>> uglier,
>>>>>>>> especially the in/out change for the parameter.  If you can at
>>>>>>>> least
>>>>>>>> avoid that, it would alleviate my concerns quite a bit.
>>>>>>>
>>>>>>> In my view, we have two options overall:
>>>>>>>
>>>>>>> 1) Make it explicit, the changes the PV drivers we have to make in
>>>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a
>>>>>>> callback
>>>>>>> argument to gnttab_map_refs() that is invoked for every page that
>>>>>>> gets looked up
>>>>>>> successfully, and inside this callback the PV driver may update
>>>>>>> it's tracking
>>>>>>> page. Here we no longer have this in/out parameter in
>>>>>>> gnttab_map_refs, and all
>>>>>>> shim_domain specific bits would be a little more abstracted from
>>>>>>> Xen PV
>>>>>>> backends. See netback example below the scissors mark. Or b) have
>>>>>>> sort of a
>>>>>>> translate_gref() and put_gref() API that Xen PV drivers use which
>>>>>>> make it even
>>>>>>> more explicit that there's no grant ops involved. The latter is
>>>>>>> more invasive.
>>>>>>>
>>>>>>> 2) The second option is to support guest grant mapping/unmapping
>>>>>>> [*] to allow
>>>>>>> hosting PV backends inside the guest. This would remove the Xen
>>>>>>> changes in this
>>>>>>> series completely. But it would require another guest being used
>>>>>>> as netback/blkback/xenstored, and less performance than 1)
>>>>>>> (though, in theory,
>>>>>>> it would be equivalent to what does Xen with grants/events). The
>>>>>>> only changes in
>>>>>>> Linux Xen code is adding xenstored domain support, but that is
>>>>>>> useful on its own
>>>>>>> outside the scope of this work.
>>>>>>>
>>>>>>> I think there's value on both; 1) is probably more familiar for
>>>>>>> KVM users
>>>>>>> perhaps (as it is similar to what vhost does?) while  2) equates
>>>>>>> to implementing
>>>>>>> Xen disagregation capabilities in KVM.
>>>>>>>
>>>>>>> Thoughts? Xen maintainers what's your take on this?
>>>>>>
>>>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used
>>>>>> as an
>>>>>> abstraction layer for this kind of stuff. It should be passed to the
>>>>>> backends and those would pass it on to low-level Xen drivers (xenbus,
>>>>>> event channels, grant table, ...).
>>>>>>
>>>>> So if IIRC backends would use the xenhost layer to access grants or
>>>>> frames
>>>>> referenced by grants, and that would grok into some of this. IOW,
>>>>> you would have
>>>>> two implementors of xenhost: one for nested remote/local
>>>>> events+grants and
>>>>> another for this "shim domain" ?
>>>>
>>>> As I'd need that for nested Xen I guess that would make it 3 variants.
>>>> Probably the xen-shim variant would need more hooks, but that should be
>>>> no problem.
>>>>
>>> I probably messed up in the short description but "nested remote/local
>>> events+grants" was referring to nested Xen (FWIW remote meant L0 and
>>> local L1).
>>> So maybe only 2 variants are needed?
>>
>> I need one xenhost variant for the "normal" case as today: talking to
>> the single hypervisor (or in nested case: to the L1 hypervisor).
>>
>> Then I need a variant for the nested case talking to L0 hypervisor.
>>
>> And you need a variant talking to xen-shim.
>>
>> The first two variants can be active in the same system in case of
>> nested Xen: the backends of L2 dom0 are talking to L1 hypervisor,
>> while its frontends are talking with L0 hypervisor.
> Thanks this is clarifying.
> 
> So, essentially backend drivers with a xenhost_t handle, communicate
> with Xen low-level drivers etc using the same handle, however, if they
> communicate with frontend drivers for accessing the "real" world,
> they exclusively use standard mechanisms (Linux or hypercalls)?

This should be opaque to the backends. The xenhost_t handle should have
a pointer to a function vector for relevant grant-, event- and Xenstore-
related functions. Calls to such functions should be done via an inline
function with the xenhost_t handle being one parameter, that function
will then call the correct implementation.

> In this scenario L2 dom0 xen-netback and L2 dom0 xen-netfront should
> just be able to use Linux interfaces. But if L2 dom0 xenbus-backend
> needs to talk to L2 dom0 xenbus-frontend then do you see them layered
> or are they still exclusively talking via the standard mechanisms?

The distinction is made via the function vector in xenhost_t. So the
only change in backends needed is the introduction of xenhost_t.

Whether we want to introduce xenhost_t in frontends, too, is TBD.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-10  5:50                   ` Ankur Arora
  2019-04-10 20:45                     ` Stefano Stabellini
@ 2019-04-10 20:45                     ` Stefano Stabellini
  1 sibling, 0 replies; 43+ messages in thread
From: Stefano Stabellini @ 2019-04-10 20:45 UTC (permalink / raw)
  To: Ankur Arora
  Cc: Juergen Gross, Artem_Mygaiev, Stefano Stabellini, Thomas Gleixner,
	kvm, Radim Krčmář, x86, linux-kernel,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	xen-devel, Joao Martins, Boris Ostrovsky

On Tue, 9 Apr 2019, Ankur Arora wrote:
> On 2019-04-08 5:35 p.m., Stefano Stabellini wrote:
> > On Mon, 8 Apr 2019, Joao Martins wrote:
> > > On 4/8/19 11:42 AM, Juergen Gross wrote:
> > > > On 08/04/2019 12:36, Joao Martins wrote:
> > > > > On 4/8/19 7:44 AM, Juergen Gross wrote:
> > > > > > On 12/03/2019 18:14, Joao Martins wrote:
> > > > > > > On 2/22/19 4:59 PM, Paolo Bonzini wrote:
> > > > > > > > On 21/02/19 12:45, Joao Martins wrote:
> > > > > > > > > On 2/20/19 9:09 PM, Paolo Bonzini wrote:
> > > > > > > > > > On 20/02/19 21:15, Joao Martins wrote:
> > > > > > > > > > >   2. PV Driver support (patches 17 - 39)
> > > > > > > > > > > 
> > > > > > > > > > >   We start by redirecting hypercalls from the backend to
> > > > > > > > > > > routines
> > > > > > > > > > >   which emulate the behaviour that PV backends expect i.e.
> > > > > > > > > > > grant
> > > > > > > > > > >   table and interdomain events. Next, we add support for
> > > > > > > > > > > late
> > > > > > > > > > >   initialization of xenbus, followed by implementing
> > > > > > > > > > >   frontend/backend communication mechanisms (i.e. grant
> > > > > > > > > > > tables and
> > > > > > > > > > >   interdomain event channels). Finally, introduce
> > > > > > > > > > > xen-shim.ko,
> > > > > > > > > > >   which will setup a limited Xen environment. This uses
> > > > > > > > > > > the added
> > > > > > > > > > >   functionality of Xen specific shared memory (grant
> > > > > > > > > > > tables) and
> > > > > > > > > > >   notifications (event channels).
> > > > > > > > > > 
> > > > > > > > > > I am a bit worried by the last patches, they seem really
> > > > > > > > > > brittle and
> > > > > > > > > > prone to breakage.  I don't know Xen well enough to
> > > > > > > > > > understand if the
> > > > > > > > > > lack of support for GNTMAP_host_map is fixable, but if not,
> > > > > > > > > > you have to
> > > > > > > > > > define a completely different hypercall.
> > > > > > > > > > 
> > > > > > > > > I guess Ankur already answered this; so just to stack this on
> > > > > > > > > top of his comment.
> > > > > > > > > 
> > > > > > > > > The xen_shim_domain() is only meant to handle the case where
> > > > > > > > > the backend
> > > > > > > > > has/can-have full access to guest memory [i.e. netback and
> > > > > > > > > blkback would work
> > > > > > > > > with similar assumptions as vhost?]. For the normal case,
> > > > > > > > > where a backend *in a
> > > > > > > > > guest* maps and unmaps other guest memory, this is not
> > > > > > > > > applicable and these
> > > > > > > > > changes don't affect that case.
> > > > > > > > > 
> > > > > > > > > IOW, the PV backend here sits on the hypervisor, and the
> > > > > > > > > hypercalls aren't
> > > > > > > > > actual hypercalls but rather invoking shim_hypercall(). The
> > > > > > > > > call chain would go
> > > > > > > > > more or less like:
> > > > > > > > > 
> > > > > > > > > <netback|blkback|scsiback>
> > > > > > > > >   gnttab_map_refs(map_ops, pages)
> > > > > > > > >     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
> > > > > > > > >       shim_hypercall()
> > > > > > > > >         shim_hcall_gntmap()
> > > > > > > > > 
> > > > > > > > > Our reasoning was that given we are already in KVM, why
> > > > > > > > > mapping a page if the
> > > > > > > > > user (i.e. the kernel PV backend) is himself? The lack of
> > > > > > > > > GNTMAP_host_map is how
> > > > > > > > > the shim determines its user doesn't want to map the page.
> > > > > > > > > Also, there's another
> > > > > > > > > issue where PV backends always need a struct page to reference
> > > > > > > > > the device
> > > > > > > > > inflight data as Ankur pointed out.
> > > > > > > > 
> > > > > > > > Ultimately it's up to the Xen people.  It does make their API
> > > > > > > > uglier,
> > > > > > > > especially the in/out change for the parameter.  If you can at
> > > > > > > > least
> > > > > > > > avoid that, it would alleviate my concerns quite a bit.
> > > > > > > 
> > > > > > > In my view, we have two options overall:
> > > > > > > 
> > > > > > > 1) Make it explicit, the changes the PV drivers we have to make in
> > > > > > > order to support xen_shim_domain(). This could mean e.g. a) add a
> > > > > > > callback
> > > > > > > argument to gnttab_map_refs() that is invoked for every page that
> > > > > > > gets looked up
> > > > > > > successfully, and inside this callback the PV driver may update
> > > > > > > it's tracking
> > > > > > > page. Here we no longer have this in/out parameter in
> > > > > > > gnttab_map_refs, and all
> > > > > > > shim_domain specific bits would be a little more abstracted from
> > > > > > > Xen PV
> > > > > > > backends. See netback example below the scissors mark. Or b) have
> > > > > > > sort of a
> > > > > > > translate_gref() and put_gref() API that Xen PV drivers use which
> > > > > > > make it even
> > > > > > > more explicit that there's no grant ops involved. The latter is
> > > > > > > more invasive.
> > > > > > > 
> > > > > > > 2) The second option is to support guest grant mapping/unmapping
> > > > > > > [*] to allow
> > > > > > > hosting PV backends inside the guest. This would remove the Xen
> > > > > > > changes in this
> > > > > > > series completely. But it would require another guest being used
> > > > > > > as netback/blkback/xenstored, and less performance than 1)
> > > > > > > (though, in theory,
> > > > > > > it would be equivalent to what does Xen with grants/events). The
> > > > > > > only changes in
> > > > > > > Linux Xen code is adding xenstored domain support, but that is
> > > > > > > useful on its own
> > > > > > > outside the scope of this work.
> > > > > > > 
> > > > > > > I think there's value on both; 1) is probably more familiar for
> > > > > > > KVM users
> > > > > > > perhaps (as it is similar to what vhost does?) while  2) equates
> > > > > > > to implementing
> > > > > > > Xen disagregation capabilities in KVM.
> > > > > > > 
> > > > > > > Thoughts? Xen maintainers what's your take on this?
> > > > > > 
> > > > > > What I'd like best would be a new handle (e.g. xenhost_t *) used as
> > > > > > an
> > > > > > abstraction layer for this kind of stuff. It should be passed to the
> > > > > > backends and those would pass it on to low-level Xen drivers
> > > > > > (xenbus,
> > > > > > event channels, grant table, ...).
> > > > > > 
> > > > > So if IIRC backends would use the xenhost layer to access grants or
> > > > > frames
> > > > > referenced by grants, and that would grok into some of this. IOW, you
> > > > > would have
> > > > > two implementors of xenhost: one for nested remote/local events+grants
> > > > > and
> > > > > another for this "shim domain" ?
> > > > 
> > > > As I'd need that for nested Xen I guess that would make it 3 variants.
> > > > Probably the xen-shim variant would need more hooks, but that should be
> > > > no problem.
> > > > 
> > > I probably messed up in the short description but "nested remote/local
> > > events+grants" was referring to nested Xen (FWIW remote meant L0 and local
> > > L1).
> > > So maybe only 2 variants are needed?
> > > 
> > > > > > I was planning to do that (the xenhost_t * stuff) soon in order to
> > > > > > add
> > > > > > support for nested Xen using PV devices (you need two Xenstores for
> > > > > > that
> > > > > > as the nested dom0 is acting as Xen backend server, while using PV
> > > > > > frontends for accessing the "real" world outside).
> > > > > > 
> > > > > > The xenhost_t should be used for:
> > > > > > 
> > > > > > - accessing Xenstore
> > > > > > - issuing and receiving events
> > > > > > - doing hypercalls
> > > > > > - grant table operations
> > > > > > 
> > > > > 
> > > > > In the text above, I sort of suggested a slice of this on 1.b) with a
> > > > > translate_gref() and put_gref() API -- to get the page from a gref.
> > > > > This was
> > > > > because of the flags|host_addr hurdle we depicted above wrt to using
> > > > > using grant
> > > > > maps/unmaps. You think some of the xenhost layer would be ammenable to
> > > > > support
> > > > > this case?
> > > > 
> > > > I think so, yes.
> > > > 
> > > > > 
> > > > > > So exactly the kind of stuff you want to do, too.
> > > > > > 
> > > > > Cool idea!
> > > > 
> > > > In the end you might make my life easier for nested Xen. :-)
> > > > 
> > > Hehe :)
> > > 
> > > > Do you want to have a try with that idea or should I do that? I might be
> > > > able to start working on that in about a month.
> > > > 
> > > Ankur (CC'ed) will give a shot at it, and should start a new thread on
> > > this
> > > xenhost abstraction layer.
> > 
> > If you are up for it, it would be great to write some documentation too.
> > We are starting to have decent docs for some PV protocols, describing a
> > specific PV interface, but we are lacking docs about the basic building
> > blocks to bring up PV drivers in general. They would be extremely
> Agreed. These would be useful.
> 
> > useful. Given that you have just done the work, you are in a great
> > position to write those docs. Even bad English would be fine, I am sure
> > somebody else could volunteer to clean-up the language. Anything would
> > help :-)
> Can't make any promises on this yet but I will see if I can convert
> notes I made into something useful for the community.

Thank you!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support
  2019-04-10  5:50                   ` Ankur Arora
@ 2019-04-10 20:45                     ` Stefano Stabellini
  2019-04-10 20:45                     ` Stefano Stabellini
  1 sibling, 0 replies; 43+ messages in thread
From: Stefano Stabellini @ 2019-04-10 20:45 UTC (permalink / raw)
  To: Ankur Arora
  Cc: Juergen Gross, Artem_Mygaiev, Stefano Stabellini, Thomas Gleixner,
	kvm, Radim Krčmář, x86, linux-kernel,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	xen-devel, Joao Martins, Boris Ostrovsky

On Tue, 9 Apr 2019, Ankur Arora wrote:
> On 2019-04-08 5:35 p.m., Stefano Stabellini wrote:
> > On Mon, 8 Apr 2019, Joao Martins wrote:
> > > On 4/8/19 11:42 AM, Juergen Gross wrote:
> > > > On 08/04/2019 12:36, Joao Martins wrote:
> > > > > On 4/8/19 7:44 AM, Juergen Gross wrote:
> > > > > > On 12/03/2019 18:14, Joao Martins wrote:
> > > > > > > On 2/22/19 4:59 PM, Paolo Bonzini wrote:
> > > > > > > > On 21/02/19 12:45, Joao Martins wrote:
> > > > > > > > > On 2/20/19 9:09 PM, Paolo Bonzini wrote:
> > > > > > > > > > On 20/02/19 21:15, Joao Martins wrote:
> > > > > > > > > > >   2. PV Driver support (patches 17 - 39)
> > > > > > > > > > > 
> > > > > > > > > > >   We start by redirecting hypercalls from the backend to
> > > > > > > > > > > routines
> > > > > > > > > > >   which emulate the behaviour that PV backends expect i.e.
> > > > > > > > > > > grant
> > > > > > > > > > >   table and interdomain events. Next, we add support for
> > > > > > > > > > > late
> > > > > > > > > > >   initialization of xenbus, followed by implementing
> > > > > > > > > > >   frontend/backend communication mechanisms (i.e. grant
> > > > > > > > > > > tables and
> > > > > > > > > > >   interdomain event channels). Finally, introduce
> > > > > > > > > > > xen-shim.ko,
> > > > > > > > > > >   which will setup a limited Xen environment. This uses
> > > > > > > > > > > the added
> > > > > > > > > > >   functionality of Xen specific shared memory (grant
> > > > > > > > > > > tables) and
> > > > > > > > > > >   notifications (event channels).
> > > > > > > > > > 
> > > > > > > > > > I am a bit worried by the last patches, they seem really
> > > > > > > > > > brittle and
> > > > > > > > > > prone to breakage.  I don't know Xen well enough to
> > > > > > > > > > understand if the
> > > > > > > > > > lack of support for GNTMAP_host_map is fixable, but if not,
> > > > > > > > > > you have to
> > > > > > > > > > define a completely different hypercall.
> > > > > > > > > > 
> > > > > > > > > I guess Ankur already answered this; so just to stack this on
> > > > > > > > > top of his comment.
> > > > > > > > > 
> > > > > > > > > The xen_shim_domain() is only meant to handle the case where
> > > > > > > > > the backend
> > > > > > > > > has/can-have full access to guest memory [i.e. netback and
> > > > > > > > > blkback would work
> > > > > > > > > with similar assumptions as vhost?]. For the normal case,
> > > > > > > > > where a backend *in a
> > > > > > > > > guest* maps and unmaps other guest memory, this is not
> > > > > > > > > applicable and these
> > > > > > > > > changes don't affect that case.
> > > > > > > > > 
> > > > > > > > > IOW, the PV backend here sits on the hypervisor, and the
> > > > > > > > > hypercalls aren't
> > > > > > > > > actual hypercalls but rather invoking shim_hypercall(). The
> > > > > > > > > call chain would go
> > > > > > > > > more or less like:
> > > > > > > > > 
> > > > > > > > > <netback|blkback|scsiback>
> > > > > > > > >   gnttab_map_refs(map_ops, pages)
> > > > > > > > >     HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...)
> > > > > > > > >       shim_hypercall()
> > > > > > > > >         shim_hcall_gntmap()
> > > > > > > > > 
> > > > > > > > > Our reasoning was that given we are already in KVM, why
> > > > > > > > > mapping a page if the
> > > > > > > > > user (i.e. the kernel PV backend) is himself? The lack of
> > > > > > > > > GNTMAP_host_map is how
> > > > > > > > > the shim determines its user doesn't want to map the page.
> > > > > > > > > Also, there's another
> > > > > > > > > issue where PV backends always need a struct page to reference
> > > > > > > > > the device
> > > > > > > > > inflight data as Ankur pointed out.
> > > > > > > > 
> > > > > > > > Ultimately it's up to the Xen people.  It does make their API
> > > > > > > > uglier,
> > > > > > > > especially the in/out change for the parameter.  If you can at
> > > > > > > > least
> > > > > > > > avoid that, it would alleviate my concerns quite a bit.
> > > > > > > 
> > > > > > > In my view, we have two options overall:
> > > > > > > 
> > > > > > > 1) Make it explicit, the changes the PV drivers we have to make in
> > > > > > > order to support xen_shim_domain(). This could mean e.g. a) add a
> > > > > > > callback
> > > > > > > argument to gnttab_map_refs() that is invoked for every page that
> > > > > > > gets looked up
> > > > > > > successfully, and inside this callback the PV driver may update
> > > > > > > it's tracking
> > > > > > > page. Here we no longer have this in/out parameter in
> > > > > > > gnttab_map_refs, and all
> > > > > > > shim_domain specific bits would be a little more abstracted from
> > > > > > > Xen PV
> > > > > > > backends. See netback example below the scissors mark. Or b) have
> > > > > > > sort of a
> > > > > > > translate_gref() and put_gref() API that Xen PV drivers use which
> > > > > > > make it even
> > > > > > > more explicit that there's no grant ops involved. The latter is
> > > > > > > more invasive.
> > > > > > > 
> > > > > > > 2) The second option is to support guest grant mapping/unmapping
> > > > > > > [*] to allow
> > > > > > > hosting PV backends inside the guest. This would remove the Xen
> > > > > > > changes in this
> > > > > > > series completely. But it would require another guest being used
> > > > > > > as netback/blkback/xenstored, and less performance than 1)
> > > > > > > (though, in theory,
> > > > > > > it would be equivalent to what does Xen with grants/events). The
> > > > > > > only changes in
> > > > > > > Linux Xen code is adding xenstored domain support, but that is
> > > > > > > useful on its own
> > > > > > > outside the scope of this work.
> > > > > > > 
> > > > > > > I think there's value on both; 1) is probably more familiar for
> > > > > > > KVM users
> > > > > > > perhaps (as it is similar to what vhost does?) while  2) equates
> > > > > > > to implementing
> > > > > > > Xen disagregation capabilities in KVM.
> > > > > > > 
> > > > > > > Thoughts? Xen maintainers what's your take on this?
> > > > > > 
> > > > > > What I'd like best would be a new handle (e.g. xenhost_t *) used as
> > > > > > an
> > > > > > abstraction layer for this kind of stuff. It should be passed to the
> > > > > > backends and those would pass it on to low-level Xen drivers
> > > > > > (xenbus,
> > > > > > event channels, grant table, ...).
> > > > > > 
> > > > > So if IIRC backends would use the xenhost layer to access grants or
> > > > > frames
> > > > > referenced by grants, and that would grok into some of this. IOW, you
> > > > > would have
> > > > > two implementors of xenhost: one for nested remote/local events+grants
> > > > > and
> > > > > another for this "shim domain" ?
> > > > 
> > > > As I'd need that for nested Xen I guess that would make it 3 variants.
> > > > Probably the xen-shim variant would need more hooks, but that should be
> > > > no problem.
> > > > 
> > > I probably messed up in the short description but "nested remote/local
> > > events+grants" was referring to nested Xen (FWIW remote meant L0 and local
> > > L1).
> > > So maybe only 2 variants are needed?
> > > 
> > > > > > I was planning to do that (the xenhost_t * stuff) soon in order to
> > > > > > add
> > > > > > support for nested Xen using PV devices (you need two Xenstores for
> > > > > > that
> > > > > > as the nested dom0 is acting as Xen backend server, while using PV
> > > > > > frontends for accessing the "real" world outside).
> > > > > > 
> > > > > > The xenhost_t should be used for:
> > > > > > 
> > > > > > - accessing Xenstore
> > > > > > - issuing and receiving events
> > > > > > - doing hypercalls
> > > > > > - grant table operations
> > > > > > 
> > > > > 
> > > > > In the text above, I sort of suggested a slice of this on 1.b) with a
> > > > > translate_gref() and put_gref() API -- to get the page from a gref.
> > > > > This was
> > > > > because of the flags|host_addr hurdle we depicted above wrt to using
> > > > > using grant
> > > > > maps/unmaps. You think some of the xenhost layer would be ammenable to
> > > > > support
> > > > > this case?
> > > > 
> > > > I think so, yes.
> > > > 
> > > > > 
> > > > > > So exactly the kind of stuff you want to do, too.
> > > > > > 
> > > > > Cool idea!
> > > > 
> > > > In the end you might make my life easier for nested Xen. :-)
> > > > 
> > > Hehe :)
> > > 
> > > > Do you want to have a try with that idea or should I do that? I might be
> > > > able to start working on that in about a month.
> > > > 
> > > Ankur (CC'ed) will give a shot at it, and should start a new thread on
> > > this
> > > xenhost abstraction layer.
> > 
> > If you are up for it, it would be great to write some documentation too.
> > We are starting to have decent docs for some PV protocols, describing a
> > specific PV interface, but we are lacking docs about the basic building
> > blocks to bring up PV drivers in general. They would be extremely
> Agreed. These would be useful.
> 
> > useful. Given that you have just done the work, you are in a great
> > position to write those docs. Even bad English would be fine, I am sure
> > somebody else could volunteer to clean-up the language. Anything would
> > help :-)
> Can't make any promises on this yet but I will see if I can convert
> notes I made into something useful for the community.

Thank you!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-04-10 20:46 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20190220201609.28290-1-joao.m.martins@oracle.com>
2019-02-20 20:15 ` [PATCH RFC 17/39] x86/xen: export vcpu_info and shared_info Joao Martins
2019-02-20 20:15 ` [PATCH RFC 18/39] x86/xen: make hypercall_page generic Joao Martins
2019-02-20 20:15 ` [PATCH RFC 19/39] xen/xenbus: xenbus uninit support Joao Martins
2019-02-20 20:15 ` [PATCH RFC 20/39] xen-blkback: module_exit support Joao Martins
2019-02-20 20:16 ` [PATCH RFC 31/39] xen-shim: introduce shim domain driver Joao Martins
2019-02-20 20:16 ` [PATCH RFC 32/39] xen/balloon: xen_shim_domain() support Joao Martins
2019-02-20 20:16 ` [PATCH RFC 33/39] xen/grant-table: " Joao Martins
2019-02-20 20:16 ` [PATCH RFC 34/39] xen/gntdev: " Joao Martins
2019-02-20 20:16 ` [PATCH RFC 35/39] xen/xenbus: " Joao Martins
2019-02-20 20:16 ` [PATCH RFC 36/39] drivers/xen: " Joao Martins
2019-02-20 20:16 ` [PATCH RFC 38/39] xen-blkback: " Joao Martins
2019-02-20 21:09 ` [PATCH RFC 00/39] x86/KVM: Xen HVM guest support Paolo Bonzini
2019-02-20 23:39 ` Marek Marczykowski-Górecki
     [not found] ` <35051310-c497-8ad5-4434-1b8426a317d2@redhat.com>
2019-02-21  0:29   ` Ankur Arora
2019-02-21 11:45   ` Joao Martins
     [not found]   ` <8b1f4912-4f92-69ae-ae01-d899d5640572@oracle.com>
2019-02-22 16:59     ` Paolo Bonzini
     [not found]     ` <3ee91f33-2973-c2db-386f-afbf138081b4@redhat.com>
2019-03-12 17:14       ` Joao Martins
2019-04-08  6:44         ` Juergen Gross
2019-04-08  6:44         ` [Xen-devel] " Juergen Gross
2019-04-08 10:36           ` Joao Martins
2019-04-08 10:36           ` [Xen-devel] " Joao Martins
2019-04-08 10:42             ` Juergen Gross
2019-04-08 17:31               ` Joao Martins
2019-04-08 17:31               ` [Xen-devel] " Joao Martins
2019-04-09  0:35                 ` Stefano Stabellini
2019-04-10  5:50                   ` Ankur Arora
2019-04-10 20:45                     ` Stefano Stabellini
2019-04-10 20:45                     ` Stefano Stabellini
2019-04-10  5:50                   ` Ankur Arora
2019-04-09  0:35                 ` Stefano Stabellini
2019-04-09  5:04                 ` [Xen-devel] " Juergen Gross
2019-04-10  6:55                   ` Ankur Arora
2019-04-10  6:55                   ` [Xen-devel] " Ankur Arora
2019-04-10  7:14                     ` Juergen Gross
2019-04-10  7:14                     ` Juergen Gross
2019-04-09  5:04                 ` Juergen Gross
2019-04-08 10:42             ` Juergen Gross
     [not found] ` <20190220233941.GA5279@mail-itl>
2019-02-21  0:31   ` Ankur Arora
2019-02-21  7:57   ` Juergen Gross
2019-02-21 11:55   ` Joao Martins
     [not found]   ` <58ff93e1-6c91-c1a6-4273-531c28101569@suse.com>
2019-02-21 12:00     ` Joao Martins
     [not found] ` <20190220201609.28290-21-joao.m.martins@oracle.com>
2019-02-25 18:57   ` [PATCH RFC 20/39] xen-blkback: module_exit support Konrad Rzeszutek Wilk
     [not found]   ` <20190225185719.GA16013@char.us.oracle.com>
2019-02-26 11:20     ` Joao Martins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).