[PATCH v12] Linux Xen PVH support.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v12] Linux Xen PVH support.
@ 2014-01-01  4:35 Konrad Rzeszutek Wilk
  2014-01-01  4:35 ` [PATCH v12 01/18] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
                   ` (19 more replies)
  0 siblings, 20 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

The patches, also available at
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v12

implements the neccessary functionality to boot a PV guest in PVH mode.

This blog has a great description of what PVH is:
http://blog.xen.org/index.php/2012/10/31/the-paravirtualization-spectrum-part-2-from-poles-to-a-spectrum/

These patches are based on v3.13-rc6.

Changelog of v12: [http://mid.gmane.org/1387313503-31362-1-git-send-email-konrad.wilk@oracle.com]
 - Rework per Stefano's review.
 - Split some patches up for easier review.
 - Bugs fixed.

Changelog of v11 as compared to v10: [https://lkml.org/lkml/2013/12/12/625]:
 - Split patches in a more logical sense, squash some
 - Dropped Acked-by's from folks
 - Fleshed out descriptions

Regression wise - there are no bugs with Xen 4.2 and Xen 4.3.

That is if you compile/boot it with
CONFIG_XEN_PVH=y or "# CONFIG_XEN_PVH is not set" - in both cases as
either dom0 or domU there are no bugs. Also launched it as 32/64 bit
dom0 with 32/64 domU as PV or PVHVM, and along with SLES11, SLES12,
F15->F19 (32/64), OL5, OL6, RHEL5 (32/64) FreeBSD HVM, NetBSD PV without issues.

With Xen 4.1, there is a regression, (see
http://mid.gmane.org/20131220175735.GA619@phenom.dumpdata.com)
and it is unclear at just time what the right way to fix the PVH ABI
to work around it. When that has been cleared up, some of the patches:

 [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2).
 [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
 [PATCH v12 07/18] xen/pvh: Setup up shared_info.
 [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2)
 [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware

will have to be reworked.

The only things needed to make this work as PVH are:

 0) Get the latest version of Xen and compile/install it.
    See http://wiki.xen.org/wiki/Compiling_Xen_From_Source for details

 1) Clone above mentioned tree

    See http://wiki.xenproject.org/wiki/Mainline_Linux_Kernel_Configs#Configuring_the_Kernel
    for details. The steps are:

	cd $HOME
	git clone  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git linux
	cd linux
	git checkout origin/stable/pvh.v11

 2) Compile with CONFIG_XEN_PVH=y

    a) From scratch:

	make defconfig
	make menuconfig
	Processor type and features  --->  Linux guest support  --->
		 Paravirtualization layer for spinlocks
		 Xen guest support	(which will now show you:)
		 Support for running as a PVH guest (NEW)

	in case you like to edit .config, it is:

	CONFIG_HYPERVISOR_GUEST=y
	CONFIG_PARAVIRT=y
	CONFIG_PARAVIRT_GUEST=y
	CONFIG_PARAVIRT_SPINLOCKS=y
	CONFIG_XEN=y
	CONFIG_XEN_PVH=y

	You will also have to enable the block, network drivers, console, etc
	which are in different submenus.

    b). Based on your current distro.

	cp /boot/config-`uname -r` $HOME/linux/.config
	make menuconfig
	Processor type and features  --->  Linux guest support  --->
		 Support for running as a PVH guest (NEW)

 3) Launch it with 'pvh=1' in your guest config (for example):

	extra="console=hvc0 debug  kgdboc=hvc0 nokgdbroundup  initcall_debug debug"
	kernel="/mnt/lab/latest/vmlinuz"
	ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
	memory=1024
	vcpus=4
	name="pvh"
	vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ]
	vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
	disk=['phy:/dev/sdb1,xvda,w']
	pvh=1
	on_reboot="preserve"
	on_crash="preserve"
	on_poweroff="preserve"

    using 'xl'. Xend 'xm' does not have PVH support.

It will bootup as a normal PV guest, but 'xen-detect' will report it as an HVM
guest.

The functionality that is turned off is:
 - VCPU hotplug. You can try it but it should not allow you to do it.
   So 'echo 0 > /sys/bus/cpu/devices/cpu4/online' will error out.


Items that have not been tested extensively or at all:
  - Migration (xl save && xl restore for example).

  - 32-bit guests (won't even present you with a CONFIG_XEN_PVH option)

  - PCI passthrough

  - Running it in dom0 mode (as the patches for that are not yet in Xen upstream).
    If you want to try that, you can merge/pull Mukesh's branch:

	cd $HOME/xen
	git pull git://oss.oracle.com/git/mrathor/xen.git dom0pvh-v6

    .. and use this bootup parameter ("dom0pvh=1"). Remember to recompile
    and install the new version of Xen. This patchset
    does not contain the patches neccessary to setup guests - but I can
    create one easily enough. 

  - Memory ballooning
.
  - Multiple VBDs, NICs, etc.

If you encounter errors, please email with the following (pls note that the
guest config has 'on_reboot="preserve", on_crash="preserve" - which you should
have in your guest config to contain the memory of the guest):

 a) xl dmesg
 b) xl list
 c) xenctx -s $HOME/linux/System.map -f -a -C <domain id>
    [xenctx is sometimes found in  /usr/lib/xen/bin/xenctx ]
 d) the console output from the guest
 e) Anything else you can think off.

Stash away your vmlinux file (it is too big to send via email) - as I might
need it later on.


That is it!

Thank you!

 arch/arm/xen/enlighten.c           |   9 +-
 arch/x86/include/asm/xen/page.h    |   7 +-
 arch/x86/xen/Kconfig               |   8 ++
 arch/x86/xen/enlighten.c           | 115 ++++++++++++++++++++------
 arch/x86/xen/grant-table.c         |  64 +++++++++++++++
 arch/x86/xen/irq.c                 |   5 +-
 arch/x86/xen/mmu.c                 | 164 ++++++++++++++++++++++---------------
 arch/x86/xen/p2m.c                 |  15 +++-
 arch/x86/xen/setup.c               |  41 ++++++++--
 arch/x86/xen/smp.c                 |  49 +++++++----
 arch/x86/xen/xen-head.S            |   8 +-
 arch/x86/xen/xen-ops.h             |   1 +
 drivers/xen/cpu_hotplug.c          |   4 +-
 drivers/xen/events.c               |  16 ++--
 drivers/xen/gntdev.c               |   2 +-
 drivers/xen/grant-table.c          |  76 ++++++++++++-----
 drivers/xen/platform-pci.c         |  10 ++-
 drivers/xen/xenbus/xenbus_client.c |   3 +-
 include/xen/grant_table.h          |   9 +-
 include/xen/xen.h                  |  14 ++++
 20 files changed, 462 insertions(+), 158 deletions(-)

Konrad Rzeszutek Wilk (6):
      xen/pvh: Don't setup P2M tree.
      xen/mmu/p2m: Refactor the xen_pagetable_init code.
      xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
      xen/grant-table: Refactor gnttab_init
      xen/grant: Implement an grant frame array struct.
      xen/pvh: Piggyback on PVHVM for grant driver (v2)

Mukesh Rathor (12):
      xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
      xen/pvh/x86: Define what an PVH guest is (v2).
      xen/pvh: Early bootup changes in PV code (v2).
      xen/pvh: MMU changes for PVH (v2)
      xen/pvh: Setup up shared_info.
      xen/pvh: Load GDT/GS in early PV bootup code for BSP.
      xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
      xen/pvh: Update E820 to work with PVH (v2)
      xen/pvh: Piggyback on PVHVM for event channels (v2)
      xen/pvh: Piggyback on PVHVM XenBus.
      xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2)
      xen/pvh: Support ParaVirtualized Hardware extensions (v2).


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v12 01/18] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-01  4:35 ` [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2) Konrad Rzeszutek Wilk
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

Most of the functions in page.h are prefaced with
	if (xen_feature(XENFEAT_auto_translated_physmap))
		return mfn;

Except the mfn_to_local_pfn. At a first sight, the function
should work without this patch - as the 'mfn_to_mfn' has
a similar check. But there are no such check in the
'get_phys_to_machine' function - so we would crash in there.

This fixes it by following the convention of having the
check for auto-xlat in these static functions.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/include/asm/xen/page.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index b913915..4a092cc 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -167,7 +167,12 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
  */
 static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 {
-	unsigned long pfn = mfn_to_pfn(mfn);
+	unsigned long pfn;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return mfn;
+
+	pfn = mfn_to_pfn(mfn);
 	if (get_phys_to_machine(pfn) != mfn)
 		return -1; /* force !pfn_valid() */
 	return pfn;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2).
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
  2014-01-01  4:35 ` [PATCH v12 01/18] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:13   ` David Vrabel
  2014-01-01  4:35 ` [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2) Konrad Rzeszutek Wilk
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

Which is a PV guest with auto page translation enabled
and with vector callback. It is a cross between PVHVM and PV.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

We don't have yet a Kconfig entry setup as we do not
have all the parts ready for it - so we piggyback
on the PVHVM config option. This scaffolding will
be removed later.

Note that on ARM the concept of PVH is non-existent. As Ian
put it: "an ARM guest is neither PV nor HVM nor PVHVM.
It's a bit like PVH but is different also (it's further towards
the H end of the spectrum than even PVH).". As such these
options (PVHVM, PVH) are never enabled nor seen on ARM
compilations.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 include/xen/xen.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/xen/xen.h b/include/xen/xen.h
index a74d436..c4ab644 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,4 +29,20 @@ extern enum xen_domain_type xen_domain_type;
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
+#ifdef CONFIG_XEN_PVHVM
+/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
+
+/* This functionality exists only for x86. The XEN_PVHVM support exists
+ * only in x86 world - hence on ARM it will be always disabled.
+ * N.B. ARM guests are neither PV nor HVM nor PVHVM.
+ * It's a bit like PVH but is different also (it's further towards the H
+ * end of the spectrum than even PVH).
+ */
+#include <xen/features.h>
+#define xen_pvh_domain() (xen_pv_domain() && \
+			  xen_feature(XENFEAT_auto_translated_physmap) && \
+			  xen_have_vector_callback)
+#else
+#define xen_pvh_domain()	(0)
+#endif
 #endif	/* _XEN_XEN_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
  2014-01-01  4:35 ` [PATCH v12 01/18] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
  2014-01-01  4:35 ` [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 15:32   ` David Vrabel
  2014-01-01  4:35 ` [PATCH v12 04/18] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

In the bootup code for PVH we can trap cpuid via vmexit, so don't
need to use emulated prefix call. We also check for vector callback
early on, as it is a required feature. PVH also runs at default kernel
IOPL.

Finally, pure PV settings are moved to a separate function that are
only called for pure PV, ie, pv with pvmmu. They are also #ifdef
with CONFIG_XEN_PVMMU.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 63 +++++++++++++++++++++++++++++++++---------------
 arch/x86/xen/setup.c     | 18 +++++++++-----
 2 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade7..755e5bb 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -46,6 +46,7 @@
 #include <xen/hvm.h>
 #include <xen/hvc-console.h>
 #include <xen/acpi.h>
+#include <xen/features.h>
 
 #include <asm/paravirt.h>
 #include <asm/apic.h>
@@ -262,8 +263,9 @@ static void __init xen_banner(void)
 	struct xen_extraversion extra;
 	HYPERVISOR_xen_version(XENVER_extraversion, &extra);
 
-	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
-	       pv_info.name);
+	pr_info("Booting paravirtualized kernel %son %s\n",
+		xen_feature(XENFEAT_auto_translated_physmap) ?
+			"with PVH extensions " : "", pv_info.name);
 	printk(KERN_INFO "Xen version: %d.%d%s%s\n",
 	       version >> 16, version & 0xffff, extra.extraversion,
 	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
@@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 		break;
 	}
 
-	asm(XEN_EMULATE_PREFIX "cpuid"
-		: "=a" (*ax),
-		  "=b" (*bx),
-		  "=c" (*cx),
-		  "=d" (*dx)
-		: "0" (*ax), "2" (*cx));
+	if (xen_pvh_domain())
+		native_cpuid(ax, bx, cx, dx);
+	else
+		asm(XEN_EMULATE_PREFIX "cpuid"
+			: "=a" (*ax),
+			"=b" (*bx),
+			"=c" (*cx),
+			"=d" (*dx)
+			: "0" (*ax), "2" (*cx));
 
 	*bx &= maskebx;
 	*cx &= maskecx;
@@ -1420,6 +1425,19 @@ static void __init xen_setup_stackprotector(void)
 	pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+static void __init xen_pvh_early_guest_init(void)
+{
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	if (xen_feature(XENFEAT_hvm_callback_vector))
+		xen_have_vector_callback = 1;
+
+#ifdef CONFIG_X86_32
+	BUG(); /* PVH: Implement proper support. */
+#endif
+}
+
 /* First C function to be called on Xen boot */
 asmlinkage void __init xen_start_kernel(void)
 {
@@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_domain_type = XEN_PV_DOMAIN;
 
+	xen_setup_features();
+	xen_pvh_early_guest_init();
 	xen_setup_machphys_mapping();
 
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_init_ops = xen_init_ops;
-	pv_cpu_ops = xen_cpu_ops;
 	pv_apic_ops = xen_apic_ops;
+	if (xen_pvh_domain())
+		pv_cpu_ops.cpuid = xen_cpuid;
+	else
+		pv_cpu_ops = xen_cpu_ops;
 
 	x86_init.resources.memory_setup = xen_memory_setup;
 	x86_init.oem.arch_setup = xen_arch_setup;
@@ -1469,8 +1492,6 @@ asmlinkage void __init xen_start_kernel(void)
 	/* Work out if we support NX */
 	x86_configure_nx();
 
-	xen_setup_features();
-
 	/* Get mfn list */
 	if (!xen_feature(XENFEAT_auto_translated_physmap))
 		xen_build_dynamic_phys_to_machine();
@@ -1548,14 +1569,18 @@ asmlinkage void __init xen_start_kernel(void)
 	/* set the limit of our address space */
 	xen_reserve_top();
 
-	/* We used to do this in xen_arch_setup, but that is too late on AMD
-	 * were early_cpu_init (run before ->arch_setup()) calls early_amd_init
-	 * which pokes 0xcf8 port.
-	 */
-	set_iopl.iopl = 1;
-	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
-	if (rc != 0)
-		xen_raw_printk("physdev_op failed %d\n", rc);
+	/* PVH: runs at default kernel iopl of 0 */
+	if (!xen_pvh_domain()) {
+		/*
+		 * We used to do this in xen_arch_setup, but that is too late
+		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
+		 * early_amd_init which pokes 0xcf8 port.
+		 */
+		set_iopl.iopl = 1;
+		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
+		if (rc != 0)
+			xen_raw_printk("physdev_op failed %d\n", rc);
+	}
 
 #ifdef CONFIG_X86_32
 	/* set up basic CPUID stuff */
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 68c054f..2137c51 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -563,16 +563,13 @@ void xen_enable_nmi(void)
 		BUG();
 #endif
 }
-void __init xen_arch_setup(void)
+void __init xen_pvmmu_arch_setup(void)
 {
-	xen_panic_handler_init();
-
 	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
 	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables);
 
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		HYPERVISOR_vm_assist(VMASST_CMD_enable,
-				     VMASST_TYPE_pae_extended_cr3);
+	HYPERVISOR_vm_assist(VMASST_CMD_enable,
+			     VMASST_TYPE_pae_extended_cr3);
 
 	if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
 	    register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
@@ -581,6 +578,15 @@ void __init xen_arch_setup(void)
 	xen_enable_sysenter();
 	xen_enable_syscall();
 	xen_enable_nmi();
+}
+
+/* This function is not called for HVM domains */
+void __init xen_arch_setup(void)
+{
+	xen_panic_handler_init();
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
+		xen_pvmmu_arch_setup();
+
 #ifdef CONFIG_ACPI
 	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
 		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 04/18] xen/pvh: Don't setup P2M tree.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:17   ` David Vrabel
  2014-01-03 15:41   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code Konrad Rzeszutek Wilk
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit 3d24bbd7dddbea54358a9795abaf051b0f18973c
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.

with above mentioned "commit you'll get set_phys_to_machine calls
from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
anything with them " (Stefano Stabellini) which is OK - we want
them to be NOPs.

This is because we assume that an "IOMMU is always present on the
plaform and Xen is going to make the appropriate IOMMU pagetable
changes in the hypercall implementation of GNTTABOP_map_grant_ref
and GNTTABOP_unmap_grant_ref, then eveything should be transparent
from PVH priviligied point of view and DMA transfers involving
foreign pages keep working with no issues[sp]

Otherwise we would need a P2M (and an M2P) for PVH priviligied to
track these foreign pages .. (see arch/arm/xen/p2m.c)."
(Stefano Stabellini).

We still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).

Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.

Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c |  3 +--
 arch/x86/xen/p2m.c       | 12 ++++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 755e5bb..ab4dd70 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1493,8 +1493,7 @@ asmlinkage void __init xen_start_kernel(void)
 	x86_configure_nx();
 
 	/* Get mfn list */
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		xen_build_dynamic_phys_to_machine();
+	xen_build_dynamic_phys_to_machine();
 
 	/*
 	 * Set up kernel GDT and segment registers, mainly so that
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 2ae8699..fb7ee0a 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -280,6 +280,9 @@ void __ref xen_build_mfn_list_list(void)
 {
 	unsigned long pfn;
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
 		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
@@ -346,10 +349,15 @@ void xen_setup_mfn_list_list(void)
 /* Set up p2m_top to point to the domain-builder provided p2m pages */
 void __init xen_build_dynamic_phys_to_machine(void)
 {
-	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
-	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
+	unsigned long *mfn_list;
+	unsigned long max_pfn;
 	unsigned long pfn;
 
+	 if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	mfn_list = (unsigned long *)xen_start_info->mfn_list;
+	max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
 	xen_max_p2m_pfn = max_pfn;
 
 	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 04/18] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:21   ` David Vrabel
  2014-01-03 15:47   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

The revector and copying of the P2M only happens when
!auto-xlat and on 64-bit builds. It is not obvious from
the code, so lets have seperate 32 and 64-bit functions.

We also invert the check for auto-xlat to make the code
flow simpler.

Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c | 73 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..d792a69 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1198,44 +1198,40 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
 	 * instead of somewhere later and be confusing. */
 	xen_mc_flush();
 }
-#endif
-static void __init xen_pagetable_init(void)
+static void __init xen_pagetable_p2m_copy(void)
 {
-#ifdef CONFIG_X86_64
 	unsigned long size;
 	unsigned long addr;
-#endif
-	paging_init();
-	xen_setup_shared_info();
-#ifdef CONFIG_X86_64
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		unsigned long new_mfn_list;
+	unsigned long new_mfn_list;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+
+	/* On 32-bit, we get zero so this never gets executed. */
+	new_mfn_list = xen_revector_p2m_tree();
+	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
+		/* using __ka address and sticking INVALID_P2M_ENTRY! */
+		memset((void *)xen_start_info->mfn_list, 0xff, size);
+
+		/* We should be in __ka space. */
+		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
+		addr = xen_start_info->mfn_list;
+		/* We roundup to the PMD, which means that if anybody at this stage is
+		 * using the __ka address of xen_start_info or xen_start_info->shared_info
+		 * they are in going to crash. Fortunatly we have already revectored
+		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
+		size = roundup(size, PMD_SIZE);
+		xen_cleanhighmap(addr, addr + size);
 
 		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+		memblock_free(__pa(xen_start_info->mfn_list), size);
+		/* And revector! Bye bye old array */
+		xen_start_info->mfn_list = new_mfn_list;
+	} else
+		return;
 
-		/* On 32-bit, we get zero so this never gets executed. */
-		new_mfn_list = xen_revector_p2m_tree();
-		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
-			/* using __ka address and sticking INVALID_P2M_ENTRY! */
-			memset((void *)xen_start_info->mfn_list, 0xff, size);
-
-			/* We should be in __ka space. */
-			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
-			addr = xen_start_info->mfn_list;
-			/* We roundup to the PMD, which means that if anybody at this stage is
-			 * using the __ka address of xen_start_info or xen_start_info->shared_info
-			 * they are in going to crash. Fortunatly we have already revectored
-			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
-			size = roundup(size, PMD_SIZE);
-			xen_cleanhighmap(addr, addr + size);
-
-			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
-			memblock_free(__pa(xen_start_info->mfn_list), size);
-			/* And revector! Bye bye old array */
-			xen_start_info->mfn_list = new_mfn_list;
-		} else
-			goto skip;
-	}
 	/* At this stage, cleanup_highmap has already cleaned __ka space
 	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
 	 * the ramdisk). We continue on, erasing PMD entries that point to page
@@ -1255,8 +1251,19 @@ static void __init xen_pagetable_init(void)
 	 * anything at this stage. */
 	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
 #endif
-skip:
+}
+#else
+static void __init xen_pagetable_p2m_copy(void)
+{
+	/* Nada! */
+}
 #endif
+
+static void __init xen_pagetable_init(void)
+{
+	paging_init();
+	xen_setup_shared_info();
+	xen_pagetable_p2m_copy();
 	xen_post_allocator_init();
 }
 static void xen_write_cr2(unsigned long cr2)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:24   ` David Vrabel
  2014-01-01  4:35 ` [PATCH v12 07/18] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

.. which are surprinsingly small compared to the amount for PV code.

PVH uses mostly native mmu ops, we leave the generic (native_*) for
the majority and just overwrite the baremetal with the ones we need.

We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.

At startup, we are running with pre-allocated page-tables
courtesy of the tool-stack. But we still need to graft them
in the Linux initial pagetables. However there is no need to
unpin/pin and change them to R/O or R/W.

Note that the xen_pagetable_init due to 7836fec9d0994cc9c9150c5a33f0eb0eb08a335a
"xen/mmu/p2m: Refactor the xen_pagetable_init code." does not
need any changes - we just need to make sure that xen_post_allocator_init
does not alter the pvops from the default native one.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c | 90 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 55 insertions(+), 35 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d792a69..d9ac620 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1760,6 +1760,10 @@ static void set_page_prot_flags(void *addr, pgprot_t prot, unsigned long flags)
 	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
 	pte_t pte = pfn_pte(pfn, prot);
 
+	/* For PVH no need to set R/O or R/W to pin them or unpin them. */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, flags))
 		BUG();
 }
@@ -1870,6 +1874,7 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
  * but that's enough to get __va working.  We need to fill in the rest
  * of the physical mapping once some sort of allocator has been set
  * up.
+ * NOTE: for PVH, the page tables are native.
  */
 void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
@@ -1891,17 +1896,18 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	/* Zap identity mapping */
 	init_level4_pgt[0] = __pgd(0);
 
-	/* Pre-constructed entries are in pfn, so convert to mfn */
-	/* L4[272] -> level3_ident_pgt
-	 * L4[511] -> level3_kernel_pgt */
-	convert_pfn_mfn(init_level4_pgt);
-
-	/* L3_i[0] -> level2_ident_pgt */
-	convert_pfn_mfn(level3_ident_pgt);
-	/* L3_k[510] -> level2_kernel_pgt
-	 * L3_i[511] -> level2_fixmap_pgt */
-	convert_pfn_mfn(level3_kernel_pgt);
-
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* Pre-constructed entries are in pfn, so convert to mfn */
+		/* L4[272] -> level3_ident_pgt
+		 * L4[511] -> level3_kernel_pgt */
+		convert_pfn_mfn(init_level4_pgt);
+
+		/* L3_i[0] -> level2_ident_pgt */
+		convert_pfn_mfn(level3_ident_pgt);
+		/* L3_k[510] -> level2_kernel_pgt
+		 * L3_i[511] -> level2_fixmap_pgt */
+		convert_pfn_mfn(level3_kernel_pgt);
+	}
 	/* We get [511][511] and have Xen's version of level2_kernel_pgt */
 	l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
 	l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
@@ -1925,31 +1931,33 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	copy_page(level2_fixmap_pgt, l2);
 	/* Note that we don't do anything with level1_fixmap_pgt which
 	 * we don't need. */
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* Make pagetable pieces RO */
+		set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
+		set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
+
+		/* Pin down new L4 */
+		pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
+				  PFN_DOWN(__pa_symbol(init_level4_pgt)));
+
+		/* Unpin Xen-provided one */
+		pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
 
-	/* Make pagetable pieces RO */
-	set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
-	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
-
-	/* Pin down new L4 */
-	pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
-			  PFN_DOWN(__pa_symbol(init_level4_pgt)));
-
-	/* Unpin Xen-provided one */
-	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
-
-	/*
-	 * At this stage there can be no user pgd, and no page
-	 * structure to attach it to, so make sure we just set kernel
-	 * pgd.
-	 */
-	xen_mc_batch();
-	__xen_write_cr3(true, __pa(init_level4_pgt));
-	xen_mc_issue(PARAVIRT_LAZY_CPU);
+		/*
+		 * At this stage there can be no user pgd, and no page
+		 * structure to attach it to, so make sure we just set kernel
+		 * pgd.
+		 */
+		xen_mc_batch();
+		__xen_write_cr3(true, __pa(init_level4_pgt));
+		xen_mc_issue(PARAVIRT_LAZY_CPU);
+	} else
+		native_write_cr3(__pa(init_level4_pgt));
 
 	/* We can't that easily rip out L3 and L2, as the Xen pagetables are
 	 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
@@ -2110,6 +2118,9 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 
 static void __init xen_post_allocator_init(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	pv_mmu_ops.set_pte = xen_set_pte;
 	pv_mmu_ops.set_pmd = xen_set_pmd;
 	pv_mmu_ops.set_pud = xen_set_pud;
@@ -2214,6 +2225,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 void __init xen_init_mmu_ops(void)
 {
 	x86_init.paging.pagetable_init = xen_pagetable_init;
+
+	/* Optimization - we can use the HVM one but it has no idea which
+	 * VCPUs are descheduled - which means that it will needlessly IPI
+	 * them. Xen knows so let it do the job.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+		return;
+	}
 	pv_mmu_ops = xen_mmu_ops;
 
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 07/18] xen/pvh: Setup up shared_info.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:27   ` David Vrabel
  2014-01-01  4:35 ` [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).

That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.

The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies those to work in auto-xlat mode.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 5 +++--
 arch/x86/xen/p2m.c       | 3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ab4dd70..4cdc483 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1147,8 +1147,9 @@ void xen_setup_vcpu_info_placement(void)
 		xen_vcpu_setup(cpu);
 
 	/* xen_vcpu_setup managed to place the vcpu_info within the
-	   percpu area for all cpus, so make use of it */
-	if (have_vcpu_info_placement) {
+	 * percpu area for all cpus, so make use of it. Note that for
+	 * PVH we want to use native IRQ mechanism. */
+	if (have_vcpu_info_placement && !xen_pvh_domain()) {
 		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
 		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
 		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index fb7ee0a..696c694 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -339,6 +339,9 @@ void __ref xen_build_mfn_list_list(void)
 
 void xen_setup_mfn_list_list(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
 
 	HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 07/18] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:31   ` David Vrabel
  2014-01-01  4:35 ` [PATCH v12 09/18] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).

But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.

For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.

For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4cdc483..7690484 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1414,8 +1414,43 @@ static void __init xen_boot_params_init_edd(void)
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
  */
-static void __init xen_setup_stackprotector(void)
+static void __init xen_setup_gdt(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+#ifdef CONFIG_X86_64
+		unsigned long dummy;
+
+		switch_to_new_gdt(0); /* GDT and GS set */
+
+		/* We are switching of the Xen provided GDT to our HVM mode
+		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
+		 * and we are jumping to reload it.
+		 */
+		asm volatile ("pushq %0\n"
+			      "leaq 1f(%%rip),%0\n"
+			      "pushq %0\n"
+			      "lretq\n"
+			      "1:\n"
+			      : "=&r" (dummy) : "0" (__KERNEL_CS));
+
+		/*
+		 * While not needed, we also set the %es, %ds, and %fs
+		 * to zero. We don't care about %ss as it is NULL.
+		 * Strictly speaking this is not needed as Xen zeros those
+		 * out (and also MSR_FS_BASE, MSR_GS_BASE, MSR_KERNEL_GS_BASE)
+		 *
+		 * Linux zeros them in cpu_init() and in secondary_startup_64
+		 * (for BSP).
+		 */
+		loadsegment(es, 0);
+		loadsegment(ds, 0);
+		loadsegment(fs, 0);
+#else
+		/* PVH: TODO Implement. */
+		BUG();
+#endif
+		return;
+	}
 	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_cpu_ops.load_gdt = xen_load_gdt_boot;
 
@@ -1500,7 +1535,7 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up kernel GDT and segment registers, mainly so that
 	 * -fstack-protector code can be executed.
 	 */
-	xen_setup_stackprotector();
+	xen_setup_gdt();
 
 	xen_init_irq_ops();
 	xen_init_cpuid_mask();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 09/18] xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 16:07   ` David Vrabel
  2014-01-01  4:35 ` [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

The VCPU bringup protocol follows the PV with certain twists.
>From xen/include/public/arch-x86/xen.h:

Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:

 - For HVM guests, the structures read include: fpu_ctxt (if
 VGCT_I387_VALID is set), flags, user_regs, debugreg[*]

 - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
 set cr3. All other fields not used should be set to 0.

This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.

We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 11 ++++++++---
 arch/x86/xen/smp.c       | 49 ++++++++++++++++++++++++++++++++----------------
 arch/x86/xen/xen-ops.h   |  1 +
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 7690484..8493653 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1413,14 +1413,19 @@ static void __init xen_boot_params_init_edd(void)
  * Set up the GDT and segment registers for -fstack-protector.  Until
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
+ *
+ * Note, that it is refok - b/c the only caller of this after init
+ * is PVH which is not going to use xen_load_gdt_boot or other
+ * __init functions.
  */
-static void __init xen_setup_gdt(void)
+void __init_refok xen_setup_gdt(int cpu)
 {
 	if (xen_feature(XENFEAT_auto_translated_physmap)) {
 #ifdef CONFIG_X86_64
 		unsigned long dummy;
 
-		switch_to_new_gdt(0); /* GDT and GS set */
+		load_percpu_segment(cpu); /* We need to access per-cpu area */
+		switch_to_new_gdt(cpu); /* GDT and GS set */
 
 		/* We are switching of the Xen provided GDT to our HVM mode
 		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
@@ -1535,7 +1540,7 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up kernel GDT and segment registers, mainly so that
 	 * -fstack-protector code can be executed.
 	 */
-	xen_setup_gdt();
+	xen_setup_gdt(0);
 
 	xen_init_irq_ops();
 	xen_init_cpuid_mask();
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c36b325..5e46190 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -73,9 +73,11 @@ static void cpu_bringup(void)
 	touch_softlockup_watchdog();
 	preempt_disable();
 
-	xen_enable_sysenter();
-	xen_enable_syscall();
-
+	/* PVH runs in ring 0 and allows us to do native syscalls. Yay! */
+	if (!xen_feature(XENFEAT_supervisor_mode_kernel)) {
+		xen_enable_sysenter();
+		xen_enable_syscall();
+	}
 	cpu = smp_processor_id();
 	smp_store_cpu_info(cpu);
 	cpu_data(cpu).x86_max_cores = 1;
@@ -97,8 +99,14 @@ static void cpu_bringup(void)
 	wmb();			/* make sure everything is out */
 }
 
-static void cpu_bringup_and_idle(void)
+/* Note: cpu parameter is only relevant for PVH */
+static void cpu_bringup_and_idle(int cpu)
 {
+#ifdef CONFIG_X86_64
+	if (xen_feature(XENFEAT_auto_translated_physmap) &&
+	    xen_feature(XENFEAT_supervisor_mode_kernel))
+		xen_setup_gdt(cpu);
+#endif
 	cpu_bringup();
 	cpu_startup_entry(CPUHP_ONLINE);
 }
@@ -274,9 +282,10 @@ static void __init xen_smp_prepare_boot_cpu(void)
 	native_smp_prepare_boot_cpu();
 
 	if (xen_pv_domain()) {
-		/* We've switched to the "real" per-cpu gdt, so make sure the
-		   old memory can be recycled */
-		make_lowmem_page_readwrite(xen_initial_gdt);
+		if (!xen_feature(XENFEAT_writable_page_tables))
+			/* We've switched to the "real" per-cpu gdt, so make
+			 * sure the old memory can be recycled. */
+			make_lowmem_page_readwrite(xen_initial_gdt);
 
 #ifdef CONFIG_X86_32
 		/*
@@ -360,22 +369,21 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 
 	gdt = get_cpu_gdt_table(cpu);
 
-	ctxt->flags = VGCF_IN_KERNEL;
-	ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
+	/* Note: PVH is not yet supported on x86_32. */
 	ctxt->user_regs.fs = __KERNEL_PERCPU;
 	ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
-#else
-	ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-	{
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		ctxt->flags = VGCF_IN_KERNEL;
 		ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 		ctxt->user_regs.ds = __USER_DS;
 		ctxt->user_regs.es = __USER_DS;
+		ctxt->user_regs.ss = __KERNEL_DS;
 
 		xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -396,18 +404,27 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_32
 		ctxt->event_callback_cs     = __KERNEL_CS;
 		ctxt->failsafe_callback_cs  = __KERNEL_CS;
+#else
+		ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 		ctxt->event_callback_eip    =
 					(unsigned long)xen_hypervisor_callback;
 		ctxt->failsafe_callback_eip =
 					(unsigned long)xen_failsafe_callback;
+		ctxt->user_regs.cs = __KERNEL_CS;
+		per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
+#ifdef CONFIG_X86_32
 	}
-	ctxt->user_regs.cs = __KERNEL_CS;
+#else
+	} else
+		/* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
+		 * %rdi having the cpu number - which means are passing in
+		 * as the first parameter the cpu. Subtle!
+		 */
+		ctxt->user_regs.rdi = cpu;
+#endif
 	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
-
-	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
-
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
 		BUG();
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 95f8c61..9059c24 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -123,4 +123,5 @@ __visible void xen_adjust_exception_frame(void);
 
 extern int xen_panic_handler_init(void);
 
+void xen_setup_gdt(int cpu);
 #endif /* XEN_OPS_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 09/18] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 16:14   ` David Vrabel
  2014-01-03 16:30   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 2137c51..dd5f905 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -27,6 +27,7 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/physdev.h>
 #include <xen/features.h>
+#include "mmu.h"
 #include "xen-ops.h"
 #include "vdso.h"
 
@@ -81,6 +82,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 
 	memblock_reserve(start, size);
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	xen_max_p2m_pfn = PFN_DOWN(start + size);
 	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
 		unsigned long mfn = pfn_to_mfn(pfn);
@@ -103,6 +107,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 		.domid        = DOMID_SELF
 	};
 	unsigned long len = 0;
+	int xlated_phys = xen_feature(XENFEAT_auto_translated_physmap);
 	unsigned long pfn;
 	int ret;
 
@@ -116,7 +121,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 				continue;
 			frame = mfn;
 		} else {
-			if (mfn != INVALID_P2M_ENTRY)
+			if (!xlated_phys && mfn != INVALID_P2M_ENTRY)
 				continue;
 			frame = pfn;
 		}
@@ -154,6 +159,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 static unsigned long __init xen_release_chunk(unsigned long start,
 					      unsigned long end)
 {
+	/*
+	 * Xen already ballooned out the E820 non RAM regions for us
+	 * and set them up properly in EPT.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return end - start;
+
 	return xen_do_chunk(start, end, true);
 }
 
@@ -222,7 +234,13 @@ static void __init xen_set_identity_and_release_chunk(
 	 * (except for the ISA region which must be 1:1 mapped) to
 	 * release the refcounts (in Xen) on the original frames.
 	 */
-	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
+
+	/*
+	 * PVH E820 matches the hypervisor's P2M which means we need to
+	 * account for the proper values of *release and *identity.
+	 */
+	for (pfn = start_pfn; !xen_feature(XENFEAT_auto_translated_physmap) &&
+	     pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
 		pte_t pte = __pte_ma(0);
 
 		if (pfn < PFN_UP(ISA_END_ADDRESS))

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2)
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 15:43   ` David Vrabel
  2014-01-03 16:34   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.

The most notable PV interfaces are the XenBus and event channels.

We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.

This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/irq.c   |  5 ++++-
 drivers/xen/events.c | 16 ++++++++++------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 0da7f86..76ca326 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/sched.h>
 #include <xen/interface/vcpu.h>
+#include <xen/features.h>
 #include <xen/events.h>
 
 #include <asm/xen/hypercall.h>
@@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-	pv_irq_ops = xen_irq_ops;
+	/* For PVH we use default pv_irq_ops settings. */
+	if (!xen_feature(XENFEAT_hvm_callback_vector))
+		pv_irq_ops = xen_irq_ops;
 	x86_init.irqs.intr_init = xen_init_IRQ;
 }
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 4035e83..bf8fb29 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1908,20 +1908,24 @@ void __init xen_init_IRQ(void)
 	pirq_needs_eoi = pirq_needs_eoi_flag;
 
 #ifdef CONFIG_X86
-	if (xen_hvm_domain()) {
+	if (xen_pv_domain()) {
+		irq_ctx_init(smp_processor_id());
+		if (xen_initial_domain())
+			pci_xen_initial_domain();
+	}
+	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_callback_vector();
+
+	if (xen_hvm_domain()) {
 		native_init_IRQ();
 		/* pci_xen_hvm_init must be called after native_init_IRQ so that
 		 * __acpi_register_gsi can point at the right function */
 		pci_xen_hvm_init();
-	} else {
+	} else if (!xen_pvh_domain()) {
+		/* TODO: No PVH support for PIRQ EOI */
 		int rc;
 		struct physdev_pirq_eoi_gmfn eoi_gmfn;
 
-		irq_ctx_init(smp_processor_id());
-		if (xen_initial_domain())
-			pci_xen_initial_domain();
-
 		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
 		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
 		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (10 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:38   ` David Vrabel
  2014-01-03 16:40   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

The function gnttab_max_grant_frames() returns the maximum amount
of frames (pages) of grants we can have. Unfortunatly it was
dependent on gnttab_init() having been run before to initialize
the boot max value (boot_max_nr_grant_frames).

This meant that users of gnttab_max_grant_frames would always
get a zero value if they called before gnttab_init() - such as
'platform_pci_init' (drivers/xen/platform-pci.c).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/grant-table.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index aa846a4..99399cb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -62,7 +62,6 @@
 
 static grant_ref_t **gnttab_list;
 static unsigned int nr_grant_frames;
-static unsigned int boot_max_nr_grant_frames;
 static int gnttab_free_count;
 static grant_ref_t gnttab_free_head;
 static DEFINE_SPINLOCK(gnttab_list_lock);
@@ -827,6 +826,11 @@ static unsigned int __max_nr_grant_frames(void)
 unsigned int gnttab_max_grant_frames(void)
 {
 	unsigned int xen_max = __max_nr_grant_frames();
+	static unsigned int boot_max_nr_grant_frames;
+
+	/* First time, initialize it properly. */
+	if (!boot_max_nr_grant_frames)
+		boot_max_nr_grant_frames = __max_nr_grant_frames();
 
 	if (xen_max > boot_max_nr_grant_frames)
 		return boot_max_nr_grant_frames;
@@ -1227,13 +1231,12 @@ int gnttab_init(void)
 
 	gnttab_request_version();
 	nr_grant_frames = 1;
-	boot_max_nr_grant_frames = __max_nr_grant_frames();
 
 	/* Determine the maximum number of frames required for the
 	 * grant reference free list on the current hypervisor.
 	 */
 	BUG_ON(grefs_per_grant_frame == 0);
-	max_nr_glist_frames = (boot_max_nr_grant_frames *
+	max_nr_glist_frames = (gnttab_max_grant_frames() *
 			       grefs_per_grant_frame / RPP);
 
 	gnttab_list = kmalloc(max_nr_glist_frames * sizeof(grant_ref_t *),
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (11 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:39   ` David Vrabel
  2014-01-03 16:43   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 14/18] xen/grant: Implement an grant frame array struct Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

We have this odd scenario of where for PV paths we take a shortcut
but for the HVM paths we first ioremap xen_hvm_resume_frames, then
assign it to gnttab_shared.addr. This is needed because gnttab_map
uses gnttab_shared.addr.

Instead of having:
	if (pv)
		return gnttab_map
	if (hvm)
		...

	gnttab_map

Lets move the HVM part before the gnttab_map and remove the
first call to gnttab_map.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/grant-table.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 99399cb..cc1b4fa 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1173,22 +1173,17 @@ static int gnttab_setup(void)
 	if (max_nr_gframes < nr_grant_frames)
 		return -ENOSYS;
 
-	if (xen_pv_domain())
-		return gnttab_map(0, nr_grant_frames - 1);
-
-	if (gnttab_shared.addr == NULL) {
+	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
+	{
 		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
-						PAGE_SIZE * max_nr_gframes);
+					       PAGE_SIZE * max_nr_gframes);
 		if (gnttab_shared.addr == NULL) {
 			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
 					xen_hvm_resume_frames);
 			return -ENOMEM;
 		}
 	}
-
-	gnttab_map(0, nr_grant_frames - 1);
-
-	return 0;
+	return gnttab_map(0, nr_grant_frames - 1);
 }
 
 int gnttab_resume(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (12 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 16:27   ` David Vrabel
  2014-01-03 16:53   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2) Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

The 'xen_hvm_resume_frames' used to be an 'unsigned long'
and contain the virtual address of the grants. That was OK
for most architectures (PVHVM, ARM) were the grants are contingous
in memory. That however is not the case for PVH - in which case
we will have to do a lookup for each virtual address for the PFN.

Instead of doing that, lets make it a structure which will contain
the array of PFNs, the virtual address and the count of said PFNs.

Also provide a generic functions: gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames to populate said structure with
appropiate values for PVHVM and ARM.

To round it off, change the name from 'xen_hvm_resume_frames' to
a more descriptive one - 'xen_auto_xlat_grant_frames'.

For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
we will populate the 'xen_auto_xlat_grant_frames' by ourselves.

Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/arm/xen/enlighten.c   |  9 +++++++--
 drivers/xen/grant-table.c  | 45 ++++++++++++++++++++++++++++++++++++++++-----
 drivers/xen/platform-pci.c | 10 +++++++---
 include/xen/grant_table.h  |  9 ++++++++-
 4 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 8550123..2162172 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -208,6 +208,7 @@ static int __init xen_guest_init(void)
 	const char *version = NULL;
 	const char *xen_prefix = "xen,xen-";
 	struct resource res;
+	unsigned long grant_frames;
 
 	node = of_find_compatible_node(NULL, NULL, "xen,xen");
 	if (!node) {
@@ -224,10 +225,10 @@ static int __init xen_guest_init(void)
 	}
 	if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res))
 		return 0;
-	xen_hvm_resume_frames = res.start;
+	grant_frames = res.start;
 	xen_events_irq = irq_of_parse_and_map(node, 0);
 	pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n",
-			version, xen_events_irq, (xen_hvm_resume_frames >> PAGE_SHIFT));
+			version, xen_events_irq, (grant_frames >> PAGE_SHIFT));
 	xen_domain_type = XEN_HVM_DOMAIN;
 
 	xen_setup_features();
@@ -265,6 +266,10 @@ static int __init xen_guest_init(void)
 	if (xen_vcpu_info == NULL)
 		return -ENOMEM;
 
+	if (gnttab_setup_auto_xlat_frames(grant_frames)) {
+		free_percpu(xen_vcpu_info);
+		return -ENOMEM;
+	}
 	gnttab_init();
 	if (!xen_initial_domain())
 		xenbus_probe(NULL);
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index cc1b4fa..b117fd6 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -65,8 +65,8 @@ static unsigned int nr_grant_frames;
 static int gnttab_free_count;
 static grant_ref_t gnttab_free_head;
 static DEFINE_SPINLOCK(gnttab_list_lock);
-unsigned long xen_hvm_resume_frames;
-EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
+struct grant_frames xen_auto_xlat_grant_frames;
+EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);
 
 static union {
 	struct grant_entry_v1 *v1;
@@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
 }
 EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
 
+int gnttab_setup_auto_xlat_frames(unsigned long addr)
+{
+	xen_pfn_t *pfn;
+	unsigned int max_nr_gframes = __max_nr_grant_frames();
+	int i;
+
+	if (xen_auto_xlat_grant_frames.count)
+		return -EINVAL;
+
+	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
+	if (!pfn)
+		return -ENOMEM;
+	for (i = 0; i < max_nr_gframes; i++)
+		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));
+
+	xen_auto_xlat_grant_frames.vaddr = addr;
+	xen_auto_xlat_grant_frames.pfn = pfn;
+	xen_auto_xlat_grant_frames.count = max_nr_gframes;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gnttab_setup_auto_xlat_frames);
+
+void gnttab_free_auto_xlat_frames(void)
+{
+	if (!xen_auto_xlat_grant_frames.count)
+		return;
+	kfree(xen_auto_xlat_grant_frames.pfn);
+	xen_auto_xlat_grant_frames.pfn = NULL;
+	xen_auto_xlat_grant_frames.count = 0;
+	xen_auto_xlat_grant_frames.vaddr = 0;
+}
+EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
+
 /* Handling of paged out grant targets (GNTST_eagain) */
 #define MAX_DELAY 256
 static inline void
@@ -1068,6 +1102,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 		struct xen_add_to_physmap xatp;
 		unsigned int i = end_idx;
 		rc = 0;
+		BUG_ON(xen_auto_xlat_grant_frames.count < nr_gframes);
 		/*
 		 * Loop backwards, so that the first hypercall has the largest
 		 * index, ensuring that the table will grow only once.
@@ -1076,7 +1111,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 			xatp.domid = DOMID_SELF;
 			xatp.idx = i;
 			xatp.space = XENMAPSPACE_grant_table;
-			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
+			xatp.gpfn = xen_auto_xlat_grant_frames.pfn[i];
 			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
 			if (rc != 0) {
 				pr_warn("grant table add_to_physmap failed, err=%d\n",
@@ -1175,11 +1210,11 @@ static int gnttab_setup(void)
 
 	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
 	{
-		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
+		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames.vaddr,
 					       PAGE_SIZE * max_nr_gframes);
 		if (gnttab_shared.addr == NULL) {
 			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
-					xen_hvm_resume_frames);
+					xen_auto_xlat_grant_frames.vaddr);
 			return -ENOMEM;
 		}
 	}
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 2f3528e..f1947ac 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -108,6 +108,7 @@ static int platform_pci_init(struct pci_dev *pdev,
 	long ioaddr;
 	long mmio_addr, mmio_len;
 	unsigned int max_nr_gframes;
+	unsigned long grant_frames;
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -154,13 +155,16 @@ static int platform_pci_init(struct pci_dev *pdev,
 	}
 
 	max_nr_gframes = gnttab_max_grant_frames();
-	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
+	grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
+	if (gnttab_setup_auto_xlat_frames(grant_frames))
+		goto out;
 	ret = gnttab_init();
 	if (ret)
-		goto out;
+		goto grant_out;
 	xenbus_probe(NULL);
 	return 0;
-
+grant_out:
+	gnttab_free_auto_xlat_frames();
 out:
 	pci_release_region(pdev, 0);
 mem_out:
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 694dcaf..a997406 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -178,8 +178,15 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
 			   grant_status_t **__shared);
 void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
 
-extern unsigned long xen_hvm_resume_frames;
+struct grant_frames {
+	xen_pfn_t *pfn;
+	int count;
+	unsigned long vaddr;
+};
+extern struct grant_frames xen_auto_xlat_grant_frames;
 unsigned int gnttab_max_grant_frames(void);
+int gnttab_setup_auto_xlat_frames(unsigned long addr);
+void gnttab_free_auto_xlat_frames(void);
 
 #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (13 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 14/18] xen/grant: Implement an grant frame array struct Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 16:32   ` David Vrabel
  2014-01-03 17:26   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

In PVH the shared grant frame is the PFN and not MFN,
hence its mapped via the same code path as HVM.

The allocation of the grant frame is done differently - we
do not use the early platform-pci driver and have an
ioremap area - instead we use balloon memory and stitch
all of the non-contingous pages in a virtualized area.

That means when we call the hypervisor to replace the GMFN
with a XENMAPSPACE_grant_table type, we need to lookup the
old PFN for every iteration instead of assuming a flat
contingous PFN allocation.

Lastly, we only use v1 for grants. This is because PVHVM
is not able to use v2 due to no XENMEM_add_to_physmap
calls on the error status page (see commit
69e8f430e243d657c2053f097efebc2e2cd559f0
 xen/granttable: Disable grant v2 for HVM domains.)

Until that is implemented this workaround has to
be in place.

Also per suggestions by Stefano utilize the PVHVM paths
as they share common functionality.

v2 of this patch moves most of the PVH code out in the
arch/x86/xen/grant-table driver and touches only minimally
the generic driver.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/grant-table.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/gntdev.c       |  2 +-
 drivers/xen/grant-table.c  | 13 ++++++----
 3 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 3a5f55d..040e064 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -125,3 +125,67 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
 	apply_to_page_range(&init_mm, (unsigned long)shared,
 			    PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL);
 }
+#ifdef CONFIG_XEN_PVHVM
+#include <xen/balloon.h>
+#include <linux/slab.h>
+static int __init xlated_setup_gnttab_pages(void)
+{
+	struct page **pages;
+	xen_pfn_t *pfns;
+	int rc, i;
+	unsigned long nr_grant_frames = gnttab_max_grant_frames();
+
+	BUG_ON(nr_grant_frames == 0);
+	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
+	if (!pfns) {
+		kfree(pages);
+		return -ENOMEM;
+	}
+	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
+	if (rc) {
+		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+			nr_grant_frames, rc);
+		kfree(pages);
+		kfree(pfns);
+		return rc;
+	}
+	for (i = 0; i < nr_grant_frames; i++)
+		pfns[i] = page_to_pfn(pages[i]);
+
+	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, nr_grant_frames,
+				    (void *)&xen_auto_xlat_grant_frames.vaddr);
+
+	kfree(pages);
+	if (rc) {
+		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
+			nr_grant_frames, rc);
+		free_xenballooned_pages(nr_grant_frames, pages);
+		kfree(pfns);
+		return rc;
+	}
+
+	xen_auto_xlat_grant_frames.pfn = pfns;
+	xen_auto_xlat_grant_frames.count = nr_grant_frames;
+
+	return 0;
+}
+
+static int __init xen_pvh_gnttab_setup(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	if (!xen_pv_domain())
+		return -ENODEV;
+
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
+		return -ENODEV;
+
+	return xlated_setup_gnttab_pages();
+}
+core_initcall(xen_pvh_gnttab_setup); /* Call it _before_ __gnttab_init */
+#endif
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index e41c79c..073b4a1 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -846,7 +846,7 @@ static int __init gntdev_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
-	use_ptemod = xen_pv_domain();
+	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
 
 	err = misc_register(&gntdev_miscdev);
 	if (err != 0) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index b117fd6..2fa3a4c 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1098,7 +1098,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 	unsigned int nr_gframes = end_idx + 1;
 	int rc;
 
-	if (xen_hvm_domain()) {
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
 		struct xen_add_to_physmap xatp;
 		unsigned int i = end_idx;
 		rc = 0;
@@ -1174,7 +1174,7 @@ static void gnttab_request_version(void)
 	int rc;
 	struct gnttab_set_version gsv;
 
-	if (xen_hvm_domain())
+	if (xen_feature(XENFEAT_auto_translated_physmap))
 		gsv.version = 1;
 	else
 		gsv.version = 2;
@@ -1210,8 +1210,11 @@ static int gnttab_setup(void)
 
 	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
 	{
-		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames.vaddr,
-					       PAGE_SIZE * max_nr_gframes);
+		if (xen_hvm_domain()) {
+			gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames.vaddr,
+						       PAGE_SIZE * max_nr_gframes);
+		} else
+			gnttab_shared.addr = xen_auto_xlat_grant_frames.vaddr;
 		if (gnttab_shared.addr == NULL) {
 			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
 					xen_auto_xlat_grant_frames.vaddr);
@@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
 	return gnttab_init();
 }
 
-core_initcall(__gnttab_init);
+core_initcall_sync(__gnttab_init);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (14 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:43   ` David Vrabel
  2014-01-03 17:22   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2) Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. For the XenBus
mechanism we want to use the PVHVM mechanism.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/xenbus/xenbus_client.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index ec097d6..7f7c454 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -45,6 +45,7 @@
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 #include <xen/xen.h>
+#include <xen/features.h>
 
 #include "xenbus_probe.h"
 
@@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
 
 void __init xenbus_ring_ops_init(void)
 {
-	if (xen_pv_domain())
+	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))
 		ring_ops = &ring_ops_pv;
 	else
 		ring_ops = &ring_ops_hvm;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2)
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (15 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:44   ` David Vrabel
  2014-01-03 16:22   ` Stefano Stabellini
  2014-01-01  4:35 ` [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2) Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  19 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

As we do not have yet a mechanism for that.

This also impacts the ARM/ARM64 code (which does not have
hotplug support yet).

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/cpu_hotplug.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index cc6513a..5f80802 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -4,6 +4,7 @@
 
 #include <xen/xen.h>
 #include <xen/xenbus.h>
+#include <xen/features.h>
 
 #include <asm/xen/hypervisor.h>
 #include <asm/cpu.h>
@@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
 	static struct notifier_block xsn_cpu = {
 		.notifier_call = setup_cpu_watcher };
 
-	if (!xen_pv_domain())
+	/* PVH/ARM/ARM64 TBD/FIXME: future work */
+	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
 		return -ENODEV;
 
 	register_xenstore_notifier(&xsn_cpu);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2).
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (16 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-01  4:35 ` Konrad Rzeszutek Wilk
  2014-01-02 11:48   ` David Vrabel
  2014-01-02 16:50 ` [PATCH v12] Linux Xen PVH support David Vrabel
  2014-01-02 18:39 ` H. Peter Anvin
  19 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-01  4:35 UTC (permalink / raw)
  To: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH allows PV linux guest to utilize hardware extended capabilities,
such as running MMU updates in a HVM container.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Use .ascii and .asciz to define xen feature string. Note, the PVH
string must be in a single line (not multiple lines with \) to keep the
assembler from putting null char after each string before \.
This patch allows it to be configured and enabled.

Lastly remove some of the scaffolding.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/Kconfig       | 8 ++++++++
 arch/x86/xen/grant-table.c | 2 +-
 arch/x86/xen/xen-head.S    | 8 +++++++-
 include/xen/xen.h          | 4 +---
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 1a3c765..161cc34 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -51,3 +51,11 @@ config XEN_DEBUG_FS
 	  Enable statistics output and various tuning options in debugfs.
 	  Enabling this option may incur a significant performance overhead.
 
+config XEN_PVH
+	bool "Support for running as a PVH guest"
+	depends on X86_64 && XEN && XEN_PVHVM
+	default n
+	help
+	   This option enables support for running as a PVH guest (PV guest
+	   using hardware extensions) under a suitably capable hypervisor.
+	   If unsure, say N.
diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 040e064..42635e9 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -125,7 +125,7 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
 	apply_to_page_range(&init_mm, (unsigned long)shared,
 			    PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL);
 }
-#ifdef CONFIG_XEN_PVHVM
+#ifdef CONFIG_XEN_PVH
 #include <xen/balloon.h>
 #include <linux/slab.h>
 static int __init xlated_setup_gnttab_pages(void)
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7faed58..56f42c0 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -13,6 +13,12 @@
 #include <xen/interface/elfnote.h>
 #include <asm/xen/interface.h>
 
+#ifdef CONFIG_XEN_PVH
+#define PVH_FEATURES_STR  "|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector"
+#else
+#define PVH_FEATURES_STR  ""
+#endif
+
 	__INIT
 ENTRY(startup_xen)
 	cld
@@ -95,7 +101,7 @@ NEXT_HYPERCALL(arch_6)
 #endif
 	ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,          _ASM_PTR startup_xen)
 	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz "!writable_page_tables|pae_pgdir_above_4gb")
+	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .ascii "!writable_page_tables|pae_pgdir_above_4gb"; .asciz PVH_FEATURES_STR)
 	ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,       .asciz "yes")
 	ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz "generic")
 	ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
diff --git a/include/xen/xen.h b/include/xen/xen.h
index c4ab644..0c0e3ef 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,9 +29,7 @@ extern enum xen_domain_type xen_domain_type;
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
-#ifdef CONFIG_XEN_PVHVM
-/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
-
+#ifdef CONFIG_XEN_PVH
 /* This functionality exists only for x86. The XEN_PVHVM support exists
  * only in x86 world - hence on ARM it will be always disabled.
  * N.B. ARM guests are neither PV nor HVM nor PVHVM.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2).
  2014-01-01  4:35 ` [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 11:13   ` David Vrabel
  2014-01-03 15:33     ` Stefano Stabellini
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> Which is a PV guest with auto page translation enabled
> and with vector callback. It is a cross between PVHVM and PV.
> 
> The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> with modifications):
> 
> "* the guest uses auto translate:
>  - p2m is managed by Xen
>  - pagetables are owned by the guest
>  - mmu_update hypercall not available
> * it uses event callback and not vlapic emulation,
> * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> 
> For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> PV guest with auto translate, although it does use hvm_op for setting
> callback vector."
> 
> We don't have yet a Kconfig entry setup as we do not
> have all the parts ready for it - so we piggyback
> on the PVHVM config option. This scaffolding will
> be removed later.
> 
> Note that on ARM the concept of PVH is non-existent. As Ian
> put it: "an ARM guest is neither PV nor HVM nor PVHVM.
> It's a bit like PVH but is different also (it's further towards
> the H end of the spectrum than even PVH).". As such these
> options (PVHVM, PVH) are never enabled nor seen on ARM
> compilations.
[...]
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -29,4 +29,20 @@ extern enum xen_domain_type xen_domain_type;
>  #define xen_initial_domain()	(0)
>  #endif	/* CONFIG_XEN_DOM0 */
>  
> +#ifdef CONFIG_XEN_PVHVM
> +/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */

This is a bit confusing.  I think it would be better to add the
CONFIG_XEN_PVH option with this patch but make it default n and not
possible to enable.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 04/18] xen/pvh: Don't setup P2M tree.
  2014-01-01  4:35 ` [PATCH v12 04/18] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
@ 2014-01-02 11:17   ` David Vrabel
  2014-01-03 15:41   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
[...]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code.
  2014-01-01  4:35 ` [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code Konrad Rzeszutek Wilk
@ 2014-01-02 11:21   ` David Vrabel
  2014-01-03 15:47   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> The revector and copying of the P2M only happens when
> !auto-xlat and on 64-bit builds. It is not obvious from
> the code, so lets have seperate 32 and 64-bit functions.
> 
> We also invert the check for auto-xlat to make the code
> flow simpler.
[...]
> @@ -1255,8 +1251,19 @@ static void __init xen_pagetable_init(void)
>  	 * anything at this stage. */
>  	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
>  #endif
> -skip:
> +}
> +#else
> +static void __init xen_pagetable_p2m_copy(void)
> +{
> +	/* Nada! */
> +}
>  #endif
> +
> +static void __init xen_pagetable_init(void)
> +{
> +	paging_init();
> +	xen_setup_shared_info();

I would prefer

#ifdef CONFIG_X86_64

> +	xen_pagetable_p2m_copy();

#endif

rather than the empty stub function.  I think this makes it clearer what
is 64-bit specific.

>  	xen_post_allocator_init();
>  }
>  static void xen_write_cr2(unsigned long cr2)

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)
  2014-01-01  4:35 ` [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 11:24   ` David Vrabel
  2014-01-03  1:36     ` Mukesh Rathor
  2014-01-03 15:50     ` Stefano Stabellini
  0 siblings, 2 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> .. which are surprinsingly small compared to the amount for PV code.
> 
> PVH uses mostly native mmu ops, we leave the generic (native_*) for
> the majority and just overwrite the baremetal with the ones we need.
> 
> We also optimize one - the TLB flush. The native operation would
> needlessly IPI offline VCPUs causing extra wakeups. Using the
> Xen one avoids that and lets the hypervisor determine which
> VCPU needs the TLB flush.

This TLB flush optimization should be a separate patch.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 07/18] xen/pvh: Setup up shared_info.
  2014-01-01  4:35 ` [PATCH v12 07/18] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
@ 2014-01-02 11:27   ` David Vrabel
  2014-01-02 18:23     ` Konrad Rzeszutek Wilk
  2014-01-03 14:39     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> For PVHVM the shared_info structure is provided via the same way
> as for normal PV guests (see include/xen/interface/xen.h).
> 
> That is during bootup we get 'xen_start_info' via the %esi register
> in startup_xen. Then later we extract the 'shared_info' from said
> structure (in xen_setup_shared_info) and start using it.
> 
> The 'xen_setup_shared_info' is all setup to work with auto-xlat
> guests, but there are two functions which it calls that are not:
> xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
> This patch modifies those to work in auto-xlat mode.
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1147,8 +1147,9 @@ void xen_setup_vcpu_info_placement(void)
>  		xen_vcpu_setup(cpu);
>  
>  	/* xen_vcpu_setup managed to place the vcpu_info within the
> -	   percpu area for all cpus, so make use of it */
> -	if (have_vcpu_info_placement) {
> +	 * percpu area for all cpus, so make use of it. Note that for
> +	 * PVH we want to use native IRQ mechanism. */
> +	if (have_vcpu_info_placement && !xen_pvh_domain()) {
>  		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
>  		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
>  		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);

Should this be in a separate patch: "xen/pvh: use native irq ops"?

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP.
  2014-01-01  4:35 ` [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
@ 2014-01-02 11:31   ` David Vrabel
  2014-01-02 18:24     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> During early bootup we start life using the Xen provided
> GDT, which means that we are running with %cs segment set
> to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).
> 
> But for PVH we want to be use HVM type mechanism for
> segment operations. As such we need to switch to the HVM
> one and also reload ourselves with the __KERNEL_CS:eip
> to run in the proper GDT and segment.
> 
> For HVM this is usually done in 'secondary_startup_64' in
> (head_64.S) but since we are not taking that bootup
> path (we start in PV - xen_start_kernel) we need to do
> that in the early PV bootup paths.
> 
> For good measure we also zero out the %fs, %ds, and %es
> (not strictly needed as Xen has already cleared them
> for us). The %gs is loaded by 'switch_to_new_gdt'.
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1414,8 +1414,43 @@ static void __init xen_boot_params_init_edd(void)
>   * we do this, we have to be careful not to call any stack-protected
>   * function, which is most of the kernel.
>   */
> -static void __init xen_setup_stackprotector(void)
> +static void __init xen_setup_gdt(void)
>  {
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> +#ifdef CONFIG_X86_64
> +		unsigned long dummy;
> +
> +		switch_to_new_gdt(0); /* GDT and GS set */
> +
> +		/* We are switching of the Xen provided GDT to our HVM mode
> +		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
> +		 * and we are jumping to reload it.
> +		 */
> +		asm volatile ("pushq %0\n"
> +			      "leaq 1f(%%rip),%0\n"
> +			      "pushq %0\n"
> +			      "lretq\n"
> +			      "1:\n"
> +			      : "=&r" (dummy) : "0" (__KERNEL_CS));
> +
> +		/*
> +		 * While not needed, we also set the %es, %ds, and %fs
> +		 * to zero. We don't care about %ss as it is NULL.
> +		 * Strictly speaking this is not needed as Xen zeros those
> +		 * out (and also MSR_FS_BASE, MSR_GS_BASE, MSR_KERNEL_GS_BASE)
> +		 *
> +		 * Linux zeros them in cpu_init() and in secondary_startup_64
> +		 * (for BSP).
> +		 */
> +		loadsegment(es, 0);
> +		loadsegment(ds, 0);
> +		loadsegment(fs, 0);
> +#else
> +		/* PVH: TODO Implement. */
> +		BUG();
> +#endif
> +		return;
> +	}
>  	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
>  	pv_cpu_ops.load_gdt = xen_load_gdt_boot;

If PVH uses native GDT why are these (and possibly other?) GDT ops needed?

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
  2014-01-01  4:35 ` [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-02 11:38   ` David Vrabel
  2014-01-03 16:40   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> The function gnttab_max_grant_frames() returns the maximum amount
> of frames (pages) of grants we can have. Unfortunatly it was
> dependent on gnttab_init() having been run before to initialize
> the boot max value (boot_max_nr_grant_frames).
> 
> This meant that users of gnttab_max_grant_frames would always
> get a zero value if they called before gnttab_init() - such as
> 'platform_pci_init' (drivers/xen/platform-pci.c).
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

You can pull this out of the PVH series and merge early if you prefer.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init
  2014-01-01  4:35 ` [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-02 11:39   ` David Vrabel
  2014-01-03 16:43   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> We have this odd scenario of where for PV paths we take a shortcut
> but for the HVM paths we first ioremap xen_hvm_resume_frames, then
> assign it to gnttab_shared.addr. This is needed because gnttab_map
> uses gnttab_shared.addr.
> 
> Instead of having:
> 	if (pv)
> 		return gnttab_map
> 	if (hvm)
> 		...
> 
> 	gnttab_map
> 
> Lets move the HVM part before the gnttab_map and remove the
> first call to gnttab_map.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Again, feel free to apply early.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus.
  2014-01-01  4:35 ` [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
@ 2014-01-02 11:43   ` David Vrabel
  2014-01-03 17:22   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. For the XenBus
> mechanism we want to use the PVHVM mechanism.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2)
  2014-01-01  4:35 ` [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 11:44   ` David Vrabel
  2014-01-03 16:22   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> As we do not have yet a mechanism for that.
> 
> This also impacts the ARM/ARM64 code (which does not have
> hotplug support yet).

Subject needs to be "xen/pvh: disable VCPU hotplug with PVH" or similar
since it's only disabling this one feature.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2).
  2014-01-01  4:35 ` [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 11:48   ` David Vrabel
  2014-01-02 18:27     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 11:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH allows PV linux guest to utilize hardware extended capabilities,
> such as running MMU updates in a HVM container.
> 
> The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> with modifications):
> 
> "* the guest uses auto translate:
>  - p2m is managed by Xen
>  - pagetables are owned by the guest
>  - mmu_update hypercall not available
> * it uses event callback and not vlapic emulation,
> * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> 
> For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> PV guest with auto translate, although it does use hvm_op for setting
> callback vector."
> 
> Use .ascii and .asciz to define xen feature string. Note, the PVH
> string must be in a single line (not multiple lines with \) to keep the
> assembler from putting null char after each string before \.
> This patch allows it to be configured and enabled.
> 
> Lastly remove some of the scaffolding.
[...]
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -51,3 +51,11 @@ config XEN_DEBUG_FS
>  	  Enable statistics output and various tuning options in debugfs.
>  	  Enabling this option may incur a significant performance overhead.
>  
> +config XEN_PVH
> +	bool "Support for running as a PVH guest"
> +	depends on X86_64 && XEN && XEN_PVHVM

Would select XEN_PVHVM be more useful?  It may not be obvious to a user
that PV with hardware extension depends on HVM with PV extensions.

> +	default n
> +	help
> +	   This option enables support for running as a PVH guest (PV guest
> +	   using hardware extensions) under a suitably capable hypervisor.
> +	   If unsure, say N.

This help text needs to clearly state that PVH support is experimental
or a tech preview and the ABI is subject to change and PVH guests may
not run on newer hypervisors.  Unless the plan is to only merge the
Linux support once the hypervisor ABI is finalized.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-01  4:35 ` [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 15:32   ` David Vrabel
  2014-01-02 18:32     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 15:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> In the bootup code for PVH we can trap cpuid via vmexit, so don't
> need to use emulated prefix call. We also check for vector callback
> early on, as it is a required feature. PVH also runs at default kernel
> IOPL.
> 
> Finally, pure PV settings are moved to a separate function that are
> only called for pure PV, ie, pv with pvmmu. They are also #ifdef
> with CONFIG_XEN_PVMMU.
[...]
> @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>  		break;
>  	}
>  
> -	asm(XEN_EMULATE_PREFIX "cpuid"
> -		: "=a" (*ax),
> -		  "=b" (*bx),
> -		  "=c" (*cx),
> -		  "=d" (*dx)
> -		: "0" (*ax), "2" (*cx));
> +	if (xen_pvh_domain())
> +		native_cpuid(ax, bx, cx, dx);
> +	else
> +		asm(XEN_EMULATE_PREFIX "cpuid"
> +			: "=a" (*ax),
> +			"=b" (*bx),
> +			"=c" (*cx),
> +			"=d" (*dx)
> +			: "0" (*ax), "2" (*cx));

For this one off cpuid call it seems preferrable to me to use the
emulate prefix rather than diverge from PV.

> @@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
>  
>  	xen_domain_type = XEN_PV_DOMAIN;
>  
> +	xen_setup_features();
> +	xen_pvh_early_guest_init();
>  	xen_setup_machphys_mapping();
>  
>  	/* Install Xen paravirt ops */
>  	pv_info = xen_info;
>  	pv_init_ops = xen_init_ops;
> -	pv_cpu_ops = xen_cpu_ops;
>  	pv_apic_ops = xen_apic_ops;
> +	if (xen_pvh_domain())
> +		pv_cpu_ops.cpuid = xen_cpuid;
> +	else
> +		pv_cpu_ops = xen_cpu_ops;

If cpuid is trapped for PVH guests why does PVH need non-native cpuid op?

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2)
  2014-01-01  4:35 ` [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 15:43   ` David Vrabel
  2014-01-03 16:34   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 15:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. There is
> a similar mode - PVHVM where we run in HVM mode with
> PV code enabled - and this patch explores that.
> 
> The most notable PV interfaces are the XenBus and event channels.
> 
> We will piggyback on how the event channel mechanism is
> used in PVHVM - that is we want the normal native IRQ mechanism
> and we will install a vector (hvm callback) for which we
> will call the event channel mechanism.
> 
> This means that from a pvops perspective, we can use
> native_irq_ops instead of the Xen PV specific. Albeit in the
> future we could support pirq_eoi_map. But that is
> a feature request that can be shared with PVHVM.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 09/18] xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  2014-01-01  4:35 ` [PATCH v12 09/18] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
@ 2014-01-02 16:07   ` David Vrabel
  0 siblings, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-02 16:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> The VCPU bringup protocol follows the PV with certain twists.
> From xen/include/public/arch-x86/xen.h:
> 
> Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
> for HVM and PVH guests, not all information in this structure is updated:
> 
>  - For HVM guests, the structures read include: fpu_ctxt (if
>  VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
> 
>  - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
>  set cr3. All other fields not used should be set to 0.
> 
> This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
> a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
> can load per-cpu data-structures. It has no effect on the VCPU0.
> 
> We also piggyback on the %rdi register to pass in the CPU number - so
> that when we bootup a new CPU, the cpu_bringup_and_idle will have
> passed as the first parameter the CPU number (via %rdi for 64-bit).
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1413,14 +1413,19 @@ static void __init xen_boot_params_init_edd(void)
>   * Set up the GDT and segment registers for -fstack-protector.  Until
>   * we do this, we have to be careful not to call any stack-protected
>   * function, which is most of the kernel.
> + *
> + * Note, that it is refok - b/c the only caller of this after init

Please spell out 'because' in full.  b/c is too hard to read.  Also list
the callers (cpu_bringup_and_idle() I guess).

> + * is PVH which is not going to use xen_load_gdt_boot or other
> + * __init functions.
>   */
> -static void __init xen_setup_gdt(void)
> +void __init_refok xen_setup_gdt(int cpu)

__ref seems to be the correct section marker for this.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-01  4:35 ` [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 16:14   ` David Vrabel
  2014-01-02 18:41     ` Konrad Rzeszutek Wilk
  2014-01-03 16:30   ` Stefano Stabellini
  1 sibling, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 16:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> In xen_add_extra_mem() we can skip updating P2M as it's managed
> by Xen. PVH maps the entire IO space, but only RAM pages need
> to be repopulated.

So this looks minimal but I can't work out what PVH actually needs to do
here.  This code really doesn't need to be made any more confusing.

I don't understand why the guest hasn't been supplied with sensible
memory map that we can use as-is without playing all these games?

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-01  4:35 ` [PATCH v12 14/18] xen/grant: Implement an grant frame array struct Konrad Rzeszutek Wilk
@ 2014-01-02 16:27   ` David Vrabel
  2014-01-02 18:47     ` Konrad Rzeszutek Wilk
  2014-01-03 16:53   ` Stefano Stabellini
  1 sibling, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 16:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> The 'xen_hvm_resume_frames' used to be an 'unsigned long'
> and contain the virtual address of the grants. That was OK
> for most architectures (PVHVM, ARM) were the grants are contingous
> in memory. That however is not the case for PVH - in which case
> we will have to do a lookup for each virtual address for the PFN.
> 
> Instead of doing that, lets make it a structure which will contain
> the array of PFNs, the virtual address and the count of said PFNs.
> 
> Also provide a generic functions: gnttab_setup_auto_xlat_frames and
> gnttab_free_auto_xlat_frames to populate said structure with
> appropiate values for PVHVM and ARM.
> 
> To round it off, change the name from 'xen_hvm_resume_frames' to
> a more descriptive one - 'xen_auto_xlat_grant_frames'.
> 
> For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
> we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
[...]
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
[...]
> @@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
>  }
>  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
>  
> +int gnttab_setup_auto_xlat_frames(unsigned long addr)
> +{
> +	xen_pfn_t *pfn;
> +	unsigned int max_nr_gframes = __max_nr_grant_frames();
> +	int i;
> +
> +	if (xen_auto_xlat_grant_frames.count)
> +		return -EINVAL;
> +
> +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
> +	if (!pfn)
> +		return -ENOMEM;
> +	for (i = 0; i < max_nr_gframes; i++)
> +		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));

PFN_DOWN(addr) + i looks better to me.

> +
> +	xen_auto_xlat_grant_frames.vaddr = addr;

Huh? addr is a physical address but you're assigning it to a field
called vaddr?  I think you mean to set this field to the result of the
xen_remap() call, yes?

> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -178,8 +178,15 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
>  			   grant_status_t **__shared);
>  void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
>  
> -extern unsigned long xen_hvm_resume_frames;
> +struct grant_frames {
> +	xen_pfn_t *pfn;
> +	int count;

unsigned int.

> +	unsigned long vaddr;

void * if this is a virtual address.

David


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-01  4:35 ` [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 16:32   ` David Vrabel
  2014-01-02 18:50     ` Konrad Rzeszutek Wilk
  2014-01-03 17:26   ` Stefano Stabellini
  1 sibling, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 16:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> In PVH the shared grant frame is the PFN and not MFN,
> hence its mapped via the same code path as HVM.
> 
> The allocation of the grant frame is done differently - we
> do not use the early platform-pci driver and have an
> ioremap area - instead we use balloon memory and stitch
> all of the non-contingous pages in a virtualized area.
> 
> That means when we call the hypervisor to replace the GMFN
> with a XENMAPSPACE_grant_table type, we need to lookup the
> old PFN for every iteration instead of assuming a flat
> contingous PFN allocation.
> 
> Lastly, we only use v1 for grants. This is because PVHVM
> is not able to use v2 due to no XENMEM_add_to_physmap
> calls on the error status page (see commit
> 69e8f430e243d657c2053f097efebc2e2cd559f0
>  xen/granttable: Disable grant v2 for HVM domains.)
> 
> Until that is implemented this workaround has to
> be in place.
> 
> Also per suggestions by Stefano utilize the PVHVM paths
> as they share common functionality.
> 
> v2 of this patch moves most of the PVH code out in the
> arch/x86/xen/grant-table driver and touches only minimally
> the generic driver.
[...]
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
[...]
> +static int __init xen_pvh_gnttab_setup(void)
> +{
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	if (!xen_pv_domain())
> +		return -ENODEV;
> +
> +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> +		return -ENODEV;

Replace all these with if (!xen_pvh_domain()) ?

> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
>  	return gnttab_init();
>  }
>  
> -core_initcall(__gnttab_init);
> +core_initcall_sync(__gnttab_init);

Why has this become _sync?

David


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12] Linux Xen PVH support.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (17 preceding siblings ...)
  2014-01-01  4:35 ` [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2) Konrad Rzeszutek Wilk
@ 2014-01-02 16:50 ` David Vrabel
  2014-01-02 19:02   ` Konrad Rzeszutek Wilk
  2014-01-02 18:39 ` H. Peter Anvin
  19 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-02 16:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> The patches, also available at
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v12
> 
> implements the neccessary functionality to boot a PV guest in PVH mode.

In general this looks in much better shape now.  Some of the refactoring
patches should be queued for 3.14.

I'm not sure if when the rest wants to go in given that the PVH
hypervisor ABI is not yet finalized and is missing support for a number
of things with no visible plan for how/when/if this missing
functionality will be implemented.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 07/18] xen/pvh: Setup up shared_info.
  2014-01-02 11:27   ` David Vrabel
@ 2014-01-02 18:23     ` Konrad Rzeszutek Wilk
  2014-01-03 14:39     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:23 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 11:27:56AM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > For PVHVM the shared_info structure is provided via the same way
> > as for normal PV guests (see include/xen/interface/xen.h).
> > 
> > That is during bootup we get 'xen_start_info' via the %esi register
> > in startup_xen. Then later we extract the 'shared_info' from said
> > structure (in xen_setup_shared_info) and start using it.
> > 
> > The 'xen_setup_shared_info' is all setup to work with auto-xlat
> > guests, but there are two functions which it calls that are not:
> > xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
> > This patch modifies those to work in auto-xlat mode.
> [...]
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1147,8 +1147,9 @@ void xen_setup_vcpu_info_placement(void)
> >  		xen_vcpu_setup(cpu);
> >  
> >  	/* xen_vcpu_setup managed to place the vcpu_info within the
> > -	   percpu area for all cpus, so make use of it */
> > -	if (have_vcpu_info_placement) {
> > +	 * percpu area for all cpus, so make use of it. Note that for
> > +	 * PVH we want to use native IRQ mechanism. */
> > +	if (have_vcpu_info_placement && !xen_pvh_domain()) {
> >  		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
> >  		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
> >  		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
> 
> Should this be in a separate patch: "xen/pvh: use native irq ops"?

Good idea. Initially it was part of the event channel one, but I split
it.
> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP.
  2014-01-02 11:31   ` David Vrabel
@ 2014-01-02 18:24     ` Konrad Rzeszutek Wilk
  2014-01-03 11:27       ` David Vrabel
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:24 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

> > +		loadsegment(es, 0);
> > +		loadsegment(ds, 0);
> > +		loadsegment(fs, 0);
> > +#else
> > +		/* PVH: TODO Implement. */
> > +		BUG();
> > +#endif
> > +		return;    <==============
> > +	}
> >  	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
> >  	pv_cpu_ops.load_gdt = xen_load_gdt_boot;
> 
> If PVH uses native GDT why are these (and possibly other?) GDT ops needed?

They aren't. There is a 'return' there. I marked it for you with
'<======'.


> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2).
  2014-01-02 11:48   ` David Vrabel
@ 2014-01-02 18:27     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:27 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 11:48:50AM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > PVH allows PV linux guest to utilize hardware extended capabilities,
> > such as running MMU updates in a HVM container.
> > 
> > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > with modifications):
> > 
> > "* the guest uses auto translate:
> >  - p2m is managed by Xen
> >  - pagetables are owned by the guest
> >  - mmu_update hypercall not available
> > * it uses event callback and not vlapic emulation,
> > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > 
> > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > PV guest with auto translate, although it does use hvm_op for setting
> > callback vector."
> > 
> > Use .ascii and .asciz to define xen feature string. Note, the PVH
> > string must be in a single line (not multiple lines with \) to keep the
> > assembler from putting null char after each string before \.
> > This patch allows it to be configured and enabled.
> > 
> > Lastly remove some of the scaffolding.
> [...]
> > --- a/arch/x86/xen/Kconfig
> > +++ b/arch/x86/xen/Kconfig
> > @@ -51,3 +51,11 @@ config XEN_DEBUG_FS
> >  	  Enable statistics output and various tuning options in debugfs.
> >  	  Enabling this option may incur a significant performance overhead.
> >  
> > +config XEN_PVH
> > +	bool "Support for running as a PVH guest"
> > +	depends on X86_64 && XEN && XEN_PVHVM
> 
> Would select XEN_PVHVM be more useful?  It may not be obvious to a user

Sure.
> that PV with hardware extension depends on HVM with PV extensions.
> 
> > +	default n
> > +	help
> > +	   This option enables support for running as a PVH guest (PV guest
> > +	   using hardware extensions) under a suitably capable hypervisor.
> > +	   If unsure, say N.
> 
> This help text needs to clearly state that PVH support is experimental
> or a tech preview and the ABI is subject to change and PVH guests may
> not run on newer hypervisors.  Unless the plan is to only merge the
> Linux support once the hypervisor ABI is finalized.

I am very much comfortable marking it as experimental and tech preview
with the caveat that it: 1) will change (or probably) in the future of
Xen versions, and 2) won't cause regressions with older hypervisors.
In other words, enabling this option should not make the kernel stop
working with say Xen 4.1.

[Which we need to fix of course]


> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-02 15:32   ` David Vrabel
@ 2014-01-02 18:32     ` Konrad Rzeszutek Wilk
  2014-01-03  1:34       ` Mukesh Rathor
  2014-01-03 11:25       ` David Vrabel
  0 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:32 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > need to use emulated prefix call. We also check for vector callback
> > early on, as it is a required feature. PVH also runs at default kernel
> > IOPL.
> > 
> > Finally, pure PV settings are moved to a separate function that are
> > only called for pure PV, ie, pv with pvmmu. They are also #ifdef
> > with CONFIG_XEN_PVMMU.
> [...]
> > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
> >  		break;
> >  	}
> >  
> > -	asm(XEN_EMULATE_PREFIX "cpuid"
> > -		: "=a" (*ax),
> > -		  "=b" (*bx),
> > -		  "=c" (*cx),
> > -		  "=d" (*dx)
> > -		: "0" (*ax), "2" (*cx));
> > +	if (xen_pvh_domain())
> > +		native_cpuid(ax, bx, cx, dx);
> > +	else
> > +		asm(XEN_EMULATE_PREFIX "cpuid"
> > +			: "=a" (*ax),
> > +			"=b" (*bx),
> > +			"=c" (*cx),
> > +			"=d" (*dx)
> > +			: "0" (*ax), "2" (*cx));
> 
> For this one off cpuid call it seems preferrable to me to use the
> emulate prefix rather than diverge from PV.

This was before the PV cpuid was deemed OK to be used on PVH.
Will rip this out to use the same version.

> 
> > @@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
> >  
> >  	xen_domain_type = XEN_PV_DOMAIN;
> >  
> > +	xen_setup_features();
> > +	xen_pvh_early_guest_init();
> >  	xen_setup_machphys_mapping();
> >  
> >  	/* Install Xen paravirt ops */
> >  	pv_info = xen_info;
> >  	pv_init_ops = xen_init_ops;
> > -	pv_cpu_ops = xen_cpu_ops;
> >  	pv_apic_ops = xen_apic_ops;
> > +	if (xen_pvh_domain())
> > +		pv_cpu_ops.cpuid = xen_cpuid;
> > +	else
> > +		pv_cpu_ops = xen_cpu_ops;
> 
> If cpuid is trapped for PVH guests why does PVH need non-native cpuid op?

There are a couple of filtering done on the cpuid. But with HVM I am
not entirely sure if it is worth preserving those or not.

My fear is that if we switch over to the native one without the
filtering that the kernel does we open up a can of worms that had been
closed in the past. The reason is that for dom0 - there is no cpuid
filtering being done. So it gets everything that the hypervisor sees.

Which we don't want to do for APERF (b/c the generic scheduler code will
try to do those MSRs), and then there is the ACPI extended C-states.

Perhaps a better thing is just to still have the xen_cpuid but
but a big comment saying: "/* We should use native, but we need to
filter some cpuid's out. TODO */

?
> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12] Linux Xen PVH support.
  2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
                   ` (18 preceding siblings ...)
  2014-01-02 16:50 ` [PATCH v12] Linux Xen PVH support David Vrabel
@ 2014-01-02 18:39 ` H. Peter Anvin
  2014-01-02 19:12   ` Konrad Rzeszutek Wilk
  19 siblings, 1 reply; 90+ messages in thread
From: H. Peter Anvin @ 2014-01-02 18:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, linux-kernel, xen-devel, boris.ostrovsky,
	david.vrabel, stefano.stabellini, mukesh.rathor

On 12/31/2013 08:35 PM, Konrad Rzeszutek Wilk wrote:
> The patches, also available at
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v12
> 
> implements the neccessary functionality to boot a PV guest in PVH mode.
> 

As x86 maintainer I would like to see a list of what pvops are necessary
in PVH mode.  Obviously the hope is that the really invasive ones will
not be necessary (and there are good reasons to believe that is within
reach.)

	-hpa



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-02 16:14   ` David Vrabel
@ 2014-01-02 18:41     ` Konrad Rzeszutek Wilk
  2014-01-04  1:23       ` Mukesh Rathor
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:41 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 04:14:32PM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > by Xen. PVH maps the entire IO space, but only RAM pages need
> > to be repopulated.
> 
> So this looks minimal but I can't work out what PVH actually needs to do
> here.  This code really doesn't need to be made any more confusing.

I gather you prefer Mukesh's original version?

https://lkml.org/lkml/2013/12/18/710
> 
> I don't understand why the guest hasn't been supplied with sensible
> memory map that we can use as-is without playing all these games?

dom0_mem=3G,max:7G. The E820 and the P2M setup in the hypervisor have
a sensible layout (aka, 1-1). But the shared_info.nr_pages doesn't tell
us that - it instead gives us just the number of pages.

Which is OK, but if it is different than what you would expect from
the E820 (as in, the number of pages of E820_RAM is different than
the nr_pages), then you need to setup some of the E820 regions as the
balloon memory but without real memory.

Unless the hypervisor's filter out the E820 that we get through the
'XENMEM_machine_memory_map' ?

This should not be (and it did not look to be) a problem with the
E820 that is setup by the toolstack.

> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-02 16:27   ` David Vrabel
@ 2014-01-02 18:47     ` Konrad Rzeszutek Wilk
  2014-01-03 12:11       ` [Xen-devel] " David Vrabel
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:47 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 04:27:19PM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > The 'xen_hvm_resume_frames' used to be an 'unsigned long'
> > and contain the virtual address of the grants. That was OK
> > for most architectures (PVHVM, ARM) were the grants are contingous
> > in memory. That however is not the case for PVH - in which case
> > we will have to do a lookup for each virtual address for the PFN.
> > 
> > Instead of doing that, lets make it a structure which will contain
> > the array of PFNs, the virtual address and the count of said PFNs.
> > 
> > Also provide a generic functions: gnttab_setup_auto_xlat_frames and
> > gnttab_free_auto_xlat_frames to populate said structure with
> > appropiate values for PVHVM and ARM.
> > 
> > To round it off, change the name from 'xen_hvm_resume_frames' to
> > a more descriptive one - 'xen_auto_xlat_grant_frames'.
> > 
> > For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
> > we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
> [...]
> > --- a/drivers/xen/grant-table.c
> > +++ b/drivers/xen/grant-table.c
> [...]
> > @@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
> >  }
> >  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
> >  
> > +int gnttab_setup_auto_xlat_frames(unsigned long addr)
> > +{
> > +	xen_pfn_t *pfn;
> > +	unsigned int max_nr_gframes = __max_nr_grant_frames();
> > +	int i;
> > +
> > +	if (xen_auto_xlat_grant_frames.count)
> > +		return -EINVAL;
> > +
> > +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
> > +	if (!pfn)
> > +		return -ENOMEM;
> > +	for (i = 0; i < max_nr_gframes; i++)
> > +		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));
> 
> PFN_DOWN(addr) + i looks better to me.
> 
> > +
> > +	xen_auto_xlat_grant_frames.vaddr = addr;
> 
> Huh? addr is a physical address but you're assigning it to a field
> called vaddr?  I think you mean to set this field to the result of the
> xen_remap() call, yes?

It ends up doing that in gnttab_init. Not to
xen_auto_xlat_grant_frames.vaddr but to gnttab_shared.addr.

But not for PVH, which already has done so (via vmap).

It is kind of silly - for PVHVM we use a physical address (the MMIO
of the plaform-pci) and ioremap it. For PVH, we need to use balloon
memory and vmap it. We can't use ioremap on it because the it is RAM
pages and ioremap will complain.

We end up with special casing - for PVHVM do ioremap, for PVH, just
assign it to gnttab_shared.addr as it already has an virtual address.

Perhaps I should just make this a union field? 
> 
> > --- a/include/xen/grant_table.h
> > +++ b/include/xen/grant_table.h
> > @@ -178,8 +178,15 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
> >  			   grant_status_t **__shared);
> >  void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
> >  
> > -extern unsigned long xen_hvm_resume_frames;
> > +struct grant_frames {
> > +	xen_pfn_t *pfn;
> > +	int count;
> 
> unsigned int.
> 
> > +	unsigned long vaddr;
> 
> void * if this is a virtual address.

It is a physical address for PVHVM, and a virtual address for PVH
(see above rant). 
> 
> David
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-02 16:32   ` David Vrabel
@ 2014-01-02 18:50     ` Konrad Rzeszutek Wilk
  2014-01-03 11:54       ` David Vrabel
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 18:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > In PVH the shared grant frame is the PFN and not MFN,
> > hence its mapped via the same code path as HVM.
> > 
> > The allocation of the grant frame is done differently - we
> > do not use the early platform-pci driver and have an
> > ioremap area - instead we use balloon memory and stitch
> > all of the non-contingous pages in a virtualized area.
> > 
> > That means when we call the hypervisor to replace the GMFN
> > with a XENMAPSPACE_grant_table type, we need to lookup the
> > old PFN for every iteration instead of assuming a flat
> > contingous PFN allocation.
> > 
> > Lastly, we only use v1 for grants. This is because PVHVM
> > is not able to use v2 due to no XENMEM_add_to_physmap
> > calls on the error status page (see commit
> > 69e8f430e243d657c2053f097efebc2e2cd559f0
> >  xen/granttable: Disable grant v2 for HVM domains.)
> > 
> > Until that is implemented this workaround has to
> > be in place.
> > 
> > Also per suggestions by Stefano utilize the PVHVM paths
> > as they share common functionality.
> > 
> > v2 of this patch moves most of the PVH code out in the
> > arch/x86/xen/grant-table driver and touches only minimally
> > the generic driver.
> [...]
> > --- a/arch/x86/xen/grant-table.c
> > +++ b/arch/x86/xen/grant-table.c
> [...]
> > +static int __init xen_pvh_gnttab_setup(void)
> > +{
> > +	if (!xen_domain())
> > +		return -ENODEV;
> > +
> > +	if (!xen_pv_domain())
> > +		return -ENODEV;
> > +
> > +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> > +		return -ENODEV;
> 
> Replace all these with if (!xen_pvh_domain()) ?

Yes.
> 
> > @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> >  	return gnttab_init();
> >  }
> >  
> > -core_initcall(__gnttab_init);
> > +core_initcall_sync(__gnttab_init);
> 
> Why has this become _sync?

It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
at gnttab_init):

+core_initcall(xen_pvh_gnttab_setup); /* Call it _before_ __gnttab_init */

Otherwise __gnttab_init will try to use the xen_auto_xlat_grant_frames
that has not yet xen_pvh_gnttab_setup setup.

Do you think I should: a) expand the comment in 'xen_pvh_gnttab_setup'
to mention this, or b) put it in the commit description, or c) what is
there is OK?
> 
> David
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12] Linux Xen PVH support.
  2014-01-02 16:50 ` [PATCH v12] Linux Xen PVH support David Vrabel
@ 2014-01-02 19:02   ` Konrad Rzeszutek Wilk
  2014-01-03 13:37     ` David Vrabel
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 19:02 UTC (permalink / raw)
  To: David Vrabel, boris.ostrovsky
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 04:50:14PM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > The patches, also available at
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v12
> > 
> > implements the neccessary functionality to boot a PV guest in PVH mode.
> 
> In general this looks in much better shape now.  Some of the refactoring
> patches should be queued for 3.14.

<nods> Thank you for your review!
> 
> I'm not sure if when the rest wants to go in given that the PVH
> hypervisor ABI is not yet finalized and is missing support for a number
> of things with no visible plan for how/when/if this missing
> functionality will be implemented.

We could follow the same path that Xen ARM in Linux did.

They put a thick stick in the ground with the caveat that this is
experimental. And we can do the same thing and lift the stick when we
are sure (mostly?) that it all works.

In regards to 'missing how/when/if'. As you can see from the Xen's TODO
there are a quite of items left - AMD support, shadow, etc, so it isn't
just in the kernel.

And on the Linux side things need to be tested out and figured out.

I can't give you an 'when', but the 'how/if' will be addressed. We are
under-staffed so it will take time. I sincerly hope that anybody who is
interested in PVH will help as well.

I presume that the path is going to be similar to how dom0 was
added in Linux - it took a couple of releases before it was OK, and
things have always been added to add that extra 'Uh-oh we forgot that'.
There is still some misisng pieces.

Anyhow that said - I believe this decision should be yours and Boris's.

P.S.
Happy 2014!

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12] Linux Xen PVH support.
  2014-01-02 18:39 ` H. Peter Anvin
@ 2014-01-02 19:12   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-02 19:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Thu, Jan 02, 2014 at 10:39:34AM -0800, H. Peter Anvin wrote:
> On 12/31/2013 08:35 PM, Konrad Rzeszutek Wilk wrote:
> > The patches, also available at
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v12
> > 
> > implements the neccessary functionality to boot a PV guest in PVH mode.
> > 
> 
> As x86 maintainer I would like to see a list of what pvops are necessary
> in PVH mode.  Obviously the hope is that the really invasive ones will

This patchset uses these
+               pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+               pv_cpu_ops.cpuid = xen_cpuid;

These are still in the code:

 	pv_info = xen_info;
        pv_init_ops = xen_init_ops;
	pv_apic_ops = xen_apic_ops;
	pv_time_ops = xen_time_ops;

And the x86_init,apic, and smp_ops ops are still in force.

This is just the first step so there might be some other ones
that are needed that I failed to enumerate.

We are at the infancy period.

> not be necessary (and there are good reasons to believe that is within
> reach.)
> 
> 	-hpa
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-02 18:32     ` Konrad Rzeszutek Wilk
@ 2014-01-03  1:34       ` Mukesh Rathor
  2014-01-03 11:29         ` David Vrabel
  2014-01-03 17:35         ` Konrad Rzeszutek Wilk
  2014-01-03 11:25       ` David Vrabel
  1 sibling, 2 replies; 90+ messages in thread
From: Mukesh Rathor @ 2014-01-03  1:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On Thu, 2 Jan 2014 13:32:21 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > > need to use emulated prefix call. We also check for vector
> > > callback early on, as it is a required feature. PVH also runs at
> > > default kernel IOPL.
> > > 
> > > Finally, pure PV settings are moved to a separate function that
> > > are only called for pure PV, ie, pv with pvmmu. They are also
> > > #ifdef with CONFIG_XEN_PVMMU.
> > [...]
> > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > unsigned int *bx, break;
> > >  	}
> > >  
> > > -	asm(XEN_EMULATE_PREFIX "cpuid"
> > > -		: "=a" (*ax),
> > > -		  "=b" (*bx),
> > > -		  "=c" (*cx),
> > > -		  "=d" (*dx)
> > > -		: "0" (*ax), "2" (*cx));
> > > +	if (xen_pvh_domain())
> > > +		native_cpuid(ax, bx, cx, dx);
> > > +	else
> > > +		asm(XEN_EMULATE_PREFIX "cpuid"
> > > +			: "=a" (*ax),
> > > +			"=b" (*bx),
> > > +			"=c" (*cx),
> > > +			"=d" (*dx)
> > > +			: "0" (*ax), "2" (*cx));
> > 
> > For this one off cpuid call it seems preferrable to me to use the
> > emulate prefix rather than diverge from PV.
> 
> This was before the PV cpuid was deemed OK to be used on PVH.
> Will rip this out to use the same version.

Whats wrong with using native cpuid? That is one of the benefits that
cpuid can be trapped via vmexit, and also there is talk of making PV
cpuid trap obsolete in the future. I suggest leaving it native.

Mukesh


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)
  2014-01-02 11:24   ` David Vrabel
@ 2014-01-03  1:36     ` Mukesh Rathor
  2014-01-03 10:14       ` David Vrabel
  2014-01-03 15:50     ` Stefano Stabellini
  1 sibling, 1 reply; 90+ messages in thread
From: Mukesh Rathor @ 2014-01-03  1:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On Thu, 2 Jan 2014 11:24:50 +0000
David Vrabel <david.vrabel@citrix.com> wrote:

> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > .. which are surprinsingly small compared to the amount for PV code.
> > 
> > PVH uses mostly native mmu ops, we leave the generic (native_*) for
> > the majority and just overwrite the baremetal with the ones we need.
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> 
> This TLB flush optimization should be a separate patch.

It's not really an "optimization", we are using PV mechanism instead
of native because PV one performs better. So, I think it's ok to belong
here.

Mukesh


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)
  2014-01-03  1:36     ` Mukesh Rathor
@ 2014-01-03 10:14       ` David Vrabel
  0 siblings, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-03 10:14 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: Konrad Rzeszutek Wilk, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On 03/01/14 01:36, Mukesh Rathor wrote:
> On Thu, 2 Jan 2014 11:24:50 +0000
> David Vrabel <david.vrabel@citrix.com> wrote:
> 
>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>> From: Mukesh Rathor <mukesh.rathor@oracle.com>
>>>
>>> .. which are surprinsingly small compared to the amount for PV code.
>>>
>>> PVH uses mostly native mmu ops, we leave the generic (native_*) for
>>> the majority and just overwrite the baremetal with the ones we need.
>>>
>>> We also optimize one - the TLB flush. The native operation would
>>> needlessly IPI offline VCPUs causing extra wakeups. Using the
>>> Xen one avoids that and lets the hypervisor determine which
>>> VCPU needs the TLB flush.
>>
>> This TLB flush optimization should be a separate patch.
> 
> It's not really an "optimization", we are using PV mechanism instead
> of native because PV one performs better.

Um.  Isn't that the very definition of an optimization?

I do think it is better for the essential MMU changes to be clearly
separate from the optional ones.

David


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-02 18:32     ` Konrad Rzeszutek Wilk
  2014-01-03  1:34       ` Mukesh Rathor
@ 2014-01-03 11:25       ` David Vrabel
  1 sibling, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-03 11:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 02/01/14 18:32, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>> From: Mukesh Rathor <mukesh.rathor@oracle.com>
>>>
>>> In the bootup code for PVH we can trap cpuid via vmexit, so don't
>>> need to use emulated prefix call. We also check for vector callback
>>> early on, as it is a required feature. PVH also runs at default kernel
>>> IOPL.
>>>
>>> Finally, pure PV settings are moved to a separate function that are
>>> only called for pure PV, ie, pv with pvmmu. They are also #ifdef
>>> with CONFIG_XEN_PVMMU.
>> [...]
>>> @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>>>  		break;
>>>  	}
>>>  
>>> -	asm(XEN_EMULATE_PREFIX "cpuid"
>>> -		: "=a" (*ax),
>>> -		  "=b" (*bx),
>>> -		  "=c" (*cx),
>>> -		  "=d" (*dx)
>>> -		: "0" (*ax), "2" (*cx));
>>> +	if (xen_pvh_domain())
>>> +		native_cpuid(ax, bx, cx, dx);
>>> +	else
>>> +		asm(XEN_EMULATE_PREFIX "cpuid"
>>> +			: "=a" (*ax),
>>> +			"=b" (*bx),
>>> +			"=c" (*cx),
>>> +			"=d" (*dx)
>>> +			: "0" (*ax), "2" (*cx));
>>
>> For this one off cpuid call it seems preferrable to me to use the
>> emulate prefix rather than diverge from PV.
> 
> This was before the PV cpuid was deemed OK to be used on PVH.
> Will rip this out to use the same version.
> 
>>
>>> @@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
>>>  
>>>  	xen_domain_type = XEN_PV_DOMAIN;
>>>  
>>> +	xen_setup_features();
>>> +	xen_pvh_early_guest_init();
>>>  	xen_setup_machphys_mapping();
>>>  
>>>  	/* Install Xen paravirt ops */
>>>  	pv_info = xen_info;
>>>  	pv_init_ops = xen_init_ops;
>>> -	pv_cpu_ops = xen_cpu_ops;
>>>  	pv_apic_ops = xen_apic_ops;
>>> +	if (xen_pvh_domain())
>>> +		pv_cpu_ops.cpuid = xen_cpuid;
>>> +	else
>>> +		pv_cpu_ops = xen_cpu_ops;
>>
>> If cpuid is trapped for PVH guests why does PVH need non-native cpuid op?
> 
> There are a couple of filtering done on the cpuid. But with HVM I am
> not entirely sure if it is worth preserving those or not.

I think we should behave like HVM for cpuid and any cpuid
policy/filtering should be set up by the toolstack.

> My fear is that if we switch over to the native one without the
> filtering that the kernel does we open up a can of worms that had been
> closed in the past. The reason is that for dom0 - there is no cpuid
> filtering being done. So it gets everything that the hypervisor sees.

I think we should switch to using the native cpuid pv-op and fix up any
problems as we encounter them (by fixing the toolstack to set up the
cpuid stuff properly).

dom0 isn't supported yet so that's not an issue.  In the future dom0
could be handled by either: a) setting the cpuid policy in the
hypervisor during dom0 create; or b) the kernel can set this up during
early boot.  In both cases using native cpuid should do the right thing.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP.
  2014-01-02 18:24     ` Konrad Rzeszutek Wilk
@ 2014-01-03 11:27       ` David Vrabel
  0 siblings, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-03 11:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 02/01/14 18:24, Konrad Rzeszutek Wilk wrote:
>>> +		loadsegment(es, 0);
>>> +		loadsegment(ds, 0);
>>> +		loadsegment(fs, 0);
>>> +#else
>>> +		/* PVH: TODO Implement. */
>>> +		BUG();
>>> +#endif
>>> +		return;    <==============
>>> +	}
>>>  	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
>>>  	pv_cpu_ops.load_gdt = xen_load_gdt_boot;
>>
>> If PVH uses native GDT why are these (and possibly other?) GDT ops needed?
> 
> They aren't. There is a 'return' there. I marked it for you with
> '<======'.

I missed that, in which case.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-03  1:34       ` Mukesh Rathor
@ 2014-01-03 11:29         ` David Vrabel
  2014-01-03 15:37           ` Stefano Stabellini
  2014-01-03 17:35         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-03 11:29 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: Konrad Rzeszutek Wilk, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On 03/01/14 01:34, Mukesh Rathor wrote:
> On Thu, 2 Jan 2014 13:32:21 -0500
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
>> On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>>> From: Mukesh Rathor <mukesh.rathor@oracle.com>
>>>>
>>>> In the bootup code for PVH we can trap cpuid via vmexit, so don't
>>>> need to use emulated prefix call. We also check for vector
>>>> callback early on, as it is a required feature. PVH also runs at
>>>> default kernel IOPL.
>>>>
>>>> Finally, pure PV settings are moved to a separate function that
>>>> are only called for pure PV, ie, pv with pvmmu. They are also
>>>> #ifdef with CONFIG_XEN_PVMMU.
>>> [...]
>>>> @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
>>>> unsigned int *bx, break;
>>>>  	}
>>>>  
>>>> -	asm(XEN_EMULATE_PREFIX "cpuid"
>>>> -		: "=a" (*ax),
>>>> -		  "=b" (*bx),
>>>> -		  "=c" (*cx),
>>>> -		  "=d" (*dx)
>>>> -		: "0" (*ax), "2" (*cx));
>>>> +	if (xen_pvh_domain())
>>>> +		native_cpuid(ax, bx, cx, dx);
>>>> +	else
>>>> +		asm(XEN_EMULATE_PREFIX "cpuid"
>>>> +			: "=a" (*ax),
>>>> +			"=b" (*bx),
>>>> +			"=c" (*cx),
>>>> +			"=d" (*dx)
>>>> +			: "0" (*ax), "2" (*cx));
>>>
>>> For this one off cpuid call it seems preferrable to me to use the
>>> emulate prefix rather than diverge from PV.
>>
>> This was before the PV cpuid was deemed OK to be used on PVH.
>> Will rip this out to use the same version.
> 
> Whats wrong with using native cpuid? That is one of the benefits that
> cpuid can be trapped via vmexit, and also there is talk of making PV
> cpuid trap obsolete in the future. I suggest leaving it native.

It should either use the PV interface or the HVM one, not a hybrid of
the two.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-02 18:50     ` Konrad Rzeszutek Wilk
@ 2014-01-03 11:54       ` David Vrabel
  2014-01-03 14:44         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-03 11:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
>>>  	return gnttab_init();
>>>  }
>>>  
>>> -core_initcall(__gnttab_init);
>>> +core_initcall_sync(__gnttab_init);
>>
>> Why has this become _sync?
> 
> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> at gnttab_init):


The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
you call xen_pvh_gnttab_setup() from within __gnttab_init() ?

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-02 18:47     ` Konrad Rzeszutek Wilk
@ 2014-01-03 12:11       ` David Vrabel
  2014-01-03 15:09         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-03 12:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, boris.ostrovsky, linux-kernel,
	stefano.stabellini

On 02/01/14 18:47, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 02, 2014 at 04:27:19PM +0000, David Vrabel wrote:
>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>> The 'xen_hvm_resume_frames' used to be an 'unsigned long'
>>> and contain the virtual address of the grants. That was OK
>>> for most architectures (PVHVM, ARM) were the grants are contingous
>>> in memory. That however is not the case for PVH - in which case
>>> we will have to do a lookup for each virtual address for the PFN.
>>>
>>> Instead of doing that, lets make it a structure which will contain
>>> the array of PFNs, the virtual address and the count of said PFNs.
>>>
>>> Also provide a generic functions: gnttab_setup_auto_xlat_frames and
>>> gnttab_free_auto_xlat_frames to populate said structure with
>>> appropiate values for PVHVM and ARM.
>>>
>>> To round it off, change the name from 'xen_hvm_resume_frames' to
>>> a more descriptive one - 'xen_auto_xlat_grant_frames'.
>>>
>>> For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
>>> we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
>> [...]
>>> --- a/drivers/xen/grant-table.c
>>> +++ b/drivers/xen/grant-table.c
>> [...]
>>> @@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
>>>  }
>>>  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
>>>  
>>> +int gnttab_setup_auto_xlat_frames(unsigned long addr)
>>> +{
>>> +	xen_pfn_t *pfn;
>>> +	unsigned int max_nr_gframes = __max_nr_grant_frames();
>>> +	int i;
>>> +
>>> +	if (xen_auto_xlat_grant_frames.count)
>>> +		return -EINVAL;
>>> +
>>> +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
>>> +	if (!pfn)
>>> +		return -ENOMEM;
>>> +	for (i = 0; i < max_nr_gframes; i++)
>>> +		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));
>>
>> PFN_DOWN(addr) + i looks better to me.
>>
>>> +
>>> +	xen_auto_xlat_grant_frames.vaddr = addr;

I think you should move the xen_remap() call here.

>> Huh? addr is a physical address but you're assigning it to a field
>> called vaddr?  I think you mean to set this field to the result of the
>> xen_remap() call, yes?
> 
> It ends up doing that in gnttab_init. Not to
> xen_auto_xlat_grant_frames.vaddr but to gnttab_shared.addr.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12] Linux Xen PVH support.
  2014-01-02 19:02   ` Konrad Rzeszutek Wilk
@ 2014-01-03 13:37     ` David Vrabel
  0 siblings, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-03 13:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: boris.ostrovsky, linux-kernel, xen-devel, stefano.stabellini,
	mukesh.rathor

On 02/01/14 19:02, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 02, 2014 at 04:50:14PM +0000, David Vrabel wrote:
>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>> The patches, also available at
>>> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v12
>>>
>>> implements the neccessary functionality to boot a PV guest in PVH mode.
>>
>> In general this looks in much better shape now.  Some of the refactoring
>> patches should be queued for 3.14.
> 
> <nods> Thank you for your review!
>>
>> I'm not sure if when the rest wants to go in given that the PVH
>> hypervisor ABI is not yet finalized and is missing support for a number
>> of things with no visible plan for how/when/if this missing
>> functionality will be implemented.
> 
> We could follow the same path that Xen ARM in Linux did.

ARM was a whole new architecture with limited hardware availability
initially so I think what the ARM port did was the right approach.  It's
less clear to me if this is sensible for an existing, widely used
architecture.

If the PVH ABI was fixed and documented then it would be a non-brainer
to merge kernel support even if it was not fully feature complete.

What I don't want is guests or dom0s that used to boot in PVH mode that
would end up not booting at all if Xen is upgraded.  It's probably ok if
PV can be used a fallback.  It's also probably ok if this fallback is a
manual process (e.g., user has to set pvh=0 to get a working guest again).

It would also be preferable for PVH guests to fail hard if run on newer,
incompatible hypervisors.  Whether this is feasible would depend on what
the ABI changes are.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 07/18] xen/pvh: Setup up shared_info.
  2014-01-02 11:27   ` David Vrabel
  2014-01-02 18:23     ` Konrad Rzeszutek Wilk
@ 2014-01-03 14:39     ` Konrad Rzeszutek Wilk
  2014-01-03 15:18       ` David Vrabel
  1 sibling, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 14:39 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Thu, Jan 02, 2014 at 11:27:56AM +0000, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > For PVHVM the shared_info structure is provided via the same way
> > as for normal PV guests (see include/xen/interface/xen.h).
> > 
> > That is during bootup we get 'xen_start_info' via the %esi register
> > in startup_xen. Then later we extract the 'shared_info' from said
> > structure (in xen_setup_shared_info) and start using it.
> > 
> > The 'xen_setup_shared_info' is all setup to work with auto-xlat
> > guests, but there are two functions which it calls that are not:
> > xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
> > This patch modifies those to work in auto-xlat mode.
> [...]
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1147,8 +1147,9 @@ void xen_setup_vcpu_info_placement(void)
> >  		xen_vcpu_setup(cpu);
> >  
> >  	/* xen_vcpu_setup managed to place the vcpu_info within the
> > -	   percpu area for all cpus, so make use of it */
> > -	if (have_vcpu_info_placement) {
> > +	 * percpu area for all cpus, so make use of it. Note that for
> > +	 * PVH we want to use native IRQ mechanism. */
> > +	if (have_vcpu_info_placement && !xen_pvh_domain()) {
> >  		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
> >  		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
> >  		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
> 
> Should this be in a separate patch: "xen/pvh: use native irq ops"?

On a second thought I think not. The reason is explained in the commit
description:

 The 'xen_setup_shared_info' is all setup to work with auto-xlat
 guests, but there are two functions which it calls that are not:
 xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
 This patch modifies those to work in auto-xlat mode.

If we move this to another patch, it is going to be mostly the same
comment and this patch will feel unfinished.


> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 11:54       ` David Vrabel
@ 2014-01-03 14:44         ` Konrad Rzeszutek Wilk
  2014-01-03 15:41           ` David Vrabel
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 14:44 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> >> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> >>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> >>>  	return gnttab_init();
> >>>  }
> >>>  
> >>> -core_initcall(__gnttab_init);
> >>> +core_initcall_sync(__gnttab_init);
> >>
> >> Why has this become _sync?
> > 
> > It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> > at gnttab_init):
> 
> 
> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't

It has a clear ordering property.

> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?

No. That is due to the fact that __gnttab_init() is in drivers/xen and is
also used by the ARM code.

Stefano in his previous review mentioned he would like PVH specific
code in arch/x86:

https://lkml.org/lkml/2013/12/18/507

> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-03 12:11       ` [Xen-devel] " David Vrabel
@ 2014-01-03 15:09         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 15:09 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, boris.ostrovsky, linux-kernel, stefano.stabellini

On Fri, Jan 03, 2014 at 12:11:42PM +0000, David Vrabel wrote:
> On 02/01/14 18:47, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jan 02, 2014 at 04:27:19PM +0000, David Vrabel wrote:
> >> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> >>> The 'xen_hvm_resume_frames' used to be an 'unsigned long'
> >>> and contain the virtual address of the grants. That was OK
> >>> for most architectures (PVHVM, ARM) were the grants are contingous
> >>> in memory. That however is not the case for PVH - in which case
> >>> we will have to do a lookup for each virtual address for the PFN.
> >>>
> >>> Instead of doing that, lets make it a structure which will contain
> >>> the array of PFNs, the virtual address and the count of said PFNs.
> >>>
> >>> Also provide a generic functions: gnttab_setup_auto_xlat_frames and
> >>> gnttab_free_auto_xlat_frames to populate said structure with
> >>> appropiate values for PVHVM and ARM.
> >>>
> >>> To round it off, change the name from 'xen_hvm_resume_frames' to
> >>> a more descriptive one - 'xen_auto_xlat_grant_frames'.
> >>>
> >>> For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
> >>> we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
> >> [...]
> >>> --- a/drivers/xen/grant-table.c
> >>> +++ b/drivers/xen/grant-table.c
> >> [...]
> >>> @@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
> >>>  
> >>> +int gnttab_setup_auto_xlat_frames(unsigned long addr)
> >>> +{
> >>> +	xen_pfn_t *pfn;
> >>> +	unsigned int max_nr_gframes = __max_nr_grant_frames();
> >>> +	int i;
> >>> +
> >>> +	if (xen_auto_xlat_grant_frames.count)
> >>> +		return -EINVAL;
> >>> +
> >>> +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
> >>> +	if (!pfn)
> >>> +		return -ENOMEM;
> >>> +	for (i = 0; i < max_nr_gframes; i++)
> >>> +		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));
> >>
> >> PFN_DOWN(addr) + i looks better to me.
> >>
> >>> +
> >>> +	xen_auto_xlat_grant_frames.vaddr = addr;
> 
> I think you should move the xen_remap() call here.

Excellent suggestion!
> 
> >> Huh? addr is a physical address but you're assigning it to a field
> >> called vaddr?  I think you mean to set this field to the result of the
> >> xen_remap() call, yes?
> > 
> > It ends up doing that in gnttab_init. Not to
> > xen_auto_xlat_grant_frames.vaddr but to gnttab_shared.addr.
> 
> David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 07/18] xen/pvh: Setup up shared_info.
  2014-01-03 14:39     ` Konrad Rzeszutek Wilk
@ 2014-01-03 15:18       ` David Vrabel
  0 siblings, 0 replies; 90+ messages in thread
From: David Vrabel @ 2014-01-03 15:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 03/01/14 14:39, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 02, 2014 at 11:27:56AM +0000, David Vrabel wrote:
>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>> From: Mukesh Rathor <mukesh.rathor@oracle.com>
>>>
>>> For PVHVM the shared_info structure is provided via the same way
>>> as for normal PV guests (see include/xen/interface/xen.h).
>>>
>>> That is during bootup we get 'xen_start_info' via the %esi register
>>> in startup_xen. Then later we extract the 'shared_info' from said
>>> structure (in xen_setup_shared_info) and start using it.
>>>
>>> The 'xen_setup_shared_info' is all setup to work with auto-xlat
>>> guests, but there are two functions which it calls that are not:
>>> xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
>>> This patch modifies those to work in auto-xlat mode.
>> [...]
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -1147,8 +1147,9 @@ void xen_setup_vcpu_info_placement(void)
>>>  		xen_vcpu_setup(cpu);
>>>  
>>>  	/* xen_vcpu_setup managed to place the vcpu_info within the
>>> -	   percpu area for all cpus, so make use of it */
>>> -	if (have_vcpu_info_placement) {
>>> +	 * percpu area for all cpus, so make use of it. Note that for
>>> +	 * PVH we want to use native IRQ mechanism. */
>>> +	if (have_vcpu_info_placement && !xen_pvh_domain()) {
>>>  		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
>>>  		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
>>>  		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
>>
>> Should this be in a separate patch: "xen/pvh: use native irq ops"?
> 
> On a second thought I think not. The reason is explained in the commit
> description:
> 
>  The 'xen_setup_shared_info' is all setup to work with auto-xlat
>  guests, but there are two functions which it calls that are not:
>  xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
>  This patch modifies those to work in auto-xlat mode.
> 
> If we move this to another patch, it is going to be mostly the same
> comment and this patch will feel unfinished.

Looking at again, this hunk be in "xen/pvh: Piggyback on PVHVM for event
channel" where we have:

+	if (!xen_feature(XENFEAT_hvm_callback_vector))
+		pv_irq_ops = xen_irq_ops;

The tests in both places need to be the same as well.

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2).
  2014-01-02 11:13   ` David Vrabel
@ 2014-01-03 15:33     ` Stefano Stabellini
  0 siblings, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:33 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini, mukesh.rathor

On Thu, 2 Jan 2014, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > Which is a PV guest with auto page translation enabled
> > and with vector callback. It is a cross between PVHVM and PV.
> > 
> > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > with modifications):
> > 
> > "* the guest uses auto translate:
> >  - p2m is managed by Xen
> >  - pagetables are owned by the guest
> >  - mmu_update hypercall not available
> > * it uses event callback and not vlapic emulation,
> > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > 
> > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > PV guest with auto translate, although it does use hvm_op for setting
> > callback vector."
> > 
> > We don't have yet a Kconfig entry setup as we do not
> > have all the parts ready for it - so we piggyback
> > on the PVHVM config option. This scaffolding will
> > be removed later.
> > 
> > Note that on ARM the concept of PVH is non-existent. As Ian
> > put it: "an ARM guest is neither PV nor HVM nor PVHVM.
> > It's a bit like PVH but is different also (it's further towards
> > the H end of the spectrum than even PVH).". As such these
> > options (PVHVM, PVH) are never enabled nor seen on ARM
> > compilations.
> [...]
> > --- a/include/xen/xen.h
> > +++ b/include/xen/xen.h
> > @@ -29,4 +29,20 @@ extern enum xen_domain_type xen_domain_type;
> >  #define xen_initial_domain()	(0)
> >  #endif	/* CONFIG_XEN_DOM0 */
> >  
> > +#ifdef CONFIG_XEN_PVHVM
> > +/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
> 
> This is a bit confusing.  I think it would be better to add the
> CONFIG_XEN_PVH option with this patch but make it default n and not
> possible to enable.
 
I am OK with the patch as is, but your suggestion would probably make
things better.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-03 11:29         ` David Vrabel
@ 2014-01-03 15:37           ` Stefano Stabellini
  0 siblings, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:37 UTC (permalink / raw)
  To: David Vrabel
  Cc: Mukesh Rathor, Konrad Rzeszutek Wilk, linux-kernel, xen-devel,
	boris.ostrovsky, stefano.stabellini

On Fri, 3 Jan 2014, David Vrabel wrote:
> On 03/01/14 01:34, Mukesh Rathor wrote:
> > On Thu, 2 Jan 2014 13:32:21 -0500
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > 
> >> On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
> >>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> >>>> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> >>>>
> >>>> In the bootup code for PVH we can trap cpuid via vmexit, so don't
> >>>> need to use emulated prefix call. We also check for vector
> >>>> callback early on, as it is a required feature. PVH also runs at
> >>>> default kernel IOPL.
> >>>>
> >>>> Finally, pure PV settings are moved to a separate function that
> >>>> are only called for pure PV, ie, pv with pvmmu. They are also
> >>>> #ifdef with CONFIG_XEN_PVMMU.
> >>> [...]
> >>>> @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> >>>> unsigned int *bx, break;
> >>>>  	}
> >>>>  
> >>>> -	asm(XEN_EMULATE_PREFIX "cpuid"
> >>>> -		: "=a" (*ax),
> >>>> -		  "=b" (*bx),
> >>>> -		  "=c" (*cx),
> >>>> -		  "=d" (*dx)
> >>>> -		: "0" (*ax), "2" (*cx));
> >>>> +	if (xen_pvh_domain())
> >>>> +		native_cpuid(ax, bx, cx, dx);
> >>>> +	else
> >>>> +		asm(XEN_EMULATE_PREFIX "cpuid"
> >>>> +			: "=a" (*ax),
> >>>> +			"=b" (*bx),
> >>>> +			"=c" (*cx),
> >>>> +			"=d" (*dx)
> >>>> +			: "0" (*ax), "2" (*cx));
> >>>
> >>> For this one off cpuid call it seems preferrable to me to use the
> >>> emulate prefix rather than diverge from PV.
> >>
> >> This was before the PV cpuid was deemed OK to be used on PVH.
> >> Will rip this out to use the same version.
> > 
> > Whats wrong with using native cpuid? That is one of the benefits that
> > cpuid can be trapped via vmexit, and also there is talk of making PV
> > cpuid trap obsolete in the future. I suggest leaving it native.
> 
> It should either use the PV interface or the HVM one, not a hybrid of
> the two.

I agree

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 04/18] xen/pvh: Don't setup P2M tree.
  2014-01-01  4:35 ` [PATCH v12 04/18] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
  2014-01-02 11:17   ` David Vrabel
@ 2014-01-03 15:41   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> P2M is not available for PVH. Fortunatly for us the
> P2M code already has mostly the support for auto-xlat guest thanks to
> commit 3d24bbd7dddbea54358a9795abaf051b0f18973c
> "grant-table: call set_phys_to_machine after mapping grant refs"
> which: "
> introduces set_phys_to_machine calls for auto_translated guests
> (even on x86) in gnttab_map_refs and gnttab_unmap_refs.
> translated by swiotlb-xen... " so we don't need to muck much.
> 
> with above mentioned "commit you'll get set_phys_to_machine calls
> from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
> anything with them " (Stefano Stabellini) which is OK - we want
> them to be NOPs.
> 
> This is because we assume that an "IOMMU is always present on the
> plaform and Xen is going to make the appropriate IOMMU pagetable
> changes in the hypercall implementation of GNTTABOP_map_grant_ref
> and GNTTABOP_unmap_grant_ref, then eveything should be transparent
> from PVH priviligied point of view and DMA transfers involving
> foreign pages keep working with no issues[sp]
> 
> Otherwise we would need a P2M (and an M2P) for PVH priviligied to
> track these foreign pages .. (see arch/arm/xen/p2m.c)."
> (Stefano Stabellini).
> 
> We still have to inhibit the building of the P2M tree.
> That had been done in the past by not calling
> xen_build_dynamic_phys_to_machine (which setups the P2M tree
> and gives us virtual address to access them). But we are missing
> a check for xen_build_mfn_list_list - which was continuing to setup
> the P2M tree and would blow up at trying to get the virtual
> address of p2m_missing (which would have been setup by
> xen_build_dynamic_phys_to_machine).
> 
> Hence a check is needed to not call xen_build_mfn_list_list when
> running in auto-xlat mode.
> 
> Instead of replicating the check for auto-xlat in enlighten.c
> do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
> is called also in xen_arch_post_suspend without any checks for
> auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
> allocate space for an P2M tree.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/enlighten.c |  3 +--
>  arch/x86/xen/p2m.c       | 12 ++++++++++--
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 755e5bb..ab4dd70 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1493,8 +1493,7 @@ asmlinkage void __init xen_start_kernel(void)
>  	x86_configure_nx();
>  
>  	/* Get mfn list */
> -	if (!xen_feature(XENFEAT_auto_translated_physmap))
> -		xen_build_dynamic_phys_to_machine();
> +	xen_build_dynamic_phys_to_machine();
>  
>  	/*
>  	 * Set up kernel GDT and segment registers, mainly so that
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 2ae8699..fb7ee0a 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -280,6 +280,9 @@ void __ref xen_build_mfn_list_list(void)
>  {
>  	unsigned long pfn;
>  
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
>  	/* Pre-initialize p2m_top_mfn to be completely missing */
>  	if (p2m_top_mfn == NULL) {
>  		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
> @@ -346,10 +349,15 @@ void xen_setup_mfn_list_list(void)
>  /* Set up p2m_top to point to the domain-builder provided p2m pages */
>  void __init xen_build_dynamic_phys_to_machine(void)
>  {
> -	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
> -	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
> +	unsigned long *mfn_list;
> +	unsigned long max_pfn;
>  	unsigned long pfn;
>  
> +	 if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
> +	mfn_list = (unsigned long *)xen_start_info->mfn_list;
> +	max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
>  	xen_max_p2m_pfn = max_pfn;
>  
>  	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 14:44         ` Konrad Rzeszutek Wilk
@ 2014-01-03 15:41           ` David Vrabel
  2014-01-03 15:48             ` [Xen-devel] " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: David Vrabel @ 2014-01-03 15:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, stefano.stabellini,
	mukesh.rathor

On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
>> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
>>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
>>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
>>>>>  	return gnttab_init();
>>>>>  }
>>>>>  
>>>>> -core_initcall(__gnttab_init);
>>>>> +core_initcall_sync(__gnttab_init);
>>>>
>>>> Why has this become _sync?
>>>
>>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
>>> at gnttab_init):
>>
>>
>> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> 
> It has a clear ordering property.

This really isn't obvious to me.  Can you point to the docs/code the
guarantee this?  I couldn't find it.

>> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> 
> No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> also used by the ARM code.
> 
> Stefano in his previous review mentioned he would like PVH specific
> code in arch/x86:
> 
> https://lkml.org/lkml/2013/12/18/507

Call it xen_arch_gnttab_setup() and add weak stub for other architectures?

David

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code.
  2014-01-01  4:35 ` [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code Konrad Rzeszutek Wilk
  2014-01-02 11:21   ` David Vrabel
@ 2014-01-03 15:47   ` Stefano Stabellini
  2014-01-03 16:02     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> The revector and copying of the P2M only happens when
> !auto-xlat and on 64-bit builds. It is not obvious from
> the code, so lets have seperate 32 and 64-bit functions.
> 
> We also invert the check for auto-xlat to make the code
> flow simpler.
> 
> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/mmu.c | 73 ++++++++++++++++++++++++++++++------------------------
>  1 file changed, 40 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index ce563be..d792a69 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1198,44 +1198,40 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
>  	 * instead of somewhere later and be confusing. */
>  	xen_mc_flush();
>  }
> -#endif
> -static void __init xen_pagetable_init(void)
> +static void __init xen_pagetable_p2m_copy(void)
>  {
> -#ifdef CONFIG_X86_64
>  	unsigned long size;
>  	unsigned long addr;
> -#endif
> -	paging_init();
> -	xen_setup_shared_info();
> -#ifdef CONFIG_X86_64
> -	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> -		unsigned long new_mfn_list;
> +	unsigned long new_mfn_list;
> +
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
> +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +
> +	/* On 32-bit, we get zero so this never gets executed. */

Given that this code is already ifdef'ed CONFIG_X86_64, this comment
should be removed.


> +	new_mfn_list = xen_revector_p2m_tree();

I take from the comment that new_mfn_list must not be zero. Maybe we
want a BUG_ON or a WARN_ON?


> +	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> +		/* using __ka address and sticking INVALID_P2M_ENTRY! */
> +		memset((void *)xen_start_info->mfn_list, 0xff, size);
> +
> +		/* We should be in __ka space. */
> +		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> +		addr = xen_start_info->mfn_list;
> +		/* We roundup to the PMD, which means that if anybody at this stage is
> +		 * using the __ka address of xen_start_info or xen_start_info->shared_info
> +		 * they are in going to crash. Fortunatly we have already revectored
> +		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> +		size = roundup(size, PMD_SIZE);
> +		xen_cleanhighmap(addr, addr + size);
>  
>  		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +		memblock_free(__pa(xen_start_info->mfn_list), size);
> +		/* And revector! Bye bye old array */
> +		xen_start_info->mfn_list = new_mfn_list;
> +	} else
> +		return;

This was a normal condition when the function was executed on both
x86_64 and x86_32. Now that it is only executed on x86_64, is it still
the case?


> -		/* On 32-bit, we get zero so this never gets executed. */
> -		new_mfn_list = xen_revector_p2m_tree();
> -		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> -			/* using __ka address and sticking INVALID_P2M_ENTRY! */
> -			memset((void *)xen_start_info->mfn_list, 0xff, size);
> -
> -			/* We should be in __ka space. */
> -			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> -			addr = xen_start_info->mfn_list;
> -			/* We roundup to the PMD, which means that if anybody at this stage is
> -			 * using the __ka address of xen_start_info or xen_start_info->shared_info
> -			 * they are in going to crash. Fortunatly we have already revectored
> -			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> -			size = roundup(size, PMD_SIZE);
> -			xen_cleanhighmap(addr, addr + size);
> -
> -			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> -			memblock_free(__pa(xen_start_info->mfn_list), size);
> -			/* And revector! Bye bye old array */
> -			xen_start_info->mfn_list = new_mfn_list;
> -		} else
> -			goto skip;
> -	}
>  	/* At this stage, cleanup_highmap has already cleaned __ka space
>  	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
>  	 * the ramdisk). We continue on, erasing PMD entries that point to page
> @@ -1255,8 +1251,19 @@ static void __init xen_pagetable_init(void)
>  	 * anything at this stage. */
>  	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
>  #endif
> -skip:
> +}
> +#else
> +static void __init xen_pagetable_p2m_copy(void)
> +{
> +	/* Nada! */
> +}
>  #endif
> +
> +static void __init xen_pagetable_init(void)
> +{
> +	paging_init();
> +	xen_setup_shared_info();
> +	xen_pagetable_p2m_copy();
>  	xen_post_allocator_init();
>  }
>  static void xen_write_cr2(unsigned long cr2)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 15:41           ` David Vrabel
@ 2014-01-03 15:48             ` Konrad Rzeszutek Wilk
  2014-01-03 17:20               ` Stefano Stabellini
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 15:48 UTC (permalink / raw)
  To: David Vrabel, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk, xen-devel, boris.ostrovsky, linux-kernel,
	stefano.stabellini

On Fri, Jan 03, 2014 at 03:41:51PM +0000, David Vrabel wrote:
> On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> >> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> >>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> >>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> >>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> >>>>>  	return gnttab_init();
> >>>>>  }
> >>>>>  
> >>>>> -core_initcall(__gnttab_init);
> >>>>> +core_initcall_sync(__gnttab_init);
> >>>>
> >>>> Why has this become _sync?
> >>>
> >>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> >>> at gnttab_init):
> >>
> >>
> >> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> > 
> > It has a clear ordering property.
> 
> This really isn't obvious to me.  Can you point to the docs/code the
> guarantee this?  I couldn't find it.

include/linux/init.h
> 
> >> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> > 
> > No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> > also used by the ARM code.
> > 
> > Stefano in his previous review mentioned he would like PVH specific
> > code in arch/x86:
> > 
> > https://lkml.org/lkml/2013/12/18/507
> 
> Call it xen_arch_gnttab_setup() and add weak stub for other architectures?

Stefano, thoughts?

> 
> David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)
  2014-01-02 11:24   ` David Vrabel
  2014-01-03  1:36     ` Mukesh Rathor
@ 2014-01-03 15:50     ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini, mukesh.rathor

On Thu, 2 Jan 2014, David Vrabel wrote:
> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > .. which are surprinsingly small compared to the amount for PV code.
> > 
> > PVH uses mostly native mmu ops, we leave the generic (native_*) for
> > the majority and just overwrite the baremetal with the ones we need.
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> 
> This TLB flush optimization should be a separate patch.

Right.
Aside from this:

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code.
  2014-01-03 15:47   ` Stefano Stabellini
@ 2014-01-03 16:02     ` Konrad Rzeszutek Wilk
  2014-01-03 16:23       ` Stefano Stabellini
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 16:02 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	mukesh.rathor

On Fri, Jan 03, 2014 at 03:47:15PM +0000, Stefano Stabellini wrote:
> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > The revector and copying of the P2M only happens when
> > !auto-xlat and on 64-bit builds. It is not obvious from
> > the code, so lets have seperate 32 and 64-bit functions.
> > 
> > We also invert the check for auto-xlat to make the code
> > flow simpler.
> > 
> > Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/mmu.c | 73 ++++++++++++++++++++++++++++++------------------------
> >  1 file changed, 40 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index ce563be..d792a69 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -1198,44 +1198,40 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
> >  	 * instead of somewhere later and be confusing. */
> >  	xen_mc_flush();
> >  }
> > -#endif
> > -static void __init xen_pagetable_init(void)
> > +static void __init xen_pagetable_p2m_copy(void)
> >  {
> > -#ifdef CONFIG_X86_64
> >  	unsigned long size;
> >  	unsigned long addr;
> > -#endif
> > -	paging_init();
> > -	xen_setup_shared_info();
> > -#ifdef CONFIG_X86_64
> > -	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> > -		unsigned long new_mfn_list;
> > +	unsigned long new_mfn_list;
> > +
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> > +
> > +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> > +
> > +	/* On 32-bit, we get zero so this never gets executed. */
> 
> Given that this code is already ifdef'ed CONFIG_X86_64, this comment
> should be removed.

Sure.
> 
> 
> > +	new_mfn_list = xen_revector_p2m_tree();
> 
> I take from the comment that new_mfn_list must not be zero. Maybe we
> want a BUG_ON or a WARN_ON?

It can be zero, in which case we don't want to revector.
> 
> 
> > +	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> > +		/* using __ka address and sticking INVALID_P2M_ENTRY! */
> > +		memset((void *)xen_start_info->mfn_list, 0xff, size);
> > +
> > +		/* We should be in __ka space. */
> > +		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> > +		addr = xen_start_info->mfn_list;
> > +		/* We roundup to the PMD, which means that if anybody at this stage is
> > +		 * using the __ka address of xen_start_info or xen_start_info->shared_info
> > +		 * they are in going to crash. Fortunatly we have already revectored
> > +		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> > +		size = roundup(size, PMD_SIZE);
> > +		xen_cleanhighmap(addr, addr + size);
> >  
> >  		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> > +		memblock_free(__pa(xen_start_info->mfn_list), size);
> > +		/* And revector! Bye bye old array */
> > +		xen_start_info->mfn_list = new_mfn_list;
> > +	} else
> > +		return;
> 
> This was a normal condition when the function was executed on both
> x86_64 and x86_32. Now that it is only executed on x86_64, is it still
> the case?

It could be. Since this particular patch just moves code I would hesitate
to make changes here. Perhaps a seperate patch after the conditions
under which the xen_revector_p2m_tree() fail can be done?

> 
> 
> > -		/* On 32-bit, we get zero so this never gets executed. */
> > -		new_mfn_list = xen_revector_p2m_tree();
> > -		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> > -			/* using __ka address and sticking INVALID_P2M_ENTRY! */
> > -			memset((void *)xen_start_info->mfn_list, 0xff, size);
> > -
> > -			/* We should be in __ka space. */
> > -			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> > -			addr = xen_start_info->mfn_list;
> > -			/* We roundup to the PMD, which means that if anybody at this stage is
> > -			 * using the __ka address of xen_start_info or xen_start_info->shared_info
> > -			 * they are in going to crash. Fortunatly we have already revectored
> > -			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> > -			size = roundup(size, PMD_SIZE);
> > -			xen_cleanhighmap(addr, addr + size);
> > -
> > -			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> > -			memblock_free(__pa(xen_start_info->mfn_list), size);
> > -			/* And revector! Bye bye old array */
> > -			xen_start_info->mfn_list = new_mfn_list;
> > -		} else
> > -			goto skip;
> > -	}
> >  	/* At this stage, cleanup_highmap has already cleaned __ka space
> >  	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
> >  	 * the ramdisk). We continue on, erasing PMD entries that point to page
> > @@ -1255,8 +1251,19 @@ static void __init xen_pagetable_init(void)
> >  	 * anything at this stage. */
> >  	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
> >  #endif
> > -skip:
> > +}
> > +#else
> > +static void __init xen_pagetable_p2m_copy(void)
> > +{
> > +	/* Nada! */
> > +}
> >  #endif
> > +
> > +static void __init xen_pagetable_init(void)
> > +{
> > +	paging_init();
> > +	xen_setup_shared_info();
> > +	xen_pagetable_p2m_copy();
> >  	xen_post_allocator_init();
> >  }
> >  static void xen_write_cr2(unsigned long cr2)
> > -- 
> > 1.8.3.1
> > 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2)
  2014-01-01  4:35 ` [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2) Konrad Rzeszutek Wilk
  2014-01-02 11:44   ` David Vrabel
@ 2014-01-03 16:22   ` Stefano Stabellini
  2014-01-03 17:59     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> As we do not have yet a mechanism for that.
> 
> This also impacts the ARM/ARM64 code (which does not have
> hotplug support yet).
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/xen/cpu_hotplug.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> index cc6513a..5f80802 100644
> --- a/drivers/xen/cpu_hotplug.c
> +++ b/drivers/xen/cpu_hotplug.c
> @@ -4,6 +4,7 @@
>  
>  #include <xen/xen.h>
>  #include <xen/xenbus.h>
> +#include <xen/features.h>
>  
>  #include <asm/xen/hypervisor.h>
>  #include <asm/cpu.h>
> @@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
>  	static struct notifier_block xsn_cpu = {
>  		.notifier_call = setup_cpu_watcher };
>  
> -	if (!xen_pv_domain())
> +	/* PVH/ARM/ARM64 TBD/FIXME: future work */
> +	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
>  		return -ENODEV;
>  
>  	register_xenstore_notifier(&xsn_cpu);

Sorry for being a bit obnoxious but I was thinking that using a
xen_feature(XENFEAT_auto_translated_physmap) check is conceptually
wrong, because cpu hotplug and nested paging are orthogonal.

Given that we most probably want to follow the PV path for cpu_hotplug
(that is using drivers/xen/cpu_hotplug.c), is there actually a problem
with building and initializing it on PVH guests?
If it works as it is, I would be tempted to leave it for now.

Otherwise the patch is OK and you can add my Acked-by.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code.
  2014-01-03 16:02     ` Konrad Rzeszutek Wilk
@ 2014-01-03 16:23       ` Stefano Stabellini
  0 siblings, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, linux-kernel, xen-devel, boris.ostrovsky,
	david.vrabel, mukesh.rathor

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 03, 2014 at 03:47:15PM +0000, Stefano Stabellini wrote:
> > On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > The revector and copying of the P2M only happens when
> > > !auto-xlat and on 64-bit builds. It is not obvious from
> > > the code, so lets have seperate 32 and 64-bit functions.
> > > 
> > > We also invert the check for auto-xlat to make the code
> > > flow simpler.
> > > 
> > > Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  arch/x86/xen/mmu.c | 73 ++++++++++++++++++++++++++++++------------------------
> > >  1 file changed, 40 insertions(+), 33 deletions(-)
> > > 
> > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > > index ce563be..d792a69 100644
> > > --- a/arch/x86/xen/mmu.c
> > > +++ b/arch/x86/xen/mmu.c
> > > @@ -1198,44 +1198,40 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
> > >  	 * instead of somewhere later and be confusing. */
> > >  	xen_mc_flush();
> > >  }
> > > -#endif
> > > -static void __init xen_pagetable_init(void)
> > > +static void __init xen_pagetable_p2m_copy(void)
> > >  {
> > > -#ifdef CONFIG_X86_64
> > >  	unsigned long size;
> > >  	unsigned long addr;
> > > -#endif
> > > -	paging_init();
> > > -	xen_setup_shared_info();
> > > -#ifdef CONFIG_X86_64
> > > -	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> > > -		unsigned long new_mfn_list;
> > > +	unsigned long new_mfn_list;
> > > +
> > > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > > +		return;
> > > +
> > > +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> > > +
> > > +	/* On 32-bit, we get zero so this never gets executed. */
> > 
> > Given that this code is already ifdef'ed CONFIG_X86_64, this comment
> > should be removed.
> 
> Sure.
> > 
> > 
> > > +	new_mfn_list = xen_revector_p2m_tree();
> > 
> > I take from the comment that new_mfn_list must not be zero. Maybe we
> > want a BUG_ON or a WARN_ON?
> 
> It can be zero, in which case we don't want to revector.
> > 
> > 
> > > +	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> > > +		/* using __ka address and sticking INVALID_P2M_ENTRY! */
> > > +		memset((void *)xen_start_info->mfn_list, 0xff, size);
> > > +
> > > +		/* We should be in __ka space. */
> > > +		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> > > +		addr = xen_start_info->mfn_list;
> > > +		/* We roundup to the PMD, which means that if anybody at this stage is
> > > +		 * using the __ka address of xen_start_info or xen_start_info->shared_info
> > > +		 * they are in going to crash. Fortunatly we have already revectored
> > > +		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> > > +		size = roundup(size, PMD_SIZE);
> > > +		xen_cleanhighmap(addr, addr + size);
> > >  
> > >  		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> > > +		memblock_free(__pa(xen_start_info->mfn_list), size);
> > > +		/* And revector! Bye bye old array */
> > > +		xen_start_info->mfn_list = new_mfn_list;
> > > +	} else
> > > +		return;
> > 
> > This was a normal condition when the function was executed on both
> > x86_64 and x86_32. Now that it is only executed on x86_64, is it still
> > the case?
> 
> It could be. Since this particular patch just moves code I would hesitate
> to make changes here. Perhaps a seperate patch after the conditions
> under which the xen_revector_p2m_tree() fail can be done?

No, I think that's OK as it is.


> > 
> > 
> > > -		/* On 32-bit, we get zero so this never gets executed. */
> > > -		new_mfn_list = xen_revector_p2m_tree();
> > > -		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> > > -			/* using __ka address and sticking INVALID_P2M_ENTRY! */
> > > -			memset((void *)xen_start_info->mfn_list, 0xff, size);
> > > -
> > > -			/* We should be in __ka space. */
> > > -			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> > > -			addr = xen_start_info->mfn_list;
> > > -			/* We roundup to the PMD, which means that if anybody at this stage is
> > > -			 * using the __ka address of xen_start_info or xen_start_info->shared_info
> > > -			 * they are in going to crash. Fortunatly we have already revectored
> > > -			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> > > -			size = roundup(size, PMD_SIZE);
> > > -			xen_cleanhighmap(addr, addr + size);
> > > -
> > > -			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> > > -			memblock_free(__pa(xen_start_info->mfn_list), size);
> > > -			/* And revector! Bye bye old array */
> > > -			xen_start_info->mfn_list = new_mfn_list;
> > > -		} else
> > > -			goto skip;
> > > -	}
> > >  	/* At this stage, cleanup_highmap has already cleaned __ka space
> > >  	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
> > >  	 * the ramdisk). We continue on, erasing PMD entries that point to page
> > > @@ -1255,8 +1251,19 @@ static void __init xen_pagetable_init(void)
> > >  	 * anything at this stage. */
> > >  	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
> > >  #endif
> > > -skip:
> > > +}
> > > +#else
> > > +static void __init xen_pagetable_p2m_copy(void)
> > > +{
> > > +	/* Nada! */
> > > +}
> > >  #endif
> > > +
> > > +static void __init xen_pagetable_init(void)
> > > +{
> > > +	paging_init();
> > > +	xen_setup_shared_info();
> > > +	xen_pagetable_p2m_copy();
> > >  	xen_post_allocator_init();
> > >  }
> > >  static void xen_write_cr2(unsigned long cr2)
> > > -- 
> > > 1.8.3.1
> > > 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-01  4:35 ` [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
  2014-01-02 16:14   ` David Vrabel
@ 2014-01-03 16:30   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> In xen_add_extra_mem() we can skip updating P2M as it's managed
> by Xen. PVH maps the entire IO space, but only RAM pages need
> to be repopulated.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/setup.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 2137c51..dd5f905 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -27,6 +27,7 @@
>  #include <xen/interface/memory.h>
>  #include <xen/interface/physdev.h>
>  #include <xen/features.h>
> +#include "mmu.h"
>  #include "xen-ops.h"
>  #include "vdso.h"
>  
> @@ -81,6 +82,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>  
>  	memblock_reserve(start, size);
>  
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
>  	xen_max_p2m_pfn = PFN_DOWN(start + size);
>  	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
>  		unsigned long mfn = pfn_to_mfn(pfn);
> @@ -103,6 +107,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  		.domid        = DOMID_SELF
>  	};
>  	unsigned long len = 0;
> +	int xlated_phys = xen_feature(XENFEAT_auto_translated_physmap);
>  	unsigned long pfn;
>  	int ret;
>  
> @@ -116,7 +121,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  				continue;
>  			frame = mfn;
>  		} else {
> -			if (mfn != INVALID_P2M_ENTRY)
> +			if (!xlated_phys && mfn != INVALID_P2M_ENTRY)
>  				continue;
>  			frame = pfn;
>  		}
> @@ -154,6 +159,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  static unsigned long __init xen_release_chunk(unsigned long start,
>  					      unsigned long end)
>  {
> +	/*
> +	 * Xen already ballooned out the E820 non RAM regions for us
> +	 * and set them up properly in EPT.
> +	 */
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return end - start;
> +
>  	return xen_do_chunk(start, end, true);
>  }
>  
> @@ -222,7 +234,13 @@ static void __init xen_set_identity_and_release_chunk(
>  	 * (except for the ISA region which must be 1:1 mapped) to
>  	 * release the refcounts (in Xen) on the original frames.
>  	 */
> -	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
> +
> +	/*
> +	 * PVH E820 matches the hypervisor's P2M which means we need to
> +	 * account for the proper values of *release and *identity.
> +	 */
> +	for (pfn = start_pfn; !xen_feature(XENFEAT_auto_translated_physmap) &&
> +	     pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
>  		pte_t pte = __pte_ma(0);
>  
>  		if (pfn < PFN_UP(ISA_END_ADDRESS))
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2)
  2014-01-01  4:35 ` [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
  2014-01-02 15:43   ` David Vrabel
@ 2014-01-03 16:34   ` Stefano Stabellini
  2014-01-03 18:10     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. There is
> a similar mode - PVHVM where we run in HVM mode with
> PV code enabled - and this patch explores that.
> 
> The most notable PV interfaces are the XenBus and event channels.
> 
> We will piggyback on how the event channel mechanism is
> used in PVHVM - that is we want the normal native IRQ mechanism
> and we will install a vector (hvm callback) for which we
> will call the event channel mechanism.
> 
> This means that from a pvops perspective, we can use
> native_irq_ops instead of the Xen PV specific. Albeit in the
> future we could support pirq_eoi_map. But that is
> a feature request that can be shared with PVHVM.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/irq.c   |  5 ++++-
>  drivers/xen/events.c | 16 ++++++++++------
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 0da7f86..76ca326 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -5,6 +5,7 @@
>  #include <xen/interface/xen.h>
>  #include <xen/interface/sched.h>
>  #include <xen/interface/vcpu.h>
> +#include <xen/features.h>
>  #include <xen/events.h>
>  
>  #include <asm/xen/hypercall.h>
> @@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
>  
>  void __init xen_init_irq_ops(void)
>  {
> -	pv_irq_ops = xen_irq_ops;
> +	/* For PVH we use default pv_irq_ops settings. */
> +	if (!xen_feature(XENFEAT_hvm_callback_vector))
> +		pv_irq_ops = xen_irq_ops;
>  	x86_init.irqs.intr_init = xen_init_IRQ;
>  }
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index 4035e83..bf8fb29 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -1908,20 +1908,24 @@ void __init xen_init_IRQ(void)
>  	pirq_needs_eoi = pirq_needs_eoi_flag;
>  
>  #ifdef CONFIG_X86
> -	if (xen_hvm_domain()) {
> +	if (xen_pv_domain()) {
> +		irq_ctx_init(smp_processor_id());
> +		if (xen_initial_domain())
> +			pci_xen_initial_domain();
> +	}
> +	if (xen_feature(XENFEAT_hvm_callback_vector))
>  		xen_callback_vector();
> +
> +	if (xen_hvm_domain()) {
>  		native_init_IRQ();
>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>  		 * __acpi_register_gsi can point at the right function */
>  		pci_xen_hvm_init();
> -	} else {
> +	} else if (!xen_pvh_domain()) {
> +		/* TODO: No PVH support for PIRQ EOI */
>  		int rc;
>  		struct physdev_pirq_eoi_gmfn eoi_gmfn;
>  
> -		irq_ctx_init(smp_processor_id());
> -		if (xen_initial_domain())
> -			pci_xen_initial_domain();

We already have a mechanism to identify whether
PHYSDEVOP_pirq_eoi_gmfn_v2 is available or not. Can't we just rely on
that?


>  		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
>  		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
>  		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
  2014-01-01  4:35 ` [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
  2014-01-02 11:38   ` David Vrabel
@ 2014-01-03 16:40   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> The function gnttab_max_grant_frames() returns the maximum amount
> of frames (pages) of grants we can have. Unfortunatly it was
> dependent on gnttab_init() having been run before to initialize
> the boot max value (boot_max_nr_grant_frames).
> 
> This meant that users of gnttab_max_grant_frames would always
> get a zero value if they called before gnttab_init() - such as
> 'platform_pci_init' (drivers/xen/platform-pci.c).
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  drivers/xen/grant-table.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index aa846a4..99399cb 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -62,7 +62,6 @@
>  
>  static grant_ref_t **gnttab_list;
>  static unsigned int nr_grant_frames;
> -static unsigned int boot_max_nr_grant_frames;
>  static int gnttab_free_count;
>  static grant_ref_t gnttab_free_head;
>  static DEFINE_SPINLOCK(gnttab_list_lock);
> @@ -827,6 +826,11 @@ static unsigned int __max_nr_grant_frames(void)
>  unsigned int gnttab_max_grant_frames(void)
>  {
>  	unsigned int xen_max = __max_nr_grant_frames();
> +	static unsigned int boot_max_nr_grant_frames;
> +
> +	/* First time, initialize it properly. */
> +	if (!boot_max_nr_grant_frames)
> +		boot_max_nr_grant_frames = __max_nr_grant_frames();
>  
>  	if (xen_max > boot_max_nr_grant_frames)
>  		return boot_max_nr_grant_frames;
> @@ -1227,13 +1231,12 @@ int gnttab_init(void)
>  
>  	gnttab_request_version();
>  	nr_grant_frames = 1;
> -	boot_max_nr_grant_frames = __max_nr_grant_frames();
>  
>  	/* Determine the maximum number of frames required for the
>  	 * grant reference free list on the current hypervisor.
>  	 */
>  	BUG_ON(grefs_per_grant_frame == 0);
> -	max_nr_glist_frames = (boot_max_nr_grant_frames *
> +	max_nr_glist_frames = (gnttab_max_grant_frames() *
>  			       grefs_per_grant_frame / RPP);
>  
>  	gnttab_list = kmalloc(max_nr_glist_frames * sizeof(grant_ref_t *),
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init
  2014-01-01  4:35 ` [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
  2014-01-02 11:39   ` David Vrabel
@ 2014-01-03 16:43   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> We have this odd scenario of where for PV paths we take a shortcut
> but for the HVM paths we first ioremap xen_hvm_resume_frames, then
> assign it to gnttab_shared.addr. This is needed because gnttab_map
> uses gnttab_shared.addr.
> 
> Instead of having:
> 	if (pv)
> 		return gnttab_map
> 	if (hvm)
> 		...
> 
> 	gnttab_map
> 
> Lets move the HVM part before the gnttab_map and remove the
> first call to gnttab_map.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/xen/grant-table.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 99399cb..cc1b4fa 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1173,22 +1173,17 @@ static int gnttab_setup(void)
>  	if (max_nr_gframes < nr_grant_frames)
>  		return -ENOSYS;
>  
> -	if (xen_pv_domain())
> -		return gnttab_map(0, nr_grant_frames - 1);
> -
> -	if (gnttab_shared.addr == NULL) {
> +	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
> +	{
>  		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> -						PAGE_SIZE * max_nr_gframes);
> +					       PAGE_SIZE * max_nr_gframes);

                          ^ spurious change
Aside from that:

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  		if (gnttab_shared.addr == NULL) {
>  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
>  					xen_hvm_resume_frames);
>  			return -ENOMEM;
>  		}
>  	}
> -
> -	gnttab_map(0, nr_grant_frames - 1);
> -
> -	return 0;
> +	return gnttab_map(0, nr_grant_frames - 1);
>  }
>  
>  int gnttab_resume(void)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-01  4:35 ` [PATCH v12 14/18] xen/grant: Implement an grant frame array struct Konrad Rzeszutek Wilk
  2014-01-02 16:27   ` David Vrabel
@ 2014-01-03 16:53   ` Stefano Stabellini
  2014-01-03 19:18     ` [Xen-devel] " Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 16:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> The 'xen_hvm_resume_frames' used to be an 'unsigned long'
> and contain the virtual address of the grants. That was OK
> for most architectures (PVHVM, ARM) were the grants are contingous
> in memory. That however is not the case for PVH - in which case
> we will have to do a lookup for each virtual address for the PFN.
> 
> Instead of doing that, lets make it a structure which will contain
> the array of PFNs, the virtual address and the count of said PFNs.
> 
> Also provide a generic functions: gnttab_setup_auto_xlat_frames and
> gnttab_free_auto_xlat_frames to populate said structure with
> appropiate values for PVHVM and ARM.
     ^appropriate


> To round it off, change the name from 'xen_hvm_resume_frames' to
> a more descriptive one - 'xen_auto_xlat_grant_frames'.
> 
> For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
> we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
> 
> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/arm/xen/enlighten.c   |  9 +++++++--
>  drivers/xen/grant-table.c  | 45 ++++++++++++++++++++++++++++++++++++++++-----
>  drivers/xen/platform-pci.c | 10 +++++++---
>  include/xen/grant_table.h  |  9 ++++++++-
>  4 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 8550123..2162172 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -208,6 +208,7 @@ static int __init xen_guest_init(void)
>  	const char *version = NULL;
>  	const char *xen_prefix = "xen,xen-";
>  	struct resource res;
> +	unsigned long grant_frames;
>  
>  	node = of_find_compatible_node(NULL, NULL, "xen,xen");
>  	if (!node) {
> @@ -224,10 +225,10 @@ static int __init xen_guest_init(void)
>  	}
>  	if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res))
>  		return 0;
> -	xen_hvm_resume_frames = res.start;
> +	grant_frames = res.start;
>  	xen_events_irq = irq_of_parse_and_map(node, 0);
>  	pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n",
> -			version, xen_events_irq, (xen_hvm_resume_frames >> PAGE_SHIFT));
> +			version, xen_events_irq, (grant_frames >> PAGE_SHIFT));
>  	xen_domain_type = XEN_HVM_DOMAIN;
>  
>  	xen_setup_features();
> @@ -265,6 +266,10 @@ static int __init xen_guest_init(void)
>  	if (xen_vcpu_info == NULL)
>  		return -ENOMEM;
>  
> +	if (gnttab_setup_auto_xlat_frames(grant_frames)) {
> +		free_percpu(xen_vcpu_info);
> +		return -ENOMEM;
> +	}
>  	gnttab_init();
>  	if (!xen_initial_domain())
>  		xenbus_probe(NULL);
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index cc1b4fa..b117fd6 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -65,8 +65,8 @@ static unsigned int nr_grant_frames;
>  static int gnttab_free_count;
>  static grant_ref_t gnttab_free_head;
>  static DEFINE_SPINLOCK(gnttab_list_lock);
> -unsigned long xen_hvm_resume_frames;
> -EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
> +struct grant_frames xen_auto_xlat_grant_frames;
> +EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);

it should be static now


>  static union {
>  	struct grant_entry_v1 *v1;
> @@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
>  }
>  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
>  
> +int gnttab_setup_auto_xlat_frames(unsigned long addr)
> +{
> +	xen_pfn_t *pfn;
> +	unsigned int max_nr_gframes = __max_nr_grant_frames();
> +	int i;
> +
> +	if (xen_auto_xlat_grant_frames.count)
> +		return -EINVAL;
> +
> +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
> +	if (!pfn)
> +		return -ENOMEM;
> +	for (i = 0; i < max_nr_gframes; i++)
> +		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));
> +
> +	xen_auto_xlat_grant_frames.vaddr = addr;
> +	xen_auto_xlat_grant_frames.pfn = pfn;
> +	xen_auto_xlat_grant_frames.count = max_nr_gframes;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_setup_auto_xlat_frames);
> +
> +void gnttab_free_auto_xlat_frames(void)
> +{
> +	if (!xen_auto_xlat_grant_frames.count)
> +		return;
> +	kfree(xen_auto_xlat_grant_frames.pfn);
> +	xen_auto_xlat_grant_frames.pfn = NULL;
> +	xen_auto_xlat_grant_frames.count = 0;
> +	xen_auto_xlat_grant_frames.vaddr = 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);

I would leave vaddr alone in gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames


>  /* Handling of paged out grant targets (GNTST_eagain) */
>  #define MAX_DELAY 256
>  static inline void
> @@ -1068,6 +1102,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  		struct xen_add_to_physmap xatp;
>  		unsigned int i = end_idx;
>  		rc = 0;
> +		BUG_ON(xen_auto_xlat_grant_frames.count < nr_gframes);
>  		/*
>  		 * Loop backwards, so that the first hypercall has the largest
>  		 * index, ensuring that the table will grow only once.
> @@ -1076,7 +1111,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  			xatp.domid = DOMID_SELF;
>  			xatp.idx = i;
>  			xatp.space = XENMAPSPACE_grant_table;
> -			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
> +			xatp.gpfn = xen_auto_xlat_grant_frames.pfn[i];
>  			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
>  			if (rc != 0) {
>  				pr_warn("grant table add_to_physmap failed, err=%d\n",
> @@ -1175,11 +1210,11 @@ static int gnttab_setup(void)
>  
>  	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
>  	{
> -		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> +		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames.vaddr,
>  					       PAGE_SIZE * max_nr_gframes);

here you can xen_remap xen_auto_xlat_grant_frames.pfn[0] instead


>  		if (gnttab_shared.addr == NULL) {
>  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
> -					xen_hvm_resume_frames);
> +					xen_auto_xlat_grant_frames.vaddr);
>  			return -ENOMEM;
>  		}
>  	}
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 2f3528e..f1947ac 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,7 @@ static int platform_pci_init(struct pci_dev *pdev,
>  	long ioaddr;
>  	long mmio_addr, mmio_len;
>  	unsigned int max_nr_gframes;
> +	unsigned long grant_frames;
>  
>  	if (!xen_domain())
>  		return -ENODEV;
> @@ -154,13 +155,16 @@ static int platform_pci_init(struct pci_dev *pdev,
>  	}
>  
>  	max_nr_gframes = gnttab_max_grant_frames();
> -	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> +	grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> +	if (gnttab_setup_auto_xlat_frames(grant_frames))
> +		goto out;
>  	ret = gnttab_init();
>  	if (ret)
> -		goto out;
> +		goto grant_out;
>  	xenbus_probe(NULL);
>  	return 0;
> -
> +grant_out:
> +	gnttab_free_auto_xlat_frames();
>  out:
>  	pci_release_region(pdev, 0);
>  mem_out:
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 694dcaf..a997406 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -178,8 +178,15 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
>  			   grant_status_t **__shared);
>  void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
>  
> -extern unsigned long xen_hvm_resume_frames;
> +struct grant_frames {
> +	xen_pfn_t *pfn;
> +	int count;
> +	unsigned long vaddr;
> +};
> +extern struct grant_frames xen_auto_xlat_grant_frames;
>  unsigned int gnttab_max_grant_frames(void);
> +int gnttab_setup_auto_xlat_frames(unsigned long addr);
> +void gnttab_free_auto_xlat_frames(void);
>  
>  #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 15:48             ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2014-01-03 17:20               ` Stefano Stabellini
  2014-01-03 18:14                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 17:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, stefano.stabellini, Konrad Rzeszutek Wilk,
	xen-devel, boris.ostrovsky, linux-kernel

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 03, 2014 at 03:41:51PM +0000, David Vrabel wrote:
> > On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> > >> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> > >>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> > >>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > >>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> > >>>>>  	return gnttab_init();
> > >>>>>  }
> > >>>>>  
> > >>>>> -core_initcall(__gnttab_init);
> > >>>>> +core_initcall_sync(__gnttab_init);
> > >>>>
> > >>>> Why has this become _sync?
> > >>>
> > >>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> > >>> at gnttab_init):
> > >>
> > >>
> > >> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> > > 
> > > It has a clear ordering property.
> > 
> > This really isn't obvious to me.  Can you point to the docs/code the
> > guarantee this?  I couldn't find it.
> 
> include/linux/init.h
> > 
> > >> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> > > 
> > > No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> > > also used by the ARM code.
> > > 
> > > Stefano in his previous review mentioned he would like PVH specific
> > > code in arch/x86:
> > > 
> > > https://lkml.org/lkml/2013/12/18/507
> > 
> > Call it xen_arch_gnttab_setup() and add weak stub for other architectures?
> 
> Stefano, thoughts?

I think that you can safely move __gnttab_init to postcore_initcall if
it works correctly for the PV and PVH cases, because HVM and ARM are
unaffected by it.  In fact they don't initialize the grant table via
__gnttab_init at all. See:

/* Delay grant-table initialization in the PV on HVM case */
if (xen_hvm_domain())
	return 0;

at the beginning of __gnttab_init.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus.
  2014-01-01  4:35 ` [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
  2014-01-02 11:43   ` David Vrabel
@ 2014-01-03 17:22   ` Stefano Stabellini
  1 sibling, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 17:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. For the XenBus
> mechanism we want to use the PVHVM mechanism.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/xen/xenbus/xenbus_client.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index ec097d6..7f7c454 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -45,6 +45,7 @@
>  #include <xen/grant_table.h>
>  #include <xen/xenbus.h>
>  #include <xen/xen.h>
> +#include <xen/features.h>
>  
>  #include "xenbus_probe.h"
>  
> @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
>  
>  void __init xenbus_ring_ops_init(void)
>  {
> -	if (xen_pv_domain())
> +	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))

As I wrote in the other email, this should be

if (!xen_feature(XENFEAT_auto_translated_physmap))


>  		ring_ops = &ring_ops_pv;
>  	else
>  		ring_ops = &ring_ops_hvm;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-01  4:35 ` [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2) Konrad Rzeszutek Wilk
  2014-01-02 16:32   ` David Vrabel
@ 2014-01-03 17:26   ` Stefano Stabellini
  2014-01-03 18:20     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 17:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	stefano.stabellini, mukesh.rathor

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> In PVH the shared grant frame is the PFN and not MFN,
> hence its mapped via the same code path as HVM.
> 
> The allocation of the grant frame is done differently - we
> do not use the early platform-pci driver and have an
> ioremap area - instead we use balloon memory and stitch
> all of the non-contingous pages in a virtualized area.
> 
> That means when we call the hypervisor to replace the GMFN
> with a XENMAPSPACE_grant_table type, we need to lookup the
> old PFN for every iteration instead of assuming a flat
> contingous PFN allocation.
> 
> Lastly, we only use v1 for grants. This is because PVHVM
> is not able to use v2 due to no XENMEM_add_to_physmap
> calls on the error status page (see commit
> 69e8f430e243d657c2053f097efebc2e2cd559f0
>  xen/granttable: Disable grant v2 for HVM domains.)
> 
> Until that is implemented this workaround has to
> be in place.
> 
> Also per suggestions by Stefano utilize the PVHVM paths
> as they share common functionality.
> 
> v2 of this patch moves most of the PVH code out in the
> arch/x86/xen/grant-table driver and touches only minimally
> the generic driver.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/grant-table.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/gntdev.c       |  2 +-
>  drivers/xen/grant-table.c  | 13 ++++++----
>  3 files changed, 73 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> index 3a5f55d..040e064 100644
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
> @@ -125,3 +125,67 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
>  	apply_to_page_range(&init_mm, (unsigned long)shared,
>  			    PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL);
>  }
> +#ifdef CONFIG_XEN_PVHVM
> +#include <xen/balloon.h>
> +#include <linux/slab.h>
> +static int __init xlated_setup_gnttab_pages(void)
> +{
> +	struct page **pages;
> +	xen_pfn_t *pfns;
> +	int rc, i;
> +	unsigned long nr_grant_frames = gnttab_max_grant_frames();
> +
> +	BUG_ON(nr_grant_frames == 0);
> +	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
> +	if (!pages)
> +		return -ENOMEM;
> +
> +	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
> +	if (!pfns) {
> +		kfree(pages);
> +		return -ENOMEM;
> +	}
> +	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
> +	if (rc) {
> +		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
> +			nr_grant_frames, rc);
> +		kfree(pages);
> +		kfree(pfns);
> +		return rc;
> +	}
> +	for (i = 0; i < nr_grant_frames; i++)
> +		pfns[i] = page_to_pfn(pages[i]);
> +
> +	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, nr_grant_frames,
> +				    (void *)&xen_auto_xlat_grant_frames.vaddr);
> +
> +	kfree(pages);
> +	if (rc) {
> +		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
> +			nr_grant_frames, rc);
> +		free_xenballooned_pages(nr_grant_frames, pages);
> +		kfree(pfns);
> +		return rc;
> +	}
> +
> +	xen_auto_xlat_grant_frames.pfn = pfns;
> +	xen_auto_xlat_grant_frames.count = nr_grant_frames;
> +
> +	return 0;
> +}

Unfortunately this way pfns is leaked. Can we safely free it or is it
reused at resume time?


> +static int __init xen_pvh_gnttab_setup(void)
> +{
> +	if (!xen_domain())
> +		return -ENODEV;
> +
> +	if (!xen_pv_domain())
> +		return -ENODEV;
> +
> +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> +		return -ENODEV;
> +
> +	return xlated_setup_gnttab_pages();
> +}
> +core_initcall(xen_pvh_gnttab_setup); /* Call it _before_ __gnttab_init */
> +#endif
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index e41c79c..073b4a1 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -846,7 +846,7 @@ static int __init gntdev_init(void)
>  	if (!xen_domain())
>  		return -ENODEV;
>  
> -	use_ptemod = xen_pv_domain();
> +	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
>  
>  	err = misc_register(&gntdev_miscdev);
>  	if (err != 0) {
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index b117fd6..2fa3a4c 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1098,7 +1098,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  	unsigned int nr_gframes = end_idx + 1;
>  	int rc;
>  
> -	if (xen_hvm_domain()) {
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
>  		struct xen_add_to_physmap xatp;
>  		unsigned int i = end_idx;
>  		rc = 0;
> @@ -1174,7 +1174,7 @@ static void gnttab_request_version(void)
>  	int rc;
>  	struct gnttab_set_version gsv;
>  
> -	if (xen_hvm_domain())
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
>  		gsv.version = 1;
>  	else
>  		gsv.version = 2;
> @@ -1210,8 +1210,11 @@ static int gnttab_setup(void)
>  
>  	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
>  	{
> -		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames.vaddr,
> -					       PAGE_SIZE * max_nr_gframes);
> +		if (xen_hvm_domain()) {
> +			gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames.vaddr,
> +						       PAGE_SIZE * max_nr_gframes);
> +		} else
> +			gnttab_shared.addr = xen_auto_xlat_grant_frames.vaddr;
>  		if (gnttab_shared.addr == NULL) {
>  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
>  					xen_auto_xlat_grant_frames.vaddr);
> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
>  	return gnttab_init();
>  }
>  
> -core_initcall(__gnttab_init);
> +core_initcall_sync(__gnttab_init);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-03  1:34       ` Mukesh Rathor
  2014-01-03 11:29         ` David Vrabel
@ 2014-01-03 17:35         ` Konrad Rzeszutek Wilk
  2014-01-04  1:13           ` Mukesh Rathor
  1 sibling, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 17:35 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: David Vrabel, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On Thu, Jan 02, 2014 at 05:34:38PM -0800, Mukesh Rathor wrote:
> On Thu, 2 Jan 2014 13:32:21 -0500
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> > On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
> > > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > 
> > > > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > > > need to use emulated prefix call. We also check for vector
> > > > callback early on, as it is a required feature. PVH also runs at
> > > > default kernel IOPL.
> > > > 
> > > > Finally, pure PV settings are moved to a separate function that
> > > > are only called for pure PV, ie, pv with pvmmu. They are also
> > > > #ifdef with CONFIG_XEN_PVMMU.
> > > [...]
> > > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > > unsigned int *bx, break;
> > > >  	}
> > > >  
> > > > -	asm(XEN_EMULATE_PREFIX "cpuid"
> > > > -		: "=a" (*ax),
> > > > -		  "=b" (*bx),
> > > > -		  "=c" (*cx),
> > > > -		  "=d" (*dx)
> > > > -		: "0" (*ax), "2" (*cx));
> > > > +	if (xen_pvh_domain())
> > > > +		native_cpuid(ax, bx, cx, dx);
> > > > +	else
> > > > +		asm(XEN_EMULATE_PREFIX "cpuid"
> > > > +			: "=a" (*ax),
> > > > +			"=b" (*bx),
> > > > +			"=c" (*cx),
> > > > +			"=d" (*dx)
> > > > +			: "0" (*ax), "2" (*cx));
> > > 
> > > For this one off cpuid call it seems preferrable to me to use the
> > > emulate prefix rather than diverge from PV.
> > 
> > This was before the PV cpuid was deemed OK to be used on PVH.
> > Will rip this out to use the same version.
> 
> Whats wrong with using native cpuid? That is one of the benefits that
> cpuid can be trapped via vmexit, and also there is talk of making PV
> cpuid trap obsolete in the future. I suggest leaving it native.

I chatted with David, Andrew and Roger on IRC about this. I like the
idea of using xen_cpuid because:
 1) It filters some of the CPUID flags that guests should not use. There is
    the 'aperfmperf,'x2apic', 'xsave', and whether the MWAIT_LEAF
    should be exposed (so that the ACPI AML code can call the right
    initialization code to use the extended C3 states instead of the
    legacy IOPORT ones). All of that is in xen_cpuid.
   
 2) It works, while we can concentrate on making 1) work in the
    hypervisor/toolstack.

Meaning that the future way would be to use the native cpuid and have
the hypervisor/toolstack setup the proper cpuid. In other words - use
the xen_cpuid as is until that code for filtering is in the hypervisor.


Except that PVH does not work the PV cpuid at all. I get a triple fault.
The instruction it fails at is at the 'XEN_EMULATE_PREFIX'.

Mukesh, can you point me to the patch where the PV cpuid functionality
is enabled?

Anyhow, as it stands, I will just use the native cpuid.

> 
> Mukesh
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2)
  2014-01-03 16:22   ` Stefano Stabellini
@ 2014-01-03 17:59     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 17:59 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	mukesh.rathor

On Fri, Jan 03, 2014 at 04:22:01PM +0000, Stefano Stabellini wrote:
> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > As we do not have yet a mechanism for that.
> > 
> > This also impacts the ARM/ARM64 code (which does not have
> > hotplug support yet).
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  drivers/xen/cpu_hotplug.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> > index cc6513a..5f80802 100644
> > --- a/drivers/xen/cpu_hotplug.c
> > +++ b/drivers/xen/cpu_hotplug.c
> > @@ -4,6 +4,7 @@
> >  
> >  #include <xen/xen.h>
> >  #include <xen/xenbus.h>
> > +#include <xen/features.h>
> >  
> >  #include <asm/xen/hypervisor.h>
> >  #include <asm/cpu.h>
> > @@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
> >  	static struct notifier_block xsn_cpu = {
> >  		.notifier_call = setup_cpu_watcher };
> >  
> > -	if (!xen_pv_domain())
> > +	/* PVH/ARM/ARM64 TBD/FIXME: future work */
> > +	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
> >  		return -ENODEV;
> >  
> >  	register_xenstore_notifier(&xsn_cpu);
> 
> Sorry for being a bit obnoxious but I was thinking that using a
> xen_feature(XENFEAT_auto_translated_physmap) check is conceptually
> wrong, because cpu hotplug and nested paging are orthogonal.

Yeah, you should be sorry :-) (Just joking - appreciate your
input and review)
> 
> Given that we most probably want to follow the PV path for cpu_hotplug
> (that is using drivers/xen/cpu_hotplug.c), is there actually a problem
> with building and initializing it on PVH guests?

It hasn't been tested..

> If it works as it is, I would be tempted to leave it for now.

..until now and it actually looks to work.

> 
> Otherwise the patch is OK and you can add my Acked-by.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2)
  2014-01-03 16:34   ` Stefano Stabellini
@ 2014-01-03 18:10     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 18:10 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	mukesh.rathor

On Fri, Jan 03, 2014 at 04:34:18PM +0000, Stefano Stabellini wrote:
> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > PVH is a PV guest with a twist - there are certain things
> > that work in it like HVM and some like PV. There is
> > a similar mode - PVHVM where we run in HVM mode with
> > PV code enabled - and this patch explores that.
> > 
> > The most notable PV interfaces are the XenBus and event channels.
> > 
> > We will piggyback on how the event channel mechanism is
> > used in PVHVM - that is we want the normal native IRQ mechanism
> > and we will install a vector (hvm callback) for which we
> > will call the event channel mechanism.
> > 
> > This means that from a pvops perspective, we can use
> > native_irq_ops instead of the Xen PV specific. Albeit in the
> > future we could support pirq_eoi_map. But that is
> > a feature request that can be shared with PVHVM.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/irq.c   |  5 ++++-
> >  drivers/xen/events.c | 16 ++++++++++------
> >  2 files changed, 14 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> > index 0da7f86..76ca326 100644
> > --- a/arch/x86/xen/irq.c
> > +++ b/arch/x86/xen/irq.c
> > @@ -5,6 +5,7 @@
> >  #include <xen/interface/xen.h>
> >  #include <xen/interface/sched.h>
> >  #include <xen/interface/vcpu.h>
> > +#include <xen/features.h>
> >  #include <xen/events.h>
> >  
> >  #include <asm/xen/hypercall.h>
> > @@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
> >  
> >  void __init xen_init_irq_ops(void)
> >  {
> > -	pv_irq_ops = xen_irq_ops;
> > +	/* For PVH we use default pv_irq_ops settings. */
> > +	if (!xen_feature(XENFEAT_hvm_callback_vector))
> > +		pv_irq_ops = xen_irq_ops;
> >  	x86_init.irqs.intr_init = xen_init_IRQ;
> >  }
> > diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> > index 4035e83..bf8fb29 100644
> > --- a/drivers/xen/events.c
> > +++ b/drivers/xen/events.c
> > @@ -1908,20 +1908,24 @@ void __init xen_init_IRQ(void)
> >  	pirq_needs_eoi = pirq_needs_eoi_flag;
> >  
> >  #ifdef CONFIG_X86
> > -	if (xen_hvm_domain()) {
> > +	if (xen_pv_domain()) {
> > +		irq_ctx_init(smp_processor_id());
> > +		if (xen_initial_domain())
> > +			pci_xen_initial_domain();
> > +	}
> > +	if (xen_feature(XENFEAT_hvm_callback_vector))
> >  		xen_callback_vector();
> > +
> > +	if (xen_hvm_domain()) {
> >  		native_init_IRQ();
> >  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
> >  		 * __acpi_register_gsi can point at the right function */
> >  		pci_xen_hvm_init();
> > -	} else {
> > +	} else if (!xen_pvh_domain()) {
> > +		/* TODO: No PVH support for PIRQ EOI */
> >  		int rc;
> >  		struct physdev_pirq_eoi_gmfn eoi_gmfn;
> >  
> > -		irq_ctx_init(smp_processor_id());
> > -		if (xen_initial_domain())
> > -			pci_xen_initial_domain();
> 
> We already have a mechanism to identify whether
> PHYSDEVOP_pirq_eoi_gmfn_v2 is available or not. Can't we just rely on
> that?

Yes, and this code has the right recovery mechanism to deal with let.

Thank you!

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 17:20               ` Stefano Stabellini
@ 2014-01-03 18:14                 ` Konrad Rzeszutek Wilk
  2014-01-03 18:29                   ` Stefano Stabellini
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 18:14 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, David Vrabel, xen-devel, boris.ostrovsky,
	linux-kernel

On Fri, Jan 03, 2014 at 05:20:54PM +0000, Stefano Stabellini wrote:
> On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 03, 2014 at 03:41:51PM +0000, David Vrabel wrote:
> > > On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> > > >> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> > > >>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> > > >>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > >>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> > > >>>>>  	return gnttab_init();
> > > >>>>>  }
> > > >>>>>  
> > > >>>>> -core_initcall(__gnttab_init);
> > > >>>>> +core_initcall_sync(__gnttab_init);
> > > >>>>
> > > >>>> Why has this become _sync?
> > > >>>
> > > >>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> > > >>> at gnttab_init):
> > > >>
> > > >>
> > > >> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> > > > 
> > > > It has a clear ordering property.
> > > 
> > > This really isn't obvious to me.  Can you point to the docs/code the
> > > guarantee this?  I couldn't find it.
> > 
> > include/linux/init.h
> > > 
> > > >> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> > > > 
> > > > No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> > > > also used by the ARM code.
> > > > 
> > > > Stefano in his previous review mentioned he would like PVH specific
> > > > code in arch/x86:
> > > > 
> > > > https://lkml.org/lkml/2013/12/18/507
> > > 
> > > Call it xen_arch_gnttab_setup() and add weak stub for other architectures?
> > 
> > Stefano, thoughts?
> 
> I think that you can safely move __gnttab_init to postcore_initcall if
> it works correctly for the PV and PVH cases, because HVM and ARM are
> unaffected by it.  In fact they don't initialize the grant table via
> __gnttab_init at all. See:

The 'xenbus_init' is called in postcore_initcall. I don't actually
know if it is OK to call that _before_ gnttab_init is called.

> 
> /* Delay grant-table initialization in the PV on HVM case */
> if (xen_hvm_domain())
> 	return 0;
> 
> at the beginning of __gnttab_init.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 17:26   ` Stefano Stabellini
@ 2014-01-03 18:20     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 18:20 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-kernel, xen-devel, boris.ostrovsky, david.vrabel,
	mukesh.rathor

On Fri, Jan 03, 2014 at 05:26:39PM +0000, Stefano Stabellini wrote:
> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > In PVH the shared grant frame is the PFN and not MFN,
> > hence its mapped via the same code path as HVM.
> > 
> > The allocation of the grant frame is done differently - we
> > do not use the early platform-pci driver and have an
> > ioremap area - instead we use balloon memory and stitch
> > all of the non-contingous pages in a virtualized area.
> > 
> > That means when we call the hypervisor to replace the GMFN
> > with a XENMAPSPACE_grant_table type, we need to lookup the
> > old PFN for every iteration instead of assuming a flat
> > contingous PFN allocation.
> > 
> > Lastly, we only use v1 for grants. This is because PVHVM
> > is not able to use v2 due to no XENMEM_add_to_physmap
> > calls on the error status page (see commit
> > 69e8f430e243d657c2053f097efebc2e2cd559f0
> >  xen/granttable: Disable grant v2 for HVM domains.)
> > 
> > Until that is implemented this workaround has to
> > be in place.
> > 
> > Also per suggestions by Stefano utilize the PVHVM paths
> > as they share common functionality.
> > 
> > v2 of this patch moves most of the PVH code out in the
> > arch/x86/xen/grant-table driver and touches only minimally
> > the generic driver.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/grant-table.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/xen/gntdev.c       |  2 +-
> >  drivers/xen/grant-table.c  | 13 ++++++----
> >  3 files changed, 73 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> > index 3a5f55d..040e064 100644
> > --- a/arch/x86/xen/grant-table.c
> > +++ b/arch/x86/xen/grant-table.c
> > @@ -125,3 +125,67 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
> >  	apply_to_page_range(&init_mm, (unsigned long)shared,
> >  			    PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL);
> >  }
> > +#ifdef CONFIG_XEN_PVHVM
> > +#include <xen/balloon.h>
> > +#include <linux/slab.h>
> > +static int __init xlated_setup_gnttab_pages(void)
> > +{
> > +	struct page **pages;
> > +	xen_pfn_t *pfns;
> > +	int rc, i;
> > +	unsigned long nr_grant_frames = gnttab_max_grant_frames();
> > +
> > +	BUG_ON(nr_grant_frames == 0);
> > +	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
> > +	if (!pages)
> > +		return -ENOMEM;
> > +
> > +	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
> > +	if (!pfns) {
> > +		kfree(pages);
> > +		return -ENOMEM;
> > +	}
> > +	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
> > +	if (rc) {
> > +		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
> > +			nr_grant_frames, rc);
> > +		kfree(pages);
> > +		kfree(pfns);
> > +		return rc;
> > +	}
> > +	for (i = 0; i < nr_grant_frames; i++)
> > +		pfns[i] = page_to_pfn(pages[i]);
> > +
> > +	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, nr_grant_frames,
> > +				    (void *)&xen_auto_xlat_grant_frames.vaddr);
> > +
> > +	kfree(pages);
> > +	if (rc) {
> > +		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
> > +			nr_grant_frames, rc);
> > +		free_xenballooned_pages(nr_grant_frames, pages);
> > +		kfree(pfns);
> > +		return rc;
> > +	}
> > +
> > +	xen_auto_xlat_grant_frames.pfn = pfns;
> > +	xen_auto_xlat_grant_frames.count = nr_grant_frames;
> > +
> > +	return 0;
> > +}
> 
> Unfortunately this way pfns is leaked. Can we safely free it or is it
> reused at resume time?

You mean you want PVH to suspend and resume work out of the box?!

HA! I hadn't even tested that yet.

How about when we get to that point we will figure out the way to
do the right thing.

What actually happens during suspend/resume in a HVM guests? We just
need to call 'gnttab_setup' which calls 'gnttab_map' to do the
XENMAPSPACE_grant_table on the PFNs right? That should be OK
and the xen_auto_xlat_grant_frames.pfn  is used during that.

The suspend path would use unmap_frames -> arch_gnttab_unmap which
just clears the PTEs. There is no freeing off the memory
which is used as backing store.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 18:14                 ` Konrad Rzeszutek Wilk
@ 2014-01-03 18:29                   ` Stefano Stabellini
  2014-01-03 18:39                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 18:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Konrad Rzeszutek Wilk, David Vrabel,
	xen-devel, boris.ostrovsky, linux-kernel

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 03, 2014 at 05:20:54PM +0000, Stefano Stabellini wrote:
> > On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 03, 2014 at 03:41:51PM +0000, David Vrabel wrote:
> > > > On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> > > > > On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> > > > >> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> > > > >>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> > > > >>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > >>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> > > > >>>>>  	return gnttab_init();
> > > > >>>>>  }
> > > > >>>>>  
> > > > >>>>> -core_initcall(__gnttab_init);
> > > > >>>>> +core_initcall_sync(__gnttab_init);
> > > > >>>>
> > > > >>>> Why has this become _sync?
> > > > >>>
> > > > >>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> > > > >>> at gnttab_init):
> > > > >>
> > > > >>
> > > > >> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> > > > > 
> > > > > It has a clear ordering property.
> > > > 
> > > > This really isn't obvious to me.  Can you point to the docs/code the
> > > > guarantee this?  I couldn't find it.
> > > 
> > > include/linux/init.h
> > > > 
> > > > >> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> > > > > 
> > > > > No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> > > > > also used by the ARM code.
> > > > > 
> > > > > Stefano in his previous review mentioned he would like PVH specific
> > > > > code in arch/x86:
> > > > > 
> > > > > https://lkml.org/lkml/2013/12/18/507
> > > > 
> > > > Call it xen_arch_gnttab_setup() and add weak stub for other architectures?
> > > 
> > > Stefano, thoughts?
> > 
> > I think that you can safely move __gnttab_init to postcore_initcall if
> > it works correctly for the PV and PVH cases, because HVM and ARM are
> > unaffected by it.  In fact they don't initialize the grant table via
> > __gnttab_init at all. See:
> 
> The 'xenbus_init' is called in postcore_initcall. I don't actually
> know if it is OK to call that _before_ gnttab_init is called.

No, xenbus_init needs to be called after gnttab_init, however the
alphabetical order would enforce it.
Not that I would want to rely on it :-)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 18:29                   ` Stefano Stabellini
@ 2014-01-03 18:39                     ` Konrad Rzeszutek Wilk
  2014-01-03 19:02                       ` Stefano Stabellini
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 18:39 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, David Vrabel, xen-devel, boris.ostrovsky,
	linux-kernel

On Fri, Jan 03, 2014 at 06:29:25PM +0000, Stefano Stabellini wrote:
> On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 03, 2014 at 05:20:54PM +0000, Stefano Stabellini wrote:
> > > On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Jan 03, 2014 at 03:41:51PM +0000, David Vrabel wrote:
> > > > > On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> > > > > > On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> > > > > >> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> > > > > >>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> > > > > >>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > > >>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> > > > > >>>>>  	return gnttab_init();
> > > > > >>>>>  }
> > > > > >>>>>  
> > > > > >>>>> -core_initcall(__gnttab_init);
> > > > > >>>>> +core_initcall_sync(__gnttab_init);
> > > > > >>>>
> > > > > >>>> Why has this become _sync?
> > > > > >>>
> > > > > >>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> > > > > >>> at gnttab_init):
> > > > > >>
> > > > > >>
> > > > > >> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> > > > > > 
> > > > > > It has a clear ordering property.
> > > > > 
> > > > > This really isn't obvious to me.  Can you point to the docs/code the
> > > > > guarantee this?  I couldn't find it.
> > > > 
> > > > include/linux/init.h
> > > > > 
> > > > > >> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> > > > > > 
> > > > > > No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> > > > > > also used by the ARM code.
> > > > > > 
> > > > > > Stefano in his previous review mentioned he would like PVH specific
> > > > > > code in arch/x86:
> > > > > > 
> > > > > > https://lkml.org/lkml/2013/12/18/507
> > > > > 
> > > > > Call it xen_arch_gnttab_setup() and add weak stub for other architectures?
> > > > 
> > > > Stefano, thoughts?
> > > 
> > > I think that you can safely move __gnttab_init to postcore_initcall if
> > > it works correctly for the PV and PVH cases, because HVM and ARM are
> > > unaffected by it.  In fact they don't initialize the grant table via
> > > __gnttab_init at all. See:
> > 
> > The 'xenbus_init' is called in postcore_initcall. I don't actually
> > know if it is OK to call that _before_ gnttab_init is called.
> 
> No, xenbus_init needs to be called after gnttab_init, however the
> alphabetical order would enforce it.
> Not that I would want to rely on it :-)

Exactly. Which is why I came back to the idea of just moving __gnttab_init 
one level down in the '1' runlevel. This way I can guarantee that this
order of operation will be done:

xen_pvh_gnttab_setup
__gnttab_init
xenbus_init

Without anybody coming up with a patch that would randomize the order
of functions called within the runlevels.

I gather you prefer then this approach then?


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2)
  2014-01-03 18:39                     ` Konrad Rzeszutek Wilk
@ 2014-01-03 19:02                       ` Stefano Stabellini
  0 siblings, 0 replies; 90+ messages in thread
From: Stefano Stabellini @ 2014-01-03 19:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Konrad Rzeszutek Wilk, David Vrabel,
	xen-devel, boris.ostrovsky, linux-kernel

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 03, 2014 at 06:29:25PM +0000, Stefano Stabellini wrote:
> > On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jan 03, 2014 at 05:20:54PM +0000, Stefano Stabellini wrote:
> > > > On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > > > > On Fri, Jan 03, 2014 at 03:41:51PM +0000, David Vrabel wrote:
> > > > > > On 03/01/14 14:44, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Fri, Jan 03, 2014 at 11:54:13AM +0000, David Vrabel wrote:
> > > > > > >> On 02/01/14 18:50, Konrad Rzeszutek Wilk wrote:
> > > > > > >>> On Thu, Jan 02, 2014 at 04:32:03PM +0000, David Vrabel wrote:
> > > > > > >>>> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > > > >>>>> @@ -1320,4 +1323,4 @@ static int __gnttab_init(void)
> > > > > > >>>>>  	return gnttab_init();
> > > > > > >>>>>  }
> > > > > > >>>>>  
> > > > > > >>>>> -core_initcall(__gnttab_init);
> > > > > > >>>>> +core_initcall_sync(__gnttab_init);
> > > > > > >>>>
> > > > > > >>>> Why has this become _sync?
> > > > > > >>>
> > > > > > >>> It needs to run _after_ the xen_pvh_gnttab_setup has run (which is
> > > > > > >>> at gnttab_init):
> > > > > > >>
> > > > > > >>
> > > > > > >> The use of core_initcall_sync() doesn't imply any ordering to me.  Can't
> > > > > > > 
> > > > > > > It has a clear ordering property.
> > > > > > 
> > > > > > This really isn't obvious to me.  Can you point to the docs/code the
> > > > > > guarantee this?  I couldn't find it.
> > > > > 
> > > > > include/linux/init.h
> > > > > > 
> > > > > > >> you call xen_pvh_gnttab_setup() from within __gnttab_init() ?
> > > > > > > 
> > > > > > > No. That is due to the fact that __gnttab_init() is in drivers/xen and is
> > > > > > > also used by the ARM code.
> > > > > > > 
> > > > > > > Stefano in his previous review mentioned he would like PVH specific
> > > > > > > code in arch/x86:
> > > > > > > 
> > > > > > > https://lkml.org/lkml/2013/12/18/507
> > > > > > 
> > > > > > Call it xen_arch_gnttab_setup() and add weak stub for other architectures?
> > > > > 
> > > > > Stefano, thoughts?
> > > > 
> > > > I think that you can safely move __gnttab_init to postcore_initcall if
> > > > it works correctly for the PV and PVH cases, because HVM and ARM are
> > > > unaffected by it.  In fact they don't initialize the grant table via
> > > > __gnttab_init at all. See:
> > > 
> > > The 'xenbus_init' is called in postcore_initcall. I don't actually
> > > know if it is OK to call that _before_ gnttab_init is called.
> > 
> > No, xenbus_init needs to be called after gnttab_init, however the
> > alphabetical order would enforce it.
> > Not that I would want to rely on it :-)
> 
> Exactly. Which is why I came back to the idea of just moving __gnttab_init 
> one level down in the '1' runlevel. This way I can guarantee that this
> order of operation will be done:
> 
> xen_pvh_gnttab_setup
> __gnttab_init
> xenbus_init
> 
> Without anybody coming up with a patch that would randomize the order
> of functions called within the runlevels.
> 
> I gather you prefer then this approach then?

Yeah, seems sensible.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Xen-devel] [PATCH v12 14/18] xen/grant: Implement an grant frame array struct.
  2014-01-03 16:53   ` Stefano Stabellini
@ 2014-01-03 19:18     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:18 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, linux-kernel, david.vrabel, xen-devel,
	boris.ostrovsky

On Fri, Jan 03, 2014 at 04:53:59PM +0000, Stefano Stabellini wrote:
> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > The 'xen_hvm_resume_frames' used to be an 'unsigned long'
> > and contain the virtual address of the grants. That was OK
> > for most architectures (PVHVM, ARM) were the grants are contingous
> > in memory. That however is not the case for PVH - in which case
> > we will have to do a lookup for each virtual address for the PFN.
> > 
> > Instead of doing that, lets make it a structure which will contain
> > the array of PFNs, the virtual address and the count of said PFNs.
> > 
> > Also provide a generic functions: gnttab_setup_auto_xlat_frames and
> > gnttab_free_auto_xlat_frames to populate said structure with
> > appropiate values for PVHVM and ARM.
>      ^appropriate
> 
> 
> > To round it off, change the name from 'xen_hvm_resume_frames' to
> > a more descriptive one - 'xen_auto_xlat_grant_frames'.
> > 
> > For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
> > we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
> > 
> > Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/arm/xen/enlighten.c   |  9 +++++++--
> >  drivers/xen/grant-table.c  | 45 ++++++++++++++++++++++++++++++++++++++++-----
> >  drivers/xen/platform-pci.c | 10 +++++++---
> >  include/xen/grant_table.h  |  9 ++++++++-
> >  4 files changed, 62 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > index 8550123..2162172 100644
> > --- a/arch/arm/xen/enlighten.c
> > +++ b/arch/arm/xen/enlighten.c
> > @@ -208,6 +208,7 @@ static int __init xen_guest_init(void)
> >  	const char *version = NULL;
> >  	const char *xen_prefix = "xen,xen-";
> >  	struct resource res;
> > +	unsigned long grant_frames;
> >  
> >  	node = of_find_compatible_node(NULL, NULL, "xen,xen");
> >  	if (!node) {
> > @@ -224,10 +225,10 @@ static int __init xen_guest_init(void)
> >  	}
> >  	if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res))
> >  		return 0;
> > -	xen_hvm_resume_frames = res.start;
> > +	grant_frames = res.start;
> >  	xen_events_irq = irq_of_parse_and_map(node, 0);
> >  	pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n",
> > -			version, xen_events_irq, (xen_hvm_resume_frames >> PAGE_SHIFT));
> > +			version, xen_events_irq, (grant_frames >> PAGE_SHIFT));
> >  	xen_domain_type = XEN_HVM_DOMAIN;
> >  
> >  	xen_setup_features();
> > @@ -265,6 +266,10 @@ static int __init xen_guest_init(void)
> >  	if (xen_vcpu_info == NULL)
> >  		return -ENOMEM;
> >  
> > +	if (gnttab_setup_auto_xlat_frames(grant_frames)) {
> > +		free_percpu(xen_vcpu_info);
> > +		return -ENOMEM;
> > +	}
> >  	gnttab_init();
> >  	if (!xen_initial_domain())
> >  		xenbus_probe(NULL);
> > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > index cc1b4fa..b117fd6 100644
> > --- a/drivers/xen/grant-table.c
> > +++ b/drivers/xen/grant-table.c
> > @@ -65,8 +65,8 @@ static unsigned int nr_grant_frames;
> >  static int gnttab_free_count;
> >  static grant_ref_t gnttab_free_head;
> >  static DEFINE_SPINLOCK(gnttab_list_lock);
> > -unsigned long xen_hvm_resume_frames;
> > -EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
> > +struct grant_frames xen_auto_xlat_grant_frames;
> > +EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);
> 
> it should be static now

Can't be. The arch/x86/xen/grant-table.c has to use it.

I can drop the 'EXPORT_SYMBOL_GPL' though.

> 
> 
> >  static union {
> >  	struct grant_entry_v1 *v1;
> > @@ -838,6 +838,40 @@ unsigned int gnttab_max_grant_frames(void)
> >  }
> >  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
> >  
> > +int gnttab_setup_auto_xlat_frames(unsigned long addr)
> > +{
> > +	xen_pfn_t *pfn;
> > +	unsigned int max_nr_gframes = __max_nr_grant_frames();
> > +	int i;
> > +
> > +	if (xen_auto_xlat_grant_frames.count)
> > +		return -EINVAL;
> > +
> > +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
> > +	if (!pfn)
> > +		return -ENOMEM;
> > +	for (i = 0; i < max_nr_gframes; i++)
> > +		pfn[i] = PFN_DOWN(addr + (i * PAGE_SIZE));
> > +
> > +	xen_auto_xlat_grant_frames.vaddr = addr;
> > +	xen_auto_xlat_grant_frames.pfn = pfn;
> > +	xen_auto_xlat_grant_frames.count = max_nr_gframes;
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(gnttab_setup_auto_xlat_frames);
> > +
> > +void gnttab_free_auto_xlat_frames(void)
> > +{
> > +	if (!xen_auto_xlat_grant_frames.count)
> > +		return;
> > +	kfree(xen_auto_xlat_grant_frames.pfn);
> > +	xen_auto_xlat_grant_frames.pfn = NULL;
> > +	xen_auto_xlat_grant_frames.count = 0;
> > +	xen_auto_xlat_grant_frames.vaddr = 0;
> > +}
> > +EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
> 
> I would leave vaddr alone in gnttab_setup_auto_xlat_frames and
> gnttab_free_auto_xlat_frames

Actually, I like David's suggestion. Patch coming out soon.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).
  2014-01-03 17:35         ` Konrad Rzeszutek Wilk
@ 2014-01-04  1:13           ` Mukesh Rathor
  0 siblings, 0 replies; 90+ messages in thread
From: Mukesh Rathor @ 2014-01-04  1:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On Fri, 3 Jan 2014 12:35:55 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Thu, Jan 02, 2014 at 05:34:38PM -0800, Mukesh Rathor wrote:
> > On Thu, 2 Jan 2014 13:32:21 -0500
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > 
> > > On Thu, Jan 02, 2014 at 03:32:33PM +0000, David Vrabel wrote:
> > > > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > > 
> > > > > In the bootup code for PVH we can trap cpuid via vmexit, so
> > > > > don't need to use emulated prefix call. We also check for
> > > > > vector callback early on, as it is a required feature. PVH
> > > > > also runs at default kernel IOPL.
> > > > > 
> > > > > Finally, pure PV settings are moved to a separate function
> > > > > that are only called for pure PV, ie, pv with pvmmu. They are
> > > > > also #ifdef with CONFIG_XEN_PVMMU.
> > > > [...]
> > > > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > > > unsigned int *bx, break;
> > > > >  	}
> > > > >  
> > > > > -	asm(XEN_EMULATE_PREFIX "cpuid"
> > > > > -		: "=a" (*ax),
> > > > > -		  "=b" (*bx),
> > > > > -		  "=c" (*cx),
> > > > > -		  "=d" (*dx)
> > > > > -		: "0" (*ax), "2" (*cx));
> > > > > +	if (xen_pvh_domain())
> > > > > +		native_cpuid(ax, bx, cx, dx);
> > > > > +	else
> > > > > +		asm(XEN_EMULATE_PREFIX "cpuid"
> > > > > +			: "=a" (*ax),
> > > > > +			"=b" (*bx),
> > > > > +			"=c" (*cx),
> > > > > +			"=d" (*dx)
> > > > > +			: "0" (*ax), "2" (*cx));
> > > > 
> > > > For this one off cpuid call it seems preferrable to me to use
> > > > the emulate prefix rather than diverge from PV.
> > > 
> > > This was before the PV cpuid was deemed OK to be used on PVH.
> > > Will rip this out to use the same version.
> > 
> > Whats wrong with using native cpuid? That is one of the benefits
> > that cpuid can be trapped via vmexit, and also there is talk of
> > making PV cpuid trap obsolete in the future. I suggest leaving it
> > native.
> 
> I chatted with David, Andrew and Roger on IRC about this. I like the
> idea of using xen_cpuid because:
>  1) It filters some of the CPUID flags that guests should not use.
> There is the 'aperfmperf,'x2apic', 'xsave', and whether the MWAIT_LEAF
>     should be exposed (so that the ACPI AML code can call the right
>     initialization code to use the extended C3 states instead of the
>     legacy IOPORT ones). All of that is in xen_cpuid.
>    
>  2) It works, while we can concentrate on making 1) work in the
>     hypervisor/toolstack.
> 
> Meaning that the future way would be to use the native cpuid and have
> the hypervisor/toolstack setup the proper cpuid. In other words - use
> the xen_cpuid as is until that code for filtering is in the
> hypervisor.
> 
> 
> Except that PVH does not work the PV cpuid at all. I get a triple
> fault. The instruction it fails at is at the 'XEN_EMULATE_PREFIX'.
> 
> Mukesh, can you point me to the patch where the PV cpuid functionality
> is enabled?
> 
> Anyhow, as it stands, I will just use the native cpuid.

I am referring to using "cpuid" instruction instead of XEN_EMULATE_PREFIX.
cpuid is faster and long term better... there is no benefit using
XEN_EMULATE_PREFIX IMO. We can look at removing xen_cpuid() altogether for
PVH when/after pvh 32bit work gets done IMO.

The triple fault seems to be a new bug... I can create a bug, but for
now, with using cpuid instruction, that won't be an issue.

thanks
mukesh


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-02 18:41     ` Konrad Rzeszutek Wilk
@ 2014-01-04  1:23       ` Mukesh Rathor
  2014-01-04  2:25         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Mukesh Rathor @ 2014-01-04  1:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On Thu, 2 Jan 2014 13:41:34 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Thu, Jan 02, 2014 at 04:14:32PM +0000, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > > by Xen. PVH maps the entire IO space, but only RAM pages need
> > > to be repopulated.
> > 
> > So this looks minimal but I can't work out what PVH actually needs
> > to do here.  This code really doesn't need to be made any more
> > confusing.
> 
> I gather you prefer Mukesh's original version?

I think Konrad thats easier to follow as one can quickly spot
the PVH difference... but your call.

thanks
mukesh


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-04  1:23       ` Mukesh Rathor
@ 2014-01-04  2:25         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-04  2:25 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: David Vrabel, linux-kernel, xen-devel, boris.ostrovsky,
	stefano.stabellini

On Fri, Jan 03, 2014 at 05:23:37PM -0800, Mukesh Rathor wrote:
> On Thu, 2 Jan 2014 13:41:34 -0500
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> > On Thu, Jan 02, 2014 at 04:14:32PM +0000, David Vrabel wrote:
> > > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > 
> > > > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > > > by Xen. PVH maps the entire IO space, but only RAM pages need
> > > > to be repopulated.
> > > 
> > > So this looks minimal but I can't work out what PVH actually needs
> > > to do here.  This code really doesn't need to be made any more
> > > confusing.
> > 
> > I gather you prefer Mukesh's original version?
> 
> I think Konrad thats easier to follow as one can quickly spot
> the PVH difference... but your call.

I prefer the one that re-uses the existing logic. That has been - both
in the hypervisor and in the Linux kernel for PVH - the path - just do
nice little one-offs that do something simpler and easier than the
old PV path.

That way one can easily spot how PV vs PVH works for certain operations.

It also from a testing coverage perspective means we end up using the
same codepath for both PV and PVH - so we do get more testing exposure
for different modes.

> 
> thanks
> mukesh
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2014-01-04  2:26 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-01  4:35 [PATCH v12] Linux Xen PVH support Konrad Rzeszutek Wilk
2014-01-01  4:35 ` [PATCH v12 01/18] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
2014-01-01  4:35 ` [PATCH v12 02/18] xen/pvh/x86: Define what an PVH guest is (v2) Konrad Rzeszutek Wilk
2014-01-02 11:13   ` David Vrabel
2014-01-03 15:33     ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2) Konrad Rzeszutek Wilk
2014-01-02 15:32   ` David Vrabel
2014-01-02 18:32     ` Konrad Rzeszutek Wilk
2014-01-03  1:34       ` Mukesh Rathor
2014-01-03 11:29         ` David Vrabel
2014-01-03 15:37           ` Stefano Stabellini
2014-01-03 17:35         ` Konrad Rzeszutek Wilk
2014-01-04  1:13           ` Mukesh Rathor
2014-01-03 11:25       ` David Vrabel
2014-01-01  4:35 ` [PATCH v12 04/18] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
2014-01-02 11:17   ` David Vrabel
2014-01-03 15:41   ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 05/18] xen/mmu/p2m: Refactor the xen_pagetable_init code Konrad Rzeszutek Wilk
2014-01-02 11:21   ` David Vrabel
2014-01-03 15:47   ` Stefano Stabellini
2014-01-03 16:02     ` Konrad Rzeszutek Wilk
2014-01-03 16:23       ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
2014-01-02 11:24   ` David Vrabel
2014-01-03  1:36     ` Mukesh Rathor
2014-01-03 10:14       ` David Vrabel
2014-01-03 15:50     ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 07/18] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
2014-01-02 11:27   ` David Vrabel
2014-01-02 18:23     ` Konrad Rzeszutek Wilk
2014-01-03 14:39     ` Konrad Rzeszutek Wilk
2014-01-03 15:18       ` David Vrabel
2014-01-01  4:35 ` [PATCH v12 08/18] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
2014-01-02 11:31   ` David Vrabel
2014-01-02 18:24     ` Konrad Rzeszutek Wilk
2014-01-03 11:27       ` David Vrabel
2014-01-01  4:35 ` [PATCH v12 09/18] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
2014-01-02 16:07   ` David Vrabel
2014-01-01  4:35 ` [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
2014-01-02 16:14   ` David Vrabel
2014-01-02 18:41     ` Konrad Rzeszutek Wilk
2014-01-04  1:23       ` Mukesh Rathor
2014-01-04  2:25         ` Konrad Rzeszutek Wilk
2014-01-03 16:30   ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 11/18] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
2014-01-02 15:43   ` David Vrabel
2014-01-03 16:34   ` Stefano Stabellini
2014-01-03 18:10     ` Konrad Rzeszutek Wilk
2014-01-01  4:35 ` [PATCH v12 12/18] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
2014-01-02 11:38   ` David Vrabel
2014-01-03 16:40   ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 13/18] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
2014-01-02 11:39   ` David Vrabel
2014-01-03 16:43   ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 14/18] xen/grant: Implement an grant frame array struct Konrad Rzeszutek Wilk
2014-01-02 16:27   ` David Vrabel
2014-01-02 18:47     ` Konrad Rzeszutek Wilk
2014-01-03 12:11       ` [Xen-devel] " David Vrabel
2014-01-03 15:09         ` Konrad Rzeszutek Wilk
2014-01-03 16:53   ` Stefano Stabellini
2014-01-03 19:18     ` [Xen-devel] " Konrad Rzeszutek Wilk
2014-01-01  4:35 ` [PATCH v12 15/18] xen/pvh: Piggyback on PVHVM for grant driver (v2) Konrad Rzeszutek Wilk
2014-01-02 16:32   ` David Vrabel
2014-01-02 18:50     ` Konrad Rzeszutek Wilk
2014-01-03 11:54       ` David Vrabel
2014-01-03 14:44         ` Konrad Rzeszutek Wilk
2014-01-03 15:41           ` David Vrabel
2014-01-03 15:48             ` [Xen-devel] " Konrad Rzeszutek Wilk
2014-01-03 17:20               ` Stefano Stabellini
2014-01-03 18:14                 ` Konrad Rzeszutek Wilk
2014-01-03 18:29                   ` Stefano Stabellini
2014-01-03 18:39                     ` Konrad Rzeszutek Wilk
2014-01-03 19:02                       ` Stefano Stabellini
2014-01-03 17:26   ` Stefano Stabellini
2014-01-03 18:20     ` Konrad Rzeszutek Wilk
2014-01-01  4:35 ` [PATCH v12 16/18] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
2014-01-02 11:43   ` David Vrabel
2014-01-03 17:22   ` Stefano Stabellini
2014-01-01  4:35 ` [PATCH v12 17/18] xen/pvh/arm/arm64: Disable PV code that does not work with PVH (v2) Konrad Rzeszutek Wilk
2014-01-02 11:44   ` David Vrabel
2014-01-03 16:22   ` Stefano Stabellini
2014-01-03 17:59     ` Konrad Rzeszutek Wilk
2014-01-01  4:35 ` [PATCH v12 18/18] xen/pvh: Support ParaVirtualized Hardware extensions (v2) Konrad Rzeszutek Wilk
2014-01-02 11:48   ` David Vrabel
2014-01-02 18:27     ` Konrad Rzeszutek Wilk
2014-01-02 16:50 ` [PATCH v12] Linux Xen PVH support David Vrabel
2014-01-02 19:02   ` Konrad Rzeszutek Wilk
2014-01-03 13:37     ` David Vrabel
2014-01-02 18:39 ` H. Peter Anvin
2014-01-02 19:12   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).