xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC][v2][PATCH 00/14] Fix RMRR
@ 2015-05-22  9:35 Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy Tiejun Chen
                   ` (14 more replies)
  0 siblings, 15 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      introduce XENMEM_reserved_device_memory_map
 
Tiejun Chen (13):
      tools: introduce some new parameters to set rdm policy
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools/libxl: detect and avoid conflicts with RDM
      xen/x86/p2m: introduce set_identity_p2m_entry
      xen:vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      tools: extend xc_assign_device() to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      tools: extend XENMEM_set_memory_map
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      xen/vtd: enable USB device assignment

 docs/man/xl.cfg.pod.5                       |  78 ++++++
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             |  62 +++--
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |  36 +++
 tools/firmware/hvmloader/pci.c              |  36 ++-
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  11 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/include/xenguest.h              |   1 +
 tools/libxc/xc_domain.c                     |  40 +++-
 tools/libxc/xc_hvm_build_x86.c              |  25 +-
 tools/libxl/libxl_create.c                  |  15 +-
 tools/libxl/libxl_dm.c                      | 253 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     | 114 ++++++++-
 tools/libxl/libxl_internal.h                |  13 +-
 tools/libxl/libxl_pci.c                     |  13 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxlu_pci.c                    |  92 +++++++
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  36 ++-
 tools/libxl/xl_cmdtable.c                   |   2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  18 +-
 tools/python/xen/lowlevel/xc/xc.c           |  29 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  41 ++++
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |   3 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  10 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  33 ++-
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |   4 +
 xen/include/public/domctl.h                 |   5 +
 xen/include/public/memory.h                 |  32 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 45 files changed, 1215 insertions(+), 96 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-06-02 15:57   ` Wei Liu
  2015-05-22  9:35 ` [RFC][v2][PATCH 02/14] introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = [ 'type=none/host, reserve=strict/relaxed' ]
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. Instead, 'none' means we have nothing
to do all reserved regions and ignore all policies, so guest work as before.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM will be killed, while 'relaxed' allows moving forward with a warning
message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 docs/man/xl.cfg.pod.5        | 57 +++++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 ++++++++++++
 tools/libxl/libxl_create.c   | 13 +++++++
 tools/libxl/libxl_internal.h |  2 +
 tools/libxl/libxl_pci.c      |  2 +
 tools/libxl/libxl_types.idl  | 18 +++++++++
 tools/libxl/libxlu_pci.c     | 92 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxlutil.h      |  4 ++
 tools/libxl/xl_cmdimpl.c     | 10 +++++
 9 files changed, 222 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8e4154f..12c34c4 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -645,6 +645,49 @@ assigned slave device.
 
 =back
 
+=item B<rdm= "RDM_RESERVE_STRING" >
+
+(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough usage. One example of
+RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
+structure on x86 platform.
+
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<type="STRING">
+
+Currently we just have two types:
+
+"host" means all reserved device memory on this platform should be reserved
+in this VM's pfn space. This global RDM parameter allows user to specify
+reserved regions explicitly. And using "host" to include all reserved regions
+reported on this platform which is good to handle hotplug scenario. In the
+future this parameter may be further extended to allow specifying random
+regions, e.g. even those belonging to another platform as a preparation
+for live migration with passthrough devices.
+
+"none" means we have nothing to do all reserved regions and ignore all policies,
+so guest work as before.
+
+=over 4
+
+=item B<reserve="STRING">
+
+Conflict may be detected when reserving reserved device memory in gfn space.
+"strict" means an unsolved conflict leads to immediate VM crash, while
+"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
+is default.
+
+Note this may be overrided by another sub item, rdm_reserve, in pci device.
+
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 
 Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
@@ -707,6 +750,20 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdm_reserv="STRING">
+
+(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough usage. One example of
+RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
+structure on x86 platform.
+
+Conflict may be detected when reserving reserved device memory in gfn space.
+"strict" means an unsolved conflict leads to immediate VM crash, while
+"relaxed" allows VM moving forward with a warning message thrown out. "strict"
+is default.
+
+Note this would override global B<rdm> option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..7d63c47 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
+
+For all the options available to RDM, see xl.cfg(5).
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f0da7dc..d649ead 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -100,6 +100,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
 }
 
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_NONE;
+    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -410,6 +416,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
                    libxl_domain_type_to_string(b_info->type));
         return ERROR_INVAL;
     }
+
+    libxl__rdm_setdefault(gc, b_info);
     return 0;
 }
 
@@ -1439,6 +1447,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
     }
 
     for (i = 0; i < d_config->num_pcidevs; i++) {
+        /*
+         * If the rdm global policy is 'force' we should override each device.
+         */
+        if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_STRICT)
+            d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
         ret = libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
         if (ret < 0) {
             LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8aaa1ad..3a2f6ec 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1108,6 +1108,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..07e84f2 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1039,6 +1039,8 @@ static int libxl__device_pci_reset(libxl__gc *gc, unsigned int domain, unsigned
 
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
 {
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    pci->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
     return 0;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 65d479f..f1acd13 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -73,6 +73,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
 
+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ])
+
+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID")
+
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -366,6 +377,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
     ])
 
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("type",    libxl_rdm_reserve_type),
+    ("reserve",   libxl_rdm_reserve_flag),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -413,6 +429,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("kernel",           string),
     ("cmdline",          string),
     ("ramdisk",          string),
+    ("rdm",     libxl_rdm_reserve),
     ("u", KeyedUnion(None, libxl_domain_type, "type",
                 [("hvm", Struct(None, [("firmware",         string),
                                        ("bios",             libxl_bios_type),
@@ -533,6 +550,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_reserve",   libxl_rdm_reserve_flag),
     ])
 
 libxl_device_vtpm = Struct("device_vtpm", [
diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..9255878 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_RDM_TYPE      10
+#define STATE_RESERVE_FLAG      11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str)
 {
     unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
+                }else if ( !strcmp(optkey, "rdm_reserve") ) {
+                    if ( !strcmp(tok, "strict") ) {
+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+                    } else if ( !strcmp(tok, "relaxed") ) {
+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+                    } else {
+                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+                                          " flag: 'strict' or 'relaxed'.",
+                                     tok);
+                        goto parse_error;
+                    }
                 }else{
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
@@ -167,6 +181,84 @@ parse_error:
     return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if (NULL == (buf2 = ptr = strdup(str)))
+        return ERROR_NOMEM;
+
+    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {
+        case STATE_TYPE:
+            if (*ptr == '=') {
+                state = STATE_RDM_TYPE;
+                *ptr = '\0';
+                if (strcmp(tok, "type")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RDM_TYPE:
+            if (*ptr == '\0' || *ptr == ',') {
+                state = STATE_RESERVE_FLAG;
+                *ptr = '\0';
+                if (!strcmp(tok, "host")) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_HOST;
+                } else if (!strcmp(tok, "none")) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_NONE;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM type option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RESERVE_FLAG:
+            if (*ptr == '=') {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if (strcmp(tok, "reserve")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if (*ptr == ',' || *ptr == '\0') {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if (!strcmp(tok, "strict")) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+                } else if (!strcmp(tok, "relaxed")) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property flag value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if (tok != ptr || state != STATE_TERMINAL)
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 989605a..e81b644 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -106,6 +106,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs,
  */
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str);
 
+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
 
 /*
  * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 373aa37..5b8cf2b 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1919,6 +1919,14 @@ skip_vfb:
         xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
     }
 
+    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
+        libxl_rdm_reserve rdm;
+        if (!xlu_rdm_parse(config, &rdm, buf)) {
+            b_info->rdm.type = rdm.type;
+            b_info->rdm.reserve = rdm.reserve;
+        }
+    }
+
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
         d_config->pcidevs = NULL;
@@ -1933,6 +1941,8 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            /* We'd like to force reserve rdm specific to a device by default.*/
+            pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 02/14] introduce XENMEM_reserved_device_memory_map
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 03/14] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/common/compat/memory.c           | 66 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 64 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 ++++++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 32 ++++++++++++++++-
 xen/include/xen/iommu.h              | 10 ++++++
 xen/include/xen/pci.h                |  2 ++
 xen/include/xlat.lst                 |  3 +-
 10 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+                return -ERANGE;
+
+            if ( __copy_to_compat_offset(grdm->map.buffer,
+                                         grdm->used_entries,
+                                         &rdm,
+                                         1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 063a1c5..c789f72 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
     return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            struct xen_reserved_device_memory rdm = {
+                .start_pfn = start, .nr_pages = nr
+            };
+
+            if ( __copy_to_guest_offset(grdm->map.buffer,
+                                        grdm->used_entries,
+                                        &rdm,
+                                        1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 06cb38f..0b2ef52 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 18d7903..518cae6 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
+    int rc = 0;
+    unsigned int i;
+    u16 bdf;
+
+    for_each_rmrr_device ( rmrr, bdf, i )
+    {
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            if ( !rc )
+                continue;
+
+            /* Just go next. */
+            if ( rc == 1 )
+                rmrr_cur = rmrr;
+        }
+    }
+
+    return 0;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 9053a1f..6a37624 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..7b25275 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,37 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * With some legacy devices, certain guest-physical addresses cannot safely
+ * be used for other purposes, e.g. to map guest RAM.  This hypercall
+ * enumerates those regions so the toolstack can avoid using them.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN */
+    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
+#define PCI_DEV_RDM_ALL   0x1
+    uint32_t        flag;
+    /* IN */
+    uint16_t        seg;
+    uint8_t         bus;
+    uint8_t         devfn;
+    /* IN/OUT */
+    unsigned int    nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..e2f584d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
 
 struct page_info;
 
+/*
+ * Any non-zero value returned from callbacks of this type will cause the
+ * function the callback was handed to terminate its iteration. Assigning
+ * meaning of these non-zero values is left to the top level caller /
+ * callback pair.
+ */
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -157,12 +165,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 4377f3e..f891f85 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..dd23559 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	vmemrange			memory.h
 !	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 03/14] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 02/14] introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxc/include/xenctrl.h |  8 ++++++++
 tools/libxc/xc_domain.c       | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 09a7450..5f84a62 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index a7079a1..c17a5a8 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .flag = flag,
+        .seg = seg,
+        .bus = bus,
+        .devfn = devfn,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (2 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 03/14] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-06-02 16:29   ` Wei Liu
  2015-05-22  9:35 ` [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RAM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

    #1. Above a predefined boundary (default 2G)
        - move lowmem_end below reserved region to solve conflict;

    #2. Below a predefined boundary (default 2G)
        - Check strict/relaxed policy.
        "strict" policy leads to fail libxl. Note when both policies
        are specified on a given region, 'strict' is always preferred.
        "relaxed" policy issue a warning message and also mask this entry INVALID
        to indicate we shouldn't expose this entry to hvmloader.

Note this predefined boundary can be changes with the parameter
"rdm_mem_boundary" in .cfg file.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 docs/man/xl.cfg.pod.5          |  21 ++++
 tools/libxc/include/xenguest.h |   1 +
 tools/libxc/xc_hvm_build_x86.c |  25 ++--
 tools/libxl/libxl_create.c     |   2 +-
 tools/libxl/libxl_dm.c         | 253 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c        |  27 ++++-
 tools/libxl/libxl_internal.h   |  11 +-
 tools/libxl/libxl_types.idl    |   8 ++
 tools/libxl/xl_cmdimpl.c       |   3 +
 9 files changed, 337 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 12c34c4..80e3930 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -764,6 +764,27 @@ is default.
 
 Note this would override global B<rdm> option.
 
+=item B<rdm_mem_boundary=MBYTES>
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RMRR entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+    #1. Above a predefined boundary
+        - move lowmem_end below reserved region to solve conflict;
+
+    #2. Below a predefined boundary
+        - Check strict/relaxed policy.
+        "strict" policy leads to fail libxl. Note when both policies
+        are specified on a given region, 'strict' is always preferred.
+        "relaxed" policy issue a warning message and also mask this entry INVALID
+        to indicate we shouldn't expose this entry to hvmloader.
+
+Her the default is 2G.
+
 =back
 
 =back
diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 7581263..4cb7e9f 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -234,6 +234,7 @@ struct xc_hvm_firmware_module {
 };
 
 struct xc_hvm_build_args {
+    uint64_t lowmem_size;        /* All low memory size in bytes. */
     uint64_t mem_size;           /* Memory size in bytes. */
     uint64_t mem_target;         /* Memory target in bytes. */
     uint64_t mmio_size;          /* Size of the MMIO hole in bytes. */
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index e45ae4a..9a1567a 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -21,6 +21,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <zlib.h>
+#include <assert.h>
 
 #include "xg_private.h"
 #include "xc_private.h"
@@ -98,11 +99,8 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     uint8_t sum;
     int i;
 
-    if ( lowmem_end > mmio_start )
-    {
-        highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
-        lowmem_end = mmio_start;
-    }
+    if ( args->mem_size > lowmem_end )
+        highmem_end = (1ull<<32) + (args->mem_size - lowmem_end);
 
     memset(hvm_info_page, 0, PAGE_SIZE);
 
@@ -279,7 +277,7 @@ static int setup_guest(xc_interface *xch,
 
     elf_parse_binary(&elf);
     v_start = 0;
-    v_end = args->mem_size;
+    v_end = args->lowmem_size;
 
     if ( nr_pages > target_pages )
         memflags |= XENMEMF_populate_on_demand;
@@ -344,8 +342,14 @@ static int setup_guest(xc_interface *xch,
 
     for ( i = 0; i < nr_pages; i++ )
         page_array[i] = i;
-    for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
-        page_array[i] += mmio_size >> PAGE_SHIFT;
+    /*
+     * Actually v_end is args->lowmem_size, and we already adjusted
+     * this below mmio_start when we check rdm previously, so here
+     * this condition 'v_end <= mmio_start' is always true.
+     */
+    assert(v_end <= mmio_start);
+    for ( i = v_end >> PAGE_SHIFT; i < nr_pages; i++ )
+        page_array[i] += ((1ull << 32) - v_end) >> PAGE_SHIFT;
 
     /*
      * Try to claim pages for early warning of insufficient memory available.
@@ -664,9 +668,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
     if ( args.mem_target == 0 )
         args.mem_target = args.mem_size;
 
-    if ( args.mmio_size == 0 )
-        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
-
     /* An HVM guest must be initialised with at least 2MB memory. */
     if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
         return -1;
@@ -713,6 +714,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
     args.mem_size = (uint64_t)memsize << 20;
     args.mem_target = (uint64_t)target << 20;
     args.image_file_name = image_name;
+    if ( args.mmio_size == 0 )
+        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
 
     return xc_hvm_build(xch, domid, &args);
 }
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d649ead..a782860 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, info, state);
+        ret = libxl__build_hvm(gc, domid, d_config, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 0c6408d..85e5317 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,259 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+static struct xen_reserved_device_memory
+*xc_device_get_rdm(libxl__gc *gc,
+                   uint32_t flag,
+                   uint16_t seg,
+                   uint8_t bus,
+                   uint8_t devfn,
+                   unsigned int *nr_entries)
+{
+    struct xen_reserved_device_memory *xrdm = NULL;
+    int rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                           xrdm, nr_entries);
+
+    assert( rc <= 0 );
+    /* "0" means we have no any rdm entry. */
+    if ( !rc )
+        goto out;
+
+    if ( errno == ENOBUFS )
+    {
+        if ( (xrdm = malloc(*nr_entries *
+                            sizeof(xen_reserved_device_memory_t))) == NULL )
+        {
+            LOG(ERROR, "Could not allocate RDM buffer!\n");
+            goto out;
+        }
+        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                           xrdm, nr_entries);
+        if ( rc )
+        {
+            LOG(ERROR, "Could not get reserved device memory maps.\n");
+            *nr_entries = 0;
+            free(xrdm);
+            xrdm = NULL;
+        }
+    }
+    else
+        LOG(ERROR, "Could not get reserved device memory maps.\n");
+
+ out:
+    return xrdm;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+                         uint64_t rdm_start, uint64_t rdm_size)
+{
+    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RMRR can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RMRR probably scatter the whole RAM space.
+ * Especially multiple RMRR entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * "strict" policy leads to fail libxl. Note when both policies
+ * are specified on a given region, 'strict' is always preferred.
+ * "relaxed" policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                       libxl_domain_config *d_config,
+                                       uint64_t rdm_mem_boundary,
+                                       struct xc_hvm_build_args *args)
+{
+    int i, j, conflict;
+    struct xen_reserved_device_memory *xrdm = NULL;
+    uint64_t rdm_start, rdm_size, highmem_end = (1ULL << 32);
+    uint32_t type = d_config->b_info.rdm.type;
+    uint16_t seg;
+    uint8_t bus, devfn;
+
+    /* Fix highmem. */
+    highmem_end += (args->mem_size - args->lowmem_size);
+
+    /* Might not expose rdm. */
+    if (type == LIBXL_RDM_RESERVE_TYPE_NONE)
+        return 0;
+
+    /* Query all RDM entries in this platform */
+    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
+        unsigned int nr_entries;
+
+        /* Collect all rdm info if exist. */
+        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+                                 0, 0, 0, &nr_entries);
+        if (!nr_entries)
+            return 0;
+
+        d_config->num_rdms = nr_entries;
+        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+        for (i = 0; i < d_config->num_rdms; i++) {
+            d_config->rdms[i].start =
+                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[i].size =
+                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
+        }
+
+        free(xrdm);
+    } else
+        d_config->num_rdms = 0;
+
+    /* Query RDM entries per-device */
+    for (i = 0; i < d_config->num_pcidevs; i++) {
+        unsigned int nr_entries;
+
+        bool new = true;
+        seg = d_config->pcidevs[i].domain;
+        bus = d_config->pcidevs[i].bus;
+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
+        nr_entries = 0;
+        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+                                 seg, bus, devfn, &nr_entries);
+        /* No RDM to associated with this device. */
+        if (!nr_entries)
+            continue;
+
+        /*
+         * Need to check whether this entry is already saved in the array.
+         * This could come from two cases:
+         *
+         *   - user may configure to get all RMRRs in this platform, which
+         *   is already queried before this point
+         *   - or two assigned devices may share one RMRR entry
+         *
+         * different policies may be configured on the same RMRR due to above
+         * two cases. We choose a simple policy to always favor stricter policy
+         */
+        for (j = 0; j < d_config->num_rdms; j++) {
+            if (d_config->rdms[j].start ==
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
+             {
+                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
+                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
+                new = false;
+                break;
+            }
+        }
+
+        if (new) {
+            d_config->num_rdms++;
+            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+            /* This is a new entry. */
+            d_config->rdms[d_config->num_rdms].start =
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms].size =
+                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms].flag = d_config->pcidevs[i].rdm_reserve;
+        }
+        free(xrdm);
+    }
+
+    /*
+     * Next step is to check and avoid potential conflict between RDM entries
+     * and guest RAM. To avoid intrusive impact to existing memory layout
+     * {lowmem, mmio, highmem} which is passed around various function blocks,
+     * below conflicts are not handled which are rare and handling them would
+     * lead to a more scattered layout:
+     *  - RMRR in highmem area (>4G)
+     *  - RMRR lower than a defined memory boundary (e.g. 2G)
+     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
+     * end below reserved region to solve conflict.
+     *
+     * If a conflict is detected on a given RMRR entry, an error will be
+     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
+     * specified, this conflict is treated just as a warning, but we mark this
+     * RMRR entry as INVALID to indicate that this entry shouldn't be exposed
+     * to hvmloader.
+     *
+     * Firstly we should check the case of rdm < 4G because we may need to
+     * expand highmem_end.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        conflict = overlaps_rdm(0, args->lowmem_size, rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        /* Just check if RDM > our memory boundary. */
+        if (rdm_start > rdm_mem_boundary) {
+            /*
+             * We will move downwards lowmem_end so we have to expand
+             * highmem_end.
+             */
+            highmem_end += (args->lowmem_size - rdm_start);
+            /* Now move downwards lowmem_end. */
+            args->lowmem_size = rdm_start;
+        }
+    }
+
+    /*
+     * Finally we can take same policy to check lowmem(< 2G) and
+     * highmem adjusted above.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        /* Does this entry conflict with lowmem? */
+        conflict = overlaps_rdm(0, args->lowmem_size,
+                                rdm_start, rdm_size);
+        /* Does this entry conflict with highmem? */
+        conflict |= overlaps_rdm((1ULL<<32),
+                                 highmem_end - (1ULL<<32),
+                                 rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return -1;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a0c9850..84d5465 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -914,12 +914,14 @@ out:
 }
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    uint64_t rdm_mem_boundary, mmio_start;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -928,6 +930,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
      * Do all this in one step here...
      */
     args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
+    args.lowmem_size = min((uint64_t)(1ULL << 32), args.mem_size);
     args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
     args.claim_enabled = libxl_defbool_val(info->claim_mode);
     if (info->u.hvm.mmio_hole_memkb) {
@@ -937,6 +940,28 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         if (max_ram_below_4g < HVM_BELOW_4G_MMIO_START)
             args.mmio_size = info->u.hvm.mmio_hole_memkb << 10;
     }
+
+    if (args.mmio_size == 0)
+        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
+    mmio_start = (1ull << 32) - args.mmio_size;
+
+    if (args.lowmem_size > mmio_start)
+        args.lowmem_size = mmio_start;
+
+    /*
+     * We'd like to set a memory boundary to determine if we need to check
+     * any overlap with reserved device memory.
+     */
+    rdm_mem_boundary = 0x80000000;
+    if (info->rdm_mem_boundary_memkb)
+        rdm_mem_boundary = info->rdm_mem_boundary_memkb;
+    ret = libxl__domain_device_construct_rdm(gc, d_config, rdm_mem_boundary,
+                                             &args);
+    if (ret) {
+        LOG(ERROR, "checking reserved device memory failed");
+        goto out;
+    }
+
     if (libxl__domain_firmware(gc, info, &args)) {
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3a2f6ec..d00e210 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1011,7 +1011,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
@@ -1514,6 +1514,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_channels, libxl_device_channel *channels);
 
 /*
+ * This function will fix reserved device memory conflict
+ * according to user's configuration.
+ */
+_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                   libxl_domain_config *d_config,
+                                   uint64_t rdm_mem_guard,
+                                   struct xc_hvm_build_args *args);
+
+/*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f1acd13..aff5f96 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
     ("shadow_memkb",    MemKB),
+    ("rdm_mem_boundary_memkb",    MemKB),
     ("rtc_timeoffset",  uint32),
     ("exec_ssidref",    uint32),
     ("exec_ssid_label", string),
@@ -553,6 +554,12 @@ libxl_device_pci = Struct("device_pci", [
     ("rdm_reserve",   libxl_rdm_reserve_flag),
     ])
 
+libxl_device_rdm = Struct("device_rdm", [
+    ("start", uint64),
+    ("size", uint64),
+    ("flag", bool),
+    ])
+
 libxl_device_vtpm = Struct("device_vtpm", [
     ("backend_domid",    libxl_domid),
     ("backend_domname",  string),
@@ -579,6 +586,7 @@ libxl_domain_config = Struct("domain_config", [
     ("disks", Array(libxl_device_disk, "num_disks")),
     ("nics", Array(libxl_device_nic, "num_nics")),
     ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
+    ("rdms", Array(libxl_device_rdm, "num_rdms")),
     ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
     ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
     ("vtpms", Array(libxl_device_vtpm, "num_vtpms")),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5b8cf2b..2dfa106 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
     if (!xlu_cfg_get_long (config, "videoram", &l, 0))
         b_info->video_memkb = l * 1024;
 
+    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
+        b_info->rdm_mem_boundary_memkb = l * 1024;
+
     if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
         b_info->event_channels = l;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (3 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-28 12:27   ` Jan Beulich
  2015-05-22  9:35 ` [RFC][v2][PATCH 06/14] xen:vtd: create RMRR mapping Tiejun Chen
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

We will create this sort of identity mapping as follows:

If the gfn space is unoccupied, we just set the mapping. If the space
is already occupied by 1:1 mappings, do nothing. Failed for any
other cases.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c     | 30 ++++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  4 ++++
 2 files changed, 34 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1fd1194..c674201 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -898,6 +898,36 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret = -EBUSY;
+
+    gfn_lock(p2m, gfn, 0);
+
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+    if ( p2mt == p2m_invalid )
+        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+                            p2m_mmio_direct, p2ma);
+    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+        ret = 0;
+    else
+    {
+        printk(XENLOG_G_WARNING
+               "Cannot identity map d%d:%lx, already mapped to %lx.\n",
+               d->domain_id, gfn, mfn_x(mfn));
+    }
+
+    gfn_unlock(p2m, gfn, 0);
+
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..95b6266 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -543,6 +543,10 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
                        p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 06/14] xen:vtd: create RMRR mapping
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (4 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c               | 5 +++++
 xen/drivers/passthrough/vtd/iommu.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c674201..3574521 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -925,6 +925,11 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
 
     gfn_unlock(p2m, gfn, 0);
 
+    if( ret == 0 )
+    {
+        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
+    }
+
     return ret;
 }
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 6a37624..31ce1af 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1856,8 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
         if ( err )
             return err;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (5 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 06/14] xen:vtd: create RMRR mapping Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22 10:33   ` Julien Grall
  2015-05-22  9:35 ` [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() " Tiejun Chen
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is 'force' or 'try'. And in some cases, e.g. add a device to
hwdomain, and remove a device from user domain, 'try' is fine enough since
this is always safe to hwdomain.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c                       |  8 +++++++-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c          |  2 +-
 xen/drivers/passthrough/device_tree.c       |  3 ++-
 xen/drivers/passthrough/pci.c               |  9 +++++----
 xen/drivers/passthrough/vtd/iommu.c         | 20 ++++++++++++--------
 xen/include/asm-x86/p2m.h                   |  2 +-
 xen/include/public/domctl.h                 |  5 +++++
 xen/include/xen/iommu.h                     |  2 +-
 9 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3574521..89473b9 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma)
+                           p2m_access_t p2ma, u32 flag)
 {
     p2m_type_t p2mt;
     p2m_access_t a;
@@ -921,6 +921,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
         printk(XENLOG_G_WARNING
                "Cannot identity map d%d:%lx, already mapped to %lx.\n",
                d->domain_id, gfn, mfn_x(mfn));
+
+        if ( flag == XEN_DOMCTL_DEV_RDM_RELAXED )
+        {
+            ret = 0;
+            printk(XENLOG_G_WARNING "Some devices may work failed .\n");
+        }
     }
 
     gfn_unlock(p2m, gfn, 0);
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-                                   struct pci_dev *pdev)
+                                   struct pci_dev *pdev,
+                                   u32 flag)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
     int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-			       struct device *dev)
+			       struct device *dev, u32 flag)
 {
 	struct iommu_domain *domain;
 	struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 5d3842a..d4ff7f0 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
             goto fail;
     }
 
-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
+                                         XEN_DOMCTL_DEV_NO_RDM);
 
     if ( rc )
         goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 862e20f..c06f038 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1338,7 +1338,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
     return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     struct pci_dev *pdev;
@@ -1374,7 +1374,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 
     pdev->fault.count = 0;
 
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
         goto done;
 
     for ( ; pdev->phantom_stride; rc = 0 )
@@ -1382,7 +1382,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
         devfn += pdev->phantom_stride;
         if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
             break;
-        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
+        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
         if ( rc )
             printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
                    d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
@@ -1499,6 +1499,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
+    u32 flag = XEN_DOMCTL_DEV_RDM_RELAXED;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1582,7 +1583,7 @@ int iommu_do_pci_domctl(
         devfn = PCI_DEVFN2(machine_sbdf);
 
         ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
+              assign_device(d, seg, bus, devfn, flag);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 31ce1af..d7c9e1c 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1808,7 +1808,8 @@ static void iommu_set_pgd(struct domain *d)
 }
 
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
-                                 const struct acpi_rmrr_unit *rmrr)
+                                 const struct acpi_rmrr_unit *rmrr,
+                                 u32 flag)
 {
     unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
     unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
@@ -1856,7 +1857,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
 
         if ( err )
             return err;
@@ -1899,7 +1900,8 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
+                                        XEN_DOMCTL_DEV_RDM_RELAXED);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
                         pdev->domain->domain_id);
@@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
-        rmrr_identity_mapping(pdev->domain, 0, rmrr);
+        rmrr_identity_mapping(pdev->domain, 0, rmrr,
+                              XEN_DOMCTL_DEV_RDM_RELAXED);
     }
 
     return domain_context_unmap(pdev->domain, devfn, pdev);
@@ -2098,7 +2101,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
-        ret = rmrr_identity_mapping(d, 1, rmrr);
+        ret = rmrr_identity_mapping(d, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
@@ -2241,7 +2244,8 @@ static int reassign_device_ownership(
                  PCI_BUS(bdf) == pdev->bus &&
                  PCI_DEVFN2(bdf) == devfn )
             {
-                ret = rmrr_identity_mapping(source, 0, rmrr);
+                ret = rmrr_identity_mapping(source, 0, rmrr,
+                                            XEN_DOMCTL_DEV_RDM_RELAXED);
                 if ( ret != -ENOENT )
                     return ret;
             }
@@ -2265,7 +2269,7 @@ static int reassign_device_ownership(
 }
 
 static int intel_iommu_assign_device(
-    struct domain *d, u8 devfn, struct pci_dev *pdev)
+    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
 {
     struct acpi_rmrr_unit *rmrr;
     int ret = 0, i;
@@ -2294,7 +2298,7 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(d, 1, rmrr);
+            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
             if ( ret )
             {
                 reassign_device_ownership(d, hardware_domain, devfn, pdev);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 95b6266..a80b4f8 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma);
+                           p2m_access_t p2ma, u32 flag);
 
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 0c0ea4a..203c80e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
         } dt;
     } u;
+    /* IN */
+#define XEN_DOMCTL_DEV_NO_RDM           0
+#define XEN_DOMCTL_DEV_RDM_RELAXED      1
+#define XEN_DOMCTL_DEV_RDM_STRICT       2
+    uint32_t  flag;   /* flag of assigned device */
 };
 typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e2f584d..02b2b02 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -140,7 +140,7 @@ struct iommu_ops {
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);
-    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
+    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
     int (*reassign_device)(struct domain *s, struct domain *t,
                            u8 devfn, device_t *dev);
 #ifdef HAS_PCI
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() to support rdm reservation policy
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (6 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-06-02 16:36   ` Wei Liu
  2015-05-22  9:35 ` [RFC][v2][PATCH 09/14] xen: enable XENMEM_memory_map in hvm Tiejun Chen
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxc/include/xenctrl.h       |  3 ++-
 tools/libxc/xc_domain.c             |  4 +++-
 tools/libxl/libxl_pci.c             | 11 ++++++++++-
 tools/libxl/xl_cmdimpl.c            | 23 +++++++++++++++++++----
 tools/libxl/xl_cmdtable.c           |  2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
 tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
 xen/drivers/passthrough/pci.c       |  3 ++-
 8 files changed, 70 insertions(+), 23 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 5f84a62..2a447b9 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
-                     uint32_t machine_sbdf);
+                     uint32_t machine_sbdf,
+                     uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index c17a5a8..9761e5a 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1704,7 +1704,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t machine_sbdf)
+    uint32_t machine_sbdf,
+    uint32_t flag)
 {
     DECLARE_DOMCTL;
 
@@ -1712,6 +1713,7 @@ int xc_assign_device(
     domctl.domain = domid;
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
     domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+    domctl.u.assign_device.flag = flag;
 
     return do_domctl(xch, &domctl);
 }
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 07e84f2..ac70edc 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
+    uint32_t flag;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -987,7 +988,15 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
+            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            flag = XEN_DOMCTL_DEV_RDM_STRICT;
+        } else {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");
+            return ERROR_FAIL;
+        }
+        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
             return ERROR_FAIL;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 2dfa106..0816186 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3341,7 +3341,8 @@ int main_pcidetach(int argc, char **argv)
     pcidetach(domid, bdf, force);
     return 0;
 }
-static void pciattach(uint32_t domid, const char *bdf, const char *vs)
+static void pciattach(uint32_t domid, const char *bdf, const char *vs,
+                      uint32_t flag)
 {
     libxl_device_pci pcidev;
     XLU_Config *config;
@@ -3351,6 +3352,7 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
     config = xlu_cfg_init(stderr, "command line");
     if (!config) { perror("xlu_cfg_inig"); exit(-1); }
 
+    pcidev.rdm_reserve = flag;
     if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
         fprintf(stderr, "pci-attach: malformed BDF specification \"%s\"\n", bdf);
         exit(2);
@@ -3363,9 +3365,9 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
 
 int main_pciattach(int argc, char **argv)
 {
-    uint32_t domid;
+    uint32_t domid, flag;
     int opt;
-    const char *bdf = NULL, *vs = NULL;
+    const char *bdf = NULL, *vs = NULL, *rdm_policy = NULL;
 
     SWITCH_FOREACH_OPT(opt, "", NULL, "pci-attach", 2) {
         /* No options */
@@ -3377,7 +3379,20 @@ int main_pciattach(int argc, char **argv)
     if (optind + 1 < argc)
         vs = argv[optind + 2];
 
-    pciattach(domid, bdf, vs);
+    if (optind + 2 < argc) {
+        rdm_policy = argv[optind + 3];
+    }
+    if (!strcmp(rdm_policy, "strict")) {
+        flag = LIBXL_RDM_RESERVE_FLAG_STRICT;
+    } else if (!strcmp(rdm_policy, "relaxed")) {
+        flag = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+    } else {
+        fprintf(stderr, "%s is an invalid rdm policy: 'strict'|'relaxed'\n",
+                rdm_policy);
+        exit(2);
+    }
+
+    pciattach(domid, bdf, vs, flag);
     return 0;
 }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7f4759b..06cc452 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
     { "pci-attach",
       &main_pciattach, 0, 1,
       "Insert a new pass-through pci device",
-      "<Domain> <BDF> [Virtual Slot]",
+      "<Domain> <BDF> [Virtual Slot] <policy to reserve rdm['force'|'try']>",
     },
     { "pci-detach",
       &main_pcidetach, 0, 1,
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..317bf75 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
 	CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+    XEN_DOMCTL_DEV_NO_RDM,
+    XEN_DOMCTL_DEV_RDM_RELAXED,
+    XEN_DOMCTL_DEV_RDM_STRICT,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+                                            value rflag)
 {
-	CAMLparam3(xch, domid, desc);
+	CAMLparam4(xch, domid, desc, rflag);
 	int ret;
 	int domain, bus, dev, func;
-	uint32_t sbdf;
+	uint32_t sbdf, flag;
 
 	domain = Int_val(Field(desc, 0));
 	bus = Int_val(Field(desc, 1));
@@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
 	func = Int_val(Field(desc, 3));
 	sbdf = encode_sbdf(domain, bus, dev, func);
 
-	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+	ret = Int_val(Field(rflag, 0));
+	flag = domain_assign_device_rdm_flag_table[ret];
+
+	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
 	if (ret < 0)
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index fbd93db..86b8925 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
+                    int *flag)
 {
     char *token;
 
@@ -607,8 +608,16 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
     *dev  = token_value(token);
     token = strchr(token, ',') + 1;
     *func  = token_value(token);
-    token = strchr(token, ',');
-    *str = token ? token + 1 : NULL;
+    token = strchr(token, ',') + 1;
+    if ( token ) {
+        *flag = token_value(token);
+        *str = token + 1;
+    }
+    else
+    {
+        *flag = XEN_DOMCTL_DEV_RDM_STRICT;
+        *str = NULL;
+    }
 
     return 1;
 }
@@ -620,14 +629,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
@@ -653,21 +662,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
         sbdf |= (dev & 0x1f) << 3;
         sbdf |= (func & 0x7);
 
-        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
+        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
         {
             if (errno == ENOSYS)
                 sbdf = -1;
@@ -686,14 +695,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c06f038..f3088a3 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1499,7 +1499,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
-    u32 flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+    u32 flag;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1581,6 +1581,7 @@ int iommu_do_pci_domctl(
         seg = machine_sbdf >> 16;
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
 
         ret = device_assigned(seg, bus, devfn) ?:
               assign_device(d, seg, bus, devfn, flag);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 09/14] xen: enable XENMEM_memory_map in hvm
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (7 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() " Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map Tiejun Chen
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

This patch enables XENMEM_memory_map in hvm. So we can use it to
setup the e820 mappings.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c      | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 689e402..0dedd3b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4705,7 +4705,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
@@ -4781,7 +4780,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 034de22..d6fa080 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4715,12 +4715,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (8 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 09/14] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22 10:25   ` Julien Grall
  2015-06-02 16:42   ` Wei Liu
  2015-05-22  9:35 ` [RFC][v2][PATCH 11/14] hvmloader: get guest memory map into memory_map[] Tiejun Chen
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist. And hvmloader would need this info
later.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxl/libxl_dom.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 84d5465..cc4b1a6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -913,6 +913,87 @@ out:
     return rc;
 }
 
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+static int libxl__domain_construct_memmap(libxl__gc *gc,
+                                          libxl_domain_config *d_config,
+                                          uint32_t domid,
+                                          struct xc_hvm_build_args *args)
+{
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    unsigned int nr = 0, i;
+    /* We always own at least one lowmem entry. */
+    unsigned int e820_entries = 1;
+    uint64_t highmem_end = 0, highmem_size = args->mem_size - args->lowmem_size;
+    struct e820entry *e820 = NULL;
+
+    /* Add all rdm entries. */
+    e820_entries += d_config->num_rdms;
+
+    /* If we should have a highmem range. */
+    if (highmem_size)
+    {
+        highmem_end = (1ull<<32) + highmem_size;
+        e820_entries++;
+    }
+
+    if (e820_entries >= E820MAX) {
+        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+        return -1;
+    }
+
+    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+    /* Low memory */
+    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = args->lowmem_size - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].type = E820_RAM;
+    nr++;
+
+    /* RDM mapping */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        /*
+         * We should drop this kind of rdm entry.
+         */
+        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
+            continue;
+
+        e820[nr].addr = d_config->rdms[i].start;
+        e820[nr].size = d_config->rdms[i].size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
+    /* High memory */
+    if (highmem_size) {
+        e820[nr].addr = ((uint64_t)1 << 32);
+        e820[nr].size = highmem_size;
+        e820[nr].type = E820_RAM;
+    }
+
+    if (xc_domain_set_memory_map(ctx->xch, domid, e820, e820_entries) != 0)
+        return -1;
+
+    return 0;
+}
+
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
               libxl_domain_config *d_config,
               libxl__domain_build_state *state)
@@ -1016,6 +1097,12 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         ret = set_vnuma_info(gc, domid, info, state);
         if (ret) goto out;
     }
+
+    if (libxl__domain_construct_memmap(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain rdm memory map failed");
+        goto out;
+    }
+
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 11/14] hvmloader: get guest memory map into memory_map[]
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (9 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 12/14] hvmloader/pci: skip reserved ranges Tiejun Chen
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/e820.h      |  7 +++++++
 tools/firmware/hvmloader/hvmloader.c | 36 ++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.c      | 26 ++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h      | 11 +++++++++++
 4 files changed, 80 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
     uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX	128
+
+struct e820map {
+    unsigned int nr_map;
+    struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..33da5b5 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -107,6 +107,8 @@ asm (
     "    .text                       \n"
     );
 
+struct e820map memory_map;
+
 unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS;
 
 static void init_hypercalls(void)
@@ -199,6 +201,38 @@ static void apic_setup(void)
     ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
 }
 
+void memory_map_setup(void)
+{
+    unsigned int nr_entries = E820MAX, i;
+    int rc;
+    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+    if ( rc )
+    {
+        printf("Failed to get guest memory map.\n");
+        BUG();
+    }
+
+    memory_map.nr_map = nr_entries;
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED )
+        {
+            if ( check_overlap(alloc_addr, alloc_size,
+                               memory_map.map[i].addr,
+                               memory_map.map[i].size) )
+            {
+                printf("RDM conflicts Memory allocation.\n");
+                BUG();
+            }
+        }
+    }
+}
+
 struct bios_info {
     const char *key;
     const struct bios_config *bios;
@@ -262,6 +296,8 @@ int main(void)
 
     init_hypercalls();
 
+    memory_map_setup();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size)
+{
+    return (start + size > reserved_start) &&
+            (start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
     *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+    int rc;
+    struct xen_memory_map memmap = {
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(memmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+    *max_entries = memmap.nr_entries;
+
+    return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index a70e4aa..70e19c4 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include <stdarg.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include <xen/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci_setup(void);
 
+/* Setup memory map  */
+void memory_map_setup(void);
+
 /* Prepare the 32bit BIOS */
 uint32_t rombios_highbios_setup(void);
 
@@ -249,6 +254,12 @@ void perform_tests(void);
 
 extern char _start[], _end[];
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries);
+
+extern struct e820map memory_map;
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 12/14] hvmloader/pci: skip reserved ranges
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (10 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 11/14] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 13/14] hvmloader/e820: construct guest e820 table Tiejun Chen
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/pci.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..98af568 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -59,8 +59,8 @@ void pci_setup(void)
         uint32_t bar_reg;
         uint64_t bar_sz;
     } *bars = (struct bars *)scratch_start;
-    unsigned int i, nr_bars = 0;
-    uint64_t mmio_hole_size = 0;
+    unsigned int i, j, nr_bars = 0;
+    uint64_t mmio_hole_size = 0, reserved_end, max_bar_sz = 0;
 
     const char *s;
     /*
@@ -226,6 +226,8 @@ void pci_setup(void)
             bars[i].devfn   = devfn;
             bars[i].bar_reg = bar_reg;
             bars[i].bar_sz  = bar_sz;
+            if ( bar_sz > max_bar_sz )
+                max_bar_sz = bar_sz;
 
             if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
                   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
@@ -301,6 +303,21 @@ void pci_setup(void)
             pci_mem_start <<= 1;
     }
 
+    /* Relocate PCI memory that overlaps reserved space, like RDM. */
+    for ( j = 0; j < memory_map.nr_map ; j++ )
+    {
+        if ( memory_map.map[j].type != E820_RAM )
+        {
+            reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
+            if ( check_overlap(pci_mem_start, pci_mem_end,
+                               memory_map.map[j].addr,
+                               memory_map.map[j].size) )
+                pci_mem_start -= memory_map.map[j].size >> PAGE_SHIFT;
+                pci_mem_start = (pci_mem_start + max_bar_sz - 1) &
+                                    ~(uint64_t)(max_bar_sz - 1);
+        }
+    }
+
     if ( mmio_total > (pci_mem_end - pci_mem_start) )
     {
         printf("Low MMIO hole not large enough for all devices,"
@@ -407,8 +424,23 @@ void pci_setup(void)
         }
 
         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_mmio:
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
+        for ( j = 0; j < memory_map.nr_map ; j++ )
+        {
+            if ( memory_map.map[j].type != E820_RAM )
+            {
+                reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
+                if ( check_overlap(base, bar_sz,
+                                   memory_map.map[j].addr,
+                                   memory_map.map[j].size) )
+                {
+                    base = (reserved_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+                    goto reallocate_mmio;
+                }
+            }
+        }
         base += bar_sz;
 
         if ( (base < resource->base) || (base > resource->max) )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 13/14] hvmloader/e820: construct guest e820 table
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (11 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 12/14] hvmloader/pci: skip reserved ranges Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:35 ` [RFC][v2][PATCH 14/14] xen/vtd: enable USB device assignment Tiejun Chen
  2015-05-22  9:46 ` [RFC][v2][PATCH 00/14] Fix RMRR Jan Beulich
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/e820.c | 62 +++++++++++++++++++++++++++++++----------
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..c39b0aa 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -73,7 +73,8 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint64_t low_mem_pgend = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -117,13 +118,6 @@ int build_e820_table(struct e820entry *e820,
     e820[nr].type = E820_RESERVED;
     nr++;
 
-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
     /*
      * Explicitly reserve space for special pages.
      * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -159,16 +153,56 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
-
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct the remaining according memory_map.
+     *
+     * Note memory_map includes,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. RDM region if it exists
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
     {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        e820[nr] = memory_map.map[i];
         nr++;
     }
 
+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_pgend < (2u << 20));
+    /*
+     * We may need to adjust real lowmem end since we may
+     * populate RAM to get enough MMIO previously.
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_pgend > e820[i].addr && low_mem_pgend < end )
+            e820[i].size = low_mem_pgend - e820[i].addr;
+    }
+
+    /* Finally we need to reorder all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC][v2][PATCH 14/14] xen/vtd: enable USB device assignment
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (12 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 13/14] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-05-22  9:35 ` Tiejun Chen
  2015-05-22  9:46 ` [RFC][v2][PATCH 00/14] Fix RMRR Jan Beulich
  14 siblings, 0 replies; 43+ messages in thread
From: Tiejun Chen @ 2015-05-22  9:35 UTC (permalink / raw)
  To: JBeulich, tim, konrad.wilk, andrew.cooper3, kevin.tian,
	yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
region so we always ignore USB RMRR. Now this can be gone when we enable
pci_force to check/reserve RMRR.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
 xen/drivers/passthrough/vtd/utils.c |  7 -------
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {                                                \
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d7c9e1c..d3233b8 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
     /*
      * If the device belongs to the hardware domain, and it has RMRR, don't
      * remove it from the hardware domain, because BIOS may use RMRR at
-     * booting time. Also account for the special casing of USB below (in
-     * intel_iommu_assign_device()).
+     * booting time.
      */
-    if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+    if ( !is_hardware_domain(source) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2283,13 +2281,8 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
-     */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
-        return 0;
 
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include <asm/io_apic.h>
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                                PCI_CLASS_DEVICE);
-    return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 00/14] Fix RMRR
  2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
                   ` (13 preceding siblings ...)
  2015-05-22  9:35 ` [RFC][v2][PATCH 14/14] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-05-22  9:46 ` Jan Beulich
  2015-05-28  5:48   ` Chen, Tiejun
  14 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2015-05-22  9:46 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

>>> On 22.05.15 at 11:35, <tiejun.chen@intel.com> wrote:
> As you know all devices are owned by Dom0 firstly before we create any
> DomU, right? Do we allow Dom0 still own a group device while assign another
> device in the same group?

Clearly not, or - just like anything else putting the security of a system
at risk - only at explicit host admin request.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map
  2015-05-22  9:35 ` [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map Tiejun Chen
@ 2015-05-22 10:25   ` Julien Grall
  2015-05-25  2:00     ` Chen, Tiejun
  2015-06-02 16:42   ` Wei Liu
  1 sibling, 1 reply; 43+ messages in thread
From: Julien Grall @ 2015-05-22 10:25 UTC (permalink / raw)
  To: Tiejun Chen, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

Hi,

On 22/05/2015 10:35, Tiejun Chen wrote:
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist. And hvmloader would need this info
> later.
>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>   tools/libxl/libxl_dom.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 87 insertions(+)
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 84d5465..cc4b1a6 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -913,6 +913,87 @@ out:
>       return rc;
>   }
>
> +/*
> + * Here we're just trying to set these kinds of e820 mappings:
> + *
> + * #1. Low memory region
> + *
> + * Low RAM starts at least from 1M to make sure all standard regions
> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> + * have enough space.
> + * Note: Those stuffs below 1M are still constructed with multiple
> + * e820 entries by hvmloader. At this point we don't change anything.
> + *
> + * #2. RDM region if it exists
> + *
> + * #3. High memory region if it exists
> + *
> + * Note: these regions are not overlapping since we already check
> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
> + */
> +#define GUEST_LOW_MEM_START_DEFAULT 0x100000
> +static int libxl__domain_construct_memmap(libxl__gc *gc,
> +                                          libxl_domain_config *d_config,
> +                                          uint32_t domid,
> +                                          struct xc_hvm_build_args *args)

The code within this function is x86 specific. Shouldn't it be moved in 
libxl_x86.c?

> +{
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    unsigned int nr = 0, i;
> +    /* We always own at least one lowmem entry. */
> +    unsigned int e820_entries = 1;
> +    uint64_t highmem_end = 0, highmem_size = args->mem_size - args->lowmem_size;
> +    struct e820entry *e820 = NULL;
> +
> +    /* Add all rdm entries. */
> +    e820_entries += d_config->num_rdms;
> +
> +    /* If we should have a highmem range. */
> +    if (highmem_size)
> +    {
> +        highmem_end = (1ull<<32) + highmem_size;
> +        e820_entries++;
> +    }
> +
> +    if (e820_entries >= E820MAX) {
> +        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
> +        return -1;
> +    }
> +
> +    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
> +
> +    /* Low memory */
> +    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
> +    e820[nr].size = args->lowmem_size - GUEST_LOW_MEM_START_DEFAULT;
> +    e820[nr].type = E820_RAM;
> +    nr++;
> +
> +    /* RDM mapping */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        /*
> +         * We should drop this kind of rdm entry.
> +         */
> +        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
> +            continue;
> +
> +        e820[nr].addr = d_config->rdms[i].start;
> +        e820[nr].size = d_config->rdms[i].size;
> +        e820[nr].type = E820_RESERVED;
> +        nr++;
> +    }
> +
> +    /* High memory */
> +    if (highmem_size) {
> +        e820[nr].addr = ((uint64_t)1 << 32);
> +        e820[nr].size = highmem_size;
> +        e820[nr].type = E820_RAM;
> +    }
> +
> +    if (xc_domain_set_memory_map(ctx->xch, domid, e820, e820_entries) != 0)
> +        return -1;
> +
> +    return 0;
> +}
> +
>   int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>                 libxl_domain_config *d_config,
>                 libxl__domain_build_state *state)
> @@ -1016,6 +1097,12 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>           ret = set_vnuma_info(gc, domid, info, state);
>           if (ret) goto out;
>       }
> +
> +    if (libxl__domain_construct_memmap(gc, d_config, domid, &args)) {
> +        LOG(ERROR, "setting domain rdm memory map failed");
> +        goto out;
> +    }
> +
>       ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
>                                  &state->store_mfn, state->console_port,
>                                  &state->console_mfn, state->store_domid,
>

regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-22  9:35 ` [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-05-22 10:33   ` Julien Grall
  2015-05-25  2:09     ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2015-05-22 10:33 UTC (permalink / raw)
  To: Tiejun Chen, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

Hi,

On 22/05/2015 10:35, Tiejun Chen wrote:
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 0c0ea4a..203c80e 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
>               XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>           } dt;
>       } u;
> +    /* IN */
> +#define XEN_DOMCTL_DEV_NO_RDM           0
> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
> +    uint32_t  flag;   /* flag of assigned device */

You don't plumb this value for DT neither in the toolstack (see 
xc_assign_dt_device) and Xen. Please add a comment saying it's only used 
by PCI and/or the value should always be XEN_DOMCTL_DEV_NO_RDM for DT.

Regards,

-- 
-- 
Julien Grall

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map
  2015-05-22 10:25   ` Julien Grall
@ 2015-05-25  2:00     ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-25  2:00 UTC (permalink / raw)
  To: Julien Grall, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

On 2015/5/22 18:25, Julien Grall wrote:
> Hi,
>
> On 22/05/2015 10:35, Tiejun Chen wrote:
>> Here we'll construct a basic guest e820 table via
>> XENMEM_set_memory_map. This table includes lowmem, highmem
>> and RDMs if they exist. And hvmloader would need this info
>> later.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxl/libxl_dom.c | 87
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 87 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 84d5465..cc4b1a6 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -913,6 +913,87 @@ out:
>>       return rc;
>>   }
>>
>> +/*
>> + * Here we're just trying to set these kinds of e820 mappings:
>> + *
>> + * #1. Low memory region
>> + *
>> + * Low RAM starts at least from 1M to make sure all standard regions
>> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
>> + * have enough space.
>> + * Note: Those stuffs below 1M are still constructed with multiple
>> + * e820 entries by hvmloader. At this point we don't change anything.
>> + *
>> + * #2. RDM region if it exists
>> + *
>> + * #3. High memory region if it exists
>> + *
>> + * Note: these regions are not overlapping since we already check
>> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
>> + */
>> +#define GUEST_LOW_MEM_START_DEFAULT 0x100000
>> +static int libxl__domain_construct_memmap(libxl__gc *gc,
>> +                                          libxl_domain_config *d_config,
>> +                                          uint32_t domid,
>> +                                          struct xc_hvm_build_args
>> *args)
>
> The code within this function is x86 specific. Shouldn't it be moved in
> libxl_x86.c?
>

Sounds reasonable.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-22 10:33   ` Julien Grall
@ 2015-05-25  2:09     ` Chen, Tiejun
  2015-05-25 10:02       ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-25  2:09 UTC (permalink / raw)
  To: Julien Grall, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

On 2015/5/22 18:33, Julien Grall wrote:
> Hi,
>
> On 22/05/2015 10:35, Tiejun Chen wrote:
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 0c0ea4a..203c80e 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
>>               XEN_GUEST_HANDLE_64(char) path; /* path to the device
>> tree node */
>>           } dt;
>>       } u;
>> +    /* IN */
>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>> +    uint32_t  flag;   /* flag of assigned device */
>
> You don't plumb this value for DT neither in the toolstack (see
> xc_assign_dt_device) and Xen. Please add a comment saying it's only used

I think we should do this,

@@ -1801,6 +1801,8 @@ int xc_assign_dt_device(

      domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
      domctl.u.assign_device.u.dt.size = size;
+    /* DT doesn't own any RDM. */
+    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
      set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);

      rc = do_domctl(xch, &domctl);

Thanks
Tiejun

> by PCI and/or the value should always be XEN_DOMCTL_DEV_NO_RDM for DT.
>
> Regards,
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-25  2:09     ` Chen, Tiejun
@ 2015-05-25 10:02       ` Julien Grall
  2015-05-25 10:50         ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2015-05-25 10:02 UTC (permalink / raw)
  To: Chen, Tiejun, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel



On 25/05/2015 04:09, Chen, Tiejun wrote:
> On 2015/5/22 18:33, Julien Grall wrote:
>> Hi,
>>
>> On 22/05/2015 10:35, Tiejun Chen wrote:
>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>> index 0c0ea4a..203c80e 100644
>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
>>>               XEN_GUEST_HANDLE_64(char) path; /* path to the device
>>> tree node */
>>>           } dt;
>>>       } u;
>>> +    /* IN */
>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>> +    uint32_t  flag;   /* flag of assigned device */
>>
>> You don't plumb this value for DT neither in the toolstack (see
>> xc_assign_dt_device) and Xen. Please add a comment saying it's only used
>
> I think we should do this,
>
> @@ -1801,6 +1801,8 @@ int xc_assign_dt_device(
>
>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
>       domctl.u.assign_device.u.dt.size = size;
> +    /* DT doesn't own any RDM. */
> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
>       set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
>
>       rc = do_domctl(xch, &domctl);

I would be fine with plumbing in drivers/passthrough/device_tree.c and a 
check that the value is not different.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-25 10:02       ` Julien Grall
@ 2015-05-25 10:50         ` Chen, Tiejun
  2015-05-25 11:42           ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-25 10:50 UTC (permalink / raw)
  To: Julien Grall, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel


On 2015/5/25 18:02, Julien Grall wrote:
>
>
> On 25/05/2015 04:09, Chen, Tiejun wrote:
>> On 2015/5/22 18:33, Julien Grall wrote:
>>> Hi,
>>>
>>> On 22/05/2015 10:35, Tiejun Chen wrote:
>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>> index 0c0ea4a..203c80e 100644
>>>> --- a/xen/include/public/domctl.h
>>>> +++ b/xen/include/public/domctl.h
>>>> @@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
>>>>               XEN_GUEST_HANDLE_64(char) path; /* path to the device
>>>> tree node */
>>>>           } dt;
>>>>       } u;
>>>> +    /* IN */
>>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>>> +    uint32_t  flag;   /* flag of assigned device */
>>>
>>> You don't plumb this value for DT neither in the toolstack (see
>>> xc_assign_dt_device) and Xen. Please add a comment saying it's only used
>>
>> I think we should do this,
>>
>> @@ -1801,6 +1801,8 @@ int xc_assign_dt_device(
>>
>>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
>>       domctl.u.assign_device.u.dt.size = size;
>> +    /* DT doesn't own any RDM. */
>> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
>>       set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
>>
>>       rc = do_domctl(xch, &domctl);
>
> I would be fine with plumbing in drivers/passthrough/device_tree.c and a
> check that the value is not different.
>

Are you saying something like this?

@@ -149,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, 
struct domain *d,
          if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
              break;

+        if ( domctl->u.assign_device.dev == XEN_DOMCTL_DEV_NO_RDM )
+        {
+            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
+                   " to dom%u failed (%d) since we don't support RDM.\n",
+                   dt_node_full_name(dev), d->domain_id, ret);
+            break;
+        }
+
          if ( unlikely(d->is_dying) )
          {
              ret = -EINVAL;

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-25 10:50         ` Chen, Tiejun
@ 2015-05-25 11:42           ` Julien Grall
  2015-05-26  0:42             ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2015-05-25 11:42 UTC (permalink / raw)
  To: Chen, Tiejun, Julien Grall, JBeulich, tim, konrad.wilk,
	andrew.cooper3, kevin.tian, yang.z.zhang, ian.campbell, wei.liu2,
	Ian.Jackson, stefano.stabellini
  Cc: xen-devel

Hi,

On 25/05/2015 12:50, Chen, Tiejun wrote:
>
> On 2015/5/25 18:02, Julien Grall wrote:
>>
>>
>> On 25/05/2015 04:09, Chen, Tiejun wrote:
>>> On 2015/5/22 18:33, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 22/05/2015 10:35, Tiejun Chen wrote:
>>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>>> index 0c0ea4a..203c80e 100644
>>>>> --- a/xen/include/public/domctl.h
>>>>> +++ b/xen/include/public/domctl.h
>>>>> @@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
>>>>>               XEN_GUEST_HANDLE_64(char) path; /* path to the device
>>>>> tree node */
>>>>>           } dt;
>>>>>       } u;
>>>>> +    /* IN */
>>>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>>>> +    uint32_t  flag;   /* flag of assigned device */
>>>>
>>>> You don't plumb this value for DT neither in the toolstack (see
>>>> xc_assign_dt_device) and Xen. Please add a comment saying it's only
>>>> used
>>>
>>> I think we should do this,
>>>
>>> @@ -1801,6 +1801,8 @@ int xc_assign_dt_device(
>>>
>>>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
>>>       domctl.u.assign_device.u.dt.size = size;
>>> +    /* DT doesn't own any RDM. */
>>> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
>>>       set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
>>>
>>>       rc = do_domctl(xch, &domctl);
>>
>> I would be fine with plumbing in drivers/passthrough/device_tree.c and a
>> check that the value is not different.
>>
>
> Are you saying something like this?
>
> @@ -149,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl,
> struct domain *d,
>           if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
>               break;
>
> +        if ( domctl->u.assign_device.dev == XEN_DOMCTL_DEV_NO_RDM )

wrong field here. Other than that it looks good to me.

> +        {
> +            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
> +                   " to dom%u failed (%d) since we don't support RDM.\n",
> +                   dt_node_full_name(dev), d->domain_id, ret);
> +            break;
> +        }
> +
>           if ( unlikely(d->is_dying) )
>           {
>               ret = -EINVAL;

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-05-25 11:42           ` Julien Grall
@ 2015-05-26  0:42             ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-26  0:42 UTC (permalink / raw)
  To: Julien Grall, JBeulich, tim, konrad.wilk, andrew.cooper3,
	kevin.tian, yang.z.zhang, ian.campbell, wei.liu2, Ian.Jackson,
	stefano.stabellini
  Cc: xen-devel

On 2015/5/25 19:42, Julien Grall wrote:
> Hi,
>
> On 25/05/2015 12:50, Chen, Tiejun wrote:
>>
>> On 2015/5/25 18:02, Julien Grall wrote:
>>>
>>>
>>> On 25/05/2015 04:09, Chen, Tiejun wrote:
>>>> On 2015/5/22 18:33, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 22/05/2015 10:35, Tiejun Chen wrote:
>>>>>> diff --git a/xen/include/public/domctl.h
>>>>>> b/xen/include/public/domctl.h
>>>>>> index 0c0ea4a..203c80e 100644
>>>>>> --- a/xen/include/public/domctl.h
>>>>>> +++ b/xen/include/public/domctl.h
>>>>>> @@ -499,6 +499,11 @@ struct xen_domctl_assign_device {
>>>>>>               XEN_GUEST_HANDLE_64(char) path; /* path to the device
>>>>>> tree node */
>>>>>>           } dt;
>>>>>>       } u;
>>>>>> +    /* IN */
>>>>>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>>>>>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>>>>>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>>>>>> +    uint32_t  flag;   /* flag of assigned device */
>>>>>
>>>>> You don't plumb this value for DT neither in the toolstack (see
>>>>> xc_assign_dt_device) and Xen. Please add a comment saying it's only
>>>>> used
>>>>
>>>> I think we should do this,
>>>>
>>>> @@ -1801,6 +1801,8 @@ int xc_assign_dt_device(
>>>>
>>>>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
>>>>       domctl.u.assign_device.u.dt.size = size;
>>>> +    /* DT doesn't own any RDM. */
>>>> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
>>>>       set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
>>>>
>>>>       rc = do_domctl(xch, &domctl);
>>>
>>> I would be fine with plumbing in drivers/passthrough/device_tree.c and a
>>> check that the value is not different.
>>>
>>
>> Are you saying something like this?
>>
>> @@ -149,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl,
>> struct domain *d,
>>           if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
>>               break;
>>
>> +        if ( domctl->u.assign_device.dev == XEN_DOMCTL_DEV_NO_RDM )
>
> wrong field here. Other than that it looks good to me.

Sorry for this typo, s/.dev/.flag.

Thanks
Tiejun

>
>> +        {
>> +            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign
>> \"%s\""
>> +                   " to dom%u failed (%d) since we don't support
>> RDM.\n",
>> +                   dt_node_full_name(dev), d->domain_id, ret);
>> +            break;
>> +        }
>> +
>>           if ( unlikely(d->is_dying) )
>>           {
>>               ret = -EINVAL;
>
> Regards,
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 00/14] Fix RMRR
  2015-05-22  9:46 ` [RFC][v2][PATCH 00/14] Fix RMRR Jan Beulich
@ 2015-05-28  5:48   ` Chen, Tiejun
  2015-05-28  7:55     ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-28  5:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

On 2015/5/22 17:46, Jan Beulich wrote:
>>>> On 22.05.15 at 11:35, <tiejun.chen@intel.com> wrote:
>> As you know all devices are owned by Dom0 firstly before we create any
>> DomU, right? Do we allow Dom0 still own a group device while assign another
>> device in the same group?
>
> Clearly not, or - just like anything else putting the security of a system
> at risk - only at explicit host admin request.
>

You're right.

After we discussed internally, we're intending to cover this simply 
since the case of shared RMRR is a rare case according to our previous 
experiences. Furthermore, Xen doesn't have a good existing API to 
directly assign this sort of group devices and even Xen doesn't identify 
these devices,  so currently we always assign devices one by one, right? 
This means we have to put more efforts to concern a good implementation 
to address something like, identification, atomic, hotplug and so on. 
Obviously, this would involve hypervisor and tools at the same time so 
this has a little bit of difficulty to work along with 4.6.

So could we do this separately?

#1. Phase 1 to 4.6

#1.1. Do a simple implementation

We just prevent from that device assignment if we're assigning this sort 
of group devices like this,

@@ -2291,6 +2291,16 @@ static int intel_iommu_assign_device(
               PCI_BUS(bdf) == bus &&
               PCI_DEVFN2(bdf) == devfn )
          {
+            if ( rmrr->scope.devices_cnt > 1 )
+            {
+                reassign_device_ownership(d, hardware_domain, devfn, pdev);
+                printk(XENLOG_G_ERR VTDPREFIX
+                       " cannot assign any device with RMRR for Dom%d 
(%d)\n",
+                       rmrr->base_address, rmrr->end_address,
+                       d->domain_id, ret);
+                ret = -EPERM;
+                break;
+            }
              ret = rmrr_identity_mapping(d, 1, rmrr, flag);
              if ( ret )
              {

Note this is just one draft code to show our idea. And I'm also 
concerning if we need to introduce a flag to bypass this to make sure we 
still have a approach to our original behavior.

#1.2. Post a design

We'd like to post a preliminary design to Xen community to get a better 
solution.

#2. Phase 2 after 4.6

Once the design is clear we will start writing patches to address this 
completely.

So any idea?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 00/14] Fix RMRR
  2015-05-28  5:48   ` Chen, Tiejun
@ 2015-05-28  7:55     ` Jan Beulich
  2015-05-29  7:58       ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2015-05-28  7:55 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

>>> On 28.05.15 at 07:48, <tiejun.chen@intel.com> wrote:
> On 2015/5/22 17:46, Jan Beulich wrote:
>>>>> On 22.05.15 at 11:35, <tiejun.chen@intel.com> wrote:
>>> As you know all devices are owned by Dom0 firstly before we create any
>>> DomU, right? Do we allow Dom0 still own a group device while assign another
>>> device in the same group?
>>
>> Clearly not, or - just like anything else putting the security of a system
>> at risk - only at explicit host admin request.
>>
> 
> You're right.
> 
> After we discussed internally, we're intending to cover this simply 
> since the case of shared RMRR is a rare case according to our previous 
> experiences. Furthermore, Xen doesn't have a good existing API to 
> directly assign this sort of group devices and even Xen doesn't identify 
> these devices,  so currently we always assign devices one by one, right? 
> This means we have to put more efforts to concern a good implementation 
> to address something like, identification, atomic, hotplug and so on. 
> Obviously, this would involve hypervisor and tools at the same time so 
> this has a little bit of difficulty to work along with 4.6.
> 
> So could we do this separately?
> 
> #1. Phase 1 to 4.6
> 
> #1.1. Do a simple implementation
> 
> We just prevent from that device assignment if we're assigning this sort 
> of group devices like this,

Right.

> @@ -2291,6 +2291,16 @@ static int intel_iommu_assign_device(
>                PCI_BUS(bdf) == bus &&
>                PCI_DEVFN2(bdf) == devfn )
>           {
> +            if ( rmrr->scope.devices_cnt > 1 )
> +            {
> +                reassign_device_ownership(d, hardware_domain, devfn, pdev);

I think if this is really needed here, the check comes too late.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-05-22  9:35 ` [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2015-05-28 12:27   ` Jan Beulich
  2015-05-29  1:19     ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2015-05-28 12:27 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: tim, xen-devel

>>> On 22.05.15 at 11:35, <tiejun.chen@intel.com> wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -898,6 +898,36 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>      return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>  }
>  
> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                           p2m_access_t p2ma)
> +{
> +    p2m_type_t p2mt;
> +    p2m_access_t a;
> +    mfn_t mfn;
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int ret = -EBUSY;
> +
> +    gfn_lock(p2m, gfn, 0);
> +
> +    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
> +
> +    if ( p2mt == p2m_invalid )
> +        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
> +                            p2m_mmio_direct, p2ma);
> +    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
> +        ret = 0;
> +    else
> +    {
> +        printk(XENLOG_G_WARNING
> +               "Cannot identity map d%d:%lx, already mapped to %lx.\n",
> +               d->domain_id, gfn, mfn_x(mfn));
> +    }

With the redundant braces here dropped or the ret = -EBUSY moved
into this block,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

I also reduced the Cc list quite significantly - I don't understand why
so many people were Cc-ed on this patch.

Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-05-28 12:27   ` Jan Beulich
@ 2015-05-29  1:19     ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-29  1:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: tim, xen-devel


On 2015/5/28 20:27, Jan Beulich wrote:
>>>> On 22.05.15 at 11:35, <tiejun.chen@intel.com> wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -898,6 +898,36 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>>       return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>>   }
>>
>> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>> +                           p2m_access_t p2ma)
>> +{
>> +    p2m_type_t p2mt;
>> +    p2m_access_t a;
>> +    mfn_t mfn;
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int ret = -EBUSY;
>> +
>> +    gfn_lock(p2m, gfn, 0);
>> +
>> +    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
>> +
>> +    if ( p2mt == p2m_invalid )
>> +        ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
>> +                            p2m_mmio_direct, p2ma);
>> +    else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
>> +        ret = 0;
>> +    else
>> +    {
>> +        printk(XENLOG_G_WARNING
>> +               "Cannot identity map d%d:%lx, already mapped to %lx.\n",
>> +               d->domain_id, gfn, mfn_x(mfn));
>> +    }
>
> With the redundant braces here dropped or the ret = -EBUSY moved
> into this block,

Okay, I will fix this with the latter.

> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>

Really thanks for your review.

> I also reduced the Cc list quite significantly - I don't understand why
> so many people were Cc-ed on this patch.
>

I just pick up all guys involving that design we posted previously to 
make sure they also pay attention on this series.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 00/14] Fix RMRR
  2015-05-28  7:55     ` Jan Beulich
@ 2015-05-29  7:58       ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-05-29  7:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

On 2015/5/28 15:55, Jan Beulich wrote:
>>>> On 28.05.15 at 07:48, <tiejun.chen@intel.com> wrote:
>> On 2015/5/22 17:46, Jan Beulich wrote:
>>>>>> On 22.05.15 at 11:35, <tiejun.chen@intel.com> wrote:
>>>> As you know all devices are owned by Dom0 firstly before we create any
>>>> DomU, right? Do we allow Dom0 still own a group device while assign another
>>>> device in the same group?
>>>
>>> Clearly not, or - just like anything else putting the security of a system
>>> at risk - only at explicit host admin request.
>>>
>>
>> You're right.
>>
>> After we discussed internally, we're intending to cover this simply
>> since the case of shared RMRR is a rare case according to our previous
>> experiences. Furthermore, Xen doesn't have a good existing API to
>> directly assign this sort of group devices and even Xen doesn't identify
>> these devices,  so currently we always assign devices one by one, right?
>> This means we have to put more efforts to concern a good implementation
>> to address something like, identification, atomic, hotplug and so on.
>> Obviously, this would involve hypervisor and tools at the same time so
>> this has a little bit of difficulty to work along with 4.6.
>>
>> So could we do this separately?
>>
>> #1. Phase 1 to 4.6
>>
>> #1.1. Do a simple implementation
>>
>> We just prevent from that device assignment if we're assigning this sort
>> of group devices like this,
>
> Right.
>
>> @@ -2291,6 +2291,16 @@ static int intel_iommu_assign_device(
>>                 PCI_BUS(bdf) == bus &&
>>                 PCI_DEVFN2(bdf) == devfn )
>>            {
>> +            if ( rmrr->scope.devices_cnt > 1 )
>> +            {
>> +                reassign_device_ownership(d, hardware_domain, devfn, pdev);
>
> I think if this is really needed here, the check comes too late.
>

So we can do this at the begging of this function

@@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
      if ( list_empty(&acpi_drhd_units) )
          return -ENODEV;

+    seg = pdev->seg;
+    bus = pdev->bus;
+    /*
+     * In rare cases one given rmrr is shared by multiple devices but
+     * obviously this would put the security of a system at risk. So
+     * we should prevent from this sort of device assignment.
+     *
+     * TODO: actually we can group these devices which shared rmrr, and
+     * then allow all devices within a group to be assigned to same domain.
+     */
+    for_each_rmrr_device( rmrr, bdf, i )
+    {
+        if ( rmrr->segment == seg &&
+             PCI_BUS(bdf) == bus &&
+             PCI_DEVFN2(bdf) == devfn )
+        {
+            if ( rmrr->scope.devices_cnt > 1 )
+            {
+                ret = -EPERM;
+                printk(XENLOG_G_ERR VTDPREFIX
+                       " cannot assign this device with shared RMRR for 
Dom%d (%d)\n",
+                       d->domain_id, ret);
+                return ret;
+            }
+        }
+    }
+
      ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
      if ( ret )
          return ret;

-    seg = pdev->seg;
-    bus = pdev->bus;
-
      /* Setup rmrr identity mapping */
      for_each_rmrr_device( rmrr, bdf, i )
      {

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy
  2015-05-22  9:35 ` [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-06-02 15:57   ` Wei Liu
  2015-06-03  1:35     ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-02 15:57 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, JBeulich, yang.z.zhang,
	Ian.Jackson

On Fri, May 22, 2015 at 05:35:01PM +0800, Tiejun Chen wrote:
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
> 
> Global RDM parameter:
>     rdm = [ 'type=none/host, reserve=strict/relaxed' ]
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]
> 
> Global RDM parameter, "type", allows user to specify reserved regions
> explicitly, e.g. using 'host' to include all reserved regions reported
> on this platform which is good to handle hotplug scenario. In the future
> this parameter may be further extended to allow specifying random regions,
> e.g. even those belonging to another platform as a preparation for live
> migration with passthrough devices. Instead, 'none' means we have nothing
> to do all reserved regions and ignore all policies, so guest work as before.
> 
> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
> regions in pfn space. If conflict exists, 'strict' means an immediate error
> so VM will be killed, while 'relaxed' allows moving forward with a warning
> message thrown out.
> 
> Default per-device RDM policy is 'strict', while default global RDM policy
> is 'relaxed'. When both policies are specified on a given region, 'strict' is
> always preferred.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  docs/man/xl.cfg.pod.5        | 57 +++++++++++++++++++++++++++
>  docs/misc/vtd.txt            | 24 ++++++++++++
>  tools/libxl/libxl_create.c   | 13 +++++++
>  tools/libxl/libxl_internal.h |  2 +
>  tools/libxl/libxl_pci.c      |  2 +
>  tools/libxl/libxl_types.idl  | 18 +++++++++
>  tools/libxl/libxlu_pci.c     | 92 ++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxlutil.h      |  4 ++
>  tools/libxl/xl_cmdimpl.c     | 10 +++++
>  9 files changed, 222 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 8e4154f..12c34c4 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -645,6 +645,49 @@ assigned slave device.
>  
>  =back
>  
> +=item B<rdm= "RDM_RESERVE_STRING" >

Stray space after before and after "RDM_RESERVE_STRING". 

> +
> +(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
> +which is necessary to enable robust device passthrough usage. One example of

Delete "usage".

> +RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
> +structure on x86 platform.
> +
> +B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
> +
> +=over 4
> +
> +=item B<KEY=VALUE>
> +
> +Possible B<KEY>s are:
> +
> +=over 4
> +
> +=item B<type="STRING">
> +
> +Currently we just have two types:
> +
> +"host" means all reserved device memory on this platform should be reserved
> +in this VM's pfn space. This global RDM parameter allows user to specify

PFN is Xen internal terminology. Do you mean "guest address space"? Note
that the reader is system administrators who might not know / want to
know Xen internals.

> +reserved regions explicitly. And using "host" to include all reserved regions
> +reported on this platform which is good to handle hotplug scenario. In the
> +future this parameter may be further extended to allow specifying random
> +regions, e.g. even those belonging to another platform as a preparation

Extending how? What's your envisaged syntax for those random regions?
Should you want to reserve more, an array is more useful. Could you
provide some examples?

> +for live migration with passthrough devices.
> +
> +"none" means we have nothing to do all reserved regions and ignore all policies,
> +so guest work as before.
> +
> +=over 4
> +
> +=item B<reserve="STRING">
> +
> +Conflict may be detected when reserving reserved device memory in gfn space.

GFN is a Xen internal terminology. Maybe you should use "guest address
space"?

Nonetheless the terminology throughout this document should be
consistent.

> +"strict" means an unsolved conflict leads to immediate VM crash, while
> +"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
> +is default.
> +
> +Note this may be overrided by another sub item, rdm_reserve, in pci device.
> +

"overridden by rdm_reserve option in PCI device configuration".

>  =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
>  
>  Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
> @@ -707,6 +750,20 @@ dom0 without confirmation.  Please use with care.
>  D0-D3hot power management states for the PCI device. False (0) by
>  default.
>  
> +=item B<rdm_reserv="STRING">
> +
> +(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
> +which is necessary to enable robust device passthrough usage. One example of

Delete "usage".

> +RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
> +structure on x86 platform.
> +
> +Conflict may be detected when reserving reserved device memory in gfn space.
> +"strict" means an unsolved conflict leads to immediate VM crash, while
> +"relaxed" allows VM moving forward with a warning message thrown out. "strict"
> +is default.
> +

Actually these two paragraphs are the same as before. You can just point
readers to previous sections instead of copying them here.

> +Note this would override global B<rdm> option.
> +
>  =back
>  
>  =back
> diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> index 9af0e99..7d63c47 100644
> --- a/docs/misc/vtd.txt
> +++ b/docs/misc/vtd.txt
> @@ -111,6 +111,30 @@ in the config file:
>  To override for a specific device:
>  	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
>  
> +RDM, 'reserved device memory', for PCI Device Passthrough
> +---------------------------------------------------------
> +
> +There are some devices the BIOS controls, for e.g. USB devices to perform
> +PS2 emulation. The regions of memory used for these devices are marked
> +reserved in the e820 map. When we turn on DMA translation, DMA to those
> +regions will fail. Hence BIOS uses RMRR to specify these regions along with
> +devices that need to access these regions. OS is expected to setup
> +identity mappings for these regions for these devices to access these regions.
> +
> +While creating a VM we should reserve them in advance, and avoid any conflicts.
> +So we introduce user configurable parameters to specify RDM resource and
> +according policies,
> +
> +To enable this globally, add "rdm" in the config file:
> +
> +    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
> +
> +Or just for a specific device:
> +
> +    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
> +
> +For all the options available to RDM, see xl.cfg(5).
> +
>  
>  Caveat on Conventional PCI Device Passthrough
>  ---------------------------------------------
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index f0da7dc..d649ead 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -100,6 +100,12 @@ static int sched_params_valid(libxl__gc *gc,
>      return 1;
>  }
>  
> +void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
> +{
> +    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_NONE;
> +    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;

No, not like this. You set everything back to none and relaxed even if
it is set before this point.

It should be
    if (xxx == DEFAULT_SENTINEL_VALUE)
        xxx = THE_DEFAULT_YOU_WANT;

Have a look at libxl__device_nic_setdefault etc to get an idea
how it works. Don't hesitate to ask if I'm not clear enough.

> +}
> +
>  int libxl__domain_build_info_setdefault(libxl__gc *gc,
>                                          libxl_domain_build_info *b_info)
>  {
> @@ -410,6 +416,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>                     libxl_domain_type_to_string(b_info->type));
>          return ERROR_INVAL;
>      }
> +
> +    libxl__rdm_setdefault(gc, b_info);
>      return 0;
>  }
>  
> @@ -1439,6 +1447,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
>      }
>  
>      for (i = 0; i < d_config->num_pcidevs; i++) {
> +        /*
> +         * If the rdm global policy is 'force' we should override each device.
> +         */

"strict" not "force"

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM
  2015-05-22  9:35 ` [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-06-02 16:29   ` Wei Liu
  2015-06-03  2:25     ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-02 16:29 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, JBeulich, yang.z.zhang,
	Ian.Jackson

On Fri, May 22, 2015 at 05:35:04PM +0800, Tiejun Chen wrote:
> While building a VM, HVM domain builder provides struct hvm_info_table{}
> to help hvmloader. Currently it includes two fields to construct guest
> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> check them to fix any conflict with RAM.
> 
> RMRR can reside in address space beyond 4G theoretically, but we never
> see this in real world. So in order to avoid breaking highmem layout
> we don't solve highmem conflict. Note this means highmem rmrr could still
> be supported if no conflict.
> 
> But in the case of lowmem, RMRR probably scatter the whole RAM space.
> Especially multiple RMRR entries would worsen this to lead a complicated
> memory layout. And then its hard to extend hvm_info_table{} to work
> hvmloader out. So here we're trying to figure out a simple solution to
> avoid breaking existing layout. So when a conflict occurs,
> 
>     #1. Above a predefined boundary (default 2G)
>         - move lowmem_end below reserved region to solve conflict;
> 
>     #2. Below a predefined boundary (default 2G)
>         - Check strict/relaxed policy.
>         "strict" policy leads to fail libxl. Note when both policies
>         are specified on a given region, 'strict' is always preferred.
>         "relaxed" policy issue a warning message and also mask this entry INVALID
>         to indicate we shouldn't expose this entry to hvmloader.
> 
> Note this predefined boundary can be changes with the parameter
> "rdm_mem_boundary" in .cfg file.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---

It would be better you write down what you changed in this version after
"---" marker.

What we normally do is 


libxl: implement FOO

FOO is needed because ...

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
changes in vN:
 * bar -> baz
 * more comments
---

The stuff between two "---" will be automatically discarded when
committing.

>  docs/man/xl.cfg.pod.5          |  21 ++++
>  tools/libxc/include/xenguest.h |   1 +
>  tools/libxc/xc_hvm_build_x86.c |  25 ++--
>  tools/libxl/libxl_create.c     |   2 +-
>  tools/libxl/libxl_dm.c         | 253 +++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_dom.c        |  27 ++++-
>  tools/libxl/libxl_internal.h   |  11 +-
>  tools/libxl/libxl_types.idl    |   8 ++
>  tools/libxl/xl_cmdimpl.c       |   3 +
>  9 files changed, 337 insertions(+), 14 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 12c34c4..80e3930 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -764,6 +764,27 @@ is default.
>  
>  Note this would override global B<rdm> option.
>  
> +=item B<rdm_mem_boundary=MBYTES>
> +
> +Number of megabytes to set a boundary for checking rdm conflict.
> +
> +When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
> +Especially multiple RMRR entries would worsen this to lead a complicated
> +memory layout. So here we're trying to figure out a simple solution to
> +avoid breaking existing layout. So when a conflict occurs,
> +
> +    #1. Above a predefined boundary
> +        - move lowmem_end below reserved region to solve conflict;
> +
> +    #2. Below a predefined boundary
> +        - Check strict/relaxed policy.
> +        "strict" policy leads to fail libxl. Note when both policies
> +        are specified on a given region, 'strict' is always preferred.
> +        "relaxed" policy issue a warning message and also mask this entry INVALID
> +        to indicate we shouldn't expose this entry to hvmloader.
> +
> +Her the default is 2G.

Typo "her".

I get the idea. I will leave grammar / syntax check to native speakers.

> +
>  =back
>  
>  =back
> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
> index 7581263..4cb7e9f 100644
> --- a/tools/libxc/include/xenguest.h
> +++ b/tools/libxc/include/xenguest.h
> @@ -234,6 +234,7 @@ struct xc_hvm_firmware_module {
>  };
>  
>  struct xc_hvm_build_args {
> +    uint64_t lowmem_size;        /* All low memory size in bytes. */

You might find this value unnecessary with my patch to consolidate
memory layout generation in libxl?

>      uint64_t mem_size;           /* Memory size in bytes. */
>      uint64_t mem_target;         /* Memory target in bytes. */
>      uint64_t mmio_size;          /* Size of the MMIO hole in bytes. */
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index e45ae4a..9a1567a 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -21,6 +21,7 @@
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <zlib.h>
> +#include <assert.h>
>  
>  #include "xg_private.h"
>  #include "xc_private.h"
> @@ -98,11 +99,8 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      uint8_t sum;
>      int i;
>  
> -    if ( lowmem_end > mmio_start )
> -    {
> -        highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
> -        lowmem_end = mmio_start;
> -    }
> +    if ( args->mem_size > lowmem_end )
> +        highmem_end = (1ull<<32) + (args->mem_size - lowmem_end);
>  
>      memset(hvm_info_page, 0, PAGE_SIZE);
>  
> @@ -279,7 +277,7 @@ static int setup_guest(xc_interface *xch,
>  
>      elf_parse_binary(&elf);
>      v_start = 0;
> -    v_end = args->mem_size;
> +    v_end = args->lowmem_size;
>  
>      if ( nr_pages > target_pages )
>          memflags |= XENMEMF_populate_on_demand;
> @@ -344,8 +342,14 @@ static int setup_guest(xc_interface *xch,
>  
>      for ( i = 0; i < nr_pages; i++ )
>          page_array[i] = i;
> -    for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
> -        page_array[i] += mmio_size >> PAGE_SHIFT;
> +    /*
> +     * Actually v_end is args->lowmem_size, and we already adjusted
> +     * this below mmio_start when we check rdm previously, so here
> +     * this condition 'v_end <= mmio_start' is always true.
> +     */
> +    assert(v_end <= mmio_start);
> +    for ( i = v_end >> PAGE_SHIFT; i < nr_pages; i++ )
> +        page_array[i] += ((1ull << 32) - v_end) >> PAGE_SHIFT;
>  
>      /*
>       * Try to claim pages for early warning of insufficient memory available.
> @@ -664,9 +668,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
>      if ( args.mem_target == 0 )
>          args.mem_target = args.mem_size;
>  
> -    if ( args.mmio_size == 0 )
> -        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
> -
>      /* An HVM guest must be initialised with at least 2MB memory. */
>      if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
>          return -1;
> @@ -713,6 +714,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>      args.mem_size = (uint64_t)memsize << 20;
>      args.mem_target = (uint64_t)target << 20;
>      args.image_file_name = image_name;
> +    if ( args.mmio_size == 0 )
> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>  

The above hunk can be simplified with my patch mentioned above.

>      return xc_hvm_build(xch, domid, &args);
>  }
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index d649ead..a782860 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
>  
>      switch (info->type) {
>      case LIBXL_DOMAIN_TYPE_HVM:
> -        ret = libxl__build_hvm(gc, domid, info, state);
> +        ret = libxl__build_hvm(gc, domid, d_config, state);
>          if (ret)
>              goto out;
>  
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 0c6408d..85e5317 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -90,6 +90,259 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>      return dm;
>  }
>  
> +static struct xen_reserved_device_memory
> +*xc_device_get_rdm(libxl__gc *gc,
> +                   uint32_t flag,
> +                   uint16_t seg,
> +                   uint8_t bus,
> +                   uint8_t devfn,
> +                   unsigned int *nr_entries)
> +{
> +    struct xen_reserved_device_memory *xrdm = NULL;
> +    int rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                           xrdm, nr_entries);
> +

Please separate declaration and function call. Also change xrdm to NULL
in that function call.

> +    assert( rc <= 0 );
> +    /* "0" means we have no any rdm entry. */
> +    if ( !rc )
> +        goto out;

Also set *nr_entries = 0; otherwise you can't distinguish error vs 0
entries.

> +
> +    if ( errno == ENOBUFS )
> +    {
> +        if ( (xrdm = malloc(*nr_entries *
> +                            sizeof(xen_reserved_device_memory_t))) == NULL )

Move xrdm = malloc out of "if".

> +        {
> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
> +            goto out;
> +        }
> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                           xrdm, nr_entries);
> +        if ( rc )
> +        {
> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
> +            *nr_entries = 0;
> +            free(xrdm);
> +            xrdm = NULL;
> +        }
> +    }
> +    else
> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
> +
> + out:
> +    return xrdm;
> +}
> +
> +/*
> + * Check whether there exists rdm hole in the specified memory range.
> + * Returns true if exists, else returns false.
> + */
> +static bool overlaps_rdm(uint64_t start, uint64_t memsize,
> +                         uint64_t rdm_start, uint64_t rdm_size)
> +{
> +    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
> +}
> +
> +/*
> + * Check reported RDM regions and handle potential gfn conflicts according
> + * to user preferred policy.
> + *
> + * RMRR can reside in address space beyond 4G theoretically, but we never
> + * see this in real world. So in order to avoid breaking highmem layout
> + * we don't solve highmem conflict. Note this means highmem rmrr could still
> + * be supported if no conflict.
> + *
> + * But in the case of lowmem, RMRR probably scatter the whole RAM space.
> + * Especially multiple RMRR entries would worsen this to lead a complicated
> + * memory layout. And then its hard to extend hvm_info_table{} to work
> + * hvmloader out. So here we're trying to figure out a simple solution to
> + * avoid breaking existing layout. So when a conflict occurs,
> + *
> + * #1. Above a predefined boundary (default 2G)
> + * - Move lowmem_end below reserved region to solve conflict;
> + *
> + * #2. Below a predefined boundary (default 2G)
> + * - Check strict/relaxed policy.
> + * "strict" policy leads to fail libxl. Note when both policies
> + * are specified on a given region, 'strict' is always preferred.
> + * "relaxed" policy issue a warning message and also mask this entry
> + * INVALID to indicate we shouldn't expose this entry to hvmloader.
> + */
> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
> +                                       libxl_domain_config *d_config,
> +                                       uint64_t rdm_mem_boundary,
> +                                       struct xc_hvm_build_args *args)
> +{
> +    int i, j, conflict;
> +    struct xen_reserved_device_memory *xrdm = NULL;
> +    uint64_t rdm_start, rdm_size, highmem_end = (1ULL << 32);
> +    uint32_t type = d_config->b_info.rdm.type;
> +    uint16_t seg;
> +    uint8_t bus, devfn;
> +
> +    /* Fix highmem. */
> +    highmem_end += (args->mem_size - args->lowmem_size);
> +
> +    /* Might not expose rdm. */
> +    if (type == LIBXL_RDM_RESERVE_TYPE_NONE)
> +        return 0;
> +
> +    /* Query all RDM entries in this platform */
> +    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
> +        unsigned int nr_entries;
> +
> +        /* Collect all rdm info if exist. */
> +        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
> +                                 0, 0, 0, &nr_entries);
> +        if (!nr_entries)
> +            return 0;
> +
> +        d_config->num_rdms = nr_entries;
> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
> +
> +        for (i = 0; i < d_config->num_rdms; i++) {
> +            d_config->rdms[i].start =
> +                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
> +            d_config->rdms[i].size =
> +                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
> +            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
> +        }
> +
> +        free(xrdm);
> +    } else
> +        d_config->num_rdms = 0;
> +
> +    /* Query RDM entries per-device */
> +    for (i = 0; i < d_config->num_pcidevs; i++) {
> +        unsigned int nr_entries;
> +

Stray blank line.

> +        bool new = true;

Need blank line here.

> +        seg = d_config->pcidevs[i].domain;
> +        bus = d_config->pcidevs[i].bus;
> +        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
> +        nr_entries = 0;
> +        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
> +                                 seg, bus, devfn, &nr_entries);

You didn't check xrdm != NULL;

> +        /* No RDM to associated with this device. */
> +        if (!nr_entries)
> +            continue;
> +
> +        /*
> +         * Need to check whether this entry is already saved in the array.
> +         * This could come from two cases:
> +         *
> +         *   - user may configure to get all RMRRs in this platform, which
> +         *   is already queried before this point
> +         *   - or two assigned devices may share one RMRR entry
> +         *
> +         * different policies may be configured on the same RMRR due to above
> +         * two cases. We choose a simple policy to always favor stricter policy
> +         */
> +        for (j = 0; j < d_config->num_rdms; j++) {
> +            if (d_config->rdms[j].start ==
> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
> +             {
> +                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
> +                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
> +                new = false;
> +                break;
> +            }
> +        }
> +
> +        if (new) {
> +            d_config->num_rdms++;
> +            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
> +
> +            /* This is a new entry. */

Delete this comment.

> +            d_config->rdms[d_config->num_rdms].start =
> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
> +            d_config->rdms[d_config->num_rdms].size =
> +                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
> +            d_config->rdms[d_config->num_rdms].flag = d_config->pcidevs[i].rdm_reserve;

Line too long.

> +        }
> +        free(xrdm);
> +    }
> +
> +    /*
> +     * Next step is to check and avoid potential conflict between RDM entries
> +     * and guest RAM. To avoid intrusive impact to existing memory layout
> +     * {lowmem, mmio, highmem} which is passed around various function blocks,
> +     * below conflicts are not handled which are rare and handling them would
> +     * lead to a more scattered layout:
> +     *  - RMRR in highmem area (>4G)
> +     *  - RMRR lower than a defined memory boundary (e.g. 2G)
> +     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
> +     * end below reserved region to solve conflict.
> +     *
> +     * If a conflict is detected on a given RMRR entry, an error will be
> +     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
> +     * specified, this conflict is treated just as a warning, but we mark this
> +     * RMRR entry as INVALID to indicate that this entry shouldn't be exposed
> +     * to hvmloader.
> +     *
> +     * Firstly we should check the case of rdm < 4G because we may need to
> +     * expand highmem_end.
> +     */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        rdm_start = d_config->rdms[i].start;
> +        rdm_size = d_config->rdms[i].size;
> +        conflict = overlaps_rdm(0, args->lowmem_size, rdm_start, rdm_size);
> +
> +        if (!conflict)
> +            continue;
> +
> +        /* Just check if RDM > our memory boundary. */
> +        if (rdm_start > rdm_mem_boundary) {
> +            /*
> +             * We will move downwards lowmem_end so we have to expand
> +             * highmem_end.
> +             */
> +            highmem_end += (args->lowmem_size - rdm_start);
> +            /* Now move downwards lowmem_end. */
> +            args->lowmem_size = rdm_start;
> +        }
> +    }
> +
> +    /*
> +     * Finally we can take same policy to check lowmem(< 2G) and
> +     * highmem adjusted above.
> +     */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        rdm_start = d_config->rdms[i].start;
> +        rdm_size = d_config->rdms[i].size;
> +        /* Does this entry conflict with lowmem? */
> +        conflict = overlaps_rdm(0, args->lowmem_size,
> +                                rdm_start, rdm_size);
> +        /* Does this entry conflict with highmem? */
> +        conflict |= overlaps_rdm((1ULL<<32),
> +                                 highmem_end - (1ULL<<32),
> +                                 rdm_start, rdm_size);
> +
> +        if (!conflict)
> +            continue;
> +
> +        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
> +            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
> +            goto out;
> +        } else {
> +            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
> +                      d_config->rdms[i].start);
> +
> +            /*
> +             * Then mask this INVALID to indicate we shouldn't expose this
> +             * to hvmloader.
> +             */
> +            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
> +        }
> +    }
> +
> +    return 0;
> +
> + out:
> +    return -1;

Please return libxl error code.


> +}
> +
>  const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
>  {
>      const libxl_vnc_info *vnc = NULL;
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index a0c9850..84d5465 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -914,12 +914,14 @@ out:
>  }
>  
>  int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> -              libxl_domain_build_info *info,
> +              libxl_domain_config *d_config,
>                libxl__domain_build_state *state)
>  {
>      libxl_ctx *ctx = libxl__gc_owner(gc);
>      struct xc_hvm_build_args args = {};
>      int ret, rc = ERROR_FAIL;
> +    libxl_domain_build_info *const info = &d_config->b_info;
> +    uint64_t rdm_mem_boundary, mmio_start;
>  
>      memset(&args, 0, sizeof(struct xc_hvm_build_args));
>      /* The params from the configuration file are in Mb, which are then
> @@ -928,6 +930,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>       * Do all this in one step here...
>       */
>      args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
> +    args.lowmem_size = min((uint64_t)(1ULL << 32), args.mem_size);
>      args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
>      args.claim_enabled = libxl_defbool_val(info->claim_mode);
>      if (info->u.hvm.mmio_hole_memkb) {
> @@ -937,6 +940,28 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>          if (max_ram_below_4g < HVM_BELOW_4G_MMIO_START)
>              args.mmio_size = info->u.hvm.mmio_hole_memkb << 10;
>      }
> +
> +    if (args.mmio_size == 0)
> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
> +    mmio_start = (1ull << 32) - args.mmio_size;
> +
> +    if (args.lowmem_size > mmio_start)
> +        args.lowmem_size = mmio_start;
> +
> +    /*
> +     * We'd like to set a memory boundary to determine if we need to check
> +     * any overlap with reserved device memory.
> +     */
> +    rdm_mem_boundary = 0x80000000;
> +    if (info->rdm_mem_boundary_memkb)

I think you mean info->rdm_mem_boundary_memkb != LIBXL_MEMKB_DEFAULT?

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() to support rdm reservation policy
  2015-05-22  9:35 ` [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() " Tiejun Chen
@ 2015-06-02 16:36   ` Wei Liu
  2015-06-03  2:58     ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-02 16:36 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, JBeulich, yang.z.zhang,
	Ian.Jackson

On Fri, May 22, 2015 at 05:35:08PM +0800, Tiejun Chen wrote:
> This patch passes rdm reservation policy to xc_assign_device() so the policy
> is checked when assigning devices to a VM.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/libxc/include/xenctrl.h       |  3 ++-
>  tools/libxc/xc_domain.c             |  4 +++-
>  tools/libxl/libxl_pci.c             | 11 ++++++++++-
>  tools/libxl/xl_cmdimpl.c            | 23 +++++++++++++++++++----
>  tools/libxl/xl_cmdtable.c           |  2 +-

Where is document for the new options you added to xl pci commands?

BTW you might want to consider rearrange patches in this series so that
you keep the tree bisectable.

>  tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
>  tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
>  xen/drivers/passthrough/pci.c       |  3 ++-
>  8 files changed, 70 insertions(+), 23 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 5f84a62..2a447b9 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>  /* HVM guest pass-through */
>  int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
> -                     uint32_t machine_sbdf);
> +                     uint32_t machine_sbdf,
> +                     uint32_t flag);
>  
>  int xc_get_device_group(xc_interface *xch,
>                       uint32_t domid,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index c17a5a8..9761e5a 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1704,7 +1704,8 @@ int xc_domain_setdebugging(xc_interface *xch,
>  int xc_assign_device(
>      xc_interface *xch,
>      uint32_t domid,
> -    uint32_t machine_sbdf)
> +    uint32_t machine_sbdf,
> +    uint32_t flag)
>  {
>      DECLARE_DOMCTL;
>  
> @@ -1712,6 +1713,7 @@ int xc_assign_device(
>      domctl.domain = domid;
>      domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
>      domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
> +    domctl.u.assign_device.flag = flag;
>  
>      return do_domctl(xch, &domctl);
>  }
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 07e84f2..ac70edc 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>      FILE *f;
>      unsigned long long start, end, flags, size;
>      int irq, i, rc, hvm = 0;
> +    uint32_t flag;
>  
>      if (type == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
> @@ -987,7 +988,15 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>  
>  out:
>      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> -        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> +        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
> +        } else {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");

unknown

Couldn't continue reviewing because I don't know the expected behaviour.
But the changes look mostly mechanical.

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map
  2015-05-22  9:35 ` [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map Tiejun Chen
  2015-05-22 10:25   ` Julien Grall
@ 2015-06-02 16:42   ` Wei Liu
  2015-06-03  3:06     ` Chen, Tiejun
  1 sibling, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-02 16:42 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, JBeulich, yang.z.zhang,
	Ian.Jackson

On Fri, May 22, 2015 at 05:35:10PM +0800, Tiejun Chen wrote:
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist. And hvmloader would need this info
> later.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/libxl/libxl_dom.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 84d5465..cc4b1a6 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -913,6 +913,87 @@ out:
>      return rc;
>  }
>  
> +/*
> + * Here we're just trying to set these kinds of e820 mappings:
> + *
> + * #1. Low memory region
> + *
> + * Low RAM starts at least from 1M to make sure all standard regions
> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> + * have enough space.
> + * Note: Those stuffs below 1M are still constructed with multiple
> + * e820 entries by hvmloader. At this point we don't change anything.
> + *
> + * #2. RDM region if it exists
> + *
> + * #3. High memory region if it exists
> + *
> + * Note: these regions are not overlapping since we already check
> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
> + */
> +#define GUEST_LOW_MEM_START_DEFAULT 0x100000
> +static int libxl__domain_construct_memmap(libxl__gc *gc,
> +                                          libxl_domain_config *d_config,
> +                                          uint32_t domid,
> +                                          struct xc_hvm_build_args *args)

This is x86 specific. I think libxl__domain_construct_e820 is better
name.

> +{
> +    libxl_ctx *ctx = libxl__gc_owner(gc);

Use CTX.

> +    unsigned int nr = 0, i;
> +    /* We always own at least one lowmem entry. */
> +    unsigned int e820_entries = 1;
> +    uint64_t highmem_end = 0, highmem_size = args->mem_size - args->lowmem_size;
> +    struct e820entry *e820 = NULL;
> +
> +    /* Add all rdm entries. */
> +    e820_entries += d_config->num_rdms;
> +
> +    /* If we should have a highmem range. */
> +    if (highmem_size)
> +    {
> +        highmem_end = (1ull<<32) + highmem_size;
> +        e820_entries++;
> +    }
> +
> +    if (e820_entries >= E820MAX) {
> +        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
> +        return -1;
> +    }
> +
> +    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
> +
> +    /* Low memory */
> +    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
> +    e820[nr].size = args->lowmem_size - GUEST_LOW_MEM_START_DEFAULT;
> +    e820[nr].type = E820_RAM;
> +    nr++;
> +
> +    /* RDM mapping */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        /*
> +         * We should drop this kind of rdm entry.
> +         */

This comment is not useful.

> +        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
> +            continue;
> +
> +        e820[nr].addr = d_config->rdms[i].start;
> +        e820[nr].size = d_config->rdms[i].size;
> +        e820[nr].type = E820_RESERVED;
> +        nr++;
> +    }
> +
> +    /* High memory */
> +    if (highmem_size) {
> +        e820[nr].addr = ((uint64_t)1 << 32);
> +        e820[nr].size = highmem_size;
> +        e820[nr].type = E820_RAM;
> +    }
> +
> +    if (xc_domain_set_memory_map(ctx->xch, domid, e820, e820_entries) != 0)
> +        return -1;
> +
> +    return 0;
> +}
> +
>  int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>                libxl_domain_config *d_config,
>                libxl__domain_build_state *state)
> @@ -1016,6 +1097,12 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>          ret = set_vnuma_info(gc, domid, info, state);
>          if (ret) goto out;
>      }
> +
> +    if (libxl__domain_construct_memmap(gc, d_config, domid, &args)) {
> +        LOG(ERROR, "setting domain rdm memory map failed");

The error message should not be RDM specific.

Wei.

> +        goto out;
> +    }
> +
>      ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
>                                 &state->store_mfn, state->console_port,
>                                 &state->console_mfn, state->store_domid,
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy
  2015-06-02 15:57   ` Wei Liu
@ 2015-06-03  1:35     ` Chen, Tiejun
  2015-06-07 11:06       ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-03  1:35 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

>> +=item B<rdm= "RDM_RESERVE_STRING" >
>
> Stray space after before and after "RDM_RESERVE_STRING".

Sure,

=item B<rdm="RDM_RESERVE_STRING">

>
>> +
>> +(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
>> +which is necessary to enable robust device passthrough usage. One example of
>
> Delete "usage".

Okay.

>
>> +RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
>> +structure on x86 platform.
>> +
>> +B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
>> +
>> +=over 4
>> +
>> +=item B<KEY=VALUE>
>> +
>> +Possible B<KEY>s are:
>> +
>> +=over 4
>> +
>> +=item B<type="STRING">
>> +
>> +Currently we just have two types:
>> +
>> +"host" means all reserved device memory on this platform should be reserved
>> +in this VM's pfn space. This global RDM parameter allows user to specify
>
> PFN is Xen internal terminology. Do you mean "guest address space"? Note
> that the reader is system administrators who might not know / want to
> know Xen internals.

Sure.

>
>> +reserved regions explicitly. And using "host" to include all reserved regions
>> +reported on this platform which is good to handle hotplug scenario. In the
>> +future this parameter may be further extended to allow specifying random
>> +regions, e.g. even those belonging to another platform as a preparation
>
> Extending how? What's your envisaged syntax for those random regions?

We didn't go into details while discussing that design. Maybe we can do 
something like this,

rdm="type=host,reserve=strict,rdm_add=size[KMG][@offset[KMG]],size[KMG][@offset[KMG]],..."

> Should you want to reserve more, an array is more useful. Could you

Yeah.

> provide some examples?

But we may have alternative approach to this when I noticed some guys 
are trying to delivery some patches about setting rmrr region by xen 
commandline. So I also would like to check this likelihood when we can 
step forward.

>
>> +for live migration with passthrough devices.
>> +
>> +"none" means we have nothing to do all reserved regions and ignore all policies,
>> +so guest work as before.
>> +
>> +=over 4
>> +
>> +=item B<reserve="STRING">
>> +
>> +Conflict may be detected when reserving reserved device memory in gfn space.
>
> GFN is a Xen internal terminology. Maybe you should use "guest address
> space"?
>
> Nonetheless the terminology throughout this document should be
> consistent.

Sure, so I will do this,

s/pfn/guest address space/g

s/gfn/guest address space/g

>
>> +"strict" means an unsolved conflict leads to immediate VM crash, while
>> +"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
>> +is default.
>> +
>> +Note this may be overrided by another sub item, rdm_reserve, in pci device.
>> +
>
> "overridden by rdm_reserve option in PCI device configuration".

Okay.

>
>>   =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
>>
>>   Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
>> @@ -707,6 +750,20 @@ dom0 without confirmation.  Please use with care.
>>   D0-D3hot power management states for the PCI device. False (0) by
>>   default.
>>
>> +=item B<rdm_reserv="STRING">
>> +
>> +(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
>> +which is necessary to enable robust device passthrough usage. One example of
>
> Delete "usage".
>
>> +RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
>> +structure on x86 platform.
>> +
>> +Conflict may be detected when reserving reserved device memory in gfn space.
>> +"strict" means an unsolved conflict leads to immediate VM crash, while
>> +"relaxed" allows VM moving forward with a warning message thrown out. "strict"
>> +is default.
>> +
>
> Actually these two paragraphs are the same as before. You can just point
> readers to previous sections instead of copying them here.

So instead,

(HVM/x86 only) This is same as reserve option above but just specific
to a given device, and "strict" is default here.

>
>> +Note this would override global B<rdm> option.
>> +
>>   =back
>>
>>   =back
>> diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
>> index 9af0e99..7d63c47 100644
>> --- a/docs/misc/vtd.txt
>> +++ b/docs/misc/vtd.txt
>> @@ -111,6 +111,30 @@ in the config file:
>>   To override for a specific device:
>>   	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
>>
>> +RDM, 'reserved device memory', for PCI Device Passthrough
>> +---------------------------------------------------------
>> +
>> +There are some devices the BIOS controls, for e.g. USB devices to perform
>> +PS2 emulation. The regions of memory used for these devices are marked
>> +reserved in the e820 map. When we turn on DMA translation, DMA to those
>> +regions will fail. Hence BIOS uses RMRR to specify these regions along with
>> +devices that need to access these regions. OS is expected to setup
>> +identity mappings for these regions for these devices to access these regions.
>> +
>> +While creating a VM we should reserve them in advance, and avoid any conflicts.
>> +So we introduce user configurable parameters to specify RDM resource and
>> +according policies,
>> +
>> +To enable this globally, add "rdm" in the config file:
>> +
>> +    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
>> +
>> +Or just for a specific device:
>> +
>> +    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
>> +
>> +For all the options available to RDM, see xl.cfg(5).
>> +
>>
>>   Caveat on Conventional PCI Device Passthrough
>>   ---------------------------------------------
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index f0da7dc..d649ead 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -100,6 +100,12 @@ static int sched_params_valid(libxl__gc *gc,
>>       return 1;
>>   }
>>
>> +void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
>> +{
>> +    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_NONE;

Based on our previous discussion, I will initial this firstly,

+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ], init_val = "LIBXL_RDM_RESERVE_TYPE_NONE")
+

and then, I would remove this line since right now we just own two 
options, "none" or "host". And both they're fine.

>> +    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
>
> No, not like this. You set everything back to none and relaxed even if
> it is set before this point.
>
> It should be
>      if (xxx == DEFAULT_SENTINEL_VALUE)
>          xxx = THE_DEFAULT_YOU_WANT;
>
> Have a look at libxl__device_nic_setdefault etc to get an idea
> how it works. Don't hesitate to ask if I'm not clear enough.

But indeed, here we should set rdm.reserve as you said,

+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+}
+

>
>> +}
>> +
>>   int libxl__domain_build_info_setdefault(libxl__gc *gc,
>>                                           libxl_domain_build_info *b_info)
>>   {
>> @@ -410,6 +416,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>>                      libxl_domain_type_to_string(b_info->type));
>>           return ERROR_INVAL;
>>       }
>> +
>> +    libxl__rdm_setdefault(gc, b_info);
>>       return 0;
>>   }
>>
>> @@ -1439,6 +1447,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
>>       }
>>
>>       for (i = 0; i < d_config->num_pcidevs; i++) {
>> +        /*
>> +         * If the rdm global policy is 'force' we should override each device.
>> +         */
>
> "strict" not "force"

Right.

Thanks
Tiejun

>
> Wei.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM
  2015-06-02 16:29   ` Wei Liu
@ 2015-06-03  2:25     ` Chen, Tiejun
  2015-06-07 11:20       ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-03  2:25 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On 2015/6/3 0:29, Wei Liu wrote:
> On Fri, May 22, 2015 at 05:35:04PM +0800, Tiejun Chen wrote:
>> While building a VM, HVM domain builder provides struct hvm_info_table{}
>> to help hvmloader. Currently it includes two fields to construct guest
>> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
>> check them to fix any conflict with RAM.
>>
>> RMRR can reside in address space beyond 4G theoretically, but we never
>> see this in real world. So in order to avoid breaking highmem layout
>> we don't solve highmem conflict. Note this means highmem rmrr could still
>> be supported if no conflict.
>>
>> But in the case of lowmem, RMRR probably scatter the whole RAM space.
>> Especially multiple RMRR entries would worsen this to lead a complicated
>> memory layout. And then its hard to extend hvm_info_table{} to work
>> hvmloader out. So here we're trying to figure out a simple solution to
>> avoid breaking existing layout. So when a conflict occurs,
>>
>>      #1. Above a predefined boundary (default 2G)
>>          - move lowmem_end below reserved region to solve conflict;
>>
>>      #2. Below a predefined boundary (default 2G)
>>          - Check strict/relaxed policy.
>>          "strict" policy leads to fail libxl. Note when both policies
>>          are specified on a given region, 'strict' is always preferred.
>>          "relaxed" policy issue a warning message and also mask this entry INVALID
>>          to indicate we shouldn't expose this entry to hvmloader.
>>
>> Note this predefined boundary can be changes with the parameter
>> "rdm_mem_boundary" in .cfg file.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>
> It would be better you write down what you changed in this version after
> "---" marker.
>
> What we normally do is
>
>
> libxl: implement FOO
>
> FOO is needed because ...
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
> changes in vN:
>   * bar -> baz
>   * more comments
> ---
>
> The stuff between two "---" will be automatically discarded when
> committing.

I knew about this rule.

Actually I already mentioned this change in patch #00,

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
   introduce a parameter, "rdm_mem_boundary", to set this threshold value.
...

So I didn't explain this again separately so sorry for this inconvenience.

>
>>   docs/man/xl.cfg.pod.5          |  21 ++++
>>   tools/libxc/include/xenguest.h |   1 +
>>   tools/libxc/xc_hvm_build_x86.c |  25 ++--
>>   tools/libxl/libxl_create.c     |   2 +-
>>   tools/libxl/libxl_dm.c         | 253 +++++++++++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_dom.c        |  27 ++++-
>>   tools/libxl/libxl_internal.h   |  11 +-
>>   tools/libxl/libxl_types.idl    |   8 ++
>>   tools/libxl/xl_cmdimpl.c       |   3 +
>>   9 files changed, 337 insertions(+), 14 deletions(-)
>>
>> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
>> index 12c34c4..80e3930 100644
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -764,6 +764,27 @@ is default.
>>
>>   Note this would override global B<rdm> option.
>>
>> +=item B<rdm_mem_boundary=MBYTES>
>> +
>> +Number of megabytes to set a boundary for checking rdm conflict.
>> +
>> +When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
>> +Especially multiple RMRR entries would worsen this to lead a complicated
>> +memory layout. So here we're trying to figure out a simple solution to
>> +avoid breaking existing layout. So when a conflict occurs,
>> +
>> +    #1. Above a predefined boundary
>> +        - move lowmem_end below reserved region to solve conflict;
>> +
>> +    #2. Below a predefined boundary
>> +        - Check strict/relaxed policy.
>> +        "strict" policy leads to fail libxl. Note when both policies
>> +        are specified on a given region, 'strict' is always preferred.
>> +        "relaxed" policy issue a warning message and also mask this entry INVALID
>> +        to indicate we shouldn't expose this entry to hvmloader.
>> +
>> +Her the default is 2G.
>
> Typo "her".

s/her/here

>
> I get the idea. I will leave grammar / syntax check to native speakers.

Sure :)

>
>> +
>>   =back
>>
>>   =back
>> diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
>> index 7581263..4cb7e9f 100644
>> --- a/tools/libxc/include/xenguest.h
>> +++ b/tools/libxc/include/xenguest.h
>> @@ -234,6 +234,7 @@ struct xc_hvm_firmware_module {
>>   };
>>
>>   struct xc_hvm_build_args {
>> +    uint64_t lowmem_size;        /* All low memory size in bytes. */
>
> You might find this value unnecessary with my patch to consolidate
> memory layout generation in libxl?

I also noticed this from your patch. And also I replied you online, I 
would rebase my patches once yours is acked. So at this point, yes, this 
should be gone when you introduce "lowmem_end".

>
>>       uint64_t mem_size;           /* Memory size in bytes. */
>>       uint64_t mem_target;         /* Memory target in bytes. */
>>       uint64_t mmio_size;          /* Size of the MMIO hole in bytes. */
>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>> index e45ae4a..9a1567a 100644
>> --- a/tools/libxc/xc_hvm_build_x86.c
>> +++ b/tools/libxc/xc_hvm_build_x86.c
>> @@ -21,6 +21,7 @@
>>   #include <stdlib.h>
>>   #include <unistd.h>
>>   #include <zlib.h>
>> +#include <assert.h>
>>
>>   #include "xg_private.h"
>>   #include "xc_private.h"
>> @@ -98,11 +99,8 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>>       uint8_t sum;
>>       int i;
>>
>> -    if ( lowmem_end > mmio_start )
>> -    {
>> -        highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
>> -        lowmem_end = mmio_start;
>> -    }
>> +    if ( args->mem_size > lowmem_end )
>> +        highmem_end = (1ull<<32) + (args->mem_size - lowmem_end);
>>
>>       memset(hvm_info_page, 0, PAGE_SIZE);
>>
>> @@ -279,7 +277,7 @@ static int setup_guest(xc_interface *xch,
>>
>>       elf_parse_binary(&elf);
>>       v_start = 0;
>> -    v_end = args->mem_size;
>> +    v_end = args->lowmem_size;
>>
>>       if ( nr_pages > target_pages )
>>           memflags |= XENMEMF_populate_on_demand;
>> @@ -344,8 +342,14 @@ static int setup_guest(xc_interface *xch,
>>
>>       for ( i = 0; i < nr_pages; i++ )
>>           page_array[i] = i;
>> -    for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
>> -        page_array[i] += mmio_size >> PAGE_SHIFT;
>> +    /*
>> +     * Actually v_end is args->lowmem_size, and we already adjusted
>> +     * this below mmio_start when we check rdm previously, so here
>> +     * this condition 'v_end <= mmio_start' is always true.
>> +     */
>> +    assert(v_end <= mmio_start);
>> +    for ( i = v_end >> PAGE_SHIFT; i < nr_pages; i++ )
>> +        page_array[i] += ((1ull << 32) - v_end) >> PAGE_SHIFT;
>>
>>       /*
>>        * Try to claim pages for early warning of insufficient memory available.
>> @@ -664,9 +668,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
>>       if ( args.mem_target == 0 )
>>           args.mem_target = args.mem_size;
>>
>> -    if ( args.mmio_size == 0 )
>> -        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>> -
>>       /* An HVM guest must be initialised with at least 2MB memory. */
>>       if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
>>           return -1;
>> @@ -713,6 +714,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>>       args.mem_size = (uint64_t)memsize << 20;
>>       args.mem_target = (uint64_t)target << 20;
>>       args.image_file_name = image_name;
>> +    if ( args.mmio_size == 0 )
>> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>>
>
> The above hunk can be simplified with my patch mentioned above.

Yeah.

Actually I already finalized one local revision in accordance with yours :)

>
>>       return xc_hvm_build(xch, domid, &args);
>>   }
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index d649ead..a782860 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
>>
>>       switch (info->type) {
>>       case LIBXL_DOMAIN_TYPE_HVM:
>> -        ret = libxl__build_hvm(gc, domid, info, state);
>> +        ret = libxl__build_hvm(gc, domid, d_config, state);
>>           if (ret)
>>               goto out;
>>
>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>> index 0c6408d..85e5317 100644
>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -90,6 +90,259 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>>       return dm;
>>   }
>>
>> +static struct xen_reserved_device_memory
>> +*xc_device_get_rdm(libxl__gc *gc,
>> +                   uint32_t flag,
>> +                   uint16_t seg,
>> +                   uint8_t bus,
>> +                   uint8_t devfn,
>> +                   unsigned int *nr_entries)
>> +{
>> +    struct xen_reserved_device_memory *xrdm = NULL;
>> +    int rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>> +                                           xrdm, nr_entries);
>> +
>
> Please separate declaration and function call. Also change xrdm to NULL

Are you saying this?

     struct xen_reserved_device_memory *xrdm = NULL;
     int rc;

     rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
                                        xrdm, nr_entries);

> in that function call.

Sorry, what do you mean by this point? Or you let me to change xrdm to 
NULL inside xc_reserved_device_memory_map()?

>
>> +    assert( rc <= 0 );
>> +    /* "0" means we have no any rdm entry. */
>> +    if ( !rc )
>> +        goto out;
>
> Also set *nr_entries = 0; otherwise you can't distinguish error vs 0
> entries.

*nr_entries is always updated by xc_reserved_device_memory_map() above.

>
>> +
>> +    if ( errno == ENOBUFS )
>> +    {
>> +        if ( (xrdm = malloc(*nr_entries *
>> +                            sizeof(xen_reserved_device_memory_t))) == NULL )
>
> Move xrdm = malloc out of "if".

Okay.

>
>> +        {
>> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
>> +            goto out;
>> +        }
>> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>> +                                           xrdm, nr_entries);
>> +        if ( rc )
>> +        {
>> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
>> +            *nr_entries = 0;
>> +            free(xrdm);
>> +            xrdm = NULL;
>> +        }
>> +    }
>> +    else
>> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
>> +
>> + out:
>> +    return xrdm;
>> +}
>> +
>> +/*
>> + * Check whether there exists rdm hole in the specified memory range.
>> + * Returns true if exists, else returns false.
>> + */
>> +static bool overlaps_rdm(uint64_t start, uint64_t memsize,
>> +                         uint64_t rdm_start, uint64_t rdm_size)
>> +{
>> +    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
>> +}
>> +
>> +/*
>> + * Check reported RDM regions and handle potential gfn conflicts according
>> + * to user preferred policy.
>> + *
>> + * RMRR can reside in address space beyond 4G theoretically, but we never
>> + * see this in real world. So in order to avoid breaking highmem layout
>> + * we don't solve highmem conflict. Note this means highmem rmrr could still
>> + * be supported if no conflict.
>> + *
>> + * But in the case of lowmem, RMRR probably scatter the whole RAM space.
>> + * Especially multiple RMRR entries would worsen this to lead a complicated
>> + * memory layout. And then its hard to extend hvm_info_table{} to work
>> + * hvmloader out. So here we're trying to figure out a simple solution to
>> + * avoid breaking existing layout. So when a conflict occurs,
>> + *
>> + * #1. Above a predefined boundary (default 2G)
>> + * - Move lowmem_end below reserved region to solve conflict;
>> + *
>> + * #2. Below a predefined boundary (default 2G)
>> + * - Check strict/relaxed policy.
>> + * "strict" policy leads to fail libxl. Note when both policies
>> + * are specified on a given region, 'strict' is always preferred.
>> + * "relaxed" policy issue a warning message and also mask this entry
>> + * INVALID to indicate we shouldn't expose this entry to hvmloader.
>> + */
>> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
>> +                                       libxl_domain_config *d_config,
>> +                                       uint64_t rdm_mem_boundary,
>> +                                       struct xc_hvm_build_args *args)
>> +{
>> +    int i, j, conflict;
>> +    struct xen_reserved_device_memory *xrdm = NULL;
>> +    uint64_t rdm_start, rdm_size, highmem_end = (1ULL << 32);
>> +    uint32_t type = d_config->b_info.rdm.type;
>> +    uint16_t seg;
>> +    uint8_t bus, devfn;
>> +
>> +    /* Fix highmem. */
>> +    highmem_end += (args->mem_size - args->lowmem_size);
>> +
>> +    /* Might not expose rdm. */
>> +    if (type == LIBXL_RDM_RESERVE_TYPE_NONE)
>> +        return 0;
>> +
>> +    /* Query all RDM entries in this platform */
>> +    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
>> +        unsigned int nr_entries;
>> +
>> +        /* Collect all rdm info if exist. */
>> +        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
>> +                                 0, 0, 0, &nr_entries);
>> +        if (!nr_entries)
>> +            return 0;
>> +
>> +        d_config->num_rdms = nr_entries;
>> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
>> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
>> +
>> +        for (i = 0; i < d_config->num_rdms; i++) {
>> +            d_config->rdms[i].start =
>> +                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
>> +            d_config->rdms[i].size =
>> +                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
>> +            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
>> +        }
>> +
>> +        free(xrdm);
>> +    } else
>> +        d_config->num_rdms = 0;
>> +
>> +    /* Query RDM entries per-device */
>> +    for (i = 0; i < d_config->num_pcidevs; i++) {
>> +        unsigned int nr_entries;
>> +
>
> Stray blank line.

Okay.

>
>> +        bool new = true;
>
> Need blank line here.
>

Okay.

>> +        seg = d_config->pcidevs[i].domain;
>> +        bus = d_config->pcidevs[i].bus;
>> +        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
>> +        nr_entries = 0;
>> +        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
>> +                                 seg, bus, devfn, &nr_entries);
>
> You didn't check xrdm != NULL;
>
>> +        /* No RDM to associated with this device. */
>> +        if (!nr_entries)

This combination of (!xrdm) and (nr_entries) should never happen but I 
think you're right we still need to check that to avoid any potential 
risk. So I guess we just need to introduce this line,

	assert(xrdm);

>> +            continue;
>> +
>> +        /*
>> +         * Need to check whether this entry is already saved in the array.
>> +         * This could come from two cases:
>> +         *
>> +         *   - user may configure to get all RMRRs in this platform, which
>> +         *   is already queried before this point
>> +         *   - or two assigned devices may share one RMRR entry
>> +         *
>> +         * different policies may be configured on the same RMRR due to above
>> +         * two cases. We choose a simple policy to always favor stricter policy
>> +         */
>> +        for (j = 0; j < d_config->num_rdms; j++) {
>> +            if (d_config->rdms[j].start ==
>> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
>> +             {
>> +                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
>> +                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
>> +                new = false;
>> +                break;
>> +            }
>> +        }
>> +
>> +        if (new) {
>> +            d_config->num_rdms++;
>> +            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
>> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
>> +
>> +            /* This is a new entry. */
>
> Delete this comment.

Okay.

>
>> +            d_config->rdms[d_config->num_rdms].start =
>> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
>> +            d_config->rdms[d_config->num_rdms].size =
>> +                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
>> +            d_config->rdms[d_config->num_rdms].flag = d_config->pcidevs[i].rdm_reserve;
>
> Line too long.

Fixed.

>
>> +        }
>> +        free(xrdm);
>> +    }
>> +
>> +    /*
>> +     * Next step is to check and avoid potential conflict between RDM entries
>> +     * and guest RAM. To avoid intrusive impact to existing memory layout
>> +     * {lowmem, mmio, highmem} which is passed around various function blocks,
>> +     * below conflicts are not handled which are rare and handling them would
>> +     * lead to a more scattered layout:
>> +     *  - RMRR in highmem area (>4G)
>> +     *  - RMRR lower than a defined memory boundary (e.g. 2G)
>> +     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
>> +     * end below reserved region to solve conflict.
>> +     *
>> +     * If a conflict is detected on a given RMRR entry, an error will be
>> +     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
>> +     * specified, this conflict is treated just as a warning, but we mark this
>> +     * RMRR entry as INVALID to indicate that this entry shouldn't be exposed
>> +     * to hvmloader.
>> +     *
>> +     * Firstly we should check the case of rdm < 4G because we may need to
>> +     * expand highmem_end.
>> +     */
>> +    for (i = 0; i < d_config->num_rdms; i++) {
>> +        rdm_start = d_config->rdms[i].start;
>> +        rdm_size = d_config->rdms[i].size;
>> +        conflict = overlaps_rdm(0, args->lowmem_size, rdm_start, rdm_size);
>> +
>> +        if (!conflict)
>> +            continue;
>> +
>> +        /* Just check if RDM > our memory boundary. */
>> +        if (rdm_start > rdm_mem_boundary) {
>> +            /*
>> +             * We will move downwards lowmem_end so we have to expand
>> +             * highmem_end.
>> +             */
>> +            highmem_end += (args->lowmem_size - rdm_start);
>> +            /* Now move downwards lowmem_end. */
>> +            args->lowmem_size = rdm_start;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Finally we can take same policy to check lowmem(< 2G) and
>> +     * highmem adjusted above.
>> +     */
>> +    for (i = 0; i < d_config->num_rdms; i++) {
>> +        rdm_start = d_config->rdms[i].start;
>> +        rdm_size = d_config->rdms[i].size;
>> +        /* Does this entry conflict with lowmem? */
>> +        conflict = overlaps_rdm(0, args->lowmem_size,
>> +                                rdm_start, rdm_size);
>> +        /* Does this entry conflict with highmem? */
>> +        conflict |= overlaps_rdm((1ULL<<32),
>> +                                 highmem_end - (1ULL<<32),
>> +                                 rdm_start, rdm_size);
>> +
>> +        if (!conflict)
>> +            continue;
>> +
>> +        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
>> +            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
>> +            goto out;
>> +        } else {
>> +            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
>> +                      d_config->rdms[i].start);
>> +
>> +            /*
>> +             * Then mask this INVALID to indicate we shouldn't expose this
>> +             * to hvmloader.
>> +             */
>> +            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +
>> + out:
>> +    return -1;
>
> Please return libxl error code.

ERROR_FAIL?

>
>
>> +}
>> +
>>   const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
>>   {
>>       const libxl_vnc_info *vnc = NULL;
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index a0c9850..84d5465 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -914,12 +914,14 @@ out:
>>   }
>>
>>   int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>> -              libxl_domain_build_info *info,
>> +              libxl_domain_config *d_config,
>>                 libxl__domain_build_state *state)
>>   {
>>       libxl_ctx *ctx = libxl__gc_owner(gc);
>>       struct xc_hvm_build_args args = {};
>>       int ret, rc = ERROR_FAIL;
>> +    libxl_domain_build_info *const info = &d_config->b_info;
>> +    uint64_t rdm_mem_boundary, mmio_start;
>>
>>       memset(&args, 0, sizeof(struct xc_hvm_build_args));
>>       /* The params from the configuration file are in Mb, which are then
>> @@ -928,6 +930,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>        * Do all this in one step here...
>>        */
>>       args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
>> +    args.lowmem_size = min((uint64_t)(1ULL << 32), args.mem_size);
>>       args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
>>       args.claim_enabled = libxl_defbool_val(info->claim_mode);
>>       if (info->u.hvm.mmio_hole_memkb) {
>> @@ -937,6 +940,28 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>           if (max_ram_below_4g < HVM_BELOW_4G_MMIO_START)
>>               args.mmio_size = info->u.hvm.mmio_hole_memkb << 10;
>>       }
>> +
>> +    if (args.mmio_size == 0)
>> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>> +    mmio_start = (1ull << 32) - args.mmio_size;
>> +
>> +    if (args.lowmem_size > mmio_start)
>> +        args.lowmem_size = mmio_start;
>> +
>> +    /*
>> +     * We'd like to set a memory boundary to determine if we need to check
>> +     * any overlap with reserved device memory.
>> +     */
>> +    rdm_mem_boundary = 0x80000000;
>> +    if (info->rdm_mem_boundary_memkb)
>

I'm going to update this chunk of codes as follows:

#1. @@ -858,6 +858,12 @@ const char 
*libxl_defbool_to_string(libxl_defbool b);
  #define LIBXL_TIMER_MODE_DEFAULT -1
  #define LIBXL_MEMKB_DEFAULT ~0ULL

+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
  #define LIBXL_MS_VM_GENID_LEN 16
  typedef struct {
      uint8_t bytes[LIBXL_MS_VM_GENID_LEN];

> I think you mean info->rdm_mem_boundary_memkb != LIBXL_MEMKB_DEFAULT?
>

#2.

@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
  {
      if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
          b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+
+    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+        b_info->rdm_mem_boundary_memkb =
+                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
  }

  int libxl__domain_build_info_setdefault(libxl__gc *gc,

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-02 16:36   ` Wei Liu
@ 2015-06-03  2:58     ` Chen, Tiejun
  2015-06-07 11:27       ` Wei Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-03  2:58 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On 2015/6/3 0:36, Wei Liu wrote:
> On Fri, May 22, 2015 at 05:35:08PM +0800, Tiejun Chen wrote:
>> This patch passes rdm reservation policy to xc_assign_device() so the policy
>> is checked when assigning devices to a VM.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxc/include/xenctrl.h       |  3 ++-
>>   tools/libxc/xc_domain.c             |  4 +++-
>>   tools/libxl/libxl_pci.c             | 11 ++++++++++-
>>   tools/libxl/xl_cmdimpl.c            | 23 +++++++++++++++++++----
>>   tools/libxl/xl_cmdtable.c           |  2 +-
>
> Where is document for the new options you added to xl pci commands?

Looks I'm missing to describe something specific to pci-attach?

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 4eb929d..2ebfd54 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to 
its original driver, making it
  usable by Domain 0 again.  If the device is not bound to pciback, it will
  return success.

-=item B<pci-attach> I<domain-id> I<BDF>
+=item B<pci-attach> I<domain-id> I<BDF> I<rdm policy>

  Hot-plug a new pass-through pci device to the specified domain.
  B<BDF> is the PCI Bus/Device/Function of the physical device to 
pass-through.
+B<rdm policy> is about how to handle conflict between reserving 
reserved device
+memory and guest address space. "strict" means an unsolved conflict 
leads to
+immediate VM crash, while "relaxed" allows VM moving forward with a warning
+message thrown out. Here "strict" is default.
+

  =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>


>
> BTW you might want to consider rearrange patches in this series so that

Yes, this is really what I intend to do.

> you keep the tree bisectable.

Overall, I can separate this series as several parts,

#1. Introduce our policy configuration on tools side
#2. Interact with Hypervisor to get rdm info
#3. Implement our policy with rdm info on tool side
#4. Make hvmloader to align our policy

If you already see something obviously wrong, let me know.

>
>>   tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
>>   tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
>>   xen/drivers/passthrough/pci.c       |  3 ++-
>>   8 files changed, 70 insertions(+), 23 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 5f84a62..2a447b9 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>>   /* HVM guest pass-through */
>>   int xc_assign_device(xc_interface *xch,
>>                        uint32_t domid,
>> -                     uint32_t machine_sbdf);
>> +                     uint32_t machine_sbdf,
>> +                     uint32_t flag);
>>
>>   int xc_get_device_group(xc_interface *xch,
>>                        uint32_t domid,
>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>> index c17a5a8..9761e5a 100644
>> --- a/tools/libxc/xc_domain.c
>> +++ b/tools/libxc/xc_domain.c
>> @@ -1704,7 +1704,8 @@ int xc_domain_setdebugging(xc_interface *xch,
>>   int xc_assign_device(
>>       xc_interface *xch,
>>       uint32_t domid,
>> -    uint32_t machine_sbdf)
>> +    uint32_t machine_sbdf,
>> +    uint32_t flag)
>>   {
>>       DECLARE_DOMCTL;
>>
>> @@ -1712,6 +1713,7 @@ int xc_assign_device(
>>       domctl.domain = domid;
>>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
>>       domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
>> +    domctl.u.assign_device.flag = flag;
>>
>>       return do_domctl(xch, &domctl);
>>   }
>> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
>> index 07e84f2..ac70edc 100644
>> --- a/tools/libxl/libxl_pci.c
>> +++ b/tools/libxl/libxl_pci.c
>> @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>>       FILE *f;
>>       unsigned long long start, end, flags, size;
>>       int irq, i, rc, hvm = 0;
>> +    uint32_t flag;
>>
>>       if (type == LIBXL_DOMAIN_TYPE_INVALID)
>>           return ERROR_FAIL;
>> @@ -987,7 +988,15 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>>
>>   out:
>>       if (!libxl_is_stubdom(ctx, domid, NULL)) {
>> -        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
>> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
>> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>> +        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
>> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
>> +        } else {
>> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");
>
> unknown
>
> Couldn't continue reviewing because I don't know the expected behaviour.
> But the changes look mostly mechanical.
>

I want to make this assignment failed so return ERROR_FAIL

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map
  2015-06-02 16:42   ` Wei Liu
@ 2015-06-03  3:06     ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-03  3:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On 2015/6/3 0:42, Wei Liu wrote:
> On Fri, May 22, 2015 at 05:35:10PM +0800, Tiejun Chen wrote:
>> Here we'll construct a basic guest e820 table via
>> XENMEM_set_memory_map. This table includes lowmem, highmem
>> and RDMs if they exist. And hvmloader would need this info
>> later.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxl/libxl_dom.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 87 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 84d5465..cc4b1a6 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -913,6 +913,87 @@ out:
>>       return rc;
>>   }
>>
>> +/*
>> + * Here we're just trying to set these kinds of e820 mappings:
>> + *
>> + * #1. Low memory region
>> + *
>> + * Low RAM starts at least from 1M to make sure all standard regions
>> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
>> + * have enough space.
>> + * Note: Those stuffs below 1M are still constructed with multiple
>> + * e820 entries by hvmloader. At this point we don't change anything.
>> + *
>> + * #2. RDM region if it exists
>> + *
>> + * #3. High memory region if it exists
>> + *
>> + * Note: these regions are not overlapping since we already check
>> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
>> + */
>> +#define GUEST_LOW_MEM_START_DEFAULT 0x100000
>> +static int libxl__domain_construct_memmap(libxl__gc *gc,
>> +                                          libxl_domain_config *d_config,
>> +                                          uint32_t domid,
>> +                                          struct xc_hvm_build_args *args)
>
> This is x86 specific. I think libxl__domain_construct_e820 is better
> name.

Okay.

>
>> +{
>> +    libxl_ctx *ctx = libxl__gc_owner(gc);
>
> Use CTX.

Sure.

>
>> +    unsigned int nr = 0, i;
>> +    /* We always own at least one lowmem entry. */
>> +    unsigned int e820_entries = 1;
>> +    uint64_t highmem_end = 0, highmem_size = args->mem_size - args->lowmem_size;
>> +    struct e820entry *e820 = NULL;
>> +
>> +    /* Add all rdm entries. */
>> +    e820_entries += d_config->num_rdms;
>> +
>> +    /* If we should have a highmem range. */
>> +    if (highmem_size)
>> +    {
>> +        highmem_end = (1ull<<32) + highmem_size;
>> +        e820_entries++;
>> +    }
>> +
>> +    if (e820_entries >= E820MAX) {
>> +        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
>> +        return -1;
>> +    }
>> +
>> +    e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
>> +
>> +    /* Low memory */
>> +    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
>> +    e820[nr].size = args->lowmem_size - GUEST_LOW_MEM_START_DEFAULT;
>> +    e820[nr].type = E820_RAM;
>> +    nr++;
>> +
>> +    /* RDM mapping */
>> +    for (i = 0; i < d_config->num_rdms; i++) {
>> +        /*
>> +         * We should drop this kind of rdm entry.
>> +         */
>
> This comment is not useful.

Okay.

>
>> +        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
>> +            continue;
>> +
>> +        e820[nr].addr = d_config->rdms[i].start;
>> +        e820[nr].size = d_config->rdms[i].size;
>> +        e820[nr].type = E820_RESERVED;
>> +        nr++;
>> +    }
>> +
>> +    /* High memory */
>> +    if (highmem_size) {
>> +        e820[nr].addr = ((uint64_t)1 << 32);
>> +        e820[nr].size = highmem_size;
>> +        e820[nr].type = E820_RAM;
>> +    }
>> +
>> +    if (xc_domain_set_memory_map(ctx->xch, domid, e820, e820_entries) != 0)
>> +        return -1;
>> +
>> +    return 0;
>> +}
>> +
>>   int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>                 libxl_domain_config *d_config,
>>                 libxl__domain_build_state *state)
>> @@ -1016,6 +1097,12 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>           ret = set_vnuma_info(gc, domid, info, state);
>>           if (ret) goto out;
>>       }
>> +
>> +    if (libxl__domain_construct_memmap(gc, d_config, domid, &args)) {
>> +        LOG(ERROR, "setting domain rdm memory map failed");
>
> The error message should not be RDM specific.

Maybe we can just remove "rdm" simply.

Thanks
Tiejun

>
> Wei.
>
>> +        goto out;
>> +    }
>> +
>>       ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
>>                                  &state->store_mfn, state->console_port,
>>                                  &state->console_mfn, state->store_domid,
>> --
>> 1.9.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy
  2015-06-03  1:35     ` Chen, Tiejun
@ 2015-06-07 11:06       ` Wei Liu
  2015-06-08  1:42         ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-07 11:06 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, Wei Liu, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On Wed, Jun 03, 2015 at 09:35:16AM +0800, Chen, Tiejun wrote:
[...]
> >
> >>+reserved regions explicitly. And using "host" to include all reserved regions
> >>+reported on this platform which is good to handle hotplug scenario. In the
> >>+future this parameter may be further extended to allow specifying random
> >>+regions, e.g. even those belonging to another platform as a preparation
> >
> >Extending how? What's your envisaged syntax for those random regions?
> 
> We didn't go into details while discussing that design. Maybe we can do
> something like this,
> 
> rdm="type=host,reserve=strict,rdm_add=size[KMG][@offset[KMG]],size[KMG][@offset[KMG]],..."
> 

This limits the extra regions to type host and strict policy. If that's
what you want then it's fine.

> >Should you want to reserve more, an array is more useful. Could you
> 
> Yeah.
> 
> >provide some examples?
> 
> But we may have alternative approach to this when I noticed some guys are
> trying to delivery some patches about setting rmrr region by xen
> commandline. So I also would like to check this likelihood when we can step
> forward.
> 

Makes sense.

> >
> >>+for live migration with passthrough devices.
> >>+
> >>+"none" means we have nothing to do all reserved regions and ignore all policies,
> >>+so guest work as before.

[...]

> >>diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> >>index 9af0e99..7d63c47 100644
> >>--- a/docs/misc/vtd.txt
> >>+++ b/docs/misc/vtd.txt
> >>@@ -111,6 +111,30 @@ in the config file:
> >>  To override for a specific device:
> >>  	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
> >>
> >>+RDM, 'reserved device memory', for PCI Device Passthrough
> >>+---------------------------------------------------------
> >>+
> >>+There are some devices the BIOS controls, for e.g. USB devices to perform
> >>+PS2 emulation. The regions of memory used for these devices are marked
> >>+reserved in the e820 map. When we turn on DMA translation, DMA to those
> >>+regions will fail. Hence BIOS uses RMRR to specify these regions along with
> >>+devices that need to access these regions. OS is expected to setup
> >>+identity mappings for these regions for these devices to access these regions.
> >>+
> >>+While creating a VM we should reserve them in advance, and avoid any conflicts.
> >>+So we introduce user configurable parameters to specify RDM resource and
> >>+according policies,
> >>+
> >>+To enable this globally, add "rdm" in the config file:
> >>+
> >>+    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
> >>+
> >>+Or just for a specific device:
> >>+
> >>+    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
> >>+
> >>+For all the options available to RDM, see xl.cfg(5).
> >>+
> >>
> >>  Caveat on Conventional PCI Device Passthrough
> >>  ---------------------------------------------
> >>diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> >>index f0da7dc..d649ead 100644
> >>--- a/tools/libxl/libxl_create.c
> >>+++ b/tools/libxl/libxl_create.c
> >>@@ -100,6 +100,12 @@ static int sched_params_valid(libxl__gc *gc,
> >>      return 1;
> >>  }
> >>
> >>+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
> >>+{
> >>+    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_NONE;
> 
> Based on our previous discussion, I will initial this firstly,
> 
> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
> +    (0, "none"),
> +    (1, "host"),
> +    ], init_val = "LIBXL_RDM_RESERVE_TYPE_NONE")
> +
> 
> and then, I would remove this line since right now we just own two options,
> "none" or "host". And both they're fine.
> 
> >>+    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
> >
> >No, not like this. You set everything back to none and relaxed even if
> >it is set before this point.
> >
> >It should be
> >     if (xxx == DEFAULT_SENTINEL_VALUE)
> >         xxx = THE_DEFAULT_YOU_WANT;
> >
> >Have a look at libxl__device_nic_setdefault etc to get an idea
> >how it works. Don't hesitate to ask if I'm not clear enough.
> 
> But indeed, here we should set rdm.reserve as you said,
> 
> +void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
> +{
> +    if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
> +        b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
> +}
> +
> 

Yes. This is fine.

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM
  2015-06-03  2:25     ` Chen, Tiejun
@ 2015-06-07 11:20       ` Wei Liu
  2015-06-08  2:16         ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-07 11:20 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, Wei Liu, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On Wed, Jun 03, 2015 at 10:25:47AM +0800, Chen, Tiejun wrote:
[...]
> >>+static struct xen_reserved_device_memory
> >>+*xc_device_get_rdm(libxl__gc *gc,
> >>+                   uint32_t flag,
> >>+                   uint16_t seg,
> >>+                   uint8_t bus,
> >>+                   uint8_t devfn,
> >>+                   unsigned int *nr_entries)
> >>+{
> >>+    struct xen_reserved_device_memory *xrdm = NULL;
> >>+    int rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> >>+                                           xrdm, nr_entries);
> >>+
> >
> >Please separate declaration and function call. Also change xrdm to NULL
> 
> Are you saying this?
> 
>     struct xen_reserved_device_memory *xrdm = NULL;
>     int rc;
> 
>     rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>                                        xrdm, nr_entries);

Yes, splitting "rc = " to a separate line. The other thing is: 

     rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
                                        NULL, nr_entries);
					^ here

It's mostly cosmetic. IMHO it is clearer than passing xrdm which is
always NULL in that function call.

> 
> >in that function call.
> 
> Sorry, what do you mean by this point? Or you let me to change xrdm to NULL
> inside xc_reserved_device_memory_map()?
> 
> >
> >>+    assert( rc <= 0 );
> >>+    /* "0" means we have no any rdm entry. */
> >>+    if ( !rc )
> >>+        goto out;
> >
> >Also set *nr_entries = 0; otherwise you can't distinguish error vs 0
> >entries.
> 
> *nr_entries is always updated by xc_reserved_device_memory_map() above.
> 

Actually no. If xc_hypercall_bounce_pre fails in the function,
nr_entries is untouched.

> >
> >>+
> >>+    if ( errno == ENOBUFS )
> >>+    {
> >>+        if ( (xrdm = malloc(*nr_entries *
> >>+                            sizeof(xen_reserved_device_memory_t))) == NULL )
[...]
> >>+    return -1;
> >
> >Please return libxl error code.
> 
> ERROR_FAIL?
> 

Yes, that's fine.

[...]
> >>+        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
> >>+    mmio_start = (1ull << 32) - args.mmio_size;
> >>+
> >>+    if (args.lowmem_size > mmio_start)
> >>+        args.lowmem_size = mmio_start;
> >>+
> >>+    /*
> >>+     * We'd like to set a memory boundary to determine if we need to check
> >>+     * any overlap with reserved device memory.
> >>+     */
> >>+    rdm_mem_boundary = 0x80000000;
> >>+    if (info->rdm_mem_boundary_memkb)
> >
> 
> I'm going to update this chunk of codes as follows:
> 
> #1. @@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool
> b);
>  #define LIBXL_TIMER_MODE_DEFAULT -1
>  #define LIBXL_MEMKB_DEFAULT ~0ULL
> 
> +/*
> + * We'd like to set a memory boundary to determine if we need to check
> + * any overlap with reserved device memory.
> + */
> +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
> +
>  #define LIBXL_MS_VM_GENID_LEN 16
>  typedef struct {
>      uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
> 
> >I think you mean info->rdm_mem_boundary_memkb != LIBXL_MEMKB_DEFAULT?
> >
> 
> #2.
> 
> @@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc,
> libxl_domain_build_info *b_info)
>  {
>      if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
>          b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
> +
> +    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
> +        b_info->rdm_mem_boundary_memkb =
> +                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;

This looks plausible.

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-03  2:58     ` Chen, Tiejun
@ 2015-06-07 11:27       ` Wei Liu
  2015-06-09  5:42         ` Chen, Tiejun
  0 siblings, 1 reply; 43+ messages in thread
From: Wei Liu @ 2015-06-07 11:27 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, Wei Liu, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On Wed, Jun 03, 2015 at 10:58:31AM +0800, Chen, Tiejun wrote:
> On 2015/6/3 0:36, Wei Liu wrote:
> >On Fri, May 22, 2015 at 05:35:08PM +0800, Tiejun Chen wrote:
> >>This patch passes rdm reservation policy to xc_assign_device() so the policy
> >>is checked when assigning devices to a VM.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> >>---
> >>  tools/libxc/include/xenctrl.h       |  3 ++-
> >>  tools/libxc/xc_domain.c             |  4 +++-
> >>  tools/libxl/libxl_pci.c             | 11 ++++++++++-
> >>  tools/libxl/xl_cmdimpl.c            | 23 +++++++++++++++++++----
> >>  tools/libxl/xl_cmdtable.c           |  2 +-
> >
> >Where is document for the new options you added to xl pci commands?
> 
> Looks I'm missing to describe something specific to pci-attach?
> 
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 4eb929d..2ebfd54 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its
> original driver, making it
>  usable by Domain 0 again.  If the device is not bound to pciback, it will
>  return success.
> 
> -=item B<pci-attach> I<domain-id> I<BDF>
> +=item B<pci-attach> I<domain-id> I<BDF> I<rdm policy>
> 

The way you put it here suggests that "rdm policy" is mandatory. I don't
think this is the case?

If it is not mandatory, write [I<rdm>].

>  Hot-plug a new pass-through pci device to the specified domain.
>  B<BDF> is the PCI Bus/Device/Function of the physical device to
> pass-through.
> +B<rdm policy> is about how to handle conflict between reserving reserved
> device
> +memory and guest address space. "strict" means an unsolved conflict leads
> to
> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
> +message thrown out. Here "strict" is default.
> +
> 
>  =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
> 
> 
> >
> >BTW you might want to consider rearrange patches in this series so that
> 
> Yes, this is really what I intend to do.
> 
> >you keep the tree bisectable.
> 
> Overall, I can separate this series as several parts,
> 
> #1. Introduce our policy configuration on tools side
> #2. Interact with Hypervisor to get rdm info
> #3. Implement our policy with rdm info on tool side
> #4. Make hvmloader to align our policy
> 
> If you already see something obviously wrong, let me know.
> 

I think all toolstack patches should come after hypervisor and hvmloader
patches. And then within toolstack patches, libxc patches should come
before libxl patches, libxl patches should come before xl patches.

The pattern is clear. Patches that are late in the series make use of
functionalities provided by early patches. Breaking this pattern is
definitely going to break bisection.

Wei.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy
  2015-06-07 11:06       ` Wei Liu
@ 2015-06-08  1:42         ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-08  1:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On 2015/6/7 19:06, Wei Liu wrote:
> On Wed, Jun 03, 2015 at 09:35:16AM +0800, Chen, Tiejun wrote:
> [...]
>>>
>>>> +reserved regions explicitly. And using "host" to include all reserved regions
>>>> +reported on this platform which is good to handle hotplug scenario. In the
>>>> +future this parameter may be further extended to allow specifying random
>>>> +regions, e.g. even those belonging to another platform as a preparation
>>>
>>> Extending how? What's your envisaged syntax for those random regions?
>>
>> We didn't go into details while discussing that design. Maybe we can do
>> something like this,
>>
>> rdm="type=host,reserve=strict,rdm_add=size[KMG][@offset[KMG]],size[KMG][@offset[KMG]],..."
>>
>
> This limits the extra regions to type host and strict policy. If that's
> what you want then it's fine.

The policy still can be changed with "reserve=". But whatever, I mean 
this current format is easy to extend :) You know, the hotplug is really 
a complicated case, we really need to consider more next.

Thanks
Tiejun

>
>>> Should you want to reserve more, an array is more useful. Could you
>>
>> Yeah.
>>
>>> provide some examples?
>>
>> But we may have alternative approach to this when I noticed some guys are
>> trying to delivery some patches about setting rmrr region by xen
>> commandline. So I also would like to check this likelihood when we can step
>> forward.
>>
>
> Makes sense.
>
>>>
>>>> +for live migration with passthrough devices.
>>>> +
>>>> +"none" means we have nothing to do all reserved regions and ignore all policies,
>>>> +so guest work as before.
>
> [...]
>
>>>> diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
>>>> index 9af0e99..7d63c47 100644
>>>> --- a/docs/misc/vtd.txt
>>>> +++ b/docs/misc/vtd.txt
>>>> @@ -111,6 +111,30 @@ in the config file:
>>>>   To override for a specific device:
>>>>   	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
>>>>
>>>> +RDM, 'reserved device memory', for PCI Device Passthrough
>>>> +---------------------------------------------------------
>>>> +
>>>> +There are some devices the BIOS controls, for e.g. USB devices to perform
>>>> +PS2 emulation. The regions of memory used for these devices are marked
>>>> +reserved in the e820 map. When we turn on DMA translation, DMA to those
>>>> +regions will fail. Hence BIOS uses RMRR to specify these regions along with
>>>> +devices that need to access these regions. OS is expected to setup
>>>> +identity mappings for these regions for these devices to access these regions.
>>>> +
>>>> +While creating a VM we should reserve them in advance, and avoid any conflicts.
>>>> +So we introduce user configurable parameters to specify RDM resource and
>>>> +according policies,
>>>> +
>>>> +To enable this globally, add "rdm" in the config file:
>>>> +
>>>> +    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
>>>> +
>>>> +Or just for a specific device:
>>>> +
>>>> +    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
>>>> +
>>>> +For all the options available to RDM, see xl.cfg(5).
>>>> +
>>>>
>>>>   Caveat on Conventional PCI Device Passthrough
>>>>   ---------------------------------------------
>>>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>>>> index f0da7dc..d649ead 100644
>>>> --- a/tools/libxl/libxl_create.c
>>>> +++ b/tools/libxl/libxl_create.c
>>>> @@ -100,6 +100,12 @@ static int sched_params_valid(libxl__gc *gc,
>>>>       return 1;
>>>>   }
>>>>
>>>> +void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
>>>> +{
>>>> +    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_NONE;
>>
>> Based on our previous discussion, I will initial this firstly,
>>
>> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
>> +    (0, "none"),
>> +    (1, "host"),
>> +    ], init_val = "LIBXL_RDM_RESERVE_TYPE_NONE")
>> +
>>
>> and then, I would remove this line since right now we just own two options,
>> "none" or "host". And both they're fine.
>>
>>>> +    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
>>>
>>> No, not like this. You set everything back to none and relaxed even if
>>> it is set before this point.
>>>
>>> It should be
>>>      if (xxx == DEFAULT_SENTINEL_VALUE)
>>>          xxx = THE_DEFAULT_YOU_WANT;
>>>
>>> Have a look at libxl__device_nic_setdefault etc to get an idea
>>> how it works. Don't hesitate to ask if I'm not clear enough.
>>
>> But indeed, here we should set rdm.reserve as you said,
>>
>> +void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
>> +{
>> +    if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
>> +        b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
>> +}
>> +
>>
>
> Yes. This is fine.
>
> Wei.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM
  2015-06-07 11:20       ` Wei Liu
@ 2015-06-08  2:16         ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-08  2:16 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On 2015/6/7 19:20, Wei Liu wrote:
> On Wed, Jun 03, 2015 at 10:25:47AM +0800, Chen, Tiejun wrote:
> [...]
>>>> +static struct xen_reserved_device_memory
>>>> +*xc_device_get_rdm(libxl__gc *gc,
>>>> +                   uint32_t flag,
>>>> +                   uint16_t seg,
>>>> +                   uint8_t bus,
>>>> +                   uint8_t devfn,
>>>> +                   unsigned int *nr_entries)
>>>> +{
>>>> +    struct xen_reserved_device_memory *xrdm = NULL;
>>>> +    int rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>>>> +                                           xrdm, nr_entries);
>>>> +
>>>
>>> Please separate declaration and function call. Also change xrdm to NULL
>>
>> Are you saying this?
>>
>>      struct xen_reserved_device_memory *xrdm = NULL;
>>      int rc;
>>
>>      rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>>                                         xrdm, nr_entries);
>
> Yes, splitting "rc = " to a separate line. The other thing is:
>
>       rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>                                          NULL, nr_entries);
> 					^ here
>
> It's mostly cosmetic. IMHO it is clearer than passing xrdm which is
> always NULL in that function call.

Right.

>
>>
>>> in that function call.
>>
>> Sorry, what do you mean by this point? Or you let me to change xrdm to NULL
>> inside xc_reserved_device_memory_map()?
>>
>>>
>>>> +    assert( rc <= 0 );
>>>> +    /* "0" means we have no any rdm entry. */
>>>> +    if ( !rc )
>>>> +        goto out;
>>>
>>> Also set *nr_entries = 0; otherwise you can't distinguish error vs 0
>>> entries.
>>
>> *nr_entries is always updated by xc_reserved_device_memory_map() above.
>>
>
> Actually no. If xc_hypercall_bounce_pre fails in the function,
> nr_entries is untouched.

Sure.

>
>>>
>>>> +
>>>> +    if ( errno == ENOBUFS )
>>>> +    {
>>>> +        if ( (xrdm = malloc(*nr_entries *
>>>> +                            sizeof(xen_reserved_device_memory_t))) == NULL )
> [...]
>>>> +    return -1;
>>>
>>> Please return libxl error code.
>>
>> ERROR_FAIL?
>>
>
> Yes, that's fine.

Thanks
Tiejun

>
> [...]
>>>> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>>>> +    mmio_start = (1ull << 32) - args.mmio_size;
>>>> +
>>>> +    if (args.lowmem_size > mmio_start)
>>>> +        args.lowmem_size = mmio_start;
>>>> +
>>>> +    /*
>>>> +     * We'd like to set a memory boundary to determine if we need to check
>>>> +     * any overlap with reserved device memory.
>>>> +     */
>>>> +    rdm_mem_boundary = 0x80000000;
>>>> +    if (info->rdm_mem_boundary_memkb)
>>>
>>
>> I'm going to update this chunk of codes as follows:
>>
>> #1. @@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool
>> b);
>>   #define LIBXL_TIMER_MODE_DEFAULT -1
>>   #define LIBXL_MEMKB_DEFAULT ~0ULL
>>
>> +/*
>> + * We'd like to set a memory boundary to determine if we need to check
>> + * any overlap with reserved device memory.
>> + */
>> +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
>> +
>>   #define LIBXL_MS_VM_GENID_LEN 16
>>   typedef struct {
>>       uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
>>
>>> I think you mean info->rdm_mem_boundary_memkb != LIBXL_MEMKB_DEFAULT?
>>>
>>
>> #2.
>>
>> @@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc,
>> libxl_domain_build_info *b_info)
>>   {
>>       if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
>>           b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
>> +
>> +    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
>> +        b_info->rdm_mem_boundary_memkb =
>> +                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
>
> This looks plausible.
>
> Wei.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-07 11:27       ` Wei Liu
@ 2015-06-09  5:42         ` Chen, Tiejun
  0 siblings, 0 replies; 43+ messages in thread
From: Chen, Tiejun @ 2015-06-09  5:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, JBeulich, yang.z.zhang, Ian.Jackson

On 2015/6/7 19:27, Wei Liu wrote:
> On Wed, Jun 03, 2015 at 10:58:31AM +0800, Chen, Tiejun wrote:
>> On 2015/6/3 0:36, Wei Liu wrote:
>>> On Fri, May 22, 2015 at 05:35:08PM +0800, Tiejun Chen wrote:
>>>> This patch passes rdm reservation policy to xc_assign_device() so the policy
>>>> is checked when assigning devices to a VM.
>>>>
>>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>>> ---
>>>>   tools/libxc/include/xenctrl.h       |  3 ++-
>>>>   tools/libxc/xc_domain.c             |  4 +++-
>>>>   tools/libxl/libxl_pci.c             | 11 ++++++++++-
>>>>   tools/libxl/xl_cmdimpl.c            | 23 +++++++++++++++++++----
>>>>   tools/libxl/xl_cmdtable.c           |  2 +-
>>>
>>> Where is document for the new options you added to xl pci commands?
>>
>> Looks I'm missing to describe something specific to pci-attach?
>>
>> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
>> index 4eb929d..2ebfd54 100644
>> --- a/docs/man/xl.pod.1
>> +++ b/docs/man/xl.pod.1
>> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its
>> original driver, making it
>>   usable by Domain 0 again.  If the device is not bound to pciback, it will
>>   return success.
>>
>> -=item B<pci-attach> I<domain-id> I<BDF>
>> +=item B<pci-attach> I<domain-id> I<BDF> I<rdm policy>
>>
>
> The way you put it here suggests that "rdm policy" is mandatory. I don't
> think this is the case?
>
> If it is not mandatory, write [I<rdm>].

Yes, thanks for you correction.

>
>>   Hot-plug a new pass-through pci device to the specified domain.
>>   B<BDF> is the PCI Bus/Device/Function of the physical device to
>> pass-through.
>> +B<rdm policy> is about how to handle conflict between reserving reserved
>> device
>> +memory and guest address space. "strict" means an unsolved conflict leads
>> to
>> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
>> +message thrown out. Here "strict" is default.
>> +
>>
>>   =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
>>
>>
>>>
>>> BTW you might want to consider rearrange patches in this series so that
>>
>> Yes, this is really what I intend to do.
>>
>>> you keep the tree bisectable.
>>
>> Overall, I can separate this series as several parts,
>>
>> #1. Introduce our policy configuration on tools side
>> #2. Interact with Hypervisor to get rdm info
>> #3. Implement our policy with rdm info on tool side
>> #4. Make hvmloader to align our policy
>>
>> If you already see something obviously wrong, let me know.
>>
>
> I think all toolstack patches should come after hypervisor and hvmloader
> patches. And then within toolstack patches, libxc patches should come
> before libxl patches, libxl patches should come before xl patches.
>
> The pattern is clear. Patches that are late in the series make use of
> functionalities provided by early patches. Breaking this pattern is
> definitely going to break bisection.
>

I tried to rearrange these patches as follows:

#1. hypervisor
0001-xen-introduce-XENMEM_reserved_device_memory_map.patch
0002-xen-x86-p2m-introduce-set_identity_p2m_entry.patch
0003-xen-vtd-create-RMRR-mapping.patch
0004-xen-passthrough-extend-hypercall-to-support-rdm-rese.patch
0005-xen-enable-XENMEM_memory_map-in-hvm.patch
#2. hvmloader
0006-hvmloader-get-guest-memory-map-into-memory_map.patch
0007-hvmloader-pci-skip-reserved-ranges.patch
0008-hvmloader-e820-construct-guest-e820-table.patch
#3. tools/libxc
0009-tools-libxc-Expose-new-hypercall-xc_reserved_device_.patch
0010-tools-extend-xc_assign_device-to-support-rdm-reserva.patch
0011-tools-introduce-some-new-parameters-to-set-rdm-polic.patch
#4. tools/linxl
0012-tools-libxl-passes-rdm-reservation-policy.patch
0013-tools-libxl-detect-and-avoid-conflicts-with-RDM.patch
0014-tools-libxl-extend-XENMEM_set_memory_map.patch
#5. Misc
0015-xen-vtd-enable-USB-device-assignment.patch
0016-xen-vtd-prevent-from-assign-the-device-with-shared-r.patch

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2015-06-09  5:42 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-22  9:35 [RFC][v2][PATCH 00/14] Fix RMRR Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 01/14] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-06-02 15:57   ` Wei Liu
2015-06-03  1:35     ` Chen, Tiejun
2015-06-07 11:06       ` Wei Liu
2015-06-08  1:42         ` Chen, Tiejun
2015-05-22  9:35 ` [RFC][v2][PATCH 02/14] introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 03/14] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 04/14] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-06-02 16:29   ` Wei Liu
2015-06-03  2:25     ` Chen, Tiejun
2015-06-07 11:20       ` Wei Liu
2015-06-08  2:16         ` Chen, Tiejun
2015-05-22  9:35 ` [RFC][v2][PATCH 05/14] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2015-05-28 12:27   ` Jan Beulich
2015-05-29  1:19     ` Chen, Tiejun
2015-05-22  9:35 ` [RFC][v2][PATCH 06/14] xen:vtd: create RMRR mapping Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 07/14] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-05-22 10:33   ` Julien Grall
2015-05-25  2:09     ` Chen, Tiejun
2015-05-25 10:02       ` Julien Grall
2015-05-25 10:50         ` Chen, Tiejun
2015-05-25 11:42           ` Julien Grall
2015-05-26  0:42             ` Chen, Tiejun
2015-05-22  9:35 ` [RFC][v2][PATCH 08/14] tools: extend xc_assign_device() " Tiejun Chen
2015-06-02 16:36   ` Wei Liu
2015-06-03  2:58     ` Chen, Tiejun
2015-06-07 11:27       ` Wei Liu
2015-06-09  5:42         ` Chen, Tiejun
2015-05-22  9:35 ` [RFC][v2][PATCH 09/14] xen: enable XENMEM_memory_map in hvm Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 10/14] tools: extend XENMEM_set_memory_map Tiejun Chen
2015-05-22 10:25   ` Julien Grall
2015-05-25  2:00     ` Chen, Tiejun
2015-06-02 16:42   ` Wei Liu
2015-06-03  3:06     ` Chen, Tiejun
2015-05-22  9:35 ` [RFC][v2][PATCH 11/14] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 12/14] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 13/14] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-05-22  9:35 ` [RFC][v2][PATCH 14/14] xen/vtd: enable USB device assignment Tiejun Chen
2015-05-22  9:46 ` [RFC][v2][PATCH 00/14] Fix RMRR Jan Beulich
2015-05-28  5:48   ` Chen, Tiejun
2015-05-28  7:55     ` Jan Beulich
2015-05-29  7:58       ` Chen, Tiejun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).