[Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends
@ 2017-11-27  4:35 Haozhong Zhang
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-27  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang,
	Eduardo Habkost, Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Michael S. Tsirkin, Xiao Guangrong

Previous versions can be found at
  v2: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01203.html
  v1: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05919.html

Changes in v3:
 * Add an option 'align' to 'memory-backend-file' to address the
   failure when mmap device dax (patch 1).
 * Remove device dax check, which needs to access sysfs and may not
   work with SELinux.
 * Add a boolean option 'unarmed' to '-device nvdimm', which allows
   users to control the unarmed flag in guest ACPI NFIT. I don't make
   it as OnOffAuto, because of the remove of device dax check.
 * Document new options added by this patch series.

Haozhong Zhang (3):
  hostmem-file: add "align" option
  nvdimm: add a macro for property "label-size"
  nvdimm: add 'unarmed' option

 backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 docs/nvdimm.txt         | 31 +++++++++++++++++++++++++++++++
 exec.c                  |  8 +++++++-
 hw/acpi/nvdimm.c        |  7 +++++++
 hw/mem/nvdimm.c         | 28 +++++++++++++++++++++++++++-
 include/exec/memory.h   |  3 +++
 include/hw/mem/nvdimm.h | 12 ++++++++++++
 memory.c                |  2 ++
 numa.c                  |  2 +-
 9 files changed, 130 insertions(+), 4 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option
  2017-11-27  4:35 [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Haozhong Zhang
@ 2017-11-27  4:35 ` Haozhong Zhang
  2017-11-28  1:07   ` Eduardo Habkost
                     ` (2 more replies)
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 2/3] nvdimm: add a macro for property "label-size" Haozhong Zhang
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-27  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang,
	Eduardo Habkost, Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson

When mmap(2) the backend files, QEMU uses the host page size
(getpagesize(2)) by default as the alignment of mapping address.
However, some backends may require alignments different than the page
size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
fails with a kernel message like

[617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)

Because there is no common approach to get such alignment requirement,
we add the 'align' option to 'memory-backend-file', so that users or
management utils, which have enough knowledge about the backend, can
specify a proper alignment via this option.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 docs/nvdimm.txt         | 16 ++++++++++++++++
 exec.c                  |  8 +++++++-
 include/exec/memory.h   |  3 +++
 memory.c                |  2 ++
 numa.c                  |  2 +-
 6 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e44c319915..e319ec1ad8 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
     bool share;
     bool discard_data;
     char *mem_path;
+    uint64_t align;
 };
 
 static void
@@ -58,7 +59,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         path = object_get_canonical_path(OBJECT(backend));
         memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  path,
-                                 backend->size, fb->share,
+                                 backend->size, fb->align, fb->share,
                                  fb->mem_path, errp);
         g_free(path);
     }
@@ -115,6 +116,40 @@ static void file_memory_backend_set_discard_data(Object *o, bool value,
     MEMORY_BACKEND_FILE(o)->discard_data = value;
 }
 
+static void file_memory_backend_get_align(Object *o, Visitor *v,
+                                          const char *name, void *opaque,
+                                          Error **errp)
+{
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+    uint64_t val = fb->align;
+
+    visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_align(Object *o, Visitor *v,
+                                          const char *name, void *opaque,
+                                          Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+    Error *local_err = NULL;
+    uint64_t val;
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    visit_type_size(v, name, &val, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    fb->align = val;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
 static void file_backend_unparent(Object *obj)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -145,6 +180,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
     object_class_property_add_str(oc, "mem-path",
         get_mem_path, set_mem_path,
         &error_abort);
+    object_class_property_add(oc, "align", "int",
+        file_memory_backend_get_align,
+        file_memory_backend_set_align,
+        NULL, NULL, &error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 2d9f8c0e8c..21249dd062 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -122,3 +122,19 @@ Note:
      M >= size of RAM devices +
           size of statically plugged vNVDIMM devices +
           size of hotplugged vNVDIMM devices
+
+Alignment
+---------
+
+QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
+address to the page size (getpagesize(2)) by default. However, some
+types of backends may require an alignment different than the page
+size. In that case, QEMU v2.12.0 and later provide 'align' option to
+memory-backend-file to allow users to specify the proper alignment.
+
+For example, device dax require the 2 MB alignment, so we can use
+following QEMU command line options to use it (/dev/dax0.0) as the
+backend of vNVDIMM:
+
+ -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
+ -device nvdimm,id=nvdimm1,memdev=mem1
diff --git a/exec.c b/exec.c
index 03238a3449..90440efecd 100644
--- a/exec.c
+++ b/exec.c
@@ -1600,7 +1600,13 @@ static void *file_ram_alloc(RAMBlock *block,
     void *area;
 
     block->page_size = qemu_fd_getpagesize(fd);
-    block->mr->align = block->page_size;
+    if (block->mr->align % block->page_size) {
+        error_setg(errp, "aligment 0x%" PRIx64
+                   " must be multiples of page size 0x%" PRIx64,
+                   block->mr->align, block->page_size);
+        return NULL;
+    }
+    block->mr->align = MAX(block->page_size, block->mr->align);
 #if defined(__s390x__)
     if (kvm_enabled()) {
         block->mr->align = MAX(block->mr->align, QEMU_VMALLOC_ALIGN);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5ed4042f87..a1be8b06d3 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -465,6 +465,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @name: Region name, becomes part of RAMBlock name used in migration stream
  *        must be unique within any device
  * @size: size of the region.
+ * @align: alignment of the region base address; if 0, the default alignment
+ *         (getpagesize()) will be used.
  * @share: %true if memory must be mmaped with the MAP_SHARED flag
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
@@ -476,6 +478,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       struct Object *owner,
                                       const char *name,
                                       uint64_t size,
+                                      uint64_t align,
                                       bool share,
                                       const char *path,
                                       Error **errp);
diff --git a/memory.c b/memory.c
index e26e5a3b1d..2c80656e39 100644
--- a/memory.c
+++ b/memory.c
@@ -1570,6 +1570,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       struct Object *owner,
                                       const char *name,
                                       uint64_t size,
+                                      uint64_t align,
                                       bool share,
                                       const char *path,
                                       Error **errp)
@@ -1578,6 +1579,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
+    mr->align = align;
     mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
diff --git a/numa.c b/numa.c
index 7151b24d1c..b0fe22a60c 100644
--- a/numa.c
+++ b/numa.c
@@ -551,7 +551,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
     if (mem_path) {
 #ifdef __linux__
         Error *err = NULL;
-        memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
+        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
                                          mem_path, &err);
         if (err) {
             error_report_err(err);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] nvdimm: add a macro for property "label-size"
  2017-11-27  4:35 [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Haozhong Zhang
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
@ 2017-11-27  4:35 ` Haozhong Zhang
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option Haozhong Zhang
  2017-11-28 18:44 ` [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Michael S. Tsirkin
  3 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-27  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang,
	Michael S. Tsirkin, Igor Mammedov, Xiao Guangrong

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/mem/nvdimm.c         | 2 +-
 include/hw/mem/nvdimm.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 952fce5ec8..618c3d677b 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -66,7 +66,7 @@ out:
 
 static void nvdimm_init(Object *obj)
 {
-    object_property_add(obj, "label-size", "int",
+    object_property_add(obj, NVDIMM_LABLE_SIZE_PROP, "int",
                         nvdimm_get_label_size, nvdimm_set_label_size, NULL,
                         NULL, NULL);
 }
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 03e1ff9558..28e68ddf59 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -47,6 +47,9 @@
 #define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
 #define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
                                                TYPE_NVDIMM)
+
+#define NVDIMM_LABLE_SIZE_PROP "label-size"
+
 struct NVDIMMDevice {
     /* private */
     PCDIMMDevice parent_obj;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option
  2017-11-27  4:35 [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Haozhong Zhang
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 2/3] nvdimm: add a macro for property "label-size" Haozhong Zhang
@ 2017-11-27  4:35 ` Haozhong Zhang
  2017-12-01 10:44   ` Stefan Hajnoczi
  2017-11-28 18:44 ` [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Michael S. Tsirkin
  3 siblings, 1 reply; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-27  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Haozhong Zhang,
	Xiao Guangrong, Michael S. Tsirkin, Igor Mammedov

Currently the only vNVDIMM backend can guarantee the guest write
persistence is device DAX on Linux, because no host-side kernel cache
is involved in the guest access to it. The approach to detect whether
the backend is device DAX needs to access sysfs, which may not work
with SELinux.

Instead, we add the 'unarmed' option to device 'nvdimm', so that users
or management utils, which have enough knowledge about the backend,
can control the unarmed flag in guest ACPI NFIT via this option. The
guest Linux NVDIMM driver, for example, will mark the corresponding
vNVDIMM device read-only if the unarmed flag in guest NFIT is set.

The default value of 'unarmed' option is 'off' in order to keep the
backwards compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 docs/nvdimm.txt         | 15 +++++++++++++++
 hw/acpi/nvdimm.c        |  7 +++++++
 hw/mem/nvdimm.c         | 26 ++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h |  9 +++++++++
 4 files changed, 57 insertions(+)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 21249dd062..e903d8bb09 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -138,3 +138,18 @@ backend of vNVDIMM:
 
  -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
  -device nvdimm,id=nvdimm1,memdev=mem1
+
+Guest Data Persistence
+----------------------
+
+Though QEMU supports multiple types of vNVDIMM backends on Linux,
+currently the only one that can guarantee the guest write persistence
+is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
+which all guest access do not involve any host-side kernel cache.
+
+When using other types of backends, it's suggested to set 'unarmed'
+option of '-device nvdimm' to 'on', which sets the unarmed flag of the
+guest NVDIMM region mapping structure.  This unarmed flag indicates
+guest software that this vNVDIMM device contains a region that cannot
+accept persistent writes. In result, for example, the guest Linux
+NVDIMM driver, marks such vNVDIMM device as read-only.
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 6ceea196e7..e55ff2cd12 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -138,6 +138,8 @@ struct NvdimmNfitMemDev {
 } QEMU_PACKED;
 typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
 
+#define ACPI_NFIT_MEM_NOT_ARMED     (1 << 3)
+
 /*
  * NVDIMM Control Region Structure
  *
@@ -284,6 +286,7 @@ static void
 nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
 {
     NvdimmNfitMemDev *nfit_memdev;
+    NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev));
     uint64_t size = object_property_get_uint(OBJECT(dev), PC_DIMM_SIZE_PROP,
                                              NULL);
     int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
@@ -312,6 +315,10 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
 
     /* Only one interleave for PMEM. */
     nfit_memdev->interleave_ways = cpu_to_le16(1);
+
+    if (nvdimm->unarmed) {
+        nfit_memdev->flags |= ACPI_NFIT_MEM_NOT_ARMED;
+    }
 }
 
 /*
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 618c3d677b..61e677f92f 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include "qapi-visit.h"
 #include "hw/mem/nvdimm.h"
 
 static void nvdimm_get_label_size(Object *obj, Visitor *v, const char *name,
@@ -64,11 +65,36 @@ out:
     error_propagate(errp, local_err);
 }
 
+static bool nvdimm_get_unarmed(Object *obj, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    return nvdimm->unarmed;
+}
+
+static void nvdimm_set_unarmed(Object *obj, bool value, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+    Error *local_err = NULL;
+
+    if (memory_region_size(&nvdimm->nvdimm_mr)) {
+        error_setg(&local_err, "cannot change property value");
+        goto out;
+    }
+
+    nvdimm->unarmed = value;
+
+ out:
+    error_propagate(errp, local_err);
+}
+
 static void nvdimm_init(Object *obj)
 {
     object_property_add(obj, NVDIMM_LABLE_SIZE_PROP, "int",
                         nvdimm_get_label_size, nvdimm_set_label_size, NULL,
                         NULL, NULL);
+    object_property_add_bool(obj, NVDIMM_UNARMED_PROP,
+                             nvdimm_get_unarmed, nvdimm_set_unarmed, NULL);
 }
 
 static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm, Error **errp)
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 28e68ddf59..7fd87c4e1c 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -49,6 +49,7 @@
                                                TYPE_NVDIMM)
 
 #define NVDIMM_LABLE_SIZE_PROP "label-size"
+#define NVDIMM_UNARMED_PROP    "unarmed"
 
 struct NVDIMMDevice {
     /* private */
@@ -74,6 +75,14 @@ struct NVDIMMDevice {
      * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
      */
     MemoryRegion nvdimm_mr;
+
+    /*
+     * The 'on' value results in the unarmed flag set in ACPI NFIT,
+     * which can be used to notify guest implicitly that the host
+     * backend (e.g., files on HDD, /dev/pmemX, etc.) cannot guarantee
+     * the guest write persistence.
+     */
+    bool unarmed;
 };
 typedef struct NVDIMMDevice NVDIMMDevice;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
@ 2017-11-28  1:07   ` Eduardo Habkost
  2017-11-29  0:33     ` Haozhong Zhang
  2017-11-29  0:31   ` Haozhong Zhang
  2017-11-29  0:38   ` Haozhong Zhang
  2 siblings, 1 reply; 12+ messages in thread
From: Eduardo Habkost @ 2017-11-28  1:07 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Xiao Guangrong, Stefan Hajnoczi, Dan Williams,
	Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson

On Mon, Nov 27, 2017 at 12:35:15PM +0800, Haozhong Zhang wrote:
> When mmap(2) the backend files, QEMU uses the host page size
> (getpagesize(2)) by default as the alignment of mapping address.
> However, some backends may require alignments different than the page
> size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
> kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
> fails with a kernel message like
> 
> [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
> 
> Because there is no common approach to get such alignment requirement,
> we add the 'align' option to 'memory-backend-file', so that users or
> management utils, which have enough knowledge about the backend, can
> specify a proper alignment via this option.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

The new option needs to be documented on qemu-options.hx.

Except for that, the patch looks good to me.


> ---
>  backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  docs/nvdimm.txt         | 16 ++++++++++++++++
>  exec.c                  |  8 +++++++-
>  include/exec/memory.h   |  3 +++
>  memory.c                |  2 ++
>  numa.c                  |  2 +-
>  6 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index e44c319915..e319ec1ad8 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
>      bool share;
>      bool discard_data;
>      char *mem_path;
> +    uint64_t align;
>  };
>  
>  static void
> @@ -58,7 +59,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>          path = object_get_canonical_path(OBJECT(backend));
>          memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                   path,
> -                                 backend->size, fb->share,
> +                                 backend->size, fb->align, fb->share,
>                                   fb->mem_path, errp);
>          g_free(path);
>      }
> @@ -115,6 +116,40 @@ static void file_memory_backend_set_discard_data(Object *o, bool value,
>      MEMORY_BACKEND_FILE(o)->discard_data = value;
>  }
>  
> +static void file_memory_backend_get_align(Object *o, Visitor *v,
> +                                          const char *name, void *opaque,
> +                                          Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +    uint64_t val = fb->align;
> +
> +    visit_type_size(v, name, &val, errp);
> +}
> +
> +static void file_memory_backend_set_align(Object *o, Visitor *v,
> +                                          const char *name, void *opaque,
> +                                          Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +    Error *local_err = NULL;
> +    uint64_t val;
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(&local_err, "cannot change property value");
> +        goto out;
> +    }
> +
> +    visit_type_size(v, name, &val, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +    fb->align = val;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -145,6 +180,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_str(oc, "mem-path",
>          get_mem_path, set_mem_path,
>          &error_abort);
> +    object_class_property_add(oc, "align", "int",
> +        file_memory_backend_get_align,
> +        file_memory_backend_set_align,
> +        NULL, NULL, &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 2d9f8c0e8c..21249dd062 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -122,3 +122,19 @@ Note:
>       M >= size of RAM devices +
>            size of statically plugged vNVDIMM devices +
>            size of hotplugged vNVDIMM devices
> +
> +Alignment
> +---------
> +
> +QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
> +address to the page size (getpagesize(2)) by default. However, some
> +types of backends may require an alignment different than the page
> +size. In that case, QEMU v2.12.0 and later provide 'align' option to
> +memory-backend-file to allow users to specify the proper alignment.
> +
> +For example, device dax require the 2 MB alignment, so we can use
> +following QEMU command line options to use it (/dev/dax0.0) as the
> +backend of vNVDIMM:
> +
> + -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
> + -device nvdimm,id=nvdimm1,memdev=mem1
> diff --git a/exec.c b/exec.c
> index 03238a3449..90440efecd 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1600,7 +1600,13 @@ static void *file_ram_alloc(RAMBlock *block,
>      void *area;
>  
>      block->page_size = qemu_fd_getpagesize(fd);
> -    block->mr->align = block->page_size;
> +    if (block->mr->align % block->page_size) {
> +        error_setg(errp, "aligment 0x%" PRIx64
> +                   " must be multiples of page size 0x%" PRIx64,
> +                   block->mr->align, block->page_size);
> +        return NULL;
> +    }
> +    block->mr->align = MAX(block->page_size, block->mr->align);
>  #if defined(__s390x__)
>      if (kvm_enabled()) {
>          block->mr->align = MAX(block->mr->align, QEMU_VMALLOC_ALIGN);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5ed4042f87..a1be8b06d3 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -465,6 +465,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>   * @name: Region name, becomes part of RAMBlock name used in migration stream
>   *        must be unique within any device
>   * @size: size of the region.
> + * @align: alignment of the region base address; if 0, the default alignment
> + *         (getpagesize()) will be used.
>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
>   * @path: the path in which to allocate the RAM.
>   * @errp: pointer to Error*, to store an error if it happens.
> @@ -476,6 +478,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        struct Object *owner,
>                                        const char *name,
>                                        uint64_t size,
> +                                      uint64_t align,
>                                        bool share,
>                                        const char *path,
>                                        Error **errp);
> diff --git a/memory.c b/memory.c
> index e26e5a3b1d..2c80656e39 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1570,6 +1570,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        struct Object *owner,
>                                        const char *name,
>                                        uint64_t size,
> +                                      uint64_t align,
>                                        bool share,
>                                        const char *path,
>                                        Error **errp)
> @@ -1578,6 +1579,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>      mr->ram = true;
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
> +    mr->align = align;
>      mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
> diff --git a/numa.c b/numa.c
> index 7151b24d1c..b0fe22a60c 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -551,7 +551,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>      if (mem_path) {
>  #ifdef __linux__
>          Error *err = NULL;
> -        memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
> +        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
>                                           mem_path, &err);
>          if (err) {
>              error_report_err(err);
> -- 
> 2.14.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends
  2017-11-27  4:35 [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Haozhong Zhang
                   ` (2 preceding siblings ...)
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option Haozhong Zhang
@ 2017-11-28 18:44 ` Michael S. Tsirkin
  3 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2017-11-28 18:44 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Xiao Guangrong, Stefan Hajnoczi, Dan Williams,
	Eduardo Habkost, Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Xiao Guangrong

On Mon, Nov 27, 2017 at 12:35:14PM +0800, Haozhong Zhang wrote:
> Previous versions can be found at
>   v2: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg01203.html
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05919.html
> 
> Changes in v3:
>  * Add an option 'align' to 'memory-backend-file' to address the
>    failure when mmap device dax (patch 1).
>  * Remove device dax check, which needs to access sysfs and may not
>    work with SELinux.
>  * Add a boolean option 'unarmed' to '-device nvdimm', which allows
>    users to control the unarmed flag in guest ACPI NFIT. I don't make
>    it as OnOffAuto, because of the remove of device dax check.
>  * Document new options added by this patch series.

If you expect me to be the one to merge this, pls copy me
on all patches.


> Haozhong Zhang (3):
>   hostmem-file: add "align" option
>   nvdimm: add a macro for property "label-size"
>   nvdimm: add 'unarmed' option
> 
>  backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  docs/nvdimm.txt         | 31 +++++++++++++++++++++++++++++++
>  exec.c                  |  8 +++++++-
>  hw/acpi/nvdimm.c        |  7 +++++++
>  hw/mem/nvdimm.c         | 28 +++++++++++++++++++++++++++-
>  include/exec/memory.h   |  3 +++
>  include/hw/mem/nvdimm.h | 12 ++++++++++++
>  memory.c                |  2 ++
>  numa.c                  |  2 +-
>  9 files changed, 130 insertions(+), 4 deletions(-)
> 
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
  2017-11-28  1:07   ` Eduardo Habkost
@ 2017-11-29  0:31   ` Haozhong Zhang
  2017-11-29  0:38   ` Haozhong Zhang
  2 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-29  0:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Eduardo Habkost,
	Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Michael S. Tsirkin

copy mst

On 11/27/17 12:35 +0800, Haozhong Zhang wrote:
> When mmap(2) the backend files, QEMU uses the host page size
> (getpagesize(2)) by default as the alignment of mapping address.
> However, some backends may require alignments different than the page
> size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
> kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
> fails with a kernel message like
> 
> [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
> 
> Because there is no common approach to get such alignment requirement,
> we add the 'align' option to 'memory-backend-file', so that users or
> management utils, which have enough knowledge about the backend, can
> specify a proper alignment via this option.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  docs/nvdimm.txt         | 16 ++++++++++++++++
>  exec.c                  |  8 +++++++-
>  include/exec/memory.h   |  3 +++
>  memory.c                |  2 ++
>  numa.c                  |  2 +-
>  6 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index e44c319915..e319ec1ad8 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
>      bool share;
>      bool discard_data;
>      char *mem_path;
> +    uint64_t align;
>  };
>  
>  static void
> @@ -58,7 +59,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>          path = object_get_canonical_path(OBJECT(backend));
>          memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                   path,
> -                                 backend->size, fb->share,
> +                                 backend->size, fb->align, fb->share,
>                                   fb->mem_path, errp);
>          g_free(path);
>      }
> @@ -115,6 +116,40 @@ static void file_memory_backend_set_discard_data(Object *o, bool value,
>      MEMORY_BACKEND_FILE(o)->discard_data = value;
>  }
>  
> +static void file_memory_backend_get_align(Object *o, Visitor *v,
> +                                          const char *name, void *opaque,
> +                                          Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +    uint64_t val = fb->align;
> +
> +    visit_type_size(v, name, &val, errp);
> +}
> +
> +static void file_memory_backend_set_align(Object *o, Visitor *v,
> +                                          const char *name, void *opaque,
> +                                          Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +    Error *local_err = NULL;
> +    uint64_t val;
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(&local_err, "cannot change property value");
> +        goto out;
> +    }
> +
> +    visit_type_size(v, name, &val, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +    fb->align = val;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -145,6 +180,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_str(oc, "mem-path",
>          get_mem_path, set_mem_path,
>          &error_abort);
> +    object_class_property_add(oc, "align", "int",
> +        file_memory_backend_get_align,
> +        file_memory_backend_set_align,
> +        NULL, NULL, &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 2d9f8c0e8c..21249dd062 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -122,3 +122,19 @@ Note:
>       M >= size of RAM devices +
>            size of statically plugged vNVDIMM devices +
>            size of hotplugged vNVDIMM devices
> +
> +Alignment
> +---------
> +
> +QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
> +address to the page size (getpagesize(2)) by default. However, some
> +types of backends may require an alignment different than the page
> +size. In that case, QEMU v2.12.0 and later provide 'align' option to
> +memory-backend-file to allow users to specify the proper alignment.
> +
> +For example, device dax require the 2 MB alignment, so we can use
> +following QEMU command line options to use it (/dev/dax0.0) as the
> +backend of vNVDIMM:
> +
> + -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
> + -device nvdimm,id=nvdimm1,memdev=mem1
> diff --git a/exec.c b/exec.c
> index 03238a3449..90440efecd 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1600,7 +1600,13 @@ static void *file_ram_alloc(RAMBlock *block,
>      void *area;
>  
>      block->page_size = qemu_fd_getpagesize(fd);
> -    block->mr->align = block->page_size;
> +    if (block->mr->align % block->page_size) {
> +        error_setg(errp, "aligment 0x%" PRIx64
> +                   " must be multiples of page size 0x%" PRIx64,
> +                   block->mr->align, block->page_size);
> +        return NULL;
> +    }
> +    block->mr->align = MAX(block->page_size, block->mr->align);
>  #if defined(__s390x__)
>      if (kvm_enabled()) {
>          block->mr->align = MAX(block->mr->align, QEMU_VMALLOC_ALIGN);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5ed4042f87..a1be8b06d3 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -465,6 +465,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>   * @name: Region name, becomes part of RAMBlock name used in migration stream
>   *        must be unique within any device
>   * @size: size of the region.
> + * @align: alignment of the region base address; if 0, the default alignment
> + *         (getpagesize()) will be used.
>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
>   * @path: the path in which to allocate the RAM.
>   * @errp: pointer to Error*, to store an error if it happens.
> @@ -476,6 +478,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        struct Object *owner,
>                                        const char *name,
>                                        uint64_t size,
> +                                      uint64_t align,
>                                        bool share,
>                                        const char *path,
>                                        Error **errp);
> diff --git a/memory.c b/memory.c
> index e26e5a3b1d..2c80656e39 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1570,6 +1570,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        struct Object *owner,
>                                        const char *name,
>                                        uint64_t size,
> +                                      uint64_t align,
>                                        bool share,
>                                        const char *path,
>                                        Error **errp)
> @@ -1578,6 +1579,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>      mr->ram = true;
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
> +    mr->align = align;
>      mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
> diff --git a/numa.c b/numa.c
> index 7151b24d1c..b0fe22a60c 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -551,7 +551,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>      if (mem_path) {
>  #ifdef __linux__
>          Error *err = NULL;
> -        memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
> +        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
>                                           mem_path, &err);
>          if (err) {
>              error_report_err(err);
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option
  2017-11-28  1:07   ` Eduardo Habkost
@ 2017-11-29  0:33     ` Haozhong Zhang
  2017-11-29 10:05       ` Eduardo Habkost
  0 siblings, 1 reply; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-29  0:33 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Xiao Guangrong, Stefan Hajnoczi, Dan Williams,
	Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Michael S. Tsirkin

On 11/27/17 23:07 -0200, Eduardo Habkost wrote:
> On Mon, Nov 27, 2017 at 12:35:15PM +0800, Haozhong Zhang wrote:
> > When mmap(2) the backend files, QEMU uses the host page size
> > (getpagesize(2)) by default as the alignment of mapping address.
> > However, some backends may require alignments different than the page
> > size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
> > kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
> > fails with a kernel message like
> > 
> > [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
> > 
> > Because there is no common approach to get such alignment requirement,
> > we add the 'align' option to 'memory-backend-file', so that users or
> > management utils, which have enough knowledge about the backend, can
> > specify a proper alignment via this option.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> The new option needs to be documented on qemu-options.hx.

will add in the next version.

Thanks,
Haozhong

> 
> Except for that, the patch looks good to me.
> 
> 
> > ---
> >  backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >  docs/nvdimm.txt         | 16 ++++++++++++++++
> >  exec.c                  |  8 +++++++-
> >  include/exec/memory.h   |  3 +++
> >  memory.c                |  2 ++
> >  numa.c                  |  2 +-
> >  6 files changed, 69 insertions(+), 3 deletions(-)
> > 
> > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > index e44c319915..e319ec1ad8 100644
> > --- a/backends/hostmem-file.c
> > +++ b/backends/hostmem-file.c
> > @@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
> >      bool share;
> >      bool discard_data;
> >      char *mem_path;
> > +    uint64_t align;
> >  };
> >  
> >  static void
> > @@ -58,7 +59,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> >          path = object_get_canonical_path(OBJECT(backend));
> >          memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
> >                                   path,
> > -                                 backend->size, fb->share,
> > +                                 backend->size, fb->align, fb->share,
> >                                   fb->mem_path, errp);
> >          g_free(path);
> >      }
> > @@ -115,6 +116,40 @@ static void file_memory_backend_set_discard_data(Object *o, bool value,
> >      MEMORY_BACKEND_FILE(o)->discard_data = value;
> >  }
> >  
> > +static void file_memory_backend_get_align(Object *o, Visitor *v,
> > +                                          const char *name, void *opaque,
> > +                                          Error **errp)
> > +{
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > +    uint64_t val = fb->align;
> > +
> > +    visit_type_size(v, name, &val, errp);
> > +}
> > +
> > +static void file_memory_backend_set_align(Object *o, Visitor *v,
> > +                                          const char *name, void *opaque,
> > +                                          Error **errp)
> > +{
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > +    Error *local_err = NULL;
> > +    uint64_t val;
> > +
> > +    if (host_memory_backend_mr_inited(backend)) {
> > +        error_setg(&local_err, "cannot change property value");
> > +        goto out;
> > +    }
> > +
> > +    visit_type_size(v, name, &val, &local_err);
> > +    if (local_err) {
> > +        goto out;
> > +    }
> > +    fb->align = val;
> > +
> > + out:
> > +    error_propagate(errp, local_err);
> > +}
> > +
> >  static void file_backend_unparent(Object *obj)
> >  {
> >      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> > @@ -145,6 +180,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
> >      object_class_property_add_str(oc, "mem-path",
> >          get_mem_path, set_mem_path,
> >          &error_abort);
> > +    object_class_property_add(oc, "align", "int",
> > +        file_memory_backend_get_align,
> > +        file_memory_backend_set_align,
> > +        NULL, NULL, &error_abort);
> >  }
> >  
> >  static void file_backend_instance_finalize(Object *o)
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 2d9f8c0e8c..21249dd062 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -122,3 +122,19 @@ Note:
> >       M >= size of RAM devices +
> >            size of statically plugged vNVDIMM devices +
> >            size of hotplugged vNVDIMM devices
> > +
> > +Alignment
> > +---------
> > +
> > +QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
> > +address to the page size (getpagesize(2)) by default. However, some
> > +types of backends may require an alignment different than the page
> > +size. In that case, QEMU v2.12.0 and later provide 'align' option to
> > +memory-backend-file to allow users to specify the proper alignment.
> > +
> > +For example, device dax require the 2 MB alignment, so we can use
> > +following QEMU command line options to use it (/dev/dax0.0) as the
> > +backend of vNVDIMM:
> > +
> > + -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
> > + -device nvdimm,id=nvdimm1,memdev=mem1
> > diff --git a/exec.c b/exec.c
> > index 03238a3449..90440efecd 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1600,7 +1600,13 @@ static void *file_ram_alloc(RAMBlock *block,
> >      void *area;
> >  
> >      block->page_size = qemu_fd_getpagesize(fd);
> > -    block->mr->align = block->page_size;
> > +    if (block->mr->align % block->page_size) {
> > +        error_setg(errp, "aligment 0x%" PRIx64
> > +                   " must be multiples of page size 0x%" PRIx64,
> > +                   block->mr->align, block->page_size);
> > +        return NULL;
> > +    }
> > +    block->mr->align = MAX(block->page_size, block->mr->align);
> >  #if defined(__s390x__)
> >      if (kvm_enabled()) {
> >          block->mr->align = MAX(block->mr->align, QEMU_VMALLOC_ALIGN);
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 5ed4042f87..a1be8b06d3 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -465,6 +465,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
> >   * @name: Region name, becomes part of RAMBlock name used in migration stream
> >   *        must be unique within any device
> >   * @size: size of the region.
> > + * @align: alignment of the region base address; if 0, the default alignment
> > + *         (getpagesize()) will be used.
> >   * @share: %true if memory must be mmaped with the MAP_SHARED flag
> >   * @path: the path in which to allocate the RAM.
> >   * @errp: pointer to Error*, to store an error if it happens.
> > @@ -476,6 +478,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
> >                                        struct Object *owner,
> >                                        const char *name,
> >                                        uint64_t size,
> > +                                      uint64_t align,
> >                                        bool share,
> >                                        const char *path,
> >                                        Error **errp);
> > diff --git a/memory.c b/memory.c
> > index e26e5a3b1d..2c80656e39 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1570,6 +1570,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
> >                                        struct Object *owner,
> >                                        const char *name,
> >                                        uint64_t size,
> > +                                      uint64_t align,
> >                                        bool share,
> >                                        const char *path,
> >                                        Error **errp)
> > @@ -1578,6 +1579,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
> >      mr->ram = true;
> >      mr->terminates = true;
> >      mr->destructor = memory_region_destructor_ram;
> > +    mr->align = align;
> >      mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
> >      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
> >  }
> > diff --git a/numa.c b/numa.c
> > index 7151b24d1c..b0fe22a60c 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -551,7 +551,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> >      if (mem_path) {
> >  #ifdef __linux__
> >          Error *err = NULL;
> > -        memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
> > +        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
> >                                           mem_path, &err);
> >          if (err) {
> >              error_report_err(err);
> > -- 
> > 2.14.1
> > 
> 
> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
  2017-11-28  1:07   ` Eduardo Habkost
  2017-11-29  0:31   ` Haozhong Zhang
@ 2017-11-29  0:38   ` Haozhong Zhang
  2 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-11-29  0:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Xiao Guangrong, Stefan Hajnoczi, Dan Williams, Eduardo Habkost,
	Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, mst

[ it looks my email client had some glitch and failed to add mst in cc list ]

copy mst

On 11/27/17 12:35 +0800, Haozhong Zhang wrote:
> When mmap(2) the backend files, QEMU uses the host page size
> (getpagesize(2)) by default as the alignment of mapping address.
> However, some backends may require alignments different than the page
> size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
> kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
> fails with a kernel message like
> 
> [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
> 
> Because there is no common approach to get such alignment requirement,
> we add the 'align' option to 'memory-backend-file', so that users or
> management utils, which have enough knowledge about the backend, can
> specify a proper alignment via this option.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  backends/hostmem-file.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  docs/nvdimm.txt         | 16 ++++++++++++++++
>  exec.c                  |  8 +++++++-
>  include/exec/memory.h   |  3 +++
>  memory.c                |  2 ++
>  numa.c                  |  2 +-
>  6 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index e44c319915..e319ec1ad8 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
>      bool share;
>      bool discard_data;
>      char *mem_path;
> +    uint64_t align;
>  };
>  
>  static void
> @@ -58,7 +59,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>          path = object_get_canonical_path(OBJECT(backend));
>          memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                   path,
> -                                 backend->size, fb->share,
> +                                 backend->size, fb->align, fb->share,
>                                   fb->mem_path, errp);
>          g_free(path);
>      }
> @@ -115,6 +116,40 @@ static void file_memory_backend_set_discard_data(Object *o, bool value,
>      MEMORY_BACKEND_FILE(o)->discard_data = value;
>  }
>  
> +static void file_memory_backend_get_align(Object *o, Visitor *v,
> +                                          const char *name, void *opaque,
> +                                          Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +    uint64_t val = fb->align;
> +
> +    visit_type_size(v, name, &val, errp);
> +}
> +
> +static void file_memory_backend_set_align(Object *o, Visitor *v,
> +                                          const char *name, void *opaque,
> +                                          Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +    Error *local_err = NULL;
> +    uint64_t val;
> +
> +    if (host_memory_backend_mr_inited(backend)) {
> +        error_setg(&local_err, "cannot change property value");
> +        goto out;
> +    }
> +
> +    visit_type_size(v, name, &val, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +    fb->align = val;
> +
> + out:
> +    error_propagate(errp, local_err);
> +}
> +
>  static void file_backend_unparent(Object *obj)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -145,6 +180,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_str(oc, "mem-path",
>          get_mem_path, set_mem_path,
>          &error_abort);
> +    object_class_property_add(oc, "align", "int",
> +        file_memory_backend_get_align,
> +        file_memory_backend_set_align,
> +        NULL, NULL, &error_abort);
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 2d9f8c0e8c..21249dd062 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -122,3 +122,19 @@ Note:
>       M >= size of RAM devices +
>            size of statically plugged vNVDIMM devices +
>            size of hotplugged vNVDIMM devices
> +
> +Alignment
> +---------
> +
> +QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
> +address to the page size (getpagesize(2)) by default. However, some
> +types of backends may require an alignment different than the page
> +size. In that case, QEMU v2.12.0 and later provide 'align' option to
> +memory-backend-file to allow users to specify the proper alignment.
> +
> +For example, device dax require the 2 MB alignment, so we can use
> +following QEMU command line options to use it (/dev/dax0.0) as the
> +backend of vNVDIMM:
> +
> + -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
> + -device nvdimm,id=nvdimm1,memdev=mem1
> diff --git a/exec.c b/exec.c
> index 03238a3449..90440efecd 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1600,7 +1600,13 @@ static void *file_ram_alloc(RAMBlock *block,
>      void *area;
>  
>      block->page_size = qemu_fd_getpagesize(fd);
> -    block->mr->align = block->page_size;
> +    if (block->mr->align % block->page_size) {
> +        error_setg(errp, "aligment 0x%" PRIx64
> +                   " must be multiples of page size 0x%" PRIx64,
> +                   block->mr->align, block->page_size);
> +        return NULL;
> +    }
> +    block->mr->align = MAX(block->page_size, block->mr->align);
>  #if defined(__s390x__)
>      if (kvm_enabled()) {
>          block->mr->align = MAX(block->mr->align, QEMU_VMALLOC_ALIGN);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5ed4042f87..a1be8b06d3 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -465,6 +465,8 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>   * @name: Region name, becomes part of RAMBlock name used in migration stream
>   *        must be unique within any device
>   * @size: size of the region.
> + * @align: alignment of the region base address; if 0, the default alignment
> + *         (getpagesize()) will be used.
>   * @share: %true if memory must be mmaped with the MAP_SHARED flag
>   * @path: the path in which to allocate the RAM.
>   * @errp: pointer to Error*, to store an error if it happens.
> @@ -476,6 +478,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        struct Object *owner,
>                                        const char *name,
>                                        uint64_t size,
> +                                      uint64_t align,
>                                        bool share,
>                                        const char *path,
>                                        Error **errp);
> diff --git a/memory.c b/memory.c
> index e26e5a3b1d..2c80656e39 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1570,6 +1570,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>                                        struct Object *owner,
>                                        const char *name,
>                                        uint64_t size,
> +                                      uint64_t align,
>                                        bool share,
>                                        const char *path,
>                                        Error **errp)
> @@ -1578,6 +1579,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
>      mr->ram = true;
>      mr->terminates = true;
>      mr->destructor = memory_region_destructor_ram;
> +    mr->align = align;
>      mr->ram_block = qemu_ram_alloc_from_file(size, mr, share, path, errp);
>      mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
>  }
> diff --git a/numa.c b/numa.c
> index 7151b24d1c..b0fe22a60c 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -551,7 +551,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>      if (mem_path) {
>  #ifdef __linux__
>          Error *err = NULL;
> -        memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
> +        memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, false,
>                                           mem_path, &err);
>          if (err) {
>              error_report_err(err);
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option
  2017-11-29  0:33     ` Haozhong Zhang
@ 2017-11-29 10:05       ` Eduardo Habkost
  0 siblings, 0 replies; 12+ messages in thread
From: Eduardo Habkost @ 2017-11-29 10:05 UTC (permalink / raw)
  To: qemu-devel, Xiao Guangrong, Stefan Hajnoczi, Dan Williams,
	Igor Mammedov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Michael S. Tsirkin

On Wed, Nov 29, 2017 at 08:33:29AM +0800, Haozhong Zhang wrote:
> On 11/27/17 23:07 -0200, Eduardo Habkost wrote:
> > On Mon, Nov 27, 2017 at 12:35:15PM +0800, Haozhong Zhang wrote:
> > > When mmap(2) the backend files, QEMU uses the host page size
> > > (getpagesize(2)) by default as the alignment of mapping address.
> > > However, some backends may require alignments different than the page
> > > size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
> > > kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
> > > fails with a kernel message like
> > > 
> > > [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
> > > 
> > > Because there is no common approach to get such alignment requirement,
> > > we add the 'align' option to 'memory-backend-file', so that users or
> > > management utils, which have enough knowledge about the backend, can
> > > specify a proper alignment via this option.
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > 
> > The new option needs to be documented on qemu-options.hx.
> 
> will add in the next version.

Note that there are patches on on machine-next that change the memory backend
documentation.

See:
  git://github.com/ehabkost/qemu.git machine-next
  https://github.com/ehabkost/qemu/commit/869f7f61c557a46b46e41b5e38a61551d45b6d0d

But the conflicts should be trivial to solve if you document the new option in
a separate paragraph.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option
  2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option Haozhong Zhang
@ 2017-12-01 10:44   ` Stefan Hajnoczi
  2017-12-04  5:47     ` Haozhong Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Hajnoczi @ 2017-12-01 10:44 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: qemu-devel, Xiao Guangrong, Dan Williams, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

On Mon, Nov 27, 2017 at 12:35:17PM +0800, Haozhong Zhang wrote:
> @@ -312,6 +315,10 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
>  
>      /* Only one interleave for PMEM. */
>      nfit_memdev->interleave_ways = cpu_to_le16(1);
> +
> +    if (nvdimm->unarmed) {
> +        nfit_memdev->flags |= ACPI_NFIT_MEM_NOT_ARMED;

Missing cpu_to_le16()?

> +static void nvdimm_set_unarmed(Object *obj, bool value, Error **errp)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> +    Error *local_err = NULL;
> +
> +    if (memory_region_size(&nvdimm->nvdimm_mr)) {
> +        error_setg(&local_err, "cannot change property value");

This error message does not explain why the value cannot be changed.  I
guess this is checking if the device has been realized.

Please call qdev_prop_set_after_realize() instead:

  if (memory_region_size(&nvdimm->nvdimm_mr)) {
      qdev_prop_set_after_realize(DEVICE(obj), "unarmed", &local_err);
      goto out;
  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option
  2017-12-01 10:44   ` Stefan Hajnoczi
@ 2017-12-04  5:47     ` Haozhong Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Haozhong Zhang @ 2017-12-04  5:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Xiao Guangrong, Dan Williams, Xiao Guangrong,
	Michael S. Tsirkin, Igor Mammedov

On 12/01/17 10:44 +0000, Stefan Hajnoczi wrote:
> On Mon, Nov 27, 2017 at 12:35:17PM +0800, Haozhong Zhang wrote:
> > @@ -312,6 +315,10 @@ nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev)
> >  
> >      /* Only one interleave for PMEM. */
> >      nfit_memdev->interleave_ways = cpu_to_le16(1);
> > +
> > +    if (nvdimm->unarmed) {
> > +        nfit_memdev->flags |= ACPI_NFIT_MEM_NOT_ARMED;
> 
> Missing cpu_to_le16()?
> 
> > +static void nvdimm_set_unarmed(Object *obj, bool value, Error **errp)
> > +{
> > +    NVDIMMDevice *nvdimm = NVDIMM(obj);
> > +    Error *local_err = NULL;
> > +
> > +    if (memory_region_size(&nvdimm->nvdimm_mr)) {
> > +        error_setg(&local_err, "cannot change property value");
> 
> This error message does not explain why the value cannot be changed.  I
> guess this is checking if the device has been realized.
> 
> Please call qdev_prop_set_after_realize() instead:
> 
>   if (memory_region_size(&nvdimm->nvdimm_mr)) {
>       qdev_prop_set_after_realize(DEVICE(obj), "unarmed", &local_err);
>       goto out;
>   }

I'll change in the next version.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-12-04  5:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-27  4:35 [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Haozhong Zhang
2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 1/3] hostmem-file: add "align" option Haozhong Zhang
2017-11-28  1:07   ` Eduardo Habkost
2017-11-29  0:33     ` Haozhong Zhang
2017-11-29 10:05       ` Eduardo Habkost
2017-11-29  0:31   ` Haozhong Zhang
2017-11-29  0:38   ` Haozhong Zhang
2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 2/3] nvdimm: add a macro for property "label-size" Haozhong Zhang
2017-11-27  4:35 ` [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option Haozhong Zhang
2017-12-01 10:44   ` Stefan Hajnoczi
2017-12-04  5:47     ` Haozhong Zhang
2017-11-28 18:44 ` [Qemu-devel] [PATCH-for-2.12 v3 0/3] nvdimm: fixes for (non-)dax backends Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).