[Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
@ 2015-08-14 14:51 Xiao Guangrong
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset Xiao Guangrong
                   ` (18 more replies)
  0 siblings, 19 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Changlog:
- Use litten endian for DSM method, thanks for Stefan's suggestion

- introduce a new parameter, @configdata, if it's false, Qemu will
  build a static and readonly namespace in memory and use it serveing
  for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
  reserved region is needed at the end of the @file, it is good for
  the user who want to pass whole nvdimm device and make its data
  completely be visible to guest

- divide the source code into separated files and add maintain info

BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week

====== Background ======
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

====== Design ======
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named "pc-nvdimm", it has a parameter, file, which
is the file-based backed memory passed to guest. The file can be regular file
and block device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
  on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.

--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Config Data Area
So that the label namespace data can be persistent during power lose or system
failure

--- _DSM method design ---
_DSM in ACPI is used to configure NVDIMM, currently we only allow access of
label namespace data, i.e, Get Namespace Label Size (Function Index 4),
Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
(Function Index 6)

_DSM uses two pages to transfer data between ACPI and Qemu, the first page
is RAM-based used to save the input info of _DSM method and Qemu reuse it
store output info and another page is MMIO-based, ACPI write data to this
page to transfer the control to Qemu

We use the address region above 4G to map these pages because there is huge
free space above 4G and it can avoid the address overlap with PCI and other
address reserved component (e,g HPET). This is also the reason we choose MMIO
notification instead of PIO

====== Test ======
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append '-device pc-nvdimm,file=/tmp/nvdimm' in Qemu command line

In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
                        continue;
                }
 
-               if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+               //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+               if (nfit_mem->memdev_pmem)
                        flags |= NDD_ALIASING;

You can append another NVDIMM device in guest and do:                       
# cd /sys/bus/nd/devices/
# cd namespace1.0/
# echo `uuidgen` > uuid
# echo `expr 1024 \* 1024 \* 128` > size
then reload nd.pmem.ko

You can see /dev/pmem1 appears

====== TODO ======
1) NVDIMM NUMA support
2) NVDIMM hotplug support

Xiao Guangrong (18):
  acpi: allow aml_operation_region() working on 64 bit offset
  i386/acpi-build: allow SSDT to operate on 64 bit
  acpi: add aml_derefof
  acpi: add aml_sizeof
  acpi: add aml_create_field
  pc: implement NVDIMM device abstract
  nvdimm: reserve address range for NVDIMM
  nvdimm: init backend memory mapping and config data area
  nvdimm: build ACPI NFIT table
  nvdimm: init the address region used by DSM method
  nvdimm: build ACPI nvdimm devices
  nvdimm: save arg3 for NVDIMM device _DSM method
  nvdimm: build namespace config data
  nvdimm: support NFIT_CMD_IMPLEMENTED function
  nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
  nvdimm: support NFIT_CMD_GET_CONFIG_DATA
  nvdimm: support NFIT_CMD_SET_CONFIG_DATA
  nvdimm: add maintain info

 MAINTAINERS                        |   6 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/Makefile.objs                   |   2 +-
 hw/acpi/aml-build.c                |  32 +-
 hw/i386/acpi-build.c               |   9 +-
 hw/i386/acpi-dsdt.dsl              |   2 +-
 hw/i386/pc.c                       |  12 +-
 hw/mem/Makefile.objs               |   2 +
 hw/mem/nvdimm/acpi.c               | 864 +++++++++++++++++++++++++++++++++++++
 hw/mem/nvdimm/internal.h           |  42 ++
 hw/mem/nvdimm/namespace.c          | 307 +++++++++++++
 hw/mem/nvdimm/pc-nvdimm.c          | 244 +++++++++++
 include/hw/acpi/aml-build.h        |   5 +-
 include/hw/mem/pc-nvdimm.h         |  45 ++
 15 files changed, 1566 insertions(+), 8 deletions(-)
 create mode 100644 hw/mem/nvdimm/acpi.c
 create mode 100644 hw/mem/nvdimm/internal.h
 create mode 100644 hw/mem/nvdimm/namespace.c
 create mode 100644 hw/mem/nvdimm/pc-nvdimm.c
 create mode 100644 include/hw/mem/pc-nvdimm.h

-- 
2.4.3

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
@ 2015-08-14 14:51 ` Xiao Guangrong
  2015-09-02  8:05   ` Igor Mammedov
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit Xiao Guangrong
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Currently, the offset in OperationRegion is limited to 32 bit, extend it
to 64 bit so that we can switch SSDT to 64 bit in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 2 +-
 include/hw/acpi/aml-build.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..02f9e3d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -752,7 +752,7 @@ Aml *aml_package(uint8_t num_elements)
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefOpRegion */
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-                          uint32_t offset, uint32_t len)
+                          uint64_t offset, uint32_t len)
 {
     Aml *var = aml_alloc();
     build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index e3afa13..996ac5b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -222,7 +222,7 @@ Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
 Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
             uint8_t aln, uint8_t len);
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-                          uint32_t offset, uint32_t len);
+                          uint64_t offset, uint32_t len);
 Aml *aml_irq_no_flags(uint8_t irq);
 Aml *aml_named_field(const char *name, unsigned length);
 Aml *aml_reserved_field(unsigned length);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset Xiao Guangrong
@ 2015-08-14 14:51 ` Xiao Guangrong
  2015-09-02 10:06   ` Igor Mammedov
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof Xiao Guangrong
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
Other components also reserve regions from their internal usage, e.g,
[0xFED00000, 0xFED00000 + 0x400) is reserved for HPET

Switch SSDT to 64 bit to use the huge free room above 4G. In the later
patches, we will dynamical allocate free space within this region which
is used by NVDIMM _DSM method

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/i386/acpi-build.c  | 4 ++--
 hw/i386/acpi-dsdt.dsl | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 46eddb8..8ead1c1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
     g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
     build_header(linker, table_data,
         (void *)(table_data->data + table_data->len - ssdt->buf->len),
-        "SSDT", ssdt->buf->len, 1);
+        "SSDT", ssdt->buf->len, 2);
     free_aml_allocator();
 }
 
@@ -1586,7 +1586,7 @@ build_dsdt(GArray *table_data, GArray *linker, AcpiMiscInfo *misc)
 
     memset(dsdt, 0, sizeof *dsdt);
     build_header(linker, table_data, dsdt, "DSDT",
-                 misc->dsdt_size, 1);
+                 misc->dsdt_size, 2);
 }
 
 static GArray *
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index a2d84ec..5cd3f0e 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
 DefinitionBlock (
     "acpi-dsdt.aml",    // Output Filename
     "DSDT",             // Signature
-    0x01,               // DSDT Compliance Revision
+    0x02,               // DSDT Compliance Revision
     "BXPC",             // OEMID
     "BXDSDT",           // TABLE ID
     0x1                 // OEM Revision
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset Xiao Guangrong
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit Xiao Guangrong
@ 2015-08-14 14:51 ` Xiao Guangrong
  2015-09-02 10:16   ` Igor Mammedov
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof Xiao Guangrong
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Implement DeRefOf term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 02f9e3d..9e89efc 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
     return var;
 }
 
+/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
+Aml *aml_derefof(Aml *arg)
+{
+    Aml *var = aml_opcode(0x83 /* DerefOfOp */);
+    aml_append(var, arg);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 996ac5b..21dc5e9 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -275,6 +275,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name);
 Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
+Aml *aml_derefof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (2 preceding siblings ...)
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof Xiao Guangrong
@ 2015-08-14 14:51 ` Xiao Guangrong
  2015-09-02 10:18   ` Igor Mammedov
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field Xiao Guangrong
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Implement SizeOf term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9e89efc..a526eed 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
     return var;
 }
 
+/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
+Aml *aml_sizeof(Aml *arg)
+{
+    Aml *var = aml_opcode(0x87 /* SizeOfOp */);
+    aml_append(var, arg);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 21dc5e9..6b591ab 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -276,6 +276,7 @@ Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
+Aml *aml_sizeof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (3 preceding siblings ...)
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof Xiao Guangrong
@ 2015-08-14 14:51 ` Xiao Guangrong
  2015-09-02 11:10   ` Igor Mammedov
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract Xiao Guangrong
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Implement CreateField term which are used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 14 ++++++++++++++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a526eed..debdad2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1151,6 +1151,20 @@ Aml *aml_sizeof(Aml *arg)
     return var;
 }
 
+/* ACPI 6.0: 20.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+    Aml *var = aml_alloc();
+
+    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+    aml_append(var, srcbuf);
+    aml_append(var, index);
+    aml_append(var, len);
+    build_append_namestring(var->buf, "%s", name);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 6b591ab..d4dbd44 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -277,6 +277,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (4 preceding siblings ...)
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field Xiao Guangrong
@ 2015-08-14 14:51 ` Xiao Guangrong
  2015-08-25 14:57   ` Stefan Hajnoczi
  2015-09-02  9:58   ` Igor Mammedov
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:51 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Introduce "pc-nvdimm" device and it has two parameters:
- @file, which is the backed memory file for NVDIMM device

- @configdata, specify if we need to reserve 128k at the end of
  @file for nvdimm device's config data. Default is false

If @configdata is false, Qemu will build a static and readonly
namespace in memory and use it serveing for
DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
This is good for the user who want to pass whole nvdimm device
and make its data is complete visible to guest

We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
Qemu command to create NVDIMM device for the guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/Makefile.objs                   |  2 +-
 hw/mem/Makefile.objs               |  1 +
 hw/mem/nvdimm/pc-nvdimm.c          | 99 ++++++++++++++++++++++++++++++++++++++
 include/hw/mem/pc-nvdimm.h         | 31 ++++++++++++
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 hw/mem/nvdimm/pc-nvdimm.c
 create mode 100644 include/hw/mem/pc-nvdimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 48b5762..67fc3a8 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -49,3 +49,4 @@ CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
+CONFIG_NVDIMM=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 4962ed7..dfcde36 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
+CONFIG_NVDIMM=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 73afa41..1e25d3f 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -30,7 +30,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
+devices-dirs-y += mem/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..4df7482 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
new file mode 100644
index 0000000..a53d235
--- /dev/null
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -0,0 +1,99 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/pc-nvdimm.h"
+
+static char *get_file(Object *obj, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    return g_strdup(nvdimm->file);
+}
+
+static void set_file(Object *obj, const char *str, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    if (nvdimm->file) {
+        g_free(nvdimm->file);
+    }
+
+    nvdimm->file = g_strdup(str);
+}
+
+static bool has_configdata(Object *obj, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    return nvdimm->configdata;
+}
+
+static void set_configdata(Object *obj, bool value, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    nvdimm->configdata = value;
+}
+
+static void pc_nvdimm_init(Object *obj)
+{
+    object_property_add_str(obj, "file", get_file, set_file, NULL);
+    object_property_add_bool(obj, "configdata", has_configdata,
+                             set_configdata, NULL);
+}
+
+static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+
+    if (!nvdimm->file) {
+        error_setg(errp, "file property is not set");
+    }
+}
+
+static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    /* nvdimm hotplug has not been supported yet. */
+    dc->hotpluggable = false;
+
+    dc->realize = pc_nvdimm_realize;
+    dc->desc = "NVDIMM memory module";
+}
+
+static TypeInfo pc_nvdimm_info = {
+    .name          = TYPE_PC_NVDIMM,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(PCNVDIMMDevice),
+    .instance_init = pc_nvdimm_init,
+    .class_init    = pc_nvdimm_class_init,
+};
+
+static void pc_nvdimm_register_types(void)
+{
+    type_register_static(&pc_nvdimm_info);
+}
+
+type_init(pc_nvdimm_register_types)
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
new file mode 100644
index 0000000..51152b8
--- /dev/null
+++ b/include/hw/mem/pc-nvdimm.h
@@ -0,0 +1,31 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef __PC_NVDIMM_H
+#define __PC_NVDIMM_H
+
+#include "hw/qdev.h"
+
+typedef struct PCNVDIMMDevice {
+    /* private */
+    DeviceState parent_obj;
+
+    char *file;
+    bool configdata;
+} PCNVDIMMDevice;
+
+#define TYPE_PC_NVDIMM "pc-nvdimm"
+
+#define PC_NVDIMM(obj) \
+    OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
+
+#endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (5 preceding siblings ...)
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 15:12   ` Stefan Hajnoczi
                     ` (3 more replies)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area Xiao Guangrong
                   ` (11 subsequent siblings)
  18 siblings, 4 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

NVDIMM reserves all the free range above 4G to do:
- Persistent Memory (PMEM) mapping
- implement NVDIMM ACPI device _DSM method

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/i386/pc.c               | 12 ++++++++++--
 hw/mem/nvdimm/pc-nvdimm.c  | 13 +++++++++++++
 include/hw/mem/pc-nvdimm.h |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7661ea9..41af6ea 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -64,6 +64,7 @@
 #include "hw/pci/pci_host.h"
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/mem/pc-nvdimm.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
 
@@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
     MemoryRegion *ram_below_4g, *ram_above_4g;
     FWCfgState *fw_cfg;
     PCMachineState *pcms = PC_MACHINE(machine);
+    ram_addr_t offset;
 
     assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
 
@@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
         exit(EXIT_FAILURE);
     }
 
+    offset = 0x100000000ULL + above_4g_mem_size;
+
     /* initialize hotplug memory address space */
     if (guest_info->has_reserved_memory &&
         (machine->ram_size < machine->maxram_size)) {
@@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
             exit(EXIT_FAILURE);
         }
 
-        pcms->hotplug_memory.base =
-            ROUND_UP(0x100000000ULL + above_4g_mem_size, 1ULL << 30);
+        pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);
 
         if (pcms->enforce_aligned_dimm) {
             /* size hotplug region assuming 1G page max alignment per slot */
@@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
                            "hotplug-memory", hotplug_mem_size);
         memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
                                     &pcms->hotplug_memory.mr);
+
+        offset = pcms->hotplug_memory.base + hotplug_mem_size;
     }
 
+     /* all the space left above 4G is reserved for NVDIMM. */
+    pc_nvdimm_reserve_range(offset);
+
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, guest_info->isapc_ram_fw);
 
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index a53d235..7a270a8 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -24,6 +24,19 @@
 
 #include "hw/mem/pc-nvdimm.h"
 
+#define PAGE_SIZE      (1UL << 12)
+
+static struct nvdimms_info {
+    ram_addr_t current_addr;
+} nvdimms_info;
+
+/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
+void pc_nvdimm_reserve_range(ram_addr_t offset)
+{
+    offset = ROUND_UP(offset, PAGE_SIZE);
+    nvdimms_info.current_addr = offset;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
     PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index 51152b8..8601e9b 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
 #define PC_NVDIMM(obj) \
     OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
 
+void pc_nvdimm_reserve_range(ram_addr_t offset);
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (6 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 16:03   ` Stefan Hajnoczi
  2015-09-07 14:11   ` Igor Mammedov
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table Xiao Guangrong
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

The parameter @file is used as backed memory for NVDIMM which is
divided into two parts if @dataconfig is true:
- first parts is (0, size - 128K], which is used as PMEM (Persistent
  Memory)
- 128K at the end of the file, which is used as Config Data Area, it's
  used to store Label namespace data

The @file supports both regular file and block device, of course we
can assign any these two kinds of files for test and emulation, however,
in the real word for performance reason, we usually used these files as
NVDIMM backed file:
- the regular file in the filesystem with DAX enabled created on NVDIMM
  device on host
- the raw PMEM device on host, e,g /dev/pmem0

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/pc-nvdimm.c  | 109 ++++++++++++++++++++++++++++++++++++++++++++-
 include/hw/mem/pc-nvdimm.h |   7 +++
 2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index 7a270a8..97710d1 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -22,12 +22,20 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>
  */
 
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+
+#include "exec/address-spaces.h"
 #include "hw/mem/pc-nvdimm.h"
 
-#define PAGE_SIZE      (1UL << 12)
+#define PAGE_SIZE               (1UL << 12)
+
+#define MIN_CONFIG_DATA_SIZE    (128 << 10)
 
 static struct nvdimms_info {
     ram_addr_t current_addr;
+    int device_index;
 } nvdimms_info;
 
 /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
@@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
     nvdimms_info.current_addr = offset;
 }
 
+static ram_addr_t reserved_range_push(uint64_t size)
+{
+    uint64_t current;
+
+    current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
+
+    /* do not have enough space? */
+    if (current + size < current) {
+        return 0;
+    }
+
+    nvdimms_info.current_addr = current + size;
+    return current;
+}
+
+static uint32_t new_device_index(void)
+{
+    return nvdimms_info.device_index++;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
     PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
@@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error **errp)
 {
     PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 
+    if (memory_region_size(&nvdimm->mr)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+
     if (nvdimm->file) {
         g_free(nvdimm->file);
     }
@@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
                              set_configdata, NULL);
 }
 
+static uint64_t get_file_size(int fd)
+{
+    struct stat stat_buf;
+    uint64_t size;
+
+    if (fstat(fd, &stat_buf) < 0) {
+        return 0;
+    }
+
+    if (S_ISREG(stat_buf.st_mode)) {
+        return stat_buf.st_size;
+    }
+
+    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
+        return size;
+    }
+
+    return 0;
+}
+
 static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
 {
     PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+    char name[512];
+    void *buf;
+    ram_addr_t addr;
+    uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
+    int fd;
 
     if (!nvdimm->file) {
         error_setg(errp, "file property is not set");
     }
+
+    fd = open(nvdimm->file, O_RDWR);
+    if (fd < 0) {
+        error_setg(errp, "can not open %s", nvdimm->file);
+        return;
+    }
+
+    size = get_file_size(fd);
+    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    if (buf == MAP_FAILED) {
+        error_setg(errp, "can not do mmap on %s", nvdimm->file);
+        goto do_close;
+    }
+
+    nvdimm->config_data_size = config_size;
+    if (nvdimm->configdata) {
+        /* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
+        nvdimm_size = size - config_size;
+        nvdimm->config_data_addr = buf + nvdimm_size;
+    } else {
+        nvdimm_size = size;
+        nvdimm->config_data_addr = NULL;
+    }
+
+    if ((int64_t)nvdimm_size <= 0) {
+        error_setg(errp, "file size is too small to store NVDIMM"
+                         " configure data");
+        goto do_unmap;
+    }
+
+    addr = reserved_range_push(nvdimm_size);
+    if (!addr) {
+        error_setg(errp, "do not have enough space for size %#lx.\n", size);
+        goto do_unmap;
+    }
+
+    nvdimm->device_index = new_device_index();
+    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
+    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
+                               buf);
+    vmstate_register_ram(&nvdimm->mr, DEVICE(dev));
+    memory_region_add_subregion(get_system_memory(), addr, &nvdimm->mr);
+
+    return;
+
+do_unmap:
+    munmap(buf, size);
+do_close:
+    close(fd);
 }
 
 static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index 8601e9b..f617fd2 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -21,6 +21,13 @@ typedef struct PCNVDIMMDevice {
 
     char *file;
     bool configdata;
+
+    int device_index;
+
+    uint64_t config_data_size;
+    void *config_data_addr;
+
+    MemoryRegion mr;
 } PCNVDIMMDevice;
 
 #define TYPE_PC_NVDIMM "pc-nvdimm"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (7 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-09-15 16:12   ` Paolo Bonzini
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method Xiao Guangrong
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 tables:
- SPA table, define the PMEM region info

- MEM DEV table, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR table, it defines Vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/i386/acpi-build.c       |   3 +
 hw/mem/Makefile.objs       |   2 +-
 hw/mem/nvdimm/acpi.c       | 285 +++++++++++++++++++++++++++++++++++++++++++++
 hw/mem/nvdimm/internal.h   |  29 +++++
 hw/mem/nvdimm/pc-nvdimm.c  |  27 ++++-
 include/hw/mem/pc-nvdimm.h |   2 +
 6 files changed, 346 insertions(+), 2 deletions(-)
 create mode 100644 hw/mem/nvdimm/acpi.c
 create mode 100644 hw/mem/nvdimm/internal.h

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 8ead1c1..092ed2f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -39,6 +39,7 @@
 #include "hw/loader.h"
 #include "hw/isa/isa.h"
 #include "hw/acpi/memory_hotplug.h"
+#include "hw/mem/pc-nvdimm.h"
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "sysemu/tpm_backend.h"
@@ -1741,6 +1742,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
         build_dmar_q35(tables_blob, tables->linker);
     }
 
+    pc_nvdimm_build_nfit_table(table_offsets, tables_blob, tables->linker);
+
     /* Add tables supplied by user (if any) */
     for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
         unsigned len = acpi_table_len(u);
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 4df7482..7a6948d 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,2 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
-common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o
diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
new file mode 100644
index 0000000..f28752f
--- /dev/null
+++ b/hw/mem/nvdimm/acpi.c
@@ -0,0 +1,285 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) NFIT Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ * and the DSM specfication can be found at:
+ *       http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu-common.h"
+
+#include "hw/acpi/aml-build.h"
+#include "hw/mem/pc-nvdimm.h"
+
+#include "internal.h"
+
+static void nfit_spa_uuid_pm(void *uuid)
+{
+    uuid_le uuid_pm = UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d,
+                              0x33, 0x18, 0xb7, 0x8c, 0xdb);
+    memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
+}
+
+enum {
+    NFIT_TABLE_SPA = 0,
+    NFIT_TABLE_MEM = 1,
+    NFIT_TABLE_IDT = 2,
+    NFIT_TABLE_SMBIOS = 3,
+    NFIT_TABLE_DCR = 4,
+    NFIT_TABLE_BDW = 5,
+    NFIT_TABLE_FLUSH = 6,
+};
+
+enum {
+    EFI_MEMORY_UC = 0x1ULL,
+    EFI_MEMORY_WC = 0x2ULL,
+    EFI_MEMORY_WT = 0x4ULL,
+    EFI_MEMORY_WB = 0x8ULL,
+    EFI_MEMORY_UCE = 0x10ULL,
+    EFI_MEMORY_WP = 0x1000ULL,
+    EFI_MEMORY_RP = 0x2000ULL,
+    EFI_MEMORY_XP = 0x4000ULL,
+    EFI_MEMORY_NV = 0x8000ULL,
+    EFI_MEMORY_MORE_RELIABLE = 0x10000ULL,
+};
+
+/*
+ * struct nfit - Nvdimm Firmware Interface Table
+ * @signature: "NFIT"
+ */
+struct nfit {
+    ACPI_TABLE_HEADER_DEF
+    uint32_t reserved;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_spa - System Physical Address Range Structure
+ */
+struct nfit_spa {
+    uint16_t type;
+    uint16_t length;
+    uint16_t spa_index;
+    uint16_t flags;
+    uint32_t reserved;
+    uint32_t proximity_domain;
+    uint8_t type_uuid[16];
+    uint64_t spa_base;
+    uint64_t spa_length;
+    uint64_t mem_attr;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_memdev - Memory Device to SPA Map Structure
+ */
+struct nfit_memdev {
+    uint16_t type;
+    uint16_t length;
+    uint32_t nfit_handle;
+    uint16_t phys_id;
+    uint16_t region_id;
+    uint16_t spa_index;
+    uint16_t dcr_index;
+    uint64_t region_len;
+    uint64_t region_spa_offset;
+    uint64_t region_dpa;
+    uint16_t idt_index;
+    uint16_t interleave_ways;
+    uint16_t flags;
+    uint16_t reserved;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_dcr - NVDIMM Control Region Structure
+ */
+struct nfit_dcr {
+    uint16_t type;
+    uint16_t length;
+    uint16_t dcr_index;
+    uint16_t vendor_id;
+    uint16_t device_id;
+    uint16_t revision_id;
+    uint16_t sub_vendor_id;
+    uint16_t sub_device_id;
+    uint16_t sub_revision_id;
+    uint8_t reserved[6];
+    uint32_t serial_number;
+    uint16_t fic;
+    uint16_t num_bcw;
+    uint64_t bcw_size;
+    uint64_t cmd_offset;
+    uint64_t cmd_size;
+    uint64_t status_offset;
+    uint64_t status_size;
+    uint16_t flags;
+    uint8_t reserved2[6];
+} QEMU_PACKED;
+
+#define REVSISON_ID    1
+#define NFIT_FIC1      0x201
+
+#define MAX_NVDIMM_NUMBER       10
+
+static int get_nvdimm_device_number(GSList *list)
+{
+    int nr = 0;
+
+    for (; list; list = list->next) {
+        nr++;
+    }
+
+    return nr;
+}
+
+static uint32_t nvdimm_index_to_sn(int index)
+{
+    return 0x123456 + index;
+}
+
+static uint32_t nvdimm_index_to_handle(int index)
+{
+    return index + 1;
+}
+
+static size_t get_nfit_total_size(int nr)
+{
+    /* each nvdimm has 3 tables. */
+    return sizeof(struct nfit) + nr * (sizeof(struct nfit_spa) +
+                  sizeof(struct nfit_memdev) + sizeof(struct nfit_dcr));
+}
+
+static int build_spa_table(void *buf, PCNVDIMMDevice *nvdimm, int spa_index)
+{
+    struct nfit_spa *nfit_spa;
+    uint64_t addr = object_property_get_int(OBJECT(&nvdimm->mr), "addr", NULL);
+
+    nfit_spa = (struct nfit_spa *)buf;
+
+    /*
+     * nfit_spa->flags is set to zero so that proximity_domain
+     * info is ignored.
+     */
+    nfit_spa->type = cpu_to_le16(NFIT_TABLE_SPA);
+    nfit_spa->length = cpu_to_le16(sizeof(*nfit_spa));
+    nfit_spa_uuid_pm(&nfit_spa->type_uuid);
+    nfit_spa->spa_index = cpu_to_le16(spa_index);
+    nfit_spa->spa_base = cpu_to_le64(addr);
+    nfit_spa->spa_length = cpu_to_le64(memory_region_size(&nvdimm->mr));
+    nfit_spa->mem_attr = cpu_to_le64(EFI_MEMORY_WB | EFI_MEMORY_NV);
+
+    return sizeof(*nfit_spa);
+}
+
+static int build_memdev_table(void *buf, PCNVDIMMDevice *nvdimm,
+                              int spa_index, int dcr_index)
+{
+    struct nfit_memdev *nfit_memdev;
+    uint64_t addr = object_property_get_int(OBJECT(&nvdimm->mr), "addr", NULL);
+    uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
+
+    nfit_memdev = (struct nfit_memdev *)buf;
+    nfit_memdev->type = cpu_to_le16(NFIT_TABLE_MEM);
+    nfit_memdev->length = cpu_to_le16(sizeof(*nfit_memdev));
+    nfit_memdev->nfit_handle = cpu_to_le32(handle);
+    /* point to nfit_spa. */
+    nfit_memdev->spa_index = cpu_to_le16(spa_index);
+    /* point to nfit_dcr. */
+    nfit_memdev->dcr_index = cpu_to_le16(dcr_index);
+    nfit_memdev->region_len = cpu_to_le64(memory_region_size(&nvdimm->mr));
+    nfit_memdev->region_dpa = cpu_to_le64(addr);
+    /* Only one interleave for pmem. */
+    nfit_memdev->interleave_ways = cpu_to_le16(1);
+
+    return sizeof(*nfit_memdev);
+}
+
+static int build_dcr_table(void *buf, PCNVDIMMDevice *nvdimm, int dcr_index)
+{
+    struct nfit_dcr *nfit_dcr;
+    uint32_t sn = nvdimm_index_to_sn(nvdimm->device_index);
+
+    nfit_dcr = (struct nfit_dcr *)buf;
+    nfit_dcr->type = cpu_to_le16(NFIT_TABLE_DCR);
+    nfit_dcr->length = cpu_to_le16(sizeof(*nfit_dcr));
+    nfit_dcr->dcr_index = cpu_to_le16(dcr_index);
+    nfit_dcr->vendor_id = cpu_to_le16(0x8086);
+    nfit_dcr->device_id = cpu_to_le16(1);
+    nfit_dcr->revision_id = cpu_to_le16(REVSISON_ID);
+    nfit_dcr->serial_number = cpu_to_le32(sn);
+    nfit_dcr->fic = cpu_to_le16(NFIT_FIC1);
+
+    return sizeof(*nfit_dcr);
+}
+
+static void build_nfit_table(GSList *device_list, char *buf)
+{
+    int index = 0;
+
+    buf += sizeof(struct nfit);
+
+    for (; device_list; device_list = device_list->next) {
+        PCNVDIMMDevice *nvdimm = device_list->data;
+        int spa_index, dcr_index;
+
+        spa_index = ++index;
+        dcr_index = ++index;
+
+        /* build System Physical Address Range Description Table. */
+        buf += build_spa_table(buf, nvdimm, spa_index);
+
+        /*
+         * build Memory Device to System Physical Address Range Mapping
+         * Table.
+         */
+        buf += build_memdev_table(buf, nvdimm, spa_index, dcr_index);
+
+        /* build Control Region Descriptor Table. */
+        buf += build_dcr_table(buf, nvdimm, dcr_index);
+    }
+}
+
+void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
+                                GArray *linker)
+{
+    GSList *list = get_nvdimm_built_list();
+    size_t total;
+    char *buf;
+    int nfit_start, nr;
+
+    nr = get_nvdimm_device_number(list);
+    total = get_nfit_total_size(nr);
+
+    if (nr <= 0 || nr > MAX_NVDIMM_NUMBER) {
+        goto exit;
+    }
+
+    nfit_start = table_data->len;
+    acpi_add_table(table_offsets, table_data);
+
+    buf = acpi_data_push(table_data, total);
+    build_nfit_table(list, buf);
+
+    build_header(linker, table_data, (void *)(table_data->data + nfit_start),
+                 "NFIT", table_data->len - nfit_start, 1);
+exit:
+    g_slist_free(list);
+}
diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
new file mode 100644
index 0000000..252a222
--- /dev/null
+++ b/hw/mem/nvdimm/internal.h
@@ -0,0 +1,29 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef __NVDIMM_INTERNAL_H
+#define __NVDIMM_INTERNAL_H
+
+#define PAGE_SIZE               (1UL << 12)
+
+typedef struct {
+    uint8_t b[16];
+} uuid_le;
+
+#define UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)                   \
+((uuid_le)                                                                 \
+{ { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
+    (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
+    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } })
+
+GSList *get_nvdimm_built_list(void);
+#endif
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index 97710d1..2a6cfa2 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -29,7 +29,7 @@
 #include "exec/address-spaces.h"
 #include "hw/mem/pc-nvdimm.h"
 
-#define PAGE_SIZE               (1UL << 12)
+#include "internal.h"
 
 #define MIN_CONFIG_DATA_SIZE    (128 << 10)
 
@@ -65,6 +65,31 @@ static uint32_t new_device_index(void)
     return nvdimms_info.device_index++;
 }
 
+static int pc_nvdimm_built_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_PC_NVDIMM)) {
+        PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+        /* only realized NVDIMMs matter */
+        if (memory_region_size(&nvdimm->mr)) {
+            *list = g_slist_append(*list, nvdimm);
+        }
+    }
+
+    object_child_foreach(obj, pc_nvdimm_built_list, opaque);
+    return 0;
+}
+
+GSList *get_nvdimm_built_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), pc_nvdimm_built_list, &list);
+    return list;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
     PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index f617fd2..b2da8fa 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -36,4 +36,6 @@ typedef struct PCNVDIMMDevice {
     OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
 
 void pc_nvdimm_reserve_range(ram_addr_t offset);
+void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
+                                GArray *linker);
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (8 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 16:11   ` Stefan Hajnoczi
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 11/18] nvdimm: build ACPI nvdimm devices Xiao Guangrong
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

This memory range is used to transfer data between ACPI in guest and Qemu,
it occupies two pages:
- one is RAM-based used to save the input info of _DSM method and Qemu reuse
  it store output info

- another one is MMIO-based, ACPI write data to this page to transfer the
  control to Qemu

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/acpi.c      | 80 ++++++++++++++++++++++++++++++++++++++++++++++-
 hw/mem/nvdimm/internal.h  |  1 +
 hw/mem/nvdimm/pc-nvdimm.c |  2 +-
 3 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index f28752f..e0f2ad3 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -28,6 +28,7 @@
 
 #include "qemu-common.h"
 
+#include "exec/address-spaces.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/mem/pc-nvdimm.h"
 
@@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char *buf)
     }
 }
 
+struct dsm_buffer {
+    /* RAM page. */
+    uint32_t handle;
+    uint8_t arg0[16];
+    uint32_t arg1;
+    uint32_t arg2;
+    union {
+        char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
+    };
+
+    /* MMIO page. */
+    union {
+        uint32_t notify;
+        char pedding[PAGE_SIZE];
+    };
+};
+
+static ram_addr_t dsm_addr;
+static size_t dsm_size;
+
+static uint64_t dsm_read(void *opaque, hwaddr addr,
+                         unsigned size)
+{
+    return 0;
+}
+
+static void dsm_write(void *opaque, hwaddr addr,
+                      uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps dsm_ops = {
+    .read = dsm_read,
+    .write = dsm_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static int build_dsm_buffer(void)
+{
+    MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
+    ram_addr_t addr;;
+
+    QEMU_BUILD_BUG_ON(PAGE_SIZE * 2 != sizeof(struct dsm_buffer));
+
+    /* DSM buffer has already been built. */
+    if (dsm_addr) {
+        return 0;
+    }
+
+    addr = reserved_range_push(2 * PAGE_SIZE);
+    if (!addr) {
+        return -1;
+    }
+
+    dsm_addr = addr;
+    dsm_size = PAGE_SIZE * 2;
+
+    dsm_ram_mr = g_new(MemoryRegion, 1);
+    memory_region_init_ram(dsm_ram_mr, NULL, "dsm_ram", PAGE_SIZE,
+                           &error_abort);
+    vmstate_register_ram_global(dsm_ram_mr);
+    memory_region_add_subregion(get_system_memory(), addr, dsm_ram_mr);
+
+    dsm_mmio_mr = g_new(MemoryRegion, 1);
+    memory_region_init_io(dsm_mmio_mr, NULL, &dsm_ops, dsm_ram_mr,
+                          "dsm_mmio", PAGE_SIZE);
+    memory_region_add_subregion(get_system_memory(), addr + PAGE_SIZE,
+                                dsm_mmio_mr);
+    return 0;
+}
+
 void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
                                 GArray *linker)
 {
-    GSList *list = get_nvdimm_built_list();
+    GSList *list;
     size_t total;
     char *buf;
     int nfit_start, nr;
 
+    if (build_dsm_buffer()) {
+        fprintf(stderr, "do not have enough space for DSM buffer.\n");
+        return;
+    }
+
+    list = get_nvdimm_built_list();
     nr = get_nvdimm_device_number(list);
     total = get_nfit_total_size(nr);
 
diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
index 252a222..90d54dc 100644
--- a/hw/mem/nvdimm/internal.h
+++ b/hw/mem/nvdimm/internal.h
@@ -26,4 +26,5 @@ typedef struct {
     (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } })
 
 GSList *get_nvdimm_built_list(void);
+ram_addr_t reserved_range_push(uint64_t size);
 #endif
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
index 2a6cfa2..752842a 100644
--- a/hw/mem/nvdimm/pc-nvdimm.c
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -45,7 +45,7 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
     nvdimms_info.current_addr = offset;
 }
 
-static ram_addr_t reserved_range_push(uint64_t size)
+ram_addr_t reserved_range_push(uint64_t size)
 {
     uint64_t current;
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 11/18] nvdimm: build ACPI nvdimm devices
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (9 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 12/18] nvdimm: save arg3 for NVDIMM device _DSM method Xiao Guangrong
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

This is a root device under \_SB and specified NVDIMM device are under the
root device. Each NVDIMM device has _ADR which return its handle used to
associate MEMDEV table in NFIT

We reserve handle 0 for root device. In this patch, we save handle, arg0,
arg1 and arg2. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/i386/acpi-build.c       |   2 +
 hw/mem/nvdimm/acpi.c       | 130 ++++++++++++++++++++++++++++++++++++++++++++-
 include/hw/mem/pc-nvdimm.h |   2 +
 3 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 092ed2f..a792135 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1342,6 +1342,8 @@ build_ssdt(GArray *table_data, GArray *linker,
                 aml_append(sb_scope, scope);
             }
         }
+
+        pc_nvdimm_build_acpi_devices(sb_scope);
         aml_append(ssdt, sb_scope);
     }
 
diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index e0f2ad3..909a8ef 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -135,10 +135,11 @@ struct nfit_dcr {
     uint8_t reserved2[6];
 } QEMU_PACKED;
 
-#define REVSISON_ID    1
-#define NFIT_FIC1      0x201
+#define REVSISON_ID             1
+#define NFIT_FIC1               0x201
 
 #define MAX_NVDIMM_NUMBER       10
+#define NOTIFY_VALUE            0x99
 
 static int get_nvdimm_device_number(GSList *list)
 {
@@ -281,12 +282,15 @@ static size_t dsm_size;
 static uint64_t dsm_read(void *opaque, hwaddr addr,
                          unsigned size)
 {
+    fprintf(stderr, "BUG: we never read DSM notification MMIO.\n");
+    assert(0);
     return 0;
 }
 
 static void dsm_write(void *opaque, hwaddr addr,
                       uint64_t val, unsigned size)
 {
+    assert(val == NOTIFY_VALUE);
 }
 
 static const MemoryRegionOps dsm_ops = {
@@ -361,3 +365,125 @@ void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
 exit:
     g_slist_free(list);
 }
+
+#define BUILD_STA_METHOD(_dev_, _method_)                                  \
+    do {                                                                   \
+        _method_ = aml_method("_STA", 0);                                  \
+        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
+        aml_append(_dev_, _method_);                                       \
+    } while (0)
+
+#define SAVE_ARG012_HANDLE(_method_, _handle_)                             \
+    do {                                                                   \
+        aml_append(_method_, aml_store(_handle_, aml_name("HDLE")));       \
+        aml_append(_method_, aml_store(aml_arg(0), aml_name("ARG0")));     \
+        aml_append(_method_, aml_store(aml_arg(1), aml_name("ARG1")));     \
+        aml_append(_method_, aml_store(aml_arg(2), aml_name("ARG2")));     \
+    } while (0)
+
+#define NOTIFY_AND_RETURN(_method_)                                        \
+    do {                                                                   \
+        aml_append(_method_, aml_store(aml_int(NOTIFY_VALUE),              \
+                   aml_name("NOTI")));                                     \
+        aml_append(_method_, aml_return(aml_name("ODAT")));                \
+    } while (0)
+
+static void build_nvdimm_devices(Aml *root_dev, GSList *list)
+{
+    for (; list; list = list->next) {
+        PCNVDIMMDevice *nvdimm = list->data;
+        uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
+        Aml *dev, *method;
+
+        dev = aml_device("NVD%d", nvdimm->device_index);
+        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
+
+        BUILD_STA_METHOD(dev, method);
+
+        method = aml_method("_DSM", 4);
+        {
+            SAVE_ARG012_HANDLE(method, aml_int(handle));
+            NOTIFY_AND_RETURN(method);
+        }
+        aml_append(dev, method);
+
+        aml_append(root_dev, dev);
+    }
+}
+
+void pc_nvdimm_build_acpi_devices(Aml *sb_scope)
+{
+    Aml *dev, *method, *field;
+    struct dsm_buffer *dsm_buf;
+    GSList *list = get_nvdimm_built_list();
+    int nr = get_nvdimm_device_number(list);
+
+    if (nr <= 0 || nr > MAX_NVDIMM_NUMBER) {
+        g_slist_free(list);
+        return;
+    }
+
+    dev = aml_device("NVDR");
+    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
+
+    /* map DSM buffer into ACPI namespace. */
+    aml_append(dev, aml_operation_region("DSMR", AML_SYSTEM_MEMORY,
+               dsm_addr, dsm_size));
+
+    /*
+     * DSM input:
+     * @HDLE: store device's handle, it's zero if the _DSM call happens
+     *        on ROOT.
+     * @ARG0 ~ @ARG3: store the parameters of _DSM call.
+     *
+     * They are ram mapping on host so that these access never cause VM-EXIT.
+     */
+    field = aml_field("DSMR", AML_DWORD_ACC, AML_PRESERVE);
+    aml_append(field, aml_named_field("HDLE",
+                   sizeof(dsm_buf->handle) * BITS_PER_BYTE));
+    aml_append(field, aml_named_field("ARG0",
+                   sizeof(dsm_buf->arg0) * BITS_PER_BYTE));
+    aml_append(field, aml_named_field("ARG1",
+                   sizeof(dsm_buf->arg1) * BITS_PER_BYTE));
+    aml_append(field, aml_named_field("ARG2",
+                   sizeof(dsm_buf->arg2) * BITS_PER_BYTE));
+    aml_append(field, aml_named_field("ARG3",
+                   sizeof(dsm_buf->arg3) * BITS_PER_BYTE));
+    /*
+     * DSM input:
+     * @NOTI: write value to it will notify QEMU that _DSM method is being
+     *        called and the parameters can be found in dsm_buf.
+     *
+     * It is MMIO mapping on host so that it will cause VM-exit and QEMU
+     * gets control.
+     */
+    aml_append(field, aml_named_field("NOTI",
+                   sizeof(dsm_buf->notify) * BITS_PER_BYTE));
+    aml_append(dev, field);
+
+    /*
+     * DSM output:
+     * @ODAT: it resues the first page of dsm buffer and QEMU uses it to
+     *        stores the result
+     *
+     * Since the first page is reused by both input and out, the input data
+     * will be lost after storing new result into @ODAT
+     */
+    field = aml_field("DSMR", AML_DWORD_ACC, AML_PRESERVE);
+    aml_append(field, aml_named_field("ODAT", PAGE_SIZE * BITS_PER_BYTE));
+    aml_append(dev, field);
+
+    BUILD_STA_METHOD(dev, method);
+
+    method = aml_method("_DSM", 4);
+    {
+        SAVE_ARG012_HANDLE(method, aml_int(0));
+        NOTIFY_AND_RETURN(method);
+    }
+    aml_append(dev, method);
+
+    build_nvdimm_devices(dev, list);
+
+    aml_append(sb_scope, dev);
+    g_slist_free(list);
+}
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index b2da8fa..b7faec3 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -14,6 +14,7 @@
 #define __PC_NVDIMM_H
 
 #include "hw/qdev.h"
+#include "hw/acpi/aml-build.h"
 
 typedef struct PCNVDIMMDevice {
     /* private */
@@ -38,4 +39,5 @@ typedef struct PCNVDIMMDevice {
 void pc_nvdimm_reserve_range(ram_addr_t offset);
 void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
                                 GArray *linker);
+void pc_nvdimm_build_acpi_devices(Aml *sb_scope);
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 12/18] nvdimm: save arg3 for NVDIMM device _DSM method
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (10 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 11/18] nvdimm: build ACPI nvdimm devices Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data Xiao Guangrong
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Check if the function (Arg2) has additional input info (arg3) and save
the info if needed

We only do the save on NVDIMM device since we are not going to support any
function on root device

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/acpi.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 909a8ef..0b09efa 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -259,6 +259,26 @@ static void build_nfit_table(GSList *device_list, char *buf)
     }
 }
 
+enum {
+    NFIT_CMD_IMPLEMENTED = 0,
+
+    /* bus commands */
+    NFIT_CMD_ARS_CAP = 1,
+    NFIT_CMD_ARS_START = 2,
+    NFIT_CMD_ARS_QUERY = 3,
+
+    /* per-dimm commands */
+    NFIT_CMD_SMART = 1,
+    NFIT_CMD_SMART_THRESHOLD = 2,
+    NFIT_CMD_DIMM_FLAGS = 3,
+    NFIT_CMD_GET_CONFIG_SIZE = 4,
+    NFIT_CMD_GET_CONFIG_DATA = 5,
+    NFIT_CMD_SET_CONFIG_DATA = 6,
+    NFIT_CMD_VENDOR_EFFECT_LOG_SIZE = 7,
+    NFIT_CMD_VENDOR_EFFECT_LOG = 8,
+    NFIT_CMD_VENDOR = 9,
+};
+
 struct dsm_buffer {
     /* RAM page. */
     uint32_t handle;
@@ -366,6 +386,19 @@ exit:
     g_slist_free(list);
 }
 
+static bool device_cmd_has_arg3[] = {
+    false,      /* NFIT_CMD_IMPLEMENTED */
+    false,      /* NFIT_CMD_SMART */
+    false,      /* NFIT_CMD_SMART_THRESHOLD */
+    false,      /* NFIT_CMD_DIMM_FLAGS */
+    false,      /* NFIT_CMD_GET_CONFIG_SIZE */
+    true,       /* NFIT_CMD_GET_CONFIG_DATA */
+    true,       /* NFIT_CMD_SET_CONFIG_DATA */
+    false,      /* NFIT_CMD_VENDOR_EFFECT_LOG_SIZE */
+    false,      /* NFIT_CMD_VENDOR_EFFECT_LOG */
+    false,      /* NFIT_CMD_VENDOR */
+};
+
 #define BUILD_STA_METHOD(_dev_, _method_)                                  \
     do {                                                                   \
         _method_ = aml_method("_STA", 0);                                  \
@@ -390,10 +423,20 @@ exit:
 
 static void build_nvdimm_devices(Aml *root_dev, GSList *list)
 {
+    Aml *has_arg3;
+    int i, cmd_nr;
+
+    cmd_nr = ARRAY_SIZE(device_cmd_has_arg3);
+    has_arg3 = aml_package(cmd_nr);
+    for (i = 0; i < cmd_nr; i++) {
+        aml_append(has_arg3, aml_int(device_cmd_has_arg3[i]));
+    }
+    aml_append(root_dev, aml_name_decl("CAG3", has_arg3));
+
     for (; list; list = list->next) {
         PCNVDIMMDevice *nvdimm = list->data;
         uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
-        Aml *dev, *method;
+        Aml *dev, *method, *ifctx;
 
         dev = aml_device("NVD%d", nvdimm->device_index);
         aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
@@ -403,6 +446,34 @@ static void build_nvdimm_devices(Aml *root_dev, GSList *list)
         method = aml_method("_DSM", 4);
         {
             SAVE_ARG012_HANDLE(method, aml_int(handle));
+
+            /* Local5 = DeRefOf(Index(CAG3, Arg2)) */
+            aml_append(method,
+                       aml_store(aml_derefof(aml_index(aml_name("CAG3"),
+                       aml_arg(2))), aml_local(5)));
+            /* if 0 < local5 */
+            ifctx = aml_if(aml_lless(aml_int(0), aml_local(5)));
+            {
+                /* Local0 = Index(Arg3, 0) */
+                aml_append(ifctx, aml_store(aml_index(aml_arg(3), aml_int(0)),
+                           aml_local(0)));
+                /* Local1 = sizeof(Local0) */
+                aml_append(ifctx, aml_store(aml_sizeof(aml_local(0)),
+                           aml_local(1)));
+                /* Local2 = Local1 << 3 */
+                aml_append(ifctx, aml_store(aml_shiftleft(aml_local(1),
+                           aml_int(3)), aml_local(2)));
+                /* Local3 = DeRefOf(Local0) */
+                aml_append(ifctx, aml_store(aml_derefof(aml_local(0)),
+                           aml_local(3)));
+                /* CreateField(Local3, 0, local2, IBUF) */
+                aml_append(ifctx, aml_create_field(aml_local(3),
+                           aml_int(0), aml_local(2), "IBUF"));
+                /* ARG3 = IBUF */
+                aml_append(ifctx, aml_store(aml_name("IBUF"),
+                           aml_name("ARG3")));
+            }
+            aml_append(method, ifctx);
             NOTIFY_AND_RETURN(method);
         }
         aml_append(dev, method);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (11 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 12/18] nvdimm: save arg3 for NVDIMM device _DSM method Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 16:16   ` Stefan Hajnoczi
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function Xiao Guangrong
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

If @configdata is false, Qemu will build a static and readonly
namespace in memory and use it serveing for
DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/Makefile.objs       |   3 +-
 hw/mem/nvdimm/acpi.c       |  10 ++
 hw/mem/nvdimm/internal.h   |  12 ++
 hw/mem/nvdimm/namespace.c  | 307 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/mem/pc-nvdimm.h |   2 +
 5 files changed, 333 insertions(+), 1 deletion(-)
 create mode 100644 hw/mem/nvdimm/namespace.c

diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 7a6948d..7f3fab2 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,3 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
-common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o	\
+			       nvdimm/namespace.o
diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 0b09efa..c773954 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -240,6 +240,8 @@ static void build_nfit_table(GSList *device_list, char *buf)
 
     for (; device_list; device_list = device_list->next) {
         PCNVDIMMDevice *nvdimm = device_list->data;
+        struct nfit_memdev *nfit_memdev;
+        struct nfit_dcr *nfit_dcr;
         int spa_index, dcr_index;
 
         spa_index = ++index;
@@ -252,10 +254,15 @@ static void build_nfit_table(GSList *device_list, char *buf)
          * build Memory Device to System Physical Address Range Mapping
          * Table.
          */
+        nfit_memdev = (struct nfit_memdev *)buf;
         buf += build_memdev_table(buf, nvdimm, spa_index, dcr_index);
 
         /* build Control Region Descriptor Table. */
+        nfit_dcr = (struct nfit_dcr *)buf;
         buf += build_dcr_table(buf, nvdimm, dcr_index);
+
+        calculate_nvdimm_isetcookie(nvdimm, nfit_memdev->region_spa_offset,
+                                    nfit_dcr->serial_number);
     }
 }
 
@@ -382,6 +389,9 @@ void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
 
     build_header(linker, table_data, (void *)(table_data->data + nfit_start),
                  "NFIT", table_data->len - nfit_start, 1);
+
+    build_nvdimm_configdata(list);
+
 exit:
     g_slist_free(list);
 }
diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
index 90d54dc..b1f3f16 100644
--- a/hw/mem/nvdimm/internal.h
+++ b/hw/mem/nvdimm/internal.h
@@ -13,6 +13,14 @@
 #ifndef __NVDIMM_INTERNAL_H
 #define __NVDIMM_INTERNAL_H
 
+/* #define NVDIMM_DEBUG */
+
+#ifdef NVDIMM_DEBUG
+#define nvdebug(fmt, ...) fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__)
+#else
+#define nvdebug(...)
+#endif
+
 #define PAGE_SIZE               (1UL << 12)
 
 typedef struct {
@@ -27,4 +35,8 @@ typedef struct {
 
 GSList *get_nvdimm_built_list(void);
 ram_addr_t reserved_range_push(uint64_t size);
+
+void calculate_nvdimm_isetcookie(PCNVDIMMDevice *nvdimm, uint64_t spa,
+                                 uint32_t sn);
+void build_nvdimm_configdata(GSList *device_list);
 #endif
diff --git a/hw/mem/nvdimm/namespace.c b/hw/mem/nvdimm/namespace.c
new file mode 100644
index 0000000..04626da
--- /dev/null
+++ b/hw/mem/nvdimm/namespace.c
@@ -0,0 +1,307 @@
+/*
+ * NVDIMM  Namespace Support
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * NVDIMM namespace specification can be found at:
+ *      http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/pc-nvdimm.h"
+
+#include "internal.h"
+
+static uint64_t fletcher64(void *addr, size_t len)
+{
+    uint32_t *buf = addr;
+    uint32_t lo32 = 0;
+    uint64_t hi32 = 0;
+    int i;
+
+    for (i = 0; i < len / sizeof(uint32_t); i++) {
+        lo32 += cpu_to_le32(buf[i]);
+        hi32 += lo32;
+    }
+
+    return hi32 << 32 | lo32;
+}
+
+struct interleave_set_info {
+    struct interleave_set_info_map {
+        uint64_t region_spa_offset;
+        uint32_t serial_number;
+        uint32_t zero;
+    } mapping[1];
+};
+
+void calculate_nvdimm_isetcookie(PCNVDIMMDevice *nvdimm, uint64_t spa,
+                                 uint32_t sn)
+{
+    struct interleave_set_info info;
+
+    info.mapping[0].region_spa_offset = spa;
+    info.mapping[0].serial_number = sn;
+    info.mapping[0].zero = 0;
+
+    nvdimm->isetcookie = fletcher64(&info, sizeof(info));
+}
+
+#define NSINDEX_SIGNATURE      "NAMESPACE_INDEX\0"
+
+enum {
+    NSINDEX_SIG_LEN = 16,
+    NSINDEX_ALIGN = 256,
+    NSINDEX_SEQ_MASK = 0x3,
+    NSINDEX_MAJOR = 0x1,
+    NSINDEX_MINOR = 0x1,
+
+    NSLABEL_UUID_LEN = 16,
+    NSLABEL_NAME_LEN = 64,
+    NSLABEL_FLAG_ROLABEL = 0x1,  /* read-only label */
+    NSLABEL_FLAG_LOCAL = 0x2,    /* DIMM-local namespace */
+    NSLABEL_FLAG_BTT = 0x4,      /* namespace contains a BTT */
+    NSLABEL_FLAG_UPDATING = 0x8, /* label being updated */
+};
+
+/*
+ * struct nd_namespace_index - label set superblock
+ * @sig: NAMESPACE_INDEX\0
+ * @flags: placeholder
+ * @seq: sequence number for this index
+ * @myoff: offset of this index in label area
+ * @mysize: size of this index struct
+ * @otheroff: offset of other index
+ * @labeloff: offset of first label slot
+ * @nslot: total number of label slots
+ * @major: label area major version
+ * @minor: label area minor version
+ * @checksum: fletcher64 of all fields
+ * @free[0]: bitmap, nlabel bits
+ *
+ * The size of free[] is rounded up so the total struct size is a
+ * multiple of NSINDEX_ALIGN bytes.  Any bits this allocates beyond
+ * nlabel bits must be zero.
+ */
+struct namespace_label_index_block {
+    uint8_t sig[NSINDEX_SIG_LEN];
+    uint32_t flags;
+    uint32_t seq;
+    uint64_t myoff;
+    uint64_t mysize;
+    uint64_t otheroff;
+    uint64_t labeloff;
+    uint32_t nlabel;
+    uint16_t major;
+    uint16_t minor;
+    uint64_t checksum;
+    uint8_t free[0];
+} QEMU_PACKED;
+
+/*
+ * struct nd_namespace_label - namespace superblock
+ * @uuid: UUID per RFC 4122
+ * @name: optional name (NULL-terminated)
+ * @flags: see NSLABEL_FLAG_*
+ * @nlabel: num labels to describe this ns
+ * @position: labels position in set
+ * @isetcookie: interleave set cookie
+ * @lbasize: LBA size in bytes or 0 for pmem
+ * @dpa: DPA of NVM range on this DIMM
+ * @rawsize: size of namespace
+ * @slot: slot of this label in label area
+ * @unused: must be zero
+ */
+struct namespace_label {
+    uint8_t uuid[NSLABEL_UUID_LEN];
+    uint8_t name[NSLABEL_NAME_LEN];
+    uint32_t flags;
+    uint16_t nlabel;
+    uint16_t position;
+    uint64_t isetcookie;
+    uint64_t lbasize;
+    uint64_t dpa;
+    uint64_t rawsize;
+    uint32_t slot;
+    uint32_t unused;
+} QEMU_PACKED;
+
+/*calculate the number of label can be contained in whole config space. */
+static int config_space_max_label_nr(PCNVDIMMDevice *nvdimm, size_t block_size)
+{
+    /* totally we have 2 namespace label index block. */
+    if (block_size * 2 >= nvdimm->config_data_size) {
+        return 0;
+    }
+
+    return (nvdimm->config_data_size - block_size * 2) /
+            sizeof(struct namespace_label);
+}
+
+/*calculate the number of label can be contained in index block. */
+static int label_index_block_max_label_nr(size_t block_size)
+{
+    int free_size;
+
+    free_size = block_size - sizeof(struct namespace_label_index_block);
+
+    return free_size * BITS_PER_BYTE;
+}
+
+static int calculate_max_label_nr(PCNVDIMMDevice *nvdimm, size_t block_size)
+{
+    return MIN(label_index_block_max_label_nr(block_size),
+        config_space_max_label_nr(nvdimm, block_size));
+}
+
+/*
+ * check if we can increase the size of namespace_label_index_block to
+ * contain more labels.
+ */
+static bool can_increase_index_block(PCNVDIMMDevice *nvdimm,
+                                     size_t block_size, int label_nr)
+{
+    size_t remaining;
+
+    remaining = nvdimm->config_data_size - block_size * 2 -
+                label_nr * sizeof(struct namespace_label);
+
+    assert((int64_t)remaining >= 0);
+
+    /* can contain 1 label at least. */
+    return remaining >=  NSINDEX_ALIGN * 2 + sizeof(struct namespace_label);
+}
+
+static void count_label_nr(PCNVDIMMDevice *nvdimm, size_t *label_block_size,
+                           int *label_nr)
+{
+    *label_block_size = 0;
+
+    do {
+        /*
+          * The minimum size of an index block is 256 bytes and the size must
+          * be a multiple of 256 bytes.
+          */
+        *label_block_size += NSINDEX_ALIGN;
+
+        *label_nr = calculate_max_label_nr(nvdimm, *label_block_size);
+    } while (can_increase_index_block(nvdimm, *label_block_size, *label_nr));
+}
+
+static void namespace_label_uuid(PCNVDIMMDevice *nvdimm, void *uuid)
+{
+    uuid_le label_uuid_init = UUID_LE(0x137e67a9, 0x7dcb, 0x4c66, 0xb2,
+                                      0xe6, 0x05, 0x06, 0x5b, 0xeb,
+                                      0x6a, 0x00);
+
+    assert(nvdimm->device_index <= 0xff);
+
+    label_uuid_init.b[0] += nvdimm->device_index;
+    memcpy(uuid, &label_uuid_init, sizeof(label_uuid_init));
+}
+
+static void init_namespace(PCNVDIMMDevice *nvdimm)
+{
+    struct namespace_label_index_block *index1, *index2;
+    struct namespace_label *label;
+    int i;
+
+    size_t label_block_size;
+    int label_nr;
+
+    assert(!nvdimm->configdata);
+
+    count_label_nr(nvdimm, &label_block_size, &label_nr);
+    nvdebug("nvdimm%d: label_block_size 0x%lx label_nr %d.\n",
+            nvdimm->device_index, label_block_size, label_nr);
+
+    index1 = nvdimm->config_data_addr;
+
+    /*
+     * init the first namespace label index block, except @otheroff
+     * and @checksum. we will do it later.
+     */
+    memcpy(index1->sig, NSINDEX_SIGNATURE, sizeof(NSINDEX_SIGNATURE));
+    index1->flags = cpu_to_le32(0);
+    index1->seq = cpu_to_le32(0x1);
+    index1->myoff = cpu_to_le64(0);
+    index1->mysize = cpu_to_le64(label_block_size);
+    index1->labeloff = cpu_to_le64(label_block_size * 2);
+    index1->nlabel = cpu_to_le32(label_nr);
+    index1->major = cpu_to_le16(NSINDEX_MAJOR);
+    index1->minor = cpu_to_le16(NSINDEX_MINOR);
+    index1->checksum = cpu_to_le64(0);
+    memset(index1->free, 0,
+           label_block_size - sizeof(struct namespace_label_index_block));
+
+    /*
+     * the label slot with the lowest offset in the label storage area is
+     * tracked by the least significant bit of the first byte of the free
+     * array.
+     *
+     * the fist label is used.
+     */
+    for (i = 1; i < index1->nlabel; i++) {
+        set_bit(i, (unsigned long *)index1->free);
+    }
+
+    /* init the second namespace label index block. */
+    index2 = (void *)index1 + label_block_size;
+    memcpy(index2, index1, label_block_size);
+    index2->seq = cpu_to_le32(0x2);
+    index2->myoff = cpu_to_le64(label_block_size);
+
+    /* init @otheroff and @checksume. */
+    index1->otheroff = cpu_to_le64(index2->myoff);
+    index2->otheroff = cpu_to_le64(index1->myoff);
+    index1->checksum = cpu_to_le64(fletcher64(index1, label_block_size));
+    index2->checksum = cpu_to_le64(fletcher64(index2, label_block_size));
+
+    /* only one label is used which is the first label and is readonly. */
+    label = nvdimm->config_data_addr + label_block_size * 2;
+    namespace_label_uuid(nvdimm, label->uuid);
+    sprintf((char *)label->name, "QEMU NS%d", nvdimm->device_index);
+    label->flags = cpu_to_le32(NSLABEL_FLAG_ROLABEL);
+    label->nlabel = cpu_to_le16(1);
+    label->position = cpu_to_le16(0);
+    label->isetcookie = cpu_to_le64(nvdimm->isetcookie);
+    label->lbasize = cpu_to_le64(0);
+    label->dpa = cpu_to_le64(object_property_get_int(OBJECT(&nvdimm->mr),
+                                                     "addr", NULL));
+    label->rawsize = cpu_to_le64(memory_region_size(&nvdimm->mr));
+    label->slot = cpu_to_le32(0);
+    label->unused = cpu_to_le32(0);
+
+    nvdebug("nvdimm%d, checksum1 0x%lx checksum2 0x%lx isetcookie 0x%lx.\n",
+            nvdimm->device_index, index1->checksum, index2->checksum,
+            label->isetcookie);
+}
+
+void build_nvdimm_configdata(GSList *device_list)
+{
+    for (; device_list; device_list = device_list->next) {
+        PCNVDIMMDevice *nvdimm = device_list->data;
+
+        if (nvdimm->config_data_addr) {
+            return;
+        }
+
+        nvdimm->config_data_addr = g_malloc(nvdimm->config_data_size);
+        init_namespace(nvdimm);
+    }
+}
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index b7faec3..8aa7086 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -28,6 +28,8 @@ typedef struct PCNVDIMMDevice {
     uint64_t config_data_size;
     void *config_data_addr;
 
+    uint64_t isetcookie;
+
     MemoryRegion mr;
 } PCNVDIMMDevice;
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (12 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 16:23   ` Stefan Hajnoczi
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function Xiao Guangrong
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

__DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)

Function 0 is a query function. We do not support any function on root
device and only 3 functions are support for NVDIMM device,
NFIT_CMD_GET_CONFIG_SIZE, NFIT_CMD_GET_CONFIG_DATA and
NFIT_CMD_SET_CONFIG_DATA, that means we currently only allow to access
device's Label Namespace

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/acpi.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 152 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index c773954..20aefce 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -31,6 +31,7 @@
 #include "exec/address-spaces.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/mem/pc-nvdimm.h"
+#include "sysemu/sysemu.h"
 
 #include "internal.h"
 
@@ -41,6 +42,22 @@ static void nfit_spa_uuid_pm(void *uuid)
     memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
 }
 
+static bool dsm_is_root_uuid(uint8_t *uuid)
+{
+    uuid_le uuid_root = UUID_LE(0x2f10e7a4, 0x9e91, 0x11e4, 0x89,
+                                0xd3, 0x12, 0x3b, 0x93, 0xf7, 0x5c, 0xba);
+
+    return !memcmp(uuid, &uuid_root, sizeof(uuid_root));
+}
+
+static bool dsm_is_dimm_uuid(uint8_t *uuid)
+{
+    uuid_le uuid_dimm = UUID_LE(0x4309ac30, 0x0d11, 0x11e4, 0x91,
+                                0x91, 0x08, 0x00, 0x20, 0x0c, 0x9a, 0x66);
+
+    return !memcmp(uuid, &uuid_dimm, sizeof(uuid_dimm));
+}
+
 enum {
     NFIT_TABLE_SPA = 0,
     NFIT_TABLE_MEM = 1,
@@ -162,6 +179,20 @@ static uint32_t nvdimm_index_to_handle(int index)
     return index + 1;
 }
 
+static PCNVDIMMDevice
+*get_nvdimm_device_by_handle(GSList *list, uint32_t handle)
+{
+    for (; list; list = list->next) {
+        PCNVDIMMDevice *nvdimm = list->data;
+
+        if (nvdimm_index_to_handle(nvdimm->device_index) == handle) {
+            return nvdimm;
+        }
+    }
+
+    return NULL;
+}
+
 static size_t get_nfit_total_size(int nr)
 {
     /* each nvdimm has 3 tables. */
@@ -286,6 +317,23 @@ enum {
     NFIT_CMD_VENDOR = 9,
 };
 
+enum {
+    NFIT_STATUS_SUCCESS = 0,
+    NFIT_STATUS_NOT_SUPPORTED = 1,
+    NFIT_STATUS_NON_EXISTING_MEM_DEV = 2,
+    NFIT_STATUS_INVALID_PARAS = 3,
+    NFIT_STATUS_VENDOR_SPECIFIC_ERROR = 4,
+};
+
+#define DSM_REVISION        (1)
+
+/* do not support any command except NFIT_CMD_IMPLEMENTED on root. */
+#define ROOT_SUPPORT_CMD    (1 << NFIT_CMD_IMPLEMENTED)
+/* support NFIT_CMD_SET_CONFIG_DATA iif nvdimm->configdata is true. */
+#define DIMM_SUPPORT_CMD    ((1 << NFIT_CMD_IMPLEMENTED)        \
+                           | (1 << NFIT_CMD_GET_CONFIG_SIZE)    \
+                           | (1 << NFIT_CMD_GET_CONFIG_DATA))
+
 struct dsm_buffer {
     /* RAM page. */
     uint32_t handle;
@@ -306,6 +354,18 @@ struct dsm_buffer {
 static ram_addr_t dsm_addr;
 static size_t dsm_size;
 
+struct cmd_out_implemented {
+    uint64_t cmd_list;
+};
+
+struct dsm_out {
+    union {
+        uint32_t status;
+        struct cmd_out_implemented cmd_implemented;
+        uint8_t data[PAGE_SIZE];
+    };
+};
+
 static uint64_t dsm_read(void *opaque, hwaddr addr,
                          unsigned size)
 {
@@ -314,10 +374,102 @@ static uint64_t dsm_read(void *opaque, hwaddr addr,
     return 0;
 }
 
+static void dsm_write_root(struct dsm_buffer *in, struct dsm_out *out)
+{
+    uint32_t function = in->arg2;
+
+    if (function == NFIT_CMD_IMPLEMENTED) {
+        out->cmd_implemented.cmd_list = cpu_to_le64(ROOT_SUPPORT_CMD);
+        return;
+    }
+
+    out->status = cpu_to_le32(NFIT_STATUS_NOT_SUPPORTED);
+    nvdebug("Return status %#x.\n", out->status);
+}
+
+static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
+{
+    GSList *list = get_nvdimm_built_list();
+    PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in->handle);
+    uint32_t function = in->arg2;
+    uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV;
+    uint64_t cmd_list;
+
+    if (!nvdimm) {
+        goto set_status_free;
+    }
+
+    switch (function) {
+    case NFIT_CMD_IMPLEMENTED:
+        cmd_list = DIMM_SUPPORT_CMD;
+        if (nvdimm->configdata) {
+            cmd_list |= 1 << NFIT_CMD_SET_CONFIG_DATA;
+        }
+
+        out->cmd_implemented.cmd_list = cpu_to_le64(cmd_list);
+        goto free;
+    default:
+        status = NFIT_STATUS_NOT_SUPPORTED;
+    };
+
+    nvdebug("Return status %#x.\n", status);
+
+set_status_free:
+    out->status = cpu_to_le32(status);
+free:
+    g_slist_free(list);
+}
+
 static void dsm_write(void *opaque, hwaddr addr,
                       uint64_t val, unsigned size)
 {
+    struct MemoryRegion *dsm_ram_mr = opaque;
+    struct dsm_buffer *dsm;
+    struct dsm_out *out;
+    void *buf;
+
     assert(val == NOTIFY_VALUE);
+
+    buf = memory_region_get_ram_ptr(dsm_ram_mr);
+    dsm = buf;
+    out = buf;
+
+    le32_to_cpus(&dsm->handle);
+    le32_to_cpus(&dsm->arg1);
+    le32_to_cpus(&dsm->arg2);
+
+    nvdebug("Arg0 " UUID_FMT ".\n", dsm->arg0[0], dsm->arg0[1], dsm->arg0[2],
+            dsm->arg0[3], dsm->arg0[4], dsm->arg0[5], dsm->arg0[6],
+            dsm->arg0[7], dsm->arg0[8], dsm->arg0[9], dsm->arg0[10],
+            dsm->arg0[11], dsm->arg0[12], dsm->arg0[13], dsm->arg0[14],
+            dsm->arg0[15]);
+    nvdebug("Handler %#x, Arg1 %#x, Arg2 %#x.\n", dsm->handle, dsm->arg1,
+            dsm->arg2);
+
+    if (dsm->arg1 != DSM_REVISION) {
+        nvdebug("Revision %#x is not supported, expect %#x.\n",
+                dsm->arg1, DSM_REVISION);
+        goto exit;
+    }
+
+    if (!dsm->handle) {
+        if (!dsm_is_root_uuid(dsm->arg0)) {
+            nvdebug("Root UUID does not match.\n");
+            goto exit;
+        }
+
+        return dsm_write_root(dsm, out);
+    }
+
+    if (!dsm_is_dimm_uuid(dsm->arg0)) {
+        nvdebug("DIMM UUID does not match.\n");
+        goto exit;
+    }
+
+    return dsm_write_nvdimm(dsm, out);
+
+exit:
+    out->status = cpu_to_le32(NFIT_STATUS_NOT_SUPPORTED);
 }
 
 static const MemoryRegionOps dsm_ops = {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (13 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 16:24   ` Stefan Hajnoczi
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 16/18] nvdimm: support NFIT_CMD_GET_CONFIG_DATA Xiao Guangrong
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Function 4 is used to get Namespace lable size

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/acpi.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 20aefce..0a5f2c2 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -334,6 +334,17 @@ enum {
                            | (1 << NFIT_CMD_GET_CONFIG_SIZE)    \
                            | (1 << NFIT_CMD_GET_CONFIG_DATA))
 
+struct cmd_in_get_config_data {
+    uint32_t offset;
+    uint32_t length;
+} QEMU_PACKED;
+
+struct cmd_in_set_config_data {
+    uint32_t offset;
+    uint32_t length;
+    uint8_t in_buf[0];
+} QEMU_PACKED;
+
 struct dsm_buffer {
     /* RAM page. */
     uint32_t handle;
@@ -341,6 +352,7 @@ struct dsm_buffer {
     uint32_t arg1;
     uint32_t arg2;
     union {
+        struct cmd_in_set_config_data cmd_config_set;
         char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
     };
 
@@ -358,10 +370,23 @@ struct cmd_out_implemented {
     uint64_t cmd_list;
 };
 
+struct cmd_out_get_config_size {
+    uint32_t status;
+    uint32_t config_size;
+    uint32_t max_xfer;
+} QEMU_PACKED;
+
+struct cmd_out_get_config_data {
+    uint32_t status;
+    uint8_t out_buf[0];
+} QEMU_PACKED;
+
 struct dsm_out {
     union {
         uint32_t status;
         struct cmd_out_implemented cmd_implemented;
+        struct cmd_out_get_config_size cmd_config_size;
+        struct cmd_out_get_config_data cmd_config_get;
         uint8_t data[PAGE_SIZE];
     };
 };
@@ -387,6 +412,48 @@ static void dsm_write_root(struct dsm_buffer *in, struct dsm_out *out)
     nvdebug("Return status %#x.\n", out->status);
 }
 
+/*
+ * the max transfer size is the max size transfered by both a
+ * NFIT_CMD_GET_CONFIG_DATA and a NFIT_CMD_SET_CONFIG_DATA
+ * command.
+ */
+static uint32_t max_xfer_config_size(void)
+{
+    struct dsm_buffer *in;
+    struct dsm_out *out;
+    uint32_t max_get_size, max_set_size;
+
+    /*
+     * the max data ACPI can read one time which is transfered by
+     * the response of NFIT_CMD_GET_CONFIG_DATA.
+     */
+    max_get_size = sizeof(out->data) - sizeof(out->cmd_config_get);
+
+    /*
+     * the max data ACPI can write one time which is transfered by
+     * NFIT_CMD_SET_CONFIG_DATA
+     */
+    max_set_size = sizeof(in->arg3) - sizeof(in->cmd_config_set);
+    return MIN(max_get_size, max_set_size);
+}
+
+static uint32_t
+dsm_cmd_config_size(PCNVDIMMDevice *nvdimm, struct dsm_buffer *in,
+                    struct dsm_out *out)
+{
+    uint32_t config_size, mxfer;
+
+    config_size = nvdimm->config_data_size;
+    mxfer = max_xfer_config_size();
+
+    out->cmd_config_size.config_size = cpu_to_le32(config_size);
+    out->cmd_config_size.max_xfer = cpu_to_le32(mxfer);
+    nvdebug("%s config_size %#x, max_xfer %#x.\n", __func__, config_size,
+            mxfer);
+
+    return NFIT_STATUS_SUCCESS;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
     GSList *list = get_nvdimm_built_list();
@@ -408,6 +475,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 
         out->cmd_implemented.cmd_list = cpu_to_le64(cmd_list);
         goto free;
+    case NFIT_CMD_GET_CONFIG_SIZE:
+        status = dsm_cmd_config_size(nvdimm, in, out);
+        break;
     default:
         status = NFIT_STATUS_NOT_SUPPORTED;
     };
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 16/18] nvdimm: support NFIT_CMD_GET_CONFIG_DATA
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (14 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 17/18] nvdimm: support NFIT_CMD_SET_CONFIG_DATA Xiao Guangrong
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Function 5 is used to get Namespace Label Data

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/acpi.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 0a5f2c2..517d710 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -352,6 +352,7 @@ struct dsm_buffer {
     uint32_t arg1;
     uint32_t arg2;
     union {
+        struct cmd_in_get_config_data cmd_config_get;
         struct cmd_in_set_config_data cmd_config_set;
         char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
     };
@@ -454,6 +455,34 @@ dsm_cmd_config_size(PCNVDIMMDevice *nvdimm, struct dsm_buffer *in,
     return NFIT_STATUS_SUCCESS;
 }
 
+static uint32_t
+dsm_cmd_config_get(PCNVDIMMDevice *nvdimm, struct dsm_buffer *in,
+                   struct dsm_out *out)
+{
+    struct cmd_in_get_config_data *cmd_in = &in->cmd_config_get;
+    uint32_t status;
+
+    le32_to_cpus(&cmd_in->length);
+    le32_to_cpus(&cmd_in->offset);
+
+    nvdebug("Read Config: offset %#x length %#x.\n", cmd_in->offset,
+            cmd_in->length);
+
+    if (nvdimm->config_data_size < cmd_in->length + cmd_in->offset) {
+        nvdebug("position %#x is beyond config data (len = %#lx).\n",
+                cmd_in->length + cmd_in->offset, nvdimm->config_data_size);
+        status = NFIT_STATUS_INVALID_PARAS;
+        goto exit;
+    }
+
+    status = NFIT_STATUS_SUCCESS;
+    memcpy(out->cmd_config_get.out_buf, nvdimm->config_data_addr +
+           cmd_in->offset, cmd_in->length);
+
+exit:
+    return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
     GSList *list = get_nvdimm_built_list();
@@ -478,6 +507,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
     case NFIT_CMD_GET_CONFIG_SIZE:
         status = dsm_cmd_config_size(nvdimm, in, out);
         break;
+    case NFIT_CMD_GET_CONFIG_DATA:
+        status = dsm_cmd_config_get(nvdimm, in, out);
+        break;
     default:
         status = NFIT_STATUS_NOT_SUPPORTED;
     };
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 17/18] nvdimm: support NFIT_CMD_SET_CONFIG_DATA
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (15 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 16/18] nvdimm: support NFIT_CMD_GET_CONFIG_DATA Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 18/18] nvdimm: add maintain info Xiao Guangrong
  2015-08-25 16:26 ` [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Stefan Hajnoczi
  18 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Function 6 is used to set Namespace Label Data

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/nvdimm/acpi.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
index 517d710..283228d 100644
--- a/hw/mem/nvdimm/acpi.c
+++ b/hw/mem/nvdimm/acpi.c
@@ -382,12 +382,17 @@ struct cmd_out_get_config_data {
     uint8_t out_buf[0];
 } QEMU_PACKED;
 
+struct cmd_out_set_config_data {
+    uint32_t status;
+} QEMU_PACKED;
+
 struct dsm_out {
     union {
         uint32_t status;
         struct cmd_out_implemented cmd_implemented;
         struct cmd_out_get_config_size cmd_config_size;
         struct cmd_out_get_config_data cmd_config_get;
+        struct cmd_out_set_config_data cmd_config_set;
         uint8_t data[PAGE_SIZE];
     };
 };
@@ -483,6 +488,38 @@ exit:
     return status;
 }
 
+static uint32_t
+dsm_cmd_config_set(PCNVDIMMDevice *nvdimm, struct dsm_buffer *in,
+                   struct dsm_out *out)
+{
+    struct cmd_in_set_config_data *cmd_in = &in->cmd_config_set;
+    uint32_t status;
+
+    if (!nvdimm->configdata) {
+        status = NFIT_STATUS_NOT_SUPPORTED;
+        goto exit;
+    }
+
+    le32_to_cpus(&cmd_in->length);
+    le32_to_cpus(&cmd_in->offset);
+
+    nvdebug("Write Config: offset %#x length %#x.\n", cmd_in->offset,
+            cmd_in->length);
+    if (nvdimm->config_data_size < cmd_in->length + cmd_in->offset) {
+        nvdebug("position %#x is beyond config data (len = %#lx).\n",
+                cmd_in->length + cmd_in->offset, nvdimm->config_data_size);
+        status = NFIT_STATUS_INVALID_PARAS;
+        goto exit;
+    }
+
+    status = NFIT_STATUS_SUCCESS;
+    memcpy(nvdimm->config_data_addr + cmd_in->offset, cmd_in->in_buf,
+           cmd_in->length);
+
+exit:
+    return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
     GSList *list = get_nvdimm_built_list();
@@ -510,6 +547,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
     case NFIT_CMD_GET_CONFIG_DATA:
         status = dsm_cmd_config_get(nvdimm, in, out);
         break;
+    case NFIT_CMD_SET_CONFIG_DATA:
+        status = dsm_cmd_config_set(nvdimm, in, out);
+        break;
     default:
         status = NFIT_STATUS_NOT_SUPPORTED;
     };
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Qemu-devel] [PATCH v2 18/18] nvdimm: add maintain info
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (16 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 17/18] nvdimm: support NFIT_CMD_SET_CONFIG_DATA Xiao Guangrong
@ 2015-08-14 14:52 ` Xiao Guangrong
  2015-08-25 16:26 ` [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Stefan Hajnoczi
  18 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-14 14:52 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

Add NVDIMM maintainer

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 978b717..86786e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -793,6 +793,12 @@ M: Jiri Pirko <jiri@resnulli.us>
 S: Maintained
 F: hw/net/rocker/
 
+NVDIMM
+M: Xiao Guangrong <guangrong.xiao@linux.intel.com>
+S: Maintained
+F: hw/mem/nvdimm/
+F: include/hw/mem/pc-nvdimm.h
+
 Subsystems
 ----------
 Audio
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract Xiao Guangrong
@ 2015-08-25 14:57   ` Stefan Hajnoczi
  2015-08-26  9:37     ` Xiao Guangrong
  2015-09-02  9:58   ` Igor Mammedov
  1 sibling, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 14:57 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
> +static void set_file(Object *obj, const char *str, Error **errp)
> +{
> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +    if (nvdimm->file) {
> +        g_free(nvdimm->file);
> +    }

g_free(NULL) is a nop so it's safe to replace the if with just
g_free(nvdimm->file).

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
@ 2015-08-25 15:12   ` Stefan Hajnoczi
  2015-08-26  9:39     ` Xiao Guangrong
  2015-08-26  9:40     ` Xiao Guangrong
  2015-08-25 15:39   ` Stefan Hajnoczi
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 15:12 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> index a53d235..7a270a8 100644
> --- a/hw/mem/nvdimm/pc-nvdimm.c
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -24,6 +24,19 @@
>  
>  #include "hw/mem/pc-nvdimm.h"
>  
> +#define PAGE_SIZE      (1UL << 12)

This macro name is likely to collide with system headers or other code.

Could you use the existing TARGET_PAGE_SIZE constant instead?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
  2015-08-25 15:12   ` Stefan Hajnoczi
@ 2015-08-25 15:39   ` Stefan Hajnoczi
  2015-08-28 17:25   ` Eduardo Habkost
  2015-09-04 12:02   ` Igor Mammedov
  3 siblings, 0 replies; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 15:39 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
> NVDIMM reserves all the free range above 4G to do:
> - Persistent Memory (PMEM) mapping
> - implement NVDIMM ACPI device _DSM method
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/i386/pc.c               | 12 ++++++++++--
>  hw/mem/nvdimm/pc-nvdimm.c  | 13 +++++++++++++
>  include/hw/mem/pc-nvdimm.h |  1 +
>  3 files changed, 24 insertions(+), 2 deletions(-)

CCing Igor for memory hotplug-related changes.

> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 7661ea9..41af6ea 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -64,6 +64,7 @@
>  #include "hw/pci/pci_host.h"
>  #include "acpi-build.h"
>  #include "hw/mem/pc-dimm.h"
> +#include "hw/mem/pc-nvdimm.h"
>  #include "qapi/visitor.h"
>  #include "qapi-visit.h"
>  
> @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      FWCfgState *fw_cfg;
>      PCMachineState *pcms = PC_MACHINE(machine);
> +    ram_addr_t offset;
>  
>      assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
>  
> @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
>          exit(EXIT_FAILURE);
>      }
>  
> +    offset = 0x100000000ULL + above_4g_mem_size;
> +
>      /* initialize hotplug memory address space */
>      if (guest_info->has_reserved_memory &&
>          (machine->ram_size < machine->maxram_size)) {
> @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>              exit(EXIT_FAILURE);
>          }
>  
> -        pcms->hotplug_memory.base =
> -            ROUND_UP(0x100000000ULL + above_4g_mem_size, 1ULL << 30);
> +        pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);
>  
>          if (pcms->enforce_aligned_dimm) {
>              /* size hotplug region assuming 1G page max alignment per slot */
> @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
>                             "hotplug-memory", hotplug_mem_size);
>          memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
>                                      &pcms->hotplug_memory.mr);
> +
> +        offset = pcms->hotplug_memory.base + hotplug_mem_size;
>      }
>  
> +     /* all the space left above 4G is reserved for NVDIMM. */
> +    pc_nvdimm_reserve_range(offset);
> +
>      /* Initialize PC system firmware */
>      pc_system_firmware_init(rom_memory, guest_info->isapc_ram_fw);
>  
> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> index a53d235..7a270a8 100644
> --- a/hw/mem/nvdimm/pc-nvdimm.c
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -24,6 +24,19 @@
>  
>  #include "hw/mem/pc-nvdimm.h"
>  
> +#define PAGE_SIZE      (1UL << 12)
> +
> +static struct nvdimms_info {
> +    ram_addr_t current_addr;
> +} nvdimms_info;
> +
> +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
> +void pc_nvdimm_reserve_range(ram_addr_t offset)
> +{
> +    offset = ROUND_UP(offset, PAGE_SIZE);
> +    nvdimms_info.current_addr = offset;
> +}
> +
>  static char *get_file(Object *obj, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
> index 51152b8..8601e9b 100644
> --- a/include/hw/mem/pc-nvdimm.h
> +++ b/include/hw/mem/pc-nvdimm.h
> @@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
>  #define PC_NVDIMM(obj) \
>      OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
>  
> +void pc_nvdimm_reserve_range(ram_addr_t offset);
>  #endif
> -- 
> 2.4.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area Xiao Guangrong
@ 2015-08-25 16:03   ` Stefan Hajnoczi
  2015-08-26 10:40     ` Xiao Guangrong
  2015-09-15 16:06     ` Paolo Bonzini
  2015-09-07 14:11   ` Igor Mammedov
  1 sibling, 2 replies; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 16:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
> The parameter @file is used as backed memory for NVDIMM which is
> divided into two parts if @dataconfig is true:

s/dataconfig/configdata/

> @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
>                               set_configdata, NULL);
>  }
>  
> +static uint64_t get_file_size(int fd)
> +{
> +    struct stat stat_buf;
> +    uint64_t size;
> +
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return 0;
> +    }
> +
> +    if (S_ISREG(stat_buf.st_mode)) {
> +        return stat_buf.st_size;
> +    }
> +
> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        return size;
> +    }

#ifdef __linux__ for ioctl(fd, BLKGETSIZE64, &size)?

There is nothing Linux-specific about emulating NVDIMMs so this code
should compile on all platforms.

> +
> +    return 0;
> +}
> +
>  static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
> +    char name[512];
> +    void *buf;
> +    ram_addr_t addr;
> +    uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
> +    int fd;
>  
>      if (!nvdimm->file) {
>          error_setg(errp, "file property is not set");
>      }

Missing return here.

> +
> +    fd = open(nvdimm->file, O_RDWR);

Does it make sense to support read-only NVDIMMs?

It could be handy for sharing a read-only file between unprivileged
guests.  The permissions on the file would only allow read, not write.

> +    if (fd < 0) {
> +        error_setg(errp, "can not open %s", nvdimm->file);

s/can not/cannot/

> +        return;
> +    }
> +
> +    size = get_file_size(fd);
> +    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
This can be added in the future.

> +    if (buf == MAP_FAILED) {
> +        error_setg(errp, "can not do mmap on %s", nvdimm->file);
> +        goto do_close;
> +    }
> +
> +    nvdimm->config_data_size = config_size;
> +    if (nvdimm->configdata) {
> +        /* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
> +        nvdimm_size = size - config_size;
> +        nvdimm->config_data_addr = buf + nvdimm_size;
> +    } else {
> +        nvdimm_size = size;
> +        nvdimm->config_data_addr = NULL;
> +    }
> +
> +    if ((int64_t)nvdimm_size <= 0) {

The error cases can be detected before mmap(2).  That avoids the int64_t
cast and also avoids nvdimm_size underflow and the bogus
nvdimm->config_data_addr calculation above.

size = get_file_size(fd);
if (size == 0) {
    error_setg(errp, "empty file or unable to get file size");
    goto do_close;
} else if (nvdimm->configdata && size < config_size) {{
    error_setg(errp, "file size is too small to store NVDIMM"
                     " configure data");
    goto do_close;
}

> +        error_setg(errp, "file size is too small to store NVDIMM"
> +                         " configure data");
> +        goto do_unmap;
> +    }
> +
> +    addr = reserved_range_push(nvdimm_size);
> +    if (!addr) {
> +        error_setg(errp, "do not have enough space for size %#lx.\n", size);

error_setg() messages must not have a newline at the end.

Please use "%#" PRIx64 instead of "%#lx" so compilation works on 32-bit
hosts where sizeof(long) == 4.

> +        goto do_unmap;
> +    }
> +
> +    nvdimm->device_index = new_device_index();
> +    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
> +    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
> +                               buf);

How is the autogenerated name used?

Why not just use "pc-nvdimm.memory"?

> +    vmstate_register_ram(&nvdimm->mr, DEVICE(dev));
> +    memory_region_add_subregion(get_system_memory(), addr, &nvdimm->mr);
> +
> +    return;

fd is leaked.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method Xiao Guangrong
@ 2015-08-25 16:11   ` Stefan Hajnoczi
  2015-08-26 10:41     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 16:11 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:03PM +0800, Xiao Guangrong wrote:
> @@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char *buf)
>      }
>  }
>  
> +struct dsm_buffer {
> +    /* RAM page. */
> +    uint32_t handle;
> +    uint8_t arg0[16];
> +    uint32_t arg1;
> +    uint32_t arg2;
> +    union {
> +        char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
> +    };
> +
> +    /* MMIO page. */
> +    union {
> +        uint32_t notify;
> +        char pedding[PAGE_SIZE];

s/pedding/padding/

> +    };
> +};
> +
> +static ram_addr_t dsm_addr;
> +static size_t dsm_size;
> +
> +static uint64_t dsm_read(void *opaque, hwaddr addr,
> +                         unsigned size)
> +{
> +    return 0;
> +}
> +
> +static void dsm_write(void *opaque, hwaddr addr,
> +                      uint64_t val, unsigned size)
> +{
> +}
> +
> +static const MemoryRegionOps dsm_ops = {
> +    .read = dsm_read,
> +    .write = dsm_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +};
> +
> +static int build_dsm_buffer(void)
> +{
> +    MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
> +    ram_addr_t addr;;

s/;;/;/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data Xiao Guangrong
@ 2015-08-25 16:16   ` Stefan Hajnoczi
  2015-08-26 10:42     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 16:16 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
> +#ifdef NVDIMM_DEBUG
> +#define nvdebug(fmt, ...) fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__)
> +#else
> +#define nvdebug(...)
> +#endif

The following allows the compiler to check format strings and syntax
check the argument expressions:

#define NVDIMM_DEBUG 0  /* set to 1 for debug output */
#define nvdebug(fmt, ...) \
    if (NVDIMM_DEBUG) { \
        fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__); \
    }

This approach avoids bitrot (e.g. debug format string arguments have
become outdated).

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function Xiao Guangrong
@ 2015-08-25 16:23   ` Stefan Hajnoczi
  2015-08-26 10:46     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 16:23 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
> @@ -306,6 +354,18 @@ struct dsm_buffer {
>  static ram_addr_t dsm_addr;
>  static size_t dsm_size;
>  
> +struct cmd_out_implemented {

QEMU coding style uses typedef struct {} CamelCase.  Please follow this
convention in all user-defined structs (see ./CODING_STYLE).

>  static void dsm_write(void *opaque, hwaddr addr,
>                        uint64_t val, unsigned size)
>  {
> +    struct MemoryRegion *dsm_ram_mr = opaque;
> +    struct dsm_buffer *dsm;
> +    struct dsm_out *out;
> +    void *buf;
> +
>      assert(val == NOTIFY_VALUE);

The guest should not be able to cause an abort(3).  If val !=
NOTIFY_VALUE we can do nvdebug() and then return.

> +
> +    buf = memory_region_get_ram_ptr(dsm_ram_mr);
> +    dsm = buf;
> +    out = buf;
> +
> +    le32_to_cpus(&dsm->handle);
> +    le32_to_cpus(&dsm->arg1);
> +    le32_to_cpus(&dsm->arg2);

Can SMP guests modify DSM RAM while this thread is running?

We must avoid race conditions.  It's probably better to copy in data
before byte-swapping or checking input values.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function Xiao Guangrong
@ 2015-08-25 16:24   ` Stefan Hajnoczi
  2015-08-26 10:47     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 16:24 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:08PM +0800, Xiao Guangrong wrote:
> Function 4 is used to get Namespace lable size

s/lable/label/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
  2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
                   ` (17 preceding siblings ...)
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 18/18] nvdimm: add maintain info Xiao Guangrong
@ 2015-08-25 16:26 ` Stefan Hajnoczi
  2015-08-26 10:49   ` Xiao Guangrong
  18 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-25 16:26 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> Changlog:
> - Use litten endian for DSM method, thanks for Stefan's suggestion
> 
> - introduce a new parameter, @configdata, if it's false, Qemu will
>   build a static and readonly namespace in memory and use it serveing
>   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>   reserved region is needed at the end of the @file, it is good for
>   the user who want to pass whole nvdimm device and make its data
>   completely be visible to guest
> 
> - divide the source code into separated files and add maintain info

I have skipped ACPI patches because I'm not very familiar with that
area.

Have you thought about live migration?

Are the contents of the NVDIMM migrated since they are registered as a
RAM region?

Stefan

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-08-25 14:57   ` Stefan Hajnoczi
@ 2015-08-26  9:37     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26  9:37 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/25/2015 10:57 PM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
>> +static void set_file(Object *obj, const char *str, Error **errp)
>> +{
>> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> +
>> +    if (nvdimm->file) {
>> +        g_free(nvdimm->file);
>> +    }
>
> g_free(NULL) is a nop so it's safe to replace the if with just
> g_free(nvdimm->file).
>

Yeah, the man page says you're right. Will clean it up.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-25 15:12   ` Stefan Hajnoczi
@ 2015-08-26  9:39     ` Xiao Guangrong
  2015-08-26  9:40     ` Xiao Guangrong
  1 sibling, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26  9:39 UTC (permalink / raw)
  To: qemu-devel



On 08/25/2015 11:12 PM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
>> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
>> index a53d235..7a270a8 100644
>> --- a/hw/mem/nvdimm/pc-nvdimm.c
>> +++ b/hw/mem/nvdimm/pc-nvdimm.c
>> @@ -24,6 +24,19 @@
>>
>>   #include "hw/mem/pc-nvdimm.h"
>>
>> +#define PAGE_SIZE      (1UL << 12)
>
> This macro name is likely to collide with system headers or other code.
>
> Could you use the existing TARGET_PAGE_SIZE constant instead?

Will follow your way in the next version. Thank you for pointing it out.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-25 15:12   ` Stefan Hajnoczi
  2015-08-26  9:39     ` Xiao Guangrong
@ 2015-08-26  9:40     ` Xiao Guangrong
  1 sibling, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26  9:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, imammedo, rth



On 08/25/2015 11:12 PM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
>> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
>> index a53d235..7a270a8 100644
>> --- a/hw/mem/nvdimm/pc-nvdimm.c
>> +++ b/hw/mem/nvdimm/pc-nvdimm.c
>> @@ -24,6 +24,19 @@
>>
>>   #include "hw/mem/pc-nvdimm.h"
>>
>> +#define PAGE_SIZE      (1UL << 12)
>
> This macro name is likely to collide with system headers or other code.
>
> Could you use the existing TARGET_PAGE_SIZE constant instead?
>

Will follow your way in the next version. Thank you for pointing it out.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-25 16:03   ` Stefan Hajnoczi
@ 2015-08-26 10:40     ` Xiao Guangrong
  2015-08-28 11:58       ` Stefan Hajnoczi
  2015-09-15 16:07       ` Paolo Bonzini
  2015-09-15 16:06     ` Paolo Bonzini
  1 sibling, 2 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26 10:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/26/2015 12:03 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
>> The parameter @file is used as backed memory for NVDIMM which is
>> divided into two parts if @dataconfig is true:
>
> s/dataconfig/configdata/

Stupid typo, sorry.

>
>> @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
>>                                set_configdata, NULL);
>>   }
>>
>> +static uint64_t get_file_size(int fd)
>> +{
>> +    struct stat stat_buf;
>> +    uint64_t size;
>> +
>> +    if (fstat(fd, &stat_buf) < 0) {
>> +        return 0;
>> +    }
>> +
>> +    if (S_ISREG(stat_buf.st_mode)) {
>> +        return stat_buf.st_size;
>> +    }
>> +
>> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
>> +        return size;
>> +    }
>
> #ifdef __linux__ for ioctl(fd, BLKGETSIZE64, &size)?
>
> There is nothing Linux-specific about emulating NVDIMMs so this code
> should compile on all platforms.

Right. I have no idea for how block devices work on other platforms so
I will only allow linux to directly use bock device file in the next
version.

>
>> +
>> +    return 0;
>> +}
>> +
>>   static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
>>   {
>>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
>> +    char name[512];
>> +    void *buf;
>> +    ram_addr_t addr;
>> +    uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
>> +    int fd;
>>
>>       if (!nvdimm->file) {
>>           error_setg(errp, "file property is not set");
>>       }
>
> Missing return here.

Will fix.

>
>> +
>> +    fd = open(nvdimm->file, O_RDWR);
>
> Does it make sense to support read-only NVDIMMs?
>
> It could be handy for sharing a read-only file between unprivileged
> guests.  The permissions on the file would only allow read, not write.

Make sense. Currently these patchset just implements "shared" mode so that
write permission is required, however, please see below:

>
>> +    if (fd < 0) {
>> +        error_setg(errp, "can not open %s", nvdimm->file);
>
> s/can not/cannot/
>
>> +        return;
>> +    }
>> +
>> +    size = get_file_size(fd);
>> +    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>
> I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
> This can be added in the future.

Good idea, it will allow guest to write data but discards its content after it
exits. Will implement O_RDONLY + MAP_PRIVATE in the near future.

>
>> +    if (buf == MAP_FAILED) {
>> +        error_setg(errp, "can not do mmap on %s", nvdimm->file);
>> +        goto do_close;
>> +    }
>> +
>> +    nvdimm->config_data_size = config_size;
>> +    if (nvdimm->configdata) {
>> +        /* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
>> +        nvdimm_size = size - config_size;
>> +        nvdimm->config_data_addr = buf + nvdimm_size;
>> +    } else {
>> +        nvdimm_size = size;
>> +        nvdimm->config_data_addr = NULL;
>> +    }
>> +
>> +    if ((int64_t)nvdimm_size <= 0) {
>
> The error cases can be detected before mmap(2).  That avoids the int64_t
> cast and also avoids nvdimm_size underflow and the bogus
> nvdimm->config_data_addr calculation above.

Okay.

>
> size = get_file_size(fd);
> if (size == 0) {
>      error_setg(errp, "empty file or unable to get file size");
>      goto do_close;
> } else if (nvdimm->configdata && size < config_size) {{
>      error_setg(errp, "file size is too small to store NVDIMM"
>                       " configure data");
>      goto do_close;
> }
>
>> +        error_setg(errp, "file size is too small to store NVDIMM"
>> +                         " configure data");
>> +        goto do_unmap;
>> +    }
>> +
>> +    addr = reserved_range_push(nvdimm_size);
>> +    if (!addr) {
>> +        error_setg(errp, "do not have enough space for size %#lx.\n", size);
>
> error_setg() messages must not have a newline at the end.
>
> Please use "%#" PRIx64 instead of "%#lx" so compilation works on 32-bit
> hosts where sizeof(long) == 4.

Good catch.

>
>> +        goto do_unmap;
>> +    }
>> +
>> +    nvdimm->device_index = new_device_index();
>> +    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
>> +    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
>> +                               buf);
>
> How is the autogenerated name used?
>
> Why not just use "pc-nvdimm.memory"?

Ah. Just for debug proposal :) and i am not sure if a name used for multiple
MRs (MemoryRegion) is a good idea.

>
>> +    vmstate_register_ram(&nvdimm->mr, DEVICE(dev));
>> +    memory_region_add_subregion(get_system_memory(), addr, &nvdimm->mr);
>> +
>> +    return;
>
> fd is leaked.

Will fix.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method
  2015-08-25 16:11   ` Stefan Hajnoczi
@ 2015-08-26 10:41     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26 10:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/26/2015 12:11 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:03PM +0800, Xiao Guangrong wrote:
>> @@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char *buf)
>>       }
>>   }
>>
>> +struct dsm_buffer {
>> +    /* RAM page. */
>> +    uint32_t handle;
>> +    uint8_t arg0[16];
>> +    uint32_t arg1;
>> +    uint32_t arg2;
>> +    union {
>> +        char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
>> +    };
>> +
>> +    /* MMIO page. */
>> +    union {
>> +        uint32_t notify;
>> +        char pedding[PAGE_SIZE];
>
> s/pedding/padding/

Will fix.

>
>> +    };
>> +};
>> +
>> +static ram_addr_t dsm_addr;
>> +static size_t dsm_size;
>> +
>> +static uint64_t dsm_read(void *opaque, hwaddr addr,
>> +                         unsigned size)
>> +{
>> +    return 0;
>> +}
>> +
>> +static void dsm_write(void *opaque, hwaddr addr,
>> +                      uint64_t val, unsigned size)
>> +{
>> +}
>> +
>> +static const MemoryRegionOps dsm_ops = {
>> +    .read = dsm_read,
>> +    .write = dsm_write,
>> +    .endianness = DEVICE_LITTLE_ENDIAN,
>> +};
>> +
>> +static int build_dsm_buffer(void)
>> +{
>> +    MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
>> +    ram_addr_t addr;;
>
> s/;;/;/

Will fix.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data
  2015-08-25 16:16   ` Stefan Hajnoczi
@ 2015-08-26 10:42     ` Xiao Guangrong
  2015-08-28 11:59       ` Stefan Hajnoczi
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26 10:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/26/2015 12:16 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
>> +#ifdef NVDIMM_DEBUG
>> +#define nvdebug(fmt, ...) fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__)
>> +#else
>> +#define nvdebug(...)
>> +#endif
>
> The following allows the compiler to check format strings and syntax
> check the argument expressions:
>
> #define NVDIMM_DEBUG 0  /* set to 1 for debug output */
> #define nvdebug(fmt, ...) \
>      if (NVDIMM_DEBUG) { \
>          fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__); \
>      }
>
> This approach avoids bitrot (e.g. debug format string arguments have
> become outdated).
>

Really good tips, thanks for your sharing.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
  2015-08-25 16:23   ` Stefan Hajnoczi
@ 2015-08-26 10:46     ` Xiao Guangrong
  2015-08-28 12:01       ` Stefan Hajnoczi
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26 10:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/26/2015 12:23 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
>> @@ -306,6 +354,18 @@ struct dsm_buffer {
>>   static ram_addr_t dsm_addr;
>>   static size_t dsm_size;
>>
>> +struct cmd_out_implemented {
>
> QEMU coding style uses typedef struct {} CamelCase.  Please follow this
> convention in all user-defined structs (see ./CODING_STYLE).
>

Okay, will adjust all the defines in the next version.

>>   static void dsm_write(void *opaque, hwaddr addr,
>>                         uint64_t val, unsigned size)
>>   {
>> +    struct MemoryRegion *dsm_ram_mr = opaque;
>> +    struct dsm_buffer *dsm;
>> +    struct dsm_out *out;
>> +    void *buf;
>> +
>>       assert(val == NOTIFY_VALUE);
>
> The guest should not be able to cause an abort(3).  If val !=
> NOTIFY_VALUE we can do nvdebug() and then return.

The ACPI code and emulation code both are from qemu, if that happens,
it's really a bug, aborting the VM is better than throwing a debug
message under this case to avoid potential data corruption.

>
>> +
>> +    buf = memory_region_get_ram_ptr(dsm_ram_mr);
>> +    dsm = buf;
>> +    out = buf;
>> +
>> +    le32_to_cpus(&dsm->handle);
>> +    le32_to_cpus(&dsm->arg1);
>> +    le32_to_cpus(&dsm->arg2);
>
> Can SMP guests modify DSM RAM while this thread is running?
>
> We must avoid race conditions.  It's probably better to copy in data
> before byte-swapping or checking input values.

Yes, my mistake, will fix.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
  2015-08-25 16:24   ` Stefan Hajnoczi
@ 2015-08-26 10:47     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26 10:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/26/2015 12:24 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:52:08PM +0800, Xiao Guangrong wrote:
>> Function 4 is used to get Namespace lable size
>
> s/lable/label/
>

Stupid me, will fix the change log.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
  2015-08-25 16:26 ` [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Stefan Hajnoczi
@ 2015-08-26 10:49   ` Xiao Guangrong
  2015-10-07 14:02     ` Stefan Hajnoczi
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-26 10:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
>> Changlog:
>> - Use litten endian for DSM method, thanks for Stefan's suggestion
>>
>> - introduce a new parameter, @configdata, if it's false, Qemu will
>>    build a static and readonly namespace in memory and use it serveing
>>    for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>>    reserved region is needed at the end of the @file, it is good for
>>    the user who want to pass whole nvdimm device and make its data
>>    completely be visible to guest
>>
>> - divide the source code into separated files and add maintain info
>
> I have skipped ACPI patches because I'm not very familiar with that
> area.

Thank you very much for your review, your comment is great helpful to
me, Stefan!

>
> Have you thought about live migration?
>
> Are the contents of the NVDIMM migrated since they are registered as a
> RAM region?

Will fully test live migration and VM save before sending the V3 out. :)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-26 10:40     ` Xiao Guangrong
@ 2015-08-28 11:58       ` Stefan Hajnoczi
  2015-08-31  6:23         ` Xiao Guangrong
  2015-09-15 16:07       ` Paolo Bonzini
  1 sibling, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-28 11:58 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth

On Wed, Aug 26, 2015 at 06:40:26PM +0800, Xiao Guangrong wrote:
> On 08/26/2015 12:03 AM, Stefan Hajnoczi wrote:
> >On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
> >
> >>+    if (fd < 0) {
> >>+        error_setg(errp, "can not open %s", nvdimm->file);
> >
> >s/can not/cannot/
> >
> >>+        return;
> >>+    }
> >>+
> >>+    size = get_file_size(fd);
> >>+    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> >
> >I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
> >This can be added in the future.
> 
> Good idea, it will allow guest to write data but discards its content after it
> exits. Will implement O_RDONLY + MAP_PRIVATE in the near future.

Great.

> >>+        goto do_unmap;
> >>+    }
> >>+
> >>+    nvdimm->device_index = new_device_index();
> >>+    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
> >>+    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
> >>+                               buf);
> >
> >How is the autogenerated name used?
> >
> >Why not just use "pc-nvdimm.memory"?
> 
> Ah. Just for debug proposal :) and i am not sure if a name used for multiple
> MRs (MemoryRegion) is a good idea.

Other devices use a constant name too (git grep
memory_region_init_ram_ptr) so it seems to be okay.  The unique thing is
the OBJECT(dev) which differs for each NVDIMM instance.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data
  2015-08-26 10:42     ` Xiao Guangrong
@ 2015-08-28 11:59       ` Stefan Hajnoczi
  2015-08-31  6:25         ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-28 11:59 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth

On Wed, Aug 26, 2015 at 06:42:01PM +0800, Xiao Guangrong wrote:
> 
> 
> On 08/26/2015 12:16 AM, Stefan Hajnoczi wrote:
> >On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
> >>+#ifdef NVDIMM_DEBUG
> >>+#define nvdebug(fmt, ...) fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__)
> >>+#else
> >>+#define nvdebug(...)
> >>+#endif
> >
> >The following allows the compiler to check format strings and syntax
> >check the argument expressions:
> >
> >#define NVDIMM_DEBUG 0  /* set to 1 for debug output */
> >#define nvdebug(fmt, ...) \
> >     if (NVDIMM_DEBUG) { \
> >         fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__); \
> >     }
> >
> >This approach avoids bitrot (e.g. debug format string arguments have
> >become outdated).
> >
> 
> Really good tips, thanks for your sharing.

I forgot the do { ... } while (0) in the macro to make nvdebug("hello
world"); work like a normal C statement.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
  2015-08-26 10:46     ` Xiao Guangrong
@ 2015-08-28 12:01       ` Stefan Hajnoczi
  2015-08-31  6:51         ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-08-28 12:01 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth

On Wed, Aug 26, 2015 at 06:46:35PM +0800, Xiao Guangrong wrote:
> On 08/26/2015 12:23 AM, Stefan Hajnoczi wrote:
> >On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
> >>  static void dsm_write(void *opaque, hwaddr addr,
> >>                        uint64_t val, unsigned size)
> >>  {
> >>+    struct MemoryRegion *dsm_ram_mr = opaque;
> >>+    struct dsm_buffer *dsm;
> >>+    struct dsm_out *out;
> >>+    void *buf;
> >>+
> >>      assert(val == NOTIFY_VALUE);
> >
> >The guest should not be able to cause an abort(3).  If val !=
> >NOTIFY_VALUE we can do nvdebug() and then return.
> 
> The ACPI code and emulation code both are from qemu, if that happens,
> it's really a bug, aborting the VM is better than throwing a debug
> message under this case to avoid potential data corruption.

abort(3) is dangerous because it can create a core dump.  If a malicious
guest triggers this repeatedly it could consume a lot of disk space and
I/O or CPU while performing the core dumps.

We cannot trust anything inside the guest, even if the guest code comes
from QEMU because a malicious guest can still read/write to the same
hardware registers.

Stefan

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
  2015-08-25 15:12   ` Stefan Hajnoczi
  2015-08-25 15:39   ` Stefan Hajnoczi
@ 2015-08-28 17:25   ` Eduardo Habkost
  2015-08-31  7:01     ` Xiao Guangrong
  2015-09-04 12:02   ` Igor Mammedov
  3 siblings, 1 reply; 87+ messages in thread
From: Eduardo Habkost @ 2015-08-28 17:25 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: kvm, mst, gleb, mtosatti, qemu-devel, stefanha, imammedo,
	pbonzini, rth

On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
> NVDIMM reserves all the free range above 4G to do:
> - Persistent Memory (PMEM) mapping
> - implement NVDIMM ACPI device _DSM method
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[...]
> @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      FWCfgState *fw_cfg;
>      PCMachineState *pcms = PC_MACHINE(machine);
> +    ram_addr_t offset;

"offset" is a very generic name. I suggest naming it "nvdimm_offset".

>  
>      assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
>  
> @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
>          exit(EXIT_FAILURE);
>      }
>  
> +    offset = 0x100000000ULL + above_4g_mem_size;
> +
>      /* initialize hotplug memory address space */
>      if (guest_info->has_reserved_memory &&
>          (machine->ram_size < machine->maxram_size)) {
> @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>              exit(EXIT_FAILURE);
>          }
>  
> -        pcms->hotplug_memory.base =
> -            ROUND_UP(0x100000000ULL + above_4g_mem_size, 1ULL << 30);
> +        pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);
>  
>          if (pcms->enforce_aligned_dimm) {
>              /* size hotplug region assuming 1G page max alignment per slot */

-- 
Eduardo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-28 11:58       ` Stefan Hajnoczi
@ 2015-08-31  6:23         ` Xiao Guangrong
  2015-09-01  9:14           ` Stefan Hajnoczi
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-31  6:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth


Hi Stefan,

On 08/28/2015 07:58 PM, Stefan Hajnoczi wrote:

>
>>>> +        goto do_unmap;
>>>> +    }
>>>> +
>>>> +    nvdimm->device_index = new_device_index();
>>>> +    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
>>>> +    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
>>>> +                               buf);
>>>
>>> How is the autogenerated name used?
>>>
>>> Why not just use "pc-nvdimm.memory"?
>>
>> Ah. Just for debug proposal :) and i am not sure if a name used for multiple
>> MRs (MemoryRegion) is a good idea.
>
> Other devices use a constant name too (git grep
> memory_region_init_ram_ptr) so it seems to be okay.  The unique thing is
> the OBJECT(dev) which differs for each NVDIMM instance.
>

When I was digging into live migration code, i noticed that the same MR name may
cause the name "idstr", please refer to qemu_ram_set_idstr().

Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
function.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data
  2015-08-28 11:59       ` Stefan Hajnoczi
@ 2015-08-31  6:25         ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-31  6:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth



On 08/28/2015 07:59 PM, Stefan Hajnoczi wrote:
> On Wed, Aug 26, 2015 at 06:42:01PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 08/26/2015 12:16 AM, Stefan Hajnoczi wrote:
>>> On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
>>>> +#ifdef NVDIMM_DEBUG
>>>> +#define nvdebug(fmt, ...) fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__)
>>>> +#else
>>>> +#define nvdebug(...)
>>>> +#endif
>>>
>>> The following allows the compiler to check format strings and syntax
>>> check the argument expressions:
>>>
>>> #define NVDIMM_DEBUG 0  /* set to 1 for debug output */
>>> #define nvdebug(fmt, ...) \
>>>      if (NVDIMM_DEBUG) { \
>>>          fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__); \
>>>      }
>>>
>>> This approach avoids bitrot (e.g. debug format string arguments have
>>> become outdated).
>>>
>>
>> Really good tips, thanks for your sharing.
>
> I forgot the do { ... } while (0) in the macro to make nvdebug("hello
> world"); work like a normal C statement.
>

Got it, will keep it in my mind.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
  2015-08-28 12:01       ` Stefan Hajnoczi
@ 2015-08-31  6:51         ` Xiao Guangrong
  2015-09-01  9:16           ` Stefan Hajnoczi
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-31  6:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	imammedo, pbonzini, rth



On 08/28/2015 08:01 PM, Stefan Hajnoczi wrote:
> On Wed, Aug 26, 2015 at 06:46:35PM +0800, Xiao Guangrong wrote:
>> On 08/26/2015 12:23 AM, Stefan Hajnoczi wrote:
>>> On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
>>>>   static void dsm_write(void *opaque, hwaddr addr,
>>>>                         uint64_t val, unsigned size)
>>>>   {
>>>> +    struct MemoryRegion *dsm_ram_mr = opaque;
>>>> +    struct dsm_buffer *dsm;
>>>> +    struct dsm_out *out;
>>>> +    void *buf;
>>>> +
>>>>       assert(val == NOTIFY_VALUE);
>>>
>>> The guest should not be able to cause an abort(3).  If val !=
>>> NOTIFY_VALUE we can do nvdebug() and then return.
>>
>> The ACPI code and emulation code both are from qemu, if that happens,
>> it's really a bug, aborting the VM is better than throwing a debug
>> message under this case to avoid potential data corruption.
>
> abort(3) is dangerous because it can create a core dump.  If a malicious
> guest triggers this repeatedly it could consume a lot of disk space and
> I/O or CPU while performing the core dumps.
>
> We cannot trust anything inside the guest, even if the guest code comes
> from QEMU because a malicious guest can still read/write to the same
> hardware registers.
>

Completely agree with you. :)

How about use exit{1} instead of abort() to kill the VM?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-28 17:25   ` Eduardo Habkost
@ 2015-08-31  7:01     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-08-31  7:01 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: kvm, mst, gleb, mtosatti, qemu-devel, stefanha, imammedo,
	pbonzini, rth


Hi Eduardo,

Thank you for reviewing my patches.

On 08/29/2015 01:25 AM, Eduardo Habkost wrote:
> On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
>> NVDIMM reserves all the free range above 4G to do:
>> - Persistent Memory (PMEM) mapping
>> - implement NVDIMM ACPI device _DSM method
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> [...]
>> @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>>       MemoryRegion *ram_below_4g, *ram_above_4g;
>>       FWCfgState *fw_cfg;
>>       PCMachineState *pcms = PC_MACHINE(machine);
>> +    ram_addr_t offset;
>
> "offset" is a very generic name. I suggest naming it "nvdimm_offset".

'offset' is used for generic proposal as it is not only used for nvdimm but
also for calculating hotplug_mem_base:
       pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-31  6:23         ` Xiao Guangrong
@ 2015-09-01  9:14           ` Stefan Hajnoczi
  2015-09-15 16:10             ` Paolo Bonzini
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-09-01  9:14 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth

On Mon, Aug 31, 2015 at 02:23:43PM +0800, Xiao Guangrong wrote:
> 
> Hi Stefan,
> 
> On 08/28/2015 07:58 PM, Stefan Hajnoczi wrote:
> 
> >
> >>>>+        goto do_unmap;
> >>>>+    }
> >>>>+
> >>>>+    nvdimm->device_index = new_device_index();
> >>>>+    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
> >>>>+    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
> >>>>+                               buf);
> >>>
> >>>How is the autogenerated name used?
> >>>
> >>>Why not just use "pc-nvdimm.memory"?
> >>
> >>Ah. Just for debug proposal :) and i am not sure if a name used for multiple
> >>MRs (MemoryRegion) is a good idea.
> >
> >Other devices use a constant name too (git grep
> >memory_region_init_ram_ptr) so it seems to be okay.  The unique thing is
> >the OBJECT(dev) which differs for each NVDIMM instance.
> >
> 
> When I was digging into live migration code, i noticed that the same MR name may
> cause the name "idstr", please refer to qemu_ram_set_idstr().
> 
> Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
> function.

I see.  The other devices that use a constant name are on a bus so the
abort doesn't trigger.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
  2015-08-31  6:51         ` Xiao Guangrong
@ 2015-09-01  9:16           ` Stefan Hajnoczi
  0 siblings, 0 replies; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-09-01  9:16 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	imammedo, pbonzini, rth

On Mon, Aug 31, 2015 at 02:51:50PM +0800, Xiao Guangrong wrote:
> 
> 
> On 08/28/2015 08:01 PM, Stefan Hajnoczi wrote:
> >On Wed, Aug 26, 2015 at 06:46:35PM +0800, Xiao Guangrong wrote:
> >>On 08/26/2015 12:23 AM, Stefan Hajnoczi wrote:
> >>>On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
> >>>>  static void dsm_write(void *opaque, hwaddr addr,
> >>>>                        uint64_t val, unsigned size)
> >>>>  {
> >>>>+    struct MemoryRegion *dsm_ram_mr = opaque;
> >>>>+    struct dsm_buffer *dsm;
> >>>>+    struct dsm_out *out;
> >>>>+    void *buf;
> >>>>+
> >>>>      assert(val == NOTIFY_VALUE);
> >>>
> >>>The guest should not be able to cause an abort(3).  If val !=
> >>>NOTIFY_VALUE we can do nvdebug() and then return.
> >>
> >>The ACPI code and emulation code both are from qemu, if that happens,
> >>it's really a bug, aborting the VM is better than throwing a debug
> >>message under this case to avoid potential data corruption.
> >
> >abort(3) is dangerous because it can create a core dump.  If a malicious
> >guest triggers this repeatedly it could consume a lot of disk space and
> >I/O or CPU while performing the core dumps.
> >
> >We cannot trust anything inside the guest, even if the guest code comes
> >from QEMU because a malicious guest can still read/write to the same
> >hardware registers.
> >
> 
> Completely agree with you. :)
> 
> How about use exit{1} instead of abort() to kill the VM?

Most devices on a physical machine do not power off or reset the machine
in case of error.

I think it's good to follow that model and avoid killing the VM.
Otherwise nested virtualization or userspace drivers can take down the
whole VM.

Stefan

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset Xiao Guangrong
@ 2015-09-02  8:05   ` Igor Mammedov
  0 siblings, 0 replies; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02  8:05 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:51:54 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Currently, the offset in OperationRegion is limited to 32 bit, extend it
> to 64 bit so that we can switch SSDT to 64 bit in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/acpi/aml-build.c         | 2 +-
>  include/hw/acpi/aml-build.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 0d4b324..02f9e3d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -752,7 +752,7 @@ Aml *aml_package(uint8_t num_elements)
>  
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefOpRegion */
>  Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
> -                          uint32_t offset, uint32_t len)
> +                          uint64_t offset, uint32_t len)
>  {
>      Aml *var = aml_alloc();
>      build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index e3afa13..996ac5b 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -222,7 +222,7 @@ Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
>  Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
>              uint8_t aln, uint8_t len);
>  Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
> -                          uint32_t offset, uint32_t len);
> +                          uint64_t offset, uint32_t len);
>  Aml *aml_irq_no_flags(uint8_t irq);
>  Aml *aml_named_field(const char *name, unsigned length);
>  Aml *aml_reserved_field(unsigned length);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract Xiao Guangrong
  2015-08-25 14:57   ` Stefan Hajnoczi
@ 2015-09-02  9:58   ` Igor Mammedov
  2015-09-02 10:36     ` Xiao Guangrong
  1 sibling, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02  9:58 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:51:59 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Introduce "pc-nvdimm" device and it has two parameters:
Why do you use prefix "pc-", I suppose we potentially
could use this device not only with x86 targets but with
other targets as well.
I'd just drop 'pc' prefix through out patchset.

> - @file, which is the backed memory file for NVDIMM device
Could you try to split device into backend/frontend parts,
like it's done with pc-dimm. As I understand it's preferred
way to implement this kind of devices.
Then you could reuse memory backends that we already have
including file backend.

So CLI could look like:
-object memory-backend-file,id=mem0,file=/storage/foo
-device nvdimm,memdev=mem0,configdata=on

> 
> - @configdata, specify if we need to reserve 128k at the end of
>   @file for nvdimm device's config data. Default is false
> 
> If @configdata is false, Qemu will build a static and readonly
> namespace in memory and use it serveing for
> DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
> This is good for the user who want to pass whole nvdimm device
> and make its data is complete visible to guest
> 
> We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
> Qemu command to create NVDIMM device for the guest
PS:
please try to fix commit message spelling/grammar wise.

[...]
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -0,0 +1,99 @@
> +/*
> + * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
s/Implement/Implementation/ in all new files
an maybe s/NVDIMM (A // as it's reduntant

[...]
> +static bool has_configdata(Object *obj, Error **errp)
> +{
> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +    return nvdimm->configdata;
> +}
> +
> +static void set_configdata(Object *obj, bool value, Error **errp)
> +{
> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +    nvdimm->configdata = value;
> +}
usually for property setters/getters we use form:
 "device_prefix"_[g|s]et_foo
so
 nvdim_get_configdata ...

[...]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit Xiao Guangrong
@ 2015-09-02 10:06   ` Igor Mammedov
  2015-09-02 10:43     ` Xiao Guangrong
  2015-09-02 12:05     ` Michael S. Tsirkin
  0 siblings, 2 replies; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02 10:06 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:51:55 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
> Other components also reserve regions from their internal usage, e.g,
> [0xFED00000, 0xFED00000 + 0x400) is reserved for HPET
> 
> Switch SSDT to 64 bit to use the huge free room above 4G. In the later
> patches, we will dynamical allocate free space within this region which
> is used by NVDIMM _DSM method
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/i386/acpi-build.c  | 4 ++--
>  hw/i386/acpi-dsdt.dsl | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 46eddb8..8ead1c1 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
>      g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
>      build_header(linker, table_data,
>          (void *)(table_data->data + table_data->len - ssdt->buf->len),
> -        "SSDT", ssdt->buf->len, 1);
> +        "SSDT", ssdt->buf->len, 2);
That might break Windows XP, since it supports only 1.0b ACPI with some
2.0 extensions.
there is 2 way to work around it:
 - add an additional Rev2 ssdt table if NVDIMMs are present
   and describe them there
 - make sure that you use only 32bit arithmetic in AML
   (and emulate 64bit like it has been done for memory hotplug)

>      free_aml_allocator();
>  }
>  
> @@ -1586,7 +1586,7 @@ build_dsdt(GArray *table_data, GArray *linker, AcpiMiscInfo *misc)
>  
>      memset(dsdt, 0, sizeof *dsdt);
>      build_header(linker, table_data, dsdt, "DSDT",
> -                 misc->dsdt_size, 1);
> +                 misc->dsdt_size, 2);
>  }
>  
>  static GArray *
> diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> index a2d84ec..5cd3f0e 100644
> --- a/hw/i386/acpi-dsdt.dsl
> +++ b/hw/i386/acpi-dsdt.dsl
> @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
>  DefinitionBlock (
>      "acpi-dsdt.aml",    // Output Filename
>      "DSDT",             // Signature
> -    0x01,               // DSDT Compliance Revision
> +    0x02,               // DSDT Compliance Revision
>      "BXPC",             // OEMID
>      "BXDSDT",           // TABLE ID
>      0x1                 // OEM Revision

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof Xiao Guangrong
@ 2015-09-02 10:16   ` Igor Mammedov
  2015-09-02 10:38     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02 10:16 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:51:56 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Implement DeRefOf term which is used by NVDIMM _DSM method in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 8 ++++++++
>  include/hw/acpi/aml-build.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 02f9e3d..9e89efc 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
>      return var;
>  }
>  
> +/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
Pls put here lowest doc revision where the term has first appeared

> +Aml *aml_derefof(Aml *arg)
> +{
> +    Aml *var = aml_opcode(0x83 /* DerefOfOp */);
> +    aml_append(var, arg);
> +    return var;
> +}
> +
>  void
>  build_header(GArray *linker, GArray *table_data,
>               AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 996ac5b..21dc5e9 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -275,6 +275,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name);
>  Aml *aml_varpackage(uint32_t num_elements);
>  Aml *aml_touuid(const char *uuid);
>  Aml *aml_unicode(const char *str);
> +Aml *aml_derefof(Aml *arg);
>  
>  void
>  build_header(GArray *linker, GArray *table_data,

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof Xiao Guangrong
@ 2015-09-02 10:18   ` Igor Mammedov
  2015-09-02 10:39     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02 10:18 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:51:57 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Implement SizeOf term which is used by NVDIMM _DSM method in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 8 ++++++++
>  include/hw/acpi/aml-build.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 9e89efc..a526eed 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
>      return var;
>  }
>  
> +/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
ditto, refer to to the first revision where it's appeared

> +Aml *aml_sizeof(Aml *arg)
> +{
> +    Aml *var = aml_opcode(0x87 /* SizeOfOp */);
> +    aml_append(var, arg);
> +    return var;
> +}
> +
>  void
>  build_header(GArray *linker, GArray *table_data,
>               AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 21dc5e9..6b591ab 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -276,6 +276,7 @@ Aml *aml_varpackage(uint32_t num_elements);
>  Aml *aml_touuid(const char *uuid);
>  Aml *aml_unicode(const char *str);
>  Aml *aml_derefof(Aml *arg);
> +Aml *aml_sizeof(Aml *arg);
>  
>  void
>  build_header(GArray *linker, GArray *table_data,

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-09-02  9:58   ` Igor Mammedov
@ 2015-09-02 10:36     ` Xiao Guangrong
  2015-09-02 11:31       ` Igor Mammedov
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-02 10:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:59 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Introduce "pc-nvdimm" device and it has two parameters:
> Why do you use prefix "pc-", I suppose we potentially
> could use this device not only with x86 targets but with
> other targets as well.
> I'd just drop 'pc' prefix through out patchset.

Yeah, the prefix is stolen from pc-dimm, will drop this
prefix as your suggestion.

>
>> - @file, which is the backed memory file for NVDIMM device
> Could you try to split device into backend/frontend parts,
> like it's done with pc-dimm. As I understand it's preferred
> way to implement this kind of devices.
> Then you could reuse memory backends that we already have
> including file backend.

I considered it too and Stefan, Paolo got the some idea in
V1's review, however:

| However, file-based memory used by NVDIMM is special, it divides the file
| to two parts, one part is used as PMEM and another part is used to store
| NVDIMM's configure data.
|
| Maybe we can introduce "end-reserved" property to reserve specified size
| at the end of the file. Or create a new class type based on
| memory-backend-file (named nvdimm-backend-file) class to hide this magic
| thing?

Your idea?

>
> So CLI could look like:
> -object memory-backend-file,id=mem0,file=/storage/foo
> -device nvdimm,memdev=mem0,configdata=on
>
>>
>> - @configdata, specify if we need to reserve 128k at the end of
>>    @file for nvdimm device's config data. Default is false
>>
>> If @configdata is false, Qemu will build a static and readonly
>> namespace in memory and use it serveing for
>> DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
>> This is good for the user who want to pass whole nvdimm device
>> and make its data is complete visible to guest
>>
>> We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
>> Qemu command to create NVDIMM device for the guest
> PS:
> please try to fix commit message spelling/grammar wise.

Sorry for my careless, will try it fix them.

>
> [...]
>> +++ b/hw/mem/nvdimm/pc-nvdimm.c
>> @@ -0,0 +1,99 @@
>> +/*
>> + * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
> s/Implement/Implementation/ in all new files
> an maybe s/NVDIMM (A // as it's reduntant

Okay, will drop it.

>
> [...]
>> +static bool has_configdata(Object *obj, Error **errp)
>> +{
>> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> +
>> +    return nvdimm->configdata;
>> +}
>> +
>> +static void set_configdata(Object *obj, bool value, Error **errp)
>> +{
>> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> +
>> +    nvdimm->configdata = value;
>> +}
> usually for property setters/getters we use form:
>   "device_prefix"_[g|s]et_foo
> so
>   nvdim_get_configdata ...

Good to me.

Thanks for your review, Igor!

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof
  2015-09-02 10:16   ` Igor Mammedov
@ 2015-09-02 10:38     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-02 10:38 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 06:16 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:56 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Implement DeRefOf term which is used by NVDIMM _DSM method in later patch
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/aml-build.c         | 8 ++++++++
>>   include/hw/acpi/aml-build.h | 1 +
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 02f9e3d..9e89efc 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
>>       return var;
>>   }
>>
>> +/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
> Pls put here lowest doc revision where the term has first appeared

Okay, i will figure out the lowest revision.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof
  2015-09-02 10:18   ` Igor Mammedov
@ 2015-09-02 10:39     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-02 10:39 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 06:18 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:57 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Implement SizeOf term which is used by NVDIMM _DSM method in later patch
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/aml-build.c         | 8 ++++++++
>>   include/hw/acpi/aml-build.h | 1 +
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 9e89efc..a526eed 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
>>       return var;
>>   }
>>
>> +/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
> ditto, refer to to the first revision where it's appeared
>

Good to me, will do. :)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit
  2015-09-02 10:06   ` Igor Mammedov
@ 2015-09-02 10:43     ` Xiao Guangrong
  2015-09-02 11:42       ` Igor Mammedov
  2015-09-02 12:05     ` Michael S. Tsirkin
  1 sibling, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-02 10:43 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 06:06 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:55 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
>> Other components also reserve regions from their internal usage, e.g,
>> [0xFED00000, 0xFED00000 + 0x400) is reserved for HPET
>>
>> Switch SSDT to 64 bit to use the huge free room above 4G. In the later
>> patches, we will dynamical allocate free space within this region which
>> is used by NVDIMM _DSM method
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/i386/acpi-build.c  | 4 ++--
>>   hw/i386/acpi-dsdt.dsl | 2 +-
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 46eddb8..8ead1c1 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
>>       g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
>>       build_header(linker, table_data,
>>           (void *)(table_data->data + table_data->len - ssdt->buf->len),
>> -        "SSDT", ssdt->buf->len, 1);
>> +        "SSDT", ssdt->buf->len, 2);
> That might break Windows XP, since it supports only 1.0b ACPI with some
> 2.0 extensions.
> there is 2 way to work around it:
>   - add an additional Rev2 ssdt table if NVDIMMs are present
>     and describe them there

I like this way, it's more straightforward to me.

BTW, IIUC the DSDT still need to be changed to Rev2 to recognise SSDT with Rev2,
does it hurt Windows XP?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field
  2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field Xiao Guangrong
@ 2015-09-02 11:10   ` Igor Mammedov
  2015-09-06  5:32     ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02 11:10 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:51:58 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Implement CreateField term which are used by NVDIMM _DSM method in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 14 ++++++++++++++
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index a526eed..debdad2 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1151,6 +1151,20 @@ Aml *aml_sizeof(Aml *arg)
>      return var;
>  }
>  
> +/* ACPI 6.0: 20.2.5.2 Named Objects Encoding: DefCreateField */
ditto, refer to to the first revision where it's appeared

> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
index and len could be only of 'Integer' type, so there is no point
to pass them in as Aml, just use uintFOO_t here and convert
them to aml_int() internally. That way call sites will be smaller
and have less chance to pass a wrong Aml variable. 

> +{
> +    Aml *var = aml_alloc();
> +
drop newline

> +    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
> +    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
> +    aml_append(var, srcbuf);
> +    aml_append(var, index);
> +    aml_append(var, len);
> +    build_append_namestring(var->buf, "%s", name);
> +    return var;
> +}
> +
>  void
>  build_header(GArray *linker, GArray *table_data,
>               AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 6b591ab..d4dbd44 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -277,6 +277,7 @@ Aml *aml_touuid(const char *uuid);
>  Aml *aml_unicode(const char *str);
>  Aml *aml_derefof(Aml *arg);
>  Aml *aml_sizeof(Aml *arg);
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
>  
>  void
>  build_header(GArray *linker, GArray *table_data,

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-09-02 10:36     ` Xiao Guangrong
@ 2015-09-02 11:31       ` Igor Mammedov
  2015-09-06  6:07         ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02 11:31 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Wed, 2 Sep 2015 18:36:43 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> > On Fri, 14 Aug 2015 22:51:59 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> Introduce "pc-nvdimm" device and it has two parameters:
> > Why do you use prefix "pc-", I suppose we potentially
> > could use this device not only with x86 targets but with
> > other targets as well.
> > I'd just drop 'pc' prefix through out patchset.
> 
> Yeah, the prefix is stolen from pc-dimm, will drop this
> prefix as your suggestion.
> 
> >
> >> - @file, which is the backed memory file for NVDIMM device
> > Could you try to split device into backend/frontend parts,
> > like it's done with pc-dimm. As I understand it's preferred
> > way to implement this kind of devices.
> > Then you could reuse memory backends that we already have
> > including file backend.
> 
> I considered it too and Stefan, Paolo got the some idea in
> V1's review, however:
> 
> | However, file-based memory used by NVDIMM is special, it divides the file
> | to two parts, one part is used as PMEM and another part is used to store
> | NVDIMM's configure data.
> |
> | Maybe we can introduce "end-reserved" property to reserve specified size
> | at the end of the file. Or create a new class type based on
> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
> | thing?
I'd go with separate backend/frontend idea.

Question is if this config area is part backend or frontend?
If we pass-through NVDIMM device do we need to set configdata=true
and QEMU would skip building config structures and use structures
that are already present on passed-through device in that place?


> 
> Your idea?
[...]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit
  2015-09-02 10:43     ` Xiao Guangrong
@ 2015-09-02 11:42       ` Igor Mammedov
  2015-09-06  7:01         ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-02 11:42 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Wed, 2 Sep 2015 18:43:41 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/02/2015 06:06 PM, Igor Mammedov wrote:
> > On Fri, 14 Aug 2015 22:51:55 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
> >> Other components also reserve regions from their internal usage, e.g,
> >> [0xFED00000, 0xFED00000 + 0x400) is reserved for HPET
> >>
> >> Switch SSDT to 64 bit to use the huge free room above 4G. In the later
> >> patches, we will dynamical allocate free space within this region which
> >> is used by NVDIMM _DSM method
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >> ---
> >>   hw/i386/acpi-build.c  | 4 ++--
> >>   hw/i386/acpi-dsdt.dsl | 2 +-
> >>   2 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index 46eddb8..8ead1c1 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> >>       g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> >>       build_header(linker, table_data,
> >>           (void *)(table_data->data + table_data->len - ssdt->buf->len),
> >> -        "SSDT", ssdt->buf->len, 1);
> >> +        "SSDT", ssdt->buf->len, 2);
> > That might break Windows XP, since it supports only 1.0b ACPI with some
> > 2.0 extensions.
> > there is 2 way to work around it:
> >   - add an additional Rev2 ssdt table if NVDIMMs are present
> >     and describe them there
> 
> I like this way, it's more straightforward to me.
> 
> BTW, IIUC the DSDT still need to be changed to Rev2 to recognise SSDT with Rev2,
> does it hurt Windows XP?
Probably it will, but why DSDT should be v2 for one of SSDT to be v2,
they are separate tables.

Also you might find following interesting wrt Windows compatibility
http://www.acpi.info/presentations/S01USMOBS169_OS%20new.ppt

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit
  2015-09-02 10:06   ` Igor Mammedov
  2015-09-02 10:43     ` Xiao Guangrong
@ 2015-09-02 12:05     ` Michael S. Tsirkin
  1 sibling, 0 replies; 87+ messages in thread
From: Michael S. Tsirkin @ 2015-09-02 12:05 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, kvm, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, rth

On Wed, Sep 02, 2015 at 12:06:02PM +0200, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:55 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> 
> > Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
> > Other components also reserve regions from their internal usage, e.g,
> > [0xFED00000, 0xFED00000 + 0x400) is reserved for HPET
> > 
> > Switch SSDT to 64 bit to use the huge free room above 4G. In the later
> > patches, we will dynamical allocate free space within this region which
> > is used by NVDIMM _DSM method
> > 
> > Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > ---
> >  hw/i386/acpi-build.c  | 4 ++--
> >  hw/i386/acpi-dsdt.dsl | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 46eddb8..8ead1c1 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> >      g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> >      build_header(linker, table_data,
> >          (void *)(table_data->data + table_data->len - ssdt->buf->len),
> > -        "SSDT", ssdt->buf->len, 1);
> > +        "SSDT", ssdt->buf->len, 2);
> That might break Windows XP, since it supports only 1.0b ACPI with some
> 2.0 extensions.
> there is 2 way to work around it:
>  - add an additional Rev2 ssdt table if NVDIMMs are present
>    and describe them there
>  - make sure that you use only 32bit arithmetic in AML
>    (and emulate 64bit like it has been done for memory hotplug)

Add an XSDT and link the 64 bit stuff from there only.
This approach will need some work to avoid breaking UEFI.


> >      free_aml_allocator();
> >  }
> >  
> > @@ -1586,7 +1586,7 @@ build_dsdt(GArray *table_data, GArray *linker, AcpiMiscInfo *misc)
> >  
> >      memset(dsdt, 0, sizeof *dsdt);
> >      build_header(linker, table_data, dsdt, "DSDT",
> > -                 misc->dsdt_size, 1);
> > +                 misc->dsdt_size, 2);
> >  }
> >  
> >  static GArray *
> > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> > index a2d84ec..5cd3f0e 100644
> > --- a/hw/i386/acpi-dsdt.dsl
> > +++ b/hw/i386/acpi-dsdt.dsl
> > @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
> >  DefinitionBlock (
> >      "acpi-dsdt.aml",    // Output Filename
> >      "DSDT",             // Signature
> > -    0x01,               // DSDT Compliance Revision
> > +    0x02,               // DSDT Compliance Revision
> >      "BXPC",             // OEMID
> >      "BXDSDT",           // TABLE ID
> >      0x1                 // OEM Revision

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
                     ` (2 preceding siblings ...)
  2015-08-28 17:25   ` Eduardo Habkost
@ 2015-09-04 12:02   ` Igor Mammedov
  2015-09-06  7:22     ` Xiao Guangrong
  3 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-04 12:02 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:52:00 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> NVDIMM reserves all the free range above 4G to do:
> - Persistent Memory (PMEM) mapping
> - implement NVDIMM ACPI device _DSM method
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/i386/pc.c               | 12 ++++++++++--
>  hw/mem/nvdimm/pc-nvdimm.c  | 13 +++++++++++++
>  include/hw/mem/pc-nvdimm.h |  1 +
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 7661ea9..41af6ea 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -64,6 +64,7 @@
>  #include "hw/pci/pci_host.h"
>  #include "acpi-build.h"
>  #include "hw/mem/pc-dimm.h"
> +#include "hw/mem/pc-nvdimm.h"
>  #include "qapi/visitor.h"
>  #include "qapi-visit.h"
>  
> @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      FWCfgState *fw_cfg;
>      PCMachineState *pcms = PC_MACHINE(machine);
> +    ram_addr_t offset;
>  
>      assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
>  
> @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
>          exit(EXIT_FAILURE);
>      }
>  
> +    offset = 0x100000000ULL + above_4g_mem_size;
> +
>      /* initialize hotplug memory address space */
>      if (guest_info->has_reserved_memory &&
>          (machine->ram_size < machine->maxram_size)) {
> @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>              exit(EXIT_FAILURE);
>          }
>  
> -        pcms->hotplug_memory.base =
> -            ROUND_UP(0x100000000ULL + above_4g_mem_size, 1ULL << 30);
> +        pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);
>  
>          if (pcms->enforce_aligned_dimm) {
>              /* size hotplug region assuming 1G page max alignment per slot */
> @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
>                             "hotplug-memory", hotplug_mem_size);
>          memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
>                                      &pcms->hotplug_memory.mr);
> +
> +        offset = pcms->hotplug_memory.base + hotplug_mem_size;
>      }
>  
> +     /* all the space left above 4G is reserved for NVDIMM. */
> +    pc_nvdimm_reserve_range(offset);
I'd drop 'offset' in this patch and just use:
  foo(pcms->hotplug_memory.base + hotplug_mem_size)

> +
>      /* Initialize PC system firmware */
>      pc_system_firmware_init(rom_memory, guest_info->isapc_ram_fw);
>  
> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> index a53d235..7a270a8 100644
> --- a/hw/mem/nvdimm/pc-nvdimm.c
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -24,6 +24,19 @@
>  
>  #include "hw/mem/pc-nvdimm.h"
>  
> +#define PAGE_SIZE      (1UL << 12)
> +
> +static struct nvdimms_info {
> +    ram_addr_t current_addr;
> +} nvdimms_info;
no globals please, so far it looks like pcms->hotplug_memory
so add asimmilar nvdimm_memory field to PCMachineState

> +
> +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
> +void pc_nvdimm_reserve_range(ram_addr_t offset)
do you plan to reuse this function, if not then just inline it at call site

> +{
> +    offset = ROUND_UP(offset, PAGE_SIZE);
I'd suggest round up to 1Gb as we do with mem hotplug

> +    nvdimms_info.current_addr = offset;
> +}
> +
>  static char *get_file(Object *obj, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
> index 51152b8..8601e9b 100644
> --- a/include/hw/mem/pc-nvdimm.h
> +++ b/include/hw/mem/pc-nvdimm.h
> @@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
>  #define PC_NVDIMM(obj) \
>      OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
>  
> +void pc_nvdimm_reserve_range(ram_addr_t offset);
>  #endif

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field
  2015-09-02 11:10   ` Igor Mammedov
@ 2015-09-06  5:32     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-06  5:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 07:10 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:58 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Implement CreateField term which are used by NVDIMM _DSM method in later patch
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/aml-build.c         | 14 ++++++++++++++
>>   include/hw/acpi/aml-build.h |  1 +
>>   2 files changed, 15 insertions(+)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index a526eed..debdad2 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1151,6 +1151,20 @@ Aml *aml_sizeof(Aml *arg)
>>       return var;
>>   }
>>
>> +/* ACPI 6.0: 20.2.5.2 Named Objects Encoding: DefCreateField */
> ditto, refer to to the first revision where it's appeared

Well do it.

>
>> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
> index and len could be only of 'Integer' type, so there is no point
> to pass them in as Aml, just use uintFOO_t here and convert
> them to aml_int() internally. That way call sites will be smaller
> and have less chance to pass a wrong Aml variable.
>

Good idea.

BTW, Igor, sorry for the delay reply since we got a holiday in this week.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-09-02 11:31       ` Igor Mammedov
@ 2015-09-06  6:07         ` Xiao Guangrong
  2015-09-07 13:40           ` Igor Mammedov
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-06  6:07 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> On Wed, 2 Sep 2015 18:36:43 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
>>> On Fri, 14 Aug 2015 22:51:59 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> Introduce "pc-nvdimm" device and it has two parameters:
>>> Why do you use prefix "pc-", I suppose we potentially
>>> could use this device not only with x86 targets but with
>>> other targets as well.
>>> I'd just drop 'pc' prefix through out patchset.
>>
>> Yeah, the prefix is stolen from pc-dimm, will drop this
>> prefix as your suggestion.
>>
>>>
>>>> - @file, which is the backed memory file for NVDIMM device
>>> Could you try to split device into backend/frontend parts,
>>> like it's done with pc-dimm. As I understand it's preferred
>>> way to implement this kind of devices.
>>> Then you could reuse memory backends that we already have
>>> including file backend.
>>
>> I considered it too and Stefan, Paolo got the some idea in
>> V1's review, however:
>>
>> | However, file-based memory used by NVDIMM is special, it divides the file
>> | to two parts, one part is used as PMEM and another part is used to store
>> | NVDIMM's configure data.
>> |
>> | Maybe we can introduce "end-reserved" property to reserve specified size
>> | at the end of the file. Or create a new class type based on
>> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
>> | thing?
> I'd go with separate backend/frontend idea.
>
> Question is if this config area is part backend or frontend?

Configdata area is used to store nvdimm device's configuration, normally, it's
namespace info.

Currently, we chosen configdata located at the end of nvdimm's backend-memory
as it's easy to configure / use and configdata is naturally non-volatile and it
is like the layout on physical device.

However, using two separated backed-memory is okay, for example:
-object memory-backend-file,id=mem0,file=/storage/foo
-object memory-backend-file,id=mem1,file=/storage/bar
-device nvdimm,memdev=mem0,configdata=mem1
then configdata is written to a single backend.

Which one is better for you? :)

> If we pass-through NVDIMM device do we need to set configdata=true
> and QEMU would skip building config structures and use structures
> that are already present on passed-through device in that place?
>

The file specified by @file is something like a normal disk, like /dev/sda/,
host process can use whole space on it. If we want to directly pass it to guest,
we can specify 'configdata=false'. If we allow guest to 'partition' (create
namespace on) it then we use 'configdata=true' to reserve some space to store
its partition info (namesapce info).

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit
  2015-09-02 11:42       ` Igor Mammedov
@ 2015-09-06  7:01         ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-06  7:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/02/2015 07:42 PM, Igor Mammedov wrote:
> On Wed, 2 Sep 2015 18:43:41 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 09/02/2015 06:06 PM, Igor Mammedov wrote:
>>> On Fri, 14 Aug 2015 22:51:55 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
>>>> Other components also reserve regions from their internal usage, e.g,
>>>> [0xFED00000, 0xFED00000 + 0x400) is reserved for HPET
>>>>
>>>> Switch SSDT to 64 bit to use the huge free room above 4G. In the later
>>>> patches, we will dynamical allocate free space within this region which
>>>> is used by NVDIMM _DSM method
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>    hw/i386/acpi-build.c  | 4 ++--
>>>>    hw/i386/acpi-dsdt.dsl | 2 +-
>>>>    2 files changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>> index 46eddb8..8ead1c1 100644
>>>> --- a/hw/i386/acpi-build.c
>>>> +++ b/hw/i386/acpi-build.c
>>>> @@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
>>>>        g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
>>>>        build_header(linker, table_data,
>>>>            (void *)(table_data->data + table_data->len - ssdt->buf->len),
>>>> -        "SSDT", ssdt->buf->len, 1);
>>>> +        "SSDT", ssdt->buf->len, 2);
>>> That might break Windows XP, since it supports only 1.0b ACPI with some
>>> 2.0 extensions.
>>> there is 2 way to work around it:
>>>    - add an additional Rev2 ssdt table if NVDIMMs are present
>>>      and describe them there
>>
>> I like this way, it's more straightforward to me.
>>
>> BTW, IIUC the DSDT still need to be changed to Rev2 to recognise SSDT with Rev2,
>> does it hurt Windows XP?
> Probably it will, but why DSDT should be v2 for one of SSDT to be v2,
> they are separate tables.

When i made the first version of this patch, i only changed SSDT to v2 in build_ssdt()
but it failed, it worked only if both SSDT and DSDT were changed to v2. :(

I will confirm it again and figure it out.

>
> Also you might find following interesting wrt Windows compatibility
> http://www.acpi.info/presentations/S01USMOBS169_OS%20new.ppt

That's great help to me, thanks for your sharing, Igor!

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
  2015-09-04 12:02   ` Igor Mammedov
@ 2015-09-06  7:22     ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-06  7:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/04/2015 08:02 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:52:00 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> NVDIMM reserves all the free range above 4G to do:
>> - Persistent Memory (PMEM) mapping
>> - implement NVDIMM ACPI device _DSM method
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/i386/pc.c               | 12 ++++++++++--
>>   hw/mem/nvdimm/pc-nvdimm.c  | 13 +++++++++++++
>>   include/hw/mem/pc-nvdimm.h |  1 +
>>   3 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 7661ea9..41af6ea 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -64,6 +64,7 @@
>>   #include "hw/pci/pci_host.h"
>>   #include "acpi-build.h"
>>   #include "hw/mem/pc-dimm.h"
>> +#include "hw/mem/pc-nvdimm.h"
>>   #include "qapi/visitor.h"
>>   #include "qapi-visit.h"
>>
>> @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>>       MemoryRegion *ram_below_4g, *ram_above_4g;
>>       FWCfgState *fw_cfg;
>>       PCMachineState *pcms = PC_MACHINE(machine);
>> +    ram_addr_t offset;
>>
>>       assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
>>
>> @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
>>           exit(EXIT_FAILURE);
>>       }
>>
>> +    offset = 0x100000000ULL + above_4g_mem_size;
>> +
>>       /* initialize hotplug memory address space */
>>       if (guest_info->has_reserved_memory &&
>>           (machine->ram_size < machine->maxram_size)) {
>> @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
>>               exit(EXIT_FAILURE);
>>           }
>>
>> -        pcms->hotplug_memory.base =
>> -            ROUND_UP(0x100000000ULL + above_4g_mem_size, 1ULL << 30);
>> +        pcms->hotplug_memory.base = ROUND_UP(offset, 1ULL << 30);
>>
>>           if (pcms->enforce_aligned_dimm) {
>>               /* size hotplug region assuming 1G page max alignment per slot */
>> @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
>>                              "hotplug-memory", hotplug_mem_size);
>>           memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
>>                                       &pcms->hotplug_memory.mr);
>> +
>> +        offset = pcms->hotplug_memory.base + hotplug_mem_size;
>>       }
>>
>> +     /* all the space left above 4G is reserved for NVDIMM. */
>> +    pc_nvdimm_reserve_range(offset);
> I'd drop 'offset' in this patch and just use:
>    foo(pcms->hotplug_memory.base + hotplug_mem_size)
>

That works only if hotplug is used... however we can enable nvdimm separately.

>> +
>>       /* Initialize PC system firmware */
>>       pc_system_firmware_init(rom_memory, guest_info->isapc_ram_fw);
>>
>> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
>> index a53d235..7a270a8 100644
>> --- a/hw/mem/nvdimm/pc-nvdimm.c
>> +++ b/hw/mem/nvdimm/pc-nvdimm.c
>> @@ -24,6 +24,19 @@
>>
>>   #include "hw/mem/pc-nvdimm.h"
>>
>> +#define PAGE_SIZE      (1UL << 12)
>> +
>> +static struct nvdimms_info {
>> +    ram_addr_t current_addr;
>> +} nvdimms_info;
> no globals please, so far it looks like pcms->hotplug_memory
> so add asimmilar nvdimm_memory field to PCMachineState
>

Okay, it's good to me.

>> +
>> +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
>> +void pc_nvdimm_reserve_range(ram_addr_t offset)
> do you plan to reuse this function, if not then just inline it at call site

I prefer it as a inline function and move it to the nvdimm.h file since it's easier
to port it to other platforms - avoid to find the pieces of code related to nvdimm
in x86 arch and only needed to implement the functions in nvdimm.h.

>
>> +{
>> +    offset = ROUND_UP(offset, PAGE_SIZE);
> I'd suggest round up to 1Gb as we do with mem hotplug

Okay, good to me.

Really appreciate for all your time/comment in the whole patchset, Igor!

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-09-06  6:07         ` Xiao Guangrong
@ 2015-09-07 13:40           ` Igor Mammedov
  2015-09-08 14:03             ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-07 13:40 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Sun, 6 Sep 2015 14:07:21 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> > On Wed, 2 Sep 2015 18:36:43 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> >>> On Fri, 14 Aug 2015 22:51:59 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>> Introduce "pc-nvdimm" device and it has two parameters:
> >>> Why do you use prefix "pc-", I suppose we potentially
> >>> could use this device not only with x86 targets but with
> >>> other targets as well.
> >>> I'd just drop 'pc' prefix through out patchset.
> >>
> >> Yeah, the prefix is stolen from pc-dimm, will drop this
> >> prefix as your suggestion.
> >>
> >>>
> >>>> - @file, which is the backed memory file for NVDIMM device
> >>> Could you try to split device into backend/frontend parts,
> >>> like it's done with pc-dimm. As I understand it's preferred
> >>> way to implement this kind of devices.
> >>> Then you could reuse memory backends that we already have
> >>> including file backend.
> >>
> >> I considered it too and Stefan, Paolo got the some idea in
> >> V1's review, however:
> >>
> >> | However, file-based memory used by NVDIMM is special, it divides the file
> >> | to two parts, one part is used as PMEM and another part is used to store
> >> | NVDIMM's configure data.
> >> |
> >> | Maybe we can introduce "end-reserved" property to reserve specified size
> >> | at the end of the file. Or create a new class type based on
> >> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
> >> | thing?
> > I'd go with separate backend/frontend idea.
> >
> > Question is if this config area is part backend or frontend?
> 
> Configdata area is used to store nvdimm device's configuration, normally, it's
> namespace info.
> 
> Currently, we chosen configdata located at the end of nvdimm's backend-memory
> as it's easy to configure / use and configdata is naturally non-volatile and it
> is like the layout on physical device.
> 
> However, using two separated backed-memory is okay, for example:
> -object memory-backend-file,id=mem0,file=/storage/foo
> -object memory-backend-file,id=mem1,file=/storage/bar
> -device nvdimm,memdev=mem0,configdata=mem1
> then configdata is written to a single backend.
> 
> Which one is better for you? :)
> 
> > If we pass-through NVDIMM device do we need to set configdata=true
> > and QEMU would skip building config structures and use structures
> > that are already present on passed-through device in that place?
> >
> 
> The file specified by @file is something like a normal disk, like /dev/sda/,
> host process can use whole space on it. If we want to directly pass it to guest,
> we can specify 'configdata=false'. If we allow guest to 'partition' (create
> namespace on) it then we use 'configdata=true' to reserve some space to store
> its partition info (namesapce info).
As far as I understand currently linux provides to userspace only one interface
which is block device i.e. /dev/sdX and on top of it userspace can put
PM/DAX aware filesystem and use files from it. In either cases kernel
just provides access to separate namespaces and not to a whole NVDIMM which
includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
so we could consider it as just a file/storage that could be used by userspace.

Lets assume that NVDIMM should always have 'labels area'.
In that case I'd always reserve space for it and
 * format it (build a new one) if backend doesn't have a
   valid labels area dropping configdata parameter along the way
 * or if backing-file already has valid labels area I'd just use it.

If you need to make labels area readonly you can introduce 'NVDIMM.readonly_labels'
option and just use labels backend's without allowing changes writeback.
IT would be better to make it another series on top of basic NVDIMM implementation
if there is an actual usecase for it.

PS:
Also when you write commit messages, comment and name variables try to use terms from
relevant spec and mention specs where you describe data structures from them.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area Xiao Guangrong
  2015-08-25 16:03   ` Stefan Hajnoczi
@ 2015-09-07 14:11   ` Igor Mammedov
  2015-09-08 13:38     ` Xiao Guangrong
  2015-09-15 16:11     ` Paolo Bonzini
  1 sibling, 2 replies; 87+ messages in thread
From: Igor Mammedov @ 2015-09-07 14:11 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Fri, 14 Aug 2015 22:52:01 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> The parameter @file is used as backed memory for NVDIMM which is
> divided into two parts if @dataconfig is true:
> - first parts is (0, size - 128K], which is used as PMEM (Persistent
>   Memory)
> - 128K at the end of the file, which is used as Config Data Area, it's
>   used to store Label namespace data
> 
> The @file supports both regular file and block device, of course we
> can assign any these two kinds of files for test and emulation, however,
> in the real word for performance reason, we usually used these files as
> NVDIMM backed file:
> - the regular file in the filesystem with DAX enabled created on NVDIMM
>   device on host
> - the raw PMEM device on host, e,g /dev/pmem0

A lot of code in this series could reuse what QEMU already
uses for implementing pc-dimm devices.

here is common concepts that could be reused.
  - on physical system both DIMM and NVDIMM devices use
    the same slots. We could share QEMU's '-m slots' option between
    both devices. An alternative to not sharing would be to introduce
    '-machine nvdimm_slots' option.
    And yes, we need to know number of NVDIMMs to describe
    them all in ACPI table (taking in amount future hotplug
    include in this possible NVDIMM devices)
    I'd go the same way as on real hardware on make them share the same slots.
  - they share the same physical address space and limits
    on how much memory system can handle. So I'd suggest sharing existing
    '-m maxmem' option and reuse hotplug_memory address space.

Essentially what I'm suggesting is to inherit NVDIMM's implementation
from pc-dimm reusing all of its code/backends and
just override parts that do memory mapping into guest's address space to
accommodate NVDIMM's requirements.

> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/mem/nvdimm/pc-nvdimm.c  | 109 ++++++++++++++++++++++++++++++++++++++++++++-
>  include/hw/mem/pc-nvdimm.h |   7 +++
>  2 files changed, 115 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> index 7a270a8..97710d1 100644
> --- a/hw/mem/nvdimm/pc-nvdimm.c
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -22,12 +22,20 @@
>   * License along with this library; if not, see <http://www.gnu.org/licenses/>
>   */
>  
> +#include <sys/mman.h>
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +
> +#include "exec/address-spaces.h"
>  #include "hw/mem/pc-nvdimm.h"
>  
> -#define PAGE_SIZE      (1UL << 12)
> +#define PAGE_SIZE               (1UL << 12)
> +
> +#define MIN_CONFIG_DATA_SIZE    (128 << 10)
>  
>  static struct nvdimms_info {
>      ram_addr_t current_addr;
> +    int device_index;
>  } nvdimms_info;
>  
>  /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
> @@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
>      nvdimms_info.current_addr = offset;
>  }
>  
> +static ram_addr_t reserved_range_push(uint64_t size)
> +{
> +    uint64_t current;
> +
> +    current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
> +
> +    /* do not have enough space? */
> +    if (current + size < current) {
> +        return 0;
> +    }
> +
> +    nvdimms_info.current_addr = current + size;
> +    return current;
> +}
You can't use all memory above hotplug_memory area since
we have to tell guest where 64-bit PCI window starts,
and currently it should start at reserved-memory-end
(but it isn't due to a bug: I've just posted fix to qemu-devel
 "[PATCH 0/2] pc: fix 64-bit PCI window clashing with memory hotplug region"
)

> +
> +static uint32_t new_device_index(void)
> +{
> +    return nvdimms_info.device_index++;
> +}
> +
>  static char *get_file(Object *obj, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> @@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>  
> +    if (memory_region_size(&nvdimm->mr)) {
> +        error_setg(errp, "cannot change property value");
> +        return;
> +    }
> +
>      if (nvdimm->file) {
>          g_free(nvdimm->file);
>      }
> @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
>                               set_configdata, NULL);
>  }
>  
> +static uint64_t get_file_size(int fd)
> +{
> +    struct stat stat_buf;
> +    uint64_t size;
> +
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return 0;
> +    }
> +
> +    if (S_ISREG(stat_buf.st_mode)) {
> +        return stat_buf.st_size;
> +    }
> +
> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        return size;
> +    }
> +
> +    return 0;
> +}
All this file stuff I'd leave to already existing backends like
memory-backend-file or even memory-backend-ram which already do
above and more allowing to configure persistent and volatile
NVDIMMs without changing NVDIMM fronted code.

> +
>  static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
> +    char name[512];
> +    void *buf;
> +    ram_addr_t addr;
> +    uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
> +    int fd;
>  
>      if (!nvdimm->file) {
>          error_setg(errp, "file property is not set");
>      }
> +
> +    fd = open(nvdimm->file, O_RDWR);
> +    if (fd < 0) {
> +        error_setg(errp, "can not open %s", nvdimm->file);
> +        return;
> +    }
> +
> +    size = get_file_size(fd);
> +    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +    if (buf == MAP_FAILED) {
> +        error_setg(errp, "can not do mmap on %s", nvdimm->file);
> +        goto do_close;
> +    }
> +
> +    nvdimm->config_data_size = config_size;
> +    if (nvdimm->configdata) {
> +        /* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
> +        nvdimm_size = size - config_size;
> +        nvdimm->config_data_addr = buf + nvdimm_size;
> +    } else {
> +        nvdimm_size = size;
> +        nvdimm->config_data_addr = NULL;
> +    }
> +
> +    if ((int64_t)nvdimm_size <= 0) {
> +        error_setg(errp, "file size is too small to store NVDIMM"
> +                         " configure data");
> +        goto do_unmap;
> +    }
> +
> +    addr = reserved_range_push(nvdimm_size);
> +    if (!addr) {
> +        error_setg(errp, "do not have enough space for size %#lx.\n", size);
> +        goto do_unmap;
> +    }
> +
> +    nvdimm->device_index = new_device_index();
> +    sprintf(name, "NVDIMM-%d", nvdimm->device_index);
> +    memory_region_init_ram_ptr(&nvdimm->mr, OBJECT(dev), name, nvdimm_size,
> +                               buf);
> +    vmstate_register_ram(&nvdimm->mr, DEVICE(dev));
> +    memory_region_add_subregion(get_system_memory(), addr, &nvdimm->mr);
> +
> +    return;
> +
> +do_unmap:
> +    munmap(buf, size);
> +do_close:
> +    close(fd);
>  }
>  
>  static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
> diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
> index 8601e9b..f617fd2 100644
> --- a/include/hw/mem/pc-nvdimm.h
> +++ b/include/hw/mem/pc-nvdimm.h
> @@ -21,6 +21,13 @@ typedef struct PCNVDIMMDevice {
>  
>      char *file;
>      bool configdata;
> +
> +    int device_index;
> +
> +    uint64_t config_data_size;
> +    void *config_data_addr;
> +
> +    MemoryRegion mr;
>  } PCNVDIMMDevice;
>  
>  #define TYPE_PC_NVDIMM "pc-nvdimm"

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-07 14:11   ` Igor Mammedov
@ 2015-09-08 13:38     ` Xiao Guangrong
  2015-09-10 10:35       ` Igor Mammedov
  2015-09-15 16:11     ` Paolo Bonzini
  1 sibling, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-08 13:38 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/07/2015 10:11 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:52:01 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> The parameter @file is used as backed memory for NVDIMM which is
>> divided into two parts if @dataconfig is true:
>> - first parts is (0, size - 128K], which is used as PMEM (Persistent
>>    Memory)
>> - 128K at the end of the file, which is used as Config Data Area, it's
>>    used to store Label namespace data
>>
>> The @file supports both regular file and block device, of course we
>> can assign any these two kinds of files for test and emulation, however,
>> in the real word for performance reason, we usually used these files as
>> NVDIMM backed file:
>> - the regular file in the filesystem with DAX enabled created on NVDIMM
>>    device on host
>> - the raw PMEM device on host, e,g /dev/pmem0
>
> A lot of code in this series could reuse what QEMU already
> uses for implementing pc-dimm devices.
>
> here is common concepts that could be reused.
>    - on physical system both DIMM and NVDIMM devices use
>      the same slots. We could share QEMU's '-m slots' option between
>      both devices. An alternative to not sharing would be to introduce
>      '-machine nvdimm_slots' option.
>      And yes, we need to know number of NVDIMMs to describe
>      them all in ACPI table (taking in amount future hotplug
>      include in this possible NVDIMM devices)
>      I'd go the same way as on real hardware on make them share the same slots.

I'd prefer sharing slots for pc-dimm and nvdimm, it's easier to reuse the
logic of slot-assignment and plug/unplug.

>    - they share the same physical address space and limits
>      on how much memory system can handle. So I'd suggest sharing existing
>      '-m maxmem' option and reuse hotplug_memory address space.

Sounds good to me.

>
> Essentially what I'm suggesting is to inherit NVDIMM's implementation
> from pc-dimm reusing all of its code/backends and
> just override parts that do memory mapping into guest's address space to
> accommodate NVDIMM's requirements.

Good idea!

We have to differentiate pc-dimm and nvdimm in the common code and nvdimm
has different points with pc-dimm (for example, its has reserved-region, and
need support live migration of label data). How about rename 'pc-nvdimm' to
'memory-device' and make it as a common device type, then build pc-dimm and
nvdimm on top of it?

Something like:
static TypeInfo memory_device_info = {
     .name          = TYPE_MEM_DEV,
     .parent        = TYPE_DEVICE,
};

static TypeInfo memory_device_info = {
.name = TYPE_PC_DIMM,
.parent = TYPE_MEM_DEV,
};

static TypeInfo memory_device_info = {
.name = TYPE_NVDIMM,
.parent = TYPE_MEM_DEV,
};

It also make CONIFG_NVDIMM and CONFIG_HOT_PLUG be independent.

>
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/mem/nvdimm/pc-nvdimm.c  | 109 ++++++++++++++++++++++++++++++++++++++++++++-
>>   include/hw/mem/pc-nvdimm.h |   7 +++
>>   2 files changed, 115 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
>> index 7a270a8..97710d1 100644
>> --- a/hw/mem/nvdimm/pc-nvdimm.c
>> +++ b/hw/mem/nvdimm/pc-nvdimm.c
>> @@ -22,12 +22,20 @@
>>    * License along with this library; if not, see <http://www.gnu.org/licenses/>
>>    */
>>
>> +#include <sys/mman.h>
>> +#include <sys/ioctl.h>
>> +#include <linux/fs.h>
>> +
>> +#include "exec/address-spaces.h"
>>   #include "hw/mem/pc-nvdimm.h"
>>
>> -#define PAGE_SIZE      (1UL << 12)
>> +#define PAGE_SIZE               (1UL << 12)
>> +
>> +#define MIN_CONFIG_DATA_SIZE    (128 << 10)
>>
>>   static struct nvdimms_info {
>>       ram_addr_t current_addr;
>> +    int device_index;
>>   } nvdimms_info;
>>
>>   /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
>> @@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
>>       nvdimms_info.current_addr = offset;
>>   }
>>
>> +static ram_addr_t reserved_range_push(uint64_t size)
>> +{
>> +    uint64_t current;
>> +
>> +    current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
>> +
>> +    /* do not have enough space? */
>> +    if (current + size < current) {
>> +        return 0;
>> +    }
>> +
>> +    nvdimms_info.current_addr = current + size;
>> +    return current;
>> +}
> You can't use all memory above hotplug_memory area since
> we have to tell guest where 64-bit PCI window starts,
> and currently it should start at reserved-memory-end
> (but it isn't due to a bug: I've just posted fix to qemu-devel
>   "[PATCH 0/2] pc: fix 64-bit PCI window clashing with memory hotplug region"
> )

Ah, got it, thanks for you pointing it out.

>
>> +
>> +static uint32_t new_device_index(void)
>> +{
>> +    return nvdimms_info.device_index++;
>> +}
>> +
>>   static char *get_file(Object *obj, Error **errp)
>>   {
>>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> @@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error **errp)
>>   {
>>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>>
>> +    if (memory_region_size(&nvdimm->mr)) {
>> +        error_setg(errp, "cannot change property value");
>> +        return;
>> +    }
>> +
>>       if (nvdimm->file) {
>>           g_free(nvdimm->file);
>>       }
>> @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
>>                                set_configdata, NULL);
>>   }
>>
>> +static uint64_t get_file_size(int fd)
>> +{
>> +    struct stat stat_buf;
>> +    uint64_t size;
>> +
>> +    if (fstat(fd, &stat_buf) < 0) {
>> +        return 0;
>> +    }
>> +
>> +    if (S_ISREG(stat_buf.st_mode)) {
>> +        return stat_buf.st_size;
>> +    }
>> +
>> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
>> +        return size;
>> +    }
>> +
>> +    return 0;
>> +}
> All this file stuff I'd leave to already existing backends like
> memory-backend-file or even memory-backend-ram which already do
> above and more allowing to configure persistent and volatile
> NVDIMMs without changing NVDIMM fronted code.
>

The current memory backends use all memory size and map it to guest's
address space. However, nvdimm needs a reserved region for its label
data which is only accessed in Qemu.

How about introduce two parameters, "reserved_size" and "reserved_addr"
to TYPE_MEMORY_BACKEND, then only the memory region
[0, size - reserved_size) is mapped to guest and the remain part is
pointed by "reserved_addr"?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-09-07 13:40           ` Igor Mammedov
@ 2015-09-08 14:03             ` Xiao Guangrong
  2015-09-10  9:47               ` Igor Mammedov
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-08 14:03 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth



On 09/07/2015 09:40 PM, Igor Mammedov wrote:
> On Sun, 6 Sep 2015 14:07:21 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
>>> On Wed, 2 Sep 2015 18:36:43 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
>>>>> On Fri, 14 Aug 2015 22:51:59 +0800
>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>
>>>>>> Introduce "pc-nvdimm" device and it has two parameters:
>>>>> Why do you use prefix "pc-", I suppose we potentially
>>>>> could use this device not only with x86 targets but with
>>>>> other targets as well.
>>>>> I'd just drop 'pc' prefix through out patchset.
>>>>
>>>> Yeah, the prefix is stolen from pc-dimm, will drop this
>>>> prefix as your suggestion.
>>>>
>>>>>
>>>>>> - @file, which is the backed memory file for NVDIMM device
>>>>> Could you try to split device into backend/frontend parts,
>>>>> like it's done with pc-dimm. As I understand it's preferred
>>>>> way to implement this kind of devices.
>>>>> Then you could reuse memory backends that we already have
>>>>> including file backend.
>>>>
>>>> I considered it too and Stefan, Paolo got the some idea in
>>>> V1's review, however:
>>>>
>>>> | However, file-based memory used by NVDIMM is special, it divides the file
>>>> | to two parts, one part is used as PMEM and another part is used to store
>>>> | NVDIMM's configure data.
>>>> |
>>>> | Maybe we can introduce "end-reserved" property to reserve specified size
>>>> | at the end of the file. Or create a new class type based on
>>>> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
>>>> | thing?
>>> I'd go with separate backend/frontend idea.
>>>
>>> Question is if this config area is part backend or frontend?
>>
>> Configdata area is used to store nvdimm device's configuration, normally, it's
>> namespace info.
>>
>> Currently, we chosen configdata located at the end of nvdimm's backend-memory
>> as it's easy to configure / use and configdata is naturally non-volatile and it
>> is like the layout on physical device.
>>
>> However, using two separated backed-memory is okay, for example:
>> -object memory-backend-file,id=mem0,file=/storage/foo
>> -object memory-backend-file,id=mem1,file=/storage/bar
>> -device nvdimm,memdev=mem0,configdata=mem1
>> then configdata is written to a single backend.
>>
>> Which one is better for you? :)
>>
>>> If we pass-through NVDIMM device do we need to set configdata=true
>>> and QEMU would skip building config structures and use structures
>>> that are already present on passed-through device in that place?
>>>
>>
>> The file specified by @file is something like a normal disk, like /dev/sda/,
>> host process can use whole space on it. If we want to directly pass it to guest,
>> we can specify 'configdata=false'. If we allow guest to 'partition' (create
>> namespace on) it then we use 'configdata=true' to reserve some space to store
>> its partition info (namesapce info).
> As far as I understand currently linux provides to userspace only one interface
> which is block device i.e. /dev/sdX and on top of it userspace can put
> PM/DAX aware filesystem and use files from it. In either cases kernel
> just provides access to separate namespaces and not to a whole NVDIMM which
> includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
> so we could consider it as just a file/storage that could be used by userspace.
>

Yes, it is.

> Lets assume that NVDIMM should always have 'labels area'.
> In that case I'd always reserve space for it and
>   * format it (build a new one) if backend doesn't have a
>     valid labels area dropping configdata parameter along the way
>   * or if backing-file already has valid labels area I'd just use it.

Yes.

>
> If you need to make labels area readonly you can introduce 'NVDIMM.readonly_labels'
> option and just use labels backend's without allowing changes writeback.
> IT would be better to make it another series on top of basic NVDIMM implementation
> if there is an actual usecase for it.

I'd prefer the way that discards not only its label data but also the whole nvdimm device,
that is, open(, RDONLY) + mmap(, MAP_PRIVATE), the idea was raised by Stefan.

The 'configdata = false' in this patchset does not aim at making label readonly, it provides
a way to make the file in a single partition. For example, you create a image in /dev/pmem0,
pass it to guest, then the whole file will appear at /dev/pmem0 in guest, and guest can directly
use the image in that device. Under this case, no file region is reserved and the label data is
build in memory which can not be updated by guest.

>
> PS:
> Also when you write commit messages, comment and name variables try to use terms from
> relevant spec and mention specs where you describe data structures from them.

Parts of the names/definitions were stolen from Kernel NVDIMM driver, i will update it to
let them reflect the specs. Thanks, Igor.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
  2015-09-08 14:03             ` Xiao Guangrong
@ 2015-09-10  9:47               ` Igor Mammedov
  0 siblings, 0 replies; 87+ messages in thread
From: Igor Mammedov @ 2015-09-10  9:47 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Tue, 8 Sep 2015 22:03:01 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/07/2015 09:40 PM, Igor Mammedov wrote:
> > On Sun, 6 Sep 2015 14:07:21 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> >>> On Wed, 2 Sep 2015 18:36:43 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> >>>>> On Fri, 14 Aug 2015 22:51:59 +0800
> >>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>>>
> >>>>>> Introduce "pc-nvdimm" device and it has two parameters:
> >>>>> Why do you use prefix "pc-", I suppose we potentially
> >>>>> could use this device not only with x86 targets but with
> >>>>> other targets as well.
> >>>>> I'd just drop 'pc' prefix through out patchset.
> >>>>
> >>>> Yeah, the prefix is stolen from pc-dimm, will drop this
> >>>> prefix as your suggestion.
> >>>>
> >>>>>
> >>>>>> - @file, which is the backed memory file for NVDIMM device
> >>>>> Could you try to split device into backend/frontend parts,
> >>>>> like it's done with pc-dimm. As I understand it's preferred
> >>>>> way to implement this kind of devices.
> >>>>> Then you could reuse memory backends that we already have
> >>>>> including file backend.
> >>>>
> >>>> I considered it too and Stefan, Paolo got the some idea in
> >>>> V1's review, however:
> >>>>
> >>>> | However, file-based memory used by NVDIMM is special, it divides the file
> >>>> | to two parts, one part is used as PMEM and another part is used to store
> >>>> | NVDIMM's configure data.
> >>>> |
> >>>> | Maybe we can introduce "end-reserved" property to reserve specified size
> >>>> | at the end of the file. Or create a new class type based on
> >>>> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
> >>>> | thing?
> >>> I'd go with separate backend/frontend idea.
> >>>
> >>> Question is if this config area is part backend or frontend?
> >>
> >> Configdata area is used to store nvdimm device's configuration, normally, it's
> >> namespace info.
> >>
> >> Currently, we chosen configdata located at the end of nvdimm's backend-memory
> >> as it's easy to configure / use and configdata is naturally non-volatile and it
> >> is like the layout on physical device.
> >>
> >> However, using two separated backed-memory is okay, for example:
> >> -object memory-backend-file,id=mem0,file=/storage/foo
> >> -object memory-backend-file,id=mem1,file=/storage/bar
> >> -device nvdimm,memdev=mem0,configdata=mem1
> >> then configdata is written to a single backend.
> >>
> >> Which one is better for you? :)
> >>
> >>> If we pass-through NVDIMM device do we need to set configdata=true
> >>> and QEMU would skip building config structures and use structures
> >>> that are already present on passed-through device in that place?
> >>>
> >>
> >> The file specified by @file is something like a normal disk, like /dev/sda/,
> >> host process can use whole space on it. If we want to directly pass it to guest,
> >> we can specify 'configdata=false'. If we allow guest to 'partition' (create
> >> namespace on) it then we use 'configdata=true' to reserve some space to store
> >> its partition info (namesapce info).
> > As far as I understand currently linux provides to userspace only one interface
> > which is block device i.e. /dev/sdX and on top of it userspace can put
> > PM/DAX aware filesystem and use files from it. In either cases kernel
> > just provides access to separate namespaces and not to a whole NVDIMM which
> > includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
> > so we could consider it as just a file/storage that could be used by userspace.
> >
> 
> Yes, it is.
> 
> > Lets assume that NVDIMM should always have 'labels area'.
> > In that case I'd always reserve space for it and
> >   * format it (build a new one) if backend doesn't have a
> >     valid labels area dropping configdata parameter along the way
On a second glance, qemu probably shouldn't 'format/build' labels
in this case and only supply non configured NVDIMM to guest.
It probably should be guest responsibility to configure namespaces
on bare NVDIMM. Also it would allow to seamlessly introduce separate labels
backend as described below to cover 'configdata = false' case.

> >   * or if backing-file already has valid labels area I'd just use it.
> 
> Yes.
> 
> >
> > If you need to make labels area readonly you can introduce 'NVDIMM.readonly_labels'
> > option and just use labels backend's without allowing changes writeback.
> > IT would be better to make it another series on top of basic NVDIMM implementation
> > if there is an actual usecase for it.
> 
> I'd prefer the way that discards not only its label data but also the whole nvdimm device,
> that is, open(, RDONLY) + mmap(, MAP_PRIVATE), the idea was raised by Stefan.
Yes, we can do this but it's not related to NVDIMM at all, it's backend's feature
that also could be (re)used by pc-dimm with file backend.
I think it could be done on top.

> 
> The 'configdata = false' in this patchset does not aim at making label readonly, it provides
> a way to make the file in a single partition. For example, you create a image in /dev/pmem0,
> pass it to guest, then the whole file will appear at /dev/pmem0 in guest, and guest can directly
> use the image in that device. Under this case, no file region is reserved and the label data is
> build in memory which can not be updated by guest.
If we will have 'configdata = false' as in this patchset, one will have
to take special care to keep it bug compatible with previous versions
as implementation changes/fixes might affect compatibility.
I think it is simpler and more robust to always have metadata in backend.
That's also would be migration friendly since labels area will stay
consistent with data it describes, regardless of QEMU version when we
do cross version migration.

Later we could introduce a separate option for a separate labels area
to cover above described usecase.

> 
> >
> > PS:
> > Also when you write commit messages, comment and name variables try to use terms from
> > relevant spec and mention specs where you describe data structures from them.
> 
> Parts of the names/definitions were stolen from Kernel NVDIMM driver, i will update it to
> let them reflect the specs. Thanks, Igor.
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-08 13:38     ` Xiao Guangrong
@ 2015-09-10 10:35       ` Igor Mammedov
  0 siblings, 0 replies; 87+ messages in thread
From: Igor Mammedov @ 2015-09-10 10:35 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	pbonzini, rth

On Tue, 8 Sep 2015 21:38:17 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/07/2015 10:11 PM, Igor Mammedov wrote:
> > On Fri, 14 Aug 2015 22:52:01 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> The parameter @file is used as backed memory for NVDIMM which is
> >> divided into two parts if @dataconfig is true:
> >> - first parts is (0, size - 128K], which is used as PMEM (Persistent
> >>    Memory)
> >> - 128K at the end of the file, which is used as Config Data Area, it's
> >>    used to store Label namespace data
> >>
> >> The @file supports both regular file and block device, of course we
> >> can assign any these two kinds of files for test and emulation, however,
> >> in the real word for performance reason, we usually used these files as
> >> NVDIMM backed file:
> >> - the regular file in the filesystem with DAX enabled created on NVDIMM
> >>    device on host
> >> - the raw PMEM device on host, e,g /dev/pmem0
> >
> > A lot of code in this series could reuse what QEMU already
> > uses for implementing pc-dimm devices.
> >
> > here is common concepts that could be reused.
> >    - on physical system both DIMM and NVDIMM devices use
> >      the same slots. We could share QEMU's '-m slots' option between
> >      both devices. An alternative to not sharing would be to introduce
> >      '-machine nvdimm_slots' option.
> >      And yes, we need to know number of NVDIMMs to describe
> >      them all in ACPI table (taking in amount future hotplug
> >      include in this possible NVDIMM devices)
> >      I'd go the same way as on real hardware on make them share the same slots.
> 
> I'd prefer sharing slots for pc-dimm and nvdimm, it's easier to reuse the
> logic of slot-assignment and plug/unplug.
> 
> >    - they share the same physical address space and limits
> >      on how much memory system can handle. So I'd suggest sharing existing
> >      '-m maxmem' option and reuse hotplug_memory address space.
> 
> Sounds good to me.
> 
> >
> > Essentially what I'm suggesting is to inherit NVDIMM's implementation
> > from pc-dimm reusing all of its code/backends and
> > just override parts that do memory mapping into guest's address space to
> > accommodate NVDIMM's requirements.
> 
> Good idea!
> 
> We have to differentiate pc-dimm and nvdimm in the common code and nvdimm
> has different points with pc-dimm (for example, its has reserved-region, and
> need support live migration of label data). How about rename 'pc-nvdimm' to
> 'memory-device' and make it as a common device type, then build pc-dimm and
> nvdimm on top of it?
sound good, only I'd call it just 'dimm' as 'memory-device' is too broad.
Also I'd make base class abstract.

> 
> Something like:
> static TypeInfo memory_device_info = {
>      .name          = TYPE_MEM_DEV,
>      .parent        = TYPE_DEVICE,
> };
> 
> static TypeInfo memory_device_info = {
> .name = TYPE_PC_DIMM,
> .parent = TYPE_MEM_DEV,
> };
> 
> static TypeInfo memory_device_info = {
> .name = TYPE_NVDIMM,
> .parent = TYPE_MEM_DEV,
> };
> 
> It also make CONIFG_NVDIMM and CONFIG_HOT_PLUG be independent.
> 
> >
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >> ---
> >>   hw/mem/nvdimm/pc-nvdimm.c  | 109 ++++++++++++++++++++++++++++++++++++++++++++-
> >>   include/hw/mem/pc-nvdimm.h |   7 +++
> >>   2 files changed, 115 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> >> index 7a270a8..97710d1 100644
> >> --- a/hw/mem/nvdimm/pc-nvdimm.c
> >> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> >> @@ -22,12 +22,20 @@
> >>    * License along with this library; if not, see <http://www.gnu.org/licenses/>
> >>    */
> >>
> >> +#include <sys/mman.h>
> >> +#include <sys/ioctl.h>
> >> +#include <linux/fs.h>
> >> +
> >> +#include "exec/address-spaces.h"
> >>   #include "hw/mem/pc-nvdimm.h"
> >>
> >> -#define PAGE_SIZE      (1UL << 12)
> >> +#define PAGE_SIZE               (1UL << 12)
> >> +
> >> +#define MIN_CONFIG_DATA_SIZE    (128 << 10)
> >>
> >>   static struct nvdimms_info {
> >>       ram_addr_t current_addr;
> >> +    int device_index;
> >>   } nvdimms_info;
> >>
> >>   /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
> >> @@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
> >>       nvdimms_info.current_addr = offset;
> >>   }
> >>
> >> +static ram_addr_t reserved_range_push(uint64_t size)
> >> +{
> >> +    uint64_t current;
> >> +
> >> +    current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
> >> +
> >> +    /* do not have enough space? */
> >> +    if (current + size < current) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    nvdimms_info.current_addr = current + size;
> >> +    return current;
> >> +}
> > You can't use all memory above hotplug_memory area since
> > we have to tell guest where 64-bit PCI window starts,
> > and currently it should start at reserved-memory-end
> > (but it isn't due to a bug: I've just posted fix to qemu-devel
> >   "[PATCH 0/2] pc: fix 64-bit PCI window clashing with memory hotplug region"
> > )
> 
> Ah, got it, thanks for you pointing it out.
> 
> >
> >> +
> >> +static uint32_t new_device_index(void)
> >> +{
> >> +    return nvdimms_info.device_index++;
> >> +}
> >> +
> >>   static char *get_file(Object *obj, Error **errp)
> >>   {
> >>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> >> @@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error **errp)
> >>   {
> >>       PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> >>
> >> +    if (memory_region_size(&nvdimm->mr)) {
> >> +        error_setg(errp, "cannot change property value");
> >> +        return;
> >> +    }
> >> +
> >>       if (nvdimm->file) {
> >>           g_free(nvdimm->file);
> >>       }
> >> @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
> >>                                set_configdata, NULL);
> >>   }
> >>
> >> +static uint64_t get_file_size(int fd)
> >> +{
> >> +    struct stat stat_buf;
> >> +    uint64_t size;
> >> +
> >> +    if (fstat(fd, &stat_buf) < 0) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    if (S_ISREG(stat_buf.st_mode)) {
> >> +        return stat_buf.st_size;
> >> +    }
> >> +
> >> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
> >> +        return size;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> > All this file stuff I'd leave to already existing backends like
> > memory-backend-file or even memory-backend-ram which already do
> > above and more allowing to configure persistent and volatile
> > NVDIMMs without changing NVDIMM fronted code.
> >
> 
> The current memory backends use all memory size and map it to guest's
> address space. However, nvdimm needs a reserved region for its label
> data which is only accessed in Qemu.
> 
> How about introduce two parameters, "reserved_size" and "reserved_addr"
> to TYPE_MEMORY_BACKEND, then only the memory region
> [0, size - reserved_size) is mapped to guest and the remain part is
> pointed by "reserved_addr"?
Looks like only "reserved_size" is sufficient if there aren't any plans
/reasons to keep labels area not at the end of NVDIMM.

Also keeping reserved area in the same backend => MemoryRegion should
automatically migrate it during live migration.
To separate and map guest visible part data part of backend's MemoryRegion
you could use memory_region_init_alias().

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-25 16:03   ` Stefan Hajnoczi
  2015-08-26 10:40     ` Xiao Guangrong
@ 2015-09-15 16:06     ` Paolo Bonzini
  2015-09-17  8:21       ` Xiao Guangrong
  1 sibling, 1 reply; 87+ messages in thread
From: Paolo Bonzini @ 2015-09-15 16:06 UTC (permalink / raw)
  To: Stefan Hajnoczi, Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, rth



On 25/08/2015 18:03, Stefan Hajnoczi wrote:
>> >  
>> > +static uint64_t get_file_size(int fd)
>> > +{
>> > +    struct stat stat_buf;
>> > +    uint64_t size;
>> > +
>> > +    if (fstat(fd, &stat_buf) < 0) {
>> > +        return 0;
>> > +    }
>> > +
>> > +    if (S_ISREG(stat_buf.st_mode)) {
>> > +        return stat_buf.st_size;
>> > +    }
>> > +
>> > +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
>> > +        return size;
>> > +    }
> #ifdef __linux__ for ioctl(fd, BLKGETSIZE64, &size)?
> 
> There is nothing Linux-specific about emulating NVDIMMs so this code
> should compile on all platforms.

The code from block/raw-posix.c and block/raw-win32.c's raw_getlength
should probably be extracted to a new function in utils/, and reused here.

Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-08-26 10:40     ` Xiao Guangrong
  2015-08-28 11:58       ` Stefan Hajnoczi
@ 2015-09-15 16:07       ` Paolo Bonzini
  2015-09-17  8:23         ` Xiao Guangrong
  1 sibling, 1 reply; 87+ messages in thread
From: Paolo Bonzini @ 2015-09-15 16:07 UTC (permalink / raw)
  To: Xiao Guangrong, Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, rth



On 26/08/2015 12:40, Xiao Guangrong wrote:
>>>
>>> +
>>> +    size = get_file_size(fd);
>>> +    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>
>> I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
>> This can be added in the future.
> 
> Good idea, it will allow guest to write data but discards its content
> after it exits. Will implement O_RDONLY + MAP_PRIVATE in the near future.

FWIW, if Igor's backend/frontend idea is implemented, the choice between
MAP_SHARED and MAP_PRIVATE should belong in the backend.

Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-01  9:14           ` Stefan Hajnoczi
@ 2015-09-15 16:10             ` Paolo Bonzini
  2015-09-17  8:39               ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Paolo Bonzini @ 2015-09-15 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	imammedo, rth



On 01/09/2015 11:14, Stefan Hajnoczi wrote:
>> > 
>> > When I was digging into live migration code, i noticed that the same MR name may
>> > cause the name "idstr", please refer to qemu_ram_set_idstr().
>> > 
>> > Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
>> > function.
> I see.  The other devices that use a constant name are on a bus so the
> abort doesn't trigger.

However, the MR name must be the same across the two machines.  Indices
are not friendly to hotplug.  Even though hotplug isn't supported now,
we should prepare and try not to change migration format when we support
hotplug in the future.

Is there any other fixed value that we can use, for example the base
address of the NVDIMM?

Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-07 14:11   ` Igor Mammedov
  2015-09-08 13:38     ` Xiao Guangrong
@ 2015-09-15 16:11     ` Paolo Bonzini
  1 sibling, 0 replies; 87+ messages in thread
From: Paolo Bonzini @ 2015-09-15 16:11 UTC (permalink / raw)
  To: Igor Mammedov, Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha, rth



On 07/09/2015 16:11, Igor Mammedov wrote:
> 
> here is common concepts that could be reused.
>   - on physical system both DIMM and NVDIMM devices use
>     the same slots. We could share QEMU's '-m slots' option between
>     both devices. An alternative to not sharing would be to introduce
>     '-machine nvdimm_slots' option.
>     And yes, we need to know number of NVDIMMs to describe
>     them all in ACPI table (taking in amount future hotplug
>     include in this possible NVDIMM devices)
>     I'd go the same way as on real hardware on make them share the same slots.
>   - they share the same physical address space and limits
>     on how much memory system can handle. So I'd suggest sharing existing
>     '-m maxmem' option and reuse hotplug_memory address space.

I agree, and the slot number also provide a nice way to build a
consistent memory region name across multiple systems.

Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table
  2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table Xiao Guangrong
@ 2015-09-15 16:12   ` Paolo Bonzini
  2015-09-15 17:35     ` Igor Mammedov
  0 siblings, 1 reply; 87+ messages in thread
From: Paolo Bonzini @ 2015-09-15 16:12 UTC (permalink / raw)
  To: Xiao Guangrong, imammedo, mst
  Cc: ehabkost, kvm, gleb, mtosatti, qemu-devel, stefanha, rth



On 14/08/2015 16:52, Xiao Guangrong wrote:
> NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
> 
> Currently, we only support PMEM mode. Each device has 3 tables:
> - SPA table, define the PMEM region info
> 
> - MEM DEV table, it has the @handle which is used to associate specified
>   ACPI NVDIMM  device we will introduce in later patch.
>   Also we can happily ignored the memory device's interleave, the real
>   nvdimm hardware access is hidden behind host
> 
> - DCR table, it defines Vendor ID used to associate specified vendor
>   nvdimm driver. Since we only implement PMEM mode this time, Command
>   window and Data window are not needed
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/i386/acpi-build.c       |   3 +
>  hw/mem/Makefile.objs       |   2 +-
>  hw/mem/nvdimm/acpi.c       | 285 +++++++++++++++++++++++++++++++++++++++++++++
>  hw/mem/nvdimm/internal.h   |  29 +++++
>  hw/mem/nvdimm/pc-nvdimm.c  |  27 ++++-
>  include/hw/mem/pc-nvdimm.h |   2 +
>  6 files changed, 346 insertions(+), 2 deletions(-)
>  create mode 100644 hw/mem/nvdimm/acpi.c
>  create mode 100644 hw/mem/nvdimm/internal.h
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 8ead1c1..092ed2f 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -39,6 +39,7 @@
>  #include "hw/loader.h"
>  #include "hw/isa/isa.h"
>  #include "hw/acpi/memory_hotplug.h"
> +#include "hw/mem/pc-nvdimm.h"
>  #include "sysemu/tpm.h"
>  #include "hw/acpi/tpm.h"
>  #include "sysemu/tpm_backend.h"
> @@ -1741,6 +1742,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>          build_dmar_q35(tables_blob, tables->linker);
>      }
>  
> +    pc_nvdimm_build_nfit_table(table_offsets, tables_blob, tables->linker);
> +
>      /* Add tables supplied by user (if any) */
>      for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
>          unsigned len = acpi_table_len(u);
> diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
> index 4df7482..7a6948d 100644
> --- a/hw/mem/Makefile.objs
> +++ b/hw/mem/Makefile.objs
> @@ -1,2 +1,2 @@
>  common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
> -common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
> +common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o
> diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
> new file mode 100644
> index 0000000..f28752f
> --- /dev/null
> +++ b/hw/mem/nvdimm/acpi.c
> @@ -0,0 +1,285 @@
> +/*
> + * NVDIMM (A Non-Volatile Dual In-line Memory Module) NFIT Implement
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
> + * and the DSM specfication can be found at:
> + *       http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> + *
> + * Currently, it only supports PMEM Virtualization.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "qemu-common.h"
> +
> +#include "hw/acpi/aml-build.h"
> +#include "hw/mem/pc-nvdimm.h"
> +
> +#include "internal.h"
> +
> +static void nfit_spa_uuid_pm(void *uuid)
> +{
> +    uuid_le uuid_pm = UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d,
> +                              0x33, 0x18, 0xb7, 0x8c, 0xdb);
> +    memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
> +}
> +
> +enum {
> +    NFIT_TABLE_SPA = 0,
> +    NFIT_TABLE_MEM = 1,
> +    NFIT_TABLE_IDT = 2,
> +    NFIT_TABLE_SMBIOS = 3,
> +    NFIT_TABLE_DCR = 4,
> +    NFIT_TABLE_BDW = 5,
> +    NFIT_TABLE_FLUSH = 6,
> +};
> +
> +enum {
> +    EFI_MEMORY_UC = 0x1ULL,
> +    EFI_MEMORY_WC = 0x2ULL,
> +    EFI_MEMORY_WT = 0x4ULL,
> +    EFI_MEMORY_WB = 0x8ULL,
> +    EFI_MEMORY_UCE = 0x10ULL,
> +    EFI_MEMORY_WP = 0x1000ULL,
> +    EFI_MEMORY_RP = 0x2000ULL,
> +    EFI_MEMORY_XP = 0x4000ULL,
> +    EFI_MEMORY_NV = 0x8000ULL,
> +    EFI_MEMORY_MORE_RELIABLE = 0x10000ULL,
> +};
> +
> +/*
> + * struct nfit - Nvdimm Firmware Interface Table
> + * @signature: "NFIT"
> + */
> +struct nfit {
> +    ACPI_TABLE_HEADER_DEF
> +    uint32_t reserved;
> +} QEMU_PACKED;
> +
> +/*
> + * struct nfit_spa - System Physical Address Range Structure
> + */
> +struct nfit_spa {
> +    uint16_t type;
> +    uint16_t length;
> +    uint16_t spa_index;
> +    uint16_t flags;
> +    uint32_t reserved;
> +    uint32_t proximity_domain;
> +    uint8_t type_uuid[16];
> +    uint64_t spa_base;
> +    uint64_t spa_length;
> +    uint64_t mem_attr;
> +} QEMU_PACKED;
> +
> +/*
> + * struct nfit_memdev - Memory Device to SPA Map Structure
> + */
> +struct nfit_memdev {
> +    uint16_t type;
> +    uint16_t length;
> +    uint32_t nfit_handle;
> +    uint16_t phys_id;
> +    uint16_t region_id;
> +    uint16_t spa_index;
> +    uint16_t dcr_index;
> +    uint64_t region_len;
> +    uint64_t region_spa_offset;
> +    uint64_t region_dpa;
> +    uint16_t idt_index;
> +    uint16_t interleave_ways;
> +    uint16_t flags;
> +    uint16_t reserved;
> +} QEMU_PACKED;
> +
> +/*
> + * struct nfit_dcr - NVDIMM Control Region Structure
> + */
> +struct nfit_dcr {
> +    uint16_t type;
> +    uint16_t length;
> +    uint16_t dcr_index;
> +    uint16_t vendor_id;
> +    uint16_t device_id;
> +    uint16_t revision_id;
> +    uint16_t sub_vendor_id;
> +    uint16_t sub_device_id;
> +    uint16_t sub_revision_id;
> +    uint8_t reserved[6];
> +    uint32_t serial_number;
> +    uint16_t fic;
> +    uint16_t num_bcw;
> +    uint64_t bcw_size;
> +    uint64_t cmd_offset;
> +    uint64_t cmd_size;
> +    uint64_t status_offset;
> +    uint64_t status_size;
> +    uint16_t flags;
> +    uint8_t reserved2[6];
> +} QEMU_PACKED;
> +
> +#define REVSISON_ID    1
> +#define NFIT_FIC1      0x201
> +
> +#define MAX_NVDIMM_NUMBER       10
> +
> +static int get_nvdimm_device_number(GSList *list)
> +{
> +    int nr = 0;
> +
> +    for (; list; list = list->next) {
> +        nr++;
> +    }
> +
> +    return nr;
> +}
> +
> +static uint32_t nvdimm_index_to_sn(int index)
> +{
> +    return 0x123456 + index;
> +}
> +
> +static uint32_t nvdimm_index_to_handle(int index)
> +{
> +    return index + 1;
> +}
> +
> +static size_t get_nfit_total_size(int nr)
> +{
> +    /* each nvdimm has 3 tables. */
> +    return sizeof(struct nfit) + nr * (sizeof(struct nfit_spa) +
> +                  sizeof(struct nfit_memdev) + sizeof(struct nfit_dcr));
> +}
> +
> +static int build_spa_table(void *buf, PCNVDIMMDevice *nvdimm, int spa_index)
> +{
> +    struct nfit_spa *nfit_spa;
> +    uint64_t addr = object_property_get_int(OBJECT(&nvdimm->mr), "addr", NULL);
> +
> +    nfit_spa = (struct nfit_spa *)buf;
> +
> +    /*
> +     * nfit_spa->flags is set to zero so that proximity_domain
> +     * info is ignored.
> +     */
> +    nfit_spa->type = cpu_to_le16(NFIT_TABLE_SPA);
> +    nfit_spa->length = cpu_to_le16(sizeof(*nfit_spa));
> +    nfit_spa_uuid_pm(&nfit_spa->type_uuid);
> +    nfit_spa->spa_index = cpu_to_le16(spa_index);
> +    nfit_spa->spa_base = cpu_to_le64(addr);
> +    nfit_spa->spa_length = cpu_to_le64(memory_region_size(&nvdimm->mr));
> +    nfit_spa->mem_attr = cpu_to_le64(EFI_MEMORY_WB | EFI_MEMORY_NV);
> +
> +    return sizeof(*nfit_spa);
> +}
> +
> +static int build_memdev_table(void *buf, PCNVDIMMDevice *nvdimm,
> +                              int spa_index, int dcr_index)
> +{
> +    struct nfit_memdev *nfit_memdev;
> +    uint64_t addr = object_property_get_int(OBJECT(&nvdimm->mr), "addr", NULL);
> +    uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
> +
> +    nfit_memdev = (struct nfit_memdev *)buf;
> +    nfit_memdev->type = cpu_to_le16(NFIT_TABLE_MEM);
> +    nfit_memdev->length = cpu_to_le16(sizeof(*nfit_memdev));
> +    nfit_memdev->nfit_handle = cpu_to_le32(handle);
> +    /* point to nfit_spa. */
> +    nfit_memdev->spa_index = cpu_to_le16(spa_index);
> +    /* point to nfit_dcr. */
> +    nfit_memdev->dcr_index = cpu_to_le16(dcr_index);
> +    nfit_memdev->region_len = cpu_to_le64(memory_region_size(&nvdimm->mr));
> +    nfit_memdev->region_dpa = cpu_to_le64(addr);
> +    /* Only one interleave for pmem. */
> +    nfit_memdev->interleave_ways = cpu_to_le16(1);
> +
> +    return sizeof(*nfit_memdev);
> +}
> +
> +static int build_dcr_table(void *buf, PCNVDIMMDevice *nvdimm, int dcr_index)
> +{
> +    struct nfit_dcr *nfit_dcr;
> +    uint32_t sn = nvdimm_index_to_sn(nvdimm->device_index);
> +
> +    nfit_dcr = (struct nfit_dcr *)buf;
> +    nfit_dcr->type = cpu_to_le16(NFIT_TABLE_DCR);
> +    nfit_dcr->length = cpu_to_le16(sizeof(*nfit_dcr));
> +    nfit_dcr->dcr_index = cpu_to_le16(dcr_index);
> +    nfit_dcr->vendor_id = cpu_to_le16(0x8086);
> +    nfit_dcr->device_id = cpu_to_le16(1);
> +    nfit_dcr->revision_id = cpu_to_le16(REVSISON_ID);
> +    nfit_dcr->serial_number = cpu_to_le32(sn);
> +    nfit_dcr->fic = cpu_to_le16(NFIT_FIC1);
> +
> +    return sizeof(*nfit_dcr);
> +}
> +
> +static void build_nfit_table(GSList *device_list, char *buf)
> +{
> +    int index = 0;
> +
> +    buf += sizeof(struct nfit);
> +
> +    for (; device_list; device_list = device_list->next) {
> +        PCNVDIMMDevice *nvdimm = device_list->data;
> +        int spa_index, dcr_index;
> +
> +        spa_index = ++index;
> +        dcr_index = ++index;
> +
> +        /* build System Physical Address Range Description Table. */
> +        buf += build_spa_table(buf, nvdimm, spa_index);
> +
> +        /*
> +         * build Memory Device to System Physical Address Range Mapping
> +         * Table.
> +         */
> +        buf += build_memdev_table(buf, nvdimm, spa_index, dcr_index);
> +
> +        /* build Control Region Descriptor Table. */
> +        buf += build_dcr_table(buf, nvdimm, dcr_index);
> +    }
> +}
> +
> +void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
> +                                GArray *linker)
> +{
> +    GSList *list = get_nvdimm_built_list();
> +    size_t total;
> +    char *buf;
> +    int nfit_start, nr;
> +
> +    nr = get_nvdimm_device_number(list);
> +    total = get_nfit_total_size(nr);
> +
> +    if (nr <= 0 || nr > MAX_NVDIMM_NUMBER) {
> +        goto exit;
> +    }
> +
> +    nfit_start = table_data->len;
> +    acpi_add_table(table_offsets, table_data);
> +
> +    buf = acpi_data_push(table_data, total);
> +    build_nfit_table(list, buf);
> +
> +    build_header(linker, table_data, (void *)(table_data->data + nfit_start),
> +                 "NFIT", table_data->len - nfit_start, 1);
> +exit:
> +    g_slist_free(list);
> +}
> diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
> new file mode 100644
> index 0000000..252a222
> --- /dev/null
> +++ b/hw/mem/nvdimm/internal.h
> @@ -0,0 +1,29 @@
> +/*
> + * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef __NVDIMM_INTERNAL_H
> +#define __NVDIMM_INTERNAL_H
> +
> +#define PAGE_SIZE               (1UL << 12)
> +
> +typedef struct {
> +    uint8_t b[16];
> +} uuid_le;
> +
> +#define UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)                   \
> +((uuid_le)                                                                 \
> +{ { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
> +    (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } })
> +
> +GSList *get_nvdimm_built_list(void);
> +#endif
> diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> index 97710d1..2a6cfa2 100644
> --- a/hw/mem/nvdimm/pc-nvdimm.c
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -29,7 +29,7 @@
>  #include "exec/address-spaces.h"
>  #include "hw/mem/pc-nvdimm.h"
>  
> -#define PAGE_SIZE               (1UL << 12)
> +#include "internal.h"
>  
>  #define MIN_CONFIG_DATA_SIZE    (128 << 10)
>  
> @@ -65,6 +65,31 @@ static uint32_t new_device_index(void)
>      return nvdimms_info.device_index++;
>  }
>  
> +static int pc_nvdimm_built_list(Object *obj, void *opaque)
> +{
> +    GSList **list = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_PC_NVDIMM)) {
> +        PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +        /* only realized NVDIMMs matter */
> +        if (memory_region_size(&nvdimm->mr)) {
> +            *list = g_slist_append(*list, nvdimm);
> +        }
> +    }
> +
> +    object_child_foreach(obj, pc_nvdimm_built_list, opaque);
> +    return 0;
> +}
> +
> +GSList *get_nvdimm_built_list(void)
> +{
> +    GSList *list = NULL;
> +
> +    object_child_foreach(qdev_get_machine(), pc_nvdimm_built_list, &list);
> +    return list;
> +}
> +
>  static char *get_file(Object *obj, Error **errp)
>  {
>      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
> index f617fd2..b2da8fa 100644
> --- a/include/hw/mem/pc-nvdimm.h
> +++ b/include/hw/mem/pc-nvdimm.h
> @@ -36,4 +36,6 @@ typedef struct PCNVDIMMDevice {
>      OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
>  
>  void pc_nvdimm_reserve_range(ram_addr_t offset);
> +void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
> +                                GArray *linker);
>  #endif
> 

Michael, Igor,

how would this compare to the IPMI patches?  Is there any interface in
those patches that we can reuse here?

Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table
  2015-09-15 16:12   ` Paolo Bonzini
@ 2015-09-15 17:35     ` Igor Mammedov
  0 siblings, 0 replies; 87+ messages in thread
From: Igor Mammedov @ 2015-09-15 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, rth

On Tue, 15 Sep 2015 18:12:43 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
> 
> On 14/08/2015 16:52, Xiao Guangrong wrote:
> > NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
> > 
> > Currently, we only support PMEM mode. Each device has 3 tables:
> > - SPA table, define the PMEM region info
> > 
> > - MEM DEV table, it has the @handle which is used to associate specified
> >   ACPI NVDIMM  device we will introduce in later patch.
> >   Also we can happily ignored the memory device's interleave, the real
> >   nvdimm hardware access is hidden behind host
> > 
> > - DCR table, it defines Vendor ID used to associate specified vendor
> >   nvdimm driver. Since we only implement PMEM mode this time, Command
> >   window and Data window are not needed
> > 
> > Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > ---
> >  hw/i386/acpi-build.c       |   3 +
> >  hw/mem/Makefile.objs       |   2 +-
> >  hw/mem/nvdimm/acpi.c       | 285 +++++++++++++++++++++++++++++++++++++++++++++
> >  hw/mem/nvdimm/internal.h   |  29 +++++
> >  hw/mem/nvdimm/pc-nvdimm.c  |  27 ++++-
> >  include/hw/mem/pc-nvdimm.h |   2 +
> >  6 files changed, 346 insertions(+), 2 deletions(-)
> >  create mode 100644 hw/mem/nvdimm/acpi.c
> >  create mode 100644 hw/mem/nvdimm/internal.h
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 8ead1c1..092ed2f 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -39,6 +39,7 @@
> >  #include "hw/loader.h"
> >  #include "hw/isa/isa.h"
> >  #include "hw/acpi/memory_hotplug.h"
> > +#include "hw/mem/pc-nvdimm.h"
> >  #include "sysemu/tpm.h"
> >  #include "hw/acpi/tpm.h"
> >  #include "sysemu/tpm_backend.h"
> > @@ -1741,6 +1742,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
> >          build_dmar_q35(tables_blob, tables->linker);
> >      }
> >  
> > +    pc_nvdimm_build_nfit_table(table_offsets, tables_blob, tables->linker);
> > +
> >      /* Add tables supplied by user (if any) */
> >      for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
> >          unsigned len = acpi_table_len(u);
> > diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
> > index 4df7482..7a6948d 100644
> > --- a/hw/mem/Makefile.objs
> > +++ b/hw/mem/Makefile.objs
> > @@ -1,2 +1,2 @@
> >  common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
> > -common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
> > +common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o nvdimm/acpi.o
> > diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
> > new file mode 100644
> > index 0000000..f28752f
> > --- /dev/null
> > +++ b/hw/mem/nvdimm/acpi.c
> > @@ -0,0 +1,285 @@
> > +/*
> > + * NVDIMM (A Non-Volatile Dual In-line Memory Module) NFIT Implement
> > + *
> > + * Copyright(C) 2015 Intel Corporation.
> > + *
> > + * Author:
> > + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > + *
> > + * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
> > + * and the DSM specfication can be found at:
> > + *       http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> > + *
> > + * Currently, it only supports PMEM Virtualization.
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> > + */
> > +
> > +#include "qemu-common.h"
> > +
> > +#include "hw/acpi/aml-build.h"
> > +#include "hw/mem/pc-nvdimm.h"
> > +
> > +#include "internal.h"
> > +
> > +static void nfit_spa_uuid_pm(void *uuid)
> > +{
> > +    uuid_le uuid_pm = UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d,
> > +                              0x33, 0x18, 0xb7, 0x8c, 0xdb);
> > +    memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
> > +}
> > +
> > +enum {
> > +    NFIT_TABLE_SPA = 0,
> > +    NFIT_TABLE_MEM = 1,
> > +    NFIT_TABLE_IDT = 2,
> > +    NFIT_TABLE_SMBIOS = 3,
> > +    NFIT_TABLE_DCR = 4,
> > +    NFIT_TABLE_BDW = 5,
> > +    NFIT_TABLE_FLUSH = 6,
> > +};
> > +
> > +enum {
> > +    EFI_MEMORY_UC = 0x1ULL,
> > +    EFI_MEMORY_WC = 0x2ULL,
> > +    EFI_MEMORY_WT = 0x4ULL,
> > +    EFI_MEMORY_WB = 0x8ULL,
> > +    EFI_MEMORY_UCE = 0x10ULL,
> > +    EFI_MEMORY_WP = 0x1000ULL,
> > +    EFI_MEMORY_RP = 0x2000ULL,
> > +    EFI_MEMORY_XP = 0x4000ULL,
> > +    EFI_MEMORY_NV = 0x8000ULL,
> > +    EFI_MEMORY_MORE_RELIABLE = 0x10000ULL,
> > +};
> > +
> > +/*
> > + * struct nfit - Nvdimm Firmware Interface Table
> > + * @signature: "NFIT"
> > + */
> > +struct nfit {
> > +    ACPI_TABLE_HEADER_DEF
> > +    uint32_t reserved;
> > +} QEMU_PACKED;
> > +
> > +/*
> > + * struct nfit_spa - System Physical Address Range Structure
> > + */
> > +struct nfit_spa {
> > +    uint16_t type;
> > +    uint16_t length;
> > +    uint16_t spa_index;
> > +    uint16_t flags;
> > +    uint32_t reserved;
> > +    uint32_t proximity_domain;
> > +    uint8_t type_uuid[16];
> > +    uint64_t spa_base;
> > +    uint64_t spa_length;
> > +    uint64_t mem_attr;
> > +} QEMU_PACKED;
> > +
> > +/*
> > + * struct nfit_memdev - Memory Device to SPA Map Structure
> > + */
> > +struct nfit_memdev {
> > +    uint16_t type;
> > +    uint16_t length;
> > +    uint32_t nfit_handle;
> > +    uint16_t phys_id;
> > +    uint16_t region_id;
> > +    uint16_t spa_index;
> > +    uint16_t dcr_index;
> > +    uint64_t region_len;
> > +    uint64_t region_spa_offset;
> > +    uint64_t region_dpa;
> > +    uint16_t idt_index;
> > +    uint16_t interleave_ways;
> > +    uint16_t flags;
> > +    uint16_t reserved;
> > +} QEMU_PACKED;
> > +
> > +/*
> > + * struct nfit_dcr - NVDIMM Control Region Structure
> > + */
> > +struct nfit_dcr {
> > +    uint16_t type;
> > +    uint16_t length;
> > +    uint16_t dcr_index;
> > +    uint16_t vendor_id;
> > +    uint16_t device_id;
> > +    uint16_t revision_id;
> > +    uint16_t sub_vendor_id;
> > +    uint16_t sub_device_id;
> > +    uint16_t sub_revision_id;
> > +    uint8_t reserved[6];
> > +    uint32_t serial_number;
> > +    uint16_t fic;
> > +    uint16_t num_bcw;
> > +    uint64_t bcw_size;
> > +    uint64_t cmd_offset;
> > +    uint64_t cmd_size;
> > +    uint64_t status_offset;
> > +    uint64_t status_size;
> > +    uint16_t flags;
> > +    uint8_t reserved2[6];
> > +} QEMU_PACKED;
> > +
> > +#define REVSISON_ID    1
> > +#define NFIT_FIC1      0x201
> > +
> > +#define MAX_NVDIMM_NUMBER       10
> > +
> > +static int get_nvdimm_device_number(GSList *list)
> > +{
> > +    int nr = 0;
> > +
> > +    for (; list; list = list->next) {
> > +        nr++;
> > +    }
> > +
> > +    return nr;
> > +}
> > +
> > +static uint32_t nvdimm_index_to_sn(int index)
> > +{
> > +    return 0x123456 + index;
> > +}
> > +
> > +static uint32_t nvdimm_index_to_handle(int index)
> > +{
> > +    return index + 1;
> > +}
> > +
> > +static size_t get_nfit_total_size(int nr)
> > +{
> > +    /* each nvdimm has 3 tables. */
> > +    return sizeof(struct nfit) + nr * (sizeof(struct nfit_spa) +
> > +                  sizeof(struct nfit_memdev) + sizeof(struct nfit_dcr));
> > +}
> > +
> > +static int build_spa_table(void *buf, PCNVDIMMDevice *nvdimm, int spa_index)
> > +{
> > +    struct nfit_spa *nfit_spa;
> > +    uint64_t addr = object_property_get_int(OBJECT(&nvdimm->mr), "addr", NULL);
> > +
> > +    nfit_spa = (struct nfit_spa *)buf;
> > +
> > +    /*
> > +     * nfit_spa->flags is set to zero so that proximity_domain
> > +     * info is ignored.
> > +     */
> > +    nfit_spa->type = cpu_to_le16(NFIT_TABLE_SPA);
> > +    nfit_spa->length = cpu_to_le16(sizeof(*nfit_spa));
> > +    nfit_spa_uuid_pm(&nfit_spa->type_uuid);
> > +    nfit_spa->spa_index = cpu_to_le16(spa_index);
> > +    nfit_spa->spa_base = cpu_to_le64(addr);
> > +    nfit_spa->spa_length = cpu_to_le64(memory_region_size(&nvdimm->mr));
> > +    nfit_spa->mem_attr = cpu_to_le64(EFI_MEMORY_WB | EFI_MEMORY_NV);
> > +
> > +    return sizeof(*nfit_spa);
> > +}
> > +
> > +static int build_memdev_table(void *buf, PCNVDIMMDevice *nvdimm,
> > +                              int spa_index, int dcr_index)
> > +{
> > +    struct nfit_memdev *nfit_memdev;
> > +    uint64_t addr = object_property_get_int(OBJECT(&nvdimm->mr), "addr", NULL);
> > +    uint32_t handle = nvdimm_index_to_handle(nvdimm->device_index);
> > +
> > +    nfit_memdev = (struct nfit_memdev *)buf;
> > +    nfit_memdev->type = cpu_to_le16(NFIT_TABLE_MEM);
> > +    nfit_memdev->length = cpu_to_le16(sizeof(*nfit_memdev));
> > +    nfit_memdev->nfit_handle = cpu_to_le32(handle);
> > +    /* point to nfit_spa. */
> > +    nfit_memdev->spa_index = cpu_to_le16(spa_index);
> > +    /* point to nfit_dcr. */
> > +    nfit_memdev->dcr_index = cpu_to_le16(dcr_index);
> > +    nfit_memdev->region_len = cpu_to_le64(memory_region_size(&nvdimm->mr));
> > +    nfit_memdev->region_dpa = cpu_to_le64(addr);
> > +    /* Only one interleave for pmem. */
> > +    nfit_memdev->interleave_ways = cpu_to_le16(1);
> > +
> > +    return sizeof(*nfit_memdev);
> > +}
> > +
> > +static int build_dcr_table(void *buf, PCNVDIMMDevice *nvdimm, int dcr_index)
> > +{
> > +    struct nfit_dcr *nfit_dcr;
> > +    uint32_t sn = nvdimm_index_to_sn(nvdimm->device_index);
> > +
> > +    nfit_dcr = (struct nfit_dcr *)buf;
> > +    nfit_dcr->type = cpu_to_le16(NFIT_TABLE_DCR);
> > +    nfit_dcr->length = cpu_to_le16(sizeof(*nfit_dcr));
> > +    nfit_dcr->dcr_index = cpu_to_le16(dcr_index);
> > +    nfit_dcr->vendor_id = cpu_to_le16(0x8086);
> > +    nfit_dcr->device_id = cpu_to_le16(1);
> > +    nfit_dcr->revision_id = cpu_to_le16(REVSISON_ID);
> > +    nfit_dcr->serial_number = cpu_to_le32(sn);
> > +    nfit_dcr->fic = cpu_to_le16(NFIT_FIC1);
> > +
> > +    return sizeof(*nfit_dcr);
> > +}
> > +
> > +static void build_nfit_table(GSList *device_list, char *buf)
> > +{
> > +    int index = 0;
> > +
> > +    buf += sizeof(struct nfit);
> > +
> > +    for (; device_list; device_list = device_list->next) {
> > +        PCNVDIMMDevice *nvdimm = device_list->data;
> > +        int spa_index, dcr_index;
> > +
> > +        spa_index = ++index;
> > +        dcr_index = ++index;
> > +
> > +        /* build System Physical Address Range Description Table. */
> > +        buf += build_spa_table(buf, nvdimm, spa_index);
> > +
> > +        /*
> > +         * build Memory Device to System Physical Address Range Mapping
> > +         * Table.
> > +         */
> > +        buf += build_memdev_table(buf, nvdimm, spa_index, dcr_index);
> > +
> > +        /* build Control Region Descriptor Table. */
> > +        buf += build_dcr_table(buf, nvdimm, dcr_index);
> > +    }
> > +}
> > +
> > +void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
> > +                                GArray *linker)
> > +{
> > +    GSList *list = get_nvdimm_built_list();
> > +    size_t total;
> > +    char *buf;
> > +    int nfit_start, nr;
> > +
> > +    nr = get_nvdimm_device_number(list);
> > +    total = get_nfit_total_size(nr);
> > +
> > +    if (nr <= 0 || nr > MAX_NVDIMM_NUMBER) {
> > +        goto exit;
> > +    }
> > +
> > +    nfit_start = table_data->len;
> > +    acpi_add_table(table_offsets, table_data);
> > +
> > +    buf = acpi_data_push(table_data, total);
> > +    build_nfit_table(list, buf);
> > +
> > +    build_header(linker, table_data, (void *)(table_data->data + nfit_start),
> > +                 "NFIT", table_data->len - nfit_start, 1);
> > +exit:
> > +    g_slist_free(list);
> > +}
> > diff --git a/hw/mem/nvdimm/internal.h b/hw/mem/nvdimm/internal.h
> > new file mode 100644
> > index 0000000..252a222
> > --- /dev/null
> > +++ b/hw/mem/nvdimm/internal.h
> > @@ -0,0 +1,29 @@
> > +/*
> > + * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
> > + *
> > + * Copyright(C) 2015 Intel Corporation.
> > + *
> > + * Author:
> > + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef __NVDIMM_INTERNAL_H
> > +#define __NVDIMM_INTERNAL_H
> > +
> > +#define PAGE_SIZE               (1UL << 12)
> > +
> > +typedef struct {
> > +    uint8_t b[16];
> > +} uuid_le;
> > +
> > +#define UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)                   \
> > +((uuid_le)                                                                 \
> > +{ { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
> > +    (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
> > +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } })
> > +
> > +GSList *get_nvdimm_built_list(void);
> > +#endif
> > diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
> > index 97710d1..2a6cfa2 100644
> > --- a/hw/mem/nvdimm/pc-nvdimm.c
> > +++ b/hw/mem/nvdimm/pc-nvdimm.c
> > @@ -29,7 +29,7 @@
> >  #include "exec/address-spaces.h"
> >  #include "hw/mem/pc-nvdimm.h"
> >  
> > -#define PAGE_SIZE               (1UL << 12)
> > +#include "internal.h"
> >  
> >  #define MIN_CONFIG_DATA_SIZE    (128 << 10)
> >  
> > @@ -65,6 +65,31 @@ static uint32_t new_device_index(void)
> >      return nvdimms_info.device_index++;
> >  }
> >  
> > +static int pc_nvdimm_built_list(Object *obj, void *opaque)
> > +{
> > +    GSList **list = opaque;
> > +
> > +    if (object_dynamic_cast(obj, TYPE_PC_NVDIMM)) {
> > +        PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> > +
> > +        /* only realized NVDIMMs matter */
> > +        if (memory_region_size(&nvdimm->mr)) {
> > +            *list = g_slist_append(*list, nvdimm);
> > +        }
> > +    }
> > +
> > +    object_child_foreach(obj, pc_nvdimm_built_list, opaque);
> > +    return 0;
> > +}
> > +
> > +GSList *get_nvdimm_built_list(void)
> > +{
> > +    GSList *list = NULL;
> > +
> > +    object_child_foreach(qdev_get_machine(), pc_nvdimm_built_list, &list);
> > +    return list;
> > +}
> > +
> >  static char *get_file(Object *obj, Error **errp)
> >  {
> >      PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> > diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
> > index f617fd2..b2da8fa 100644
> > --- a/include/hw/mem/pc-nvdimm.h
> > +++ b/include/hw/mem/pc-nvdimm.h
> > @@ -36,4 +36,6 @@ typedef struct PCNVDIMMDevice {
> >      OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
> >  
> >  void pc_nvdimm_reserve_range(ram_addr_t offset);
> > +void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
> > +                                GArray *linker);
> >  #endif
> > 
> 
> Michael, Igor,
> 
> how would this compare to the IPMI patches?  Is there any interface in
> those patches that we can reuse here?
I don't think so, in IPMI series we stopped on the need of abstract ACPI
interface via which devices could report its resources so we could separate
cleanly device impl vs ACPI code that describes it.

As for NVDIMM, it's pretty close to pc-dimm device by structure and since
this implementation supports only pmem mode I think we can do away
using the same properties and probably add a couple more on top for
building NFIT structure.
But I'm still thinking on what to suggest wrt ACPI parts of this series
to make it cleaner and easier to handle.

> 
> Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-15 16:06     ` Paolo Bonzini
@ 2015-09-17  8:21       ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-17  8:21 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, rth



On 09/16/2015 12:06 AM, Paolo Bonzini wrote:
>
>
> On 25/08/2015 18:03, Stefan Hajnoczi wrote:
>>>>
>>>> +static uint64_t get_file_size(int fd)
>>>> +{
>>>> +    struct stat stat_buf;
>>>> +    uint64_t size;
>>>> +
>>>> +    if (fstat(fd, &stat_buf) < 0) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    if (S_ISREG(stat_buf.st_mode)) {
>>>> +        return stat_buf.st_size;
>>>> +    }
>>>> +
>>>> +    if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, &size)) {
>>>> +        return size;
>>>> +    }
>> #ifdef __linux__ for ioctl(fd, BLKGETSIZE64, &size)?
>>
>> There is nothing Linux-specific about emulating NVDIMMs so this code
>> should compile on all platforms.
>
> The code from block/raw-posix.c and block/raw-win32.c's raw_getlength
> should probably be extracted to a new function in utils/, and reused here.
>

The function you pointed out is really complex - it mixed 9 platforms and each
platform has its own specific implementation. It is hard for us to verify the
change.

I'd prefer to make it for Linux specific first then share it to other platforms
if it's needed in the future.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-15 16:07       ` Paolo Bonzini
@ 2015-09-17  8:23         ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-17  8:23 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, rth



On 09/16/2015 12:07 AM, Paolo Bonzini wrote:
>
>
> On 26/08/2015 12:40, Xiao Guangrong wrote:
>>>>
>>>> +
>>>> +    size = get_file_size(fd);
>>>> +    buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>>
>>> I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
>>> This can be added in the future.
>>
>> Good idea, it will allow guest to write data but discards its content
>> after it exits. Will implement O_RDONLY + MAP_PRIVATE in the near future.
>
> FWIW, if Igor's backend/frontend idea is implemented, the choice between
> MAP_SHARED and MAP_PRIVATE should belong in the backend.

Yes. I can not agree with you more! :)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-15 16:10             ` Paolo Bonzini
@ 2015-09-17  8:39               ` Xiao Guangrong
  2015-09-17  9:04                 ` Igor Mammedov
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-17  8:39 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	imammedo, rth



On 09/16/2015 12:10 AM, Paolo Bonzini wrote:
>
>
> On 01/09/2015 11:14, Stefan Hajnoczi wrote:
>>>>
>>>> When I was digging into live migration code, i noticed that the same MR name may
>>>> cause the name "idstr", please refer to qemu_ram_set_idstr().
>>>>
>>>> Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
>>>> function.
>> I see.  The other devices that use a constant name are on a bus so the
>> abort doesn't trigger.
>
> However, the MR name must be the same across the two machines.  Indices
> are not friendly to hotplug.  Even though hotplug isn't supported now,
> we should prepare and try not to change migration format when we support
> hotplug in the future.
>

Thanks for your reminder.

> Is there any other fixed value that we can use, for example the base
> address of the NVDIMM?

How about use object_get_canonical_path(OBJECT(dev)) (the @dev is NVDIMM
device) ?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-17  8:39               ` Xiao Guangrong
@ 2015-09-17  9:04                 ` Igor Mammedov
  2015-09-17  9:14                   ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Igor Mammedov @ 2015-09-17  9:04 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini, rth

On Thu, 17 Sep 2015 16:39:12 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/16/2015 12:10 AM, Paolo Bonzini wrote:
> >
> >
> > On 01/09/2015 11:14, Stefan Hajnoczi wrote:
> >>>>
> >>>> When I was digging into live migration code, i noticed that the same MR name may
> >>>> cause the name "idstr", please refer to qemu_ram_set_idstr().
> >>>>
> >>>> Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
> >>>> function.
> >> I see.  The other devices that use a constant name are on a bus so the
> >> abort doesn't trigger.
> >
> > However, the MR name must be the same across the two machines.  Indices
> > are not friendly to hotplug.  Even though hotplug isn't supported now,
> > we should prepare and try not to change migration format when we support
> > hotplug in the future.
> >
> 
> Thanks for your reminder.
> 
> > Is there any other fixed value that we can use, for example the base
> > address of the NVDIMM?
> 
> How about use object_get_canonical_path(OBJECT(dev)) (the @dev is NVDIMM
> device) ?
if you use split backend/frotnend idea then existing backends
already have a stable name derived from backend's ID and you won't need to care
about it.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-17  9:04                 ` Igor Mammedov
@ 2015-09-17  9:14                   ` Xiao Guangrong
  2015-09-17  9:34                     ` Paolo Bonzini
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-17  9:14 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini, rth



On 09/17/2015 05:04 PM, Igor Mammedov wrote:
> On Thu, 17 Sep 2015 16:39:12 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 09/16/2015 12:10 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 01/09/2015 11:14, Stefan Hajnoczi wrote:
>>>>>>
>>>>>> When I was digging into live migration code, i noticed that the same MR name may
>>>>>> cause the name "idstr", please refer to qemu_ram_set_idstr().
>>>>>>
>>>>>> Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
>>>>>> function.
>>>> I see.  The other devices that use a constant name are on a bus so the
>>>> abort doesn't trigger.
>>>
>>> However, the MR name must be the same across the two machines.  Indices
>>> are not friendly to hotplug.  Even though hotplug isn't supported now,
>>> we should prepare and try not to change migration format when we support
>>> hotplug in the future.
>>>
>>
>> Thanks for your reminder.
>>
>>> Is there any other fixed value that we can use, for example the base
>>> address of the NVDIMM?
>>
>> How about use object_get_canonical_path(OBJECT(dev)) (the @dev is NVDIMM
>> device) ?
> if you use split backend/frotnend idea then existing backends
> already have a stable name derived from backend's ID and you won't need to care
> about it.
>

Yes, i am using this idea and addressing your suggestion that use
memory_region_init_alias() to partly map hostmem to guest's address
space.

The code is like this:

/* get the memory region from backend memory. */
mr = host_memory_backend_get_memory(dimm->hostmem, errp);

/* nvdimm_nr will map to guest address space. */
memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dev),
                          object_get_canonical_path(OBJECT(dev)), mr, 0,
                          size - nvdimm->label_size);

/* the label size at the end of the file used as label_data of NVDIMM. */
......

So there are two memory regions, one is the backend-mem and another one
is nvdimm_mr in the example above. The name i am worried about is the name
of nvdimm_mr.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-17  9:14                   ` Xiao Guangrong
@ 2015-09-17  9:34                     ` Paolo Bonzini
  2015-09-17 12:43                       ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Paolo Bonzini @ 2015-09-17  9:34 UTC (permalink / raw)
  To: Xiao Guangrong, Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	Stefan Hajnoczi, rth



On 17/09/2015 11:14, Xiao Guangrong wrote:
> 
> 
> /* get the memory region from backend memory. */
> mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> 
> /* nvdimm_nr will map to guest address space. */
> memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dev),
>                          object_get_canonical_path(OBJECT(dev)), mr, 0,
>                          size - nvdimm->label_size);

You can just use "memory" here for the name here.  The name only needs
to be unique for RAM memory regions, and dimm->hostmem will take care of it.

Paolo

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
  2015-09-17  9:34                     ` Paolo Bonzini
@ 2015-09-17 12:43                       ` Xiao Guangrong
  0 siblings, 0 replies; 87+ messages in thread
From: Xiao Guangrong @ 2015-09-17 12:43 UTC (permalink / raw)
  To: Paolo Bonzini, Igor Mammedov
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	Stefan Hajnoczi, rth



On 09/17/2015 05:34 PM, Paolo Bonzini wrote:
>
>
> On 17/09/2015 11:14, Xiao Guangrong wrote:
>>
>>
>> /* get the memory region from backend memory. */
>> mr = host_memory_backend_get_memory(dimm->hostmem, errp);
>>
>> /* nvdimm_nr will map to guest address space. */
>> memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dev),
>>                           object_get_canonical_path(OBJECT(dev)), mr, 0,
>>                           size - nvdimm->label_size);
>
> You can just use "memory" here for the name here.  The name only needs
> to be unique for RAM memory regions, and dimm->hostmem will take care of it.
>

Okay. I will try it, thank you, Paolo.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
  2015-08-26 10:49   ` Xiao Guangrong
@ 2015-10-07 14:02     ` Stefan Hajnoczi
  2015-10-07 14:43       ` Xiao Guangrong
  0 siblings, 1 reply; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-10-07 14:02 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth

On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
> On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> >On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> >Have you thought about live migration?
> >
> >Are the contents of the NVDIMM migrated since they are registered as a
> >RAM region?
> 
> Will fully test live migration and VM save before sending the V3 out. :)

Hi,
What is the status of this patch series?

Stefan

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
  2015-10-07 14:02     ` Stefan Hajnoczi
@ 2015-10-07 14:43       ` Xiao Guangrong
  2015-10-09 10:38         ` Stefan Hajnoczi
  0 siblings, 1 reply; 87+ messages in thread
From: Xiao Guangrong @ 2015-10-07 14:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, rth



On 10/07/2015 10:02 PM, Stefan Hajnoczi wrote:
> On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
>> On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
>>> On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
>>> Have you thought about live migration?
>>>
>>> Are the contents of the NVDIMM migrated since they are registered as a
>>> RAM region?
>>
>> Will fully test live migration and VM save before sending the V3 out. :)
>
> Hi,
> What is the status of this patch series?

This is huge change in v3, the patchset is ready now and it's being tested.
Will post it out (hopefully this week) after the long holiday in China. :)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
  2015-10-07 14:43       ` Xiao Guangrong
@ 2015-10-09 10:38         ` Stefan Hajnoczi
  0 siblings, 0 replies; 87+ messages in thread
From: Stefan Hajnoczi @ 2015-10-09 10:38 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: ehabkost, kvm, mst, gleb, Stefan Hajnoczi, mtosatti, qemu-devel,
	pbonzini, imammedo, rth

On Wed, Oct 07, 2015 at 10:43:40PM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/07/2015 10:02 PM, Stefan Hajnoczi wrote:
> >On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
> >>On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> >>>On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> >>>Have you thought about live migration?
> >>>
> >>>Are the contents of the NVDIMM migrated since they are registered as a
> >>>RAM region?
> >>
> >>Will fully test live migration and VM save before sending the V3 out. :)
> >
> >Hi,
> >What is the status of this patch series?
> 
> This is huge change in v3, the patchset is ready now and it's being tested.
> Will post it out (hopefully this week) after the long holiday in China. :)

Great, thanks!

Stefan

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2015-10-09 10:38 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-14 14:51 [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset Xiao Guangrong
2015-09-02  8:05   ` Igor Mammedov
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit Xiao Guangrong
2015-09-02 10:06   ` Igor Mammedov
2015-09-02 10:43     ` Xiao Guangrong
2015-09-02 11:42       ` Igor Mammedov
2015-09-06  7:01         ` Xiao Guangrong
2015-09-02 12:05     ` Michael S. Tsirkin
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof Xiao Guangrong
2015-09-02 10:16   ` Igor Mammedov
2015-09-02 10:38     ` Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof Xiao Guangrong
2015-09-02 10:18   ` Igor Mammedov
2015-09-02 10:39     ` Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field Xiao Guangrong
2015-09-02 11:10   ` Igor Mammedov
2015-09-06  5:32     ` Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract Xiao Guangrong
2015-08-25 14:57   ` Stefan Hajnoczi
2015-08-26  9:37     ` Xiao Guangrong
2015-09-02  9:58   ` Igor Mammedov
2015-09-02 10:36     ` Xiao Guangrong
2015-09-02 11:31       ` Igor Mammedov
2015-09-06  6:07         ` Xiao Guangrong
2015-09-07 13:40           ` Igor Mammedov
2015-09-08 14:03             ` Xiao Guangrong
2015-09-10  9:47               ` Igor Mammedov
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
2015-08-25 15:12   ` Stefan Hajnoczi
2015-08-26  9:39     ` Xiao Guangrong
2015-08-26  9:40     ` Xiao Guangrong
2015-08-25 15:39   ` Stefan Hajnoczi
2015-08-28 17:25   ` Eduardo Habkost
2015-08-31  7:01     ` Xiao Guangrong
2015-09-04 12:02   ` Igor Mammedov
2015-09-06  7:22     ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area Xiao Guangrong
2015-08-25 16:03   ` Stefan Hajnoczi
2015-08-26 10:40     ` Xiao Guangrong
2015-08-28 11:58       ` Stefan Hajnoczi
2015-08-31  6:23         ` Xiao Guangrong
2015-09-01  9:14           ` Stefan Hajnoczi
2015-09-15 16:10             ` Paolo Bonzini
2015-09-17  8:39               ` Xiao Guangrong
2015-09-17  9:04                 ` Igor Mammedov
2015-09-17  9:14                   ` Xiao Guangrong
2015-09-17  9:34                     ` Paolo Bonzini
2015-09-17 12:43                       ` Xiao Guangrong
2015-09-15 16:07       ` Paolo Bonzini
2015-09-17  8:23         ` Xiao Guangrong
2015-09-15 16:06     ` Paolo Bonzini
2015-09-17  8:21       ` Xiao Guangrong
2015-09-07 14:11   ` Igor Mammedov
2015-09-08 13:38     ` Xiao Guangrong
2015-09-10 10:35       ` Igor Mammedov
2015-09-15 16:11     ` Paolo Bonzini
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table Xiao Guangrong
2015-09-15 16:12   ` Paolo Bonzini
2015-09-15 17:35     ` Igor Mammedov
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method Xiao Guangrong
2015-08-25 16:11   ` Stefan Hajnoczi
2015-08-26 10:41     ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 11/18] nvdimm: build ACPI nvdimm devices Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 12/18] nvdimm: save arg3 for NVDIMM device _DSM method Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data Xiao Guangrong
2015-08-25 16:16   ` Stefan Hajnoczi
2015-08-26 10:42     ` Xiao Guangrong
2015-08-28 11:59       ` Stefan Hajnoczi
2015-08-31  6:25         ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function Xiao Guangrong
2015-08-25 16:23   ` Stefan Hajnoczi
2015-08-26 10:46     ` Xiao Guangrong
2015-08-28 12:01       ` Stefan Hajnoczi
2015-08-31  6:51         ` Xiao Guangrong
2015-09-01  9:16           ` Stefan Hajnoczi
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function Xiao Guangrong
2015-08-25 16:24   ` Stefan Hajnoczi
2015-08-26 10:47     ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 16/18] nvdimm: support NFIT_CMD_GET_CONFIG_DATA Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 17/18] nvdimm: support NFIT_CMD_SET_CONFIG_DATA Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 18/18] nvdimm: add maintain info Xiao Guangrong
2015-08-25 16:26 ` [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Stefan Hajnoczi
2015-08-26 10:49   ` Xiao Guangrong
2015-10-07 14:02     ` Stefan Hajnoczi
2015-10-07 14:43       ` Xiao Guangrong
2015-10-09 10:38         ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).