From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: pbonzini@redhat.com, imammedo@redhat.com
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>,
ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
stefanha@redhat.com, rth@twiddle.net
Subject: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
Date: Fri, 14 Aug 2015 22:51:53 +0800 [thread overview]
Message-ID: <1439563931-12352-1-git-send-email-guangrong.xiao@linux.intel.com> (raw)
Changlog:
- Use litten endian for DSM method, thanks for Stefan's suggestion
- introduce a new parameter, @configdata, if it's false, Qemu will
build a static and readonly namespace in memory and use it serveing
for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
reserved region is needed at the end of the @file, it is good for
the user who want to pass whole nvdimm device and make its data
completely be visible to guest
- divide the source code into separated files and add maintain info
BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week
====== Background ======
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field
====== Design ======
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space
BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
--- vPMEM design ---
We introduce a new device named "pc-nvdimm", it has a parameter, file, which
is the file-based backed memory passed to guest. The file can be regular file
and block device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.
--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Config Data Area
So that the label namespace data can be persistent during power lose or system
failure
--- _DSM method design ---
_DSM in ACPI is used to configure NVDIMM, currently we only allow access of
label namespace data, i.e, Get Namespace Label Size (Function Index 4),
Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
(Function Index 6)
_DSM uses two pages to transfer data between ACPI and Qemu, the first page
is RAM-based used to save the input info of _DSM method and Qemu reuse it
store output info and another page is MMIO-based, ACPI write data to this
page to transfer the control to Qemu
We use the address region above 4G to map these pages because there is huge
free space above 4G and it can avoid the address overlap with PCI and other
address reserved component (e,g HPET). This is also the reason we choose MMIO
notification instead of PIO
====== Test ======
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append '-device pc-nvdimm,file=/tmp/nvdimm' in Qemu command line
In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.
Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:
@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
continue;
}
- if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+ //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+ if (nfit_mem->memdev_pmem)
flags |= NDD_ALIASING;
You can append another NVDIMM device in guest and do:
# cd /sys/bus/nd/devices/
# cd namespace1.0/
# echo `uuidgen` > uuid
# echo `expr 1024 \* 1024 \* 128` > size
then reload nd.pmem.ko
You can see /dev/pmem1 appears
====== TODO ======
1) NVDIMM NUMA support
2) NVDIMM hotplug support
Xiao Guangrong (18):
acpi: allow aml_operation_region() working on 64 bit offset
i386/acpi-build: allow SSDT to operate on 64 bit
acpi: add aml_derefof
acpi: add aml_sizeof
acpi: add aml_create_field
pc: implement NVDIMM device abstract
nvdimm: reserve address range for NVDIMM
nvdimm: init backend memory mapping and config data area
nvdimm: build ACPI NFIT table
nvdimm: init the address region used by DSM method
nvdimm: build ACPI nvdimm devices
nvdimm: save arg3 for NVDIMM device _DSM method
nvdimm: build namespace config data
nvdimm: support NFIT_CMD_IMPLEMENTED function
nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
nvdimm: support NFIT_CMD_GET_CONFIG_DATA
nvdimm: support NFIT_CMD_SET_CONFIG_DATA
nvdimm: add maintain info
MAINTAINERS | 6 +
default-configs/i386-softmmu.mak | 1 +
default-configs/x86_64-softmmu.mak | 1 +
hw/Makefile.objs | 2 +-
hw/acpi/aml-build.c | 32 +-
hw/i386/acpi-build.c | 9 +-
hw/i386/acpi-dsdt.dsl | 2 +-
hw/i386/pc.c | 12 +-
hw/mem/Makefile.objs | 2 +
hw/mem/nvdimm/acpi.c | 864 +++++++++++++++++++++++++++++++++++++
hw/mem/nvdimm/internal.h | 42 ++
hw/mem/nvdimm/namespace.c | 307 +++++++++++++
hw/mem/nvdimm/pc-nvdimm.c | 244 +++++++++++
include/hw/acpi/aml-build.h | 5 +-
include/hw/mem/pc-nvdimm.h | 45 ++
15 files changed, 1566 insertions(+), 8 deletions(-)
create mode 100644 hw/mem/nvdimm/acpi.c
create mode 100644 hw/mem/nvdimm/internal.h
create mode 100644 hw/mem/nvdimm/namespace.c
create mode 100644 hw/mem/nvdimm/pc-nvdimm.c
create mode 100644 include/hw/mem/pc-nvdimm.h
--
2.4.3
next reply other threads:[~2015-08-14 14:58 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-14 14:51 Xiao Guangrong [this message]
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 01/18] acpi: allow aml_operation_region() working on 64 bit offset Xiao Guangrong
2015-09-02 8:05 ` Igor Mammedov
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 02/18] i386/acpi-build: allow SSDT to operate on 64 bit Xiao Guangrong
2015-09-02 10:06 ` Igor Mammedov
2015-09-02 10:43 ` Xiao Guangrong
2015-09-02 11:42 ` Igor Mammedov
2015-09-06 7:01 ` Xiao Guangrong
2015-09-02 12:05 ` Michael S. Tsirkin
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 03/18] acpi: add aml_derefof Xiao Guangrong
2015-09-02 10:16 ` Igor Mammedov
2015-09-02 10:38 ` Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 04/18] acpi: add aml_sizeof Xiao Guangrong
2015-09-02 10:18 ` Igor Mammedov
2015-09-02 10:39 ` Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 05/18] acpi: add aml_create_field Xiao Guangrong
2015-09-02 11:10 ` Igor Mammedov
2015-09-06 5:32 ` Xiao Guangrong
2015-08-14 14:51 ` [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract Xiao Guangrong
2015-08-25 14:57 ` Stefan Hajnoczi
2015-08-26 9:37 ` Xiao Guangrong
2015-09-02 9:58 ` Igor Mammedov
2015-09-02 10:36 ` Xiao Guangrong
2015-09-02 11:31 ` Igor Mammedov
2015-09-06 6:07 ` Xiao Guangrong
2015-09-07 13:40 ` Igor Mammedov
2015-09-08 14:03 ` Xiao Guangrong
2015-09-10 9:47 ` Igor Mammedov
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM Xiao Guangrong
2015-08-25 15:12 ` Stefan Hajnoczi
2015-08-26 9:39 ` Xiao Guangrong
2015-08-26 9:40 ` Xiao Guangrong
2015-08-25 15:39 ` Stefan Hajnoczi
2015-08-28 17:25 ` Eduardo Habkost
2015-08-31 7:01 ` Xiao Guangrong
2015-09-04 12:02 ` Igor Mammedov
2015-09-06 7:22 ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area Xiao Guangrong
2015-08-25 16:03 ` Stefan Hajnoczi
2015-08-26 10:40 ` Xiao Guangrong
2015-08-28 11:58 ` Stefan Hajnoczi
2015-08-31 6:23 ` Xiao Guangrong
2015-09-01 9:14 ` Stefan Hajnoczi
2015-09-15 16:10 ` Paolo Bonzini
2015-09-17 8:39 ` Xiao Guangrong
2015-09-17 9:04 ` Igor Mammedov
2015-09-17 9:14 ` Xiao Guangrong
2015-09-17 9:34 ` Paolo Bonzini
2015-09-17 12:43 ` Xiao Guangrong
2015-09-15 16:07 ` Paolo Bonzini
2015-09-17 8:23 ` Xiao Guangrong
2015-09-15 16:06 ` Paolo Bonzini
2015-09-17 8:21 ` Xiao Guangrong
2015-09-07 14:11 ` Igor Mammedov
2015-09-08 13:38 ` Xiao Guangrong
2015-09-10 10:35 ` Igor Mammedov
2015-09-15 16:11 ` Paolo Bonzini
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 09/18] nvdimm: build ACPI NFIT table Xiao Guangrong
2015-09-15 16:12 ` Paolo Bonzini
2015-09-15 17:35 ` Igor Mammedov
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 10/18] nvdimm: init the address region used by DSM method Xiao Guangrong
2015-08-25 16:11 ` Stefan Hajnoczi
2015-08-26 10:41 ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 11/18] nvdimm: build ACPI nvdimm devices Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 12/18] nvdimm: save arg3 for NVDIMM device _DSM method Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data Xiao Guangrong
2015-08-25 16:16 ` Stefan Hajnoczi
2015-08-26 10:42 ` Xiao Guangrong
2015-08-28 11:59 ` Stefan Hajnoczi
2015-08-31 6:25 ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function Xiao Guangrong
2015-08-25 16:23 ` Stefan Hajnoczi
2015-08-26 10:46 ` Xiao Guangrong
2015-08-28 12:01 ` Stefan Hajnoczi
2015-08-31 6:51 ` Xiao Guangrong
2015-09-01 9:16 ` Stefan Hajnoczi
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function Xiao Guangrong
2015-08-25 16:24 ` Stefan Hajnoczi
2015-08-26 10:47 ` Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 16/18] nvdimm: support NFIT_CMD_GET_CONFIG_DATA Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 17/18] nvdimm: support NFIT_CMD_SET_CONFIG_DATA Xiao Guangrong
2015-08-14 14:52 ` [Qemu-devel] [PATCH v2 18/18] nvdimm: add maintain info Xiao Guangrong
2015-08-25 16:26 ` [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM Stefan Hajnoczi
2015-08-26 10:49 ` Xiao Guangrong
2015-10-07 14:02 ` Stefan Hajnoczi
2015-10-07 14:43 ` Xiao Guangrong
2015-10-09 10:38 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1439563931-12352-1-git-send-email-guangrong.xiao@linux.intel.com \
--to=guangrong.xiao@linux.intel.com \
--cc=ehabkost@redhat.com \
--cc=gleb@kernel.org \
--cc=imammedo@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).