From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38239) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZlTXR-00087C-5q for qemu-devel@nongnu.org; Sun, 11 Oct 2015 23:12:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZlTXN-0000eV-SB for qemu-devel@nongnu.org; Sun, 11 Oct 2015 23:12:37 -0400 Received: from mga09.intel.com ([134.134.136.24]:26673) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZlTXN-0000dV-E1 for qemu-devel@nongnu.org; Sun, 11 Oct 2015 23:12:33 -0400 References: <1444535584-18220-1-git-send-email-guangrong.xiao@linux.intel.com> From: Xiao Guangrong Message-ID: <561B23AC.2010901@linux.intel.com> Date: Mon, 12 Oct 2015 11:06:20 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Bharata B Rao Cc: Eduardo Habkost , kvm@vger.kernel.org, "mst@redhat.com" , gleb@kernel.org, mtosatti@redhat.com, "qemu-devel@nongnu.org" , Stefan Hajnoczi , Igor Mammedov , Paolo Bonzini , dan.j.williams@intel.com, Richard Henderson On 10/12/2015 10:59 AM, Bharata B Rao wrote: > Xiao, > > Are these patches present in any git tree so that they can be easily tried out. > Sorry, currently no git tree out of my workspace is available :( BTW, this patchset is based on top of the commit b37686f7e on qemu tree: commit b37686f7e84b22cfaf7fd01ac5133f2617cc3027 Merge: 8be6e62 98cf48f Author: Peter Maydell Date: Fri Oct 9 12:18:13 2015 +0100 Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging Thanks. > Regards, > Bharata. > > On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong > wrote: >> Changelog in v3: >> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, >> Michael for their valuable comments, the patchset finally gets better shape. >> - changes from Igor's comments: >> 1) abstract dimm device type from pc-dimm and create nvdimm device based on >> dimm, then it uses memory backend device as nvdimm's memory and NUMA has >> easily been implemented. >> 2) let file-backend device support any kind of filesystem not only for >> hugetlbfs and let it work on file not only for directory which is >> achieved by extending 'mem-path' - if it's a directory then it works as >> current behavior, otherwise if it's file then directly allocates memory >> from it. >> 3) we figure out a unused memory hole below 4G that is 0xFF00000 ~ >> 0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit >> ACPI SSDT/DSDT table will break windows XP. >> BTW, only make SSDT.rev = 2 can not work since the width is only depended >> on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) >> in ACPI spec: >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit >> | width of Integer objects is dependent on the ComplianceRevision of the DSDT. >> | If the ComplianceRevision is less than 2, all integers are restricted to 32 >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets >> | the global integer width for all integers, including integers in SSDTs. >> 4) use the lowest ACPI spec version to document AML terms. >> 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" >> >> - changes from Stefan's comments: >> 1) do not do endian adjustment in-place since _DSM memory is visible to guest >> 2) use target platform's target page size instead of fixed PAGE_SIZE >> definition >> 3) lots of code style improvement and typo fixes. >> 4) live migration fix >> - changes from Paolo's comments: >> 1) improve the name of memory region >> >> - other changes: >> 1) return exact buffer size for _DSM method instead of the page size. >> 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm >> devices. >> 3) NUMA support >> 4) implement _FIT method >> 5) rename "configdata" to "reserve-label-data" >> 6) simplify _DSM arg3 determination >> 7) main changelog update to let it reflect v3. >> >> Changlog in v2: >> - Use litten endian for DSM method, thanks for Stefan's suggestion >> >> - introduce a new parameter, @configdata, if it's false, Qemu will >> build a static and readonly namespace in memory and use it serveing >> for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no >> reserved region is needed at the end of the @file, it is good for >> the user who want to pass whole nvdimm device and make its data >> completely be visible to guest >> >> - divide the source code into separated files and add maintain info >> >> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will >> be posted on next week >> >> ====== Background ====== >> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported >> on Intel's platform. They are discovered via ACPI and configured by _DSM >> method of NVDIMM device in ACPI. There has some supporting documents which >> can be found at: >> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf >> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf >> DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf >> Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf >> >> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and >> this patchset tries to enable it in virtualization field >> >> ====== Design ====== >> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's >> address space then CPU can directly access it as normal memory, another is >> BLK which is used as block device to reduce the occupying of CPU address >> space >> >> BLK mode accesses NVDIMM via Command Register window and Data Register window. >> BLK virtualization has high workload since each sector access will cause at >> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset >> >> --- vPMEM design --- >> We introduce a new device named "nvdimm", it uses memory backend device as >> NVDIMM memory. The file in file-backend device can be a regular file and block >> device. We can use any file when we do test or emulation, however, >> in the real word, the files passed to guest are: >> - the regular file in the filesystem with DAX enabled created on NVDIMM device >> on host >> - the raw PMEM device on host, e,g /dev/pmem0 >> Memory access on the address created by mmap on these kinds of files can >> directly reach NVDIMM device on host. >> >> --- vConfigure data area design --- >> Each NVDIMM device has a configure data area which is used to store label >> namespace data. In order to emulating this area, we divide the file into two >> parts: >> - first parts is (0, size - 128K], which is used as PMEM >> - 128K at the end of the file, which is used as Label Data Area >> So that the label namespace data can be persistent during power lose or system >> failure. >> >> We also support passing the whole file to guest without reserve any region for >> label data area which is achieved by "reserve-label-data" parameter - if it's >> false then QEMU will build static and readonly namespace in memory and that >> namespace contains the whole file size. The parameter is false on default. >> >> --- _DSM method design --- >> _DSM in ACPI is used to configure NVDIMM, currently we only allow access of >> label namespace data, i.e, Get Namespace Label Size (Function Index 4), >> Get Namespace Label Data (Function Index 5) and Set Namespace Label Data >> (Function Index 6) >> >> _DSM uses two pages to transfer data between ACPI and Qemu, the first page >> is RAM-based used to save the input info of _DSM method and Qemu reuse it >> store output info and another page is MMIO-based, ACPI write data to this >> page to transfer the control to Qemu >> >> ====== Test ====== >> In host >> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10 >> 2) append "-object memory-backend-file,share,id=mem1, >> mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data, >> id=nv1" in QEMU command line >> >> In guest, download the latest upsteam kernel (4.2 merge window) and enable >> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM. >> 1) insmod drivers/nvdimm/libnvdimm.ko >> 2) insmod drivers/acpi/nfit.ko >> 3) insmod drivers/nvdimm/nd_btt.ko >> 4) insmod drivers/nvdimm/nd_pmem.ko >> You can see the whole nvdimm device used as a single namespace and /dev/pmem0 >> appears. You can do whatever on /dev/pmem0 including DAX access. >> >> Currently Linux NVDIMM driver does not support namespace operation on this >> kind of PMEM, apply below changes to support dynamical namespace: >> >> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a >> continue; >> } >> >> - if (nfit_mem->bdw && nfit_mem->memdev_pmem) >> + //if (nfit_mem->bdw && nfit_mem->memdev_pmem) >> + if (nfit_mem->memdev_pmem) >> flags |= NDD_ALIASING; >> >> You can append another NVDIMM device in guest and do: >> # cd /sys/bus/nd/devices/ >> # cd namespace1.0/ >> # echo `uuidgen` > uuid >> # echo `expr 1024 \* 1024 \* 128` > size >> then reload nd.pmem.ko >> >> You can see /dev/pmem1 appears >> >> ====== TODO ====== >> NVDIMM hotplug support >> >> Xiao Guangrong (32): >> acpi: add aml_derefof >> acpi: add aml_sizeof >> acpi: add aml_create_field >> acpi: add aml_mutex, aml_acquire, aml_release >> acpi: add aml_concatenate >> acpi: add aml_object_type >> util: introduce qemu_file_get_page_size() >> exec: allow memory to be allocated from any kind of path >> exec: allow file_ram_alloc to work on file >> hostmem-file: clean up memory allocation >> hostmem-file: use whole file size if possible >> pc-dimm: remove DEFAULT_PC_DIMMSIZE >> pc-dimm: make pc_existing_dimms_capacity static and rename it >> pc-dimm: drop the prefix of pc-dimm >> stubs: rename qmp_pc_dimm_device_list.c >> pc-dimm: rename pc-dimm.c and pc-dimm.h >> dimm: abstract dimm device from pc-dimm >> dimm: get mapped memory region from DIMMDeviceClass->get_memory_region >> dimm: keep the state of the whole backend memory >> dimm: introduce realize callback >> nvdimm: implement NVDIMM device abstract >> nvdimm: init the address region used by NVDIMM ACPI >> nvdimm: build ACPI NFIT table >> nvdimm: init the address region used by DSM method >> nvdimm: build ACPI nvdimm devices >> nvdimm: save arg3 for NVDIMM device _DSM method >> nvdimm: support DSM_CMD_IMPLEMENTED function >> nvdimm: support DSM_CMD_NAMESPACE_LABEL_SIZE function >> nvdimm: support DSM_CMD_GET_NAMESPACE_LABEL_DATA >> nvdimm: support DSM_CMD_SET_NAMESPACE_LABEL_DATA >> nvdimm: allow using whole backend memory as pmem >> nvdimm: add maintain info >> >> MAINTAINERS | 6 + >> backends/hostmem-file.c | 58 +- >> default-configs/i386-softmmu.mak | 2 + >> default-configs/x86_64-softmmu.mak | 2 + >> exec.c | 113 ++- >> hmp.c | 2 +- >> hw/Makefile.objs | 2 +- >> hw/acpi/aml-build.c | 83 ++ >> hw/acpi/ich9.c | 8 +- >> hw/acpi/memory_hotplug.c | 26 +- >> hw/acpi/piix4.c | 8 +- >> hw/i386/acpi-build.c | 4 + >> hw/i386/pc.c | 37 +- >> hw/mem/Makefile.objs | 3 + >> hw/mem/{pc-dimm.c => dimm.c} | 240 ++--- >> hw/mem/nvdimm/acpi.c | 961 +++++++++++++++++++++ >> hw/mem/nvdimm/internal.h | 41 + >> hw/mem/nvdimm/namespace.c | 309 +++++++ >> hw/mem/nvdimm/nvdimm.c | 136 +++ >> hw/mem/pc-dimm.c | 506 +---------- >> hw/ppc/spapr.c | 20 +- >> include/hw/acpi/aml-build.h | 8 + >> include/hw/i386/pc.h | 4 +- >> include/hw/mem/{pc-dimm.h => dimm.h} | 68 +- >> include/hw/mem/nvdimm.h | 58 ++ >> include/hw/mem/pc-dimm.h | 105 +-- >> include/hw/ppc/spapr.h | 2 +- >> include/qemu/osdep.h | 1 + >> numa.c | 4 +- >> qapi-schema.json | 8 +- >> qmp.c | 4 +- >> stubs/Makefile.objs | 2 +- >> ...c_dimm_device_list.c => qmp_dimm_device_list.c} | 4 +- >> target-ppc/kvm.c | 21 +- >> trace-events | 8 +- >> util/oslib-posix.c | 16 + >> util/oslib-win32.c | 5 + >> 37 files changed, 2023 insertions(+), 862 deletions(-) >> rename hw/mem/{pc-dimm.c => dimm.c} (65%) >> create mode 100644 hw/mem/nvdimm/acpi.c >> create mode 100644 hw/mem/nvdimm/internal.h >> create mode 100644 hw/mem/nvdimm/namespace.c >> create mode 100644 hw/mem/nvdimm/nvdimm.c >> rewrite hw/mem/pc-dimm.c (91%) >> rename include/hw/mem/{pc-dimm.h => dimm.h} (50%) >> create mode 100644 include/hw/mem/nvdimm.h >> rewrite include/hw/mem/pc-dimm.h (97%) >> rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (56%) >> >> -- >> 1.8.3.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >