* [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
@ 2019-02-06 17:26 ` Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h Zhang, Yi
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:26 UTC (permalink / raw)
To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, mst, ehabkost
Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi
From: Zhang Yi <yi.z.zhang@linux.intel.com>
besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
exec.c | 2 +-
include/qemu/mmap-alloc.h | 21 ++++++++++++++++++++-
util/mmap-alloc.c | 6 +++++-
util/oslib-posix.c | 2 +-
4 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/exec.c b/exec.c
index bb6170d..27cea52 100644
--- a/exec.c
+++ b/exec.c
@@ -1860,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block,
}
area = qemu_ram_mmap(fd, memory, block->mr->align,
- block->flags & RAM_SHARED);
+ block->flags & RAM_SHARED, block->flags & RAM_PMEM);
if (area == MAP_FAILED) {
error_setg_errno(errp, errno,
"unable to map backing store for guest RAM");
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3..190688a 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,7 +7,26 @@ size_t qemu_fd_getpagesize(int fd);
size_t qemu_mempath_getpagesize(const char *mem_path);
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
+/**
+ * qemu_ram_mmap: mmap the specified file or device.
+ *
+ * Parameters:
+ * @fd: the file or the device to mmap
+ * @size: the number of bytes to be mmaped
+ * @align: if not zero, specify the alignment of the starting mapping address;
+ * otherwise, the alignment in use will be determined by QEMU.
+ * @shared: map has RAM_SHARED flag.
+ * @is_pmem: map has RAM_PMEM flag.
+ *
+ * Return:
+ * On success, return a pointer to the mapped area.
+ * On failure, return MAP_FAILED.
+ */
+void *qemu_ram_mmap(int fd,
+ size_t size,
+ size_t align,
+ bool shared,
+ bool is_pmem);
void qemu_ram_munmap(void *ptr, size_t size);
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329ec..97bbeed 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -75,7 +75,11 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
return getpagesize();
}
-void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
+void *qemu_ram_mmap(int fd,
+ size_t size,
+ size_t align,
+ bool shared,
+ bool is_pmem)
{
/*
* Note: this always allocates at least one extra page of virtual address
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8..040937f 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -203,7 +203,7 @@ void *qemu_memalign(size_t alignment, size_t size)
void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
{
size_t align = QEMU_VMALLOC_ALIGN;
- void *ptr = qemu_ram_mmap(-1, size, align, shared);
+ void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
if (ptr == MAP_FAILED) {
return NULL;
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h
2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
2019-02-06 17:26 ` [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 3/5] linux-headers: " Zhang, Yi
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, mst, ehabkost
Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi
From: Zhang Yi <yi.z.zhang@linux.intel.com>
Add linux/mman.h,asm/mman.h,asm/mman-common.h to linux-headers,
So we can use more mmap2 flags.
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
| 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 0a964fe..57db5d9 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -95,7 +95,7 @@ for arch in $ARCHLIST; do
rm -rf "$output/linux-headers/asm-$arch"
mkdir -p "$output/linux-headers/asm-$arch"
- for header in kvm.h unistd.h bitsperlong.h; do
+ for header in kvm.h unistd.h bitsperlong.h mman.h; do
cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
done
@@ -126,13 +126,13 @@ done
rm -rf "$output/linux-headers/linux"
mkdir -p "$output/linux-headers/linux"
for header in kvm.h vfio.h vfio_ccw.h vhost.h \
- psci.h psp-sev.h userfaultfd.h; do
+ psci.h psp-sev.h userfaultfd.h mman.h; do
cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
done
rm -rf "$output/linux-headers/asm-generic"
mkdir -p "$output/linux-headers/asm-generic"
-for header in unistd.h bitsperlong.h; do
+for header in unistd.h bitsperlong.h mman-common.h mman.h hugetlb_encode.h; do
cp "$tmpdir/include/asm-generic/$header" "$output/linux-headers/asm-generic"
done
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH V12 3/5] linux-headers: add linux/mman.h.
2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
2019-02-06 17:26 ` [Qemu-devel] [PATCH V12 1/5] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 2/5] scripts/update-linux-headers: add linux/mman.h Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
4 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, mst, ehabkost
Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi
From: Zhang Yi <yi.z.zhang@linux.intel.com>
Update it to 4.20-rc1
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
| 4 ++
| 1 +
| 36 ++++++++++
| 77 ++++++++++++++++++++
| 24 +++++++
| 108 +++++++++++++++++++++++++++++
| 39 +++++++++++
| 1 +
| 31 +++++++++
| 38 ++++++++++
10 files changed, 359 insertions(+)
create mode 100644 linux-headers/asm-arm/mman.h
create mode 100644 linux-headers/asm-arm64/mman.h
create mode 100644 linux-headers/asm-generic/hugetlb_encode.h
create mode 100644 linux-headers/asm-generic/mman-common.h
create mode 100644 linux-headers/asm-generic/mman.h
create mode 100644 linux-headers/asm-mips/mman.h
create mode 100644 linux-headers/asm-powerpc/mman.h
create mode 100644 linux-headers/asm-s390/mman.h
create mode 100644 linux-headers/asm-x86/mman.h
create mode 100644 linux-headers/linux/mman.h
--git a/linux-headers/asm-arm/mman.h b/linux-headers/asm-arm/mman.h
new file mode 100644
index 0000000..41f99c5
--- /dev/null
+++ b/linux-headers/asm-arm/mman.h
@@ -0,0 +1,4 @@
+#include <asm-generic/mman.h>
+
+#define arch_mmap_check(addr, len, flags) \
+ (((flags) & MAP_FIXED && (addr) < FIRST_USER_ADDRESS) ? -EINVAL : 0)
--git a/linux-headers/asm-arm64/mman.h b/linux-headers/asm-arm64/mman.h
new file mode 100644
index 0000000..8eebf89
--- /dev/null
+++ b/linux-headers/asm-arm64/mman.h
@@ -0,0 +1 @@
+#include <asm-generic/mman.h>
--git a/linux-headers/asm-generic/hugetlb_encode.h b/linux-headers/asm-generic/hugetlb_encode.h
new file mode 100644
index 0000000..b0f8e87
--- /dev/null
+++ b/linux-headers/asm-generic/hugetlb_encode.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_GENERIC_HUGETLB_ENCODE_H_
+#define _ASM_GENERIC_HUGETLB_ENCODE_H_
+
+/*
+ * Several system calls take a flag to request "hugetlb" huge pages.
+ * Without further specification, these system calls will use the
+ * system's default huge page size. If a system supports multiple
+ * huge page sizes, the desired huge page size can be specified in
+ * bits [26:31] of the flag arguments. The value in these 6 bits
+ * will encode the log2 of the huge page size.
+ *
+ * The following definitions are associated with this huge page size
+ * encoding in flag arguments. System call specific header files
+ * that use this encoding should include this file. They can then
+ * provide definitions based on these with their own specific prefix.
+ * for example:
+ * #define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+ */
+
+#define HUGETLB_FLAG_ENCODE_SHIFT 26
+#define HUGETLB_FLAG_ENCODE_MASK 0x3f
+
+#define HUGETLB_FLAG_ENCODE_64KB (16 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512KB (19 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1MB (20 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2MB (21 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_8MB (23 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16MB (24 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_32MB (25 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_256MB (28 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_512MB (29 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1GB (30 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2GB (31 << HUGETLB_FLAG_ENCODE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16GB (34 << HUGETLB_FLAG_ENCODE_SHIFT)
+
+#endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
--git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
new file mode 100644
index 0000000..e7ee328
--- /dev/null
+++ b/linux-headers/asm-generic/mman-common.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_GENERIC_MMAN_COMMON_H
+#define __ASM_GENERIC_MMAN_COMMON_H
+
+/*
+ Author: Michael S. Tsirkin <mst@mellanox.co.il>, Mellanox Technologies Ltd.
+ Based on: asm-xxx/mman.h
+*/
+
+#define PROT_READ 0x1 /* page can be read */
+#define PROT_WRITE 0x2 /* page can be written */
+#define PROT_EXEC 0x4 /* page can be executed */
+#define PROT_SEM 0x8 /* page may be used for atomic ops */
+#define PROT_NONE 0x0 /* page can not be accessed */
+#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */
+#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+
+#define MAP_SHARED 0x01 /* Share changes */
+#define MAP_PRIVATE 0x02 /* Changes are private */
+#define MAP_SHARED_VALIDATE 0x03 /* share + validate extension flags */
+#define MAP_TYPE 0x0f /* Mask for type of mapping */
+#define MAP_FIXED 0x10 /* Interpret addr exactly */
+#define MAP_ANONYMOUS 0x20 /* don't use a file */
+#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
+# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
+#else
+# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */
+#endif
+
+/* 0x0100 - 0x80000 flags are defined in asm-generic/mman.h */
+#define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
+
+#define MS_ASYNC 1 /* sync memory asynchronously */
+#define MS_INVALIDATE 2 /* invalidate the caches */
+#define MS_SYNC 4 /* synchronous memory sync */
+
+#define MADV_NORMAL 0 /* no further special treatment */
+#define MADV_RANDOM 1 /* expect random page references */
+#define MADV_SEQUENTIAL 2 /* expect sequential page references */
+#define MADV_WILLNEED 3 /* will need these pages */
+#define MADV_DONTNEED 4 /* don't need these pages */
+
+/* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE 8 /* free pages only if memory pressure */
+#define MADV_REMOVE 9 /* remove these pages & resources */
+#define MADV_DONTFORK 10 /* don't inherit across fork */
+#define MADV_DOFORK 11 /* do inherit across fork */
+#define MADV_HWPOISON 100 /* poison a page for testing */
+#define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */
+
+#define MADV_MERGEABLE 12 /* KSM may merge identical pages */
+#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
+
+#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
+#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
+
+#define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
+ overrides the coredump filter bits */
+#define MADV_DODUMP 17 /* Clear the MADV_DONTDUMP flag */
+
+#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
+
+/* compatibility flags */
+#define MAP_FILE 0
+
+#define PKEY_DISABLE_ACCESS 0x1
+#define PKEY_DISABLE_WRITE 0x2
+#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
+ PKEY_DISABLE_WRITE)
+
+#endif /* __ASM_GENERIC_MMAN_COMMON_H */
--git a/linux-headers/asm-generic/mman.h b/linux-headers/asm-generic/mman.h
new file mode 100644
index 0000000..653687d
--- /dev/null
+++ b/linux-headers/asm-generic/mman.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __ASM_GENERIC_MMAN_H
+#define __ASM_GENERIC_MMAN_H
+
+#include <asm-generic/mman-common.h>
+
+#define MAP_GROWSDOWN 0x0100 /* stack-like segment */
+#define MAP_DENYWRITE 0x0800 /* ETXTBSY */
+#define MAP_EXECUTABLE 0x1000 /* mark it as an executable */
+#define MAP_LOCKED 0x2000 /* pages are locked */
+#define MAP_NORESERVE 0x4000 /* don't check for reservations */
+#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
+#define MAP_NONBLOCK 0x10000 /* do not block on IO */
+#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_SYNC 0x80000 /* perform synchronous page faults for the mapping */
+
+/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
+
+#define MCL_CURRENT 1 /* lock all current mappings */
+#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+#endif /* __ASM_GENERIC_MMAN_H */
--git a/linux-headers/asm-mips/mman.h b/linux-headers/asm-mips/mman.h
new file mode 100644
index 0000000..3035ca4
--- /dev/null
+++ b/linux-headers/asm-mips/mman.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 1995, 1999, 2002 by Ralf Baechle
+ */
+#ifndef _ASM_MMAN_H
+#define _ASM_MMAN_H
+
+/*
+ * Protections are chosen from these bits, OR'd together. The
+ * implementation does not necessarily support PROT_EXEC or PROT_WRITE
+ * without PROT_READ. The only guarantees are that no writing will be
+ * allowed without PROT_WRITE and no access will be allowed for PROT_NONE.
+ */
+#define PROT_NONE 0x00 /* page can not be accessed */
+#define PROT_READ 0x01 /* page can be read */
+#define PROT_WRITE 0x02 /* page can be written */
+#define PROT_EXEC 0x04 /* page can be executed */
+/* 0x08 reserved for PROT_EXEC_NOFLUSH */
+#define PROT_SEM 0x10 /* page may be used for atomic ops */
+#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */
+#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
+
+/*
+ * Flags for mmap
+ */
+#define MAP_SHARED 0x001 /* Share changes */
+#define MAP_PRIVATE 0x002 /* Changes are private */
+#define MAP_SHARED_VALIDATE 0x003 /* share + validate extension flags */
+#define MAP_TYPE 0x00f /* Mask for type of mapping */
+#define MAP_FIXED 0x010 /* Interpret addr exactly */
+
+/* not used by linux, but here to make sure we don't clash with ABI defines */
+#define MAP_RENAME 0x020 /* Assign page to file */
+#define MAP_AUTOGROW 0x040 /* File may grow by writing */
+#define MAP_LOCAL 0x080 /* Copy on fork/sproc */
+#define MAP_AUTORSRV 0x100 /* Logical swap reserved on demand */
+
+/* These are linux-specific */
+#define MAP_NORESERVE 0x0400 /* don't check for reservations */
+#define MAP_ANONYMOUS 0x0800 /* don't use a file */
+#define MAP_GROWSDOWN 0x1000 /* stack-like segment */
+#define MAP_DENYWRITE 0x2000 /* ETXTBSY */
+#define MAP_EXECUTABLE 0x4000 /* mark it as an executable */
+#define MAP_LOCKED 0x8000 /* pages are locked */
+#define MAP_POPULATE 0x10000 /* populate (prefault) pagetables */
+#define MAP_NONBLOCK 0x20000 /* do not block on IO */
+#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */
+
+/*
+ * Flags for msync
+ */
+#define MS_ASYNC 0x0001 /* sync memory asynchronously */
+#define MS_INVALIDATE 0x0002 /* invalidate mappings & caches */
+#define MS_SYNC 0x0004 /* synchronous memory sync */
+
+/*
+ * Flags for mlockall
+ */
+#define MCL_CURRENT 1 /* lock all current mappings */
+#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
+
+#define MADV_NORMAL 0 /* no further special treatment */
+#define MADV_RANDOM 1 /* expect random page references */
+#define MADV_SEQUENTIAL 2 /* expect sequential page references */
+#define MADV_WILLNEED 3 /* will need these pages */
+#define MADV_DONTNEED 4 /* don't need these pages */
+
+/* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE 8 /* free pages only if memory pressure */
+#define MADV_REMOVE 9 /* remove these pages & resources */
+#define MADV_DONTFORK 10 /* don't inherit across fork */
+#define MADV_DOFORK 11 /* do inherit across fork */
+
+#define MADV_MERGEABLE 12 /* KSM may merge identical pages */
+#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
+#define MADV_HWPOISON 100 /* poison a page for testing */
+
+#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
+#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
+
+#define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
+ overrides the coredump filter bits */
+#define MADV_DODUMP 17 /* Clear the MADV_NODUMP flag */
+
+#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
+
+/* compatibility flags */
+#define MAP_FILE 0
+
+#define PKEY_DISABLE_ACCESS 0x1
+#define PKEY_DISABLE_WRITE 0x2
+#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
+ PKEY_DISABLE_WRITE)
+
+#endif /* _ASM_MMAN_H */
--git a/linux-headers/asm-powerpc/mman.h b/linux-headers/asm-powerpc/mman.h
new file mode 100644
index 0000000..1c2b3fc
--- /dev/null
+++ b/linux-headers/asm-powerpc/mman.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _ASM_POWERPC_MMAN_H
+#define _ASM_POWERPC_MMAN_H
+
+#include <asm-generic/mman-common.h>
+
+
+#define PROT_SAO 0x10 /* Strong Access Ordering */
+
+#define MAP_RENAME MAP_ANONYMOUS /* In SunOS terminology */
+#define MAP_NORESERVE 0x40 /* don't reserve swap pages */
+#define MAP_LOCKED 0x80
+
+#define MAP_GROWSDOWN 0x0100 /* stack-like segment */
+#define MAP_DENYWRITE 0x0800 /* ETXTBSY */
+#define MAP_EXECUTABLE 0x1000 /* mark it as an executable */
+
+#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
+#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */
+
+#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
+#define MAP_NONBLOCK 0x10000 /* do not block on IO */
+#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
+#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+
+/* Override any generic PKEY permission defines */
+#define PKEY_DISABLE_EXECUTE 0x4
+#undef PKEY_ACCESS_MASK
+#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
+ PKEY_DISABLE_WRITE |\
+ PKEY_DISABLE_EXECUTE)
+#endif /* _ASM_POWERPC_MMAN_H */
--git a/linux-headers/asm-s390/mman.h b/linux-headers/asm-s390/mman.h
new file mode 100644
index 0000000..8eebf89
--- /dev/null
+++ b/linux-headers/asm-s390/mman.h
@@ -0,0 +1 @@
+#include <asm-generic/mman.h>
--git a/linux-headers/asm-x86/mman.h b/linux-headers/asm-x86/mman.h
new file mode 100644
index 0000000..d4a8d04
--- /dev/null
+++ b/linux-headers/asm-x86/mman.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _ASM_X86_MMAN_H
+#define _ASM_X86_MMAN_H
+
+#define MAP_32BIT 0x40 /* only give out 32bit addresses */
+
+#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+/*
+ * Take the 4 protection key bits out of the vma->vm_flags
+ * value and turn them in to the bits that we can put in
+ * to a pte.
+ *
+ * Only override these if Protection Keys are available
+ * (which is only on 64-bit).
+ */
+#define arch_vm_get_page_prot(vm_flags) __pgprot( \
+ ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \
+ ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \
+ ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \
+ ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
+
+#define arch_calc_vm_prot_bits(prot, key) ( \
+ ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \
+ ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \
+ ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \
+ ((key) & 0x8 ? VM_PKEY_BIT3 : 0))
+#endif
+
+#include <asm-generic/mman.h>
+
+#endif /* _ASM_X86_MMAN_H */
--git a/linux-headers/linux/mman.h b/linux-headers/linux/mman.h
new file mode 100644
index 0000000..3c44b6f
--- /dev/null
+++ b/linux-headers/linux/mman.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_MMAN_H
+#define _LINUX_MMAN_H
+
+#include <asm/mman.h>
+#include <asm-generic/hugetlb_encode.h>
+
+#define MREMAP_MAYMOVE 1
+#define MREMAP_FIXED 2
+
+#define OVERCOMMIT_GUESS 0
+#define OVERCOMMIT_ALWAYS 1
+#define OVERCOMMIT_NEVER 2
+
+/*
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h.
+ * All known huge page size encodings are provided here. It is the
+ * responsibility of the application to know which sizes are supported on
+ * the running system. See mmap(2) man page for details.
+ */
+#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
+
+#define MAP_HUGE_64KB HUGETLB_FLAG_ENCODE_64KB
+#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB
+#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB
+#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
+#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB
+#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB
+#define MAP_HUGE_32MB HUGETLB_FLAG_ENCODE_32MB
+#define MAP_HUGE_256MB HUGETLB_FLAG_ENCODE_256MB
+#define MAP_HUGE_512MB HUGETLB_FLAG_ENCODE_512MB
+#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
+#define MAP_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB
+#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE_16GB
+
+#endif /* _LINUX_MMAN_H */
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
` (2 preceding siblings ...)
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 3/5] linux-headers: " Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
2019-02-06 18:25 ` Michael S. Tsirkin
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
4 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, mst, ehabkost
Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi
From: Zhang Yi <yi.z.zhang@linux.intel.com>
When a file supporting DAX is used as vNVDIMM backend, mmap it with
MAP_SYNC flag in addition which can ensure file system metadata
synced in each guest writes to the backend file, without other QEMU
actions (e.g., periodic fsync() by QEMU).
Current, We have below different possible use cases:
1. pmem=on is set, shared=on is set, MAP_SYNC supported:
a: backend is a dax supporting file.
- MAP_SYNC will active.
b: backend is not a dax supporting file.
- mmap will trigger a warning. then MAP_SYNC flag will be ignored
2. The rest of cases:
- we will never pass the MAP_SYNC to mmap2
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
include/qemu/osdep.h | 7 +++++++
util/mmap-alloc.c | 24 +++++++++++++++++++++++-
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 457d24e..9a94cc3 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -115,6 +115,13 @@ extern int daemon(int, int);
#include "sysemu/os-win32.h"
#endif
+#ifdef CONFIG_LINUX
+#include <linux/mman.h>
+#else /* !CONFIG_LINUX */
+#define MAP_SYNC 0x0
+#define MAP_SHARED_VALIDATE 0x0
+#endif /* CONFIG_LINUX */
+
#ifdef CONFIG_POSIX
#include "sysemu/os-posix.h"
#endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 97bbeed..e4e55fc 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -15,6 +15,7 @@
#include "qemu/host-utils.h"
#define HUGETLBFS_MAGIC 0x958458f6
+#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
#ifdef CONFIG_LINUX
#include <sys/vfs.h>
@@ -101,6 +102,7 @@ void *qemu_ram_mmap(int fd,
#else
void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
#endif
+ int mmap_flags;
size_t offset;
void *ptr1;
@@ -111,13 +113,33 @@ void *qemu_ram_mmap(int fd,
assert(is_power_of_2(align));
/* Always align to host page size */
assert(align >= getpagesize());
+ mmap_flags = shared ? MAP_SHARED : MAP_PRIVATE;
+ if (shared && is_pmem) {
+ mmap_flags |= MAP_SYNC_FLAGS;
+ }
offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
MAP_FIXED |
(fd == -1 ? MAP_ANONYMOUS : 0) |
- (shared ? MAP_SHARED : MAP_PRIVATE),
+ mmap_flags,
fd, 0);
+
+
+ if (ptr1 == MAP_FAILED &&
+ (mmap_flags & MAP_SYNC_FLAGS) == MAP_SYNC_FLAGS) {
+ if (errno == ENOTSUP) {
+ perror("failed to validate with mapping flags");
+ }
+ /* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
+ * we will remove these flags to handle compatibility.
+ */
+ ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
+ MAP_FIXED |
+ (fd == -1 ? MAP_ANONYMOUS : 0) |
+ MAP_SHARED,
+ fd, 0);
+ }
if (ptr1 == MAP_FAILED) {
munmap(ptr, total);
return MAP_FAILED;
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
@ 2019-02-06 18:25 ` Michael S. Tsirkin
0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-02-06 18:25 UTC (permalink / raw)
To: Zhang, Yi
Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams
On Thu, Feb 07, 2019 at 01:27:19AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
>
> When a file supporting DAX is used as vNVDIMM backend, mmap it with
> MAP_SYNC flag in addition which can ensure file system metadata
> synced in each guest writes to the backend file, without other QEMU
> actions (e.g., periodic fsync() by QEMU).
>
> Current, We have below different possible use cases:
>
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
> a: backend is a dax supporting file.
> - MAP_SYNC will active.
> b: backend is not a dax supporting file.
> - mmap will trigger a warning. then MAP_SYNC flag will be ignored
>
> 2. The rest of cases:
> - we will never pass the MAP_SYNC to mmap2
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
> include/qemu/osdep.h | 7 +++++++
> util/mmap-alloc.c | 24 +++++++++++++++++++++++-
> 2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 457d24e..9a94cc3 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -115,6 +115,13 @@ extern int daemon(int, int);
> #include "sysemu/os-win32.h"
> #endif
>
> +#ifdef CONFIG_LINUX
> +#include <linux/mman.h>
> +#else /* !CONFIG_LINUX */
> +#define MAP_SYNC 0x0
> +#define MAP_SHARED_VALIDATE 0x0
> +#endif /* CONFIG_LINUX */
> +
> #ifdef CONFIG_POSIX
> #include "sysemu/os-posix.h"
> #endif
It's only used in one place. Maybe put this code in mmap-alloc.c ?
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 97bbeed..e4e55fc 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -15,6 +15,7 @@
> #include "qemu/host-utils.h"
>
> #define HUGETLBFS_MAGIC 0x958458f6
> +#define MAP_SYNC_FLAGS (MAP_SYNC | MAP_SHARED_VALIDATE)
>
Pls don't do this, just put it in a local variable within qemu_ram_mmap.
> #ifdef CONFIG_LINUX
> #include <sys/vfs.h>
> @@ -101,6 +102,7 @@ void *qemu_ram_mmap(int fd,
> #else
> void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> #endif
> + int mmap_flags;
> size_t offset;
> void *ptr1;
>
> @@ -111,13 +113,33 @@ void *qemu_ram_mmap(int fd,
> assert(is_power_of_2(align));
> /* Always align to host page size */
> assert(align >= getpagesize());
> + mmap_flags = shared ? MAP_SHARED : MAP_PRIVATE;
> + if (shared && is_pmem) {
> + mmap_flags |= MAP_SYNC_FLAGS;
> + }
>
> offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> MAP_FIXED |
> (fd == -1 ? MAP_ANONYMOUS : 0) |
> - (shared ? MAP_SHARED : MAP_PRIVATE),
> + mmap_flags,
> fd, 0);
> +
> +
> + if (ptr1 == MAP_FAILED &&
> + (mmap_flags & MAP_SYNC_FLAGS) == MAP_SYNC_FLAGS) {
> + if (errno == ENOTSUP) {
> + perror("failed to validate with mapping flags");
I don't think this warning message makes sense.
Are you trying to say:
Warning: requesting persistence across crashes
for file XYZ failed. Proceeding without persistence,
data might become corrupted in case of host crash.
?
> + }
> + /* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
> + * we will remove these flags to handle compatibility.
> + */
> + ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> + MAP_FIXED |
> + (fd == -1 ? MAP_ANONYMOUS : 0) |
> + MAP_SHARED,
> + fd, 0);
> + }
> if (ptr1 == MAP_FAILED) {
> munmap(ptr, total);
> return MAP_FAILED;
> --
> 2.7.4
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
2019-02-06 17:25 [Qemu-devel] [PATCH V12 0/5] support MAP_SYNC for memory-backend-file Zhang, Yi
` (3 preceding siblings ...)
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 4/5] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() Zhang, Yi
@ 2019-02-06 17:27 ` Zhang, Yi
2019-02-06 18:29 ` Michael S. Tsirkin
4 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yi @ 2019-02-06 17:27 UTC (permalink / raw)
To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, mst, ehabkost
Cc: qemu-devel, imammedo, dan.j.williams, Zhang Yi
From: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
---
docs/nvdimm.txt | 25 ++++++++++++++++++++++---
qemu-options.hx | 4 ++++
2 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 5f158a6..e2bf89f 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -143,9 +143,28 @@ Guest Data Persistence
----------------------
Though QEMU supports multiple types of vNVDIMM backends on Linux,
-currently the only one that can guarantee the guest write persistence
-is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
-which all guest access do not involve any host-side kernel cache.
+the only backend that can guarantee the guest write persistence is:
+
+A. DAX device (e.g., /dev/dax0.0, ) or
+B. DAX file(mounted with dax option)
+
+both are from the real NVDIMM device, all guest access do not
+involve any host-side kernel cache.
+
+When using B (A file supporting direct mapping of persistent memory)
+as a backend, write persistence is guaranteed if the host kernel has
+support for the MAP_SYNC flag in the mmap system call (available
+since Linux 4.15 and on certain distro kernels) and additionally
+both 'pmem' and 'share' flags are set to 'on' on the backend.
+
+If these conditions are not satisfied i.e. if either 'pmem' or 'share'
+are not set, if the backend file does not support DAX or if MAP_SYNC
+is not supported by the host kernel, write persistence is not
+guaranteed after a system crash. For compatibility reasons, these
+conditions are silently ignored if not satisfied. Currently, no way
+is provided to test for them.
+For more details, please reference mmap(2) man page:
+http://man7.org/linux/man-pages/man2/mmap.2.html.
When using other types of backends, it's suggested to set 'unarmed'
option of '-device nvdimm' to 'on', which sets the unarmed flag of the
diff --git a/qemu-options.hx b/qemu-options.hx
index 08f8516..0cd41f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
If @option{pmem} is set to 'on', QEMU will take necessary operations to
guarantee the persistence of its own writes to @option{mem-path}
(e.g. in vNVDIMM label emulation and live migration).
+Also, we will map the backend-file with MAP_SYNC flag, which can ensure
+the file metadata is in sync to @option{mem-path} in case of host crash
+or a power failure. MAP_SYNC requires support from both the host kernel
+(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX).
@item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
2019-02-06 17:27 ` [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation Zhang, Yi
@ 2019-02-06 18:29 ` Michael S. Tsirkin
2019-02-07 15:16 ` Yi Zhang
0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-02-06 18:29 UTC (permalink / raw)
To: Zhang, Yi
Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams
On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> From: Zhang Yi <yi.z.zhang@linux.intel.com>
>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> ---
> docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> qemu-options.hx | 4 ++++
> 2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index 5f158a6..e2bf89f 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -143,9 +143,28 @@ Guest Data Persistence
> ----------------------
>
> Though QEMU supports multiple types of vNVDIMM backends on Linux,
> -currently the only one that can guarantee the guest write persistence
> -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> -which all guest access do not involve any host-side kernel cache.
> +the only backend that can guarantee the guest write persistence is:
> +
> +A. DAX device (e.g., /dev/dax0.0, ) or
> +B. DAX file(mounted with dax option)
> +
> +both are from the real NVDIMM device, all guest access do not
> +involve any host-side kernel cache.
I'm not sure - what do above 2 lines mean?
That cache must not be used if persistence is desired?
> +
> +When using B (A file supporting direct mapping of persistent memory)
> +as a backend, write persistence is guaranteed if the host kernel has
> +support for the MAP_SYNC flag in the mmap system call (available
> +since Linux 4.15 and on certain distro kernels) and additionally
> +both 'pmem' and 'share' flags are set to 'on' on the backend.
> +
> +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> +are not set, if the backend file does not support DAX or if MAP_SYNC
> +is not supported by the host kernel, write persistence is not
> +guaranteed after a system crash. For compatibility reasons, these
> +conditions are silently ignored if not satisfied. Currently, no way
> +is provided to test for them.
> +For more details, please reference mmap(2) man page:
> +http://man7.org/linux/man-pages/man2/mmap.2.html.
>
> When using other types of backends, it's suggested to set 'unarmed'
> option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 08f8516..0cd41f4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> If @option{pmem} is set to 'on', QEMU will take necessary operations to
> guarantee the persistence of its own writes to @option{mem-path}
> (e.g. in vNVDIMM label emulation and live migration).
> +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
should be
which ensures
> +the file metadata is in sync to @option{mem-path}
should be
for @option{mem-path}
> in case of host crash
> +or a power failure. MAP_SYNC requires support from both the host kernel
> +(since Linux kernel 4.15) and @option{mem-path}
should be
and the filesystem of @option{mem-path}
> (only files supporting DAX).
>
> @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
>
> --
> 2.7.4
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
2019-02-06 18:29 ` Michael S. Tsirkin
@ 2019-02-07 15:16 ` Yi Zhang
2019-02-07 14:30 ` Michael S. Tsirkin
0 siblings, 1 reply; 11+ messages in thread
From: Yi Zhang @ 2019-02-07 15:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams
On 2019-02-06 at 13:29:37 -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> >
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > ---
> > docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> > qemu-options.hx | 4 ++++
> > 2 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index 5f158a6..e2bf89f 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -143,9 +143,28 @@ Guest Data Persistence
> > ----------------------
> >
> > Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > -currently the only one that can guarantee the guest write persistence
> > -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > -which all guest access do not involve any host-side kernel cache.
> > +the only backend that can guarantee the guest write persistence is:
> > +
> > +A. DAX device (e.g., /dev/dax0.0, ) or
> > +B. DAX file(mounted with dax option)
> > +
> > +both are from the real NVDIMM device, all guest access do not
> > +involve any host-side kernel cache.
>
> I'm not sure - what do above 2 lines mean?
> That cache must not be used if persistence is desired?
same meaning of direct mapping of pmem,
Ah, Maybe better to change to:
"both are backend from the real NVDIMM device, which supportting direct
mapping of persistent memory." ?
>
> > +
> > +When using B (A file supporting direct mapping of persistent memory)
> > +as a backend, write persistence is guaranteed if the host kernel has
> > +support for the MAP_SYNC flag in the mmap system call (available
> > +since Linux 4.15 and on certain distro kernels) and additionally
> > +both 'pmem' and 'share' flags are set to 'on' on the backend.
> > +
> > +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> > +are not set, if the backend file does not support DAX or if MAP_SYNC
> > +is not supported by the host kernel, write persistence is not
> > +guaranteed after a system crash. For compatibility reasons, these
> > +conditions are silently ignored if not satisfied. Currently, no way
> > +is provided to test for them.
> > +For more details, please reference mmap(2) man page:
> > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> >
> > When using other types of backends, it's suggested to set 'unarmed'
> > option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 08f8516..0cd41f4 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > guarantee the persistence of its own writes to @option{mem-path}
> > (e.g. in vNVDIMM label emulation and live migration).
> > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
>
> should be
> which ensures
>
> > +the file metadata is in sync to @option{mem-path}
>
>
> should be
> for @option{mem-path}
>
> > in case of host crash
> > +or a power failure. MAP_SYNC requires support from both the host kernel
> > +(since Linux kernel 4.15) and @option{mem-path}
>
>
> should be
> and the filesystem of @option{mem-path}
Thanks, will update it.
>
> > (only files supporting DAX).
> >
> > @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> >
> > --
> > 2.7.4
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
2019-02-07 15:16 ` Yi Zhang
@ 2019-02-07 14:30 ` Michael S. Tsirkin
2019-02-08 10:07 ` Yi Zhang
0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-02-07 14:30 UTC (permalink / raw)
To: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams
On Thu, Feb 07, 2019 at 11:16:05PM +0800, Yi Zhang wrote:
> On 2019-02-06 at 13:29:37 -0500, Michael S. Tsirkin wrote:
> > On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > >
> > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > ---
> > > docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> > > qemu-options.hx | 4 ++++
> > > 2 files changed, 26 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > index 5f158a6..e2bf89f 100644
> > > --- a/docs/nvdimm.txt
> > > +++ b/docs/nvdimm.txt
> > > @@ -143,9 +143,28 @@ Guest Data Persistence
> > > ----------------------
> > >
> > > Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > -currently the only one that can guarantee the guest write persistence
> > > -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > > -which all guest access do not involve any host-side kernel cache.
> > > +the only backend that can guarantee the guest write persistence is:
> > > +
> > > +A. DAX device (e.g., /dev/dax0.0, ) or
> > > +B. DAX file(mounted with dax option)
> > > +
> > > +both are from the real NVDIMM device, all guest access do not
> > > +involve any host-side kernel cache.
> >
> > I'm not sure - what do above 2 lines mean?
> > That cache must not be used if persistence is desired?
> same meaning of direct mapping of pmem,
> Ah, Maybe better to change to:
> "both are backend from the real NVDIMM device, which supportting direct
> mapping of persistent memory." ?
Yes but typos aside it is still unclear - what is this? An extra
condition when persistence is guaranteed?
> >
> > > +
> > > +When using B (A file supporting direct mapping of persistent memory)
> > > +as a backend, write persistence is guaranteed if the host kernel has
> > > +support for the MAP_SYNC flag in the mmap system call (available
> > > +since Linux 4.15 and on certain distro kernels) and additionally
> > > +both 'pmem' and 'share' flags are set to 'on' on the backend.
> > > +
> > > +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> > > +are not set, if the backend file does not support DAX or if MAP_SYNC
> > > +is not supported by the host kernel, write persistence is not
> > > +guaranteed after a system crash. For compatibility reasons, these
> > > +conditions are silently ignored if not satisfied. Currently, no way
> > > +is provided to test for them.
> > > +For more details, please reference mmap(2) man page:
> > > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > >
> > > When using other types of backends, it's suggested to set 'unarmed'
> > > option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 08f8516..0cd41f4 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > guarantee the persistence of its own writes to @option{mem-path}
> > > (e.g. in vNVDIMM label emulation and live migration).
> > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> >
> > should be
> > which ensures
> >
> > > +the file metadata is in sync to @option{mem-path}
> >
> >
> > should be
> > for @option{mem-path}
> >
> > > in case of host crash
> > > +or a power failure. MAP_SYNC requires support from both the host kernel
> > > +(since Linux kernel 4.15) and @option{mem-path}
> >
> >
> > should be
> > and the filesystem of @option{mem-path}
> Thanks, will update it.
> >
> > > (only files supporting DAX).
> > >
> > > @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > >
> > > --
> > > 2.7.4
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/5] docs: Added MAP_SYNC documentation
2019-02-07 14:30 ` Michael S. Tsirkin
@ 2019-02-08 10:07 ` Yi Zhang
0 siblings, 0 replies; 11+ messages in thread
From: Yi Zhang @ 2019-02-08 10:07 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: xiaoguangrong.eric, stefanha, pbonzini, pagupta, yu.c.zhang,
richardw.yang, ehabkost, qemu-devel, imammedo, dan.j.williams
On 2019-02-07 at 09:30:12 -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 07, 2019 at 11:16:05PM +0800, Yi Zhang wrote:
> > On 2019-02-06 at 13:29:37 -0500, Michael S. Tsirkin wrote:
> > > On Thu, Feb 07, 2019 at 01:27:29AM +0800, Zhang, Yi wrote:
> > > > From: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > >
> > > > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > > > ---
> > > > docs/nvdimm.txt | 25 ++++++++++++++++++++++---
> > > > qemu-options.hx | 4 ++++
> > > > 2 files changed, 26 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > > > index 5f158a6..e2bf89f 100644
> > > > --- a/docs/nvdimm.txt
> > > > +++ b/docs/nvdimm.txt
> > > > @@ -143,9 +143,28 @@ Guest Data Persistence
> > > > ----------------------
> > > >
> > > > Though QEMU supports multiple types of vNVDIMM backends on Linux,
> > > > -currently the only one that can guarantee the guest write persistence
> > > > -is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
> > > > -which all guest access do not involve any host-side kernel cache.
> > > > +the only backend that can guarantee the guest write persistence is:
> > > > +
> > > > +A. DAX device (e.g., /dev/dax0.0, ) or
> > > > +B. DAX file(mounted with dax option)
> > > > +
> > > > +both are from the real NVDIMM device, all guest access do not
> > > > +involve any host-side kernel cache.
Yes, A and B both based on the direct access for files/devices(no page cache)
> > >
> > > I'm not sure - what do above 2 lines mean?
> > > That cache must not be used if persistence is desired?
> > same meaning of direct mapping of pmem,
> > Ah, Maybe better to change to:
> > "both are backend from the real NVDIMM device, which supportting direct
> > mapping of persistent memory." ?
>
> Yes but typos aside it is still unclear - what is this? An extra
> condition when persistence is guaranteed?
>
>
> > >
> > > > +
> > > > +When using B (A file supporting direct mapping of persistent memory)
> > > > +as a backend, write persistence is guaranteed if the host kernel has
> > > > +support for the MAP_SYNC flag in the mmap system call (available
> > > > +since Linux 4.15 and on certain distro kernels) and additionally
> > > > +both 'pmem' and 'share' flags are set to 'on' on the backend.
> > > > +
> > > > +If these conditions are not satisfied i.e. if either 'pmem' or 'share'
> > > > +are not set, if the backend file does not support DAX or if MAP_SYNC
> > > > +is not supported by the host kernel, write persistence is not
> > > > +guaranteed after a system crash. For compatibility reasons, these
> > > > +conditions are silently ignored if not satisfied. Currently, no way
> > > > +is provided to test for them.
> > > > +For more details, please reference mmap(2) man page:
> > > > +http://man7.org/linux/man-pages/man2/mmap.2.html.
> > > >
> > > > When using other types of backends, it's suggested to set 'unarmed'
> > > > option of '-device nvdimm' to 'on', which sets the unarmed flag of the
> > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > index 08f8516..0cd41f4 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVDIMM).
> > > > If @option{pmem} is set to 'on', QEMU will take necessary operations to
> > > > guarantee the persistence of its own writes to @option{mem-path}
> > > > (e.g. in vNVDIMM label emulation and live migration).
> > > > +Also, we will map the backend-file with MAP_SYNC flag, which can ensure
> > >
> > > should be
> > > which ensures
> > >
> > > > +the file metadata is in sync to @option{mem-path}
> > >
> > >
> > > should be
> > > for @option{mem-path}
> > >
> > > > in case of host crash
> > > > +or a power failure. MAP_SYNC requires support from both the host kernel
> > > > +(since Linux kernel 4.15) and @option{mem-path}
> > >
> > >
> > > should be
> > > and the filesystem of @option{mem-path}
> > Thanks, will update it.
> > >
> > > > (only files supporting DAX).
> > > >
> > > > @item -object memory-backend-ram,id=@var{id},merge=@var{on|off},dump=@var{on|off},share=@var{on|off},prealloc=@var{on|off},size=@var{size},host-nodes=@var{host-nodes},policy=@var{default|preferred|bind|interleave}
> > > >
> > > > --
> > > > 2.7.4
>
^ permalink raw reply [flat|nested] 11+ messages in thread