* [Qemu-devel] [PATCH RFC 1/6] configure: probe for memfd
2015-07-23 1:36 [Qemu-devel] [PATCH RFC 0/6] vhost-user: add migration log support Marc-André Lureau
@ 2015-07-23 1:36 ` Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback Marc-André Lureau
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 1:36 UTC (permalink / raw)
To: qemu-devel
Cc: thibaut.collet, pbonzini, haifeng.lin, Marc-André Lureau,
mst
Check if memfd_create() is part of system libc.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
configure | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/configure b/configure
index cc0338d..9a401d4 100755
--- a/configure
+++ b/configure
@@ -3390,6 +3390,22 @@ if compile_prog "" "" ; then
eventfd=yes
fi
+# check if memfd is supported
+memfd=no
+cat > $TMPC << EOF
+#include <sys/memfd.h>
+
+int main(void)
+{
+ return memfd_create("foo", MFD_ALLOW_SEALING);
+}
+EOF
+if compile_prog "" "" ; then
+ memfd=yes
+fi
+
+
+
# check for fallocate
fallocate=no
cat > $TMPC << EOF
@@ -4770,6 +4786,9 @@ fi
if test "$eventfd" = "yes" ; then
echo "CONFIG_EVENTFD=y" >> $config_host_mak
fi
+if test "$memfd" = "yes" ; then
+ echo "CONFIG_MEMFD=y" >> $config_host_mak
+fi
if test "$fallocate" = "yes" ; then
echo "CONFIG_FALLOCATE=y" >> $config_host_mak
fi
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-23 1:36 [Qemu-devel] [PATCH RFC 0/6] vhost-user: add migration log support Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 1/6] configure: probe for memfd Marc-André Lureau
@ 2015-07-23 1:36 ` Marc-André Lureau
2015-07-23 15:25 ` Michael S. Tsirkin
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 3/6] osdep: add memfd helpers Marc-André Lureau
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 1:36 UTC (permalink / raw)
To: qemu-devel
Cc: thibaut.collet, pbonzini, haifeng.lin, Marc-André Lureau,
mst
Implement memfd_create() fallback if not available in system libc.
memfd_create() is still not included in glibc today, atlhough it's been
available since Linux 3.17 in Oct 2014.
memfd has numerous advantages over traditional shm/mmap for ipc memory
sharing with fd handler, which we are going to make use of for
vhost-user logging memory in following patches.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/qemu/osdep.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3247364..adc138b 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -6,6 +6,7 @@
#include <stddef.h>
#include <stdbool.h>
#include <stdint.h>
+#include <unistd.h>
#include <sys/types.h>
#ifdef __OpenBSD__
#include <sys/signal.h>
@@ -20,6 +21,64 @@
#include <sys/time.h>
+#ifdef CONFIG_LINUX
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef MFD_CLOEXEC
+#define MFD_CLOEXEC 0x0001U
+#endif
+
+#ifndef __NR_memfd_create
+# if defined __x86_64__
+# define __NR_memfd_create 319
+# elif defined __arm__
+# define __NR_memfd_create 385
+# elif defined __aarch64__
+# define __NR_memfd_create 279
+# elif defined _MIPS_SIM
+# if _MIPS_SIM == _MIPS_SIM_ABI32
+# define __NR_memfd_create 4354
+# endif
+# if _MIPS_SIM == _MIPS_SIM_NABI32
+# define __NR_memfd_create 6318
+# endif
+# if _MIPS_SIM == _MIPS_SIM_ABI64
+# define __NR_memfd_create 5314
+# endif
+# elif defined __i386__
+# define __NR_memfd_create 356
+# else
+# warning "__NR_memfd_create unknown for your architecture"
+# define __NR_memfd_create 0xffffffff
+# endif
+#endif
+
+#ifndef CONFIG_MEMFD
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif
+
+#endif /* LINUX */
+
#if defined(CONFIG_SOLARIS) && CONFIG_SOLARIS_VERSION < 10
/* [u]int_fast*_t not in <sys/int_types.h> */
typedef unsigned char uint_fast8_t;
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback Marc-André Lureau
@ 2015-07-23 15:25 ` Michael S. Tsirkin
2015-07-28 8:11 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2015-07-23 15:25 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: thibaut.collet, pbonzini, qemu-devel, haifeng.lin
On Thu, Jul 23, 2015 at 03:36:39AM +0200, Marc-André Lureau wrote:
> Implement memfd_create() fallback if not available in system libc.
> memfd_create() is still not included in glibc today, atlhough it's been
> available since Linux 3.17 in Oct 2014.
>
> memfd has numerous advantages over traditional shm/mmap for ipc memory
> sharing with fd handler, which we are going to make use of for
> vhost-user logging memory in following patches.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> include/qemu/osdep.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 59 insertions(+)
>
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 3247364..adc138b 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -6,6 +6,7 @@
> #include <stddef.h>
> #include <stdbool.h>
> #include <stdint.h>
> +#include <unistd.h>
> #include <sys/types.h>
> #ifdef __OpenBSD__
> #include <sys/signal.h>
> @@ -20,6 +21,64 @@
>
> #include <sys/time.h>
>
> +#ifdef CONFIG_LINUX
> +
> +#ifndef F_LINUX_SPECIFIC_BASE
> +#define F_LINUX_SPECIFIC_BASE 1024
> +#endif
> +
> +#ifndef F_ADD_SEALS
> +#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
> +#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
> +
> +#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
> +#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
> +#define F_SEAL_GROW 0x0004 /* prevent file from growing */
> +#define F_SEAL_WRITE 0x0008 /* prevent writes */
> +#endif
These are from include/uapi/linux/fcntl.h,
they should be imported into linux-headers I think.
> +
> +#ifndef MFD_ALLOW_SEALING
> +#define MFD_ALLOW_SEALING 0x0002U
> +#endif
> +
> +#ifndef MFD_CLOEXEC
> +#define MFD_CLOEXEC 0x0001U
> +#endif
> +
> +#ifndef __NR_memfd_create
> +# if defined __x86_64__
> +# define __NR_memfd_create 319
> +# elif defined __arm__
> +# define __NR_memfd_create 385
> +# elif defined __aarch64__
> +# define __NR_memfd_create 279
> +# elif defined _MIPS_SIM
> +# if _MIPS_SIM == _MIPS_SIM_ABI32
> +# define __NR_memfd_create 4354
> +# endif
> +# if _MIPS_SIM == _MIPS_SIM_NABI32
> +# define __NR_memfd_create 6318
> +# endif
> +# if _MIPS_SIM == _MIPS_SIM_ABI64
> +# define __NR_memfd_create 5314
> +# endif
What's defining all these macros?
> +# elif defined __i386__
> +# define __NR_memfd_create 356
> +# else
> +# warning "__NR_memfd_create unknown for your architecture"
> +# define __NR_memfd_create 0xffffffff
> +# endif
> +#endif
> +
> +#ifndef CONFIG_MEMFD
> +static inline int memfd_create(const char *name, unsigned int flags)
> +{
> + return syscall(__NR_memfd_create, name, flags);
> +}
> +#endif
How about making these non-inline?
I think we need stubs for non-posix systems, right?
> +
> +#endif /* LINUX */
> +
> #if defined(CONFIG_SOLARIS) && CONFIG_SOLARIS_VERSION < 10
> /* [u]int_fast*_t not in <sys/int_types.h> */
> typedef unsigned char uint_fast8_t;
> --
> 2.4.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-23 15:25 ` Michael S. Tsirkin
@ 2015-07-28 8:11 ` Paolo Bonzini
2015-07-28 10:58 ` Marc-André Lureau
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2015-07-28 8:11 UTC (permalink / raw)
To: Michael S. Tsirkin, Marc-André Lureau
Cc: thibaut.collet, qemu-devel, haifeng.lin
On 23/07/2015 17:25, Michael S. Tsirkin wrote:
> > +#ifdef CONFIG_LINUX
> > +
> > +#ifndef F_LINUX_SPECIFIC_BASE
> > +#define F_LINUX_SPECIFIC_BASE 1024
> > +#endif
> > +
> > +#ifndef F_ADD_SEALS
> > +#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
> > +#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
> > +
> > +#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
> > +#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
> > +#define F_SEAL_GROW 0x0004 /* prevent file from growing */
> > +#define F_SEAL_WRITE 0x0008 /* prevent writes */
> > +#endif
>
> These are from include/uapi/linux/fcntl.h,
> they should be imported into linux-headers I think.
linux-headers is usually used for virt-related features that we want in
QEMU a few weeks before they are distributed upstream.
Here, I think just including linux/fcntl.h is enough.
>> +#ifndef __NR_memfd_create
>> +# if defined __x86_64__
>> +# define __NR_memfd_create 319
>> +# elif defined __arm__
>> +# define __NR_memfd_create 385
>> +# elif defined __aarch64__
>> +# define __NR_memfd_create 279
>> +# elif defined _MIPS_SIM
>> +# if _MIPS_SIM == _MIPS_SIM_ABI32
>> +# define __NR_memfd_create 4354
>> +# endif
>> +# if _MIPS_SIM == _MIPS_SIM_NABI32
>> +# define __NR_memfd_create 6318
>> +# endif
>> +# if _MIPS_SIM == _MIPS_SIM_ABI64
>> +# define __NR_memfd_create 5314
>> +# endif
>
> What's defining all these macros?
They're in asm/unistd.h.
I think that, instead of making qemu/osdep.h the new qemu-common.h, the
wrappers added by patch 3 should be declared in a new header
qemu/memfd.h. The implementation in util/memfd.c can include both
linux/fcntl.h and asm/unistd.h.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-28 8:11 ` Paolo Bonzini
@ 2015-07-28 10:58 ` Marc-André Lureau
2015-07-28 11:50 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-28 10:58 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Linhaifeng, Thibaut Collet, QEMU, Michael S. Tsirkin
Hi
On Tue, Jul 28, 2015 at 10:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> What's defining all these macros?
>
> They're in asm/unistd.h.
>
> I think that, instead of making qemu/osdep.h the new qemu-common.h, the
> wrappers added by patch 3 should be declared in a new header
> qemu/memfd.h. The implementation in util/memfd.c can include both
> linux/fcntl.h and asm/unistd.h.
>
Ok, shouldn't it keep the inline function? this avoids future clash
when upgrading glibc.
--
Marc-André Lureau
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-28 10:58 ` Marc-André Lureau
@ 2015-07-28 11:50 ` Paolo Bonzini
2015-07-28 14:25 ` Marc-André Lureau
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2015-07-28 11:50 UTC (permalink / raw)
To: Marc-André Lureau
Cc: Thibaut Collet, Michael S. Tsirkin, Linhaifeng, QEMU
On 28/07/2015 12:58, Marc-André Lureau wrote:
> Hi
>
> On Tue, Jul 28, 2015 at 10:11 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> What's defining all these macros?
>>
>> They're in asm/unistd.h.
>>
>> I think that, instead of making qemu/osdep.h the new qemu-common.h, the
>> wrappers added by patch 3 should be declared in a new header
>> qemu/memfd.h. The implementation in util/memfd.c can include both
>> linux/fcntl.h and asm/unistd.h.
>>
>
> Ok, shouldn't it keep the inline function? this avoids future clash
> when upgrading glibc.
Can the inline function stay in util/memfd.c?
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-28 11:50 ` Paolo Bonzini
@ 2015-07-28 14:25 ` Marc-André Lureau
2015-07-28 16:37 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-28 14:25 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Thibaut Collet, Michael S. Tsirkin, Linhaifeng, QEMU
Hi
On Tue, Jul 28, 2015 at 1:50 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Can the inline function stay in util/memfd.c?
I see little benefits in that, only the qemu_memfd_alloc helpers would
then be exported. Then the inline is probably unnecessary if moved in
the memfd.c.
--
Marc-André Lureau
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback
2015-07-28 14:25 ` Marc-André Lureau
@ 2015-07-28 16:37 ` Paolo Bonzini
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-07-28 16:37 UTC (permalink / raw)
To: Marc-André Lureau
Cc: Thibaut Collet, Michael S. Tsirkin, Linhaifeng, QEMU
On 28/07/2015 16:25, Marc-André Lureau wrote:
> > Can the inline function stay in util/memfd.c?
> I see little benefits in that, only the qemu_memfd_alloc helpers would
> then be exported. Then the inline is probably unnecessary if moved in
> the memfd.c.
That's just a matter of taste, I agree.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC 3/6] osdep: add memfd helpers
2015-07-23 1:36 [Qemu-devel] [PATCH RFC 0/6] vhost-user: add migration log support Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 1/6] configure: probe for memfd Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 2/6] posix: add linux-only memfd fallback Marc-André Lureau
@ 2015-07-23 1:36 ` Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log Marc-André Lureau
` (2 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 1:36 UTC (permalink / raw)
To: qemu-devel
Cc: thibaut.collet, pbonzini, haifeng.lin, Marc-André Lureau,
mst
Add qemu_memfd_alloc/free() helpers.
The function helps to allocate and seal a memfd, and implements an
open/unlink/mmap fallback for system that do not support memfd.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
include/qemu/osdep.h | 5 +++++
util/oslib-posix.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 67 insertions(+)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index adc138b..c49145f 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -167,6 +167,11 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align);
void qemu_vfree(void *ptr);
void qemu_anon_ram_free(void *ptr, size_t size);
+void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals,
+ int *fd);
+void qemu_memfd_free(void *ptr, size_t size, int fd);
+
+
#define QEMU_MADV_INVALID -1
#if defined(CONFIG_MADVISE)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 3ae4987..6e5a143 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -482,3 +482,65 @@ int qemu_read_password(char *buf, int buf_size)
printf("\n");
return ret;
}
+
+void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals,
+ int *fd)
+{
+ void *ptr;
+ int mfd;
+
+ mfd = memfd_create(name, MFD_ALLOW_SEALING|MFD_CLOEXEC);
+ if (mfd != -1) {
+ if (ftruncate(mfd, size) == -1) {
+ perror("ftruncate");
+ close(mfd);
+ return NULL;
+ }
+
+ if (fcntl(mfd, F_ADD_SEALS, seals) == -1) {
+ perror("fcntl");
+ close(mfd);
+ return NULL;
+ }
+ } else {
+ const char *tmpdir = getenv("TMPDIR");
+ gchar *fname;
+
+ tmpdir = tmpdir ? tmpdir : "/tmp";
+
+ fname = g_strdup_printf("%s/memfd-XXXXXX", tmpdir);
+ mfd = mkstemp(fname);
+ unlink(fname);
+ g_free(fname);
+
+ if (mfd == -1) {
+ perror("mkstemp");
+ return NULL;
+ }
+
+ if (ftruncate(mfd, size) == -1) {
+ perror("ftruncate");
+ close(mfd);
+ return NULL;
+ }
+ }
+
+ ptr = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, mfd, 0);
+ if (ptr == MAP_FAILED) {
+ perror("mmap");
+ close(mfd);
+ return NULL;
+ }
+
+ *fd = mfd;
+ return ptr;
+}
+
+void qemu_memfd_free(void *ptr, size_t size, int fd)
+{
+ if (ptr) {
+ munmap(ptr, size);
+ }
+
+ close(fd);
+}
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log
2015-07-23 1:36 [Qemu-devel] [PATCH RFC 0/6] vhost-user: add migration log support Marc-André Lureau
` (2 preceding siblings ...)
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 3/6] osdep: add memfd helpers Marc-André Lureau
@ 2015-07-23 1:36 ` Marc-André Lureau
2015-07-28 5:28 ` Jason Wang
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 5/6] vhost-user: send log shm fd along with log_base Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 6/6] vhost-user: document migration log Marc-André Lureau
5 siblings, 1 reply; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 1:36 UTC (permalink / raw)
To: qemu-devel
Cc: thibaut.collet, pbonzini, haifeng.lin, Marc-André Lureau,
mst
If the backend is of type VHOST_BACKEND_TYPE_USER, allocate
shareable memory.
Note: vhost_log_get() can use a global "vhost_log" that can be shared by
several vhost devices. We may want instead a common shareable log and a
common non-shareable one.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
hw/virtio/vhost.c | 42 +++++++++++++++++++++++++++++++++---------
include/hw/virtio/vhost.h | 3 ++-
2 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2712c6f..12dd644 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -286,20 +286,34 @@ static uint64_t vhost_get_log_size(struct vhost_dev *dev)
}
return log_size;
}
-static struct vhost_log *vhost_log_alloc(uint64_t size)
+
+static struct vhost_log *vhost_log_alloc(uint64_t size, bool share)
{
- struct vhost_log *log = g_malloc0(sizeof *log + size * sizeof(*(log->log)));
+ struct vhost_log *log;
+ uint64_t logsize = size * sizeof(*(log->log));
+ int fd = -1;
+
+ log = g_new0(struct vhost_log, 1);
+ if (share) {
+ log->log = qemu_memfd_alloc("vhost-log", logsize,
+ F_SEAL_GROW|F_SEAL_SHRINK|F_SEAL_SEAL, &fd);
+ memset(log->log, 0, logsize);
+ } else {
+ log->log = g_malloc0(logsize);
+ }
log->size = size;
log->refcnt = 1;
+ log->fd = fd;
return log;
}
-static struct vhost_log *vhost_log_get(uint64_t size)
+static struct vhost_log *vhost_log_get(uint64_t size, bool share)
{
- if (!vhost_log || vhost_log->size != size) {
- vhost_log = vhost_log_alloc(size);
+ if (!vhost_log || vhost_log->size != size ||
+ (share && vhost_log->fd == -1)) {
+ vhost_log = vhost_log_alloc(size, share);
} else {
++vhost_log->refcnt;
}
@@ -324,21 +338,30 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
if (vhost_log == log) {
vhost_log = NULL;
}
+
+ if (log->fd == -1) {
+ g_free(log->log);
+ } else {
+ qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
+ log->fd);
+ }
g_free(log);
}
}
static inline void vhost_dev_log_resize(struct vhost_dev* dev, uint64_t size)
{
- struct vhost_log *log = vhost_log_get(size);
+ bool share = dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
+ struct vhost_log *log = vhost_log_get(size, share);
uint64_t log_base = (uintptr_t)log->log;
int r;
- r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
- assert(r >= 0);
vhost_log_put(dev, true);
dev->log = log;
dev->log_size = size;
+
+ r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
+ assert(r >= 0);
}
static int vhost_verify_ring_mappings(struct vhost_dev *dev,
@@ -1136,9 +1159,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
if (hdev->log_enabled) {
uint64_t log_base;
+ bool share = hdev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
hdev->log_size = vhost_get_log_size(hdev);
- hdev->log = vhost_log_get(hdev->log_size);
+ hdev->log = vhost_log_get(hdev->log_size, share);
log_base = (uintptr_t)hdev->log->log;
r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_LOG_BASE,
hdev->log_size ? &log_base : NULL);
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6467c73..ab1dcac 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -31,7 +31,8 @@ typedef unsigned long vhost_log_chunk_t;
struct vhost_log {
unsigned long long size;
int refcnt;
- vhost_log_chunk_t log[0];
+ int fd;
+ vhost_log_chunk_t *log;
};
struct vhost_memory;
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log Marc-André Lureau
@ 2015-07-28 5:28 ` Jason Wang
2015-07-28 10:10 ` Michael S. Tsirkin
0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2015-07-28 5:28 UTC (permalink / raw)
To: Marc-André Lureau, qemu-devel
Cc: thibaut.collet, mst, haifeng.lin, pbonzini
On 07/23/2015 09:36 AM, Marc-André Lureau wrote:
> If the backend is of type VHOST_BACKEND_TYPE_USER, allocate
> shareable memory.
>
> Note: vhost_log_get() can use a global "vhost_log" that can be shared by
> several vhost devices. We may want instead a common shareable log and a
> common non-shareable one.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> hw/virtio/vhost.c | 42 +++++++++++++++++++++++++++++++++---------
> include/hw/virtio/vhost.h | 3 ++-
> 2 files changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 2712c6f..12dd644 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -286,20 +286,34 @@ static uint64_t vhost_get_log_size(struct vhost_dev *dev)
> }
> return log_size;
> }
> -static struct vhost_log *vhost_log_alloc(uint64_t size)
> +
> +static struct vhost_log *vhost_log_alloc(uint64_t size, bool share)
> {
> - struct vhost_log *log = g_malloc0(sizeof *log + size * sizeof(*(log->log)));
> + struct vhost_log *log;
> + uint64_t logsize = size * sizeof(*(log->log));
> + int fd = -1;
> +
> + log = g_new0(struct vhost_log, 1);
> + if (share) {
> + log->log = qemu_memfd_alloc("vhost-log", logsize,
> + F_SEAL_GROW|F_SEAL_SHRINK|F_SEAL_SEAL, &fd);
> + memset(log->log, 0, logsize);
> + } else {
> + log->log = g_malloc0(logsize);
> + }
>
> log->size = size;
> log->refcnt = 1;
> + log->fd = fd;
>
> return log;
> }
>
> -static struct vhost_log *vhost_log_get(uint64_t size)
> +static struct vhost_log *vhost_log_get(uint64_t size, bool share)
> {
> - if (!vhost_log || vhost_log->size != size) {
> - vhost_log = vhost_log_alloc(size);
> + if (!vhost_log || vhost_log->size != size ||
> + (share && vhost_log->fd == -1)) {
> + vhost_log = vhost_log_alloc(size, share);
> } else {
> ++vhost_log->refcnt;
> }
> @@ -324,21 +338,30 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
> if (vhost_log == log) {
> vhost_log = NULL;
> }
> +
> + if (log->fd == -1) {
> + g_free(log->log);
> + } else {
> + qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
> + log->fd);
> + }
> g_free(log);
> }
> }
>
> static inline void vhost_dev_log_resize(struct vhost_dev* dev, uint64_t size)
> {
> - struct vhost_log *log = vhost_log_get(size);
> + bool share = dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
> + struct vhost_log *log = vhost_log_get(size, share);
> uint64_t log_base = (uintptr_t)log->log;
> int r;
>
> - r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
> - assert(r >= 0);
> vhost_log_put(dev, true);
> dev->log = log;
> dev->log_size = size;
> +
> + r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
> + assert(r >= 0);
> }
Why this change is needed?
>
> static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> @@ -1136,9 +1159,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>
> if (hdev->log_enabled) {
> uint64_t log_base;
> + bool share = hdev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
>
> hdev->log_size = vhost_get_log_size(hdev);
> - hdev->log = vhost_log_get(hdev->log_size);
> + hdev->log = vhost_log_get(hdev->log_size, share);
> log_base = (uintptr_t)hdev->log->log;
> r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_LOG_BASE,
> hdev->log_size ? &log_base : NULL);
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 6467c73..ab1dcac 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -31,7 +31,8 @@ typedef unsigned long vhost_log_chunk_t;
> struct vhost_log {
> unsigned long long size;
> int refcnt;
> - vhost_log_chunk_t log[0];
> + int fd;
> + vhost_log_chunk_t *log;
> };
>
> struct vhost_memory;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log
2015-07-28 5:28 ` Jason Wang
@ 2015-07-28 10:10 ` Michael S. Tsirkin
2015-07-28 14:42 ` Marc-André Lureau
0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2015-07-28 10:10 UTC (permalink / raw)
To: Jason Wang
Cc: haifeng.lin, Marc-André Lureau, pbonzini, qemu-devel,
thibaut.collet
On Tue, Jul 28, 2015 at 01:28:05PM +0800, Jason Wang wrote:
>
>
> On 07/23/2015 09:36 AM, Marc-André Lureau wrote:
> > If the backend is of type VHOST_BACKEND_TYPE_USER, allocate
> > shareable memory.
> >
> > Note: vhost_log_get() can use a global "vhost_log" that can be shared by
> > several vhost devices. We may want instead a common shareable log and a
> > common non-shareable one.
> >
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ---
> > hw/virtio/vhost.c | 42 +++++++++++++++++++++++++++++++++---------
> > include/hw/virtio/vhost.h | 3 ++-
> > 2 files changed, 35 insertions(+), 10 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 2712c6f..12dd644 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -286,20 +286,34 @@ static uint64_t vhost_get_log_size(struct vhost_dev *dev)
> > }
> > return log_size;
> > }
> > -static struct vhost_log *vhost_log_alloc(uint64_t size)
> > +
> > +static struct vhost_log *vhost_log_alloc(uint64_t size, bool share)
> > {
> > - struct vhost_log *log = g_malloc0(sizeof *log + size * sizeof(*(log->log)));
> > + struct vhost_log *log;
> > + uint64_t logsize = size * sizeof(*(log->log));
> > + int fd = -1;
> > +
> > + log = g_new0(struct vhost_log, 1);
> > + if (share) {
> > + log->log = qemu_memfd_alloc("vhost-log", logsize,
> > + F_SEAL_GROW|F_SEAL_SHRINK|F_SEAL_SEAL, &fd);
> > + memset(log->log, 0, logsize);
> > + } else {
> > + log->log = g_malloc0(logsize);
> > + }
> >
> > log->size = size;
> > log->refcnt = 1;
> > + log->fd = fd;
> >
> > return log;
> > }
> >
> > -static struct vhost_log *vhost_log_get(uint64_t size)
> > +static struct vhost_log *vhost_log_get(uint64_t size, bool share)
> > {
> > - if (!vhost_log || vhost_log->size != size) {
> > - vhost_log = vhost_log_alloc(size);
> > + if (!vhost_log || vhost_log->size != size ||
> > + (share && vhost_log->fd == -1)) {
> > + vhost_log = vhost_log_alloc(size, share);
> > } else {
> > ++vhost_log->refcnt;
> > }
> > @@ -324,21 +338,30 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
> > if (vhost_log == log) {
> > vhost_log = NULL;
> > }
> > +
> > + if (log->fd == -1) {
> > + g_free(log->log);
> > + } else {
> > + qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
> > + log->fd);
> > + }
> > g_free(log);
> > }
> > }
> >
> > static inline void vhost_dev_log_resize(struct vhost_dev* dev, uint64_t size)
> > {
> > - struct vhost_log *log = vhost_log_get(size);
> > + bool share = dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
> > + struct vhost_log *log = vhost_log_get(size, share);
> > uint64_t log_base = (uintptr_t)log->log;
> > int r;
> >
> > - r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
> > - assert(r >= 0);
> > vhost_log_put(dev, true);
> > dev->log = log;
> > dev->log_size = size;
> > +
> > + r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
> > + assert(r >= 0);
> > }
>
> Why this change is needed?
I know why it's needed :) But this needs to be stated in the commit log.
Also, it only makes sense if remote supports getting the logfd.
> >
> > static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> > @@ -1136,9 +1159,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >
> > if (hdev->log_enabled) {
> > uint64_t log_base;
> > + bool share = hdev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER;
> >
> > hdev->log_size = vhost_get_log_size(hdev);
> > - hdev->log = vhost_log_get(hdev->log_size);
> > + hdev->log = vhost_log_get(hdev->log_size, share);
> > log_base = (uintptr_t)hdev->log->log;
> > r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_LOG_BASE,
> > hdev->log_size ? &log_base : NULL);
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 6467c73..ab1dcac 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -31,7 +31,8 @@ typedef unsigned long vhost_log_chunk_t;
> > struct vhost_log {
> > unsigned long long size;
> > int refcnt;
> > - vhost_log_chunk_t log[0];
> > + int fd;
> > + vhost_log_chunk_t *log;
> > };
> >
> > struct vhost_memory;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log
2015-07-28 10:10 ` Michael S. Tsirkin
@ 2015-07-28 14:42 ` Marc-André Lureau
0 siblings, 0 replies; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-28 14:42 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Paolo Bonzini, Jason Wang, Linhaifeng, Thibaut Collet, QEMU
Hi
On Tue, Jul 28, 2015 at 12:10 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> I know why it's needed :) But this needs to be stated in the commit log.
> Also, it only makes sense if remote supports getting the logfd.
Thanks for pointing out this change. Actually, I think the current log
overlap when resizing is on purpose: there shouldn't be any time
without log. I'll rework that to keep the same ordering, keeping a
gap-less log switching. I'll also comment this part, as this is easy
to overlook.
--
Marc-André Lureau
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC 5/6] vhost-user: send log shm fd along with log_base
2015-07-23 1:36 [Qemu-devel] [PATCH RFC 0/6] vhost-user: add migration log support Marc-André Lureau
` (3 preceding siblings ...)
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 4/6] vhost: alloc shareable log Marc-André Lureau
@ 2015-07-23 1:36 ` Marc-André Lureau
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 6/6] vhost-user: document migration log Marc-André Lureau
5 siblings, 0 replies; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 1:36 UTC (permalink / raw)
To: qemu-devel
Cc: thibaut.collet, pbonzini, haifeng.lin, Marc-André Lureau,
mst
Send the shm for the dirty pages logging if the backend support
VHOST_USER_PROTOCOL_F_LOG_SHMFD.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
hw/virtio/vhost-user.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 4993b63..fe75618 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -26,7 +26,9 @@
#define VHOST_MEMORY_MAX_NREGIONS 8
#define VHOST_USER_F_PROTOCOL_FEATURES 30
-#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x0ULL
+
+#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x1ULL
+#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
typedef enum VhostUserRequest {
VHOST_USER_NONE = 0,
@@ -213,8 +215,15 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
need_reply = 1;
break;
- case VHOST_USER_SET_FEATURES:
case VHOST_USER_SET_LOG_BASE:
+ if (__virtio_has_feature(dev->protocol_features,
+ VHOST_USER_PROTOCOL_F_LOG_SHMFD) &&
+ dev->log->fd != -1) {
+ fds[fd_num++] = dev->log->fd;
+ }
+ /* fall through */
+
+ case VHOST_USER_SET_FEATURES:
msg.u64 = *((__u64 *) arg);
msg.size = sizeof(m.u64);
break;
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC 6/6] vhost-user: document migration log
2015-07-23 1:36 [Qemu-devel] [PATCH RFC 0/6] vhost-user: add migration log support Marc-André Lureau
` (4 preceding siblings ...)
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 5/6] vhost-user: send log shm fd along with log_base Marc-André Lureau
@ 2015-07-23 1:36 ` Marc-André Lureau
2015-07-23 15:30 ` Michael S. Tsirkin
5 siblings, 1 reply; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 1:36 UTC (permalink / raw)
To: qemu-devel
Cc: thibaut.collet, pbonzini, haifeng.lin, Marc-André Lureau,
mst
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
docs/specs/vhost-user.txt | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 0062baa..c2d2e2a 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -120,6 +120,7 @@ There are several messages that the master sends with file descriptors passed
in the ancillary data:
* VHOST_SET_MEM_TABLE
+ * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
* VHOST_SET_LOG_FD
* VHOST_SET_VRING_KICK
* VHOST_SET_VRING_CALL
@@ -135,6 +136,11 @@ As older slaves don't support negotiating protocol features,
a feature bit was dedicated for this purpose:
#define VHOST_USER_F_PROTOCOL_FEATURES 30
+Protocol features
+-----------------
+
+#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
+
Message types
-------------
@@ -301,3 +307,37 @@ Message types
Bits (0-7) of the payload contain the vring index. Bit 8 is the
invalid FD flag. This flag is set when there is no file descriptor
in the ancillary data.
+
+Migration
+---------
+
+During live migration, the master may need to track the modifications
+the slave makes to the memory mapped regions. The client should mark
+the dirty pages in a log. Once it complies to this logging, it may
+declare VHOST_F_LOG_ALL has a vhost feature.
+
+All the modifications to memory pointed by vring "descriptor" should
+be marked. Modifications to "used" vring should be marked if
+VHOST_VRING_F_LOG is part of ring's features.
+
+Dirty pages are of size:
+#define VHOST_LOG_PAGE 0x1000
+
+The log memory fd is provided in the ancillary data of
+VHOST_USER_SET_LOG_BASE message when the slave has
+VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
+
+The size of the log may be computed by using all the known guest
+addresses. The log covers from address 0 to the maximum of guest
+regions. In pseudo-code, to mark page at "addr" as dirty:
+
+page = addr / VHOST_LOG_PAGE
+log[page / 8] |= 1 << page % 8
+
+VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
+ancillary data, it may be used to inform the master that the log has
+been modified.
+
+Once the source has finished migration, VHOST_USER_RESET_OWNER message
+will be sent by the source. No further update must be done before the
+destination takes over with new regions & rings.
--
2.4.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 6/6] vhost-user: document migration log
2015-07-23 1:36 ` [Qemu-devel] [PATCH RFC 6/6] vhost-user: document migration log Marc-André Lureau
@ 2015-07-23 15:30 ` Michael S. Tsirkin
2015-07-23 15:36 ` Marc-André Lureau
0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2015-07-23 15:30 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: thibaut.collet, pbonzini, qemu-devel, haifeng.lin
On Thu, Jul 23, 2015 at 03:36:43AM +0200, Marc-André Lureau wrote:
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
for some reason I didn't get 5/6.
> ---
> docs/specs/vhost-user.txt | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 0062baa..c2d2e2a 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -120,6 +120,7 @@ There are several messages that the master sends with file descriptors passed
> in the ancillary data:
>
> * VHOST_SET_MEM_TABLE
> + * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> * VHOST_SET_LOG_FD
> * VHOST_SET_VRING_KICK
> * VHOST_SET_VRING_CALL
> @@ -135,6 +136,11 @@ As older slaves don't support negotiating protocol features,
> a feature bit was dedicated for this purpose:
> #define VHOST_USER_F_PROTOCOL_FEATURES 30
>
> +Protocol features
> +-----------------
> +
> +#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
> +
> Message types
> -------------
>
> @@ -301,3 +307,37 @@ Message types
> Bits (0-7) of the payload contain the vring index. Bit 8 is the
> invalid FD flag. This flag is set when there is no file descriptor
> in the ancillary data.
> +
> +Migration
> +---------
> +
> +During live migration, the master may need to track the modifications
> +the slave makes to the memory mapped regions. The client should mark
> +the dirty pages in a log. Once it complies to this logging, it may
> +declare VHOST_F_LOG_ALL has a vhost feature.
> +
> +All the modifications to memory pointed by vring "descriptor" should
> +be marked. Modifications to "used" vring should be marked if
> +VHOST_VRING_F_LOG is part of ring's features.
It's device's features I think.
> +
> +Dirty pages are of size:
> +#define VHOST_LOG_PAGE 0x1000
> +
> +The log memory fd is provided in the ancillary data of
> +VHOST_USER_SET_LOG_BASE message when the slave has
> +VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
> +
> +The size of the log may be computed by using all the known guest
> +addresses. The log covers from address 0 to the maximum of guest
> +regions. In pseudo-code, to mark page at "addr" as dirty:
> +
> +page = addr / VHOST_LOG_PAGE
> +log[page / 8] |= 1 << page % 8
Pls note it must be done atomically.
> +
> +VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
> +ancillary data, it may be used to inform the master that the log has
> +been modified.
> +
> +Once the source has finished migration, VHOST_USER_RESET_OWNER message
> +will be sent by the source. No further update must be done before the
> +destination takes over with new regions & rings.
> --
> 2.4.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC 6/6] vhost-user: document migration log
2015-07-23 15:30 ` Michael S. Tsirkin
@ 2015-07-23 15:36 ` Marc-André Lureau
0 siblings, 0 replies; 18+ messages in thread
From: Marc-André Lureau @ 2015-07-23 15:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: haifeng lin, Marc-André Lureau, pbonzini, qemu-devel,
thibaut collet
Hi
----- Original Message -----
> On Thu, Jul 23, 2015 at 03:36:43AM +0200, Marc-André Lureau wrote:
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>
> for some reason I didn't get 5/6.
>
strange: http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04640.html
> > ---
> > docs/specs/vhost-user.txt | 40 ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 40 insertions(+)
> >
> > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > index 0062baa..c2d2e2a 100644
> > --- a/docs/specs/vhost-user.txt
> > +++ b/docs/specs/vhost-user.txt
> > @@ -120,6 +120,7 @@ There are several messages that the master sends with
> > file descriptors passed
> > in the ancillary data:
> >
> > * VHOST_SET_MEM_TABLE
> > + * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
> > * VHOST_SET_LOG_FD
> > * VHOST_SET_VRING_KICK
> > * VHOST_SET_VRING_CALL
> > @@ -135,6 +136,11 @@ As older slaves don't support negotiating protocol
> > features,
> > a feature bit was dedicated for this purpose:
> > #define VHOST_USER_F_PROTOCOL_FEATURES 30
> >
> > +Protocol features
> > +-----------------
> > +
> > +#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 0
> > +
> > Message types
> > -------------
> >
> > @@ -301,3 +307,37 @@ Message types
> > Bits (0-7) of the payload contain the vring index. Bit 8 is the
> > invalid FD flag. This flag is set when there is no file descriptor
> > in the ancillary data.
> > +
> > +Migration
> > +---------
> > +
> > +During live migration, the master may need to track the modifications
> > +the slave makes to the memory mapped regions. The client should mark
> > +the dirty pages in a log. Once it complies to this logging, it may
> > +declare VHOST_F_LOG_ALL has a vhost feature.
> > +
> > +All the modifications to memory pointed by vring "descriptor" should
> > +be marked. Modifications to "used" vring should be marked if
> > +VHOST_VRING_F_LOG is part of ring's features.
>
> It's device's features I think.
Hmm, it's part of both, not sure why: see vhost_virtqueue_set_addr() and vhost_dev_set_features()
Not sure it's correct in device features, it doesn't seem to be check in kernel vhost.c either.
There is also some dead definitions like VHOST_MEMORY_F_LOG there
>
> > +
> > +Dirty pages are of size:
> > +#define VHOST_LOG_PAGE 0x1000
> > +
> > +The log memory fd is provided in the ancillary data of
> > +VHOST_USER_SET_LOG_BASE message when the slave has
> > +VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
> > +
> > +The size of the log may be computed by using all the known guest
> > +addresses. The log covers from address 0 to the maximum of guest
> > +regions. In pseudo-code, to mark page at "addr" as dirty:
> > +
> > +page = addr / VHOST_LOG_PAGE
> > +log[page / 8] |= 1 << page % 8
>
> Pls note it must be done atomically.
ok
>
>
> > +
> > +VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
> > +ancillary data, it may be used to inform the master that the log has
> > +been modified.
> > +
> > +Once the source has finished migration, VHOST_USER_RESET_OWNER message
> > +will be sent by the source. No further update must be done before the
> > +destination takes over with new regions & rings.
> > --
> > 2.4.3
>
^ permalink raw reply [flat|nested] 18+ messages in thread