* [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
@ 2008-10-12 16:30 Uri Lublin
2008-10-12 16:55 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: Uri Lublin @ 2008-10-12 16:30 UTC (permalink / raw)
To: kvm; +Cc: Uri Lublin
Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but
if both are given (non NULL) we encounter problems:
1. There is only one buffer and index, which may mean data corruption.
2. qemu_flush (which is also called by qemu_fclose) is writing ("flushing")
some of the data that was read (for the reader part).
Currently qemu_fopen_fd registers both get_buffer and put_buffer functions.
This breaks migration for tcp and ssh migration protocols.
The following patch fix the above by:
1. It makes sure that at most one of get_buffer and put_buffer is
given to qemu_fopen_ops.
2. It changes qemu_fopen_fd to register only get_buffer for a reader
and only put_buffer for a writer (adding a 'reader' parameter).
3. The incoming fd migration code calls qemu_fopen_fd as a reader only.
Signed-off-by: Uri Lublin <uril@qumranet.com>
---
qemu/hw/hw.h | 2 +-
qemu/migration.c | 2 +-
qemu/vl.c | 12 ++++++++++--
3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/qemu/hw/hw.h b/qemu/hw/hw.h
index c9390c1..d965c47 100644
--- a/qemu/hw/hw.h
+++ b/qemu/hw/hw.h
@@ -34,7 +34,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
QEMUFileCloseFunc *close,
QEMUFileRateLimit *rate_limit);
QEMUFile *qemu_fopen(const char *filename, const char *mode);
-QEMUFile *qemu_fopen_fd(int fd);
+QEMUFile *qemu_fopen_fd(int fd, int reader);
void qemu_fflush(QEMUFile *f);
int qemu_fclose(QEMUFile *f);
void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
diff --git a/qemu/migration.c b/qemu/migration.c
index 44cb9eb..587c67e 100644
--- a/qemu/migration.c
+++ b/qemu/migration.c
@@ -820,7 +820,7 @@ static int migrate_incoming_page(QEMUFile *f, uint32_t addr)
static int migrate_incoming_fd(int fd)
{
int ret = 0;
- QEMUFile *f = qemu_fopen_fd(fd);
+ QEMUFile *f = qemu_fopen_fd(fd, 1);
uint32_t addr, size;
extern void qemu_announce_self(void);
unsigned char running;
diff --git a/qemu/vl.c b/qemu/vl.c
index 36e3bb7..1ce188b 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -6712,7 +6712,7 @@ static int fd_close(void *opaque)
return 0;
}
-QEMUFile *qemu_fopen_fd(int fd)
+QEMUFile *qemu_fopen_fd(int fd, int reader)
{
QEMUFileFD *s = qemu_mallocz(sizeof(QEMUFileFD));
@@ -6720,7 +6720,10 @@ QEMUFile *qemu_fopen_fd(int fd)
return NULL;
s->fd = fd;
- s->file = qemu_fopen_ops(s, fd_put_buffer, fd_get_buffer, fd_close, NULL);
+ if (reader)
+ s->file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close, NULL);
+ else
+ s->file = qemu_fopen_ops(s, fd_put_buffer, NULL, fd_close, NULL);
return s->file;
}
@@ -6826,6 +6829,11 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
{
QEMUFile *f;
+ if (put_buffer && get_buffer) {
+ fprintf(stderr, "%s: only one of get_buffer and put_buffer "
+ "functions may be given\n", __FUNCTION__);
+ return NULL;
+ }
f = qemu_mallocz(sizeof(QEMUFile));
if (!f)
return NULL;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-12 16:30 [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user Uri Lublin @ 2008-10-12 16:55 ` Avi Kivity 2008-10-12 18:09 ` Anthony Liguori 2008-10-16 0:13 ` Uri Lublin 0 siblings, 2 replies; 21+ messages in thread From: Avi Kivity @ 2008-10-12 16:55 UTC (permalink / raw) To: Uri Lublin; +Cc: kvm, Anthony Liguori Uri Lublin wrote: > Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but > if both are given (non NULL) we encounter problems: > 1. There is only one buffer and index, which may mean data corruption. > 2. qemu_flush (which is also called by qemu_fclose) is writing ("flushing") > some of the data that was read (for the reader part). > > Currently qemu_fopen_fd registers both get_buffer and put_buffer functions. > > This breaks migration for tcp and ssh migration protocols. > > The following patch fix the above by: > 1. It makes sure that at most one of get_buffer and put_buffer is > given to qemu_fopen_ops. > 2. It changes qemu_fopen_fd to register only get_buffer for a reader > and only put_buffer for a writer (adding a 'reader' parameter). > 3. The incoming fd migration code calls qemu_fopen_fd as a reader only. > > Anthony, this is a problem with qemu-upstream so I'd like to solve it in a way that's acceptable for upstream. The proposed patch is less that ideal IMO as it introduces limitations on what you can do with a file. An alternative implementation would add a read/write mode to the buffer, based on the last access type. When switching from read to write, we drop the buffer, and when switching from write to read, we flush it and then drop it. This is more complex but results in a cleaner API. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-12 16:55 ` Avi Kivity @ 2008-10-12 18:09 ` Anthony Liguori 2008-10-12 18:17 ` Avi Kivity 2008-10-16 0:13 ` Uri Lublin 1 sibling, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-12 18:09 UTC (permalink / raw) To: Avi Kivity; +Cc: Uri Lublin, kvm Avi Kivity wrote: > Uri Lublin wrote: >> Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but >> if both are given (non NULL) we encounter problems: >> 1. There is only one buffer and index, which may mean data corruption. >> 2. qemu_flush (which is also called by qemu_fclose) is writing >> ("flushing") >> some of the data that was read (for the reader part). >> >> Currently qemu_fopen_fd registers both get_buffer and put_buffer >> functions. >> >> This breaks migration for tcp and ssh migration protocols. >> >> The following patch fix the above by: >> 1. It makes sure that at most one of get_buffer and put_buffer is >> given to qemu_fopen_ops. >> 2. It changes qemu_fopen_fd to register only get_buffer for a reader >> and only put_buffer for a writer (adding a 'reader' parameter). >> 3. The incoming fd migration code calls qemu_fopen_fd as a reader >> only. >> >> > > Anthony, this is a problem with qemu-upstream so I'd like to solve it > in a way that's acceptable for upstream. > > The proposed patch is less that ideal IMO as it introduces limitations > on what you can do with a file. An alternative implementation would > add a read/write mode to the buffer, based on the last access type. > When switching from read to write, we drop the buffer, and when > switching from write to read, we flush it and then drop it. This is > more complex but results in a cleaner API. I would think a better solution would introduce two buffers, one for read and one for write. That way, you can have a proper bidirectional stream. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-12 18:09 ` Anthony Liguori @ 2008-10-12 18:17 ` Avi Kivity 2008-10-12 22:18 ` Anthony Liguori 0 siblings, 1 reply; 21+ messages in thread From: Avi Kivity @ 2008-10-12 18:17 UTC (permalink / raw) To: Anthony Liguori; +Cc: Uri Lublin, kvm Anthony Liguori wrote: >> >> The proposed patch is less that ideal IMO as it introduces >> limitations on what you can do with a file. An alternative >> implementation would add a read/write mode to the buffer, based on >> the last access type. When switching from read to write, we drop the >> buffer, and when switching from write to read, we flush it and then >> drop it. This is more complex but results in a cleaner API. > > I would think a better solution would introduce two buffers, one for > read and one for write. That way, you can have a proper bidirectional > stream. > Complexity goes way up. Now you need to intercept reads that go to the write buffer, and vice versa. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-12 18:17 ` Avi Kivity @ 2008-10-12 22:18 ` Anthony Liguori 2008-10-13 3:03 ` Anthony Liguori 0 siblings, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-12 22:18 UTC (permalink / raw) To: Avi Kivity; +Cc: Uri Lublin, kvm Avi Kivity wrote: > Anthony Liguori wrote: > >>> The proposed patch is less that ideal IMO as it introduces >>> limitations on what you can do with a file. An alternative >>> implementation would add a read/write mode to the buffer, based on >>> the last access type. When switching from read to write, we drop the >>> buffer, and when switching from write to read, we flush it and then >>> drop it. This is more complex but results in a cleaner API. >>> >> I would think a better solution would introduce two buffers, one for >> read and one for write. That way, you can have a proper bidirectional >> stream. >> >> > > Complexity goes way up. Now you need to intercept reads that go to the > write buffer, and vice versa. > Yeah, Uri: instead of passing an argument to qemu_fopen_ops, it may be better to direct the cases where we do a write and set a flag. Then in the fflush() function, only do the put_buffer if the is_write flag is set. Also, having checks and the read and write functions to determine if the is_write flag is set along with whether buf_index > 0 that fprintf()'d and aborted would be good for debugging. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-12 22:18 ` Anthony Liguori @ 2008-10-13 3:03 ` Anthony Liguori 2008-10-16 1:36 ` Uri Lublin 0 siblings, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-13 3:03 UTC (permalink / raw) To: Avi Kivity; +Cc: Uri Lublin, kvm [-- Attachment #1: Type: text/plain, Size: 368 bytes --] Anthony Liguori wrote: > > Also, having checks and the read and write functions to determine if > the is_write flag is set along with whether buf_index > 0 that > fprintf()'d and aborted would be good for debugging. I have a patch that does this along with fixing a few other bugs. It's attached. Regards, Anthony Liguori > > Regards, > > Anthony Liguori > > [-- Attachment #2: 006_live_migration.patch --] [-- Type: text/x-patch, Size: 8114 bytes --] diff --git a/hw/hw.h b/hw/hw.h index e130355..8edd788 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -11,8 +11,8 @@ * The pos argument can be ignored if the file is only being used for * streaming. The handler should try to write all of the data it can. */ -typedef void (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf, - int64_t pos, int size); +typedef int (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf, + int64_t pos, int size); /* Read a chunk of data from a file at the given position. The pos argument * can be ignored if the file is only be used for streaming. The number of @@ -64,6 +64,7 @@ unsigned int qemu_get_be16(QEMUFile *f); unsigned int qemu_get_be32(QEMUFile *f); uint64_t qemu_get_be64(QEMUFile *f); int qemu_file_rate_limit(QEMUFile *f); +int qemu_file_has_error(QEMUFile *f); /* Try to send any outstanding data. This function is useful when output is * halted due to rate limiting or EAGAIN errors occur as it can be used to diff --git a/vl.c b/vl.c index 5659fea..d49c648 100644 --- a/vl.c +++ b/vl.c @@ -6197,12 +6197,15 @@ struct QEMUFile { QEMUFileCloseFunc *close; QEMUFileRateLimit *rate_limit; void *opaque; + int is_write; int64_t buf_offset; /* start of buffer when writing, end of buffer when reading */ int buf_index; int buf_size; /* 0 when writing */ uint8_t buf[IO_BUF_SIZE]; + + int has_error; }; typedef struct QEMUFileFD @@ -6211,34 +6214,6 @@ typedef struct QEMUFileFD QEMUFile *file; } QEMUFileFD; -static void fd_put_notify(void *opaque) -{ - QEMUFileFD *s = opaque; - - /* Remove writable callback and do a put notify */ - qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); - qemu_file_put_notify(s->file); -} - -static void fd_put_buffer(void *opaque, const uint8_t *buf, - int64_t pos, int size) -{ - QEMUFileFD *s = opaque; - ssize_t len; - - do { - len = write(s->fd, buf, size); - } while (len == -1 && errno == EINTR); - - if (len == -1) - len = -errno; - - /* When the fd becomes writable again, register a callback to do - * a put notify */ - if (len == -EAGAIN) - qemu_set_fd_handler2(s->fd, NULL, NULL, fd_put_notify, s); -} - static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) { QEMUFileFD *s = opaque; @@ -6269,7 +6244,7 @@ QEMUFile *qemu_fopen_fd(int fd) return NULL; s->fd = fd; - s->file = qemu_fopen_ops(s, fd_put_buffer, fd_get_buffer, fd_close, NULL); + s->file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close, NULL); return s->file; } @@ -6278,12 +6253,13 @@ typedef struct QEMUFileStdio FILE *outfile; } QEMUFileStdio; -static void file_put_buffer(void *opaque, const uint8_t *buf, +static int file_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { QEMUFileStdio *s = opaque; fseek(s->outfile, pos, SEEK_SET); fwrite(buf, 1, size, s->outfile); + return size; } static int file_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) @@ -6331,11 +6307,12 @@ typedef struct QEMUFileBdrv int64_t base_offset; } QEMUFileBdrv; -static void bdrv_put_buffer(void *opaque, const uint8_t *buf, - int64_t pos, int size) +static int bdrv_put_buffer(void *opaque, const uint8_t *buf, + int64_t pos, int size) { QEMUFileBdrv *s = opaque; bdrv_pwrite(s->bs, s->base_offset + pos, buf, size); + return size; } static int bdrv_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) @@ -6384,18 +6361,29 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, f->get_buffer = get_buffer; f->close = close; f->rate_limit = rate_limit; + f->is_write = 0; return f; } +int qemu_file_has_error(QEMUFile *f) +{ + return f->has_error; +} + void qemu_fflush(QEMUFile *f) { if (!f->put_buffer) return; - if (f->buf_index > 0) { - f->put_buffer(f->opaque, f->buf, f->buf_offset, f->buf_index); - f->buf_offset += f->buf_index; + if (f->is_write && f->buf_index > 0) { + int len; + + len = f->put_buffer(f->opaque, f->buf, f->buf_offset, f->buf_index); + if (len > 0) + f->buf_offset += f->buf_index; + else + f->has_error = 1; f->buf_index = 0; } } @@ -6407,13 +6395,16 @@ static void qemu_fill_buffer(QEMUFile *f) if (!f->get_buffer) return; - len = f->get_buffer(f->opaque, f->buf, f->buf_offset, IO_BUF_SIZE); - if (len < 0) - len = 0; + if (f->is_write) + abort(); - f->buf_index = 0; - f->buf_size = len; - f->buf_offset += len; + len = f->get_buffer(f->opaque, f->buf, f->buf_offset, IO_BUF_SIZE); + if (len > 0) { + f->buf_index = 0; + f->buf_size = len; + f->buf_offset += len; + } else if (len != -EAGAIN) + f->has_error = 1; } int qemu_fclose(QEMUFile *f) @@ -6434,11 +6425,19 @@ void qemu_file_put_notify(QEMUFile *f) void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size) { int l; - while (size > 0) { + + if (!f->has_error && f->is_write == 0 && f->buf_index > 0) { + fprintf(stderr, + "Attempted to write to buffer while read buffer is not empty\n"); + abort(); + } + + while (!f->has_error && size > 0) { l = IO_BUF_SIZE - f->buf_index; if (l > size) l = size; memcpy(f->buf + f->buf_index, buf, l); + f->is_write = 1; f->buf_index += l; buf += l; size -= l; @@ -6449,7 +6448,14 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size) void qemu_put_byte(QEMUFile *f, int v) { + if (!f->has_error && f->is_write == 0 && f->buf_index > 0) { + fprintf(stderr, + "Attempted to write to buffer while read buffer is not empty\n"); + abort(); + } + f->buf[f->buf_index++] = v; + f->is_write = 1; if (f->buf_index >= IO_BUF_SIZE) qemu_fflush(f); } @@ -6458,6 +6464,9 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1) { int size, l; + if (f->is_write) + abort(); + size = size1; while (size > 0) { l = f->buf_size - f->buf_index; @@ -6479,6 +6488,9 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size1) int qemu_get_byte(QEMUFile *f) { + if (f->is_write) + abort(); + if (f->buf_index >= f->buf_size) { qemu_fill_buffer(f); if (f->buf_index >= f->buf_size) @@ -6671,6 +6683,9 @@ int qemu_savevm_state_begin(QEMUFile *f) se->save_live_state(f, QEMU_VM_SECTION_START, se->opaque); } + if (qemu_file_has_error(f)) + return -EIO; + return 0; } @@ -6693,6 +6708,9 @@ int qemu_savevm_state_iterate(QEMUFile *f) if (ret) return 1; + if (qemu_file_has_error(f)) + return -EIO; + return 0; } @@ -6734,6 +6752,9 @@ int qemu_savevm_state_complete(QEMUFile *f) qemu_put_byte(f, QEMU_VM_EOF); + if (qemu_file_has_error(f)) + return -EIO; + return 0; } @@ -6758,8 +6779,12 @@ int qemu_savevm_state(QEMUFile *f) ret = qemu_savevm_state_complete(f); out: - if (saved_vm_running) + if (qemu_file_has_error(f)) + ret = -EIO; + + if (!ret && saved_vm_running) vm_start(); + return ret; } @@ -6815,6 +6840,10 @@ static int qemu_loadvm_state_v2(QEMUFile *f) /* always seek to exact end of record */ qemu_fseek(f, cur_pos + record_len, SEEK_SET); } + + if (qemu_file_has_error(f)) + return -EIO; + return 0; } @@ -6913,6 +6942,9 @@ out: qemu_free(le); } + if (qemu_file_has_error(f)) + ret = -EIO; + return ret; } @@ -7224,6 +7256,10 @@ static int ram_get_page(QEMUFile *f, uint8_t *buf, int len) default: return -EINVAL; } + + if (qemu_file_has_error(f)) + return -EIO; + return 0; } ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-13 3:03 ` Anthony Liguori @ 2008-10-16 1:36 ` Uri Lublin 2008-10-16 4:14 ` Anthony Liguori 0 siblings, 1 reply; 21+ messages in thread From: Uri Lublin @ 2008-10-16 1:36 UTC (permalink / raw) To: Anthony Liguori; +Cc: Avi Kivity, Uri Lublin, kvm Anthony Liguori wrote: > Anthony Liguori wrote: >> >> Also, having checks and the read and write functions to determine if >> the is_write flag is set along with whether buf_index > 0 that >> fprintf()'d and aborted would be good for debugging. > > I have a patch that does this along with fixing a few other bugs. It's > attached. > Hi Anthony, Thanks for pushing Live Migration into upstream qemu. This patch doesn't provide us with full duplex capability yet: * qemu_fopen_fd() now only register get_buffer function (always a reader). * Also there still a single buffer that should not be shared by reader/writer. * It aborts upon a read-to-write switch and upon a write-to-read switch. That as you've mentioned may be helpful when/if implementing a full duplex qemu_file_* mechanism. Below are some specific code comments. Thanks, Uri. >@@ -6278,12 +6253,13 @@ typedef struct QEMUFileStdio > FILE *outfile; > } QEMUFileStdio; > >-static void file_put_buffer(void *opaque, const uint8_t *buf, >+static int file_put_buffer(void *opaque, const uint8_t *buf, > int64_t pos, int size) > { > QEMUFileStdio *s = opaque; > fseek(s->outfile, pos, SEEK_SET); > fwrite(buf, 1, size, s->outfile); >+ return size; Better return the size that was actually written > } > > static int file_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) >@@ -6331,11 +6307,12 @@ typedef struct QEMUFileBdrv > int64_t base_offset; > } QEMUFileBdrv; > >-static void bdrv_put_buffer(void *opaque, const uint8_t *buf, >- int64_t pos, int size) >+static int bdrv_put_buffer(void *opaque, const uint8_t *buf, >+ int64_t pos, int size) > { > QEMUFileBdrv *s = opaque; > bdrv_pwrite(s->bs, s->base_offset + pos, buf, size); >+ return size; Same here > } > void qemu_fflush(QEMUFile *f) > { > if (!f->put_buffer) > return; >- if (f->buf_index > 0) { >- f->put_buffer(f->opaque, f->buf, f->buf_offset, f->buf_index); >- f->buf_offset += f->buf_index; >+ if (f->is_write && f->buf_index > 0) { >+ int len; >+ >+ len = f->put_buffer(f->opaque, f->buf, f->buf_offset, f->buf_index); >+ if (len > 0) >+ f->buf_offset += f->buf_index; >+ else >+ f->has_error = 1; Untabify. > f->buf_index = 0; > } > } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 1:36 ` Uri Lublin @ 2008-10-16 4:14 ` Anthony Liguori 2008-10-16 8:13 ` Avi Kivity 2008-10-16 10:52 ` Uri Lublin 0 siblings, 2 replies; 21+ messages in thread From: Anthony Liguori @ 2008-10-16 4:14 UTC (permalink / raw) To: Uri Lublin; +Cc: Avi Kivity, Uri Lublin, kvm Uri Lublin wrote: > Anthony Liguori wrote: I have already cut your text, but I don't understand the comment about not being "full duplex". Is there a reason why migration needs to be bidirectional? I don't think there's a fundamental reason it needs to be and I think there are some advantages to it being unidirectional. > >@@ -6278,12 +6253,13 @@ typedef struct QEMUFileStdio > > FILE *outfile; > > } QEMUFileStdio; > > > >-static void file_put_buffer(void *opaque, const uint8_t *buf, > >+static int file_put_buffer(void *opaque, const uint8_t *buf, > > int64_t pos, int size) > > { > > QEMUFileStdio *s = opaque; > > fseek(s->outfile, pos, SEEK_SET); > > fwrite(buf, 1, size, s->outfile); > >+ return size; > > Better return the size that was actually written At this level in the API, you have to write all of the data. The reason I introduced a return value at all was so that you could return an error. This is necessary to detect that a migration or save/restore has failed. I agree the QEMUFile API isn't ideal ATM. The whole buffered thing is a big hack. I'm open to refactoring suggestions/patches. > > void qemu_fflush(QEMUFile *f) > > { > > if (!f->put_buffer) > > return; > >- if (f->buf_index > 0) { > >- f->put_buffer(f->opaque, f->buf, f->buf_offset, f->buf_index); > >- f->buf_offset += f->buf_index; > >+ if (f->is_write && f->buf_index > 0) { > >+ int len; > >+ > >+ len = f->put_buffer(f->opaque, f->buf, f->buf_offset, > f->buf_index); > >+ if (len > 0) > >+ f->buf_offset += f->buf_index; > >+ else > >+ f->has_error = 1; > > Untabify. I don't see a tab in subversion so perhaps I cleaned that up before committing :-) Thanks for the review! Regards, Anthony Liguori > > > f->buf_index = 0; > > } > > } > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 4:14 ` Anthony Liguori @ 2008-10-16 8:13 ` Avi Kivity 2008-10-16 12:54 ` Anthony Liguori 2008-10-16 10:52 ` Uri Lublin 1 sibling, 1 reply; 21+ messages in thread From: Avi Kivity @ 2008-10-16 8:13 UTC (permalink / raw) To: Anthony Liguori; +Cc: Uri Lublin, kvm Anthony Liguori wrote: > Uri Lublin wrote: >> Anthony Liguori wrote: > > I have already cut your text, but I don't understand the comment about > not being "full duplex". Is there a reason why migration needs to be > bidirectional? I don't think there's a fundamental reason it needs to > be and I think there are some advantages to it being unidirectional. > Weren't there some acks flowing back on the old protocol to let the source now things are fine? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 8:13 ` Avi Kivity @ 2008-10-16 12:54 ` Anthony Liguori 2008-10-16 14:23 ` Uri Lublin 0 siblings, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-16 12:54 UTC (permalink / raw) To: Avi Kivity; +Cc: Uri Lublin, kvm Avi Kivity wrote: > Anthony Liguori wrote: >> Uri Lublin wrote: >>> Anthony Liguori wrote: >> >> I have already cut your text, but I don't understand the comment >> about not being "full duplex". Is there a reason why migration needs >> to be bidirectional? I don't think there's a fundamental reason it >> needs to be and I think there are some advantages to it being >> unidirectional. >> > > Weren't there some acks flowing back on the old protocol to let the > source now things are fine? There were, but there aren't now. They don't improve reliability. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 12:54 ` Anthony Liguori @ 2008-10-16 14:23 ` Uri Lublin 2008-10-16 14:32 ` Avi Kivity 0 siblings, 1 reply; 21+ messages in thread From: Uri Lublin @ 2008-10-16 14:23 UTC (permalink / raw) To: Anthony Liguori; +Cc: Avi Kivity, Uri Lublin, kvm Anthony Liguori wrote: > Avi Kivity wrote: >> Anthony Liguori wrote: >>> Uri Lublin wrote: >>>> Anthony Liguori wrote: >>> >>> I have already cut your text, but I don't understand the comment >>> about not being "full duplex". Is there a reason why migration needs >>> to be bidirectional? I don't think there's a fundamental reason it >>> needs to be and I think there are some advantages to it being >>> unidirectional. >>> >> >> Weren't there some acks flowing back on the old protocol to let the >> source now things are fine? > > There were, but there aren't now. They don't improve reliability. > Why do you think they don't improve reliability ? The Ack/Go messages catch the scenario where SRC finishes sending all its state, and DST have a problem loading that state. In that case SRC thinks migration was completed successfully (and stay in stopped mode), while DST exists. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 14:23 ` Uri Lublin @ 2008-10-16 14:32 ` Avi Kivity 2008-10-16 14:49 ` Uri Lublin 0 siblings, 1 reply; 21+ messages in thread From: Avi Kivity @ 2008-10-16 14:32 UTC (permalink / raw) To: Uri Lublin; +Cc: Anthony Liguori, Uri Lublin, kvm Uri Lublin wrote: >>> >>> Weren't there some acks flowing back on the old protocol to let the >>> source now things are fine? >> >> There were, but there aren't now. They don't improve reliability. >> > > Why do you think they don't improve reliability ? > The Ack/Go messages catch the scenario where SRC finishes sending all > its state, and DST have a problem loading that state. In that case SRC > thinks migration was completed successfully (and stay in stopped > mode), while DST exists. The management tool is in contact with both, and can arbitrate. Once it thinks both source and destination are in a good state, it can continue the destination and quit the source. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 14:32 ` Avi Kivity @ 2008-10-16 14:49 ` Uri Lublin 2008-10-17 2:47 ` Anthony Liguori 0 siblings, 1 reply; 21+ messages in thread From: Uri Lublin @ 2008-10-16 14:49 UTC (permalink / raw) To: Avi Kivity; +Cc: Anthony Liguori, Uri Lublin, kvm Avi Kivity wrote: > Uri Lublin wrote: >>>> >>>> Weren't there some acks flowing back on the old protocol to let the >>>> source now things are fine? >>> >>> There were, but there aren't now. They don't improve reliability. >>> >> >> Why do you think they don't improve reliability ? >> The Ack/Go messages catch the scenario where SRC finishes sending all >> its state, and DST have a problem loading that state. In that case SRC >> thinks migration was completed successfully (and stay in stopped >> mode), while DST exists. > > The management tool is in contact with both, and can arbitrate. Once it > thinks both source and destination are in a good state, it can continue > the destination and quit the source. > That is true, but in the case I mentioned above it would take the management tool some time (guest down time) to realize what happens, and to send "cont" to the SRC. With end-of-migration messages SRC discovers DST fails and immediately continues. I agree those messages add some complexity, and slow things a bit for the good/average case. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 14:49 ` Uri Lublin @ 2008-10-17 2:47 ` Anthony Liguori 2008-10-19 13:46 ` Uri Lublin 0 siblings, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-17 2:47 UTC (permalink / raw) To: Uri Lublin; +Cc: Avi Kivity, kvm Uri Lublin wrote: > > That is true, but in the case I mentioned above it would take the > management tool some time (guest down time) to realize what happens, > and to send "cont" to the SRC. With end-of-migration messages SRC > discovers DST fails and immediately continues. > I agree those messages add some complexity, and slow things a bit for > the good/average case. It's the classic general's dilemma. If SRC waits for DST to send an ACK, DST still doesn't know whether SRC received the ACK so it doesn't know whether it's truly safe to continue. This is why migration doesn't quit SRC immediately, and leaves SRC in the stopped state. It's because the only safe way to handle this is with a third party that is reliable. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-17 2:47 ` Anthony Liguori @ 2008-10-19 13:46 ` Uri Lublin 2008-10-19 22:00 ` Anthony Liguori 0 siblings, 1 reply; 21+ messages in thread From: Uri Lublin @ 2008-10-19 13:46 UTC (permalink / raw) To: Anthony Liguori; +Cc: Avi Kivity, kvm Anthony Liguori wrote: > Uri Lublin wrote: >> >> That is true, but in the case I mentioned above it would take the >> management tool some time (guest down time) to realize what happens, >> and to send "cont" to the SRC. With end-of-migration messages SRC >> discovers DST fails and immediately continues. >> I agree those messages add some complexity, and slow things a bit for >> the good/average case. > > It's the classic general's dilemma. If SRC waits for DST to send an > ACK, DST still doesn't know whether SRC received the ACK so it doesn't > know whether it's truly safe to continue. > > This is why migration doesn't quit SRC immediately, and leaves SRC in > the stopped state. It's because the only safe way to handle this is > with a third party that is reliable. > In the scenario above (with ACK/GO messages), SRC _does_ know that DST have failed (as it does not receive ACK). With ACK/GO messages we only need third party involvement to handle a scenario where GO does not reach DST. Without ACK/GO messages we need third party involvement for almost any state-load function failure. In other words the risk/exposure is smaller with ACK/GO messages. Since in both cases we must have a third party involvement in the worst case, and since on the good/normal case those messages slow down the migration process a bit (and complicate the code a bit), I do not mind dropping those messages. I just wanted to make sure we all understand their benefit. We can always add them later if we'll "miss" them (if we'll find out they are more useful then we think now). In any case, we need to think of a way to get the migration status on the destination. A minimum is to term_printf a message specifying that status. Maybe change 'info migration' result on destination, or add a new info command. Specifically if the guest dies on DST, a management software must be able to distinguish between a migration problem (in which case it must continue running the guest on SRC) and a successful migration followed by a guest dying (in which case it must quit the guest on SRC). Regards, Uri ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-19 13:46 ` Uri Lublin @ 2008-10-19 22:00 ` Anthony Liguori 2008-10-22 16:23 ` Uri Lublin 0 siblings, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-19 22:00 UTC (permalink / raw) To: Uri Lublin; +Cc: Avi Kivity, kvm Uri Lublin wrote: > Anthony Liguori wrote: >> Uri Lublin wrote: >>> >>> That is true, but in the case I mentioned above it would take the >>> management tool some time (guest down time) to realize what happens, >>> and to send "cont" to the SRC. With end-of-migration messages SRC >>> discovers DST fails and immediately continues. >>> I agree those messages add some complexity, and slow things a bit >>> for the good/average case. >> >> It's the classic general's dilemma. If SRC waits for DST to send an >> ACK, DST still doesn't know whether SRC received the ACK so it >> doesn't know whether it's truly safe to continue. >> >> This is why migration doesn't quit SRC immediately, and leaves SRC in >> the stopped state. It's because the only safe way to handle this is >> with a third party that is reliable. >> > > In the scenario above (with ACK/GO messages), SRC _does_ know that DST > have failed (as it does not receive ACK). With ACK/GO messages we only > need third party involvement to handle a scenario where GO does not > reach DST. Without ACK/GO messages we need third party involvement for > almost any state-load function failure. In other words the > risk/exposure is smaller with ACK/GO messages. I think this is a scenario where we have to be careful about layering in the design. The core migration protocol is a mechanism. The goal is to not implement policy. Having an exchange of ack/go messages may increase reliability but they don't do so in a perfect way. How many times you go back and forth therefore becomes a policy which is based on how important reliability is to you trading off latency. If you have a high latency network, the round trip cost of an ack/go message may introduce unwanted latency (which translates to VM downtime). Moreover, if you have a third party orchestrating everything, it's totally unnecessary downtime. This is not to say there is no place for QEMU to support policies. They should be layered in such a way that they don't burden everyone though. The idea behind using migration protocols is to help facilitate this. I think the tcp: protocol should remain a pure migration-over-tcp transport. I think there is room for implementing another migration protocol that was maybe geared toward more average users. An ack/go message may be appropriate for this. I really think it should also have a daemon associated with it that could automatically spawn QEMU instances. I've always felt the ssh: protocol should provide this but it proved less popular than I expected it to be. Anyway, my point is that if you want an ack/go message, you should encapsulate the existing protocol within another protocol (that has it's own versioning) and introduce a new transport. > Since in both cases we must have a third party involvement in the > worst case, and since on the good/normal case those messages slow down > the migration process a bit (and complicate the code a bit), I do not > mind dropping those messages. I just wanted to make sure we all > understand their benefit. We can always add them later if we'll "miss" > them (if we'll find out they are more useful then we think now). > > In any case, we need to think of a way to get the migration status on > the destination. A minimum is to term_printf a message specifying that > status. What's the use case for this? In what circumstances would you have no idea of what was happening on DST such that you'd need to get this from the SRC? The problem with the old migration code is that while there were a lot of error status, in practice, there was only one or two that would ever happen. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-19 22:00 ` Anthony Liguori @ 2008-10-22 16:23 ` Uri Lublin 0 siblings, 0 replies; 21+ messages in thread From: Uri Lublin @ 2008-10-22 16:23 UTC (permalink / raw) To: Anthony Liguori; +Cc: Avi Kivity, kvm hony Liguori wrote: >> Uri Lublin wrote: >> >> In the scenario above (with ACK/GO messages), SRC _does_ know that DST >> have failed (as it does not receive ACK). With ACK/GO messages we only >> need third party involvement to handle a scenario where GO does not >> reach DST. Without ACK/GO messages we need third party involvement for >> almost any state-load function failure. In other words the >> risk/exposure is smaller with ACK/GO messages. > > I think this is a scenario where we have to be careful about layering in > the design. The core migration protocol is a mechanism. The goal is to > not implement policy. Having an exchange of ack/go messages may > increase reliability but they don't do so in a perfect way. How many > times you go back and forth therefore becomes a policy which is based on > how important reliability is to you trading off latency. I think it's either use ACK/GO messages (one round trip) or not. > If you have a > high latency network, the round trip cost of an ack/go message may > introduce unwanted latency (which translates to VM downtime). Moreover, > if you have a third party orchestrating everything, it's totally > unnecessary downtime. Agreed. That's what I meant when I mentioned ACK/GO messages slow the migration a bit on the average/good case. > > This is not to say there is no place for QEMU to support policies. They > should be layered in such a way that they don't burden everyone though. > The idea behind using migration protocols is to help facilitate this. > > I think the tcp: protocol should remain a pure migration-over-tcp > transport. I think there is room for implementing another migration > protocol that was maybe geared toward more average users. An ack/go > message may be appropriate for this. I really think it should also have > a daemon associated with it that could automatically spawn QEMU > instances. I've always felt the ssh: protocol should provide this but > it proved less popular than I expected it to be. This is where management tools start to play. Although very elegant (getting security+compression+auto-spawning "for free"), the problem with the ssh migration protocol is the "local" params of qemu/kvm. For example if VM1 on host A use vnc port : and VM2 on host B use port :9 , then ssh-migration of VM1 from A to B would fail. > > Anyway, my point is that if you want an ack/go message, you should > encapsulate the existing protocol within another protocol (that has it's > own versioning) and introduce a new transport. OK. > >> Since in both cases we must have a third party involvement in the >> worst case, and since on the good/normal case those messages slow down >> the migration process a bit (and complicate the code a bit), I do not >> mind dropping those messages. I just wanted to make sure we all >> understand their benefit. We can always add them later if we'll "miss" >> them (if we'll find out they are more useful then we think now). >> >> In any case, we need to think of a way to get the migration status on >> the destination. A minimum is to term_printf a message specifying that >> status. > > What's the use case for this? In what circumstances would you have no > idea of what was happening on DST such that you'd need to get this from > the SRC? I meant that you need to get DST view of migration status from DST. So a management tool, or a user, would get migration status from both SRC and DST and would be able to tell if migration was successful. That too is not that important on the normal case, but may be valuable on a bad case. Regards, Uri. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 4:14 ` Anthony Liguori 2008-10-16 8:13 ` Avi Kivity @ 2008-10-16 10:52 ` Uri Lublin 1 sibling, 0 replies; 21+ messages in thread From: Uri Lublin @ 2008-10-16 10:52 UTC (permalink / raw) To: Anthony Liguori; +Cc: Avi Kivity, Uri Lublin, kvm Anthony Liguori wrote: > Uri Lublin wrote: >> Anthony Liguori wrote: > > I have already cut your text, but I don't understand the comment about > not being "full duplex". Is there a reason why migration needs to be > bidirectional? I don't think there's a fundamental reason it needs to > be and I think there are some advantages to it being unidirectional. > Avi was concerned that my patch makes QEMUFile uni-directional, so I assumed you are trying to "fix" that too. Unidirectional-only usage of QEMUFile was working. The problem was that the fd implementation registered both get_buffer and put_buffer. If you never register both non-NULL get_buffer and put_buffer (unidirectional), then you do not need the is_write field. Adding the error checking is valuable. Thanks, Uri. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-12 16:55 ` Avi Kivity 2008-10-12 18:09 ` Anthony Liguori @ 2008-10-16 0:13 ` Uri Lublin 2008-10-16 4:10 ` Anthony Liguori 1 sibling, 1 reply; 21+ messages in thread From: Uri Lublin @ 2008-10-16 0:13 UTC (permalink / raw) To: Avi Kivity; +Cc: Uri Lublin, kvm, Anthony Liguori Avi Kivity wrote: > Uri Lublin wrote: >> Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but >> if both are given (non NULL) we encounter problems: >> 1. There is only one buffer and index, which may mean data corruption. >> 2. qemu_flush (which is also called by qemu_fclose) is writing >> ("flushing") >> some of the data that was read (for the reader part). >> >> Currently qemu_fopen_fd registers both get_buffer and put_buffer >> functions. >> >> This breaks migration for tcp and ssh migration protocols. >> >> The following patch fix the above by: >> 1. It makes sure that at most one of get_buffer and put_buffer is >> given to qemu_fopen_ops. >> 2. It changes qemu_fopen_fd to register only get_buffer for a reader >> and only put_buffer for a writer (adding a 'reader' parameter). >> 3. The incoming fd migration code calls qemu_fopen_fd as a reader only. >> >> > > Anthony, this is a problem with qemu-upstream so I'd like to solve it in > a way that's acceptable for upstream. > > The proposed patch is less that ideal IMO as it introduces limitations > on what you can do with a file. An alternative implementation would add > a read/write mode to the buffer, based on the last access type. When > switching from read to write, we drop the buffer, and when switching > from write to read, we flush it and then drop it. This is more complex > but results in a cleaner API. > I am not sure we are allowed to drop the buffer a on read-to-write switch, especially if there are unread bytes in it. We may need to pause the write that causes the switch, similar to pausing the read in order to flush the buffer on a write-to-read switch. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 0:13 ` Uri Lublin @ 2008-10-16 4:10 ` Anthony Liguori 2008-10-16 8:16 ` Avi Kivity 0 siblings, 1 reply; 21+ messages in thread From: Anthony Liguori @ 2008-10-16 4:10 UTC (permalink / raw) To: Uri Lublin; +Cc: Avi Kivity, Uri Lublin, kvm Uri Lublin wrote: >> >> Anthony, this is a problem with qemu-upstream so I'd like to solve it >> in a way that's acceptable for upstream. >> >> The proposed patch is less that ideal IMO as it introduces >> limitations on what you can do with a file. An alternative >> implementation would add a read/write mode to the buffer, based on >> the last access type. When switching from read to write, we drop the >> buffer, and when switching from write to read, we flush it and then >> drop it. This is more complex but results in a cleaner API. >> > > I am not sure we are allowed to drop the buffer a on read-to-write > switch, especially if there are unread bytes in it. We may need to > pause the write that causes the switch, similar to pausing the read in > order to flush the buffer on a write-to-read switch. Yeah, I tried to implement the read/write mode buffer and it ends up not to work very nicely. I don't remember the precise issue, but I hit some sort of wall that I thought was a fundamental limitation. I ended up forcing the QEMUFile to be in either read or write mode similar to your original patch without introducing a new option as an argument. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user 2008-10-16 4:10 ` Anthony Liguori @ 2008-10-16 8:16 ` Avi Kivity 0 siblings, 0 replies; 21+ messages in thread From: Avi Kivity @ 2008-10-16 8:16 UTC (permalink / raw) To: Anthony Liguori; +Cc: Uri Lublin, kvm Anthony Liguori wrote: > > Yeah, I tried to implement the read/write mode buffer and it ends up > not to work very nicely. I don't remember the precise issue, but I > hit some sort of wall that I thought was a fundamental limitation. I > ended up forcing the QEMUFile to be in either read or write mode > similar to your original patch without introducing a new option as an > argument. > It really depends on whether the file descriptor is a file or a socket. With files, the read and write streams are backed by the same storage, so they can intersect. With sockets, the read and write streams are completely independent. Everything is a file, except when it isn't. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2008-10-22 16:23 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-12 16:30 [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user Uri Lublin 2008-10-12 16:55 ` Avi Kivity 2008-10-12 18:09 ` Anthony Liguori 2008-10-12 18:17 ` Avi Kivity 2008-10-12 22:18 ` Anthony Liguori 2008-10-13 3:03 ` Anthony Liguori 2008-10-16 1:36 ` Uri Lublin 2008-10-16 4:14 ` Anthony Liguori 2008-10-16 8:13 ` Avi Kivity 2008-10-16 12:54 ` Anthony Liguori 2008-10-16 14:23 ` Uri Lublin 2008-10-16 14:32 ` Avi Kivity 2008-10-16 14:49 ` Uri Lublin 2008-10-17 2:47 ` Anthony Liguori 2008-10-19 13:46 ` Uri Lublin 2008-10-19 22:00 ` Anthony Liguori 2008-10-22 16:23 ` Uri Lublin 2008-10-16 10:52 ` Uri Lublin 2008-10-16 0:13 ` Uri Lublin 2008-10-16 4:10 ` Anthony Liguori 2008-10-16 8:16 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).