* [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
@ 2016-09-23 19:14 ` Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error Dr. David Alan Gilbert (git)
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
If we fail for some reason (e.g. a mismatched RAMBlock)
and it's set the qemu_file error flag, pass that error back to the
peer so it can clean up rather than waiting for some higher level
progress.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/rdma.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/migration/rdma.c b/migration/rdma.c
index 88bdb64..7271292 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2804,6 +2804,9 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
trace_qemu_rdma_close();
if (rioc->rdma) {
+ if (!rioc->rdma->error_state) {
+ rioc->rdma->error_state = qemu_file_get_error(rioc->file);
+ }
qemu_rdma_cleanup(rioc->rdma);
g_free(rioc->rdma);
rioc->rdma = NULL;
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link Dr. David Alan Gilbert (git)
@ 2016-09-23 19:14 ` Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one Dr. David Alan Gilbert (git)
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
If an error occurs in a section load, set the file error flag
so that the transport can get notified to do a cleanup.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/savevm.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 33a2911..a831ec2 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1828,40 +1828,45 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
{
uint8_t section_type;
- int ret;
+ int ret = 0;
while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
-
+ ret = 0;
trace_qemu_loadvm_state_section(section_type);
switch (section_type) {
case QEMU_VM_SECTION_START:
case QEMU_VM_SECTION_FULL:
ret = qemu_loadvm_section_start_full(f, mis);
if (ret < 0) {
- return ret;
+ goto out;
}
break;
case QEMU_VM_SECTION_PART:
case QEMU_VM_SECTION_END:
ret = qemu_loadvm_section_part_end(f, mis);
if (ret < 0) {
- return ret;
+ goto out;
}
break;
case QEMU_VM_COMMAND:
ret = loadvm_process_command(f);
trace_qemu_loadvm_state_section_command(ret);
if ((ret < 0) || (ret & LOADVM_QUIT)) {
- return ret;
+ goto out;
}
break;
default:
error_report("Unknown savevm section type %d", section_type);
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
}
- return 0;
+out:
+ if (ret < 0) {
+ qemu_file_set_error(f, ret);
+ }
+ return ret;
}
int qemu_loadvm_state(QEMUFile *f)
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error Dr. David Alan Gilbert (git)
@ 2016-09-23 19:14 ` Dr. David Alan Gilbert (git)
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
2016-10-05 8:58 ` Juan Quintela
4 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
If the other side tells us there's been an error and we fail
the migration, we don't need to signal that failure to the other
side because it already knew.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/rdma.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 7271292..674ccab 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -350,6 +350,7 @@ typedef struct RDMAContext {
*/
int error_state;
int error_reported;
+ int received_error;
/*
* Description of ram blocks used throughout the code.
@@ -1676,6 +1677,9 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
", but got: %s (%d), length: %d",
control_desc[expecting], expecting,
control_desc[head->type], head->type, head->len);
+ if (head->type == RDMA_CONTROL_ERROR) {
+ rdma->received_error = true;
+ }
return -EIO;
}
if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
@@ -2202,7 +2206,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
int ret, idx;
if (rdma->cm_id && rdma->connected) {
- if (rdma->error_state) {
+ if (rdma->error_state && !rdma->received_error) {
RDMAControlHeader head = { .len = 0,
.type = RDMA_CONTROL_ERROR,
.repeat = 1,
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
` (2 preceding siblings ...)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one Dr. David Alan Gilbert (git)
@ 2016-09-23 19:22 ` Michael R. Hines
2016-09-26 7:49 ` Dr. David Alan Gilbert
2016-10-05 8:58 ` Juan Quintela
4 siblings, 1 reply; 8+ messages in thread
From: Michael R. Hines @ 2016-09-23 19:22 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git), qemu-devel, amit.shah, quintela,
michael
Reviewed-by: Michael R. Hines <michael@hinespot.com>
(By the way, I no longer work for IBM and no longer have direct access to RDMA hardware. If someone is willing to let me login to something that does in the future, I don't mind debugging things. I just don't have any hardware of my own anymore to debug, and the last time I tried to use software RDMA it was an unpleasurable experience.)
/*
* Michael R. Hines
* Senior Engineer, DigitalOcean.
*/
On 09/23/2016 02:14 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> lp: https://bugs.launchpad.net/qemu/+bug/1545052
>
> The RDMA code tends to hang if the destination dies
> in the wrong place; this series doesn't completely fix
> that, but in cases where the destination knows there's
> been an error, it makes sure it tells the source and
> that cleans up quickly.
> If the destination just dies, then the source still hangs
> and I still need to look at better ways to fix that.
>
> Dave
>
> Dr. David Alan Gilbert (3):
> migration/rdma: Pass qemu_file errors across link
> migration: Make failed migration load set file error
> migration/rdma: Don't flag an error when we've been told about one
>
> migration/rdma.c | 9 ++++++++-
> migration/savevm.c | 19 ++++++++++++-------
> 2 files changed, 20 insertions(+), 8 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
@ 2016-09-26 7:49 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2016-09-26 7:49 UTC (permalink / raw)
To: Michael R. Hines; +Cc: qemu-devel, amit.shah, quintela, michael
* Michael R. Hines (mrhines@digitalocean.com) wrote:
> Reviewed-by: Michael R. Hines <michael@hinespot.com>
>
> (By the way, I no longer work for IBM and no longer have direct access to RDMA hardware. If someone is willing to let me login to something that does in the future, I don't mind debugging things. I just don't have any hardware of my own anymore to debug, and the last time I tried to use software RDMA it was an unpleasurable experience.)
Thanks; I did hear a rumour that SoftRoCE was going to go upstream, but
it doesn't seem to have happened yet.
Dave
>
> /*
> * Michael R. Hines
> * Senior Engineer, DigitalOcean.
> */
>
> On 09/23/2016 02:14 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > lp: https://bugs.launchpad.net/qemu/+bug/1545052
> >
> > The RDMA code tends to hang if the destination dies
> > in the wrong place; this series doesn't completely fix
> > that, but in cases where the destination knows there's
> > been an error, it makes sure it tells the source and
> > that cleans up quickly.
> > If the destination just dies, then the source still hangs
> > and I still need to look at better ways to fix that.
> >
> > Dave
> >
> > Dr. David Alan Gilbert (3):
> > migration/rdma: Pass qemu_file errors across link
> > migration: Make failed migration load set file error
> > migration/rdma: Don't flag an error when we've been told about one
> >
> > migration/rdma.c | 9 ++++++++-
> > migration/savevm.c | 19 ++++++++++++-------
> > 2 files changed, 20 insertions(+), 8 deletions(-)
> >
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
` (3 preceding siblings ...)
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
@ 2016-10-05 8:58 ` Juan Quintela
2016-10-05 9:07 ` Dr. David Alan Gilbert
4 siblings, 1 reply; 8+ messages in thread
From: Juan Quintela @ 2016-10-05 8:58 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, amit.shah, michael
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> lp: https://bugs.launchpad.net/qemu/+bug/1545052
>
> The RDMA code tends to hang if the destination dies
> in the wrong place; this series doesn't completely fix
> that, but in cases where the destination knows there's
> been an error, it makes sure it tells the source and
> that cleans up quickly.
> If the destination just dies, then the source still hangs
> and I still need to look at better ways to fix that.
>
> Dave
>
> Dr. David Alan Gilbert (3):
> migration/rdma: Pass qemu_file errors across link
> migration: Make failed migration load set file error
> migration/rdma: Don't flag an error when we've been told about one
>
> migration/rdma.c | 9 ++++++++-
> migration/savevm.c | 19 ++++++++++++-------
> 2 files changed, 20 insertions(+), 8 deletions(-)
Reviewed-by: Juan Quintela <quintela@redhat.com>
and applied.
BTW, second patch is not RDMA error handling O:-)
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-10-05 8:58 ` Juan Quintela
@ 2016-10-05 9:07 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2016-10-05 9:07 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, amit.shah, michael
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > lp: https://bugs.launchpad.net/qemu/+bug/1545052
> >
> > The RDMA code tends to hang if the destination dies
> > in the wrong place; this series doesn't completely fix
> > that, but in cases where the destination knows there's
> > been an error, it makes sure it tells the source and
> > that cleans up quickly.
> > If the destination just dies, then the source still hangs
> > and I still need to look at better ways to fix that.
> >
> > Dave
> >
> > Dr. David Alan Gilbert (3):
> > migration/rdma: Pass qemu_file errors across link
> > migration: Make failed migration load set file error
> > migration/rdma: Don't flag an error when we've been told about one
> >
> > migration/rdma.c | 9 ++++++++-
> > migration/savevm.c | 19 ++++++++++++-------
> > 2 files changed, 20 insertions(+), 8 deletions(-)
>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> and applied.
>
Thanks.
> BTW, second patch is not RDMA error handling O:-)
Ah, but it's there to cause the first patch to trigger.
It's not needed in the case of any of the other transports.
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 8+ messages in thread