* [Qemu-devel] [PATCH 0/3] RDMA error handling
@ 2016-09-23 19:14 Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link Dr. David Alan Gilbert (git)
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
lp: https://bugs.launchpad.net/qemu/+bug/1545052
The RDMA code tends to hang if the destination dies
in the wrong place; this series doesn't completely fix
that, but in cases where the destination knows there's
been an error, it makes sure it tells the source and
that cleans up quickly.
If the destination just dies, then the source still hangs
and I still need to look at better ways to fix that.
Dave
Dr. David Alan Gilbert (3):
migration/rdma: Pass qemu_file errors across link
migration: Make failed migration load set file error
migration/rdma: Don't flag an error when we've been told about one
migration/rdma.c | 9 ++++++++-
migration/savevm.c | 19 ++++++++++++-------
2 files changed, 20 insertions(+), 8 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
@ 2016-09-23 19:14 ` Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error Dr. David Alan Gilbert (git)
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
If we fail for some reason (e.g. a mismatched RAMBlock)
and it's set the qemu_file error flag, pass that error back to the
peer so it can clean up rather than waiting for some higher level
progress.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/rdma.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/migration/rdma.c b/migration/rdma.c
index 88bdb64..7271292 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2804,6 +2804,9 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
trace_qemu_rdma_close();
if (rioc->rdma) {
+ if (!rioc->rdma->error_state) {
+ rioc->rdma->error_state = qemu_file_get_error(rioc->file);
+ }
qemu_rdma_cleanup(rioc->rdma);
g_free(rioc->rdma);
rioc->rdma = NULL;
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link Dr. David Alan Gilbert (git)
@ 2016-09-23 19:14 ` Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one Dr. David Alan Gilbert (git)
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
If an error occurs in a section load, set the file error flag
so that the transport can get notified to do a cleanup.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/savevm.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 33a2911..a831ec2 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1828,40 +1828,45 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
{
uint8_t section_type;
- int ret;
+ int ret = 0;
while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
-
+ ret = 0;
trace_qemu_loadvm_state_section(section_type);
switch (section_type) {
case QEMU_VM_SECTION_START:
case QEMU_VM_SECTION_FULL:
ret = qemu_loadvm_section_start_full(f, mis);
if (ret < 0) {
- return ret;
+ goto out;
}
break;
case QEMU_VM_SECTION_PART:
case QEMU_VM_SECTION_END:
ret = qemu_loadvm_section_part_end(f, mis);
if (ret < 0) {
- return ret;
+ goto out;
}
break;
case QEMU_VM_COMMAND:
ret = loadvm_process_command(f);
trace_qemu_loadvm_state_section_command(ret);
if ((ret < 0) || (ret & LOADVM_QUIT)) {
- return ret;
+ goto out;
}
break;
default:
error_report("Unknown savevm section type %d", section_type);
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
}
- return 0;
+out:
+ if (ret < 0) {
+ qemu_file_set_error(f, ret);
+ }
+ return ret;
}
int qemu_loadvm_state(QEMUFile *f)
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error Dr. David Alan Gilbert (git)
@ 2016-09-23 19:14 ` Dr. David Alan Gilbert (git)
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
2016-10-05 8:58 ` Juan Quintela
4 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-09-23 19:14 UTC (permalink / raw)
To: qemu-devel, amit.shah, quintela, michael
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
If the other side tells us there's been an error and we fail
the migration, we don't need to signal that failure to the other
side because it already knew.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/rdma.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 7271292..674ccab 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -350,6 +350,7 @@ typedef struct RDMAContext {
*/
int error_state;
int error_reported;
+ int received_error;
/*
* Description of ram blocks used throughout the code.
@@ -1676,6 +1677,9 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
", but got: %s (%d), length: %d",
control_desc[expecting], expecting,
control_desc[head->type], head->type, head->len);
+ if (head->type == RDMA_CONTROL_ERROR) {
+ rdma->received_error = true;
+ }
return -EIO;
}
if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
@@ -2202,7 +2206,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
int ret, idx;
if (rdma->cm_id && rdma->connected) {
- if (rdma->error_state) {
+ if (rdma->error_state && !rdma->received_error) {
RDMAControlHeader head = { .len = 0,
.type = RDMA_CONTROL_ERROR,
.repeat = 1,
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
` (2 preceding siblings ...)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one Dr. David Alan Gilbert (git)
@ 2016-09-23 19:22 ` Michael R. Hines
2016-09-26 7:49 ` Dr. David Alan Gilbert
2016-10-05 8:58 ` Juan Quintela
4 siblings, 1 reply; 8+ messages in thread
From: Michael R. Hines @ 2016-09-23 19:22 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git), qemu-devel, amit.shah, quintela,
michael
Reviewed-by: Michael R. Hines <michael@hinespot.com>
(By the way, I no longer work for IBM and no longer have direct access to RDMA hardware. If someone is willing to let me login to something that does in the future, I don't mind debugging things. I just don't have any hardware of my own anymore to debug, and the last time I tried to use software RDMA it was an unpleasurable experience.)
/*
* Michael R. Hines
* Senior Engineer, DigitalOcean.
*/
On 09/23/2016 02:14 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> lp: https://bugs.launchpad.net/qemu/+bug/1545052
>
> The RDMA code tends to hang if the destination dies
> in the wrong place; this series doesn't completely fix
> that, but in cases where the destination knows there's
> been an error, it makes sure it tells the source and
> that cleans up quickly.
> If the destination just dies, then the source still hangs
> and I still need to look at better ways to fix that.
>
> Dave
>
> Dr. David Alan Gilbert (3):
> migration/rdma: Pass qemu_file errors across link
> migration: Make failed migration load set file error
> migration/rdma: Don't flag an error when we've been told about one
>
> migration/rdma.c | 9 ++++++++-
> migration/savevm.c | 19 ++++++++++++-------
> 2 files changed, 20 insertions(+), 8 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
@ 2016-09-26 7:49 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2016-09-26 7:49 UTC (permalink / raw)
To: Michael R. Hines; +Cc: qemu-devel, amit.shah, quintela, michael
* Michael R. Hines (mrhines@digitalocean.com) wrote:
> Reviewed-by: Michael R. Hines <michael@hinespot.com>
>
> (By the way, I no longer work for IBM and no longer have direct access to RDMA hardware. If someone is willing to let me login to something that does in the future, I don't mind debugging things. I just don't have any hardware of my own anymore to debug, and the last time I tried to use software RDMA it was an unpleasurable experience.)
Thanks; I did hear a rumour that SoftRoCE was going to go upstream, but
it doesn't seem to have happened yet.
Dave
>
> /*
> * Michael R. Hines
> * Senior Engineer, DigitalOcean.
> */
>
> On 09/23/2016 02:14 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > lp: https://bugs.launchpad.net/qemu/+bug/1545052
> >
> > The RDMA code tends to hang if the destination dies
> > in the wrong place; this series doesn't completely fix
> > that, but in cases where the destination knows there's
> > been an error, it makes sure it tells the source and
> > that cleans up quickly.
> > If the destination just dies, then the source still hangs
> > and I still need to look at better ways to fix that.
> >
> > Dave
> >
> > Dr. David Alan Gilbert (3):
> > migration/rdma: Pass qemu_file errors across link
> > migration: Make failed migration load set file error
> > migration/rdma: Don't flag an error when we've been told about one
> >
> > migration/rdma.c | 9 ++++++++-
> > migration/savevm.c | 19 ++++++++++++-------
> > 2 files changed, 20 insertions(+), 8 deletions(-)
> >
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
` (3 preceding siblings ...)
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
@ 2016-10-05 8:58 ` Juan Quintela
2016-10-05 9:07 ` Dr. David Alan Gilbert
4 siblings, 1 reply; 8+ messages in thread
From: Juan Quintela @ 2016-10-05 8:58 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, amit.shah, michael
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> lp: https://bugs.launchpad.net/qemu/+bug/1545052
>
> The RDMA code tends to hang if the destination dies
> in the wrong place; this series doesn't completely fix
> that, but in cases where the destination knows there's
> been an error, it makes sure it tells the source and
> that cleans up quickly.
> If the destination just dies, then the source still hangs
> and I still need to look at better ways to fix that.
>
> Dave
>
> Dr. David Alan Gilbert (3):
> migration/rdma: Pass qemu_file errors across link
> migration: Make failed migration load set file error
> migration/rdma: Don't flag an error when we've been told about one
>
> migration/rdma.c | 9 ++++++++-
> migration/savevm.c | 19 ++++++++++++-------
> 2 files changed, 20 insertions(+), 8 deletions(-)
Reviewed-by: Juan Quintela <quintela@redhat.com>
and applied.
BTW, second patch is not RDMA error handling O:-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RDMA error handling
2016-10-05 8:58 ` Juan Quintela
@ 2016-10-05 9:07 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2016-10-05 9:07 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, amit.shah, michael
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > lp: https://bugs.launchpad.net/qemu/+bug/1545052
> >
> > The RDMA code tends to hang if the destination dies
> > in the wrong place; this series doesn't completely fix
> > that, but in cases where the destination knows there's
> > been an error, it makes sure it tells the source and
> > that cleans up quickly.
> > If the destination just dies, then the source still hangs
> > and I still need to look at better ways to fix that.
> >
> > Dave
> >
> > Dr. David Alan Gilbert (3):
> > migration/rdma: Pass qemu_file errors across link
> > migration: Make failed migration load set file error
> > migration/rdma: Don't flag an error when we've been told about one
> >
> > migration/rdma.c | 9 ++++++++-
> > migration/savevm.c | 19 ++++++++++++-------
> > 2 files changed, 20 insertions(+), 8 deletions(-)
>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> and applied.
>
Thanks.
> BTW, second patch is not RDMA error handling O:-)
Ah, but it's there to cause the first patch to trigger.
It's not needed in the case of any of the other transports.
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-10-05 9:07 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-23 19:14 [Qemu-devel] [PATCH 0/3] RDMA error handling Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 1/3] migration/rdma: Pass qemu_file errors across link Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 2/3] migration: Make failed migration load set file error Dr. David Alan Gilbert (git)
2016-09-23 19:14 ` [Qemu-devel] [PATCH 3/3] migration/rdma: Don't flag an error when we've been told about one Dr. David Alan Gilbert (git)
2016-09-23 19:22 ` [Qemu-devel] [PATCH 0/3] RDMA error handling Michael R. Hines
2016-09-26 7:49 ` Dr. David Alan Gilbert
2016-10-05 8:58 ` Juan Quintela
2016-10-05 9:07 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).