All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Juan Quintela <quintela@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v3 03/10] migration: Refactor error handling in source return path
Date: Thu, 5 Oct 2023 12:05:49 -0400	[thread overview]
Message-ID: <ZR7e3cmxCH9LAdnS@x1n> (raw)
In-Reply-To: <f60f3687-d778-3d8e-41f6-39fbcad8f0f8@linaro.org>

On Thu, Oct 05, 2023 at 08:11:33AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Peter,
> 
> On 5/10/23 00:02, Peter Xu wrote:
> > rp_state.error was a boolean used to show error happened in return path
> > thread.  That's not only duplicating error reporting (migrate_set_error),
> > but also not good enough in that we only do error_report() and set it to
> > true, we never can keep a history of the exact error and show it in
> > query-migrate.
> > 
> > To make this better, a few things done:
> > 
> >    - Use error_setg() rather than error_report() across the whole lifecycle
> >      of return path thread, keeping the error in an Error*.
> > 
> >    - Use migrate_set_error() to apply that captured error to the global
> >      migration object when error occured in this thread.
> > 
> >    - With above, no need to have mark_source_rp_bad(), remove it, alongside
> >      with rp_state.error itself.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   migration/migration.h  |   1 -
> >   migration/ram.h        |   5 +-
> >   migration/migration.c  | 123 ++++++++++++++++++-----------------------
> >   migration/ram.c        |  41 +++++++-------
> >   migration/trace-events |   4 +-
> >   5 files changed, 79 insertions(+), 95 deletions(-)
> 
> 
> > -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> > +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
> >   {
> >       int ret = -EINVAL;
> >       /* from_dst_file is always valid because we're within rp_thread */
> 
> 
> > @@ -4193,16 +4194,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> >       ret = qemu_file_get_error(file);
> >       if (ret || size != local_size) {
> > -        error_report("%s: read bitmap failed for ramblock '%s': %d"
> > -                     " (size 0x%"PRIx64", got: 0x%"PRIx64")",
> > -                     __func__, block->idstr, ret, local_size, size);
> > +        error_setg(errp, "read bitmap failed for ramblock '%s': %d"
> > +                   " (size 0x%"PRIx64", got: 0x%"PRIx64")",
> > +                   block->idstr, ret, local_size, size);
> >           ret = -EIO;
> >           goto out;
> >       }
> >       if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
> > -        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64,
> > -                     __func__, block->idstr, end_mark);
> > +        error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64,
> > +                   block->idstr, end_mark);
> >           ret = -EINVAL;
> >           goto out;
> >       }
> 
> This function returns -EIO/-EINVAL errors, propagated to its 2 callers
>  - migrate_handle_rp_recv_bitmap()
>  - migrate_handle_rp_resume_ack()

It was only called in migrate_handle_rp_recv_bitmap(), but I think I see
what you meant..

> which are only used in source_return_path_thread() where the return
> value is only checked as boolean.
> 
> Could we simplify them returning a boolean (which is the pattern with
> functions taking an Error** as last parameter)?

Yes, with errp passed in, the "int" retcode is slightly duplicated.  I can
add one more patch on top of this as further cleanup, as below.

Thanks,

===8<===
From b1052befd72beb129012afddf5647339fe4e257c Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 5 Oct 2023 12:03:44 -0400
Subject: [PATCH] migration: Change ram_dirty_bitmap_reload() retval to bool
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now we have a Error** passed into the return path thread stack, which is
even clearer than an int retval.  Change ram_dirty_bitmap_reload() and the
callers to use a bool instead to replace errnos.

Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.h       |  2 +-
 migration/migration.c | 18 +++++++++---------
 migration/ram.c       | 24 +++++++++++-------------
 3 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/migration/ram.h b/migration/ram.h
index 14ed666d58..af0290f8ab 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -72,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
+bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
diff --git a/migration/migration.c b/migration/migration.c
index 1a7f214fcf..e7375810be 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1837,29 +1837,29 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     ram_save_queue_pages(rbname, start, len, errp);
 }
 
-static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
-                                         Error **errp)
+static bool migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
+                                          Error **errp)
 {
     RAMBlock *block = qemu_ram_block_by_name(block_name);
 
     if (!block) {
         error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'",
                    block_name);
-        return -EINVAL;
+        return false;
     }
 
     /* Fetch the received bitmap and refresh the dirty bitmap */
     return ram_dirty_bitmap_reload(s, block, errp);
 }
 
-static int migrate_handle_rp_resume_ack(MigrationState *s,
-                                        uint32_t value, Error **errp)
+static bool migrate_handle_rp_resume_ack(MigrationState *s,
+                                         uint32_t value, Error **errp)
 {
     trace_source_return_path_thread_resume_ack(value);
 
     if (value != MIGRATION_RESUME_ACK_VALUE) {
         error_setg(errp, "illegal resume_ack value %"PRIu32, value);
-        return -1;
+        return false;
     }
 
     /* Now both sides are active. */
@@ -1869,7 +1869,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s,
     /* Notify send thread that time to continue send pages */
     migration_rp_kick(s);
 
-    return 0;
+    return true;
 }
 
 /*
@@ -2021,14 +2021,14 @@ static void *source_return_path_thread(void *opaque)
             }
             /* Format: len (1B) + idstr (<255B). This ends the idstr. */
             buf[buf[0] + 1] = '\0';
-            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) {
+            if (!migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) {
                 goto out;
             }
             break;
 
         case MIG_RP_MSG_RESUME_ACK:
             tmp32 = ldl_be_p(buf);
-            if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) {
+            if (!migrate_handle_rp_resume_ack(ms, tmp32, &err)) {
                 goto out;
             }
             break;
diff --git a/migration/ram.c b/migration/ram.c
index 2565f53f5c..982fbbeee1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4157,23 +4157,25 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
  * to resume from a middle point.
+ *
+ * Returns true if succeeded, false for errors.
  */
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
+bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
 {
-    int ret = -EINVAL;
     /* from_dst_file is always valid because we're within rp_thread */
     QEMUFile *file = s->rp_state.from_dst_file;
     unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
     uint64_t local_size = DIV_ROUND_UP(nbits, 8);
     uint64_t size, end_mark;
     RAMState *rs = ram_state;
+    bool result = false;
 
     trace_ram_dirty_bitmap_reload_begin(block->idstr);
 
     if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
         error_setg(errp, "Reload bitmap in incorrect state %s",
                    MigrationStatus_str(s->state));
-        return -EINVAL;
+        return false;
     }
 
     /*
@@ -4191,26 +4193,22 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
     if (size != local_size) {
         error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64
                    " != 0x%"PRIx64")", block->idstr, size, local_size);
-        ret = -EINVAL;
         goto out;
     }
 
     size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
     end_mark = qemu_get_be64(file);
 
-    ret = qemu_file_get_error(file);
-    if (ret || size != local_size) {
-        error_setg(errp, "read bitmap failed for ramblock '%s': %d"
-                   " (size 0x%"PRIx64", got: 0x%"PRIx64")",
-                   block->idstr, ret, local_size, size);
-        ret = -EIO;
+    if (qemu_file_get_error(file) || size != local_size) {
+        error_setg(errp, "read bitmap failed for ramblock '%s': "
+                   "(size 0x%"PRIx64", got: 0x%"PRIx64")",
+                   block->idstr, local_size, size);
         goto out;
     }
 
     if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
         error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64,
                    block->idstr, end_mark);
-        ret = -EINVAL;
         goto out;
     }
 
@@ -4243,10 +4241,10 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
      */
     migration_rp_kick(s);
 
-    ret = 0;
+    result = true;
 out:
     g_free(le_bitmap);
-    return ret;
+    return result;
 }
 
 static int ram_resume_prepare(MigrationState *s, void *opaque)
-- 
2.41.0


-- 
Peter Xu



  reply	other threads:[~2023-10-05 16:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04 22:02 [PATCH v3 00/10] migration: Better error handling in rp thread, allow failures in recover Peter Xu
2023-10-04 22:02 ` [PATCH v3 01/10] migration: Display error in query-migrate irrelevant of status Peter Xu
2023-10-05  7:28   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 02/10] migration: Introduce migrate_has_error() Peter Xu
2023-10-05  7:30   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 03/10] migration: Refactor error handling in source return path Peter Xu
2023-10-05  6:11   ` Philippe Mathieu-Daudé
2023-10-05 16:05     ` Peter Xu [this message]
2023-10-08 11:39       ` Philippe Mathieu-Daudé
2023-10-05  8:22   ` Juan Quintela
2023-10-05 19:35     ` Peter Xu
2023-10-05 12:57   ` Fabiano Rosas
2023-10-05 19:35     ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 04/10] migration: Deliver return path file error to migrate state too Peter Xu
2023-10-05  7:32   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 05/10] qemufile: Always return a verbose error Peter Xu
2023-10-05  7:42   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 06/10] migration: Remember num of ramblocks to sync during recovery Peter Xu
2023-10-05  7:43   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 07/10] migration: Add migration_rp_wait|kick() Peter Xu
2023-10-05  7:49   ` Juan Quintela
2023-10-05 20:47     ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 08/10] migration: Allow network to fail even during recovery Peter Xu
2023-10-05 13:25   ` Fabiano Rosas
2023-10-04 22:02 ` [PATCH v3 09/10] migration: Allow RECOVER->PAUSED convertion for dest qemu Peter Xu
2023-10-05  8:24   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER Peter Xu
2023-10-05 13:24   ` Fabiano Rosas
2023-10-05 13:37     ` Fabiano Rosas
2023-10-05 20:55       ` Peter Xu
2023-10-05 21:10         ` Fabiano Rosas
2023-10-05 21:44           ` Peter Xu
2023-10-05 22:01             ` Fabiano Rosas
2023-10-09 16:50               ` Fabiano Rosas
2023-10-10 16:00                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZR7e3cmxCH9LAdnS@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=farosas@suse.de \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.