qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Philippe Mathieu-Daudé" <philmd@linaro.org>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Juan Quintela <quintela@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v3 03/10] migration: Refactor error handling in source return path
Date: Sun, 8 Oct 2023 13:39:16 +0200	[thread overview]
Message-ID: <bb34f0ec-06b4-c635-dce9-385c3b87c57e@linaro.org> (raw)
In-Reply-To: <ZR7e3cmxCH9LAdnS@x1n>

On 5/10/23 18:05, Peter Xu wrote:
> On Thu, Oct 05, 2023 at 08:11:33AM +0200, Philippe Mathieu-Daudé wrote:
>> Hi Peter,
>>
>> On 5/10/23 00:02, Peter Xu wrote:
>>> rp_state.error was a boolean used to show error happened in return path
>>> thread.  That's not only duplicating error reporting (migrate_set_error),
>>> but also not good enough in that we only do error_report() and set it to
>>> true, we never can keep a history of the exact error and show it in
>>> query-migrate.
>>>
>>> To make this better, a few things done:
>>>
>>>     - Use error_setg() rather than error_report() across the whole lifecycle
>>>       of return path thread, keeping the error in an Error*.
>>>
>>>     - Use migrate_set_error() to apply that captured error to the global
>>>       migration object when error occured in this thread.
>>>
>>>     - With above, no need to have mark_source_rp_bad(), remove it, alongside
>>>       with rp_state.error itself.
>>>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>    migration/migration.h  |   1 -
>>>    migration/ram.h        |   5 +-
>>>    migration/migration.c  | 123 ++++++++++++++++++-----------------------
>>>    migration/ram.c        |  41 +++++++-------
>>>    migration/trace-events |   4 +-
>>>    5 files changed, 79 insertions(+), 95 deletions(-)
>>
>>
>>> -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
>>> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
>>>    {
>>>        int ret = -EINVAL;
>>>        /* from_dst_file is always valid because we're within rp_thread */
>>
>>
>>> @@ -4193,16 +4194,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
>>>        ret = qemu_file_get_error(file);
>>>        if (ret || size != local_size) {
>>> -        error_report("%s: read bitmap failed for ramblock '%s': %d"
>>> -                     " (size 0x%"PRIx64", got: 0x%"PRIx64")",
>>> -                     __func__, block->idstr, ret, local_size, size);
>>> +        error_setg(errp, "read bitmap failed for ramblock '%s': %d"
>>> +                   " (size 0x%"PRIx64", got: 0x%"PRIx64")",
>>> +                   block->idstr, ret, local_size, size);
>>>            ret = -EIO;
>>>            goto out;
>>>        }
>>>        if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
>>> -        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64,
>>> -                     __func__, block->idstr, end_mark);
>>> +        error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64,
>>> +                   block->idstr, end_mark);
>>>            ret = -EINVAL;
>>>            goto out;
>>>        }
>>
>> This function returns -EIO/-EINVAL errors, propagated to its 2 callers
>>   - migrate_handle_rp_recv_bitmap()
>>   - migrate_handle_rp_resume_ack()
> 
> It was only called in migrate_handle_rp_recv_bitmap(), but I think I see
> what you meant..
> 
>> which are only used in source_return_path_thread() where the return
>> value is only checked as boolean.
>>
>> Could we simplify them returning a boolean (which is the pattern with
>> functions taking an Error** as last parameter)?
> 
> Yes, with errp passed in, the "int" retcode is slightly duplicated.  I can
> add one more patch on top of this as further cleanup, as below.
> 
> Thanks,
> 
> ===8<===
>  From b1052befd72beb129012afddf5647339fe4e257c Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Thu, 5 Oct 2023 12:03:44 -0400
> Subject: [PATCH] migration: Change ram_dirty_bitmap_reload() retval to bool
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Now we have a Error** passed into the return path thread stack, which is
> even clearer than an int retval.  Change ram_dirty_bitmap_reload() and the
> callers to use a bool instead to replace errnos.
> 
> Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   migration/ram.h       |  2 +-
>   migration/migration.c | 18 +++++++++---------
>   migration/ram.c       | 24 +++++++++++-------------
>   3 files changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/migration/ram.h b/migration/ram.h
> index 14ed666d58..af0290f8ab 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -72,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
>   void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
>   int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>                                     const char *block_name);
> -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
> +bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
>   bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
>   void postcopy_preempt_shutdown_file(MigrationState *s);
>   void *postcopy_preempt_thread(void *opaque);
> diff --git a/migration/migration.c b/migration/migration.c
> index 1a7f214fcf..e7375810be 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1837,29 +1837,29 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>       ram_save_queue_pages(rbname, start, len, errp);
>   }
>   
> -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
> -                                         Error **errp)
> +static bool migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
> +                                          Error **errp)
>   {
>       RAMBlock *block = qemu_ram_block_by_name(block_name);
>   
>       if (!block) {
>           error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'",
>                      block_name);
> -        return -EINVAL;
> +        return false;
>       }
>   
>       /* Fetch the received bitmap and refresh the dirty bitmap */
>       return ram_dirty_bitmap_reload(s, block, errp);
>   }
>   
> -static int migrate_handle_rp_resume_ack(MigrationState *s,
> -                                        uint32_t value, Error **errp)
> +static bool migrate_handle_rp_resume_ack(MigrationState *s,
> +                                         uint32_t value, Error **errp)
>   {
>       trace_source_return_path_thread_resume_ack(value);
>   
>       if (value != MIGRATION_RESUME_ACK_VALUE) {
>           error_setg(errp, "illegal resume_ack value %"PRIu32, value);
> -        return -1;
> +        return false;
>       }
>   
>       /* Now both sides are active. */
> @@ -1869,7 +1869,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s,
>       /* Notify send thread that time to continue send pages */
>       migration_rp_kick(s);
>   
> -    return 0;
> +    return true;
>   }
>   
>   /*
> @@ -2021,14 +2021,14 @@ static void *source_return_path_thread(void *opaque)
>               }
>               /* Format: len (1B) + idstr (<255B). This ends the idstr. */
>               buf[buf[0] + 1] = '\0';
> -            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) {
> +            if (!migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) {
>                   goto out;
>               }
>               break;
>   
>           case MIG_RP_MSG_RESUME_ACK:
>               tmp32 = ldl_be_p(buf);
> -            if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) {
> +            if (!migrate_handle_rp_resume_ack(ms, tmp32, &err)) {
>                   goto out;
>               }
>               break;
> diff --git a/migration/ram.c b/migration/ram.c
> index 2565f53f5c..982fbbeee1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4157,23 +4157,25 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
>    * Read the received bitmap, revert it as the initial dirty bitmap.
>    * This is only used when the postcopy migration is paused but wants
>    * to resume from a middle point.
> + *
> + * Returns true if succeeded, false for errors.
>    */
> -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
> +bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
>   {
> -    int ret = -EINVAL;
>       /* from_dst_file is always valid because we're within rp_thread */
>       QEMUFile *file = s->rp_state.from_dst_file;
>       unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
>       uint64_t local_size = DIV_ROUND_UP(nbits, 8);
>       uint64_t size, end_mark;
>       RAMState *rs = ram_state;
> +    bool result = false;
>   
>       trace_ram_dirty_bitmap_reload_begin(block->idstr);
>   
>       if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
>           error_setg(errp, "Reload bitmap in incorrect state %s",
>                      MigrationStatus_str(s->state));
> -        return -EINVAL;
> +        return false;
>       }
>   
>       /*
> @@ -4191,26 +4193,22 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
>       if (size != local_size) {
>           error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64
>                      " != 0x%"PRIx64")", block->idstr, size, local_size);
> -        ret = -EINVAL;
>           goto out;
>       }
>   
>       size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
>       end_mark = qemu_get_be64(file);
>   
> -    ret = qemu_file_get_error(file);
> -    if (ret || size != local_size) {
> -        error_setg(errp, "read bitmap failed for ramblock '%s': %d"
> -                   " (size 0x%"PRIx64", got: 0x%"PRIx64")",
> -                   block->idstr, ret, local_size, size);
> -        ret = -EIO;
> +    if (qemu_file_get_error(file) || size != local_size) {
> +        error_setg(errp, "read bitmap failed for ramblock '%s': "
> +                   "(size 0x%"PRIx64", got: 0x%"PRIx64")",
> +                   block->idstr, local_size, size);
>           goto out;
>       }
>   
>       if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
>           error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64,
>                      block->idstr, end_mark);
> -        ret = -EINVAL;
>           goto out;
>       }
>   
> @@ -4243,10 +4241,10 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
>        */
>       migration_rp_kick(s);
>   
> -    ret = 0;
> +    result = true;
>   out:
>       g_free(le_bitmap);
> -    return ret;
> +    return result;
>   }
>   
>   static int ram_resume_prepare(MigrationState *s, void *opaque)

Yes, exactly what I meant. For the embedded patch:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

One step further is to use g_autofree for le_bitmap to remove this
annoying 'out' label. I'll send the patch.


  reply	other threads:[~2023-10-08 11:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04 22:02 [PATCH v3 00/10] migration: Better error handling in rp thread, allow failures in recover Peter Xu
2023-10-04 22:02 ` [PATCH v3 01/10] migration: Display error in query-migrate irrelevant of status Peter Xu
2023-10-05  7:28   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 02/10] migration: Introduce migrate_has_error() Peter Xu
2023-10-05  7:30   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 03/10] migration: Refactor error handling in source return path Peter Xu
2023-10-05  6:11   ` Philippe Mathieu-Daudé
2023-10-05 16:05     ` Peter Xu
2023-10-08 11:39       ` Philippe Mathieu-Daudé [this message]
2023-10-05  8:22   ` Juan Quintela
2023-10-05 19:35     ` Peter Xu
2023-10-05 12:57   ` Fabiano Rosas
2023-10-05 19:35     ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 04/10] migration: Deliver return path file error to migrate state too Peter Xu
2023-10-05  7:32   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 05/10] qemufile: Always return a verbose error Peter Xu
2023-10-05  7:42   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 06/10] migration: Remember num of ramblocks to sync during recovery Peter Xu
2023-10-05  7:43   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 07/10] migration: Add migration_rp_wait|kick() Peter Xu
2023-10-05  7:49   ` Juan Quintela
2023-10-05 20:47     ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 08/10] migration: Allow network to fail even during recovery Peter Xu
2023-10-05 13:25   ` Fabiano Rosas
2023-10-04 22:02 ` [PATCH v3 09/10] migration: Allow RECOVER->PAUSED convertion for dest qemu Peter Xu
2023-10-05  8:24   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER Peter Xu
2023-10-05 13:24   ` Fabiano Rosas
2023-10-05 13:37     ` Fabiano Rosas
2023-10-05 20:55       ` Peter Xu
2023-10-05 21:10         ` Fabiano Rosas
2023-10-05 21:44           ` Peter Xu
2023-10-05 22:01             ` Fabiano Rosas
2023-10-09 16:50               ` Fabiano Rosas
2023-10-10 16:00                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb34f0ec-06b4-c635-dce9-385c3b87c57e@linaro.org \
    --to=philmd@linaro.org \
    --cc=armbru@redhat.com \
    --cc=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).