From: Juan Quintela <quintela@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>
Subject: Re: [PATCH v3 03/10] migration: Refactor error handling in source return path
Date: Thu, 05 Oct 2023 10:22:52 +0200 [thread overview]
Message-ID: <87wmw1o4v7.fsf@secure.mitica> (raw)
In-Reply-To: <20231004220240.167175-4-peterx@redhat.com> (Peter Xu's message of "Wed, 4 Oct 2023 18:02:33 -0400")
Peter Xu <peterx@redhat.com> wrote:
> rp_state.error was a boolean used to show error happened in return path
> thread. That's not only duplicating error reporting (migrate_set_error),
> but also not good enough in that we only do error_report() and set it to
> true, we never can keep a history of the exact error and show it in
> query-migrate.
>
> To make this better, a few things done:
>
> - Use error_setg() rather than error_report() across the whole lifecycle
> of return path thread, keeping the error in an Error*.
Good.
> - Use migrate_set_error() to apply that captured error to the global
> migration object when error occured in this thread.
Good.
> - With above, no need to have mark_source_rp_bad(), remove it, alongside
> with rp_state.error itself.
Good.
> uint64_t ram_pagesize_summary(void);
> -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
> +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
> + Error **errp);
good.
> @@ -1793,37 +1782,36 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
> */
> if (!QEMU_IS_ALIGNED(start, our_host_ps) ||
> !QEMU_IS_ALIGNED(len, our_host_ps)) {
> - error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
> - " len: %zd", __func__, start, len);
> - mark_source_rp_bad(ms);
> + error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:"
> + RAM_ADDR_FMT " len: %zd", start, len);
> return;
> }
>
> - if (ram_save_queue_pages(rbname, start, len)) {
> - mark_source_rp_bad(ms);
> - }
> + ram_save_queue_pages(rbname, start, len, errp);
ram_save_queue_pages() returns an int.
I think this function should return an int.
Next is independent of this patch:
> -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
> +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
> + Error **errp)
> {
> RAMBlock *block = qemu_ram_block_by_name(block_name);
>
> if (!block) {
> - error_report("%s: invalid block name '%s'", __func__, block_name);
> + error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'",
> + block_name);
> return -EINVAL;
We sent -EINVAL.
> }
>
> /* Fetch the received bitmap and refresh the dirty bitmap */
> - return ram_dirty_bitmap_reload(s, block);
> + return ram_dirty_bitmap_reload(s, block, errp);
> }
>
> -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> +static int migrate_handle_rp_resume_ack(MigrationState *s,
> + uint32_t value, Error **errp)
> {
> trace_source_return_path_thread_resume_ack(value);
>
> if (value != MIGRATION_RESUME_ACK_VALUE) {
> - error_report("%s: illegal resume_ack value %"PRIu32,
> - __func__, value);
> + error_setg(errp, "illegal resume_ack value %"PRIu32, value);
> return -1;
And here -1.
On both callers we just check if it is different from zero. We never
use the return value as errno, so I think we should move to -1, if there
is an error, that is what errp is for.
> -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
> -static int await_return_path_close_on_source(MigrationState *ms)
> +static void await_return_path_close_on_source(MigrationState *ms)
> {
> - int ret;
> -
> if (!ms->rp_state.rp_thread_created) {
> - return 0;
> + return;
> }
>
> trace_migration_return_path_end_before();
> @@ -2060,18 +2050,10 @@ static int await_return_path_close_on_source(MigrationState *ms)
> }
> }
>
> - trace_await_return_path_close_on_source_joining();
> qemu_thread_join(&ms->rp_state.rp_thread);
> ms->rp_state.rp_thread_created = false;
> - trace_await_return_path_close_on_source_close();
> -
> - ret = ms->rp_state.error;
> - ms->rp_state.error = false;
> -
> migration_release_dst_files(ms);
> -
> - trace_migration_return_path_end_after(ret);
> - return ret;
> + trace_migration_return_path_end_after();
> }
>
> static inline void
> @@ -2367,7 +2349,10 @@ static void migration_completion(MigrationState *s)
> goto fail;
> }
>
> - if (await_return_path_close_on_source(s)) {
> + await_return_path_close_on_source(s);
> +
> + /* If return path has error, should have been set here */
> + if (migrate_has_error(s)) {
> goto fail;
> }
In general, I think this is bad. We are moving for
int foo(..)
{
}
....
if (foo()) {
goto fail;
}
to:
void foo(..)
{
}
....
foo();
if (bar()) {
goto fail;
}
I would preffer to move the other way around. Move the error
synchrconously. My plan is that at some point in time
qemu_file_get_error() dissapears, i.e. we return the error when we
receive it and we handle it synchronously.
And yes, that is a something will take a lot of time, but I will hope we
move on that direction, not in trusting more setting internal errors,
use void functions and then check with yet another functions.
On top of your changes:
> -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
> {
> int ret = -EINVAL;
> /* from_dst_file is always valid because we're within rp_thread */
> @@ -4163,8 +4165,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
> trace_ram_dirty_bitmap_reload_begin(block->idstr);
>
> if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> - error_report("%s: incorrect state %s", __func__,
> - MigrationStatus_str(s->state));
> + error_setg(errp, "Reload bitmap in incorrect state %s",
> + MigrationStatus_str(s->state));
> return -EINVAL;
return -1
same for the rest of the cases. Callers only check for != 0, and if you
want details, you need to look at errp.
See the nice series for migration/rdma.c for why this is better (and
more consistent).
Rest of the patch is very nice.
Thanks, Juan.
next prev parent reply other threads:[~2023-10-05 8:24 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-04 22:02 [PATCH v3 00/10] migration: Better error handling in rp thread, allow failures in recover Peter Xu
2023-10-04 22:02 ` [PATCH v3 01/10] migration: Display error in query-migrate irrelevant of status Peter Xu
2023-10-05 7:28 ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 02/10] migration: Introduce migrate_has_error() Peter Xu
2023-10-05 7:30 ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 03/10] migration: Refactor error handling in source return path Peter Xu
2023-10-05 6:11 ` Philippe Mathieu-Daudé
2023-10-05 16:05 ` Peter Xu
2023-10-08 11:39 ` Philippe Mathieu-Daudé
2023-10-05 8:22 ` Juan Quintela [this message]
2023-10-05 19:35 ` Peter Xu
2023-10-05 12:57 ` Fabiano Rosas
2023-10-05 19:35 ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 04/10] migration: Deliver return path file error to migrate state too Peter Xu
2023-10-05 7:32 ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 05/10] qemufile: Always return a verbose error Peter Xu
2023-10-05 7:42 ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 06/10] migration: Remember num of ramblocks to sync during recovery Peter Xu
2023-10-05 7:43 ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 07/10] migration: Add migration_rp_wait|kick() Peter Xu
2023-10-05 7:49 ` Juan Quintela
2023-10-05 20:47 ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 08/10] migration: Allow network to fail even during recovery Peter Xu
2023-10-05 13:25 ` Fabiano Rosas
2023-10-04 22:02 ` [PATCH v3 09/10] migration: Allow RECOVER->PAUSED convertion for dest qemu Peter Xu
2023-10-05 8:24 ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER Peter Xu
2023-10-05 13:24 ` Fabiano Rosas
2023-10-05 13:37 ` Fabiano Rosas
2023-10-05 20:55 ` Peter Xu
2023-10-05 21:10 ` Fabiano Rosas
2023-10-05 21:44 ` Peter Xu
2023-10-05 22:01 ` Fabiano Rosas
2023-10-09 16:50 ` Fabiano Rosas
2023-10-10 16:00 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wmw1o4v7.fsf@secure.mitica \
--to=quintela@redhat.com \
--cc=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).