From: David Gibson <david@gibson.dropbear.id.au>
To: Markus Armbruster <armbru@redhat.com>
Cc: Aravinda Prasad <arawinda.p@gmail.com>,
Ganesh Goudar <ganeshgr@linux.ibm.com>,
qemu-devel@nongnu.org
Subject: Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?
Date: Mon, 19 Jul 2021 17:20:58 +1000 [thread overview]
Message-ID: <YPUn2quWrztTqyML@yekko> (raw)
In-Reply-To: <87lf62ydow.fsf@dusky.pond.sub.org>
[-- Attachment #1: Type: text/plain, Size: 4459 bytes --]
On Mon, Jul 19, 2021 at 09:18:07AM +0200, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
>
> > On Thu, Jul 15, 2021 at 03:32:06PM +0200, Markus Armbruster wrote:
> >> Commit 2500fb423a "migration: Include migration support for machine
> >> check handling" adds this:
> >>
> >> ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >> if (ret == -EBUSY) {
> >> /*
> >> * We don't want to abort so we let the migration to continue.
> >> * In a rare case, the machine check handler will run on the target.
> >> * Though this is not preferable, it is better than aborting
> >> * the migration or killing the VM.
> >> */
> >> warn_report("Received a fwnmi while migration was in progress");
> >> }
> >>
> >> migrate_add_blocker() can fail in two ways:
> >>
> >> 1. -EBUSY: migration is already in progress
> >>
> >> Ignoring this one is clearly intentional. The comment explains why.
> >> I'm taking it at face value (I'm a spapr ignoramus).
> >
> > Right. The argument isn't really about papr particularly, except
> > insofar as understanding what fwnmi is. fwnmi (FirmWare assisted NMI)
> > is a reporting mechanism for certain low-level hardware failures
> > (think memory ECC or cpu level faults, IIRC). If we migrate between
> > detecting and reporting the error, then the particulars we report will
> > be mostly meaningless since they relate to hardware we're no longer
> > running on. Hence the migration blocker.
> >
> > However, migrating away from a (non-fatal) fwnmi error is a pretty
> > reasonable response, so we don't want to actually fail a migration if
> > its already in progress.
> >
> >> Aside: I doubt
> >> the warning is going to help users.
> >
> > You're probably right, but it's not very clear how to do better. It
> > might possibly help someone in tech support explain why the reported
> > fwnmi doesn't seem to match the hardware the guest is (now) running
> > on.
>
> Perhaps pointing to the actual problem could help: the FWNMI's
> information is mostly meaningless.
Sorry, I don't follow what you're suggesting.
>
> >> 2. -EACCES: we're running with -only-migratable
> >>
> >> Why may we ignore -only-migratable here?
> >
> > Short answer: because I didn't think about that case. Long answer:
> > I think we probably shoud ignore it anyway. As above, receiving a
> > fwnmi doesn't really prevent migration, it just means that if you're
> > unlucky it can report stale information. Since migrating away from a
> > possibly-dubious host would be a reasonable response to a non-fatal
> > fwnmi, I don't think we want to simply prohibit fwnmi entirely with
> > -only-migratable.
>
> I think the comment text and placement could be improved to make clear
> ignoring this failure is intentional, too. How do you like the
> following?
That's fair..
>
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index a8f2cc6bdc..54d8e856d3 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -911,16 +911,14 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> }
> }
>
> + /*
> + * Try to block migration while FWNMI is being handled, so the
> + * machine check handler runs where the information passed to it
> + * actually makes sense. This won't actually block migration,
> + * only delay it slightly. If the attempt fails, carry on.
> + */
> ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, NULL);
> if (ret == -EBUSY) {
> - /*
> - * We don't want to abort so we let the migration to continue.
> - * In a rare case, the machine check handler will run on the target.
> - * Though this is not preferable, it is better than aborting
> - * the migration or killing the VM. It is okay to call
> - * migrate_del_blocker on a blocker that was not added (which the
> - * nmi-interlock handler would do when it's called after this).
> - */
> warn_report("Received a fwnmi while migration was in progress");
> }
LGTM.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-07-19 7:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-15 13:32 spapr_events: Sure we may ignore migrate_add_blocker() failure? Markus Armbruster
2021-07-19 2:31 ` David Gibson
2021-07-19 7:18 ` Markus Armbruster
2021-07-19 7:20 ` David Gibson [this message]
2021-07-19 10:41 ` Markus Armbruster
2021-07-19 11:00 ` -only-migrate and the two different uses of migration blockers (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?) Markus Armbruster
2021-07-19 12:42 ` Dr. David Alan Gilbert
2021-07-20 5:30 ` -only-migrate and the two different uses of migration blockers Markus Armbruster
2021-07-21 6:32 ` David Gibson
2021-07-22 18:00 ` Dr. David Alan Gilbert
2021-07-25 6:25 ` David Gibson
2021-11-02 14:32 ` Juan Quintela
2021-11-02 14:30 ` Juan Quintela
2021-07-21 6:26 ` spapr_events: Sure we may ignore migrate_add_blocker() failure? David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YPUn2quWrztTqyML@yekko \
--to=david@gibson.dropbear.id.au \
--cc=arawinda.p@gmail.com \
--cc=armbru@redhat.com \
--cc=ganeshgr@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).