From: David Gibson <david@gibson.dropbear.id.au>
To: Markus Armbruster <armbru@redhat.com>
Cc: Aravinda Prasad <arawinda.p@gmail.com>,
Ganesh Goudar <ganeshgr@linux.ibm.com>,
qemu-devel@nongnu.org
Subject: Re: spapr_events: Sure we may ignore migrate_add_blocker() failure?
Date: Mon, 19 Jul 2021 17:20:58 +1000 [thread overview]
Message-ID: <YPUn2quWrztTqyML@yekko> (raw)
In-Reply-To: <87lf62ydow.fsf@dusky.pond.sub.org>
[-- Attachment #1: Type: text/plain, Size: 4459 bytes --]
On Mon, Jul 19, 2021 at 09:18:07AM +0200, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
>
> > On Thu, Jul 15, 2021 at 03:32:06PM +0200, Markus Armbruster wrote:
> >> Commit 2500fb423a "migration: Include migration support for machine
> >> check handling" adds this:
> >>
> >> ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, &local_err);
> >> if (ret == -EBUSY) {
> >> /*
> >> * We don't want to abort so we let the migration to continue.
> >> * In a rare case, the machine check handler will run on the target.
> >> * Though this is not preferable, it is better than aborting
> >> * the migration or killing the VM.
> >> */
> >> warn_report("Received a fwnmi while migration was in progress");
> >> }
> >>
> >> migrate_add_blocker() can fail in two ways:
> >>
> >> 1. -EBUSY: migration is already in progress
> >>
> >> Ignoring this one is clearly intentional. The comment explains why.
> >> I'm taking it at face value (I'm a spapr ignoramus).
> >
> > Right. The argument isn't really about papr particularly, except
> > insofar as understanding what fwnmi is. fwnmi (FirmWare assisted NMI)
> > is a reporting mechanism for certain low-level hardware failures
> > (think memory ECC or cpu level faults, IIRC). If we migrate between
> > detecting and reporting the error, then the particulars we report will
> > be mostly meaningless since they relate to hardware we're no longer
> > running on. Hence the migration blocker.
> >
> > However, migrating away from a (non-fatal) fwnmi error is a pretty
> > reasonable response, so we don't want to actually fail a migration if
> > its already in progress.
> >
> >> Aside: I doubt
> >> the warning is going to help users.
> >
> > You're probably right, but it's not very clear how to do better. It
> > might possibly help someone in tech support explain why the reported
> > fwnmi doesn't seem to match the hardware the guest is (now) running
> > on.
>
> Perhaps pointing to the actual problem could help: the FWNMI's
> information is mostly meaningless.
Sorry, I don't follow what you're suggesting.
>
> >> 2. -EACCES: we're running with -only-migratable
> >>
> >> Why may we ignore -only-migratable here?
> >
> > Short answer: because I didn't think about that case. Long answer:
> > I think we probably shoud ignore it anyway. As above, receiving a
> > fwnmi doesn't really prevent migration, it just means that if you're
> > unlucky it can report stale information. Since migrating away from a
> > possibly-dubious host would be a reasonable response to a non-fatal
> > fwnmi, I don't think we want to simply prohibit fwnmi entirely with
> > -only-migratable.
>
> I think the comment text and placement could be improved to make clear
> ignoring this failure is intentional, too. How do you like the
> following?
That's fair..
>
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index a8f2cc6bdc..54d8e856d3 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -911,16 +911,14 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
> }
> }
>
> + /*
> + * Try to block migration while FWNMI is being handled, so the
> + * machine check handler runs where the information passed to it
> + * actually makes sense. This won't actually block migration,
> + * only delay it slightly. If the attempt fails, carry on.
> + */
> ret = migrate_add_blocker(spapr->fwnmi_migration_blocker, NULL);
> if (ret == -EBUSY) {
> - /*
> - * We don't want to abort so we let the migration to continue.
> - * In a rare case, the machine check handler will run on the target.
> - * Though this is not preferable, it is better than aborting
> - * the migration or killing the VM. It is okay to call
> - * migrate_del_blocker on a blocker that was not added (which the
> - * nmi-interlock handler would do when it's called after this).
> - */
> warn_report("Received a fwnmi while migration was in progress");
> }
LGTM.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-07-19 7:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-15 13:32 spapr_events: Sure we may ignore migrate_add_blocker() failure? Markus Armbruster
2021-07-19 2:31 ` David Gibson
2021-07-19 7:18 ` Markus Armbruster
2021-07-19 7:20 ` David Gibson [this message]
2021-07-19 10:41 ` Markus Armbruster
2021-07-19 11:00 ` -only-migrate and the two different uses of migration blockers (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?) Markus Armbruster
2021-07-19 12:42 ` Dr. David Alan Gilbert
2021-07-20 5:30 ` -only-migrate and the two different uses of migration blockers Markus Armbruster
2021-07-21 6:32 ` David Gibson
2021-07-22 18:00 ` Dr. David Alan Gilbert
2021-07-25 6:25 ` David Gibson
2021-11-02 14:32 ` Juan Quintela
2021-11-02 14:30 ` Juan Quintela
2021-07-21 6:26 ` spapr_events: Sure we may ignore migrate_add_blocker() failure? David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YPUn2quWrztTqyML@yekko \
--to=david@gibson.dropbear.id.au \
--cc=arawinda.p@gmail.com \
--cc=armbru@redhat.com \
--cc=ganeshgr@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.