From: Paul Mackerras <paulus@ozlabs.org>
To: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@ozlabs.org>,
Michael Ellerman <mpe@ellerman.id.au>,
KVM-PPC <kvm-ppc@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 2/3] powerpc/book3s: Fix TB corruption in guest exit path on HMI interrupt.
Date: Fri, 18 Mar 2016 15:51:32 +1100 [thread overview]
Message-ID: <20160318045132.GB1393@fergus.ozlabs.ibm.com> (raw)
In-Reply-To: <20160114031503.1287.54860.stgit@mars>
On Thu, Jan 14, 2016 at 08:45:04AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> When a guest is assigned to a core it converts the host Timebase (TB)
> into guest TB by adding guest timebase offset before entering into
> guest. During guest exit it restores the guest TB to host TB. This means
> under certain conditions (Guest migration) host TB and guest TB can differ.
>
> When we get an HMI for TB related issues the opal HMI handler would
> try fixing errors and restore the correct host TB value. With no guest
> running, we don't have any issues. But with guest running on the core
> we run into TB corruption issues.
>
> If we get an HMI while in the guest, the current HMI handler invokes opal
> hmi handler before forcing guest to exit. The guest exit path subtracts
> the guest TB offset from the current TB value which may have already
> been restored with host value by opal hmi handler. This leads to incorrect
> host and guest TB values.
>
> With split-core, things become more complex. With split-core, TB also gets
> split and each subcore gets its own TB register. When a hmi handler fixes
> a TB error and restores the TB value, it affects all the TB values of
> sibling subcores on the same core. On TB errors all the thread in the core
> gets HMI. With existing code, the individual threads call opal hmi handle
> independently which can easily throw TB out of sync if we have guest
> running on subcores. Hence we will need to co-ordinate with all the
> threads before making opal hmi handler call followed by TB resync.
>
> This patch introduces a sibling subcore state structure (shared by all
> threads in the core) in paca which holds information about whether sibling
> subcores are in Guest mode or host mode. An array in_guest[] of size
> MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
> The subcore id is used as index into in_guest[] array. Only primary
> thread entering/exiting the guest is responsible to set/unset its
> designated array element.
>
> On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
> this patch will now force guest to vacate the core/subcore. Primary
> thread from each subcore will then turn off its respective bit
> from the above bitmap during the guest exit path just after the
> guest->host partition switch is complete.
>
> All other threads that have just exited the guest OR were already in host
> will wait until all other subcores clears their respective bit.
> Once all the subcores turn off their respective bit, all threads will
> will make call to opal hmi handler.
>
> It is not necessary that opal hmi handler would resync the TB value for
> every HMI interrupts. It would do so only for the HMI caused due to
> TB errors. For rest, it would not touch TB value. Hence to make things
> simpler, primary thread would call TB resync explicitly once for each
> core immediately after opal hmi handler instead of subtracting guest
> offset from TB. TB resync call will restore the TB with host value.
> Thus we can be sure about the TB state.
>
> One of the primary threads exiting the guest will take up the
> responsibility of calling TB resync. It will use one of the top bits
> (bit 63) from subcore state flags bitmap to make the decision. The first
> primary thread (among the subcores) that is able to set the bit will
> have to call the TB resync. Rest all other threads will wait until TB
> resync is complete. Once TB resync is complete all threads will then
> proceed.
This patch looks pretty good; the one nit that I can see is that you
have a vcpu parameter to kvmppc_realmode_hmi_handler which is
completely unused; which is just as well, because you get it from r9
at the point where it is called, but at that point r9 may or may not
contain a vcpu pointer, depending on the path we take to get there.
Why not just remove the vcpu argument?
Also, patch 3/3 seems like it is just something that was missed from
this patch (2/3). Why not fold it in?
Paul.
next prev parent reply other threads:[~2016-03-18 4:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-14 3:14 [PATCH 1/3] Powernv: Remove the usage of PACAR1 from opal wrappers Mahesh J Salgaonkar
2016-01-14 3:15 ` [PATCH 2/3] powerpc/book3s: Fix TB corruption in guest exit path on HMI interrupt Mahesh J Salgaonkar
2016-03-18 4:51 ` Paul Mackerras [this message]
2016-05-15 4:21 ` Mahesh Jagannath Salgaonkar
2016-01-14 3:15 ` [PATCH 3/3] KVM: PPC: Book3S HV: Fix soft lockups in KVM on HMI for time base errors Mahesh J Salgaonkar
2016-03-18 3:33 ` [1/3] Powernv: Remove the usage of PACAR1 from opal wrappers Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160318045132.GB1393@fergus.ozlabs.ibm.com \
--to=paulus@ozlabs.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mahesh@linux.vnet.ibm.com \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).