From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
Date: Fri, 10 Aug 2012 01:25:32 +0000 [thread overview]
Message-ID: <20120810012532.GA15142@amt.cnet> (raw)
In-Reply-To: <20120810003439.GB26420@bloggs.ozlabs.ibm.com>
On Fri, Aug 10, 2012 at 10:34:39AM +1000, Paul Mackerras wrote:
> On Thu, Aug 09, 2012 at 03:16:12PM -0300, Marcelo Tosatti wrote:
>
> > The !memslot->npages case is handled in __kvm_set_memory_region
> > (please read that part, before kvm_arch_prepare_memory_region() call).
> >
> > kvm_arch_flush_shadow should be implemented.
>
> Book3S HV doesn't have shadow page tables per se, rather the hardware
> page table is under the control of the hypervisor (i.e. KVM), and
> entries are added and removed by the guest using hypercalls. On
> recent machines (POWER7) the hypervisor can choose whether or not to
> have the hardware PTE point to a real page of memory; if it doesn't,
> access by the guest will trap to the hypervisor. On older machines
> (PPC970) we don't have that flexibility, and we have to provide a real
> page of memory (i.e. RAM or I/O) behind every hardware PTE. (This is
> because PPC970 provides no way for page faults in the guest to go to
> the hypervisor.)
>
> I could implement kvm_arch_flush_shadow to remove the backing pages
> behind every hardware PTE, but that would be very slow and inefficient
> on POWER7, and would break the guest on PPC970, particularly in the
> case where userspace is removing a small memory slot containing some
> I/O device and leaving the memory slot for system RAM untouched.
>
> So the reason for unmapping the hardware PTEs in
> kvm_arch_prepare_memory_region rather than kvm_arch_flush_shadow is
> that that way we know which memslot is going away.
>
> What exactly are the semantics of kvm_arch_flush_shadow?
It removes all translations mapped via memslots. Its used in cases where
the translations become stale, or during shutdown.
> I presume that on x86 with NPT/EPT it basically does nothing - is that right?
It does, it removes all NPT/EPT ptes (named "sptes" in arch/x86/kvm/).
The translations are rebuilt on demand (when accesses by the guest fault
into the HV).
> > > + if (old->npages) {
> > > + /* modifying guest_phys or flags */
> > > + if (old->base_gfn != memslot->base_gfn)
> > > + kvmppc_unmap_memslot(kvm, old);
> >
> > This case is also handled generically by the last kvm_arch_flush_shadow
> > call in __kvm_set_memory_region.
>
> Again, to use this we would need to know which memslot we're
> flushing. If we could change __kvm_set_memory_region to pass the
> memslot for these kvm_arch_flush_shadow calls, then I could do as you
> suggest. (Though I would need to think carefully about what would
> happen with guest invalidations of hardware PTEs in the interval
> between the rcu_assign_pointer(kvm->memslots, slots) and the
> kvm_arch_flush_shadow, and whether the invalidation would find the
> correct location in the rmap array, given that we have updated the
> base_gfn in the memslot without first getting rid of any references to
> those pages in the hardware page table.)
That can be done.
I'll send a patch to flush per memslot in the next days, you can work
out the PPC details in the meantime.
To be clear: this is necessary to have consistent behaviour across
arches in the kvm_set_memory codepath which is tricky (not nitpicking).
Alternatively, kvm_arch_flush_shadow can be split into two methods (but
thats not necessary if memslot information is sufficient for PPC).
> > > + if (memslot->dirty_bitmap &&
> > > + old->dirty_bitmap != memslot->dirty_bitmap)
> > > + kvmppc_hv_get_dirty_log(kvm, old);
> > > + return 0;
> > > + }
> >
> > Better clear dirty log unconditionally on kvm_arch_commit_memory_region,
> > similarly to x86 (just so its consistent).
>
> OK.
>
> > > --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > > +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > > @@ -81,7 +81,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
> > > ptel = rev->guest_rpte |= rcbits;
> > > gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
> > > memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
> > > - if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
> > > + if (!memslot)
> > > return;
> >
> > Why remove this check? (i don't know why it was there in the first
> > place, just checking).
>
> This is where we are removing the page backing a hardware PTE and thus
> removing the hardware PTE from the reverse-mapping list for the page.
> We want to be able to do that properly even if the memslot is in the
> process of going away. I had the flags check in there originally
> because other places that used a memslot had that check, but when I
> read __kvm_set_memory_region more carefully I realized that the
> KVM_MEMSLOT_INVALID flag indicates that we should not create any more
> references to pages in the memslot, but we do still need to be able to
> handle references going away, i.e. pages in the memslot getting
> unmapped.
>
> Paul.
Yes, thats it. kvm_arch_flush_shadow requires functional memslot lookup,
for example.
next prev parent reply other threads:[~2012-08-10 1:25 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-06 10:02 [PATCH 0/5] Improve memory slot handling and other fixes Paul Mackerras
2012-08-06 10:03 ` [PATCH 1/5] KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code Paul Mackerras
2012-08-06 10:04 ` [PATCH 2/5] KVM: PPC: Quieten message about allocating linear regions Paul Mackerras
2012-08-06 10:06 ` [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly Paul Mackerras
2012-08-09 18:16 ` Marcelo Tosatti
2012-08-10 0:34 ` Paul Mackerras
2012-08-10 1:25 ` Marcelo Tosatti [this message]
2012-08-10 1:33 ` Marcelo Tosatti
2012-08-10 2:09 ` Takuya Yoshikawa
2012-08-10 18:35 ` Marcelo Tosatti
2012-08-11 0:37 ` Paul Mackerras
2012-08-13 16:34 ` Marcelo Tosatti
2012-08-13 22:04 ` Marcelo Tosatti
2012-08-15 9:26 ` Avi Kivity
2012-08-15 17:59 ` Marcelo Tosatti
2012-08-17 7:06 ` Benjamin Herrenschmidt
2012-08-17 18:39 ` Marcelo Tosatti
2012-08-17 20:32 ` Benjamin Herrenschmidt
2012-08-23 13:55 ` Marcelo Tosatti
2012-08-24 9:29 ` Paul Mackerras
2012-08-24 18:58 ` Marcelo Tosatti
2012-08-19 9:39 ` Avi Kivity
2012-08-15 6:06 ` Paul Mackerras
2012-08-15 9:23 ` Avi Kivity
2012-08-06 10:06 ` [PATCH 4/5] KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots Paul Mackerras
2012-08-09 18:22 ` Marcelo Tosatti
2012-08-10 0:45 ` Paul Mackerras
2012-08-06 10:08 ` [RFC PATCH 5/5] KVM: PPC: Take the SRCU lock around memslot use Paul Mackerras
2012-08-09 18:27 ` Marcelo Tosatti
2012-08-10 0:37 ` Paul Mackerras
2012-08-10 9:27 ` Alexander Graf
2012-08-15 8:16 ` Benjamin Herrenschmidt
2012-08-10 9:23 ` [PATCH 0/5] Improve memory slot handling and other fixes Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120810012532.GA15142@amt.cnet \
--to=mtosatti@redhat.com \
--cc=agraf@suse.de \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox