Kernel KVM-PPC virtualization development
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
Date: Fri, 10 Aug 2012 01:25:32 +0000	[thread overview]
Message-ID: <20120810012532.GA15142@amt.cnet> (raw)
In-Reply-To: <20120810003439.GB26420@bloggs.ozlabs.ibm.com>

On Fri, Aug 10, 2012 at 10:34:39AM +1000, Paul Mackerras wrote:
> On Thu, Aug 09, 2012 at 03:16:12PM -0300, Marcelo Tosatti wrote:
> 
> > The !memslot->npages case is handled in __kvm_set_memory_region
> > (please read that part, before kvm_arch_prepare_memory_region() call).
> > 
> > kvm_arch_flush_shadow should be implemented.
> 
> Book3S HV doesn't have shadow page tables per se, rather the hardware
> page table is under the control of the hypervisor (i.e. KVM), and
> entries are added and removed by the guest using hypercalls.  On
> recent machines (POWER7) the hypervisor can choose whether or not to
> have the hardware PTE point to a real page of memory; if it doesn't,
> access by the guest will trap to the hypervisor.  On older machines
> (PPC970) we don't have that flexibility, and we have to provide a real
> page of memory (i.e. RAM or I/O) behind every hardware PTE.  (This is
> because PPC970 provides no way for page faults in the guest to go to
> the hypervisor.)
> 
> I could implement kvm_arch_flush_shadow to remove the backing pages
> behind every hardware PTE, but that would be very slow and inefficient
> on POWER7, and would break the guest on PPC970, particularly in the
> case where userspace is removing a small memory slot containing some
> I/O device and leaving the memory slot for system RAM untouched.
> 
> So the reason for unmapping the hardware PTEs in
> kvm_arch_prepare_memory_region rather than kvm_arch_flush_shadow is
> that that way we know which memslot is going away.
> 
> What exactly are the semantics of kvm_arch_flush_shadow?  

It removes all translations mapped via memslots. Its used in cases where
the translations become stale, or during shutdown.

> I presume that on x86 with NPT/EPT it basically does nothing - is that right?

It does, it removes all NPT/EPT ptes (named "sptes" in arch/x86/kvm/).
The translations are rebuilt on demand (when accesses by the guest fault
into the HV).

> > > +	if (old->npages) {
> > > +		/* modifying guest_phys or flags */
> > > +		if (old->base_gfn != memslot->base_gfn)
> > > +			kvmppc_unmap_memslot(kvm, old);
> > 
> > This case is also handled generically by the last kvm_arch_flush_shadow
> > call in __kvm_set_memory_region.
> 
> Again, to use this we would need to know which memslot we're
> flushing.  If we could change __kvm_set_memory_region to pass the
> memslot for these kvm_arch_flush_shadow calls, then I could do as you
> suggest.  (Though I would need to think carefully about what would
> happen with guest invalidations of hardware PTEs in the interval
> between the rcu_assign_pointer(kvm->memslots, slots) and the
> kvm_arch_flush_shadow, and whether the invalidation would find the
> correct location in the rmap array, given that we have updated the
> base_gfn in the memslot without first getting rid of any references to
> those pages in the hardware page table.)

That can be done.

I'll send a patch to flush per memslot in the next days, you can work
out the PPC details in the meantime.

To be clear: this is necessary to have consistent behaviour across
arches in the kvm_set_memory codepath which is tricky (not nitpicking).

Alternatively, kvm_arch_flush_shadow can be split into two methods (but
thats not necessary if memslot information is sufficient for PPC).

> > > +		if (memslot->dirty_bitmap &&
> > > +		    old->dirty_bitmap != memslot->dirty_bitmap)
> > > +			kvmppc_hv_get_dirty_log(kvm, old);
> > > +		return 0;
> > > +	}
> > 
> > Better clear dirty log unconditionally on kvm_arch_commit_memory_region,
> > similarly to x86 (just so its consistent).
> 
> OK.
> 
> > > --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > > +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > > @@ -81,7 +81,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
> > >  	ptel = rev->guest_rpte |= rcbits;
> > >  	gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
> > >  	memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
> > > -	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
> > > +	if (!memslot)
> > >  		return;
> > 
> > Why remove this check? (i don't know why it was there in the first
> > place, just checking).
> 
> This is where we are removing the page backing a hardware PTE and thus
> removing the hardware PTE from the reverse-mapping list for the page.
> We want to be able to do that properly even if the memslot is in the
> process of going away.  I had the flags check in there originally
> because other places that used a memslot had that check, but when I
> read __kvm_set_memory_region more carefully I realized that the
> KVM_MEMSLOT_INVALID flag indicates that we should not create any more
> references to pages in the memslot, but we do still need to be able to
> handle references going away, i.e. pages in the memslot getting
> unmapped.
> 
> Paul.

Yes, thats it. kvm_arch_flush_shadow requires functional memslot lookup,
for example.


  reply	other threads:[~2012-08-10  1:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-06 10:02 [PATCH 0/5] Improve memory slot handling and other fixes Paul Mackerras
2012-08-06 10:03 ` [PATCH 1/5] KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code Paul Mackerras
2012-08-06 10:04 ` [PATCH 2/5] KVM: PPC: Quieten message about allocating linear regions Paul Mackerras
2012-08-06 10:06 ` [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly Paul Mackerras
2012-08-09 18:16   ` Marcelo Tosatti
2012-08-10  0:34     ` Paul Mackerras
2012-08-10  1:25       ` Marcelo Tosatti [this message]
2012-08-10  1:33         ` Marcelo Tosatti
2012-08-10  2:09         ` Takuya Yoshikawa
2012-08-10 18:35           ` Marcelo Tosatti
2012-08-11  0:37             ` Paul Mackerras
2012-08-13 16:34               ` Marcelo Tosatti
2012-08-13 22:04                 ` Marcelo Tosatti
2012-08-15  9:26                   ` Avi Kivity
2012-08-15 17:59                     ` Marcelo Tosatti
2012-08-17  7:06                       ` Benjamin Herrenschmidt
2012-08-17 18:39                         ` Marcelo Tosatti
2012-08-17 20:32                           ` Benjamin Herrenschmidt
2012-08-23 13:55                             ` Marcelo Tosatti
2012-08-24  9:29                               ` Paul Mackerras
2012-08-24 18:58                                 ` Marcelo Tosatti
2012-08-19  9:39                           ` Avi Kivity
2012-08-15  6:06                 ` Paul Mackerras
2012-08-15  9:23                 ` Avi Kivity
2012-08-06 10:06 ` [PATCH 4/5] KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots Paul Mackerras
2012-08-09 18:22   ` Marcelo Tosatti
2012-08-10  0:45     ` Paul Mackerras
2012-08-06 10:08 ` [RFC PATCH 5/5] KVM: PPC: Take the SRCU lock around memslot use Paul Mackerras
2012-08-09 18:27   ` Marcelo Tosatti
2012-08-10  0:37     ` Paul Mackerras
2012-08-10  9:27       ` Alexander Graf
2012-08-15  8:16         ` Benjamin Herrenschmidt
2012-08-10  9:23 ` [PATCH 0/5] Improve memory slot handling and other fixes Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120810012532.GA15142@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox