Re: large page support for kvm

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <marcelo@kvack.org>
To: Avi Kivity <avi@qumranet.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>,
	kvm-devel <kvm-devel@lists.sourceforge.net>
Subject: Re: large page support for kvm
Date: Tue, 19 Feb 2008 17:37:33 -0300	[thread overview]
Message-ID: <20080219203733.GA7558@dmt> (raw)
In-Reply-To: <47B800AB.20905@qumranet.com>

On Sun, Feb 17, 2008 at 11:38:51AM +0200, Avi Kivity wrote:
> >+ * Return the pointer to the largepage write count for a given
> >+ * gfn, handling slots that are not large page aligned.
> >+ */
> >+static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
> >+{
> >+	unsigned long idx;
> >+
> >+	idx = (gfn - slot->base_gfn) + hpage_align_diff(slot->base_gfn);
> >+	idx /= KVM_PAGES_PER_HPAGE;
> >+	return &slot->lpage_info[idx].write_count;
> >+}
> >  
> 
> Can be further simplified to (gfn / KVM_PAGES_PER_HPAGE) - 
> (slot->base_gfn / KVM_PAGES_PER_HPAGE).  Sorry for not noticing earlier.

Right.

> >+	/* guest has 4M pages, host 2M */
> >+	if (!is_pae(vcpu) && HPAGE_SHIFT == 21)
> >+		return 0;
> >  
> 
> Is this check necessary?  I think that if we remove it things will just 
> work.  A 4MB page will be have either one or two 2MB sptes (which may 
> even belong to different slots).

You mentioned that before, I agree its not necessary.

> >+			/*
> >+ 			 * Largepage creation is susceptible to a upper-level
> >+ 			 * table to be shadowed and write-protected in the
> >+ 			 * area being mapped. If that is the case, invalidate
> >+ 			 * the entry and let the instruction fault again
> >+ 			 * and use 4K mappings.
> >+ 			 */
> >+			if (largepage) {
> >+				spte = shadow_trap_nonpresent_pte;
> >+				kvm_x86_ops->tlb_flush(vcpu);
> >+				goto unshadowed;
> >+			}
> >  
> 
> Would it not repeat exactly the same code path?  Or is this just for the 
> case of the pte_update path?

The problem is if the instruction writing to one of the roots can't be
emulated.

kvm_mmu_unprotect_page() does not know about largepages, so it will zap
a gfn inside the large page frame, but not the large translation itself.

And zapping the gfn brings the shadowed page count in large area to
zero, allowing has_wrprotected_page() to succeed. Endless unfixable
write faults.

> >-	page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
> >+	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))
> >+	    && is_physical_memory(vcpu->kvm, gfn)) {
> >+		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
> >+		largepage = 1;
> >+	}
> >  
> 
> Doesn't is_largepage_backed() imply is_physical_memory?

Hum. I'll verify... it seems that now that the ends of the slots have
write_count set to 1 that should be true.

> > 
> >Index: kvm.largepages/arch/x86/kvm/x86.c
> >===================================================================
> >--- kvm.largepages.orig/arch/x86/kvm/x86.c
> >+++ kvm.largepages/arch/x86/kvm/x86.c
> >@@ -86,6 +86,7 @@ struct kvm_stats_debugfs_item debugfs_en
> > 	{ "mmu_recycled", VM_STAT(mmu_recycled) },
> > 	{ "mmu_cache_miss", VM_STAT(mmu_cache_miss) },
> > 	{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
> >+	{ "lpages", VM_STAT(lpages) },
> > 	{ NULL }
> > };
> >  
> 
> s/lpages/largepages/, this is user visible.

OK.

> >+		new.lpage_info = vmalloc(largepages * 
> >sizeof(*new.lpage_info));
> >+
> >+		if (!new.lpage_info)
> >+			goto out_free;
> >+
> >+		memset(new.lpage_info, 0, largepages * 
> >sizeof(*new.lpage_info));
> >+		/* large page crosses memslot boundary */
> >+		if (npages % KVM_PAGES_PER_HPAGE) {
> >+			new.lpage_info[0].write_count = 1;
> >  
> 
> This seems wrong, say a 3MB slot at 1GB, you kill the first largepage 
> which is good.
> 
> >+			new.lpage_info[largepages-1].write_count = 1;
> >  
> 
> OTOH, a 3MB slot at 3MB, the last page is fine.  The check needs to be 
> against base_gfn and base_gfn + npages, not the number of pages.

Right, will fix. Will post an uptodated patch soon.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

next prev parent reply	other threads:[~2008-02-19 20:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29 17:20 large page support for kvm Avi Kivity
     [not found] ` <479F604C.20107-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-30 18:40   ` Joerg Roedel
     [not found]     ` <20080130184035.GS6960-5C7GfCeVMHo@public.gmane.org>
2008-01-31  5:44       ` Avi Kivity
2008-02-11 15:49         ` Marcelo Tosatti
2008-02-12 11:55           ` Avi Kivity
2008-02-13  0:15             ` Marcelo Tosatti
2008-02-13  6:45               ` Avi Kivity
2008-02-14 23:17                 ` Marcelo Tosatti
2008-02-15  7:40                   ` Roedel, Joerg
2008-02-17  9:38                   ` Avi Kivity
2008-02-19 20:37                     ` Marcelo Tosatti [this message]
2008-02-20 14:25                       ` Avi Kivity
2008-02-22  2:01                         ` Marcelo Tosatti
2008-02-22  7:16                           ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080219203733.GA7558@dmt \
    --to=marcelo@kvack.org \
    --cc=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox