public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Shaohua Li <shaoh.li-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [RFC]kvm: swapout guest page
Date: Mon, 21 May 2007 12:17:31 +0300	[thread overview]
Message-ID: <465163AB.8030402@qumranet.com> (raw)
In-Reply-To: <288dbef70705210112t710bc904pe546840f7b9cfcfa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Shaohua Li wrote:
> Hi,
> I saw some discussions on the topic but no progress. I did an
> experiment to make guest page be allocated dynamically and swap out.
> please see attachment patches. It's not yet for merge but I'd like get
> some suggestions and help. Patches (against kvm-19) work here but
> maybe not very stable as there should be some lock issue for swapout,
> which I'll do more check later. If you are brave, please try :). 

Nice work.  This is fairly different from what I had in mind - I wanted 
to use regular address spaces in kvm, whereas this patchset adds swapout 
capability to the kvm address space.

Differences between the two approaches include:

- yours is probably simpler :)
- possibly less intrusive code mm changes with using regular address spaces
- automatic hugetlbfs support (this was my main motivation for generic 
address spaces, esp. with npt/ept).  of course hugetlbfs can be 
implemented with your approach as well
- your approach allows kvm to continue using page->private, so it saves 
memory and requires less kvm modification
- using Linux address spaces allows paging to file-backed storage, not 
just swap

Ultimately I think the balance is in favor of your approach, as it is 
more tightly coupled with kvm and can therefore be faster.  The 
simplicity also helps a lot.

> Some
> issues I have:
> 1. there is a spinlock to pretoct kvm struct, we can't sleep in it. A
> possible solution is do a 'release lock, sleep and retry', but the
> shadow page fault path sounds not easy to follow it. The spinlock also
> prevents vcpu is migrated to other cpus as vmx operation must be done
> in the cpu vcpu runs. I changed it to a semaphore plus a cpu affinity
> setting. It's a little hacky, I'd see if there are better approaches.

My plan is to teach the scheduler about kvm, so it can call a callback 
when a vcpu is migrated.  That will allow re-enabling preemption in all 
kvm code except the actual entry/exit sequence.  This is an improvement 
all over (for realtime, for easier coding, for latency) so I hope to to 
it soon.

> 2. Linux page relcaim can't get if a guest page is referenced often.
> My current patch just bliendly adds guest page to lru, not optimized.

Well, that will always be a problem with paging guest memory.  There are 
some patches floating around to allow a guest to give hints to the host 
about page recency, for s390, which may help.

> 3. kvm_ops.tlb_flush should really send an IPI to make the vcpu flush
> tlb, as it might be called in other cpus other than the cpu vcpu run.
> This makes the swapout path not be able to zap shadow page tables. My
> patch just skip any guest page which has shadow page table points to.
> I assume kvm smp guest support will improve the tlb_flush.
>

Yes.  The apic patchset includes mechanisms for interrupting a running 
vcpu which can be used for this.

> @@ -151,9 +151,8 @@
>  		walker->inherited_ar &= walker->table[index];
>  		table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>  		paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK);
> -		kunmap_atomic(walker->table, KM_USER0);
> -		walker->table = kmap_atomic(pfn_to_page(paddr >> PAGE_SHIFT),
> -					    KM_USER0);
> +		kunmap(walker->table);
> +		walker->table = kmap(pfn_to_page(paddr >> PAGE_SHIFT));
>   

kunmap() wants a struct page IIRC.  It's also much slower than the 
atomic variant on i386+HIGHMEM, so I'd rather avoid it.

> @@ -1099,11 +1121,23 @@
>  	}
>  }
>  
> +static void mmu_zap_active_pages(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_mmu_page *page;
> +
> +	while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
> +		page = container_of(vcpu->kvm->active_mmu_pages.next,
> +				    struct kvm_mmu_page, link);
> +		kvm_mmu_zap_page(vcpu, page);
> +	}
> +}
> +
>  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
>  {
>  	int r;
>  
>  	destroy_kvm_mmu(vcpu);
> +	mmu_zap_active_pages(vcpu);
>  	r = init_kvm_mmu(vcpu);
>  	if (r < 0)
>  		goto out;
>   

This is called on set_cr0(), which can be called fairly often.  However, 
I think it can be qualified on changing the paging related bits.

> Index: kvm/kernel/paging_tmpl.h
> ===================================================================
> --- kvm.orig/kernel/paging_tmpl.h	2007-05-21 09:20:11.000000000 +0800
> +++ kvm/kernel/paging_tmpl.h	2007-05-21 09:20:26.000000000 +0800
> @@ -369,7 +369,7 @@
>  	*shadow_ent |= PT_WRITABLE_MASK;
>  	FNAME(mark_pagetable_dirty)(vcpu->kvm, walker);
>  	*guest_ent |= PT_DIRTY_MASK;
> -	rmap_add(vcpu, shadow_ent);
> +//	rmap_add(vcpu, shadow_ent);
>   

??

> +
> +static void kvm_invalidatepage(struct page *page, unsigned long offset)
> +{
> +	/*
> +	 * truncate_page is done after vcpu_free, that means all shadow page
> +	 * table should be freed already, we should never get here
> +	 */
> +	BUG();
> +}
>   

Eventually we'll want to add support for invalidating a vm page, to 
support ballooning and similar mechanisms.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

      parent reply	other threads:[~2007-05-21  9:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-21  8:12 [RFC]kvm: swapout guest page Shaohua Li
     [not found] ` <288dbef70705210112t710bc904pe546840f7b9cfcfa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-05-21  8:43   ` Dor Laor
2007-05-21  9:17   ` Carsten Otte
     [not found]     ` <46516392.6070402-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21  9:20       ` Avi Kivity
     [not found]         ` <46516466.9030904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 12:38           ` Carsten Otte
     [not found]             ` <465192DE.3000902-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 13:31               ` Avi Kivity
     [not found]                 ` <46519F32.7020808-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 14:07                   ` Carsten Otte
     [not found]                     ` <4651A7A4.9040702-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 14:35                       ` Avi Kivity
     [not found]                         ` <4651AE3F.8060603-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 14:41                           ` Carsten Otte
     [not found]                             ` <4651AFA6.2060605-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 14:43                               ` Avi Kivity
     [not found]                                 ` <4651AFF7.2080107-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-22 15:10                                   ` Carsten Otte
2007-05-21 11:51       ` Christoph Hellwig
2007-05-21  9:17   ` Avi Kivity [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=465163AB.8030402@qumranet.com \
    --to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=shaoh.li-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox