From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Shaohua Li <shaoh.li-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [RFC]kvm: swapout guest page
Date: Mon, 21 May 2007 12:17:31 +0300 [thread overview]
Message-ID: <465163AB.8030402@qumranet.com> (raw)
In-Reply-To: <288dbef70705210112t710bc904pe546840f7b9cfcfa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Shaohua Li wrote:
> Hi,
> I saw some discussions on the topic but no progress. I did an
> experiment to make guest page be allocated dynamically and swap out.
> please see attachment patches. It's not yet for merge but I'd like get
> some suggestions and help. Patches (against kvm-19) work here but
> maybe not very stable as there should be some lock issue for swapout,
> which I'll do more check later. If you are brave, please try :).
Nice work. This is fairly different from what I had in mind - I wanted
to use regular address spaces in kvm, whereas this patchset adds swapout
capability to the kvm address space.
Differences between the two approaches include:
- yours is probably simpler :)
- possibly less intrusive code mm changes with using regular address spaces
- automatic hugetlbfs support (this was my main motivation for generic
address spaces, esp. with npt/ept). of course hugetlbfs can be
implemented with your approach as well
- your approach allows kvm to continue using page->private, so it saves
memory and requires less kvm modification
- using Linux address spaces allows paging to file-backed storage, not
just swap
Ultimately I think the balance is in favor of your approach, as it is
more tightly coupled with kvm and can therefore be faster. The
simplicity also helps a lot.
> Some
> issues I have:
> 1. there is a spinlock to pretoct kvm struct, we can't sleep in it. A
> possible solution is do a 'release lock, sleep and retry', but the
> shadow page fault path sounds not easy to follow it. The spinlock also
> prevents vcpu is migrated to other cpus as vmx operation must be done
> in the cpu vcpu runs. I changed it to a semaphore plus a cpu affinity
> setting. It's a little hacky, I'd see if there are better approaches.
My plan is to teach the scheduler about kvm, so it can call a callback
when a vcpu is migrated. That will allow re-enabling preemption in all
kvm code except the actual entry/exit sequence. This is an improvement
all over (for realtime, for easier coding, for latency) so I hope to to
it soon.
> 2. Linux page relcaim can't get if a guest page is referenced often.
> My current patch just bliendly adds guest page to lru, not optimized.
Well, that will always be a problem with paging guest memory. There are
some patches floating around to allow a guest to give hints to the host
about page recency, for s390, which may help.
> 3. kvm_ops.tlb_flush should really send an IPI to make the vcpu flush
> tlb, as it might be called in other cpus other than the cpu vcpu run.
> This makes the swapout path not be able to zap shadow page tables. My
> patch just skip any guest page which has shadow page table points to.
> I assume kvm smp guest support will improve the tlb_flush.
>
Yes. The apic patchset includes mechanisms for interrupting a running
vcpu which can be used for this.
> @@ -151,9 +151,8 @@
> walker->inherited_ar &= walker->table[index];
> table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
> paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK);
> - kunmap_atomic(walker->table, KM_USER0);
> - walker->table = kmap_atomic(pfn_to_page(paddr >> PAGE_SHIFT),
> - KM_USER0);
> + kunmap(walker->table);
> + walker->table = kmap(pfn_to_page(paddr >> PAGE_SHIFT));
>
kunmap() wants a struct page IIRC. It's also much slower than the
atomic variant on i386+HIGHMEM, so I'd rather avoid it.
> @@ -1099,11 +1121,23 @@
> }
> }
>
> +static void mmu_zap_active_pages(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_mmu_page *page;
> +
> + while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
> + page = container_of(vcpu->kvm->active_mmu_pages.next,
> + struct kvm_mmu_page, link);
> + kvm_mmu_zap_page(vcpu, page);
> + }
> +}
> +
> int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> {
> int r;
>
> destroy_kvm_mmu(vcpu);
> + mmu_zap_active_pages(vcpu);
> r = init_kvm_mmu(vcpu);
> if (r < 0)
> goto out;
>
This is called on set_cr0(), which can be called fairly often. However,
I think it can be qualified on changing the paging related bits.
> Index: kvm/kernel/paging_tmpl.h
> ===================================================================
> --- kvm.orig/kernel/paging_tmpl.h 2007-05-21 09:20:11.000000000 +0800
> +++ kvm/kernel/paging_tmpl.h 2007-05-21 09:20:26.000000000 +0800
> @@ -369,7 +369,7 @@
> *shadow_ent |= PT_WRITABLE_MASK;
> FNAME(mark_pagetable_dirty)(vcpu->kvm, walker);
> *guest_ent |= PT_DIRTY_MASK;
> - rmap_add(vcpu, shadow_ent);
> +// rmap_add(vcpu, shadow_ent);
>
??
> +
> +static void kvm_invalidatepage(struct page *page, unsigned long offset)
> +{
> + /*
> + * truncate_page is done after vcpu_free, that means all shadow page
> + * table should be freed already, we should never get here
> + */
> + BUG();
> +}
>
Eventually we'll want to add support for invalidating a vm page, to
support ballooning and similar mechanisms.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
prev parent reply other threads:[~2007-05-21 9:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-21 8:12 [RFC]kvm: swapout guest page Shaohua Li
[not found] ` <288dbef70705210112t710bc904pe546840f7b9cfcfa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-05-21 8:43 ` Dor Laor
2007-05-21 9:17 ` Carsten Otte
[not found] ` <46516392.6070402-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 9:20 ` Avi Kivity
[not found] ` <46516466.9030904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 12:38 ` Carsten Otte
[not found] ` <465192DE.3000902-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 13:31 ` Avi Kivity
[not found] ` <46519F32.7020808-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 14:07 ` Carsten Otte
[not found] ` <4651A7A4.9040702-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 14:35 ` Avi Kivity
[not found] ` <4651AE3F.8060603-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 14:41 ` Carsten Otte
[not found] ` <4651AFA6.2060605-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 14:43 ` Avi Kivity
[not found] ` <4651AFF7.2080107-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-22 15:10 ` Carsten Otte
2007-05-21 11:51 ` Christoph Hellwig
2007-05-21 9:17 ` Avi Kivity [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=465163AB.8030402@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=shaoh.li-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.