From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Shaohua Li <shaoh.li-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [RFC]kvm: swapout guest page
Date: Mon, 21 May 2007 12:17:31 +0300 [thread overview]
Message-ID: <465163AB.8030402@qumranet.com> (raw)
In-Reply-To: <288dbef70705210112t710bc904pe546840f7b9cfcfa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Shaohua Li wrote:
> Hi,
> I saw some discussions on the topic but no progress. I did an
> experiment to make guest page be allocated dynamically and swap out.
> please see attachment patches. It's not yet for merge but I'd like get
> some suggestions and help. Patches (against kvm-19) work here but
> maybe not very stable as there should be some lock issue for swapout,
> which I'll do more check later. If you are brave, please try :).
Nice work. This is fairly different from what I had in mind - I wanted
to use regular address spaces in kvm, whereas this patchset adds swapout
capability to the kvm address space.
Differences between the two approaches include:
- yours is probably simpler :)
- possibly less intrusive code mm changes with using regular address spaces
- automatic hugetlbfs support (this was my main motivation for generic
address spaces, esp. with npt/ept). of course hugetlbfs can be
implemented with your approach as well
- your approach allows kvm to continue using page->private, so it saves
memory and requires less kvm modification
- using Linux address spaces allows paging to file-backed storage, not
just swap
Ultimately I think the balance is in favor of your approach, as it is
more tightly coupled with kvm and can therefore be faster. The
simplicity also helps a lot.
> Some
> issues I have:
> 1. there is a spinlock to pretoct kvm struct, we can't sleep in it. A
> possible solution is do a 'release lock, sleep and retry', but the
> shadow page fault path sounds not easy to follow it. The spinlock also
> prevents vcpu is migrated to other cpus as vmx operation must be done
> in the cpu vcpu runs. I changed it to a semaphore plus a cpu affinity
> setting. It's a little hacky, I'd see if there are better approaches.
My plan is to teach the scheduler about kvm, so it can call a callback
when a vcpu is migrated. That will allow re-enabling preemption in all
kvm code except the actual entry/exit sequence. This is an improvement
all over (for realtime, for easier coding, for latency) so I hope to to
it soon.
> 2. Linux page relcaim can't get if a guest page is referenced often.
> My current patch just bliendly adds guest page to lru, not optimized.
Well, that will always be a problem with paging guest memory. There are
some patches floating around to allow a guest to give hints to the host
about page recency, for s390, which may help.
> 3. kvm_ops.tlb_flush should really send an IPI to make the vcpu flush
> tlb, as it might be called in other cpus other than the cpu vcpu run.
> This makes the swapout path not be able to zap shadow page tables. My
> patch just skip any guest page which has shadow page table points to.
> I assume kvm smp guest support will improve the tlb_flush.
>
Yes. The apic patchset includes mechanisms for interrupting a running
vcpu which can be used for this.
> @@ -151,9 +151,8 @@
> walker->inherited_ar &= walker->table[index];
> table_gfn = (*ptep & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
> paddr = safe_gpa_to_hpa(vcpu, *ptep & PT_BASE_ADDR_MASK);
> - kunmap_atomic(walker->table, KM_USER0);
> - walker->table = kmap_atomic(pfn_to_page(paddr >> PAGE_SHIFT),
> - KM_USER0);
> + kunmap(walker->table);
> + walker->table = kmap(pfn_to_page(paddr >> PAGE_SHIFT));
>
kunmap() wants a struct page IIRC. It's also much slower than the
atomic variant on i386+HIGHMEM, so I'd rather avoid it.
> @@ -1099,11 +1121,23 @@
> }
> }
>
> +static void mmu_zap_active_pages(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_mmu_page *page;
> +
> + while (!list_empty(&vcpu->kvm->active_mmu_pages)) {
> + page = container_of(vcpu->kvm->active_mmu_pages.next,
> + struct kvm_mmu_page, link);
> + kvm_mmu_zap_page(vcpu, page);
> + }
> +}
> +
> int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> {
> int r;
>
> destroy_kvm_mmu(vcpu);
> + mmu_zap_active_pages(vcpu);
> r = init_kvm_mmu(vcpu);
> if (r < 0)
> goto out;
>
This is called on set_cr0(), which can be called fairly often. However,
I think it can be qualified on changing the paging related bits.
> Index: kvm/kernel/paging_tmpl.h
> ===================================================================
> --- kvm.orig/kernel/paging_tmpl.h 2007-05-21 09:20:11.000000000 +0800
> +++ kvm/kernel/paging_tmpl.h 2007-05-21 09:20:26.000000000 +0800
> @@ -369,7 +369,7 @@
> *shadow_ent |= PT_WRITABLE_MASK;
> FNAME(mark_pagetable_dirty)(vcpu->kvm, walker);
> *guest_ent |= PT_DIRTY_MASK;
> - rmap_add(vcpu, shadow_ent);
> +// rmap_add(vcpu, shadow_ent);
>
??
> +
> +static void kvm_invalidatepage(struct page *page, unsigned long offset)
> +{
> + /*
> + * truncate_page is done after vcpu_free, that means all shadow page
> + * table should be freed already, we should never get here
> + */
> + BUG();
> +}
>
Eventually we'll want to add support for invalidating a vm page, to
support ballooning and similar mechanisms.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
prev parent reply other threads:[~2007-05-21 9:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-21 8:12 [RFC]kvm: swapout guest page Shaohua Li
[not found] ` <288dbef70705210112t710bc904pe546840f7b9cfcfa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-05-21 8:43 ` Dor Laor
2007-05-21 9:17 ` Carsten Otte
[not found] ` <46516392.6070402-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 9:20 ` Avi Kivity
[not found] ` <46516466.9030904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 12:38 ` Carsten Otte
[not found] ` <465192DE.3000902-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 13:31 ` Avi Kivity
[not found] ` <46519F32.7020808-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 14:07 ` Carsten Otte
[not found] ` <4651A7A4.9040702-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 14:35 ` Avi Kivity
[not found] ` <4651AE3F.8060603-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-21 14:41 ` Carsten Otte
[not found] ` <4651AFA6.2060605-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-21 14:43 ` Avi Kivity
[not found] ` <4651AFF7.2080107-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-22 15:10 ` Carsten Otte
2007-05-21 11:51 ` Christoph Hellwig
2007-05-21 9:17 ` Avi Kivity [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=465163AB.8030402@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=shaoh.li-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox