From: Marcelo Tosatti <mtosatti@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"ddutile@redhat.com" <ddutile@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"avi@redhat.com" <avi@redhat.com>,
"chrisw@redhat.com" <chrisw@redhat.com>
Subject: Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts
Date: Mon, 31 Jan 2011 17:18:34 -0200 [thread overview]
Message-ID: <20110131191834.GA27154@amt.cnet> (raw)
In-Reply-To: <1295933876.3230.46.camel@x201>
On Mon, Jan 24, 2011 at 10:37:56PM -0700, Alex Williamson wrote:
> On Mon, 2011-01-24 at 08:44 -0700, Alex Williamson wrote:
> > I'll look at how we might be
> > able to allocate slots on demand. Thanks,
>
> Here's a first cut just to see if this looks agreeable. This allows the
> slot array to grow on demand. This works with current userspace, as
> well as userspace trivially modified to double KVMState.slots and
> hotplugging enough pci-assign devices to exceed the previous limit (w/ &
> w/o ept). Hopefully I got the rcu bits correct. Does this look like
> the right path? If so, I'll work on removing the fixed limit from
> userspace next. Thanks,
>
> Alex
>
>
> kvm: Allow memory slot array to grow on demand
>
> Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> to grow on demand. Private slots are now allocated at the
> front instead of the end. Only x86 seems to use private slots,
> so this is now zero for all other archs. The memslots pointer
> is already updated using rcu, so changing the size off the
> array when it's replaces is straight forward. x86 also keeps
> a bitmap of slots used by a kvm_mmu_page, which requires a
> shadow tlb flush whenever we increase the number of slots.
> This forces the pages to be rebuilt with the new bitmap size.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>
> arch/ia64/include/asm/kvm_host.h | 4 --
> arch/ia64/kvm/kvm-ia64.c | 2 +
> arch/powerpc/include/asm/kvm_host.h | 3 --
> arch/s390/include/asm/kvm_host.h | 3 --
> arch/x86/include/asm/kvm_host.h | 3 +-
> arch/x86/include/asm/vmx.h | 6 ++-
> arch/x86/kvm/mmu.c | 7 +++-
> arch/x86/kvm/x86.c | 6 ++-
> include/linux/kvm_host.h | 7 +++-
> virt/kvm/kvm_main.c | 65 ++++++++++++++++++++++++-----------
> 10 files changed, 63 insertions(+), 43 deletions(-)
>
>
> diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
> index 2689ee5..11d0ab2 100644
> --- a/arch/ia64/include/asm/kvm_host.h
> +++ b/arch/ia64/include/asm/kvm_host.h
> @@ -23,10 +23,6 @@
> #ifndef __ASM_KVM_HOST_H
> #define __ASM_KVM_HOST_H
>
> -#define KVM_MEMORY_SLOTS 32
> -/* memory slots that does not exposed to userspace */
> -#define KVM_PRIVATE_MEM_SLOTS 4
> -
> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>
> /* define exit reasons from vmm to kvm*/
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 70d224d..f1adda2 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1814,7 +1814,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> mutex_lock(&kvm->slots_lock);
>
> r = -EINVAL;
> - if (log->slot >= KVM_MEMORY_SLOTS)
> + if (log->slot >= kvm->memslots->nmemslots)
> goto out;
>
> memslot = &kvm->memslots->memslots[log->slot];
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index bba3b9b..dc80057 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -29,9 +29,6 @@
> #include <asm/kvm_asm.h>
>
> #define KVM_MAX_VCPUS 1
> -#define KVM_MEMORY_SLOTS 32
> -/* memory slots that does not exposed to userspace */
> -#define KVM_PRIVATE_MEM_SLOTS 4
>
> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index cef7dbf..92a964c 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -20,9 +20,6 @@
> #include <asm/cpu.h>
>
> #define KVM_MAX_VCPUS 64
> -#define KVM_MEMORY_SLOTS 32
> -/* memory slots that does not exposed to userspace */
> -#define KVM_PRIVATE_MEM_SLOTS 4
>
> struct sca_entry {
> atomic_t scn;
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ffd7f8d..df1382c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -27,7 +27,6 @@
> #include <asm/msr-index.h>
>
> #define KVM_MAX_VCPUS 64
> -#define KVM_MEMORY_SLOTS 32
> /* memory slots that does not exposed to userspace */
> #define KVM_PRIVATE_MEM_SLOTS 4
>
> @@ -207,7 +206,7 @@ struct kvm_mmu_page {
> * One bit set per slot which has memory
> * in this shadow page.
> */
> - DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
> + unsigned long *slot_bitmap;
> bool multimapped; /* More than one parent_pte? */
> bool unsync;
> int root_count; /* Currently serving as active root */
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 84471b8..7fd8c89 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -370,9 +370,9 @@ enum vmcs_field {
>
> #define AR_RESERVD_MASK 0xfffe0f00
>
> -#define TSS_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 0)
> -#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 1)
> -#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 2)
> +#define TSS_PRIVATE_MEMSLOT 0
> +#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT 1
> +#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT 2
>
> #define VMX_NR_VPIDS (1 << 16)
> #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ccacf0b..8c2533a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1032,6 +1032,7 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> ASSERT(is_empty_shadow_page(sp->spt));
> hlist_del(&sp->hash_link);
> list_del(&sp->link);
> + kfree(sp->slot_bitmap);
> __free_page(virt_to_page(sp->spt));
> if (!sp->role.direct)
> __free_page(virt_to_page(sp->gfns));
> @@ -1048,6 +1049,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
> u64 *parent_pte, int direct)
> {
> struct kvm_mmu_page *sp;
> + struct kvm_memslots *slots = kvm_memslots(vcpu->kvm);
>
> sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
> sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
> @@ -1056,7 +1058,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
> PAGE_SIZE);
> set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
> - bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
> + sp->slot_bitmap = kzalloc(sizeof(long) *
> + BITS_TO_LONGS(slots->nmemslots), GFP_KERNEL);
> + if (!sp->slot_bitmap)
> + return NULL;
> sp->multimapped = 0;
> sp->parent_pte = parent_pte;
> kvm_mod_used_mmu_pages(vcpu->kvm, +1);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5eccdba..c002fac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1978,7 +1978,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> r = KVM_MAX_VCPUS;
> break;
> case KVM_CAP_NR_MEMSLOTS:
> - r = KVM_MEMORY_SLOTS;
> + r = INT_MAX - KVM_PRIVATE_MEM_SLOTS;
> break;
> case KVM_CAP_PV_MMU: /* obsolete */
> r = 0;
> @@ -3201,7 +3201,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> mutex_lock(&kvm->slots_lock);
>
> r = -EINVAL;
> - if (log->slot >= KVM_MEMORY_SLOTS)
> + if (log->slot >= kvm->memslots->nmemslots)
> goto out;
>
> memslot = &kvm->memslots->memslots[log->slot];
> @@ -6068,7 +6068,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
>
> /* Prevent internal slot pages from being moved by fork()/COW. */
> - if (memslot->id >= KVM_MEMORY_SLOTS)
> + if (memslot->id < KVM_PRIVATE_MEM_SLOTS)
> map_flags = MAP_SHARED | MAP_ANONYMOUS;
>
> /*To keep backward compatibility with older userspace,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b5021db..4cb9f94 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -27,6 +27,10 @@
>
> #include <asm/kvm_host.h>
>
> +#ifndef KVM_PRIVATE_MEM_SLOTS
> + #define KVM_PRIVATE_MEM_SLOTS 0
> +#endif
> +
> /*
> * vcpu->requests bit members
> */
> @@ -206,8 +210,7 @@ struct kvm_irq_routing_table {};
> struct kvm_memslots {
> int nmemslots;
> u64 generation;
> - struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
> - KVM_PRIVATE_MEM_SLOTS];
> + struct kvm_memory_slot memslots[];
> };
>
> struct kvm {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fd67bcd..32f023c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -623,13 +623,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
> struct kvm_userspace_memory_region *mem,
> int user_alloc)
> {
> - int r;
> + int r, nmemslots;
> gfn_t base_gfn;
> unsigned long npages;
> unsigned long i;
> - struct kvm_memory_slot *memslot;
> - struct kvm_memory_slot old, new;
> + struct kvm_memory_slot *memslot = NULL;
> + struct kvm_memory_slot old = {}, new = {};
> struct kvm_memslots *slots, *old_memslots;
> + bool flush = false;
>
> r = -EINVAL;
> /* General sanity checks */
> @@ -639,12 +640,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
> goto out;
> if (user_alloc && (mem->userspace_addr & (PAGE_SIZE - 1)))
> goto out;
> - if (mem->slot >= KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
> - goto out;
> if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
> goto out;
>
> - memslot = &kvm->memslots->memslots[mem->slot];
> base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
> npages = mem->memory_size >> PAGE_SHIFT;
>
> @@ -655,7 +653,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
> if (!npages)
> mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
>
> - new = old = *memslot;
> + if (mem->slot < kvm->memslots->nmemslots) {
> + memslot = &kvm->memslots->memslots[mem->slot];
> + new = old = *memslot;
> + }
> if (s == memslot || !s->npages)
> @@ -752,12 +753,19 @@ skip_lpage:
>
> if (!npages) {
> r = -ENOMEM;
> - slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> +
> + nmemslots = (mem->slot >= kvm->memslots->nmemslots) ?
> + mem->slot + 1 : kvm->memslots->nmemslots;
> +
> + slots = kzalloc(sizeof(struct kvm_memslots) +
> + nmemslots * sizeof(struct kvm_memory_slot),
> + GFP_KERNEL);
> if (!slots)
> goto out_free;
> - memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> - if (mem->slot >= slots->nmemslots)
> - slots->nmemslots = mem->slot + 1;
> + memcpy(slots, kvm->memslots,
> + sizeof(struct kvm_memslots) + kvm->memslots->nmemslots *
> + sizeof(struct kvm_memory_slot));
> + slots->nmemslots = nmemslots;
Other than the upper limit, should disallow increasing the number of
slots in case the new slot is being deleted (npages == 0). So none of
this additions to !npages case should be necessary.
Also, must disallow shrinking the number of slots.
> slots->generation++;
> slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
>
> @@ -787,12 +795,21 @@ skip_lpage:
> }
>
> r = -ENOMEM;
> - slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> +
> + if (mem->slot >= kvm->memslots->nmemslots) {
> + nmemslots = mem->slot + 1;
> + flush = true;
> + } else
> + nmemslots = kvm->memslots->nmemslots;
> +
> + slots = kzalloc(sizeof(struct kvm_memslots) +
> + nmemslots * sizeof(struct kvm_memory_slot),
> + GFP_KERNEL);
> if (!slots)
> goto out_free;
> - memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> - if (mem->slot >= slots->nmemslots)
> - slots->nmemslots = mem->slot + 1;
> + memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots) +
> + kvm->memslots->nmemslots * sizeof(struct kvm_memory_slot));
> + slots->nmemslots = nmemslots;
> slots->generation++;
>
> /* actual memory is freed via old in kvm_free_physmem_slot below */
> @@ -808,6 +825,9 @@ skip_lpage:
> rcu_assign_pointer(kvm->memslots, slots);
It should be:
spin_lock(kvm->mmu_lock)
rcu_assign_pointer()
kvm_arch_flush_shadow()
spin_unlock(kvm->mmu_lock)
But kvm_arch_flush_shadow() takes mmu_lock currently, so that needs
fixing.
Otherwise looks fine.
next prev parent reply other threads:[~2011-01-31 19:18 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-21 23:48 [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Alex Williamson
2011-01-21 23:48 ` [RFC PATCH 1/2] kvm: Allow querying free slots Alex Williamson
2011-01-21 23:48 ` [RFC PATCH 2/2] device-assignment: Count required kvm memory slots Alex Williamson
2011-01-22 22:11 ` [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts Michael S. Tsirkin
2011-01-24 9:32 ` Marcelo Tosatti
2011-01-24 14:16 ` Jan Kiszka
2011-01-24 15:44 ` Alex Williamson
2011-01-25 5:37 ` Alex Williamson
2011-01-25 7:36 ` Jan Kiszka
2011-01-25 14:41 ` Alex Williamson
2011-01-25 14:45 ` Michael S. Tsirkin
2011-01-25 14:54 ` Avi Kivity
2011-01-25 14:53 ` Avi Kivity
2011-01-25 14:59 ` Michael S. Tsirkin
2011-01-25 17:33 ` Avi Kivity
2011-01-25 17:58 ` Michael S. Tsirkin
2011-01-26 9:17 ` Avi Kivity
2011-01-26 9:20 ` Michael S. Tsirkin
2011-01-26 9:23 ` Avi Kivity
2011-01-26 9:39 ` Michael S. Tsirkin
2011-01-26 9:54 ` Avi Kivity
2011-01-26 12:08 ` Michael S. Tsirkin
2011-01-27 9:21 ` Avi Kivity
2011-01-27 9:26 ` Michael S. Tsirkin
2011-01-27 9:28 ` Avi Kivity
2011-01-27 9:29 ` Michael S. Tsirkin
2011-01-27 9:51 ` Avi Kivity
2011-01-27 9:28 ` Michael S. Tsirkin
2011-01-25 16:35 ` Jan Kiszka
2011-01-25 19:13 ` Alex Williamson
2011-01-26 8:14 ` Jan Kiszka
2011-01-25 10:23 ` Avi Kivity
2011-01-25 14:57 ` Alex Williamson
2011-01-25 17:11 ` Avi Kivity
2011-01-25 17:43 ` Alex Williamson
2011-01-26 9:22 ` Avi Kivity
2011-01-31 19:18 ` Marcelo Tosatti [this message]
2011-02-23 21:46 ` Alex Williamson
2011-02-24 12:34 ` Avi Kivity
2011-02-24 12:37 ` Avi Kivity
2011-02-24 18:10 ` Alex Williamson
2011-01-25 10:20 ` Avi Kivity
2011-01-25 14:46 ` Alex Williamson
2011-01-25 14:56 ` Avi Kivity
2011-01-25 14:55 ` Michael S. Tsirkin
2011-01-25 14:58 ` Avi Kivity
2011-01-25 15:23 ` Michael S. Tsirkin
2011-01-25 17:34 ` Avi Kivity
2011-01-25 18:00 ` Michael S. Tsirkin
2011-01-26 9:25 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110131191834.GA27154@amt.cnet \
--to=mtosatti@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avi@redhat.com \
--cc=chrisw@redhat.com \
--cc=ddutile@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).