* [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
@ 2012-09-12 0:34 Paul Mackerras
2012-09-12 0:35 ` [PATCH 1/3] KVM: PPC: Book3S HV: Add a more " Paul Mackerras
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Paul Mackerras @ 2012-09-12 0:34 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, kvm
This series of 3 patches makes it possible for guests to allocate
whatever size of HPT they need from linear memory preallocated at
boot, rather than being restricted to a single size of HPT (by
default, 16MB) and having to use the kernel page allocator for
anything else -- which in practice limits them to at most 16MB given
the default value for the maximum page order. Instead of allocating
many individual pieces of memory, this allocates a single contiguous
area and uses a simple bitmap-based allocator to hand out pieces of it
as required.
Paul.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] KVM: PPC: Book3S HV: Add a more flexible allocator for linear memory
2012-09-12 0:34 [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory Paul Mackerras
@ 2012-09-12 0:35 ` Paul Mackerras
2012-09-12 0:36 ` [PATCH 2/3] KVM: PPC: Book3S HV: Allocate user-requested size of HPT from preallocated pool Paul Mackerras
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2012-09-12 0:35 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, kvm
HV-style KVM requires some large regions of physically contiguous
memory for the guest hashed page table (HPT), and on PPC970 processors,
the real mode area (RMA). Currently we can allocate some number of
a single size of HPT and a single size of RMA at boot time and use them
later to run guests. However, the desired size of the HPT depends on
how much RAM a guest is given, and the optimal size of the RMA might
differ between guests also.
Therefore, this changes the boot-time allocation to allocate a single,
contiguous area of memory, rather than many separate areas, and adds a
simple bitmap-based allocator which can find aligned or unaligned free
regions of a power-of-2 size within the area. Each bit of the bitmap
represents a "chunk", a 256kB aligned piece of the area. If the bit
is set, it means that the chunk is in use.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/kvm/book3s_hv_builtin.c | 388 ++++++++++++++++++++++++++++------
1 file changed, 321 insertions(+), 67 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index ec0a9e5..f0c51c5 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -18,11 +18,14 @@
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
+/* Use a chunk size of 256kB, since this is the smallest allowed HPT size */
+#define KVM_CHUNK_ORDER 18
+#define KVM_CHUNK_SIZE (1ul << KVM_CHUNK_ORDER)
+
#define KVM_LINEAR_RMA 0
#define KVM_LINEAR_HPT 1
-static void __init kvm_linear_init_one(ulong size, int count, int type);
-static struct kvmppc_linear_info *kvm_alloc_linear(int type);
+static struct kvmppc_linear_info *kvm_alloc_linear(int type, int order);
static void kvm_release_linear(struct kvmppc_linear_info *ri);
int kvm_hpt_order = KVM_DEFAULT_HPT_ORDER;
@@ -38,7 +41,8 @@ EXPORT_SYMBOL_GPL(kvm_hpt_order);
* much physically contiguous memory after the system is up and running,
* we preallocate a set of RMAs in early boot for KVM to use.
*/
-static unsigned long kvm_rma_size = 64 << 20; /* 64MB */
+#define DEFAULT_RMA_SIZE (64 << 20) /* 64MB */
+static unsigned long kvm_rma_size = DEFAULT_RMA_SIZE;
static unsigned long kvm_rma_count;
/* Work out RMLS (real mode limit selector) field value for a given RMA size.
@@ -91,7 +95,7 @@ early_param("kvm_rma_count", early_parse_rma_count);
struct kvmppc_linear_info *kvm_alloc_rma(void)
{
- return kvm_alloc_linear(KVM_LINEAR_RMA);
+ return kvm_alloc_linear(KVM_LINEAR_RMA, __ilog2(kvm_rma_size));
}
EXPORT_SYMBOL_GPL(kvm_alloc_rma);
@@ -124,7 +128,7 @@ early_param("kvm_hpt_count", early_parse_hpt_count);
struct kvmppc_linear_info *kvm_alloc_hpt(void)
{
- return kvm_alloc_linear(KVM_LINEAR_HPT);
+ return kvm_alloc_linear(KVM_LINEAR_HPT, kvm_hpt_order);
}
EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
@@ -134,73 +138,295 @@ void kvm_release_hpt(struct kvmppc_linear_info *li)
}
EXPORT_SYMBOL_GPL(kvm_release_hpt);
-/*************** generic *************/
+/* Bitmap allocator functions */
-static LIST_HEAD(free_linears);
-static DEFINE_SPINLOCK(linear_lock);
+/*
+ * Set `nbits' bits in a bitmap starting at bit `start', checking that
+ * the bits are 0 to begin with. If they aren't, stop and return the
+ * number of bits remaining (not set to 1).
+ */
+static long set_zero_bits(unsigned long *bitmap, unsigned long start,
+ unsigned long nbits)
+{
+ unsigned long mask;
+ unsigned long offset;
+ unsigned long *p;
+
+ offset = start & (BITS_PER_LONG - 1);
+ mask = ~0UL << offset;
+ nbits += offset;
+ p = bitmap + start / BITS_PER_LONG;
+ while (nbits >= BITS_PER_LONG) {
+ if (*p & mask)
+ return nbits;
+ *p++ |= mask;
+ mask = ~0UL;
+ nbits -= BITS_PER_LONG;
+ }
+ if (nbits) {
+ mask &= ~(~0UL << nbits);
+ if (*p & mask)
+ return nbits;
+ *p |= mask;
+ }
+ return 0;
+}
-static void __init kvm_linear_init_one(ulong size, int count, int type)
+/*
+ * Clear `nbits' bits in a bitmap starting at bit `start', checking that
+ * the bits are 1 to begin with. If they aren't, stop and return the
+ * number of bits remaining (not set to 0).
+ */
+static long clear_one_bits(unsigned long *bitmap, unsigned long start,
+ unsigned long nbits)
{
- unsigned long i;
- unsigned long j, npages;
- void *linear;
- struct page *pg;
- const char *typestr;
- struct kvmppc_linear_info *linear_info;
+ unsigned long mask;
+ unsigned long offset;
+ unsigned long *p;
+
+ offset = start & (BITS_PER_LONG - 1);
+ mask = ~0UL << offset;
+ nbits += offset;
+ p = bitmap + start / BITS_PER_LONG;
+ while (nbits >= BITS_PER_LONG) {
+ if (~*p & mask)
+ return nbits;
+ *p++ &= ~mask;
+ mask = ~0UL;
+ nbits -= BITS_PER_LONG;
+ }
+ if (nbits) {
+ mask &= ~(~0UL << nbits);
+ if (~*p & mask)
+ return nbits;
+ *p &= ~mask;
+ }
+ return 0;
+}
- if (!count)
- return;
+static void bitmap_free(unsigned long *bitmap, unsigned long start,
+ unsigned long n)
+{
+ long nr;
+
+ nr = clear_one_bits(bitmap, start, n);
+ if (nr) {
+ pr_err("KVM: oops, freeing not-in-use linear memory\n");
+ if (nr < n)
+ set_zero_bits(bitmap, start, n - nr);
+ }
+}
+
+static unsigned long lsbs[6] = {
+ ~0UL, /* LSBs of aligned 1-bit fields */
+ ~0UL / 3, /* LSBs of aligned 2-bit fields */
+ ~0UL / 0xf, /* LSBs of aligned 4-bit fields */
+ ~0UL / 0xff, /* LSBs of aligned 8-bit fields */
+ ~0UL / 0xffff, /* LSBs of aligned 16-bit fields */
+ ~0UL / 0xffffffff, /* LSBs of aligned 32-bit fields */
+};
+
+static inline int find_ls_one(unsigned long n)
+{
+ return __ffs(n);
+}
+
+static inline int find_ms_one(unsigned long n)
+{
+ return __fls(n);
+}
- typestr = (type == KVM_LINEAR_RMA) ? "RMA" : "HPT";
-
- npages = size >> PAGE_SHIFT;
- linear_info = alloc_bootmem(count * sizeof(struct kvmppc_linear_info));
- for (i = 0; i < count; ++i) {
- linear = alloc_bootmem_align(size, size);
- pr_debug("Allocated KVM %s at %p (%ld MB)\n", typestr, linear,
- size >> 20);
- linear_info[i].base_virt = linear;
- linear_info[i].base_pfn = __pa(linear) >> PAGE_SHIFT;
- linear_info[i].npages = npages;
- linear_info[i].type = type;
- list_add_tail(&linear_info[i].list, &free_linears);
- atomic_set(&linear_info[i].use_count, 0);
-
- pg = pfn_to_page(linear_info[i].base_pfn);
- for (j = 0; j < npages; ++j) {
- atomic_inc(&pg->_count);
- ++pg;
+/*
+ * Allocate a contiguous region of 1 << order bits in a bitmap.
+ * If aligned != 0, require it to be aligned on a multiple of its size.
+ * Returns -1 if no suitable free region could be found.
+ */
+static int bitmap_alloc(unsigned long *bitmap, unsigned long map_length,
+ int order, int aligned)
+{
+ unsigned long n = 1ul << order;
+ int aorder = order;
+ unsigned long map, map2, *p;
+ unsigned long ls, ms;
+ unsigned long i, j, t;
+ unsigned long b, nb, nw, delta;
+
+ if (!order)
+ aligned = 1;
+ else if (!aligned)
+ --aorder;
+
+ /*
+ * If we are allocating 32 bits or fewer, we can use arithmetic
+ * tricks to find aligned free regions 2^n bits.
+ * If we need an aligned allocation, we just look for aligned
+ * free regions of 2^order bits.
+ * If we don't need an aligned allocation, we check if each word
+ * has any aligned free regions of 2^(order - 1) bits, and if it
+ * does, we find any free regions of n bits by smearing out the
+ * one (in-use) bits in the word over the following n-1 bits,
+ * and then looking for any remaining zero bits.
+ */
+ p = bitmap;
+ if (n < BITS_PER_LONG) {
+ ls = lsbs[aorder];
+ ms = ls << ((1 << aorder) - 1);
+ j = map_length;
+ for (i = 0; i < map_length; i += BITS_PER_LONG) {
+ map = *p++;
+ t = (map - ls) & ~map & ms;
+ if (!t)
+ continue;
+ /* found aligned free space of size 1<<aorder */
+ if (aligned) {
+ j = i + (find_ls_one(t) & ~(n - 1));
+ break;
+ }
+ /* look for unaligned free space */
+ map2 = 0;
+ if (i < map_length - BITS_PER_LONG)
+ map2 = *p;
+ for (b = 1; b < n; b <<= 1) {
+ map |= map >> b;
+ map |= map2 << (BITS_PER_LONG - b);
+ map2 |= map2 >> b;
+ }
+ if (map != ~0UL) {
+ j = i + find_ls_one(~map);
+ break;
+ }
}
+ } else if (aligned) {
+ /*
+ * Aligned allocations of 64 bits or more: we just
+ * have to find `nw' consecutive 0 words in the bitmap
+ * starting at multiple of `nw'.
+ */
+ nw = n / BITS_PER_LONG;
+ j = 0;
+ for (i = 0; i < map_length; i += BITS_PER_LONG) {
+ map = *p++;
+ if (map) {
+ delta = (n - BITS_PER_LONG) & ~i;
+ i += delta;
+ j = i + BITS_PER_LONG;
+ p += delta / BITS_PER_LONG;
+ nw = n / BITS_PER_LONG;
+ } else if (--nw == 0)
+ break;
+ }
+ if (nw)
+ j = map_length;
+ } else {
+ /*
+ * Unaligned allocations of 64 bits or more: we only
+ * need to consider groups of consecutive zeros at
+ * either end of a word, and words that are all zeros.
+ */
+ nb = 0;
+ j = 0;
+ for (i = 0; i < map_length; i += BITS_PER_LONG) {
+ map = *p++;
+ if (map == 0) {
+ nb += BITS_PER_LONG;
+ if (nb >= n)
+ break;
+ continue;
+ }
+ nb += find_ls_one(map);
+ if (nb >= n)
+ break;
+ j = find_ms_one(map) + 1;
+ nb = BITS_PER_LONG - j;
+ j += i;
+ }
+ if (nb < n)
+ j = map_length;
}
+
+ if (j >= map_length)
+ return -1;
+
+ nb = set_zero_bits(bitmap, j, n);
+ if (nb) {
+ pr_err("KVM: Bug in alloc_bitmap\n");
+ if (nb < n)
+ clear_one_bits(bitmap, j, n - nb);
+ return -1;
+ }
+
+ return j;
}
-static struct kvmppc_linear_info *kvm_alloc_linear(int type)
+/*************** generic *************/
+
+static unsigned long *linear_bitmap;
+static unsigned long linear_bitmap_len;
+static char *linear_mem;
+static DEFINE_SPINLOCK(linear_lock);
+
+static struct kvmppc_linear_info *kvm_alloc_linear(int type, int order)
{
- struct kvmppc_linear_info *ri, *ret;
+ struct kvmppc_linear_info *ri;
+ int aligned = 1;
+ int index;
+
+ /* Do we have any preallocated memory at all? */
+ if (!linear_bitmap_len)
+ return NULL;
+
+ /* Convert log2(bytes) to log2(chunks) */
+ order -= KVM_CHUNK_ORDER;
+ if (order < 0)
+ order = 0;
+
+ /*
+ * Assume arch 2.06 means POWER7, which doesn't require
+ * the HPT to be aligned on a multiple of its size,
+ * but only on a 256kB boundary.
+ */
+ if (type == KVM_LINEAR_HPT && cpu_has_feature(CPU_FTR_ARCH_206))
+ aligned = 0;
+
+ ri = kzalloc(sizeof(*ri), GFP_KERNEL);
+ if (!ri)
+ return NULL;
- ret = NULL;
spin_lock(&linear_lock);
- list_for_each_entry(ri, &free_linears, list) {
- if (ri->type != type)
- continue;
-
- list_del(&ri->list);
- atomic_inc(&ri->use_count);
- memset(ri->base_virt, 0, ri->npages << PAGE_SHIFT);
- ret = ri;
- break;
- }
+ index = bitmap_alloc(linear_bitmap, linear_bitmap_len, order, aligned);
spin_unlock(&linear_lock);
- return ret;
+ if (index < 0) {
+ kfree(ri);
+ return NULL;
+ }
+
+ ri->base_virt = linear_mem + index * KVM_CHUNK_SIZE;
+ ri->base_pfn = __pa(ri->base_virt) >> PAGE_SHIFT;
+ ri->npages = (KVM_CHUNK_SIZE >> PAGE_SHIFT) << order;
+ ri->type = type;
+ atomic_set(&ri->use_count, 1);
+
+ memset(ri->base_virt, 0, ri->npages << PAGE_SHIFT);
+
+ return ri;
}
static void kvm_release_linear(struct kvmppc_linear_info *ri)
{
- if (atomic_dec_and_test(&ri->use_count)) {
- spin_lock(&linear_lock);
- list_add_tail(&ri->list, &free_linears);
- spin_unlock(&linear_lock);
+ unsigned long index, n;
+ if (atomic_dec_and_test(&ri->use_count)) {
+ index = ((char *)ri->base_virt - linear_mem) / KVM_CHUNK_SIZE;
+ n = ri->npages / (KVM_CHUNK_SIZE >> PAGE_SHIFT);
+ if (index < linear_bitmap_len &&
+ index + n < linear_bitmap_len) {
+ spin_lock(&linear_lock);
+ bitmap_free(linear_bitmap, index, n);
+ spin_unlock(&linear_lock);
+ } else {
+ pr_err("KVM: oops, corrupted linear_info %p\n", ri);
+ }
+ kfree(ri);
}
}
@@ -211,23 +437,51 @@ static void kvm_release_linear(struct kvmppc_linear_info *ri)
*/
void __init kvm_linear_init(void)
{
- /* HPT */
- kvm_linear_init_one(1 << kvm_hpt_order, kvm_hpt_count, KVM_LINEAR_HPT);
+ unsigned long total, nchunks, align;
+ struct page *pg;
+ unsigned long j, npages;
- /* RMA */
- /* Only do this on PPC970 in HV mode */
- if (!cpu_has_feature(CPU_FTR_HVMODE) ||
- !cpu_has_feature(CPU_FTR_ARCH_201))
+ /* only do this if HV KVM is possible on this cpu */
+ if (!cpu_has_feature(CPU_FTR_HVMODE))
return;
- if (!kvm_rma_size || !kvm_rma_count)
- return;
+ /* HPT */
+ total = kvm_hpt_count << kvm_hpt_order;
+
+ /* On PPC970 in HV mode, allow space for RMAs */
+ if (cpu_has_feature(CPU_FTR_ARCH_201)) {
+ if (lpcr_rmls(kvm_rma_size) < 0) {
+ pr_err("RMA size of 0x%lx not supported, using 0x%x\n",
+ kvm_rma_size, DEFAULT_RMA_SIZE);
+ kvm_rma_size = DEFAULT_RMA_SIZE;
+ }
+ total += kvm_rma_count * kvm_rma_size;
+ }
- /* Check that the requested size is one supported in hardware */
- if (lpcr_rmls(kvm_rma_size) < 0) {
- pr_err("RMA size of 0x%lx not supported\n", kvm_rma_size);
+ if (!total)
return;
- }
- kvm_linear_init_one(kvm_rma_size, kvm_rma_count, KVM_LINEAR_RMA);
+ /* round up to multiple of amount represented by a bitmap word (16MB) */
+ total = (total + KVM_CHUNK_SIZE * BITS_PER_LONG - 1) &
+ ~(KVM_CHUNK_SIZE * BITS_PER_LONG - 1);
+
+ /* allocate bitmap of in-use chunks */
+ nchunks = total / KVM_CHUNK_SIZE;
+ linear_bitmap = alloc_bootmem(nchunks / BITS_PER_BYTE);
+ memset(linear_bitmap, 0, nchunks / BITS_PER_BYTE);
+ linear_bitmap_len = nchunks;
+
+ /* ask for maximum useful alignment, i.e. max power of 2 <= total */
+ align = 1ul << __ilog2(total);
+
+ linear_mem = alloc_bootmem_align(total, align);
+ pr_info("Allocated KVM memory at %p (%ld MB)\n", linear_mem,
+ total >> 20);
+
+ npages = total >> PAGE_SHIFT;
+ pg = virt_to_page(linear_mem);
+ for (j = 0; j < npages; ++j) {
+ atomic_inc(&pg->_count);
+ ++pg;
+ }
}
--
1.7.10.rc3.219.g53414
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] KVM: PPC: Book3S HV: Allocate user-requested size of HPT from preallocated pool
2012-09-12 0:34 [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory Paul Mackerras
2012-09-12 0:35 ` [PATCH 1/3] KVM: PPC: Book3S HV: Add a more " Paul Mackerras
@ 2012-09-12 0:36 ` Paul Mackerras
2012-09-12 0:36 ` [PATCH 3/3] KVM: PPC: Book3S HV: Add command-line option for amount of KVM linear memory Paul Mackerras
2012-09-13 23:32 ` [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for " Alexander Graf
3 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2012-09-12 0:36 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, kvm
Currently, although userspace can request any size it wants that is a
power of 2 between 256k and 64GB for the hashed page table (HPT) for
the guest, the HPT allocator code only uses the preallocated memory
pool for HPTs of the standard size, 16MB, and uses the kernel page
allocator for anything else. Now that we can allocate arbitrary
power-of-2 sizes from the preallocated memory pool, we can use it for
any requested size, so this makes it do that.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 2 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 33 ++++++++++-----------------------
arch/powerpc/kvm/book3s_hv_builtin.c | 4 ++--
3 files changed, 13 insertions(+), 26 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 5dccdc5..925a869 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -141,7 +141,7 @@ extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm,
struct kvm_allocate_rma *rma);
extern struct kvmppc_linear_info *kvm_alloc_rma(void);
extern void kvm_release_rma(struct kvmppc_linear_info *ri);
-extern struct kvmppc_linear_info *kvm_alloc_hpt(void);
+extern struct kvmppc_linear_info *kvm_alloc_hpt(long);
extern void kvm_release_hpt(struct kvmppc_linear_info *li);
extern int kvmppc_core_init_vm(struct kvm *kvm);
extern void kvmppc_core_destroy_vm(struct kvm *kvm);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d95d113..148e444 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -42,7 +42,7 @@
long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
{
- unsigned long hpt;
+ unsigned long hpt = 0;
struct revmap_entry *rev;
struct kvmppc_linear_info *li;
long order = kvm_hpt_order;
@@ -53,34 +53,21 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
order = PPC_MIN_HPT_ORDER;
}
- /*
- * If the user wants a different size from default,
- * try first to allocate it from the kernel page allocator.
- */
- hpt = 0;
- if (order != kvm_hpt_order) {
- hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
- __GFP_NOWARN, order - PAGE_SHIFT);
- if (!hpt)
- --order;
- }
-
- /* Next try to allocate from the preallocated pool */
- if (!hpt) {
- li = kvm_alloc_hpt();
+ /* Try successively smaller sizes */
+ while (order > PPC_MIN_HPT_ORDER) {
+ /* First try allocating from the preallocated contiguous pool */
+ li = kvm_alloc_hpt(order);
if (li) {
hpt = (ulong)li->base_virt;
kvm->arch.hpt_li = li;
- order = kvm_hpt_order;
+ break;
}
- }
-
- /* Lastly try successively smaller sizes from the page allocator */
- while (!hpt && order > PPC_MIN_HPT_ORDER) {
+ /* If that doesn't work, try the kernel page allocator */
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
__GFP_NOWARN, order - PAGE_SHIFT);
- if (!hpt)
- --order;
+ if (hpt)
+ break;
+ --order;
}
if (!hpt)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index f0c51c5..0c4633c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -126,9 +126,9 @@ static int __init early_parse_hpt_count(char *p)
}
early_param("kvm_hpt_count", early_parse_hpt_count);
-struct kvmppc_linear_info *kvm_alloc_hpt(void)
+struct kvmppc_linear_info *kvm_alloc_hpt(long order)
{
- return kvm_alloc_linear(KVM_LINEAR_HPT, kvm_hpt_order);
+ return kvm_alloc_linear(KVM_LINEAR_HPT, order);
}
EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
--
1.7.10.rc3.219.g53414
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] KVM: PPC: Book3S HV: Add command-line option for amount of KVM linear memory
2012-09-12 0:34 [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory Paul Mackerras
2012-09-12 0:35 ` [PATCH 1/3] KVM: PPC: Book3S HV: Add a more " Paul Mackerras
2012-09-12 0:36 ` [PATCH 2/3] KVM: PPC: Book3S HV: Allocate user-requested size of HPT from preallocated pool Paul Mackerras
@ 2012-09-12 0:36 ` Paul Mackerras
2012-09-13 23:32 ` [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for " Alexander Graf
3 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2012-09-12 0:36 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, kvm
This adds a kernel command line option to allow the user to specify how
much memory should be reserved in early boot for use for hashed page
tables (HPTs) and real mode areas (RMAs) for KVM guests. The option is
called "kvm_memory" and the amount can be specified as an absolute
amount (for example, "kvm_memory=128M") or as a percentage of system
RAM (for example, "kvm_memory=5%").
If the option is not given, it defaults to 3%, but this is only
allocated on systems where KVM can run in HV mode. In particular it
isn't allocated when the kernel is running as a guest, either of KVM
or PowerVM.
The amount actually allocated is the larger of the amount specified with
the kvm_memory option, and the amount specified with the existing
kvm_rma_count, kvm_rma_size and kvm_hpt_count options. The
kvm_rma_count and kvm_hpt_count options default to 0.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/kvm/book3s_hv_builtin.c | 40 ++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 0c4633c..2cebd02 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -431,6 +431,29 @@ static void kvm_release_linear(struct kvmppc_linear_info *ri)
}
/*
+ * Default to reserving 3% of RAM
+ * (it only gets reserved if HV KVM is possible on this processor).
+ */
+static u64 kvm_memory = 3;
+static int kvm_memory_percent = 1;
+
+static int __init early_parse_kvm_memory(char *p)
+{
+ char *endp;
+
+ if (!p)
+ return 1;
+
+ kvm_memory = memparse(p, &endp);
+ kvm_memory_percent = 0;
+ if (*endp == '%')
+ kvm_memory_percent = 1;
+
+ return 0;
+}
+early_param("kvm_memory", early_parse_kvm_memory);
+
+/*
* Called at boot time while the bootmem allocator is active,
* to allocate contiguous physical memory for the hash page
* tables for guests.
@@ -458,6 +481,23 @@ void __init kvm_linear_init(void)
total += kvm_rma_count * kvm_rma_size;
}
+ /*
+ * See if an explicit amount or percentage is requested;
+ * if so treat it as a minimum.
+ */
+ if (kvm_memory) {
+ u64 memsize = max_pfn << PAGE_SHIFT;
+
+ if (!kvm_memory_percent) {
+ if (kvm_memory < memsize && kvm_memory > total)
+ total = kvm_memory;
+ } else if (kvm_memory < 100) {
+ memsize = (memsize * kvm_memory) / 100;
+ if (memsize > total)
+ total = memsize;
+ }
+ }
+
if (!total)
return;
--
1.7.10.rc3.219.g53414
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-09-12 0:34 [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory Paul Mackerras
` (2 preceding siblings ...)
2012-09-12 0:36 ` [PATCH 3/3] KVM: PPC: Book3S HV: Add command-line option for amount of KVM linear memory Paul Mackerras
@ 2012-09-13 23:32 ` Alexander Graf
2012-09-14 8:11 ` Paul Mackerras
3 siblings, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2012-09-13 23:32 UTC (permalink / raw)
To: Paul Mackerras; +Cc: kvm-ppc, kvm
On 12.09.2012, at 02:34, Paul Mackerras wrote:
> This series of 3 patches makes it possible for guests to allocate
> whatever size of HPT they need from linear memory preallocated at
> boot, rather than being restricted to a single size of HPT (by
> default, 16MB) and having to use the kernel page allocator for
> anything else -- which in practice limits them to at most 16MB given
> the default value for the maximum page order. Instead of allocating
> many individual pieces of memory, this allocates a single contiguous
> area and uses a simple bitmap-based allocator to hand out pieces of it
> as required.
Have you tried to play with CMA for this? It sounds like it could buy us exactly what we need.
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-09-13 23:32 ` [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for " Alexander Graf
@ 2012-09-14 8:11 ` Paul Mackerras
2012-09-14 12:13 ` Alexander Graf
0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2012-09-14 8:11 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, kvm
On Fri, Sep 14, 2012 at 01:32:23AM +0200, Alexander Graf wrote:
>
> On 12.09.2012, at 02:34, Paul Mackerras wrote:
>
> > This series of 3 patches makes it possible for guests to allocate
> > whatever size of HPT they need from linear memory preallocated at
> > boot, rather than being restricted to a single size of HPT (by
> > default, 16MB) and having to use the kernel page allocator for
> > anything else -- which in practice limits them to at most 16MB given
> > the default value for the maximum page order. Instead of allocating
> > many individual pieces of memory, this allocates a single contiguous
> > area and uses a simple bitmap-based allocator to hand out pieces of it
> > as required.
>
> Have you tried to play with CMA for this? It sounds like it could buy us exactly what we need.
Interesting, I hadn't noticed that there. I had a bit of a look at
it, and it's certainly in the right general direction, however it
would need some changes to do what we need. It limits the alignment
to at most 512 pages, i.e. 2MB with 4k pages or 32MB with 64k pages,
but we need RMAs of 64MB to 256MB for PPC970 and they have to be
aligned on their size, as do the HPTs for PPC970.
Secondly, it has a link with the page allocator that I don't fully
understand, but it seems from the comments in alloc_contig_range()
(mm/page_alloc.c) that you can allocate at most MAX_ORDER_NR_PAGES
pages at once, and that defaults to 16MB for ppc64, which isn't nearly
enough. If that's true then it would make it unusable for this.
Paul.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-09-14 8:11 ` Paul Mackerras
@ 2012-09-14 12:13 ` Alexander Graf
2012-09-14 12:45 ` Paul Mackerras
0 siblings, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2012-09-14 12:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: kvm-ppc, KVM list, linux-mm, m.nazarewicz
On 14.09.2012, at 10:11, Paul Mackerras wrote:
> On Fri, Sep 14, 2012 at 01:32:23AM +0200, Alexander Graf wrote:
>>
>> On 12.09.2012, at 02:34, Paul Mackerras wrote:
>>
>>> This series of 3 patches makes it possible for guests to allocate
>>> whatever size of HPT they need from linear memory preallocated at
>>> boot, rather than being restricted to a single size of HPT (by
>>> default, 16MB) and having to use the kernel page allocator for
>>> anything else -- which in practice limits them to at most 16MB given
>>> the default value for the maximum page order. Instead of allocating
>>> many individual pieces of memory, this allocates a single contiguous
>>> area and uses a simple bitmap-based allocator to hand out pieces of it
>>> as required.
>>
>> Have you tried to play with CMA for this? It sounds like it could buy us exactly what we need.
>
> Interesting, I hadn't noticed that there. I had a bit of a look at
> it, and it's certainly in the right general direction, however it
> would need some changes to do what we need. It limits the alignment
> to at most 512 pages, i.e. 2MB with 4k pages or 32MB with 64k pages,
> but we need RMAs of 64MB to 256MB for PPC970 and they have to be
> aligned on their size, as do the HPTs for PPC970.
>
> Secondly, it has a link with the page allocator that I don't fully
> understand, but it seems from the comments in alloc_contig_range()
> (mm/page_alloc.c) that you can allocate at most MAX_ORDER_NR_PAGES
> pages at once, and that defaults to 16MB for ppc64, which isn't nearly
> enough. If that's true then it would make it unusable for this.
So do you think it makes more sense to reimplement a large page allocator in KVM, as this patch set does, or improve CMA to get us really big chunks of linear memory?
Let's ask the Linux mm guys too :). Maybe they have an idea.
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-09-14 12:13 ` Alexander Graf
@ 2012-09-14 12:45 ` Paul Mackerras
2012-09-14 13:15 ` Alexander Graf
0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2012-09-14 12:45 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, KVM list, linux-mm, m.nazarewicz
On Fri, Sep 14, 2012 at 02:13:37PM +0200, Alexander Graf wrote:
> So do you think it makes more sense to reimplement a large page allocator in KVM, as this patch set does, or improve CMA to get us really big chunks of linear memory?
>
> Let's ask the Linux mm guys too :). Maybe they have an idea.
I asked the authors of CMA, and apparently it's not limited to
MAX_ORDER as I feared. It has the advantage that the memory can be
used for other things such as page cache when it's not needed, but not
for immovable allocations such as kmalloc. I'm going to try it out.
It will need a patch to increase the maximum alignment it allows.
Paul.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-09-14 12:45 ` Paul Mackerras
@ 2012-09-14 13:15 ` Alexander Graf
2012-10-26 1:17 ` Paul Mackerras
0 siblings, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2012-09-14 13:15 UTC (permalink / raw)
To: Paul Mackerras; +Cc: kvm-ppc, KVM list, linux-mm, mina86
On 14.09.2012, at 14:45, Paul Mackerras wrote:
> On Fri, Sep 14, 2012 at 02:13:37PM +0200, Alexander Graf wrote:
>
>> So do you think it makes more sense to reimplement a large page allocator in KVM, as this patch set does, or improve CMA to get us really big chunks of linear memory?
>>
>> Let's ask the Linux mm guys too :). Maybe they have an idea.
>
> I asked the authors of CMA, and apparently it's not limited to
> MAX_ORDER as I feared. It has the advantage that the memory can be
> used for other things such as page cache when it's not needed, but not
> for immovable allocations such as kmalloc. I'm going to try it out.
> It will need a patch to increase the maximum alignment it allows.
Awesome. Thanks a lot. I'd really prefer if we can stick to generic Linux solutions rather than invent our own :).
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-09-14 13:15 ` Alexander Graf
@ 2012-10-26 1:17 ` Paul Mackerras
2012-10-30 9:12 ` Alexander Graf
0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2012-10-26 1:17 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm-ppc, KVM list, linux-mm, mina86
On Fri, Sep 14, 2012 at 03:15:32PM +0200, Alexander Graf wrote:
>
> On 14.09.2012, at 14:45, Paul Mackerras wrote:
>
> > On Fri, Sep 14, 2012 at 02:13:37PM +0200, Alexander Graf wrote:
> >
> >> So do you think it makes more sense to reimplement a large page allocator in KVM, as this patch set does, or improve CMA to get us really big chunks of linear memory?
> >>
> >> Let's ask the Linux mm guys too :). Maybe they have an idea.
> >
> > I asked the authors of CMA, and apparently it's not limited to
> > MAX_ORDER as I feared. It has the advantage that the memory can be
> > used for other things such as page cache when it's not needed, but not
> > for immovable allocations such as kmalloc. I'm going to try it out.
> > It will need a patch to increase the maximum alignment it allows.
>
> Awesome. Thanks a lot. I'd really prefer if we can stick to generic Linux solutions rather than invent our own :).
Turns out there is a difficulty with this. When we have a guest page
that we want to pin in memory, and that page happens to have been
allocated within the CMA region, we would need to migrate it out of
the CMA region before pinning it, since otherwise it would reduce the
amount of contiguous memory available. But it appears that there
isn't any way to do that.
Paul.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory
2012-10-26 1:17 ` Paul Mackerras
@ 2012-10-30 9:12 ` Alexander Graf
0 siblings, 0 replies; 11+ messages in thread
From: Alexander Graf @ 2012-10-30 9:12 UTC (permalink / raw)
To: Paul Mackerras; +Cc: kvm-ppc, KVM list, linux-mm, mina86
On 26.10.2012, at 03:17, Paul Mackerras wrote:
> On Fri, Sep 14, 2012 at 03:15:32PM +0200, Alexander Graf wrote:
>>
>> On 14.09.2012, at 14:45, Paul Mackerras wrote:
>>
>>> On Fri, Sep 14, 2012 at 02:13:37PM +0200, Alexander Graf wrote:
>>>
>>>> So do you think it makes more sense to reimplement a large page allocator in KVM, as this patch set does, or improve CMA to get us really big chunks of linear memory?
>>>>
>>>> Let's ask the Linux mm guys too :). Maybe they have an idea.
>>>
>>> I asked the authors of CMA, and apparently it's not limited to
>>> MAX_ORDER as I feared. It has the advantage that the memory can be
>>> used for other things such as page cache when it's not needed, but not
>>> for immovable allocations such as kmalloc. I'm going to try it out.
>>> It will need a patch to increase the maximum alignment it allows.
>>
>> Awesome. Thanks a lot. I'd really prefer if we can stick to generic Linux solutions rather than invent our own :).
>
> Turns out there is a difficulty with this. When we have a guest page
> that we want to pin in memory, and that page happens to have been
> allocated within the CMA region, we would need to migrate it out of
> the CMA region before pinning it, since otherwise it would reduce the
> amount of contiguous memory available. But it appears that there
> isn't any way to do that.
How does this work for other users of CMA? I can't possibly believe that we only ever want a static amount of contiguous memory on the system.
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-10-30 9:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 0:34 [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for linear memory Paul Mackerras
2012-09-12 0:35 ` [PATCH 1/3] KVM: PPC: Book3S HV: Add a more " Paul Mackerras
2012-09-12 0:36 ` [PATCH 2/3] KVM: PPC: Book3S HV: Allocate user-requested size of HPT from preallocated pool Paul Mackerras
2012-09-12 0:36 ` [PATCH 3/3] KVM: PPC: Book3S HV: Add command-line option for amount of KVM linear memory Paul Mackerras
2012-09-13 23:32 ` [PATCH 0/3] KVM: PPC: Book3S HV: More flexible allocator for " Alexander Graf
2012-09-14 8:11 ` Paul Mackerras
2012-09-14 12:13 ` Alexander Graf
2012-09-14 12:45 ` Paul Mackerras
2012-09-14 13:15 ` Alexander Graf
2012-10-26 1:17 ` Paul Mackerras
2012-10-30 9:12 ` Alexander Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox