* [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
@ 2011-12-12 14:47 Sasha Levin
2011-12-12 14:54 ` Pekka Enberg
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Sasha Levin @ 2011-12-12 14:47 UTC (permalink / raw)
To: penberg; +Cc: mingo, gorcunov, asias.hejun, kvm, ajsween, Sasha Levin
This patch mmaps guest kernel into it's own memory slot instead of reading
it into the memory.
The advantages are:
- Smaller memory footprint (same effect as KSM if running multiple guests)
- Faster loading of larger kernels.
Suggested-by: "Sweeney, Andrew John" <ajsween@sandia.gov>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
tools/kvm/x86/include/kvm/kvm-arch.h | 4 ++
tools/kvm/x86/kvm.c | 60 +++++++++++++++++++++++++++-------
2 files changed, 52 insertions(+), 12 deletions(-)
diff --git a/tools/kvm/x86/include/kvm/kvm-arch.h b/tools/kvm/x86/include/kvm/kvm-arch.h
index 686b1b8..3f7a311 100644
--- a/tools/kvm/x86/include/kvm/kvm-arch.h
+++ b/tools/kvm/x86/include/kvm/kvm-arch.h
@@ -35,6 +35,10 @@ struct kvm {
u64 ram_size;
void *ram_start;
+ int bz_fd;
+ void *bz_start;
+ u32 bz_len;
+
bool nmi_disabled;
bool single_step;
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index da4a6b6..42d7810 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -9,6 +9,7 @@
#include <asm/bootparam.h>
#include <linux/kvm.h>
+#include <linux/kernel.h>
#include <sys/types.h>
#include <sys/ioctl.h>
@@ -93,24 +94,51 @@ void kvm__init_ram(struct kvm *kvm)
void *host_mem;
if (kvm->ram_size < KVM_32BIT_GAP_START) {
- /* Use a single block of RAM for 32bit RAM */
-
+ /* Memory between 0 and where the kernel starts */
+ u64 bzl = ALIGN(kvm->bz_len, PAGE_SIZE);
phys_start = 0;
- phys_size = kvm->ram_size;
+ phys_size = BZ_KERNEL_START;
host_mem = kvm->ram_start;
kvm__register_mem(kvm, phys_start, phys_size, host_mem);
- } else {
- /* First RAM range from zero to the PCI gap: */
+ /* Mapped kernel */
+ phys_start = BZ_KERNEL_START;
+ phys_size = bzl;
+ host_mem = kvm->bz_start;
+
+ kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+
+ /* Rest of the memory */
+ phys_start = BZ_KERNEL_START + bzl;
+ phys_size = kvm->ram_size - (BZ_KERNEL_START + bzl);
+ host_mem = kvm->ram_start + (BZ_KERNEL_START + bzl);
+
+ kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+ } else {
+ /* Memory between 0 and where the kernel starts */
+ u64 bzl = ALIGN(kvm->bz_len, PAGE_SIZE);
phys_start = 0;
- phys_size = KVM_32BIT_GAP_START;
+ phys_size = BZ_KERNEL_START;
host_mem = kvm->ram_start;
kvm__register_mem(kvm, phys_start, phys_size, host_mem);
- /* Second RAM range from 4GB to the end of RAM: */
+ /* Mapped kernel */
+ phys_start = BZ_KERNEL_START;
+ phys_size = bzl;
+ host_mem = kvm->bz_start;
+ kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+
+ /* Rest of the memory until the 4GB gap */
+ phys_start = BZ_KERNEL_START + bzl;
+ phys_size = KVM_32BIT_GAP_START - (BZ_KERNEL_START + bzl);
+ host_mem = kvm->ram_start + (BZ_KERNEL_START + bzl);
+
+ kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+
+ /* RAM range from 4GB to the end of RAM */
phys_start = 0x100000000ULL;
phys_size = kvm->ram_size - phys_size;
host_mem = kvm->ram_start + phys_start;
@@ -232,7 +260,8 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel,
struct boot_params *kern_boot;
unsigned long setup_sects;
struct boot_params boot;
- size_t cmdline_size;
+ struct stat st;
+ size_t cmdline_size, setup_end;
ssize_t setup_size;
void *p;
int nr;
@@ -242,6 +271,9 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel,
* memory layout.
*/
+ if (fstat(fd_kernel, &st) < 0)
+ die_perror("fstat");
+
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
@@ -268,11 +300,15 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel,
if (read(fd_kernel, p, setup_size) != setup_size)
die_perror("read");
- /* copy vmlinux.bin to BZ_KERNEL_START*/
- p = guest_flat_to_host(kvm, BZ_KERNEL_START);
+ /* mmap the actual kernel */
+ kvm->bz_fd = dup(fd_kernel);
+ kvm->bz_len = st.st_size;
+ setup_end = ALIGN(setup_size - PAGE_SIZE, PAGE_SIZE); /* Need it aligned to PAGE_SIZE */
+ kvm->bz_start = mmap(NULL, kvm->bz_len, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE, kvm->bz_fd, setup_end);
- while ((nr = read(fd_kernel, p, 65536)) > 0)
- p += nr;
+ /* NOP everything before the kernel start */
+ memset(kvm->bz_start, 0x90, setup_size - setup_end);
p = guest_flat_to_host(kvm, BOOT_CMDLINE_OFFSET);
if (kernel_cmdline) {
--
1.7.8
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 14:47 [PATCH] kvm tools: mmap guest kernel instead of reading it into memory Sasha Levin
@ 2011-12-12 14:54 ` Pekka Enberg
2011-12-12 17:47 ` Sasha Levin
2011-12-12 15:59 ` Pekka Enberg
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Pekka Enberg @ 2011-12-12 14:54 UTC (permalink / raw)
To: Sasha Levin; +Cc: mingo, gorcunov, asias.hejun, kvm, ajsween
On Mon, Dec 12, 2011 at 4:47 PM, Sasha Levin <levinsasha928@gmail.com> wrote:
> This patch mmaps guest kernel into it's own memory slot instead of reading
> it into the memory.
>
> The advantages are:
> - Smaller memory footprint (same effect as KSM if running multiple guests)
KSM isn't free so it's smaller footprint with less CPU cycles. It's
mostly useful for the special case of running tons of guests using the
same kernel image.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 14:54 ` Pekka Enberg
@ 2011-12-12 17:47 ` Sasha Levin
2011-12-12 16:40 ` Andrew Walrond
0 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2011-12-12 17:47 UTC (permalink / raw)
To: Pekka Enberg; +Cc: mingo, gorcunov, asias.hejun, kvm, ajsween
On Mon, 2011-12-12 at 16:54 +0200, Pekka Enberg wrote:
> On Mon, Dec 12, 2011 at 4:47 PM, Sasha Levin <levinsasha928@gmail.com> wrote:
> > This patch mmaps guest kernel into it's own memory slot instead of reading
> > it into the memory.
> >
> > The advantages are:
> > - Smaller memory footprint (same effect as KSM if running multiple guests)
>
> KSM isn't free so it's smaller footprint with less CPU cycles. It's
> mostly useful for the special case of running tons of guests using the
> same kernel image.
Another usecase which I've only first heard about couple of days ago is
loading extremely large bzImages. Those images are 300MB+ in size and
come with built in filesystem. Apparently it's some sort of a livecd
variant.
--
Sasha.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 17:47 ` Sasha Levin
@ 2011-12-12 16:40 ` Andrew Walrond
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Walrond @ 2011-12-12 16:40 UTC (permalink / raw)
To: Sasha Levin; +Cc: Pekka Enberg, mingo, gorcunov, asias.hejun, kvm, ajsween
On Mon, Dec 12, 2011 at 07:47:35PM +0200, Sasha Levin wrote:
>
> Another usecase which I've only first heard about couple of days ago is
> loading extremely large bzImages. Those images are 300MB+ in size and
> come with built in filesystem. Apparently it's some sort of a livecd
> variant.
>
Having a bootable one-file kernel/distro is useful for lots of reasons.
Upgrading all my servers by replacing one file not the least of them ;)
But since I often run a pile of these simultaneously in vms, I can vouch that
this functionality will be very useful indeed.
Andrew Walrond
Now if only linus will accept the rootfs is tmpfs not ramfs patch...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 14:47 [PATCH] kvm tools: mmap guest kernel instead of reading it into memory Sasha Levin
2011-12-12 14:54 ` Pekka Enberg
@ 2011-12-12 15:59 ` Pekka Enberg
2011-12-12 18:14 ` Sasha Levin
2011-12-12 17:18 ` Ingo Molnar
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Pekka Enberg @ 2011-12-12 15:59 UTC (permalink / raw)
To: Sasha Levin; +Cc: mingo, gorcunov, asias.hejun, kvm, ajsween
On Mon, Dec 12, 2011 at 4:47 PM, Sasha Levin <levinsasha928@gmail.com> wrote:
> + /* mmap the actual kernel */
> + kvm->bz_fd = dup(fd_kernel);
> + kvm->bz_len = st.st_size;
> + setup_end = ALIGN(setup_size - PAGE_SIZE, PAGE_SIZE); /* Need it aligned to PAGE_SIZE */
> + kvm->bz_start = mmap(NULL, kvm->bz_len, PROT_READ | PROT_WRITE,
> + MAP_PRIVATE, kvm->bz_fd, setup_end);
>
> - while ((nr = read(fd_kernel, p, 65536)) > 0)
> - p += nr;
> + /* NOP everything before the kernel start */
> + memset(kvm->bz_start, 0x90, setup_size - setup_end);
So what's the deal with this NOP thing? It really needs a comment that
explains it all.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 15:59 ` Pekka Enberg
@ 2011-12-12 18:14 ` Sasha Levin
2011-12-12 18:03 ` Avi Kivity
0 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2011-12-12 18:14 UTC (permalink / raw)
To: Pekka Enberg; +Cc: mingo, gorcunov, asias.hejun, kvm, ajsween
On Mon, 2011-12-12 at 17:59 +0200, Pekka Enberg wrote:
> On Mon, Dec 12, 2011 at 4:47 PM, Sasha Levin <levinsasha928@gmail.com> wrote:
> > + /* mmap the actual kernel */
> > + kvm->bz_fd = dup(fd_kernel);
> > + kvm->bz_len = st.st_size;
> > + setup_end = ALIGN(setup_size - PAGE_SIZE, PAGE_SIZE); /* Need it aligned to PAGE_SIZE */
> > + kvm->bz_start = mmap(NULL, kvm->bz_len, PROT_READ | PROT_WRITE,
> > + MAP_PRIVATE, kvm->bz_fd, setup_end);
> >
> > - while ((nr = read(fd_kernel, p, 65536)) > 0)
> > - p += nr;
> > + /* NOP everything before the kernel start */
> > + memset(kvm->bz_start, 0x90, setup_size - setup_end);
>
> So what's the deal with this NOP thing? It really needs a comment that
> explains it all.
Right, I'll explain it here and if it sounds right to you I'll add it
into the patch.
Since the start of the actual kernel image is somewhere into the
bzImage, and is not aligned to anything, we can't mmap() directly to the
beginning of it.
So what we do is mmap the kernel with <PAGE_SIZE bytes before it which
belong to the setup code.
KVM expects page aligned pointers for both in-guest physical memory
start, and the corresponding userspace address. This means that we can't
simply pass an offset within the memory we mapped before since it won't
be page aligned.
The solution is to NOP the bytes which belong to the setup code right
before the kernel starts. In practice it means <PAGE_SIZE NOPs before
actual kernel code starts running.
--
Sasha.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 18:14 ` Sasha Levin
@ 2011-12-12 18:03 ` Avi Kivity
0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2011-12-12 18:03 UTC (permalink / raw)
To: Sasha Levin; +Cc: Pekka Enberg, mingo, gorcunov, asias.hejun, kvm, ajsween
On 12/12/2011 08:14 PM, Sasha Levin wrote:
> On Mon, 2011-12-12 at 17:59 +0200, Pekka Enberg wrote:
> > On Mon, Dec 12, 2011 at 4:47 PM, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > + /* mmap the actual kernel */
> > > + kvm->bz_fd = dup(fd_kernel);
> > > + kvm->bz_len = st.st_size;
> > > + setup_end = ALIGN(setup_size - PAGE_SIZE, PAGE_SIZE); /* Need it aligned to PAGE_SIZE */
> > > + kvm->bz_start = mmap(NULL, kvm->bz_len, PROT_READ | PROT_WRITE,
> > > + MAP_PRIVATE, kvm->bz_fd, setup_end);
> > >
> > > - while ((nr = read(fd_kernel, p, 65536)) > 0)
> > > - p += nr;
> > > + /* NOP everything before the kernel start */
> > > + memset(kvm->bz_start, 0x90, setup_size - setup_end);
> >
> > So what's the deal with this NOP thing? It really needs a comment that
> > explains it all.
>
> Right, I'll explain it here and if it sounds right to you I'll add it
> into the patch.
>
> Since the start of the actual kernel image is somewhere into the
> bzImage, and is not aligned to anything, we can't mmap() directly to the
> beginning of it.
>
> So what we do is mmap the kernel with <PAGE_SIZE bytes before it which
> belong to the setup code.
>
> KVM expects page aligned pointers for both in-guest physical memory
> start, and the corresponding userspace address. This means that we can't
> simply pass an offset within the memory we mapped before since it won't
> be page aligned.
>
> The solution is to NOP the bytes which belong to the setup code right
> before the kernel starts. In practice it means <PAGE_SIZE NOPs before
> actual kernel code starts running.
Can't you just adjust rip to point to the starting code?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 14:47 [PATCH] kvm tools: mmap guest kernel instead of reading it into memory Sasha Levin
2011-12-12 14:54 ` Pekka Enberg
2011-12-12 15:59 ` Pekka Enberg
@ 2011-12-12 17:18 ` Ingo Molnar
2011-12-12 18:08 ` Avi Kivity
2011-12-12 18:10 ` Avi Kivity
4 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2011-12-12 17:18 UTC (permalink / raw)
To: Sasha Levin; +Cc: penberg, gorcunov, asias.hejun, kvm, ajsween
* Sasha Levin <levinsasha928@gmail.com> wrote:
> + kvm->bz_start = mmap(NULL, kvm->bz_len, PROT_READ | PROT_WRITE,
> + MAP_PRIVATE, kvm->bz_fd, setup_end);
>
> + /* NOP everything before the kernel start */
> + memset(kvm->bz_start, 0x90, setup_size - setup_end);
You should really, really think about the case where mmap()
fails.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 14:47 [PATCH] kvm tools: mmap guest kernel instead of reading it into memory Sasha Levin
` (2 preceding siblings ...)
2011-12-12 17:18 ` Ingo Molnar
@ 2011-12-12 18:08 ` Avi Kivity
2011-12-12 18:10 ` Avi Kivity
4 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2011-12-12 18:08 UTC (permalink / raw)
To: Sasha Levin; +Cc: penberg, mingo, gorcunov, asias.hejun, kvm, ajsween
On 12/12/2011 04:47 PM, Sasha Levin wrote:
> This patch mmaps guest kernel into it's own memory slot instead of reading
> it into the memory.
>
> The advantages are:
> - Smaller memory footprint (same effect as KSM if running multiple guests)
> - Faster loading of larger kernels.
>
How many COW faults do you get when loading a kernel? Would be
interesting to try both smp and up guests.
Surprisingly, there is no tracepoint for that that I can see, but you
can probe do_wp_page() and count it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 14:47 [PATCH] kvm tools: mmap guest kernel instead of reading it into memory Sasha Levin
` (3 preceding siblings ...)
2011-12-12 18:08 ` Avi Kivity
@ 2011-12-12 18:10 ` Avi Kivity
2011-12-13 6:46 ` Sasha Levin
4 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2011-12-12 18:10 UTC (permalink / raw)
To: Sasha Levin; +Cc: penberg, mingo, gorcunov, asias.hejun, kvm, ajsween
On 12/12/2011 04:47 PM, Sasha Levin wrote:
> This patch mmaps guest kernel into it's own memory slot instead of reading
> it into the memory.
>
> - } else {
> - /* First RAM range from zero to the PCI gap: */
>
> + /* Mapped kernel */
> + phys_start = BZ_KERNEL_START;
> + phys_size = bzl;
> + host_mem = kvm->bz_start;
> +
> + kvm__register_mem(kvm, phys_start, phys_size, host_mem);
> +
> + /* Rest of the memory */
> + phys_start = BZ_KERNEL_START + bzl;
> + phys_size = kvm->ram_size - (BZ_KERNEL_START + bzl);
> + host_mem = kvm->ram_start + (BZ_KERNEL_START + bzl);
> +
> + kvm__register_mem(kvm, phys_start, phys_size, host_mem);
>
You don't actually need separate slots for this (there is no requirement
that a slot == one vma).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory
2011-12-12 18:10 ` Avi Kivity
@ 2011-12-13 6:46 ` Sasha Levin
0 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2011-12-13 6:46 UTC (permalink / raw)
To: Avi Kivity; +Cc: penberg, mingo, gorcunov, asias.hejun, kvm, ajsween
On Mon, 2011-12-12 at 20:10 +0200, Avi Kivity wrote:
> On 12/12/2011 04:47 PM, Sasha Levin wrote:
> > This patch mmaps guest kernel into it's own memory slot instead of reading
> > it into the memory.
> >
> > - } else {
> > - /* First RAM range from zero to the PCI gap: */
> >
> > + /* Mapped kernel */
> > + phys_start = BZ_KERNEL_START;
> > + phys_size = bzl;
> > + host_mem = kvm->bz_start;
> > +
> > + kvm__register_mem(kvm, phys_start, phys_size, host_mem);
> > +
> > + /* Rest of the memory */
> > + phys_start = BZ_KERNEL_START + bzl;
> > + phys_size = kvm->ram_size - (BZ_KERNEL_START + bzl);
> > + host_mem = kvm->ram_start + (BZ_KERNEL_START + bzl);
> > +
> > + kvm__register_mem(kvm, phys_start, phys_size, host_mem);
> >
>
>
> You don't actually need separate slots for this (there is no requirement
> that a slot == one vma).
How exactly would I put it into one slot?
--
Sasha.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-12-13 6:46 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-12 14:47 [PATCH] kvm tools: mmap guest kernel instead of reading it into memory Sasha Levin
2011-12-12 14:54 ` Pekka Enberg
2011-12-12 17:47 ` Sasha Levin
2011-12-12 16:40 ` Andrew Walrond
2011-12-12 15:59 ` Pekka Enberg
2011-12-12 18:14 ` Sasha Levin
2011-12-12 18:03 ` Avi Kivity
2011-12-12 17:18 ` Ingo Molnar
2011-12-12 18:08 ` Avi Kivity
2011-12-12 18:10 ` Avi Kivity
2011-12-13 6:46 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).