public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Marcelo Tosatti <marcelo@kvack.org>
Cc: kvm-devel <kvm-devel@lists.sourceforge.net>,
	Avi Kivity <avi@qumranet.com>
Subject: Re: [PATCH] QEMU/KVM: large page support
Date: Sat, 23 Feb 2008 12:29:27 -0600	[thread overview]
Message-ID: <47C06607.80200@codemonkey.ws> (raw)
In-Reply-To: <20080223144812.GB1040@dmt>

Hi Marcelo,

Marcelo Tosatti wrote:
> Add an option so the user can specify the hugetlbfs mounted path, with
> fallback to 4k pages on error.
>
> Align the 4GB+ memslot on large page boundary.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: kvm-userspace.gitpara/qemu/hw/pc.c
> ===================================================================
> --- kvm-userspace.gitpara.orig/qemu/hw/pc.c
> +++ kvm-userspace.gitpara/qemu/hw/pc.c
> @@ -854,6 +854,15 @@ static void pc_init1(ram_addr_t ram_size
>      /* above 4giga memory allocation */
>      if (above_4g_mem_size > 0) {
>          ram_addr = qemu_ram_alloc(above_4g_mem_size);
> +	if (hpagesize) {
> +		if (ram_addr & (hpagesize-1)) {
> +			unsigned long aligned_addr;
> +			aligned_addr = (ram_addr + hpagesize - 1) &
> +					 ~(hpagesize-1);
> +			qemu_ram_alloc(aligned_addr - ram_addr);
> +			ram_addr = aligned_addr;
> +		}
> +	}
>   

In general, I don't think it causes any real harm if we always align the 
ram address to a large page boundary.  If we aren't on Linux (and can't 
determine what the large page size is), we can just set hpagesize to 
getpagesize().  I think there's a good reason for this that I'll explain 
below.

> +
> +void *alloc_huge_area(unsigned long memory, const char *path)
> +{
> +	void *area;
> +	int fd;
> +	char *filename;
> +	char *tmpfile = "/kvm.XXXXXX";
> +
> +	filename = qemu_malloc(4096);
> +	if (!filename) 
> +		return NULL;
> +
> +	memset(filename, 0, 4096);
> +	strncpy(filename, path, 4096 - strlen(tmpfile) - 1);
> +	strcat(filename, tmpfile);
> +	
> +	hpagesize = gethugepagesize() * 1024;
> +	if (!hpagesize)
> +		return NULL;
> +
> +	mkstemp(filename);
>   

mkstemp returns a file descriptor so the following open is not required.

> +	fd = open(filename, O_RDWR);
> +	if (fd < 0) {
> +		perror("open");
> +		hpagesize = 0;
> +		exit(0);
> +	}
> +	memory = (memory+hpagesize-1) & ~(hpagesize-1);
>   

I'm a little surprised that hugetlbfs doesn't require an ftruncate() 
before the mmap().  Does an ftruncate() do any harm?  If so, it would be 
better to have one.

> +	area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
>   

I think a MAP_SHARED may have some advantages in that it then becomes 
possible to pass this file descriptor around to other processes so they 
can mmap() the same memory region.  I don't know if that works with 
hugetlbfs but it certainly does with tmpfs.  My thinking is that this 
code can be made generic so it works with either hugetlbfs or tmpfs.

Furthermore, I think it would be interesting if we defaulted to trying 
to create the memory file in something like /dev/kvm-mem or something 
more appropriately named.  An administrator can then either mount a 
hugetlbfs or tmpfs mount there.  We still probably want to provide an 
option to override where to create the file and we want to be able to 
fall back to a normal malloc() of course but at least this makes it 
possible for the distros to set things up so that it Just Work without 
the user having to understand things like hugetlbfs and tmpfs.

Maybe we'll even see something that merges the two filesystems in the 
future so that if a huge page allocation fails, it falls back to 
creating a normal tmpfs file.  Perhaps that's a reasonable mount option 
to add to hugetlbfs.

> +	if (area == MAP_FAILED) {
> +		perror("mmap");
> +		hpagesize = 0;
> +		exit(0);
> +	}
> +
> +	hugetlbfile = filename;
> +	atexit(cleanup_hugetlb);
>   

Instead of registering an atexit() handler, I think it would be better 
to unlink immediately after doing the mkstemp().  This reduces the 
possibility of leaking the file in the event of catastrophe (like a kill 
-SIGKILL).

Regards,

Anthony Liguori

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  reply	other threads:[~2008-02-23 18:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-23 14:48 [PATCH] QEMU/KVM: large page support Marcelo Tosatti
2008-02-23 18:29 ` Anthony Liguori [this message]
2008-02-23 22:54   ` Marcelo Tosatti
2008-02-23 22:56     ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47C06607.80200@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=marcelo@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox