public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo@kvack.org>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Marcelo Tosatti <marcelo@kvack.org>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Avi Kivity <avi@qumranet.com>
Subject: Re: [PATCH] QEMU/KVM: large page support
Date: Sat, 23 Feb 2008 19:54:07 -0300	[thread overview]
Message-ID: <20080223225407.GA8465@dmt> (raw)
In-Reply-To: <47C06607.80200@codemonkey.ws>

Hi Anthony, 

Thanks for your comments.

On Sat, Feb 23, 2008 at 12:29:27PM -0600, Anthony Liguori wrote:

> In general, I don't think it causes any real harm if we always align the 
> ram address to a large page boundary.  If we aren't on Linux (and can't 
> determine what the large page size is), we can just set hpagesize to 
> getpagesize().  I think there's a good reason for this that I'll explain 
> below.

I thought about doing that (gets rid of the 4GB+ special casing) but we
lose the ability to compact smaller allocations in a single largepage.

Right now the VGA BIOS and the BIOS fit in the same largepage, for
example.

> > +
> > +void *alloc_huge_area(unsigned long memory, const char *path)
> > +{
> > +	void *area;
> > +	int fd;
> > +	char *filename;
> > +	char *tmpfile = "/kvm.XXXXXX";
> > +
> > +	filename = qemu_malloc(4096);
> > +	if (!filename) 
> > +		return NULL;
> > +
> > +	memset(filename, 0, 4096);
> > +	strncpy(filename, path, 4096 - strlen(tmpfile) - 1);
> > +	strcat(filename, tmpfile);
> > +	
> > +	hpagesize = gethugepagesize() * 1024;
> > +	if (!hpagesize)
> > +		return NULL;
> > +
> > +	mkstemp(filename);
> >   
> 
> mkstemp returns a file descriptor so the following open is not required.

Right, will fix.

> > +	fd = open(filename, O_RDWR);
> > +	if (fd < 0) {
> > +		perror("open");
> > +		hpagesize = 0;
> > +		exit(0);
> > +	}
> > +	memory = (memory+hpagesize-1) & ~(hpagesize-1);
> >   
> 
> I'm a little surprised that hugetlbfs doesn't require an ftruncate() 
> before the mmap().  Does an ftruncate() do any harm?  If so, it would be 
> better to have one.

Its not needed because hugetlbfs automatically adjusts the inode size on
demand.

Don't think it will do any harm and its compliant with normal
filesystems.

> > +	area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> >   
> 
> I think a MAP_SHARED may have some advantages in that it then becomes 
> possible to pass this file descriptor around to other processes so they 
> can mmap() the same memory region.  I don't know if that works with 
> hugetlbfs but it certainly does with tmpfs.  My thinking is that this 
> code can be made generic so it works with either hugetlbfs or tmpfs.

Yes, makes sense.

> Furthermore, I think it would be interesting if we defaulted to trying 
> to create the memory file in something like /dev/kvm-mem or something 
> more appropriately named.  An administrator can then either mount a 
> hugetlbfs or tmpfs mount there.  We still probably want to provide an 
> option to override where to create the file and we want to be able to 
> fall back to a normal malloc() of course but at least this makes it 
> possible for the distros to set things up so that it Just Work without 
> the user having to understand things like hugetlbfs and tmpfs.

I like the idea. I'll change "-hugetlb-path" to "-mem-path" for now (and
test it with tmpfs). Not so fond of hardcoding a path in QEMU though.

> Maybe we'll even see something that merges the two filesystems in the 
> future so that if a huge page allocation fails, it falls back to 
> creating a normal tmpfs file.  Perhaps that's a reasonable mount option 
> to add to hugetlbfs.

> Instead of registering an atexit() handler, I think it would be better 
> to unlink immediately after doing the mkstemp().  This reduces the 
> possibility of leaking the file in the event of catastrophe (like a kill 
> -SIGKILL).

Right, will fix.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  reply	other threads:[~2008-02-23 22:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-23 14:48 [PATCH] QEMU/KVM: large page support Marcelo Tosatti
2008-02-23 18:29 ` Anthony Liguori
2008-02-23 22:54   ` Marcelo Tosatti [this message]
2008-02-23 22:56     ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080223225407.GA8465@dmt \
    --to=marcelo@kvack.org \
    --cc=anthony@codemonkey.ws \
    --cc=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox