All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo@kvack.org>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Marcelo Tosatti <marcelo@kvack.org>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Avi Kivity <avi@qumranet.com>
Subject: Re: [PATCH] QEMU/KVM: large page support
Date: Sat, 23 Feb 2008 19:54:07 -0300	[thread overview]
Message-ID: <20080223225407.GA8465@dmt> (raw)
In-Reply-To: <47C06607.80200@codemonkey.ws>

Hi Anthony, 

Thanks for your comments.

On Sat, Feb 23, 2008 at 12:29:27PM -0600, Anthony Liguori wrote:

> In general, I don't think it causes any real harm if we always align the 
> ram address to a large page boundary.  If we aren't on Linux (and can't 
> determine what the large page size is), we can just set hpagesize to 
> getpagesize().  I think there's a good reason for this that I'll explain 
> below.

I thought about doing that (gets rid of the 4GB+ special casing) but we
lose the ability to compact smaller allocations in a single largepage.

Right now the VGA BIOS and the BIOS fit in the same largepage, for
example.

> > +
> > +void *alloc_huge_area(unsigned long memory, const char *path)
> > +{
> > +	void *area;
> > +	int fd;
> > +	char *filename;
> > +	char *tmpfile = "/kvm.XXXXXX";
> > +
> > +	filename = qemu_malloc(4096);
> > +	if (!filename) 
> > +		return NULL;
> > +
> > +	memset(filename, 0, 4096);
> > +	strncpy(filename, path, 4096 - strlen(tmpfile) - 1);
> > +	strcat(filename, tmpfile);
> > +	
> > +	hpagesize = gethugepagesize() * 1024;
> > +	if (!hpagesize)
> > +		return NULL;
> > +
> > +	mkstemp(filename);
> >   
> 
> mkstemp returns a file descriptor so the following open is not required.

Right, will fix.

> > +	fd = open(filename, O_RDWR);
> > +	if (fd < 0) {
> > +		perror("open");
> > +		hpagesize = 0;
> > +		exit(0);
> > +	}
> > +	memory = (memory+hpagesize-1) & ~(hpagesize-1);
> >   
> 
> I'm a little surprised that hugetlbfs doesn't require an ftruncate() 
> before the mmap().  Does an ftruncate() do any harm?  If so, it would be 
> better to have one.

Its not needed because hugetlbfs automatically adjusts the inode size on
demand.

Don't think it will do any harm and its compliant with normal
filesystems.

> > +	area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> >   
> 
> I think a MAP_SHARED may have some advantages in that it then becomes 
> possible to pass this file descriptor around to other processes so they 
> can mmap() the same memory region.  I don't know if that works with 
> hugetlbfs but it certainly does with tmpfs.  My thinking is that this 
> code can be made generic so it works with either hugetlbfs or tmpfs.

Yes, makes sense.

> Furthermore, I think it would be interesting if we defaulted to trying 
> to create the memory file in something like /dev/kvm-mem or something 
> more appropriately named.  An administrator can then either mount a 
> hugetlbfs or tmpfs mount there.  We still probably want to provide an 
> option to override where to create the file and we want to be able to 
> fall back to a normal malloc() of course but at least this makes it 
> possible for the distros to set things up so that it Just Work without 
> the user having to understand things like hugetlbfs and tmpfs.

I like the idea. I'll change "-hugetlb-path" to "-mem-path" for now (and
test it with tmpfs). Not so fond of hardcoding a path in QEMU though.

> Maybe we'll even see something that merges the two filesystems in the 
> future so that if a huge page allocation fails, it falls back to 
> creating a normal tmpfs file.  Perhaps that's a reasonable mount option 
> to add to hugetlbfs.

> Instead of registering an atexit() handler, I think it would be better 
> to unlink immediately after doing the mkstemp().  This reduces the 
> possibility of leaking the file in the event of catastrophe (like a kill 
> -SIGKILL).

Right, will fix.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  reply	other threads:[~2008-02-23 22:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-23 14:48 [PATCH] QEMU/KVM: large page support Marcelo Tosatti
2008-02-23 18:29 ` Anthony Liguori
2008-02-23 22:54   ` Marcelo Tosatti [this message]
2008-02-23 22:56     ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080223225407.GA8465@dmt \
    --to=marcelo@kvack.org \
    --cc=anthony@codemonkey.ws \
    --cc=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.