From: Marcelo Tosatti <marcelo@kvack.org>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Marcelo Tosatti <marcelo@kvack.org>,
kvm-devel <kvm-devel@lists.sourceforge.net>,
Avi Kivity <avi@qumranet.com>
Subject: Re: [PATCH] QEMU/KVM: large page support
Date: Sat, 23 Feb 2008 19:54:07 -0300 [thread overview]
Message-ID: <20080223225407.GA8465@dmt> (raw)
In-Reply-To: <47C06607.80200@codemonkey.ws>
Hi Anthony,
Thanks for your comments.
On Sat, Feb 23, 2008 at 12:29:27PM -0600, Anthony Liguori wrote:
> In general, I don't think it causes any real harm if we always align the
> ram address to a large page boundary. If we aren't on Linux (and can't
> determine what the large page size is), we can just set hpagesize to
> getpagesize(). I think there's a good reason for this that I'll explain
> below.
I thought about doing that (gets rid of the 4GB+ special casing) but we
lose the ability to compact smaller allocations in a single largepage.
Right now the VGA BIOS and the BIOS fit in the same largepage, for
example.
> > +
> > +void *alloc_huge_area(unsigned long memory, const char *path)
> > +{
> > + void *area;
> > + int fd;
> > + char *filename;
> > + char *tmpfile = "/kvm.XXXXXX";
> > +
> > + filename = qemu_malloc(4096);
> > + if (!filename)
> > + return NULL;
> > +
> > + memset(filename, 0, 4096);
> > + strncpy(filename, path, 4096 - strlen(tmpfile) - 1);
> > + strcat(filename, tmpfile);
> > +
> > + hpagesize = gethugepagesize() * 1024;
> > + if (!hpagesize)
> > + return NULL;
> > +
> > + mkstemp(filename);
> >
>
> mkstemp returns a file descriptor so the following open is not required.
Right, will fix.
> > + fd = open(filename, O_RDWR);
> > + if (fd < 0) {
> > + perror("open");
> > + hpagesize = 0;
> > + exit(0);
> > + }
> > + memory = (memory+hpagesize-1) & ~(hpagesize-1);
> >
>
> I'm a little surprised that hugetlbfs doesn't require an ftruncate()
> before the mmap(). Does an ftruncate() do any harm? If so, it would be
> better to have one.
Its not needed because hugetlbfs automatically adjusts the inode size on
demand.
Don't think it will do any harm and its compliant with normal
filesystems.
> > + area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> >
>
> I think a MAP_SHARED may have some advantages in that it then becomes
> possible to pass this file descriptor around to other processes so they
> can mmap() the same memory region. I don't know if that works with
> hugetlbfs but it certainly does with tmpfs. My thinking is that this
> code can be made generic so it works with either hugetlbfs or tmpfs.
Yes, makes sense.
> Furthermore, I think it would be interesting if we defaulted to trying
> to create the memory file in something like /dev/kvm-mem or something
> more appropriately named. An administrator can then either mount a
> hugetlbfs or tmpfs mount there. We still probably want to provide an
> option to override where to create the file and we want to be able to
> fall back to a normal malloc() of course but at least this makes it
> possible for the distros to set things up so that it Just Work without
> the user having to understand things like hugetlbfs and tmpfs.
I like the idea. I'll change "-hugetlb-path" to "-mem-path" for now (and
test it with tmpfs). Not so fond of hardcoding a path in QEMU though.
> Maybe we'll even see something that merges the two filesystems in the
> future so that if a huge page allocation fails, it falls back to
> creating a normal tmpfs file. Perhaps that's a reasonable mount option
> to add to hugetlbfs.
> Instead of registering an atexit() handler, I think it would be better
> to unlink immediately after doing the mkstemp(). This reduces the
> possibility of leaking the file in the event of catastrophe (like a kill
> -SIGKILL).
Right, will fix.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
next prev parent reply other threads:[~2008-02-23 22:54 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-23 14:48 [PATCH] QEMU/KVM: large page support Marcelo Tosatti
2008-02-23 18:29 ` Anthony Liguori
2008-02-23 22:54 ` Marcelo Tosatti [this message]
2008-02-23 22:56 ` Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080223225407.GA8465@dmt \
--to=marcelo@kvack.org \
--cc=anthony@codemonkey.ws \
--cc=avi@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox