From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH] QEMU/KVM: large page support Date: Sat, 23 Feb 2008 19:54:07 -0300 Message-ID: <20080223225407.GA8465@dmt> References: <20080223144812.GB1040@dmt> <47C06607.80200@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm-devel , Avi Kivity To: Anthony Liguori Return-path: Content-Disposition: inline In-Reply-To: <47C06607.80200@codemonkey.ws> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces@lists.sourceforge.net Errors-To: kvm-devel-bounces@lists.sourceforge.net List-Id: kvm.vger.kernel.org Hi Anthony, Thanks for your comments. On Sat, Feb 23, 2008 at 12:29:27PM -0600, Anthony Liguori wrote: > In general, I don't think it causes any real harm if we always align the > ram address to a large page boundary. If we aren't on Linux (and can't > determine what the large page size is), we can just set hpagesize to > getpagesize(). I think there's a good reason for this that I'll explain > below. I thought about doing that (gets rid of the 4GB+ special casing) but we lose the ability to compact smaller allocations in a single largepage. Right now the VGA BIOS and the BIOS fit in the same largepage, for example. > > + > > +void *alloc_huge_area(unsigned long memory, const char *path) > > +{ > > + void *area; > > + int fd; > > + char *filename; > > + char *tmpfile = "/kvm.XXXXXX"; > > + > > + filename = qemu_malloc(4096); > > + if (!filename) > > + return NULL; > > + > > + memset(filename, 0, 4096); > > + strncpy(filename, path, 4096 - strlen(tmpfile) - 1); > > + strcat(filename, tmpfile); > > + > > + hpagesize = gethugepagesize() * 1024; > > + if (!hpagesize) > > + return NULL; > > + > > + mkstemp(filename); > > > > mkstemp returns a file descriptor so the following open is not required. Right, will fix. > > + fd = open(filename, O_RDWR); > > + if (fd < 0) { > > + perror("open"); > > + hpagesize = 0; > > + exit(0); > > + } > > + memory = (memory+hpagesize-1) & ~(hpagesize-1); > > > > I'm a little surprised that hugetlbfs doesn't require an ftruncate() > before the mmap(). Does an ftruncate() do any harm? If so, it would be > better to have one. Its not needed because hugetlbfs automatically adjusts the inode size on demand. Don't think it will do any harm and its compliant with normal filesystems. > > + area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); > > > > I think a MAP_SHARED may have some advantages in that it then becomes > possible to pass this file descriptor around to other processes so they > can mmap() the same memory region. I don't know if that works with > hugetlbfs but it certainly does with tmpfs. My thinking is that this > code can be made generic so it works with either hugetlbfs or tmpfs. Yes, makes sense. > Furthermore, I think it would be interesting if we defaulted to trying > to create the memory file in something like /dev/kvm-mem or something > more appropriately named. An administrator can then either mount a > hugetlbfs or tmpfs mount there. We still probably want to provide an > option to override where to create the file and we want to be able to > fall back to a normal malloc() of course but at least this makes it > possible for the distros to set things up so that it Just Work without > the user having to understand things like hugetlbfs and tmpfs. I like the idea. I'll change "-hugetlb-path" to "-mem-path" for now (and test it with tmpfs). Not so fond of hardcoding a path in QEMU though. > Maybe we'll even see something that merges the two filesystems in the > future so that if a huge page allocation fails, it falls back to > creating a normal tmpfs file. Perhaps that's a reasonable mount option > to add to hugetlbfs. > Instead of registering an atexit() handler, I think it would be better > to unlink immediately after doing the mkstemp(). This reduces the > possibility of leaking the file in the event of catastrophe (like a kill > -SIGKILL). Right, will fix. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/