Re: Resend: patch: qemu + hugetlbfs..

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: john cooper <john.cooper@third-harmonic.com>
To: kvm@vger.kernel.org, aliguori@us.ibm.com
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	john.cooper@redhat.com, avi@redhat.com
Subject: Re: Resend: patch: qemu + hugetlbfs..
Date: Thu, 15 Jan 2009 21:19:55 -0500	[thread overview]
Message-ID: <496FEECB.2060208@third-harmonic.com> (raw)
In-Reply-To: <48B33AAD.8000508@third-harmonic.com>

[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]

This trivial patch never did manage to find its way
in.  Marcelo called it to my attention earlier in
the week.  I've tweaked it to apply to kvm-83 and
the resulting patch is attached.  I've left the
prior e-mail discussion below for reference.

-john


john cooper wrote:
> This patch from over a month ago doesn't seem to have
> made it into kvm-73 and may have been lost in the
> shuffle.  Attached is essentially the same patch but
> as applied to kvm-73, and validated relative to that
> version.
> 
> In a nutshell the intention here is to allow
> preallocation of guest huge page backed memory at
> qemu initialization time to avoid a quirk in the
> kernel's huge page accounting allowing overcommit
> of huge pages.  Failure of the kernel to resolve a
> guest fault to overcommitted huge page memory during
> runtime results in sigkill termination of the guest.
> This patch provides the option of avoiding such
> behavior at the cost of up-front preallocation of
> physical huge pages backing the guest.
> 
> -john
> 
> 
> Anthony Liguori wrote:
>> john cooper wrote:
>>> Anthony Liguori wrote:
>>>> john cooper wrote:
>>>>> As it currently exists alloc_hpage_mem() is tied to
>>>>> the notion of huge page allocation as it will reference
>>>>> gethugepagesize() irrespective of *mem_path.  So even
>>>>> in the case of tmpfs backed files, if the host kernel
>>>>> has been configured with CONFIG_HUGETLBFS we will wind
>>>>> up doing allocations of /dev/shm mapped files at
>>>>> /proc/meminfo:Hugepagesize granularity.
>>>>
>>>> Which is fine.  It just means we round -m values up to even numbers.
>>>
>>> Well, yes it will round the allocation.  But from a
>>> minimally sufficient 4KB boundary to that of 4MB/2MB
>>> relative to a 32/64 bit x86 host which is excessive.
>>>
>>>>> Probably not what was intended but probably not too
>>>>> much of a concern as "-mem-path /dev/shm" is likely
>>>>> only used in debug of this flag and associated logic.
>>>>> I don't see it currently being worth the trouble to
>>>>> correct from a squeaky clean POV, and doing so may
>>>>> drag in far more than the header file we've just
>>>>> booted above to deal with this architecture/config
>>>>> dependency.
>>>>
>>>> Renaming a function to a name that's less accurate seems bad to me.  
>>>> I don't mean to be pedantic, but it seems like a strange thing to 
>>>> do.  I prefer it the way it was before.
>>>
>>> I don't see any harm reverting the name.  But I do
>>> believe it is largely cosmetic as given the above,
>>> the current code does require some work to make it
>>> independent of huge page assumptions.  Update attached.
>>>
>>> -john
>>
>> Looks good to me.
>>
>> Acked-by: Anthony Liguori <aliguori@us.ibm.com>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 


-- 
john.cooper@third-harmonic.com

[-- Attachment #2: prealloc.diff-090115 --]
[-- Type: text/plain, Size: 3552 bytes --]

 kernel/x86/Kbuild |    4 ++--
 qemu/vl.c         |   27 ++++++++++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)
=================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -237,6 +237,7 @@ int semihosting_enabled = 0;
 int time_drift_fix = 0;
 unsigned int kvm_shadow_memory = 0;
 const char *mem_path = NULL;
+int mem_prealloc = 1;	/* force preallocation of physical target memory */
 int hpagesize = 0;
 const char *cpu_vendor_string;
 #ifdef TARGET_ARM
@@ -4116,7 +4117,10 @@ static void help(int exitcode)
 #endif
            "-tdf            inject timer interrupts that got lost\n"
            "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
-           "-mem-path       set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+           "-mem-path       set the path to hugetlbfs/tmpfs mounted directory, also\n"
+           "                enables allocation of guest memory with huge pages\n"
+           "-mem-prealloc   toggles preallocation of -mem-path backed physical memory\n"
+           "                at startup.  Default is enabled.\n"
 	   "-option-rom rom load a file, rom, into the option ROM space\n"
 #ifdef TARGET_SPARC
            "-prom-env variable=value  set OpenBIOS nvram variables\n"
@@ -4246,6 +4250,7 @@ enum {
     QEMU_OPTION_tdf,
     QEMU_OPTION_kvm_shadow_memory,
     QEMU_OPTION_mempath,
+    QEMU_OPTION_mem_prealloc
 };
 
 typedef struct QEMUOption {
@@ -4381,6 +4386,7 @@ static const QEMUOption qemu_options[] =
     { "icount", HAS_ARG, QEMU_OPTION_icount },
     { "incoming", HAS_ARG, QEMU_OPTION_incoming },
     { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+    { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
     { NULL },
 };
 
@@ -4662,7 +4668,7 @@ void *alloc_mem_area(size_t memory, unsi
 {
     char *filename;
     void *area;
-    int fd;
+    int fd, flags;
 
     if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
 	return NULL;
@@ -4690,13 +4696,17 @@ void *alloc_mem_area(size_t memory, unsi
      */
     ftruncate(fd, memory);
 
-    area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+    /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+     * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
+     * to sidestep this quirk.
+     */
+    flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+    area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
     if (area == MAP_FAILED) {
-	perror("mmap");
-	close(fd);
-	return NULL;
+        perror("alloc_mem_area: can't mmap hugetlbfs pages");
+        close(fd);
+        return (NULL);
     }
-
     *len = memory;
     return area;
 }
@@ -5377,6 +5387,9 @@ int main(int argc, char **argv, char **e
             case QEMU_OPTION_mempath:
 		mem_path = optarg;
 		break;
+            case QEMU_OPTION_mem_prealloc:
+		mem_prealloc = !mem_prealloc;
+		break;
             case QEMU_OPTION_name:
                 qemu_name = optarg;
                 break;
=================================================================
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e
 ifeq ($(EXT_CONFIG_KVM_TRACE),y)
 kvm-objs += kvm_trace.o
 endif
-ifeq ($(CONFIG_DMAR),y)
-kvm-objs += vtd.o
+ifeq ($(CONFIG_IOMMU_API),y)
+kvm-objs += iommu.o
 endif
 kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
 kvm-amd-objs := svm.o ../external-module-compat.o

next prev parent reply	other threads:[~2009-01-16  2:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
2008-07-08 23:09 ` Anthony Liguori
2008-07-09  0:23   ` john cooper
2008-07-09  1:08     ` Anthony Liguori
2008-07-09 17:03       ` Marcelo Tosatti
2008-07-09 17:11         ` Anthony Liguori
2008-07-10 16:40           ` john cooper
2008-07-10 17:58             ` Anthony Liguori
2008-07-10 20:16               ` john cooper
2008-07-10 20:47                 ` Anthony Liguori
2008-07-10 21:12                   ` john cooper
2008-07-10 21:38                     ` Anthony Liguori
2008-08-25 23:05                       ` Resend: " john cooper
2008-08-26  8:11                         ` Avi Kivity
2008-08-27  4:13                           ` john cooper
2009-01-16  2:19                         ` john cooper [this message]
2009-01-20 10:29                           ` Avi Kivity
2009-01-23 21:21                             ` john cooper
2009-02-05 15:42                               ` Avi Kivity
2009-02-05 16:12                                 ` Marcelo Tosatti
2009-02-05 16:15                                   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=496FEECB.2060208@third-harmonic.com \
    --to=john.cooper@third-harmonic.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=john.cooper@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox