All of lore.kernel.org
 help / color / mirror / Atom feed
From: john cooper <john.cooper@third-harmonic.com>
To: kvm@vger.kernel.org, aliguori@us.ibm.com
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	john.cooper@redhat.com, avi@redhat.com
Subject: Re: Resend: patch: qemu + hugetlbfs..
Date: Thu, 15 Jan 2009 21:19:55 -0500	[thread overview]
Message-ID: <496FEECB.2060208@third-harmonic.com> (raw)
In-Reply-To: <48B33AAD.8000508@third-harmonic.com>

[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]

This trivial patch never did manage to find its way
in.  Marcelo called it to my attention earlier in
the week.  I've tweaked it to apply to kvm-83 and
the resulting patch is attached.  I've left the
prior e-mail discussion below for reference.

-john


john cooper wrote:
> This patch from over a month ago doesn't seem to have
> made it into kvm-73 and may have been lost in the
> shuffle.  Attached is essentially the same patch but
> as applied to kvm-73, and validated relative to that
> version.
> 
> In a nutshell the intention here is to allow
> preallocation of guest huge page backed memory at
> qemu initialization time to avoid a quirk in the
> kernel's huge page accounting allowing overcommit
> of huge pages.  Failure of the kernel to resolve a
> guest fault to overcommitted huge page memory during
> runtime results in sigkill termination of the guest.
> This patch provides the option of avoiding such
> behavior at the cost of up-front preallocation of
> physical huge pages backing the guest.
> 
> -john
> 
> 
> Anthony Liguori wrote:
>> john cooper wrote:
>>> Anthony Liguori wrote:
>>>> john cooper wrote:
>>>>> As it currently exists alloc_hpage_mem() is tied to
>>>>> the notion of huge page allocation as it will reference
>>>>> gethugepagesize() irrespective of *mem_path.  So even
>>>>> in the case of tmpfs backed files, if the host kernel
>>>>> has been configured with CONFIG_HUGETLBFS we will wind
>>>>> up doing allocations of /dev/shm mapped files at
>>>>> /proc/meminfo:Hugepagesize granularity.
>>>>
>>>> Which is fine.  It just means we round -m values up to even numbers.
>>>
>>> Well, yes it will round the allocation.  But from a
>>> minimally sufficient 4KB boundary to that of 4MB/2MB
>>> relative to a 32/64 bit x86 host which is excessive.
>>>
>>>>> Probably not what was intended but probably not too
>>>>> much of a concern as "-mem-path /dev/shm" is likely
>>>>> only used in debug of this flag and associated logic.
>>>>> I don't see it currently being worth the trouble to
>>>>> correct from a squeaky clean POV, and doing so may
>>>>> drag in far more than the header file we've just
>>>>> booted above to deal with this architecture/config
>>>>> dependency.
>>>>
>>>> Renaming a function to a name that's less accurate seems bad to me.  
>>>> I don't mean to be pedantic, but it seems like a strange thing to 
>>>> do.  I prefer it the way it was before.
>>>
>>> I don't see any harm reverting the name.  But I do
>>> believe it is largely cosmetic as given the above,
>>> the current code does require some work to make it
>>> independent of huge page assumptions.  Update attached.
>>>
>>> -john
>>
>> Looks good to me.
>>
>> Acked-by: Anthony Liguori <aliguori@us.ibm.com>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 


-- 
john.cooper@third-harmonic.com

[-- Attachment #2: prealloc.diff-090115 --]
[-- Type: text/plain, Size: 3552 bytes --]

 kernel/x86/Kbuild |    4 ++--
 qemu/vl.c         |   27 ++++++++++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)
=================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -237,6 +237,7 @@ int semihosting_enabled = 0;
 int time_drift_fix = 0;
 unsigned int kvm_shadow_memory = 0;
 const char *mem_path = NULL;
+int mem_prealloc = 1;	/* force preallocation of physical target memory */
 int hpagesize = 0;
 const char *cpu_vendor_string;
 #ifdef TARGET_ARM
@@ -4116,7 +4117,10 @@ static void help(int exitcode)
 #endif
            "-tdf            inject timer interrupts that got lost\n"
            "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
-           "-mem-path       set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+           "-mem-path       set the path to hugetlbfs/tmpfs mounted directory, also\n"
+           "                enables allocation of guest memory with huge pages\n"
+           "-mem-prealloc   toggles preallocation of -mem-path backed physical memory\n"
+           "                at startup.  Default is enabled.\n"
 	   "-option-rom rom load a file, rom, into the option ROM space\n"
 #ifdef TARGET_SPARC
            "-prom-env variable=value  set OpenBIOS nvram variables\n"
@@ -4246,6 +4250,7 @@ enum {
     QEMU_OPTION_tdf,
     QEMU_OPTION_kvm_shadow_memory,
     QEMU_OPTION_mempath,
+    QEMU_OPTION_mem_prealloc
 };
 
 typedef struct QEMUOption {
@@ -4381,6 +4386,7 @@ static const QEMUOption qemu_options[] =
     { "icount", HAS_ARG, QEMU_OPTION_icount },
     { "incoming", HAS_ARG, QEMU_OPTION_incoming },
     { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+    { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
     { NULL },
 };
 
@@ -4662,7 +4668,7 @@ void *alloc_mem_area(size_t memory, unsi
 {
     char *filename;
     void *area;
-    int fd;
+    int fd, flags;
 
     if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
 	return NULL;
@@ -4690,13 +4696,17 @@ void *alloc_mem_area(size_t memory, unsi
      */
     ftruncate(fd, memory);
 
-    area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+    /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+     * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
+     * to sidestep this quirk.
+     */
+    flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+    area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
     if (area == MAP_FAILED) {
-	perror("mmap");
-	close(fd);
-	return NULL;
+        perror("alloc_mem_area: can't mmap hugetlbfs pages");
+        close(fd);
+        return (NULL);
     }
-
     *len = memory;
     return area;
 }
@@ -5377,6 +5387,9 @@ int main(int argc, char **argv, char **e
             case QEMU_OPTION_mempath:
 		mem_path = optarg;
 		break;
+            case QEMU_OPTION_mem_prealloc:
+		mem_prealloc = !mem_prealloc;
+		break;
             case QEMU_OPTION_name:
                 qemu_name = optarg;
                 break;
=================================================================
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e
 ifeq ($(EXT_CONFIG_KVM_TRACE),y)
 kvm-objs += kvm_trace.o
 endif
-ifeq ($(CONFIG_DMAR),y)
-kvm-objs += vtd.o
+ifeq ($(CONFIG_IOMMU_API),y)
+kvm-objs += iommu.o
 endif
 kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
 kvm-amd-objs := svm.o ../external-module-compat.o

  parent reply	other threads:[~2009-01-16  2:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
2008-07-08 23:09 ` Anthony Liguori
2008-07-09  0:23   ` john cooper
2008-07-09  1:08     ` Anthony Liguori
2008-07-09 17:03       ` Marcelo Tosatti
2008-07-09 17:11         ` Anthony Liguori
2008-07-10 16:40           ` john cooper
2008-07-10 17:58             ` Anthony Liguori
2008-07-10 20:16               ` john cooper
2008-07-10 20:47                 ` Anthony Liguori
2008-07-10 21:12                   ` john cooper
2008-07-10 21:38                     ` Anthony Liguori
2008-08-25 23:05                       ` Resend: " john cooper
2008-08-26  8:11                         ` Avi Kivity
2008-08-27  4:13                           ` john cooper
2009-01-16  2:19                         ` john cooper [this message]
2009-01-20 10:29                           ` Avi Kivity
2009-01-23 21:21                             ` john cooper
2009-02-05 15:42                               ` Avi Kivity
2009-02-05 16:12                                 ` Marcelo Tosatti
2009-02-05 16:15                                   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=496FEECB.2060208@third-harmonic.com \
    --to=john.cooper@third-harmonic.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=john.cooper@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.