All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: john cooper <john.cooper@third-harmonic.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, john.cooper@redhat.com
Subject: Re: patch: qemu + hugetlbfs..
Date: Thu, 10 Jul 2008 12:58:07 -0500	[thread overview]
Message-ID: <48764DAF.6060502@codemonkey.ws> (raw)
In-Reply-To: <48763B86.6060402@third-harmonic.com>

john cooper wrote:
> Anthony Liguori wrote:
>> Marcelo Tosatti wrote:
>>> This is Linux's behaviour for all filesystems. There is no error
>>> checking on MAP_POPULATE's attempt to prefault pages.
>>>   
>>
>> I thought that mmap() will fail with hugetlbfs if enough large pages 
>> can't be reserved?
>
> This appears to be the case for MAP_SHARED.  For MAP_PRIVATE
> the accounting is side-stepped, the result of which was the
> best-effort prealloc/populate behavior we had see seen.
>
> Marcelo and I batted this around some yesterday.  And there
> doesn't appear to be any concern using MAP_SHARED for the
> phys_ram map as potential access to the backing file is
> still regulated by filesystem permissions.
>
> Doing so dispenses with the syscall/useraddr hack to fault
> in pages since mmap() fails as expected in the case
> sufficient pages are unavailable immediately to back
> MAP_SHARED pages.  We also decided to defer the map-fallback
> logic.  This is essentially behavior we'd like to see
> provided by kvm, ideally on a dynamic (guest fault) basis
> and was technically beyond the issue we were trying to
> target.
>
> Attached is the resulting patch which also addresses the
> other concerns which arose.
>
> -john
>
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -45,6 +45,7 @@
>  #include <signal.h>
>  #include <time.h>
>  #include <errno.h>
> +#include <asm/param.h>
>   

I don't think this is necessary anymore.  Depending on a Linux headers 
breaks the QEMU build on other unices so it's a bad thing.

>  #include <sys/time.h>
>  #include <zlib.h>
>  
> @@ -234,6 +235,7 @@ int autostart = 1;
>  int time_drift_fix = 0;
>  unsigned int kvm_shadow_memory = 0;
>  const char *mem_path = NULL;
> +int mem_prealloc = 1;	/* force preallocation of physical target memory */
>  int hpagesize = 0;
>  const char *cpu_vendor_string;
>  #ifdef TARGET_ARM
> @@ -7809,7 +7811,10 @@ static void help(int exitcode)
>  #endif
>             "-tdf            inject timer interrupts that got lost\n"
>             "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
> -           "-mem-path       set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
> +           "-mem-path       set the path to hugetlbfs/tmpfs mounted directory, also\n"
> +           "                enables allocation of guest memory with huge pages\n"
> +           "-mem-prealloc   toggles preallocation of huge page physical memory at\n"
> +           "                startup when -mem-path is specified.  Default is enabled.\n"
>   

This really isn't huge page specific.  It should work equally well with 
tmpfs, no?

>  	   "-option-rom rom load a file, rom, into the option ROM space\n"
>  #ifdef TARGET_SPARC
>             "-prom-env variable=value  set OpenBIOS nvram variables\n"
> @@ -7932,6 +7937,7 @@ enum {
>      QEMU_OPTION_tdf,
>      QEMU_OPTION_kvm_shadow_memory,
>      QEMU_OPTION_mempath,
> +    QEMU_OPTION_mem_prealloc
>  };
>  
>  typedef struct QEMUOption {
> @@ -8059,6 +8065,7 @@ const QEMUOption qemu_options[] = {
>      { "startdate", HAS_ARG, QEMU_OPTION_startdate },
>      { "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
>      { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
> +    { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
>      { NULL },
>  };
>  
> @@ -8276,11 +8283,13 @@ static int gethugepagesize(void)
>      return hugepagesize;
>  }
>  
> -void *alloc_mem_area(unsigned long memory, const char *path)
> +/* attempt to allocate memory mmap'ed to mem_path
> + */
> +void *alloc_hpage_mem(unsigned long memory, const char *path)
>  {
>      char *filename;
>      void *area;
> -    int fd;
> +    int fd, flags;
>  
>      if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
>  	return NULL;
> @@ -8308,26 +8317,24 @@ void *alloc_mem_area(unsigned long memor
>       */
>      ftruncate(fd, memory);
>  
> -    area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> -    if (area == MAP_FAILED) {
> -	perror("mmap");
> -	close(fd);
> -	return NULL;
> -    }
> -
> -    return area;
> +    /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
> +     * MAP_PRIVATE is requested.  For mem_prealloc we mmap as MAP_SHARED
> +     * to sidestep this quirk.
> +     */
> +    flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
> +    area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
> +    if (area != MAP_FAILED)
> +	return (area);
> +    perror("alloc_hpage_mem: can't mmap hugetlbfs pages");
> +    close(fd);
> +    return (NULL);
>  }
>  
> -void *qemu_alloc_physram(unsigned long memory)
> +/* allocate guest memory as requested
> + */
> +void *qemu_alloc_physram(unsigned long size)
>  {
> -    void *area = NULL;
> -
> -    if (mem_path)
> -	area = alloc_mem_area(memory, mem_path);
> -    if (!area)
> -	area = qemu_vmalloc(memory);
> -
> -    return area;
> +    return (mem_path ? alloc_hpage_mem(size, mem_path) : qemu_vmalloc(size));
>  }
>   

I'd prefer if you didn't convert from a if/else to conditional.  hpage 
is a misnomer too as we aren't actually dependent on huge pages (this 
code should work equally well for tmpfs).

Regards,

Anthony Liguori

>  int main(int argc, char **argv)
> @@ -8962,6 +8969,9 @@ int main(int argc, char **argv)
>              case QEMU_OPTION_mempath:
>  		mem_path = optarg;
>  		break;
> +            case QEMU_OPTION_mem_prealloc:
> +		mem_prealloc = !mem_prealloc;
> +		break;
>              case QEMU_OPTION_name:
>                  qemu_name = optarg;
>                  break;
>   


  reply	other threads:[~2008-07-10 17:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
2008-07-08 23:09 ` Anthony Liguori
2008-07-09  0:23   ` john cooper
2008-07-09  1:08     ` Anthony Liguori
2008-07-09 17:03       ` Marcelo Tosatti
2008-07-09 17:11         ` Anthony Liguori
2008-07-10 16:40           ` john cooper
2008-07-10 17:58             ` Anthony Liguori [this message]
2008-07-10 20:16               ` john cooper
2008-07-10 20:47                 ` Anthony Liguori
2008-07-10 21:12                   ` john cooper
2008-07-10 21:38                     ` Anthony Liguori
2008-08-25 23:05                       ` Resend: " john cooper
2008-08-26  8:11                         ` Avi Kivity
2008-08-27  4:13                           ` john cooper
2009-01-16  2:19                         ` john cooper
2009-01-20 10:29                           ` Avi Kivity
2009-01-23 21:21                             ` john cooper
2009-02-05 15:42                               ` Avi Kivity
2009-02-05 16:12                                 ` Marcelo Tosatti
2009-02-05 16:15                                   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48764DAF.6060502@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=john.cooper@redhat.com \
    --cc=john.cooper@third-harmonic.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.