Re: [Qemu-devel] [RFC PATCH v2] spapr: Enable use of huge pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexander Graf <agraf@suse.de>
To: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH v2] spapr: Enable use of huge pages
Date: Thu, 10 Jul 2014 12:29:05 +0200	[thread overview]
Message-ID: <53BE6AF1.1020905@suse.de> (raw)
In-Reply-To: <1404914381-9953-1-git-send-email-aik@ozlabs.ru>


On 09.07.14 15:59, Alexey Kardashevskiy wrote:
> On 07/09/2014 05:46 PM, Paolo Bonzini wrote:> Il 09/07/2014 07:57, Alexey Kardashevskiy ha scritto:
>>> 0b183fc87 "memory: move mem_path handling to
>>> memory_region_allocate_system_memory" disabled -mempath use for all
>>> machines that do not use memory_region_allocate_system_memory() to
>>> register RAM. Since SPAPR uses memory_region_init_ram(), the huge pages
>>> support was disabled for it.
>>>
>>> This replaces memory_region_init_ram()+vmstate_register_ram_global() with
>>> memory_region_allocate_system_memory() to get huge pages back.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Hu Tao <hutao@cn.fujitsu.com>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>   hw/ppc/spapr.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index a23c0f0..8fa9f7e 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1337,8 +1337,8 @@ static void ppc_spapr_init(MachineState *machine)
>>>           ram_addr_t nonrma_base = rma_alloc_size;
>>>           ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
>>>
>>> -        memory_region_init_ram(ram, NULL, "ppc_spapr.ram", nonrma_size);
>>> -        vmstate_register_ram_global(ram);
>>> +        memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
>>> +                                             nonrma_size);
>> The reason why I didn't do this in the simple way is that depending on the
>> value of nonrma_base you may get smaller hugepages than you wanted.
>>
>> For example, if the hugepage size is 1G but nonrma_base is 32M, you will
>> not be able to get a page size larger than 32M.
>>
>> Depending on the value of nonrma_base, it may be better to allocate the
>> whole spapr->ram_limit to ppc_spapr.ram, and just ignore the first part of it.
>>
>> I see in target-ppc/kvm.c that rma_alloc_size is capped to 256M, and  in
>> practice it is 128M (arch/powerpc/kvm/book3s_hv_builtin.c.  Considering
>> that Linux overcommits so the memory isn't lost in the non-hugepage case, I
>> think it's better to just waste the 128M of address space.
>>
>> Paolo
>>
>>>           memory_region_add_subregion(sysmem, nonrma_base, ram);
>>>       }
> Did you mean something like below? If so, I have to change MR tree and
> place RMA under RAM, I guess.
> I'll try to give it a try tomorrow on bare PPC970.
>
>
>
>
> ---
>   hw/ppc/spapr.c       | 19 ++++++++++++-------
>   target-ppc/kvm.c     |  9 +--------
>   target-ppc/kvm_ppc.h |  2 +-
>   3 files changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a23c0f0..47ae6c1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1223,6 +1223,7 @@ static void ppc_spapr_init(MachineState *machine)
>       int i;
>       MemoryRegion *sysmem = get_system_memory();
>       MemoryRegion *ram = g_new(MemoryRegion, 1);
> +    MemoryRegion *rma_region;
>       hwaddr rma_alloc_size;
>       hwaddr node0_size = (nb_numa_nodes > 1) ? numa_info[0].node_mem : ram_size;
>       uint32_t initrd_base = 0;
> @@ -1230,6 +1231,7 @@ static void ppc_spapr_init(MachineState *machine)
>       long load_limit, rtas_limit, fw_size;
>       bool kernel_le = false;
>       char *filename;
> +    void *rma = NULL;
>   
>       msi_supported = true;
>   
> @@ -1239,7 +1241,7 @@ static void ppc_spapr_init(MachineState *machine)
>       cpu_ppc_hypercall = emulate_spapr_hypercall;
>   
>       /* Allocate RMA if necessary */
> -    rma_alloc_size = kvmppc_alloc_rma("ppc_spapr.rma", sysmem);
> +    rma_alloc_size = kvmppc_alloc_rma(&rma);
>   
>       if (rma_alloc_size == -1) {
>           hw_error("qemu: Unable to create RMA\n");
> @@ -1333,13 +1335,16 @@ static void ppc_spapr_init(MachineState *machine)
>   
>       /* allocate RAM */
>       spapr->ram_limit = ram_size;
> -    if (spapr->ram_limit > rma_alloc_size) {
> -        ram_addr_t nonrma_base = rma_alloc_size;
> -        ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
> +    memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
> +                                         spapr->ram_limit);
> +    memory_region_add_subregion(sysmem, 0, ram);
>   
> -        memory_region_init_ram(ram, NULL, "ppc_spapr.ram", nonrma_size);
> -        vmstate_register_ram_global(ram);
> -        memory_region_add_subregion(sysmem, nonrma_base, ram);
> +    if (rma_alloc_size && rma) {
> +        rma_region = g_new(MemoryRegion, 1);
> +        memory_region_init_ram_ptr(rma_region, NULL, "ppc_spapr.rma",
> +                                   rma_alloc_size, rma);
> +        vmstate_register_ram_global(rma_region);
> +        memory_region_add_subregion(sysmem, 0, rma_region);
>       }
>   
>       filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 995706a..9ca14d2 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -1582,13 +1582,11 @@ int kvmppc_smt_threads(void)
>   }
>   
>   #ifdef TARGET_PPC64
> -off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
> +off_t kvmppc_alloc_rma(void **rma)
>   {
> -    void *rma;
>       off_t size;
>       int fd;
>       struct kvm_allocate_rma ret;
> -    MemoryRegion *rma_region;
>   
>       /* If cap_ppc_rma == 0, contiguous RMA allocation is not supported
>        * if cap_ppc_rma == 1, contiguous RMA allocation is supported, but
> @@ -1617,11 +1615,6 @@ off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
>           return -1;
>       };
>   
> -    rma_region = g_new(MemoryRegion, 1);
> -    memory_region_init_ram_ptr(rma_region, NULL, name, size, rma);
> -    vmstate_register_ram_global(rma_region);
> -    memory_region_add_subregion(sysmem, 0, rma_region);
> -

I don't see where you set *rma here.

Apart from that while I think that with hugetlbfs we might actually 
waste a few MB of RAM, I don't think it's a real problem for systems 
that require an RMA. So semantically the change works well for me. 
Please verify it works though :).


Alex

next prev parent reply	other threads:[~2014-07-10 10:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-09  5:57 [Qemu-devel] [PATCH] spapr: Enable use of huge pages Alexey Kardashevskiy
2014-07-09  7:38 ` Hu Tao
2014-07-09  7:46 ` Paolo Bonzini
2014-07-09 13:59   ` [Qemu-devel] [RFC PATCH v2] " Alexey Kardashevskiy
2014-07-09 14:02     ` Paolo Bonzini
2014-07-10 10:29     ` Alexander Graf [this message]
2014-07-10 10:45       ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BE6AF1.1020905@suse.de \
    --to=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.