From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59905)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1X5BbF-0000hy-76
	for qemu-devel@nongnu.org; Thu, 10 Jul 2014 06:29:18 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1X5Bb8-00037w-Qt
	for qemu-devel@nongnu.org; Thu, 10 Jul 2014 06:29:13 -0400
Message-ID: <53BE6AF1.1020905@suse.de>
Date: Thu, 10 Jul 2014 12:29:05 +0200
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <53BCF352.7070005@redhat.com>
	<1404914381-9953-1-git-send-email-aik@ozlabs.ru>
In-Reply-To: <1404914381-9953-1-git-send-email-aik@ozlabs.ru>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v2] spapr: Enable use of huge pages
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org


On 09.07.14 15:59, Alexey Kardashevskiy wrote:
> On 07/09/2014 05:46 PM, Paolo Bonzini wrote:> Il 09/07/2014 07:57, Alexey Kardashevskiy ha scritto:
>>> 0b183fc87 "memory: move mem_path handling to
>>> memory_region_allocate_system_memory" disabled -mempath use for all
>>> machines that do not use memory_region_allocate_system_memory() to
>>> register RAM. Since SPAPR uses memory_region_init_ram(), the huge pages
>>> support was disabled for it.
>>>
>>> This replaces memory_region_init_ram()+vmstate_register_ram_global() with
>>> memory_region_allocate_system_memory() to get huge pages back.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Hu Tao <hutao@cn.fujitsu.com>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>   hw/ppc/spapr.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index a23c0f0..8fa9f7e 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -1337,8 +1337,8 @@ static void ppc_spapr_init(MachineState *machine)
>>>           ram_addr_t nonrma_base = rma_alloc_size;
>>>           ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
>>>
>>> -        memory_region_init_ram(ram, NULL, "ppc_spapr.ram", nonrma_size);
>>> -        vmstate_register_ram_global(ram);
>>> +        memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
>>> +                                             nonrma_size);
>> The reason why I didn't do this in the simple way is that depending on the
>> value of nonrma_base you may get smaller hugepages than you wanted.
>>
>> For example, if the hugepage size is 1G but nonrma_base is 32M, you will
>> not be able to get a page size larger than 32M.
>>
>> Depending on the value of nonrma_base, it may be better to allocate the
>> whole spapr->ram_limit to ppc_spapr.ram, and just ignore the first part of it.
>>
>> I see in target-ppc/kvm.c that rma_alloc_size is capped to 256M, and  in
>> practice it is 128M (arch/powerpc/kvm/book3s_hv_builtin.c.  Considering
>> that Linux overcommits so the memory isn't lost in the non-hugepage case, I
>> think it's better to just waste the 128M of address space.
>>
>> Paolo
>>
>>>           memory_region_add_subregion(sysmem, nonrma_base, ram);
>>>       }
> Did you mean something like below? If so, I have to change MR tree and
> place RMA under RAM, I guess.
> I'll try to give it a try tomorrow on bare PPC970.
>
>
>
>
> ---
>   hw/ppc/spapr.c       | 19 ++++++++++++-------
>   target-ppc/kvm.c     |  9 +--------
>   target-ppc/kvm_ppc.h |  2 +-
>   3 files changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a23c0f0..47ae6c1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1223,6 +1223,7 @@ static void ppc_spapr_init(MachineState *machine)
>       int i;
>       MemoryRegion *sysmem = get_system_memory();
>       MemoryRegion *ram = g_new(MemoryRegion, 1);
> +    MemoryRegion *rma_region;
>       hwaddr rma_alloc_size;
>       hwaddr node0_size = (nb_numa_nodes > 1) ? numa_info[0].node_mem : ram_size;
>       uint32_t initrd_base = 0;
> @@ -1230,6 +1231,7 @@ static void ppc_spapr_init(MachineState *machine)
>       long load_limit, rtas_limit, fw_size;
>       bool kernel_le = false;
>       char *filename;
> +    void *rma = NULL;
>   
>       msi_supported = true;
>   
> @@ -1239,7 +1241,7 @@ static void ppc_spapr_init(MachineState *machine)
>       cpu_ppc_hypercall = emulate_spapr_hypercall;
>   
>       /* Allocate RMA if necessary */
> -    rma_alloc_size = kvmppc_alloc_rma("ppc_spapr.rma", sysmem);
> +    rma_alloc_size = kvmppc_alloc_rma(&rma);
>   
>       if (rma_alloc_size == -1) {
>           hw_error("qemu: Unable to create RMA\n");
> @@ -1333,13 +1335,16 @@ static void ppc_spapr_init(MachineState *machine)
>   
>       /* allocate RAM */
>       spapr->ram_limit = ram_size;
> -    if (spapr->ram_limit > rma_alloc_size) {
> -        ram_addr_t nonrma_base = rma_alloc_size;
> -        ram_addr_t nonrma_size = spapr->ram_limit - rma_alloc_size;
> +    memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
> +                                         spapr->ram_limit);
> +    memory_region_add_subregion(sysmem, 0, ram);
>   
> -        memory_region_init_ram(ram, NULL, "ppc_spapr.ram", nonrma_size);
> -        vmstate_register_ram_global(ram);
> -        memory_region_add_subregion(sysmem, nonrma_base, ram);
> +    if (rma_alloc_size && rma) {
> +        rma_region = g_new(MemoryRegion, 1);
> +        memory_region_init_ram_ptr(rma_region, NULL, "ppc_spapr.rma",
> +                                   rma_alloc_size, rma);
> +        vmstate_register_ram_global(rma_region);
> +        memory_region_add_subregion(sysmem, 0, rma_region);
>       }
>   
>       filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 995706a..9ca14d2 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -1582,13 +1582,11 @@ int kvmppc_smt_threads(void)
>   }
>   
>   #ifdef TARGET_PPC64
> -off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
> +off_t kvmppc_alloc_rma(void **rma)
>   {
> -    void *rma;
>       off_t size;
>       int fd;
>       struct kvm_allocate_rma ret;
> -    MemoryRegion *rma_region;
>   
>       /* If cap_ppc_rma == 0, contiguous RMA allocation is not supported
>        * if cap_ppc_rma == 1, contiguous RMA allocation is supported, but
> @@ -1617,11 +1615,6 @@ off_t kvmppc_alloc_rma(const char *name, MemoryRegion *sysmem)
>           return -1;
>       };
>   
> -    rma_region = g_new(MemoryRegion, 1);
> -    memory_region_init_ram_ptr(rma_region, NULL, name, size, rma);
> -    vmstate_register_ram_global(rma_region);
> -    memory_region_add_subregion(sysmem, 0, rma_region);
> -

I don't see where you set *rma here.

Apart from that while I think that with hugetlbfs we might actually 
waste a few MB of RAM, I don't think it's a real problem for systems 
that require an RMA. So semantically the change works well for me. 
Please verify it works though :).


Alex