qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: andre.przywara@amd.com, aliguori@us.ibm.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com,
	Wanlong Gao <gaowanlong@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes
Date: Tue, 11 Jun 2013 15:22:13 +0800	[thread overview]
Message-ID: <51B6D025.3040606@cn.fujitsu.com> (raw)
In-Reply-To: <20130605134505.GS2580@otherpad.lan.raisama.net>

On 06/05/2013 09:46 PM, Eduardo Habkost wrote:
> On Wed, Jun 05, 2013 at 11:58:25AM +0800, Wanlong Gao wrote:
>> Add monitor command mem-nodes to show the huge mapped
>> memory nodes locations.
>>
> 
> This is for machine consumption, so we need a QMP command.
> 
>> (qemu) info mem-nodes
>> /proc/14132/fd/13: 00002aaaaac00000-00002aaaeac00000: node0
>> /proc/14132/fd/13: 00002aaaeac00000-00002aab2ac00000: node1
>> /proc/14132/fd/14: 00002aab2ac00000-00002aab2b000000: node0
>> /proc/14132/fd/14: 00002aab2b000000-00002aab2b400000: node1
> 
> Are node0/node1 _host_ nodes?
> 
> How do I know what's the _guest_ address/node corresponding to each
> file/range above?
> 
> What I am really looking for is:
> 
>  * The correspondence between guest (virtual) NUMA nodes and guest
>    physical address ranges (it could be provided by the QMP version of
>    "info numa")

AFAIK, the guest NUMA nodes and guest physical address ranges are set
by seabios, we can't get this information from QEMU, and I think this
information is useless for pinning memory range to host.

>  * The correspondence between guest physical address ranges and ranges
>    inside the mapped files (so external tools could set the policy on
>    those files instead of requiring QEMU to set it directly)
> 
> I understand that your use case may require additional information and
> additional interfaces. But if we provide the information above we will
> allow external components set the policy on the hugetlbfs files before
> we add new interfaces required for your use case.

But the file backed memory is not good for the host which has many
virtual machines, in this situation, we can't handle anon THP yet.

And as I mentioned, the cross numa node access performance regression
is caused by pci-passthrough, it's a very long time bug, we should
back port the host memory pinning patch to old QEMU to resolve this performance
problem, too.

Thanks,
Wanlong Gao

> 
> Also, what about making it conditional to OSes where we really know
> "/proc/<pid>/fd/<fd>" is available?
> 
> 
>>
>> Refer to the proposal of Eduardo and Daniel.
>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/93476
> 
>>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>>  monitor.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 45 insertions(+)
>>
>> diff --git a/monitor.c b/monitor.c
>> index eefc7f0..85c865f 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -74,6 +74,10 @@
>>  #endif
>>  #include "hw/lm32/lm32_pic.h"
>>  
>> +#if defined(CONFIG_NUMA)
>> +#include <numaif.h>
>> +#endif
>> +
>>  //#define DEBUG
>>  //#define DEBUG_COMPLETION
>>  
>> @@ -1759,6 +1763,38 @@ static void mem_info(Monitor *mon, const QDict *qdict)
>>  }
>>  #endif
>>  
>> +#if defined(CONFIG_NUMA)
>> +static void mem_nodes(Monitor *mon, const QDict *qdict)
>> +{
>> +    RAMBlock *block;
>> +    int prevnode, node;
>> +    unsigned long long c, start, area;
>> +    int fd;
>> +    int pid = getpid();
>> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>> +        if (!(fd = block->fd))
>> +            continue;
>> +        prevnode = -1;
>> +        start = 0;
>> +        area = (unsigned long long)block->host;
>> +        for (c = 0; c < block->length; c += TARGET_PAGE_SIZE) {
>> +            if (get_mempolicy(&node, NULL, 0, c + block->host,
>> +                              MPOL_F_ADDR | MPOL_F_NODE) < 0)
>> +                continue;
>> +            if (node == prevnode)
>> +                continue;
>> +            if (prevnode != -1)
>> +                monitor_printf(mon, "/proc/%d/fd/%d: %016Lx-%016Lx: node%d\n",
>> +                               pid, fd, start + area, c + area, prevnode);
>> +            prevnode = node;
>> +            start = c;
>> +         }
>> +         monitor_printf(mon, "/proc/%d/fd/%d: %016Lx-%016Lx: node%d\n",
>> +                        pid, fd, start + area, c + area, prevnode);
>> +    }
>> +}
>> +#endif
>> +
>>  #if defined(TARGET_SH4)
>>  
>>  static void print_tlb(Monitor *mon, int idx, tlb_t *tlb)
>> @@ -2567,6 +2603,15 @@ static mon_cmd_t info_cmds[] = {
>>          .mhandler.cmd = mem_info,
>>      },
>>  #endif
>> +#if defined(CONFIG_NUMA)
>> +    {
>> +        .name       = "mem-nodes",
>> +        .args_type  = "",
>> +        .params     = "",
>> +        .help       = "show the huge mapped memory nodes location",
>> +        .mhandler.cmd = mem_nodes,
>> +    },
>> +#endif
>>      {
>>          .name       = "mtree",
>>          .args_type  = "",
>> -- 
>> 1.8.3.rc2.10.g0c2b1cf
>>
> 

  reply	other threads:[~2013-06-11  7:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05  3:58 [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Wanlong Gao
2013-06-05  3:58 ` [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes Wanlong Gao
2013-06-05 12:39   ` Eric Blake
2013-06-05 12:57   ` Anthony Liguori
2013-06-05 15:54     ` Eduardo Habkost
2013-06-06  9:30       ` Wanlong Gao
2013-06-06 16:15         ` Eduardo Habkost
2013-06-14  1:04       ` Anthony Liguori
2013-06-14 13:56         ` Eduardo Habkost
2013-06-05 13:46   ` Eduardo Habkost
2013-06-11  7:22     ` Wanlong Gao [this message]
2013-06-11 13:40       ` Eduardo Habkost
2013-06-13  1:40         ` Wanlong Gao
2013-06-13 12:50           ` Eduardo Habkost
2013-06-13 22:32             ` Paolo Bonzini
2013-06-14  1:05               ` Anthony Liguori
2013-06-14  1:16                 ` Wanlong Gao
2013-06-15 17:23                   ` Paolo Bonzini
2013-06-05 10:02 ` [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B6D025.3040606@cn.fujitsu.com \
    --to=gaowanlong@cn.fujitsu.com \
    --cc=aliguori@us.ibm.com \
    --cc=andre.przywara@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).