All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect
@ 2020-07-25  7:26 Vishnu Dixit
  2020-07-27 17:33   ` Igor
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Vishnu Dixit @ 2020-07-25  7:26 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

I was trying to configure latencies and bandwidths between nodes in a
NUMA emulation using QEMU 5.0.0.

Host : Ubuntu 20.04 64 bit
Guest : Ubuntu 18.04 64 bit

The machine configured has 2 nodes. Each node has 2 CPUs and has been
allocated 3GB of memory. The memory access latencies and bandwidths for
a local access (i.e from initiator 0 to target 0, and from initiator 1
to target 1) are set as 40ns and 10GB/s respectively. The memory access
latencies and bandwidths for a remote access (i.e from initiator 1 to
target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
respectively.

The command line launch is as follows.

sudo x86_64-softmmu/qemu-system-x86_64  \
-machine hmat=on \
-boot c \
-enable-kvm \
-m 6G,slots=2,maxmem=7G \
-object memory-backend-ram,size=3G,id=m0 \
-object memory-backend-ram,size=3G,id=m1 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1 \
-smp 4,sockets=4,maxcpus=4  \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1 \
-numa cpu,node-id=1,socket-id=2 \
-numa cpu,node-id=1,socket-id=3 \
-numa dist,src=0,dst=1,val=20 \
-net nic \
-net user \
-hda testing.img \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
-numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
-numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
-numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
-numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \

Then the latencies and bandwidths between the nodes were tested using
the Intel Memory Latency Checker v3.9
(https://software.intel.com/content/www/us/en/develop/articles/intelr-
memory-latency-checker.html). But the obtained results did not match the
configuration. The following are the results obtained.

Latency_matrix with idle latencies (in ns)

Numa Node
. .0. . .1.
0 36.2 36.4
1 34.9 35.4

Bandwidth_matrix with memory bandwidths (in MB/s)

Numa Node
. . .0. . . .1. 
0 15167.1 15308.9
1 15226.0 15234.0

A test was also conducted with the tool “lat_mem_rd” from lmbench to
measure the memory read latencies. This also gave results which did not
match the config.

Any information on why the config latency and bandwidth values are not
applied, would be appreciated.

** Affects: qemu
     Importance: Undecided
         Status: New


** Tags: bandwidth hmat hmat-lb latency

** Description changed:

  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.
  
  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit
-  
- The machine configured has 2 nodes. Each node has 2 CPUs and has been allocated 3GB of memory. The memory access latencies and bandwidths for a local access (i.e from initiator 0 to target 0, and from initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The memory access latencies and bandwidths for a remote access (i.e from initiator 1 to target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s respectively. 
+ 
+ The machine configured has 2 nodes. Each node has 2 CPUs and has been
+ allocated 3GB of memory. The memory access latencies and bandwidths for
+ a local access (i.e from initiator 0 to target 0, and from initiator 1
+ to target 1) are set as 40ns and 10GB/s respectively. The memory access
+ latencies and bandwidths for a remote access (i.e from initiator 1 to
+ target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
+ respectively.
  
  The command line launch is as follows.
  
  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  
  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match the
  configuration. The following are the results obtained.
  
  Latency_matrix with idle latencies (in ns)
  
  Numa
- Node  0     1
+ lat node0 node1
  0    36.2 36.4
  1    34.9 35.4
  
  Bandwidth_matrix with memory bandwidths (in MB/s)
  
- Numa
- Node 0       1
+ Numa Node 
+ bw node0 .bw node1
  0 15167.1 15308.9
  1 15226.0 15234.0
  
  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did not
  match the config.
  
  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

** Description changed:

  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.
  
  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit
  
  The machine configured has 2 nodes. Each node has 2 CPUs and has been
  allocated 3GB of memory. The memory access latencies and bandwidths for
  a local access (i.e from initiator 0 to target 0, and from initiator 1
  to target 1) are set as 40ns and 10GB/s respectively. The memory access
  latencies and bandwidths for a remote access (i.e from initiator 1 to
  target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
  respectively.
  
  The command line launch is as follows.
  
  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  
  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match the
  configuration. The following are the results obtained.
  
  Latency_matrix with idle latencies (in ns)
  
- Numa
+ Numa Node
  lat node0 node1
  0    36.2 36.4
  1    34.9 35.4
  
  Bandwidth_matrix with memory bandwidths (in MB/s)
  
- Numa Node 
- bw node0 .bw node1
+ Numa Node
+ bw node0  bw node1
  0 15167.1 15308.9
  1 15226.0 15234.0
  
  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did not
  match the config.
  
  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

** Description changed:

  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.
  
  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit
  
  The machine configured has 2 nodes. Each node has 2 CPUs and has been
  allocated 3GB of memory. The memory access latencies and bandwidths for
  a local access (i.e from initiator 0 to target 0, and from initiator 1
  to target 1) are set as 40ns and 10GB/s respectively. The memory access
  latencies and bandwidths for a remote access (i.e from initiator 1 to
  target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
  respectively.
  
  The command line launch is as follows.
  
  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  
  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match the
  configuration. The following are the results obtained.
  
  Latency_matrix with idle latencies (in ns)
  
  Numa Node
- lat node0 node1
- 0    36.2 36.4
- 1    34.9 35.4
+ . .0. . .1.
+ 0 36.2 36.4
+ 1 34.9 35.4
  
  Bandwidth_matrix with memory bandwidths (in MB/s)
  
  Numa Node
- bw node0  bw node1
+ . . .0. . . .1. 
  0 15167.1 15308.9
  1 15226.0 15234.0
  
  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did not
  match the config.
  
  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1888923

Title:
  Configured Memory access latency and bandwidth not taking effect

Status in QEMU:
  New

Bug description:
  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.

  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit

  The machine configured has 2 nodes. Each node has 2 CPUs and has been
  allocated 3GB of memory. The memory access latencies and bandwidths
  for a local access (i.e from initiator 0 to target 0, and from
  initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The
  memory access latencies and bandwidths for a remote access (i.e from
  initiator 1 to target 0, and from initiator 0 to target 1) are set as
  80ns and 5GB/s respectively.

  The command line launch is as follows.

  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \

  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match
  the configuration. The following are the results obtained.

  Latency_matrix with idle latencies (in ns)

  Numa Node
  . .0. . .1.
  0 36.2 36.4
  1 34.9 35.4

  Bandwidth_matrix with memory bandwidths (in MB/s)

  Numa Node
  . . .0. . . .1. 
  0 15167.1 15308.9
  1 15226.0 15234.0

  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did
  not match the config.

  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1888923/+subscriptions


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect
@ 2020-07-27 17:33   ` Igor
  0 siblings, 0 replies; 5+ messages in thread
From: Igor Mammedov @ 2020-07-27 17:33 UTC (permalink / raw)
  To: Vishnu Dixit; +Cc: qemu-devel

On Sat, 25 Jul 2020 07:26:38 -0000
Vishnu Dixit <1888923@bugs.launchpad.net> wrote:

> Public bug reported:
> 
> I was trying to configure latencies and bandwidths between nodes in a
> NUMA emulation using QEMU 5.0.0.
> 
> Host : Ubuntu 20.04 64 bit
> Guest : Ubuntu 18.04 64 bit
> 
> The machine configured has 2 nodes. Each node has 2 CPUs and has been
> allocated 3GB of memory. The memory access latencies and bandwidths for
> a local access (i.e from initiator 0 to target 0, and from initiator 1
> to target 1) are set as 40ns and 10GB/s respectively. The memory access
> latencies and bandwidths for a remote access (i.e from initiator 1 to
> target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
> respectively.
> 
> The command line launch is as follows.
> 
> sudo x86_64-softmmu/qemu-system-x86_64  \
> -machine hmat=on \
> -boot c \
> -enable-kvm \
> -m 6G,slots=2,maxmem=7G \
> -object memory-backend-ram,size=3G,id=m0 \
> -object memory-backend-ram,size=3G,id=m1 \
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1,memdev=m1 \
> -smp 4,sockets=4,maxcpus=4  \
> -numa cpu,node-id=0,socket-id=0 \
> -numa cpu,node-id=0,socket-id=1 \
> -numa cpu,node-id=1,socket-id=2 \
> -numa cpu,node-id=1,socket-id=3 \
> -numa dist,src=0,dst=1,val=20 \
> -net nic \
> -net user \
> -hda testing.img \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
> -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
> -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
> -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
> -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
> 
> Then the latencies and bandwidths between the nodes were tested using
> the Intel Memory Latency Checker v3.9
> (https://software.intel.com/content/www/us/en/develop/articles/intelr-
> memory-latency-checker.html). But the obtained results did not match the
> configuration. The following are the results obtained.
> 
> Latency_matrix with idle latencies (in ns)
> 
> Numa Node
> . .0. . .1.
> 0 36.2 36.4
> 1 34.9 35.4
> 
> Bandwidth_matrix with memory bandwidths (in MB/s)
> 
> Numa Node
> . . .0. . . .1. 
> 0 15167.1 15308.9
> 1 15226.0 15234.0
> 
> A test was also conducted with the tool “lat_mem_rd” from lmbench to
> measure the memory read latencies. This also gave results which did not
> match the config.
> 
> Any information on why the config latency and bandwidth values are not
> applied, would be appreciated.

There is no information about host hardware, so I'd onldy hazard a guess
that host is non NUMA machine so all guest RAM and CPUs are in the same
latency domain, so that's why you are seeing pretty much the same timings.

QEMU nor KMV do not simullate HW latencies at all, all that is configured
with '-numa hmat-lb' is intended for guest OS consumption as a hint for
smarter memory allocation and it's on to user to pin CPUs and RAM to
concrete host NUMA nodes and use  host's values in '-numa hmat-lb' to
actually get performance benefits from it on 'NUMA' machine.
On non NUMA host it's rather pointless except of the cases where one
needs to fake NUMA config (like testing some aspects of NUMA related code
in guest OS).

> 
> ** Affects: qemu
>      Importance: Undecided
>          Status: New
> 
> 
> ** Tags: bandwidth hmat hmat-lb latency
> 
> ** Description changed:
> 
>   I was trying to configure latencies and bandwidths between nodes in a
>   NUMA emulation using QEMU 5.0.0.
>   
>   Host : Ubuntu 20.04 64 bit
>   Guest : Ubuntu 18.04 64 bit
> -  
> - The machine configured has 2 nodes. Each node has 2 CPUs and has been allocated 3GB of memory. The memory access latencies and bandwidths for a local access (i.e from initiator 0 to target 0, and from initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The memory access latencies and bandwidths for a remote access (i.e from initiator 1 to target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s respectively. 
> + 
> + The machine configured has 2 nodes. Each node has 2 CPUs and has been
> + allocated 3GB of memory. The memory access latencies and bandwidths for
> + a local access (i.e from initiator 0 to target 0, and from initiator 1
> + to target 1) are set as 40ns and 10GB/s respectively. The memory access
> + latencies and bandwidths for a remote access (i.e from initiator 1 to
> + target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
> + respectively.
>   
>   The command line launch is as follows.
>   
>   sudo x86_64-softmmu/qemu-system-x86_64  \
>   -machine hmat=on \
>   -boot c \
>   -enable-kvm \
>   -m 6G,slots=2,maxmem=7G \
>   -object memory-backend-ram,size=3G,id=m0 \
>   -object memory-backend-ram,size=3G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 4,sockets=4,maxcpus=4  \
>   -numa cpu,node-id=0,socket-id=0 \
>   -numa cpu,node-id=0,socket-id=1 \
>   -numa cpu,node-id=1,socket-id=2 \
>   -numa cpu,node-id=1,socket-id=3 \
>   -numa dist,src=0,dst=1,val=20 \
>   -net nic \
>   -net user \
>   -hda testing.img \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   
>   Then the latencies and bandwidths between the nodes were tested using
>   the Intel Memory Latency Checker v3.9
>   (https://software.intel.com/content/www/us/en/develop/articles/intelr-
>   memory-latency-checker.html). But the obtained results did not match the
>   configuration. The following are the results obtained.
>   
>   Latency_matrix with idle latencies (in ns)
>   
>   Numa
> - Node  0     1
> + lat node0 node1
>   0    36.2 36.4
>   1    34.9 35.4
>   
>   Bandwidth_matrix with memory bandwidths (in MB/s)
>   
> - Numa
> - Node 0       1
> + Numa Node 
> + bw node0 .bw node1
>   0 15167.1 15308.9
>   1 15226.0 15234.0
>   
>   A test was also conducted with the tool “lat_mem_rd” from lmbench to
>   measure the memory read latencies. This also gave results which did not
>   match the config.
>   
>   Any information on why the config latency and bandwidth values are not
>   applied, would be appreciated.
> 
> ** Description changed:
> 
>   I was trying to configure latencies and bandwidths between nodes in a
>   NUMA emulation using QEMU 5.0.0.
>   
>   Host : Ubuntu 20.04 64 bit
>   Guest : Ubuntu 18.04 64 bit
>   
>   The machine configured has 2 nodes. Each node has 2 CPUs and has been
>   allocated 3GB of memory. The memory access latencies and bandwidths for
>   a local access (i.e from initiator 0 to target 0, and from initiator 1
>   to target 1) are set as 40ns and 10GB/s respectively. The memory access
>   latencies and bandwidths for a remote access (i.e from initiator 1 to
>   target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
>   respectively.
>   
>   The command line launch is as follows.
>   
>   sudo x86_64-softmmu/qemu-system-x86_64  \
>   -machine hmat=on \
>   -boot c \
>   -enable-kvm \
>   -m 6G,slots=2,maxmem=7G \
>   -object memory-backend-ram,size=3G,id=m0 \
>   -object memory-backend-ram,size=3G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 4,sockets=4,maxcpus=4  \
>   -numa cpu,node-id=0,socket-id=0 \
>   -numa cpu,node-id=0,socket-id=1 \
>   -numa cpu,node-id=1,socket-id=2 \
>   -numa cpu,node-id=1,socket-id=3 \
>   -numa dist,src=0,dst=1,val=20 \
>   -net nic \
>   -net user \
>   -hda testing.img \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   
>   Then the latencies and bandwidths between the nodes were tested using
>   the Intel Memory Latency Checker v3.9
>   (https://software.intel.com/content/www/us/en/develop/articles/intelr-
>   memory-latency-checker.html). But the obtained results did not match the
>   configuration. The following are the results obtained.
>   
>   Latency_matrix with idle latencies (in ns)
>   
> - Numa
> + Numa Node
>   lat node0 node1
>   0    36.2 36.4
>   1    34.9 35.4
>   
>   Bandwidth_matrix with memory bandwidths (in MB/s)
>   
> - Numa Node 
> - bw node0 .bw node1
> + Numa Node
> + bw node0  bw node1
>   0 15167.1 15308.9
>   1 15226.0 15234.0
>   
>   A test was also conducted with the tool “lat_mem_rd” from lmbench to
>   measure the memory read latencies. This also gave results which did not
>   match the config.
>   
>   Any information on why the config latency and bandwidth values are not
>   applied, would be appreciated.
> 
> ** Description changed:
> 
>   I was trying to configure latencies and bandwidths between nodes in a
>   NUMA emulation using QEMU 5.0.0.
>   
>   Host : Ubuntu 20.04 64 bit
>   Guest : Ubuntu 18.04 64 bit
>   
>   The machine configured has 2 nodes. Each node has 2 CPUs and has been
>   allocated 3GB of memory. The memory access latencies and bandwidths for
>   a local access (i.e from initiator 0 to target 0, and from initiator 1
>   to target 1) are set as 40ns and 10GB/s respectively. The memory access
>   latencies and bandwidths for a remote access (i.e from initiator 1 to
>   target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
>   respectively.
>   
>   The command line launch is as follows.
>   
>   sudo x86_64-softmmu/qemu-system-x86_64  \
>   -machine hmat=on \
>   -boot c \
>   -enable-kvm \
>   -m 6G,slots=2,maxmem=7G \
>   -object memory-backend-ram,size=3G,id=m0 \
>   -object memory-backend-ram,size=3G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 4,sockets=4,maxcpus=4  \
>   -numa cpu,node-id=0,socket-id=0 \
>   -numa cpu,node-id=0,socket-id=1 \
>   -numa cpu,node-id=1,socket-id=2 \
>   -numa cpu,node-id=1,socket-id=3 \
>   -numa dist,src=0,dst=1,val=20 \
>   -net nic \
>   -net user \
>   -hda testing.img \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   
>   Then the latencies and bandwidths between the nodes were tested using
>   the Intel Memory Latency Checker v3.9
>   (https://software.intel.com/content/www/us/en/develop/articles/intelr-
>   memory-latency-checker.html). But the obtained results did not match the
>   configuration. The following are the results obtained.
>   
>   Latency_matrix with idle latencies (in ns)
>   
>   Numa Node
> - lat node0 node1
> - 0    36.2 36.4
> - 1    34.9 35.4
> + . .0. . .1.
> + 0 36.2 36.4
> + 1 34.9 35.4
>   
>   Bandwidth_matrix with memory bandwidths (in MB/s)
>   
>   Numa Node
> - bw node0  bw node1
> + . . .0. . . .1. 
>   0 15167.1 15308.9
>   1 15226.0 15234.0
>   
>   A test was also conducted with the tool “lat_mem_rd” from lmbench to
>   measure the memory read latencies. This also gave results which did not
>   match the config.
>   
>   Any information on why the config latency and bandwidth values are not
>   applied, would be appreciated.
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect
@ 2020-07-27 17:33   ` Igor
  0 siblings, 0 replies; 5+ messages in thread
From: Igor @ 2020-07-27 17:33 UTC (permalink / raw)
  To: qemu-devel

On Sat, 25 Jul 2020 07:26:38 -0000
Vishnu Dixit <1888923@bugs.launchpad.net> wrote:

> Public bug reported:
> 
> I was trying to configure latencies and bandwidths between nodes in a
> NUMA emulation using QEMU 5.0.0.
> 
> Host : Ubuntu 20.04 64 bit
> Guest : Ubuntu 18.04 64 bit
> 
> The machine configured has 2 nodes. Each node has 2 CPUs and has been
> allocated 3GB of memory. The memory access latencies and bandwidths for
> a local access (i.e from initiator 0 to target 0, and from initiator 1
> to target 1) are set as 40ns and 10GB/s respectively. The memory access
> latencies and bandwidths for a remote access (i.e from initiator 1 to
> target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
> respectively.
> 
> The command line launch is as follows.
> 
> sudo x86_64-softmmu/qemu-system-x86_64  \
> -machine hmat=on \
> -boot c \
> -enable-kvm \
> -m 6G,slots=2,maxmem=7G \
> -object memory-backend-ram,size=3G,id=m0 \
> -object memory-backend-ram,size=3G,id=m1 \
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1,memdev=m1 \
> -smp 4,sockets=4,maxcpus=4  \
> -numa cpu,node-id=0,socket-id=0 \
> -numa cpu,node-id=0,socket-id=1 \
> -numa cpu,node-id=1,socket-id=2 \
> -numa cpu,node-id=1,socket-id=3 \
> -numa dist,src=0,dst=1,val=20 \
> -net nic \
> -net user \
> -hda testing.img \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
> -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
> -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
> -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
> -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
> 
> Then the latencies and bandwidths between the nodes were tested using
> the Intel Memory Latency Checker v3.9
> (https://software.intel.com/content/www/us/en/develop/articles/intelr-
> memory-latency-checker.html). But the obtained results did not match the
> configuration. The following are the results obtained.
> 
> Latency_matrix with idle latencies (in ns)
> 
> Numa Node
> . .0. . .1.
> 0 36.2 36.4
> 1 34.9 35.4
> 
> Bandwidth_matrix with memory bandwidths (in MB/s)
> 
> Numa Node
> . . .0. . . .1. 
> 0 15167.1 15308.9
> 1 15226.0 15234.0
> 
> A test was also conducted with the tool “lat_mem_rd” from lmbench to
> measure the memory read latencies. This also gave results which did not
> match the config.
> 
> Any information on why the config latency and bandwidth values are not
> applied, would be appreciated.

There is no information about host hardware, so I'd onldy hazard a guess
that host is non NUMA machine so all guest RAM and CPUs are in the same
latency domain, so that's why you are seeing pretty much the same timings.

QEMU nor KMV do not simullate HW latencies at all, all that is configured
with '-numa hmat-lb' is intended for guest OS consumption as a hint for
smarter memory allocation and it's on to user to pin CPUs and RAM to
concrete host NUMA nodes and use  host's values in '-numa hmat-lb' to
actually get performance benefits from it on 'NUMA' machine.
On non NUMA host it's rather pointless except of the cases where one
needs to fake NUMA config (like testing some aspects of NUMA related code
in guest OS).

> 
> ** Affects: qemu
>      Importance: Undecided
>          Status: New
> 
> 
> ** Tags: bandwidth hmat hmat-lb latency
> 
> ** Description changed:
> 
>   I was trying to configure latencies and bandwidths between nodes in a
>   NUMA emulation using QEMU 5.0.0.
>   
>   Host : Ubuntu 20.04 64 bit
>   Guest : Ubuntu 18.04 64 bit
> -  
> - The machine configured has 2 nodes. Each node has 2 CPUs and has been allocated 3GB of memory. The memory access latencies and bandwidths for a local access (i.e from initiator 0 to target 0, and from initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The memory access latencies and bandwidths for a remote access (i.e from initiator 1 to target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s respectively. 
> + 
> + The machine configured has 2 nodes. Each node has 2 CPUs and has been
> + allocated 3GB of memory. The memory access latencies and bandwidths for
> + a local access (i.e from initiator 0 to target 0, and from initiator 1
> + to target 1) are set as 40ns and 10GB/s respectively. The memory access
> + latencies and bandwidths for a remote access (i.e from initiator 1 to
> + target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
> + respectively.
>   
>   The command line launch is as follows.
>   
>   sudo x86_64-softmmu/qemu-system-x86_64  \
>   -machine hmat=on \
>   -boot c \
>   -enable-kvm \
>   -m 6G,slots=2,maxmem=7G \
>   -object memory-backend-ram,size=3G,id=m0 \
>   -object memory-backend-ram,size=3G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 4,sockets=4,maxcpus=4  \
>   -numa cpu,node-id=0,socket-id=0 \
>   -numa cpu,node-id=0,socket-id=1 \
>   -numa cpu,node-id=1,socket-id=2 \
>   -numa cpu,node-id=1,socket-id=3 \
>   -numa dist,src=0,dst=1,val=20 \
>   -net nic \
>   -net user \
>   -hda testing.img \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   
>   Then the latencies and bandwidths between the nodes were tested using
>   the Intel Memory Latency Checker v3.9
>   (https://software.intel.com/content/www/us/en/develop/articles/intelr-
>   memory-latency-checker.html). But the obtained results did not match the
>   configuration. The following are the results obtained.
>   
>   Latency_matrix with idle latencies (in ns)
>   
>   Numa
> - Node  0     1
> + lat node0 node1
>   0    36.2 36.4
>   1    34.9 35.4
>   
>   Bandwidth_matrix with memory bandwidths (in MB/s)
>   
> - Numa
> - Node 0       1
> + Numa Node 
> + bw node0 .bw node1
>   0 15167.1 15308.9
>   1 15226.0 15234.0
>   
>   A test was also conducted with the tool “lat_mem_rd” from lmbench to
>   measure the memory read latencies. This also gave results which did not
>   match the config.
>   
>   Any information on why the config latency and bandwidth values are not
>   applied, would be appreciated.
> 
> ** Description changed:
> 
>   I was trying to configure latencies and bandwidths between nodes in a
>   NUMA emulation using QEMU 5.0.0.
>   
>   Host : Ubuntu 20.04 64 bit
>   Guest : Ubuntu 18.04 64 bit
>   
>   The machine configured has 2 nodes. Each node has 2 CPUs and has been
>   allocated 3GB of memory. The memory access latencies and bandwidths for
>   a local access (i.e from initiator 0 to target 0, and from initiator 1
>   to target 1) are set as 40ns and 10GB/s respectively. The memory access
>   latencies and bandwidths for a remote access (i.e from initiator 1 to
>   target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
>   respectively.
>   
>   The command line launch is as follows.
>   
>   sudo x86_64-softmmu/qemu-system-x86_64  \
>   -machine hmat=on \
>   -boot c \
>   -enable-kvm \
>   -m 6G,slots=2,maxmem=7G \
>   -object memory-backend-ram,size=3G,id=m0 \
>   -object memory-backend-ram,size=3G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 4,sockets=4,maxcpus=4  \
>   -numa cpu,node-id=0,socket-id=0 \
>   -numa cpu,node-id=0,socket-id=1 \
>   -numa cpu,node-id=1,socket-id=2 \
>   -numa cpu,node-id=1,socket-id=3 \
>   -numa dist,src=0,dst=1,val=20 \
>   -net nic \
>   -net user \
>   -hda testing.img \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   
>   Then the latencies and bandwidths between the nodes were tested using
>   the Intel Memory Latency Checker v3.9
>   (https://software.intel.com/content/www/us/en/develop/articles/intelr-
>   memory-latency-checker.html). But the obtained results did not match the
>   configuration. The following are the results obtained.
>   
>   Latency_matrix with idle latencies (in ns)
>   
> - Numa
> + Numa Node
>   lat node0 node1
>   0    36.2 36.4
>   1    34.9 35.4
>   
>   Bandwidth_matrix with memory bandwidths (in MB/s)
>   
> - Numa Node 
> - bw node0 .bw node1
> + Numa Node
> + bw node0  bw node1
>   0 15167.1 15308.9
>   1 15226.0 15234.0
>   
>   A test was also conducted with the tool “lat_mem_rd” from lmbench to
>   measure the memory read latencies. This also gave results which did not
>   match the config.
>   
>   Any information on why the config latency and bandwidth values are not
>   applied, would be appreciated.
> 
> ** Description changed:
> 
>   I was trying to configure latencies and bandwidths between nodes in a
>   NUMA emulation using QEMU 5.0.0.
>   
>   Host : Ubuntu 20.04 64 bit
>   Guest : Ubuntu 18.04 64 bit
>   
>   The machine configured has 2 nodes. Each node has 2 CPUs and has been
>   allocated 3GB of memory. The memory access latencies and bandwidths for
>   a local access (i.e from initiator 0 to target 0, and from initiator 1
>   to target 1) are set as 40ns and 10GB/s respectively. The memory access
>   latencies and bandwidths for a remote access (i.e from initiator 1 to
>   target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
>   respectively.
>   
>   The command line launch is as follows.
>   
>   sudo x86_64-softmmu/qemu-system-x86_64  \
>   -machine hmat=on \
>   -boot c \
>   -enable-kvm \
>   -m 6G,slots=2,maxmem=7G \
>   -object memory-backend-ram,size=3G,id=m0 \
>   -object memory-backend-ram,size=3G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 4,sockets=4,maxcpus=4  \
>   -numa cpu,node-id=0,socket-id=0 \
>   -numa cpu,node-id=0,socket-id=1 \
>   -numa cpu,node-id=1,socket-id=2 \
>   -numa cpu,node-id=1,socket-id=3 \
>   -numa dist,src=0,dst=1,val=20 \
>   -net nic \
>   -net user \
>   -hda testing.img \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
>   -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
>   -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>   
>   Then the latencies and bandwidths between the nodes were tested using
>   the Intel Memory Latency Checker v3.9
>   (https://software.intel.com/content/www/us/en/develop/articles/intelr-
>   memory-latency-checker.html). But the obtained results did not match the
>   configuration. The following are the results obtained.
>   
>   Latency_matrix with idle latencies (in ns)
>   
>   Numa Node
> - lat node0 node1
> - 0    36.2 36.4
> - 1    34.9 35.4
> + . .0. . .1.
> + 0 36.2 36.4
> + 1 34.9 35.4
>   
>   Bandwidth_matrix with memory bandwidths (in MB/s)
>   
>   Numa Node
> - bw node0  bw node1
> + . . .0. . . .1. 
>   0 15167.1 15308.9
>   1 15226.0 15234.0
>   
>   A test was also conducted with the tool “lat_mem_rd” from lmbench to
>   measure the memory read latencies. This also gave results which did not
>   match the config.
>   
>   Any information on why the config latency and bandwidth values are not
>   applied, would be appreciated.
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1888923

Title:
  Configured Memory access latency and bandwidth not taking effect

Status in QEMU:
  New

Bug description:
  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.

  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit

  The machine configured has 2 nodes. Each node has 2 CPUs and has been
  allocated 3GB of memory. The memory access latencies and bandwidths
  for a local access (i.e from initiator 0 to target 0, and from
  initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The
  memory access latencies and bandwidths for a remote access (i.e from
  initiator 1 to target 0, and from initiator 0 to target 1) are set as
  80ns and 5GB/s respectively.

  The command line launch is as follows.

  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \

  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match
  the configuration. The following are the results obtained.

  Latency_matrix with idle latencies (in ns)

  Numa Node
  . .0. . .1.
  0 36.2 36.4
  1 34.9 35.4

  Bandwidth_matrix with memory bandwidths (in MB/s)

  Numa Node
  . . .0. . . .1. 
  0 15167.1 15308.9
  1 15226.0 15234.0

  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did
  not match the config.

  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1888923/+subscriptions


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 1888923] Re: Configured Memory access latency and bandwidth not taking effect
  2020-07-25  7:26 [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect Vishnu Dixit
  2020-07-27 17:33   ` Igor
@ 2020-07-28  7:05 ` Thomas Huth
  2020-07-28 10:40 ` Vishnu Dixit
  2 siblings, 0 replies; 5+ messages in thread
From: Thomas Huth @ 2020-07-28  7:05 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1888923

Title:
  Configured Memory access latency and bandwidth not taking effect

Status in QEMU:
  Invalid

Bug description:
  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.

  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit

  The machine configured has 2 nodes. Each node has 2 CPUs and has been
  allocated 3GB of memory. The memory access latencies and bandwidths
  for a local access (i.e from initiator 0 to target 0, and from
  initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The
  memory access latencies and bandwidths for a remote access (i.e from
  initiator 1 to target 0, and from initiator 0 to target 1) are set as
  80ns and 5GB/s respectively.

  The command line launch is as follows.

  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \

  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match
  the configuration. The following are the results obtained.

  Latency_matrix with idle latencies (in ns)

  Numa Node
  . .0. . .1.
  0 36.2 36.4
  1 34.9 35.4

  Bandwidth_matrix with memory bandwidths (in MB/s)

  Numa Node
  . . .0. . . .1. 
  0 15167.1 15308.9
  1 15226.0 15234.0

  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did
  not match the config.

  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1888923/+subscriptions


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 1888923] Re: Configured Memory access latency and bandwidth not taking effect
  2020-07-25  7:26 [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect Vishnu Dixit
  2020-07-27 17:33   ` Igor
  2020-07-28  7:05 ` [Bug 1888923] " Thomas Huth
@ 2020-07-28 10:40 ` Vishnu Dixit
  2 siblings, 0 replies; 5+ messages in thread
From: Vishnu Dixit @ 2020-07-28 10:40 UTC (permalink / raw)
  To: qemu-devel

It indeed was a non NUMA machine

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1888923

Title:
  Configured Memory access latency and bandwidth not taking effect

Status in QEMU:
  Invalid

Bug description:
  I was trying to configure latencies and bandwidths between nodes in a
  NUMA emulation using QEMU 5.0.0.

  Host : Ubuntu 20.04 64 bit
  Guest : Ubuntu 18.04 64 bit

  The machine configured has 2 nodes. Each node has 2 CPUs and has been
  allocated 3GB of memory. The memory access latencies and bandwidths
  for a local access (i.e from initiator 0 to target 0, and from
  initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The
  memory access latencies and bandwidths for a remote access (i.e from
  initiator 1 to target 0, and from initiator 0 to target 1) are set as
  80ns and 5GB/s respectively.

  The command line launch is as follows.

  sudo x86_64-softmmu/qemu-system-x86_64  \
  -machine hmat=on \
  -boot c \
  -enable-kvm \
  -m 6G,slots=2,maxmem=7G \
  -object memory-backend-ram,size=3G,id=m0 \
  -object memory-backend-ram,size=3G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 4,sockets=4,maxcpus=4  \
  -numa cpu,node-id=0,socket-id=0 \
  -numa cpu,node-id=0,socket-id=1 \
  -numa cpu,node-id=1,socket-id=2 \
  -numa cpu,node-id=1,socket-id=3 \
  -numa dist,src=0,dst=1,val=20 \
  -net nic \
  -net user \
  -hda testing.img \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
  -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
  -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \

  Then the latencies and bandwidths between the nodes were tested using
  the Intel Memory Latency Checker v3.9
  (https://software.intel.com/content/www/us/en/develop/articles/intelr-
  memory-latency-checker.html). But the obtained results did not match
  the configuration. The following are the results obtained.

  Latency_matrix with idle latencies (in ns)

  Numa Node
  . .0. . .1.
  0 36.2 36.4
  1 34.9 35.4

  Bandwidth_matrix with memory bandwidths (in MB/s)

  Numa Node
  . . .0. . . .1. 
  0 15167.1 15308.9
  1 15226.0 15234.0

  A test was also conducted with the tool “lat_mem_rd” from lmbench to
  measure the memory read latencies. This also gave results which did
  not match the config.

  Any information on why the config latency and bandwidth values are not
  applied, would be appreciated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1888923/+subscriptions


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-28 10:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-25  7:26 [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect Vishnu Dixit
2020-07-27 17:33 ` Igor Mammedov
2020-07-27 17:33   ` Igor
2020-07-28  7:05 ` [Bug 1888923] " Thomas Huth
2020-07-28 10:40 ` Vishnu Dixit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.