linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Marek Kedzierski <mkedzier@redhat.com>,
	Hui Zhu <teawater@gmail.com>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	Michal Hocko <mhocko@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-acpi@vger.kernel.org
Subject: Re: [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups
Date: Tue, 8 Jun 2021 11:42:49 +0200	[thread overview]
Message-ID: <20210608094244.GA22894@linux> (raw)
In-Reply-To: <20210607195430.48228-1-david@redhat.com>

On Mon, Jun 07, 2021 at 09:54:18PM +0200, David Hildenbrand wrote:
> Hi,
> 
> this series aims at improving in-kernel auto-online support. It tackles the
> fundamental problems that:

Hi David,

the idea sounds good to me, and I like that this series takes away part of the
responsability from the user to know where the memory should go.
I think the kernel is a much better fit for that as it has all the required
information to balance things.

I also glanced over the series and besides some things here and there the
whole approach looks sane.
I plan to have a look into it in a few days, just have some high level questions
for the time being:

>  1) We can create zone imbalances when onlining all memory blindly to
>     ZONE_MOVABLE, in the worst case crashing the system. We have to know
>     upfront how much memory we are going to hotplug such that we can
>     safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
>     via "online_movable". This is far from practical and only applicable in
>     limited setups -- like inside VMs under the RHV/oVirt hypervisor which
>     will never hotplug more than 3 times the boot memory (and the
>     limitation is only in place due to the Linux limitation).

Could you give more insight about the problems created by zone imbalances (e.g:
a lot of movable memory and little kernel memory).

>  2) We see more setups that implement dynamic VM resizing, hot(un)plugging
>     memory to resize VM memory. In these setups, we might hotplug a lot of
>     memory, but it might happen in various small steps in both directions
>     (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
>     primary driver of this upstream right now, performing such dynamic
>     resizing NUMA-aware via multiple virtio-mem devices.
> 
>     Onlining all hotplugged memory to ZONE_NORMAL means we basically have
>     no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
>     easily run into zone imbalances when growing a VM. We want a mixture,
>     and we want as much memory as reasonable/configured in ZONE_MOVABLE.
> 
>  3) Memory devices consist of 1..X memory block devices, however, the
>     kernel doesn't really track the relationship. Consequently, also user
>     space has no idea. We want to make per-device decisions. As one
>     example, for memory hotunplug it doesn't make sense to use a mixture of
>     zones within a single DIMM: we want all MOVABLE if possible, otherwise
>     all !MOVABLE, because any !MOVABLE part will easily block the DIMM from
>     getting hotunplugged. As another example, virtio-mem operates on
>     individual units that span 1..X memory blocks. Similar to a DIMM, we
>     want a unit to either be all MOVABLE or !MOVABLE. Further, we want
>     as much memory of a virtio-mem device to be MOVABLE as possible.

So, a virtio-mem unit could be seen as DIMM right? 

>  4) We want memory onlining to be done right from the kernel while adding
>     memory; for example, this is reqired for fast memory hotplug for
>     drivers that add individual memory blocks, like virito-mem. We want a
>     way to configure a policy in the kernel and avoid implementing advanced
>     policies in user space.

"we want memory onlining to be done right from the kernel while adding memory"

is not that always the case when a driver adds memory? User has no interaction
with that right?

> The auto-onlining support we have in the kernel is not sufficient. All we
> have is a) online everything movable (online_movable) b) online everything
> !movable (online_kernel) c) keep zones contiguous (online). This series
> allows configuring c) to mean instead "online movable if possible according
> to the coniguration, driven by a maximum MOVABLE:KERNEL ratio" -- a new
> onlining policy.
> 
> This series does 3 things:
> 
>   1) Introduces the "auto-movable" online policy that initially operates on
>      individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
>      to make a decision whether a memory block will be onlined to
>      ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
>      memory does not allow for more MOVABLE memory (details in the
>      patches). CMA memory is treated like MOVABLE memory.

How a user would know which ratio is sane? Could we add some info in the
Docu part that kinda sets some "basic" rules?

>   2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
>      groups and uses group information to make decisions in the
>      "auto-movable" online policy accross memory blocks of a single memory
>      device (modeled as memory group).

So, the distinction being that a DIMM cannot grow larger but we can add more
memory to a virtio-mem unit? I feel I am missing some insight here.

>   3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
>      allowing ZONE_NORMAL memory within a dynamic memory group to allow for
>      more ZONE_MOVABLE memory within the same memory group. The target use
>      case is dynamic VM resizing using virtio-mem.

Sorry, I got lost in this one. Care to explain a bit more? 

> The target usage will be:
> 
>   1) Linux boots with "mhp_default_online_type=offline"
> 
>   2) User space (e.g., systemd unit) configures memory onlining (according
>      to a config file and system properties), for example:
>      * Setting memory_hotplug.online_policy=auto-movable
>      * Setting memory_hotplug.auto_movable_ratio=301
>      * Setting memory_hotplug.auto_movable_numa_aware=true

I think we would need to document those in order to let the user know what
it is best for them. e.g: when do we want to enable auto_movable_numa_aware etc.

> For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
> 301% results in the following layout:
> 	Memory block 1-15:    DMA32   (early)
> 	Memory block 32-47:   Normal  (early)
> 	Memory block 48-79:   Movable (DIMM 0)
> 	Memory block 80-111:  Movable (DIMM 1)
> 	Memory block 112-143: Movable (DIMM 2)
> 	Memory block 144-275: Normal  (DIMM 3)
> 	Memory block 176-207: Normal  (DIMM 4)
> 	... all Normal
> 	(-> hotplugged Normal memory does not allow for more Movable memory)

Uhm, I am sorry for being dense here:

On x86_64, 4GB = 32 sections (of 128MB each). Why the memblock span from #1 to #47?

> For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM
> will result in the following layout:
> 	Memory block 1-15:    DMA32   (early)
> 	Memory block 32-47:   Normal  (early)
> 	Memory block 48-143:  Movable (virtio-mem, first 12 GiB)
> 	Memory block 144:     Normal  (virtio-mem, next 128 MiB)
> 	Memory block 145-147: Movable (virtio-mem, next 384 MiB)
> 	Memory block 148:     Normal  (virtio-mem, next 128 MiB)
> 	Memory block 149-151: Movable (virtio-mem, next 384 MiB)
> 	... Normal/Movable mixture as above
> 	(-> hotplugged Normal memory allows for more Movable memory within
> 	    the same device)
> 
> Which gives us maximum flexibility when dynamically growing/shrinking a
> VM in smaller steps. When shrinking, virtio-mem will prioritize unplug of
> MOVABLE memory with [1] sent last week, such that we won't accidentially
> trigger zone imbalances in more complicated setups that involve multiple
> virtio-mem devices.

-- 
Oscar Salvador
SUSE L3


  parent reply	other threads:[~2021-06-08  9:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 19:54 [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 01/12] mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 02/12] mm: track present early pages per zone David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 03/12] mm/memory_hotplug: introduce "auto-movable" online policy David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 04/12] mm/memory_hotplug: remove nid parameter from arch_remove_memory() David Hildenbrand
2021-06-08  8:32   ` Catalin Marinas
2021-06-08 10:50   ` Michael Ellerman
2021-06-09  5:51   ` Heiko Carstens
2021-06-07 19:54 ` [PATCH v1 05/12] mm/memory_hotplug: remove nid parameter from remove_memory() and friends David Hildenbrand
2021-06-08 11:11   ` Michael Ellerman
2021-06-08 11:18     ` David Hildenbrand
2021-06-09 10:05       ` David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 06/12] drivers/base/memory: "memory groups" to logically group memory blocks David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 07/12] mm/memory_hotplug: track present pages in memory groups David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 08/12] ACPI: memhotplug: memory resources cannot be enabled yet David Hildenbrand
2021-06-08 12:20   ` Rafael J. Wysocki
2021-06-07 19:54 ` [PATCH v1 09/12] ACPI: memhotplug: use a single static memory group for a single memory device David Hildenbrand
2021-06-08 12:20   ` Rafael J. Wysocki
2021-06-07 19:54 ` [PATCH v1 10/12] virtio-mem: use a single dynamic memory group for a single virtio-mem device David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 11/12] mm/memory_hotplug: memory group aware "auto-movable" online policy David Hildenbrand
2021-06-07 19:54 ` [PATCH v1 12/12] mm/memory_hotplug: improved dynamic " David Hildenbrand
2021-06-08  9:42 ` Oscar Salvador [this message]
2021-06-08 10:12   ` [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210608094244.GA22894@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkedzier@redhat.com \
    --cc=mst@redhat.com \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=rjw@rjwysocki.net \
    --cc=rppt@kernel.org \
    --cc=teawater@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).