From: Ross Zwisler <zwisler@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: collision between ZONE_MOVABLE and memblock allocations
Date: Wed, 19 Jul 2023 17:05:15 -0600 [thread overview]
Message-ID: <20230719230515.GA3654720@google.com> (raw)
In-Reply-To: <9770454d-f840-c7cf-314e-ce81839393e3@redhat.com>
On Wed, Jul 19, 2023 at 10:14:59AM +0200, David Hildenbrand wrote:
> On 19.07.23 10:06, Michal Hocko wrote:
> > On Wed 19-07-23 10:59:52, Mike Rapoport wrote:
> > > On Wed, Jul 19, 2023 at 08:14:48AM +0200, Michal Hocko wrote:
> > > > On Tue 18-07-23 16:01:06, Ross Zwisler wrote:
> > > > [...]
> > > > > I do think that we need to fix this collision between ZONE_MOVABLE and memmap
> > > > > allocations, because this issue essentially makes the movablecore= kernel
> > > > > command line parameter useless in many cases, as the ZONE_MOVABLE region it
> > > > > creates will often actually be unmovable.
> > > >
> > > > movablecore is kinda hack and I would be more inclined to get rid of it
> > > > rather than build more into it. Could you be more specific about your
> > > > use case?
> > > >
> > > > > Here are the options I currently see for resolution:
> > > > >
> > > > > 1. Change the way ZONE_MOVABLE memory is allocated so that it is allocated from
> > > > > the beginning of the NUMA node instead of the end. This should fix my use case,
> > > > > but again is prone to breakage in other configurations (# of NUMA nodes, other
> > > > > architectures) where ZONE_MOVABLE and memblock allocations might overlap. I
> > > > > think that this should be relatively straightforward and low risk, though.
> > > > >
> > > > > 2. Make the code which processes the movablecore= command line option aware of
> > > > > the memblock allocations, and have it choose a region for ZONE_MOVABLE which
> > > > > does not have these allocations. This might be done by checking for
> > > > > PageReserved() as we do with offlining memory, though that will take some boot
> > > > > time reordering, or we'll have to figure out the overlap in another way. This
> > > > > may also result in us having two ZONE_NORMAL zones for a given NUMA node, with
> > > > > a ZONE_MOVABLE section in between them. I'm not sure if this is allowed?
> > > >
> > > > Yes, this is no problem. Zones are allowed to be sparse.
> > >
> > > The current initialization order is roughly
> > >
> > > * very early initialization with some memblock allocations
> > > * determine zone locations and sizes
> > > * initialize memory map
> > > - memblock_alloc(lots of memory)
> > > * lots of unrelated initializations that may allocate memory
> > > * release free pages from memblock to the buddy allocator
> > >
> > > With 2) we can make sure the memory map and early allocations won't be in
> > > the ZONE_MOVABLE, but we'll still may have reserved pages there.
> >
> > Yes this will always be fragile. If the spefic placement of the movable
> > memory is not important and the only thing that matters is the size and
> > numa locality then an easier to maintain solution would be to simply
> > offline enough memory blocks very early in the userspace bring up and
> > online it back as movable. If offlining fails just try another
> > memblock. This doesn't require any kernel code change.
>
> As an alternative, we might use the "memmap=nn[KMG]!ss[KMG]" [1] parameter
> to mark some memory as protected.
>
> That memory can then be configured as devdax device and online to
> ZONE_MOVABLE (dev/dax).
>
> [1] https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap
I've previously been reconfiguring devdax memory like this:
ndctl create-namespace --reconfig=namespace0.0 -m devdax -f
daxctl reconfigure-device --mode=system-ram dax0.0
Is this how you've been doing it, or is there something else I should
consider?
I just sent mail to Michal outlining my use case, hopefully it makes sense.
I had thought about using 'memmap=' in the first kernel and the worry was that
I'd have to support many different machines with different memory
configurations, and have to hard-code memory offsets and lengths for the
various memmap= kernel command line parameters. If I can make ZONE_MOVABLE
work that's preferable because the kernel will choose the correct usermem-only
region for me, and then I can just use that region for the crash kernel and
3rd kernel boots.
next prev parent reply other threads:[~2023-07-19 23:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 22:01 collision between ZONE_MOVABLE and memblock allocations Ross Zwisler
2023-07-19 5:44 ` Mike Rapoport
2023-07-19 22:26 ` Ross Zwisler
2023-07-21 11:20 ` Mike Rapoport
2023-07-26 7:49 ` Michal Hocko
2023-07-26 10:48 ` Mike Rapoport
2023-07-26 12:57 ` Michal Hocko
2023-07-26 13:23 ` Mike Rapoport
2023-07-26 14:23 ` Michal Hocko
2023-07-19 6:14 ` Michal Hocko
2023-07-19 7:59 ` Mike Rapoport
2023-07-19 8:06 ` Michal Hocko
2023-07-19 8:14 ` David Hildenbrand
2023-07-19 23:05 ` Ross Zwisler [this message]
2023-07-26 8:31 ` David Hildenbrand
2023-07-19 22:48 ` Ross Zwisler
2023-07-20 7:49 ` Michal Hocko
2023-07-20 12:13 ` Michal Hocko
2023-07-24 16:56 ` Ross Zwisler
2023-07-26 8:44 ` David Hildenbrand
2023-07-26 13:08 ` David Hildenbrand
2023-07-27 8:18 ` Michal Hocko
2023-07-27 9:41 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230719230515.GA3654720@google.com \
--to=zwisler@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).