From: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
To: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
Gavin Shan <gshan@redhat.com>, Will Deacon <will@kernel.org>,
David Hildenbrand <david@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-kernel@vger.kernel.org, Steven Price <steven.price@arm.com>,
Logan Gunthorpe <logang@deltatee.com>,
Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
linux-arm-kernel@lists.infradead.org, pratikp@codeaurora.org
Subject: Re: arm64: dropping prevent_bootmem_remove_notifier
Date: Thu, 29 Oct 2020 14:02:09 -0700 [thread overview]
Message-ID: <57952529a81c6d5ba2521c203dfc54b5@codeaurora.org> (raw)
In-Reply-To: <f125083d-4885-d174-f732-9cd96c45ddb4@arm.com>
Hi Anshuman, David,
Thanks for all the detailed explanations for the reasoning to have
bootmem protected from being removed. Also, I do agree drivers being
able to mark memory sections isn't the right thing to do.
We went ahead with the approach of using "mem=" as you suggested to
limit the bootmem and add remaining blocks using
add_memory_driver_managed() so that driver has ownership of these
blocks.
We do have some follow-up questions regarding this - will initiate a
discussion soon.
On 2020-10-18 22:37, Anshuman Khandual wrote:
> Hello Sudarshan,
>
> On 10/17/2020 04:41 AM, Sudarshan Rajagopalan wrote:
>>
>> Hello Anshuman,
>>
>> In the patch that enables memory hot-remove (commit bbd6ec605c0f
>> ("arm64/mm: Enable memory hot remove")) for arm64, there’s a notifier
>> put in place that prevents boot memory from being offlined and
>> removed. Also commit text mentions that boot memory on arm64 cannot be
>> removed. We wanted to understand more about the reasoning for this.
>> X86 and other archs doesn’t seem to do this prevention. There’s also
>> comment in the code that this notifier could be dropped in future if
>> and when boot memory can be removed.
>
> Right and till then the notifier cannot be dropped. There was a lot of
> discussions
> around this topic during multiple iterations of memory hot remove
> series. Hence, I
> would just request you to please go through them first. This list here
> is from one
> such series (https://lwn.net/Articles/809179/) but might not be
> exhaustive.
>
> -----------------
> On arm64 platform, it is essential to ensure that the boot time
> discovered
> memory couldn't be hot-removed so that,
>
> 1. FW data structures used across kexec are idempotent
> e.g. the EFI memory map.
>
> 2. linear map or vmemmap would not have to be dynamically split, and
> can
> map boot memory at a large granularity
>
> 3. Avoid penalizing paths that have to walk page tables, where we can
> be
> certain that the memory is not hot-removable
> -----------------
>
> The primary reason being kexec which would need substantial rework
> otherwise.
>
>>
>> The current logic is that only “new” memory blocks which are hot-added
>> can later be offlined and removed. The memory that system booted up
>> with cannot be offlined and removed. But there could be many usercases
>> such as inter-VM memory sharing where a primary VM could offline and
>> hot-remove a block/section of memory and lend it to secondary VM where
>> it could hot-add it. And after usecase is done, the reverse happens
>> where secondary VM hot-removes and gives it back to primary which can
>> hot-add it back. In such cases, the present logic for arm64 doesn’t
>> allow this hot-remove in primary to happen.
>
> That is not true. Each VM could just boot with a minimum boot memory
> which can
> not be offlined or removed but then a possible larger portion of memory
> can be
> hot added during the boot process itself, making them available for any
> future
> inter VM sharing purpose. Hence this problem could easily be solved in
> the user
> space itself.
>
>>
>> Also, on systems with movable zone that sort of guarantees pages to be
>> migrated and isolated so that blocks can be offlined, this logic also
>> defeats the purpose of having a movable zone which system can rely on
>> memory hot-plugging, which say virt-io mem also relies on for fully
>> plugged memory blocks.
> ZONE_MOVABLE does not really guarantee migration, isolation and
> removal. There
> are reasons an offline request might just fail. I agree that those
> reasons are
> normally not platform related but core memory gives platform an
> opportunity to
> decline an offlining request via a notifier. Hence ZONE_MOVABLE offline
> can be
> denied. Semantics wise we are still okay.
>
> This might look bit inconsistent that
> movablecore/kernelcore/movable_node with
> firmware sending in 'hot pluggable' memory (IIRC arm64 does not really
> support
> this yet), the system might end up with ZONE_MOVABLE marked boot memory
> which
> cannot be offlined or removed. But an offline notifier action is
> orthogonal.
> Hence did not block those kernel command line paths that creates
> ZONE_MOVABLE
> during boot to preserve existing behavior.
>
>>
>> I understand that some region of boot RAM shouldn’t be allowed to be
>> removed, but such regions won’t be allowed to be offlined in first
>> place since pages cannot be migrated and isolated, example reserved
>> pages.
>>
>> So we’re trying to understand the reasoning for such a prevention put
>> in place for arm64 arch alone.
>
> Primary reason being kexec. During kexec on arm64, next kernel's memory
> map is
> derived from firmware and not from current running kernel. So the next
> kernel
> will crash if it would access memory that might have been removed in
> running
> kernel. Until kexec on arm64 changes substantially and takes into
> account the
> real available memory on the current kernel, boot memory cannot be
> removed.
>
>>
>> One possible way to solve this is by marking the required sections as
>> “non-early” by removing the SECTION_IS_EARLY bit in its
>> section_mem_map.
>
> That is too intrusive from core memory perspective.
>
> This puts these sections in the context of “memory hotpluggable”
> which can be offlined-removed and added-onlined which are part of boot
> RAM itself and doesn’t need any extra blocks to be hot added. This way
> of marking certain sections as “non-early” could be exported so that
> module drivers can set the required number of sections as “memory
> hotpluggable”. This could have certain checks put in place to see
> which sections are allowed, example only movable zone sections can be
> marked as “non-early”.
>
> Giving modules the right to mark memory hotpluggable ? That is too
> intrusive
> and would still not solve the problem with kexec.
>
>>
>> Your thoughts on this? We are also looking for different ways to solve
>> the problem without having to completely dropping this notifier, but
>> just putting out the concern here about the notifier logic that is
>> breaking our usecase which is a generic memory sharing usecase using
>> memory hotplug feature.
>
> Completely preventing boot memory offline and removal is essential for
> kexec
> to work as expected afterwards. As suggested previously, splitting the
> VM
> memory into boot and non boot chunks during init can help work around
> this
> restriction effectively in userspace itself and would not require any
> kernel
> changes.
>
> - Anshuman
Sudarshan
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
prev parent reply other threads:[~2020-10-29 21:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-16 23:11 arm64: dropping prevent_bootmem_remove_notifier Sudarshan Rajagopalan
2020-10-17 9:35 ` David Hildenbrand
2020-10-19 6:30 ` Anshuman Khandual
2020-10-19 5:37 ` Anshuman Khandual
2020-10-29 21:02 ` Sudarshan Rajagopalan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57952529a81c6d5ba2521c203dfc54b5@codeaurora.org \
--to=sudaraja@codeaurora.org \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=mark.rutland@arm.com \
--cc=pratikp@codeaurora.org \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).