From: David Hildenbrand <david@redhat.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
npiggin@gmail.com, christophe.leroy@csgroup.eu,
Tarun Sahu <tsahu@linux.ibm.com>
Cc: foraker1@llnl.gov
Subject: Re: [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter
Date: Mon, 19 Jun 2023 18:28:56 +0200 [thread overview]
Message-ID: <bdb94911-ec9a-02ca-06fc-f850b6c815b2@redhat.com> (raw)
In-Reply-To: <875y7jifzc.fsf@linux.ibm.com>
On 19.06.23 18:17, Aneesh Kumar K.V wrote:
> David Hildenbrand <david@redhat.com> writes:
>
>> On 09.06.23 08:08, Aneesh Kumar K.V wrote:
>>> Certain devices can possess non-standard memory capacities, not constrained
>>> to multiples of 1GB. Provide a kernel parameter so that we can map the
>>> device memory completely on memory hotplug.
>>
>> So, the unfortunate thing is that these devices would have worked out of
>> the box before the memory block size was increased from 256 MiB to 1 GiB
>> in these setups. Now, one has to fine-tune the memory block size. The
>> only other arch that I know, which supports setting the memory block
>> size, is x86 for special (large) UV systems -- and at least in the past
>> 128 MiB vs. 2 GiB memory blocks made a performance difference during
>> boot (maybe no longer today, who knows).
>>
>>
>> Obviously, less tunable and getting stuff simply working out of the box
>> is preferable.
>>
>> Two questions:
>>
>> 1) Isn't there a way to improve auto-detection to fallback to 256 MiB in
>> these setups, to avoid specifying these parameters?
>
> The patch does try to detect as much as possible by looking at device tree
> nodes and aperture window size. But there are still cases where we find
> a memory aperture of size X GB and device driver hotplug X.YGB memory.
>
Okay, and I assume we can't detect that case easily.
Which interface is that device driver using to hotplug memory? It's
quite surprising I have to say ...
>>
>> 2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On
>> x86-64, experiments (with direct map fragmentation) showed that the
>> effective performance boost is pretty insignificant, so I wonder how big
>> the 1 GiB direct map performance improvement is.
>
>
> Tarun is running some tests to evaluate the impact. We used to use 1GiB
> mapping always. This was later switched to use memory block size to fix
> issues with memory unplug
> commit af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory")
> explains some details related to that change.
>
IIUC, that commit (conditionally) increased the memory block size to
avoid the splitting, correct? By that, it broke the device driver use case.
>
>>
>>
>> I guess the only real issue with 256 MiB memory blocks and 1 GiB direct
>> mapping is memory unplug of boot memory: when unplugging a 256 MiB
>> block, one would have to remap the 1 GiB range using 2 MiB ranges.
>
>>
>> ... I was wondering what would happen if you simply leave the direct
>> mapping in this corner case in place instead of doing this remapping.
>> IOW, remove the memory but keep the direct map pointing at the removed
>> memory. Nobody should be touching it, or are there any cases where that
>> could hurt?
>>
>>
>> Or is there any other reason why we really want 1 GiB memory blocks
>> instead of to defaulting to 256 MiB the way it used to be?
>>
>
> The idea we are working towards is to keep the memory block size small
That would be preferable, yes ...
> but map the boot memory using 1G. An unplug request can split that 1G
> mapping later. We could look at the possibility of leaving that mapping
> without splitting. But not sure why we would want to do that if we can
> correctly split things. Right now there is no splitting support in powerpc.
If splitting over-complicates the matter (and well, it will even consume
more memory), it might at least be worth looking into that. Yes, it's
cleaner.
I think there is also the option to fail memory offlining (and therefore
unplug) if we have a 1 GiB mapping and don't want to split. For
hotplugged memory it would always work to unplug again. aarch64 blocks
any boot memory from getting unplugged.
But I guess that might break existing use cases (unplug boot memory) on
ppc64 that rely on ZONE_MOVABLE to have it working with guarantees,
right? Could be optimized but not sure if that's the best approach.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-06-19 16:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-09 6:08 [PATCH v2 1/2] powerpc/mm: Cleanup memory block size probing Aneesh Kumar K.V
2023-06-09 6:08 ` [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter Aneesh Kumar K.V
2023-06-13 20:06 ` Reza Arbab
2023-06-19 10:35 ` David Hildenbrand
2023-06-19 16:17 ` Aneesh Kumar K.V
2023-06-19 16:28 ` David Hildenbrand [this message]
2023-06-20 12:35 ` Michael Ellerman
2023-06-20 12:53 ` David Hildenbrand
2023-06-13 19:53 ` [PATCH v2 1/2] powerpc/mm: Cleanup memory block size probing Reza Arbab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bdb94911-ec9a-02ca-06fc-f850b6c815b2@redhat.com \
--to=david@redhat.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=foraker1@llnl.gov \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=tsahu@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).