From: David Hildenbrand <david@redhat.com>
To: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
linuxppc-dev@ozlabs.org, Michal Suchanek <msuchanek@suse.de>,
Laurent Dufour <ldufour@linux.vnet.ibm.com>,
Rick Lindsley <ricklind@linux.vnet.ibm.com>
Subject: Re: [PATCH v3] pseries/hotplug-memory: hot-add: skip redundant LMB lookup
Date: Wed, 16 Sep 2020 16:45:57 +0200 [thread overview]
Message-ID: <954f530d-99fb-64a1-0733-772d3a9e98ff@redhat.com> (raw)
In-Reply-To: <20200916143913.o4o63mh4mums2qfm@rascal.austin.ibm.com>
On 16.09.20 16:39, Scott Cheloha wrote:
> On Wed, Sep 16, 2020 at 09:39:53AM +0200, David Hildenbrand wrote:
>> On 15.09.20 21:46, Scott Cheloha wrote:
>>> During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
>>> to determine which node id (nid) to use when later calling __add_memory().
>>>
>>> This is wasteful. On pseries, memory_add_physaddr_to_nid() finds an
>>> appropriate nid for a given address by looking up the LMB containing the
>>> address and then passing that LMB to of_drconf_to_nid_single() to get the
>>> nid. In dlpar_add_lmb() we get this address from the LMB itself.
>>>
>>> In short, we have a pointer to an LMB and then we are searching for
>>> that LMB *again* in order to find its nid.
>>>
>>> If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we
>>> can skip the redundant lookup. The only error handling we need to
>>> duplicate from memory_add_physaddr_to_nid() is the fallback to the
>>> default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or
>>> an invalid nid.
>>>
>>> Skipping the extra lookup makes hot-add operations faster, especially
>>> on machines with many LMBs.
>>>
>>> Consider an LPAR with 126976 LMBs. In one test, hot-adding 126000
>>> LMBs on an upatched kernel took ~3.5 hours while a patched kernel
>>> completed the same operation in ~2 hours:
>>>
>>> Unpatched (12450 seconds):
>>> Sep 9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000
>>> Sep 9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
>>> [...]
>>> Sep 9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
>>>
>>> Patched (7065 seconds):
>>> Sep 8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000
>>> Sep 8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
>>> [...]
>>> Sep 8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
>>>
>>> It should be noted that the speedup grows more substantial when
>>> hot-adding LMBs at the end of the drconf range. This is because we
>>> are skipping a linear LMB search.
>>>
>>> To see the distinction, consider smaller hot-add test on the same
>>> LPAR. A perf-stat run with 10 iterations showed that hot-adding 4096
>>> LMBs completed less than 1 second faster on a patched kernel:
>>>
>>> Unpatched:
>>> Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
>>>
>>> 104,753.42 msec task-clock # 0.992 CPUs utilized ( +- 0.55% )
>>> 4,708 context-switches # 0.045 K/sec ( +- 0.69% )
>>> 2,444 cpu-migrations # 0.023 K/sec ( +- 1.25% )
>>> 394 page-faults # 0.004 K/sec ( +- 0.22% )
>>> 445,902,503,057 cycles # 4.257 GHz ( +- 0.55% ) (66.67%)
>>> 8,558,376,740 stalled-cycles-frontend # 1.92% frontend cycles idle ( +- 0.88% ) (49.99%)
>>> 300,346,181,651 stalled-cycles-backend # 67.36% backend cycles idle ( +- 0.76% ) (50.01%)
>>> 258,091,488,691 instructions # 0.58 insn per cycle
>>> # 1.16 stalled cycles per insn ( +- 0.22% ) (66.67%)
>>> 70,568,169,256 branches # 673.660 M/sec ( +- 0.17% ) (50.01%)
>>> 3,100,725,426 branch-misses # 4.39% of all branches ( +- 0.20% ) (49.99%)
>>>
>>> 105.583 +- 0.589 seconds time elapsed ( +- 0.56% )
>>>
>>> Patched:
>>> Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
>>>
>>> 104,055.69 msec task-clock # 0.993 CPUs utilized ( +- 0.32% )
>>> 4,606 context-switches # 0.044 K/sec ( +- 0.20% )
>>> 2,463 cpu-migrations # 0.024 K/sec ( +- 0.93% )
>>> 394 page-faults # 0.004 K/sec ( +- 0.25% )
>>> 442,951,129,921 cycles # 4.257 GHz ( +- 0.32% ) (66.66%)
>>> 8,710,413,329 stalled-cycles-frontend # 1.97% frontend cycles idle ( +- 0.47% ) (50.06%)
>>> 299,656,905,836 stalled-cycles-backend # 67.65% backend cycles idle ( +- 0.39% ) (50.02%)
>>> 252,731,168,193 instructions # 0.57 insn per cycle
>>> # 1.19 stalled cycles per insn ( +- 0.20% ) (66.66%)
>>> 68,902,851,121 branches # 662.173 M/sec ( +- 0.13% ) (49.94%)
>>> 3,100,242,882 branch-misses # 4.50% of all branches ( +- 0.15% ) (49.98%)
>>>
>>> 104.829 +- 0.325 seconds time elapsed ( +- 0.31% )
>>>
>>> This is consistent. An add-by-count hot-add operation adds LMBs
>>> greedily, so LMBs near the start of the drconf range are considered
>>> first. On an otherwise idle LPAR with so many LMBs we would expect to
>>> find the LMBs we need near the start of the drconf range, hence the
>>> smaller speedup.
>>>
>>> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
>>
>>
>> Hi Scott,
>>
>> IIRC, ppc DLPAR does a single add_memory() [...]
>
> Yes.
>
>> [...] for each LMB (16 MB).
>
> The block size is set by the hypervisor. The default is 256MB. In
> this test I had a block size of 256MB.
Oh, I wasn't aware that it's configurable, thanks for pointing that out
(missed the custom memory_block_size_bytes() implementation).
I wonder how it works with pseries_remove_memblock(), that uses
MIN_MEMORY_BLOCK_SIZE with __remove_memory() - that will always BUG_ON
in try_remove_memory() with BUG_ON(check_hotplug_memory_range(start,
size)) in case the size is < memory_block_size_bytes().
Maybe that's not called on such machines ...
>
> On multi-terabyte machines I would effectively always expect a block
> size of 256MB. 16MB blocks are supported, but it is not the default
> setting so it is increasingly rare.>
>> With tons of LMBs, this will also make /proc/iomem explode in size (using a
>> list-based tree), making traversal significantly slower e.g., on
>> insertions and system ram walks.
>>
>> I was wondering if you would get another performance boost under ppc
>> when using MEMHP_MERGE_RESOURCE [1]. AFAIKs, the resource boundaries are
>> not of interest. No guarantees, might be worth a try.
>
> I'll give it a shot.
>
>> Did you investigate what else makes memory hotplug that slow? (126000
>> LMBs correspond to roughly 2TB, that shouldn't take 2 hours ...)
>
> It was about ~31TB in 256MB blocks. It's a worst-case test (add all
> the memory), but I'm pretty happy with a 1.5 hour improvement :)
Yeah, definitely :)
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2020-09-16 14:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-15 19:46 [PATCH v3] pseries/hotplug-memory: hot-add: skip redundant LMB lookup Scott Cheloha
2020-09-16 7:39 ` David Hildenbrand
2020-09-16 14:39 ` Scott Cheloha
2020-09-16 14:45 ` David Hildenbrand [this message]
2020-09-16 8:22 ` Laurent Dufour
2020-10-07 3:21 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=954f530d-99fb-64a1-0733-772d3a9e98ff@redhat.com \
--to=david@redhat.com \
--cc=cheloha@linux.ibm.com \
--cc=ldufour@linux.vnet.ibm.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=msuchanek@suse.de \
--cc=nathanl@linux.ibm.com \
--cc=ricklind@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).