From: Gang Li <gang.li@linux.dev>
To: David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
ligang.bdlg@bytedance.com, Gang Li <gang.li@linux.dev>
Subject: [PATCH v5 0/7] hugetlb: parallelize hugetlb page init on boot
Date: Fri, 26 Jan 2024 23:24:04 +0800 [thread overview]
Message-ID: <20240126152411.1238072-1-gang.li@linux.dev> (raw)
Hi all, hugetlb init parallelization has now been updated to v5.
This version is tested on next-20240125.
Update Summary:
- Use prep_and_add_allocated_folios in 2M hugetlb parallelization
- Update huge_boot_pages in arch/powerpc/mm/hugetlbpage.c
- Revise struct padata_mt_job comment
- Add 'max_threads' section in cover letter
- Collect more Reviewed-by
# Introduction
Hugetlb initialization during boot takes up a considerable amount of time.
For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2
seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel
host takes more than 1 minute[1]. This is a noteworthy figure.
Inspired by [2] and [3], hugetlb initialization can also be accelerated
through parallelization. Kernel already has infrastructure like
padata_do_multithreaded, this patch uses it to achieve effective results
by minimal modifications.
[1] https://lore.kernel.org/all/783f8bac-55b8-5b95-eb6a-11a583675000@google.com/
[2] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/
[3] https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/
[4] https://lore.kernel.org/all/76becfc1-e609-e3e8-2966-4053143170b6@google.com/
# max_threads
This patch use `padata_do_multithreaded` like this:
```
job.max_threads = num_node_state(N_MEMORY) * multiplier;
padata_do_multithreaded(&job);
```
To fully utilize the CPU, the number of parallel threads needs to be
carefully considered. `max_threads = num_node_state(N_MEMORY)` does
not fully utilize the CPU, so we need to multiply it by a multiplier.
Tests below indicate that a multiplier of 2 significantly improves
performance, and although larger values also provide improvements,
the gains are marginal.
multiplier 1 2 3 4 5
------------ ------- ------- ------- ------- -------
256G 2node 358ms 215ms 157ms 134ms 126ms
2T 4node 979ms 679ms 543ms 489ms 481ms
50G 2node 71ms 44ms 37ms 30ms 31ms
Therefore, choosing 2 as the multiplier strikes a good balance between
enhancing parallel processing capabilities and maintaining efficient
resource management.
# Test result
test case no patch(ms) patched(ms) saved
------------------- -------------- ------------- --------
256c2T(4 node) 1G 4745 2024 57.34%
128c1T(2 node) 1G 3358 1712 49.02%
12T 1G 77000 18300 76.23%
256c2T(4 node) 2M 3336 1051 68.52%
128c1T(2 node) 2M 1943 716 63.15%
# Change log
Changes in v5:
- Use prep_and_add_allocated_folios in 2M hugetlb parallelization
- Update huge_boot_pages in arch/powerpc/mm/hugetlbpage.c
- Revise struct padata_mt_job comment
- Add 'max_threads' section in cover letter
- Collect more Reviewed-by
Changes in v4:
- https://lore.kernel.org/r/20240118123911.88833-1-gang.li@linux.dev
- Make padata_do_multithreaded dispatch all jobs with a global iterator
- Revise commit message
- Rename some functions
- Collect Tested-by and Reviewed-by
Changes in v3:
- https://lore.kernel.org/all/20240102131249.76622-1-gang.li@linux.dev/
- Select CONFIG_PADATA as we use padata_do_multithreaded
- Fix a race condition in h->next_nid_to_alloc
- Fix local variable initialization issues
- Remove RFC tag
Changes in v2:
- https://lore.kernel.org/all/20231208025240.4744-1-gang.li@linux.dev/
- Reduce complexity with `padata_do_multithreaded`
- Support 1G hugetlb
v1:
- https://lore.kernel.org/all/20231123133036.68540-1-gang.li@linux.dev/
- parallelize 2M hugetlb initialization with workqueue
Gang Li (7):
hugetlb: code clean for hugetlb_hstate_alloc_pages
hugetlb: split hugetlb_hstate_alloc_pages
padata: dispatch works on different nodes
hugetlb: pass *next_nid_to_alloc directly to
for_each_node_mask_to_alloc
hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA
hugetlb: parallelize 2M hugetlb allocation and initialization
hugetlb: parallelize 1G hugetlb initialization
arch/powerpc/mm/hugetlbpage.c | 2 +-
fs/Kconfig | 1 +
include/linux/hugetlb.h | 2 +-
include/linux/padata.h | 2 +
kernel/padata.c | 14 +-
mm/hugetlb.c | 234 +++++++++++++++++++++++-----------
mm/mm_init.c | 1 +
7 files changed, 175 insertions(+), 81 deletions(-)
--
2.20.1
next reply other threads:[~2024-01-26 15:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 15:24 Gang Li [this message]
2024-01-26 15:24 ` [PATCH v5 1/7] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
2024-01-26 15:24 ` [PATCH v5 2/7] hugetlb: split hugetlb_hstate_alloc_pages Gang Li
2024-01-26 15:24 ` [PATCH v5 3/7] padata: dispatch works on different nodes Gang Li
2024-01-26 22:23 ` Tim Chen
2024-01-26 15:24 ` [PATCH v5 4/7] hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc Gang Li
2024-01-26 15:24 ` [PATCH v5 5/7] hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA Gang Li
2024-01-26 15:24 ` [PATCH v5 6/7] hugetlb: parallelize 2M hugetlb allocation and initialization Gang Li
2024-01-29 3:44 ` Muchun Song
2024-01-26 15:24 ` [PATCH v5 7/7] hugetlb: parallelize 1G hugetlb initialization Gang Li
2024-01-29 3:56 ` Muchun Song
2024-02-05 7:28 ` Muchun Song
2024-02-05 8:26 ` Gang Li
2024-02-05 9:09 ` Muchun Song
2024-02-07 1:53 ` Jane Chu
2024-02-09 17:17 ` Daniel Jordan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240126152411.1238072-1-gang.li@linux.dev \
--to=gang.li@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=ligang.bdlg@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=tim.c.chen@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).