From: Gang Li <gang.li@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Muchun Song <muchun.song@linux.dev>,
Tim Chen <tim.c.chen@linux.intel.com>,
Steffen Klassert <steffen.klassert@secunet.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Jane Chu <jane.chu@oracle.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Randy Dunlap <rdunlap@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
ligang.bdlg@bytedance.com, Gang Li <gang.li@linux.dev>
Subject: [PATCH v6 4/8] padata: dispatch works on different nodes
Date: Thu, 22 Feb 2024 22:04:17 +0800 [thread overview]
Message-ID: <20240222140422.393911-5-gang.li@linux.dev> (raw)
In-Reply-To: <20240222140422.393911-1-gang.li@linux.dev>
When a group of tasks that access different nodes are scheduled on the
same node, they may encounter bandwidth bottlenecks and access latency.
Thus, numa_aware flag is introduced here, allowing tasks to be
distributed across different nodes to fully utilize the advantage of
multi-node systems.
Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
---
include/linux/padata.h | 2 ++
kernel/padata.c | 14 ++++++++++++--
mm/mm_init.c | 1 +
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/linux/padata.h b/include/linux/padata.h
index 495b16b6b4d72..8f418711351bc 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -137,6 +137,7 @@ struct padata_shell {
* appropriate for one worker thread to do at once.
* @max_threads: Max threads to use for the job, actual number may be less
* depending on task size and minimum chunk size.
+ * @numa_aware: Distribute jobs to different nodes with CPU in a round robin fashion.
*/
struct padata_mt_job {
void (*thread_fn)(unsigned long start, unsigned long end, void *arg);
@@ -146,6 +147,7 @@ struct padata_mt_job {
unsigned long align;
unsigned long min_chunk;
int max_threads;
+ bool numa_aware;
};
/**
diff --git a/kernel/padata.c b/kernel/padata.c
index 179fb1518070c..e3f639ff16707 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -485,7 +485,8 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
struct padata_work my_work, *pw;
struct padata_mt_job_state ps;
LIST_HEAD(works);
- int nworks;
+ int nworks, nid;
+ static atomic_t last_used_nid __initdata;
if (job->size == 0)
return;
@@ -517,7 +518,16 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
ps.chunk_size = roundup(ps.chunk_size, job->align);
list_for_each_entry(pw, &works, pw_list)
- queue_work(system_unbound_wq, &pw->pw_work);
+ if (job->numa_aware) {
+ int old_node = atomic_read(&last_used_nid);
+
+ do {
+ nid = next_node_in(old_node, node_states[N_CPU]);
+ } while (!atomic_try_cmpxchg(&last_used_nid, &old_node, nid));
+ queue_work_node(nid, system_unbound_wq, &pw->pw_work);
+ } else {
+ queue_work(system_unbound_wq, &pw->pw_work);
+ }
/* Use the current thread, which saves starting a workqueue worker. */
padata_work_init(&my_work, padata_mt_helper, &ps, PADATA_WORK_ONSTACK);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2c19f5515e36c..549e76af8f82a 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2231,6 +2231,7 @@ static int __init deferred_init_memmap(void *data)
.align = PAGES_PER_SECTION,
.min_chunk = PAGES_PER_SECTION,
.max_threads = max_threads,
+ .numa_aware = false,
};
padata_do_multithreaded(&job);
--
2.20.1
next prev parent reply other threads:[~2024-02-22 14:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 14:04 [PATCH v6 0/8] hugetlb: parallelize hugetlb page init on boot Gang Li
2024-02-22 14:04 ` [PATCH v6 1/8] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
2024-02-22 14:04 ` [PATCH v6 2/8] hugetlb: split hugetlb_hstate_alloc_pages Gang Li
2024-02-22 14:04 ` [PATCH v6 3/8] hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc Gang Li
2024-02-22 14:04 ` Gang Li [this message]
2024-02-27 21:24 ` [PATCH v6 4/8] padata: dispatch works on different nodes Daniel Jordan
2024-03-05 2:49 ` Gang Li
2024-03-08 15:42 ` Daniel Jordan
2024-02-22 14:04 ` [PATCH v6 5/8] padata: downgrade padata_do_multithreaded to serial execution for non-SMP Gang Li
2024-02-27 21:26 ` Daniel Jordan
2024-03-05 3:24 ` Gang Li
2024-02-22 14:04 ` [PATCH v6 6/8] hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA Gang Li
2024-02-27 21:26 ` Daniel Jordan
2024-02-22 14:04 ` [PATCH v6 7/8] hugetlb: parallelize 2M hugetlb allocation and initialization Gang Li
2024-03-08 17:11 ` Daniel Jordan
2024-02-22 14:04 ` [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization Gang Li
2024-03-08 17:35 ` Daniel Jordan
2024-03-12 2:26 ` Gang Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240222140422.393911-5-gang.li@linux.dev \
--to=gang.li@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=daniel.m.jordan@oracle.com \
--cc=david@redhat.com \
--cc=jane.chu@oracle.com \
--cc=ligang.bdlg@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=paulmck@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=steffen.klassert@secunet.com \
--cc=tim.c.chen@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.