linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 11/20] kthread: Make sure kthread hasn't started while binding it
       [not found] <20240726215701.19459-1-frederic@kernel.org>
@ 2024-07-26 21:56 ` Frederic Weisbecker
  2024-07-30 15:20   ` Vlastimil Babka
  2024-07-26 21:56 ` [RFC PATCH 12/20] kthread: Implement preferred affinity Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
	Zqiang, rcu

Make sure the kthread is sleeping in the schedule_preempt_disabled()
call before calling its handler when kthread_bind[_mask]() is called
on it. This provides a sanity check verifying that the task is not
randomly blocked later at some point within its function handler, in
which case it could be just concurrently awaken, leaving the call to
do_set_cpus_allowed() without any effect until the next voluntary sleep.

Rely on the wake-up ordering to ensure that the newly introduced "started"
field returns the expected value:

    TASK A                                   TASK B
    ------                                   ------
READ kthread->started
wake_up_process(B)
   rq_lock()
   ...
   rq_unlock() // RELEASE
                                           schedule()
                                              rq_lock() // ACQUIRE
                                              // schedule task B
                                              rq_unlock()
                                              WRITE kthread->started

Similarly, writing kthread->started before subsequent voluntary sleeps
will be visible after calling wait_task_inactive() in
__kthread_bind_mask(), reporting potential misuse of the API.

Upcoming patches will make further use of this facility.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/kthread.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index f7be976ff88a..ecb719f54f7a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -53,6 +53,7 @@ struct kthread_create_info
 struct kthread {
 	unsigned long flags;
 	unsigned int cpu;
+	int started;
 	int result;
 	int (*threadfn)(void *);
 	void *data;
@@ -382,6 +383,8 @@ static int kthread(void *_create)
 	schedule_preempt_disabled();
 	preempt_enable();
 
+	self->started = 1;
+
 	ret = -EINTR;
 	if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
 		cgroup_kthread_ready();
@@ -540,7 +543,9 @@ static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int
 
 void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
 {
+	struct kthread *kthread = to_kthread(p);
 	__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
+	WARN_ON_ONCE(kthread->started);
 }
 
 /**
@@ -554,7 +559,9 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
  */
 void kthread_bind(struct task_struct *p, unsigned int cpu)
 {
+	struct kthread *kthread = to_kthread(p);
 	__kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE);
+	WARN_ON_ONCE(kthread->started);
 }
 EXPORT_SYMBOL(kthread_bind);
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 12/20] kthread: Implement preferred affinity
       [not found] <20240726215701.19459-1-frederic@kernel.org>
  2024-07-26 21:56 ` [RFC PATCH 11/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
@ 2024-07-26 21:56 ` Frederic Weisbecker
  2024-07-26 22:31   ` Frederic Weisbecker
  2024-07-30 15:49   ` Vlastimil Babka
  2024-07-26 21:56 ` [RFC PATCH 13/20] mm: Make Kcompactd use kthread's " Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
	Zqiang, rcu

Affining kthreads follow either of three existing different patterns:

1) Per-CPU kthreads must stay affine to a single CPU and never execute
   relevant code on any other CPU. This is currently handled by smpboot
   code which takes care of CPU-hotplug operations.

2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
   run anywhere else. The affinity is set through kthread_bind_mask()
   and the subsystem takes care by itself to handle CPU-hotplug operations.

3) Kthreads that have a _preferred_ affinity but that can run anywhere
   without breaking correctness. Userspace can overwrite the affinity.
   It is set manually like any other task and CPU-hotplug is supposed
   to be handled by the relevant subsystem so that the task is properly
   reaffined whenever a given CPU from the preferred affinity comes up
   or down. Also care must be taken so that the preferred affinity
   doesn't cross housekeeping cpumask boundaries.

Currently the preferred affinity pattern has at least 4 identified
users, with more or less success when it comes to handle CPU-hotplug
operations and housekeeping cpumask.

Provide an infrastructure to handle this usecase patter. A new
kthread_affine_preferred() API is introduced, to be used just like
kthread_bind_mask(), right after kthread creation and before the first
wake up. The kthread is then affine right away to the cpumask passed
through the API if it has online housekeeping CPUs. Otherwise it will
be affine to all online housekeeping CPUs as a last resort.

It is aware of CPU hotplug events such that:

* When a housekeeping CPU goes up and is part of the preferred affinity
  of a given kthread, it is added to its applied affinity set (and
  possibly the default last resort online housekeeping set is removed
  from the set).

* When a housekeeping CPU goes down while it was part of the preferred
  affinity of a kthread, it is removed from the kthread's applied
  affinity. The last resort is to affine the kthread to all online
  housekeeping CPUs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/cpuhotplug.h |   1 +
 include/linux/kthread.h    |   1 +
 kernel/kthread.c           | 121 +++++++++++++++++++++++++++++++++++++
 3 files changed, 123 insertions(+)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 7a5785f405b6..5c204bd0fed6 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -238,6 +238,7 @@ enum cpuhp_state {
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RANDOM_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
+	CPUHP_AP_KTHREADS_ONLINE,
 	CPUHP_AP_BASE_CACHEINFO_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 40,
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b11f53c1ba2e..30209bdf83a2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data,
 void free_kthread_struct(struct task_struct *k);
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask);
 int kthread_stop(struct task_struct *k);
 int kthread_stop_put(struct task_struct *k);
 bool kthread_should_stop(void);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ecb719f54f7a..cfa6e1b8d933 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -35,6 +35,10 @@ static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
 struct task_struct *kthreadd_task;
 
+static struct cpumask kthread_online_mask;
+static LIST_HEAD(kthreads_hotplug);
+static DEFINE_MUTEX(kthreads_hotplug_lock);
+
 struct kthread_create_info
 {
 	/* Information passed to kthread() from kthreadd. */
@@ -64,6 +68,9 @@ struct kthread {
 #endif
 	/* To store the full name if task comm is truncated. */
 	char *full_name;
+	struct task_struct *task;
+	struct list_head hotplug_node;
+	struct cpumask *preferred_affinity;
 };
 
 enum KTHREAD_BITS {
@@ -124,6 +131,7 @@ bool set_kthread_struct(struct task_struct *p)
 	init_completion(&kthread->parked);
 	p->vfork_done = &kthread->exited;
 
+	kthread->task = p;
 	p->worker_private = kthread;
 	return true;
 }
@@ -314,6 +322,16 @@ void __noreturn kthread_exit(long result)
 {
 	struct kthread *kthread = to_kthread(current);
 	kthread->result = result;
+	if (kthread->preferred_affinity) {
+		mutex_lock(&kthreads_hotplug_lock);
+		list_del(&kthread->hotplug_node);
+		/* Make sure the kthread never gets re-affined globally */
+		set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
+		mutex_unlock(&kthreads_hotplug_lock);
+
+		kfree(kthread->preferred_affinity);
+		kthread->preferred_affinity = NULL;
+	}
 	do_exit(0);
 }
 EXPORT_SYMBOL(kthread_exit);
@@ -779,6 +797,109 @@ int kthreadd(void *unused)
 	return 0;
 }
 
+static void kthread_fetch_affinity(struct kthread *k, struct cpumask *mask)
+{
+	cpumask_and(mask, k->preferred_affinity, &kthread_online_mask);
+	cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
+	if (cpumask_empty(mask))
+		cpumask_copy(mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
+}
+
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
+{
+	struct kthread *kthread = to_kthread(p);
+	cpumask_var_t affinity;
+	unsigned long flags;
+	int ret;
+
+	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
+	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+		return -ENOMEM;
+
+	kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
+	if (!kthread->preferred_affinity) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	mutex_lock(&kthreads_hotplug_lock);
+	cpumask_copy(kthread->preferred_affinity, mask);
+	list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
+	kthread_fetch_affinity(kthread, affinity);
+
+	/* It's safe because the task is inactive. */
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	do_set_cpus_allowed(p, mask);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+	mutex_unlock(&kthreads_hotplug_lock);
+out:
+	free_cpumask_var(affinity);
+
+	return 0;
+}
+
+static int kthreads_hotplug_update(void)
+{
+	cpumask_var_t affinity;
+	struct kthread *k;
+	int err = 0;
+
+	if (list_empty(&kthreads_hotplug))
+		return 0;
+
+	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+		return -ENOMEM;
+
+	list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
+		if (WARN_ON_ONCE(!k->preferred_affinity)) {
+			err = -EINVAL;
+			break;
+		}
+		kthread_fetch_affinity(k, affinity);
+		set_cpus_allowed_ptr(k->task, affinity);
+	}
+
+	free_cpumask_var(affinity);
+
+	return err;
+}
+
+static int kthreads_offline_cpu(unsigned int cpu)
+{
+	int ret = 0;
+
+	mutex_lock(&kthreads_hotplug_lock);
+	cpumask_clear_cpu(cpu, &kthread_online_mask);
+	ret = kthreads_hotplug_update();
+	mutex_unlock(&kthreads_hotplug_lock);
+
+	return ret;
+}
+
+static int kthreads_online_cpu(unsigned int cpu)
+{
+	int ret = 0;
+
+	mutex_lock(&kthreads_hotplug_lock);
+	cpumask_set_cpu(cpu, &kthread_online_mask);
+	ret = kthreads_hotplug_update();
+	mutex_unlock(&kthreads_hotplug_lock);
+
+	return ret;
+}
+
+static int kthreads_init(void)
+{
+	return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online",
+				kthreads_online_cpu, kthreads_offline_cpu);
+}
+early_initcall(kthreads_init);
+
 void __kthread_init_worker(struct kthread_worker *worker,
 				const char *name,
 				struct lock_class_key *key)
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 13/20] mm: Make Kcompactd use kthread's preferred affinity
       [not found] <20240726215701.19459-1-frederic@kernel.org>
  2024-07-26 21:56 ` [RFC PATCH 11/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
  2024-07-26 21:56 ` [RFC PATCH 12/20] kthread: Implement preferred affinity Frederic Weisbecker
@ 2024-07-26 21:56 ` Frederic Weisbecker
  2024-07-31 15:03   ` Vlastimil Babka
  2024-07-26 21:56 ` [RFC PATCH 14/20] mm: Allocate kcompactd on its node Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, Andrew Morton,
	linux-mm, Peter Zijlstra, Thomas Gleixner

Now that kthreads have an infrastructure to handle preferred affinity
against CPU hotplug and housekeeping cpumask, convert Kcompactd to use
it instead of handling halfway all the constraints by itself.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/compaction.c | 43 +++----------------------------------------
 1 file changed, 3 insertions(+), 40 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 739b1bf3d637..64a6486f06e1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3179,15 +3179,9 @@ void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx)
 static int kcompactd(void *p)
 {
 	pg_data_t *pgdat = (pg_data_t *)p;
-	struct task_struct *tsk = current;
 	long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
 	long timeout = default_timeout;
 
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-
-	if (!cpumask_empty(cpumask))
-		set_cpus_allowed_ptr(tsk, cpumask);
-
 	set_freezable();
 
 	pgdat->kcompactd_max_order = 0;
@@ -3258,11 +3252,13 @@ void __meminit kcompactd_run(int nid)
 	if (pgdat->kcompactd)
 		return;
 
-	pgdat->kcompactd = kthread_run(kcompactd, pgdat, "kcompactd%d", nid);
+	pgdat->kcompactd = kthread_create(kcompactd, pgdat, "kcompactd%d", nid);
 	if (IS_ERR(pgdat->kcompactd)) {
 		pr_err("Failed to start kcompactd on node %d\n", nid);
 		pgdat->kcompactd = NULL;
 	}
+	kthread_affine_preferred(pgdat->kcompactd, cpumask_of_node(pgdat->node_id));
+	wake_up_process(pgdat->kcompactd);
 }
 
 /*
@@ -3279,30 +3275,6 @@ void __meminit kcompactd_stop(int nid)
 	}
 }
 
-/*
- * It's optimal to keep kcompactd on the same CPUs as their memory, but
- * not required for correctness. So if the last cpu in a node goes
- * away, we get changed to run anywhere: as the first one comes back,
- * restore their cpu bindings.
- */
-static int kcompactd_cpu_online(unsigned int cpu)
-{
-	int nid;
-
-	for_each_node_state(nid, N_MEMORY) {
-		pg_data_t *pgdat = NODE_DATA(nid);
-		const struct cpumask *mask;
-
-		mask = cpumask_of_node(pgdat->node_id);
-
-		if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
-			/* One of our CPUs online: restore mask */
-			if (pgdat->kcompactd)
-				set_cpus_allowed_ptr(pgdat->kcompactd, mask);
-	}
-	return 0;
-}
-
 static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
 		int write, void *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -3362,15 +3334,6 @@ static struct ctl_table vm_compaction[] = {
 static int __init kcompactd_init(void)
 {
 	int nid;
-	int ret;
-
-	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
-					"mm/compaction:online",
-					kcompactd_cpu_online, NULL);
-	if (ret < 0) {
-		pr_err("kcompactd: failed to register hotplug callbacks.\n");
-		return ret;
-	}
 
 	for_each_node_state(nid, N_MEMORY)
 		kcompactd_run(nid);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 14/20] mm: Allocate kcompactd on its node
       [not found] <20240726215701.19459-1-frederic@kernel.org>
                   ` (2 preceding siblings ...)
  2024-07-26 21:56 ` [RFC PATCH 13/20] mm: Make Kcompactd use kthread's " Frederic Weisbecker
@ 2024-07-26 21:56 ` Frederic Weisbecker
  2024-07-31 15:07   ` Vlastimil Babka
  2024-07-26 21:56 ` [RFC PATCH 15/20] mm: Make kswapd use kthread's preferred affinity Frederic Weisbecker
  2024-07-26 21:56 ` [RFC PATCH 16/20] mm: Allocate kswapd on its node Frederic Weisbecker
  5 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, linux-mm,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner

kcompactd runs preferrably on a specific node. Allocate its task
structure accordingly for better memory locality.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 64a6486f06e1..dc4347c4e352 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3252,7 +3252,7 @@ void __meminit kcompactd_run(int nid)
 	if (pgdat->kcompactd)
 		return;
 
-	pgdat->kcompactd = kthread_create(kcompactd, pgdat, "kcompactd%d", nid);
+	pgdat->kcompactd = kthread_create_on_node(kcompactd, pgdat, nid, "kcompactd%d", nid);
 	if (IS_ERR(pgdat->kcompactd)) {
 		pr_err("Failed to start kcompactd on node %d\n", nid);
 		pgdat->kcompactd = NULL;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 15/20] mm: Make kswapd use kthread's preferred affinity
       [not found] <20240726215701.19459-1-frederic@kernel.org>
                   ` (3 preceding siblings ...)
  2024-07-26 21:56 ` [RFC PATCH 14/20] mm: Allocate kcompactd on its node Frederic Weisbecker
@ 2024-07-26 21:56 ` Frederic Weisbecker
  2024-07-31 15:08   ` Vlastimil Babka
  2024-07-26 21:56 ` [RFC PATCH 16/20] mm: Allocate kswapd on its node Frederic Weisbecker
  5 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, linux-mm,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner

Now that kthreads have an infrastructure to handle preferred affinity
against CPU hotplug and housekeeping cpumask, convert kswapd to use
it instead of handling halfway all the constraints by itself.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/vmscan.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2e34de9cd0d4..94359a893b4f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7116,10 +7116,6 @@ static int kswapd(void *p)
 	unsigned int highest_zoneidx = MAX_NR_ZONES - 1;
 	pg_data_t *pgdat = (pg_data_t *)p;
 	struct task_struct *tsk = current;
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-
-	if (!cpumask_empty(cpumask))
-		set_cpus_allowed_ptr(tsk, cpumask);
 
 	/*
 	 * Tell the memory management that we're a "memory allocator",
@@ -7288,7 +7284,7 @@ void __meminit kswapd_run(int nid)
 
 	pgdat_kswapd_lock(pgdat);
 	if (!pgdat->kswapd) {
-		pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
+		pgdat->kswapd = kthread_create(kswapd, pgdat, "kswapd%d", nid);
 		if (IS_ERR(pgdat->kswapd)) {
 			/* failure at boot is fatal */
 			pr_err("Failed to start kswapd on node %d,ret=%ld\n",
@@ -7296,6 +7292,8 @@ void __meminit kswapd_run(int nid)
 			BUG_ON(system_state < SYSTEM_RUNNING);
 			pgdat->kswapd = NULL;
 		}
+		kthread_affine_preferred(pgdat->kswapd, cpumask_of_node(pgdat->node_id));
+		wake_up_process(pgdat->kswapd);
 	}
 	pgdat_kswapd_unlock(pgdat);
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 16/20] mm: Allocate kswapd on its node
       [not found] <20240726215701.19459-1-frederic@kernel.org>
                   ` (4 preceding siblings ...)
  2024-07-26 21:56 ` [RFC PATCH 15/20] mm: Make kswapd use kthread's preferred affinity Frederic Weisbecker
@ 2024-07-26 21:56 ` Frederic Weisbecker
  2024-07-31 15:08   ` Vlastimil Babka
  5 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, linux-mm,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner

kswapd runs preferrably on a specific node. Allocate its task
structure accordingly for better memory locality.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 94359a893b4f..adf8c1e7e89d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7284,7 +7284,7 @@ void __meminit kswapd_run(int nid)
 
 	pgdat_kswapd_lock(pgdat);
 	if (!pgdat->kswapd) {
-		pgdat->kswapd = kthread_create(kswapd, pgdat, "kswapd%d", nid);
+		pgdat->kswapd = kthread_create_on_node(kswapd, pgdat, nid, "kswapd%d", nid);
 		if (IS_ERR(pgdat->kswapd)) {
 			/* failure at boot is fatal */
 			pr_err("Failed to start kswapd on node %d,ret=%ld\n",
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-07-26 21:56 ` [RFC PATCH 12/20] kthread: Implement preferred affinity Frederic Weisbecker
@ 2024-07-26 22:31   ` Frederic Weisbecker
  2024-07-30 15:49   ` Vlastimil Babka
  1 sibling, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 22:31 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, linux-mm, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang, rcu

Le Fri, Jul 26, 2024 at 11:56:48PM +0200, Frederic Weisbecker a écrit :
> +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
> +{
> +	struct kthread *kthread = to_kthread(p);
> +	cpumask_var_t affinity;
> +	unsigned long flags;
> +	int ret;
> +
> +	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
> +		WARN_ON(1);
> +		return -EINVAL;
> +	}
> +
> +	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
> +	if (!kthread->preferred_affinity) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_copy(kthread->preferred_affinity, mask);
> +	list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
> +	kthread_fetch_affinity(kthread, affinity);
> +
> +	/* It's safe because the task is inactive. */
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +	do_set_cpus_allowed(p, mask);

s/mask/affinity


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 11/20] kthread: Make sure kthread hasn't started while binding it
  2024-07-26 21:56 ` [RFC PATCH 11/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
@ 2024-07-30 15:20   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-07-30 15:20 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> Make sure the kthread is sleeping in the schedule_preempt_disabled()
> call before calling its handler when kthread_bind[_mask]() is called
> on it. This provides a sanity check verifying that the task is not
> randomly blocked later at some point within its function handler, in
> which case it could be just concurrently awaken, leaving the call to
> do_set_cpus_allowed() without any effect until the next voluntary sleep.
> 
> Rely on the wake-up ordering to ensure that the newly introduced "started"
> field returns the expected value:
> 
>     TASK A                                   TASK B
>     ------                                   ------
> READ kthread->started
> wake_up_process(B)
>    rq_lock()
>    ...
>    rq_unlock() // RELEASE
>                                            schedule()
>                                               rq_lock() // ACQUIRE
>                                               // schedule task B
>                                               rq_unlock()
>                                               WRITE kthread->started
> 
> Similarly, writing kthread->started before subsequent voluntary sleeps
> will be visible after calling wait_task_inactive() in
> __kthread_bind_mask(), reporting potential misuse of the API.
> 
> Upcoming patches will make further use of this facility.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  kernel/kthread.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index f7be976ff88a..ecb719f54f7a 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -53,6 +53,7 @@ struct kthread_create_info
>  struct kthread {
>  	unsigned long flags;
>  	unsigned int cpu;
> +	int started;
>  	int result;
>  	int (*threadfn)(void *);
>  	void *data;
> @@ -382,6 +383,8 @@ static int kthread(void *_create)
>  	schedule_preempt_disabled();
>  	preempt_enable();
>  
> +	self->started = 1;
> +
>  	ret = -EINTR;
>  	if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
>  		cgroup_kthread_ready();
> @@ -540,7 +543,9 @@ static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int
>  
>  void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
>  {
> +	struct kthread *kthread = to_kthread(p);
>  	__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
> +	WARN_ON_ONCE(kthread->started);
>  }
>  
>  /**
> @@ -554,7 +559,9 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
>   */
>  void kthread_bind(struct task_struct *p, unsigned int cpu)
>  {
> +	struct kthread *kthread = to_kthread(p);
>  	__kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE);
> +	WARN_ON_ONCE(kthread->started);
>  }
>  EXPORT_SYMBOL(kthread_bind);
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-07-26 21:56 ` [RFC PATCH 12/20] kthread: Implement preferred affinity Frederic Weisbecker
  2024-07-26 22:31   ` Frederic Weisbecker
@ 2024-07-30 15:49   ` Vlastimil Babka
  2024-08-05 14:28     ` Frederic Weisbecker
  1 sibling, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2024-07-30 15:49 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> Affining kthreads follow either of three existing different patterns:
> 
> 1) Per-CPU kthreads must stay affine to a single CPU and never execute
>    relevant code on any other CPU. This is currently handled by smpboot
>    code which takes care of CPU-hotplug operations.
> 
> 2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
>    run anywhere else. The affinity is set through kthread_bind_mask()
>    and the subsystem takes care by itself to handle CPU-hotplug operations.
> 
> 3) Kthreads that have a _preferred_ affinity but that can run anywhere
>    without breaking correctness. Userspace can overwrite the affinity.
>    It is set manually like any other task and CPU-hotplug is supposed
>    to be handled by the relevant subsystem so that the task is properly
>    reaffined whenever a given CPU from the preferred affinity comes up
>    or down. Also care must be taken so that the preferred affinity
>    doesn't cross housekeeping cpumask boundaries.
> 
> Currently the preferred affinity pattern has at least 4 identified
> users, with more or less success when it comes to handle CPU-hotplug
> operations and housekeeping cpumask.
> 
> Provide an infrastructure to handle this usecase patter. A new
> kthread_affine_preferred() API is introduced, to be used just like
> kthread_bind_mask(), right after kthread creation and before the first
> wake up. The kthread is then affine right away to the cpumask passed
> through the API if it has online housekeeping CPUs. Otherwise it will
> be affine to all online housekeeping CPUs as a last resort.
> 
> It is aware of CPU hotplug events such that:
> 
> * When a housekeeping CPU goes up and is part of the preferred affinity
>   of a given kthread, it is added to its applied affinity set (and
>   possibly the default last resort online housekeeping set is removed
>   from the set).
> 
> * When a housekeeping CPU goes down while it was part of the preferred
>   affinity of a kthread, it is removed from the kthread's applied
>   affinity. The last resort is to affine the kthread to all online
>   housekeeping CPUs.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Nit:

> +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
> +{
> +	struct kthread *kthread = to_kthread(p);
> +	cpumask_var_t affinity;
> +	unsigned long flags;
> +	int ret;
> +
> +	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
> +		WARN_ON(1);
> +		return -EINVAL;
> +	}
> +

Should we also fail if kthread->preferred_affinity already exist? In
case somebody calls this twice.

Also for some of the use cases (kswapd, kcompactd) it would make sense
to be able to add cpus of a node as they are onlined. Which seems we
didn't do, except some corner case handling in kcompactd, but maybe we
should? I wonder if the current implementation of onlining a completely
new node with cpus does the right thing as a result of the individual
onlining operations, or we end up with being affined to a single cpu (or
none).

But that would need some kind of kthread_affine_preferred_update()
implementation?

> +	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
> +	if (!kthread->preferred_affinity) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_copy(kthread->preferred_affinity, mask);
> +	list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
> +	kthread_fetch_affinity(kthread, affinity);
> +
> +	/* It's safe because the task is inactive. */
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +	do_set_cpus_allowed(p, mask);
> +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +
> +	mutex_unlock(&kthreads_hotplug_lock);
> +out:
> +	free_cpumask_var(affinity);
> +
> +	return 0;
> +}
> +
> +static int kthreads_hotplug_update(void)
> +{
> +	cpumask_var_t affinity;
> +	struct kthread *k;
> +	int err = 0;
> +
> +	if (list_empty(&kthreads_hotplug))
> +		return 0;
> +
> +	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
> +		if (WARN_ON_ONCE(!k->preferred_affinity)) {
> +			err = -EINVAL;
> +			break;
> +		}
> +		kthread_fetch_affinity(k, affinity);
> +		set_cpus_allowed_ptr(k->task, affinity);
> +	}
> +
> +	free_cpumask_var(affinity);
> +
> +	return err;
> +}
> +
> +static int kthreads_offline_cpu(unsigned int cpu)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_clear_cpu(cpu, &kthread_online_mask);
> +	ret = kthreads_hotplug_update();
> +	mutex_unlock(&kthreads_hotplug_lock);
> +
> +	return ret;
> +}
> +
> +static int kthreads_online_cpu(unsigned int cpu)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&kthreads_hotplug_lock);
> +	cpumask_set_cpu(cpu, &kthread_online_mask);
> +	ret = kthreads_hotplug_update();
> +	mutex_unlock(&kthreads_hotplug_lock);
> +
> +	return ret;
> +}
> +
> +static int kthreads_init(void)
> +{
> +	return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online",
> +				kthreads_online_cpu, kthreads_offline_cpu);
> +}
> +early_initcall(kthreads_init);
> +
>  void __kthread_init_worker(struct kthread_worker *worker,
>  				const char *name,
>  				struct lock_class_key *key)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 13/20] mm: Make Kcompactd use kthread's preferred affinity
  2024-07-26 21:56 ` [RFC PATCH 13/20] mm: Make Kcompactd use kthread's " Frederic Weisbecker
@ 2024-07-31 15:03   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-07-31 15:03 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Michal Hocko, Andrew Morton, linux-mm, Peter Zijlstra,
	Thomas Gleixner

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> Now that kthreads have an infrastructure to handle preferred affinity
> against CPU hotplug and housekeeping cpumask, convert Kcompactd to use
> it instead of handling halfway all the constraints by itself.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 14/20] mm: Allocate kcompactd on its node
  2024-07-26 21:56 ` [RFC PATCH 14/20] mm: Allocate kcompactd on its node Frederic Weisbecker
@ 2024-07-31 15:07   ` Vlastimil Babka
  2024-08-05 14:30     ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2024-07-31 15:07 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Michal Hocko, linux-mm, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> kcompactd runs preferrably on a specific node. Allocate its task
> structure accordingly for better memory locality.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

could even squash it to previous patch since you touch that line anyway and
it should be a no-brainer

> ---
>  mm/compaction.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 64a6486f06e1..dc4347c4e352 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -3252,7 +3252,7 @@ void __meminit kcompactd_run(int nid)
>  	if (pgdat->kcompactd)
>  		return;
>  
> -	pgdat->kcompactd = kthread_create(kcompactd, pgdat, "kcompactd%d", nid);
> +	pgdat->kcompactd = kthread_create_on_node(kcompactd, pgdat, nid, "kcompactd%d", nid);
>  	if (IS_ERR(pgdat->kcompactd)) {
>  		pr_err("Failed to start kcompactd on node %d\n", nid);
>  		pgdat->kcompactd = NULL;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 15/20] mm: Make kswapd use kthread's preferred affinity
  2024-07-26 21:56 ` [RFC PATCH 15/20] mm: Make kswapd use kthread's preferred affinity Frederic Weisbecker
@ 2024-07-31 15:08   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-07-31 15:08 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Michal Hocko, linux-mm, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> Now that kthreads have an infrastructure to handle preferred affinity
> against CPU hotplug and housekeeping cpumask, convert kswapd to use
> it instead of handling halfway all the constraints by itself.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/vmscan.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2e34de9cd0d4..94359a893b4f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -7116,10 +7116,6 @@ static int kswapd(void *p)
>  	unsigned int highest_zoneidx = MAX_NR_ZONES - 1;
>  	pg_data_t *pgdat = (pg_data_t *)p;
>  	struct task_struct *tsk = current;
> -	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
> -
> -	if (!cpumask_empty(cpumask))
> -		set_cpus_allowed_ptr(tsk, cpumask);
>  
>  	/*
>  	 * Tell the memory management that we're a "memory allocator",
> @@ -7288,7 +7284,7 @@ void __meminit kswapd_run(int nid)
>  
>  	pgdat_kswapd_lock(pgdat);
>  	if (!pgdat->kswapd) {
> -		pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
> +		pgdat->kswapd = kthread_create(kswapd, pgdat, "kswapd%d", nid);
>  		if (IS_ERR(pgdat->kswapd)) {
>  			/* failure at boot is fatal */
>  			pr_err("Failed to start kswapd on node %d,ret=%ld\n",
> @@ -7296,6 +7292,8 @@ void __meminit kswapd_run(int nid)
>  			BUG_ON(system_state < SYSTEM_RUNNING);
>  			pgdat->kswapd = NULL;
>  		}
> +		kthread_affine_preferred(pgdat->kswapd, cpumask_of_node(pgdat->node_id));
> +		wake_up_process(pgdat->kswapd);
>  	}
>  	pgdat_kswapd_unlock(pgdat);
>  }



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 16/20] mm: Allocate kswapd on its node
  2024-07-26 21:56 ` [RFC PATCH 16/20] mm: Allocate kswapd on its node Frederic Weisbecker
@ 2024-07-31 15:08   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-07-31 15:08 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Michal Hocko, linux-mm, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> kswapd runs preferrably on a specific node. Allocate its task
> structure accordingly for better memory locality.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

also squashable to the previous one

> ---
>  mm/vmscan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 94359a893b4f..adf8c1e7e89d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -7284,7 +7284,7 @@ void __meminit kswapd_run(int nid)
>  
>  	pgdat_kswapd_lock(pgdat);
>  	if (!pgdat->kswapd) {
> -		pgdat->kswapd = kthread_create(kswapd, pgdat, "kswapd%d", nid);
> +		pgdat->kswapd = kthread_create_on_node(kswapd, pgdat, nid, "kswapd%d", nid);
>  		if (IS_ERR(pgdat->kswapd)) {
>  			/* failure at boot is fatal */
>  			pr_err("Failed to start kswapd on node %d,ret=%ld\n",



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-07-30 15:49   ` Vlastimil Babka
@ 2024-08-05 14:28     ` Frederic Weisbecker
  2024-08-05 14:53       ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-08-05 14:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: LKML, Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

Le Tue, Jul 30, 2024 at 05:49:51PM +0200, Vlastimil Babka a écrit :
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Nit:
> 
> > +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
> > +{
> > +	struct kthread *kthread = to_kthread(p);
> > +	cpumask_var_t affinity;
> > +	unsigned long flags;
> > +	int ret;
> > +
> > +	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
> > +		WARN_ON(1);
> > +		return -EINVAL;
> > +	}
> > +
> 
> Should we also fail if kthread->preferred_affinity already exist? In
> case somebody calls this twice.

Good point!

> 
> Also for some of the use cases (kswapd, kcompactd) it would make sense
> to be able to add cpus of a node as they are onlined. Which seems we
> didn't do, except some corner case handling in kcompactd, but maybe we
> should? I wonder if the current implementation of onlining a completely
> new node with cpus does the right thing as a result of the individual
> onlining operations, or we end up with being affined to a single cpu (or
> none).
> 
> But that would need some kind of kthread_affine_preferred_update()
> implementation?

So you mean that the "for_each_node_state()" loop in kcompactd doesn't
handle all possible nodes but only those online when it's called? Or
am I confused?

If all users of preferred affinity were to use NUMA nodes, it could be
a good idea to do a flavour of kernel/smpboot.c which would handle
per-node kthreads instead of per-cpu kthreads. I initially thought
about that. It would have handled all the lifecycle of those kthreads,
including creation, against hotplug. Unfortunately RCU doesn't rely on
per-NUMA nodes but rather use its own tree.

If there be more users of real per NUMA nodes kthreads than kswapd and
kcompactd, of course that would be much worth considering.

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 14/20] mm: Allocate kcompactd on its node
  2024-07-31 15:07   ` Vlastimil Babka
@ 2024-08-05 14:30     ` Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2024-08-05 14:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: LKML, Michal Hocko, linux-mm, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Le Wed, Jul 31, 2024 at 05:07:48PM +0200, Vlastimil Babka a écrit :
> On 7/26/24 11:56 PM, Frederic Weisbecker wrote:
> > kcompactd runs preferrably on a specific node. Allocate its task
> > structure accordingly for better memory locality.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> could even squash it to previous patch since you touch that line anyway and
> it should be a no-brainer

It looks like a no brainer indeed but I'm very careful about introducing subtle
logical changes ;-)

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-08-05 14:28     ` Frederic Weisbecker
@ 2024-08-05 14:53       ` Vlastimil Babka
  2024-08-05 16:23         ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2024-08-05 14:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

On 8/5/24 16:28, Frederic Weisbecker wrote:
> Le Tue, Jul 30, 2024 at 05:49:51PM +0200, Vlastimil Babka a écrit :
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> 
>> Nit:
>> 
>> > +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
>> > +{
>> > +	struct kthread *kthread = to_kthread(p);
>> > +	cpumask_var_t affinity;
>> > +	unsigned long flags;
>> > +	int ret;
>> > +
>> > +	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
>> > +		WARN_ON(1);
>> > +		return -EINVAL;
>> > +	}
>> > +
>> 
>> Should we also fail if kthread->preferred_affinity already exist? In
>> case somebody calls this twice.
> 
> Good point!
> 
>> 
>> Also for some of the use cases (kswapd, kcompactd) it would make sense
>> to be able to add cpus of a node as they are onlined. Which seems we
>> didn't do, except some corner case handling in kcompactd, but maybe we
>> should? I wonder if the current implementation of onlining a completely
>> new node with cpus does the right thing as a result of the individual
>> onlining operations, or we end up with being affined to a single cpu (or
>> none).
>> 
>> But that would need some kind of kthread_affine_preferred_update()
>> implementation?
> 
> So you mean that the "for_each_node_state()" loop in kcompactd doesn't
> handle all possible nodes but only those online when it's called? Or
> am I confused?

If you mean the loop in kcompactd_init() then indeed, but we also have a
hook in online_pages() to start new threads on newly onlined nodes, so
that's not a problem.

The problem (I think) I see is cpumask_of_node(pgdat->node_id) is a snapshot
of cpus running on the NUMA node the time, and is never updated later as new
cpus might be brought up.

kcompactd_cpu_online() does try to update that when cpus are onlined (in a
clumsy way), there was nothing like that for kswapd and after your series
this update is also removed for kcompactd.

> If all users of preferred affinity were to use NUMA nodes, it could be
> a good idea to do a flavour of kernel/smpboot.c which would handle
> per-node kthreads instead of per-cpu kthreads. I initially thought
> about that. It would have handled all the lifecycle of those kthreads,
> including creation, against hotplug. Unfortunately RCU doesn't rely on
> per-NUMA nodes but rather use its own tree.
> 
> If there be more users of real per NUMA nodes kthreads than kswapd and
> kcompactd, of course that would be much worth considering.

Yeah it's not that compelling, but a way to update the preferred affine mask
in response to cpu hotplug events, that kswapd and kcompactd could use,
would be sufficient. And maybe more widely useful.

I guess there could be a callback defined for kthread to provide a new
preferred_affinity, that you'd call from kthreads_hotplug_update() ?
And kcompactd and kswapd could both use the same callback that interprets
kthread_data() as pgdat and fetches a new cpumask of it?

> Thanks.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-08-05 14:53       ` Vlastimil Babka
@ 2024-08-05 16:23         ` Frederic Weisbecker
  2024-08-05 21:25           ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-08-05 16:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: LKML, Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

Le Mon, Aug 05, 2024 at 04:53:51PM +0200, Vlastimil Babka a écrit :
> If you mean the loop in kcompactd_init() then indeed, but we also have a
> hook in online_pages() to start new threads on newly onlined nodes, so
> that's not a problem.
> 
> The problem (I think) I see is cpumask_of_node(pgdat->node_id) is a snapshot
> of cpus running on the NUMA node the time, and is never updated later as new
> cpus might be brought up.

Oh I see now...

> 
> kcompactd_cpu_online() does try to update that when cpus are onlined (in a
> clumsy way), there was nothing like that for kswapd and after your series
> this update is also removed for kcompactd.

Ok...

> 
> > If all users of preferred affinity were to use NUMA nodes, it could be
> > a good idea to do a flavour of kernel/smpboot.c which would handle
> > per-node kthreads instead of per-cpu kthreads. I initially thought
> > about that. It would have handled all the lifecycle of those kthreads,
> > including creation, against hotplug. Unfortunately RCU doesn't rely on
> > per-NUMA nodes but rather use its own tree.
> > 
> > If there be more users of real per NUMA nodes kthreads than kswapd and
> > kcompactd, of course that would be much worth considering.
> 
> Yeah it's not that compelling, but a way to update the preferred affine mask
> in response to cpu hotplug events, that kswapd and kcompactd could use,
> would be sufficient. And maybe more widely useful.
> 
> I guess there could be a callback defined for kthread to provide a new
> preferred_affinity, that you'd call from kthreads_hotplug_update() ?
> And kcompactd and kswapd could both use the same callback that interprets
> kthread_data() as pgdat and fetches a new cpumask of it?

It's too bad we don't have a way to have a cpumask_possible_of_node(). I've
looked into the guts of numa but that doesn't look easy to do.

Or there could be kthread_set_preferred_node()... ?

Thanks.

> 
> > Thanks.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-08-05 16:23         ` Frederic Weisbecker
@ 2024-08-05 21:25           ` Vlastimil Babka
  2024-08-05 23:59             ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2024-08-05 21:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

On 8/5/24 18:23, Frederic Weisbecker wrote:
> Le Mon, Aug 05, 2024 at 04:53:51PM +0200, Vlastimil Babka a écrit :
>> If you mean the loop in kcompactd_init() then indeed, but we also have a
>> hook in online_pages() to start new threads on newly onlined nodes, so
>> that's not a problem.
>> 
>> The problem (I think) I see is cpumask_of_node(pgdat->node_id) is a snapshot
>> of cpus running on the NUMA node the time, and is never updated later as new
>> cpus might be brought up.
> 
> Oh I see now...
> 
>> 
>> kcompactd_cpu_online() does try to update that when cpus are onlined (in a
>> clumsy way), there was nothing like that for kswapd and after your series
>> this update is also removed for kcompactd.
> 
> Ok...
> 
>> 
>> > If all users of preferred affinity were to use NUMA nodes, it could be
>> > a good idea to do a flavour of kernel/smpboot.c which would handle
>> > per-node kthreads instead of per-cpu kthreads. I initially thought
>> > about that. It would have handled all the lifecycle of those kthreads,
>> > including creation, against hotplug. Unfortunately RCU doesn't rely on
>> > per-NUMA nodes but rather use its own tree.
>> > 
>> > If there be more users of real per NUMA nodes kthreads than kswapd and
>> > kcompactd, of course that would be much worth considering.
>> 
>> Yeah it's not that compelling, but a way to update the preferred affine mask
>> in response to cpu hotplug events, that kswapd and kcompactd could use,
>> would be sufficient. And maybe more widely useful.
>> 
>> I guess there could be a callback defined for kthread to provide a new
>> preferred_affinity, that you'd call from kthreads_hotplug_update() ?
>> And kcompactd and kswapd could both use the same callback that interprets
>> kthread_data() as pgdat and fetches a new cpumask of it?
> 
> It's too bad we don't have a way to have a cpumask_possible_of_node(). I've
> looked into the guts of numa but that doesn't look easy to do.

That was my impression as well. Maybe not even possible because exact cpu
ids might not be pre-determined like this?

> Or there could be kthread_set_preferred_node()... ?

Possible instead of the callback idea suggested above?
kthreads_hotplug_update() could check if this is set and construct the mask
accordingly.

> Thanks.
> 
>> 
>> > Thanks.
>> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-08-05 21:25           ` Vlastimil Babka
@ 2024-08-05 23:59             ` Frederic Weisbecker
  2024-08-06 11:08               ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-08-05 23:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: LKML, Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

On Mon, Aug 05, 2024 at 11:25:59PM +0200, Vlastimil Babka wrote:
> > It's too bad we don't have a way to have a cpumask_possible_of_node(). I've
> > looked into the guts of numa but that doesn't look easy to do.
> 
> That was my impression as well. Maybe not even possible because exact cpu
> ids might not be pre-determined like this?

Probably.

> 
> > Or there could be kthread_set_preferred_node()... ?
> 
> Possible instead of the callback idea suggested above?
> kthreads_hotplug_update() could check if this is set and construct the mask
> accordingly.

Or even better, callers of kthread_create_on_node() with actual node passed
(!NUMA_NO_NODE) can be preferrably affined to the corresponding node by default
unless told otherwise (that is unless kthread_bind() or
kthread_set_preferred_affinity() has been called before the first wake up, and
that includes kthread_create_on_cpu()).

There are a few callers concerned: kswapd, kcompactd, some drivers:
drivers/block/mtip32xx/mtip32xx.c, drivers/firmware/stratix10-svc.c,
kernel/dma/map_benchmark.c, net/sunrpc/svc.c

After all kthread_create_on_cpu() affines to the corresponding CPU. So
it sounds natural that kthread_create_on_node() affines to the corresponding
node.

And then it's handled on hotplug just as a special case of preferred affinity.

Or is there something that wouldn't make that work?

Thanks.


> 
> > Thanks.
> > 
> >> 
> >> > Thanks.
> >> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 12/20] kthread: Implement preferred affinity
  2024-08-05 23:59             ` Frederic Weisbecker
@ 2024-08-06 11:08               ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-08-06 11:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Michal Hocko, linux-mm, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, rcu

On 8/6/24 01:59, Frederic Weisbecker wrote:
> On Mon, Aug 05, 2024 at 11:25:59PM +0200, Vlastimil Babka wrote:
>> > It's too bad we don't have a way to have a cpumask_possible_of_node(). I've
>> > looked into the guts of numa but that doesn't look easy to do.
>> 
>> That was my impression as well. Maybe not even possible because exact cpu
>> ids might not be pre-determined like this?
> 
> Probably.
> 
>> 
>> > Or there could be kthread_set_preferred_node()... ?
>> 
>> Possible instead of the callback idea suggested above?
>> kthreads_hotplug_update() could check if this is set and construct the mask
>> accordingly.
> 
> Or even better, callers of kthread_create_on_node() with actual node passed
> (!NUMA_NO_NODE) can be preferrably affined to the corresponding node by default
> unless told otherwise (that is unless kthread_bind() or
> kthread_set_preferred_affinity() has been called before the first wake up, and
> that includes kthread_create_on_cpu()).

Sounds logical and great!

> There are a few callers concerned: kswapd, kcompactd, some drivers:
> drivers/block/mtip32xx/mtip32xx.c, drivers/firmware/stratix10-svc.c,
> kernel/dma/map_benchmark.c, net/sunrpc/svc.c
> 
> After all kthread_create_on_cpu() affines to the corresponding CPU. So
> it sounds natural that kthread_create_on_node() affines to the corresponding
> node.

Yes.

> And then it's handled on hotplug just as a special case of preferred affinity.
> 
> Or is there something that wouldn't make that work?

Hopefully not.

> Thanks.
> 
> 
>> 
>> > Thanks.
>> > 
>> >> 
>> >> > Thanks.
>> >> 
>> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-08-06 11:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240726215701.19459-1-frederic@kernel.org>
2024-07-26 21:56 ` [RFC PATCH 11/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
2024-07-30 15:20   ` Vlastimil Babka
2024-07-26 21:56 ` [RFC PATCH 12/20] kthread: Implement preferred affinity Frederic Weisbecker
2024-07-26 22:31   ` Frederic Weisbecker
2024-07-30 15:49   ` Vlastimil Babka
2024-08-05 14:28     ` Frederic Weisbecker
2024-08-05 14:53       ` Vlastimil Babka
2024-08-05 16:23         ` Frederic Weisbecker
2024-08-05 21:25           ` Vlastimil Babka
2024-08-05 23:59             ` Frederic Weisbecker
2024-08-06 11:08               ` Vlastimil Babka
2024-07-26 21:56 ` [RFC PATCH 13/20] mm: Make Kcompactd use kthread's " Frederic Weisbecker
2024-07-31 15:03   ` Vlastimil Babka
2024-07-26 21:56 ` [RFC PATCH 14/20] mm: Allocate kcompactd on its node Frederic Weisbecker
2024-07-31 15:07   ` Vlastimil Babka
2024-08-05 14:30     ` Frederic Weisbecker
2024-07-26 21:56 ` [RFC PATCH 15/20] mm: Make kswapd use kthread's preferred affinity Frederic Weisbecker
2024-07-31 15:08   ` Vlastimil Babka
2024-07-26 21:56 ` [RFC PATCH 16/20] mm: Allocate kswapd on its node Frederic Weisbecker
2024-07-31 15:08   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).