[PATCH 05/20] scsi: bnx2i: Use kthread_create_on

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 05/20] scsi: bnx2i: Use kthread_create_on_cpu()
  2024-07-26 21:56 [PATCH 00/20] kthread: Introduce preferred affinity Frederic Weisbecker
@ 2024-07-26 21:56 ` Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-07-26 21:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

However it looks like bnx2i_percpu_io_thread() kthread could be
replaced by the use of a high prio workqueue instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/scsi/bnx2i/bnx2i_init.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
index 872ad37e2a6e..cecc3a026762 100644
--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -415,14 +415,11 @@ static int bnx2i_cpu_online(unsigned int cpu)
 
 	p = &per_cpu(bnx2i_percpu, cpu);
 
-	thread = kthread_create_on_node(bnx2i_percpu_io_thread, (void *)p,
-					cpu_to_node(cpu),
-					"bnx2i_thread/%d", cpu);
+	thread = kthread_create_on_cpu(bnx2i_percpu_io_thread, (void *)p,
+				       cpu, "bnx2i_thread/%d");
 	if (IS_ERR(thread))
 		return PTR_ERR(thread);
 
-	/* bind thread to the cpu */
-	kthread_bind(thread, cpu);
 	p->iothread = thread;
 	wake_up_process(thread);
 	return 0;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 00/20] kthread: Introduce preferred affinity v4
@ 2024-09-26 22:48 Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 01/20] arm/bL_switcher: Use kthread_run_on_cpu() Frederic Weisbecker
                   ` (19 more replies)
  0 siblings, 20 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Vlastimil Babka

Affining kthreads follow either of 4 existing different patterns:

1) Per-CPU kthreads must stay affine to a single CPU and never execute
   relevant code on any other CPU. This is currently handled by smpboot
   code which takes care of CPU-hotplug operations.

2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
   run anywhere else. The affinity is set through kthread_bind_mask()
   and the subsystem takes care by itself to handle CPU-hotplug operations.

3) Kthreads that _prefer_ to be affine to a specific NUMA node.

4) Similar to the previous point but kthreads have a _preferred_ affinity
   different than a node. It is set manually like any other task and
   CPU-hotplug is supposed to be handled by the relevant subsystem so
   that the task is properly reaffined whenever a given CPU from the
   preferred affinity comes up or down. Also care must be taken so that
   the preferred affinity doesn't cross housekeeping cpumask boundaries.

Currently the preferred affinity patterns (3 and 4) have at least 4
identified users, with more or less success when it comes to handle
CPU-hotplug operations and CPU isolation.

This is a infrastructure proposal to handle this (after cleanups from 01
to 10).

Changes since v3:

_ Handle CPU down from scheduler fallback (new patch 11/20), suggested
  by Michal Hocko
_ Simplify accordlingly 13/20 and 16/20
_ Add acks

Frederic Weisbecker (20):
  arm/bL_switcher: Use kthread_run_on_cpu()
  x86/resctrl: Use kthread_run_on_cpu()
  firmware: stratix10-svc: Use kthread_run_on_cpu()
  scsi: bnx2fc: Use kthread_create_on_cpu()
  scsi: bnx2i: Use kthread_create_on_cpu()
  scsi: qedi: Use kthread_create_on_cpu()
  soc/qman: test: Use kthread_run_on_cpu()
  kallsyms: Use kthread_run_on_cpu()
  lib: test_objpool: Use kthread_run_on_cpu()
  net: pktgen: Use kthread_create_on_node()
  sched: Handle CPU isolation on last resort fallback rq selection
  kthread: Make sure kthread hasn't started while binding it
  kthread: Default affine kthread to its preferred NUMA node
  mm: Create/affine kcompactd to its preferred node
  mm: Create/affine kswapd to its preferred node
  kthread: Implement preferred affinity
  rcu: Use kthread preferred affinity for RCU boost
  kthread: Unify kthread_create_on_cpu() and
    kthread_create_worker_on_cpu() automatic format
  treewide: Introduce kthread_run_worker[_on_cpu]()
  rcu: Use kthread preferred affinity for RCU exp kworkers

 arch/arm/common/bL_switcher.c                 |  10 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c     |  28 +--
 arch/x86/kvm/i8254.c                          |   2 +-
 crypto/crypto_engine.c                        |   2 +-
 drivers/cpufreq/cppc_cpufreq.c                |   2 +-
 drivers/firmware/stratix10-svc.c              |   9 +-
 drivers/gpu/drm/drm_vblank_work.c             |   2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |   2 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/gt/selftest_slpc.c       |   2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |   8 +-
 drivers/gpu/drm/msm/disp/msm_disp_snapshot.c  |   2 +-
 drivers/gpu/drm/msm/msm_atomic.c              |   2 +-
 drivers/gpu/drm/msm/msm_gpu.c                 |   2 +-
 drivers/gpu/drm/msm/msm_kms.c                 |   2 +-
 .../platform/chips-media/wave5/wave5-vpu.c    |   2 +-
 drivers/net/dsa/mv88e6xxx/chip.c              |   2 +-
 drivers/net/ethernet/intel/ice/ice_dpll.c     |   2 +-
 drivers/net/ethernet/intel/ice/ice_gnss.c     |   2 +-
 drivers/net/ethernet/intel/ice/ice_ptp.c      |   2 +-
 drivers/platform/chrome/cros_ec_spi.c         |   2 +-
 drivers/ptp/ptp_clock.c                       |   2 +-
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c             |   7 +-
 drivers/scsi/bnx2i/bnx2i_init.c               |   7 +-
 drivers/scsi/qedi/qedi_main.c                 |   6 +-
 drivers/soc/fsl/qbman/qman_test_stash.c       |   6 +-
 drivers/spi/spi.c                             |   2 +-
 drivers/usb/typec/tcpm/tcpm.c                 |   2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c              |   2 +-
 drivers/watchdog/watchdog_dev.c               |   2 +-
 fs/erofs/zdata.c                              |   2 +-
 include/linux/cpuhotplug.h                    |   1 +
 include/linux/kthread.h                       |  56 ++++-
 kernel/kallsyms_selftest.c                    |   4 +-
 kernel/kthread.c                              | 201 ++++++++++++++++--
 kernel/rcu/tree.c                             |  94 ++------
 kernel/rcu/tree_plugin.h                      |  11 +-
 kernel/sched/core.c                           |  17 +-
 kernel/sched/ext.c                            |   2 +-
 kernel/workqueue.c                            |   2 +-
 lib/test_objpool.c                            |  19 +-
 mm/compaction.c                               |  43 +---
 mm/vmscan.c                                   |   8 +-
 net/core/pktgen.c                             |   7 +-
 net/dsa/tag_ksz.c                             |   2 +-
 net/dsa/tag_ocelot_8021q.c                    |   2 +-
 net/dsa/tag_sja1105.c                         |   2 +-
 48 files changed, 339 insertions(+), 261 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/20] arm/bL_switcher: Use kthread_run_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 02/20] x86/resctrl: " Frederic Weisbecker
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Russell King, linux-arm-kernel,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Dave Martin,
	Nicolas Pitre

Use the proper API instead of open coding it.

Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm/common/bL_switcher.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c
index 9a9aa53547a6..d1e82a318e3b 100644
--- a/arch/arm/common/bL_switcher.c
+++ b/arch/arm/common/bL_switcher.c
@@ -307,13 +307,11 @@ static struct task_struct *bL_switcher_thread_create(int cpu, void *arg)
 {
 	struct task_struct *task;
 
-	task = kthread_create_on_node(bL_switcher_thread, arg,
-				      cpu_to_node(cpu), "kswitcher_%d", cpu);
-	if (!IS_ERR(task)) {
-		kthread_bind(task, cpu);
-		wake_up_process(task);
-	} else
+	task = kthread_run_on_cpu(bL_switcher_thread, arg,
+				  cpu, "kswitcher_%d");
+	if (IS_ERR(task))
 		pr_err("%s failed for CPU %d\n", __func__, cpu);
+
 	return task;
 }
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/20] x86/resctrl: Use kthread_run_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 01/20] arm/bL_switcher: Use kthread_run_on_cpu() Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 03/20] firmware: stratix10-svc: " Frederic Weisbecker
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Fenghua Yu, Reinette Chatre, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner

Use the proper API instead of open coding it.

Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 28 +++++++----------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index e69489d48625..ae1f0c28eee6 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1205,20 +1205,14 @@ static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp, int sel)
 	plr->cpu = cpu;
 
 	if (sel == 1)
-		thread = kthread_create_on_node(measure_cycles_lat_fn, plr,
-						cpu_to_node(cpu),
-						"pseudo_lock_measure/%u",
-						cpu);
+		thread = kthread_run_on_cpu(measure_cycles_lat_fn, plr,
+					    cpu, "pseudo_lock_measure/%u");
 	else if (sel == 2)
-		thread = kthread_create_on_node(measure_l2_residency, plr,
-						cpu_to_node(cpu),
-						"pseudo_lock_measure/%u",
-						cpu);
+		thread = kthread_run_on_cpu(measure_l2_residency, plr,
+					    cpu, "pseudo_lock_measure/%u");
 	else if (sel == 3)
-		thread = kthread_create_on_node(measure_l3_residency, plr,
-						cpu_to_node(cpu),
-						"pseudo_lock_measure/%u",
-						cpu);
+		thread = kthread_run_on_cpu(measure_l3_residency, plr,
+					    cpu, "pseudo_lock_measure/%u");
 	else
 		goto out;
 
@@ -1226,8 +1220,6 @@ static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp, int sel)
 		ret = PTR_ERR(thread);
 		goto out;
 	}
-	kthread_bind(thread, cpu);
-	wake_up_process(thread);
 
 	ret = wait_event_interruptible(plr->lock_thread_wq,
 				       plr->thread_done == 1);
@@ -1315,18 +1307,14 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 
 	plr->thread_done = 0;
 
-	thread = kthread_create_on_node(pseudo_lock_fn, rdtgrp,
-					cpu_to_node(plr->cpu),
-					"pseudo_lock/%u", plr->cpu);
+	thread = kthread_run_on_cpu(pseudo_lock_fn, rdtgrp,
+				    plr->cpu, "pseudo_lock/%u");
 	if (IS_ERR(thread)) {
 		ret = PTR_ERR(thread);
 		rdt_last_cmd_printf("Locking thread returned error %d\n", ret);
 		goto out_cstates;
 	}
 
-	kthread_bind(thread, plr->cpu);
-	wake_up_process(thread);
-
 	ret = wait_event_interruptible(plr->lock_thread_wq,
 				       plr->thread_done == 1);
 	if (ret < 0) {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/20] firmware: stratix10-svc: Use kthread_run_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 01/20] arm/bL_switcher: Use kthread_run_on_cpu() Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 02/20] x86/resctrl: " Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 04/20] scsi: bnx2fc: Use kthread_create_on_cpu() Frederic Weisbecker
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Dinh Nguyen, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/firmware/stratix10-svc.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/stratix10-svc.c b/drivers/firmware/stratix10-svc.c
index 528f37417aea..4cf5bd5647a4 100644
--- a/drivers/firmware/stratix10-svc.c
+++ b/drivers/firmware/stratix10-svc.c
@@ -967,18 +967,15 @@ int stratix10_svc_send(struct stratix10_svc_chan *chan, void *msg)
 	/* first client will create kernel thread */
 	if (!chan->ctrl->task) {
 		chan->ctrl->task =
-			kthread_create_on_node(svc_normal_to_secure_thread,
-					      (void *)chan->ctrl,
-					      cpu_to_node(cpu),
-					      "svc_smc_hvc_thread");
+			kthread_run_on_cpu(svc_normal_to_secure_thread,
+					   (void *)chan->ctrl,
+					   cpu, "svc_smc_hvc_thread");
 			if (IS_ERR(chan->ctrl->task)) {
 				dev_err(chan->ctrl->dev,
 					"failed to create svc_smc_hvc_thread\n");
 				kfree(p_data);
 				return -EINVAL;
 			}
-		kthread_bind(chan->ctrl->task, cpu);
-		wake_up_process(chan->ctrl->task);
 	}
 
 	pr_debug("%s: sent P-va=%p, P-com=%x, P-size=%u\n", __func__,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/20] scsi: bnx2fc: Use kthread_create_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 03/20] firmware: stratix10-svc: " Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 05/20] scsi: bnx2i: " Frederic Weisbecker
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Saurav Kashyap, Javed Hasan,
	GR-QLogic-Storage-Upstream, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

However it looks like bnx2fc_percpu_io_thread() kthread could be
replaced by the use of a high prio workqueue instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index f49783b89d04..36126030e76d 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -2610,14 +2610,11 @@ static int bnx2fc_cpu_online(unsigned int cpu)
 
 	p = &per_cpu(bnx2fc_percpu, cpu);
 
-	thread = kthread_create_on_node(bnx2fc_percpu_io_thread,
-					(void *)p, cpu_to_node(cpu),
-					"bnx2fc_thread/%d", cpu);
+	thread = kthread_create_on_cpu(bnx2fc_percpu_io_thread,
+				       (void *)p, cpu, "bnx2fc_thread/%d");
 	if (IS_ERR(thread))
 		return PTR_ERR(thread);
 
-	/* bind thread to the cpu */
-	kthread_bind(thread, cpu);
 	p->iothread = thread;
 	wake_up_process(thread);
 	return 0;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/20] scsi: bnx2i: Use kthread_create_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 04/20] scsi: bnx2fc: Use kthread_create_on_cpu() Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 06/20] scsi: qedi: " Frederic Weisbecker
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

However it looks like bnx2i_percpu_io_thread() kthread could be
replaced by the use of a high prio workqueue instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/scsi/bnx2i/bnx2i_init.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
index 872ad37e2a6e..cecc3a026762 100644
--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -415,14 +415,11 @@ static int bnx2i_cpu_online(unsigned int cpu)
 
 	p = &per_cpu(bnx2i_percpu, cpu);
 
-	thread = kthread_create_on_node(bnx2i_percpu_io_thread, (void *)p,
-					cpu_to_node(cpu),
-					"bnx2i_thread/%d", cpu);
+	thread = kthread_create_on_cpu(bnx2i_percpu_io_thread, (void *)p,
+				       cpu, "bnx2i_thread/%d");
 	if (IS_ERR(thread))
 		return PTR_ERR(thread);
 
-	/* bind thread to the cpu */
-	kthread_bind(thread, cpu);
 	p->iothread = thread;
 	wake_up_process(thread);
 	return 0;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/20] scsi: qedi: Use kthread_create_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 05/20] scsi: bnx2i: " Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 07/20] soc/qman: test: Use kthread_run_on_cpu() Frederic Weisbecker
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

However it looks like qedi_percpu_io_thread() kthread could be
replaced by the use of a high prio workqueue instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/scsi/qedi/qedi_main.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index c5aec26019d6..4b2a9cd811c4 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1960,13 +1960,11 @@ static int qedi_cpu_online(unsigned int cpu)
 	struct qedi_percpu_s *p = this_cpu_ptr(&qedi_percpu);
 	struct task_struct *thread;
 
-	thread = kthread_create_on_node(qedi_percpu_io_thread, (void *)p,
-					cpu_to_node(cpu),
-					"qedi_thread/%d", cpu);
+	thread = kthread_create_on_cpu(qedi_percpu_io_thread, (void *)p,
+				       cpu, "qedi_thread/%d");
 	if (IS_ERR(thread))
 		return PTR_ERR(thread);
 
-	kthread_bind(thread, cpu);
 	p->iothread = thread;
 	wake_up_process(thread);
 	return 0;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/20] soc/qman: test: Use kthread_run_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 06/20] scsi: qedi: " Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 08/20] kallsyms: " Frederic Weisbecker
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, linuxppc-dev, linux-arm-kernel,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner

Use the proper API instead of open coding it.

However it looks like kthreads here could be replaced by the use of a
per-cpu workqueue instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/soc/fsl/qbman/qman_test_stash.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman_test_stash.c b/drivers/soc/fsl/qbman/qman_test_stash.c
index b7e8e5ec884c..f4d3c2146f4f 100644
--- a/drivers/soc/fsl/qbman/qman_test_stash.c
+++ b/drivers/soc/fsl/qbman/qman_test_stash.c
@@ -108,14 +108,12 @@ static int on_all_cpus(int (*fn)(void))
 			.fn = fn,
 			.started = ATOMIC_INIT(0)
 		};
-		struct task_struct *k = kthread_create(bstrap_fn, &bstrap,
-			"hotpotato%d", cpu);
+		struct task_struct *k = kthread_run_on_cpu(bstrap_fn, &bstrap,
+							   cpu, "hotpotato%d");
 		int ret;
 
 		if (IS_ERR(k))
 			return -ENOMEM;
-		kthread_bind(k, cpu);
-		wake_up_process(k);
 		/*
 		 * If we call kthread_stop() before the "wake up" has had an
 		 * effect, then the thread may exit with -EINTR without ever
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/20] kallsyms: Use kthread_run_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 07/20] soc/qman: test: Use kthread_run_on_cpu() Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 09/20] lib: test_objpool: " Frederic Weisbecker
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Kees Cook, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/kallsyms_selftest.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
index 873f7c445488..cf4af5728307 100644
--- a/kernel/kallsyms_selftest.c
+++ b/kernel/kallsyms_selftest.c
@@ -435,13 +435,11 @@ static int __init kallsyms_test_init(void)
 {
 	struct task_struct *t;
 
-	t = kthread_create(test_entry, NULL, "kallsyms_test");
+	t = kthread_run_on_cpu(test_entry, NULL, 0, "kallsyms_test");
 	if (IS_ERR(t)) {
 		pr_info("Create kallsyms selftest task failed\n");
 		return PTR_ERR(t);
 	}
-	kthread_bind(t, 0);
-	wake_up_process(t);
 
 	return 0;
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/20] lib: test_objpool: Use kthread_run_on_cpu()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 08/20] kallsyms: " Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-26 22:48 ` [PATCH 10/20] net: pktgen: Use kthread_create_on_node() Frederic Weisbecker
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Matt Wu, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 lib/test_objpool.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/lib/test_objpool.c b/lib/test_objpool.c
index 5a3f6961a70f..896c0131c9a8 100644
--- a/lib/test_objpool.c
+++ b/lib/test_objpool.c
@@ -371,14 +371,10 @@ static int ot_start_sync(struct ot_test *test)
 		if (!cpu_online(cpu))
 			continue;
 
-		work = kthread_create_on_node(ot_thread_worker, item,
-				cpu_to_node(cpu), "ot_worker_%d", cpu);
-		if (IS_ERR(work)) {
+		work = kthread_run_on_cpu(ot_thread_worker, item,
+					  cpu, "ot_worker_%d");
+		if (IS_ERR(work))
 			pr_err("failed to create thread for cpu %d\n", cpu);
-		} else {
-			kthread_bind(work, cpu);
-			wake_up_process(work);
-		}
 	}
 
 	/* wait a while to make sure all threads waiting at start line */
@@ -562,14 +558,9 @@ static int ot_start_async(struct ot_test *test)
 		if (!cpu_online(cpu))
 			continue;
 
-		work = kthread_create_on_node(ot_thread_worker, item,
-				cpu_to_node(cpu), "ot_worker_%d", cpu);
-		if (IS_ERR(work)) {
+		work = kthread_run_on_cpu(ot_thread_worker, item, cpu, "ot_worker_%d");
+		if (IS_ERR(work))
 			pr_err("failed to create thread for cpu %d\n", cpu);
-		} else {
-			kthread_bind(work, cpu);
-			wake_up_process(work);
-		}
 	}
 
 	/* wait a while to make sure all threads waiting at start line */
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/20] net: pktgen: Use kthread_create_on_node()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 09/20] lib: test_objpool: " Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-27  7:58   ` Eric Dumazet
  2024-09-30 17:19   ` Vishal Chourasia
  2024-09-26 22:48 ` [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
                   ` (9 subsequent siblings)
  19 siblings, 2 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

Use the proper API instead of open coding it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 net/core/pktgen.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 34f68ef74b8f..7fcb4fc7a5d6 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3883,17 +3883,14 @@ static int __net_init pktgen_create_thread(int cpu, struct pktgen_net *pn)
 	list_add_tail(&t->th_list, &pn->pktgen_threads);
 	init_completion(&t->start_done);
 
-	p = kthread_create_on_node(pktgen_thread_worker,
-				   t,
-				   cpu_to_node(cpu),
-				   "kpktgend_%d", cpu);
+	p = kthread_create_on_cpu(pktgen_thread_worker, t, cpu, "kpktgend_%d");
 	if (IS_ERR(p)) {
 		pr_err("kthread_create_on_node() failed for cpu %d\n", t->cpu);
 		list_del(&t->th_list);
 		kfree(t);
 		return PTR_ERR(p);
 	}
-	kthread_bind(p, cpu);
+
 	t->tsk = p;
 
 	pe = proc_create_data(t->tsk->comm, 0600, pn->proc_dir,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 10/20] net: pktgen: Use kthread_create_on_node() Frederic Weisbecker
@ 2024-09-26 22:48 ` Frederic Weisbecker
  2024-09-27  7:26   ` Michal Hocko
  2024-10-08 10:54   ` Will Deacon
  2024-09-26 22:49 ` [PATCH 12/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:48 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Will Deacon, Peter Zijlstra, Vincent Guittot,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang,
	Uladzislau Rezki, rcu, Michal Hocko

When a kthread or any other task has an affinity mask that is fully
offline or unallowed, the scheduler reaffines the task to all possible
CPUs as a last resort.

This default decision doesn't mix up very well with nohz_full CPUs that
are part of the possible cpumask but don't want to be disturbed by
unbound kthreads or even detached pinned user tasks.

Make the fallback affinity setting aware of nohz_full. This applies to
all architectures supporting nohz_full except arm32. However this
architecture that overrides the task possible mask is unlikely to be
willing to integrate new development.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/sched/core.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..d4b759c1cbf1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3421,6 +3421,21 @@ void kick_process(struct task_struct *p)
 }
 EXPORT_SYMBOL_GPL(kick_process);
 
+static const struct cpumask *task_cpu_fallback_mask(struct task_struct *t)
+{
+	const struct cpumask *mask;
+
+	mask = task_cpu_possible_mask(p);
+	/*
+	 * Architectures that overrides the task possible mask
+	 * must handle CPU isolation.
+	 */
+	if (mask != cpu_possible_mask)
+		return mask;
+	else
+		return housekeeping_cpumask(HK_TYPE_TICK);
+}
+
 /*
  * ->cpus_ptr is protected by both rq->lock and p->pi_lock
  *
@@ -3489,7 +3504,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
+			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
 			state = fail;
 			break;
 		case fail:
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/20] kthread: Make sure kthread hasn't started while binding it
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2024-09-26 22:48 ` [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 13/20] kthread: Default affine kthread to its preferred NUMA node Frederic Weisbecker
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
	Zqiang, rcu, Uladzislau Rezki

Make sure the kthread is sleeping in the schedule_preempt_disabled()
call before calling its handler when kthread_bind[_mask]() is called
on it. This provides a sanity check verifying that the task is not
randomly blocked later at some point within its function handler, in
which case it could be just concurrently awaken, leaving the call to
do_set_cpus_allowed() without any effect until the next voluntary sleep.

Rely on the wake-up ordering to ensure that the newly introduced "started"
field returns the expected value:

    TASK A                                   TASK B
    ------                                   ------
READ kthread->started
wake_up_process(B)
   rq_lock()
   ...
   rq_unlock() // RELEASE
                                           schedule()
                                              rq_lock() // ACQUIRE
                                              // schedule task B
                                              rq_unlock()
                                              WRITE kthread->started

Similarly, writing kthread->started before subsequent voluntary sleeps
will be visible after calling wait_task_inactive() in
__kthread_bind_mask(), reporting potential misuse of the API.

Upcoming patches will make further use of this facility.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/kthread.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index db4ceb0f503c..1527a522cdd3 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -53,6 +53,7 @@ struct kthread_create_info
 struct kthread {
 	unsigned long flags;
 	unsigned int cpu;
+	int started;
 	int result;
 	int (*threadfn)(void *);
 	void *data;
@@ -382,6 +383,8 @@ static int kthread(void *_create)
 	schedule_preempt_disabled();
 	preempt_enable();
 
+	self->started = 1;
+
 	ret = -EINTR;
 	if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
 		cgroup_kthread_ready();
@@ -540,7 +543,9 @@ static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int
 
 void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
 {
+	struct kthread *kthread = to_kthread(p);
 	__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
+	WARN_ON_ONCE(kthread->started);
 }
 
 /**
@@ -554,7 +559,9 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
  */
 void kthread_bind(struct task_struct *p, unsigned int cpu)
 {
+	struct kthread *kthread = to_kthread(p);
 	__kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE);
+	WARN_ON_ONCE(kthread->started);
 }
 EXPORT_SYMBOL(kthread_bind);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/20] kthread: Default affine kthread to its preferred NUMA node
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 12/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 14/20] mm: Create/affine kcompactd to its preferred node Frederic Weisbecker
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
	Uladzislau Rezki, Zqiang, rcu

Kthreads attached to a preferred NUMA node for their task structure
allocation can also be assumed to run preferrably within that same node.

A more precise affinity is usually notified by calling
kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup.

For the others, a default affinity to the node is desired and sometimes
implemented with more or less success when it comes to deal with hotplug
events and nohz_full / CPU Isolation interactions:

- kcompactd is affine to its node and handles hotplug but not CPU Isolation
- kswapd is affine to its node and ignores hotplug and CPU Isolation
- A bunch of drivers create their kthreads on a specific node and
  don't take care about affining further.

Handle that default node affinity preference at the generic level
instead, provided a kthread is created on an actual node and doesn't
apply any specific affinity such as a given CPU or a custom cpumask to
bind to before its first wake-up.

This generic handling is aware of CPU hotplug events and CPU isolation
such that:

* When a housekeeping CPU goes up that is part of the node of a given
  kthread, the related task is re-affined to that own node if it was
  previously running on the default last resort online housekeeping set
  from other nodes.

* When a housekeeping CPU goes down while it was part of the node of a
  kthread, the running task is migrated (or the sleeping task is woken
  up) automatically by the scheduler to other housekeepers within the
  same node or, as a last resort, to all housekeepers from other nodes.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/cpuhotplug.h |   1 +
 kernel/kthread.c           | 106 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 2361ed4d2b15..228f27150a93 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -239,6 +239,7 @@ enum cpuhp_state {
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RANDOM_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
+	CPUHP_AP_KTHREADS_ONLINE,
 	CPUHP_AP_BASE_CACHEINFO_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 40,
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1527a522cdd3..736276d313c2 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -35,6 +35,9 @@ static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
 struct task_struct *kthreadd_task;
 
+static LIST_HEAD(kthreads_hotplug);
+static DEFINE_MUTEX(kthreads_hotplug_lock);
+
 struct kthread_create_info
 {
 	/* Information passed to kthread() from kthreadd. */
@@ -53,6 +56,7 @@ struct kthread_create_info
 struct kthread {
 	unsigned long flags;
 	unsigned int cpu;
+	unsigned int node;
 	int started;
 	int result;
 	int (*threadfn)(void *);
@@ -64,6 +68,8 @@ struct kthread {
 #endif
 	/* To store the full name if task comm is truncated. */
 	char *full_name;
+	struct task_struct *task;
+	struct list_head hotplug_node;
 };
 
 enum KTHREAD_BITS {
@@ -122,8 +128,11 @@ bool set_kthread_struct(struct task_struct *p)
 
 	init_completion(&kthread->exited);
 	init_completion(&kthread->parked);
+	INIT_LIST_HEAD(&kthread->hotplug_node);
 	p->vfork_done = &kthread->exited;
 
+	kthread->task = p;
+	kthread->node = tsk_fork_get_node(current);
 	p->worker_private = kthread;
 	return true;
 }
@@ -314,6 +323,11 @@ void __noreturn kthread_exit(long result)
 {
 	struct kthread *kthread = to_kthread(current);
 	kthread->result = result;
+	if (!list_empty(&kthread->hotplug_node)) {
+		mutex_lock(&kthreads_hotplug_lock);
+		list_del(&kthread->hotplug_node);
+		mutex_unlock(&kthreads_hotplug_lock);
+	}
 	do_exit(0);
 }
 EXPORT_SYMBOL(kthread_exit);
@@ -339,6 +353,48 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code)
 }
 EXPORT_SYMBOL(kthread_complete_and_exit);
 
+static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask)
+{
+	cpumask_and(cpumask, cpumask_of_node(kthread->node),
+		    housekeeping_cpumask(HK_TYPE_KTHREAD));
+
+	if (cpumask_empty(cpumask))
+		cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD));
+}
+
+static void kthread_affine_node(void)
+{
+	struct kthread *kthread = to_kthread(current);
+	cpumask_var_t affinity;
+
+	WARN_ON_ONCE(kthread_is_per_cpu(current));
+
+	if (kthread->node == NUMA_NO_NODE) {
+		housekeeping_affine(current, HK_TYPE_RCU);
+	} else {
+		if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) {
+			WARN_ON_ONCE(1);
+			return;
+		}
+
+		mutex_lock(&kthreads_hotplug_lock);
+		WARN_ON_ONCE(!list_empty(&kthread->hotplug_node));
+		list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
+		/*
+		 * The node cpumask is racy when read from kthread() but:
+		 * - a racing CPU going down will either fail on the subsequent
+		 *   call to set_cpus_allowed_ptr() or be migrated to housekeepers
+		 *   afterwards by the scheduler.
+		 * - a racing CPU going up will be handled by kthreads_online_cpu()
+		 */
+		kthread_fetch_affinity(kthread, affinity);
+		set_cpus_allowed_ptr(current, affinity);
+		mutex_unlock(&kthreads_hotplug_lock);
+
+		free_cpumask_var(affinity);
+	}
+}
+
 static int kthread(void *_create)
 {
 	static const struct sched_param param = { .sched_priority = 0 };
@@ -369,7 +425,6 @@ static int kthread(void *_create)
 	 * back to default in case they have been changed.
 	 */
 	sched_setscheduler_nocheck(current, SCHED_NORMAL, &param);
-	set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
 
 	/* OK, tell user we're spawned, wait for stop or wakeup */
 	__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -385,6 +440,9 @@ static int kthread(void *_create)
 
 	self->started = 1;
 
+	if (!(current->flags & PF_NO_SETAFFINITY))
+		kthread_affine_node();
+
 	ret = -EINTR;
 	if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
 		cgroup_kthread_ready();
@@ -779,6 +837,52 @@ int kthreadd(void *unused)
 	return 0;
 }
 
+/*
+ * Re-affine kthreads according to their preferences
+ * and the newly online CPU. The CPU down part is handled
+ * by select_fallback_rq() which default re-affines to
+ * housekeepers in case the preferred affinity doesn't
+ * apply anymore.
+ */
+static int kthreads_online_cpu(unsigned int cpu)
+{
+	cpumask_var_t affinity;
+	struct kthread *k;
+	int ret;
+
+	guard(mutex)(&kthreads_hotplug_lock);
+
+	if (list_empty(&kthreads_hotplug))
+		return 0;
+
+	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+		return -ENOMEM;
+
+	ret = 0;
+
+	list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
+		if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) ||
+				 kthread_is_per_cpu(k->task) ||
+				 k->node == NUMA_NO_NODE)) {
+			ret = -EINVAL;
+			continue;
+		}
+		kthread_fetch_affinity(k, affinity);
+		set_cpus_allowed_ptr(k->task, affinity);
+	}
+
+	free_cpumask_var(affinity);
+
+	return ret;
+}
+
+static int kthreads_init(void)
+{
+	return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online",
+				kthreads_online_cpu, NULL);
+}
+early_initcall(kthreads_init);
+
 void __kthread_init_worker(struct kthread_worker *worker,
 				const char *name,
 				struct lock_class_key *key)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 14/20] mm: Create/affine kcompactd to its preferred node
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 13/20] kthread: Default affine kthread to its preferred NUMA node Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 15/20] mm: Create/affine kswapd " Frederic Weisbecker
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, Andrew Morton,
	linux-mm, Peter Zijlstra, Thomas Gleixner, Michal Hocko

Kcompactd is dedicated to a specific node. As such it wants to be
preferrably affine to it, memory and CPUs-wise.

Use the proper kthread API to achieve that. As a bonus it takes care of
CPU-hotplug events and CPU-isolation on its behalf.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/compaction.c | 43 +++----------------------------------------
 1 file changed, 3 insertions(+), 40 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a2b16b08cbbf..a31c0f5758cf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3154,15 +3154,9 @@ void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx)
 static int kcompactd(void *p)
 {
 	pg_data_t *pgdat = (pg_data_t *)p;
-	struct task_struct *tsk = current;
 	long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
 	long timeout = default_timeout;
 
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-
-	if (!cpumask_empty(cpumask))
-		set_cpus_allowed_ptr(tsk, cpumask);
-
 	set_freezable();
 
 	pgdat->kcompactd_max_order = 0;
@@ -3233,10 +3227,12 @@ void __meminit kcompactd_run(int nid)
 	if (pgdat->kcompactd)
 		return;
 
-	pgdat->kcompactd = kthread_run(kcompactd, pgdat, "kcompactd%d", nid);
+	pgdat->kcompactd = kthread_create_on_node(kcompactd, pgdat, nid, "kcompactd%d", nid);
 	if (IS_ERR(pgdat->kcompactd)) {
 		pr_err("Failed to start kcompactd on node %d\n", nid);
 		pgdat->kcompactd = NULL;
+	} else {
+		wake_up_process(pgdat->kcompactd);
 	}
 }
 
@@ -3254,30 +3250,6 @@ void __meminit kcompactd_stop(int nid)
 	}
 }
 
-/*
- * It's optimal to keep kcompactd on the same CPUs as their memory, but
- * not required for correctness. So if the last cpu in a node goes
- * away, we get changed to run anywhere: as the first one comes back,
- * restore their cpu bindings.
- */
-static int kcompactd_cpu_online(unsigned int cpu)
-{
-	int nid;
-
-	for_each_node_state(nid, N_MEMORY) {
-		pg_data_t *pgdat = NODE_DATA(nid);
-		const struct cpumask *mask;
-
-		mask = cpumask_of_node(pgdat->node_id);
-
-		if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
-			/* One of our CPUs online: restore mask */
-			if (pgdat->kcompactd)
-				set_cpus_allowed_ptr(pgdat->kcompactd, mask);
-	}
-	return 0;
-}
-
 static int proc_dointvec_minmax_warn_RT_change(const struct ctl_table *table,
 		int write, void *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -3337,15 +3309,6 @@ static struct ctl_table vm_compaction[] = {
 static int __init kcompactd_init(void)
 {
 	int nid;
-	int ret;
-
-	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
-					"mm/compaction:online",
-					kcompactd_cpu_online, NULL);
-	if (ret < 0) {
-		pr_err("kcompactd: failed to register hotplug callbacks.\n");
-		return ret;
-	}
 
 	for_each_node_state(nid, N_MEMORY)
 		kcompactd_run(nid);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 15/20] mm: Create/affine kswapd to its preferred node
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (13 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 14/20] mm: Create/affine kcompactd to its preferred node Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 16/20] kthread: Implement preferred affinity Frederic Weisbecker
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, linux-mm,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Michal Hocko

kswapd is dedicated to a specific node. As such it wants to be
preferrably affine to it, memory and CPUs-wise.

Use the proper kthread API to achieve that. As a bonus it takes care of
CPU-hotplug events and CPU-isolation on its behalf.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/vmscan.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 749cdc110c74..2f2b75536d9c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7162,10 +7162,6 @@ static int kswapd(void *p)
 	unsigned int highest_zoneidx = MAX_NR_ZONES - 1;
 	pg_data_t *pgdat = (pg_data_t *)p;
 	struct task_struct *tsk = current;
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-
-	if (!cpumask_empty(cpumask))
-		set_cpus_allowed_ptr(tsk, cpumask);
 
 	/*
 	 * Tell the memory management that we're a "memory allocator",
@@ -7334,13 +7330,15 @@ void __meminit kswapd_run(int nid)
 
 	pgdat_kswapd_lock(pgdat);
 	if (!pgdat->kswapd) {
-		pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
+		pgdat->kswapd = kthread_create_on_node(kswapd, pgdat, nid, "kswapd%d", nid);
 		if (IS_ERR(pgdat->kswapd)) {
 			/* failure at boot is fatal */
 			pr_err("Failed to start kswapd on node %d，ret=%ld\n",
 				   nid, PTR_ERR(pgdat->kswapd));
 			BUG_ON(system_state < SYSTEM_RUNNING);
 			pgdat->kswapd = NULL;
+		} else {
+			wake_up_process(pgdat->kswapd);
 		}
 	}
 	pgdat_kswapd_unlock(pgdat);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 16/20] kthread: Implement preferred affinity
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (14 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 15/20] mm: Create/affine kswapd " Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 17/20] rcu: Use kthread preferred affinity for RCU boost Frederic Weisbecker
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
	Uladzislau Rezki, Zqiang, rcu

Affining kthreads follow either of four existing different patterns:

1) Per-CPU kthreads must stay affine to a single CPU and never execute
   relevant code on any other CPU. This is currently handled by smpboot
   code which takes care of CPU-hotplug operations.

2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
   run anywhere else. The affinity is set through kthread_bind_mask()
   and the subsystem takes care by itself to handle CPU-hotplug operations.

3) Kthreads that prefer to be affine to a specific NUMA node. That
   preferred affinity is applied by default when an actual node ID is
   passed on kthread creation, provided the kthread is not per-CPU and
   no call to kthread_bind_mask() has been issued before the first
   wake-up.

4) Similar to the previous point but kthreads have a preferred affinity
   different than a node. It is set manually like any other task and
   CPU-hotplug is supposed to be handled by the relevant subsystem so
   that the task is properly reaffined whenever a given CPU from the
   preferred affinity comes up. Also care must be taken so that the
   preferred affinity doesn't cross housekeeping cpumask boundaries.

Provide a function to handle the last usecase, mostly reusing the
current node default affinity infrastructure. kthread_affine_preferred()
is introduced, to be used just like kthread_bind_mask(), right after
kthread creation and before the first wake up. The kthread is then
affine right away to the cpumask passed through the API if it has online
housekeeping CPUs. Otherwise it will be affine to all online
housekeeping CPUs as a last resort.

As with node affinity, it is aware of CPU hotplug events such that:

* When a housekeeping CPU goes up that is part of the preferred affinity
  of a given kthread, the related task is re-affined to that preferred
  affinity if it was previously running on the default last resort
  online housekeeping set.

* When a housekeeping CPU goes down while it was part of the preferred
  affinity of a kthread, the running task is migrated (or the sleeping
  task is woken up) automatically by the scheduler to other housekeepers
  within the preferred affinity or, as a last resort, to all
  housekeepers from other nodes.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/kthread.h |  1 +
 kernel/kthread.c        | 68 ++++++++++++++++++++++++++++++++++++-----
 2 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b11f53c1ba2e..30209bdf83a2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data,
 void free_kthread_struct(struct task_struct *k);
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask);
 int kthread_stop(struct task_struct *k);
 int kthread_stop_put(struct task_struct *k);
 bool kthread_should_stop(void);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 736276d313c2..91037533afda 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -70,6 +70,7 @@ struct kthread {
 	char *full_name;
 	struct task_struct *task;
 	struct list_head hotplug_node;
+	struct cpumask *preferred_affinity;
 };
 
 enum KTHREAD_BITS {
@@ -327,6 +328,11 @@ void __noreturn kthread_exit(long result)
 		mutex_lock(&kthreads_hotplug_lock);
 		list_del(&kthread->hotplug_node);
 		mutex_unlock(&kthreads_hotplug_lock);
+
+		if (kthread->preferred_affinity) {
+			kfree(kthread->preferred_affinity);
+			kthread->preferred_affinity = NULL;
+		}
 	}
 	do_exit(0);
 }
@@ -355,9 +361,17 @@ EXPORT_SYMBOL(kthread_complete_and_exit);
 
 static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask)
 {
-	cpumask_and(cpumask, cpumask_of_node(kthread->node),
-		    housekeeping_cpumask(HK_TYPE_KTHREAD));
+	const struct cpumask *pref;
 
+	if (kthread->preferred_affinity) {
+		pref = kthread->preferred_affinity;
+	} else {
+		if (WARN_ON_ONCE(kthread->node == NUMA_NO_NODE))
+			return;
+		pref = cpumask_of_node(kthread->node);
+	}
+
+	cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD));
 	if (cpumask_empty(cpumask))
 		cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD));
 }
@@ -440,7 +454,7 @@ static int kthread(void *_create)
 
 	self->started = 1;
 
-	if (!(current->flags & PF_NO_SETAFFINITY))
+	if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity)
 		kthread_affine_node();
 
 	ret = -EINTR;
@@ -837,12 +851,53 @@ int kthreadd(void *unused)
 	return 0;
 }
 
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
+{
+	struct kthread *kthread = to_kthread(p);
+	cpumask_var_t affinity;
+	unsigned long flags;
+	int ret;
+
+	if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
+	WARN_ON_ONCE(kthread->preferred_affinity);
+
+	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+		return -ENOMEM;
+
+	kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
+	if (!kthread->preferred_affinity) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	mutex_lock(&kthreads_hotplug_lock);
+	cpumask_copy(kthread->preferred_affinity, mask);
+	WARN_ON_ONCE(!list_empty(&kthread->hotplug_node));
+	list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
+	kthread_fetch_affinity(kthread, affinity);
+
+	/* It's safe because the task is inactive. */
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	do_set_cpus_allowed(p, affinity);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+	mutex_unlock(&kthreads_hotplug_lock);
+out:
+	free_cpumask_var(affinity);
+
+	return 0;
+}
+
 /*
  * Re-affine kthreads according to their preferences
  * and the newly online CPU. The CPU down part is handled
  * by select_fallback_rq() which default re-affines to
- * housekeepers in case the preferred affinity doesn't
- * apply anymore.
+ * housekeepers from other nodes in case the preferred
+ * affinity doesn't apply anymore.
  */
 static int kthreads_online_cpu(unsigned int cpu)
 {
@@ -862,8 +917,7 @@ static int kthreads_online_cpu(unsigned int cpu)
 
 	list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
 		if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) ||
-				 kthread_is_per_cpu(k->task) ||
-				 k->node == NUMA_NO_NODE)) {
+				 kthread_is_per_cpu(k->task))) {
 			ret = -EINVAL;
 			continue;
 		}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 17/20] rcu: Use kthread preferred affinity for RCU boost
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (15 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 16/20] kthread: Implement preferred affinity Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 18/20] kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format Frederic Weisbecker
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Paul E. McKenney, Uladzislau Rezki,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang, rcu,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Michal Hocko,
	Vlastimil Babka

Now that kthreads have an infrastructure to handle preferred affinity
against CPU hotplug and housekeeping cpumask, convert RCU boost to use
it instead of handling all the constraints by itself.

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c        | 27 +++++++++++++++++++--------
 kernel/rcu/tree_plugin.h | 11 ++---------
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a60616e69b66..c1e9f0818d51 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -149,7 +149,6 @@ static int rcu_scheduler_fully_active __read_mostly;
 
 static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
 			      unsigned long gps, unsigned long flags);
-static struct task_struct *rcu_boost_task(struct rcu_node *rnp);
 static void invoke_rcu_core(void);
 static void rcu_report_exp_rdp(struct rcu_data *rdp);
 static void sync_sched_exp_online_cleanup(int cpu);
@@ -5007,6 +5006,22 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	return 0;
 }
 
+static void rcu_thread_affine_rnp(struct task_struct *t, struct rcu_node *rnp)
+{
+	cpumask_var_t affinity;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+		return;
+
+	for_each_leaf_node_possible_cpu(rnp, cpu)
+		cpumask_set_cpu(cpu, affinity);
+
+	kthread_affine_preferred(t, affinity);
+
+	free_cpumask_var(affinity);
+}
+
 /*
  * Update kthreads affinity during CPU-hotplug changes.
  *
@@ -5026,19 +5041,18 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu)
 	unsigned long mask;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
-	struct task_struct *task_boost, *task_exp;
+	struct task_struct *task_exp;
 
 	rdp = per_cpu_ptr(&rcu_data, cpu);
 	rnp = rdp->mynode;
 
-	task_boost = rcu_boost_task(rnp);
 	task_exp = rcu_exp_par_gp_task(rnp);
 
 	/*
-	 * If CPU is the boot one, those tasks are created later from early
+	 * If CPU is the boot one, this task is created later from early
 	 * initcall since kthreadd must be created first.
 	 */
-	if (!task_boost && !task_exp)
+	if (!task_exp)
 		return;
 
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
@@ -5060,9 +5074,6 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu)
 	if (task_exp)
 		set_cpus_allowed_ptr(task_exp, cm);
 
-	if (task_boost)
-		set_cpus_allowed_ptr(task_boost, cm);
-
 	mutex_unlock(&rnp->kthread_mutex);
 
 	free_cpumask_var(cm);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1c7cbd145d5e..223f3a02351e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1217,16 +1217,13 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	rnp->boost_kthread_task = t;
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
 	sp.sched_priority = kthread_prio;
 	sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+	rcu_thread_affine_rnp(t, rnp);
 	wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */
 }
 
-static struct task_struct *rcu_boost_task(struct rcu_node *rnp)
-{
-	return READ_ONCE(rnp->boost_kthread_task);
-}
-
 #else /* #ifdef CONFIG_RCU_BOOST */
 
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
@@ -1243,10 +1240,6 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
 {
 }
 
-static struct task_struct *rcu_boost_task(struct rcu_node *rnp)
-{
-	return NULL;
-}
 #endif /* #else #ifdef CONFIG_RCU_BOOST */
 
 /*
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 18/20] kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (16 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 17/20] rcu: Use kthread preferred affinity for RCU boost Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 19/20] treewide: Introduce kthread_run_worker[_on_cpu]() Frederic Weisbecker
  2024-09-26 22:49 ` [PATCH 20/20] rcu: Use kthread preferred affinity for RCU exp kworkers Frederic Weisbecker
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Paul E. McKenney, Uladzislau Rezki,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang, rcu,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Michal Hocko,
	Vlastimil Babka

kthread_create_on_cpu() uses the CPU argument as an implicit and unique
printf argument to add to the format whereas
kthread_create_worker_on_cpu() still relies on explicitly passing the
printf arguments. This difference in behaviour is error prone and
doesn't help standardizing per-CPU kthread names.

Unify the behaviours and convert kthread_create_worker_on_cpu() to
use the printf behaviour of kthread_create_on_cpu().

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 fs/erofs/zdata.c        |  2 +-
 include/linux/kthread.h | 21 +++++++++++----
 kernel/kthread.c        | 59 ++++++++++++++++++++++++-----------------
 3 files changed, 52 insertions(+), 30 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 8936790618c6..050aaa016ec8 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -318,7 +318,7 @@ static void erofs_destroy_percpu_workers(void)
 static struct kthread_worker *erofs_init_percpu_worker(int cpu)
 {
 	struct kthread_worker *worker =
-		kthread_create_worker_on_cpu(cpu, 0, "erofs_worker/%u", cpu);
+		kthread_create_worker_on_cpu(cpu, 0, "erofs_worker/%u");
 
 	if (IS_ERR(worker))
 		return worker;
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 30209bdf83a2..0c66e7c1092a 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -187,13 +187,24 @@ extern void __kthread_init_worker(struct kthread_worker *worker,
 
 int kthread_worker_fn(void *worker_ptr);
 
-__printf(2, 3)
+__printf(3, 4)
+struct kthread_worker *kthread_create_worker_on_node(unsigned int flags,
+						     int node,
+						     const char namefmt[], ...);
+
+#define kthread_create_worker(flags, namefmt, ...) \
+({									   \
+	struct kthread_worker *__kw					   \
+		= kthread_create_worker_on_node(flags, NUMA_NO_NODE,	   \
+						namefmt, ## __VA_ARGS__);  \
+	if (!IS_ERR(__kw))						   \
+		wake_up_process(__kw->task);				   \
+	__kw;								   \
+})
+
 struct kthread_worker *
-kthread_create_worker(unsigned int flags, const char namefmt[], ...);
-
-__printf(3, 4) struct kthread_worker *
 kthread_create_worker_on_cpu(int cpu, unsigned int flags,
-			     const char namefmt[], ...);
+			     const char namefmt[]);
 
 bool kthread_queue_work(struct kthread_worker *worker,
 			struct kthread_work *work);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 91037533afda..7eb93c248c59 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1028,12 +1028,11 @@ int kthread_worker_fn(void *worker_ptr)
 EXPORT_SYMBOL_GPL(kthread_worker_fn);
 
 static __printf(3, 0) struct kthread_worker *
-__kthread_create_worker(int cpu, unsigned int flags,
-			const char namefmt[], va_list args)
+__kthread_create_worker_on_node(unsigned int flags, int node,
+				const char namefmt[], va_list args)
 {
 	struct kthread_worker *worker;
 	struct task_struct *task;
-	int node = NUMA_NO_NODE;
 
 	worker = kzalloc(sizeof(*worker), GFP_KERNEL);
 	if (!worker)
@@ -1041,20 +1040,14 @@ __kthread_create_worker(int cpu, unsigned int flags,
 
 	kthread_init_worker(worker);
 
-	if (cpu >= 0)
-		node = cpu_to_node(cpu);
-
 	task = __kthread_create_on_node(kthread_worker_fn, worker,
-						node, namefmt, args);
+					node, namefmt, args);
 	if (IS_ERR(task))
 		goto fail_task;
 
-	if (cpu >= 0)
-		kthread_bind(task, cpu);
-
 	worker->flags = flags;
 	worker->task = task;
-	wake_up_process(task);
+
 	return worker;
 
 fail_task:
@@ -1065,6 +1058,7 @@ __kthread_create_worker(int cpu, unsigned int flags,
 /**
  * kthread_create_worker - create a kthread worker
  * @flags: flags modifying the default behavior of the worker
+ * @node: task structure for the thread is allocated on this node
  * @namefmt: printf-style name for the kthread worker (task).
  *
  * Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM)
@@ -1072,25 +1066,49 @@ __kthread_create_worker(int cpu, unsigned int flags,
  * when the caller was killed by a fatal signal.
  */
 struct kthread_worker *
-kthread_create_worker(unsigned int flags, const char namefmt[], ...)
+kthread_create_worker_on_node(unsigned int flags, int node, const char namefmt[], ...)
 {
 	struct kthread_worker *worker;
 	va_list args;
 
 	va_start(args, namefmt);
-	worker = __kthread_create_worker(-1, flags, namefmt, args);
+	worker = __kthread_create_worker_on_node(flags, node, namefmt, args);
 	va_end(args);
 
+	if (worker)
+		wake_up_process(worker->task);
+
+	return worker;
+}
+EXPORT_SYMBOL(kthread_create_worker_on_node);
+
+static __printf(3, 4) struct kthread_worker *
+__kthread_create_worker_on_cpu(int cpu, unsigned int flags,
+			       const char namefmt[], ...)
+{
+	struct kthread_worker *worker;
+	va_list args;
+
+	va_start(args, namefmt);
+	worker = __kthread_create_worker_on_node(flags, cpu_to_node(cpu),
+						 namefmt, args);
+	va_end(args);
+
+	if (worker) {
+		kthread_bind(worker->task, cpu);
+		wake_up_process(worker->task);
+	}
+
 	return worker;
 }
-EXPORT_SYMBOL(kthread_create_worker);
 
 /**
  * kthread_create_worker_on_cpu - create a kthread worker and bind it
  *	to a given CPU and the associated NUMA node.
  * @cpu: CPU number
  * @flags: flags modifying the default behavior of the worker
- * @namefmt: printf-style name for the kthread worker (task).
+ * @namefmt: printf-style name for the thread. Format is restricted
+ *	     to "name.*%u". Code fills in cpu number.
  *
  * Use a valid CPU number if you want to bind the kthread worker
  * to the given CPU and the associated NUMA node.
@@ -1122,16 +1140,9 @@ EXPORT_SYMBOL(kthread_create_worker);
  */
 struct kthread_worker *
 kthread_create_worker_on_cpu(int cpu, unsigned int flags,
-			     const char namefmt[], ...)
+			     const char namefmt[])
 {
-	struct kthread_worker *worker;
-	va_list args;
-
-	va_start(args, namefmt);
-	worker = __kthread_create_worker(cpu, flags, namefmt, args);
-	va_end(args);
-
-	return worker;
+	return __kthread_create_worker_on_cpu(cpu, flags, namefmt, cpu);
 }
 EXPORT_SYMBOL(kthread_create_worker_on_cpu);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 19/20] treewide: Introduce kthread_run_worker[_on_cpu]()
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (17 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 18/20] kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  2024-09-27  5:39   ` Paul E. McKenney
  2024-09-26 22:49 ` [PATCH 20/20] rcu: Use kthread preferred affinity for RCU exp kworkers Frederic Weisbecker
  19 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Paul E. McKenney, Uladzislau Rezki,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang, rcu,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Michal Hocko,
	Vlastimil Babka

kthread_create() creates a kthread without running it yet. kthread_run()
creates a kthread and runs it.

On the other hand, kthread_create_worker() creates a kthread worker and
runs it.

This difference in behaviours is confusing. Also there is no way to
create a kthread worker and affine it using kthread_bind_mask() or
kthread_affine_preferred() before starting it.

Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]()
that behaves just like kthread_run(). kthread_create_worker[_on_cpu]()
will now only create a kthread worker without starting it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/kvm/i8254.c                          |  2 +-
 crypto/crypto_engine.c                        |  2 +-
 drivers/cpufreq/cppc_cpufreq.c                |  2 +-
 drivers/gpu/drm/drm_vblank_work.c             |  2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |  2 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_slpc.c       |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  8 ++--
 drivers/gpu/drm/msm/disp/msm_disp_snapshot.c  |  2 +-
 drivers/gpu/drm/msm/msm_atomic.c              |  2 +-
 drivers/gpu/drm/msm/msm_gpu.c                 |  2 +-
 drivers/gpu/drm/msm/msm_kms.c                 |  2 +-
 .../platform/chips-media/wave5/wave5-vpu.c    |  2 +-
 drivers/net/dsa/mv88e6xxx/chip.c              |  2 +-
 drivers/net/ethernet/intel/ice/ice_dpll.c     |  2 +-
 drivers/net/ethernet/intel/ice/ice_gnss.c     |  2 +-
 drivers/net/ethernet/intel/ice/ice_ptp.c      |  2 +-
 drivers/platform/chrome/cros_ec_spi.c         |  2 +-
 drivers/ptp/ptp_clock.c                       |  2 +-
 drivers/spi/spi.c                             |  2 +-
 drivers/usb/typec/tcpm/tcpm.c                 |  2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c              |  2 +-
 drivers/watchdog/watchdog_dev.c               |  2 +-
 fs/erofs/zdata.c                              |  2 +-
 include/linux/kthread.h                       | 48 ++++++++++++++++---
 kernel/kthread.c                              | 31 +++---------
 kernel/rcu/tree.c                             |  4 +-
 kernel/sched/ext.c                            |  2 +-
 kernel/workqueue.c                            |  2 +-
 net/dsa/tag_ksz.c                             |  2 +-
 net/dsa/tag_ocelot_8021q.c                    |  2 +-
 net/dsa/tag_sja1105.c                         |  2 +-
 33 files changed, 83 insertions(+), 66 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index cd57a517d04a..d7ab8780ab9e 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -681,7 +681,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	pid_nr = pid_vnr(pid);
 	put_pid(pid);
 
-	pit->worker = kthread_create_worker(0, "kvm-pit/%d", pid_nr);
+	pit->worker = kthread_run_worker(0, "kvm-pit/%d", pid_nr);
 	if (IS_ERR(pit->worker))
 		goto fail_kthread;
 
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index e60a0eb628e8..c7c16da5e649 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -517,7 +517,7 @@ struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
 	crypto_init_queue(&engine->queue, qlen);
 	spin_lock_init(&engine->queue_lock);
 
-	engine->kworker = kthread_create_worker(0, "%s", engine->name);
+	engine->kworker = kthread_run_worker(0, "%s", engine->name);
 	if (IS_ERR(engine->kworker)) {
 		dev_err(dev, "failed to create crypto request pump task\n");
 		return NULL;
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 1a5ad184d28f..9b91cba133c9 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -241,7 +241,7 @@ static void __init cppc_freq_invariance_init(void)
 	if (fie_disabled)
 		return;
 
-	kworker_fie = kthread_create_worker(0, "cppc_fie");
+	kworker_fie = kthread_run_worker(0, "cppc_fie");
 	if (IS_ERR(kworker_fie)) {
 		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
 			PTR_ERR(kworker_fie));
diff --git a/drivers/gpu/drm/drm_vblank_work.c b/drivers/gpu/drm/drm_vblank_work.c
index 1752ffb44e1d..9cc71120246f 100644
--- a/drivers/gpu/drm/drm_vblank_work.c
+++ b/drivers/gpu/drm/drm_vblank_work.c
@@ -277,7 +277,7 @@ int drm_vblank_worker_init(struct drm_vblank_crtc *vblank)
 
 	INIT_LIST_HEAD(&vblank->pending_work);
 	init_waitqueue_head(&vblank->work_wait_queue);
-	worker = kthread_create_worker(0, "card%d-crtc%d",
+	worker = kthread_run_worker(0, "card%d-crtc%d",
 				       vblank->dev->primary->index,
 				       vblank->pipe);
 	if (IS_ERR(worker))
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 89d4dc8b60c6..eb0158e43417 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -369,7 +369,7 @@ static int live_parallel_switch(void *arg)
 		if (!data[n].ce[0])
 			continue;
 
-		worker = kthread_create_worker(0, "igt/parallel:%s",
+		worker = kthread_run_worker(0, "igt/parallel:%s",
 					       data[n].ce[0]->engine->name);
 		if (IS_ERR(worker)) {
 			err = PTR_ERR(worker);
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 222ca7c44951..81c31396eceb 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3574,7 +3574,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
 			arg[id].batch = NULL;
 		arg[id].count = 0;
 
-		worker[id] = kthread_create_worker(0, "igt/smoke:%d", id);
+		worker[id] = kthread_run_worker(0, "igt/smoke:%d", id);
 		if (IS_ERR(worker[id])) {
 			err = PTR_ERR(worker[id]);
 			break;
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 9ce8ff1c04fe..9d3aeb237295 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -1025,7 +1025,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			threads[tmp].engine = other;
 			threads[tmp].flags = flags;
 
-			worker = kthread_create_worker(0, "igt/%s",
+			worker = kthread_run_worker(0, "igt/%s",
 						       other->name);
 			if (IS_ERR(worker)) {
 				err = PTR_ERR(worker);
diff --git a/drivers/gpu/drm/i915/gt/selftest_slpc.c b/drivers/gpu/drm/i915/gt/selftest_slpc.c
index 4ecc4ae74a54..e218b229681f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_slpc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_slpc.c
@@ -489,7 +489,7 @@ static int live_slpc_tile_interaction(void *arg)
 		return -ENOMEM;
 
 	for_each_gt(gt, i915, i) {
-		threads[i].worker = kthread_create_worker(0, "igt/slpc_parallel:%d", gt->info.id);
+		threads[i].worker = kthread_run_worker(0, "igt/slpc_parallel:%d", gt->info.id);
 
 		if (IS_ERR(threads[i].worker)) {
 			ret = PTR_ERR(threads[i].worker);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index acae30a04a94..88870844b5bd 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -492,7 +492,7 @@ static int mock_breadcrumbs_smoketest(void *arg)
 	for (n = 0; n < ncpus; n++) {
 		struct kthread_worker *worker;
 
-		worker = kthread_create_worker(0, "igt/%d", n);
+		worker = kthread_run_worker(0, "igt/%d", n);
 		if (IS_ERR(worker)) {
 			ret = PTR_ERR(worker);
 			ncpus = n;
@@ -1645,7 +1645,7 @@ static int live_parallel_engines(void *arg)
 		for_each_uabi_engine(engine, i915) {
 			struct kthread_worker *worker;
 
-			worker = kthread_create_worker(0, "igt/parallel:%s",
+			worker = kthread_run_worker(0, "igt/parallel:%s",
 						       engine->name);
 			if (IS_ERR(worker)) {
 				err = PTR_ERR(worker);
@@ -1806,7 +1806,7 @@ static int live_breadcrumbs_smoketest(void *arg)
 			unsigned int i = idx * ncpus + n;
 			struct kthread_worker *worker;
 
-			worker = kthread_create_worker(0, "igt/%d.%d", idx, n);
+			worker = kthread_run_worker(0, "igt/%d.%d", idx, n);
 			if (IS_ERR(worker)) {
 				ret = PTR_ERR(worker);
 				goto out_flush;
@@ -3219,7 +3219,7 @@ static int perf_parallel_engines(void *arg)
 
 			memset(&engines[idx].p, 0, sizeof(engines[idx].p));
 
-			worker = kthread_create_worker(0, "igt:%s",
+			worker = kthread_run_worker(0, "igt:%s",
 						       engine->name);
 			if (IS_ERR(worker)) {
 				err = PTR_ERR(worker);
diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
index e75b97127c0d..2be00b11e557 100644
--- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
+++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
@@ -109,7 +109,7 @@ int msm_disp_snapshot_init(struct drm_device *drm_dev)
 
 	mutex_init(&kms->dump_mutex);
 
-	kms->dump_worker = kthread_create_worker(0, "%s", "disp_snapshot");
+	kms->dump_worker = kthread_run_worker(0, "%s", "disp_snapshot");
 	if (IS_ERR(kms->dump_worker))
 		DRM_ERROR("failed to create disp state task\n");
 
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index 9c45d641b521..a7a2384044ff 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -115,7 +115,7 @@ int msm_atomic_init_pending_timer(struct msm_pending_timer *timer,
 	timer->kms = kms;
 	timer->crtc_idx = crtc_idx;
 
-	timer->worker = kthread_create_worker(0, "atomic-worker-%d", crtc_idx);
+	timer->worker = kthread_run_worker(0, "atomic-worker-%d", crtc_idx);
 	if (IS_ERR(timer->worker)) {
 		int ret = PTR_ERR(timer->worker);
 		timer->worker = NULL;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a274b8466423..15f74e9dfc9e 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -859,7 +859,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	gpu->funcs = funcs;
 	gpu->name = name;
 
-	gpu->worker = kthread_create_worker(0, "gpu-worker");
+	gpu->worker = kthread_run_worker(0, "gpu-worker");
 	if (IS_ERR(gpu->worker)) {
 		ret = PTR_ERR(gpu->worker);
 		gpu->worker = NULL;
diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
index af6a6fcb1173..8db9f3afb8ac 100644
--- a/drivers/gpu/drm/msm/msm_kms.c
+++ b/drivers/gpu/drm/msm/msm_kms.c
@@ -269,7 +269,7 @@ int msm_drm_kms_init(struct device *dev, const struct drm_driver *drv)
 		/* initialize event thread */
 		ev_thread = &priv->event_thread[drm_crtc_index(crtc)];
 		ev_thread->dev = ddev;
-		ev_thread->worker = kthread_create_worker(0, "crtc_event:%d", crtc->base.id);
+		ev_thread->worker = kthread_run_worker(0, "crtc_event:%d", crtc->base.id);
 		if (IS_ERR(ev_thread->worker)) {
 			ret = PTR_ERR(ev_thread->worker);
 			DRM_DEV_ERROR(dev, "failed to create crtc_event kthread\n");
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
index 7273254ecb03..c49f5ed461cf 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
@@ -231,7 +231,7 @@ static int wave5_vpu_probe(struct platform_device *pdev)
 		dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n");
 		hrtimer_init(&dev->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 		dev->hrtimer.function = &wave5_vpu_timer_callback;
-		dev->worker = kthread_create_worker(0, "vpu_irq_thread");
+		dev->worker = kthread_run_worker(0, "vpu_irq_thread");
 		if (IS_ERR(dev->worker)) {
 			dev_err(&pdev->dev, "failed to create vpu irq worker\n");
 			ret = PTR_ERR(dev->worker);
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5b4e2ce5470d..a5908e2ff2cf 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -393,7 +393,7 @@ static int mv88e6xxx_irq_poll_setup(struct mv88e6xxx_chip *chip)
 	kthread_init_delayed_work(&chip->irq_poll_work,
 				  mv88e6xxx_irq_poll);
 
-	chip->kworker = kthread_create_worker(0, "%s", dev_name(chip->dev));
+	chip->kworker = kthread_run_worker(0, "%s", dev_name(chip->dev));
 	if (IS_ERR(chip->kworker))
 		return PTR_ERR(chip->kworker);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index cd95705d1e7f..1f11a24387f3 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -2050,7 +2050,7 @@ static int ice_dpll_init_worker(struct ice_pf *pf)
 	struct kthread_worker *kworker;
 
 	kthread_init_delayed_work(&d->work, ice_dpll_periodic_work);
-	kworker = kthread_create_worker(0, "ice-dplls-%s",
+	kworker = kthread_run_worker(0, "ice-dplls-%s",
 					dev_name(ice_pf_to_dev(pf)));
 	if (IS_ERR(kworker))
 		return PTR_ERR(kworker);
diff --git a/drivers/net/ethernet/intel/ice/ice_gnss.c b/drivers/net/ethernet/intel/ice/ice_gnss.c
index c8ea1af51ad3..fcd1f808b696 100644
--- a/drivers/net/ethernet/intel/ice/ice_gnss.c
+++ b/drivers/net/ethernet/intel/ice/ice_gnss.c
@@ -182,7 +182,7 @@ static struct gnss_serial *ice_gnss_struct_init(struct ice_pf *pf)
 	pf->gnss_serial = gnss;
 
 	kthread_init_delayed_work(&gnss->read_work, ice_gnss_read);
-	kworker = kthread_create_worker(0, "ice-gnss-%s", dev_name(dev));
+	kworker = kthread_run_worker(0, "ice-gnss-%s", dev_name(dev));
 	if (IS_ERR(kworker)) {
 		kfree(gnss);
 		return NULL;
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index ef2e858f49bb..cd7da48bdf91 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -3185,7 +3185,7 @@ static int ice_ptp_init_work(struct ice_pf *pf, struct ice_ptp *ptp)
 	/* Allocate a kworker for handling work required for the ports
 	 * connected to the PTP hardware clock.
 	 */
-	kworker = kthread_create_worker(0, "ice-ptp-%s",
+	kworker = kthread_run_worker(0, "ice-ptp-%s",
 					dev_name(ice_pf_to_dev(pf)));
 	if (IS_ERR(kworker))
 		return PTR_ERR(kworker);
diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
index 86a3d32a7763..08f566cc1480 100644
--- a/drivers/platform/chrome/cros_ec_spi.c
+++ b/drivers/platform/chrome/cros_ec_spi.c
@@ -715,7 +715,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
 	int err;
 
 	ec_spi->high_pri_worker =
-		kthread_create_worker(0, "cros_ec_spi_high_pri");
+		kthread_run_worker(0, "cros_ec_spi_high_pri");
 
 	if (IS_ERR(ec_spi->high_pri_worker)) {
 		err = PTR_ERR(ec_spi->high_pri_worker);
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index c56cd0f63909..89a4420972e7 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -295,7 +295,7 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
 
 	if (ptp->info->do_aux_work) {
 		kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
-		ptp->kworker = kthread_create_worker(0, "ptp%d", ptp->index);
+		ptp->kworker = kthread_run_worker(0, "ptp%d", ptp->index);
 		if (IS_ERR(ptp->kworker)) {
 			err = PTR_ERR(ptp->kworker);
 			pr_err("failed to create ptp aux_worker %d\n", err);
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index c1dad30a4528..f2f4b6ee25d4 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2053,7 +2053,7 @@ static int spi_init_queue(struct spi_controller *ctlr)
 	ctlr->busy = false;
 	ctlr->queue_empty = true;
 
-	ctlr->kworker = kthread_create_worker(0, dev_name(&ctlr->dev));
+	ctlr->kworker = kthread_run_worker(0, dev_name(&ctlr->dev));
 	if (IS_ERR(ctlr->kworker)) {
 		dev_err(&ctlr->dev, "failed to create message pump kworker\n");
 		return PTR_ERR(ctlr->kworker);
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index fc619478200f..66ae934ad196 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -7577,7 +7577,7 @@ struct tcpm_port *tcpm_register_port(struct device *dev, struct tcpc_dev *tcpc)
 	mutex_init(&port->lock);
 	mutex_init(&port->swap_lock);
 
-	port->wq = kthread_create_worker(0, dev_name(dev));
+	port->wq = kthread_run_worker(0, dev_name(dev));
 	if (IS_ERR(port->wq))
 		return ERR_CAST(port->wq);
 	sched_set_fifo(port->wq->task);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 8ffea8430f95..c204fc8e471a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -229,7 +229,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 	dev = &vdpasim->vdpa.dev;
 
 	kthread_init_work(&vdpasim->work, vdpasim_work_fn);
-	vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
+	vdpasim->worker = kthread_run_worker(0, "vDPA sim worker: %s",
 						dev_attr->name);
 	if (IS_ERR(vdpasim->worker))
 		goto err_iommu;
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 4190cb800cc4..19698d87dc57 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -1229,7 +1229,7 @@ int __init watchdog_dev_init(void)
 {
 	int err;
 
-	watchdog_kworker = kthread_create_worker(0, "watchdogd");
+	watchdog_kworker = kthread_run_worker(0, "watchdogd");
 	if (IS_ERR(watchdog_kworker)) {
 		pr_err("Failed to create watchdog kworker\n");
 		return PTR_ERR(watchdog_kworker);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 050aaa016ec8..bf6b4d8cb283 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -318,7 +318,7 @@ static void erofs_destroy_percpu_workers(void)
 static struct kthread_worker *erofs_init_percpu_worker(int cpu)
 {
 	struct kthread_worker *worker =
-		kthread_create_worker_on_cpu(cpu, 0, "erofs_worker/%u");
+		kthread_run_worker_on_cpu(cpu, 0, "erofs_worker/%u");
 
 	if (IS_ERR(worker))
 		return worker;
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 0c66e7c1092a..8d27403888ce 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -193,19 +193,53 @@ struct kthread_worker *kthread_create_worker_on_node(unsigned int flags,
 						     const char namefmt[], ...);
 
 #define kthread_create_worker(flags, namefmt, ...) \
-({									   \
-	struct kthread_worker *__kw					   \
-		= kthread_create_worker_on_node(flags, NUMA_NO_NODE,	   \
-						namefmt, ## __VA_ARGS__);  \
-	if (!IS_ERR(__kw))						   \
-		wake_up_process(__kw->task);				   \
-	__kw;								   \
+	kthread_create_worker_on_node(flags, NUMA_NO_NODE, namefmt, ## __VA_ARGS__);
+
+/**
+ * kthread_run_worker - create and wake a kthread worker.
+ * @flags: flags modifying the default behavior of the worker
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create_worker() followed by
+ * wake_up_process().  Returns the kthread_worker or ERR_PTR(-ENOMEM).
+ */
+#define kthread_run_worker(flags, namefmt, ...)					\
+({										\
+	struct kthread_worker *__kw						\
+		= kthread_create_worker(flags, namefmt, ## __VA_ARGS__);	\
+	if (!IS_ERR(__kw))							\
+		wake_up_process(__kw->task);					\
+	__kw;									\
 })
 
 struct kthread_worker *
 kthread_create_worker_on_cpu(int cpu, unsigned int flags,
 			     const char namefmt[]);
 
+/**
+ * kthread_run_worker_on_cpu - create and wake a cpu bound kthread worker.
+ * @cpu: CPU number
+ * @flags: flags modifying the default behavior of the worker
+ * @namefmt: printf-style name for the thread. Format is restricted
+ *	     to "name.*%u". Code fills in cpu number.
+ *
+ * Description: Convenient wrapper for kthread_create_worker_on_cpu()
+ * followed by wake_up_process().  Returns the kthread_worker or
+ * ERR_PTR(-ENOMEM).
+ */
+static inline struct kthread_worker *
+kthread_run_worker_on_cpu(int cpu, unsigned int flags,
+			  const char namefmt[])
+{
+	struct kthread_worker *kw;
+
+	kw = kthread_create_worker_on_cpu(cpu, flags, namefmt);
+	if (!IS_ERR(kw))
+		wake_up_process(kw->task);
+
+	return kw;
+}
+
 bool kthread_queue_work(struct kthread_worker *worker,
 			struct kthread_work *work);
 
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 7eb93c248c59..d9fee08e9a66 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1075,33 +1075,10 @@ kthread_create_worker_on_node(unsigned int flags, int node, const char namefmt[]
 	worker = __kthread_create_worker_on_node(flags, node, namefmt, args);
 	va_end(args);
 
-	if (worker)
-		wake_up_process(worker->task);
-
 	return worker;
 }
 EXPORT_SYMBOL(kthread_create_worker_on_node);
 
-static __printf(3, 4) struct kthread_worker *
-__kthread_create_worker_on_cpu(int cpu, unsigned int flags,
-			       const char namefmt[], ...)
-{
-	struct kthread_worker *worker;
-	va_list args;
-
-	va_start(args, namefmt);
-	worker = __kthread_create_worker_on_node(flags, cpu_to_node(cpu),
-						 namefmt, args);
-	va_end(args);
-
-	if (worker) {
-		kthread_bind(worker->task, cpu);
-		wake_up_process(worker->task);
-	}
-
-	return worker;
-}
-
 /**
  * kthread_create_worker_on_cpu - create a kthread worker and bind it
  *	to a given CPU and the associated NUMA node.
@@ -1142,7 +1119,13 @@ struct kthread_worker *
 kthread_create_worker_on_cpu(int cpu, unsigned int flags,
 			     const char namefmt[])
 {
-	return __kthread_create_worker_on_cpu(cpu, flags, namefmt, cpu);
+	struct kthread_worker *worker;
+
+	worker = kthread_create_worker_on_node(flags, cpu_to_node(cpu), namefmt, cpu);
+	if (worker)
+		kthread_bind(worker->task, cpu);
+
+	return worker;
 }
 EXPORT_SYMBOL(kthread_create_worker_on_cpu);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1e9f0818d51..a44228b0949a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4902,7 +4902,7 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp)
 	if (rnp->exp_kworker)
 		return;
 
-	kworker = kthread_create_worker(0, name, rnp_index);
+	kworker = kthread_run_worker(0, name, rnp_index);
 	if (IS_ERR_OR_NULL(kworker)) {
 		pr_err("Failed to create par gp kworker on %d/%d\n",
 		       rnp->grplo, rnp->grphi);
@@ -4929,7 +4929,7 @@ static void __init rcu_start_exp_gp_kworker(void)
 	const char *name = "rcu_exp_gp_kthread_worker";
 	struct sched_param param = { .sched_priority = kthread_prio };
 
-	rcu_exp_gp_kworker = kthread_create_worker(0, name);
+	rcu_exp_gp_kworker = kthread_run_worker(0, name);
 	if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) {
 		pr_err("Failed to create %s!\n", name);
 		rcu_exp_gp_kworker = NULL;
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c09e3dc38c34..4835fa4d9326 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4885,7 +4885,7 @@ static struct kthread_worker *scx_create_rt_helper(const char *name)
 {
 	struct kthread_worker *helper;
 
-	helper = kthread_create_worker(0, name);
+	helper = kthread_run_worker(0, name);
 	if (helper)
 		sched_set_fifo(helper->task);
 	return helper;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9949ffad8df0..f5c7447ae1de 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7814,7 +7814,7 @@ static void __init wq_cpu_intensive_thresh_init(void)
 	unsigned long thresh;
 	unsigned long bogo;
 
-	pwq_release_worker = kthread_create_worker(0, "pool_workqueue_release");
+	pwq_release_worker = kthread_run_worker(0, "pool_workqueue_release");
 	BUG_ON(IS_ERR(pwq_release_worker));
 
 	/* if the user set it to a specific value, keep it */
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 281bbac5539d..c33d4bf17929 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -66,7 +66,7 @@ static int ksz_connect(struct dsa_switch *ds)
 	if (!priv)
 		return -ENOMEM;
 
-	xmit_worker = kthread_create_worker(0, "dsa%d:%d_xmit",
+	xmit_worker = kthread_run_worker(0, "dsa%d:%d_xmit",
 					    ds->dst->index, ds->index);
 	if (IS_ERR(xmit_worker)) {
 		ret = PTR_ERR(xmit_worker);
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..6ce0bc166792 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -110,7 +110,7 @@ static int ocelot_connect(struct dsa_switch *ds)
 	if (!priv)
 		return -ENOMEM;
 
-	priv->xmit_worker = kthread_create_worker(0, "felix_xmit");
+	priv->xmit_worker = kthread_run_worker(0, "felix_xmit");
 	if (IS_ERR(priv->xmit_worker)) {
 		err = PTR_ERR(priv->xmit_worker);
 		kfree(priv);
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 3e902af7eea6..02adec693811 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -707,7 +707,7 @@ static int sja1105_connect(struct dsa_switch *ds)
 
 	spin_lock_init(&priv->meta_lock);
 
-	xmit_worker = kthread_create_worker(0, "dsa%d:%d_xmit",
+	xmit_worker = kthread_run_worker(0, "dsa%d:%d_xmit",
 					    ds->dst->index, ds->index);
 	if (IS_ERR(xmit_worker)) {
 		err = PTR_ERR(xmit_worker);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 20/20] rcu: Use kthread preferred affinity for RCU exp kworkers
  2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
                   ` (18 preceding siblings ...)
  2024-09-26 22:49 ` [PATCH 19/20] treewide: Introduce kthread_run_worker[_on_cpu]() Frederic Weisbecker
@ 2024-09-26 22:49 ` Frederic Weisbecker
  19 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-09-26 22:49 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Paul E. McKenney, Uladzislau Rezki,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang, rcu,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Michal Hocko,
	Vlastimil Babka

Now that kthreads have an infrastructure to handle preferred affinity
against CPU hotplug and housekeeping cpumask, convert RCU exp workers to
use it instead of handling all the constraints by itself.

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 105 +++++++++-------------------------------------
 1 file changed, 19 insertions(+), 86 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a44228b0949a..d377a162c56c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4890,6 +4890,22 @@ rcu_boot_init_percpu_data(int cpu)
 	rcu_boot_init_nocb_percpu_data(rdp);
 }
 
+static void rcu_thread_affine_rnp(struct task_struct *t, struct rcu_node *rnp)
+{
+	cpumask_var_t affinity;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+		return;
+
+	for_each_leaf_node_possible_cpu(rnp, cpu)
+		cpumask_set_cpu(cpu, affinity);
+
+	kthread_affine_preferred(t, affinity);
+
+	free_cpumask_var(affinity);
+}
+
 struct kthread_worker *rcu_exp_gp_kworker;
 
 static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp)
@@ -4902,7 +4918,7 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp)
 	if (rnp->exp_kworker)
 		return;
 
-	kworker = kthread_run_worker(0, name, rnp_index);
+	kworker = kthread_create_worker(0, name, rnp_index);
 	if (IS_ERR_OR_NULL(kworker)) {
 		pr_err("Failed to create par gp kworker on %d/%d\n",
 		       rnp->grplo, rnp->grphi);
@@ -4912,16 +4928,9 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp)
 
 	if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD))
 		sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, &param);
-}
 
-static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp)
-{
-	struct kthread_worker *kworker = READ_ONCE(rnp->exp_kworker);
-
-	if (!kworker)
-		return NULL;
-
-	return kworker->task;
+	rcu_thread_affine_rnp(kworker->task, rnp);
+	wake_up_process(kworker->task);
 }
 
 static void __init rcu_start_exp_gp_kworker(void)
@@ -5006,79 +5015,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	return 0;
 }
 
-static void rcu_thread_affine_rnp(struct task_struct *t, struct rcu_node *rnp)
-{
-	cpumask_var_t affinity;
-	int cpu;
-
-	if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
-		return;
-
-	for_each_leaf_node_possible_cpu(rnp, cpu)
-		cpumask_set_cpu(cpu, affinity);
-
-	kthread_affine_preferred(t, affinity);
-
-	free_cpumask_var(affinity);
-}
-
-/*
- * Update kthreads affinity during CPU-hotplug changes.
- *
- * Set the per-rcu_node kthread's affinity to cover all CPUs that are
- * served by the rcu_node in question.  The CPU hotplug lock is still
- * held, so the value of rnp->qsmaskinit will be stable.
- *
- * We don't include outgoingcpu in the affinity set, use -1 if there is
- * no outgoing CPU.  If there are no CPUs left in the affinity set,
- * this function allows the kthread to execute on any CPU.
- *
- * Any future concurrent calls are serialized via ->kthread_mutex.
- */
-static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu)
-{
-	cpumask_var_t cm;
-	unsigned long mask;
-	struct rcu_data *rdp;
-	struct rcu_node *rnp;
-	struct task_struct *task_exp;
-
-	rdp = per_cpu_ptr(&rcu_data, cpu);
-	rnp = rdp->mynode;
-
-	task_exp = rcu_exp_par_gp_task(rnp);
-
-	/*
-	 * If CPU is the boot one, this task is created later from early
-	 * initcall since kthreadd must be created first.
-	 */
-	if (!task_exp)
-		return;
-
-	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
-		return;
-
-	mutex_lock(&rnp->kthread_mutex);
-	mask = rcu_rnp_online_cpus(rnp);
-	for_each_leaf_node_possible_cpu(rnp, cpu)
-		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
-		    cpu != outgoingcpu)
-			cpumask_set_cpu(cpu, cm);
-	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
-	if (cpumask_empty(cm)) {
-		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
-		if (outgoingcpu >= 0)
-			cpumask_clear_cpu(outgoingcpu, cm);
-	}
-
-	if (task_exp)
-		set_cpus_allowed_ptr(task_exp, cm);
-
-	mutex_unlock(&rnp->kthread_mutex);
-
-	free_cpumask_var(cm);
-}
-
 /*
  * Has the specified (known valid) CPU ever been fully online?
  */
@@ -5107,7 +5043,6 @@ int rcutree_online_cpu(unsigned int cpu)
 	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return 0; /* Too early in boot for scheduler work. */
 	sync_sched_exp_online_cleanup(cpu);
-	rcutree_affinity_setting(cpu, -1);
 
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
@@ -5324,8 +5259,6 @@ int rcutree_offline_cpu(unsigned int cpu)
 	rnp->ffmask &= ~rdp->grpmask;
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
-	rcutree_affinity_setting(cpu, cpu);
-
 	// nohz_full CPUs need the tick for stop-machine to work quickly
 	tick_dep_set(TICK_DEP_BIT_RCU);
 	return 0;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 19/20] treewide: Introduce kthread_run_worker[_on_cpu]()
  2024-09-26 22:49 ` [PATCH 19/20] treewide: Introduce kthread_run_worker[_on_cpu]() Frederic Weisbecker
@ 2024-09-27  5:39   ` Paul E. McKenney
  0 siblings, 0 replies; 34+ messages in thread
From: Paul E. McKenney @ 2024-09-27  5:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Uladzislau Rezki, Neeraj Upadhyay, Joel Fernandes,
	Boqun Feng, Zqiang, rcu, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Michal Hocko, Vlastimil Babka

On Fri, Sep 27, 2024 at 12:49:07AM +0200, Frederic Weisbecker wrote:
> kthread_create() creates a kthread without running it yet. kthread_run()
> creates a kthread and runs it.
> 
> On the other hand, kthread_create_worker() creates a kthread worker and
> runs it.
> 
> This difference in behaviours is confusing. Also there is no way to
> create a kthread worker and affine it using kthread_bind_mask() or
> kthread_affine_preferred() before starting it.
> 
> Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]()
> that behaves just like kthread_run(). kthread_create_worker[_on_cpu]()
> will now only create a kthread worker without starting it.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

For the RCU pieces:

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  arch/x86/kvm/i8254.c                          |  2 +-
>  crypto/crypto_engine.c                        |  2 +-
>  drivers/cpufreq/cppc_cpufreq.c                |  2 +-
>  drivers/gpu/drm/drm_vblank_work.c             |  2 +-
>  .../drm/i915/gem/selftests/i915_gem_context.c |  2 +-
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  2 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  2 +-
>  drivers/gpu/drm/i915/gt/selftest_slpc.c       |  2 +-
>  drivers/gpu/drm/i915/selftests/i915_request.c |  8 ++--
>  drivers/gpu/drm/msm/disp/msm_disp_snapshot.c  |  2 +-
>  drivers/gpu/drm/msm/msm_atomic.c              |  2 +-
>  drivers/gpu/drm/msm/msm_gpu.c                 |  2 +-
>  drivers/gpu/drm/msm/msm_kms.c                 |  2 +-
>  .../platform/chips-media/wave5/wave5-vpu.c    |  2 +-
>  drivers/net/dsa/mv88e6xxx/chip.c              |  2 +-
>  drivers/net/ethernet/intel/ice/ice_dpll.c     |  2 +-
>  drivers/net/ethernet/intel/ice/ice_gnss.c     |  2 +-
>  drivers/net/ethernet/intel/ice/ice_ptp.c      |  2 +-
>  drivers/platform/chrome/cros_ec_spi.c         |  2 +-
>  drivers/ptp/ptp_clock.c                       |  2 +-
>  drivers/spi/spi.c                             |  2 +-
>  drivers/usb/typec/tcpm/tcpm.c                 |  2 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c              |  2 +-
>  drivers/watchdog/watchdog_dev.c               |  2 +-
>  fs/erofs/zdata.c                              |  2 +-
>  include/linux/kthread.h                       | 48 ++++++++++++++++---
>  kernel/kthread.c                              | 31 +++---------
>  kernel/rcu/tree.c                             |  4 +-
>  kernel/sched/ext.c                            |  2 +-
>  kernel/workqueue.c                            |  2 +-
>  net/dsa/tag_ksz.c                             |  2 +-
>  net/dsa/tag_ocelot_8021q.c                    |  2 +-
>  net/dsa/tag_sja1105.c                         |  2 +-
>  33 files changed, 83 insertions(+), 66 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index cd57a517d04a..d7ab8780ab9e 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -681,7 +681,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
>  	pid_nr = pid_vnr(pid);
>  	put_pid(pid);
>  
> -	pit->worker = kthread_create_worker(0, "kvm-pit/%d", pid_nr);
> +	pit->worker = kthread_run_worker(0, "kvm-pit/%d", pid_nr);
>  	if (IS_ERR(pit->worker))
>  		goto fail_kthread;
>  
> diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
> index e60a0eb628e8..c7c16da5e649 100644
> --- a/crypto/crypto_engine.c
> +++ b/crypto/crypto_engine.c
> @@ -517,7 +517,7 @@ struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
>  	crypto_init_queue(&engine->queue, qlen);
>  	spin_lock_init(&engine->queue_lock);
>  
> -	engine->kworker = kthread_create_worker(0, "%s", engine->name);
> +	engine->kworker = kthread_run_worker(0, "%s", engine->name);
>  	if (IS_ERR(engine->kworker)) {
>  		dev_err(dev, "failed to create crypto request pump task\n");
>  		return NULL;
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 1a5ad184d28f..9b91cba133c9 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -241,7 +241,7 @@ static void __init cppc_freq_invariance_init(void)
>  	if (fie_disabled)
>  		return;
>  
> -	kworker_fie = kthread_create_worker(0, "cppc_fie");
> +	kworker_fie = kthread_run_worker(0, "cppc_fie");
>  	if (IS_ERR(kworker_fie)) {
>  		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
>  			PTR_ERR(kworker_fie));
> diff --git a/drivers/gpu/drm/drm_vblank_work.c b/drivers/gpu/drm/drm_vblank_work.c
> index 1752ffb44e1d..9cc71120246f 100644
> --- a/drivers/gpu/drm/drm_vblank_work.c
> +++ b/drivers/gpu/drm/drm_vblank_work.c
> @@ -277,7 +277,7 @@ int drm_vblank_worker_init(struct drm_vblank_crtc *vblank)
>  
>  	INIT_LIST_HEAD(&vblank->pending_work);
>  	init_waitqueue_head(&vblank->work_wait_queue);
> -	worker = kthread_create_worker(0, "card%d-crtc%d",
> +	worker = kthread_run_worker(0, "card%d-crtc%d",
>  				       vblank->dev->primary->index,
>  				       vblank->pipe);
>  	if (IS_ERR(worker))
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index 89d4dc8b60c6..eb0158e43417 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -369,7 +369,7 @@ static int live_parallel_switch(void *arg)
>  		if (!data[n].ce[0])
>  			continue;
>  
> -		worker = kthread_create_worker(0, "igt/parallel:%s",
> +		worker = kthread_run_worker(0, "igt/parallel:%s",
>  					       data[n].ce[0]->engine->name);
>  		if (IS_ERR(worker)) {
>  			err = PTR_ERR(worker);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 222ca7c44951..81c31396eceb 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -3574,7 +3574,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
>  			arg[id].batch = NULL;
>  		arg[id].count = 0;
>  
> -		worker[id] = kthread_create_worker(0, "igt/smoke:%d", id);
> +		worker[id] = kthread_run_worker(0, "igt/smoke:%d", id);
>  		if (IS_ERR(worker[id])) {
>  			err = PTR_ERR(worker[id]);
>  			break;
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 9ce8ff1c04fe..9d3aeb237295 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -1025,7 +1025,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			threads[tmp].engine = other;
>  			threads[tmp].flags = flags;
>  
> -			worker = kthread_create_worker(0, "igt/%s",
> +			worker = kthread_run_worker(0, "igt/%s",
>  						       other->name);
>  			if (IS_ERR(worker)) {
>  				err = PTR_ERR(worker);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_slpc.c b/drivers/gpu/drm/i915/gt/selftest_slpc.c
> index 4ecc4ae74a54..e218b229681f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_slpc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_slpc.c
> @@ -489,7 +489,7 @@ static int live_slpc_tile_interaction(void *arg)
>  		return -ENOMEM;
>  
>  	for_each_gt(gt, i915, i) {
> -		threads[i].worker = kthread_create_worker(0, "igt/slpc_parallel:%d", gt->info.id);
> +		threads[i].worker = kthread_run_worker(0, "igt/slpc_parallel:%d", gt->info.id);
>  
>  		if (IS_ERR(threads[i].worker)) {
>  			ret = PTR_ERR(threads[i].worker);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index acae30a04a94..88870844b5bd 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -492,7 +492,7 @@ static int mock_breadcrumbs_smoketest(void *arg)
>  	for (n = 0; n < ncpus; n++) {
>  		struct kthread_worker *worker;
>  
> -		worker = kthread_create_worker(0, "igt/%d", n);
> +		worker = kthread_run_worker(0, "igt/%d", n);
>  		if (IS_ERR(worker)) {
>  			ret = PTR_ERR(worker);
>  			ncpus = n;
> @@ -1645,7 +1645,7 @@ static int live_parallel_engines(void *arg)
>  		for_each_uabi_engine(engine, i915) {
>  			struct kthread_worker *worker;
>  
> -			worker = kthread_create_worker(0, "igt/parallel:%s",
> +			worker = kthread_run_worker(0, "igt/parallel:%s",
>  						       engine->name);
>  			if (IS_ERR(worker)) {
>  				err = PTR_ERR(worker);
> @@ -1806,7 +1806,7 @@ static int live_breadcrumbs_smoketest(void *arg)
>  			unsigned int i = idx * ncpus + n;
>  			struct kthread_worker *worker;
>  
> -			worker = kthread_create_worker(0, "igt/%d.%d", idx, n);
> +			worker = kthread_run_worker(0, "igt/%d.%d", idx, n);
>  			if (IS_ERR(worker)) {
>  				ret = PTR_ERR(worker);
>  				goto out_flush;
> @@ -3219,7 +3219,7 @@ static int perf_parallel_engines(void *arg)
>  
>  			memset(&engines[idx].p, 0, sizeof(engines[idx].p));
>  
> -			worker = kthread_create_worker(0, "igt:%s",
> +			worker = kthread_run_worker(0, "igt:%s",
>  						       engine->name);
>  			if (IS_ERR(worker)) {
>  				err = PTR_ERR(worker);
> diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
> index e75b97127c0d..2be00b11e557 100644
> --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
> +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
> @@ -109,7 +109,7 @@ int msm_disp_snapshot_init(struct drm_device *drm_dev)
>  
>  	mutex_init(&kms->dump_mutex);
>  
> -	kms->dump_worker = kthread_create_worker(0, "%s", "disp_snapshot");
> +	kms->dump_worker = kthread_run_worker(0, "%s", "disp_snapshot");
>  	if (IS_ERR(kms->dump_worker))
>  		DRM_ERROR("failed to create disp state task\n");
>  
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index 9c45d641b521..a7a2384044ff 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -115,7 +115,7 @@ int msm_atomic_init_pending_timer(struct msm_pending_timer *timer,
>  	timer->kms = kms;
>  	timer->crtc_idx = crtc_idx;
>  
> -	timer->worker = kthread_create_worker(0, "atomic-worker-%d", crtc_idx);
> +	timer->worker = kthread_run_worker(0, "atomic-worker-%d", crtc_idx);
>  	if (IS_ERR(timer->worker)) {
>  		int ret = PTR_ERR(timer->worker);
>  		timer->worker = NULL;
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index a274b8466423..15f74e9dfc9e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -859,7 +859,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  	gpu->funcs = funcs;
>  	gpu->name = name;
>  
> -	gpu->worker = kthread_create_worker(0, "gpu-worker");
> +	gpu->worker = kthread_run_worker(0, "gpu-worker");
>  	if (IS_ERR(gpu->worker)) {
>  		ret = PTR_ERR(gpu->worker);
>  		gpu->worker = NULL;
> diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
> index af6a6fcb1173..8db9f3afb8ac 100644
> --- a/drivers/gpu/drm/msm/msm_kms.c
> +++ b/drivers/gpu/drm/msm/msm_kms.c
> @@ -269,7 +269,7 @@ int msm_drm_kms_init(struct device *dev, const struct drm_driver *drv)
>  		/* initialize event thread */
>  		ev_thread = &priv->event_thread[drm_crtc_index(crtc)];
>  		ev_thread->dev = ddev;
> -		ev_thread->worker = kthread_create_worker(0, "crtc_event:%d", crtc->base.id);
> +		ev_thread->worker = kthread_run_worker(0, "crtc_event:%d", crtc->base.id);
>  		if (IS_ERR(ev_thread->worker)) {
>  			ret = PTR_ERR(ev_thread->worker);
>  			DRM_DEV_ERROR(dev, "failed to create crtc_event kthread\n");
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> index 7273254ecb03..c49f5ed461cf 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> @@ -231,7 +231,7 @@ static int wave5_vpu_probe(struct platform_device *pdev)
>  		dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n");
>  		hrtimer_init(&dev->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
>  		dev->hrtimer.function = &wave5_vpu_timer_callback;
> -		dev->worker = kthread_create_worker(0, "vpu_irq_thread");
> +		dev->worker = kthread_run_worker(0, "vpu_irq_thread");
>  		if (IS_ERR(dev->worker)) {
>  			dev_err(&pdev->dev, "failed to create vpu irq worker\n");
>  			ret = PTR_ERR(dev->worker);
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 5b4e2ce5470d..a5908e2ff2cf 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -393,7 +393,7 @@ static int mv88e6xxx_irq_poll_setup(struct mv88e6xxx_chip *chip)
>  	kthread_init_delayed_work(&chip->irq_poll_work,
>  				  mv88e6xxx_irq_poll);
>  
> -	chip->kworker = kthread_create_worker(0, "%s", dev_name(chip->dev));
> +	chip->kworker = kthread_run_worker(0, "%s", dev_name(chip->dev));
>  	if (IS_ERR(chip->kworker))
>  		return PTR_ERR(chip->kworker);
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
> index cd95705d1e7f..1f11a24387f3 100644
> --- a/drivers/net/ethernet/intel/ice/ice_dpll.c
> +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
> @@ -2050,7 +2050,7 @@ static int ice_dpll_init_worker(struct ice_pf *pf)
>  	struct kthread_worker *kworker;
>  
>  	kthread_init_delayed_work(&d->work, ice_dpll_periodic_work);
> -	kworker = kthread_create_worker(0, "ice-dplls-%s",
> +	kworker = kthread_run_worker(0, "ice-dplls-%s",
>  					dev_name(ice_pf_to_dev(pf)));
>  	if (IS_ERR(kworker))
>  		return PTR_ERR(kworker);
> diff --git a/drivers/net/ethernet/intel/ice/ice_gnss.c b/drivers/net/ethernet/intel/ice/ice_gnss.c
> index c8ea1af51ad3..fcd1f808b696 100644
> --- a/drivers/net/ethernet/intel/ice/ice_gnss.c
> +++ b/drivers/net/ethernet/intel/ice/ice_gnss.c
> @@ -182,7 +182,7 @@ static struct gnss_serial *ice_gnss_struct_init(struct ice_pf *pf)
>  	pf->gnss_serial = gnss;
>  
>  	kthread_init_delayed_work(&gnss->read_work, ice_gnss_read);
> -	kworker = kthread_create_worker(0, "ice-gnss-%s", dev_name(dev));
> +	kworker = kthread_run_worker(0, "ice-gnss-%s", dev_name(dev));
>  	if (IS_ERR(kworker)) {
>  		kfree(gnss);
>  		return NULL;
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
> index ef2e858f49bb..cd7da48bdf91 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
> @@ -3185,7 +3185,7 @@ static int ice_ptp_init_work(struct ice_pf *pf, struct ice_ptp *ptp)
>  	/* Allocate a kworker for handling work required for the ports
>  	 * connected to the PTP hardware clock.
>  	 */
> -	kworker = kthread_create_worker(0, "ice-ptp-%s",
> +	kworker = kthread_run_worker(0, "ice-ptp-%s",
>  					dev_name(ice_pf_to_dev(pf)));
>  	if (IS_ERR(kworker))
>  		return PTR_ERR(kworker);
> diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
> index 86a3d32a7763..08f566cc1480 100644
> --- a/drivers/platform/chrome/cros_ec_spi.c
> +++ b/drivers/platform/chrome/cros_ec_spi.c
> @@ -715,7 +715,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
>  	int err;
>  
>  	ec_spi->high_pri_worker =
> -		kthread_create_worker(0, "cros_ec_spi_high_pri");
> +		kthread_run_worker(0, "cros_ec_spi_high_pri");
>  
>  	if (IS_ERR(ec_spi->high_pri_worker)) {
>  		err = PTR_ERR(ec_spi->high_pri_worker);
> diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
> index c56cd0f63909..89a4420972e7 100644
> --- a/drivers/ptp/ptp_clock.c
> +++ b/drivers/ptp/ptp_clock.c
> @@ -295,7 +295,7 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
>  
>  	if (ptp->info->do_aux_work) {
>  		kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
> -		ptp->kworker = kthread_create_worker(0, "ptp%d", ptp->index);
> +		ptp->kworker = kthread_run_worker(0, "ptp%d", ptp->index);
>  		if (IS_ERR(ptp->kworker)) {
>  			err = PTR_ERR(ptp->kworker);
>  			pr_err("failed to create ptp aux_worker %d\n", err);
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index c1dad30a4528..f2f4b6ee25d4 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -2053,7 +2053,7 @@ static int spi_init_queue(struct spi_controller *ctlr)
>  	ctlr->busy = false;
>  	ctlr->queue_empty = true;
>  
> -	ctlr->kworker = kthread_create_worker(0, dev_name(&ctlr->dev));
> +	ctlr->kworker = kthread_run_worker(0, dev_name(&ctlr->dev));
>  	if (IS_ERR(ctlr->kworker)) {
>  		dev_err(&ctlr->dev, "failed to create message pump kworker\n");
>  		return PTR_ERR(ctlr->kworker);
> diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
> index fc619478200f..66ae934ad196 100644
> --- a/drivers/usb/typec/tcpm/tcpm.c
> +++ b/drivers/usb/typec/tcpm/tcpm.c
> @@ -7577,7 +7577,7 @@ struct tcpm_port *tcpm_register_port(struct device *dev, struct tcpc_dev *tcpc)
>  	mutex_init(&port->lock);
>  	mutex_init(&port->swap_lock);
>  
> -	port->wq = kthread_create_worker(0, dev_name(dev));
> +	port->wq = kthread_run_worker(0, dev_name(dev));
>  	if (IS_ERR(port->wq))
>  		return ERR_CAST(port->wq);
>  	sched_set_fifo(port->wq->task);
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 8ffea8430f95..c204fc8e471a 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -229,7 +229,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
>  	dev = &vdpasim->vdpa.dev;
>  
>  	kthread_init_work(&vdpasim->work, vdpasim_work_fn);
> -	vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
> +	vdpasim->worker = kthread_run_worker(0, "vDPA sim worker: %s",
>  						dev_attr->name);
>  	if (IS_ERR(vdpasim->worker))
>  		goto err_iommu;
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 4190cb800cc4..19698d87dc57 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -1229,7 +1229,7 @@ int __init watchdog_dev_init(void)
>  {
>  	int err;
>  
> -	watchdog_kworker = kthread_create_worker(0, "watchdogd");
> +	watchdog_kworker = kthread_run_worker(0, "watchdogd");
>  	if (IS_ERR(watchdog_kworker)) {
>  		pr_err("Failed to create watchdog kworker\n");
>  		return PTR_ERR(watchdog_kworker);
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 050aaa016ec8..bf6b4d8cb283 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -318,7 +318,7 @@ static void erofs_destroy_percpu_workers(void)
>  static struct kthread_worker *erofs_init_percpu_worker(int cpu)
>  {
>  	struct kthread_worker *worker =
> -		kthread_create_worker_on_cpu(cpu, 0, "erofs_worker/%u");
> +		kthread_run_worker_on_cpu(cpu, 0, "erofs_worker/%u");
>  
>  	if (IS_ERR(worker))
>  		return worker;
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 0c66e7c1092a..8d27403888ce 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -193,19 +193,53 @@ struct kthread_worker *kthread_create_worker_on_node(unsigned int flags,
>  						     const char namefmt[], ...);
>  
>  #define kthread_create_worker(flags, namefmt, ...) \
> -({									   \
> -	struct kthread_worker *__kw					   \
> -		= kthread_create_worker_on_node(flags, NUMA_NO_NODE,	   \
> -						namefmt, ## __VA_ARGS__);  \
> -	if (!IS_ERR(__kw))						   \
> -		wake_up_process(__kw->task);				   \
> -	__kw;								   \
> +	kthread_create_worker_on_node(flags, NUMA_NO_NODE, namefmt, ## __VA_ARGS__);
> +
> +/**
> + * kthread_run_worker - create and wake a kthread worker.
> + * @flags: flags modifying the default behavior of the worker
> + * @namefmt: printf-style name for the thread.
> + *
> + * Description: Convenient wrapper for kthread_create_worker() followed by
> + * wake_up_process().  Returns the kthread_worker or ERR_PTR(-ENOMEM).
> + */
> +#define kthread_run_worker(flags, namefmt, ...)					\
> +({										\
> +	struct kthread_worker *__kw						\
> +		= kthread_create_worker(flags, namefmt, ## __VA_ARGS__);	\
> +	if (!IS_ERR(__kw))							\
> +		wake_up_process(__kw->task);					\
> +	__kw;									\
>  })
>  
>  struct kthread_worker *
>  kthread_create_worker_on_cpu(int cpu, unsigned int flags,
>  			     const char namefmt[]);
>  
> +/**
> + * kthread_run_worker_on_cpu - create and wake a cpu bound kthread worker.
> + * @cpu: CPU number
> + * @flags: flags modifying the default behavior of the worker
> + * @namefmt: printf-style name for the thread. Format is restricted
> + *	     to "name.*%u". Code fills in cpu number.
> + *
> + * Description: Convenient wrapper for kthread_create_worker_on_cpu()
> + * followed by wake_up_process().  Returns the kthread_worker or
> + * ERR_PTR(-ENOMEM).
> + */
> +static inline struct kthread_worker *
> +kthread_run_worker_on_cpu(int cpu, unsigned int flags,
> +			  const char namefmt[])
> +{
> +	struct kthread_worker *kw;
> +
> +	kw = kthread_create_worker_on_cpu(cpu, flags, namefmt);
> +	if (!IS_ERR(kw))
> +		wake_up_process(kw->task);
> +
> +	return kw;
> +}
> +
>  bool kthread_queue_work(struct kthread_worker *worker,
>  			struct kthread_work *work);
>  
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 7eb93c248c59..d9fee08e9a66 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1075,33 +1075,10 @@ kthread_create_worker_on_node(unsigned int flags, int node, const char namefmt[]
>  	worker = __kthread_create_worker_on_node(flags, node, namefmt, args);
>  	va_end(args);
>  
> -	if (worker)
> -		wake_up_process(worker->task);
> -
>  	return worker;
>  }
>  EXPORT_SYMBOL(kthread_create_worker_on_node);
>  
> -static __printf(3, 4) struct kthread_worker *
> -__kthread_create_worker_on_cpu(int cpu, unsigned int flags,
> -			       const char namefmt[], ...)
> -{
> -	struct kthread_worker *worker;
> -	va_list args;
> -
> -	va_start(args, namefmt);
> -	worker = __kthread_create_worker_on_node(flags, cpu_to_node(cpu),
> -						 namefmt, args);
> -	va_end(args);
> -
> -	if (worker) {
> -		kthread_bind(worker->task, cpu);
> -		wake_up_process(worker->task);
> -	}
> -
> -	return worker;
> -}
> -
>  /**
>   * kthread_create_worker_on_cpu - create a kthread worker and bind it
>   *	to a given CPU and the associated NUMA node.
> @@ -1142,7 +1119,13 @@ struct kthread_worker *
>  kthread_create_worker_on_cpu(int cpu, unsigned int flags,
>  			     const char namefmt[])
>  {
> -	return __kthread_create_worker_on_cpu(cpu, flags, namefmt, cpu);
> +	struct kthread_worker *worker;
> +
> +	worker = kthread_create_worker_on_node(flags, cpu_to_node(cpu), namefmt, cpu);
> +	if (worker)
> +		kthread_bind(worker->task, cpu);
> +
> +	return worker;
>  }
>  EXPORT_SYMBOL(kthread_create_worker_on_cpu);
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c1e9f0818d51..a44228b0949a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4902,7 +4902,7 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp)
>  	if (rnp->exp_kworker)
>  		return;
>  
> -	kworker = kthread_create_worker(0, name, rnp_index);
> +	kworker = kthread_run_worker(0, name, rnp_index);
>  	if (IS_ERR_OR_NULL(kworker)) {
>  		pr_err("Failed to create par gp kworker on %d/%d\n",
>  		       rnp->grplo, rnp->grphi);
> @@ -4929,7 +4929,7 @@ static void __init rcu_start_exp_gp_kworker(void)
>  	const char *name = "rcu_exp_gp_kthread_worker";
>  	struct sched_param param = { .sched_priority = kthread_prio };
>  
> -	rcu_exp_gp_kworker = kthread_create_worker(0, name);
> +	rcu_exp_gp_kworker = kthread_run_worker(0, name);
>  	if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) {
>  		pr_err("Failed to create %s!\n", name);
>  		rcu_exp_gp_kworker = NULL;
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index c09e3dc38c34..4835fa4d9326 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4885,7 +4885,7 @@ static struct kthread_worker *scx_create_rt_helper(const char *name)
>  {
>  	struct kthread_worker *helper;
>  
> -	helper = kthread_create_worker(0, name);
> +	helper = kthread_run_worker(0, name);
>  	if (helper)
>  		sched_set_fifo(helper->task);
>  	return helper;
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9949ffad8df0..f5c7447ae1de 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7814,7 +7814,7 @@ static void __init wq_cpu_intensive_thresh_init(void)
>  	unsigned long thresh;
>  	unsigned long bogo;
>  
> -	pwq_release_worker = kthread_create_worker(0, "pool_workqueue_release");
> +	pwq_release_worker = kthread_run_worker(0, "pool_workqueue_release");
>  	BUG_ON(IS_ERR(pwq_release_worker));
>  
>  	/* if the user set it to a specific value, keep it */
> diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
> index 281bbac5539d..c33d4bf17929 100644
> --- a/net/dsa/tag_ksz.c
> +++ b/net/dsa/tag_ksz.c
> @@ -66,7 +66,7 @@ static int ksz_connect(struct dsa_switch *ds)
>  	if (!priv)
>  		return -ENOMEM;
>  
> -	xmit_worker = kthread_create_worker(0, "dsa%d:%d_xmit",
> +	xmit_worker = kthread_run_worker(0, "dsa%d:%d_xmit",
>  					    ds->dst->index, ds->index);
>  	if (IS_ERR(xmit_worker)) {
>  		ret = PTR_ERR(xmit_worker);
> diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
> index 8e8b1bef6af6..6ce0bc166792 100644
> --- a/net/dsa/tag_ocelot_8021q.c
> +++ b/net/dsa/tag_ocelot_8021q.c
> @@ -110,7 +110,7 @@ static int ocelot_connect(struct dsa_switch *ds)
>  	if (!priv)
>  		return -ENOMEM;
>  
> -	priv->xmit_worker = kthread_create_worker(0, "felix_xmit");
> +	priv->xmit_worker = kthread_run_worker(0, "felix_xmit");
>  	if (IS_ERR(priv->xmit_worker)) {
>  		err = PTR_ERR(priv->xmit_worker);
>  		kfree(priv);
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 3e902af7eea6..02adec693811 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -707,7 +707,7 @@ static int sja1105_connect(struct dsa_switch *ds)
>  
>  	spin_lock_init(&priv->meta_lock);
>  
> -	xmit_worker = kthread_create_worker(0, "dsa%d:%d_xmit",
> +	xmit_worker = kthread_run_worker(0, "dsa%d:%d_xmit",
>  					    ds->dst->index, ds->index);
>  	if (IS_ERR(xmit_worker)) {
>  		err = PTR_ERR(xmit_worker);
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-09-26 22:48 ` [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
@ 2024-09-27  7:26   ` Michal Hocko
  2024-10-08 10:54   ` Will Deacon
  1 sibling, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2024-09-27  7:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Will Deacon, Peter Zijlstra, Vincent Guittot,
	Thomas Gleixner, Vlastimil Babka, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Zqiang,
	Uladzislau Rezki, rcu

On Fri 27-09-24 00:48:59, Frederic Weisbecker wrote:
> When a kthread or any other task has an affinity mask that is fully
> offline or unallowed, the scheduler reaffines the task to all possible
> CPUs as a last resort.
> 
> This default decision doesn't mix up very well with nohz_full CPUs that
> are part of the possible cpumask but don't want to be disturbed by
> unbound kthreads or even detached pinned user tasks.
> 
> Make the fallback affinity setting aware of nohz_full. This applies to
> all architectures supporting nohz_full except arm32. However this
> architecture that overrides the task possible mask is unlikely to be
> willing to integrate new development.
> 
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Thanks, this makes sense to me. Up to scheduler maitainers whether this
makes sense in general though.

Thanks for looking into this Frederic!

> ---
>  kernel/sched/core.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 43e453ab7e20..d4b759c1cbf1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3421,6 +3421,21 @@ void kick_process(struct task_struct *p)
>  }
>  EXPORT_SYMBOL_GPL(kick_process);
>  
> +static const struct cpumask *task_cpu_fallback_mask(struct task_struct *t)
> +{
> +	const struct cpumask *mask;
> +
> +	mask = task_cpu_possible_mask(p);
> +	/*
> +	 * Architectures that overrides the task possible mask
> +	 * must handle CPU isolation.
> +	 */
> +	if (mask != cpu_possible_mask)
> +		return mask;
> +	else
> +		return housekeeping_cpumask(HK_TYPE_TICK);
> +}
> +
>  /*
>   * ->cpus_ptr is protected by both rq->lock and p->pi_lock
>   *
> @@ -3489,7 +3504,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>  			 *
>  			 * More yuck to audit.
>  			 */
> -			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
> +			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
>  			state = fail;
>  			break;
>  		case fail:
> -- 
> 2.46.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/20] net: pktgen: Use kthread_create_on_node()
  2024-09-26 22:48 ` [PATCH 10/20] net: pktgen: Use kthread_create_on_node() Frederic Weisbecker
@ 2024-09-27  7:58   ` Eric Dumazet
  2024-09-30 17:19   ` Vishal Chourasia
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2024-09-27  7:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner

On Fri, Sep 27, 2024 at 12:49 AM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> Use the proper API instead of open coding it.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/20] net: pktgen: Use kthread_create_on_node()
  2024-09-26 22:48 ` [PATCH 10/20] net: pktgen: Use kthread_create_on_node() Frederic Weisbecker
  2024-09-27  7:58   ` Eric Dumazet
@ 2024-09-30 17:19   ` Vishal Chourasia
  2024-10-24 14:29     ` Frederic Weisbecker
  1 sibling, 1 reply; 34+ messages in thread
From: Vishal Chourasia @ 2024-09-30 17:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner

On Fri, Sep 27, 2024 at 12:48:58AM +0200, Frederic Weisbecker wrote:
> Use the proper API instead of open coding it.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  net/core/pktgen.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 34f68ef74b8f..7fcb4fc7a5d6 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -3883,17 +3883,14 @@ static int __net_init pktgen_create_thread(int cpu, struct pktgen_net *pn)
>  	list_add_tail(&t->th_list, &pn->pktgen_threads);
>  	init_completion(&t->start_done);
>  
> -	p = kthread_create_on_node(pktgen_thread_worker,
> -				   t,
> -				   cpu_to_node(cpu),
> -				   "kpktgend_%d", cpu);
> +	p = kthread_create_on_cpu(pktgen_thread_worker, t, cpu, "kpktgend_%d");
Hi Frederic, 

The Subject line says "Use kthread_create_on_node()" while
kthread_create_on_cpu is used in the diff.


>  	if (IS_ERR(p)) {
>  		pr_err("kthread_create_on_node() failed for cpu %d\n", t->cpu);
>  		list_del(&t->th_list);
>  		kfree(t);
>  		return PTR_ERR(p);
>  	}
> -	kthread_bind(p, cpu);
> +
>  	t->tsk = p;
>  
>  	pe = proc_create_data(t->tsk->comm, 0600, pn->proc_dir,
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-09-26 22:48 ` [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
  2024-09-27  7:26   ` Michal Hocko
@ 2024-10-08 10:54   ` Will Deacon
  2024-10-08 12:27     ` Frederic Weisbecker
  2024-10-15 13:48     ` Frederic Weisbecker
  1 sibling, 2 replies; 34+ messages in thread
From: Will Deacon @ 2024-10-08 10:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> When a kthread or any other task has an affinity mask that is fully
> offline or unallowed, the scheduler reaffines the task to all possible
> CPUs as a last resort.
> 
> This default decision doesn't mix up very well with nohz_full CPUs that
> are part of the possible cpumask but don't want to be disturbed by
> unbound kthreads or even detached pinned user tasks.
> 
> Make the fallback affinity setting aware of nohz_full. This applies to
> all architectures supporting nohz_full except arm32. However this
> architecture that overrides the task possible mask is unlikely to be
> willing to integrate new development.

I'm not sure I understand this last sentence. The possible mask is
overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
to integrate what?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-10-08 10:54   ` Will Deacon
@ 2024-10-08 12:27     ` Frederic Weisbecker
  2024-10-15 13:48     ` Frederic Weisbecker
  1 sibling, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-10-08 12:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

Le Tue, Oct 08, 2024 at 11:54:35AM +0100, Will Deacon a écrit :
> On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> > When a kthread or any other task has an affinity mask that is fully
> > offline or unallowed, the scheduler reaffines the task to all possible
> > CPUs as a last resort.
> > 
> > This default decision doesn't mix up very well with nohz_full CPUs that
> > are part of the possible cpumask but don't want to be disturbed by
> > unbound kthreads or even detached pinned user tasks.
> > 
> > Make the fallback affinity setting aware of nohz_full. This applies to
> > all architectures supporting nohz_full except arm32. However this
> > architecture that overrides the task possible mask is unlikely to be
> > willing to integrate new development.
> 
> I'm not sure I understand this last sentence. The possible mask is
> overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
> featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
> to integrate what?

Right I've been lazy on that, assuming that nohz_full is a niche, and
nohz_full on arm 32 bits tasks must be even more a niche. But I can make
it a macro just like task_cpu_possible_mask() so that architectures
can override it?

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-10-08 10:54   ` Will Deacon
  2024-10-08 12:27     ` Frederic Weisbecker
@ 2024-10-15 13:48     ` Frederic Weisbecker
  2024-10-28 16:25       ` Will Deacon
  1 sibling, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2024-10-15 13:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

Le Tue, Oct 08, 2024 at 11:54:35AM +0100, Will Deacon a écrit :
> On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> > When a kthread or any other task has an affinity mask that is fully
> > offline or unallowed, the scheduler reaffines the task to all possible
> > CPUs as a last resort.
> > 
> > This default decision doesn't mix up very well with nohz_full CPUs that
> > are part of the possible cpumask but don't want to be disturbed by
> > unbound kthreads or even detached pinned user tasks.
> > 
> > Make the fallback affinity setting aware of nohz_full. This applies to
> > all architectures supporting nohz_full except arm32. However this
> > architecture that overrides the task possible mask is unlikely to be
> > willing to integrate new development.
> 
> I'm not sure I understand this last sentence. The possible mask is
> overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
> featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
> to integrate what?
> 
> Will

Will, how does the (untested) following look like? The rationale is that
we must deal with the fact that CPU supporting 32-bits el0 may appear at
any time and those may not intersect housekeeping CPUs (housekeeping CPUs
are CPUs that are not part of nohz_full=. If nohz_full= isn't used then
it's cpu_possible_mask). If there is a housekeeping CPU supporting el0 32bits
then it will be disallowed to be ever offlined. But if the first mismatching
CPU supporting el0 that pops up is not housekeeping then we may end up
with that CPU disallowed to be offlined + later if a housekeeping CPU appears
that also supports 32bits el0 will also be disallowed to be offlined. Ideally
it should turn back the previous CPU to be offlinable but there may be
other things that have forbidden that CPU to be offline so...

Oh well I made a mess.

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 3d261cc123c1..992d782f2899 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -663,6 +663,7 @@ static inline bool supports_clearbhb(int scope)
 }
 
 const struct cpumask *system_32bit_el0_cpumask(void);
+const struct cpumask *fallback_32bit_el0_cpumask(void);
 DECLARE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
 
 static inline bool system_supports_32bit_el0(void)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 7c09d47e09cb..30cb30438fec 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -282,6 +282,19 @@ task_cpu_possible_mask(struct task_struct *p)
 }
 #define task_cpu_possible_mask	task_cpu_possible_mask
 
+static inline const struct cpumask *
+task_cpu_fallback_mask(struct task_struct *p)
+{
+	if (!static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return housekeeping_cpumask(HK_TYPE_TICK);
+
+	if (!is_compat_thread(task_thread_info(p)))
+		return housekeeping_cpumask(HK_TYPE_TICK);
+
+	return fallback_32bit_el0_cpumask();
+}
+#define task_cpu_fallback_mask	task_cpu_fallback_mask
+
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 718728a85430..3e4400df588f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -133,6 +133,7 @@ DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
  * Only valid if arm64_mismatched_32bit_el0 is enabled.
  */
 static cpumask_var_t cpu_32bit_el0_mask __cpumask_var_read_mostly;
+static cpumask_var_t fallback_32bit_el0_mask __cpumask_var_read_mostly;
 
 void dump_cpu_features(void)
 {
@@ -1618,6 +1619,21 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+const struct cpumask *fallback_32bit_el0_cpumask(void)
+{
+	if (!system_supports_32bit_el0())
+		return cpu_none_mask;
+
+	if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+		if (!cpumask_empty(fallback_32bit_el0_mask))
+			return fallback_32bit_el0_mask;
+		else
+			return cpu_32bit_el0_mask
+	}
+
+	return housekeeping_cpumask(HK_TYPE_TICK);
+}
+
 static int __init parse_32bit_el0_param(char *str)
 {
 	allow_mismatched_32bit_el0 = true;
@@ -3598,20 +3614,30 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 	 * be offlined by userspace. -1 indicates we haven't yet onlined
 	 * a 32-bit-capable CPU.
 	 */
-	static int lucky_winner = -1;
+	static int unofflinable = nr_cpu_ids;
+	static int unofflinable_hk = nr_cpu_ids;
 
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
 	if (cpu_32bit) {
 		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
+		if (housekeeping_cpu(cpu, HK_TYPE_TICK))
+			cpumask_set_cpu(cpu, fallback_32bit_el0_mask);
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
 	}
 
-	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+	if (unofflinable < nr_cpu_ids) {
+		if (unofflinable_hk >= nr_cpu_ids && cpu_32bit && housekeeping_cpu(cpu, HK_TYPE_TICK)) {
+			unofflinable_hk = cpu;
+			get_cpu_device(unofflinable_hk)->offline_disabled = true;
+			pr_info("Asymmetric 32-bit EL0 support detected on housekeeping CPU %u;"
+				"CPU hot-unplug disabled on CPU %u\n", cpu, cpu);
+		}
 		return 0;
+	}
 
-	if (lucky_winner >= 0)
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
 		return 0;
 
 	/*
@@ -3619,9 +3645,13 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
 	 * every CPU in the system for a 32-bit task.
 	 */
-	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
-							 cpu_active_mask);
-	get_cpu_device(lucky_winner)->offline_disabled = true;
+	unofflinable_hk = cpumask_any_and(fallback_32bit_el0_mask, cpu_active_mask);
+	if (unofflinable_hk < nr_cpu_ids)
+		unofflinable = unofflinable_hk;
+	else
+		unofflinable = cpumask_any_and(cpu_32bit_el0_mask, cpu_active_mask);
+
+	get_cpu_device(unofflinable)->offline_disabled = true;
 	setup_elf_hwcaps(compat_elf_hwcaps);
 	elf_hwcap_fixup();
 	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
@@ -3637,6 +3667,9 @@ static int __init init_32bit_el0_mask(void)
 	if (!zalloc_cpumask_var(&cpu_32bit_el0_mask, GFP_KERNEL))
 		return -ENOMEM;
 
+	if (!zalloc_cpumask_var(&fallback_32bit_el0_mask, GFP_KERNEL))
+		return -ENOMEM;
+
 	return cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 				 "arm64/mismatched_32bit_el0:online",
 				 enable_mismatched_32bit_el0, NULL);
diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index bbaec80c78c5..5b8d017a17f9 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -28,6 +28,10 @@ static inline void leave_mm(void) { }
 # define task_cpu_possible(cpu, p)	cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
 #endif
 
+#ifndef task_cpu_fallback_mask
+# define task_cpu_fallback_mask(p)	housekeeping_cpumask(HK_TYPE_TICK);
+#endif
+
 #ifndef mm_untag_mask
 static inline unsigned long mm_untag_mask(struct mm_struct *mm)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aeb595514461..1edce360f1a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3489,7 +3489,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
+			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
 			state = fail;
 			break;
 		case fail:
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/20] net: pktgen: Use kthread_create_on_node()
  2024-09-30 17:19   ` Vishal Chourasia
@ 2024-10-24 14:29     ` Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-10-24 14:29 UTC (permalink / raw)
  To: Vishal Chourasia
  Cc: LKML, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner

Le Mon, Sep 30, 2024 at 10:49:39PM +0530, Vishal Chourasia a écrit :
> On Fri, Sep 27, 2024 at 12:48:58AM +0200, Frederic Weisbecker wrote:
> > Use the proper API instead of open coding it.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  net/core/pktgen.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> > index 34f68ef74b8f..7fcb4fc7a5d6 100644
> > --- a/net/core/pktgen.c
> > +++ b/net/core/pktgen.c
> > @@ -3883,17 +3883,14 @@ static int __net_init pktgen_create_thread(int cpu, struct pktgen_net *pn)
> >  	list_add_tail(&t->th_list, &pn->pktgen_threads);
> >  	init_completion(&t->start_done);
> >  
> > -	p = kthread_create_on_node(pktgen_thread_worker,
> > -				   t,
> > -				   cpu_to_node(cpu),
> > -				   "kpktgend_%d", cpu);
> > +	p = kthread_create_on_cpu(pktgen_thread_worker, t, cpu, "kpktgend_%d");
> Hi Frederic, 
> 
> The Subject line says "Use kthread_create_on_node()" while
> kthread_create_on_cpu is used in the diff.

Thanks!

Fixing this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-10-15 13:48     ` Frederic Weisbecker
@ 2024-10-28 16:25       ` Will Deacon
  2024-10-28 16:51         ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Will Deacon @ 2024-10-28 16:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

Hi Frederic,

Thanks for having a crack at this, but I'm pretty confused now so please
prepare for a bunch of silly questions!

On Tue, Oct 15, 2024 at 03:48:55PM +0200, Frederic Weisbecker wrote:
> Le Tue, Oct 08, 2024 at 11:54:35AM +0100, Will Deacon a écrit :
> > On Fri, Sep 27, 2024 at 12:48:59AM +0200, Frederic Weisbecker wrote:
> > > When a kthread or any other task has an affinity mask that is fully
> > > offline or unallowed, the scheduler reaffines the task to all possible
> > > CPUs as a last resort.
> > > 
> > > This default decision doesn't mix up very well with nohz_full CPUs that
> > > are part of the possible cpumask but don't want to be disturbed by
> > > unbound kthreads or even detached pinned user tasks.
> > > 
> > > Make the fallback affinity setting aware of nohz_full. This applies to
> > > all architectures supporting nohz_full except arm32. However this
> > > architecture that overrides the task possible mask is unlikely to be
> > > willing to integrate new development.
> > 
> > I'm not sure I understand this last sentence. The possible mask is
> > overridden for 32-bit tasks on an *arm64* kernel when running on an SoC
> > featuring some CPUs that can execute only 64-bit tasks. Who is unwilling
> > to integrate what?

I should've been clearer in my reply, but I think the most important thing
here for the arm64 heterogeneous SoCs is that we document whatever the
behaviour is in Documentation/arch/arm64/asymmetric-32bit.rst. There are
a few other kernel features that don't play well (e.g. SCHED_DEADLINE),
so it might be sufficient just to call out the limitations relating to
CPU isolation there.

However:

> Will, how does the (untested) following look like? The rationale is that
> we must deal with the fact that CPU supporting 32-bits el0 may appear at
> any time and those may not intersect housekeeping CPUs (housekeeping CPUs
> are CPUs that are not part of nohz_full=.

In the funky SoCs, all CPUs support 64-bit and we have a 64-bit kernel.
Some CPUs additionally support 32-bit but that should only be a concern
for the scheduling of user tasks.

> If nohz_full= isn't used then
> it's cpu_possible_mask). If there is a housekeeping CPU supporting el0 32bits
> then it will be disallowed to be ever offlined. But if the first mismatching
> CPU supporting el0 that pops up is not housekeeping then we may end up
> with that CPU disallowed to be offlined + later if a housekeeping CPU appears
> that also supports 32bits el0 will also be disallowed to be offlined. Ideally
> it should turn back the previous CPU to be offlinable but there may be
> other things that have forbidden that CPU to be offline so...

I'd have thought the bigger problem would be if the set of nohz_full=
CPUs was defined as the set of CPUs that support 32-bit. In that case,
executing a 32-bit task will give the scheduler no choice but to run
the task on a !housekeeping core.

So perhaps we could turn this on its head and explicitly mark the first
32-bit capable CPU as a housekeeping core when the mismatched mode is
enabled? We're already preventing CPU hotplug for the thing, so it's
"special" already. If that conflicts with the nohz_full_option, we can
emit a warning message that we're overriding it. I think that's ok, as
the user will have had to specify 'allow_mismatched_32bit_el0' as well.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection
  2024-10-28 16:25       ` Will Deacon
@ 2024-10-28 16:51         ` Frederic Weisbecker
  2024-10-28 16:54           ` [PATCH 1/2] arm64: Keep first mismatched 32bits el0 capable CPU online through its callbacks Frederic Weisbecker
  2024-10-28 16:56           ` [PATCH 2/2] sched,arm64: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
  0 siblings, 2 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-10-28 16:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

Le Mon, Oct 28, 2024 at 04:25:15PM +0000, Will Deacon a écrit :
> > If nohz_full= isn't used then
> > it's cpu_possible_mask). If there is a housekeeping CPU supporting el0 32bits
> > then it will be disallowed to be ever offlined. But if the first mismatching
> > CPU supporting el0 that pops up is not housekeeping then we may end up
> > with that CPU disallowed to be offlined + later if a housekeeping CPU appears
> > that also supports 32bits el0 will also be disallowed to be offlined. Ideally
> > it should turn back the previous CPU to be offlinable but there may be
> > other things that have forbidden that CPU to be offline so...
> 
> I'd have thought the bigger problem would be if the set of nohz_full=
> CPUs was defined as the set of CPUs that support 32-bit. In that case,
> executing a 32-bit task will give the scheduler no choice but to run
> the task on a !housekeeping core.

Right.

> 
> So perhaps we could turn this on its head and explicitly mark the first
> 32-bit capable CPU as a housekeeping core when the mismatched mode is
> enabled? We're already preventing CPU hotplug for the thing, so it's
> "special" already. If that conflicts with the nohz_full_option, we can
> emit a warning message that we're overriding it. I think that's ok, as
> the user will have had to specify 'allow_mismatched_32bit_el0' as well.

It's very complicated to revert a CPU once it is set as nohz_full. But we can
retain a 32 bits capable nohz_full CPU from offlining until we finally find
a non-nohz_full 2bits capable CPU. I was about to repost the whole kthread
patchset but lemme post just the specific bits of interests here, it's "just"
two patches.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/2] arm64: Keep first mismatched 32bits el0 capable CPU online through its callbacks
  2024-10-28 16:51         ` Frederic Weisbecker
@ 2024-10-28 16:54           ` Frederic Weisbecker
  2024-10-28 16:56           ` [PATCH 2/2] sched,arm64: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
  1 sibling, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-10-28 16:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

The first mismatched 32bits el0 capable CPU is designated as the last
resort CPU for compat 32 bits tasks. As such this CPU is forbidden to
go offline.

However this restriction is applied to the device object of the CPU,
which is not easy to revert later if needed because other components may
have forbidden the target to be offline and they are not tracked.

But the task cpu possible mask is going to be made aware of housekeeping
CPUs. In that context, a better 32 bits el0 last resort CPU may be found
later on boot. When that happens, the old fallback can be made
offlineable again.

To make this possible and more flexible, drive the offlineable decision
from the cpuhotplug callbacks themselves.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 718728a85430..53ee8ce38d5b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -3591,15 +3591,15 @@ void __init setup_user_features(void)
 	minsigstksz_setup();
 }
 
-static int enable_mismatched_32bit_el0(unsigned int cpu)
-{
-	/*
-	 * The first 32-bit-capable CPU we detected and so can no longer
-	 * be offlined by userspace. -1 indicates we haven't yet onlined
-	 * a 32-bit-capable CPU.
-	 */
-	static int lucky_winner = -1;
+/*
+ * The first 32-bit-capable CPU we detected and so can no longer
+ * be offlined by userspace. -1 indicates we haven't yet onlined
+ * a 32-bit-capable CPU.
+ */
+static int cpu_32bit_unofflineable = -1;
 
+static int mismatched_32bit_el0_online(unsigned int cpu)
+{
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
@@ -3611,7 +3611,7 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
 		return 0;
 
-	if (lucky_winner >= 0)
+	if (cpu_32bit_unofflineable < 0)
 		return 0;
 
 	/*
@@ -3619,16 +3619,20 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
 	 * every CPU in the system for a 32-bit task.
 	 */
-	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
-							 cpu_active_mask);
-	get_cpu_device(lucky_winner)->offline_disabled = true;
+	cpu_32bit_unofflineable = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
+								    cpu_active_mask);
 	setup_elf_hwcaps(compat_elf_hwcaps);
 	elf_hwcap_fixup();
 	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
-		cpu, lucky_winner);
+		cpu, cpu_32bit_unofflineable);
 	return 0;
 }
 
+static int mismatched_32bit_el0_offline(unsigned int cpu)
+{
+	return cpu == cpu_32bit_unofflineable ? -EBUSY : 0;
+}
+
 static int __init init_32bit_el0_mask(void)
 {
 	if (!allow_mismatched_32bit_el0)
@@ -3639,7 +3643,7 @@ static int __init init_32bit_el0_mask(void)
 
 	return cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 				 "arm64/mismatched_32bit_el0:online",
-				 enable_mismatched_32bit_el0, NULL);
+				 mismatched_32bit_el0_online, mismatched_32bit_el0_offline);
 }
 subsys_initcall_sync(init_32bit_el0_mask);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/2] sched,arm64: Handle CPU isolation on last resort fallback rq selection
  2024-10-28 16:51         ` Frederic Weisbecker
  2024-10-28 16:54           ` [PATCH 1/2] arm64: Keep first mismatched 32bits el0 capable CPU online through its callbacks Frederic Weisbecker
@ 2024-10-28 16:56           ` Frederic Weisbecker
  1 sibling, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2024-10-28 16:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: LKML, Peter Zijlstra, Vincent Guittot, Thomas Gleixner,
	Michal Hocko, Vlastimil Babka, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Zqiang, Uladzislau Rezki, rcu,
	Michal Hocko

When a kthread or any other task has an affinity mask that is fully
offline or unallowed, the scheduler reaffines the task to all possible
CPUs as a last resort.

This default decision doesn't mix up very well with nohz_full CPUs that
are part of the possible cpumask but don't want to be disturbed by
unbound kthreads or even detached pinned user tasks.

Make the fallback affinity setting aware of nohz_full. ARM64 is a
special case and its last resort EL0 32bits capable CPU can be updated
as housekeeping CPUs appear on boot.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h  |  1 +
 arch/arm64/include/asm/mmu_context.h |  2 ++
 arch/arm64/kernel/cpufeature.c       | 47 +++++++++++++++++++++++-----
 include/linux/mmu_context.h          |  1 +
 kernel/sched/core.c                  |  2 +-
 5 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 3d261cc123c1..992d782f2899 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -663,6 +663,7 @@ static inline bool supports_clearbhb(int scope)
 }
 
 const struct cpumask *system_32bit_el0_cpumask(void);
+const struct cpumask *fallback_32bit_el0_cpumask(void);
 DECLARE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
 
 static inline bool system_supports_32bit_el0(void)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 7c09d47e09cb..8d481e16271b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -282,6 +282,8 @@ task_cpu_possible_mask(struct task_struct *p)
 }
 #define task_cpu_possible_mask	task_cpu_possible_mask
 
+const struct cpumask *task_cpu_fallback_mask(struct task_struct *p);
+
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 53ee8ce38d5b..4eabe0f02cc8 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -75,6 +75,7 @@
 #include <linux/cpu.h>
 #include <linux/kasan.h>
 #include <linux/percpu.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/cpu.h>
 #include <asm/cpufeature.h>
@@ -133,6 +134,7 @@ DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
  * Only valid if arm64_mismatched_32bit_el0 is enabled.
  */
 static cpumask_var_t cpu_32bit_el0_mask __cpumask_var_read_mostly;
+static cpumask_var_t fallback_32bit_el0_mask __cpumask_var_read_mostly;
 
 void dump_cpu_features(void)
 {
@@ -1618,6 +1620,23 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+const struct cpumask *task_cpu_fallback_mask(struct task_struct *p)
+{
+	if (!static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return housekeeping_cpumask(HK_TYPE_TICK);
+
+	if (!is_compat_thread(task_thread_info(p)))
+		return housekeeping_cpumask(HK_TYPE_TICK);
+
+	if (!system_supports_32bit_el0())
+		return cpu_none_mask;
+
+	if (!cpumask_empty(fallback_32bit_el0_mask))
+		return fallback_32bit_el0_mask;
+	else
+		return cpu_32bit_el0_mask;
+}
+
 static int __init parse_32bit_el0_param(char *str)
 {
 	allow_mismatched_32bit_el0 = true;
@@ -3605,22 +3624,33 @@ static int mismatched_32bit_el0_online(unsigned int cpu)
 
 	if (cpu_32bit) {
 		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
+		if (housekeeping_cpu(cpu, HK_TYPE_TICK))
+			cpumask_set_cpu(cpu, fallback_32bit_el0_mask);
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
 	}
 
-	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+	if (cpu_32bit_unofflineable >= 0) {
+		if (!housekeeping_cpu(cpu_32bit_unofflineable, HK_TYPE_TICK) &&
+		    cpu_32bit && housekeeping_cpu(cpu, HK_TYPE_TICK)) {
+			cpu_32bit_unofflineable = cpu;
+			pr_info("Asymmetric 32-bit EL0 support detected on housekeeping CPU %u;"
+				"CPU hot-unplug disabled on CPU %u\n", cpu, cpu);
+		}
 		return 0;
+	}
 
-	if (cpu_32bit_unofflineable < 0)
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
 		return 0;
 
 	/*
-	 * We've detected a mismatch. We need to keep one of our CPUs with
-	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
-	 * every CPU in the system for a 32-bit task.
+	 * We've detected a mismatch. We need to keep one of our CPUs, preferrably
+	 * housekeeping, with 32-bit EL0 online so that is_cpu_allowed() doesn't end up
+	 * rejecting every CPU in the system for a 32-bit task.
 	 */
-	cpu_32bit_unofflineable = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
-								    cpu_active_mask);
+	cpu_32bit_unofflineable = cpumask_any_and(fallback_32bit_el0_mask, cpu_active_mask);
+	if (cpu_32bit_unofflineable >= nr_cpu_ids)
+		cpu_32bit_unofflineable = cpumask_any_and(cpu_32bit_el0_mask, cpu_active_mask);
+
 	setup_elf_hwcaps(compat_elf_hwcaps);
 	elf_hwcap_fixup();
 	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
@@ -3641,6 +3671,9 @@ static int __init init_32bit_el0_mask(void)
 	if (!zalloc_cpumask_var(&cpu_32bit_el0_mask, GFP_KERNEL))
 		return -ENOMEM;
 
+	if (!zalloc_cpumask_var(&fallback_32bit_el0_mask, GFP_KERNEL))
+		return -ENOMEM;
+
 	return cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 				 "arm64/mismatched_32bit_el0:online",
 				 mismatched_32bit_el0_online, mismatched_32bit_el0_offline);
diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index bbaec80c78c5..ac01dc4eb2ce 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -24,6 +24,7 @@ static inline void leave_mm(void) { }
 #ifndef task_cpu_possible_mask
 # define task_cpu_possible_mask(p)	cpu_possible_mask
 # define task_cpu_possible(cpu, p)	true
+# define task_cpu_fallback_mask(p)	housekeeping_cpumask(HK_TYPE_TICK)
 #else
 # define task_cpu_possible(cpu, p)	cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aeb595514461..1edce360f1a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3489,7 +3489,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
+			do_set_cpus_allowed(p, task_cpu_fallback_mask(p));
 			state = fail;
 			break;
 		case fail:
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-10-28 16:56 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-26 22:48 [PATCH 00/20] kthread: Introduce preferred affinity v4 Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 01/20] arm/bL_switcher: Use kthread_run_on_cpu() Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 02/20] x86/resctrl: " Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 03/20] firmware: stratix10-svc: " Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 04/20] scsi: bnx2fc: Use kthread_create_on_cpu() Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 05/20] scsi: bnx2i: " Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 06/20] scsi: qedi: " Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 07/20] soc/qman: test: Use kthread_run_on_cpu() Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 08/20] kallsyms: " Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 09/20] lib: test_objpool: " Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 10/20] net: pktgen: Use kthread_create_on_node() Frederic Weisbecker
2024-09-27  7:58   ` Eric Dumazet
2024-09-30 17:19   ` Vishal Chourasia
2024-10-24 14:29     ` Frederic Weisbecker
2024-09-26 22:48 ` [PATCH 11/20] sched: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
2024-09-27  7:26   ` Michal Hocko
2024-10-08 10:54   ` Will Deacon
2024-10-08 12:27     ` Frederic Weisbecker
2024-10-15 13:48     ` Frederic Weisbecker
2024-10-28 16:25       ` Will Deacon
2024-10-28 16:51         ` Frederic Weisbecker
2024-10-28 16:54           ` [PATCH 1/2] arm64: Keep first mismatched 32bits el0 capable CPU online through its callbacks Frederic Weisbecker
2024-10-28 16:56           ` [PATCH 2/2] sched,arm64: Handle CPU isolation on last resort fallback rq selection Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 12/20] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 13/20] kthread: Default affine kthread to its preferred NUMA node Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 14/20] mm: Create/affine kcompactd to its preferred node Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 15/20] mm: Create/affine kswapd " Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 16/20] kthread: Implement preferred affinity Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 17/20] rcu: Use kthread preferred affinity for RCU boost Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 18/20] kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format Frederic Weisbecker
2024-09-26 22:49 ` [PATCH 19/20] treewide: Introduce kthread_run_worker[_on_cpu]() Frederic Weisbecker
2024-09-27  5:39   ` Paul E. McKenney
2024-09-26 22:49 ` [PATCH 20/20] rcu: Use kthread preferred affinity for RCU exp kworkers Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2024-07-26 21:56 [PATCH 00/20] kthread: Introduce preferred affinity Frederic Weisbecker
2024-07-26 21:56 ` [PATCH 05/20] scsi: bnx2i: Use kthread_create_on_cpu() Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox