Re: [RFC PATCH v1] Add kthreads_update_affinity()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Frederic Weisbecker <frederic@kernel.org>
To: Costa Shulyupin <costa.shul@redhat.com>
Cc: "Waiman Long" <longman@redhat.com>, "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Dan Carpenter" <dan.carpenter@linaro.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Chen Yu" <yu.c.chen@intel.com>, "Kees Cook" <kees@kernel.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [RFC PATCH v1] Add kthreads_update_affinity()
Date: Wed, 22 Jan 2025 18:11:16 +0100	[thread overview]
Message-ID: <Z5EmtNh_lryVj0S3@localhost.localdomain> (raw)
In-Reply-To: <20250113190911.800623-1-costa.shul@redhat.com>

Hi Costa,

Le Mon, Jan 13, 2025 at 09:08:54PM +0200, Costa Shulyupin a écrit :
> Changing and using housekeeping and isolated CPUs requires reboot.
> 
> The goal is to change CPU isolation dynamically without reboot.
> 
> This patch is based on the parent patch
> cgroup/cpuset: Exclude isolated CPUs from housekeeping CPU masks
> https://lore.kernel.org/lkml/20240821142312.236970-3-longman@redhat.com/
> Its purpose is to update isolation cpumasks.
> 
> However, some subsystems may use outdated housekeeping CPU masks. To
> prevent the use of these isolated CPUs, it is essential to explicitly
> propagate updates to the housekeeping masks across all subsystems that
> depend on them.
> 
> This patch is not intended to be merged and disrupt the kernel.
> It is still a proof-of-concept for research purposes.
> 
> The questions are:
> - Is this the right direction, or should I explore an alternative approach?
> - What factors need to be considered?
> - Any suggestions or advice?
> - Have similar attempts been made before?

Since the kthreads preferred affinity patchset just got merged,
I don't think anything needs to be done anymore in the kthreads front to toggle
nohz_full through cpusets safely. Let's see the details below:

> 
> Update the affinity of kthreadd and trigger the recalculation of kthread
> affinities using kthreads_online_cpu().
> 
> The argument passed to kthreads_online_cpu() is irrelevant, as the
> function reassigns affinities of kthreads based on their
> preferred_affinity and the updated housekeeping_cpumask(HK_TYPE_KTHREAD).
> 
> Currently only RCU uses kthread_affine_preferred().
> 
> I dare to try calling kthread_affine_preferred() from kthread_run() to
> set preferred_affinity as cpu_possible_mask for kthreads without a
> specific affinity, enabling their management through
> kthreads_online_cpu().
> 
> Any objections?
> 
> For details about kthreads affinity patterns please see:
> https://lore.kernel.org/lkml/20241211154035.75565-16-frederic@kernel.org/
> 
> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
> ---
>  include/linux/kthread.h | 5 ++++-
>  kernel/cgroup/cpuset.c  | 1 +
>  kernel/kthread.c        | 6 ++++++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 8d27403888ce9..b43c5aeb2cfd7 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -52,8 +52,10 @@ bool kthread_is_per_cpu(struct task_struct *k);
>  ({									   \
>  	struct task_struct *__k						   \
>  		= kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
> -	if (!IS_ERR(__k))						   \
> +	if (!IS_ERR(__k)) {						   \
> +		kthread_affine_preferred(__k, cpu_possible_mask);	   \
>  		wake_up_process(__k);					   \
> +	}								   \
>  	__k;								   \
>  })
>  
> @@ -270,4 +272,5 @@ struct cgroup_subsys_state *kthread_blkcg(void);
>  #else
>  static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { }
>  #endif
> +void kthreads_update_affinity(void);
>  #endif /* _LINUX_KTHREAD_H */
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 65658a5c2ac81..7d71acc7f46b6 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1355,6 +1355,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated)
>  	trl();
>  	ret = housekeeping_exlude_isolcpus(isolated_cpus, HOUSEKEEPING_FLAGS);
>  	WARN_ON_ONCE((ret < 0) && (ret != -EOPNOTSUPP));
> +	kthreads_update_affinity();

A few things to consider:

1) update_isolation_cpumasks() will be called with cpus_read_lock()
  (cf: sched_partition_write() and cpuset_write_resmask()), therefore
  kthreads_online_cpu() can't run concurrently.

2) The constraint to turn on/off a CPU as nohz_full will be that the
   target CPU is offline.

So when the target CPU will later become offline, kthreads_online_cpu()
will take care of the newly modified housekeeping_mask() (which will be visible
because cpus_write_lock() is held) and apply accordingly the appropriate
effective affinity.

The only thing that might need to be done by update_isolation_cpumasks() is
to hold kthreads_hotplug_lock so that a subsequent call to
kthread_affine_preferred() sees the "freshest" update to housekeeping_mask().
But even that may not be mandatory because the concerned CPUs are offline
and kthreads_online_cpu() will fix the race when they boot.

Oh and the special case kthreadd might need a direct affinity update.

So the good news is that we have a lot of things sorted out to prepare
for that cpuset interface:

_ RCU NOCB is ready
_ kthreads are ready
_ timers are fine since we toggle only offline CPUs and timer_migration.c
  should work without changes.

Still some details need to be taken care of:

* scheduler (see the housekeeping_mask() references, especially the ilb which is
  my biggest worry, get_nohz_timer_target() shouldn't be an issue)
  
* posix cpu timers (make tick_dep unconditional ?)

* kernel/watchdog.c (make proc_watchdog_cpumask() and
  lockup_detector_online_cpu() safe against update_isolation_cpumasks()
  
* workqueue unbound mask (just apply the new one?)

* some RCU tick_dep to handle (make them unconditional since they
  apply on slow path anyway?)
  
* other things? (grep for tick_nohz_full_cpu(), housekeeping_* and tick_dep_* )

But we are getting closer!

Thanks!

next prev parent reply	other threads:[~2025-01-22 17:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-13 19:08 [RFC PATCH v1] Add kthreads_update_affinity() Costa Shulyupin
2025-01-22 17:11 ` Frederic Weisbecker [this message]
2025-02-20 13:57   ` Costa Shulyupin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5EmtNh_lryVj0S3@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=costa.shul@redhat.com \
    --cc=dan.carpenter@linaro.org \
    --cc=hannes@cmpxchg.org \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=rdunlap@infradead.org \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox