From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E82462F5A2E for ; Wed, 19 Nov 2025 16:50:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763571026; cv=none; b=XIp7bcxqCDWinEg+prlUChkxTuurpuE024nf6NzGcZ/QWPGR9JQICKjdHktrHlUaLigB69Acam6M7G1VO7ZVvcF+DO7q9EmhGZcSOGKHmbl2Y2izKz2xtWBYAEZ9om7tM8+ErUWOmW/I4s++qkHAyvtryOxTJe8+UWmT+sz2gV0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763571026; c=relaxed/simple; bh=6be5iNbOIjg30wf2AWrJmk6w/3SY2c5gqQ8+vOmy9lU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=PXwx5EPKbu39e0t4xnPS7fwiYerzVst5V9w8mSMTbqjQ6+KdKlv+/H4tbHOHJKkW/sotApEPmZ7axuFhk87d9yNMDcJtEaIm8pPSQsOaht/ayhdmjX5bpg/CR5lht3uwwtJhhagzrKPq5UvbcvOn7Wq6GL5IhDSBN9qAy5ZE4u0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=0038JqJ/; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ynC4hQPc; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="0038JqJ/"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ynC4hQPc" From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1763571016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1a/2pCptLQLTUelnwLk9vLjUuNrCtLbjuH6swgavBAs=; b=0038JqJ/4guPSk0bBeG1WASIQaax5IZzcnhylxVlXGpkpFeFCnyDUIRAsBtw/LFK5dwOPg 9L5/YoiMIis9Rbk4U5RTv7aB6kdZxL4H9qY6VKOJRn6MJuFZ2XkBOUqbk/u9xfwfi8UHic xqc6Y7sfFEidlt3uZpRHrDlsWmbOVOeCm4o9iXtdChU9g2j91FKsklpYCF6C+0r4Baj8vN Foas4b+EyJWntI+5iKQtSAHUfxk8AelU62x5PAcI2DLjauG1l3/8DxRa49pc22WvDUHlnt 5N5JoFigo1kqbhSyhgumw4SoKE4lkn7FBkWL8MvrEjvztNVk6QReKFZCrxr2fQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1763571016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1a/2pCptLQLTUelnwLk9vLjUuNrCtLbjuH6swgavBAs=; b=ynC4hQPcvQaMmC7cEEPdmJPNORH05huQBs2DdVHZ8apOsQbEdu3VNjF40uylavx+8q05tr ISCsvEYUd4IJs9Dg== To: Gabriele Monaco , linux-kernel@vger.kernel.org, Anna-Maria Behnsen , Frederic Weisbecker , Waiman Long Cc: Gabriele Monaco , "John B. Wyatt IV" , "John B. Wyatt IV" Subject: Re: [PATCH v15 7/7] timers: Exclude isolated cpus from timer migration In-Reply-To: <20251113083324.33490-8-gmonaco@redhat.com> References: <20251113083324.33490-1-gmonaco@redhat.com> <20251113083324.33490-8-gmonaco@redhat.com> Date: Wed, 19 Nov 2025 17:50:15 +0100 Message-ID: <87pl9eklvc.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Thu, Nov 13 2025 at 09:33, Gabriele Monaco wrote: > +/* Enabled during late initcall */ > +static DEFINE_STATIC_KEY_FALSE(tmigr_exclude_isolated); > + > #define TMIGR_NONE 0xFF > #define BIT_CNT 8 > > @@ -438,6 +442,33 @@ static inline bool tmigr_is_not_available(struct tmigr_cpu *tmc) > return !(tmc->tmgroup && tmc->available); > } > > +/* > + * Returns true if @cpu should be excluded from the hierarchy as isolated. > + * Domain isolated CPUs don't participate in timer migration, nohz_full CPUs > + * are still part of the hierarchy but become idle (from a tick and timer > + * migration perspective) when they stop their tick. This lets the timekeeping > + * CPU handle their global timers. Marking also isolated CPUs as idle would be > + * too costly, hence they are completely excluded from the hierarchy. > + * This check is necessary, for instance, to prevent offline isolated CPUs from > + * being incorrectly marked as available once getting back online. > + * > + * This function returns false during early boot and the isolation logic is > + * enabled only after isolated CPUs are marked as unavailable at late boot. > + * The tick CPU can be isolated at boot, however we cannot mark it as > + * unavailable to avoid having no global migrator for the nohz_full CPUs. This > + * should be ensured by the callers of this function: implicitly from hotplug > + * callbacs and explicitly in tmigr_init_isolation and callbacks tmigr_init_isolation() > + * tmigr_isolated_exclude_cpumask. tmigr_isolated_exclude_cpumask() It's documented how functions should be denoted in comments and change logs, no? > + */ > +static inline bool tmigr_is_isolated(int cpu) > +{ > + if (static_branch_unlikely(&tmigr_exclude_isolated)) > + return (!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) || > + cpuset_cpu_is_isolated(cpu)) && > + housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE); Lacks brackets on the if () https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#bracket-rules Also you can make this way more readable by inverting the condition: if (!static_branch_unlikely(&tmigr_exclude_isolated)) return false; return .....; No? > + return false; > +} > + > /* > * Returns true, when @childmask corresponds to the group migrator or when the > * group is not active - so no migrator is set. > @@ -1439,8 +1470,9 @@ static int tmigr_clear_cpu_available(unsigned int cpu) > int migrator; > u64 firstexp; > > - cpumask_clear_cpu(cpu, tmigr_available_cpumask); By removing this the function name does not make any sense any more. Splitting the cpumask_clear_set() out, renaming the function > scoped_guard(raw_spinlock_irq, &tmc->lock) { > + if (!tmc->available) > + return 0; and adding this > tmc->available = false; > WRITE_ONCE(tmc->wakeup, KTIME_MAX); > > @@ -1453,11 +1485,11 @@ static int tmigr_clear_cpu_available(unsigned int cpu) > } > > if (firstexp != KTIME_MAX) { > - migrator = cpumask_any(tmigr_available_cpumask); > + migrator = cpumask_any_but(tmigr_available_cpumask, cpu); and this should be done in a preparatory patch along with a reasonable explanation in the change log. > work_on_cpu(migrator, tmigr_trigger_active, NULL); > } > > - return 0; > + return 1; But thinking more about it. What's the actual point of moving this 'clear' out instead of just moving it further down? It does not matter at all whether the isol/unisol muck clears an already cleared bit or not. But it would keep the function name comprehensible and avoid all this online/offline wrapper nonsense. > } > > static int tmigr_set_cpu_available(unsigned int cpu) > @@ -1468,17 +1500,130 @@ static int tmigr_set_cpu_available(unsigned int cpu) > if (WARN_ON_ONCE(!tmc->tmgroup)) > return -EINVAL; > > - cpumask_set_cpu(cpu, tmigr_available_cpumask); Ditto. > +static int __tmigr_isolated_exclude_cpumask(struct cpumask *exclude_cpumask) > +{ > + struct work_struct __percpu *works __free(free_percpu) = > + alloc_percpu(struct work_struct); > + cpumask_var_t cpumask_unisol __free(free_cpumask_var) = CPUMASK_VAR_NULL; > + cpumask_var_t cpumask_isol __free(free_cpumask_var) = CPUMASK_VAR_NULL; > + int cpu; > + > + if (!alloc_cpumask_var(&cpumask_isol, GFP_KERNEL)) > + return -ENOMEM; > + if (!alloc_cpumask_var(&cpumask_unisol, GFP_KERNEL)) > + return -ENOMEM; > + if (!works) > + return -ENOMEM; Checking the first allocation after trying to allocate other stuff makes a lot of sense - NOT. > + cpumask_andnot(cpumask_unisol, cpu_online_mask, exclude_cpumask); > + cpumask_andnot(cpumask_unisol, cpumask_unisol, tmigr_available_cpumask); > + /* Set up the mask earlier to avoid races with the migrator CPU */ > + cpumask_or(tmigr_available_cpumask, tmigr_available_cpumask, cpumask_unisol); Your new line key is broken. This comment is barely noticeable. What's worse is that it completely fails to explain what the actual race is. > + for_each_cpu(cpu, cpumask_unisol) { > + struct work_struct *work = per_cpu_ptr(works, cpu); > + > + INIT_WORK(work, tmigr_cpu_unisolate); > + schedule_work_on(cpu, work); > + } > + > + cpumask_and(cpumask_isol, exclude_cpumask, tmigr_available_cpumask); > + cpumask_and(cpumask_isol, cpumask_isol, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)); > + /* > + * Handle this here and not in the cpuset code because exclude_cpumask > + * might include also the tick CPU if included in isolcpus. > + */ > + for_each_cpu(cpu, cpumask_isol) { > + if (!tick_nohz_cpu_hotpluggable(cpu)) { > + cpumask_clear_cpu(cpu, cpumask_isol); > + break; > + } > + } > + /* Set up the mask earlier to avoid races with the migrator CPU */ > + cpumask_andnot(tmigr_available_cpumask, tmigr_available_cpumask, cpumask_isol); > + for_each_cpu(cpu, cpumask_isol) { > + struct work_struct *work = per_cpu_ptr(works, cpu); This lacks a comment explaining that cpumask_unisol and _isol are not overlapping. I had to stare at this five times to convince myself that it's correct. > + > + INIT_WORK(work, tmigr_cpu_isolate); > + schedule_work_on(cpu, work); > + } > + > + for_each_cpu_or(cpu, cpumask_isol, cpumask_unisol) > + flush_work(per_cpu_ptr(works, cpu)); > + > return 0; > } > > +/** > + * tmigr_isolated_exclude_cpumask - Exclude given CPUs from hierarchy > + * @exclude_cpumask: the cpumask to be excluded from timer migration hierarchy > + * > + * This function can be called from cpuset code to provide the new set of > + * isolated CPUs that should be excluded from the hierarchy. > + * Online CPUs not present in exclude_cpumask but already excluded are brought > + * back to the hierarchy. > + * Functions to isolate/unisolate need to be called locally and can sleep. > + */ > +int tmigr_isolated_exclude_cpumask(struct cpumask *exclude_cpumask) > +{ > + lockdep_assert_cpus_held(); > + return __tmigr_isolated_exclude_cpumask(exclude_cpumask); This wrapper is required because... > +} > + > +static int __init tmigr_init_isolation(void) > +{ > + cpumask_var_t cpumask __free(free_cpumask_var) = CPUMASK_VAR_NULL; > + > + static_branch_enable(&tmigr_exclude_isolated); > + > + if (!housekeeping_enabled(HK_TYPE_DOMAIN)) > + return 0; > + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) > + return -ENOMEM; > + > + cpumask_andnot(cpumask, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); ... it would be too sensible to guard this with: guard(cpus_read_lock)(); for consistency sake _AND_ what's more important to protect it against the RCU torture test code which plays with CPU hotplug starting in device_initcall(), which runs before > +late_initcall(tmigr_init_isolation); Thanks, tglx