Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2 09/12] cgroup/cpuset: Introduce CPUSet-driven dynamic housekeeping (DHM)
From: Frederic Weisbecker @ 2026-04-15  9:57 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Anna-Maria Behnsen,
	Ingo Molnar, Thomas Gleixner, Tejun Heo, Andrew Morton,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
	Chen Ridong, Michal Koutný, Jonathan Corbet, Shuah Khan,
	Shuah Khan, linux-kernel, rcu, linux-mm, cgroups, linux-doc,
	linux-kselftest
In-Reply-To: <20260413-wujing-dhm-v2-9-06df21caba5d@gmail.com>

Le Mon, Apr 13, 2026 at 03:43:15PM +0800, Qiliang Yuan a écrit :
> Currently, subsystem housekeeping masks are generally static and can
> only be configured via boot-time parameters (e.g., isolcpus, nohz_full).
> This inflexible approach forces a system reboot whenever an orchestrator
> needs to change workload isolation boundaries.
> 
> This patch introduces CPUSet-driven Dynamic Housekeeping Management (DHM)
> by exposing the `cpuset.housekeeping.cpus` control file on the root cgroup.
> Writing a new cpumask to this file dynamically updates the housekeeping
> masks of all registered subsystems (scheduler, RCU, timers, tick, workqueues,
> and managed IRQs) simultaneously, without restarting the node.

There is already the "isolated" partition type which does scheduler, timers
and workqueues isolation. Shouldn't we extend that to dynamically apply nohz_full
instead of adding a new unrelated file?

I don't know which form that should take. Perhaps reuse the "isolated" partition
but add some sort of parameter to define if we want only domain isolation or
also full isolation (that is nohz_full). Waiman should have a better idea for an
interface here.

> 
> At the cpuset and isolation core level, this change implements:
> 1. `housekeeping_update_all_types(const struct cpumask *new_mask)` API inside
>    `isolation.c` to safely allocate, update, and replace all enabled hk_type
> masks.

HK_TYPE_DOMAIN is handled by "isolated" partitions. What remains to handle
is HK_TYPE_KERNEL_NOISE.

As for managed IRQs this will require more thinking but we should include that
into "full isolation" in the future.

> +int housekeeping_update_all_types(const struct cpumask *new_mask)

Please reuse housekeeping_update().

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* Re: [PATCH v2 01/12] sched/isolation: Separate housekeeping types in enum hk_type
From: Frederic Weisbecker @ 2026-04-15 10:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Qiliang Yuan, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Anna-Maria Behnsen,
	Ingo Molnar, Thomas Gleixner, Tejun Heo, Andrew Morton,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, Chen Ridong,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Shuah Khan,
	linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest
In-Reply-To: <fd77bca8-bee8-4997-a11a-932a1693edf7@redhat.com>

Le Mon, Apr 13, 2026 at 03:25:46PM -0400, Waiman Long a écrit :
> On 4/13/26 3:43 AM, Qiliang Yuan wrote:
> > Most kernel noise types (TICK, TIMER, RCU, etc.) are currently aliased
> > to a single HK_TYPE_KERNEL_NOISE enum value. This prevents fine-grained
> > runtime isolation control as all masks are forced to be identical.
> > 
> > Un-alias service-specific housekeeping types in enum hk_type. This
> > separation provides the necessary granularity for DHM subsystems to
> > subscribe to and maintain independent affinity masks.
> 
> Usually, if we want to run a latency sensitive workload like DPDK, we try to
> minimize all sorts of kernel noises or interference as much as possible. Do
> you have a good use case where it is advantageous to remove some types of
> kernel noises from a given set of CPUs but not the others?

Right what we want to do here is to remove the aliases (HK_TYPE_TIMER,
HK_TYPE_WQ, ...) and rename them to HK_TYPE_KERNEL_NOISE.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* Re: [PATCH v12 00/15] arm64/riscv: Add support for crashkernel CMA reservation
From: Jinjie Ruan @ 2026-04-15 10:04 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, vgoyal, dyoung,
	rdunlap, peterz, pawan.kumar.gupta, feng.tang, dapeng1.mi, kees,
	elver, paulmck, lirongqing, rppt, leitao, ardb, jbohac, cfsworks,
	tangyouling, sourabhjain, ritesh.list, hbathini, eajames, guoren,
	songshuaishuai, kevin.brodsky, vishal.moola, junhui.liu, coxu,
	fuqiang.wang, liaoyuanhong, takahiro.akashi, james.morse,
	lizhengyu3, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
In-Reply-To: <20260402072701.628293-1-ruanjinjie@huawei.com>



On 4/2/2026 3:26 PM, Jinjie Ruan wrote:
> The crash memory allocation, and the exclude of crashk_res, crashk_low_res
> and crashk_cma memory are almost identical across different architectures,
> This patch set handle them in crash core in a general way, which eliminate
> a lot of duplication code.
> 
> And add support for crashkernel CMA reservation for arm64 and riscv.
> 
> Rebased on v7.0-rc1.
> 
> Basic second kernel boot test were performed on QEMU platforms for x86,
> ARM64, and RISC-V architectures with the following parameters:
> 
> 	"cma=256M crashkernel=256M crashkernel=64M,cma"
> 
> Changes in v12:
> - Remove the unused "nr_mem_ranges" for x86.
> - Add "Fix crashk_low_res not exclude bug" test log.
> - Provide a separate patch for each architecture for using
>   crash_prepare_headers(), which will make the review more convenient.
> - Add Reviewed-by and Tested-by.
> - Link to v11: https://lore.kernel.org/all/20260328074013.3589544-1-ruanjinjie@huawei.com/
> 
> Changes in v11:
> - Avoid silently drop crash memory if the crash kernel is built without
>   CONFIG_CMA.
> - Remove unnecessary "cmem->nr_ranges = 0" for arch_crash_populate_cmem()
>   as we use kvzalloc().
> - Provide a separate patch for each architecture to fix the existing
>   buffer overflow issue.

Are there any further review comments? Especially regarding the patch
for fixing out-of-bounds access (suggested by the AI review), as well as
the changes for LoongArch, x86 and riscv architectures.

> - Add Acked-bys for arm64.
> 
> Changes in v10:
> - Fix crashk_low_res not excluded bug in the existing
>   RISC-V code.
> - Fix an existing memory leak issue in the existing PowerPC code.
> - Fix the ordering issue of adding CMA ranges to
>   "linux,usable-memory-range".
> - Fix an existing concurrency issue. A Concurrent memory hotplug may occur
>   between reading memblock and attempting to fill cmem during kexec_load()
>   for almost all existing architectures.
> - Link to v9: https://lore.kernel.org/all/20260323072745.2481719-1-ruanjinjie@huawei.com/
> 
> Changes in v9:
> - Collect Reviewed-by and Acked-by, and prepare for Sashiko AI review.
> - Link to v8: https://lore.kernel.org/all/20260302035315.3892241-1-ruanjinjie@huawei.com/
> 
> Changes in v8:
> - Fix the build issues reported by kernel test robot and Sourabh.
> - Link to v7: https://lore.kernel.org/all/20260226130437.1867658-1-ruanjinjie@huawei.com/
> 
> Changes in v7:
> - Correct the inclusion of CMA-reserved ranges for kdump kernel in of/kexec
>   for arm64 and riscv.
> - Add Acked-by.
> - Link to v6: https://lore.kernel.org/all/20260224085342.387996-1-ruanjinjie@huawei.com/
> 
> Changes in v6:
> - Update the crash core exclude code as Mike suggested.
> - Rebased on v7.0-rc1.
> - Add acked-by.
> - Link to v5: https://lore.kernel.org/all/20260212101001.343158-1-ruanjinjie@huawei.com/
> 
> Jinjie Ruan (14):
>   riscv: kexec_file: Fix crashk_low_res not exclude bug
>   powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr()
>   x86/kexec: Fix potential buffer overflow in prepare_elf_headers()
>   arm64: kexec_file: Fix potential buffer overflow in
>     prepare_elf_headers()
>   riscv: kexec_file: Fix potential buffer overflow in
>     prepare_elf_headers()
>   LoongArch: kexec: Fix potential buffer overflow in
>     prepare_elf_headers()
>   crash: Add crash_prepare_headers() to exclude crash kernel memory
>   arm64: kexec_file: Use crash_prepare_headers() helper to simplify code
>   x86/kexec: Use crash_prepare_headers() helper to simplify code
>   riscv: kexec_file: Use crash_prepare_headers() helper to simplify code
>   LoongArch: kexec: Use crash_prepare_headers() helper to simplify code
>   crash: Use crash_exclude_core_ranges() on powerpc
>   arm64: kexec: Add support for crashkernel CMA reservation
>   riscv: kexec: Add support for crashkernel CMA reservation
> 
> Sourabh Jain (1):
>   powerpc/crash: sort crash memory ranges before preparing elfcorehdr
> 
>  .../admin-guide/kernel-parameters.txt         |  16 +--
>  arch/arm64/kernel/machine_kexec_file.c        |  43 +++-----
>  arch/arm64/mm/init.c                          |   5 +-
>  arch/loongarch/kernel/machine_kexec_file.c    |  43 +++-----
>  arch/powerpc/include/asm/kexec_ranges.h       |   1 -
>  arch/powerpc/kexec/crash.c                    |   7 +-
>  arch/powerpc/kexec/ranges.c                   | 101 +-----------------
>  arch/riscv/kernel/machine_kexec_file.c        |  42 +++-----
>  arch/riscv/mm/init.c                          |   5 +-
>  arch/x86/kernel/crash.c                       |  92 +++-------------
>  drivers/of/fdt.c                              |   9 +-
>  drivers/of/kexec.c                            |   9 ++
>  include/linux/crash_core.h                    |   9 ++
>  include/linux/crash_reserve.h                 |   4 +-
>  kernel/crash_core.c                           |  89 ++++++++++++++-
>  15 files changed, 193 insertions(+), 282 deletions(-)
> 


^ permalink raw reply

* Re: [PATCH v10 00/11] ADF41513/ADF41510 PLL frequency synthesizers
From: Andy Shevchenko @ 2026-04-15 10:24 UTC (permalink / raw)
  To: rodrigo.alencar
  Cc: linux-kernel, linux-iio, devicetree, linux-doc, Jonathan Cameron,
	David Lechner, Andy Shevchenko, Lars-Peter Clausen,
	Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
	Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan,
	Krzysztof Kozlowski
In-Reply-To: <20260415-adf41513-iio-driver-v10-0-df61046d5457@analog.com>

On Wed, Apr 15, 2026 at 10:51:43AM +0100, Rodrigo Alencar via B4 Relay wrote:
> This patch series adds support for the Analog Devices ADF41513 and ADF41510
> ultralow noise PLL frequency synthesizers. These devices are designed for
> implementing local oscillators (LOs) in high-frequency applications.
> The ADF41513 covers frequencies from 1 GHz to 26.5 GHz, while the ADF41510
> operates from 1 GHz to 10 GHz.
> 
> Key features supported by this driver:
> - Integer-N and fractional-N operation modes
> - High maximum PFD frequency (250 MHz integer-N, 125 MHz fractional-N)
> - 25-bit fixed modulus or 49-bit variable modulus fractional modes
> - Digital lock detect functionality
> - Phase resync capability for consistent output phase
> - Load Enable vs Reference signal syncronization
> 
> The series includes:
> 1. PLL driver implementation
> 2. Device tree bindings documentation
> 3. IIO ABI documentation
> 
> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
> ---
> Changes in v10:
> - Drop simple_strntoull() changes
> - Create kstrtodec64() and kstrtoudec64() helpers. 

On a brief look this looks quite good. I will review it later on. We still have
several weeks time.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* [PATCH] net: ipv4: igmp: add sysctl option to ignore inbound llm_reports
From: Steffen Trumtrar @ 2026-04-15 10:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern
  Cc: netdev, linux-doc, linux-kernel, Steffen Trumtrar

Add a new sysctl option 'igmp_link_local_mcast_reports_drop' that allows
dropping inbound IGMP reports for link-local multicast groups in the
224.0.0.X range. This can be used to prevent the local system from
processing IGMP reports for link local multicast groups and therefore
let the kernel still send the own outbound IGMP reports.

Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
---
 Documentation/networking/ip-sysctl.rst                       | 12 ++++++++++++
 .../networking/net_cachelines/netns_ipv4_sysctl.rst          |  1 +
 include/net/netns/ipv4.h                                     |  1 +
 net/ipv4/af_inet.c                                           |  1 +
 net/ipv4/igmp.c                                              |  2 ++
 net/ipv4/sysctl_net_ipv4.c                                   |  7 +++++++
 6 files changed, 24 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 6921d8594b849..2da4cd6ac7202 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2306,6 +2306,18 @@ igmp_link_local_mcast_reports - BOOLEAN
 
 	Default TRUE
 
+igmp_link_local_mcast_reports_drop - BOOLEAN
+	Drop inbound IGMP reports for link local multicast groups in
+	the 224.0.0.X range. When enabled, IGMP membership reports for
+	link local multicast addresses are silently dropped without
+	processing.
+	When the kernel gets inbound IGMP reports it stops sending own
+	IGMP reports. With allowing to drop and process the inbound reports,
+	the kernel will not stop sending the own reports, even when IGMP
+	reports from other hosts are seen on the network.
+
+	Default FALSE
+
 Alexey Kuznetsov.
 kuznet@ms2.inr.ac.ru
 
diff --git a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
index beaf1880a19bf..703afe2ba063b 100644
--- a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
+++ b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
@@ -140,6 +140,7 @@ int                             sysctl_udp_rmem_min
 u8                              sysctl_fib_notify_on_flag_change
 u8                              sysctl_udp_l3mdev_accept
 u8                              sysctl_igmp_llm_reports
+u8                              sysctl_igmp_llm_reports_drop
 int                             sysctl_igmp_max_memberships
 int                             sysctl_igmp_max_msf
 int                             sysctl_igmp_qrv
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 8e971c7bf1646..1453f825ffd4d 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -258,6 +258,7 @@ struct netns_ipv4 {
 	u8 sysctl_igmp_llm_reports;
 	int sysctl_igmp_max_memberships;
 	int sysctl_igmp_max_msf;
+	u8 sysctl_igmp_llm_reports_drop;
 	int sysctl_igmp_qrv;
 
 	struct ping_group_range ping_group_range;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c7731e300a442..b8f96a5d8afdc 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1825,6 +1825,7 @@ static __net_init int inet_init_net(struct net *net)
 	net->ipv4.sysctl_igmp_max_msf = 10;
 	/* IGMP reports for link-local multicast groups are enabled by default */
 	net->ipv4.sysctl_igmp_llm_reports = 1;
+	net->ipv4.sysctl_igmp_llm_reports_drop = 0;
 	net->ipv4.sysctl_igmp_qrv = 2;
 
 	net->ipv4.sysctl_fib_notify_on_flag_change = 0;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index a674fb44ec25b..3a4932e4108bd 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -931,6 +931,8 @@ static bool igmp_heard_report(struct in_device *in_dev, __be32 group)
 	if (ipv4_is_local_multicast(group) &&
 	    !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
 		return false;
+	if (READ_ONCE(net->ipv4.sysctl_igmp_llm_reports_drop))
+		return true;
 
 	rcu_read_lock();
 	for_each_pmc_rcu(in_dev, im) {
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 5654cc9c8a0b9..24dde84d289e4 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -948,6 +948,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dou8vec_minmax,
 	},
+	{
+		.procname	= "igmp_link_local_mcast_reports_drop",
+		.data		= &init_net.ipv4.sysctl_igmp_llm_reports_drop,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+	},
 	{
 		.procname	= "igmp_max_memberships",
 		.data		= &init_net.ipv4.sysctl_igmp_max_memberships,

---
base-commit: 028ef9c96e96197026887c0f092424679298aae8
change-id: 20260415-v7-0-topic-igmp-llm-drop-e4c13dbf17cc

Best regards,
--  
Steffen Trumtrar <s.trumtrar@pengutronix.de>


^ permalink raw reply related

* Re: [PATCH v2 03/12] rcu: Support runtime NOCB initialization and dynamic offloading
From: Frederic Weisbecker @ 2026-04-15 10:39 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Anna-Maria Behnsen,
	Ingo Molnar, Thomas Gleixner, Tejun Heo, Andrew Morton,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
	Chen Ridong, Michal Koutný, Jonathan Corbet, Shuah Khan,
	Shuah Khan, linux-kernel, rcu, linux-mm, cgroups, linux-doc,
	linux-kselftest
In-Reply-To: <20260413-wujing-dhm-v2-3-06df21caba5d@gmail.com>

Le Mon, Apr 13, 2026 at 03:43:09PM +0800, Qiliang Yuan a écrit :
> Context:
> The RCU Non-Callback (NOCB) infrastructure traditionally requires
> boot-time parameters (e.g., rcu_nocbs) to allocate masks and spawn
> management kthreads (rcuog/rcuo). This prevents systems from activating
> offloading on-demand without a reboot.
> 
> Problem:
> Dynamic Housekeeping Management requires CPUs to transition to
> NOCB mode at runtime when they are newly isolated. Without boot-time
> setup, the NOCB masks are unallocated, and critical kthreads are missing,
> preventing effective tick suppression and isolation.
> 
> Solution:
> Refactor RCU initialization to support dynamic on-demand setup.
> - Introduce rcu_init_nocb_dynamic() to allocate masks and organize
>   kthreads if the system wasn't initially configured for NOCB.
> - Introduce rcu_housekeeping_reconfigure() to iterate over CPUs and
>   perform safe offload/deoffload transitions via hotplug sequences
>   (cpu_down -> offload -> cpu_up) when a housekeeping cpuset triggers
>   a notifier event.
> - Remove __init from rcu_organize_nocb_kthreads to allow runtime
>   reconfiguration of the callback management hierarchy.
> 
> This enables a true "Zero-Conf" isolation experience where any CPU
> can be fully isolated at runtime regardless of boot parameters.
> 
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> ---
>  kernel/rcu/rcu.h       |  4 +++
>  kernel/rcu/tree.c      | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/rcu/tree.h      |  2 +-
>  kernel/rcu/tree_nocb.h | 31 +++++++++++++--------
>  4 files changed, 100 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 9b10b57b79ada..282874443c96b 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -663,8 +663,12 @@ unsigned long srcu_batches_completed(struct srcu_struct *sp);
>  #endif // #else // #ifdef CONFIG_TINY_SRCU
>  
>  #ifdef CONFIG_RCU_NOCB_CPU
> +void rcu_init_nocb_dynamic(void);
> +void rcu_spawn_cpu_nocb_kthread(int cpu);
>  void rcu_bind_current_to_nocb(void);
>  #else
> +static inline void rcu_init_nocb_dynamic(void) { }
> +static inline void rcu_spawn_cpu_nocb_kthread(int cpu) { }
>  static inline void rcu_bind_current_to_nocb(void) { }
>  #endif
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 55df6d37145e8..84c8388cf89a1 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4928,4 +4928,79 @@ void __init rcu_init(void)
>  #include "tree_stall.h"
>  #include "tree_exp.h"
>  #include "tree_nocb.h"
> +
> +#ifdef CONFIG_SMP
> +static int rcu_housekeeping_reconfigure(struct notifier_block *nb,
> +					unsigned long action, void *data)
> +{
> +	struct housekeeping_update *upd = data;
> +	struct task_struct *t;
> +	int cpu;
> +
> +	if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_RCU)
> +		return NOTIFY_OK;
> +
> +	rcu_init_nocb_dynamic();
> +
> +	for_each_possible_cpu(cpu) {
> +		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +		bool isolated = !cpumask_test_cpu(cpu, upd->new_mask);
> +		bool offloaded = rcu_rdp_is_offloaded(rdp);
> +
> +		if (isolated && !offloaded) {
> +			/* Transition to NOCB */
> +			pr_info("rcu: CPU %d transitioning to NOCB mode\n", cpu);
> +			if (cpu_online(cpu)) {
> +				remove_cpu(cpu);

We plan to assume that the CPU is offline while updating HK_TYPE_KERNEL_NOISE
through cpusets. So you shouldn't need to care about offlining here.


> +				rcu_spawn_cpu_nocb_kthread(cpu);
> +				rcu_nocb_cpu_offload(cpu);
> +				add_cpu(cpu);
> +			} else {
> +				rcu_spawn_cpu_nocb_kthread(cpu);
> +				rcu_nocb_cpu_offload(cpu);
> +			}
> +		} else if (!isolated && offloaded) {
> +			/* Transition to CB */
> +			pr_info("rcu: CPU %d transitioning to CB mode\n", cpu);
> +			if (cpu_online(cpu)) {
> +				remove_cpu(cpu);
> +				rcu_nocb_cpu_deoffload(cpu);
> +				add_cpu(cpu);
> +			} else {
> +				rcu_nocb_cpu_deoffload(cpu);
> +			}
> +		}
> +	}
> +
> +	t = READ_ONCE(rcu_state.gp_kthread);
> +	if (t)
> +		housekeeping_affine(t, HK_TYPE_RCU);
> +
> +#ifdef CONFIG_TASKS_RCU
> +	t = get_rcu_tasks_gp_kthread();
> +	if (t)
> +		housekeeping_affine(t, HK_TYPE_RCU);
> +#endif
> +
> +#ifdef CONFIG_TASKS_RUDE_RCU
> +	t = get_rcu_tasks_rude_gp_kthread();
> +	if (t)
> +		housekeeping_affine(t, HK_TYPE_RCU);
> +#endif

No need to handle kthreads affinities. This is already taken care of by isolated
cpuset partitions.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* Re: [PATCH v2 07/12] sched/core: Dynamically update scheduler domain housekeeping mask
From: Frederic Weisbecker @ 2026-04-15 10:47 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Anna-Maria Behnsen,
	Ingo Molnar, Thomas Gleixner, Tejun Heo, Andrew Morton,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
	Chen Ridong, Michal Koutný, Jonathan Corbet, Shuah Khan,
	Shuah Khan, linux-kernel, rcu, linux-mm, cgroups, linux-doc,
	linux-kselftest
In-Reply-To: <20260413-wujing-dhm-v2-7-06df21caba5d@gmail.com>

Le Mon, Apr 13, 2026 at 03:43:13PM +0800, Qiliang Yuan a écrit :
> Scheduler domains rely on HK_TYPE_DOMAIN to identify which CPUs are
> isolated from general load balancing. Currently, these boundaries are
> static and determined only during boot-time domain initialization.
> 
> Trigger a scheduler domain rebuild when the HK_TYPE_DOMAIN mask changes.
> 
> This ensures that scheduler isolation boundaries can be reconfigured
> at runtime via the DHEI sysfs or cpuset interface.
> 
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> ---
>  kernel/sched/core.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 496dff740dcaf..b71c433bbc420 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -39,6 +39,7 @@
>  #include <linux/sched/nohz.h>
>  #include <linux/sched/rseq_api.h>
>  #include <linux/sched/rt.h>
> +#include <linux/sched/topology.h>
>  
>  #include <linux/blkdev.h>
>  #include <linux/context_tracking.h>
> @@ -10959,3 +10960,25 @@ void sched_change_end(struct sched_change_ctx *ctx)
>  		p->sched_class->prio_changed(rq, p, ctx->prio);
>  	}
>  }
> +
> +static int sched_housekeeping_update(struct notifier_block *nb,
> +				     unsigned long action, void *data)
> +{
> +	struct housekeeping_update *update = data;
> +
> +	if (action == HK_UPDATE_MASK && update->type == HK_TYPE_DOMAIN)
> +		rebuild_sched_domains();
> +
> +	return NOTIFY_OK;
> +}

This is already handled by cpuset isolated partitions.

Thanks.

> +
> +static struct notifier_block sched_housekeeping_nb = {
> +	.notifier_call = sched_housekeeping_update,
> +};
> +
> +static int __init sched_housekeeping_init(void)
> +{
> +	housekeeping_register_notifier(&sched_housekeeping_nb);
> +	return 0;
> +}
> +late_initcall(sched_housekeeping_init);
> 
> -- 
> 2.43.0
> 

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* Re: [PATCH v2 08/12] workqueue, mm: Support dynamic housekeeping mask updates
From: Frederic Weisbecker @ 2026-04-15 10:50 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Anna-Maria Behnsen,
	Ingo Molnar, Thomas Gleixner, Tejun Heo, Andrew Morton,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
	Chen Ridong, Michal Koutný, Jonathan Corbet, Shuah Khan,
	Shuah Khan, linux-kernel, rcu, linux-mm, cgroups, linux-doc,
	linux-kselftest
In-Reply-To: <20260413-wujing-dhm-v2-8-06df21caba5d@gmail.com>

Le Mon, Apr 13, 2026 at 03:43:14PM +0800, Qiliang Yuan a écrit :
> Unbound workqueues and kcompactd threads determine their default CPU
> affinity from housekeeping masks (HK_TYPE_WQ, HK_TYPE_DOMAIN, and
> HK_TYPE_KTHREAD) at boot. Currently, these boundaries are static and
> are not updated if housekeeping is reconfigured at runtime.
> 
> Implement housekeeping notifiers for both workqueue and mm compaction.
> 
> This ensures that unbound workqueue tasks and background compaction
> threads honor dynamic isolation boundaries configured via sysfs or
> cpuset at runtime.
> 
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>

Unbound workqueues and kthreads are already handled by cpuset
isolated partitions.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* [PATCH v4] KEYS: trusted: Debugging as a feature
From: Jarkko Sakkinen @ 2026-04-15 11:12 UTC (permalink / raw)
  To: linux-integrity
  Cc: Jarkko Sakkinen, Nayna Jain, Srish Srinivasan, Jonathan Corbet,
	Shuah Khan, James Bottomley, Mimi Zohar, David Howells,
	Paul Moore, James Morris, Serge E. Hallyn, Ahmad Fatoum,
	Pengutronix Kernel Team, Andrew Morton, Borislav Petkov (AMD),
	Randy Dunlap, Dave Hansen, Pawan Gupta, Feng Tang, Dapeng Mi,
	Kees Cook, Marco Elver, Li RongQing, Paul E. McKenney,
	Thomas Gleixner, Bjorn Helgaas, linux-doc, linux-kernel, keyrings,
	linux-security-module

TPM_DEBUG, and other similar flags, are a non-standard way to specify a
feature in Linux kernel. Introduce CONFIG_TRUSTED_KEYS_DEBUG for trusted
keys, and use it to replace these ad-hoc feature flags.

Given that trusted keys debug dumps can contain sensitive data, harden the
feature as follows:

1. In the Kconfig description postulate that pr_debug() statements must be
   used.
2. Use pr_debug() statements in TPM 1.x driver to print the protocol dump.
3. Require trusted.debug=1 on the kernel command line (default: 0) to
   activate dumps at runtime, even when CONFIG_TRUSTED_KEYS_DEBUG=y.

Traces, when actually needed, can be easily enabled by providing
trusted.dyndbg='+p' and trusted.debug=1 in the kernel command-line.

Reported-by: Nayna Jain <nayna@linux.ibm.com>
Closes: https://lore.kernel.org/all/7f8b8478-5cd8-4d97-bfd0-341fd5cf10f9@linux.ibm.com/
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Srish Srinivasan <ssrish@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v4:
- Added kernel parameter documentation.t
- Added tags from Srishand and Nayna.
- Sanity check round. This version will be applied unless there is
  something specific to address.
v3:
- Add kernel-command line option for enabling the traces.
- Add safety information to the Kconfig entry.
v2:
- Implement for all trusted keys backends.
- Add HAVE_TRUSTED_KEYS_DEBUG as it is a good practice despite full
  coverage.
---
 .../admin-guide/kernel-parameters.txt         | 16 +++++++
 include/keys/trusted-type.h                   | 21 +++++----
 security/keys/trusted-keys/Kconfig            | 23 ++++++++++
 security/keys/trusted-keys/trusted_caam.c     |  7 ++-
 security/keys/trusted-keys/trusted_core.c     |  6 +++
 security/keys/trusted-keys/trusted_tpm1.c     | 44 +++++++++++--------
 6 files changed, 87 insertions(+), 30 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2ce1f4975c1..f1515668c8ab 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7917,6 +7917,22 @@ Kernel parameters
 			first trust source as a backend which is initialized
 			successfully during iteration.
 
+	trusted.debug=	[KEYS]
+			Format: <bool>
+			Enable trusted keys debug traces at runtime when
+			CONFIG_TRUSTED_KEYS_DEBUG=y.
+
+			To make the traces visible after enabling the option,
+			use trusted.dyndbg='+p' as needed. By convention,
+			the subsystem uses pr_debug() for these traces.
+
+			SAFETY: The traces can leak sensitive data, so be
+			cautious before enabling this. They remain inactive
+			unless this parameter is set this option to  a true
+			value.
+
+			Default: false
+
 	trusted.rng=	[KEYS]
 			Format: <string>
 			The RNG used to generate key material for trusted keys.
diff --git a/include/keys/trusted-type.h b/include/keys/trusted-type.h
index 03527162613f..9f9940482da4 100644
--- a/include/keys/trusted-type.h
+++ b/include/keys/trusted-type.h
@@ -83,18 +83,21 @@ struct trusted_key_source {
 
 extern struct key_type key_type_trusted;
 
-#define TRUSTED_DEBUG 0
+#ifdef CONFIG_TRUSTED_KEYS_DEBUG
+extern bool trusted_debug;
 
-#if TRUSTED_DEBUG
 static inline void dump_payload(struct trusted_key_payload *p)
 {
-	pr_info("key_len %d\n", p->key_len);
-	print_hex_dump(KERN_INFO, "key ", DUMP_PREFIX_NONE,
-		       16, 1, p->key, p->key_len, 0);
-	pr_info("bloblen %d\n", p->blob_len);
-	print_hex_dump(KERN_INFO, "blob ", DUMP_PREFIX_NONE,
-		       16, 1, p->blob, p->blob_len, 0);
-	pr_info("migratable %d\n", p->migratable);
+	if (!trusted_debug)
+		return;
+
+	pr_debug("key_len %d\n", p->key_len);
+	print_hex_dump_debug("key ", DUMP_PREFIX_NONE,
+			     16, 1, p->key, p->key_len, 0);
+	pr_debug("bloblen %d\n", p->blob_len);
+	print_hex_dump_debug("blob ", DUMP_PREFIX_NONE,
+			     16, 1, p->blob, p->blob_len, 0);
+	pr_debug("migratable %d\n", p->migratable);
 }
 #else
 static inline void dump_payload(struct trusted_key_payload *p)
diff --git a/security/keys/trusted-keys/Kconfig b/security/keys/trusted-keys/Kconfig
index 9e00482d886a..e5a4a53aeab2 100644
--- a/security/keys/trusted-keys/Kconfig
+++ b/security/keys/trusted-keys/Kconfig
@@ -1,10 +1,29 @@
 config HAVE_TRUSTED_KEYS
 	bool
 
+config HAVE_TRUSTED_KEYS_DEBUG
+	bool
+
+config TRUSTED_KEYS_DEBUG
+	bool "Debug trusted keys"
+	depends on HAVE_TRUSTED_KEYS_DEBUG
+	default n
+	help
+	  Trusted key backends and core code that support debug traces can
+	  opt-in that feature here. Traces must only use debug level output, as
+	  sensitive data may pass by. In the kernel-command line traces can be
+	  enabled via trusted.dyndbg='+p'.
+
+	  SAFETY: Debug dumps are inactive at runtime until trusted.debug is set
+	  to a true value on the kernel command-line. Use at your utmost
+	  consideration when enabling this feature on a production build. The
+	  general advice is not to do this.
+
 config TRUSTED_KEYS_TPM
 	bool "TPM-based trusted keys"
 	depends on TCG_TPM >= TRUSTED_KEYS
 	default y
+	select HAVE_TRUSTED_KEYS_DEBUG
 	select CRYPTO_HASH_INFO
 	select CRYPTO_LIB_SHA1
 	select CRYPTO_LIB_UTILS
@@ -23,6 +42,7 @@ config TRUSTED_KEYS_TEE
 	bool "TEE-based trusted keys"
 	depends on TEE >= TRUSTED_KEYS
 	default y
+	select HAVE_TRUSTED_KEYS_DEBUG
 	select HAVE_TRUSTED_KEYS
 	help
 	  Enable use of the Trusted Execution Environment (TEE) as trusted
@@ -33,6 +53,7 @@ config TRUSTED_KEYS_CAAM
 	depends on CRYPTO_DEV_FSL_CAAM_JR >= TRUSTED_KEYS
 	select CRYPTO_DEV_FSL_CAAM_BLOB_GEN
 	default y
+	select HAVE_TRUSTED_KEYS_DEBUG
 	select HAVE_TRUSTED_KEYS
 	help
 	  Enable use of NXP's Cryptographic Accelerator and Assurance Module
@@ -42,6 +63,7 @@ config TRUSTED_KEYS_DCP
 	bool "DCP-based trusted keys"
 	depends on CRYPTO_DEV_MXS_DCP >= TRUSTED_KEYS
 	default y
+	select HAVE_TRUSTED_KEYS_DEBUG
 	select HAVE_TRUSTED_KEYS
 	help
 	  Enable use of NXP's DCP (Data Co-Processor) as trusted key backend.
@@ -50,6 +72,7 @@ config TRUSTED_KEYS_PKWM
 	bool "PKWM-based trusted keys"
 	depends on PSERIES_PLPKS >= TRUSTED_KEYS
 	default y
+	select HAVE_TRUSTED_KEYS_DEBUG
 	select HAVE_TRUSTED_KEYS
 	help
 	  Enable use of IBM PowerVM Key Wrapping Module (PKWM) as a trusted key backend.
diff --git a/security/keys/trusted-keys/trusted_caam.c b/security/keys/trusted-keys/trusted_caam.c
index 601943ce0d60..6a33dbf2a7f5 100644
--- a/security/keys/trusted-keys/trusted_caam.c
+++ b/security/keys/trusted-keys/trusted_caam.c
@@ -28,10 +28,13 @@ static const match_table_t key_tokens = {
 	{opt_err, NULL}
 };
 
-#ifdef CAAM_DEBUG
+#ifdef CONFIG_TRUSTED_KEYS_DEBUG
 static inline void dump_options(const struct caam_pkey_info *pkey_info)
 {
-	pr_info("key encryption algo %d\n", pkey_info->key_enc_algo);
+	if (!trusted_debug)
+		return;
+
+	pr_debug("key encryption algo %d\n", pkey_info->key_enc_algo);
 }
 #else
 static inline void dump_options(const struct caam_pkey_info *pkey_info)
diff --git a/security/keys/trusted-keys/trusted_core.c b/security/keys/trusted-keys/trusted_core.c
index 0b142d941cd2..6aed17bee09d 100644
--- a/security/keys/trusted-keys/trusted_core.c
+++ b/security/keys/trusted-keys/trusted_core.c
@@ -31,6 +31,12 @@ static char *trusted_rng = "default";
 module_param_named(rng, trusted_rng, charp, 0);
 MODULE_PARM_DESC(rng, "Select trusted key RNG");
 
+#ifdef CONFIG_TRUSTED_KEYS_DEBUG
+bool trusted_debug;
+module_param_named(debug, trusted_debug, bool, 0);
+MODULE_PARM_DESC(debug, "Enable trusted keys debug traces (default: 0)");
+#endif
+
 static char *trusted_key_source;
 module_param_named(source, trusted_key_source, charp, 0);
 MODULE_PARM_DESC(source, "Select trusted keys source (tpm, tee, caam, dcp or pkwm)");
diff --git a/security/keys/trusted-keys/trusted_tpm1.c b/security/keys/trusted-keys/trusted_tpm1.c
index 6ea728f1eae6..13513819991e 100644
--- a/security/keys/trusted-keys/trusted_tpm1.c
+++ b/security/keys/trusted-keys/trusted_tpm1.c
@@ -46,38 +46,44 @@ enum {
 	SRK_keytype = 4
 };
 
-#define TPM_DEBUG 0
-
-#if TPM_DEBUG
+#ifdef CONFIG_TRUSTED_KEYS_DEBUG
 static inline void dump_options(struct trusted_key_options *o)
 {
-	pr_info("sealing key type %d\n", o->keytype);
-	pr_info("sealing key handle %0X\n", o->keyhandle);
-	pr_info("pcrlock %d\n", o->pcrlock);
-	pr_info("pcrinfo %d\n", o->pcrinfo_len);
-	print_hex_dump(KERN_INFO, "pcrinfo ", DUMP_PREFIX_NONE,
-		       16, 1, o->pcrinfo, o->pcrinfo_len, 0);
+	if (!trusted_debug)
+		return;
+
+	pr_debug("sealing key type %d\n", o->keytype);
+	pr_debug("sealing key handle %0X\n", o->keyhandle);
+	pr_debug("pcrlock %d\n", o->pcrlock);
+	pr_debug("pcrinfo %d\n", o->pcrinfo_len);
+	print_hex_dump_debug("pcrinfo ", DUMP_PREFIX_NONE,
+			     16, 1, o->pcrinfo, o->pcrinfo_len, 0);
 }
 
 static inline void dump_sess(struct osapsess *s)
 {
-	print_hex_dump(KERN_INFO, "trusted-key: handle ", DUMP_PREFIX_NONE,
-		       16, 1, &s->handle, 4, 0);
-	pr_info("secret:\n");
-	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_NONE,
-		       16, 1, &s->secret, SHA1_DIGEST_SIZE, 0);
-	pr_info("trusted-key: enonce:\n");
-	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_NONE,
-		       16, 1, &s->enonce, SHA1_DIGEST_SIZE, 0);
+	if (!trusted_debug)
+		return;
+
+	print_hex_dump_debug("trusted-key: handle ", DUMP_PREFIX_NONE,
+			     16, 1, &s->handle, 4, 0);
+	pr_debug("secret:\n");
+	print_hex_dump_debug("", DUMP_PREFIX_NONE,
+			     16, 1, &s->secret, SHA1_DIGEST_SIZE, 0);
+	pr_debug("trusted-key: enonce:\n");
+	print_hex_dump_debug("", DUMP_PREFIX_NONE,
+			     16, 1, &s->enonce, SHA1_DIGEST_SIZE, 0);
 }
 
 static inline void dump_tpm_buf(unsigned char *buf)
 {
 	int len;
 
-	pr_info("\ntpm buffer\n");
+	if (!trusted_debug)
+		return;
+	pr_debug("\ntpm buffer\n");
 	len = LOAD32(buf, TPM_SIZE_OFFSET);
-	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_NONE, 16, 1, buf, len, 0);
+	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, buf, len, 0);
 }
 #else
 static inline void dump_options(struct trusted_key_options *o)
-- 
2.39.5


^ permalink raw reply related

* Re: [PATCH v2] bootconfig: Apply early options from embedded config
From: Breno Leitao @ 2026-04-15 11:15 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, oss, paulmck, rostedt, kernel-team, Kiryl Shutsemau
In-Reply-To: <adTX6h4Ej5jOcONP@gmail.com>

On Tue, Apr 07, 2026 at 03:19:09AM -0700, Breno Leitao wrote:
> On Fri, Apr 03, 2026 at 11:45:19AM +0900, Masami Hiramatsu wrote:
> > > I'm still uncertain about this approach. The goal is to identify and
> > > categorize the early parameters that are parsed prior to bootconfig
> > > initialization.
> >
> > Yes, if we support early parameters in bootconfig, we need to clarify
> > which parameters are inherently unsupportable, and document it.
> > Currently it is easy to say that it does not support the parameter
> > defined with "early_param()". Similary, maybe we should introduce
> > "arch_param()" or something like it (or support all of them).
> >
> > >
> > > Moreover, this work could become obsolete if bootconfig's initialization
> > > point shifts earlier or later in the boot sequence, necessitating
> > > another comprehensive analysis.
> >
> > If we can init it before calling setup_arch(), yes, we don't need to
> > check it. So that is another option. Do you think it is feasible to
> > support all of them? (Of course, theologically we can do, but the
> > question is the use case and requirements.)
>
> I don't believe all early parameters can be supported by bootconfig.
> Some are inherently incompatible as far as I understand, while others
> depend on bootconfig's initialization point in the boot sequence.

I've developed a patch series that relocates bootconfig initialization
to occur before setup_arch().

Adopting this approach would streamline the categorization considerably,
as only a small subset of kernel parameters are parsed before
setup_arch() is called.

This enables a clearer distinction: parameters processed *before*
setup_arch() versus those handled afterward, rather than classifying
based on what occurs before bootconfig initialization.

Just to close the look and link both discussion together, the proposed
patch series is available at:

https://lore.kernel.org/all/20260415-bootconfig_earlier-v1-0-cf160175de5e@debian.org/

^ permalink raw reply

* [PATCH 9/8] docs: maintainers_include: improve its output
From: Mauro Carvalho Chehab @ 2026-04-15 11:21 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, linux-kernel, Dan Williams, Randy Dunlap,
	Shuah Khan
In-Reply-To: <cover.1776242739.git.mchehab+huawei@kernel.org>

There are three "types" of profiles:
1. Profiles already included inside subsystem-specific documentation.
   This is the most common case;
2. Profiles that are hosted externally;
3. Profiles that are at the same location as maintainer-handbooks.rst.

For (3), we need to create a TOC, as they don't exist elsewhere.

Change the logic to create TOC just for (3), prepending the
content of maintainer-handbooks with a sorted entry of all types,
before the TOC.

With such change, we can have an unique sorted list of profiles,
having the subsystem names used there listed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/sphinx/maintainers_include.py | 76 +++++++++++----------
 1 file changed, 40 insertions(+), 36 deletions(-)

diff --git a/Documentation/sphinx/maintainers_include.py b/Documentation/sphinx/maintainers_include.py
index 7ab921820612..5413c1350bba 100755
--- a/Documentation/sphinx/maintainers_include.py
+++ b/Documentation/sphinx/maintainers_include.py
@@ -21,7 +21,7 @@ import sys
 import re
 import os.path
 
-from textwrap import indent
+from glob import glob
 
 from docutils import statemachine
 from docutils.parsers.rst import Directive
@@ -36,8 +36,8 @@ class MaintainersParser:
     """Parse MAINTAINERS file(s) content"""
 
     def __init__(self, base_path, path):
-        self.profiles = {}
-        self.profile_urls = {}
+        self.profile_toc = set()
+        self.profile_entries = {}
 
         result = list()
         result.append(".. _maintainers:")
@@ -73,26 +73,24 @@ class MaintainersParser:
             # Drop needless input whitespace.
             line = line.rstrip()
 
+            #
+            # Handle profile entries - either as files or as https refs
+            #
             match = re.match(r"P:\s*(Documentation/\S+)\.rst", line)
             if match:
-                fname = os.path.relpath(match.group(1), base_path)
-                if fname.startswith("../"):
-                    if self.profiles.get(fname) is None:
-                        self.profiles[fname] = subsystem_name
-                    else:
-                        self.profiles[fname] += f", {subsystem_name}"
+                entry = os.path.relpath(match.group(1), base_path)
+                if "*" in entry:
+                    for e in glob(entry):
+                        self.profile_toc.add(e)
+                        self.profile_entries[subsystem_name] = e
                 else:
-                    self.profiles[fname] = None
-
-            match = re.match(r"P:\s*(https?://.*)", line)
-            if match:
-                url = match.group(1).strip()
-                if url not in self.profile_urls:
-                    if self.profile_urls.get(url) is None:
-                        self.profile_urls[url] = subsystem_name
-                    else:
-                        self.profile_urls[url] += f", {subsystem_name}"
-
+                    self.profile_toc.add(entry)
+                    self.profile_entries[subsystem_name] = entry
+            else:
+                match = re.match(r"P:\s*(https?://.*)", line)
+                if match:
+                    entry = match.group(1).strip()
+                    self.profile_entries[subsystem_name] = entry
 
             # Linkify all non-wildcard refs to ReST files in Documentation/.
             pat = r'(Documentation/([^\s\?\*]*)\.rst)'
@@ -234,26 +232,32 @@ class MaintainersProfile(Include):
 
         maint = MaintainersParser(base_path, path)
 
-        output  = ".. toctree::\n"
-        output += "   :maxdepth: 1\n\n"
+        #
+        # Produce a list with all maintainer profiles, sorted by subsystem name
+        #
+        output = ""
 
-        items = sorted(maint.profiles.items(),
-                       key=lambda kv: (kv[1] or "", kv[0]))
-        for fname, profile in items:
-            if profile:
-                output += f"   {profile} <{fname}>\n"
+        for profile, entry in maint.profile_entries.items():
+            if entry.startswith("http"):
+                if profile:
+                    output += f"- `{profile} <{entry}>`_\n"
+                else:
+                    output += f"- `<{entry}>_`\n"
             else:
-                output += f"   {fname}\n"
+                if profile:
+                    output += f"- :doc:`{profile} <{entry}>`\n"
+                else:
+                    output += f"- :doc:`<{entry}>`\n"
 
-        output += "\n**External profiles**\n\n"
+        #
+        # Create a hidden TOC table with all profiles. That allows adding
+        # profiles without needing to add them on any index.rst file.
+        #
+        output += "\n.. toctree::\n"
+        output += "   :hidden:\n\n"
 
-        items = sorted(maint.profile_urls.items(),
-                       key=lambda kv: (kv[1] or "", kv[0]))
-        for url, profile in items:
-            if profile:
-                output += f"- {profile} <{url}>\n"
-            else:
-                output += f"- {url}\n"
+        for fname in maint.profile_toc:
+            output += f"   {fname}\n"
 
         output += "\n"
 
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] crash: Support high memory reservation for range syntax
From: Baoquan He @ 2026-04-15 11:29 UTC (permalink / raw)
  To: Youling Tang
  Cc: Baoquan He, Sourabh Jain, Andrew Morton, Jonathan Corbet,
	Vivek Goyal, Dave Young, kexec, linux-kernel, linux-doc,
	Youling Tang
In-Reply-To: <ea389ca2-8980-4022-a7d0-d96c913f671c@linux.dev>

On 04/09/26 at 09:55am, Youling Tang wrote:
> Hi, Baoquan
> 
> On 4/8/26 21:32, Baoquan He wrote:
> > On 04/08/26 at 10:01am, Sourabh Jain wrote:
> > > Hello Youling,
> > > 
> > > On 04/04/26 13:11, Youling Tang wrote:
> > > > From: Youling Tang <tangyouling@kylinos.cn>
> > > > 
> > > > The crashkernel range syntax (range1:size1[,range2:size2,...]) allows
> > > > automatic size selection based on system RAM, but it always reserves
> > > > from low memory. When a large crashkernel is selected, this can
> > > > consume most of the low memory, causing subsequent hardware
> > > > hotplug or drivers requiring low memory to fail due to allocation
> > > > failures.
> > > 
> > > Support for high crashkernel reservation has been added to
> > > address the above problem.
> > > 
> > > However, high crashkernel reservation is not supported with
> > > range-based crashkernel kernel command-line arguments.
> > > For example: crashkernel=0M-1G:100M,1G-4G:160M,4G-8G:192M
> > > 
> > > Many users, including some distributions, use range-based
> > > crashkernel configuration. So, adding support for high crashkernel
> > > reservation with range-based configuration would be useful.
> > Sorry for late response. And I have to say sorry because I have some
> > negative tendency on this change.
> > 
> > We use crashkernel=xM|G and crashkernel=range1:size1[,range2:size2,...]
> > as default setting, so that people only need to set suggested amount
> > of memory. While crashkernel=,high|low is for advanced user to customize
> > their crashkernel value. In that case, user knows what's high memory and
> > low memory, and how much is needed separately to achieve their goal, e.g
> > saving low memory, taking away more high memory.
> > 
> > To be honest, above grammers sounds simple, right? I believe both of you
> > know very well how complicated the current crashkernel code is. I would
> > suggest not letting them becomre more and more complicated by extending
> > the grammer further and further. Unless you meet unavoidable issue with
> > the existing grammer.
> > 
> > Here comes my question, do you meet unavoidable issue with the existing
> > grammer when you use crashkernel=range1:size1[,range2:size2,...] and
> > think it's not satisfactory, and at the same time crashkernel=,high|low
> > can't meet your demand either?
> 
> Yes, regular users generally don't know about high memory and low memory,
> and probably don't know how much crashkernel memory should be reserved
> either. They mostly just use the default crashkernel parameters configured
> by the distribution.
> 
> For advanced users, the current grammar is sufficient, because
> 'crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset],>boundary'
> can definitely be replaced with 'crashkernel=size,high'.
> 
> The main purpose of this patch is to provide distributions with a more
> reasonable default parameter configuration (satisfying most requirements),
> without having to set different distribution default parameters for
> different
> scenarios (physical machines, virtual machines) and different machine
> models.

OK, do you have a concrete case? e.g in your distros, what will you set
with this patchset applied? Let's see if it can cover all cases with one
simple and satisfying parameter.

^ permalink raw reply

* Re: [PATCH net-next v2 05/14] libie: add bookkeeping support for control queue messages
From: Larysa Zaremba @ 2026-04-15 11:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev,
	Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
	jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
	richardcochran, linux-doc, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <b559c877-7712-4ed7-adb4-d2b667e16e74@redhat.com>

On Thu, Apr 09, 2026 at 11:07:02AM +0200, Paolo Abeni wrote:
> On 4/3/26 9:49 PM, Tony Nguyen wrote:
> > +static bool
> > +libie_ctlq_xn_process_recv(struct libie_ctlq_xn_recv_params *params,
> > +			   struct libie_ctlq_msg *ctlq_msg)
> > +{
> > +	struct libie_ctlq_xn_manager *xnm = params->xnm;
> > +	struct libie_ctlq_xn *xn;
> > +	u16 msg_cookie, xn_index;
> > +	struct kvec *response;
> > +	int status;
> > +	u16 data;
> > +
> > +	data = ctlq_msg->sw_cookie;
> > +	xn_index = FIELD_GET(LIBIE_CTLQ_XN_INDEX_M, data);
> > +	msg_cookie = FIELD_GET(LIBIE_CTLQ_XN_COOKIE_M, data);
> > +	status = ctlq_msg->chnl_retval ? -EFAULT : 0;
> > +
> > +	xn = &xnm->ring[xn_index];
> > +	if (ctlq_msg->chnl_opcode != xn->virtchnl_opcode ||
> > +	    msg_cookie != xn->cookie)
> > +		return false;
> > +
> > +	spin_lock(&xn->xn_lock);
> 
> Sashiko says:
> 
> ---
> Because the cookie and opcode are checked before acquiring the lock, is
> it possible for the transaction to time out, be returned to the free
> list, and get reallocated for a new message before the lock is acquired?
> If that happens, could the old delayed response falsely complete the
> newly allocated transaction since the identifiers are not re-verified
> inside the lock?
> ---
> 

Yes, there is a race condition risk that is easy to fix.

> > +/**
> > + * libie_xn_check_async_timeout - Check for asynchronous message timeouts
> > + * @xnm: Xn transaction manager
> > + *
> > + * Call the corresponding callback to notify the caller about the timeout.
> > + */
> > +static void libie_xn_check_async_timeout(struct libie_ctlq_xn_manager *xnm)
> > +{
> > +	u32 idx;
> > +
> > +	for_each_clear_bit(idx, xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES) {
> 
> Sashiko says:
> 
> ---
> This iterates over the bitmap without holding the lock. Concurrently,
> other paths modify this bitmap using non-atomic bitwise operations like
> __clear_bit() and __set_bit() under the lock. Will this cause torn reads
> or data races that might lead the timeout handler to skip valid
> transactions or examine invalid ones?
> ---
>

This should create only false-negatives, which is not a problem, timeout time is 
much longer than libie_xn_check_async_timeout() calling period.

> 
> > +		params->ctlq_msg->sw_cookie = cookie;
> > +		params->ctlq_msg->send_mem = *dma_mem;
> > +		params->ctlq_msg->data_len = buf_len;
> > +		params->ctlq_msg->chnl_opcode = params->chnl_opcode;
> > +		ret = libie_ctlq_send(params->ctlq, params->ctlq_msg, 1);
> > +	}
> > +
> > +	if (ret && !libie_cp_can_send_onstack(buf_len))
> > +		libie_cp_unmap_dma_mem(dev, dma_mem);
> 
> Sashiko says:
> 
> ---
> When libie_ctlq_send() fails here, the DMA memory is unmapped and the
> buffer is freed by the caller. However, the software tracking ring at
> tx_msg[next_to_use] still contains the populated send_mem details and a
> non-zero data_len.
> 
> During driver teardown, libie_ctlq_xn_send_clean() is invoked with
> params->force = true, which processes the ring without checking the
> hardware completion bit. Could this cause the cleanup routine to process
> the failed slot again, resulting in a double-free and double-unmap?
> ---

Yes, I think that in trying to avoid unnecessary copying, I shot myself in the 
foot, will fix.

> 
> There are more remarks on the following patch, please have a look.
>

There are also a few AI's comments that will result in fixes to stable.

> Also, it would be very helpful if you could help triaging such
> (overwhelming amount of) feedback on future submissions, explicitly
> commenting on the ML. Sashiko tends to be quite noise on device driver code.
> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply

* Re: maintainer profiles
From: Mauro Carvalho Chehab @ 2026-04-15 11:43 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linux Documentation, Linux Kernel Mailing List, Jonathan Corbet,
	Linux Kernel Workflows
In-Reply-To: <d8804a85-dd2b-481e-903f-c6fea5d24c97@infradead.org>

On Sat, 11 Apr 2026 16:54:00 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> Also, does anyone know why some of these profiles are numbered and some
> are not?  See
>   https://docs.kernel.org/maintainer/maintainer-entry-profile.html#existing-profiles
> for odd numbering.

Patch 9/8 fixes it, while solving other issues:

- https://lore.kernel.org/linux-doc/cfff2b313d1f79a5919f400020a1b1a4064a7143.1776252056.git.mchehab+huawei@kernel.org/T/#u

Basically, it creates a hidden TOC which is not displayed, creating
this ReST output:

	- :doc:`Arm And Arm64 Soc Sub-Architectures (Common Parts) <maintainer-soc>`
	- :doc:`Arm/Samsung S3C, S5P And Exynos Arm Architectures <maintainer-soc-clean-dts>`
	- :doc:`Arm/Tesla Fsd Soc Support <maintainer-soc-clean-dts>`
	- `Audit Subsystem <https://github.com/linux-audit/audit-kernel/blob/main/README.md>`_
	- :doc:`Damon <../mm/damon/maintainer-profile>`
	- :doc:`Documentation <../doc-guide/maintainer-profile>`
	- :doc:`Google Tensor Soc Support <maintainer-soc-clean-dts>`
	- :doc:`Kernel Nfsd, Sunrpc, And Lockd Servers <../filesystems/nfs/nfsd-maintainer-entry-profile>`
	- :doc:`Kernel Virtual Machine For X86 (Kvm/X86) <maintainer-kvm-x86>`
	- :doc:`Libnvdimm Btt: Block Translation Table <../nvdimm/maintainer-entry-profile>`
	- :doc:`Libnvdimm Pmem: Persistent Memory Driver <../nvdimm/maintainer-entry-profile>`
	- :doc:`Libnvdimm: Non-Volatile Memory Device Subsystem <../nvdimm/maintainer-entry-profile>`
	- :doc:`Media Input Infrastructure (V4L/Dvb) <../driver-api/media/maintainer-entry-profile>`
	- :doc:`Networking Drivers <maintainer-netdev>`
	- :doc:`Networking [General] <maintainer-netdev>`
	- :doc:`Risc-V Architecture <../arch/riscv/patch-acceptance>`
	- `Rust <https://rust-for-linux.com/contributing>`_
	- `Security Subsystem <https://github.com/LinuxSecurityModule/kernel/blob/main/README.md>`_
	- `Selinux Security Module <https://github.com/SELinuxProject/selinux-kernel/blob/main/README.md>`_
	- :doc:`Vfio Pci Device Specific Drivers <../driver-api/vfio-pci-device-specific-driver-acceptance>`
	- :doc:`X86 Architecture (32-Bit And 64-Bit) <maintainer-tip>`
	- :doc:`Xfs Filesystem <../filesystems/xfs/xfs-maintainer-entry-profile>`

	.. toctree::
	   :hidden:

	   ../filesystems/xfs/xfs-maintainer-entry-profile
	   ../driver-api/vfio-pci-device-specific-driver-acceptance
	   maintainer-netdev
	   ../nvdimm/maintainer-entry-profile
	   maintainer-soc
	   maintainer-soc-clean-dts
	   ../doc-guide/maintainer-profile
	   maintainer-kvm-x86
	   ../mm/damon/maintainer-profile
	   ../driver-api/media/maintainer-entry-profile
	   ../arch/riscv/patch-acceptance
	   ../filesystems/nfs/nfsd-maintainer-entry-profile
	   maintainer-tip

E.g. instead of showing the contents of the TOC tree, it shows a
per-subsystem sorted list of items. The TOC tree is used there just
to avoid warnings that a .rst file is not placed on a TOC.

The advantage of such approach is that there's now one item at
the list for each "P:" tag at MAINTAINERS. All of them are
displayed using the name of the subsystem as described there,
e.g. it outputs:

     • Arm And Arm64 Soc Sub-Architectures (Common Parts)
     • Arm/Samsung S3C, S5P And Exynos Arm Architectures
     • Arm/Tesla Fsd Soc Support
     • Audit Subsystem
     • Damon
     • Documentation
     • Google Tensor Soc Support
     • Kernel Nfsd, Sunrpc, And Lockd Servers
     • Kernel Virtual Machine For X86 (Kvm/X86)
     • Libnvdimm Btt: Block Translation Table
     • Libnvdimm Pmem: Persistent Memory Driver
     • Libnvdimm: Non-Volatile Memory Device Subsystem
     • Media Input Infrastructure (V4L/Dvb)
     • Networking Drivers
     • Networking [General]
     • Risc-V Architecture
     • Rust
     • Security Subsystem
     • Selinux Security Module
     • Vfio Pci Device Specific Drivers
     • X86 Architecture (32-Bit And 64-Bit)
     • Xfs Filesystem

Each of entry there with either a cross-reference to a document or
with a reference to an external site.

Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: KaFai Wan @ 2026-04-15 12:52 UTC (permalink / raw)
  To: Jiayuan Chen, bpf
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David Ahern, netdev, linux-doc, linux-kernel
In-Reply-To: <0b3a3a41-f709-4414-8a5d-d2eb4959db3f@linux.dev>

On Wed, 2026-04-15 at 09:47 +0800, Jiayuan Chen wrote:
> 
> On 4/14/26 11:37 PM, mkf wrote:
> > On Tue, 2026-04-14 at 18:57 +0800, Jiayuan Chen wrote:
> 
> Hi Martin, I saw your patch. Your solution is better, please ignore mine :)
> 
I'm not Martin, just same first name :). Ok, I'll continue.
> 
> 

-- 
Thanks,
KaFai

^ permalink raw reply

* [PATCH v4 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
From: Breno Leitao @ 2026-04-15 12:55 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260415-ecc_panic-v4-0-2d0277f8f601@debian.org>

When get_hwpoison_page() returns a negative value, distinguish
reserved pages from other failure cases by reporting MF_MSG_KERNEL
instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
and should be classified accordingly for proper handling.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/memory-failure.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d43613097..7b67e43dafbd1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2432,7 +2432,16 @@ int memory_failure(unsigned long pfn, int flags)
 		}
 		goto unlock_mutex;
 	} else if (res < 0) {
-		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+		/*
+		 * PageReserved is stable here: reserved pages have
+		 * PG_reserved set at boot or by drivers and are never
+		 * freed through the page allocator.
+		 */
+		if (PageReserved(p))
+			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+		else
+			res = action_result(pfn, MF_MSG_GET_HWPOISON,
+					    MF_IGNORED);
 		goto unlock_mutex;
 	}
 

-- 
2.52.0


^ permalink raw reply related

* [PATCH v4 0/3] mm/memory-failure: add panic option for unrecoverable pages
From: Breno Leitao @ 2026-04-15 12:54 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team

When the memory failure handler encounters an in-use kernel page that it
cannot recover (slab, page tables, kernel stacks, vmalloc, etc.), it
currently logs the error as "Ignored" and continues operation.

This leaves corrupted data accessible to the kernel, which will inevitably
cause either silent data corruption or a delayed crash when the poisoned memory
is next accessed.

This is a common problem on large fleets. We frequently observe multi-bit ECC
errors hitting kernel slab pages, where memory_failure() fails to recover them
and the system crashes later at an unrelated code path, making root cause
analysis unnecessarily difficult.

Here is one specific example from production on an arm64 server: a multi-bit
ECC error hit a dentry cache slab page, memory_failure() failed to recover it
(slab pages are not supported by the hwpoison recovery mechanism), and 67
seconds later d_lookup() accessed the poisoned cache line causing
a synchronous external abort:

    [88690.479680] [Hardware Error]: error_type: 3, multi-bit ECC
    [88690.498473] Memory failure: 0x40272d: unhandlable page.
    [88690.498619] Memory failure: 0x40272d: recovery action for
                   get hwpoison page: Ignored
    ...
    [88757.847126] Internal error: synchronous external abort:
                   0000000096000410 [#1] SMP
    [88758.061075] pc : d_lookup+0x5c/0x220

This series adds a new sysctl vm.panic_on_unrecoverable_memory_failure
(default 0) that, when enabled, panics immediately on unrecoverable
memory failures. This provides a clean crash dump at the time of the
error, which is far more useful for diagnosis than a random crash later
at an unrelated code path.

This also categorizes reserved pages as MF_MSG_KERNEL, and panics on
unknown page types (MF_MSG_UNKNOWN).

Note that dynamically allocated kernel memory (SLAB/SLUB, vmalloc,
kernel stacks, page tables) shares the MF_MSG_GET_HWPOISON return path
with transient refcount races, so it is intentionally excluded from the
panic conditions to avoid false positives.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v4:
- Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
- Split the reserved page classification (MF_MSG_KERNEL) into its own
  patch, separate from the panic mechanism.
- Document why the buddy allocator TOCTOU race (between
  get_hwpoison_page() and is_free_buddy_page()) cannot cause false
  positives: PG_hwpoison is set beforehand and check_new_page() in the
  page allocator rejects hwpoisoned pages.
- Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
  its mitigation via identify_page_state()'s two-pass design.
- Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
  panic conditions (shared path with transient races and non-reserved
  kernel memory).
- Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org

Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
  as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
  similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
  how the retry mechanism mitigates false positives from buddy allocator
  races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org

Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
  instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
  instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org

---
Breno Leitao (3):
      mm/memory-failure: report MF_MSG_KERNEL for reserved pages
      mm/memory-failure: add panic option for unrecoverable pages
      Documentation: document panic_on_unrecoverable_memory_failure sysctl

 Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++
 mm/memory-failure.c                     | 92 ++++++++++++++++++++++++++++++++-
 2 files changed, 128 insertions(+), 1 deletion(-)
---
base-commit: e6efabc0afca02efa263aba533f35d90117ab283
change-id: 20260323-ecc_panic-4e473b83087c

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply

* [PATCH v4 2/3] mm/memory-failure: add panic option for unrecoverable pages
From: Breno Leitao @ 2026-04-15 12:55 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260415-ecc_panic-v4-0-2d0277f8f601@debian.org>

Add a sysctl panic_on_unrecoverable_memory_failure that triggers a
kernel panic when memory_failure() encounters pages that cannot be
recovered. This provides a clean crash with useful debug information
rather than allowing silent data corruption.

The panic is triggered for three categories of unrecoverable failures,
all requiring result == MF_IGNORED:

- MF_MSG_KERNEL: reserved pages identified via PageReserved.

- MF_MSG_KERNEL_HIGH_ORDER: pages with refcount 0 that are not in the
  buddy allocator (e.g., tail pages of high-order kernel allocations).
  A TOCTOU race between get_hwpoison_page() and is_free_buddy_page()
  is possible when CONFIG_DEBUG_VM is disabled, since check_new_pages()
  is gated by is_check_pages_enabled() and becomes a no-op. Panicking
  is still correct: the physical memory has a hardware error regardless
  of who allocated the page.

- MF_MSG_UNKNOWN: pages that do not match any known recoverable state
  in error_states[]. A theoretical false positive from concurrent LRU
  isolation is mitigated by identify_page_state()'s two-pass design
  which rechecks using saved page_flags.

MF_MSG_GET_HWPOISON is intentionally excluded: it covers both
non-reserved kernel memory (SLAB/SLUB, vmalloc, kernel stacks, page
tables) and transient refcount races, so panicking would risk false
positives.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/memory-failure.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7b67e43dafbd1..311344f332449 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
 
 static int sysctl_enable_soft_offline __read_mostly = 1;
 
+static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
+	},
+	{
+		.procname	= "panic_on_unrecoverable_memory_failure",
+		.data		= &sysctl_panic_on_unrecoverable_mf,
+		.maxlen		= sizeof(sysctl_panic_on_unrecoverable_mf),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
 	}
 };
 
@@ -1281,6 +1292,59 @@ static void update_per_node_mf_stats(unsigned long pfn,
 	++mf_stats->total;
 }
 
+/*
+ * Determine whether to panic on an unrecoverable memory failure.
+ *
+ * Design rationale: This design opts for immediate panic on kernel memory
+ * failures, capturing clean crashes rather than random crashes on MF_IGNORED
+ * pages.
+ *
+ * This panics on three categories of failures (all requiring result ==
+ * MF_IGNORED, meaning the page was not recovered):
+ *
+ * - MF_MSG_KERNEL: Reserved pages (identified via PageReserved) that belong
+ *   to the kernel and cannot be recovered.
+ *
+ * - MF_MSG_KERNEL_HIGH_ORDER: Pages that get_hwpoison_page() observed as free
+ *   (refcount 0) but are not in the buddy allocator. These are kernel pages
+ *   in a transient state between allocation and freeing. A TOCTOU race
+ *   (page allocated between get_hwpoison_page() and is_free_buddy_page())
+ *   is possible when CONFIG_DEBUG_VM is disabled, since check_new_pages()
+ *   is gated by is_check_pages_enabled() and becomes a no-op. However,
+ *   panicking is still correct in this case: the physical memory has a
+ *   hardware error, so an allocated hwpoisoned page is unrecoverable.
+ *
+ * - MF_MSG_UNKNOWN: Pages that reached identify_page_state() but did not
+ *   match any known recoverable state in error_states[]. This is the
+ *   catch-all for pages whose flags do not indicate a recoverable user or
+ *   cache page (no LRU, no swapcache, no mlock, etc). A theoretical false
+ *   positive exists if concurrent LRU isolation clears PG_lru between
+ *   folio_lock() and saving page_flags, but this window is very narrow and
+ *   mitigated by identify_page_state()'s two-pass design which rechecks
+ *   using saved page_flags.
+ *
+ * Pages intentionally NOT included:
+ * - MF_MSG_GET_HWPOISON: get_hwpoison_page() failure on non-reserved pages.
+ *   This includes dynamically allocated kernel memory (SLAB/SLUB, vmalloc,
+ *   kernel stacks, page tables) which are not PageReserved and fail
+ *   get_hwpoison_page() with -EBUSY/-EIO. These share the return path with
+ *   transient refcount races, so panicking here would risk false positives.
+ *
+ * Note: Some transient races in the buddy allocator path are mitigated by
+ * memory_failure()'s retry mechanism. When take_page_off_buddy() fails,
+ * the code clears PageHWPoison and retries the entire memory_failure()
+ * flow, allowing pages to be properly reclassified with updated flags.
+ */
+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
+				      enum mf_result result)
+{
+	return sysctl_panic_on_unrecoverable_mf &&
+	       result == MF_IGNORED &&
+	       (type == MF_MSG_KERNEL ||
+		type == MF_MSG_KERNEL_HIGH_ORDER ||
+		type == MF_MSG_UNKNOWN);
+}
+
 /*
  * "Dirty/Clean" indication is not 100% accurate due to the possibility of
  * setting PG_dirty outside page lock. See also comment above set_page_dirty().
@@ -1298,6 +1362,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 
+	if (panic_on_unrecoverable_mf(type, result))
+		panic("Memory failure: %#lx: unrecoverable page", pfn);
+
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
@@ -2428,6 +2495,20 @@ int memory_failure(unsigned long pfn, int flags)
 			}
 			res = action_result(pfn, MF_MSG_BUDDY, res);
 		} else {
+			/*
+			 * The page has refcount 0 but is not in the buddy
+			 * allocator — it is a non-compound high-order kernel
+			 * page (e.g., a tail page of a high-order allocation).
+			 *
+			 * A TOCTOU race where the page transitions from
+			 * free-buddy to allocated between get_hwpoison_page()
+			 * and is_free_buddy_page() is possible when
+			 * CONFIG_DEBUG_VM is disabled (check_new_pages() is
+			 * gated by is_check_pages_enabled() and becomes a
+			 * no-op). Panicking is still correct: the physical
+			 * memory has a hardware error regardless of who
+			 * allocated the page.
+			 */
 			res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
 		}
 		goto unlock_mutex;

-- 
2.52.0


^ permalink raw reply related

* [PATCH v4 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl
From: Breno Leitao @ 2026-04-15 12:55 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260415-ecc_panic-v4-0-2d0277f8f601@debian.org>

Add documentation for the new vm.panic_on_unrecoverable_memory_failure
sysctl, describing the three categories of failures that trigger a
panic and noting which kernel page types are not yet covered.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 97e12359775c9..592ce9ec38c4b 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,42 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
+that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+When enabled, this sysctl triggers a panic on three categories of
+unrecoverable failures: reserved kernel pages, non-buddy kernel pages
+with zero refcount (e.g. tail pages of high-order allocations), and
+pages whose state cannot be classified as recoverable.
+
+Note that some kernel page types — such as slab objects, vmalloc
+allocations, kernel stacks, and page tables — share a failure path
+with transient refcount races and are not currently covered by this
+option. I.e, do not panic when not confident of the page status.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+Example::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 

-- 
2.52.0


^ permalink raw reply related

* RE: [PATCH v7 5/6] iio: adc: ad4691: add oversampling support
From: Sabau, Radu bogdan @ 2026-04-15 13:03 UTC (permalink / raw)
  To: Nuno Sá, David Lechner
  Cc: Jonathan Cameron, Lars-Peter Clausen, Hennerich, Michael,
	Sa, Nuno, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
	Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
	Jonathan Corbet, Shuah Khan, linux-iio@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pwm@vger.kernel.org, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org
In-Reply-To: <ad9J9C5K7tyxuztU@nsa>

> -----Original Message-----
> From: Nuno Sá <noname.nuno@gmail.com>
> Sent: Wednesday, April 15, 2026 11:21 AM

...

> > >
> > > More than this, if the OSR is 32 the maximum effective rate would be
> 31250, so 25kHz
> > > would make it the closes available one. If the user would select 1MHz from
> the available
> > > list it would be weird I would say. So perhaps a solution for this is to display
> the avail list
> > > depending on the set OSR value.
> >
> > Yes, the available list should reflect the current state of any other attributes
> > that affect it.
> 
> IMO, the above makes total sense to me.
> 
> - Nuno Sá
> 

Hi everyone and thank you so much for your feedback!

After thinking this through carefully and testing on hardware (ad4692), here is
the design I have in mind:

in_voltageN_sampling_frequency = effective rate = `osc_freq / osr[N]`:

The chip has a single internal oscillator shred by all channels; each channel
independently accumulating osc[N] oscillator cycles before producing a result.

Writing in_voltageN_sampling_frequency = freq:

The driver computes the needed_osc = freq * osr[N] and snaps down to the largest
available oscillator table entry satisfying both `osc <= needed_osc` and an exact
division to osr. The divisibility constraint ensures the read-back is always an exact
integer.
The result is stored in a single shared `target_osc_freq_Hz` - writing the attribute
for any channel changes the shared oscillator and therefore the read-back of all
other channels.

in_voltageN_sampling_frequency_available:

Computed dynamically from the channel's current OSR. The list naturally becomes
sparser as OSR increases, capping at `max_rate / osr[N]` which is exactly the chip's
behaviour, and therefore more intuitive for the user.

OSC_FREQ_REG write timing:

`target_osc_freq_Hz` is written to hardware at two points:
- Single-shot read: immediately before starting accumulation.
- CNV busrt buffer enable: inside enter_conversion_mode, after the manual mode
early return (manual mode uses SPI CS toggling, not the internal oscillator, so the
write is skipped there).

This keeps the deffered-write benefit - both sampling_frequency and osr can be
set in any order before enabling the buffer/single-shot reading.

Buffer Mode:

After desired rates/osr are set by the user for each channel, reading back the sampling
frequency of each channel gives him the true effective rate for each. Therefore
he can use that information in order to set the buffer sampling frequency accordingly
and helping him use the chip with correct synchronization more intuitively.

I have also performed the next test using the hardware and got correct results:
- test case (ad4692, 1MHz maximum internal oscillator rate):

1. Set channel 0 OSR=32. Available list: {31250, 15625, 12500, 6250, 3125}.
    Write sampling_frequency=10000 (not in the list) -> snaps to 6250 (osc=200000Hz).
    Correct readback = 6250.
2. Set channel 1 OSR=4. Read channel 1 sampling frequency -> 50000 (=200000/4).
    Shared oscillator correctly reflected across channels.
3. Change channel 0 OSR from 32 to 8. Driver recomputes as follows : effective stays
    6250 as before and needed_osc becomes 50000, exact table hit. Readback channel 0:
    6250 (rate preserved). Readback channel 1 (OSR=4): 12500. (oscillator change visible).
    The sampling for channel 0 can be of course set to another available value as well and
    Make match with the initial requested 50k of channel 1. (in this case, set channel 0 to
    25k).
4. -EINVAL rejection is atomic: with OSR=1 and SF=1250 at start for lets say channel 0, writing
    OSR=32 is rejected since the needed_osc=40000, which is not a table entry and also has no
    table entry <= 40000 that is divisible by 32). Both OSR and SF remain unchanged. Raising SF
    to 500000 first then writing OSR=32 succeeds - osc snaps to 1000000, readback SF=31250.
    
    In (4) case we could still let the user have its sampling frequency as is (1250/32=39.0625),
    though it won't result in a precise true integer value, but a rounded (39) one, and when
    other channel would have OSR/rate changed it would imply a messy change in the previous
    channel's SF and requiring a non-existent/matching internal osc value (most of the times
    a float one), and true SF would be lost.

Do you guys think this approach suits the best?

Thanks,
Radu

    >
> > >
> > > Linking the two together is perhaps wrong to begin with from my end,
> since in this
> > > driver's case, the per-channel sampling frequency is controlled by the
> internal oscillator
> > > which has static available values. So perhaps sampling frequency should be
> separate, and
> > > OSR separate as well, which would make everything cleaner.
> > >
> > > Indeed, the effective rate is changed by OSR, but perhaps that is something
> the user
> > > should be aware of, since the sampling frequency is the rate at which the
> channel samples
> > > (1 sample per period) and OSR is how many times the channel samples
> upon a final sample
> > > is to be read. The user already has to take this into account when setting
> the buffer
> > > sampling frequency, so it would make sense to take this into account here
> too.
> >
> > We can't change the definition of the IIO ABI just to make one driver simpler
> > to implement. The OSR and sample rate can't be completely independent.
> >
> > If you want to leave it the way it is currently implemented though, that is
> fine.
> >
> > >
> > > Please let me know you thoughts on this,
> > > Radu
> >

^ permalink raw reply

* Re: Volunteering to do more reviews
From: Jonathan Corbet @ 2026-04-15 13:16 UTC (permalink / raw)
  To: Konstantin Ryabitsev, linux-doc
In-Reply-To: <20260414-valiant-sticky-piculet-3b7b3f@lemur>

Konstantin Ryabitsev <mricon@kernel.org> writes:

> Jon and others:
>
> I need more direct hands-on experience doing reviews and using my own
> tooling, so I'd like to offer to do more reviewing of patches sent to
> linux-doc, if that sort of thing is welcome and I won't be stepping on
> anyone's toes.

Of course it's welcome!  I'd love to see it.

Thanks,

jon

^ permalink raw reply

* RE: [PATCH v7 5/6] iio: adc: ad4691: add oversampling support
From: Sabau, Radu bogdan @ 2026-04-15 13:26 UTC (permalink / raw)
  To: Nuno Sá, David Lechner
  Cc: Jonathan Cameron, Lars-Peter Clausen, Hennerich, Michael,
	Sa, Nuno, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
	Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
	Jonathan Corbet, Shuah Khan, linux-iio@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pwm@vger.kernel.org, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org
In-Reply-To: <LV9PR03MB8414CFF38DAD2BEB7AE3E704F7222@LV9PR03MB8414.namprd03.prod.outlook.com>



> -----Original Message-----
> From: Sabau, Radu bogdan
> Sent: Wednesday, April 15, 2026 4:03 PM

...
 
> > > >
> > > > More than this, if the OSR is 32 the maximum effective rate would be
> > 31250, so 25kHz
> > > > would make it the closes available one. If the user would select 1MHz
> from
> > the available
> > > > list it would be weird I would say. So perhaps a solution for this is to
> display
> > the avail list
> > > > depending on the set OSR value.
> > >
> > > Yes, the available list should reflect the current state of any other attributes
> > > that affect it.
> >
> > IMO, the above makes total sense to me.
> >
> > - Nuno Sá
> >
> 
> Hi everyone and thank you so much for your feedback!
> 
> After thinking this through carefully and testing on hardware (ad4692), here is
> the design I have in mind:
> 
> in_voltageN_sampling_frequency = effective rate = `osc_freq / osr[N]`:
> 
> The chip has a single internal oscillator shred by all channels; each channel
> independently accumulating osc[N] oscillator cycles before producing a result.
> 
> Writing in_voltageN_sampling_frequency = freq:
> 
> The driver computes the needed_osc = freq * osr[N] and snaps down to the
> largest
> available oscillator table entry satisfying both `osc <= needed_osc` and an exact
> division to osr. The divisibility constraint ensures the read-back is always an
> exact
> integer.
> The result is stored in a single shared `target_osc_freq_Hz` - writing the
> attribute
> for any channel changes the shared oscillator and therefore the read-back of
> all
> other channels.
> 
> in_voltageN_sampling_frequency_available:
> 
> Computed dynamically from the channel's current OSR. The list naturally
> becomes
> sparser as OSR increases, capping at `max_rate / osr[N]` which is exactly the
> chip's
> behaviour, and therefore more intuitive for the user.
> 
> OSC_FREQ_REG write timing:
> 
> `target_osc_freq_Hz` is written to hardware at two points:
> - Single-shot read: immediately before starting accumulation.
> - CNV busrt buffer enable: inside enter_conversion_mode, after the manual
> mode
> early return (manual mode uses SPI CS toggling, not the internal oscillator, so
> the
> write is skipped there).
> 
> This keeps the deffered-write benefit - both sampling_frequency and osr can
> be
> set in any order before enabling the buffer/single-shot reading.
> 
> Buffer Mode:
> 
> After desired rates/osr are set by the user for each channel, reading back the
> sampling
> frequency of each channel gives him the true effective rate for each. Therefore
> he can use that information in order to set the buffer sampling frequency
> accordingly
> and helping him use the chip with correct synchronization more intuitively.
> 
> I have also performed the next test using the hardware and got correct results:
> - test case (ad4692, 1MHz maximum internal oscillator rate):
> 
> 1. Set channel 0 OSR=32. Available list: {31250, 15625, 12500, 6250, 3125}.
>     Write sampling_frequency=10000 (not in the list) -> snaps to 6250
> (osc=200000Hz).
>     Correct readback = 6250.
> 2. Set channel 1 OSR=4. Read channel 1 sampling frequency -> 50000
> (=200000/4).
>     Shared oscillator correctly reflected across channels.
> 3. Change channel 0 OSR from 32 to 8. Driver recomputes as follows : effective
> stays
>     6250 as before and needed_osc becomes 50000, exact table hit. Readback
> channel 0:
>     6250 (rate preserved). Readback channel 1 (OSR=4): 12500. (oscillator
> change visible).
>     The sampling for channel 0 can be of course set to another available value as
> well and
>     Make match with the initial requested 50k of channel 1. (in this case, set
> channel 0 to
>     25k).
> 4. -EINVAL rejection is atomic: with OSR=1 and SF=1250 at start for lets say
> channel 0, writing
>     OSR=32 is rejected since the needed_osc=40000, which is not a table entry
> and also has no
>     table entry <= 40000 that is divisible by 32). Both OSR and SF remain
> unchanged. Raising SF
>     to 500000 first then writing OSR=32 succeeds - osc snaps to 1000000,
> readback SF=31250.
> 
>     In (4) case we could still let the user have its sampling frequency as is
> (1250/32=39.0625),
>     though it won't result in a precise true integer value, but a rounded (39)
> one, and when
>     other channel would have OSR/rate changed it would imply a messy change
> in the previous
>     channel's SF and requiring a non-existent/matching internal osc value (most
> of the times
>     a float one), and true SF would be lost.
> 
> Do you guys think this approach suits the best?
> 
> Thanks,
> Radu

Hmm, perhaps changing the internal osc value when changing OSR is not correct.
If OSR is changed, only the effective SF of the respective channel should be changed
not the whole internal osc value. The effective rate readback value then becomes
target_osc_freq / new_osr automatically - no oscillator recalculation upon osr write,
no -EINVAL.

Then, if after an OSR change the effective rate is not on the available list (as the edge
case before of 39 rounded), writing `sampling_frequency` (choosing a new available value)
fixes it. The 39 rounded would still work correctly, only that the value at hand wouldn't
be precise to the last decimal though I guess the user should be aware that 1250/32 is
not an actual round 39, right?

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-15 13:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Darrick J. Wong, John Groves, Miklos Szeredi, Joanne Koong,
	Bernd Schubert, John Groves, Dan Williams, Bernd Schubert,
	Alison Schofield, John Groves, Jonathan Corbet, Shuah Khan,
	Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <f254f6fc-dc06-4612-82d7-35bb10dbd32e@kernel.org>

On Wed, Apr 15, 2026 at 10:16:38AM +0200, David Hildenbrand (Arm) wrote:
> On 4/15/26 00:20, Gregory Price wrote:
> > On Tue, Apr 14, 2026 at 11:57:40AM -0700, Darrick J. Wong wrote:
> >>>
> >>> I very strongly object to making this a prerequisite to merging. This
> >>> is an untested idea that will certainly delay us by at least a couple
> >>> of merge windows when products are shipping now, and the existing approach
> >>> has been in circulation for a long time. It is TOO LATE!!!!!!
> >>
> > ...
> >>
> >> That said, you're clearly pissed at the goalposts changing yet again,
> >> and that's really not fair that we collectively keep moving them.
> >>
> > 
> > This seems a bit more than moving a goalpost.
> > 
> > We're now gating working software, for real working hardware, on a novel,
> > unproven BPF ops structure that controls page table mappings on page table
> > faults which would be used by exactly 1 user : FAMFS.
> 
> Are MM people on board with even letting BPF do that? Honest question,
> if someone has a pointer to how that should work, that would be appreciated.
> 

This was my first reaction when I realized the BPF program would be
controlling iomap return value in the fault path.  Big ol' (!)  popped
up over my head.

~Gregory

^ permalink raw reply

* Re: [PATCH v4 04/13] dt-bindings: power: supply: document Samsung S2M series PMIC charger device
From: Kaustabh Chakraborty @ 2026-04-15 14:03 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260415-swinging-radical-junglefowl-85dcf7@quoll>

On 2026-04-15 09:18 +02:00, Krzysztof Kozlowski wrote:
> On Tue, Apr 14, 2026 at 12:02:56PM +0530, Kaustabh Chakraborty wrote:
>> +description: |
>> +  The Samsung S2M series PMIC battery charger manages power interfacing
>> +  of the USB port. It may supply power, as done in USB OTG operation
>> +  mode, or it may accept power and redirect it to the battery fuelgauge
>> +  for charging.
>> +
>> +  This is a part of device tree bindings for S2M and S5M family of Power
>> +  Management IC (PMIC).
>> +
>> +  See also Documentation/devicetree/bindings/mfd/samsung,s2mps11.yaml for
>> +  additional information and example.
>> +
>> +allOf:
>> +  - $ref: power-supply.yaml#
>> +
>> +properties:
>> +  compatible:
>> +    enum:
>> +      - samsung,s2mu005-charger
>> +
>> +  port:
>> +    $ref: /schemas/graph.yaml#/properties/port
>
> That port is internal part of the device, thus should be dropped which
> leaves you with only one property - monitored battery - and therefore
> fold the node into the parent node.

And that monitored-battery belongs to power-supply.yaml. Do I then
include the allOf block in the mfd/samsung,s2mps11.yaml under the
s2mu005 compatible?

>
> Best regards,
> Krzysztof


^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Miklos Szeredi @ 2026-04-15 14:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: David Hildenbrand (Arm), Darrick J. Wong, John Groves,
	Joanne Koong, Bernd Schubert, John Groves, Dan Williams,
	Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
	Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-UAMcALRubBcHk@gourry-fedora-PF4VCD3F>

On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:

> This was my first reaction when I realized the BPF program would be
> controlling iomap return value in the fault path.  Big ol' (!)  popped
> up over my head.

I'm wondering which part of this triggers the big (!).

BPF program being run in the fault path?

Or that the return value from the BPF function is used as iomap?

Or something else?

Thanks,
Miklos

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox