Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v7 2/9] bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

xbc_node_for_each_key_value() walks to the first leaf under @root, and
when @root is itself a leaf it yields @root. That happens not only for
an empty "kernel {}" subtree, but also when @root carries both a value
and subkeys, e.g.

	kernel = x
	kernel.foo = bar

Here @root ("kernel") is a leaf because its first child is the value
node "x", so the iterator returns @root first. Feeding @root back into
xbc_node_compose_key_after(root, root) returns -EINVAL, which the only
in-kernel caller papers over with a "len <= 0" check -- but the
follow-up tools/bootconfig -C user propagates the error and turns such
a bootconfig into a build failure. Worse, short-circuiting the whole
call on a leaf @root would silently drop the valid "kernel.foo = bar"
descendant that this patch should render.

Skip @root inside the loop instead of bailing out: the value-only entry
is dropped (it is rendered through the "kernel" cmdline path, not here),
while real descendant keys are still emitted. An entirely empty subtree
now renders nothing and returns 0 rather than -EINVAL, matching the
"nothing to render is not an error" semantics expected by the new
build-time caller.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 2ed9ee3dc81c7..926094d97397e 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -440,6 +440,17 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 	 * itself is well defined and returns the would-be length.
 	 */
 	xbc_node_for_each_key_value(root, knode, val) {
+		/*
+		 * An empty or value-only @root (e.g. "kernel {}" or
+		 * "kernel = x", possibly alongside "kernel.foo = bar")
+		 * yields @root itself here. Skip it: composing a key for it
+		 * would fail with -EINVAL, yet any real descendant keys must
+		 * still be rendered. An entirely empty subtree then renders
+		 * nothing and returns 0 rather than an error.
+		 */
+		if (knode == root)
+			continue;
+
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
 		if (ret < 0)

-- 
2.53.0-Meta

^ permalink raw reply related

* [PATCH v7 0/9] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier

The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
already landed; this series wires the rendered cmdline into the kernel.

Motivation: today the embedded bootconfig is parsed at runtime, after
parse_early_param() has already run, so early_param() handlers can't
see embedded values. Folding the kernel.* subtree into the cmdline at
build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
users without forcing them to maintain two cmdline sources.

Behaviorally, the "kernel" subtree is rendered to a flat string at
build time and stashed in .init.rodata. setup_arch() prepends it to
boot_command_line before parse_early_param() runs. Overflow is a soft
error: the helper logs and leaves boot_command_line untouched rather
than panicking, so an oversized embedded bconf cannot brick a boot.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v7:
- The runtime opt-in now shares one helper instead of open-coding its
  own. (Masami)
- bootconfig_cmdline_requested() moved into generic lib code (Masami)
- Link to v6: https://lore.kernel.org/r/20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org

Changes in v6:
- renamed CONFIG_BOOT_CONFIG_EMBED_CMDLINE to
  CONFIG_CMDLINE_FROM_BOOTCONFIG
- prepend embedded bootconfig cmdline before parse_early_param
- Link to v5: https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org

Changes in v5:
- Patch 3 (Kconfig): drop the redundant "depends on BOOT_CONFIG_EMBED"
  from CMDLINE_FROM_BOOTCONFIG; Julian Braha.
- Patch 6 (Documentation): spell out how the embedded cmdline interacts
  with the bootloader cmdline, an initrd bootconfig, and the embedded
  bootconfig
- Link to v4: https://lore.kernel.org/r/20260609-bootconfig_using_tools-v4-0-73c463f03a97@debian.org

Changes in v4:
- Patch 3 (build pipeline): clear CROSS_COMPILE= in the kernel-side
  tools/bootconfig sub-make. Without it, an LLVM=1 cross build
  inherits CROSS_COMPILE and tools/scripts/Makefile.include injects
  --target=/--sysroot= into the host clang, producing a target
  binary that fails to exec.
- Patch 3 (build pipeline): place embedded-cmdline.S in its own
  .init.rodata.embed_cmdline subsection ("a") so ld.lld does not
  see a section-type mismatch against lib/bootconfig-data.S's
  writable .init.rodata ("aw"). The linker's *(.init.rodata
  .init.rodata.*) glob still folds it into the init image.
- Patch 6 (x86/setup): also accept the bootconfig=<anything> form
  via cmdline_find_option(), matching the runtime parse_args() loop.
  Without it, bootconfig=0/=off would skip the early prepend but
  still trigger the late runtime apply -- a split-brain state.
- New patch 7: document CONFIG_CMDLINE_FROM_BOOTCONFIG in
  Documentation/admin-guide/bootconfig.rst (semantics, opt-in,
  precedence, overflow behavior, example).
- Link to v3: https://lore.kernel.org/r/20260608-bootconfig_using_tools-v3-0-4ddd079a0696@debian.org

Changes in v3:
- Patch 3: Move HOSTCC override to the kernel-side rule; tool keeps
  $(CC) for standalone/cross builds.
- Patch 6: Drop the false fail-safe wording; document the
  BOOT_CONFIG_FORCE=y default interaction.
- Link to v2:
  https://lore.kernel.org/r/20260605-bootconfig_using_tools-v2-0-d309f544b5f7@debian.org

Changes in v2 (addressing review of v1):
- Split out a standalone fix for the NULL-pointer arithmetic in
  xbc_snprint_cmdline() so the build-time render cannot trip host
  UBSan/FORTIFY_SOURCE.
- Rework the leaf-root handling: instead of returning early, skip @root
  inside the loop so a root carrying both a value and subkeys
  (kernel = x together with kernel.foo = bar) still renders its
  descendant keys.
- Build tools/bootconfig with $(HOSTCC) so cross-compiled (ARCH=...)
  builds render the cmdline on the build host instead of failing with
  "Exec format error".
- Mark the embedded cmdline section read-only (drop the "w" flag from
  .init.rodata).
- Add a make-clean hook so tools/bootconfig artifacts are removed by
  make clean.
- Gate the x86 prepend on "bootconfig" being present on the command
  line (or CONFIG_BOOT_CONFIG_FORCE), matching the init.* opt-in
  semantics documented in bootconfig.rst and preserving fail-safe
  recovery: dropping "bootconfig" from the bootloader cmdline now also
  disables the embedded kernel.* keys.
- Link to v1: https://patch.msgid.link/20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org

---
Breno Leitao (9):
      bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
      bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
      bootconfig: render embedded bootconfig as a kernel cmdline at build time
      bootconfig: clean build-time tools/bootconfig from make clean
      bootconfig: add xbc_prepend_embedded_cmdline() helper
      Documentation: bootconfig: document build-time cmdline rendering
      x86/setup: prepend embedded bootconfig cmdline before parse_early_param
      bootconfig: skip runtime kernel.* render once prepended early
      init/main.c: use bootconfig_cmdline_requested() for the runtime opt-in

 Documentation/admin-guide/bootconfig.rst |  81 ++++++++++++++++
 MAINTAINERS                              |   1 +
 Makefile                                 |  27 +++++-
 arch/x86/Kconfig                         |   1 +
 arch/x86/kernel/setup.c                  |  14 ++-
 include/linux/bootconfig.h               |  14 +++
 init/Kconfig                             |  36 +++++++
 init/main.c                              |  52 +++++-----
 lib/Makefile                             |  16 +++
 lib/bootconfig.c                         | 162 +++++++++++++++++++++++++++++--
 lib/embedded-cmdline.S                   |  16 +++
 tools/bootconfig/Makefile                |   4 +-
 12 files changed, 388 insertions(+), 36 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply

* [PATCH v7 1/9] bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

xbc_snprint_cmdline() is meant to be called twice: first with
buf=NULL, size=0 to probe the rendered length, then with a real
buffer to fill it (the standard snprintf() two-pass pattern). The
probe call makes the function compute "buf + size" (NULL + 0) and,
on every iteration, advance "buf += ret" from that NULL base and
pass the result back into snprintf().

Pointer arithmetic on a NULL pointer is undefined behavior. It is
harmless in the in-kernel callers today, but the follow-up patches
run this same code in the userspace tools/bootconfig parser at kernel
build time, where host UBSan / FORTIFY_SOURCE abort the build.

Track a running written length (size_t) instead of mutating @buf, and
only form "buf + len" when @buf is non-NULL. snprintf(NULL, 0, ...)
is itself well defined and returns the would-be length, so the
two-pass "probe then fill" usage returns identical byte counts.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd9..2ed9ee3dc81c7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -427,10 +427,18 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
 int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 {
 	struct xbc_node *knode, *vnode;
-	char *end = buf + size;
 	const char *val, *q;
+	size_t len = 0;
 	int ret;
 
+	/*
+	 * Track the running written length rather than advancing @buf, so we
+	 * never form "buf + size" or "buf += ret" while @buf is NULL (the
+	 * size-probe call passes buf=NULL, size=0). NULL pointer arithmetic
+	 * is undefined behavior and trips host UBSan / FORTIFY_SOURCE when
+	 * this renderer runs at kernel build time. snprintf(NULL, 0, ...)
+	 * itself is well defined and returns the would-be length.
+	 */
 	xbc_node_for_each_key_value(root, knode, val) {
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
@@ -439,10 +447,11 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 
 		vnode = xbc_node_get_child(knode);
 		if (!vnode) {
-			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s ", xbc_namebuf);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 			continue;
 		}
 		xbc_array_for_each_value(vnode, val) {
@@ -452,15 +461,15 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 			 * whitespace.
 			 */
 			q = strpbrk(val, " \t\r\n") ? "\"" : "";
-			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
-				       xbc_namebuf, q, val, q);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s=%s%s%s ", xbc_namebuf, q, val, q);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 		}
 	}
 
-	return buf - (end - size);
+	return len;
 }
 #undef rest
 

-- 
2.53.0-Meta


^ permalink raw reply related

* Re: [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask
From: Yury Norov @ 2026-06-26 12:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Shrikanth Hegde, linux-kernel, mingo, juri.lelli, vincent.guittot,
	yury.norov, kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini,
	seanjc, vschneid, huschle, rostedt, dietmar.eggemann, maddy,
	srikar, hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260626093901.GN1181229@noisy.programming.kicks-ass.net>

On Fri, Jun 26, 2026 at 11:39:01AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2026 at 06:16:28PM +0530, Shrikanth Hegde wrote:
> 
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index 80211900f373..5a643d608ea6 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -120,12 +120,20 @@ extern struct cpumask __cpu_enabled_mask;
> >  extern struct cpumask __cpu_present_mask;
> >  extern struct cpumask __cpu_active_mask;
> >  extern struct cpumask __cpu_dying_mask;
> > +
> > +#ifdef CONFIG_PREFERRED_CPU
> > +extern struct cpumask __cpu_preferred_mask;
> > +#else
> > +#define __cpu_preferred_mask __cpu_active_mask
> > +#endif
> 
> This is cure, but does it not result in set_cpu_preferred() changing
> active mask, and it that not somewhat unexpected behaviour?

I agree, and I think I already commented on it on previous round.
set_cpu_preferred() should be protected the same way as the
corresponding mask, and should be a NOP when CONFIG_PREFERRED_CPU
is disabled.

> >  #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
> >  #define cpu_online_mask   ((const struct cpumask *)&__cpu_online_mask)
> >  #define cpu_enabled_mask   ((const struct cpumask *)&__cpu_enabled_mask)
> >  #define cpu_present_mask  ((const struct cpumask *)&__cpu_present_mask)
> >  #define cpu_active_mask   ((const struct cpumask *)&__cpu_active_mask)
> >  #define cpu_dying_mask    ((const struct cpumask *)&__cpu_dying_mask)
> > +#define cpu_preferred_mask ((const struct cpumask *)&__cpu_preferred_mask)
> >  
> >  extern atomic_t __num_online_cpus;
> >  extern unsigned int __num_possible_cpus;
> 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index bc4f7a9ba64e..d623a9c5554a 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -3107,6 +3107,11 @@ EXPORT_SYMBOL(__cpu_dying_mask);
> >  atomic_t __num_online_cpus __read_mostly;
> >  EXPORT_SYMBOL(__num_online_cpus);
> >  
> > +#ifdef CONFIG_PREFERRED_CPU
> > +struct cpumask __cpu_preferred_mask __read_mostly;
> > +EXPORT_SYMBOL(__cpu_preferred_mask);
> > +#endif
> 
> Precedent is definitely towards !GPL exports for this, but could we get
> away with making this one GPL?
> 
> 
> > @@ -3164,6 +3169,7 @@ void __init boot_cpu_init(void)
> >  	/* Mark the boot cpu "present", "online" etc for SMP and UP case */
> >  	set_cpu_online(cpu, true);
> >  	set_cpu_active(cpu, true);
> > +	set_cpu_preferred(cpu, true);
> 
> This sets active twice, which is harmless, but wasteful...

I think, the good criteria for correctness of this series would be the
identical binaries before the series, and when CONFIG_PREFERRED_CPU is
off. At least, as a mental model. This double-set chunk breaks that
model. 

Thanks,
Yury
 
> >  	set_cpu_present(cpu, true);
> >  	set_cpu_possible(cpu, true);
> >  
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2f4530eb543f..9e16946c9d62 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8685,6 +8685,9 @@ int sched_cpu_activate(unsigned int cpu)
> >  	 */
> >  	sched_set_rq_online(rq, cpu);
> >  
> > +	/* preferred is subset of active and follows its state */
> > +	set_cpu_preferred(cpu, true);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -8698,6 +8701,8 @@ int sched_cpu_deactivate(unsigned int cpu)
> >  	if (ret)
> >  		return ret;
> >  
> > +	set_cpu_preferred(cpu, false);
> > +
> >  	/*
> >  	 * Remove CPU from nohz.idle_cpus_mask to prevent participating in
> >  	 * load balancing when not active
> 
> But this one clears active earlier, is that not a problem?
> 
> Perhaps it is best if the modifier is a no-op when preferred mask does
> not exist?

^ permalink raw reply

* Re: [PATCH net-next] Documentation: networking: Add a test plan for ethtool pause validation
From: Andrew Lunn @ 2026-06-26 12:39 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni, Simon Horman,
	Russell King, Heiner Kallweit, Jonathan Corbet, Shuah Khan,
	Oleksij Rempel, Vladimir Oltean, Florian Fainelli,
	thomas.petazzoni, netdev, linux-kernel, linux-doc
In-Reply-To: <65d26fd2-fbb3-49cd-a9ac-07863d9a8909@bootlin.com>

On Fri, Jun 26, 2026 at 10:33:50AM +0200, Maxime Chevallier wrote:
> 
> > Sphinx follows pythons object orientate structure. So you could have a
> > class test_ethtool_pause_advertising, with class documentation. And
> > then methods within the class which are individual tests.  The
> > commented out section would then be method documentation.
> 
> Good point, so maybe something along these lines :
> 
>  - A class for the test
>  - methods for indivitual tests
>  - For readability, I've written what the internal test helper would look
>    like (_adv_test), and how a test would look like without the helper in
>    adv_rx_on_tx_on().
> 
> I'm already diving into coding, but it helps me a bit in the definition of the
> "description" format :)
> 
> this is what the class would look like :

I like this :-)

> 
> 
>     @ksft_ethtool_needs_supported_allof([Pause])
>     def adv_rx_on_tx_on(cfg, peer) -> None:

Using decorators is a nice idea. Since it is not a C concept, please
give the decorator a good comment explaining what it does. We should
not assume driver developers know python.

>         """Advertising test with rx on tx on
> 
>         - run 'ethtool -A ethX rx on tx on autoneg on'
>         - FAIL if the return isn't 0
>         - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
>           "Pause" or contains "Asym_Pause"
>         - FAIL if peer's lp_advertising doesn't contain "Pause" or contains
>           "Asym_Pause"
>         - Succeed otherwise
>         """
>         ret = cfg.run('ethtool -A ethX rx on tx on autoneg on')
>         ksft_eq(ret, 0)
> 
>         linkmodes = cfg.get_advertising()
>         ksft_in('Pause', linkmodes, "rx on tx on must advertise Pause")
>         ksft_not_in('Asym_Pause', linkmodes, "rx on tx on must not advertise Asym_Pause")
> 
>         remote_linkmodes = peer.get_lp_advertising()
>         ksft_in('Pause', linkmodes, "PHY does not advertise Pause")
>         ksft_not_in('Asym_Pause', linkmodes, "PHY incorrectly advertises Asym_Pause")

There should be a sleep in here somewhere, to allow the autoneg to
complete.

	Andrew

^ permalink raw reply

* Re: [PATCH v3] mm/mempool: Untangle CONFIG_SLUB_DEBUG_ON abuse and switch to static key
From: Vlastimil Babka (SUSE) @ 2026-06-26 11:14 UTC (permalink / raw)
  To: lirongqing, Jonathan Corbet, Shuah Khan, Harry Yoo, Andrew Morton,
	Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	linux-doc, linux-kernel, linux-mm
  Cc: Matthew Wilcox, Usama Arif
In-Reply-To: <20260604110318.2089-1-lirongqing@baidu.com>

On 6/4/26 13:03, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> The mempool subsystem historically wrapped its debugging logic inside an
> merely defines compile-time defaults for SLUB and caused two flaws:
> 
> 1. On production kernels where CONFIG_SLUB_DEBUG=y but
>    CONFIG_SLUB_DEBUG_ON=n, mempool debugging was completely compiled out
>    at compile time.
> 2. On kernels with CONFIG_SLUB_DEBUG_ON=y, mempool debugging stayed active
>    even if a user explicitly disabled slub debugging at boot time.
> 
> Clean up this mess by removing the #ifdef and switching to a runtime static
> key (mempool_debug_enabled), allowing mempool debugging to be toggled
> cleanly via its own boot parameter.
> 
> Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Harry Yoo <harry@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hao Li <hao.li@linux.dev>
> Cc: Christoph Lameter <cl@gentwo.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Usama Arif <usama.arif@linux.dev>

Added to slab/for-7.3/misc, will exponse to -next after 7.2-rc1.
Thanks!


^ permalink raw reply

* Re: [PATCH 19/19] MAINTAINERS: add Rambus CryptoManager Hub (CMH)
From: Krzysztof Kozlowski @ 2026-06-26 10:57 UTC (permalink / raw)
  To: Saravanakrishnan Krishnamoorthy
  Cc: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
	devicetree, Joel Wittenauer, linux-api, linux-crypto, linux-doc,
	linux-kernel, linux-kselftest, linux-riscv, Shuah Khan,
	sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-20-skrishnamoorthy@rambus.com>

On Thu, Jun 25, 2026 at 10:33:27AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> From: Alex Ousherovitch <aousherovitch@rambus.com>
> 
> Add MAINTAINERS entry for the CRI CryptoManager Hub (CMH) hardware
> crypto accelerator driver under drivers/crypto/cmh/.
> 
> Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
> Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
> Reviewed-by: Thi Nguyen <thin@rambus.com>

Are these people really provided you with Reviewer's statement of
oversight? Do they understand what does it mean?

> ---
>  MAINTAINERS | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 90034eb7874e..ecb389795e3d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6797,6 +6797,25 @@ F:       kernel/cred.c
>  F:     rust/kernel/cred.rs
>  F:     Documentation/security/credentials.rst
> 
> +CRI CRYPTOMANAGER HUB (CMH) HARDWARE CRYPTO ACCELERATOR
> +M:     Alex Ousherovitch <aousherovitch@rambus.com>
> +M:     Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> +R:     Joel Wittenauer <Joel.Wittenauer@cryptography.com>
> +R:     Thi Nguyen <thin@rambus.com>
> +L:     linux-crypto@vger.kernel.org
> +L:     sipsupport@rambus.com (moderated for non-subscribers)

NAK, drop. You are not allowed to add here internal moderated mailing
lists. We are not going to participate in your corporate dances.

> +S:     Maintained
> +T:     git https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git

Drop, you do not have commit rights there.

> +F:     Documentation/ABI/testing/cmh-mgmt
> +F:     Documentation/ABI/testing/debugfs-driver-cmh
> +F:     Documentation/ABI/testing/sysfs-driver-cmh
> +F:     Documentation/crypto/device_drivers/cmh.rst
> +F:     Documentation/devicetree/bindings/crypto/cri,cmh.yaml
> +F:     Documentation/userspace-api/ioctl/cmh_mgmt.rst
> +F:     drivers/crypto/cmh/
> +F:     include/uapi/linux/cmh_mgmt_ioctl.h
> +F:     tools/testing/selftests/drivers/crypto/cmh/
> +
>  INTEL CRPS COMMON REDUNDANT PSU DRIVER
>  M:     Ninad Palsule <ninad@linux.ibm.com>
>  L:     linux-hwmon@vger.kernel.org
> --
> 2.43.7
> 
> 
> ** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Heh, I should have ignored your message...


Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH 01/19] dt-bindings: crypto: add Rambus CryptoManager Hub
From: Krzysztof Kozlowski @ 2026-06-26 10:55 UTC (permalink / raw)
  To: Saravanakrishnan Krishnamoorthy
  Cc: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
	devicetree, Joel Wittenauer, linux-api, linux-crypto, linux-doc,
	linux-kernel, linux-kselftest, linux-riscv, Shuah Khan,
	sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-2-skrishnamoorthy@rambus.com>

On Thu, Jun 25, 2026 at 10:33:09AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> From: Alex Ousherovitch <aousherovitch@rambus.com>
> 
> Add device tree binding schema for the CRI CryptoManager Hub (CMH)
> hardware crypto accelerator.  The binding covers the parent SoC-level
> node with register region, interrupt, DMA properties, and per-core
> child nodes identified by compatible string and unit address.

...

> 
> ** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

OK, we are done. I am removing your posting from Patchwork.

Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH v2 2/2] hwmon: (chipcap2) Add support for label
From: Javier Carrasco @ 2026-06-26 10:14 UTC (permalink / raw)
  To: Flaviu Nistor, Guenter Roeck, Javier Carrasco, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, linux-kernel, devicetree, linux-doc
In-Reply-To: <20260625160423.17882-2-flaviu.nistor@gmail.com>

On Thu Jun 25, 2026 at 6:04 PM CEST, Flaviu Nistor wrote:
> Add support for label sysfs attribute similar to other hwmon devices.
> This is particularly useful for systems with multiple sensors on the
> same board, where identifying individual sensors is much easier since
> labels can be defined via device tree.
>
> Signed-off-by: Flaviu Nistor <flaviu.nistor@gmail.com>
> ---
> Changes in v2:
> - No change for this patch in the patch series.
> - Link to v1: https://lore.kernel.org/all/20260622122200.14245-1-flaviu.nistor@gmail.com/
>
>  Documentation/hwmon/chipcap2.rst |  2 ++
>  drivers/hwmon/chipcap2.c         | 25 +++++++++++++++++++++++--
>  2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/hwmon/chipcap2.rst b/Documentation/hwmon/chipcap2.rst
> index dc165becc64c..c38d87b91b69 100644
> --- a/Documentation/hwmon/chipcap2.rst
> +++ b/Documentation/hwmon/chipcap2.rst
> @@ -70,4 +70,6 @@ humidity1_min_hyst:             RW      humidity low hystersis
>  humidity1_max_hyst:             RW      humidity high hystersis
>  humidity1_min_alarm:            RO      humidity low alarm indicator
>  humidity1_max_alarm:            RO      humidity high alarm indicator
> +humidity1_label:                RO      descriptive name for the sensor
> +temp1_label:                    RO      descriptive name for the sensor
>  =============================== ======= ========================================
> diff --git a/drivers/hwmon/chipcap2.c b/drivers/hwmon/chipcap2.c
> index 4aecf463180f..086571d556b7 100644
> --- a/drivers/hwmon/chipcap2.c
> +++ b/drivers/hwmon/chipcap2.c
> @@ -22,6 +22,8 @@
>  #include <linux/irq.h>
>  #include <linux/module.h>
>  #include <linux/regulator/consumer.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/property.h>
>
>  #define CC2_START_CM			0xA0
>  #define CC2_START_NOM			0x80
> @@ -83,6 +85,7 @@ struct cc2_data {
>  	struct i2c_client *client;
>  	struct regulator *regulator;
>  	const char *name;
> +	const char *label;
>  	int irq_ready;
>  	int irq_low;
>  	int irq_high;
> @@ -449,6 +452,8 @@ static umode_t cc2_is_visible(const void *data, enum hwmon_sensor_types type,
>  		switch (attr) {
>  		case hwmon_humidity_input:
>  			return 0444;
> +		case hwmon_humidity_label:
> +			return cc2->label ? 0444 : 0;
>  		case hwmon_humidity_min_alarm:
>  			return cc2->rh_alarm.low_alarm_visible ? 0444 : 0;
>  		case hwmon_humidity_max_alarm:
> @@ -466,6 +471,8 @@ static umode_t cc2_is_visible(const void *data, enum hwmon_sensor_types type,
>  		switch (attr) {
>  		case hwmon_temp_input:
>  			return 0444;
> +		case hwmon_temp_label:
> +			return cc2->label ? 0444 : 0;
>  		default:
>  			return 0;
>  		}
> @@ -552,6 +559,16 @@ static int cc2_humidity_max_alarm_status(struct cc2_data *data, long *val)
>  	return 0;
>  }
>
> +static int cc2_read_string(struct device *dev, enum hwmon_sensor_types type,
> +			   u32 attr, int channel, const char **str)
> +{
> +	struct cc2_data *data = dev_get_drvdata(dev);
> +
> +	*str = data->label;
> +
> +	return 0;
> +}
> +
>  static int cc2_read(struct device *dev, enum hwmon_sensor_types type, u32 attr,
>  		    int channel, long *val)
>  {
> @@ -670,8 +687,9 @@ static int cc2_request_alarm_irqs(struct cc2_data *data, struct device *dev)
>  }
>
>  static const struct hwmon_channel_info *cc2_info[] = {
> -	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT),
> -	HWMON_CHANNEL_INFO(humidity, HWMON_H_INPUT | HWMON_H_MIN | HWMON_H_MAX |
> +	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_LABEL),
> +	HWMON_CHANNEL_INFO(humidity, HWMON_H_INPUT | HWMON_H_LABEL |
> +			   HWMON_H_MIN | HWMON_H_MAX |
>  			   HWMON_H_MIN_HYST | HWMON_H_MAX_HYST |
>  			   HWMON_H_MIN_ALARM | HWMON_H_MAX_ALARM),
>  	NULL
> @@ -680,6 +698,7 @@ static const struct hwmon_channel_info *cc2_info[] = {
>  static const struct hwmon_ops cc2_hwmon_ops = {
>  	.is_visible = cc2_is_visible,
>  	.read = cc2_read,
> +	.read_string = cc2_read_string,
>  	.write = cc2_write,
>  };
>
> @@ -710,6 +729,8 @@ static int cc2_probe(struct i2c_client *client)
>  		return dev_err_probe(dev, PTR_ERR(data->regulator),
>  				     "Failed to get regulator\n");
>
> +	device_property_read_string(dev, "label", &data->label);
> +
>  	ret = cc2_request_ready_irq(data, dev);
>  	if (ret)
>  		return dev_err_probe(dev, ret, "Failed to request ready irq\n");

Reviewed-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>

^ permalink raw reply

* Re: [PATCH] Docs: conf.py: fix typos in comments
From: Mauro Carvalho Chehab @ 2026-06-26 10:07 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-doc, Jonathan Corbet, Shuah Khan, Mauro Carvalho Chehab
In-Reply-To: <20260626005248.1121464-1-rdunlap@infradead.org>

On Thu, 25 Jun 2026 17:52:48 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> Change "variabled" to "variables".
> Change "relative patch" to "relative path".
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <skhan@linuxfoundation.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> 
>  Documentation/conf.py |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- linext-2026-0623.orig/Documentation/conf.py
> +++ linext-2026-0623/Documentation/conf.py
> @@ -61,12 +61,12 @@ manpages_url = 'https://man7.org/linux/m
>  
>  def config_init(app, config):
>      """
> -    Initialize path-dependent variabled
> +    Initialize path-dependent variables
>  
>      On Sphinx, all directories are relative to what it is passed as
>      SOURCEDIR parameter for sphinx-build. Due to that, all patterns
>      that have directory names on it need to be dynamically set, after
> -    converting them to a relative patch.
> +    converting them to a relative path.
>  
>      As Sphinx doesn't include any patterns outside SOURCEDIR, we should
>      exclude relative patterns that start with "../".

-- 
Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH v2 1/2] dt-bindings: hwmon: chipcap2: Add label property
From: Krzysztof Kozlowski @ 2026-06-26 10:05 UTC (permalink / raw)
  To: Flaviu Nistor
  Cc: Guenter Roeck, Javier Carrasco, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Jonathan Corbet, Shuah Khan, linux-hwmon,
	linux-kernel, devicetree, linux-doc
In-Reply-To: <20260625160423.17882-1-flaviu.nistor@gmail.com>

On Thu, Jun 25, 2026 at 07:04:22PM +0300, Flaviu Nistor wrote:
> Add support for an optional label property similar to other hwmon devices.
> This allows, in case of boards with multiple CHIPCAP2 sensors, to assign
> distinct names to each instance.
> 
> Signed-off-by: Flaviu Nistor <flaviu.nistor@gmail.com>
> ---
> Changes in v2:
> - Implement suggestion from Javier Carrasco as proposed by Krzysztof Kozlowski.
> - Link to v1: https://lore.kernel.org/all/20260622122200.14245-1-flaviu.nistor@gmail.com/
> 
>  .../devicetree/bindings/hwmon/amphenol,chipcap2.yaml        | 6 ++++++
>  1 file changed, 6 insertions(+)

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>

Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH v5 09/24] sched/fair: Pull the load on preferred CPU
From: Peter Zijlstra @ 2026-06-26 10:00 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, mingo, juri.lelli, vincent.guittot, yury.norov,
	kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini, seanjc,
	vschneid, huschle, rostedt, dietmar.eggemann, maddy, srikar,
	hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260625124648.802832-10-sshegde@linux.ibm.com>

On Thu, Jun 25, 2026 at 06:16:33PM +0530, Shrikanth Hegde wrote:

> @@ -14375,6 +14379,10 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
>  	if (!cpu_active(this_cpu))
>  		return 0;
>  
> +	/* Do not pull to a !preferred CPU just to push it out next */
> +	if (!cpu_preferred(this_cpu))
> +		return 0;
> +
>  	/*
>  	 * This is OK, because current is on_cpu, which avoids it being picked
>  	 * for load-balance and preemption/IRQs are still disabled avoiding

Why not just replace the cpu_active() check above?

^ permalink raw reply

* Re: [PATCH v5 07/24] sched/fair: Select preferred CPU at wakeup when possible
From: Peter Zijlstra @ 2026-06-26  9:59 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, mingo, juri.lelli, vincent.guittot, yury.norov,
	kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini, seanjc,
	vschneid, huschle, rostedt, dietmar.eggemann, maddy, srikar,
	hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260625124648.802832-8-sshegde@linux.ibm.com>

On Thu, Jun 25, 2026 at 06:16:31PM +0530, Shrikanth Hegde wrote:
> Update available_idle_cpu to consider preferred CPUs. This takes care of
> lot of decisions at wakeup to use only preferred CPUs. There is no need to
> put those explicit checks everywhere.
> 
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
>  kernel/sched/sched.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5d009c2529b2..148fe6145f1a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1434,6 +1434,9 @@ static inline bool available_idle_cpu(int cpu)
>  	if (!idle_rq(cpu_rq(cpu)))
>  		return 0;
>  
> +	if (!cpu_preferred(cpu))
> +		return 0;
> +
>  	if (vcpu_is_preempted(cpu))
>  		return 0;
>  

This one might hurt, it is a whole extra cacheline in otherwise already
sensitive (wakeup) paths.


^ permalink raw reply

* Re: [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask
From: Peter Zijlstra @ 2026-06-26  9:41 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, mingo, juri.lelli, vincent.guittot, yury.norov,
	kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini, seanjc,
	vschneid, huschle, rostedt, dietmar.eggemann, maddy, srikar,
	hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260626093901.GN1181229@noisy.programming.kicks-ass.net>

On Fri, Jun 26, 2026 at 11:39:01AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2026 at 06:16:28PM +0530, Shrikanth Hegde wrote:
> 
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index 80211900f373..5a643d608ea6 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -120,12 +120,20 @@ extern struct cpumask __cpu_enabled_mask;
> >  extern struct cpumask __cpu_present_mask;
> >  extern struct cpumask __cpu_active_mask;
> >  extern struct cpumask __cpu_dying_mask;
> > +
> > +#ifdef CONFIG_PREFERRED_CPU
> > +extern struct cpumask __cpu_preferred_mask;
> > +#else
> > +#define __cpu_preferred_mask __cpu_active_mask
> > +#endif
> 
> This is cure, but does it not result in set_cpu_preferred() changing
s/cure/cute/
> active mask, and it that not somewhat unexpected behaviour?
s/it/is/

Typing hard, clearly. Also hitting 30C before noon :-(



^ permalink raw reply

* Re: [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask
From: Peter Zijlstra @ 2026-06-26  9:39 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, mingo, juri.lelli, vincent.guittot, yury.norov,
	kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini, seanjc,
	vschneid, huschle, rostedt, dietmar.eggemann, maddy, srikar,
	hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260625124648.802832-5-sshegde@linux.ibm.com>

On Thu, Jun 25, 2026 at 06:16:28PM +0530, Shrikanth Hegde wrote:

> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index 80211900f373..5a643d608ea6 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -120,12 +120,20 @@ extern struct cpumask __cpu_enabled_mask;
>  extern struct cpumask __cpu_present_mask;
>  extern struct cpumask __cpu_active_mask;
>  extern struct cpumask __cpu_dying_mask;
> +
> +#ifdef CONFIG_PREFERRED_CPU
> +extern struct cpumask __cpu_preferred_mask;
> +#else
> +#define __cpu_preferred_mask __cpu_active_mask
> +#endif

This is cure, but does it not result in set_cpu_preferred() changing
active mask, and it that not somewhat unexpected behaviour?

>  #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
>  #define cpu_online_mask   ((const struct cpumask *)&__cpu_online_mask)
>  #define cpu_enabled_mask   ((const struct cpumask *)&__cpu_enabled_mask)
>  #define cpu_present_mask  ((const struct cpumask *)&__cpu_present_mask)
>  #define cpu_active_mask   ((const struct cpumask *)&__cpu_active_mask)
>  #define cpu_dying_mask    ((const struct cpumask *)&__cpu_dying_mask)
> +#define cpu_preferred_mask ((const struct cpumask *)&__cpu_preferred_mask)
>  
>  extern atomic_t __num_online_cpus;
>  extern unsigned int __num_possible_cpus;

> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index bc4f7a9ba64e..d623a9c5554a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -3107,6 +3107,11 @@ EXPORT_SYMBOL(__cpu_dying_mask);
>  atomic_t __num_online_cpus __read_mostly;
>  EXPORT_SYMBOL(__num_online_cpus);
>  
> +#ifdef CONFIG_PREFERRED_CPU
> +struct cpumask __cpu_preferred_mask __read_mostly;
> +EXPORT_SYMBOL(__cpu_preferred_mask);
> +#endif

Precedent is definitely towards !GPL exports for this, but could we get
away with making this one GPL?


> @@ -3164,6 +3169,7 @@ void __init boot_cpu_init(void)
>  	/* Mark the boot cpu "present", "online" etc for SMP and UP case */
>  	set_cpu_online(cpu, true);
>  	set_cpu_active(cpu, true);
> +	set_cpu_preferred(cpu, true);

This sets active twice, which is harmless, but wasteful...

>  	set_cpu_present(cpu, true);
>  	set_cpu_possible(cpu, true);
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2f4530eb543f..9e16946c9d62 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8685,6 +8685,9 @@ int sched_cpu_activate(unsigned int cpu)
>  	 */
>  	sched_set_rq_online(rq, cpu);
>  
> +	/* preferred is subset of active and follows its state */
> +	set_cpu_preferred(cpu, true);
> +
>  	return 0;
>  }
>  
> @@ -8698,6 +8701,8 @@ int sched_cpu_deactivate(unsigned int cpu)
>  	if (ret)
>  		return ret;
>  
> +	set_cpu_preferred(cpu, false);
> +
>  	/*
>  	 * Remove CPU from nohz.idle_cpus_mask to prevent participating in
>  	 * load balancing when not active

But this one clears active earlier, is that not a problem?

Perhaps it is best if the modifier is a no-op when preferred mask does
not exist?

^ permalink raw reply

* Re: [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask
From: Peter Zijlstra @ 2026-06-26  9:34 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, mingo, juri.lelli, vincent.guittot, yury.norov,
	kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini, seanjc,
	vschneid, huschle, rostedt, dietmar.eggemann, maddy, srikar,
	hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260625124648.802832-5-sshegde@linux.ibm.com>

On Thu, Jun 25, 2026 at 06:16:28PM +0530, Shrikanth Hegde wrote:
> This patch does
> - Declare and Define cpu_preferred_mask.
> - Get/Set helpers for it.

There is a blub in submitting-patches.rst about how 'this patch' is
basically a red-flag for a changelog.

The changelog is per-definition pertaining to 'this patch', therefore
stating this is a tautology. Further, it is often fairly clear what the
patch does, but less clear as to why.

So the suggestion is to phrase this like:

Provide cpu_preferred_mask infrastructure (definitions, declarations and
helper methods) to facilitate ....


> Values are set/clear by the scheduler by detecting the steal time values.
> 
> A CPU is set to preferred when it becomes active. Later it may be
> marked as non-preferred depending on steal time values with
> steal monitor being enabled.
> 
> Always maintain design construct of preferred is subset of active.
> i.e. preferred ⊆ active ⊆ online ⊆ present ⊆ possible
> 
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>

^ permalink raw reply

* Re: [PATCH v4 2/5] mm/zswap: Factor writeback loop out of shrink_worker()
From: Hao Jia @ 2026-06-26  9:34 UTC (permalink / raw)
  To: Yosry Ahmed, nphamcs
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, chengming.zhou,
	muchun.song, roman.gushchin, linux-mm, linux-kernel, linux-doc,
	Hao Jia
In-Reply-To: <CAO9r8zOYgjbuG5i+LrCcMK764nVpOS+muo-5Q45ZFdiVus-dTA@mail.gmail.com>



On 2026/6/26 01:59, Yosry Ahmed wrote:
>>>> static long zswap_shrink_one(struct mem_cgroup *memcg,
>>>>                     struct zswap_shrink_state *s)
>>>> {
>>>>        long shrunk;
>>>>
>>>>        shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
>>>>        if (shrunk == -ENOENT)
>>>>            return 0;
>>>>
>>>>        s->attempts++;
>>>>        if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
>>>>            s->stop = true;
>>>
>>> Do we need 'stop' or can we just return a value here to indicate that
>>> we should stop (e.g. -EBUSY)?
>>>
>>
>> Perhaps we could return -EAGAIN instead of -EBUSY? This would align with
>> the semantics of the memory.reclaim interface, which returns -EAGAIN
>> when it reclaims fewer bytes than requested.
> 
> Hmm but -EAGAIN tells the caller to try again, while here -EAGAIN
> tells the caller *not* to try again because we exhausted all retries?
> 

Okay, let's go with -EBUSY.

>>>
>>> I think splitting the shrink/retry logic over 2 functions makes it
>>> more difficult to follow, so yeah I think fold
>>> zswap_shrink_no_candidate() into zswap_shrink_one(). Then the callers
>>> only need to iterate memcgs (depending on the context) and call
>>> zswap_shrink_one() for each of them.
>>
>> So, something like this?
> 
> Yeah, something like this :)
> 
>> /* Track progress of a memcg-tree writeback walk. */
>> struct zswap_shrink_state {
>>       int attempts;
> 
> While at it, I think "attempts" is really the number of scans, right?
> Should we rename it? Maybe "scans" or similar?
> 
>>       int failures;
>> };
>>
>> /*
>>    * Take one step of a memcg-tree writeback walk driven by the caller's
>>    * iterator, and fold the result into @s, the retry bookkeeping shared
>>    * across steps. @memcg is the iterator's current memcg, or NULL once
>>    * it has wrapped around after a full pass over the tree.
>>    *
>>    * The function returns -EAGAIN to signal the caller to abort the walk
>>    * after encountering the following conditions MAX_RECLAIM_RETRIES times:
>>    * - No writeback-candidate memcgs were found in a memcg tree walk.
>>    * - Shrinking a writeback-candidate memcg failed.
> 
> Orthogonal to this patch, but I wonder if this can be simplified. I
> wonder if these two conditions can be replaced with "shrinking a memcg
> that has zswap entries failed". The "no writeback-candidate memcgs in
> the tree" case seems like we should abort right away instead of
> retrying?
> 
> Nhat, WDYT?
> 

Perhaps something like the following is what you had in mind? I've 
drafted the implementation below to make it easier for Nhat to compare 
with the previous behavior.


>>    *
>>    * Return: The number of compressed bytes written back (>= 0), or -EAGAIN
>>    * once the retry budget is exhausted and the caller should abort the walk.
>>    */
>> static long zswap_shrink_one(struct mem_cgroup *memcg,
> 
> Nit: zswap_shrink_one_memcg()
> 
> BTW, the existing writeback logic has been broken for a while now when
> memcg is disabled. I think we constantly hit the !memcg case and run
> out of retries. Not sure if your patch changes this in any way, or if
> you want to fix that while you're at it :)

Yes, I'd be happy to do that. However, would it be better to submit a 
separate fix patch or combine it with this one?

> 
>>                    struct zswap_shrink_state *s)
>> {
>>       long shrunk;
>>
>>       /*
>>        * If the iterator has completed a full pass, update the shrink state
>>        * and check whether we should keep going.
>>        */
>>       if (!memcg) {
>>           /*
>>            * Continue shrinking without incrementing failures if we found
>>            * candidate memcgs in the last tree walk.
>>            */
>>           if (!s->attempts && ++s->failures == MAX_RECLAIM_RETRIES)
>>               return -EAGAIN;
>>           s->attempts = 0;
>>           return 0;
>>       }
>>
>>       shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
>>
>>       /*
>>        * There are no writeback-candidate pages in the memcg. This is not an
>>        * issue as long as we can find another memcg with pages in zswap. Skip
>>        * this without incrementing attempts and failures.
>>        */
>>       if (shrunk == -ENOENT)
>>           return 0;
>>       s->attempts++;
>>
>>       if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
>>           return -EAGAIN;
>>
>>       return shrunk;
>> }
>>
>> static void shrink_worker(struct work_struct *w)
>> {
>>       struct zswap_shrink_state s = {};
>>       unsigned long thr;
>>
>>       /* Reclaim down to the accept threshold */
>>       thr = zswap_accept_thr_pages();
>>
>>       while (zswap_total_pages() > thr) {
>>           struct mem_cgroup *memcg;
>>           long ret;
>>
>>           cond_resched();
>>
>>           memcg = zswap_iter_global();
> 
> Do we still need this helper? Or should we just keep the memcg
> iteration open-coded?

Done.
> 
>>           ret = zswap_shrink_one(memcg, &s);
>>           /* drop the extra reference taken by zswap_iter_global() */
>>           mem_cgroup_put(memcg);
>>           if (ret == -EAGAIN)
>>               break;
>>       }
>> }


/* Track progress of a memcg-tree writeback walk. */
struct zswap_shrink_state {
     int scans;
     int failures;
};

/*
  * Take one step of a memcg-tree writeback walk driven by the caller's
  * iterator, and fold the result into @s, the retry bookkeeping shared
  * across steps. @memcg is the iterator's current memcg, or NULL once
  * it has wrapped around after a full pass over the tree.
  *
  * The function returns -EBUSY to signal the caller to abort the walk when
  * either of the following occurs:
  * - A full pass over the tree found no writeback-candidate memcg.
  * - Shrinking a writeback-candidate memcg failed MAX_RECLAIM_RETRIES 
times.
  *
  * When memory cgroup is disabled, the iterator always yields NULL. All
  * zswap entries then live on the root list_lru, so NULL is treated as the
  * root memcg and shrunk directly rather than as a completed tree pass.
  *
  * Return: The number of compressed bytes written back (>= 0), or -EBUSY
  * when the caller should abort the walk.
  */
static long zswap_shrink_one_memcg(struct mem_cgroup *memcg,
                    struct zswap_shrink_state *s)
{
     bool disabled = mem_cgroup_disabled();
     long shrunk;

     /*
      * If the iterator has completed a full pass, update the shrink state
      * and check whether we should keep going.
      * With memcg disabled the iterator always yields NULL, so fall through
      * and shrink the root memcg directly instead.
      */
     if (!memcg && !disabled) {
         /*
          * Abort if no writeback-candidate memcgs in the last tree walk.
          * Otherwise reset the scans count and continue.
          */
         if (!s->scans)
             return -EBUSY;
         s->scans = 0;
         return 0;
     }

     shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);

     /*
      * There are no writeback-candidate pages in the memcg. With memcg
      * enabled this is not an issue as long as we can find another memcg
      * with pages in zswap, so skip without counting it as a candidate.
      * With memcg disabled the root LRU is the only target, so we should
      * abort if it has no writeback-candidate pages.
      */
     if (shrunk == -ENOENT)
         return disabled ? -EBUSY : 0;
     s->scans++;

     if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
         return -EBUSY;

     return shrunk;
}

static void shrink_worker(struct work_struct *w)
{
     struct zswap_shrink_state s = {};
     unsigned long thr;

     /* Reclaim down to the accept threshold */
     thr = zswap_accept_thr_pages();

     /*
      * Global reclaim will select cgroup in a round-robin fashion from all
      * online memcgs, but memcgs that have no pages in zswap and
      * writeback-disabled memcgs (memory.zswap.writeback=0) are not
      * candidates for shrinking.
      *
      * We save iteration cursor memcg into zswap_next_shrink,
      * which can be modified by the offline memcg cleaner
      * zswap_memcg_offline_cleanup().
      *
      * Since the offline cleaner is called only once, we cannot leave an
      * offline memcg reference in zswap_next_shrink.
      * We can rely on the cleaner only if we get online memcg under lock.
      *
      * If we get an offline memcg, we cannot determine if the cleaner has
      * already been called or will be called later. We must put back the
      * reference before returning from this function. Otherwise, the
      * offline memcg left in zswap_next_shrink will hold the reference
      * until the next run of shrink_worker().
      */
     while (zswap_total_pages() > thr) {
         struct mem_cgroup *memcg;
         long ret;

         cond_resched();
         /*
          * Start shrinking from the next memcg after zswap_next_shrink.
          * When the offline cleaner has already advanced the cursor,
          * advancing the cursor here overlooks one memcg, but this
          * should be negligibly rare.
          *
          * If we get an online memcg, keep the extra reference in case
          * the original one obtained by mem_cgroup_iter() is dropped by
          * zswap_memcg_offline_cleanup() while we are shrinking the
          * memcg.
          */
         spin_lock(&zswap_shrink_lock);
         do {
             memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
             zswap_next_shrink = memcg;
         } while (memcg && !mem_cgroup_tryget_online(memcg));
         spin_unlock(&zswap_shrink_lock);

         ret = zswap_shrink_one_memcg(memcg, &s);
         /* drop the extra reference taken above */
         mem_cgroup_put(memcg);
         if (ret == -EBUSY)
             break;
     }
}


Thanks,
Hao

^ permalink raw reply

* Re: [PATCH v5 13/24] virt/steal_monitor: Add documentation
From: Peter Zijlstra @ 2026-06-26  9:28 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, mingo, juri.lelli, vincent.guittot, yury.norov,
	kprateek.nayak, iii, corbet, tglx, gregkh, pbonzini, seanjc,
	vschneid, huschle, rostedt, dietmar.eggemann, maddy, srikar,
	hdanton, chleroy, vineeth, frederic, arighi, pauld,
	christian.loehle, tj, tommaso.cucinotta, maz, rafael, rdunlap,
	kernellwp, linux-doc
In-Reply-To: <20260625124648.802832-14-sshegde@linux.ibm.com>

On Thu, Jun 25, 2026 at 06:16:37PM +0530, Shrikanth Hegde wrote:

> +Core idea:
> +==========
> +steal time is an indication available today in Guest which shows contention
> +for underlying physical CPU. Use it as a hint in the guest to fold the
> +workload to a reduced set of vCPUs. When there is contention, steal time
> +will show up in all the guests. When each guest honors the hint and folds
> +the workload to a smaller set of vCPUs(Preferred CPUs), it reduces the
> +contention and thereby reduces vCPU preemption.
> +This is achieved without any cross-guest communication.
> +
> +Steal monitor driver effectively does:
> +
> +1. Periodically computes steal time across the system.
> +
> +2. If steal time is greater than high threshold, reduce the number of
> +   preferred CPUs by 1 core. Ensure at least one core is left always.
> +   This avoids running into extreme cases.
> +
> +3. If steal time is lower or equal to low threshold, increase the
> +   number of preferred CPUs by 1 core. If preferred is same as active,
> +   nothing to be done.
> +
> +4. Ensure preferred CPUs is always subset of active CPUs.
> +   On feature disable it is same as active CPUs.


So this is very much a co-operative scheme. Perhaps add a few words to
describe the effect of a non cooperative guest. IIRC the result is not
worse than the status quo. That is, if one (or more) guests refuse to
co-operate it will not make things worse, it will just not result in
improvements, right?

^ permalink raw reply

* Re: [PATCH v3 00/11] vfio/pci: Add CXL Type-2 device passthrough support
From: Richard Cheng @ 2026-06-26  9:16 UTC (permalink / raw)
  To: mhonap
  Cc: djbw, alex, jgg, jic23, dave.jiang, ankita,
	alejandro.lucero-palau, alison.schofield, dave, dmatlack, gourry,
	ira.weiny, cjia, kjaju, vsethi, zhiw, kvm, linux-cxl, linux-doc,
	linux-kernel, linux-kselftest
In-Reply-To: <20260625165407.1769572-1-mhonap@nvidia.com>

On Thu, Jun 25, 2026 at 10:23:56PM +0800, mhonap@nvidia.com wrote:
> From: Manish Honap <mhonap@nvidia.com>
> 
> CXL Type-2 accelerators (CXL.mem-capable GPUs and similar) cannot be
> passed through to virtual machines with stock vfio-pci because the
> driver has no concept of HDM decoder management, HDM region exposure,
> or component register virtualization.  This series adds those three
> pieces, sufficient for a guest to use the device's firmware-committed
> coherent memory under UVM / ATS.
> 
> v3 is a rewrite of the v2 framework form, responding to Dan's request
> in the v2 review for "less emulation, narrower interfaces, and a
> closer mapping to the spec language."
> In this release, cxl-core exposes four EXPORT_SYMBOL_GPL helpers behind
> an opaque handle.  vfio-pci becomes a thin transport on top of those.
> Please see "Changes since v2" and "Reviewer feedback addressed" below for
> the per-area summary.
>

Hi Manish,

Thanks for the work, I ran some test with your patches applied on a real
CXL type-2 device, it's a GPU with a FW-committed HDM decoder. I want to
report the result early, the acquire path works, but the first CPU access
to the mapped HDM region crash the host.

So device BDF is 0002:81:00.0 , with CXLCtl: Cache+ IO+ Mem+, HDM decoder firmware-committed.

Binding the device to vfio-pci brought the CXL Type-2 path up cleanly
"""
# modprobe vfio-pci
# echo vfio-pci > /sys/bus/pci/devices/0002:81:00.0/driver_override
# echo 0002:81:00.0 > /sys/bus/pci/drivers_probe
"""

A meme0/endpoint19/region1 appeared, and selftest device_is_cxl() passed.

When running the 9th patch's selftest
"""
# sudo ./vfio_cxl_type2_test 0002:81:00.0
ok 1 cxl_type2.device_is_cxl
#  RUN  cxl_type2.hdm_region_mmap_rw
"""
At this point, the machine hung and crash.

hdm_region_mmap_rw mmaps the HDM region and does a CPU read/write to it. That =
access never returned. I couldn't capture dmesg or trace before it crashed.

I'm not sure if this is a platform/FW issue or something in how the region
is mapped.
Have you exercised hdm_region_mmap_rw() against your machine? or only cxl_test mock?

If a guest can hang the host just by touching its mapped memory, it needs to be fixed.

Best regards,
Richard Cheng.



> Motivation
> ==========
> 
> A CXL Type-2 device exposes its HDM-mapped device memory through HDM
> decoders that BIOS programs and commits at boot.  To pass such a
> device to a guest, vfio-pci has to do three things at once:
> 
>   1. Surface the firmware-committed HDM-mapped HPA range as a guest-
>      mmappable region.
> 
>   2. Surface a CXL-spec-compliant view of the CXL Device DVSEC body,
>      the HDM Decoder Capability block, and the CXL.cache/mem cap-array
>      prefix, so the guest's CXL driver enumerates the same topology
>      the host saw.
> 
>   3. Keep the host's committed decoder configuration intact (the
>      physical decoder is never reprogrammed) while letting the guest
>      observe and manage a shadow that follows the per-field write
>      semantics in the spec.
> 
> The series builds on Alejandro Lucero-Palau's v28 work
> applied on for-7.3/cxl-type2-enabling [1] (sfc is the in-tree consumer
> today). vfio-pci becomes the second consumer.
> 
> Architecture
> ============
> 
> cxl-core owns the CXL semantics.  A new file
> drivers/cxl/core/passthrough.c (gated by hidden Kconfig
> CXL_VFIO_PASSTHROUGH) provides four exported symbols:
> 
>     struct cxl_passthrough *
>     devm_cxl_passthrough_create(struct device *dev,
>                                 struct cxl_dev_state *cxlds);
> 
>     int cxl_passthrough_dvsec_rw(p, off, val, sz, write);
>     int cxl_passthrough_hdm_rw  (p, off, val,      write);
>     int cxl_passthrough_cm_rw   (p, off, val,      write);
> 
> cxl_passthrough is an opaque handle; vfio-pci sees no cxl-internal
> struct pointers.  The shadows are snapshotted at create time: the
> DVSEC body from PCI config space dword by dword, the CM cap-array and
> HDM block from the cxl-core MMIO mapping at cxlds->reg_map.base.
> Per-field write semantics follow below:
> CXL r4.0 8.1.3 DVSEC:
> - LOCK is RWO,
> - CONTROL/CONTROL2 are RWL gated on CONFIG_LOCK,
> - STATUS/STATUS2 are RW1C,
> - RANGE1 is HwInit, RANGE2 is RsvdZ
> CXL r4.0 8.2.4.20 HDM:
> - GLOBAL_CTRL RW,
> - decoder CTRL implements COMMIT/COMMITTED,
> - decoder BASE/SIZE RWL gated on COMMITTED or LOCK_ON_COMMIT,
> - cap header HwInit).
> 
> vfio-pci becomes a thin transport.  The new module
> drivers/vfio/pci/cxl/ exposes two VFIO regions.
> 
>   VFIO_REGION_SUBTYPE_CXL (HDM region): mmappable view of the
>   HDM-mapped HPA. The mmap fault handler calls vmf_insert_pfn() from
>   the physical HPA. pread/pwrite go through the memremap_wb() kva
>   captured at bind time.
> 
>   VFIO_REGION_SUBTYPE_CXL_COMP_REGS (component register shadow):
>   pread/pwrite only, dword-aligned (-EINVAL on misalignment).
>   Each dword dispatches by offset to cxl_passthrough_cm_rw() or
>   cxl_passthrough_hdm_rw(). No shadow state on the vfio side; cxl-core
>   enforces the spec.
> 
> CXL DVSEC config-space accesses use a clipping shim in
> vfio_pci_config_rw_single(). A config-space chunk that crosses the
> DVSEC body boundary is split: header bytes go through the generic
> perm-bits path, body bytes go through cxl_passthrough_dvsec_rw().
> The shim replaces v2's approach of repointing ecap_perms[]
> 
> Sparse-mmap is exposed on the component BAR so userspace can mmap the
> non-component portions directly; only the CXL component register
> sub-range goes through pread/pwrite emulation. The CXL sub-range is
> also skipped from vfio_pci-core's request_selected_regions() set
> because cxl-core's devm_cxl_probe_mem() already holds a
> request_mem_region() on it; the asymmetric skip is matched by an
> asymmetric release on disable().
> 
> Scope and out-of-scope
> ======================
> 
> In scope (rejected at create time with -EOPNOTSUPP otherwise):
> 
>   - Firmware-committed devices (HOST_FIRMWARE_COMMITTED set).
>   - Single HDM decoder (hdm_count == 1).
>   - No interleave (IW == 0).
> 
> Out of scope, deferred for follow-on work:
> 
>   - Multi-decoder devices and interleave.
>   - Guest-driven (non-firmware-committed) HDM commit.
>   - Hotplug, FLR, and sibling-function reset of CXL Type-2 devices.
> 
> Changes since v2
> ================
> 
> This is a rewrite, not an incremental update.  The structure of the
> series changed (20 patches in v2 to 11 in v3) because v3 collapses
> v2 patches 9-15 (detection, HDM emulation, media readiness, region
> management, HDM region, DVSEC emulation) into one cxl-core helper
> file and one vfio-pci consumer.
> 
> Framework replaced by narrow opaque-handle helpers (patches 6, 8)
> 
>   v2 carried a generic register-emulation framework split across four
>   state-machine files in cxl-core.
>   v3 collapses it into one file: drivers/cxl/core/passthrough.c
>   exposing the four EXPORT_SYMBOL_GPL helpers above behind a struct
>   cxl_passthrough opaque handle.
> 
> Shadow ownership moved into cxl-core (patches 6, 8)
> 
>   vfio-pci no longer keeps any per-field state. It forwards
>   (offset, value) into cxl-core, and cxl-core enforces the spec
>   (RWO, RWL, RW1C, HwInit, RsvdZ) with explicit CXL r4.0 section
>   references in the switch arms.
> 
> DVSEC config-space clipping shim (patch 8)
> 
>   v2 repointed ecap_perms[] to redirect CXL DVSEC reads and writes.
>   v3 keeps ecap_perms[] untouched and clips per-config-access chunks
>   at the DVSEC body boundary in vfio_pci_config_rw_single(); header bytes
>   go through the generic perm-bits path, body bytes go through
>   cxl_passthrough_dvsec_rw(). The shim is local to the per-device
>   path.
> 
> CONFIG_VFIO_PCI_CXL gates the new module (patch 7)
> 
>   v2 had a CONFIG_VFIO_CXL_CORE Kconfig stub; v3 renames it to
>   CONFIG_VFIO_PCI_CXL to match the vfio-pci naming convention.
>   The hidden CXL_VFIO_PASSTHROUGH selects the cxl-core helper file
>   on demand. With both disabled, the cxl-core size is unchanged.
> 
> UAPI rewritten with named fields (patch 5)
> 
>   vfio_device_info_cap_cxl in v3 carries:
>     flags + HOST_FIRMWARE_COMMITTED bit
>     hdm_region_idx
>     comp_reg_region_idx
>     comp_reg_bar
>     comp_reg_offset
>     comp_reg_size
>   The DPA terminology is renamed to HDM region throughout.
>   CACHE_CAPABLE (HDM-DB indicator) is dropped;
>   it was informational only in v2 with no caller, and re-adding it
>   for an active CXL.cache plumbing series later.
> 
> Selftests trimmed (patch 9)
> 
>   v2 carried selftests for device detection, capability parsing,
>   region enumeration, HDM register emulation, HDM mmap with
>   page-fault insertion, FLR invalidation, and DVSEC register
>   emulation. v3 keeps a smoke-test set of six focused tests:
> 
>     device_is_cxl                  GET_INFO advertises FLAGS_CXL
>                                    and a populated CAP_CXL.
>     hdm_region_mmap_rw             mmap one page, write+read back.
>     component_bar_sparse_mmap      SPARSE_MMAP cap excludes the
>                                    CXL component register sub-range.
>     comp_regs_cm_cap_array_read    pread of the CM cap-array
>                                    header at CXL_CM_OFFSET succeeds
>                                    (CAP_ID == 1).
>     dvsec_lock_byte_read           pread of the DVSEC CONFIG_LOCK
>                                    byte through the clipping shim
>                                    succeeds.
>     hdm_decoder_commit_fsm         COMMIT / COMMITTED state machine
>                                    and LOCK_ON_COMMIT behaviour.
> 
>   FLR invalidation, page-fault insertion under load, and full
>   DVSEC field-by-field write coverage are deferred to a follow-on
>   selftest series. The current six are the minimal set that
>   exercises the kernel-side contract end-to-end.
> 
> cxl-core prep patches split (patches 1-4)
> 
>   v3 keeps the cxl-side enablers from v2 patches 1-4 but each as
>   a standalone change so the cxl maintainer can review the helper
>   API independently of the vfio consumer:
> 
>     [1/11] cxl_get_hdm_info()
>     [2/11] cxl_await_range_active() split from media-ready wait
>     [3/11] cxl_register_map records BIR + BAR offset
>     [4/11] component/HDM register defines moved to uapi/cxl/cxl_regs.h
> 
> Reviewer feedback addressed
> ===========================
> 
> Dan
> ---
> 
> - VFIO exposes HDM/host-visible region, not raw DPA; docs/UAPI say HDM
>   region, DPA only inside cxl-core where appropriate.
> - One vfio-pci device = one HDM region / one decoder, no interleave;
>   hdm_count != 1 → -EOPNOTSUPP.
> - Global HDM on DVSEC Range Base treated as legacy; RANGE1/RANGE2
>   read-only snapshot, guest writes dropped.
> - No guest/kernel lock games; DVSEC LOCK and HDM LOCK_ON_COMMIT RWO,
>   fixed at create from firmware snapshot.
> - Opaque cxl_passthrough handle only; vfio gets HPA via memdev probe +
>   layout via cxl_get_hdm_info(), rw via helpers.
> - No multi-region accelerator case in v3; single region enforced,
>   multi-region deferred.
> - cxl_await_range_active stays in cxl-core probe; not exported, vfio does
>   not call it.
> - No guest LOCK→0 reprogram; guest cannot clear LOCK to remap host HPA;
>   kernel uncommit tied to COMMIT, not LOCK alone.
> 
> Jason / Gregory / Dan
> ---------------------
> 
> - memremap(WB) + request_mem_region on HPA; conflicting direct-map/EFI use
>   fails probe with -EBUSY.
> 
> Jonathan
> --------
> 
> - uapi/cxl/cxl_regs.h for register defines so VMMs need no private
>   kernel headers.
> - __free() locals on cxl-core/passthrough error paths instead of
>   struct-owned temporaries.
> - No "precommitted at probe" assumption; acquire checks COMMITTED in
>   HDM shadow and refuses if missing.
> 
> Dave
> ----
> 
> - memremap(MEMREMAP_WB) for HDM host mapping (not ioremap_cache).
> - Renamed cap flag to VFIO_CXL_CAP_HOST_FIRMWARE_COMMITTED for clarity.
> - __free() / DEFINE_FREE() cleanup in new passthrough.c create path.
> 
> Patch series
> ============
> 
>  [1/11] cxl: Add cxl_get_hdm_info() helper for HDM decoder metadata
>  [2/11] cxl: Split cxl_await_range_active() from media-ready wait
>  [3/11] cxl: Record BIR and BAR offset in cxl_register_map
>  [4/11] cxl: Move component/HDM register defines to
>         uapi/cxl/cxl_regs.h
>  [5/11] vfio: UAPI for CXL Type-2 device passthrough
>  [6/11] cxl: Add register-virtualization helpers for vfio Type-2
>         passthrough
>  [7/11] vfio/pci: Add CONFIG_VFIO_PCI_CXL with bind-time CXL Type-2
>         acquisition
>  [8/11] vfio/pci/cxl: Add HDM + COMP_REGS regions and DVSEC clipping
>         shim
>  [9/11] selftests/vfio: Add CXL Type-2 device passthrough smoke test
> [10/11] docs: vfio-pci: Document CXL Type-2 device passthrough
> [11/11] vfio/pci: Provide opt-out for CXL Type-2 extensions
> 
> Dependencies
> ============
> 
> [1] [PATCH v28 0/5] Type2 device basic support
> https://lore.kernel.org/linux-cxl/20260618181806.118745-1-alejandro.lucero-palau@amd.com/
> 
> [2] Previous version of this patch series
> [PATCH v2 00/20] vfio/pci: Add CXL Type-2 device passthrough support
> https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/
> 
> [3] Companion QEMU series
> [RFC 0/9] QEMU: CXL Type-2 device passthrough via vfio-pci
> https://lore.kernel.org/linux-cxl/20260427181235.3003865-1-mhonap@nvidia.com/
> 
> Manish Honap (11):
>   cxl: Add cxl_get_hdm_info() helper for HDM decoder metadata
>   cxl: Split cxl_await_range_active() from media-ready wait
>   cxl: Record BIR and BAR offset in cxl_register_map
>   cxl: Move component/HDM register defines to uapi/cxl/cxl_regs.h
>   vfio: UAPI for CXL Type-2 device passthrough
>   cxl: Add register-virtualization helpers for vfio Type-2 passthrough
>   vfio/pci: Add CONFIG_VFIO_PCI_CXL with bind-time CXL Type-2
>     acquisition
>   vfio/pci/cxl: Add HDM + COMP_REGS regions and DVSEC clipping shim
>   selftests/vfio: Add CXL Type-2 device passthrough smoke test
>   docs: vfio-pci: Document CXL Type-2 device passthrough
>   vfio/pci: Provide opt-out for CXL Type-2 extensions
> 
>  Documentation/driver-api/index.rst            |   1 +
>  Documentation/driver-api/vfio-pci-cxl.rst     | 282 ++++++
>  drivers/cxl/Kconfig                           |   7 +
>  drivers/cxl/core/Makefile                     |   1 +
>  drivers/cxl/core/passthrough.c                | 590 ++++++++++++
>  drivers/cxl/core/pci.c                        |  70 +-
>  drivers/cxl/core/regs.c                       |  35 +
>  drivers/cxl/cxl.h                             |  52 +-
>  drivers/vfio/pci/Kconfig                      |   2 +
>  drivers/vfio/pci/Makefile                     |   1 +
>  drivers/vfio/pci/cxl/Kconfig                  |  34 +
>  drivers/vfio/pci/cxl/Makefile                 |   2 +
>  drivers/vfio/pci/cxl/vfio_cxl_core.c          | 889 ++++++++++++++++++
>  drivers/vfio/pci/cxl/vfio_cxl_priv.h          |  71 ++
>  drivers/vfio/pci/vfio_pci.c                   |   9 +
>  drivers/vfio/pci/vfio_pci_config.c            |  31 +
>  drivers/vfio/pci/vfio_pci_core.c              |  68 +-
>  drivers/vfio/pci/vfio_pci_priv.h              |  93 ++
>  drivers/vfio/pci/vfio_pci_rdwr.c              |  17 +
>  include/cxl/cxl.h                             |  18 +
>  include/cxl/passthrough.h                     | 121 +++
>  include/linux/vfio_pci_core.h                 |   8 +
>  include/uapi/cxl/cxl_regs.h                   |  63 ++
>  include/uapi/linux/vfio.h                     |  46 +
>  tools/testing/selftests/vfio/Makefile         |   1 +
>  .../selftests/vfio/lib/vfio_pci_device.c      |  11 +-
>  .../selftests/vfio/vfio_cxl_type2_test.c      | 350 +++++++
>  27 files changed, 2821 insertions(+), 52 deletions(-)
>  create mode 100644 Documentation/driver-api/vfio-pci-cxl.rst
>  create mode 100644 drivers/cxl/core/passthrough.c
>  create mode 100644 drivers/vfio/pci/cxl/Kconfig
>  create mode 100644 drivers/vfio/pci/cxl/Makefile
>  create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_core.c
>  create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_priv.h
>  create mode 100644 include/cxl/passthrough.h
>  create mode 100644 include/uapi/cxl/cxl_regs.h
>  create mode 100644 tools/testing/selftests/vfio/vfio_cxl_type2_test.c
> 
> base-commit: 90cf2e0d702c8a132ccbe72e7687f33c04c14658
> -- 
> 2.25.1
> 
> 

^ permalink raw reply

* Re: [PATCH] Docs: SMP: add an SMP docbook chapter
From: Thomas Gleixner @ 2026-06-26  9:04 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel
  Cc: Randy Dunlap, Jonathan Corbet, Shuah Khan, linux-doc,
	Peter Zijlstra, Andrew Morton
In-Reply-To: <20260625210331.1050915-1-rdunlap@infradead.org>

TOn Thu, Jun 25 2026 at 14:03, Randy Dunlap wrote:

> Add SMP primitives to the core-api documentation.
>
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>

Acked-by: Thomas Gleixner <tglx@kernel.org>

^ permalink raw reply

* Re: [PATCH v4 2/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-26  8:51 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260625234158.GA261868@ax162>

On Thu, 25 Jun 2026 16:41:58 -0700
Nathan Chancellor <nathan@kernel.org> wrote:


> The following diff resolves it for me, should I send it as a separate
> patch or do you want to just fold it in with a note?
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 621566345406..2301a701ffbb 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -10,6 +10,7 @@
>  #ifndef __LINUX_LOCKDEP_H
>  #define __LINUX_LOCKDEP_H
>  
> +#include <linux/instruction_pointer.h>

Ah, so the reason for this breakage is because lockdep was relying on
instruction_pointer.h, that just happened to be included in kernel.h
via trace_printk.h.

This is a separate issue, so it should be a separate patch. I'll add it
as patch 1 of this series.

Can you send me the config you used. This didn't trigger in my tests.

Thanks,

-- Steve



>  #include <linux/lockdep_types.h>
>  #include <linux/smp.h>
>  #include <asm/percpu.h>


^ permalink raw reply

* Re: [PATCH net-next] Documentation: networking: Add a test plan for ethtool pause validation
From: Maxime Chevallier @ 2026-06-26  8:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni, Simon Horman,
	Russell King, Heiner Kallweit, Jonathan Corbet, Shuah Khan,
	Oleksij Rempel, Vladimir Oltean, Florian Fainelli,
	thomas.petazzoni, netdev, linux-kernel, linux-doc
In-Reply-To: <58f37d6e-973b-4242-be82-0561ccdb1a6f@lunn.ch>


> Sphinx follows pythons object orientate structure. So you could have a
> class test_ethtool_pause_advertising, with class documentation. And
> then methods within the class which are individual tests.  The
> commented out section would then be method documentation.

Good point, so maybe something along these lines :

 - A class for the test
 - methods for indivitual tests
 - For readability, I've written what the internal test helper would look
   like (_adv_test), and how a test would look like without the helper in
   adv_rx_on_tx_on().

I'm already diving into coding, but it helps me a bit in the definition of the
"description" format :)

this is what the class would look like :


class test_ethtool_pause_advertising:
    """Pause advertisement

    Validate that changing pause params through the ETHTOOL_MSG_PAUSE command
    translates to a change in the advertised pause params, and that these
    parameters are correct w.r.t the supported pause params and requested pause
    params.
    
    This exercises the .set_pauseparams() ethtool ops for MAC configuration,
    as well as the reconfiguration of the PHY's advertising and negociation.
    
    On non-phylink MACs, the MAC should call phy_set_sym_pause() to update the
    PHY's advertising, and restart a negotiation with phy_start_aneg() if
    need be. Failure to do so will result on the wrong advertising parameters.
    
    Pn phylink-enabled MACs, phylink deals with the PHY reconfiguration provided
    the MAC driver calls phylink_ethtool_set_pauseparam().
    
    Failing this test likely means that the PHY driver is not correctly advertising
    pause settings, either due to the MAC triggering a PHY reconfiguration,
    a misconficonfiguration of the advertising registers by the PHY, or by
    mis-handling the phydev->advertising bitfield in the PHY driver directly.
    
    The validation is made by looking at the advertised modes locally, as well as
    what the peer's 'lp_advertising' values report.

    cfg -- local device's interface configuration
    peer -- peer device handle
    """

    def _adv_test(cfg, peer, rx, tx, adv, not_adv):
        ret = cfg.run(f"ethtool -A ethX rx {rx} tx {tx} autoneg on")
        ksft_eq(ret, 0)

        linkmodes = cfg.get_advertising()
        if adv:
            ksft_in(adv, linkmodes, f"rx {rx} tx {tx} must advertise {adv}")

        if not_adv:
            ksft_not_in(not_adv, linkmodes, f"rx {rx} tx {tx} must not advertise {not_adv}")

        remote_linkmodes = peer.get_lp_advertising()

        if adv:
            ksft_in(adv, linkmodes, f"PHY does not advertise {adv}")

        if not_adv:
            ksft_not_in(not_adv, linkmodes, f"PHY incorrectly advertises {not_adv}")


    @ksft_ethtool_needs_supported_allof([Pause])
    def adv_rx_on_tx_on(cfg, peer) -> None:
        """Advertising test with rx on tx on

        - run 'ethtool -A ethX rx on tx on autoneg on'
        - FAIL if the return isn't 0
        - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
          "Pause" or contains "Asym_Pause"
        - FAIL if peer's lp_advertising doesn't contain "Pause" or contains
          "Asym_Pause"
        - Succeed otherwise
        """
        ret = cfg.run('ethtool -A ethX rx on tx on autoneg on')
        ksft_eq(ret, 0)

        linkmodes = cfg.get_advertising()
        ksft_in('Pause', linkmodes, "rx on tx on must advertise Pause")
        ksft_not_in('Asym_Pause', linkmodes, "rx on tx on must not advertise Asym_Pause")

        remote_linkmodes = peer.get_lp_advertising()
        ksft_in('Pause', linkmodes, "PHY does not advertise Pause")
        ksft_not_in('Asym_Pause', linkmodes, "PHY incorrectly advertises Asym_Pause")


    @ksft_ethtool_needs_supported_allof([Pause, Asym_Pause])
    def adv_rx_on_tx_off(cfg, peer) -> None:
        """Advertising test with rx on tx off

        - run 'ethtool -A ethX rx on tx off autoneg on'
        - FAIL if the return isn't 0
        - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
          "Pause" and "Asym_Pause"
        - FAIL if peer's lp_advertising doesn't contain "Pause" and "Asym_Pause"
        - Succeed otherwise
        """

        _adv_test(cfg, peer, 'on', 'off', ["Pause", "Asym_Pause"], [])

    @ksft_ethtool_needs_supported_allof([Asym_Pause])
    def adv_rx_off_tx_on(cfg, peer) -> None:
        """Advertising test with rx off tx on

        - run 'ethtool -A ethX rx off tx on autoneg on'
        - FAIL if the return isn't 0
        - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
          "Asym_Pause" or contains "Pause"
        - FAIL if peer's lp_advertising doesn't contain "Pause" and "Asym_Pause"
        - Succeed otherwise
        """

        _adv_test(cfg, peer, 'off', 'on', ["Asym_Pause"], ["Pause"])


Maxime

^ permalink raw reply

* Re: [PATCH v13 0/4] kunit: Add support for suppressing warning backtraces
From: patchwork-bot+linux-riscv @ 2026-06-26  8:21 UTC (permalink / raw)
  To: Albert Esteve
  Cc: linux-riscv, arnd, brendan.higgins, david, raemoar63,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona, corbet,
	skhan, akpm, pjw, palmer, aou, alex, linux-kernel, linux-arch,
	linux-kselftest, kunit-dev, dri-devel, workflows, linux-doc,
	peterz, acarmina, linux, kees, lkft, mcanal, error27,
	simona.vetter
In-Reply-To: <20260515-kunit_add_support-v13-0-18ee42f96e7b@redhat.com>

Hello:

This series was applied to riscv/linux.git (fixes)
by Shuah Khan <skhan@linuxfoundation.org>:

On Fri, 15 May 2026 14:29:31 +0200 you wrote:
> Some unit tests intentionally trigger warning backtraces by passing bad
> parameters to kernel API functions. Such unit tests typically check the
> return value from such calls, not the existence of the warning backtrace.
> 
> Such intentionally generated warning backtraces are neither desirable
> nor useful for a number of reasons:
> - They can result in overlooked real problems.
> - A warning that suddenly starts to show up in unit tests needs to be
>   investigated and has to be marked to be ignored, for example by
>   adjusting filter scripts. Such filters are ad hoc because there is
>   no real standard format for warnings. On top of that, such filter
>   scripts would require constant maintenance.
> 
> [...]

Here is the summary with links:
  - [v13,1/4] bug/kunit: Core support for suppressing warning backtraces
    (no matching commit)
  - [v13,2/4] kunit: Add backtrace suppression self-tests
    (no matching commit)
  - [v13,3/4] drm: Suppress intentional warning backtraces in scaling unit tests
    (no matching commit)
  - [v13,4/4] kunit: Add documentation for warning backtrace suppression API
    https://git.kernel.org/riscv/c/5c1553dd5db3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE
From: David Woodhouse @ 2026-06-26  7:45 UTC (permalink / raw)
  To: Sean Christopherson, Gerd Hoffmann
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Paul Durrant, kvm, linux-doc, linux-kernel, linux-kselftest
In-Reply-To: <aj21KctIXuf7b_5G@google.com>

[-- Attachment #1: Type: text/plain, Size: 6198 bytes --]

On Thu, 2026-06-25 at 16:09 -0700, Sean Christopherson wrote:
> On Tue, Apr 28, 2026, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > Commit 3617c0ee7decb ("KVM: x86/xen: Only write Xen hypercall page for
> > guest writes to MSR") blocked host-initiated writes from triggering the
> > Xen hypercall page setup, to fix an SRCU usage violation when the
> > hypercall MSR index collides with a real MSR written during vCPU reset.
> > 
> > However, some VMMs legitimately need to trigger hypercall page setup
> > from host context. For example, a VMM may intercept the guest's MSR
> > write to track an epoch (for kexec/crash recovery), and then replay the
> > write as a host-initiated KVM_SET_MSRS to populate the hypercall page.
> > The host_initiated check breaks this use case.
> > 
> > Add KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE as a new vcpu attribute
> > that explicitly invokes kvm_xen_write_hypercall_page() under proper
> > locking. This gives userspace a safe interface to trigger hypercall page
> > setup without going through the MSR write path, preserving the
> > host_initiated defence in depth while restoring the lost functionality.
> 
> This is all kinda silly.  Userspace provides KVM a blob, then userspace intercepts
> the MSR write that triggers doing something with said blob, only to call back into
> KVM to consume the blob that userspace provided in the first place.
> 
> Any chance we can deprecate KVM's kvm_xen_write_hypercall_page(), and instead
> rely on userspace to fill the page?  This extra bit obviously isn't much code to
> carry, but it's yet one more Xen thing to maintain, and we've accumulated a lot
> of those over the years...

We don't actually use the 'blob' mode. That was added in commit
ffde22ac53b6d in 2009 with a comment saying, "A generic mechanism to
delegate MSR writes to userspace seems overkill and risks encouraging
similar MSR abuse in the future.  Thus this patch adds special support
for the Xen HVM MSR."

When João and I came along almost a decade later, in 23200b7a30de3
where we added hypercall interception support we said, "Since this
means KVM owns the ABI, dispense with the facility for the VMM to
provide its own copy of the hypercall pages; just fill them in directly
using VMCALL/VMMCALL as we do for the Hyper-V hypercall page."

I think we could probably rip out the blob mode without any fear of
breaking userspace. Even in 2018 I don't think we could even find the
alleged code from 2009 that used the old support. At least, not in
buildable and usable form?

> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 91fd3673c09a..c16b4560c9e7 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -907,6 +907,13 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
> >  {
> >  	int idx, r = -ENOENT;
> >  
> > +	/*
> > +	 * kvm_xen_write_hypercall_page() manages its own locking.
> > +	 * Handle it before taking xen_lock to avoid a deadlock.
> 
> Do we actually want the side effects that necessitate taking xen.xen_lock?  From
> a uAPI perspective, it's odd to effectively bundle KVM_XEN_ATTR_TYPE_LONG_MODE
> into KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE.

That's *guest* ABI, and it's derived from Xen behaviour. Xen will
'latch' its idea of whether a guest VM is 32-bit or 64-bit, for the
purpose of shared data structures (shared_info page, vcpu_info,
runstate).

Xen latches this from the current mode of the running vCPU in *two*
places:
 • When the hypercall MSR is invoked
 • When the guest sets the event channel GSI (HVM_PARAM_CALLBACK_IRQ).

Thus far, the former has been handled in the kernel (in the code you're
looking at), while the latter is why we have the ioctl to explicitly
latch the guest's long_mode from userspace too, as userspace handles
the HVMOP_set_param calls.

> The other question is, why does kvm_xen_write_hypercall_page() drop xen_lock
> when writing guest memory?  That seems odd and unnecessary.

Huh? It takes the lock to do the thing that needs the lock, then drops
it. That is not "odd and unnecessary" at all.

You've been spending too long with these scope-guarded locks. I *hate*
them. I hate the way they slowly spread around the whole kernel, making
every lock holder hold their locks for just a *little* bit longer than
they need to, slowly increasing lock contention "just a little bit; it
doesn't matter" at a time. I hate the way they stop us thinking about
which locks are needed and in which order, and make it unclear whether
some action in the tail of a function actually *needed* the lock, or
was just caught up in it as collateral damage.

> > +	 */
> > +	if (data->type == KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE)
> > +		return kvm_xen_write_hypercall_page(vcpu, data->u.gpa) ? -EIO : 0;
> 
> -EIO is rather weird, wouldn't -EINVAL be more appropriate?  Ah, and both are
> wrong if copying the blob fails.

-EINVAL is more for "you asked me to do something that doesn't make
sense". -EIO is for "something went wrong when I tried".

Arguably, the thing that's most likely to go wrong is the
kvm_vcpu_write_guest() where it writes instructions[] to the guest, and
maybe that ought to be -EFAULT? But I'm not sure that's quite the right
semantic to return from the ioctl?

> > +
> >  	mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
> >  	idx = srcu_read_lock(&vcpu->kvm->srcu);
> 
> Speaking of writing memory, kvm_xen_write_hypercall_page() expects the caller
> to be in a read-side SRCU critical section (I didn't actually run this with
> PROVE_LOCKING=y, but I don't think I'm missing anything?)

Yes, good catch. Thanks.

> So, if this uAPI is unavoidable seems like we want something like the below.
> Either that or guard all of kvm_xen_write_hypercall_page() with a lock, and put
> the entire thing in a helper so that KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE
> can be handled in a case-statement and doesn't need to grab SRCU on its own.

Makes sense (with the test, of course). Want me to put them together
and resend?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* [PATCH v2 9/9] hwmon: adm1275: Support ROHM BD12790
From: Matti Vaittinen @ 2026-06-26  7:26 UTC (permalink / raw)
  To: Matti Vaittinen, Matti Vaittinen, Matti Vaittinen
  Cc: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan, Wensheng Wang, Matti Vaittinen,
	Ashish Yadav, Vasileios Amoiridis, Kim Seer Paller, ChiShih Tsai,
	Chris Packham, Robert Coulson, linux-hwmon, devicetree,
	linux-kernel, linux-doc
In-Reply-To: <cover.1782458224.git.mazziesaccount@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7213 bytes --]

From: Matti Vaittinen <mazziesaccount@gmail.com>

Add support for ROHM BD12790 hot-swap controller which is largely
similar to Analog Devices adm1272.

The BD12790 uses the same selectable 60V/100V voltage ranges and
15mV/30mV current-sense ranges as the ADM1272, and the same VRANGE
(bit 5) and IRANGE (bit 0) layout in PMON_CONFIG. It therefore uses
a dedicated coefficient table that mirrors adm1272_coefficients, with
the following differences derived from BD12790 datasheet Table 1 (p.18):
- power 60V/30mV: m=17560 (vs. 17561)
- power 100V/30mV: m=10536 (vs. 10535)
- temperature: b=31880 (vs. 31871, reflecting T[11:0] = 4.2*T + 3188)

Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Assisted-by: GitHub Copilot:claude-sonnet-4.6

---
Revision history:
v1 => v2: Changes mostly based on the review by Guenter:
 - Add own 'case' for the bd12790.
 - Don't use wildcard in the macro name.
 - Fix the coefficient computation comment for power to take the
   shunt-resistor scale into account.
---
 drivers/hwmon/pmbus/Kconfig   |  4 +-
 drivers/hwmon/pmbus/adm1275.c | 79 +++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index b3c27f3b2712..6ebc01e26db3 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -52,8 +52,8 @@ config SENSORS_ADM1275
 	help
 	  If you say yes here you get hardware monitoring support for Analog
 	  Devices ADM1075, ADM1272, ADM1273, ADM1275, ADM1276, ADM1278, ADM1281,
-	  ADM1293, ADM1294, ROHM BD12780, and SQ24905C Hot-Swap Controller and
-	  Digital Power Monitors.
+	  ADM1293, ADM1294, ROHM BD12780, ROHM BD12790, and SQ24905C
+	  Hot-Swap Controller and Digital Power Monitors.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called adm1275.
diff --git a/drivers/hwmon/pmbus/adm1275.c b/drivers/hwmon/pmbus/adm1275.c
index 81c50fab7687..406b44852376 100644
--- a/drivers/hwmon/pmbus/adm1275.c
+++ b/drivers/hwmon/pmbus/adm1275.c
@@ -19,7 +19,7 @@
 #include "pmbus.h"
 
 enum chips { adm1075, adm1272, adm1273, adm1275, adm1276, adm1278, adm1281,
-	 adm1293, adm1294, bd12780, sq24905c };
+	 adm1293, adm1294, bd12780, bd12790, sq24905c };
 
 #define ADM1275_MFR_STATUS_IOUT_WARN2	BIT(0)
 #define ADM1293_MFR_STATUS_VAUX_UV_WARN	BIT(5)
@@ -47,7 +47,7 @@ enum chips { adm1075, adm1272, adm1273, adm1275, adm1276, adm1278, adm1281,
 #define ADM1278_VOUT_EN			BIT(1)
 
 #define ADM1278_PMON_DEFCONFIG		(ADM1278_VOUT_EN | ADM1278_TEMP1_EN | ADM1278_TSFILT)
-/* The BD12780 data sheets mark TSFILT bit as reserved. */
+/* The BD127[89]0 data sheets mark TSFILT bit as reserved. */
 #define BD12780_PMON_DEFCONFIG		(ADM1278_VOUT_EN | ADM1278_TEMP1_EN)
 
 #define ADM1293_IRANGE_25		0
@@ -136,6 +136,30 @@ static const struct coefficients adm1272_coefficients[] = {
 
 };
 
+/*
+ * BD12790 coefficients derived from preliminary datasheet, Table 1 (p.18)
+ * and the PMBus direct-format relationship X = (Y * 10^(-R) - b) / m.
+ *
+ * Voltage: V[V] = 14.77e-3 * code (60V) / 24.62e-3 * code (100V)
+ *   -> m = 6770, R=-2 / m = 4062, R=-2
+ * Current: code = I[A] * RS * 132802.1 + 2048 (15mV) / * 66401.06 + 2048 (30mV)
+ *   -> m = 1328, b = 2048 * 10^(-R) = 20480, R=-1 / m = 664, same b and R
+ * Power: code = k * RS * PIN, k = 35119.94 / 17559.97 / 21071.44 / 10535.72
+ *   -> m = round(k / 10^(-3-R)), R=-2 for 60V/15mV, R=-3 for the other three
+ * Temperature: code = 4.2 * T + 3188 -> m = 42, b = 3188 * 10 = 31880, R=-1
+ */
+static const struct coefficients bd12790_coefficients[] = {
+	[0] = { 6770, 0, -2 },		/* voltage, vrange 60V */
+	[1] = { 4062, 0, -2 },		/* voltage, vrange 100V */
+	[2] = { 1328, 20480, -1 },	/* current, vsense range 15mV */
+	[3] = { 664, 20480, -1 },	/* current, vsense range 30mV */
+	[4] = { 3512, 0, -2 },		/* power, vrange 60V, irange 15mV */
+	[5] = { 21071, 0, -3 },		/* power, vrange 100V, irange 15mV */
+	[6] = { 17560, 0, -3 },		/* power, vrange 60V, irange 30mV */
+	[7] = { 10536, 0, -3 },		/* power, vrange 100V, irange 30mV */
+	[8] = { 42, 31880, -1 },	/* temperature */
+};
+
 static const struct coefficients adm1275_coefficients[] = {
 	[0] = { 19199, 0, -2 },		/* voltage, vrange set */
 	[1] = { 6720, 0, -1 },		/* voltage, vrange not set */
@@ -490,6 +514,7 @@ static const struct i2c_device_id adm1275_id[] = {
 	{ "adm1293", adm1293 },
 	{ "adm1294", adm1294 },
 	{ "bd12780", bd12780 },
+	{ "bd12790", bd12790 },
 	{ "mc09c", sq24905c },
 	{ }
 };
@@ -567,7 +592,8 @@ static int adm1275_probe(struct i2c_client *client)
 	if (mid->driver_data == adm1272 || mid->driver_data == adm1273 ||
 	    mid->driver_data == adm1278 || mid->driver_data == adm1281 ||
 	    mid->driver_data == adm1293 || mid->driver_data == adm1294 ||
-	    mid->driver_data == bd12780 || mid->driver_data == sq24905c)
+	    mid->driver_data == bd12780 || mid->driver_data == bd12790 ||
+	    mid->driver_data == sq24905c)
 		config_read_fn = i2c_smbus_read_word_data;
 	else
 		config_read_fn = i2c_smbus_read_byte_data;
@@ -647,6 +673,7 @@ static int adm1275_probe(struct i2c_client *client)
 		data->have_power_sampling = true;
 
 		coefficients = adm1272_coefficients;
+
 		vindex = (config & ADM1275_VRANGE) ? 1 : 0;
 		cindex = (config & ADM1272_IRANGE) ? 3 : 2;
 		/* pindex depends on the combination of the above */
@@ -679,6 +706,51 @@ static int adm1275_probe(struct i2c_client *client)
 		if (config & ADM1278_VIN_EN)
 			info->func[0] |= PMBUS_HAVE_VIN;
 		break;
+
+	/*
+	 * The BD12790 is almost identical to the adm1272. Only the defconfig
+	 * and coefficients have minor differences.
+	 */
+	case bd12790:
+		data->have_vout = true;
+		data->have_pin_max = true;
+		data->have_temp_max = true;
+		data->have_power_sampling = true;
+
+		coefficients = bd12790_coefficients;
+
+		vindex = (config & ADM1275_VRANGE) ? 1 : 0;
+		cindex = (config & ADM1272_IRANGE) ? 3 : 2;
+		/* pindex depends on the combination of the above */
+		switch (config & (ADM1275_VRANGE | ADM1272_IRANGE)) {
+		case 0:
+		default:
+			pindex = 4;
+			break;
+		case ADM1275_VRANGE:
+			pindex = 5;
+			break;
+		case ADM1272_IRANGE:
+			pindex = 6;
+			break;
+		case ADM1275_VRANGE | ADM1272_IRANGE:
+			pindex = 7;
+			break;
+		}
+		tindex = 8;
+
+		info->func[0] |= PMBUS_HAVE_PIN | PMBUS_HAVE_STATUS_INPUT |
+			PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT |
+			PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP;
+
+		ret = adm1275_enable_vout_temp(data, client, config,
+					       BD12780_PMON_DEFCONFIG);
+		if (ret)
+			return ret;
+
+		if (config & ADM1278_VIN_EN)
+			info->func[0] |= PMBUS_HAVE_VIN;
+		break;
 	case adm1275:
 		if (device_config & ADM1275_IOUT_WARN2_SELECT)
 			data->have_oc_fault = true;
@@ -919,6 +991,7 @@ static const struct of_device_id adm1275_of_match[] = {
 	{ .compatible = "adi,adm1293", },
 	{ .compatible = "adi,adm1294", },
 	{ .compatible = "rohm,bd12780", },
+	{ .compatible = "rohm,bd12790", },
 	{ .compatible = "silergy,mc09c", },
 	{ }
 };
-- 
2.54.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox