Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH 08/24] nfsd: update the fsnotify mark when setting or removing a dir delegation
From: Chuck Lever @ 2026-04-08 18:24 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro, Christian Brauner, Jan Kara,
	Chuck Lever, Alexander Aring, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, Amir Goldstein
  Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
	linux-doc, linux-nfs
In-Reply-To: <20260407-dir-deleg-v1-8-aaf68c478abd@kernel.org>


On Tue, Apr 7, 2026, at 9:21 AM, Jeff Layton wrote:
> Add a new helper function that will update the mask on the nfsd_file's
> fsnotify_mark to be a union of all current directory delegations on an
> inode. Call that when directory delegations are added or removed.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c8fb84c38637..9a4cff08c67d 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c

> @@ -1266,6 +1297,7 @@ static void nfs4_unlock_deleg_lease(struct 
> nfs4_delegation *dp)
>  	WARN_ON_ONCE(!fp->fi_delegees);
> 
>  	nfsd4_finalize_deleg_timestamps(dp, nf->nf_file);
> +	nfsd_fsnotify_recalc_mask(nf);
>  	kernel_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
>  	put_deleg_file(fp);
>  }

The grant path in nfsd_get_dir_deleg() uses a different ordering
(setlease first, recalc_mask after).

Here, since the delegation being removed is still in flc_lease,
inode_lease_ignore_mask() includes its ignore flags. The mask is
computed as if the delegation is still present.

The result is that stale FS_CREATE/FS_DELETE/FS_RENAME bits remain
in the fsnotify mark. It might be harmless in practice since the
handler finds no leases and returns early, but it creates
unnecessary work.

Should nfs4_unlock_deleg_lease call nfsd_fsnotify_recalc_mask()
after kernel_setlease(F_UNLCK)?


-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH 01/24] filelock: add support for ignoring deleg breaks for dir change events
From: Chuck Lever @ 2026-04-08 18:16 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro, Christian Brauner, Jan Kara,
	Chuck Lever, Alexander Aring, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, Amir Goldstein
  Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
	linux-doc, linux-nfs
In-Reply-To: <20260407-dir-deleg-v1-1-aaf68c478abd@kernel.org>

On Tue, Apr 7, 2026, at 9:21 AM, Jeff Layton wrote:
> If a NFS client requests a directory delegation with a notification
> bitmask covering directory change events, the server shouldn't recall
> the delegation. Instead the client will be notified of the change after
> the fact.
>
> Add support for ignoring lease breaks on directory changes. Add a new
> flags parameter to try_break_deleg() and teach __break_lease how to
> ignore certain types of delegation break events.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---

> diff --git a/fs/locks.c b/fs/locks.c
> index 8e44b1f6c15a..dafa0752fdce 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c

> @@ -1670,7 +1709,7 @@ int __break_lease(struct inode *inode, unsigned int flags)
>  			locks_delete_lock_ctx(&fl->c, &dispose);
>  	}
> 
> -	if (list_empty(&ctx->flc_lease))
> +	if (!visible_leases_remaining(inode, flags))
>  		goto out;
> 
>  	if (flags & LEASE_BREAK_NONBLOCK) {

After breaking visible leases, the restart: label calls any_leases_conflict()
which does not filter ignored dir-delegation leases. When only ignored leases
remain, any_leases_conflict returns true, but visible_leases_remaining also
returned true (triggering the wait). The code picks the first lease (possibly
ignored), computes break_time = 1 jiffy, blocks, then loops.                                                     

For example, suppose you have two directory delegations on a directory, one
with FL_IGN_DIR_DELETE and one without. After the non-ignored one is broken
and removed, the ignored one keeps any_leases_conflict returning true. The
loop spins at 1-jiffy intervals until the ignored delegation is released.  

Should the restart: block skip ignored leases?

-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH v2] checkpatch: add --json output mode
From: Joe Perches @ 2026-04-08 18:16 UTC (permalink / raw)
  To: Sasha Levin, dwaipayanray1, lukas.bulwahn
  Cc: mricon, corbet, skhan, apw, workflows, linux-doc, linux-kernel
In-Reply-To: <20260408172435.1268067-1-sashal@kernel.org>

On Wed, 2026-04-08 at 13:24 -0400, Sasha Levin wrote:

Adding --json seems sensible but some of the
added checkpatch code seems odd to me.

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> @@ -2395,6 +2400,18 @@ sub report {
>  
>  	push(our @report, $output);
>  
> +	if ($json) {
> +		our ($realfile, $realline);

Seems an odd way to check if $realfile/$readline is set

> +		my %issue = (
> +			level => $level,
> +			type => $type,
> +			message => $msg,
> +		);
> +		$issue{file} = $realfile if (defined $realfile && $realfile ne '');
> +		$issue{line} = $realline + 0 if (defined $realline && $realline);

All the uses of + 0 seem unnecessary, but I gather it's for
string/decimal conversions.

> +sub json_print_result {
> +	my ($filename, $total_errors, $total_warnings, $total_checks,
> +	    $total_lines, $issues, $used_types, $ignored_types) = @_;
> +	my %result = (
> +		filename       => $filename,
> +		total_errors   => $total_errors + 0,
> +		total_warnings => $total_warnings + 0,
> +		total_checks   => $total_checks + 0,
> +		total_lines    => $total_lines + 0,
> +		issues         => $issues,
> +	);
> +	$result{used_types} = $used_types if (defined $used_types);
> +	$result{ignored_types} = $ignored_types if (defined $ignored_types);
> +	my $json_encoder = JSON::PP->new->canonical->utf8;

Maybe add JSON pretty too?

> +	print $json_encoder->encode(\%result) . "\n";

Still missing parentheses around print args.
I do know  that not all existing print uses have parentheses.
I just prefer them to be more like C readable.

> +}
> +
>  sub fixup_current_range {
>  	my ($lineRef, $offset, $length) = @_;
>  
> @@ -2690,14 +2724,15 @@ sub process {
>  	my $last_coalesced_string_linenr = -1;
>  
>  	our @report = ();
> +	our @json_issues = ();
>  	our $cnt_lines = 0;
>  	our $cnt_error = 0;
>  	our $cnt_warn = 0;
>  	our $cnt_chk = 0;
>  
>  	# Trace the real file/line as we go.
> -	my $realfile = '';
> -	my $realline = 0;
> +	our $realfile = '';
> +	our $realline = 0;

?

> @@ -7791,18 +7826,27 @@ sub process {
>  	# If we have no input at all, then there is nothing to report on
>  	# so just keep quiet.
>  	if ($#rawlines == -1) {
> +		if ($json) {
> +			json_print_result($filename, 0, 0, 0, 0, []);
> +		}
>  		exit(0);
>  	}
>  
>  	# In mailback mode only produce a report in the negative, for
>  	# things that appear to be patches.
>  	if ($mailback && ($clean == 1 || !$is_patch)) {
> +		if ($json) {
> +			json_print_result($filename, 0, 0, 0, 0, []);
> +		}
>  		exit(0);
>  	}
>  
>  	# This is not a patch, and we are in 'no-patch' mode so
>  	# just keep quiet.
>  	if (!$chk_patch && !$is_patch) {
> +		if ($json) {
> +			json_print_result($filename, 0, 0, 0, 0, []);
> +		}
>  		exit(0);
>  	}


Duplicated code, maybe use a function or consolidate the code?
Something like:

 	if (($#rawlines == -1) ||
			# If we have no input, there's nothing to report
 	    ($mailback && ($clean == 1 || !$is_patch)) ||
			# In mailback mode only produce a report for what seems to be a patch
 	    (!$chk_patch && !$is_patch)) {
			# This is not a patch, and we are in 'no-patch' mode.
		json_print_result($filename, 0, 0, 0, 0, []) if ($json);
 		exit(0);
 	}

>  
> @@ -7850,6 +7894,13 @@ sub process {
>  		}
>  	}
>  
> +	if ($json) {
> +		my @used = sort keys %use_type;
> +		my @ignored = sort keys %ignore_type;
> +		json_print_result($filename, $cnt_error, $cnt_warn,
> +				  $cnt_chk, $cnt_lines, \@json_issues,
> +				  \@used, \@ignored);
> +	} else {
>  	print report_dump();
>  	if ($summary && !($clean == 1 && $quiet == 1)) {
>  		print "$filename " if ($summary_file);
> @@ -7878,8 +7929,9 @@ NOTE: Whitespace errors detected.
>  EOM
>  		}
>  	}
> +	} # end !$json

I quite dislike misleading indentation.

Perhaps it's  unnecessary here and simpler to use an
exit in the new block at line 7850

>  
> -	if ($clean == 0 && $fix &&
> +	if (!$json && $clean == 0 && $fix &&
>  	    ("@rawlines" ne "@fixed" ||
>  	     $#fixed_inserted >= 0 || $#fixed_deleted >= 0)) {
>  		my $newfile = $filename;
> @@ -7918,7 +7970,7 @@ EOM
>  		}
>  	}
>  
> -	if ($quiet == 0) {
> +	if (!$json && $quiet == 0) {
>  		print "\n";
>  		if ($clean == 1) {
>  			print "$vname has no obvious style problems and is ready for submission.\n";
> 

^ permalink raw reply

* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Danilo Krummrich @ 2026-04-08 18:01 UTC (permalink / raw)
  To: Joel Fernandes, Eliot Courtney
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, alexeyi, joel, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev
In-Reply-To: <da8d03f8-0294-417b-b684-2c20d577f94a@nvidia.com>

On Wed Apr 8, 2026 at 6:58 PM CEST, Joel Fernandes wrote:
> So you're making the code much much worse than before actually. We don't
> new traits and types pointlessly.

I had a look at both approaches and yes, the traits can be considered
boilerplate. But, they are not complex and they just list method signatures that
each version's types already implement functionality wise and they get us rid of
a lot of dispatch sites. The implementation turns out cleaner as there is less
parameter threading throughout call chains, etc. Overall, it seems more
scalable.

On the other hand, there are indeed more abstractions and type indirections to
understand in Eliot's code. I.e. there are advantages and disadvantages to both
approaches.

That said, please engage with Eliot's proposal, it is not as off as your reply
implies and dismissing it right away is not what I'd like to see in this
situation.

^ permalink raw reply

* [PATCH v19 1/2] ACPI:RAS2: Add driver for the ACPI RAS2 feature table
From: shiju.jose @ 2026-04-08 17:28 UTC (permalink / raw)
  To: rafael, bp, akpm, rppt, dferguson, linux-edac, linux-acpi,
	linux-mm, linux-doc, tony.luck, lenb, leo.duran, Yazen.Ghannam,
	mchehab
  Cc: jonathan.cameron, linuxarm, rientjes, jiaqiyan, Jon.Grimm,
	dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, shiju.jose, shijujose2008
In-Reply-To: <20260408172850.183-1-shiju.jose@huawei.com>

From: Shiju Jose <shiju.jose@huawei.com>

ACPI 6.5 Specification, section 5.2.21, defined RAS2 feature table (RAS2).
Driver adds support for RAS2 feature table, which provides interfaces for
platform RAS features, e.g., for HW-based memory scrubbing, and logical to
PA translation service. RAS2 uses PCC channel subspace for communicating
with the ACPI compliant HW platform.

Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/Kconfig  |  11 ++
 drivers/acpi/Makefile |   1 +
 drivers/acpi/bus.c    |   3 +
 drivers/acpi/ras2.c   | 441 ++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ras2.h   |  57 ++++++
 5 files changed, 513 insertions(+)
 create mode 100644 drivers/acpi/ras2.c
 create mode 100644 include/acpi/ras2.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 6f4b545f7377..bc5ca7281f9f 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -294,6 +294,17 @@ config ACPI_CPPC_LIB
 	  If your platform does not support CPPC in firmware,
 	  leave this option disabled.
 
+config ACPI_RAS2
+	bool "ACPI RAS2 driver"
+	select AUXILIARY_BUS
+	depends on MAILBOX
+	depends on PCC
+	help
+	  Add support for the RAS2 feature table and provide interfaces for
+	  platform RAS features, such as hardware-based memory scrubbing.
+
+	  If unsure, select N.
+
 config ACPI_PROCESSOR
 	tristate "Processor"
 	depends on X86 || ARM64 || LOONGARCH || RISCV
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index d1b0affb844f..abfec6745724 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
+obj-$(CONFIG_ACPI_RAS2)		+= ras2.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
 obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 obj-$(CONFIG_ACPI_PFRUT)	+= pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 2ec095e2009e..b1f614416c9a 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -31,6 +31,7 @@
 #include <acpi/apei.h>
 #include <linux/suspend.h>
 #include <linux/prmt.h>
+#include <acpi/ras2.h>
 
 #include "internal.h"
 
@@ -1528,6 +1529,8 @@ static int __init acpi_init(void)
 	acpi_debugger_init();
 	acpi_setup_sb_notify_handler();
 	acpi_viot_init();
+	acpi_ras2_init();
+
 	return 0;
 }
 
diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
new file mode 100644
index 000000000000..2513b597e0f8
--- /dev/null
+++ b/drivers/acpi/ras2.c
@@ -0,0 +1,441 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ACPI RAS2 feature table driver.
+ *
+ * Copyright (c) 2024-2026 HiSilicon Limited.
+ *
+ * Support for RAS2 table - ACPI 6.5 Specification, section 5.2.21, which
+ * provides interfaces for platform RAS features, e.g., for HW-based memory
+ * scrubbing, and logical to physical address translation service. RAS2 uses
+ * PCC channel subspace for communicating with the ACPI compliant HW platform.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) "ACPI RAS2: " fmt
+
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/iopoll.h>
+#include <linux/ktime.h>
+#include <acpi/pcc.h>
+#include <acpi/ras2.h>
+
+/**
+ * struct ras2_sspcc - Data structure for PCC communication
+ * @mbox_client:	struct mbox_client object
+ * @pcc_chan:		Pointer to struct pcc_mbox_chan
+ * @comm_addr:		Pointer to RAS2 PCC shared memory region
+ * @pcc_lock:		PCC lock to provide mutually exclusive access
+ *			to PCC channel subspace
+ * @deadline_us:	Poll PCC status register timeout in microsecs
+ *			for PCC command completion
+ * @pcc_mpar:		Maximum Periodic Access Rate (MPAR) for PCC channel
+ * @pcc_mrtt:		Minimum Request Turnaround Time (MRTT) in microsecs
+ *			OS must wait after completion of a PCC command before
+ *			issuing next command
+ * @last_cmd_cmpl_time: Completion time of last PCC command
+ * @last_mpar_reset:	Time of last MPAR count reset
+ * @mpar_count:		MPAR count
+ * @pcc_id:		Identifier of the RAS2 platform communication channel
+ * @last_cmd:		Last PCC command
+ * @pcc_chnl_acq:	Status of PCC channel acquired
+ */
+struct ras2_sspcc {
+	struct mbox_client		mbox_client;
+	struct pcc_mbox_chan		*pcc_chan;
+	struct acpi_ras2_shmem __iomem	*comm_addr;
+	struct mutex			pcc_lock;
+	unsigned int			deadline_us;
+	unsigned int			pcc_mpar;
+	unsigned int			pcc_mrtt;
+	ktime_t				last_cmd_cmpl_time;
+	ktime_t				last_mpar_reset;
+	int				mpar_count;
+	int				pcc_id;
+	u16				last_cmd;
+	bool				pcc_chnl_acq;
+};
+
+/*
+ * Arbitrary retries for PCC commands because the remote processor could be
+ * much slower to reply. Keep it high enough to cover emulators where the
+ * processors run painfully slow.
+ */
+#define PCC_NUM_RETRIES 600ULL
+#define PCC_CHNL_DEFAULT_LATENCY 1000
+#define PCC_MIN_POLL_USECS 3
+
+#define RAS2_MAX_NUM_PCC_DESCS 100
+#define RAS2_FEAT_TYPE_MEMORY 0x00
+
+static int check_pcc_chan(struct ras2_sspcc *sspcc)
+{
+	struct acpi_ras2_shmem __iomem *gen_comm_base = sspcc->comm_addr;
+	u32 cap_status;
+	u16 status;
+	int rc;
+
+	/*
+	 * As per ACPI spec, the PCC space will be initialized by the
+	 * platform and should have set the command completion bit when
+	 * PCC can be used by OSPM.
+	 *
+	 * Poll PCC status register every PCC_MIN_POLL_USECS for maximum of
+	 * PCC_NUM_RETRIES * PCC channel latency until PCC command complete
+	 * bit is set.
+	 */
+	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
+					status & PCC_STATUS_CMD_COMPLETE,
+					PCC_MIN_POLL_USECS, sspcc->deadline_us);
+	if (rc) {
+		pr_warn("PCC ID: 0x%x: PCC check channel timeout for last command: 0x%x rc=%d\n",
+			sspcc->pcc_id, sspcc->last_cmd, rc);
+
+		return rc;
+	}
+
+	if (status & PCC_STATUS_ERROR) {
+		pr_warn("PCC ID: 0x%x: Error in executing last command: 0x%x\n",
+			sspcc->pcc_id, sspcc->last_cmd);
+		status &= ~PCC_STATUS_ERROR;
+		iowrite16(status, &gen_comm_base->status);
+		return -EIO;
+	}
+
+	cap_status = ioread32(&gen_comm_base->set_caps_status);
+	switch (cap_status) {
+	case ACPI_RAS2_NOT_VALID:
+	case ACPI_RAS2_NOT_SUPPORTED:
+		rc = -EPERM;
+		break;
+	case ACPI_RAS2_BUSY:
+		rc = -EBUSY;
+		break;
+	case ACPI_RAS2_FAILED:
+	case ACPI_RAS2_ABORTED:
+	case ACPI_RAS2_INVALID_DATA:
+		rc = -EINVAL;
+		break;
+	default:
+		rc = 0;
+	}
+
+	iowrite32(0x0, &gen_comm_base->set_caps_status);
+
+	return rc;
+}
+
+/**
+ * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel
+ * @ras2_ctx:	pointer to the RAS2 context structure
+ * @cmd:	RAS2 command to send
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd)
+{
+	struct acpi_ras2_shmem __iomem *gen_comm_base;
+	struct mbox_chan *pcc_channel;
+	struct ras2_sspcc *sspcc;
+	s64 time_delta;
+	int rc;
+
+	if (!ras2_ctx)
+		return -EINVAL;
+
+	lockdep_assert_held(ras2_ctx->pcc_lock);
+	sspcc = ras2_ctx->sspcc;
+	gen_comm_base = sspcc->comm_addr;
+
+	rc = check_pcc_chan(sspcc);
+	if (rc < 0)
+		return rc;
+
+	pcc_channel = sspcc->pcc_chan->mchan;
+
+	/*
+	 * Handle the Minimum Request Turnaround Time (MRTT): the minimum
+	 * amount of time that OSPM must wait after the completion of
+	 * a command before issuing the next command, in microseconds.
+	 */
+	if (sspcc->pcc_mrtt) {
+		time_delta = ktime_us_delta(ktime_get(), sspcc->last_cmd_cmpl_time);
+		if (sspcc->pcc_mrtt > time_delta)
+			udelay(sspcc->pcc_mrtt - time_delta);
+	}
+
+	/*
+	 * Handle the non-zero Maximum Periodic Access Rate (MPAR): the
+	 * maximum number of periodic requests that the subspace channel can
+	 * support, reported in commands per minute. 0 indicates no
+	 * limitation.
+	 *
+	 * This parameter should be ideally zero or large enough so that it
+	 * can handle maximum number of requests that all the cores in the
+	 * system can collectively generate. If it is not, follow the spec and
+	 * just not send the request to the platform after hitting the MPAR
+	 * limit in any 60s window.
+	 */
+	if (sspcc->pcc_mpar) {
+		if (!sspcc->mpar_count) {
+			time_delta = ktime_ms_delta(ktime_get(), sspcc->last_mpar_reset);
+			if ((time_delta < 60 * MSEC_PER_SEC) && sspcc->last_mpar_reset) {
+				dev_dbg(ras2_ctx->dev,
+					"PCC command 0x%x not sent due to MPAR limit", cmd);
+				return -EIO;
+			}
+			sspcc->last_mpar_reset = ktime_get();
+			sspcc->mpar_count = sspcc->pcc_mpar;
+		}
+		sspcc->mpar_count--;
+	}
+
+	/* Write to the shared comm region */
+	iowrite16(cmd, &gen_comm_base->command);
+
+	/* Flip CMD COMPLETE bit */
+	iowrite16(0, &gen_comm_base->status);
+
+	/* Ring doorbell */
+	rc = mbox_send_message(pcc_channel, &cmd);
+	/*
+	 * mbox_send_message() returns a non-negative integer for successful submission
+	 * and a negative value on failure.
+	 */
+	if (rc < 0) {
+		dev_warn(ras2_ctx->dev,
+			 "Error sending PCC mbox message command: 0x%x, rc:%d\n", cmd, rc);
+		return rc;
+	} else {
+		rc = 0;
+	}
+
+	sspcc->last_cmd = cmd;
+
+	/*
+	 * If Minimum Request Turnaround Time is non-zero, need to record the
+	 * completion time of both READ and WRITE commands for proper handling
+	 * of MRTT, so need to check for pcc_mrtt in addition to PCC_CMD_EXEC_RAS2.
+	 */
+	if (cmd == PCC_CMD_EXEC_RAS2 || sspcc->pcc_mrtt) {
+		rc = check_pcc_chan(sspcc);
+		if (sspcc->pcc_mrtt)
+			sspcc->last_cmd_cmpl_time = ktime_get();
+	}
+
+	/*
+	 * Both mbox_chan_txdone() and mbox_client_txdone() require the status
+	 * of the last transmission as the second argument.
+	 */
+	if (pcc_channel->mbox->txdone_irq)
+		mbox_chan_txdone(pcc_channel, rc);
+	else
+		mbox_client_txdone(pcc_channel, rc);
+
+	return rc;
+}
+EXPORT_SYMBOL_FOR_MODULES(ras2_send_pcc_cmd, "acpi_ras2");
+
+static int register_pcc_channel(struct ras2_mem_ctx *ras2_ctx, int pcc_id)
+{
+	struct pcc_mbox_chan *pcc_chan;
+	struct ras2_sspcc *sspcc;
+
+	if (pcc_id < 0)
+		return -EINVAL;
+
+	sspcc = kzalloc(sizeof(*sspcc), GFP_KERNEL);
+	if (!sspcc)
+		return -ENOMEM;
+
+	pcc_chan = pcc_mbox_request_channel(&sspcc->mbox_client, pcc_id);
+	if (IS_ERR(pcc_chan)) {
+		kfree(sspcc);
+		return PTR_ERR(pcc_chan);
+	}
+
+	sspcc->pcc_id		= pcc_id;
+	sspcc->pcc_chan		= pcc_chan;
+	sspcc->comm_addr	= pcc_chan->shmem;
+	if (pcc_chan->latency)
+		sspcc->deadline_us = PCC_NUM_RETRIES * pcc_chan->latency;
+	else
+		sspcc->deadline_us = PCC_NUM_RETRIES * PCC_CHNL_DEFAULT_LATENCY;
+	sspcc->pcc_mrtt		= pcc_chan->min_turnaround_time;
+	sspcc->pcc_mpar		= pcc_chan->max_access_rate;
+	sspcc->mbox_client.knows_txdone	= true;
+
+	ras2_ctx->sspcc		= sspcc;
+	ras2_ctx->comm_addr	= sspcc->comm_addr;
+	ras2_ctx->dev		= pcc_chan->mchan->mbox->dev;
+
+	mutex_init(&sspcc->pcc_lock);
+	ras2_ctx->pcc_lock	= &sspcc->pcc_lock;
+
+	return 0;
+}
+
+static DEFINE_IDA(ras2_ida);
+static void ras2_release(struct device *device)
+{
+	struct auxiliary_device *auxdev = to_auxiliary_dev(device);
+	struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+	struct ras2_sspcc *sspcc;
+
+	ida_free(&ras2_ida, auxdev->id);
+	sspcc = ras2_ctx->sspcc;
+	pcc_mbox_free_channel(sspcc->pcc_chan);
+	kfree(sspcc);
+	kfree(ras2_ctx);
+}
+
+static struct ras2_mem_ctx *add_aux_device(char *name, int channel, u32 pxm_inst)
+{
+	struct ras2_mem_ctx *ras2_ctx;
+	struct ras2_sspcc *sspcc;
+	u32 comp_nid;
+	int id, rc;
+
+	comp_nid = pxm_to_node(pxm_inst);
+	if (comp_nid == NUMA_NO_NODE) {
+		pr_debug("Invalid NUMA node, channel=%d pxm_inst=%d\n", channel, pxm_inst);
+		return ERR_PTR(-EINVAL);
+	}
+
+	ras2_ctx = kzalloc(sizeof(*ras2_ctx), GFP_KERNEL);
+	if (!ras2_ctx)
+		return ERR_PTR(-ENOMEM);
+
+	ras2_ctx->sys_comp_nid = comp_nid;
+
+	rc = register_pcc_channel(ras2_ctx, channel);
+	if (rc < 0) {
+		pr_debug("Failed to register PCC channel=%d pxm_inst=%d rc=%d\n", channel,
+			 pxm_inst, rc);
+		goto ctx_free;
+	}
+
+	id = ida_alloc(&ras2_ida, GFP_KERNEL);
+	if (id < 0) {
+		rc = id;
+		goto pcc_free;
+	}
+
+	ras2_ctx->adev.id		= id;
+	ras2_ctx->adev.name		= name;
+	ras2_ctx->adev.dev.release	= ras2_release;
+	ras2_ctx->adev.dev.parent	= ras2_ctx->dev;
+
+	rc = auxiliary_device_init(&ras2_ctx->adev);
+	if (rc)
+		goto ida_free;
+
+	rc = auxiliary_device_add(&ras2_ctx->adev);
+	if (rc) {
+		auxiliary_device_uninit(&ras2_ctx->adev);
+		return ERR_PTR(rc);
+	}
+
+	return ras2_ctx;
+
+ida_free:
+	ida_free(&ras2_ida, id);
+pcc_free:
+	sspcc = ras2_ctx->sspcc;
+	pcc_mbox_free_channel(sspcc->pcc_chan);
+	kfree(sspcc);
+ctx_free:
+	kfree(ras2_ctx);
+
+	return ERR_PTR(rc);
+}
+
+static void remove_aux_device(struct ras2_mem_ctx *ras2_ctx)
+{
+	if (!ras2_ctx)
+		return;
+
+	auxiliary_device_delete(&ras2_ctx->adev);
+	auxiliary_device_uninit(&ras2_ctx->adev);
+}
+
+static int parse_ras2_table(struct acpi_table_ras2 *ras2_tab)
+{
+	struct acpi_ras2_pcc_desc *pcc_desc_list;
+	struct ras2_mem_ctx **pctx_list;
+	struct ras2_mem_ctx *ras2_ctx;
+	u16 tot_tbl_len;
+	u16 i;
+
+	if (ras2_tab->header.length < sizeof(*ras2_tab)) {
+		pr_warn(FW_WARN "ACPI RAS2 table present but broken (too short, size=%u)\n",
+			ras2_tab->header.length);
+		return -EINVAL;
+	}
+
+	if (!ras2_tab->num_pcc_descs || ras2_tab->num_pcc_descs > RAS2_MAX_NUM_PCC_DESCS) {
+		pr_warn(FW_WARN "No/Invalid number of PCC descs(%d) in ACPI RAS2 table\n",
+			ras2_tab->num_pcc_descs);
+		return -EINVAL;
+	}
+
+	tot_tbl_len = sizeof(*ras2_tab) + ras2_tab->num_pcc_descs * sizeof(*pcc_desc_list);
+	if (ras2_tab->header.length < tot_tbl_len) {
+		pr_warn(FW_WARN "RAS2 table is not large enough to contain PCC descs=%d size=%u)\n",
+			ras2_tab->num_pcc_descs, ras2_tab->header.length);
+		return -EINVAL;
+	}
+
+	pctx_list = kcalloc(ras2_tab->num_pcc_descs, sizeof(*pctx_list), GFP_KERNEL);
+	if (!pctx_list)
+		return -ENOMEM;
+
+	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
+	for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
+			continue;
+
+		ras2_ctx = add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_desc_list->channel_id,
+					  pcc_desc_list->instance);
+		if (IS_ERR(ras2_ctx)) {
+			pr_warn("Failed to add RAS2 auxiliary device rc=%ld\n", PTR_ERR(ras2_ctx));
+			for (; i > 0; i--) {
+				if (pctx_list[i - 1])
+					remove_aux_device(pctx_list[i - 1]);
+			}
+			kfree(pctx_list);
+			return PTR_ERR(ras2_ctx);
+		}
+		pctx_list[i] = ras2_ctx;
+	}
+	kfree(pctx_list);
+
+	return 0;
+}
+
+/**
+ * acpi_ras2_init - RAS2 driver initialization function.
+ *
+ * Extracts the ACPI RAS2 table and retrieves ID for the PCC channel subspace
+ * for communicating with the ACPI compliant HW platform. Driver adds an
+ * auxiliary device, which binds to the memory ACPI RAS2 driver, for each RAS2
+ * memory feature.
+ *
+ * Returns: none.
+ */
+void __init acpi_ras2_init(void)
+{
+	struct acpi_table_ras2 *ras2_tab;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_RAS2, 0, (struct acpi_table_header **)&ras2_tab);
+	if (ACPI_FAILURE(status)) {
+		pr_debug("Failed to get table, %s\n", acpi_format_exception(status));
+		return;
+	}
+
+	if (parse_ras2_table(ras2_tab))
+		pr_debug("Failed to parse RAS2 table\n");
+
+	acpi_put_table((struct acpi_table_header *)ras2_tab);
+}
diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
new file mode 100644
index 000000000000..f4574e8e0a12
--- /dev/null
+++ b/include/acpi/ras2.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ACPI RAS2 (RAS Feature Table) methods.
+ *
+ * Copyright (c) 2024-2026 HiSilicon Limited
+ */
+
+#ifndef _ACPI_RAS2_H
+#define _ACPI_RAS2_H
+
+#include <linux/acpi.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct device;
+
+/*
+ * ACPI spec 6.5 Table 5.82: PCC command codes used by
+ * RAS2 platform communication channel.
+ */
+#define PCC_CMD_EXEC_RAS2 0x01
+
+#define RAS2_AUX_DEV_NAME "ras2"
+#define RAS2_MEM_DEV_ID_NAME "acpi_ras2_mem"
+
+/**
+ * struct ras2_mem_ctx - Context for RAS2 memory features
+ * @adev:		Auxiliary device object
+ * @comm_addr:		Pointer to RAS2 PCC shared memory region
+ * @dev:		Pointer to device backing struct mbox_controller for PCC
+ * @sspcc:		Pointer to local data structure for PCC communication
+ * @pcc_lock:		Pointer to PCC lock to provide mutually exclusive access
+ *			to PCC channel subspace
+ * @sys_comp_nid:	Node ID of the system component that the RAS feature
+ *			is associated with. See ACPI spec 6.5 Table 5.80: RAS2
+ *			Platform Communication Channel Descriptor format,
+ *			Field: Instance
+ */
+struct ras2_mem_ctx {
+	struct auxiliary_device		adev;
+	struct acpi_ras2_shmem __iomem	*comm_addr;
+	struct device			*dev;
+	void				*sspcc;
+	struct mutex			*pcc_lock;
+	u32				sys_comp_nid;
+};
+
+#ifdef CONFIG_ACPI_RAS2
+void __init acpi_ras2_init(void);
+int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd);
+#else
+static inline void acpi_ras2_init(void) { }
+#endif
+
+#endif /* _ACPI_RAS2_H */
-- 
2.43.0


^ permalink raw reply related

* [PATCH v19 0/2] ACPI: Add support for ACPI RAS2 feature table
From: shiju.jose @ 2026-04-08 17:28 UTC (permalink / raw)
  To: rafael, bp, akpm, rppt, dferguson, linux-edac, linux-acpi,
	linux-mm, linux-doc, tony.luck, lenb, leo.duran, Yazen.Ghannam,
	mchehab
  Cc: jonathan.cameron, linuxarm, rientjes, jiaqiyan, Jon.Grimm,
	dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, shiju.jose, shijujose2008

From: Shiju Jose <shiju.jose@huawei.com>

Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 specification, section 5.2.21 and RAS2 HW based memory
scrubbing feature.

ACPI RAS2 patches were part of the EDAC series [1].

The code is based on linux.git v7.0-rc7 [2].

1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
2. https://github.com/torvalds/linux.git

Changes
=======
v18 -> v19:
1. Fixed gemini tool reported issues sent by Borislav. Thanks.
https://sashiko.dev/#/patchset/20260325165714.294-1-shiju.jose%40huawei.com
 - Replace with iowriteX() and ioreadX() for reading fields in RAS2 shared memory
   tables throughout patches considering big-endian architectures. 
 - In ras2_send_pcc_cmd(), add extra check for non-zero last_mpar_reset,
   changed time_delta to s64, add lockdep_assert_held().
 - In register_pcc_channel(), handled case of pcc_chan->latency is 0
   and fixed timeout of 0 to readw_relaxed_poll_timeout().
 - Fixed double free case When auxiliary_device_add() fails, the driver calls
   auxiliary_device_uninit(&ras2_ctx->adev).
 - In parse_ras2_table(), add check to verify table length is large enough to contain the
   num_pcc_descs elements it iterates over.
 - Add some missing cases to acquire pcc_lock, such as ras2_hw_scrub_read_addr()
   and ras2_hw_scrub_read_size(). 
 - Removed clearing base and size in ras2_scrub_monitor_thread() when demand scrubbing
   has finished, to avoid clearing the user set values, though chances are very little.
 - Add new field set_scrub_cycle to ras2_ctx to avoid user set value is being cleared when
   ras2_update_patrol_scrub_params_cache() is being called. 
 - In ras2_hw_scrub_set_enabled_od(), redesigned to avoid prematurely restart the background scrub
   due to race condition in ras2_scrub_monitor_thread(). 
 - rename ras2_probe() to ras2_mem_drv_probe()
 - add ras2_mem_drv_remove() and call kthread_stop() to stop the ras2_scrub_monitor_thread().
   However unregistering the EDAC device which registered in the ras2_mem_drv_probe() will
   automatically happen in the EDAC via the devm_add_action_or_reset() in edac_dev_register(),
   edac_dev_unreg() and edac_dev_release().

v17 -> v18:
1. Fixed few AI tool reported issues shared by Borislav. Thanks.
https://lore.kernel.org/all/20260312165247.GSabLvX5DjzhDtmyuh@fat_crate.local/
2. Re-add support for user setting scrub address range for Daniel's 
   reply in v16, which was removed in v13 because of request to simplify the code and
   with the expectation that the firmware will do the full node demand scrubbing and
    may enable these attributes later in the follow-up patches.
   https://lore.kernel.org/all/df5fe0ed-3483-4ac5-8096-447e4e560816@os.amperecomputing.com/

v16 -> v17:
1. Merged all changes suggested by Borislav.
https://lore.kernel.org/all/20260126171552.GJaXehSJp33nFnpvVd@fat_crate.local/
2. Changes for Borislav's feedback "Add remove_aux_device() which unwinds everything
   add_aux_device() does for all those devices".

v15 -> v16:
Attempt to modify throughout the code and logs for the below comments from Borislav.
Thanks for the comments.
https://lore.kernel.org/all/20251125073627.GLaSVce7hBqGH1a3ni@fat_crate.local/
https://lore.kernel.org/all/20251231131512.GBaVUh4NSWqvr2xhbM@fat_crate.local/
https://lore.kernel.org/all/20260119111701.GBaW4Sres045xnfkpz@fat_crate.local/

v14 -> v15:
1. Incorporated new changes suggested by Borislav on v13.
   https://lore.kernel.org/all/20251231131512.GBaVUh4NSWqvr2xhbM@fat_crate.local/

2. Rebase to v6.19-rc5.

v13 -> v14:
1. Modifications for changes wanted by Borislav.
   https://lore.kernel.org/all/20251125073627.GLaSVce7hBqGH1a3ni@fat_crate.local/

2. Changes for the comments from Randy Dunlap 
   https://lore.kernel.org/all/4807417b-a8f7-47a3-b38a-94ea7bdbf775@infradead.org/
   https://lore.kernel.org/all/af7b6cdc-c0a7-4896-ba6b-6bb933898d37@infradead.org/
   https://lore.kernel.org/all/26083ba9-1979-4d14-8465-3f54f2f96d23@infradead.org/

v12 -> v13:
1. Fixed some bugs reported and changes wanted by Borislav.
   https://lore.kernel.org/all/20250910192707.GAaMHRCxWx37XitN3t@fat_crate.local/ 

2. Tried modifying the patch header as commented by Borislav.

3. Fixed a bug reported by Yazen.
   https://lore.kernel.org/all/20250909162434.GB11602@yaz-khff2.amd.com/

4. Changed setting 'Requested Address Range' for GET_PATROL_PARAMETERS
   command to meet the requirements from Daniel for Ampere Computing
   platform. 
   https://lore.kernel.org/all/7a211c5c-174c-438b-9a98-fd47b057ea4a@os.amperecomputing.com/

5. In RAS2 driver, removed support for scrub control attributes 'addr' and
   'size' for the time being with the expectation that a firmware will do
   the full node demand scrubbing and may enable these attributes in the
   future.

6. Add 'enable_demand' attribute to the EDAC scrub interface to start/stop
   the demand scrub, which is used for the RAS2 demand scrub control.

v11 -> v12:
1. Modified logic for finding the lowest contiguous phy memory addr range for
NUMA domain using node_start_pfn() and node_spanned_pages() according to the
feedback from Mike Rapoport in v11.
https://lore.kernel.org/all/aKsIlFTkBsAF5sqD@kernel.org/

2. Rebase to 6.17-rc4.

v10 -> v11:
1. Simplified code by removing workarounds previously added to support
   non-compliant case of single PCC channel shared across all proximity
   domains (which is no longer required). 
   https://lore.kernel.org/all/f5b28977-0b80-4c39-929b-cf02ab1efb97@os.amperecomputing.com/

2. Fix for the comments from Borislav (Thanks).
   https://lore.kernel.org/all/20250811152805.GQaJoMBecC4DSDtTAu@fat_crate.local/

3. Rebase to 6.17-rc1.

v9 -> v10:
1. Use pcc_chan->shmem instead of 
   acpi_os_ioremap(pcc_chan->shmem_base_addr,...) as it was
   acpi_os_ioremap internally by the PCC driver to pcc_chan->shmem.

2. Changes required for the Ampere Computing system where uses a single
   PCC channel for RAS2 memory features across all NUMA domains. Based on the
   requirements from by Daniel on V9
   https://lore.kernel.org/all/547ed8fb-d6b7-4b6b-a38b-bf13223971b1@os.amperecomputing.com/
   and discussion with Jonathan.
2.1 Add node_to_range lookup facility to numa_memblks. This is to retrieve the lowest
    physical continuous memory range of the memory associated with a NUMA domain.
2.2. Set requested addr range to the memory region's base addr and size
   while send RAS2 cmd GET_PATROL_PARAMETER 
   in functions ras2_update_patrol_scrub_params_cache() &
   ras2_get_patrol_scrub_running().
2.3. Split struct ras2_mem_ctx into struct ras2_mem_ctx_hdr and struct ras2_pxm_domain
   to support cases, uses a single PCC channel for RAS2 scrubbers across all NUMA
   domains and PCC channel per RAS2 scrub instance. Provided ACPI spec define single
   memory scrub per NUMA domain.
2.4. EDAC feature sysfs folder for RAS2 changed from "acpi_ras_memX" to  "acpi_ras_mem_idX"
   because memory scrub instances across all NUMA domains would present under
   "acpi_ras_mem_id0" when a system uses a single PCC channel for RAS2 scrubbers across
   all NUMA domains etc.
2.5. Removed Acked-by: Rafael from patch [2], because of the several above changes from v9.

v8 -> v9:
1. Added following changes for feedback from Yazen.
 1.1 In ras2_check_pcc_chan(..) function
    - u32 variables moved to the same line.
    - Updated error log for readw_relaxed_poll_timeout()
    - Added error log for if (status & PCC_STATUS_ERROR), error condition.
    - Removed an impossible condition check.
  1.2. Added guard for ras2_pc_list_lock in ras2_get_pcc_subspace().

2. Rebased to linux.git v6.16-rc2 [2].

v7 -> v8:
1. Rebased to linux.git v6.16-rc1 [2].

v6 -> v7:
1. Fix for the issue reported by Daniel,
   In ras2_check_pcc_chan(), add read, clear and check RAS2 set_cap_status outside
   if (status & PCC_STATUS_ERROR) check. 
   https://lore.kernel.org/all/51bcb52c-4132-4daf-8903-29b121c485a1@os.amperecomputing.com/

v5 -> v6:
1. Fix for the issue reported by Daniel, in start scrubbing with correct addr and size
   after firmware return INVALID DATA error for scrub request with invalid addr or size.
   https://lore.kernel.org/all/8cdf7885-31b3-4308-8a7c-f4e427486429@os.amperecomputing.com/

v4 -> v5:
1. Fix for the build warnings reported by kernel test robot.
   https://patchwork.kernel.org/project/linux-edac/patch/20250423163511.1412-3-shiju.jose@huawei.com/
2. Removed patch "ACPI: ACPI 6.5: RAS2: Rename RAS2 table structure and field names"
   from the series as the patch was merged to linux-pm.git : branch linux-next
3. Rebased to ras.git: edac-for-next branch merged with linux-pm.git : linux-next branch.

v3 -> v4:
1.  Changes for feedbacks from Yazen on v3.
    https://lore.kernel.org/all/20250415210504.GA854098@yaz-khff2.amd.com/

v2 -> v3:
1. Rename RAS2 table structure and field names in 
   include/acpi/actbl2.h limited to only necessary
   for RAS2 scrub feature.
2. Changes for feedbacks from Jonathan on v2.
3. Daniel reported a known behaviour: when readback 'size' attribute after
   setting in, returns 0 before starting scrubbing via 'addr' attribute.
   Changes added to fix this.
4. Daniel reported that firmware cannot update status of demand scrubbing
   via the 'Actual Address Range (OUTPUT)', thus add workaround in the
   kernel to update sysfs 'addr' attribute with the status of demand
   scrubbing.
5. Optimized logic in ras2_check_pcc_chan() function
   (patch - ACPI:RAS2: Add ACPI RAS2 driver).
6. Add PCC channel lock to struct ras2_pcc_subspace and change
   lock in ras2_mem_ctx as a pointer to pcc channel lock to make sure
   writing to PCC subspace shared memory is protected from race conditions.

v1 -> v2:
1.  Changes for feedbacks from Borislav.
    - Shorten ACPI RAS2 structures and variables names.
    - Shorten some of the other variables in the RAS2 drivers.
    - Fixed few CamelCases.

2.  Changes for feedbacks from Yazen.
    - Added newline after number of '}' and return statements.
    - Changed return type for "ras2_add_aux_device() to 'int'.
    - Deleted a duplication of acpi_get_table("RAS2",...) in the ras2_acpi_parse_table().
    - Add "FW_WARN" to few error logs in the ras2_acpi_parse_table().
    - Rename ras2_acpi_init() to acpi_ras2_init() and modified to call acpi_ras2_init()
      function from the acpi_init().
    - Moved scrub related variables from the struct ras2_mem_ctx from  patch
      "ACPI:RAS2: Add ACPI RAS2 driver" to "ras: mem: Add memory ACPI RAS2 driver".

Shiju Jose (2):
  ACPI:RAS2: Add driver for the ACPI RAS2 feature table
  ras: mem: Add ACPI RAS2 memory driver

 Documentation/ABI/testing/sysfs-edac-scrub |  13 +-
 Documentation/edac/scrub.rst               |  70 +++
 drivers/acpi/Kconfig                       |  11 +
 drivers/acpi/Makefile                      |   1 +
 drivers/acpi/bus.c                         |   3 +
 drivers/acpi/ras2.c                        | 441 +++++++++++++++++
 drivers/edac/scrub.c                       |  12 +
 drivers/ras/Kconfig                        |  13 +
 drivers/ras/Makefile                       |   1 +
 drivers/ras/acpi_ras2.c                    | 540 +++++++++++++++++++++
 include/acpi/ras2.h                        |  84 ++++
 include/linux/edac.h                       |   4 +
 12 files changed, 1188 insertions(+), 5 deletions(-)
 create mode 100644 drivers/acpi/ras2.c
 create mode 100644 drivers/ras/acpi_ras2.c
 create mode 100644 include/acpi/ras2.h

-- 
2.43.0

^ permalink raw reply

* [PATCH v19 2/2] ras: mem: Add ACPI RAS2 memory driver
From: shiju.jose @ 2026-04-08 17:28 UTC (permalink / raw)
  To: rafael, bp, akpm, rppt, dferguson, linux-edac, linux-acpi,
	linux-mm, linux-doc, tony.luck, lenb, leo.duran, Yazen.Ghannam,
	mchehab
  Cc: jonathan.cameron, linuxarm, rientjes, jiaqiyan, Jon.Grimm,
	dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, shiju.jose, shijujose2008
In-Reply-To: <20260408172850.183-1-shiju.jose@huawei.com>

From: Shiju Jose <shiju.jose@huawei.com>

ACPI 6.5 Specification, section 5.2.21, defined RAS2 feature table (RAS2).
Driver adds support for RAS2 feature table, which provides interfaces for
platform RAS features, e.g., for HW-based memory scrubbing, and logical to
PA translation service. RAS2 uses PCC channel subspace for communicating
with the ACPI compliant HW platform.

ACPI RAS2 auxiliary driver for the memory features binds to the auxiliary
device, which is added by the RAS2 table parser in the ACPI RAS2 driver.

Given the address range provided to the userspace may be the lowest scrub
address range in the presence of disjoint address ranges, skipping address
ranges that are from other NUMA nodes but happen to lie within this range.

Driver retrieves the PA range of the NUMA domain and use it as the
'Requested Address Range', when send RAS2 command GET_PATROL_PARAMETERS
to get parameters that apply to all addresses in the NUMA domain.

Device with ACPI RAS2 scrub feature registers with EDAC device driver,
which retrieves the scrub descriptor from EDAC scrub and exposes
the scrub control attributes for RAS2 scrub instance to userspace in
/sys/bus/edac/devices/acpi_ras_memX/scrub0/.

Add 'enable_demand' attribute to the EDAC scrub interface to start/stop
the demand scrub, which is used in the RAS2 demand scrub control.
When a demand scrub is started, any background scrub currently in progress
will be stopped and then automatically restarted at the beginning when the
demand scrub has completed.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/ABI/testing/sysfs-edac-scrub |  13 +-
 Documentation/edac/scrub.rst               |  70 +++
 drivers/edac/scrub.c                       |  12 +
 drivers/ras/Kconfig                        |  13 +
 drivers/ras/Makefile                       |   1 +
 drivers/ras/acpi_ras2.c                    | 540 +++++++++++++++++++++
 include/acpi/ras2.h                        |  27 ++
 include/linux/edac.h                       |   4 +
 8 files changed, 675 insertions(+), 5 deletions(-)
 create mode 100644 drivers/ras/acpi_ras2.c

diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
index ab6014743da5..3f68f63556f4 100644
--- a/Documentation/ABI/testing/sysfs-edac-scrub
+++ b/Documentation/ABI/testing/sysfs-edac-scrub
@@ -20,11 +20,7 @@ KernelVersion:	6.15
 Contact:	linux-edac@vger.kernel.org
 Description:
 		(RW) The base address of the memory region to be scrubbed
-		for on-demand scrubbing. Setting address starts scrubbing.
-		The size must be set before that.
-
-		The readback addr value is non-zero if the requested
-		on-demand scrubbing is in progress, zero otherwise.
+		for demand scrubbing.
 
 What:		/sys/bus/edac/devices/<dev-name>/scrubX/size
 Date:		March 2025
@@ -34,6 +30,13 @@ Description:
 		(RW) The size of the memory region to be scrubbed
 		(on-demand scrubbing).
 
+What:		/sys/bus/edac/devices/<dev-name>/scrubX/enable_demand
+Date:		Jan 2026
+KernelVersion:	6.19
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Start/Stop demand scrubbing if supported.
+
 What:		/sys/bus/edac/devices/<dev-name>/scrubX/enable_background
 Date:		March 2025
 KernelVersion:	6.15
diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
index 2cfa74fa1ffd..562bfd6ff630 100644
--- a/Documentation/edac/scrub.rst
+++ b/Documentation/edac/scrub.rst
@@ -340,3 +340,73 @@ controller or platform when unexpectedly high error rates are detected.
 
 Sysfs files for scrubbing are documented in
 `Documentation/ABI/testing/sysfs-edac-ecs`
+
+3. ACPI RAS2 Hardware-based Memory Scrubbing
+
+3.1. Demand scrubbing for a specific memory region.
+
+3.1.1. Query the status of demand scrubbing
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_demand
+
+0
+
+3.1.2. Query what is device default/current scrub cycle setting.
+
+Applicable to both demand and background scrubbing. The unit of the
+scrub cycle is seconds.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+36000
+
+3.1.3. Query the range of device supported scrub cycle for a memory region.
+The unit of the scrub cycle range is seconds.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/min_cycle_duration
+
+3600
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/max_cycle_duration
+
+86400
+
+3.1.4. Program scrubbing for the memory region in RAS2 device to repeat every
+43200 seconds (half a day).
+
+# echo 43200 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+3.1.5. Set address range.
+
+Set 'addr' of the memory region to scrub.
+
+# echo 0x80000000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+Set 'size' of the memory region to scrub.
+
+# echo 0x200000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/size
+
+3.1.6. Start 'demand scrubbing'.
+
+When a demand scrub is started, any background scrub currently in progress
+will be stopped and then automatically restarted at the beginning when the
+demand scrub has completed.
+
+# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_demand
+
+3.2. Background scrubbing the entire memory
+
+3.2.1. Query the status of background scrubbing.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
+
+0
+
+3.2.2. Program background scrubbing for RAS2 device to repeat in every 21600
+seconds (quarter of a day).
+
+# echo 21600 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+3.2.3. Start 'background scrubbing'.
+
+# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
diff --git a/drivers/edac/scrub.c b/drivers/edac/scrub.c
index f9d02af2fc3a..f3b9a2f04950 100644
--- a/drivers/edac/scrub.c
+++ b/drivers/edac/scrub.c
@@ -14,6 +14,7 @@ enum edac_scrub_attributes {
 	SCRUB_ADDRESS,
 	SCRUB_SIZE,
 	SCRUB_ENABLE_BACKGROUND,
+	SCRUB_ENABLE_DEMAND,
 	SCRUB_MIN_CYCLE_DURATION,
 	SCRUB_MAX_CYCLE_DURATION,
 	SCRUB_CUR_CYCLE_DURATION,
@@ -55,6 +56,7 @@ static ssize_t attrib##_show(struct device *ras_feat_dev,			\
 EDAC_SCRUB_ATTR_SHOW(addr, read_addr, u64, "0x%llx\n")
 EDAC_SCRUB_ATTR_SHOW(size, read_size, u64, "0x%llx\n")
 EDAC_SCRUB_ATTR_SHOW(enable_background, get_enabled_bg, bool, "%u\n")
+EDAC_SCRUB_ATTR_SHOW(enable_demand, get_enabled_od, bool, "%u\n")
 EDAC_SCRUB_ATTR_SHOW(min_cycle_duration, get_min_cycle, u32, "%u\n")
 EDAC_SCRUB_ATTR_SHOW(max_cycle_duration, get_max_cycle, u32, "%u\n")
 EDAC_SCRUB_ATTR_SHOW(current_cycle_duration, get_cycle_duration, u32, "%u\n")
@@ -84,6 +86,7 @@ static ssize_t attrib##_store(struct device *ras_feat_dev,			\
 EDAC_SCRUB_ATTR_STORE(addr, write_addr, u64, kstrtou64)
 EDAC_SCRUB_ATTR_STORE(size, write_size, u64, kstrtou64)
 EDAC_SCRUB_ATTR_STORE(enable_background, set_enabled_bg, unsigned long, kstrtoul)
+EDAC_SCRUB_ATTR_STORE(enable_demand, set_enabled_od, unsigned long, kstrtoul)
 EDAC_SCRUB_ATTR_STORE(current_cycle_duration, set_cycle_duration, unsigned long, kstrtoul)
 
 static umode_t scrub_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id)
@@ -119,6 +122,14 @@ static umode_t scrub_attr_visible(struct kobject *kobj, struct attribute *a, int
 				return 0444;
 		}
 		break;
+	case SCRUB_ENABLE_DEMAND:
+		if (ops->get_enabled_od) {
+			if (ops->set_enabled_od)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
 	case SCRUB_MIN_CYCLE_DURATION:
 		if (ops->get_min_cycle)
 			return a->mode;
@@ -164,6 +175,7 @@ static int scrub_create_desc(struct device *scrub_dev,
 		[SCRUB_ADDRESS] = EDAC_SCRUB_ATTR_RW(addr, instance),
 		[SCRUB_SIZE] = EDAC_SCRUB_ATTR_RW(size, instance),
 		[SCRUB_ENABLE_BACKGROUND] = EDAC_SCRUB_ATTR_RW(enable_background, instance),
+		[SCRUB_ENABLE_DEMAND] = EDAC_SCRUB_ATTR_RW(enable_demand, instance),
 		[SCRUB_MIN_CYCLE_DURATION] = EDAC_SCRUB_ATTR_RO(min_cycle_duration, instance),
 		[SCRUB_MAX_CYCLE_DURATION] = EDAC_SCRUB_ATTR_RO(max_cycle_duration, instance),
 		[SCRUB_CUR_CYCLE_DURATION] = EDAC_SCRUB_ATTR_RW(current_cycle_duration, instance)
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..a1e6aed8bcc8 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -46,4 +46,17 @@ config RAS_FMPM
 	  Memory will be retired during boot time and run time depending on
 	  platform-specific policies.
 
+config MEM_ACPI_RAS2
+	tristate "Memory ACPI RAS2 driver"
+	depends on ACPI_RAS2
+	depends on EDAC
+	depends on EDAC_SCRUB
+	select NUMA_KEEP_MEMINFO
+	help
+	  The driver binds to the auxiliary device added by the ACPI RAS2
+	  feature table parser. The driver uses a PCC channel subspace to
+	  communicate with the ACPI-compliant platform and provides
+	  control of the HW-based memory scrubber parameters to the user
+	  through the EDAC scrub interface.
+
 endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..a0e6e903d6b0 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_RAS)	+= ras.o
 obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_RAS_CEC)	+= cec.o
+obj-$(CONFIG_MEM_ACPI_RAS2)	+= acpi_ras2.o
 
 obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
 obj-y			+= amd/atl/
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
new file mode 100644
index 000000000000..129503fdd0bb
--- /dev/null
+++ b/drivers/ras/acpi_ras2.c
@@ -0,0 +1,540 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ACPI RAS2 memory driver
+ *
+ * Copyright (c) 2024-2026 HiSilicon Limited.
+ *
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"ACPI RAS2 MEMORY: " fmt
+
+#include <linux/bitfield.h>
+#include <linux/delay.h>
+#include <linux/edac.h>
+#include <linux/kthread.h>
+#include <linux/platform_device.h>
+#include <acpi/ras2.h>
+
+#define RAS2_SUPPORT_HW_PARTOL_SCRUB BIT(0)
+#define RAS2_TYPE_PATROL_SCRUB 0x0000
+
+#define RAS2_GET_PATROL_PARAMETERS 0x01
+#define RAS2_START_PATROL_SCRUBBER 0x02
+#define RAS2_STOP_PATROL_SCRUBBER 0x03
+
+/*
+ * RAS2 patrol scrub
+ */
+#define RAS2_PS_SC_HRS_IN_MASK GENMASK(15, 8)
+#define RAS2_PS_EN_BACKGROUND BIT(0)
+#define RAS2_PS_SC_HRS_OUT_MASK GENMASK(7, 0)
+#define RAS2_PS_MIN_SC_HRS_OUT_MASK GENMASK(15, 8)
+#define RAS2_PS_MAX_SC_HRS_OUT_MASK GENMASK(23, 16)
+#define RAS2_PS_FLAG_SCRUB_RUNNING BIT(0)
+
+#define RAS2_SCRUB_NAME_LEN 128
+#define RAS2_HOUR_IN_SECS 3600
+
+struct acpi_ras2_ps_shared_mem {
+	struct acpi_ras2_shmem common;
+	struct acpi_ras2_patrol_scrub_param params;
+};
+
+#define TO_ACPI_RAS2_PS_SHMEM(_addr) \
+	container_of(_addr, struct acpi_ras2_ps_shared_mem, common)
+
+static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable);
+
+static int ras2_is_patrol_scrub_support(struct ras2_mem_ctx *ras2_ctx)
+{
+	struct acpi_ras2_shmem __iomem *common = (void *)ras2_ctx->comm_addr;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	common->set_caps[0] = 0;
+
+	return common->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB;
+}
+
+static int ras2_update_patrol_scrub_params_cache(struct ras2_mem_ctx *ras2_ctx)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	u32 scrub_params_out;
+	int ret;
+
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	iowrite16(RAS2_GET_PATROL_PARAMETERS, &ps_sm->params.command);
+	iowrite64(ras2_ctx->mem_base, &ps_sm->params.req_addr_range[0]);
+	iowrite64(ras2_ctx->mem_size, &ps_sm->params.req_addr_range[1]);
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "Failed to read patrol scrub parameters\n");
+		return ret;
+	}
+
+	scrub_params_out = ioread32(&ps_sm->params.scrub_params_out);
+	ras2_ctx->min_scrub_cycle = FIELD_GET(RAS2_PS_MIN_SC_HRS_OUT_MASK,
+					      scrub_params_out);
+	ras2_ctx->max_scrub_cycle = FIELD_GET(RAS2_PS_MAX_SC_HRS_OUT_MASK,
+					      scrub_params_out);
+	ras2_ctx->scrub_cycle_hrs = FIELD_GET(RAS2_PS_SC_HRS_OUT_MASK,
+					      scrub_params_out);
+	if (ras2_ctx->bg_scrub) {
+		ras2_ctx->od_scrub = false;
+		ras2_ctx->base = 0;
+		ras2_ctx->size = 0;
+		return 0;
+	}
+
+	if  (ioread32(&ps_sm->params.flags) & RAS2_PS_FLAG_SCRUB_RUNNING) {
+		ras2_ctx->od_scrub = true;
+		ras2_ctx->base = ioread64(&ps_sm->params.actl_addr_range[0]);
+		ras2_ctx->size = ioread64(&ps_sm->params.actl_addr_range[1]);
+	} else {
+		ras2_ctx->od_scrub = false;
+	}
+
+	return 0;
+}
+
+/* Context - PCC lock must be held */
+static int ras2_get_demand_scrub_running(struct ras2_mem_ctx *ras2_ctx, bool *running)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	int ret;
+
+	if (!ras2_ctx->od_scrub) {
+		*running = false;
+		return 0;
+	}
+
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	iowrite16(RAS2_GET_PATROL_PARAMETERS, &ps_sm->params.command);
+	iowrite64(ras2_ctx->mem_base, &ps_sm->params.req_addr_range[0]);
+	iowrite64(ras2_ctx->mem_size, &ps_sm->params.req_addr_range[1]);
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "Failed to read patrol scrub parameters\n");
+		return ret;
+	}
+
+	*running = ioread32(&ps_sm->params.flags) & RAS2_PS_FLAG_SCRUB_RUNNING;
+	if (!(*running))
+		ras2_ctx->od_scrub = false;
+
+	return 0;
+}
+
+static int ras2_scrub_monitor_thread(void *p)
+{
+	struct ras2_mem_ctx *ras2_ctx = (struct ras2_mem_ctx *)p;
+	bool running;
+	int ret;
+
+	while (!kthread_should_stop()) {
+		if (!ras2_ctx->reenable_bg_scrub)
+			return 0;
+
+		mutex_lock(ras2_ctx->pcc_lock);
+		ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+		mutex_unlock(ras2_ctx->pcc_lock);
+		if (ret)
+			return ret;
+
+		if (!running)
+			return ras2_hw_scrub_set_enabled_bg(ras2_ctx->dev, ras2_ctx, true);
+
+		msleep(1000);
+	}
+
+	ras2_ctx->thread = NULL;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data, u32 *min)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*min = ras2_ctx->min_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data, u32 *max)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*max = ras2_ctx->max_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_cycle_read(struct device *dev, void *drv_data, u32 *scrub_cycle_secs)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*scrub_cycle_secs = ras2_ctx->scrub_cycle_hrs * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_cycle_write(struct device *dev, void *drv_data, u32 scrub_cycle_secs)
+{
+	u32 scrub_cycle_hrs = scrub_cycle_secs / RAS2_HOUR_IN_SECS;
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	if (scrub_cycle_hrs < ras2_ctx->min_scrub_cycle ||
+	    scrub_cycle_hrs > ras2_ctx->max_scrub_cycle)
+		return -EINVAL;
+
+	ras2_ctx->set_scrub_cycle = scrub_cycle_hrs;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_addr(struct device *dev, void *drv_data, u64 *base)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	int ret;
+
+	/*
+	 * When BG scrubbing is enabled the actual address range is not valid.
+	 * Return -EBUSY now unless find out a method to retrieve actual full PA range.
+	 */
+	if (ras2_ctx->bg_scrub)
+		return -EBUSY;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	*base = ras2_ctx->base;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_size(struct device *dev, void *drv_data, u64 *size)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	int ret;
+
+	if (ras2_ctx->bg_scrub)
+		return -EBUSY;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	*size = ras2_ctx->size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	ras2_ctx->base = base;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_write_size(struct device *dev, void *drv_data, u64 size)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	if (!size)
+		return -EINVAL;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	ras2_ctx->size = size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*enabled = ras2_ctx->bg_scrub;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	u32 scrub_params_in;
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	if (enable) {
+		if (ras2_ctx->bg_scrub || running)
+			return -EBUSY;
+
+		iowrite64(0, &ps_sm->params.req_addr_range[0]);
+		iowrite64(0, &ps_sm->params.req_addr_range[1]);
+		scrub_params_in = ioread32(&ps_sm->params.scrub_params_in);
+		scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
+		scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK, ras2_ctx->set_scrub_cycle);
+		iowrite32(scrub_params_in, &ps_sm->params.scrub_params_in);
+		iowrite16(RAS2_START_PATROL_SCRUBBER, &ps_sm->params.command);
+	} else {
+		if (!ras2_ctx->bg_scrub)
+			return -EPERM;
+
+		iowrite16(RAS2_STOP_PATROL_SCRUBBER, &ps_sm->params.command);
+	}
+
+	scrub_params_in = ioread32(&ps_sm->params.scrub_params_in);
+	scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
+	scrub_params_in |= FIELD_PREP(RAS2_PS_EN_BACKGROUND, enable);
+	iowrite32(scrub_params_in, &ps_sm->params.scrub_params_in);
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(dev, "Failed to %s background scrubbing\n",
+			str_enable_disable(enable));
+		return ret;
+	}
+
+	ras2_ctx->bg_scrub = enable;
+	if (enable) {
+		ras2_ctx->reenable_bg_scrub = false;
+		/* Update the cache to account for rounding of supplied parameters and similar */
+		return ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	}
+
+	return 0;
+}
+
+static int ras2_hw_scrub_get_enabled_od(struct device *dev, void *drv_data, bool *enabled)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	*enabled = running;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_od(struct device *dev, void *drv_data, bool enable)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	u32 scrub_params_in;
+	bool running;
+	int ret;
+
+	if (!enable)
+		return -EOPNOTSUPP;
+
+	/* Stop any background scrub currently in progress */
+	if (ras2_ctx->bg_scrub) {
+		ret = ras2_hw_scrub_set_enabled_bg(dev, drv_data, false);
+		if (ret)
+			return ret;
+
+		ras2_ctx->reenable_bg_scrub = true;
+	}
+
+	mutex_lock(ras2_ctx->pcc_lock);
+	ret = ras2_get_demand_scrub_running(ras2_ctx, &running);
+	if (ret)
+		goto enable_bg_scrub;
+
+	if (running) {
+		ret = -EBUSY;
+		goto enable_bg_scrub;
+	}
+
+	/* May add more validity checks for the address range in the future if necessary */
+	if (!ras2_ctx->size || ras2_ctx->base < ras2_ctx->mem_base) {
+		dev_err(dev, "%s: Invalid address range, base=0x%llx size=0x%llx\n",
+			__func__, ras2_ctx->base, ras2_ctx->size);
+		ret = -ERANGE;
+		goto enable_bg_scrub;
+	}
+
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	scrub_params_in = ioread32(&ps_sm->params.scrub_params_in);
+	scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
+	scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK, ras2_ctx->set_scrub_cycle);
+	scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
+	iowrite32(scrub_params_in, &ps_sm->params.scrub_params_in);
+	iowrite64(ras2_ctx->base, &ps_sm->params.req_addr_range[0]);
+	iowrite64(ras2_ctx->size, &ps_sm->params.req_addr_range[1]);
+	iowrite16(RAS2_START_PATROL_SCRUBBER, &ps_sm->params.command);
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(dev, "Failed to start demand scrubbing rc(%d)\n", ret);
+		if (ret != -EBUSY) {
+			iowrite64(0, &ps_sm->params.req_addr_range[0]);
+			iowrite64(0, &ps_sm->params.req_addr_range[1]);
+			ras2_ctx->od_scrub = false;
+			ras2_ctx->base = 0;
+			ras2_ctx->size = 0;
+		}
+		goto enable_bg_scrub;
+	}
+
+	ras2_ctx->od_scrub = enable;
+
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+
+	if (ras2_ctx->reenable_bg_scrub && !ras2_ctx->thread) {
+		ras2_ctx->thread = kthread_run(ras2_scrub_monitor_thread, ras2_ctx,
+					       "ras2_scrub_nid%d", ras2_ctx->sys_comp_nid);
+		if (IS_ERR(ras2_ctx->thread)) {
+			ret = PTR_ERR(ras2_ctx->thread);
+			ras2_ctx->thread = NULL;
+			goto enable_bg_scrub;
+		}
+	}
+	mutex_unlock(ras2_ctx->pcc_lock);
+
+	return ret;
+
+enable_bg_scrub:
+	mutex_unlock(ras2_ctx->pcc_lock);
+	if (ras2_ctx->reenable_bg_scrub) {
+		ras2_ctx->reenable_bg_scrub = false;
+		ras2_hw_scrub_set_enabled_bg(dev, drv_data, true);
+	}
+
+	return ret;
+}
+
+static const struct edac_scrub_ops ras2_scrub_ops = {
+	.read_addr = ras2_hw_scrub_read_addr,
+	.read_size = ras2_hw_scrub_read_size,
+	.write_addr = ras2_hw_scrub_write_addr,
+	.write_size = ras2_hw_scrub_write_size,
+	.get_enabled_bg = ras2_hw_scrub_get_enabled_bg,
+	.set_enabled_bg = ras2_hw_scrub_set_enabled_bg,
+	.get_enabled_od = ras2_hw_scrub_get_enabled_od,
+	.set_enabled_od = ras2_hw_scrub_set_enabled_od,
+	.get_min_cycle = ras2_hw_scrub_read_min_scrub_cycle,
+	.get_max_cycle = ras2_hw_scrub_read_max_scrub_cycle,
+	.get_cycle_duration = ras2_hw_scrub_cycle_read,
+	.set_cycle_duration = ras2_hw_scrub_cycle_write,
+};
+
+static void ras2_mem_drv_remove(struct auxiliary_device *auxdev)
+{
+	struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+
+	if (ras2_ctx && ras2_ctx->thread) {
+		kthread_stop(ras2_ctx->thread);
+		ras2_ctx->thread = NULL;
+	}
+}
+
+static int ras2_mem_drv_probe(struct auxiliary_device *auxdev, const struct auxiliary_device_id *id)
+{
+	struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+	struct edac_dev_feature ras_features;
+	char scrub_name[RAS2_SCRUB_NAME_LEN];
+	unsigned long start_pfn, size_pfn;
+	int ret;
+
+	if (!ras2_is_patrol_scrub_support(ras2_ctx))
+		return -EOPNOTSUPP;
+
+	/*
+	 * Retrieve the PA range of the NUMA domain and use it as the
+	 * 'Requested Address Range', when send RAS2 command
+	 * GET_PATROL_PARAMETERS to get parameters that apply to all addresses
+	 * in the NUMA domain as well as when send command START_PATROL_SCRUBBER
+	 * to start the demand scrubbing.
+	 */
+	start_pfn = node_start_pfn(ras2_ctx->sys_comp_nid);
+	size_pfn = node_spanned_pages(ras2_ctx->sys_comp_nid);
+	if (!size_pfn) {
+		pr_debug("Failed to find PA range of NUMA node(%u)\n", ras2_ctx->sys_comp_nid);
+		return -EPERM;
+	}
+
+	ras2_ctx->mem_base = __pfn_to_phys(start_pfn);
+	ras2_ctx->mem_size = __pfn_to_phys(size_pfn);
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	sprintf(scrub_name, "acpi_ras_mem%d", auxdev->id);
+
+	ras_features.ft_type	= RAS_FEAT_SCRUB;
+	ras_features.instance	= 0;
+	ras_features.scrub_ops	= &ras2_scrub_ops;
+	ras_features.ctx	= ras2_ctx;
+
+	return edac_dev_register(&auxdev->dev, scrub_name, NULL, 1, &ras_features);
+}
+
+static const struct auxiliary_device_id ras2_mem_dev_id_table[] = {
+	{ .name = RAS2_AUX_DEV_NAME "." RAS2_MEM_DEV_ID_NAME, },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(auxiliary, ras2_mem_dev_id_table);
+
+static struct auxiliary_driver ras2_mem_driver = {
+	.name = RAS2_MEM_DEV_ID_NAME,
+	.probe = ras2_mem_drv_probe,
+	.remove = ras2_mem_drv_remove,
+	.id_table = ras2_mem_dev_id_table,
+};
+module_auxiliary_driver(ras2_mem_driver);
+
+MODULE_IMPORT_NS("ACPI_RAS2");
+MODULE_DESCRIPTION("ACPI RAS2 memory driver");
+MODULE_LICENSE("GPL");
diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
index f4574e8e0a12..6b4323b03747 100644
--- a/include/acpi/ras2.h
+++ b/include/acpi/ras2.h
@@ -37,6 +37,21 @@ struct device;
  *			is associated with. See ACPI spec 6.5 Table 5.80: RAS2
  *			Platform Communication Channel Descriptor format,
  *			Field: Instance
+ * @mem_base:		Base of the lowest physical continuous memory range
+ *			of the memory associated with the NUMA domain
+ * @mem_size		Size of the lowest physical continuous memory range
+ *			of the memory associated with the NUMA domain
+ * @base:		Base address of the memory region to scrub
+ * @size:		Size of the memory region to scrub
+ * @scrub_cycle_hrs:	Current scrub rate in hours
+ * @set_scrub_cycle:	Scrub rate to set in hours
+ * @min_scrub_cycle:	Minimum scrub rate supported
+ * @max_scrub_cycle:	Maximum scrub rate supported
+ * @od_scrub:		Status of demand scrubbing (memory region)
+ * @bg_scrub:		Status of background patrol scrubbing
+ * @reenable_bg_scrub:	Flag indicates restart background scrubbing after demand
+ *			scrubbing is finished
+ * @thread:		Demand scrub monitor kthread
  */
 struct ras2_mem_ctx {
 	struct auxiliary_device		adev;
@@ -45,6 +60,18 @@ struct ras2_mem_ctx {
 	void				*sspcc;
 	struct mutex			*pcc_lock;
 	u32				sys_comp_nid;
+	u64				mem_base;
+	u64				mem_size;
+	u64				base;
+	u64				size;
+	u8				scrub_cycle_hrs;
+	u8				set_scrub_cycle;
+	u8				min_scrub_cycle;
+	u8				max_scrub_cycle;
+	bool				od_scrub;
+	bool				bg_scrub;
+	bool				reenable_bg_scrub;
+	struct task_struct		*thread;
 };
 
 #ifdef CONFIG_ACPI_RAS2
diff --git a/include/linux/edac.h b/include/linux/edac.h
index fa32f2aca22f..2342ff38e9d5 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -680,6 +680,8 @@ enum edac_dev_feat {
  * @write_size: set offset of the scrubbing range.
  * @get_enabled_bg: check if currently performing background scrub.
  * @set_enabled_bg: start or stop a bg-scrub.
+ * @get_enabled_od: check if currently performing demand scrub.
+ * @set_enabled_od: start or stop a demand-scrub.
  * @get_min_cycle: get minimum supported scrub cycle duration in seconds.
  * @get_max_cycle: get maximum supported scrub cycle duration in seconds.
  * @get_cycle_duration: get current scrub cycle duration in seconds.
@@ -692,6 +694,8 @@ struct edac_scrub_ops {
 	int (*write_size)(struct device *dev, void *drv_data, u64 size);
 	int (*get_enabled_bg)(struct device *dev, void *drv_data, bool *enable);
 	int (*set_enabled_bg)(struct device *dev, void *drv_data, bool enable);
+	int (*get_enabled_od)(struct device *dev, void *drv_data, bool *enable);
+	int (*set_enabled_od)(struct device *dev, void *drv_data, bool enable);
 	int (*get_min_cycle)(struct device *dev, void *drv_data,  u32 *min);
 	int (*get_max_cycle)(struct device *dev, void *drv_data,  u32 *max);
 	int (*get_cycle_duration)(struct device *dev, void *drv_data, u32 *cycle);
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 3/4] docs/zh_CN: update rust/quick-start.rst translation
From: Gary Guo @ 2026-04-08 17:43 UTC (permalink / raw)
  To: Ben Guo, Gary Guo, Alex Shi, Yanteng Si, Dongliang Mu,
	Jonathan Corbet
  Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <46eb585f-4983-4821-9be8-ef57571c3516@openatom.club>

On Wed Apr 8, 2026 at 5:51 PM BST, Ben Guo wrote:
> On 4/8/26 7:33 PM, Gary Guo wrote:
>> Hi Ben,
>> 
>> Thanks on updating the doc translation. There has been new changes to
>> quick-start.rst on rust-next, could you update the translation to base on that
>> please?
>> 
>> Thanks,
>> Gary
>
> Hi Gary, 
>   
>
>   
>   
>
> Thanks for the review. This series is based on the Chinese documentation
> maintainer's tree (alexs/linux.git docs-next), which does not yet have
> the latest quick-start.rst changes from the Rust-for-Linux rust-next
> tree.
>
> Would it be better to wait until those changes land in our base tree
> and then resend with the updated translation? Or would you prefer a
> different approach?
>
> Thanks,
> Ben

I don't see the issue of sending translation of the latest quick-start.rst even
if it's not in your base yet. By the time the changes land upstream, the
original quick-start.rst would already be there.

Best,
Gary

^ permalink raw reply

* [PATCH v2] checkpatch: add --json output mode
From: Sasha Levin @ 2026-04-08 17:24 UTC (permalink / raw)
  To: dwaipayanray1, lukas.bulwahn
  Cc: joe, mricon, corbet, skhan, apw, workflows, linux-doc,
	linux-kernel, Sasha Levin
In-Reply-To: <20260406170039.4034716-1-sashal@kernel.org>

Add a --json flag to checkpatch.pl that emits structured JSON output,
making results machine-parseable for CI systems, IDE integrations, and
AI-assisted code review tools.

The JSON output includes per-file totals (errors, warnings, checks,
lines) and an array of individual issues with structured fields for
level, type, message, file path, and line number.

The --json flag is mutually exclusive with --terse and --emacs.
Normal text output behavior is completely unchanged when --json is
not specified.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Changes since v1:
- Replace hand-rolled json_escape()/json_encode_issue() with JSON::PP
  module (core since perl 5.14), as suggested by Konstantin and Joe
- Factor duplicated empty-result JSON blocks into json_print_result()
  helper
- Include used_types and ignored_types arrays in JSON output instead of
  suppressing hash_show_words, per Joe's suggestion
---
 Documentation/dev-tools/checkpatch.rst |  7 +++
 scripts/checkpatch.pl                  | 64 +++++++++++++++++++++++---
 2 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst
index dccede68698ca..17e5744d3dee6 100644
--- a/Documentation/dev-tools/checkpatch.rst
+++ b/Documentation/dev-tools/checkpatch.rst
@@ -64,6 +64,13 @@ Available options:
 
    Output only one line per report.
 
+ - --json
+
+   Output results as a JSON object.  The object includes total error, warning,
+   and check counts, plus an array of individual issues with structured fields
+   for level, type, message, file, and line number.  Cannot be used with
+   --terse or --emacs.
+
  - --showfile
 
    Show the diffed file position instead of the input file position.
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index e56374662ff79..38d1a4a13ee8e 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -14,6 +14,7 @@ use File::Basename;
 use Cwd 'abs_path';
 use Term::ANSIColor qw(:constants);
 use Encode qw(decode encode);
+use JSON::PP;
 
 my $P = $0;
 my $D = dirname(abs_path($P));
@@ -33,6 +34,7 @@ my $chk_patch = 1;
 my $tst_only;
 my $emacs = 0;
 my $terse = 0;
+my $json = 0;
 my $showfile = 0;
 my $file = 0;
 my $git = 0;
@@ -93,6 +95,7 @@ Options:
   --patch                    treat FILE as patchfile (default)
   --emacs                    emacs compile window format
   --terse                    one line per report
+  --json                     output results as JSON
   --showfile                 emit diffed file position, not input file position
   -g, --git                  treat FILE as a single commit or git revision range
                              single git commit with:
@@ -320,6 +323,7 @@ GetOptions(
 	'patch!'	=> \$chk_patch,
 	'emacs!'	=> \$emacs,
 	'terse!'	=> \$terse,
+	'json!'		=> \$json,
 	'showfile!'	=> \$showfile,
 	'f|file!'	=> \$file,
 	'g|git!'	=> \$git,
@@ -379,6 +383,7 @@ help($help - 1) if ($help);
 
 die "$P: --git cannot be used with --file or --fix\n" if ($git && ($file || $fix));
 die "$P: --verbose cannot be used with --terse\n" if ($verbose && $terse);
+die "$P: --json cannot be used with --terse or --emacs\n" if ($json && ($terse || $emacs));
 
 if ($color =~ /^[01]$/) {
 	$color = !$color;
@@ -1351,7 +1356,7 @@ for my $filename (@ARGV) {
 	}
 	close($FILE);
 
-	if ($#ARGV > 0 && $quiet == 0) {
+	if (!$json && $#ARGV > 0 && $quiet == 0) {
 		print '-' x length($vname) . "\n";
 		print "$vname\n";
 		print '-' x length($vname) . "\n";
@@ -1372,7 +1377,7 @@ for my $filename (@ARGV) {
 	$file = $oldfile if ($is_git_file);
 }
 
-if (!$quiet) {
+if (!$quiet && !$json) {
 	hash_show_words(\%use_type, "Used");
 	hash_show_words(\%ignore_type, "Ignored");
 
@@ -2395,6 +2400,18 @@ sub report {
 
 	push(our @report, $output);
 
+	if ($json) {
+		our ($realfile, $realline);
+		my %issue = (
+			level => $level,
+			type => $type,
+			message => $msg,
+		);
+		$issue{file} = $realfile if (defined $realfile && $realfile ne '');
+		$issue{line} = $realline + 0 if (defined $realline && $realline);
+		push(our @json_issues, \%issue);
+	}
+
 	return 1;
 }
 
@@ -2402,6 +2419,23 @@ sub report_dump {
 	our @report;
 }
 
+sub json_print_result {
+	my ($filename, $total_errors, $total_warnings, $total_checks,
+	    $total_lines, $issues, $used_types, $ignored_types) = @_;
+	my %result = (
+		filename       => $filename,
+		total_errors   => $total_errors + 0,
+		total_warnings => $total_warnings + 0,
+		total_checks   => $total_checks + 0,
+		total_lines    => $total_lines + 0,
+		issues         => $issues,
+	);
+	$result{used_types} = $used_types if (defined $used_types);
+	$result{ignored_types} = $ignored_types if (defined $ignored_types);
+	my $json_encoder = JSON::PP->new->canonical->utf8;
+	print $json_encoder->encode(\%result) . "\n";
+}
+
 sub fixup_current_range {
 	my ($lineRef, $offset, $length) = @_;
 
@@ -2690,14 +2724,15 @@ sub process {
 	my $last_coalesced_string_linenr = -1;
 
 	our @report = ();
+	our @json_issues = ();
 	our $cnt_lines = 0;
 	our $cnt_error = 0;
 	our $cnt_warn = 0;
 	our $cnt_chk = 0;
 
 	# Trace the real file/line as we go.
-	my $realfile = '';
-	my $realline = 0;
+	our $realfile = '';
+	our $realline = 0;
 	my $realcnt = 0;
 	my $here = '';
 	my $context_function;		#undef'd unless there's a known function
@@ -7791,18 +7826,27 @@ sub process {
 	# If we have no input at all, then there is nothing to report on
 	# so just keep quiet.
 	if ($#rawlines == -1) {
+		if ($json) {
+			json_print_result($filename, 0, 0, 0, 0, []);
+		}
 		exit(0);
 	}
 
 	# In mailback mode only produce a report in the negative, for
 	# things that appear to be patches.
 	if ($mailback && ($clean == 1 || !$is_patch)) {
+		if ($json) {
+			json_print_result($filename, 0, 0, 0, 0, []);
+		}
 		exit(0);
 	}
 
 	# This is not a patch, and we are in 'no-patch' mode so
 	# just keep quiet.
 	if (!$chk_patch && !$is_patch) {
+		if ($json) {
+			json_print_result($filename, 0, 0, 0, 0, []);
+		}
 		exit(0);
 	}
 
@@ -7850,6 +7894,13 @@ sub process {
 		}
 	}
 
+	if ($json) {
+		my @used = sort keys %use_type;
+		my @ignored = sort keys %ignore_type;
+		json_print_result($filename, $cnt_error, $cnt_warn,
+				  $cnt_chk, $cnt_lines, \@json_issues,
+				  \@used, \@ignored);
+	} else {
 	print report_dump();
 	if ($summary && !($clean == 1 && $quiet == 1)) {
 		print "$filename " if ($summary_file);
@@ -7878,8 +7929,9 @@ NOTE: Whitespace errors detected.
 EOM
 		}
 	}
+	} # end !$json
 
-	if ($clean == 0 && $fix &&
+	if (!$json && $clean == 0 && $fix &&
 	    ("@rawlines" ne "@fixed" ||
 	     $#fixed_inserted >= 0 || $#fixed_deleted >= 0)) {
 		my $newfile = $filename;
@@ -7918,7 +7970,7 @@ EOM
 		}
 	}
 
-	if ($quiet == 0) {
+	if (!$json && $quiet == 0) {
 		print "\n";
 		if ($clean == 1) {
 			print "$vname has no obvious style problems and is ready for submission.\n";
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-08 16:58 UTC (permalink / raw)
  To: Eliot Courtney, linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Alexandre Courbot, Andrea Righi,
	Andy Ritger, Zhi Wang, Balbir Singh, Philipp Stanner,
	Elle Rhumsaa, alexeyi, joel, linux-doc, amd-gfx, intel-gfx,
	intel-xe, linux-fbdev
In-Reply-To: <DHNT32C2Q5HN.LLME0RV17Z8V@nvidia.com>

Hi Eliot,

On 4/8/2026 9:26 AM, Eliot Courtney wrote:
> On Tue Apr 7, 2026 at 10:59 PM JST, Joel Fernandes wrote:
>> Hi Eliot,
>>
>> On 4/7/2026 9:42 AM, Eliot Courtney wrote:
>>> On Tue Apr 7, 2026 at 6:55 AM JST, Joel Fernandes wrote:
>>>>>> +    /// Compute upper bound on page table pages needed for `num_virt_pages`.
>>>>>> +    ///
>>>>>> +    /// Walks from PTE level up through PDE levels, accumulating the tree.
>>>>>> +    pub(crate) fn pt_pages_upper_bound(&self, num_virt_pages: usize) -> usize {
>>>>>> +        let mut total = 0;
>>>>>> +
>>>>>> +        // PTE pages at the leaf level.
>>>>>> +        let pte_epp = self.entries_per_page(self.pte_level());
>>>>>> +        let mut pages_at_level = num_virt_pages.div_ceil(pte_epp);
>>>>>> +        total += pages_at_level;
>>>>>> +
>>>>>> +        // Walk PDE levels bottom-up (reverse of pde_levels()).
>>>>>> +        for &level in self.pde_levels().iter().rev() {
>>>>>> +            let epp = self.entries_per_page(level);
>>>>>> +
>>>>>> +            // How many pages at this level do we need to point to
>>>>>> +            // the previous pages_at_level?
>>>>>> +            pages_at_level = pages_at_level.div_ceil(epp);
>>>>>> +            total += pages_at_level;
>>>>>> +        }
>>>>>> +
>>>>>> +        total
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>
>>>>> We have a lot of matches on the MMU version here (and below in Pte, Pde,
>>>>> DualPde). What about making MmuVersion into a trait (e.g. Mmu) with
>>>>> associated types for Pte, Pde, DualPde which can implement traits
>>>>> defining their common operations too?
>>>>
>>>> I coded this up and it did not look pretty, there's not much LOC savings and the
>>>> code becomes harder to read because of parametrization of several functions. Also:
>>>
>>> Thanks for looking into it. Sorry to be a bother, but would you have a
>>> branch around with the code? I'm curious what didn't look good about it.
>>
>> Sorry but I already mentioned that above, the parameterizing of dozens of
>> function call sites, 3-4 new traits (because each struct like
>> Pte/Pde/DualPde etc each need their own trait which different MMU versions
>> implement) etc. The code because hard to read and readability is the top
>> critical criteria for me - I am personally strictly against "Lets use shiny
>> features in language at the cost of making code unreadable". Because that
>> translates into bugs and nightmare for maintainability.
>>
>> I don't have the code at the moment, but if you still want to spend on time
>> on this direction, feel free to share a tree. I am happy to take a look.
> 
> I had a go at this, you can see the branch here [1] - it might not be
> perfect, but I think the shape is directionally good. It's structured so
> the HEAD commit has the diff from the current approach to the
> parametrised approach. The main decision is where to do the type
> erasure, I chose in `Vmm` since it looks like the main top level API for
> this code, but could do `BarUser` instead. I think it's overall better.
> I also think Alex's point about associated types making it easier to use
> the appropriate Bounded type is a good one.
> 
> [1]: https://github.com/Edgeworth/linux/commits/review/nova-mm-v10/
First, thanks for the effort. I looked through this, its pretty much what I
had before when I used traits. I don't think it is better to be honest. In
fact your version is worse, it adds many new types and things like the
following which I did not need before.

To put it mildly, the following suggestion should not be anywhere near my code:

/// Type-erased MMU-specific [`Vmm`] implementations.
enum VmmInner {
    /// `Vmm` implementation for MMU v2.
    V2(VmmImpl<MmuV2>),
    /// `Vmm` implementation for MMU v3.
    V3(VmmImpl<MmuV3>),
}

/// MMU-specific [`Vmm`] implementation.
struct VmmImpl<M: Mmu> {

Seriously, I have to pass on this. :-)

And, you unfortunately seem to have ignored my point about requiring 4 NEW
traits (Mmu, PteOps, PdeOps, DualPdeOps etc), which I did not need before.
So you're making the code much much worse than before actually. We don't
new traits and types pointlessly.

The only positive thing I could take away from your diff is the following
(I thought I had already done that, but I'll double check).

-    fn level_index(&self, level: u64) -> u64 {
+    fn level_index(&self, level: PageTableLevel) -> u64 {

Also you're parametrizing VirtualAddress as well which I did not have before:

-     let va = VirtualAddress::from(vfn);
+     let va = M::va(VirtualAddress::from(vfn));

This is another step back.

> I also think Alex's point about associated types making it easier to use
> the appropriate Bounded type is a good one.

I will reply to Alex thread, separately. I have some good data that should
hopefully convince you and Alex that my approach in this patch is better
(Version struct based dispatch than monomorphization). I would emphasize,
as we all know, that we should make optimizations and changes based on real
data and proper technical arguments so in the spirit of that, I have
collected data with both approaches and I will reply to Alex's email with
all that in there.

Also, the bounded types usage is orthogonal to version-parameterization.
That can be done regardless, we already use bitfield macro in this code and
can use bounded types within that if needed to restrict type creation. So I
don't think we should mix the 2 concepts "bounded types" and
"parameterization".

thanks,

--
Joel Fernandes



^ permalink raw reply

* Re: [PATCH v8 0/2] PCI: s390: Expose the UID as an arch specific PCI slot attribute
From: Bjorn Helgaas @ 2026-04-08 16:57 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Bjorn Helgaas, Niklas Schnelle, Jonathan Corbet, Lukas Wunner,
	Shuah Khan, Farhan Ali, Alexander Gordeev, Christian Borntraeger,
	Gerald Schaefer, Gerd Bayer, Heiko Carstens, Julian Ruess,
	Matthew Rosato, Peter Oberparleiter, Ramesh Errabolu,
	Sven Schnelle, linux-doc, linux-kernel, linux-pci, linux-s390,
	Randy Dunlap
In-Reply-To: <ttd6cui@ub.hpns>

On Wed, Apr 08, 2026 at 02:18:18PM +0200, Vasily Gorbik wrote:
> On Tue, Apr 07, 2026 at 03:24:44PM +0200, Niklas Schnelle wrote:
> > Add a mechanism for architecture specific attributes on
> > PCI slots in order to add the user-defined ID (UID) as an s390 specific
> > PCI slot attribute. First though improve some issues with the s390 specific
> > documentation of PCI sysfs attributes noticed during development.
> 
> > Niklas Schnelle (2):
> >       docs: s390/pci: Improve and update PCI documentation
> >       PCI: s390: Expose the UID as an arch specific PCI slot attribute
> > 
> >  Documentation/arch/s390/pci.rst | 151 +++++++++++++++++++++++++++-------------
> >  arch/s390/include/asm/pci.h     |   4 ++
> >  arch/s390/pci/pci_sysfs.c       |  20 ++++++
> >  drivers/pci/slot.c              |  13 +++-
> >  4 files changed, 140 insertions(+), 48 deletions(-)
> 
> Bjorn, would you like to take this through the PCI tree? I think Niklas
> phrased the subject with that in mind.
> 
> Otherwise, I can take it through the s390 tree. If so, could you give
> me your Acked-by?

I did ack it, but I guess it was a previous version:

  https://lore.kernel.org/all/20260407193205.GA247806@bhelgaas

It'd be great if you merged it via s390.  The interesting parts are
really in arch/s390.

^ permalink raw reply

* Re: [PATCH v8 2/2] PCI: s390: Expose the UID as an arch specific PCI slot attribute
From: Bjorn Helgaas @ 2026-04-08 16:57 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Jonathan Corbet, Lukas Wunner, Shuah Khan,
	Farhan Ali, Alexander Gordeev, Christian Borntraeger,
	Gerald Schaefer, Gerd Bayer, Heiko Carstens, Julian Ruess,
	Matthew Rosato, Peter Oberparleiter, Ramesh Errabolu,
	Sven Schnelle, Vasily Gorbik, linux-doc, linux-kernel, linux-pci,
	linux-s390
In-Reply-To: <20260407-uid_slot-v8-2-15ae4409d2ce@linux.ibm.com>

On Tue, Apr 07, 2026 at 03:24:46PM +0200, Niklas Schnelle wrote:
> On s390, an individual PCI function can generally be identified by two
> identifiers, the FID and the UID. Which identifier is used depends on
> the scope and the platform configuration.
> 
> The first identifier, the FID, is always available and identifies a PCI
> device uniquely within a machine. The FID may be virtualized by
> hypervisors, but on the LPAR level, the machine scope makes it
> impossible to create the same configuration based on FIDs on two
> different LPARs of the same machine, and difficult to reuse across
> machines.
> 
> Such matching LPAR configurations are useful, though, allowing
> standardized setups and booting a Linux installation on different LPARs.
> To this end the UID, or user-defined identifier, was introduced. While
> it is only guaranteed to be unique within an LPAR and only if indicated
> by firmware, it allows users to replicate PCI device setups.
> 
> On s390, which uses a machine hypervisor, a per PCI function hotplug
> model is used. The shortcoming with the UID then is, that it is not
> visible to the user without first attaching the PCI function and
> accessing the "uid" device attribute. The FID, on the other hand, is
> used as the slot name and is thus known even with the PCI function in
> standby.
> 
> Remedy this shortcoming by providing the UID as an attribute on the slot
> allowing the user to identify a PCI function based on the UID without
> having to first attach it. Do this via a macro mechanism analogous to
> what was introduced by commit 265baca69a07 ("s390/pci: Stop usurping
> pdev->dev.groups") for the PCI device attributes.
> 
> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
> Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/slot.c

> ---
>  Documentation/arch/s390/pci.rst |  7 +++++++
>  arch/s390/include/asm/pci.h     |  4 ++++
>  arch/s390/pci/pci_sysfs.c       | 20 ++++++++++++++++++++
>  drivers/pci/slot.c              | 13 ++++++++++++-
>  4 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/s390/pci.rst b/Documentation/arch/s390/pci.rst
> index c3476de4f03278d07099aa32cbea0f868b6e9c9c..80f4ba19315994da056a10b4d216d61ff22ea5aa 100644
> --- a/Documentation/arch/s390/pci.rst
> +++ b/Documentation/arch/s390/pci.rst
> @@ -58,6 +58,13 @@ Entries specific to zPCI functions and entries that hold zPCI information.
>  
>    - /sys/bus/pci/slots/XXXXXXXX/power
>  
> +  In addition to using the FID as the name of the slot, the slot directory
> +  also contains the following s390-specific slot attributes.
> +
> +  - uid:
> +    The User-defined identifier (UID) of the function which may be configured
> +    by this slot. See also the corresponding attribute of the device.
> +
>    A physical function that currently supports a virtual function cannot be
>    powered off until all virtual functions are removed with:
>    echo 0 > /sys/bus/pci/devices/DDDD:BB:dd.f/sriov_numvf
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index c0ff19dab5807c7e1aabb48a0e9436aac45ec97d..5dcf35f0f325f5f44b28109a1c8d9aef18401035 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -208,6 +208,10 @@ extern const struct attribute_group zpci_ident_attr_group;
>  			    &pfip_attr_group,		 \
>  			    &zpci_ident_attr_group,
>  
> +extern const struct attribute_group zpci_slot_attr_group;
> +
> +#define ARCH_PCI_SLOT_GROUPS (&zpci_slot_attr_group)
> +
>  extern unsigned int s390_pci_force_floating __initdata;
>  extern unsigned int s390_pci_no_rid;
>  
> diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
> index c2444a23e26c4218832bb91930b5f0ffd498d28f..d98d97df792adb3c7e415a8d374cc2f3a65fbb52 100644
> --- a/arch/s390/pci/pci_sysfs.c
> +++ b/arch/s390/pci/pci_sysfs.c
> @@ -187,6 +187,17 @@ static ssize_t index_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(index);
>  
> +static ssize_t zpci_uid_slot_show(struct pci_slot *slot, char *buf)
> +{
> +	struct zpci_dev *zdev = container_of(slot->hotplug, struct zpci_dev,
> +					     hotplug_slot);
> +
> +	return sysfs_emit(buf, "0x%x\n", zdev->uid);
> +}
> +
> +static struct pci_slot_attribute zpci_slot_attr_uid =
> +	__ATTR(uid, 0444, zpci_uid_slot_show, NULL);
> +
>  static umode_t zpci_index_is_visible(struct kobject *kobj,
>  				     struct attribute *attr, int n)
>  {
> @@ -243,6 +254,15 @@ const struct attribute_group pfip_attr_group = {
>  	.attrs = pfip_attrs,
>  };
>  
> +static struct attribute *zpci_slot_attrs[] = {
> +	&zpci_slot_attr_uid.attr,
> +	NULL,
> +};
> +
> +const struct attribute_group zpci_slot_attr_group = {
> +	.attrs = zpci_slot_attrs,
> +};
> +
>  static struct attribute *clp_fw_attrs[] = {
>  	&uid_checking_attr.attr,
>  	NULL,
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 787311614e5b6ebb39e7284f9b9f205a0a684d6d..2f8fcfbbec24e73d0bb6e40fd04c05a94f518045 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -96,7 +96,18 @@ static struct attribute *pci_slot_default_attrs[] = {
>  	&pci_slot_attr_cur_speed.attr,
>  	NULL,
>  };
> -ATTRIBUTE_GROUPS(pci_slot_default);
> +
> +static const struct attribute_group pci_slot_default_group = {
> +	.attrs = pci_slot_default_attrs,
> +};
> +
> +static const struct attribute_group *pci_slot_default_groups[] = {
> +	&pci_slot_default_group,
> +#ifdef ARCH_PCI_SLOT_GROUPS
> +	ARCH_PCI_SLOT_GROUPS,
> +#endif
> +	NULL,
> +};
>  
>  static const struct kobj_type pci_slot_ktype = {
>  	.sysfs_ops = &pci_slot_sysfs_ops,
> 
> -- 
> 2.51.0
> 

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Ackerley Tng @ 2026-04-08 16:54 UTC (permalink / raw)
  To: Sean Christopherson, Michael Roth
  Cc: Vishal Annapurve, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, suzuki.poulose, aneesh.kumar, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <adWidf8UgZeYctr1@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Apr 07, 2026, Michael Roth wrote:
>> On Tue, Apr 07, 2026 at 02:50:58PM -0700, Vishal Annapurve wrote:
>> > On Tue, Apr 7, 2026 at 2:09 PM Michael Roth <michael.roth@amd.com> wrote:
>> > >
>> > > > TLDR:
>> > > >
>> > > > + Think of populate ioctls not as KVM touching memory, but platform
>> > > >   handling population.
>> > > > + KVM code (kvm_gmem_populate) still doesn't touch memory contents
>> > > > + post_populate is platform-specific code that handles loading into
>> > > >   private destination memory just to support legacy non-in-place
>> > > >   conversion.
>> > > > + Don't complicate populate ioctls by doing conversion just to support
>> > > >   legacy use-cases where platform-specific code has to do copying on
>> > > >   the host.
>> > >
>> > > That's a good point: these are only considerations in the context of
>> > > actually copying from src->dst, but with in-place conversion the
>> > > primary/more-performant approach will be for userspace to initial
>> > > directly. I.e. if we enforced that, then gmem could right ascertain that
>> > > it isn't even writing to private pages via these hooks and any
>> > > manipulation of that memory is purely on the part of the trusted entity
>> > > handling initial encryption/etc.
>> > >
>> > > I understand that we decided to keep the option of allowing separate
>> > > src/dst even with in-place conversion, but it doesn't seem worthwhile if
>> > > that necessarily means we need to glue population+conversion together in
>> > > 1 clumsy interface that needs to handle partial return/error responses to
>> > > userspace (or potentially get stuck forever in the conversion path).
>> >
>> > I think ARM needs userspace to specify separate source and destination
>> > memory ranges for initial population as ARM doesn't support in-place
>> > memory encryption. [1]
>> >
>> > [1] https://lore.kernel.org/kvm/20260318155413.793430-25-steven.price@arm.com/
>> >
>> > >
>> > > So I agree with Ackerley's proposal (which I guess is the same as what's
>> > > in this series).
>> > >
>> > > However, 1 other alternative would be to do what was suggested on the
>> > > call, but require userspace to subsequently handle the shared->private
>> > > conversion. I think that would be workable too.
>> >
>> > IIUC, Converting memory ranges to private after it essentially is
>> > treated as private by the KVM CC backend will expose the
>> > implementation to the same risk of userspace being able to access
>> > private memory and compromise host safety which guest_memfd was
>> > invented to address.
>>
>> Doh, fair point. Doing conversion as part of the populate call would allow
>> us to use the filemap write-lock to avoid userspace being able to fault
>> in private (as tracked by trusted entity) pages before they are
>> transitioned to private (as tracked by KVM), so it's safer than having
>> userspace drive it.
>>
>> But obviously I still think Ackerley's original proposal has more
>> upsides than the alternatives mentioned so far.
>
> I'm a bit lost.  What exactly is/was Ackerley's original proposal?  If the answer
> is "convert pages from shared=>private when populating via in-place conversion",
> then I agree, because AFAICT, that's the only sane option.

Discussed this at PUCK today 2026-04-08.

The update is that the KVM_SET_MEMORY_ATTRIBUTES2 guest_memfd ioctl will
now support the PRESERVE flag for TDX and SNP only if the setup for the
VM in question hasn't yet been completed (KVM_TDX_FINALIZE_VM or
KVM_SEV_SNP_LAUNCH_FINISH hasn't completed yet).

The populate flow will be

1a. Get contents to be loaded in guest_memfd (src_addr: NULL) as shared
OR
1b. Provide contents from some other userspace address (src_addr:
    userspace address)

2.  KVM_SET_MEMORY_ATTRIBUTES2(attribute: PRIVATE and flags: PRESERVE)
3.  KVM_SEV_SNP_LAUNCH_UPDATE() or KVM_TDX_INIT_MEM_REGION()
...
4.  KVM_SEV_SNP_LAUNCH_FINISH() or KVM_TDX_FINALIZE_VM()

This applies whether src_addr is some userspace address that is shared
or NULL, so the non-in-place loading flow is not considered legacy. ARM
CCA can still use that flow :)

Other than supporting PRESERVE only if the setup for the VM in question
hasn't yet been completed, KVM's fault path will also not permit faults
if the setup hasn't been completed. (Some exception setup will be used
for TDX to be able to perform the required fault.)

^ permalink raw reply

* Re: [PATCH 0/4] docs/zh_CN: update rust/ subsystem translations
From: Ben Guo @ 2026-04-08 16:54 UTC (permalink / raw)
  To: Dongliang Mu, Alex Shi, Yanteng Si, Jonathan Corbet
  Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <8dd6239f-eac6-4e81-a1b5-a4e6c45d07fd@hust.edu.cn>

On 4/8/26 7:44 PM, Dongliang Mu wrote:
> Hi Guo,
> 
> I found an issue in this patchset: please do not directly include my 
> review tag from the internal mailing list [1].
> 
> After you submit it to the linux‑doc mailing list, I will add my review 
> tag at that time. Including it now would look inappropriate.
> 
> Our internal review is only intended to maintain patch quality for our 
> open‑source club.

Hi Dongliang,

Thanks for pointing this out.

I will remove your Reviewed-by from all patches and resend as v2.

Thanks,
Ben

^ permalink raw reply

* Re: [PATCH 3/4] docs/zh_CN: update rust/quick-start.rst translation
From: Ben Guo @ 2026-04-08 16:51 UTC (permalink / raw)
  To: Gary Guo, Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
  Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <DHNQOSMQJV1A.18UJB6VG0QK70@garyguo.net>

On 4/8/26 7:33 PM, Gary Guo wrote:
> Hi Ben,
> 
> Thanks on updating the doc translation. There has been new changes to
> quick-start.rst on rust-next, could you update the translation to base on that
> please?
> 
> Thanks,
> Gary

Hi Gary, 

Thanks for the review. This series is based on the Chinese documentation
maintainer's tree (alexs/linux.git docs-next), which does not yet have
the latest quick-start.rst changes from the Rust-for-Linux rust-next
tree.

Would it be better to wait until those changes land in our base tree
and then resend with the updated translation? Or would you prefer a
different approach?

Thanks,
Ben

^ permalink raw reply

* Re: [RFC PATCH v3 00/10] mm/damon: introduce DAMOS failed region quota charge ratio
From: Bijan Tabatabai @ 2026-04-08 16:48 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Bijan Tabatabai, Liam R. Howlett, Andrew Morton, Brendan Higgins,
	David Gow, David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Shuah Khan, Shuah Khan,
	Suren Baghdasaryan, Vlastimil Babka, damon, kunit-dev, linux-doc,
	linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <20260407010536.83603-1-sj@kernel.org>

On Mon,  6 Apr 2026 18:05:22 -0700 SeongJae Park <sj@kernel.org> wrote:

Hi SJ,

> TL; DR: Let users set different DAMOS quota charge ratios for DAMOS
> action failed regions, for deterministic and consistent DAMOS action
> progress.
> 
> Common Reports: Unexpectedly Slow DAMOS
> =======================================
> 
> One common issue report that we get from DAMON users is that DAMOS
> action applying progress speed is sometimes much slower than expected.
> And one common root cause is that the DAMOS quota is exceeded by the
> action applying failed memory regions.
> 
> For example, a group of users tried to run DAMOS-based proactive memory
> reclamation (DAMON_RECLAIM) with 100 MiB per second DAMOS quota.  They
> ran it on a system having no active workload which means all memory of
> the system is cold.  The expectation was that the system will show 100
> MiB per second reclamation until (nearly) all memory is reclaimed. But
> what they found is that the speed is quite inconsistent and sometimes it
> becomes very slower than the expectation, sometimes even no reclamation
> at all for about tens of seconds.  The upper limit of the speed (100 MiB
> per second) was being kept as expected, though.
> 
> By monitoring the qt_exceeds (number of DAMOS quota exceed events) DAMOS
> stat, we found DAMOS quota is always exceeded when the speed is slow. By
> monitoring sz_tried and sz_applied (the total amount of DAMOS action
> tried memory and succeeded memory) DAMOS stats together, we found the
> reclamation attempts nearly always failed when the speed is slow.
> 
> DAMOS quota charges DAMOS action tried regions regardless of the
> successfulness of the try.  Hence in the example reported case, there
> was unreclaimable memory spread around the system memory.  Sometimes
> nearly 100 MiB of memory that DAMOS tried to reclaim in the given quota
> interval was reclaimable, and therefore showed nearly 100 MiB per second
> speed.  Sometimes nearly 99 MiB of memory that DAMOS was trying to
> reclaim in the given quota interval was unreclaimable, and therefore
> showing only about 1 MiB per second reclaim speed.
> 
> We explained it is an expected behavior of the feature rather than a
> bug, as DAMOS quota is there for only the upper-limit of the speed.  The
> users agreed and later reported a huge win from the adoption of
> DAMON_RECLAIM on their products.

Thanks for this series. This is a problem I have come across and am looking
forward to seeing this land.

> It is Not a Bug but a Feature; But...
> =====================================
> 
> So nothing is broken.  DAMOS quota is working as intended, as the upper
> limit of the speed.  It also provides its behavior observability via
> DAMOS stat.  In the real world production environment that runs long
> term active workloads and matters stability, the speed sometimes being
> slow is not a real problem.
> 
> But, the non-deterministic behavior is sometimes annoying, especially in
> lab environments.  Even in a realistic production environment, when
> there is a huge amount of DAMOS action unapplicable memory, the speed
> could be problematically slow.  Let's suppose a virtual machines
> provider that setup 99% of the host memory as hugetlb pages that cannot
> be reclaimed, to give it to virtual machines.  Also, when aim-oriented
> DAMOS auto-tuning is applied, this could also make the internal feedback
> loop confused.
> 
> The intention of the current behavior was that trying DAMOS action to
> regions would anyway impose some overhead, and therefore somehow be
> charged.  But in the real world, the overhead for failed action is much
> lighter than successful action.  Charging those at the same ratio may be
> unfair, or at least suboptimum in some environments.
> 
> DAMOS Action Failed Region Quota Charge Ratio
> =============================================
> 
> Let users set the charge ratio for the action-failed memory, for more
> optimal and deterministic use of DAMOS.  It allows users to specify the
> numerator and the denominator of the ratio for flexible setup.  For
> example, let's suppose the numerator and the denominator are set to 1
> and 4,096, respectively.  The ratio is 1 / 4,096.  A DAMOS scheme action
> is applied to 5 GiB memory.  For 1 GiB of the memory, the action is
> succeeded.  For the rest (4 GiB), the action is failed.  Then, only 1
> GiB and 1 MiB quota is charged.
> 
> The optimal charge ratio will depend on the use case and
> system/workload.  I'd recommend starting from setting the nominator as 1
> and the denominator as PAGE_SIZE and tune based on the results, because
> many DAMOS actions are applied at page level.

This makes sense, but the quota is also considered when setting the minimum
allowable score in damos_adjust_quota(), which, to my understanding, assumes
that all of the all of a region's data will by applied. If an action fails for
a significant amount of the memory, a lower score than what was calculated in
damos_adjust_quota() could be valid. If that's the case, the scheme would be
applied to fewer regions than strictly necessary.

As you mention above, this is not a correctness issue because the quota only
guarantees an upper limit on the amount of data the scheme is applied to.
Additionally, it may very well be true that what I listed above would not be
very noticeable in practice. I just thought this was worth pointing out as
something to think about.

Thanks,
Bijan

<snip>

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply

* [PATCH v3 1/2] platform/x86/intel-uncore-freq: Rename instance_id
From: Maciej Wieczor-Retman @ 2026-04-08 16:27 UTC (permalink / raw)
  To: skhan, ilpo.jarvinen, hansg, corbet, srinivas.pandruvada
  Cc: linux-kernel, platform-driver-x86, linux-doc, m.wieczorretman,
	Maciej Wieczor-Retman
In-Reply-To: <cover.1775665057.git.m.wieczorretman@pm.me>

From: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>

The "instance" word has a specific meaning in TPMI. It is a physical
index related to compute dies and IO dies present on a single TPMI
partition (which is also a single TPMI device). It's used for mapping
MMIO blocks for direct TPMI register access.

The currently used "instance_id" uncore_data struct field is a
sequentially generated value that's used for appending to uncore
directories inside the /sys/devices/system/cpu/intel_uncore_frequency
directory. It has no relation to the physical TPMI elements.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
Changelog v3:
- Add Srinivas' Acked-by.

Changelog v2:
- Redid the first paragraph to better describe what "instance" is.
- Rename seqname_id to seqnum_id to emphasize it's a sequential number
  not sequential name.

 .../x86/intel/uncore-frequency/uncore-frequency-common.c    | 6 +++---
 .../x86/intel/uncore-frequency/uncore-frequency-common.h    | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
index 7070c94324e0..25ab511ed8d2 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
@@ -268,7 +268,7 @@ int uncore_freq_add_entry(struct uncore_data *data, int cpu)
 		if (ret < 0)
 			goto uncore_unlock;
 
-		data->instance_id = ret;
+		data->seqnum_id = ret;
 		scnprintf(data->name, sizeof(data->name), "uncore%02d", ret);
 	} else {
 		scnprintf(data->name, sizeof(data->name), "package_%02d_die_%02d",
@@ -281,7 +281,7 @@ int uncore_freq_add_entry(struct uncore_data *data, int cpu)
 	ret = create_attr_group(data, data->name);
 	if (ret) {
 		if (data->domain_id != UNCORE_DOMAIN_ID_INVALID)
-			ida_free(&intel_uncore_ida, data->instance_id);
+			ida_free(&intel_uncore_ida, data->seqnum_id);
 	} else {
 		data->control_cpu = cpu;
 		data->valid = true;
@@ -301,7 +301,7 @@ void uncore_freq_remove_die_entry(struct uncore_data *data)
 	data->control_cpu = -1;
 	data->valid = false;
 	if (data->domain_id != UNCORE_DOMAIN_ID_INVALID)
-		ida_free(&intel_uncore_ida, data->instance_id);
+		ida_free(&intel_uncore_ida, data->seqnum_id);
 
 	mutex_unlock(&uncore_lock);
 }
diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
index 0abe850ef54e..0d5fd91ee0aa 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
@@ -35,7 +35,7 @@
  * @die_id:		Die id for this instance
  * @domain_id:		Power domain id for this instance
  * @cluster_id:		cluster id in a domain
- * @instance_id:	Unique instance id to append to directory name
+ * @seqnum_id:		Unique sequential id to append to directory name
  * @name:		Sysfs entry name for this instance
  * @agent_type_mask:	Bit mask of all hardware agents for this domain
  * @uncore_attr_group:	Attribute group storage
@@ -71,7 +71,7 @@ struct uncore_data {
 	int die_id;
 	int domain_id;
 	int cluster_id;
-	int instance_id;
+	int seqnum_id;
 	char name[32];
 	u16  agent_type_mask;
 
-- 
2.53.0



^ permalink raw reply related

* [PATCH v3 2/2] platform/x86/intel-uncore-freq: Expose instance ID in the sysfs
From: Maciej Wieczor-Retman @ 2026-04-08 16:27 UTC (permalink / raw)
  To: skhan, ilpo.jarvinen, hansg, corbet, srinivas.pandruvada
  Cc: linux-kernel, platform-driver-x86, linux-doc, m.wieczorretman,
	Maciej Wieczor-Retman
In-Reply-To: <cover.1775665057.git.m.wieczorretman@pm.me>

From: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>

Insufficient data is exported to allow direct access to TPMI registers
through MMIO. On non-partitioned systems domain_id can be used both for
mapping CPUs to their compute die IDs and for mapping die indices to
their MMIO memory blocks presented to userspace via TPMI debugfs.
However on partitioned systems the debugfs association doesn't work
anymore. This is due to how TPMI partitioning influences domain_id
calculation. The previous association is lost on partitioned systems in
order to keep using domain_id for mapping CPUs to compute dies.

Expose the instance ID in sysfs that's unique in the scope of one TPMI
partition (and hence one TPMI device). It's a physical index into mapped
MMIO blocks and can be used by userspace to figure out how to directly
access TPMI registers.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
---
Changelog v3:
- Change sprintf -> sysfs_emit in show_instance_id().
- Change part of patch message 'MMIO memory blocks mapped' -> 'MMIO
  memory blocks presented to userspace...'
- Change assigning function to static inline.

Changelog v2:
- Redo the patch message.
- Redo the function comment that assigns instance_id.
- Modify the documentation.

 .../pm/intel_uncore_frequency_scaling.rst         |  7 +++++++
 .../uncore-frequency/uncore-frequency-common.c    | 10 ++++++++++
 .../uncore-frequency/uncore-frequency-common.h    |  6 +++++-
 .../uncore-frequency/uncore-frequency-tpmi.c      | 15 ++++++++++++++-
 4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
index d367ba4d744a..b43ad4d5e333 100644
--- a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
+++ b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
@@ -88,8 +88,15 @@ and "fabric_cluster_id" in the directory.
 
 Attributes in each directory:
 
+``instance_id``
+	This attribute is used to get die indices in userspace mapped MMIO
+	blocks. Indices are local to a single TPMI partition. Needed for direct
+	TPMI register access.
+
 ``domain_id``
 	This attribute is used to get the power domain id of this instance.
+	Indices are unique in all TPMI partitions on a given CPU package. Can be
+	used to map compute dies to corresponding CPUs.
 
 ``die_id``
 	This attribute is used to get the Linux die id of this instance.
diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
index 25ab511ed8d2..3b554418a7a3 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
@@ -29,6 +29,13 @@ static ssize_t show_domain_id(struct kobject *kobj, struct kobj_attribute *attr,
 	return sysfs_emit(buf, "%u\n", data->domain_id);
 }
 
+static ssize_t show_instance_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+	struct uncore_data *data = container_of(attr, struct uncore_data, instance_id_kobj_attr);
+
+	return sysfs_emit(buf, "%u\n", data->instance_id);
+}
+
 static ssize_t show_fabric_cluster_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
 {
 	struct uncore_data *data = container_of(attr, struct uncore_data, fabric_cluster_id_kobj_attr);
@@ -200,6 +207,9 @@ static int create_attr_group(struct uncore_data *data, char *name)
 	if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) {
 		init_attribute_root_ro(domain_id);
 		data->uncore_attrs[index++] = &data->domain_id_kobj_attr.attr;
+		init_attribute_root_ro(instance_id);
+		data->uncore_attrs[index++] = &data->instance_id_kobj_attr.attr;
+
 		init_attribute_root_ro(fabric_cluster_id);
 		data->uncore_attrs[index++] = &data->fabric_cluster_id_kobj_attr.attr;
 		init_attribute_root_ro(package_id);
diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
index 0d5fd91ee0aa..e319448dc1a4 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
@@ -36,6 +36,7 @@
  * @domain_id:		Power domain id for this instance
  * @cluster_id:		cluster id in a domain
  * @seqnum_id:		Unique sequential id to append to directory name
+ * @instance_id:	Die indices or feature instances for a single TPMI device
  * @name:		Sysfs entry name for this instance
  * @agent_type_mask:	Bit mask of all hardware agents for this domain
  * @uncore_attr_group:	Attribute group storage
@@ -56,6 +57,7 @@
  * @elc_floor_freq_khz_kobj_attr: Storage for kobject attribute elc_floor_freq_khz
  * @agent_types_kobj_attr: Storage for kobject attribute agent_type
  * @die_id_kobj_attr:	Attribute storage for die_id information
+ * @instance_id_kobj_attr: Attribute storage for instance_id value
  * @uncore_attrs:	Attribute storage for group creation
  *
  * This structure is used to encapsulate all data related to uncore sysfs
@@ -72,6 +74,7 @@ struct uncore_data {
 	int domain_id;
 	int cluster_id;
 	int seqnum_id;
+	int instance_id;
 	char name[32];
 	u16  agent_type_mask;
 
@@ -90,7 +93,8 @@ struct uncore_data {
 	struct kobj_attribute elc_floor_freq_khz_kobj_attr;
 	struct kobj_attribute agent_types_kobj_attr;
 	struct kobj_attribute die_id_kobj_attr;
-	struct attribute *uncore_attrs[15];
+	struct kobj_attribute instance_id_kobj_attr;
+	struct attribute *uncore_attrs[16];
 };
 
 #define UNCORE_DOMAIN_ID_INVALID	-1
diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c
index 1237d9570886..32d03bee09a0 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c
@@ -385,7 +385,19 @@ static u8 io_die_index_next;
 /* Lock to protect io_die_start, io_die_index_next */
 static DEFINE_MUTEX(domain_lock);
 
-static void set_domain_id(int id,  int num_resources,
+static inline void set_instance_id(int id, struct tpmi_uncore_cluster_info *cluster_info)
+{
+	/*
+	 * On non-partitioned systems domain_id can be used for mapping both
+	 * CPUs to compute die IDs and physical die indexes to MMIO mapped
+	 * memory. However on partitioned systems domain_id loses the second
+	 * association. Therefore instance_id should be used for that instead,
+	 * while domain_id should still be used to match CPUs to compute dies.
+	 */
+	cluster_info->uncore_data.instance_id = id;
+}
+
+static void set_domain_id(int id, int num_resources,
 			  struct oobmsm_plat_info *plat_info,
 			  struct tpmi_uncore_cluster_info *cluster_info)
 {
@@ -686,6 +698,7 @@ static int uncore_probe(struct auxiliary_device *auxdev, const struct auxiliary_
 			set_cdie_id(i, cluster_info, plat_info);
 
 			set_domain_id(i, num_resources, plat_info, cluster_info);
+			set_instance_id(i, cluster_info);
 
 			cluster_info->uncore_root = tpmi_uncore;
 
-- 
2.53.0



^ permalink raw reply related

* [PATCH v3 0/2] platform/x86/intel-uncore-freq: Expose instance ID in the sysfs
From: Maciej Wieczor-Retman @ 2026-04-08 16:27 UTC (permalink / raw)
  To: skhan, ilpo.jarvinen, hansg, corbet, srinivas.pandruvada
  Cc: linux-kernel, platform-driver-x86, linux-doc, m.wieczorretman

--- Motivation

This patchset is about exporting instance ID, a value used to uniquely
identify MMIO blocks in TPMI devices. Userspace tools like "pepc" [1]
can use it for direct MMIO reads or writes.

Currently exported information allows doing this on non-partitioned
systems, but partitioned systems require additional steps to map MMIO
blocks.

[1] https://github.com/intel/pepc

--- Background

* TPMI MMIO organization
For each TPMI device a direct register access is possible through MMIO
mapped blocks, where:
- Each block belongs to a different power domain.
- Each power domain is exposed in sysfs via a domain_id attribute.
- Power domain scope is per-die (either IO dies or compute dies).
- Compute die blocks are ordered first, before IO die blocks in
  MMIO space.

* Domain ID mapping
For compute dies, the mapping is architectural through a CPUID leaf or
via MSR 0x54:
- Compute die IDs directly correspond to CPU die IDs
- CPU die ID can be obtained from MSR 0x54 or recent CPUID leaves
- Example: domain_id equal to 1 applies to all CPUs with die ID 1

* IO die mapping
For IO dies, the relationship is generation/platform specific. It's
generally not recommended to assume any specific IO organization but
uncore sysfs provides an attribute to differentiate die types.

* Partitioning
In partitioned systems multiple TPMI devices exist per package. However
CPUs are still enumerated package-wide and so die IDs (domain_id) are
unique per-package. For example a single partition (single TPMI device)
Granite Rapids might order its dies in the following way:

+---------------------+-----------+
| Die type and number | Domain ID |
+---------------------+-----------+
| Compute die 0	      |         0 |
| Compute die 1       |         1 |
| IO die 0            |         2 |
| IO die 1            |         3 |
+---------------------+-----------+

While a two partition system may be numbered in this way:

+---------------------+-------------+-------------+
| Die type and number |         Domain ID         |
| local in single     +-------------+-------------+
| partition scope     | Partition 0 | Partition 1 |
+---------------------+-------------+-------------+
| Compute die 0	      |           0 |           2 |
| Compute die 1       |           1 |           3 |
| IO die 0            |           4 |           6 |
| IO die 1            |           5 |           7 |
+---------------------+-------------+-------------+

The cd_mask value from the TPMI bus info register can show using a
bitmap which compute dies belong to which partition.

* Instance ID
Partition ID is not an architectural value, meaning there is no CPUID or
MSR to map a CPU to a partition number. Therefore to allow mapping CPUs
to compute dies as well as mapping TPMI registers in MMIO mapped space
two numbers need to be exported:
- domain_id
	- Whether the system is partitioned or not it still allows
	  mapping CPUs to compute die IDs.
- instance_id
	- A per-partition (and hence per-device) physical index to still
	  allow mapping MMIO blocks to both compute and IO dies. On
	  partitioned systems mapping IO dies would be very difficult
	  since they are only indexed after all the compute dies are
	  numbered.

As one can see, on non-partitioned systems the instance ID and domain ID
have the same value. It's only on partitioned systems that both values
are needed to keep all mapping functionality. To better show the
relationship this is how values on a partitioned system can look:

+---------------------+-------------+-------------+-------------+-------------+
| Die type and number |         Domain ID         |        Instance ID        |
| local in single     +-------------+-------------+-------------+-------------+
| partition scope     | Partition 0 | Partition 1 | Partition 0 | Partition 1 |
+---------------------+-------------+-------------+-------------+-------------+
| Compute die 0	      |           0 |           2 |           0 |           0 |
| Compute die 1       |           1 |           3 |           1 |           1 |
| IO die 0            |           4 |           6 |           2 |           2 |
| IO die 1            |           5 |           7 |           3 |           3 |
+---------------------+-------------+-------------+-------------+-------------+

Changes in v3:
- Remove sentence from the cover letter claiming that the motivation was
  to replace doing the same thing through MSRs - that was deprecated and
  it's not available.
- sprintf() -> sysfs_emit() in show_instance_id().
- static -> static inline in set_instance_id().
- Small correction to 2/2 patch message.

Maciej Wieczor-Retman (2):
  platform/x86/intel-uncore-freq: Rename instance_id
  platform/x86/intel-uncore-freq: Expose instance ID in the sysfs

 .../pm/intel_uncore_frequency_scaling.rst        |  7 +++++++
 .../uncore-frequency/uncore-frequency-common.c   | 16 +++++++++++++---
 .../uncore-frequency/uncore-frequency-common.h   |  8 ++++++--
 .../uncore-frequency/uncore-frequency-tpmi.c     | 15 ++++++++++++++-
 4 files changed, 40 insertions(+), 6 deletions(-)

-- 
2.53.0



^ permalink raw reply

* htmldocs: Warning: sound/soc/codecs/tas67524.c references a file that doesn't exist: Documentation/sound/codecs/tas675x.rst
From: kernel test robot @ 2026-04-08 16:13 UTC (permalink / raw)
  To: Sen Wang; +Cc: oe-kbuild-all, 0day robot, linux-doc

tree:   https://github.com/intel-lab-lkp/linux/commits/Sen-Wang/ASoC-dt-bindings-Add-ti-tas67524/20260408-141601
head:   6d18e62ff6aa71d56585dca8035437bc9218eb19
commit: 6e3145ebbb92b213c028232cad30d7d99d2ecdbd ASoC: codecs: Add TAS67524 quad-channel audio amplifier driver
date:   10 hours ago
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260408/202604081804.ImZjoifC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604081804.ImZjoifC-lkp@intel.com/

All warnings (new ones prefixed by >>):

   Warning: Documentation/translations/zh_CN/scsi/scsi_mid_low_api.rst references a file that doesn't exist: Documentation/Configure.help
   Warning: MAINTAINERS references a file that doesn't exist: Documentation/ABI/testing/sysfs-platform-ayaneo
   Warning: MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/display/bridge/megachips-stdpxxxx-ge-b850v3-fw.txt
   Warning: arch/powerpc/sysdev/mpic.c references a file that doesn't exist: Documentation/devicetree/bindings/powerpc/fsl/mpic.txt
   Warning: rust/kernel/sync/atomic/ordering.rs references a file that doesn't exist: srctree/tools/memory-model/Documentation/explanation.txt
>> Warning: sound/soc/codecs/tas67524.c references a file that doesn't exist: Documentation/sound/codecs/tas675x.rst
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: Documentation/virtual/lguest/lguest.c
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,\b(\S*)(Documentation/[A-Za-z0-9
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: Documentation/devicetree/dt-object-internal.txt
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,^Documentation/scheduler/sched-pelt
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,(Documentation/translations/[

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 01/24] filelock: add support for ignoring deleg breaks for dir change events
From: Jeff Layton @ 2026-04-08 14:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Christian Brauner, Chuck Lever, Alexander Aring,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, NeilBrown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker,
	Amir Goldstein, Calum Mackay, linux-fsdevel, linux-kernel,
	linux-trace-kernel, linux-doc, linux-nfs
In-Reply-To: <snnggefctfffpb3rsyhjdwmxozqdklqmweiojmxy7owettksgz@6vud2iacgeqc>

On Wed, 2026-04-08 at 15:45 +0200, Jan Kara wrote:
> On Tue 07-04-26 09:21:14, Jeff Layton wrote:
> > If a NFS client requests a directory delegation with a notification
> > bitmask covering directory change events, the server shouldn't recall
> > the delegation. Instead the client will be notified of the change after
> > the fact.
> > 
> > Add support for ignoring lease breaks on directory changes. Add a new
> > flags parameter to try_break_deleg() and teach __break_lease how to
> > ignore certain types of delegation break events.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> Looks good. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> > @@ -222,6 +225,10 @@ struct file_lease *locks_alloc_lease(void);
> >  #define LEASE_BREAK_LAYOUT		BIT(2)	// break layouts only
> >  #define LEASE_BREAK_NONBLOCK		BIT(3)	// non-blocking break
> >  #define LEASE_BREAK_OPEN_RDONLY		BIT(4)	// readonly open event
> > +#define LEASE_BREAK_DIR_CREATE		BIT(6)  // dir deleg create event
> > +#define LEASE_BREAK_DIR_DELETE		BIT(7)  // dir deleg delete event
> > +#define LEASE_BREAK_DIR_RENAME		BIT(8)  // dir deleg rename event
> 
> Just curious why you've left out bit 5 here... :)
> 
> 								Honza

No reason. I've had this series for a couple of years now, and I think
bit 5 got removed at some point after I originally did this patch, and
I didn't notice when I fixed up the conflict. I'll plan to renumber
this for neatness sake.

Thanks for the review!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [PATCH v2] Documentation: gpio: update the preferred method for using software node lookup
From: Bartosz Golaszewski @ 2026-04-08 13:55 UTC (permalink / raw)
  To: Linus Walleij, Bartosz Golaszewski, Jonathan Corbet, Shuah Khan,
	Dmitry Torokhov, Bartosz Golaszewski
  Cc: linux-gpio, linux-doc, linux-kernel
In-Reply-To: <20260403-doc-gpio-swnodes-v2-1-c705f5897b80@oss.qualcomm.com>


On Fri, 03 Apr 2026 15:04:55 +0200, Bartosz Golaszewski wrote:
> In its current version, the manual for converting of board files from
> using GPIO lookup tables to software nodes recommends leaving the
> software nodes representing GPIO controllers as "free-floating", not
> attached objects and relying on the matching of their names against the
> GPIO controller's name. This is an abuse of the software node API and
> makes it impossible to create fw_devlinks between GPIO suppliers and
> consumers in this case. We want to remove this behavior from GPIOLIB and
> to this end, work on converting all existing drivers to using "attached"
> software nodes.
> 
> [...]

Applied, thanks!

[1/1] Documentation: gpio: update the preferred method for using software node lookup
      https://git.kernel.org/brgl/c/d129779da5e3f8878e105fb3ca8519d9ff759a91

Best regards,
-- 
Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

^ permalink raw reply

* Re: [PATCH 00/24] vfs/nfsd: add support for CB_NOTIFY callbacks in directory delegations
From: Jan Kara @ 2026-04-08 13:55 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
	Alexander Aring, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, Amir Goldstein, Calum Mackay, linux-fsdevel,
	linux-kernel, linux-trace-kernel, linux-doc, linux-nfs
In-Reply-To: <20260407-dir-deleg-v1-0-aaf68c478abd@kernel.org>

On Tue 07-04-26 09:21:13, Jeff Layton wrote:
> This patchset builds on the directory delegation work we did a few
> months ago, to add support for CB_NOTIFY callbacks for some events. In
> particular, creates, unlinks and renames. The server also sends updated
> directory attributes in the notifications. With this support, the client
> can register interest in a directory and get notifications about changes
> within it without losing its lease.
> 
> The series starts with patches to allow the vfs to ignore certain types
> of events on directories. nfsd can then request these sorts of
> delegations on directories, and then set up inotify watches on the
> directory to trigger sending CB_NOTIFY events.
> 
> This has mainly been tested with pynfs, with some new testcases that
> I'll be posting soon. They seem to work fine with those tests, but I
> don't think we'll want to merge these until we have a complete
> client-side implementation to test against.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

The fsnotify changes and generic file locking changes look OK to me. I
don't feel confident enough with NFSD stuff to really review that :)

								Honza

> ---
> Jeff Layton (24):
>       filelock: add support for ignoring deleg breaks for dir change events
>       filelock: add a tracepoint to start of break_lease()
>       filelock: add an inode_lease_ignore_mask helper
>       nfsd: add protocol support for CB_NOTIFY
>       nfs_common: add new NOTIFY4_* flags proposed in RFC8881bis
>       nfsd: allow nfsd to get a dir lease with an ignore mask
>       vfs: add fsnotify_modify_mark_mask()
>       nfsd: update the fsnotify mark when setting or removing a dir delegation
>       nfsd: make nfsd4_callback_ops->prepare operation bool return
>       nfsd: add callback encoding and decoding linkages for CB_NOTIFY
>       nfsd: use RCU to protect fi_deleg_file
>       nfsd: add data structures for handling CB_NOTIFY
>       nfsd: add notification handlers for dir events
>       nfsd: add tracepoint to dir_event handler
>       nfsd: apply the notify mask to the delegation when requested
>       nfsd: add helper to marshal a fattr4 from completed args
>       nfsd: allow nfsd4_encode_fattr4_change() to work with no export
>       nfsd: send basic file attributes in CB_NOTIFY
>       nfsd: allow encoding a filehandle into fattr4 without a svc_fh
>       nfsd: add a fi_connectable flag to struct nfs4_file
>       nfsd: add the filehandle to returned attributes in CB_NOTIFY
>       nfsd: properly track requested child attributes
>       nfsd: track requested dir attributes
>       nfsd: add support to CB_NOTIFY for dir attribute changes
> 
>  Documentation/sunrpc/xdr/nfs4_1.x    | 264 ++++++++++++++-
>  fs/attr.c                            |   2 +-
>  fs/locks.c                           |  89 +++++-
>  fs/namei.c                           |  31 +-
>  fs/nfsd/filecache.c                  |  57 +++-
>  fs/nfsd/nfs4callback.c               |  60 +++-
>  fs/nfsd/nfs4layouts.c                |   5 +-
>  fs/nfsd/nfs4proc.c                   |  15 +
>  fs/nfsd/nfs4state.c                  | 524 ++++++++++++++++++++++++++----
>  fs/nfsd/nfs4xdr.c                    | 300 ++++++++++++++---
>  fs/nfsd/nfs4xdr_gen.c                | 601 ++++++++++++++++++++++++++++++++++-
>  fs/nfsd/nfs4xdr_gen.h                |  20 +-
>  fs/nfsd/state.h                      |  70 +++-
>  fs/nfsd/trace.h                      |  21 ++
>  fs/nfsd/xdr4.h                       |   5 +
>  fs/nfsd/xdr4cb.h                     |  12 +
>  fs/notify/mark.c                     |  29 ++
>  fs/posix_acl.c                       |   4 +-
>  fs/xattr.c                           |   4 +-
>  include/linux/filelock.h             |  54 +++-
>  include/linux/fsnotify_backend.h     |   1 +
>  include/linux/nfs4.h                 | 127 --------
>  include/linux/sunrpc/xdrgen/nfs4_1.h | 291 ++++++++++++++++-
>  include/trace/events/filelock.h      |  38 ++-
>  include/uapi/linux/nfs4.h            |   2 -
>  25 files changed, 2321 insertions(+), 305 deletions(-)
> ---
> base-commit: bd5b9fd5e3d55bc412cec4bebe5a11da2151de4a
> change-id: 20260325-dir-deleg-339066dd1017
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: (sashiko review) [PATCH v6 1/1] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
From: SeongJae Park @ 2026-04-08 13:54 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
In-Reply-To: <CALa+Y17YnrOe=UXWBMKJ1U6seKJuauDqAdTDYo1cCYnrP_vSFg@mail.gmail.com>

On Tue, 7 Apr 2026 19:33:43 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> On Tue, Apr 7, 2026 at 9:05 AM SeongJae Park <sj@kernel.org> wrote:
[...]
> Yes SJ. I think we can make it work with single goal now that the
> below commit is part of mainline. will give it a try and post an
> update.

Sounds good, please don't hesitate asking any questions.


Thanks,
SJ

[...]

^ permalink raw reply

* Re: [PATCH 08/24] nfsd: update the fsnotify mark when setting or removing a dir delegation
From: Jan Kara @ 2026-04-08 13:53 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
	Alexander Aring, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, Amir Goldstein, Calum Mackay, linux-fsdevel,
	linux-kernel, linux-trace-kernel, linux-doc, linux-nfs
In-Reply-To: <20260407-dir-deleg-v1-8-aaf68c478abd@kernel.org>

On Tue 07-04-26 09:21:21, Jeff Layton wrote:
> Add a new helper function that will update the mask on the nfsd_file's
> fsnotify_mark to be a union of all current directory delegations on an
> inode. Call that when directory delegations are added or removed.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/nfsd/nfs4state.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c8fb84c38637..9a4cff08c67d 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1258,6 +1258,37 @@ static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct f
>  	}
>  }
>  
> +static void nfsd_fsnotify_recalc_mask(struct nfsd_file *nf)
> +{
> +	struct fsnotify_mark *mark = &nf->nf_mark->nfm_mark;
> +	struct inode *inode = file_inode(nf->nf_file);
> +	u32 lease_mask, set = 0, clear = 0;
> +
> +	/* This is only needed when adding or removing dir delegs */
> +	if (!S_ISDIR(inode->i_mode))
> +		return;
> +
> +	/* Set up notifications for any ignored delegation events */
> +	lease_mask = inode_lease_ignore_mask(inode);
> +
> +	if (lease_mask & FL_IGN_DIR_CREATE)
> +		set |= FS_CREATE;
> +	else
> +		clear |= FS_CREATE;
> +
> +	if (lease_mask & FL_IGN_DIR_DELETE)
> +		set |= FS_DELETE;
> +	else
> +		clear |= FS_DELETE;
> +
> +	if (lease_mask & FL_IGN_DIR_RENAME)
> +		set |= FS_RENAME;
> +	else
> +		clear |= FS_RENAME;
> +
> +	fsnotify_modify_mark_mask(mark, set, clear);
> +}
> +
>  static void nfs4_unlock_deleg_lease(struct nfs4_delegation *dp)
>  {
>  	struct nfs4_file *fp = dp->dl_stid.sc_file;
> @@ -1266,6 +1297,7 @@ static void nfs4_unlock_deleg_lease(struct nfs4_delegation *dp)
>  	WARN_ON_ONCE(!fp->fi_delegees);
>  
>  	nfsd4_finalize_deleg_timestamps(dp, nf->nf_file);
> +	nfsd_fsnotify_recalc_mask(nf);
>  	kernel_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
>  	put_deleg_file(fp);
>  }
> @@ -9652,6 +9684,7 @@ nfsd_get_dir_deleg(struct nfsd4_compound_state *cstate,
>  
>  	if (!status) {
>  		put_nfs4_file(fp);
> +		nfsd_fsnotify_recalc_mask(nf);
>  		return dp;
>  	}
>  
> 
> -- 
> 2.53.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox