linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table
@ 2025-05-07 21:43 shiju.jose
  2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: shiju.jose @ 2025-05-07 21:43 UTC (permalink / raw)
  To: linux-edac, linux-acpi, linux-doc
  Cc: bp, rafael, tony.luck, lenb, leo.duran, Yazen.Ghannam, mchehab,
	jonathan.cameron, linux-mm, linuxarm, rientjes, jiaqiyan,
	Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 specification, section 5.2.21 and RAS2 HW based memory
scrubbing feature.

ACPI RAS2 patches were part of the EDAC series [1].

The code is based on ras.git: edac-for-next branch [2]
merged with linux-pm.git [3] : linux-next branch.

1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
2. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-for-next
3. https://web.git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/

Changes
=======
v4 -> v5:
1. Fix for the build warnings reported by kernel test robot.
   https://patchwork.kernel.org/project/linux-edac/patch/20250423163511.1412-3-shiju.jose@huawei.com/
2. Removed patch "ACPI: ACPI 6.5: RAS2: Rename RAS2 table structure and field names"
   from the series as the ACPICA patch was merged to linux-pm.git : branch linux-next
3. Rebased to ras.git: edac-for-next branch merged with linux-pm.git : linux-next branch.
      
v3 -> v4:
1.  Changes for feedbacks from Yazen on v3.
    https://lore.kernel.org/all/20250415210504.GA854098@yaz-khff2.amd.com/

v2 -> v3:
1. Rename RAS2 table structure and field names in 
   include/acpi/actbl2.h limited to only necessary
   for RAS2 scrub feature. Not for merging.
   Corresponding changes are merged in the acpica:
   https://github.com/acpica/acpica/commit/2c8a38f747de9d977491a494faf0dfaf799b777b
2. Changes for feedbacks from Jonathan on v2.
3. Daniel reported a known behaviour: when readback 'size' attribute after
   setting in, returns 0 before starting scrubbing via 'addr' attribute.
   Changes added to fix this.
4. Daniel reported that firmware cannot update status of demand scrubbing
   via the 'Actual Address Range (OUTPUT)', thus add workaround in the
   kernel to update sysfs 'addr' attribute with the status of demand
   scrubbing.
5. Optimized logic in ras2_check_pcc_chan() function
   (patch - ACPI:RAS2: Add ACPI RAS2 driver).
6. Add PCC channel lock to struct ras2_pcc_subspace and change
   lock in ras2_mem_ctx as a pointer to pcc channel lock to make sure
   writing to PCC subspace shared memory is protected from race conditions.
   
v1 -> v2:
1.  Changes for feedbacks from Borislav.
    - Shorten ACPI RAS2 structures and variables names.
    - Shorten some of the other variables in the RAS2 drivers.
    - Fixed few CamelCases.

2.  Changes for feedbacks from Yazen.
    - Added newline after number of '}' and return statements.
    - Changed return type for "ras2_add_aux_device() to 'int'.
    - Deleted a duplication of acpi_get_table("RAS2",...) in the ras2_acpi_parse_table().
    - Add "FW_WARN" to few error logs in the ras2_acpi_parse_table().
    - Rename ras2_acpi_init() to acpi_ras2_init() and modified to call acpi_ras2_init()
      function from the acpi_init().
    - Moved scrub related variables from the struct ras2_mem_ctx from  patch
      "ACPI:RAS2: Add ACPI RAS2 driver" to "ras: mem: Add memory ACPI RAS2 driver".  

Shiju Jose (2):
  ACPI:RAS2: Add ACPI RAS2 driver
  ras: mem: Add memory ACPI RAS2 driver

 Documentation/edac/scrub.rst |  76 ++++++
 drivers/acpi/Kconfig         |  11 +
 drivers/acpi/Makefile        |   1 +
 drivers/acpi/bus.c           |   3 +
 drivers/acpi/ras2.c          | 451 +++++++++++++++++++++++++++++++++++
 drivers/ras/Kconfig          |  11 +
 drivers/ras/Makefile         |   1 +
 drivers/ras/acpi_ras2.c      | 406 +++++++++++++++++++++++++++++++
 include/acpi/ras2.h          |  70 ++++++
 9 files changed, 1030 insertions(+)
 create mode 100644 drivers/acpi/ras2.c
 create mode 100644 drivers/ras/acpi_ras2.c
 create mode 100644 include/acpi/ras2.h

-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
  2025-05-07 21:43 [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
@ 2025-05-07 21:43 ` shiju.jose
  2025-05-14  2:55   ` Daniel Ferguson
  2025-05-07 21:43 ` [PATCH v5 2/2] ras: mem: Add memory " shiju.jose
  2025-05-12  8:16 ` [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
  2 siblings, 1 reply; 9+ messages in thread
From: shiju.jose @ 2025-05-07 21:43 UTC (permalink / raw)
  To: linux-edac, linux-acpi, linux-doc
  Cc: bp, rafael, tony.luck, lenb, leo.duran, Yazen.Ghannam, mchehab,
	jonathan.cameron, linux-mm, linuxarm, rientjes, jiaqiyan,
	Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 Specification, section 5.2.21.
Driver defines RAS2 Init, which extracts the RAS2 table and driver
adds auxiliary device for each memory feature which binds to the
RAS2 memory driver.

Driver uses PCC mailbox to communicate with the ACPI HW and the
driver adds OSPM interfaces to send RAS2 commands.

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/Kconfig  |  11 ++
 drivers/acpi/Makefile |   1 +
 drivers/acpi/bus.c    |   3 +
 drivers/acpi/ras2.c   | 451 ++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ras2.h   |  54 +++++
 5 files changed, 520 insertions(+)
 create mode 100644 drivers/acpi/ras2.c
 create mode 100644 include/acpi/ras2.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 7f10aa38269d..7b470cf2fd71 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -293,6 +293,17 @@ config ACPI_CPPC_LIB
 	  If your platform does not support CPPC in firmware,
 	  leave this option disabled.
 
+config ACPI_RAS2
+	bool "ACPI RAS2 driver"
+	select AUXILIARY_BUS
+	select MAILBOX
+	select PCC
+	help
+	  The driver adds support for ACPI RAS2 feature table(extracts RAS2
+	  table from OS system table) and OSPM interfaces to send RAS2
+	  commands via PCC mailbox subspace. Driver adds platform device for
+	  the RAS2 memory features which binds to the RAS2 memory driver.
+
 config ACPI_PROCESSOR
 	tristate "Processor"
 	depends on X86 || ARM64 || LOONGARCH || RISCV
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 797070fc9a3f..da2cc89c7f9a 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -104,6 +104,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
+obj-$(CONFIG_ACPI_RAS2)		+= ras2.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
 obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 obj-$(CONFIG_ACPI_PFRUT)	+= pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 058910af82bc..d45a2bb066dd 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -31,6 +31,7 @@
 #include <acpi/apei.h>
 #include <linux/suspend.h>
 #include <linux/prmt.h>
+#include <acpi/ras2.h>
 
 #include "internal.h"
 
@@ -1472,6 +1473,8 @@ static int __init acpi_init(void)
 	acpi_debugger_init();
 	acpi_setup_sb_notify_handler();
 	acpi_viot_init();
+	acpi_ras2_init();
+
 	return 0;
 }
 
diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
new file mode 100644
index 000000000000..bc5caf2566be
--- /dev/null
+++ b/drivers/acpi/ras2.c
@@ -0,0 +1,451 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Implementation of ACPI RAS2 driver.
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited.
+ *
+ * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21
+ *
+ * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and
+ * get the PCC channel subspace for communicating with the ACPI compliant
+ * HW platform which supports ACPI RAS2. Driver adds auxiliary devices
+ * for each RAS2 memory feature which binds to the memory ACPI RAS2 driver.
+ */
+
+#define pr_fmt(fmt) "ACPI RAS2: " fmt
+
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/iopoll.h>
+#include <linux/ktime.h>
+#include <acpi/pcc.h>
+#include <acpi/ras2.h>
+
+static struct acpi_table_ras2 *__read_mostly ras2_tab;
+
+/**
+ * struct ras2_pcc_subspace - Data structure for PCC communication
+ * @mbox_client:	struct mbox_client object
+ * @pcc_chan:		Pointer to struct pcc_mbox_chan
+ * @comm_addr:		Pointer to RAS2 PCC shared memory region
+ * @elem:		List for registered RAS2 PCC channel subspaces
+ * @pcc_lock:		PCC lock to provide mutually exclusive access
+ *			to PCC channel subspace
+ * @deadline_us:	Poll PCC status register timeout in micro secs
+ *			for PCC command complete
+ * @pcc_mpar:		Maximum Periodic Access Rate(MPAR) for PCC channel
+ * @pcc_mrtt:		Minimum Request Turnaround Time(MRTT) in micro secs
+ *			OS must wait after completion of a PCC command before
+ *			issue next command
+ * @last_cmd_cmpl_time:	completion time of last PCC command
+ * @last_mpar_reset:	Time of last MPAR count reset.
+ * @mpar_count:		MPAR count
+ * @pcc_id:		Identifier of the RAS2 platform communication channel
+ * @pcc_chnl_acq:	Status of PCC channel acquired.
+ * @kref:		kref object
+ */
+struct ras2_pcc_subspace {
+	struct mbox_client		mbox_client;
+	struct pcc_mbox_chan		*pcc_chan;
+	struct acpi_ras2_shmem __iomem	*comm_addr;
+	struct list_head		elem;
+	struct mutex			pcc_lock;
+	unsigned int			deadline_us;
+	unsigned int			pcc_mpar;
+	unsigned int			pcc_mrtt;
+	ktime_t				last_cmd_cmpl_time;
+	ktime_t				last_mpar_reset;
+	int				mpar_count;
+	int				pcc_id;
+	bool				pcc_chnl_acq;
+	struct kref			kref;
+};
+
+/*
+ * Arbitrary retries for PCC commands because the remote processor
+ * could be much slower to reply. Keeping it high enough to cover
+ * emulators where the processors run painfully slow.
+ */
+#define RAS2_NUM_RETRIES 600ULL
+
+#define RAS2_FEAT_TYPE_MEMORY 0x00
+
+/* Static variables for the RAS2 PCC subspaces */
+static DEFINE_MUTEX(ras2_pcc_list_lock);
+static LIST_HEAD(ras2_pcc_subspaces);
+
+static int ras2_report_cap_error(u32 cap_status)
+{
+	switch (cap_status) {
+	case ACPI_RAS2_NOT_VALID:
+	case ACPI_RAS2_NOT_SUPPORTED:
+		return -EPERM;
+	case ACPI_RAS2_BUSY:
+		return -EBUSY;
+	case ACPI_RAS2_FAILED:
+	case ACPI_RAS2_ABORTED:
+	case ACPI_RAS2_INVALID_DATA:
+		return -EINVAL;
+	default: /* 0 or other, Success */
+		return 0;
+	}
+}
+
+static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
+{
+	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr;
+	u32 cap_status;
+	u16 status;
+	u32 rc;
+
+	/*
+	 * As per ACPI spec, the PCC space will be initialized by
+	 * platform and should have set the command completion bit when
+	 * PCC can be used by OSPM.
+	 *
+	 * Poll PCC status register every 3us(delay_us) for maximum of
+	 * deadline_us(timeout_us) until PCC command complete bit is set(cond).
+	 */
+	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
+					status & PCC_STATUS_CMD_COMPLETE, 3,
+					pcc_subspace->deadline_us);
+	if (rc) {
+		pr_warn("PCC check channel failed for : %d rc=%d\n",
+			pcc_subspace->pcc_id, rc);
+		return rc;
+	}
+
+	if (status & PCC_STATUS_ERROR) {
+		cap_status = readw_relaxed(&gen_comm_base->set_caps_status);
+		rc = ras2_report_cap_error(cap_status);
+
+		status &= ~PCC_STATUS_ERROR;
+		writew_relaxed(status, &gen_comm_base->status);
+		return rc;
+	}
+
+	if (status & PCC_STATUS_CMD_COMPLETE)
+		return 0;
+
+	return -EIO;
+}
+
+/**
+ * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel
+ * @ras2_ctx:	pointer to the RAS2 context structure
+ * @cmd:	command to send
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd)
+{
+	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr;
+	struct mbox_chan *pcc_channel;
+	unsigned int time_delta;
+	int rc;
+
+	rc = ras2_check_pcc_chan(pcc_subspace);
+	if (rc < 0)
+		return rc;
+
+	pcc_channel = pcc_subspace->pcc_chan->mchan;
+
+	/*
+	 * Handle the Minimum Request Turnaround Time(MRTT).
+	 * "The minimum amount of time that OSPM must wait after the completion
+	 * of a command before issuing the next command, in microseconds."
+	 */
+	if (pcc_subspace->pcc_mrtt) {
+		time_delta = ktime_us_delta(ktime_get(),
+					    pcc_subspace->last_cmd_cmpl_time);
+		if (pcc_subspace->pcc_mrtt > time_delta)
+			udelay(pcc_subspace->pcc_mrtt - time_delta);
+	}
+
+	/*
+	 * Handle the non-zero Maximum Periodic Access Rate(MPAR).
+	 * "The maximum number of periodic requests that the subspace channel can
+	 * support, reported in commands per minute. 0 indicates no limitation."
+	 *
+	 * This parameter should be ideally zero or large enough so that it can
+	 * handle maximum number of requests that all the cores in the system can
+	 * collectively generate. If it is not, we will follow the spec and just
+	 * not send the request to the platform after hitting the MPAR limit in
+	 * any 60s window.
+	 */
+	if (pcc_subspace->pcc_mpar) {
+		if (pcc_subspace->mpar_count == 0) {
+			time_delta = ktime_ms_delta(ktime_get(),
+						    pcc_subspace->last_mpar_reset);
+			if (time_delta < 60 * MSEC_PER_SEC) {
+				dev_dbg(ras2_ctx->dev,
+					"PCC cmd not sent due to MPAR limit");
+				return -EIO;
+			}
+			pcc_subspace->last_mpar_reset = ktime_get();
+			pcc_subspace->mpar_count = pcc_subspace->pcc_mpar;
+		}
+		pcc_subspace->mpar_count--;
+	}
+
+	/* Write to the shared comm region */
+	writew_relaxed(cmd, &gen_comm_base->command);
+
+	/* Flip CMD COMPLETE bit */
+	writew_relaxed(0, &gen_comm_base->status);
+
+	/* Ring doorbell */
+	rc = mbox_send_message(pcc_channel, &cmd);
+	if (rc < 0) {
+		dev_warn(ras2_ctx->dev,
+			 "Err sending PCC mbox message. cmd:%d, rc:%d\n", cmd, rc);
+		return rc;
+	}
+
+	/*
+	 * If Minimum Request Turnaround Time is non-zero, we need
+	 * to record the completion time of both READ and WRITE
+	 * command for proper handling of MRTT, so we need to check
+	 * for pcc_mrtt in addition to CMD_READ.
+	 */
+	if (cmd == PCC_CMD_EXEC_RAS2 || pcc_subspace->pcc_mrtt) {
+		rc = ras2_check_pcc_chan(pcc_subspace);
+		if (pcc_subspace->pcc_mrtt)
+			pcc_subspace->last_cmd_cmpl_time = ktime_get();
+	}
+
+	if (pcc_channel->mbox->txdone_irq)
+		mbox_chan_txdone(pcc_channel, rc);
+	else
+		mbox_client_txdone(pcc_channel, rc);
+
+	return rc >= 0 ? 0 : rc;
+}
+EXPORT_SYMBOL_GPL(ras2_send_pcc_cmd);
+
+static void ras2_list_pcc_release(struct kref *kref)
+{
+	struct ras2_pcc_subspace *pcc_subspace =
+		container_of(kref, struct ras2_pcc_subspace, kref);
+
+	guard(mutex)(&ras2_pcc_list_lock);
+	list_del(&pcc_subspace->elem);
+	pcc_mbox_free_channel(pcc_subspace->pcc_chan);
+	kfree(pcc_subspace);
+}
+
+static void ras2_pcc_get(struct ras2_pcc_subspace *pcc_subspace)
+{
+	kref_get(&pcc_subspace->kref);
+}
+
+static void ras2_pcc_put(struct ras2_pcc_subspace *pcc_subspace)
+{
+	kref_put(&pcc_subspace->kref,  &ras2_list_pcc_release);
+}
+
+static struct ras2_pcc_subspace *ras2_get_pcc_subspace(int pcc_id)
+{
+	struct ras2_pcc_subspace *pcc_subspace;
+
+	mutex_lock(&ras2_pcc_list_lock);
+	list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
+		if (pcc_subspace->pcc_id != pcc_id)
+			continue;
+		ras2_pcc_get(pcc_subspace);
+		mutex_unlock(&ras2_pcc_list_lock);
+		return pcc_subspace;
+	}
+	mutex_unlock(&ras2_pcc_list_lock);
+
+	return NULL;
+}
+
+static int ras2_register_pcc_channel(struct ras2_mem_ctx *ras2_ctx, int pcc_id)
+{
+	struct ras2_pcc_subspace *pcc_subspace;
+	struct pcc_mbox_chan *pcc_chan;
+	struct mbox_client *mbox_cl;
+
+	if (pcc_id < 0)
+		return -EINVAL;
+
+	pcc_subspace = ras2_get_pcc_subspace(pcc_id);
+	if (pcc_subspace) {
+		ras2_ctx->pcc_subspace	= pcc_subspace;
+		ras2_ctx->comm_addr	= pcc_subspace->comm_addr;
+		ras2_ctx->dev		= pcc_subspace->pcc_chan->mchan->mbox->dev;
+		ras2_ctx->pcc_lock	= &pcc_subspace->pcc_lock;
+		return 0;
+	}
+
+	pcc_subspace = kzalloc(sizeof(*pcc_subspace), GFP_KERNEL);
+	if (!pcc_subspace)
+		return -ENOMEM;
+
+	mbox_cl			= &pcc_subspace->mbox_client;
+	mbox_cl->knows_txdone	= true;
+
+	pcc_chan = pcc_mbox_request_channel(mbox_cl, pcc_id);
+	if (IS_ERR(pcc_chan)) {
+		kfree(pcc_subspace);
+		return PTR_ERR(pcc_chan);
+	}
+
+	pcc_subspace->pcc_id		= pcc_id;
+	pcc_subspace->pcc_chan		= pcc_chan;
+	pcc_subspace->comm_addr		= acpi_os_ioremap(pcc_chan->shmem_base_addr,
+							  pcc_chan->shmem_size);
+	pcc_subspace->deadline_us	= RAS2_NUM_RETRIES * pcc_chan->latency;
+	pcc_subspace->pcc_mrtt		= pcc_chan->min_turnaround_time;
+	pcc_subspace->pcc_mpar		= pcc_chan->max_access_rate;
+	pcc_subspace->mbox_client.knows_txdone	= true;
+	pcc_subspace->pcc_chnl_acq	= true;
+
+	kref_init(&pcc_subspace->kref);
+
+	mutex_lock(&ras2_pcc_list_lock);
+	list_add(&pcc_subspace->elem, &ras2_pcc_subspaces);
+	ras2_pcc_get(pcc_subspace);
+	mutex_unlock(&ras2_pcc_list_lock);
+
+	ras2_ctx->pcc_subspace	= pcc_subspace;
+	ras2_ctx->comm_addr	= pcc_subspace->comm_addr;
+	ras2_ctx->dev		= pcc_chan->mchan->mbox->dev;
+
+	mutex_init(&pcc_subspace->pcc_lock);
+	ras2_ctx->pcc_lock	= &pcc_subspace->pcc_lock;
+
+	return 0;
+}
+
+static DEFINE_IDA(ras2_ida);
+static void ras2_release(struct device *device)
+{
+	struct auxiliary_device *auxdev = container_of(device, struct auxiliary_device, dev);
+	struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+
+	ida_free(&ras2_ida, auxdev->id);
+	ras2_pcc_put(ras2_ctx->pcc_subspace);
+	kfree(ras2_ctx);
+}
+
+static int ras2_add_aux_device(char *name, int channel)
+{
+	struct ras2_mem_ctx *ras2_ctx;
+	int id, rc;
+
+	ras2_ctx = kzalloc(sizeof(*ras2_ctx), GFP_KERNEL);
+	if (!ras2_ctx)
+		return -ENOMEM;
+
+	rc = ras2_register_pcc_channel(ras2_ctx, channel);
+	if (rc < 0) {
+		pr_debug("failed to register pcc channel rc=%d\n", rc);
+		goto ctx_free;
+	}
+
+	id = ida_alloc(&ras2_ida, GFP_KERNEL);
+	if (id < 0) {
+		rc = id;
+		goto pcc_free;
+	}
+
+	ras2_ctx->id			= id;
+	ras2_ctx->adev.id		= id;
+	ras2_ctx->adev.name		= RAS2_MEM_DEV_ID_NAME;
+	ras2_ctx->adev.dev.release	= ras2_release;
+	ras2_ctx->adev.dev.parent	= ras2_ctx->dev;
+
+	rc = auxiliary_device_init(&ras2_ctx->adev);
+	if (rc)
+		goto ida_free;
+
+	rc = auxiliary_device_add(&ras2_ctx->adev);
+	if (rc) {
+		auxiliary_device_uninit(&ras2_ctx->adev);
+		return rc;
+	}
+
+	return 0;
+
+ida_free:
+	ida_free(&ras2_ida, id);
+pcc_free:
+	ras2_pcc_put(ras2_ctx->pcc_subspace);
+ctx_free:
+	kfree(ras2_ctx);
+
+	return rc;
+}
+
+static int acpi_ras2_parse(void)
+{
+	struct acpi_ras2_pcc_desc *pcc_desc_list;
+	u16 i, count;
+	int pcc_id;
+	int rc;
+
+	if (ras2_tab->header.length < sizeof(*ras2_tab)) {
+		pr_warn(FW_WARN "ACPI RAS2 table present but broken (too short #1)\n");
+		return -EINVAL;
+	}
+
+	if (!ras2_tab->num_pcc_descs) {
+		pr_warn(FW_WARN "No PCC descs in ACPI RAS2 table\n");
+		return -EINVAL;
+	}
+
+	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
+	/* Double scan for the case of only one actual controller */
+	pcc_id = -1;
+	for (i = 0, count = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
+			continue;
+		if (pcc_id == -1) {
+			pcc_id = pcc_desc_list->channel_id;
+			count++;
+		}
+		if (pcc_desc_list->channel_id != pcc_id)
+			count++;
+	}
+
+	/*
+	 * Workaround for the client platform with multiple scrub devices
+	 * but uses single PCC subspace for communication.
+	 */
+	if (count == 1) {
+		/* Add auxiliary device and bind ACPI RAS2 memory driver */
+		return ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_id);
+	}
+
+	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
+	for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
+			continue;
+
+		rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, pcc_desc_list->channel_id);
+		if (rc)
+			pr_warn("Failed to add RAS2 auxiliary device rc=%d\n", rc);
+	}
+
+	return 0;
+}
+
+void __init acpi_ras2_init(void)
+{
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_RAS2, 0,
+				(struct acpi_table_header **)&ras2_tab);
+	if (ACPI_FAILURE(status)) {
+		pr_err("Failed to get table, %s\n", acpi_format_exception(status));
+		return;
+	}
+
+	if (acpi_ras2_parse())
+		pr_err("Failed to parse RAS2 table\n");
+
+	acpi_put_table((struct acpi_table_header *)ras2_tab);
+}
diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
new file mode 100644
index 000000000000..f8937069cd4a
--- /dev/null
+++ b/include/acpi/ras2.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ACPI RAS2 driver header file
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited
+ */
+
+#ifndef _ACPI_RAS2_H
+#define _ACPI_RAS2_H
+
+#include <linux/acpi.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct device;
+
+/*
+ * ACPI spec 6.5 Table 5.82: PCC command codes used by
+ * RAS2 platform communication channel.
+ */
+#define PCC_CMD_EXEC_RAS2 0x01
+
+#define RAS2_AUX_DEV_NAME "ras2"
+#define RAS2_MEM_DEV_ID_NAME "acpi_ras2_mem"
+
+/**
+ * struct ras2_mem_ctx - Context for RAS2 memory features
+ * @id:			Unique identifier for the RAS2 feature
+ * @adev:		Auxiliary device object
+ * @comm_addr:		Pointer to RAS2 PCC shared memory region
+ * @dev:		Pointer to device backing struct mbox_controller for PCC
+ * @pcc_subspace:	Pointer to local data structure for PCC communication
+ * @pcc_lock:		Pointer to PCC lock to provide mutually exclusive access
+ *			to PCC channel subspace
+ */
+struct ras2_mem_ctx {
+	int				id;
+	struct auxiliary_device		adev;
+	struct acpi_ras2_shmem __iomem	*comm_addr;
+	struct device			*dev;
+	void				*pcc_subspace;
+	struct mutex			*pcc_lock;
+};
+
+#ifdef CONFIG_ACPI_RAS2
+void __init acpi_ras2_init(void);
+int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd);
+#else
+static inline void acpi_ras2_init(void) { }
+#endif
+
+#endif /* _ACPI_RAS2_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/2] ras: mem: Add memory ACPI RAS2 driver
  2025-05-07 21:43 [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
  2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2025-05-07 21:43 ` shiju.jose
  2025-05-12  8:16 ` [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
  2 siblings, 0 replies; 9+ messages in thread
From: shiju.jose @ 2025-05-07 21:43 UTC (permalink / raw)
  To: linux-edac, linux-acpi, linux-doc
  Cc: bp, rafael, tony.luck, lenb, leo.duran, Yazen.Ghannam, mchehab,
	jonathan.cameron, linux-mm, linuxarm, rientjes, jiaqiyan,
	Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Memory ACPI RAS2 auxiliary driver binds to the auxiliary device
add by the ACPI RAS2 table parser.

Driver uses a PCC subspace for communicating with the ACPI compliant
platform.

Device with ACPI RAS2 scrub feature registers with EDAC device driver,
which retrieves the scrub descriptor from EDAC scrub and exposes
the scrub control attributes for RAS2 scrub instance to userspace in
/sys/bus/edac/devices/acpi_ras_mem0/scrubX/.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/edac/scrub.rst |  76 +++++++
 drivers/ras/Kconfig          |  11 +
 drivers/ras/Makefile         |   1 +
 drivers/ras/acpi_ras2.c      | 406 +++++++++++++++++++++++++++++++++++
 include/acpi/ras2.h          |  16 ++
 5 files changed, 510 insertions(+)
 create mode 100644 drivers/ras/acpi_ras2.c

diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
index daab929cdba1..171d70bff731 100644
--- a/Documentation/edac/scrub.rst
+++ b/Documentation/edac/scrub.rst
@@ -264,3 +264,79 @@ Sysfs files are documented in
 `Documentation/ABI/testing/sysfs-edac-scrub`
 
 `Documentation/ABI/testing/sysfs-edac-ecs`
+
+Examples
+--------
+
+The usage takes the form shown in these examples:
+
+1. ACPI RAS2
+
+1.1 On demand scrubbing for a specific memory region.
+
+1.1.1. Query what is device default/current scrub cycle setting.
+
+       Applicable to both on-demand and background scrubbing.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+36000
+
+1.1.2 Query the range of device supported scrub cycle for a memory region.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/min_cycle_duration
+
+3600
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/max_cycle_duration
+
+86400
+
+1.1.3. Program scrubbing for the memory region in RAS2 device to repeat every
+43200 seconds (half a day).
+
+# echo 43200 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+1.1.4. Program address and size of the memory region to scrub
+
+Readback 'addr', non-zero - demand scrub is in progress, zero - scrub is finished.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+0
+
+Write 'size' of the memory region to scrub.
+
+# echo 0x300000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/size
+
+Write 'addr' starts demand scrubbing, please make sure other attributes are
+set prior to that.
+
+# echo 0x200000 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+Readback 'addr', non-zero - demand scrub is in progress, zero - scrub is finished.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+0x200000
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/addr
+
+0
+
+1.2 Background scrubbing the entire memory
+
+1.2.3 Query the status of background scrubbing.
+
+# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
+
+0
+
+1.2.4. Program background scrubbing for RAS2 device to repeat in every 21600
+seconds (quarter of a day).
+
+# echo 21600 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_duration
+
+1.2.5. Start 'background scrubbing'.
+
+# echo 1 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..a88002f1f462 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -46,4 +46,15 @@ config RAS_FMPM
 	  Memory will be retired during boot time and run time depending on
 	  platform-specific policies.
 
+config MEM_ACPI_RAS2
+	tristate "Memory ACPI RAS2 driver"
+	depends on ACPI_RAS2
+	depends on EDAC
+	depends on EDAC_SCRUB
+	help
+	  The driver binds to the platform device added by the ACPI RAS2
+	  table parser. Use a PCC channel subspace for communicating with
+	  the ACPI compliant platform to provide control of memory scrub
+	  parameters to the user via the EDAC scrub.
+
 endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..a0e6e903d6b0 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_RAS)	+= ras.o
 obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_RAS_CEC)	+= cec.o
+obj-$(CONFIG_MEM_ACPI_RAS2)	+= acpi_ras2.o
 
 obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
 obj-y			+= amd/atl/
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
new file mode 100644
index 000000000000..4d9cfd3bdf45
--- /dev/null
+++ b/drivers/ras/acpi_ras2.c
@@ -0,0 +1,406 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ACPI RAS2 memory driver
+ *
+ * Copyright (c) 2024-2025 HiSilicon Limited.
+ *
+ */
+
+#define pr_fmt(fmt)	"ACPI RAS2 MEMORY: " fmt
+
+#include <linux/bitfield.h>
+#include <linux/edac.h>
+#include <linux/platform_device.h>
+#include <acpi/ras2.h>
+
+#define RAS2_SUPPORT_HW_PARTOL_SCRUB BIT(0)
+#define RAS2_TYPE_PATROL_SCRUB 0x0000
+
+#define RAS2_GET_PATROL_PARAMETERS 0x01
+#define RAS2_START_PATROL_SCRUBBER 0x02
+#define RAS2_STOP_PATROL_SCRUBBER 0x03
+
+/*
+ * RAS2 patrol scrub
+ */
+#define RAS2_PS_SC_HRS_IN_MASK GENMASK(15, 8)
+#define RAS2_PS_EN_BACKGROUND BIT(0)
+#define RAS2_PS_SC_HRS_OUT_MASK GENMASK(7, 0)
+#define RAS2_PS_MIN_SC_HRS_OUT_MASK GENMASK(15, 8)
+#define RAS2_PS_MAX_SC_HRS_OUT_MASK GENMASK(23, 16)
+#define RAS2_PS_FLAG_SCRUB_RUNNING BIT(0)
+
+#define RAS2_SCRUB_NAME_LEN 128
+#define RAS2_HOUR_IN_SECS 3600
+
+enum ras2_od_scrub_status {
+	OD_SCRUB_STS_IDLE,
+	OD_SCRUB_STS_INIT,
+	OD_SCRUB_STS_ACTIVE,
+};
+
+struct acpi_ras2_ps_shared_mem {
+	struct acpi_ras2_shmem common;
+	struct acpi_ras2_patrol_scrub_param params;
+};
+
+#define TO_ACPI_RAS2_PS_SHMEM(_addr) \
+	container_of(_addr, struct acpi_ras2_ps_shared_mem, common)
+
+static int ras2_is_patrol_scrub_support(struct ras2_mem_ctx *ras2_ctx)
+{
+	struct acpi_ras2_shmem __iomem *common = (void *)ras2_ctx->comm_addr;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	common->set_caps[0] = 0;
+
+	return common->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB;
+}
+
+static int ras2_update_patrol_scrub_params_cache(struct ras2_mem_ctx *ras2_ctx)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	int ret;
+
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ps_sm->params.command = RAS2_GET_PATROL_PARAMETERS;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to read parameters\n");
+		return ret;
+	}
+
+	ras2_ctx->min_scrub_cycle = FIELD_GET(RAS2_PS_MIN_SC_HRS_OUT_MASK,
+					      ps_sm->params.scrub_params_out);
+	ras2_ctx->max_scrub_cycle = FIELD_GET(RAS2_PS_MAX_SC_HRS_OUT_MASK,
+					      ps_sm->params.scrub_params_out);
+	ras2_ctx->scrub_cycle_hrs = FIELD_GET(RAS2_PS_SC_HRS_OUT_MASK,
+					      ps_sm->params.scrub_params_out);
+	if (ras2_ctx->bg_scrub) {
+		ras2_ctx->base = 0;
+		ras2_ctx->size = 0;
+		ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
+		return 0;
+	}
+
+	if  (ps_sm->params.flags & RAS2_PS_FLAG_SCRUB_RUNNING) {
+		ras2_ctx->base = ps_sm->params.actl_addr_range[0];
+		ras2_ctx->size = ps_sm->params.actl_addr_range[1];
+	} else if (ras2_ctx->od_scrub_sts != OD_SCRUB_STS_INIT) {
+		/*
+		 * When demand scrubbing is finished driver resets actual
+		 * address range to 0 when readback. Otherwise userspace
+		 * assumes demand scrubbing is in progress.
+		 */
+		ras2_ctx->base = 0;
+		ras2_ctx->size = 0;
+		ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
+	}
+
+	return 0;
+}
+
+/* Context - PCC lock must be held */
+static int ras2_get_patrol_scrub_running(struct ras2_mem_ctx *ras2_ctx,
+					 bool *running)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	int ret;
+
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ps_sm->params.command = RAS2_GET_PATROL_PARAMETERS;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to read parameters\n");
+		return ret;
+	}
+
+	*running = ps_sm->params.flags & RAS2_PS_FLAG_SCRUB_RUNNING;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data,
+					      u32 *min)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*min = ras2_ctx->min_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data,
+					      u32 *max)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*max = ras2_ctx->max_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_cycle_read(struct device *dev, void *drv_data,
+				    u32 *scrub_cycle_secs)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*scrub_cycle_secs = ras2_ctx->scrub_cycle_hrs * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_cycle_write(struct device *dev, void *drv_data,
+				     u32 scrub_cycle_secs)
+{
+	u8 scrub_cycle_hrs = scrub_cycle_secs / RAS2_HOUR_IN_SECS;
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	if (scrub_cycle_hrs < ras2_ctx->min_scrub_cycle ||
+	    scrub_cycle_hrs > ras2_ctx->max_scrub_cycle)
+		return -EINVAL;
+
+	ras2_ctx->scrub_cycle_hrs = scrub_cycle_hrs;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_addr(struct device *dev, void *drv_data, u64 *base)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	int ret;
+
+	/*
+	 * When BG scrubbing is enabled the actual address range is not valid.
+	 * Return -EBUSY now unless find out a method to retrieve actual full PA range.
+	 */
+	if (ras2_ctx->bg_scrub)
+		return -EBUSY;
+
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	*base = ras2_ctx->base;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_size(struct device *dev, void *drv_data, u64 *size)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	int ret;
+
+	if (ras2_ctx->bg_scrub)
+		return -EBUSY;
+
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	*size = ras2_ctx->size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	if (ras2_ctx->bg_scrub)
+		return -EBUSY;
+
+	if (!base || !ras2_ctx->size) {
+		dev_warn(ras2_ctx->dev,
+			 "%s: Invalid address range, base=0x%llx size=0x%llx\n",
+			 __func__, base, ras2_ctx->size);
+		return -ERANGE;
+	}
+
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
+	ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK,
+						    ras2_ctx->scrub_cycle_hrs);
+	ps_sm->params.req_addr_range[0] = base;
+	ps_sm->params.req_addr_range[1] = ras2_ctx->size;
+	ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
+	ps_sm->params.command = RAS2_START_PATROL_SCRUBBER;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "Failed to start demand scrubbing\n");
+		return ret;
+	}
+	ras2_ctx->od_scrub_sts = OD_SCRUB_STS_ACTIVE;
+
+	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
+}
+
+static int ras2_hw_scrub_write_size(struct device *dev, void *drv_data, u64 size)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	if (!size) {
+		dev_warn(dev, "%s: Invalid address range size=0x%llx\n",
+			 __func__, size);
+		return -EINVAL;
+	}
+
+	ras2_ctx->size = size;
+	ras2_ctx->od_scrub_sts = OD_SCRUB_STS_INIT;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
+		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
+	bool running;
+	int ret;
+
+	guard(mutex)(ras2_ctx->pcc_lock);
+	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (enable) {
+		if (ras2_ctx->bg_scrub || running)
+			return -EBUSY;
+		ps_sm->params.req_addr_range[0] = 0;
+		ps_sm->params.req_addr_range[1] = 0;
+		ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
+		ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK,
+							    ras2_ctx->scrub_cycle_hrs);
+		ps_sm->params.command = RAS2_START_PATROL_SCRUBBER;
+	} else {
+		if (!ras2_ctx->bg_scrub)
+			return -EPERM;
+		ps_sm->params.command = RAS2_STOP_PATROL_SCRUBBER;
+	}
+
+	ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
+	ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_EN_BACKGROUND,
+						    enable);
+	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "Failed to %s background scrubbing\n",
+			str_enable_disable(enable));
+		return ret;
+	}
+
+	if (enable) {
+		ras2_ctx->bg_scrub = true;
+		/* Update the cache to account for rounding of supplied parameters and similar */
+		ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	} else {
+		ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+		ras2_ctx->bg_scrub = false;
+	}
+
+	return ret;
+}
+
+static int ras2_hw_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
+{
+	struct ras2_mem_ctx *ras2_ctx = drv_data;
+
+	*enabled = ras2_ctx->bg_scrub;
+
+	return 0;
+}
+
+static const struct edac_scrub_ops ras2_scrub_ops = {
+	.read_addr = ras2_hw_scrub_read_addr,
+	.read_size = ras2_hw_scrub_read_size,
+	.write_addr = ras2_hw_scrub_write_addr,
+	.write_size = ras2_hw_scrub_write_size,
+	.get_enabled_bg = ras2_hw_scrub_get_enabled_bg,
+	.set_enabled_bg = ras2_hw_scrub_set_enabled_bg,
+	.get_min_cycle = ras2_hw_scrub_read_min_scrub_cycle,
+	.get_max_cycle = ras2_hw_scrub_read_max_scrub_cycle,
+	.get_cycle_duration = ras2_hw_scrub_cycle_read,
+	.set_cycle_duration = ras2_hw_scrub_cycle_write,
+};
+
+static int ras2_probe(struct auxiliary_device *auxdev,
+		      const struct auxiliary_device_id *id)
+{
+	struct ras2_mem_ctx *ras2_ctx = container_of(auxdev, struct ras2_mem_ctx, adev);
+	struct edac_dev_feature ras_features;
+	char scrub_name[RAS2_SCRUB_NAME_LEN];
+	int ret;
+
+	if (!ras2_is_patrol_scrub_support(ras2_ctx))
+		return -EOPNOTSUPP;
+
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	sprintf(scrub_name, "acpi_ras_mem%d", ras2_ctx->id);
+
+	ras_features.ft_type	= RAS_FEAT_SCRUB;
+	ras_features.instance	= ras2_ctx->instance;
+	ras_features.scrub_ops	= &ras2_scrub_ops;
+	ras_features.ctx	= ras2_ctx;
+
+	return edac_dev_register(&auxdev->dev, scrub_name, NULL, 1,
+				 &ras_features);
+}
+
+static const struct auxiliary_device_id ras2_mem_dev_id_table[] = {
+	{ .name = RAS2_AUX_DEV_NAME "." RAS2_MEM_DEV_ID_NAME, },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(auxiliary, ras2_mem_dev_id_table);
+
+static struct auxiliary_driver ras2_mem_driver = {
+	.name = RAS2_MEM_DEV_ID_NAME,
+	.probe = ras2_probe,
+	.id_table = ras2_mem_dev_id_table,
+};
+module_auxiliary_driver(ras2_mem_driver);
+
+MODULE_IMPORT_NS("ACPI_RAS2");
+MODULE_DESCRIPTION("ACPI RAS2 memory driver");
+MODULE_LICENSE("GPL");
diff --git a/include/acpi/ras2.h b/include/acpi/ras2.h
index f8937069cd4a..fe011c079dda 100644
--- a/include/acpi/ras2.h
+++ b/include/acpi/ras2.h
@@ -34,6 +34,14 @@ struct device;
  * @pcc_subspace:	Pointer to local data structure for PCC communication
  * @pcc_lock:		Pointer to PCC lock to provide mutually exclusive access
  *			to PCC channel subspace
+ * @instance:		Feature instance
+ * @base:		Base address of the memory region to scrub
+ * @size:		Size of the memory region to scrub
+ * @scrub_cycle_hrs:	Current scrub rate in hours
+ * @min_scrub_cycle:	Minimum scrub rate supported
+ * @max_scrub_cycle:	Maximum scrub rate supported
+ * @od_scrub_sts:	Status of demand scrubbing (memory region)
+ * @bg_scrub:		Status of background patrol scrubbing
  */
 struct ras2_mem_ctx {
 	int				id;
@@ -42,6 +50,14 @@ struct ras2_mem_ctx {
 	struct device			*dev;
 	void				*pcc_subspace;
 	struct mutex			*pcc_lock;
+	u8				instance;
+	u64				base;
+	u64				size;
+	u8				scrub_cycle_hrs;
+	u8				min_scrub_cycle;
+	u8				max_scrub_cycle;
+	u8				od_scrub_sts;
+	bool				bg_scrub;
 };
 
 #ifdef CONFIG_ACPI_RAS2
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table
  2025-05-07 21:43 [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
  2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
  2025-05-07 21:43 ` [PATCH v5 2/2] ras: mem: Add memory " shiju.jose
@ 2025-05-12  8:16 ` Jonathan Cameron
  2025-05-12  8:38   ` Borislav Petkov
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2025-05-12  8:16 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-acpi, linux-doc, bp, rafael, tony.luck, lenb,
	leo.duran, Yazen.Ghannam, mchehab, linux-mm, linuxarm, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang

On Wed, 7 May 2025 22:43:41 +0100
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for ACPI RAS2 feature table (RAS2) defined in the
> ACPI 6.5 specification, section 5.2.21 and RAS2 HW based memory
> scrubbing feature.
> 
> ACPI RAS2 patches were part of the EDAC series [1].
> 
> The code is based on ras.git: edac-for-next branch [2]
> merged with linux-pm.git [3] : linux-next branch.
> 
> 1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
> 2. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-for-next
> 3. https://web.git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/

Rafael, Borislav,

What path do we expect this little series to take forwards?

I'm kind of assuming through ACPI given the acpica dependency, but
anything works for us. Last few iterations have been minor and around
the renames of stuff in acpica to shorten some silly long lines. So I hope
this is ready to go.

Yazen, Daniel, others thanks for your reviews and testing etc - if you
want to give RB tags or similar that would be great as well!

Jonathan


> 
> Changes
> =======
> v4 -> v5:
> 1. Fix for the build warnings reported by kernel test robot.
>    https://patchwork.kernel.org/project/linux-edac/patch/20250423163511.1412-3-shiju.jose@huawei.com/
> 2. Removed patch "ACPI: ACPI 6.5: RAS2: Rename RAS2 table structure and field names"
>    from the series as the ACPICA patch was merged to linux-pm.git : branch linux-next
> 3. Rebased to ras.git: edac-for-next branch merged with linux-pm.git : linux-next branch.
>       
> v3 -> v4:
> 1.  Changes for feedbacks from Yazen on v3.
>     https://lore.kernel.org/all/20250415210504.GA854098@yaz-khff2.amd.com/
> 
> v2 -> v3:
> 1. Rename RAS2 table structure and field names in 
>    include/acpi/actbl2.h limited to only necessary
>    for RAS2 scrub feature. Not for merging.
>    Corresponding changes are merged in the acpica:
>    https://github.com/acpica/acpica/commit/2c8a38f747de9d977491a494faf0dfaf799b777b
> 2. Changes for feedbacks from Jonathan on v2.
> 3. Daniel reported a known behaviour: when readback 'size' attribute after
>    setting in, returns 0 before starting scrubbing via 'addr' attribute.
>    Changes added to fix this.
> 4. Daniel reported that firmware cannot update status of demand scrubbing
>    via the 'Actual Address Range (OUTPUT)', thus add workaround in the
>    kernel to update sysfs 'addr' attribute with the status of demand
>    scrubbing.
> 5. Optimized logic in ras2_check_pcc_chan() function
>    (patch - ACPI:RAS2: Add ACPI RAS2 driver).
> 6. Add PCC channel lock to struct ras2_pcc_subspace and change
>    lock in ras2_mem_ctx as a pointer to pcc channel lock to make sure
>    writing to PCC subspace shared memory is protected from race conditions.
>    
> v1 -> v2:
> 1.  Changes for feedbacks from Borislav.
>     - Shorten ACPI RAS2 structures and variables names.
>     - Shorten some of the other variables in the RAS2 drivers.
>     - Fixed few CamelCases.
> 
> 2.  Changes for feedbacks from Yazen.
>     - Added newline after number of '}' and return statements.
>     - Changed return type for "ras2_add_aux_device() to 'int'.
>     - Deleted a duplication of acpi_get_table("RAS2",...) in the ras2_acpi_parse_table().
>     - Add "FW_WARN" to few error logs in the ras2_acpi_parse_table().
>     - Rename ras2_acpi_init() to acpi_ras2_init() and modified to call acpi_ras2_init()
>       function from the acpi_init().
>     - Moved scrub related variables from the struct ras2_mem_ctx from  patch
>       "ACPI:RAS2: Add ACPI RAS2 driver" to "ras: mem: Add memory ACPI RAS2 driver".  
> 
> Shiju Jose (2):
>   ACPI:RAS2: Add ACPI RAS2 driver
>   ras: mem: Add memory ACPI RAS2 driver
> 
>  Documentation/edac/scrub.rst |  76 ++++++
>  drivers/acpi/Kconfig         |  11 +
>  drivers/acpi/Makefile        |   1 +
>  drivers/acpi/bus.c           |   3 +
>  drivers/acpi/ras2.c          | 451 +++++++++++++++++++++++++++++++++++
>  drivers/ras/Kconfig          |  11 +
>  drivers/ras/Makefile         |   1 +
>  drivers/ras/acpi_ras2.c      | 406 +++++++++++++++++++++++++++++++
>  include/acpi/ras2.h          |  70 ++++++
>  9 files changed, 1030 insertions(+)
>  create mode 100644 drivers/acpi/ras2.c
>  create mode 100644 drivers/ras/acpi_ras2.c
>  create mode 100644 include/acpi/ras2.h
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table
  2025-05-12  8:16 ` [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
@ 2025-05-12  8:38   ` Borislav Petkov
  0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2025-05-12  8:38 UTC (permalink / raw)
  To: Jonathan Cameron, shiju.jose
  Cc: linux-edac, linux-acpi, linux-doc, rafael, tony.luck, lenb,
	leo.duran, Yazen.Ghannam, mchehab, linux-mm, linuxarm, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen, gthelen,
	wschwartz, dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang

On May 12, 2025 10:16:44 AM GMT+02:00, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>What path do we expect this little series to take forwards?
>
>I'm kind of assuming through ACPI given the acpica dependency, 

Yeah, probably better thru the ACPI tree...

Thx.

-- 
Sent from a small device: formatting sucks and brevity is inevitable.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
  2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2025-05-14  2:55   ` Daniel Ferguson
  2025-05-14 11:31     ` Shiju Jose
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Ferguson @ 2025-05-14  2:55 UTC (permalink / raw)
  To: shiju.jose, linux-edac, linux-acpi, linux-doc
  Cc: bp, rafael, tony.luck, lenb, leo.duran, Yazen.Ghannam, mchehab,
	jonathan.cameron, linux-mm, linuxarm, rientjes, jiaqiyan,
	Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse, jthoughton,
	somasundaram.a, erdemaktas, pgonda, duenwen, gthelen, wschwartz,
	dferguson, wbs, nifan.cxl, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang

> +static int ras2_report_cap_error(u32 cap_status)
> +{
> +	switch (cap_status) {
> +	case ACPI_RAS2_NOT_VALID:
> +	case ACPI_RAS2_NOT_SUPPORTED:
> +		return -EPERM;
> +	case ACPI_RAS2_BUSY:
> +		return -EBUSY;
> +	case ACPI_RAS2_FAILED:
> +	case ACPI_RAS2_ABORTED:
> +	case ACPI_RAS2_INVALID_DATA:
> +		return -EINVAL;
> +	default: /* 0 or other, Success */
> +		return 0;
> +	}
> +}
> +
> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
> +{
> +	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr;
> +	u32 cap_status;
> +	u16 status;
> +	u32 rc;
> +
> +	/*
> +	 * As per ACPI spec, the PCC space will be initialized by
> +	 * platform and should have set the command completion bit when
> +	 * PCC can be used by OSPM.
> +	 *
> +	 * Poll PCC status register every 3us(delay_us) for maximum of
> +	 * deadline_us(timeout_us) until PCC command complete bit is set(cond).
> +	 */
> +	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
> +					status & PCC_STATUS_CMD_COMPLETE, 3,
> +					pcc_subspace->deadline_us);
> +	if (rc) {
> +		pr_warn("PCC check channel failed for : %d rc=%d\n",
> +			pcc_subspace->pcc_id, rc);
> +		return rc;
> +	}
> +
> +	if (status & PCC_STATUS_ERROR) {
> +		cap_status = readw_relaxed(&gen_comm_base->set_caps_status);
> +		rc = ras2_report_cap_error(cap_status);
> +
> +		status &= ~PCC_STATUS_ERROR;
> +		writew_relaxed(status, &gen_comm_base->status);
> +		return rc;
> +	}
> +
> +	if (status & PCC_STATUS_CMD_COMPLETE)
> +		return 0;
> +
> +	return -EIO;
> +}

We still have an outstanding problem. This may sound familiar.

If a user specifies an invalid address, our firmware will set an error code in
the set_caps_status field of the acpi_ras2_shmem structure. In our case, the
error code is ACPI_RAS2_INVALID_DATA, and the user will observe an EINVAL. This
is expected.

However, if the user then subsequently attempts to write a VALID address,
ras2_get_patrol_scrub_running will indirectly call ras2_check_pcc_chan using the
previously INVALID address to determine if the scrubber is still running.
Unfortunately, the INVALID address causes ras2_get_patrol_scrub_running to fail,
therefore preventing the user from specifying a VALID address after specifying
an INVALID address.

The only way to move forward from this inescapable condition is to reboot the
system.

Here is a demo of the problem as I roughly see it on our system (I've labeled
the line numbers for sake of discussion):
1  [root@myhost scrub0]# echo 0x100000000 > size
2  [root@myhost scrub0]# echo 0x1f00000000 > addr
3  [root@myhost scrub0]# echo 0xcf00000000 > addr
4  write error: Invalid argument
5  [  214.446338] PCCT PCCT: Failed to start demand scrubbing
6  [root@myhost scrub0]# echo 0x1f00000000 > addr
7  write error: Invalid argument
8  [  242.263909] PCCT PCCT: failed to read parameters
9  [root@myhost scrub0]# echo 0x100000000 > size
10 write error: Invalid argument
11 [  246.190196] PCCT PCCT: failed to read parameters

The upper most memory address on this system is 0xbf00000000. Line 1 and 2 use
valid values, and line 2 produces the expected results. On line 3, I've
specified an INVALID address (outside of valid range). The error on line 5 is
expected after executing the START_PATROL_SCRUBBER command with an INVALID address.

Line 6 show how I attempt to specify a VALID address. Unfortunately,
ras2_get_patrol_scrub_running encounters and error after executing
GET_PATROL_PARAMETERS because it used the OLD INVALID values in
ps_sm->params.req_addr_range. Line 7 and 8 are the result. Since the flow of
execution if aborted at this point, you can never rectify the situation and
insert a valid value into ps_sm->params.req_addr_range, unless you reboot
the system.

One half baked solution to this problem, is to modify
ras2_get_patrol_scrub_running so that if there is a non-zero address or size
specified, AND the last error code we received was INVALID DATA, then assume the
scrubber is NOT running.

Regards,
~Daniel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
  2025-05-14  2:55   ` Daniel Ferguson
@ 2025-05-14 11:31     ` Shiju Jose
  2025-05-16  1:49       ` Daniel Ferguson
  0 siblings, 1 reply; 9+ messages in thread
From: Shiju Jose @ 2025-05-14 11:31 UTC (permalink / raw)
  To: Daniel Ferguson, linux-edac@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
  Cc: bp@alien8.de, rafael@kernel.org, tony.luck@intel.com,
	lenb@kernel.org, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	mchehab@kernel.org, Jonathan Cameron, linux-mm@kvack.org,
	Linuxarm, rientjes@google.com, jiaqiyan@google.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	gthelen@google.com, wschwartz@amperecomputing.com,
	dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
	nifan.cxl@gmail.com, tanxiaofei, Zengtao (B), Roberto Sassu,
	kangkang.shen@futurewei.com, wanghuiqiang

>-----Original Message-----
>From: Daniel Ferguson <danielf@os.amperecomputing.com>
>Sent: 14 May 2025 03:55
>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-doc@vger.kernel.org
>Cc: bp@alien8.de; rafael@kernel.org; tony.luck@intel.com; lenb@kernel.org;
>leo.duran@amd.com; Yazen.Ghannam@amd.com; mchehab@kernel.org;
>Jonathan Cameron <jonathan.cameron@huawei.com>; linux-mm@kvack.org;
>Linuxarm <linuxarm@huawei.com>; rientjes@google.com;
>jiaqiyan@google.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>
>Subject: Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
>
>> +static int ras2_report_cap_error(u32 cap_status) {
>> +	switch (cap_status) {
>> +	case ACPI_RAS2_NOT_VALID:
>> +	case ACPI_RAS2_NOT_SUPPORTED:
>> +		return -EPERM;
>> +	case ACPI_RAS2_BUSY:
>> +		return -EBUSY;
>> +	case ACPI_RAS2_FAILED:
>> +	case ACPI_RAS2_ABORTED:
>> +	case ACPI_RAS2_INVALID_DATA:
>> +		return -EINVAL;
>> +	default: /* 0 or other, Success */
>> +		return 0;
>> +	}
>> +}
>> +
>> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace
>> +*pcc_subspace) {
>> +	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace-
>>comm_addr;
>> +	u32 cap_status;
>> +	u16 status;
>> +	u32 rc;
>> +
>> +	/*
>> +	 * As per ACPI spec, the PCC space will be initialized by
>> +	 * platform and should have set the command completion bit when
>> +	 * PCC can be used by OSPM.
>> +	 *
>> +	 * Poll PCC status register every 3us(delay_us) for maximum of
>> +	 * deadline_us(timeout_us) until PCC command complete bit is
>set(cond).
>> +	 */
>> +	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
>> +					status &
>PCC_STATUS_CMD_COMPLETE, 3,
>> +					pcc_subspace->deadline_us);
>> +	if (rc) {
>> +		pr_warn("PCC check channel failed for : %d rc=%d\n",
>> +			pcc_subspace->pcc_id, rc);
>> +		return rc;
>> +	}
>> +
>> +	if (status & PCC_STATUS_ERROR) {
>> +		cap_status = readw_relaxed(&gen_comm_base-
>>set_caps_status);
>> +		rc = ras2_report_cap_error(cap_status);
>> +
>> +		status &= ~PCC_STATUS_ERROR;
>> +		writew_relaxed(status, &gen_comm_base->status);
>> +		return rc;
>> +	}
>> +
>> +	if (status & PCC_STATUS_CMD_COMPLETE)
>> +		return 0;
>> +
>> +	return -EIO;
>> +}
>
>We still have an outstanding problem. This may sound familiar.
>
>If a user specifies an invalid address, our firmware will set an error code in the
>set_caps_status field of the acpi_ras2_shmem structure. In our case, the error
>code is ACPI_RAS2_INVALID_DATA, and the user will observe an EINVAL. This is
>expected.
>
>However, if the user then subsequently attempts to write a VALID address,
>ras2_get_patrol_scrub_running will indirectly call ras2_check_pcc_chan using
>the previously INVALID address to determine if the scrubber is still running.
>Unfortunately, the INVALID address causes ras2_get_patrol_scrub_running to
>fail, therefore preventing the user from specifying a VALID address after
>specifying an INVALID address.
>
>The only way to move forward from this inescapable condition is to reboot the
>system.
>
>Here is a demo of the problem as I roughly see it on our system (I've labeled the
>line numbers for sake of discussion):
>1  [root@myhost scrub0]# echo 0x100000000 > size
>2  [root@myhost scrub0]# echo 0x1f00000000 > addr
>3  [root@myhost scrub0]# echo 0xcf00000000 > addr
>4  write error: Invalid argument
>5  [  214.446338] PCCT PCCT: Failed to start demand scrubbing
>6  [root@myhost scrub0]# echo 0x1f00000000 > addr
>7  write error: Invalid argument
>8  [  242.263909] PCCT PCCT: failed to read parameters
>9  [root@myhost scrub0]# echo 0x100000000 > size
>10 write error: Invalid argument
>11 [  246.190196] PCCT PCCT: failed to read parameters
>
>The upper most memory address on this system is 0xbf00000000. Line 1 and 2
>use valid values, and line 2 produces the expected results. On line 3, I've
>specified an INVALID address (outside of valid range). The error on line 5 is
>expected after executing the START_PATROL_SCRUBBER command with an
>INVALID address.
>
>Line 6 show how I attempt to specify a VALID address. Unfortunately,
>ras2_get_patrol_scrub_running encounters and error after executing
>GET_PATROL_PARAMETERS because it used the OLD INVALID values in ps_sm-
>>params.req_addr_range. Line 7 and 8 are the result. Since the flow of
>execution if aborted at this point, you can never rectify the situation and insert a
>valid value into ps_sm->params.req_addr_range, unless you reboot the system.
>
>One half baked solution to this problem, is to modify
>ras2_get_patrol_scrub_running so that if there is a non-zero address or size
>specified, AND the last error code we received was INVALID DATA, then assume
>the scrubber is NOT running.
Hi Daniel,

Thanks for reporting the issue.
Can you check whether following change fix the issue in your test setup?
=============================================
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
index 4d9cfd3bdf45..ff4aa1b75860 100644
--- a/drivers/ras/acpi_ras2.c
+++ b/drivers/ras/acpi_ras2.c
@@ -255,6 +255,13 @@ static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base
        ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
        if (ret) {
                dev_err(ras2_ctx->dev, "Failed to start demand scrubbing\n");
+               if (ret == -EPERM || ret == -EINVAL) {
+                       ps_sm->params.req_addr_range[0] = 0;
+                       ps_sm->params.req_addr_range[1] = 0;
+                       ras2_ctx->base = 0;
+                       ras2_ctx->size = 0;
+                       ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
+               }
                return ret;
        }
        ras2_ctx->od_scrub_sts = OD_SCRUB_STS_ACTIVE;
==============================================
Thanks,
Shiju 

>
>Regards,
>~Daniel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
  2025-05-14 11:31     ` Shiju Jose
@ 2025-05-16  1:49       ` Daniel Ferguson
  2025-05-16 13:36         ` Shiju Jose
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Ferguson @ 2025-05-16  1:49 UTC (permalink / raw)
  To: Shiju Jose, linux-edac@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
  Cc: bp@alien8.de, rafael@kernel.org, tony.luck@intel.com,
	lenb@kernel.org, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	mchehab@kernel.org, Jonathan Cameron, linux-mm@kvack.org,
	Linuxarm, rientjes@google.com, jiaqiyan@google.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	gthelen@google.com, wschwartz@amperecomputing.com,
	dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
	nifan.cxl@gmail.com, tanxiaofei, Zengtao (B), Roberto Sassu,
	kangkang.shen@futurewei.com, wanghuiqiang



On 5/14/2025 4:31 AM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Daniel Ferguson <danielf@os.amperecomputing.com>
>> Sent: 14 May 2025 03:55
>> To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>> acpi@vger.kernel.org; linux-doc@vger.kernel.org
>> Cc: bp@alien8.de; rafael@kernel.org; tony.luck@intel.com; lenb@kernel.org;
>> leo.duran@amd.com; Yazen.Ghannam@amd.com; mchehab@kernel.org;
>> Jonathan Cameron <jonathan.cameron@huawei.com>; linux-mm@kvack.org;
>> Linuxarm <linuxarm@huawei.com>; rientjes@google.com;
>> jiaqiyan@google.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>> naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>> somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>> duenwen@google.com; gthelen@google.com;
>> wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>> wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
>> <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>> Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>> wanghuiqiang <wanghuiqiang@huawei.com>
>> Subject: Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
>>
>>> +static int ras2_report_cap_error(u32 cap_status) {
>>> +	switch (cap_status) {
>>> +	case ACPI_RAS2_NOT_VALID:
>>> +	case ACPI_RAS2_NOT_SUPPORTED:
>>> +		return -EPERM;
>>> +	case ACPI_RAS2_BUSY:
>>> +		return -EBUSY;
>>> +	case ACPI_RAS2_FAILED:
>>> +	case ACPI_RAS2_ABORTED:
>>> +	case ACPI_RAS2_INVALID_DATA:
>>> +		return -EINVAL;
>>> +	default: /* 0 or other, Success */
>>> +		return 0;
>>> +	}
>>> +}
>>> +
>>> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace
>>> +*pcc_subspace) {
>>> +	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace-
>>> comm_addr;
>>> +	u32 cap_status;
>>> +	u16 status;
>>> +	u32 rc;
>>> +
>>> +	/*
>>> +	 * As per ACPI spec, the PCC space will be initialized by
>>> +	 * platform and should have set the command completion bit when
>>> +	 * PCC can be used by OSPM.
>>> +	 *
>>> +	 * Poll PCC status register every 3us(delay_us) for maximum of
>>> +	 * deadline_us(timeout_us) until PCC command complete bit is
>> set(cond).
>>> +	 */
>>> +	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
>>> +					status &
>> PCC_STATUS_CMD_COMPLETE, 3,
>>> +					pcc_subspace->deadline_us);
>>> +	if (rc) {
>>> +		pr_warn("PCC check channel failed for : %d rc=%d\n",
>>> +			pcc_subspace->pcc_id, rc);
>>> +		return rc;
>>> +	}
>>> +
>>> +	if (status & PCC_STATUS_ERROR) {
>>> +		cap_status = readw_relaxed(&gen_comm_base-
>>> set_caps_status);
>>> +		rc = ras2_report_cap_error(cap_status);
>>> +
>>> +		status &= ~PCC_STATUS_ERROR;
>>> +		writew_relaxed(status, &gen_comm_base->status);
>>> +		return rc;
>>> +	}
>>> +
>>> +	if (status & PCC_STATUS_CMD_COMPLETE)
>>> +		return 0;
>>> +
>>> +	return -EIO;
>>> +}
>>
>> We still have an outstanding problem. This may sound familiar.
>>
>> If a user specifies an invalid address, our firmware will set an error code in the
>> set_caps_status field of the acpi_ras2_shmem structure. In our case, the error
>> code is ACPI_RAS2_INVALID_DATA, and the user will observe an EINVAL. This is
>> expected.
>>
>> However, if the user then subsequently attempts to write a VALID address,
>> ras2_get_patrol_scrub_running will indirectly call ras2_check_pcc_chan using
>> the previously INVALID address to determine if the scrubber is still running.
>> Unfortunately, the INVALID address causes ras2_get_patrol_scrub_running to
>> fail, therefore preventing the user from specifying a VALID address after
>> specifying an INVALID address.
>>
>> The only way to move forward from this inescapable condition is to reboot the
>> system.
>>
>> Here is a demo of the problem as I roughly see it on our system (I've labeled the
>> line numbers for sake of discussion):
>> 1  [root@myhost scrub0]# echo 0x100000000 > size
>> 2  [root@myhost scrub0]# echo 0x1f00000000 > addr
>> 3  [root@myhost scrub0]# echo 0xcf00000000 > addr
>> 4  write error: Invalid argument
>> 5  [  214.446338] PCCT PCCT: Failed to start demand scrubbing
>> 6  [root@myhost scrub0]# echo 0x1f00000000 > addr
>> 7  write error: Invalid argument
>> 8  [  242.263909] PCCT PCCT: failed to read parameters
>> 9  [root@myhost scrub0]# echo 0x100000000 > size
>> 10 write error: Invalid argument
>> 11 [  246.190196] PCCT PCCT: failed to read parameters
>>
>> The upper most memory address on this system is 0xbf00000000. Line 1 and 2
>> use valid values, and line 2 produces the expected results. On line 3, I've
>> specified an INVALID address (outside of valid range). The error on line 5 is
>> expected after executing the START_PATROL_SCRUBBER command with an
>> INVALID address.
>>
>> Line 6 show how I attempt to specify a VALID address. Unfortunately,
>> ras2_get_patrol_scrub_running encounters and error after executing
>> GET_PATROL_PARAMETERS because it used the OLD INVALID values in ps_sm-
>>> params.req_addr_range. Line 7 and 8 are the result. Since the flow of
>> execution if aborted at this point, you can never rectify the situation and insert a
>> valid value into ps_sm->params.req_addr_range, unless you reboot the system.
>>
>> One half baked solution to this problem, is to modify
>> ras2_get_patrol_scrub_running so that if there is a non-zero address or size
>> specified, AND the last error code we received was INVALID DATA, then assume
>> the scrubber is NOT running.
> Hi Daniel,
> 
> Thanks for reporting the issue.
> Can you check whether following change fix the issue in your test setup?
> =============================================
> diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
> index 4d9cfd3bdf45..ff4aa1b75860 100644
> --- a/drivers/ras/acpi_ras2.c
> +++ b/drivers/ras/acpi_ras2.c
> @@ -255,6 +255,13 @@ static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base
>         ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
>         if (ret) {
>                 dev_err(ras2_ctx->dev, "Failed to start demand scrubbing\n");
> +               if (ret == -EPERM || ret == -EINVAL) {
> +                       ps_sm->params.req_addr_range[0] = 0;
> +                       ps_sm->params.req_addr_range[1] = 0;
> +                       ras2_ctx->base = 0;
> +                       ras2_ctx->size = 0;
> +                       ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
> +               }
>                 return ret;
>         }
>         ras2_ctx->od_scrub_sts = OD_SCRUB_STS_ACTIVE;
> ==============================================
> Thanks,
> Shiju 

We're happy! with this fix.

For this to work, we had to no-op the START_PATROL_SCRUBBER and
GET_PATROL_PARAMETERS commands when base and size are equal to zero.
Previously, we returned INVALID DATA when base and size were zero.

Maybe we should amend the ACPI spec with this special knowledge.

Anyways; as of now, with this fix, this driver will work out of the box on our
systems.

Thanks a lot,
~Daniel
> 
>>
>> Regards,
>> ~Daniel



^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
  2025-05-16  1:49       ` Daniel Ferguson
@ 2025-05-16 13:36         ` Shiju Jose
  0 siblings, 0 replies; 9+ messages in thread
From: Shiju Jose @ 2025-05-16 13:36 UTC (permalink / raw)
  To: Daniel Ferguson, linux-edac@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
  Cc: bp@alien8.de, rafael@kernel.org, tony.luck@intel.com,
	lenb@kernel.org, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	mchehab@kernel.org, Jonathan Cameron, linux-mm@kvack.org,
	Linuxarm, rientjes@google.com, jiaqiyan@google.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	gthelen@google.com, wschwartz@amperecomputing.com,
	dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
	nifan.cxl@gmail.com, tanxiaofei, Zengtao (B), Roberto Sassu,
	kangkang.shen@futurewei.com, wanghuiqiang


>-----Original Message-----
>From: Daniel Ferguson <danielf@os.amperecomputing.com>
>Sent: 16 May 2025 02:49
>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-doc@vger.kernel.org
>Cc: bp@alien8.de; rafael@kernel.org; tony.luck@intel.com; lenb@kernel.org;
>leo.duran@amd.com; Yazen.Ghannam@amd.com; mchehab@kernel.org;
>Jonathan Cameron <jonathan.cameron@huawei.com>; linux-mm@kvack.org;
>Linuxarm <linuxarm@huawei.com>; rientjes@google.com;
>jiaqiyan@google.com; Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>
>Subject: Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
>
>
>
>On 5/14/2025 4:31 AM, Shiju Jose wrote:
>>> -----Original Message-----
>>> From: Daniel Ferguson <danielf@os.amperecomputing.com>
>>> Sent: 14 May 2025 03:55
>>> To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org;
>>> linux- acpi@vger.kernel.org; linux-doc@vger.kernel.org
>>> Cc: bp@alien8.de; rafael@kernel.org; tony.luck@intel.com;
>>> lenb@kernel.org; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>>> mchehab@kernel.org; Jonathan Cameron
><jonathan.cameron@huawei.com>;
>>> linux-mm@kvack.org; Linuxarm <linuxarm@huawei.com>;
>>> rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>>> dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>>> james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com;
>>> erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>>> gthelen@google.com; wschwartz@amperecomputing.com;
>>> dferguson@amperecomputing.com; wbs@os.amperecomputing.com;
>>> nifan.cxl@gmail.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
>>> <prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>>> kangkang.shen@futurewei.com; wanghuiqiang
><wanghuiqiang@huawei.com>
>>> Subject: Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
>>>
>>>> +static int ras2_report_cap_error(u32 cap_status) {
>>>> +	switch (cap_status) {
>>>> +	case ACPI_RAS2_NOT_VALID:
>>>> +	case ACPI_RAS2_NOT_SUPPORTED:
>>>> +		return -EPERM;
>>>> +	case ACPI_RAS2_BUSY:
>>>> +		return -EBUSY;
>>>> +	case ACPI_RAS2_FAILED:
>>>> +	case ACPI_RAS2_ABORTED:
>>>> +	case ACPI_RAS2_INVALID_DATA:
>>>> +		return -EINVAL;
>>>> +	default: /* 0 or other, Success */
>>>> +		return 0;
>>>> +	}
>>>> +}
>>>> +
>>>> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace
>>>> +*pcc_subspace) {
>>>> +	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace-
>>>> comm_addr;
>>>> +	u32 cap_status;
>>>> +	u16 status;
>>>> +	u32 rc;
>>>> +
>>>> +	/*
>>>> +	 * As per ACPI spec, the PCC space will be initialized by
>>>> +	 * platform and should have set the command completion bit when
>>>> +	 * PCC can be used by OSPM.
>>>> +	 *
>>>> +	 * Poll PCC status register every 3us(delay_us) for maximum of
>>>> +	 * deadline_us(timeout_us) until PCC command complete bit is
>>> set(cond).
>>>> +	 */
>>>> +	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
>>>> +					status &
>>> PCC_STATUS_CMD_COMPLETE, 3,
>>>> +					pcc_subspace->deadline_us);
>>>> +	if (rc) {
>>>> +		pr_warn("PCC check channel failed for : %d rc=%d\n",
>>>> +			pcc_subspace->pcc_id, rc);
>>>> +		return rc;
>>>> +	}
>>>> +
>>>> +	if (status & PCC_STATUS_ERROR) {
>>>> +		cap_status = readw_relaxed(&gen_comm_base-
>>>> set_caps_status);
>>>> +		rc = ras2_report_cap_error(cap_status);
>>>> +
>>>> +		status &= ~PCC_STATUS_ERROR;
>>>> +		writew_relaxed(status, &gen_comm_base->status);
>>>> +		return rc;
>>>> +	}
>>>> +
>>>> +	if (status & PCC_STATUS_CMD_COMPLETE)
>>>> +		return 0;
>>>> +
>>>> +	return -EIO;
>>>> +}
>>>
>>> We still have an outstanding problem. This may sound familiar.
>>>
>>> If a user specifies an invalid address, our firmware will set an
>>> error code in the set_caps_status field of the acpi_ras2_shmem
>>> structure. In our case, the error code is ACPI_RAS2_INVALID_DATA, and
>>> the user will observe an EINVAL. This is expected.
>>>
>>> However, if the user then subsequently attempts to write a VALID
>>> address, ras2_get_patrol_scrub_running will indirectly call
>>> ras2_check_pcc_chan using the previously INVALID address to determine if
>the scrubber is still running.
>>> Unfortunately, the INVALID address causes
>>> ras2_get_patrol_scrub_running to fail, therefore preventing the user
>>> from specifying a VALID address after specifying an INVALID address.
>>>
>>> The only way to move forward from this inescapable condition is to
>>> reboot the system.
>>>
>>> Here is a demo of the problem as I roughly see it on our system (I've
>>> labeled the line numbers for sake of discussion):
>>> 1  [root@myhost scrub0]# echo 0x100000000 > size
>>> 2  [root@myhost scrub0]# echo 0x1f00000000 > addr
>>> 3  [root@myhost scrub0]# echo 0xcf00000000 > addr
>>> 4  write error: Invalid argument
>>> 5  [  214.446338] PCCT PCCT: Failed to start demand scrubbing
>>> 6  [root@myhost scrub0]# echo 0x1f00000000 > addr
>>> 7  write error: Invalid argument
>>> 8  [  242.263909] PCCT PCCT: failed to read parameters
>>> 9  [root@myhost scrub0]# echo 0x100000000 > size
>>> 10 write error: Invalid argument
>>> 11 [  246.190196] PCCT PCCT: failed to read parameters
>>>
>>> The upper most memory address on this system is 0xbf00000000. Line 1
>>> and 2 use valid values, and line 2 produces the expected results. On
>>> line 3, I've specified an INVALID address (outside of valid range).
>>> The error on line 5 is expected after executing the
>>> START_PATROL_SCRUBBER command with an INVALID address.
>>>
>>> Line 6 show how I attempt to specify a VALID address. Unfortunately,
>>> ras2_get_patrol_scrub_running encounters and error after executing
>>> GET_PATROL_PARAMETERS because it used the OLD INVALID values in
>>> ps_sm-
>>>> params.req_addr_range. Line 7 and 8 are the result. Since the flow
>>>> of
>>> execution if aborted at this point, you can never rectify the
>>> situation and insert a valid value into ps_sm->params.req_addr_range, unless
>you reboot the system.
>>>
>>> One half baked solution to this problem, is to modify
>>> ras2_get_patrol_scrub_running so that if there is a non-zero address
>>> or size specified, AND the last error code we received was INVALID
>>> DATA, then assume the scrubber is NOT running.
>> Hi Daniel,
>>
>> Thanks for reporting the issue.
>> Can you check whether following change fix the issue in your test setup?
>> =============================================
>> diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c index
>> 4d9cfd3bdf45..ff4aa1b75860 100644
>> --- a/drivers/ras/acpi_ras2.c
>> +++ b/drivers/ras/acpi_ras2.c
>> @@ -255,6 +255,13 @@ static int ras2_hw_scrub_write_addr(struct device
>*dev, void *drv_data, u64 base
>>         ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
>>         if (ret) {
>>                 dev_err(ras2_ctx->dev, "Failed to start demand
>> scrubbing\n");
>> +               if (ret == -EPERM || ret == -EINVAL) {
>> +                       ps_sm->params.req_addr_range[0] = 0;
>> +                       ps_sm->params.req_addr_range[1] = 0;
>> +                       ras2_ctx->base = 0;
>> +                       ras2_ctx->size = 0;
>> +                       ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
>> +               }
>>                 return ret;
>>         }
>>         ras2_ctx->od_scrub_sts = OD_SCRUB_STS_ACTIVE;
>> ==============================================
>> Thanks,
>> Shiju
>
>We're happy! with this fix.
>
>For this to work, we had to no-op the START_PATROL_SCRUBBER and
>GET_PATROL_PARAMETERS commands when base and size are equal to zero.
>Previously, we returned INVALID DATA when base and size were zero.

Thanks Daniel for verifying the changes.

For demand scrubbing, kernel does not send START_PATROL_SCRUBBER command
when size is zero.
However GET_PATROL_PARAMETERS command from ras2_update_patrol_scrub_params_cache()
is valid case when base and size are equal to zero.

>
>Maybe we should amend the ACPI spec with this special knowledge.
>
>Anyways; as of now, with this fix, this driver will work out of the box on our
>systems.
  
Please tag V6 sent with fix.
https://lore.kernel.org/all/20250516132205.789-1-shiju.jose@huawei.com/
 
>
>Thanks a lot,
>~Daniel
>>
>>>
>>> Regards,
>>> ~Daniel
>

Thanks,
Shiju



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-16 13:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 21:43 [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-05-14  2:55   ` Daniel Ferguson
2025-05-14 11:31     ` Shiju Jose
2025-05-16  1:49       ` Daniel Ferguson
2025-05-16 13:36         ` Shiju Jose
2025-05-07 21:43 ` [PATCH v5 2/2] ras: mem: Add memory " shiju.jose
2025-05-12  8:16 ` [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
2025-05-12  8:38   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).