linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Ferguson <danielf@os.amperecomputing.com>
To: shiju.jose@huawei.com, rafael@kernel.org,
	linux-edac@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-doc@vger.kernel.org, bp@alien8.de, tony.luck@intel.com,
	lenb@kernel.org, leo.duran@amd.com, Yazen.Ghannam@amd.com,
	mchehab@kernel.org
Cc: jonathan.cameron@huawei.com, linux-mm@kvack.org,
	linuxarm@huawei.com, rientjes@google.com, jiaqiyan@google.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	gthelen@google.com, wschwartz@amperecomputing.com,
	dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
	nifan.cxl@gmail.com, tanxiaofei@huawei.com,
	prime.zeng@hisilicon.com, roberto.sassu@huawei.com,
	kangkang.shen@futurewei.com, wanghuiqiang@huawei.com,
	vanshikonda@os.amperecomputing.com
Subject: Re: [PATCH v9 2/2] ras: mem: Add memory ACPI RAS2 driver
Date: Fri, 25 Jul 2025 09:36:18 -0700	[thread overview]
Message-ID: <547ed8fb-d6b7-4b6b-a38b-bf13223971b1@os.amperecomputing.com> (raw)
In-Reply-To: <20250617161417.1681-3-shiju.jose@huawei.com>


> +static int ras2_hw_scrub_write_addr(struct device *dev, void *drv_data, u64 base)
> +{
> +	struct ras2_mem_ctx *ras2_ctx = drv_data;
> +	struct acpi_ras2_ps_shared_mem __iomem *ps_sm =
> +		TO_ACPI_RAS2_PS_SHMEM(ras2_ctx->comm_addr);
> +	bool running;
> +	int ret;
> +
> +	if (ras2_ctx->bg_scrub)
> +		return -EBUSY;
> +
> +	guard(mutex)(ras2_ctx->pcc_lock);
> +	ps_sm->common.set_caps[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
> +
> +	if (!ras2_ctx->size) {
> +		dev_warn(ras2_ctx->dev,
> +			 "%s: Invalid address range, base=0x%llx size=0x%llx\n",
> +			 __func__, base, ras2_ctx->size);
> +		return -ERANGE;
> +	}
> +
> +	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
> +	if (ret)
> +		return ret;
> +
> +	if (running)
> +		return -EBUSY;
> +
> +	ras2_ctx->base = base;
> +	ps_sm->params.scrub_params_in &= ~RAS2_PS_SC_HRS_IN_MASK;
> +	ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PS_SC_HRS_IN_MASK,
> +						    ras2_ctx->scrub_cycle_hrs);
> +	ps_sm->params.req_addr_range[0] = ras2_ctx->base;
> +	ps_sm->params.req_addr_range[1] = ras2_ctx->size;
> +	ps_sm->params.scrub_params_in &= ~RAS2_PS_EN_BACKGROUND;
> +	ps_sm->params.command = RAS2_START_PATROL_SCRUBBER;
> +
> +	ret = ras2_send_pcc_cmd(ras2_ctx, PCC_CMD_EXEC_RAS2);
> +	if (ret) {
> +		dev_err(ras2_ctx->dev, "Failed to start demand scrubbing rc(%d)\n", ret);
> +		if (ret != -EBUSY) {
> +			ps_sm->params.req_addr_range[0] = 0;
> +			ps_sm->params.req_addr_range[1] = 0;
> +			ras2_ctx->base = 0;
> +			ras2_ctx->size = 0;
> +			ras2_ctx->od_scrub_sts = OD_SCRUB_STS_IDLE;
> +		}
> +		return ret;
> +	}
> +	ras2_ctx->od_scrub_sts = OD_SCRUB_STS_ACTIVE;
> +
> +	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
> +}

After a lot more discussion within Ampere, I've come to believe that using
base==0 && size==0 as a special case, when invoking GET_PATROL_PARAMETERS after
an error, is not a good idea.

There are three problems with the current solution.

1) This is undefined behavior and will likely behave differently on different
implementations. For example, 0 is an invalid physical address of the Ampere
systems, could be a valid address on some systems, or could be device memory on
another. In case it is a valid DDR address, the firmware can fill in the Scrub
Parameters, Extended Data Region (ACPI 6.6) with proper information. But in case
it is an invalid address, or device memory, the firmware has to indicate an
error to prevent the OS from consuming bad data reported through these output
parameters.

2) Pretend you are using acpi_ras2_mem0/scrub1, which corresponds to the 2nd
NUMA node in the system. If we use the current OS driver implementation as-is
and zero out base and size after an error, then when we issue the next
GET_PATROL_PARAMETERS we will have retrieved the parameters corresponding to the
wrong NUMA node. The driver will have fetched the parameters for NUMA node 1,
even though we are interacting with the scrubber corresponding to NUMA node 2.
This could be problematic in a situation where patrol parameters differ across
different NUMA nodes. Additionally, and perhaps most importantly, using the
GET_PATROL_PARAMETERS to determine if a Scrubber is running require the correct
address range. The current implementation is not getting the running status of
the correct range.

3) This is a special case of #2. During driver load, ras2_probe issues a
GET_PATROL_PARAMETERS using a base and size equal to 0. The base and size are
zero because requested_address_range is allocated with kzalloc. In this case, we
may not be getting the initial values from the correct range.

Proposed Solution:
What we propose, is to instead of zeroing out the base and size after an error,
use the full range of the current NUMA node. We believe that a superset of a
currently active scrub range can properly report all the relevant and correct
information.
To be compliant with the specification, FW should set "Flags" field if there is
any on-demand scrub in progress on any memory range in the NUMA node. Again,
this solution assumes that the driver does not allow more than one scrubber to
run within a single NUMA node. All three problems can be solved in the same way.

What do you think?

Regards,
~Daniel

  reply	other threads:[~2025-07-25 16:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17 16:14 [PATCH v9 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-06-17 16:14 ` [PATCH v9 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-06-17 16:14 ` [PATCH v9 2/2] ras: mem: Add memory " shiju.jose
2025-07-25 16:36   ` Daniel Ferguson [this message]
2025-08-01 17:28     ` Shiju Jose
2025-08-01 17:33       ` Shiju Jose
2025-07-16 14:28 ` [PATCH v9 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
2025-07-16 14:42   ` Rafael J. Wysocki
2025-07-16 15:56     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547ed8fb-d6b7-4b6b-a38b-bf13223971b1@os.amperecomputing.com \
    --to=danielf@os.amperecomputing.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vanshikonda@os.amperecomputing.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).