From: Daniel Ferguson <danielf@os.amperecomputing.com>
To: shiju.jose@huawei.com, linux-edac@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
Cc: bp@alien8.de, rafael@kernel.org, tony.luck@intel.com,
lenb@kernel.org, leo.duran@amd.com, Yazen.Ghannam@amd.com,
mchehab@kernel.org, jonathan.cameron@huawei.com,
linux-mm@kvack.org, linuxarm@huawei.com, rientjes@google.com,
jiaqiyan@google.com, Jon.Grimm@amd.com,
dave.hansen@linux.intel.com, naoya.horiguchi@nec.com,
james.morse@arm.com, jthoughton@google.com,
somasundaram.a@hpe.com, erdemaktas@google.com, pgonda@google.com,
duenwen@google.com, gthelen@google.com,
wschwartz@amperecomputing.com, dferguson@amperecomputing.com,
wbs@os.amperecomputing.com, nifan.cxl@gmail.com,
tanxiaofei@huawei.com, prime.zeng@hisilicon.com,
roberto.sassu@huawei.com, kangkang.shen@futurewei.com,
wanghuiqiang@huawei.com
Subject: Re: [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver
Date: Tue, 13 May 2025 19:55:08 -0700 [thread overview]
Message-ID: <8cdf7885-31b3-4308-8a7c-f4e427486429@os.amperecomputing.com> (raw)
In-Reply-To: <20250507214344.709-2-shiju.jose@huawei.com>
> +static int ras2_report_cap_error(u32 cap_status)
> +{
> + switch (cap_status) {
> + case ACPI_RAS2_NOT_VALID:
> + case ACPI_RAS2_NOT_SUPPORTED:
> + return -EPERM;
> + case ACPI_RAS2_BUSY:
> + return -EBUSY;
> + case ACPI_RAS2_FAILED:
> + case ACPI_RAS2_ABORTED:
> + case ACPI_RAS2_INVALID_DATA:
> + return -EINVAL;
> + default: /* 0 or other, Success */
> + return 0;
> + }
> +}
> +
> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
> +{
> + struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr;
> + u32 cap_status;
> + u16 status;
> + u32 rc;
> +
> + /*
> + * As per ACPI spec, the PCC space will be initialized by
> + * platform and should have set the command completion bit when
> + * PCC can be used by OSPM.
> + *
> + * Poll PCC status register every 3us(delay_us) for maximum of
> + * deadline_us(timeout_us) until PCC command complete bit is set(cond).
> + */
> + rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
> + status & PCC_STATUS_CMD_COMPLETE, 3,
> + pcc_subspace->deadline_us);
> + if (rc) {
> + pr_warn("PCC check channel failed for : %d rc=%d\n",
> + pcc_subspace->pcc_id, rc);
> + return rc;
> + }
> +
> + if (status & PCC_STATUS_ERROR) {
> + cap_status = readw_relaxed(&gen_comm_base->set_caps_status);
> + rc = ras2_report_cap_error(cap_status);
> +
> + status &= ~PCC_STATUS_ERROR;
> + writew_relaxed(status, &gen_comm_base->status);
> + return rc;
> + }
> +
> + if (status & PCC_STATUS_CMD_COMPLETE)
> + return 0;
> +
> + return -EIO;
> +}
We still have an outstanding problem. This may sound familiar.
If a user specifies an invalid address, our firmware will set an error code in
the set_caps_status field of the acpi_ras2_shmem structure. In our case, the
error code is ACPI_RAS2_INVALID_DATA, and the user will observe an EINVAL. This
is expected.
However, if the user then subsequently attempts to write a VALID address,
ras2_get_patrol_scrub_running will indirectly call ras2_check_pcc_chan using the
previously INVALID address to determine if the scrubber is still running.
Unfortunately, the INVALID address causes ras2_get_patrol_scrub_running to fail,
therefore preventing the user from specifying a VALID address after specifying
an INVALID address.
The only way to move forward from this inescapable condition is to reboot the
system.
Here is a demo of the problem as I roughly see it on our system (I've labeled
the line numbers for sake of discussion):
1 [root@myhost scrub0]# echo 0x100000000 > size
2 [root@myhost scrub0]# echo 0x1f00000000 > addr
3 [root@myhost scrub0]# echo 0xcf00000000 > addr
4 write error: Invalid argument
5 [ 214.446338] PCCT PCCT: Failed to start demand scrubbing
6 [root@myhost scrub0]# echo 0x1f00000000 > addr
7 write error: Invalid argument
8 [ 242.263909] PCCT PCCT: failed to read parameters
9 [root@myhost scrub0]# echo 0x100000000 > size
10 write error: Invalid argument
11 [ 246.190196] PCCT PCCT: failed to read parameters
The upper most memory address on this system is 0xbf00000000. Line 1 and 2 use
valid values, and line 2 produces the expected results. On line 3, I've
specified an INVALID address (outside of valid range). The error on line 5 is
expected after executing the START_PATROL_SCRUBBER command with an INVALID address.
Line 6 show how I attempt to specify a VALID address. Unfortunately,
ras2_get_patrol_scrub_running encounters and error after executing
GET_PATROL_PARAMETERS because it used the OLD INVALID values in
ps_sm->params.req_addr_range. Line 7 and 8 are the result. Since the flow of
execution if aborted at this point, you can never rectify the situation and
insert a valid value into ps_sm->params.req_addr_range, unless you reboot
the system.
One half baked solution to this problem, is to modify
ras2_get_patrol_scrub_running so that if there is a non-zero address or size
specified, AND the last error code we received was INVALID DATA, then assume the
scrubber is NOT running.
Regards,
~Daniel
next prev parent reply other threads:[~2025-05-14 2:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 21:43 [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table shiju.jose
2025-05-07 21:43 ` [PATCH v5 1/2] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-05-14 2:55 ` Daniel Ferguson [this message]
2025-05-14 11:31 ` Shiju Jose
2025-05-16 1:49 ` Daniel Ferguson
2025-05-16 13:36 ` Shiju Jose
2025-05-07 21:43 ` [PATCH v5 2/2] ras: mem: Add memory " shiju.jose
2025-05-12 8:16 ` [PATCH v5 0/2] ACPI: Add support for ACPI RAS2 feature table Jonathan Cameron
2025-05-12 8:38 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8cdf7885-31b3-4308-8a7c-f4e427486429@os.amperecomputing.com \
--to=danielf@os.amperecomputing.com \
--cc=Jon.Grimm@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gthelen@google.com \
--cc=james.morse@arm.com \
--cc=jiaqiyan@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=nifan.cxl@gmail.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=roberto.sassu@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox