From: Dave Jiang <dave.jiang@intel.com>
To: shiju.jose@huawei.com, linux-cxl@vger.kernel.org,
dan.j.williams@intel.com, jonathan.cameron@huawei.com,
dave@stgolabs.net, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com
Cc: linux-edac@vger.kernel.org, linux-doc@vger.kernel.org,
bp@alien8.de, tony.luck@intel.com, lenb@kernel.org,
leo.duran@amd.com, Yazen.Ghannam@amd.com, mchehab@kernel.org,
nifan.cxl@gmail.com, linuxarm@huawei.com, tanxiaofei@huawei.com,
prime.zeng@hisilicon.com, roberto.sassu@huawei.com,
kangkang.shen@futurewei.com, wanghuiqiang@huawei.com
Subject: Re: [PATCH v3 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
Date: Mon, 28 Apr 2025 10:45:08 -0700 [thread overview]
Message-ID: <2df68c68-f1a8-4327-abc9-d265326c133d@intel.com> (raw)
In-Reply-To: <20250407174920.625-2-shiju.jose@huawei.com>
On 4/7/25 10:49 AM, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Update the Documentation/edac/scrub.rst to include usecases and
> policies for CXL memory device-based, CXL region-based patrol scrub
> control and CXL Error Check Scrub (ECS).
>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
> Documentation/edac/scrub.rst | 75 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 75 insertions(+)
>
> diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
> index daab929cdba1..6132853a02fe 100644
> --- a/Documentation/edac/scrub.rst
> +++ b/Documentation/edac/scrub.rst
> @@ -264,3 +264,78 @@ Sysfs files are documented in
> `Documentation/ABI/testing/sysfs-edac-scrub`
>
> `Documentation/ABI/testing/sysfs-edac-ecs`
> +
> +Examples
> +--------
> +
> +The usage takes the form shown in these examples:
> +
> +1. CXL memory Patrol Scrub
> +
> +The following are the usecases identified why we might increase the scrub rate.
> +
> +- Scrubbing is needed at device granularity because a device is showing
> + unexpectedly high errors, the scrub control needs to be at device
> + granularity
Not sure what the second part of the sentence has to do with defining the use case.
When the per device control is detailed in 1.1, you can refer to the first use case.
> +
> +- Scrubbing may apply to memory that isn't online at all yet.Likely this
space after period
> + is setting system wide defaults on boot.
is a system wide default setting on boot.
> +
> +- Scrubbing at higher rate because software has decided that we want
> + more reliability for particular data, calling this Differentiated
> + Reliability. That data sits in a region which may cover part of multiple
> + devices. The region interfaces are about supporting this use case.
Please consider:
Scrubbing at a higher rate because the monitor software has determined that
more reliability is necessary for a particular data set. This is called
Differentiated Reliability.
The last sentence is not needed. When describing region scrubbing in 1.2, the third use
case can be referred to.
> +
> +1.1. Device based scrubbing
> +
> +CXL memory is exposed to memory management subsystem and ultimately userspace
> +via CXL devices.
> +
> +When combining control via the device interfaces and region interfaces see
> +1.2 Region bases scrubbing.
"see section 1.2 ..."
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub`
> +
> +1.2. Region based scrubbing
> +
> +CXL memory is exposed to memory management subsystem and ultimately userspace
> +via CXL regions. CXL Regions represent mapped memory capacity in system
> +physical address space. These can incorporate one or more parts of multiple CXL
> +memory devices with traffic interleaved across them. The user may want to control
> +the scrub rate via this more abstract region instead of having to figure out the
> +constituent devices and program them separately. The scrub rate for each device
> +covers the whole device. Thus if multiple regions use parts of that device then
> +requests for scrubbing of other regions may result in a higher scrub rate than
> +requested for this specific region.
> +
> +Userspace must follow below set of rules on how to set the scrub rates for any
> +mixture of requirements.
> +
> +1. Taking each region in turn from lowest desired scrub rate to highest and set
> + their scrub rates. Later regions may override the scrub rate on individual
> + devices (and hence potentially whole regions).
> +
> +2. Take each device for which enhanced scrubbing is required (higher rate) and
> + set those scrub rates. This will override the scrub rates of individual devices
> + leaving any that are not specifically set to scrub at the maximum rate required
> + for any of the regions they are involved in backing.
I'm having trouble understanding what the second part of this sentence is attempting to convey.
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub`
> +
> +2. CXL memory Error Check Scrub (ECS)
> +
> +The Error Check Scrub (ECS) feature enables a memory device to perform error
> +checking and correction (ECC) and count single-bit errors. The associated
> +memory controller triggers the ECS mode with a trigger sent to the memory
> +device. However, CXL ECS control allows the user to change the attributes
> +for error count mode and threshold for reporting errors and reset the ECS
CXL ECX control allows the user to change the attributes for error count mode,
the threshold for reporting errors, and reset the ECS counter.
I think that's where the commas should go to make the sentence clearer.
> +counter only. Thus, the scope of start Error Check Scrub on a memory device
> +lies within a memory controller or platform when it is detecting unexpectedly
> +high errors. Userspace allows to control the error count mode, threshold
> +number of errors for a segment count indicating a number of segments
> +having at least a threshold number of errors and reset the ECS counter.
Need a comman before 'and'. Although the middle part is excessively long and hard to digest.
Please consider rephrase.
> +
> +Sysfs files for scrubbing are documented in
> +`Documentation/ABI/testing/sysfs-edac-ecs`
next prev parent reply other threads:[~2025-04-28 17:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 17:49 [PATCH v3 0/8] cxl: support CXL memory RAS features shiju.jose
2025-04-07 17:49 ` [PATCH v3 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-04-28 17:45 ` Dave Jiang [this message]
2025-05-01 9:22 ` Shiju Jose
2025-04-07 17:49 ` [PATCH v3 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
2025-04-28 18:22 ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
2025-04-07 17:49 ` [PATCH v3 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
2025-04-28 20:14 ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
2025-04-07 17:49 ` [PATCH v3 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
2025-04-28 20:53 ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
2025-04-28 21:46 ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2df68c68-f1a8-4327-abc9-d265326c133d@intel.com \
--to=dave.jiang@intel.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=nifan.cxl@gmail.com \
--cc=prime.zeng@hisilicon.com \
--cc=roberto.sassu@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=wanghuiqiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox