Linux CXL
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	Linuxarm <linuxarm@huawei.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	Roberto Sassu <roberto.sassu@huawei.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>
Subject: RE: [PATCH v3 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
Date: Thu, 1 May 2025 09:22:25 +0000	[thread overview]
Message-ID: <29baba1b74af40c58c905b946680557d@huawei.com> (raw)
In-Reply-To: <2df68c68-f1a8-4327-abc9-d265326c133d@intel.com>

>-----Original Message-----
>From: Dave Jiang <dave.jiang@intel.com>
>Sent: 28 April 2025 18:45
>To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org;
>dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; dave@stgolabs.net;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com
>Cc: linux-edac@vger.kernel.org; linux-doc@vger.kernel.org; bp@alien8.de;
>tony.luck@intel.com; lenb@kernel.org; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; mchehab@kernel.org; nifan.cxl@gmail.com;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>; Roberto Sassu
><roberto.sassu@huawei.com>; kangkang.shen@futurewei.com; wanghuiqiang
><wanghuiqiang@huawei.com>
>Subject: Re: [PATCH v3 1/8] EDAC: Update documentation for the CXL memory
>patrol scrub control feature
>
>
>
>On 4/7/25 10:49 AM, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Update the Documentation/edac/scrub.rst to include usecases and
>> policies for CXL memory device-based, CXL region-based patrol scrub
>> control and CXL Error Check Scrub (ECS).
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/edac/scrub.rst | 75
>> ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 75 insertions(+)
>>
>> diff --git a/Documentation/edac/scrub.rst
>> b/Documentation/edac/scrub.rst index daab929cdba1..6132853a02fe 100644
>> --- a/Documentation/edac/scrub.rst
>> +++ b/Documentation/edac/scrub.rst
>> @@ -264,3 +264,78 @@ Sysfs files are documented in
>> `Documentation/ABI/testing/sysfs-edac-scrub`
>>
>>  `Documentation/ABI/testing/sysfs-edac-ecs`
>> +
>> +Examples
>> +--------
>> +
>> +The usage takes the form shown in these examples:
>> +
>> +1. CXL memory Patrol Scrub
>> +
>> +The following are the usecases identified why we might increase the scrub
>rate.
>> +
>> +- Scrubbing is needed at device granularity because a device is
>> +showing
>> +  unexpectedly high errors, the scrub control needs to be at device
>> +  granularity
>
>Not sure what the second part of the sentence has to do with defining the use
>case.
>When the per device control is detailed in 1.1, you can refer to the first use case.

Hi Dave,

Thanks for the comments.
Sure. I will correct.
>
>> +
>> +- Scrubbing may apply to memory that isn't online at all yet.Likely
>> +this
>space after period
>
>> +  is setting system wide defaults on boot.
>
>is a system wide default setting on boot.

Will update.
>
>> +
>> +- Scrubbing at higher rate because software has decided that we want
>> +  more reliability for particular data, calling this Differentiated
>> +  Reliability.  That data sits in a region which may cover part of
>> +multiple
>> +  devices. The region interfaces are about supporting this use case.
>
>Please consider:
>Scrubbing at a higher rate because the monitor software has determined that
>more reliability is necessary for a particular data set. This is called
>Differentiated Reliability.
Will update.
>
>The last sentence is not needed. When describing region scrubbing in 1.2, the
>third use case can be referred to.

Will do.
>
>> +
>> +1.1. Device based scrubbing
>> +
>> +CXL memory is exposed to memory management subsystem and ultimately
>> +userspace via CXL devices.
>> +
>> +When combining control via the device interfaces and region
>> +interfaces see
>> +1.2 Region bases scrubbing.
>
>"see section 1.2 ..."
Ok.
>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub`
>> +
>> +1.2. Region based scrubbing
>> +
>> +CXL memory is exposed to memory management subsystem and ultimately
>> +userspace via CXL regions. CXL Regions represent mapped memory
>> +capacity in system physical address space. These can incorporate one
>> +or more parts of multiple CXL memory devices with traffic interleaved
>> +across them. The user may want to control the scrub rate via this
>> +more abstract region instead of having to figure out the constituent
>> +devices and program them separately. The scrub rate for each device
>> +covers the whole device. Thus if multiple regions use parts of that
>> +device then requests for scrubbing of other regions may result in a higher
>scrub rate than requested for this specific region.
>> +
>> +Userspace must follow below set of rules on how to set the scrub
>> +rates for any mixture of requirements.
>> +
>> +1. Taking each region in turn from lowest desired scrub rate to highest and
>set
>> +   their scrub rates. Later regions may override the scrub rate on individual
>> +   devices (and hence potentially whole regions).
>> +
>> +2. Take each device for which enhanced scrubbing is required (higher rate)
>and
>> +   set those scrub rates. This will override the scrub rates of
>> +individual devices
>
>> +   leaving any that are not specifically set to scrub at the maximum rate
>required
>> +   for any of the regions they are involved in backing.
>
>I'm having trouble understanding what the second part of this sentence is
>attempting to convey.
Will rephrase the sentence.

>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub`
>> +
>> +2. CXL memory Error Check Scrub (ECS)
>> +
>> +The Error Check Scrub (ECS) feature enables a memory device to
>> +perform error checking and correction (ECC) and count single-bit
>> +errors. The associated memory controller triggers the ECS mode with a
>> +trigger sent to the memory device. However, CXL ECS control allows
>> +the user to change the attributes for error count mode and threshold
>> +for reporting errors and reset the ECS
>
>CXL ECX control allows the user to change the attributes for error count mode,
>the threshold for reporting errors, and reset the ECS counter.
>
>I think that's where the commas should go to make the sentence clearer.

Will correct.
>
>> +counter only. Thus, the scope of start Error Check Scrub on a memory
>> +device lies within a memory controller or platform when it is
>> +detecting unexpectedly high errors. Userspace allows to control the
>> +error count mode, threshold number of errors for a segment count
>> +indicating a number of segments having at least a threshold number of errors
>and reset the ECS counter.
>
>Need a comman before 'and'. Although the middle part is excessively long and
>hard to digest.
>Please consider rephrase.
Sure.
>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-ecs`
>

Thanks,
Shiju

  reply	other threads:[~2025-05-01  9:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 17:49 [PATCH v3 0/8] cxl: support CXL memory RAS features shiju.jose
2025-04-07 17:49 ` [PATCH v3 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-04-28 17:45   ` Dave Jiang
2025-05-01  9:22     ` Shiju Jose [this message]
2025-04-07 17:49 ` [PATCH v3 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
2025-04-28 18:22   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
2025-04-07 17:49 ` [PATCH v3 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
2025-04-28 20:14   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
2025-04-07 17:49 ` [PATCH v3 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
2025-04-28 20:53   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
2025-04-28 21:46   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29baba1b74af40c58c905b946680557d@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=nifan.cxl@gmail.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=roberto.sassu@huawei.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox