All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"leo.duran@amd.com" <leo.duran@amd.com>,
	"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	Linuxarm <linuxarm@huawei.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	Roberto Sassu <roberto.sassu@huawei.com>,
	"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>
Subject: RE: [PATCH v3 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature
Date: Thu, 1 May 2025 09:22:25 +0000	[thread overview]
Message-ID: <29baba1b74af40c58c905b946680557d@huawei.com> (raw)
In-Reply-To: <2df68c68-f1a8-4327-abc9-d265326c133d@intel.com>

>-----Original Message-----
>From: Dave Jiang <dave.jiang@intel.com>
>Sent: 28 April 2025 18:45
>To: Shiju Jose <shiju.jose@huawei.com>; linux-cxl@vger.kernel.org;
>dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; dave@stgolabs.net;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com
>Cc: linux-edac@vger.kernel.org; linux-doc@vger.kernel.org; bp@alien8.de;
>tony.luck@intel.com; lenb@kernel.org; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; mchehab@kernel.org; nifan.cxl@gmail.com;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>; Roberto Sassu
><roberto.sassu@huawei.com>; kangkang.shen@futurewei.com; wanghuiqiang
><wanghuiqiang@huawei.com>
>Subject: Re: [PATCH v3 1/8] EDAC: Update documentation for the CXL memory
>patrol scrub control feature
>
>
>
>On 4/7/25 10:49 AM, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Update the Documentation/edac/scrub.rst to include usecases and
>> policies for CXL memory device-based, CXL region-based patrol scrub
>> control and CXL Error Check Scrub (ECS).
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/edac/scrub.rst | 75
>> ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 75 insertions(+)
>>
>> diff --git a/Documentation/edac/scrub.rst
>> b/Documentation/edac/scrub.rst index daab929cdba1..6132853a02fe 100644
>> --- a/Documentation/edac/scrub.rst
>> +++ b/Documentation/edac/scrub.rst
>> @@ -264,3 +264,78 @@ Sysfs files are documented in
>> `Documentation/ABI/testing/sysfs-edac-scrub`
>>
>>  `Documentation/ABI/testing/sysfs-edac-ecs`
>> +
>> +Examples
>> +--------
>> +
>> +The usage takes the form shown in these examples:
>> +
>> +1. CXL memory Patrol Scrub
>> +
>> +The following are the usecases identified why we might increase the scrub
>rate.
>> +
>> +- Scrubbing is needed at device granularity because a device is
>> +showing
>> +  unexpectedly high errors, the scrub control needs to be at device
>> +  granularity
>
>Not sure what the second part of the sentence has to do with defining the use
>case.
>When the per device control is detailed in 1.1, you can refer to the first use case.

Hi Dave,

Thanks for the comments.
Sure. I will correct.
>
>> +
>> +- Scrubbing may apply to memory that isn't online at all yet.Likely
>> +this
>space after period
>
>> +  is setting system wide defaults on boot.
>
>is a system wide default setting on boot.

Will update.
>
>> +
>> +- Scrubbing at higher rate because software has decided that we want
>> +  more reliability for particular data, calling this Differentiated
>> +  Reliability.  That data sits in a region which may cover part of
>> +multiple
>> +  devices. The region interfaces are about supporting this use case.
>
>Please consider:
>Scrubbing at a higher rate because the monitor software has determined that
>more reliability is necessary for a particular data set. This is called
>Differentiated Reliability.
Will update.
>
>The last sentence is not needed. When describing region scrubbing in 1.2, the
>third use case can be referred to.

Will do.
>
>> +
>> +1.1. Device based scrubbing
>> +
>> +CXL memory is exposed to memory management subsystem and ultimately
>> +userspace via CXL devices.
>> +
>> +When combining control via the device interfaces and region
>> +interfaces see
>> +1.2 Region bases scrubbing.
>
>"see section 1.2 ..."
Ok.
>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub`
>> +
>> +1.2. Region based scrubbing
>> +
>> +CXL memory is exposed to memory management subsystem and ultimately
>> +userspace via CXL regions. CXL Regions represent mapped memory
>> +capacity in system physical address space. These can incorporate one
>> +or more parts of multiple CXL memory devices with traffic interleaved
>> +across them. The user may want to control the scrub rate via this
>> +more abstract region instead of having to figure out the constituent
>> +devices and program them separately. The scrub rate for each device
>> +covers the whole device. Thus if multiple regions use parts of that
>> +device then requests for scrubbing of other regions may result in a higher
>scrub rate than requested for this specific region.
>> +
>> +Userspace must follow below set of rules on how to set the scrub
>> +rates for any mixture of requirements.
>> +
>> +1. Taking each region in turn from lowest desired scrub rate to highest and
>set
>> +   their scrub rates. Later regions may override the scrub rate on individual
>> +   devices (and hence potentially whole regions).
>> +
>> +2. Take each device for which enhanced scrubbing is required (higher rate)
>and
>> +   set those scrub rates. This will override the scrub rates of
>> +individual devices
>
>> +   leaving any that are not specifically set to scrub at the maximum rate
>required
>> +   for any of the regions they are involved in backing.
>
>I'm having trouble understanding what the second part of this sentence is
>attempting to convey.
Will rephrase the sentence.

>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub`
>> +
>> +2. CXL memory Error Check Scrub (ECS)
>> +
>> +The Error Check Scrub (ECS) feature enables a memory device to
>> +perform error checking and correction (ECC) and count single-bit
>> +errors. The associated memory controller triggers the ECS mode with a
>> +trigger sent to the memory device. However, CXL ECS control allows
>> +the user to change the attributes for error count mode and threshold
>> +for reporting errors and reset the ECS
>
>CXL ECX control allows the user to change the attributes for error count mode,
>the threshold for reporting errors, and reset the ECS counter.
>
>I think that's where the commas should go to make the sentence clearer.

Will correct.
>
>> +counter only. Thus, the scope of start Error Check Scrub on a memory
>> +device lies within a memory controller or platform when it is
>> +detecting unexpectedly high errors. Userspace allows to control the
>> +error count mode, threshold number of errors for a segment count
>> +indicating a number of segments having at least a threshold number of errors
>and reset the ECS counter.
>
>Need a comman before 'and'. Although the middle part is excessively long and
>hard to digest.
>Please consider rephrase.
Sure.
>
>> +
>> +Sysfs files for scrubbing are documented in
>> +`Documentation/ABI/testing/sysfs-edac-ecs`
>

Thanks,
Shiju

  reply	other threads:[~2025-05-01  9:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 17:49 [PATCH v3 0/8] cxl: support CXL memory RAS features shiju.jose
2025-04-07 17:49 ` [PATCH v3 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-04-28 17:45   ` Dave Jiang
2025-05-01  9:22     ` Shiju Jose [this message]
2025-04-07 17:49 ` [PATCH v3 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
2025-04-28 18:22   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
2025-04-07 17:49 ` [PATCH v3 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
2025-04-28 20:14   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
2025-04-07 17:49 ` [PATCH v3 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
2025-04-28 20:53   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
2025-04-28 21:46   ` Dave Jiang
2025-04-07 17:49 ` [PATCH v3 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29baba1b74af40c58c905b946680557d@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=nifan.cxl@gmail.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=roberto.sassu@huawei.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.