linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
@ 2025-05-30 12:28 Li Ming
  2025-05-30 18:27 ` Alison Schofield
  0 siblings, 1 reply; 9+ messages in thread
From: Li Ming @ 2025-05-30 12:28 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, shiju.jose
  Cc: linux-cxl, linux-kernel, Li Ming

When trying to update the scrub_cycle value of a cxl region, which means
updating the scrub_cycle value of each memdev under a cxl region. cxl
driver needs to guarantee the new scrub_cycle value is greater than the
min_scrub_cycle value of a memdev, otherwise the updating operation will
fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).

Current implementation logic of getting the min_scrub_cycle value of a
cxl region is that getting the min_scrub_cycle value of each memdevs
under the cxl region, then using the minimum min_scrub_cycle value as
the region's min_scrub_cycle. Checking if the new scrub_cycle value is
greater than this value. If yes, updating the new scrub_cycle value to
each memdevs. The issue is that the new scrub_cycle value is possibly
greater than the minimum min_scrub_cycle value of all memdevs but less
than the maximum min_scrub_cycle value of all memdevs if memdevs have
a different min_scrub_cycle value. The updating operation will always
fail on these memdevs which have a greater min_scrub_cycle than the new
scrub_cycle.

The correct implementation logic is to get the maximum value of these
memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater
than the value. If yes, the new scrub_cycle value is fit for the region.

The change also impacts the result of
cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
minimum min_scrub_cycle value among all memdevs under the region before
the change. The interface will return the maximum min_scrub_cycle value
among all memdevs under the region with the change.

Signed-off-by: Li Ming <ming.li@zohomail.com>
---
I made this change based on my understanding on the SPEC and current CXL
EDAC code, but I am not sure if it is a bug or it is designed this way.

base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
---
 drivers/cxl/core/edac.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 2cbc664e5d62..ad243cfe00e7 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
 				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
 {
 	struct cxl_mailbox *cxl_mbox;
-	u8 min_scrub_cycle = U8_MAX;
 	struct cxl_region_params *p;
 	struct cxl_memdev *cxlmd;
 	struct cxl_region *cxlr;
+	u8 min_scrub_cycle = 0;
 	int i, ret;
 
 	if (!cxl_ps_ctx->cxlr) {
@@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
 		if (ret)
 			return ret;
 
+		/*
+		 * The min_scrub_cycle of a region is the maximum value among
+		 * the min_scrub_cycle of all the memdevs under the region.
+		 */
 		if (min_cycle)
-			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
+			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
 	}
 
 	if (min_cycle)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-05-30 12:28 [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation Li Ming
@ 2025-05-30 18:27 ` Alison Schofield
  2025-05-31 11:52   ` Li Ming
  2025-06-02  8:23   ` Shiju Jose
  0 siblings, 2 replies; 9+ messages in thread
From: Alison Schofield @ 2025-05-30 18:27 UTC (permalink / raw)
  To: Li Ming
  Cc: dave, jonathan.cameron, dave.jiang, vishal.l.verma, ira.weiny,
	dan.j.williams, shiju.jose, linux-cxl, linux-kernel

On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
> When trying to update the scrub_cycle value of a cxl region, which means
> updating the scrub_cycle value of each memdev under a cxl region. cxl
> driver needs to guarantee the new scrub_cycle value is greater than the
> min_scrub_cycle value of a memdev, otherwise the updating operation will
> fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).
> 
> Current implementation logic of getting the min_scrub_cycle value of a
> cxl region is that getting the min_scrub_cycle value of each memdevs
> under the cxl region, then using the minimum min_scrub_cycle value as
> the region's min_scrub_cycle. Checking if the new scrub_cycle value is
> greater than this value. If yes, updating the new scrub_cycle value to
> each memdevs. The issue is that the new scrub_cycle value is possibly
> greater than the minimum min_scrub_cycle value of all memdevs but less
> than the maximum min_scrub_cycle value of all memdevs if memdevs have
> a different min_scrub_cycle value. The updating operation will always
> fail on these memdevs which have a greater min_scrub_cycle than the new
> scrub_cycle.
> 
> The correct implementation logic is to get the maximum value of these
> memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater
> than the value. If yes, the new scrub_cycle value is fit for the region.
> 
> The change also impacts the result of
> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
> minimum min_scrub_cycle value among all memdevs under the region before
> the change. The interface will return the maximum min_scrub_cycle value
> among all memdevs under the region with the change.
> 
> Signed-off-by: Li Ming <ming.li@zohomail.com>
> ---
> I made this change based on my understanding on the SPEC and current CXL
> EDAC code, but I am not sure if it is a bug or it is designed this way.

The attribute is defined to show (per Documentation/ABI/testing/sysfs-edac-scrub)
   "Supported minimum scrub cycle duration in seconds by the memory scrubber."

Your fix, making the min the max of the mins, looks needed.

I took a look at the max attribute. If the min is the max on the mins, then
the max should be the max of the maxes. But, not true. We do this:

instead: *max = U8_MAX * 3600; /* Max set by register size */

The comment isn't helping me, esp since the sysfs description doesn't
explain that we are using a constant max.


> 
> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
> ---
>  drivers/cxl/core/edac.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> index 2cbc664e5d62..ad243cfe00e7 100644
> --- a/drivers/cxl/core/edac.c
> +++ b/drivers/cxl/core/edac.c
> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>  {
>  	struct cxl_mailbox *cxl_mbox;
> -	u8 min_scrub_cycle = U8_MAX;
>  	struct cxl_region_params *p;
>  	struct cxl_memdev *cxlmd;
>  	struct cxl_region *cxlr;
> +	u8 min_scrub_cycle = 0;
>  	int i, ret;
>  
>  	if (!cxl_ps_ctx->cxlr) {
> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
>  		if (ret)
>  			return ret;
>  
> +		/*
> +		 * The min_scrub_cycle of a region is the maximum value among
> +		 * the min_scrub_cycle of all the memdevs under the region.
> +		 */
>  		if (min_cycle)
> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
> +			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>  	}
>  
>  	if (min_cycle)
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-05-30 18:27 ` Alison Schofield
@ 2025-05-31 11:52   ` Li Ming
  2025-06-02  8:23   ` Shiju Jose
  1 sibling, 0 replies; 9+ messages in thread
From: Li Ming @ 2025-05-31 11:52 UTC (permalink / raw)
  To: Alison Schofield
  Cc: dave, jonathan.cameron, dave.jiang, vishal.l.verma, ira.weiny,
	dan.j.williams, shiju.jose, linux-cxl, linux-kernel

On 5/31/2025 2:27 AM, Alison Schofield wrote:
> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>> When trying to update the scrub_cycle value of a cxl region, which means
>> updating the scrub_cycle value of each memdev under a cxl region. cxl
>> driver needs to guarantee the new scrub_cycle value is greater than the
>> min_scrub_cycle value of a memdev, otherwise the updating operation will
>> fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).
>>
>> Current implementation logic of getting the min_scrub_cycle value of a
>> cxl region is that getting the min_scrub_cycle value of each memdevs
>> under the cxl region, then using the minimum min_scrub_cycle value as
>> the region's min_scrub_cycle. Checking if the new scrub_cycle value is
>> greater than this value. If yes, updating the new scrub_cycle value to
>> each memdevs. The issue is that the new scrub_cycle value is possibly
>> greater than the minimum min_scrub_cycle value of all memdevs but less
>> than the maximum min_scrub_cycle value of all memdevs if memdevs have
>> a different min_scrub_cycle value. The updating operation will always
>> fail on these memdevs which have a greater min_scrub_cycle than the new
>> scrub_cycle.
>>
>> The correct implementation logic is to get the maximum value of these
>> memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater
>> than the value. If yes, the new scrub_cycle value is fit for the region.
>>
>> The change also impacts the result of
>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>> minimum min_scrub_cycle value among all memdevs under the region before
>> the change. The interface will return the maximum min_scrub_cycle value
>> among all memdevs under the region with the change.
>>
>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>> ---
>> I made this change based on my understanding on the SPEC and current CXL
>> EDAC code, but I am not sure if it is a bug or it is designed this way.
> The attribute is defined to show (per Documentation/ABI/testing/sysfs-edac-scrub)
>    "Supported minimum scrub cycle duration in seconds by the memory scrubber."
>
> Your fix, making the min the max of the mins, looks needed.
>
> I took a look at the max attribute. If the min is the max on the mins, then
> the max should be the max of the maxes. But, not true. We do this:
>
> instead: *max = U8_MAX * 3600; /* Max set by register size */
>
> The comment isn't helping me, esp since the sysfs description doesn't
> explain that we are using a constant max.
>
CXL spec implies the max value is FFh. You can take a look at the Table 8-222 and Table 8-223 in CXL r3.2 section 8.2.10.9.11.1.


Ming

>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>> ---
>>  drivers/cxl/core/edac.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>> index 2cbc664e5d62..ad243cfe00e7 100644
>> --- a/drivers/cxl/core/edac.c
>> +++ b/drivers/cxl/core/edac.c
>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
>>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>  {
>>  	struct cxl_mailbox *cxl_mbox;
>> -	u8 min_scrub_cycle = U8_MAX;
>>  	struct cxl_region_params *p;
>>  	struct cxl_memdev *cxlmd;
>>  	struct cxl_region *cxlr;
>> +	u8 min_scrub_cycle = 0;
>>  	int i, ret;
>>  
>>  	if (!cxl_ps_ctx->cxlr) {
>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
>>  		if (ret)
>>  			return ret;
>>  
>> +		/*
>> +		 * The min_scrub_cycle of a region is the maximum value among
>> +		 * the min_scrub_cycle of all the memdevs under the region.
>> +		 */
>>  		if (min_cycle)
>> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>> +			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>>  	}
>>  
>>  	if (min_cycle)
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-05-30 18:27 ` Alison Schofield
  2025-05-31 11:52   ` Li Ming
@ 2025-06-02  8:23   ` Shiju Jose
  2025-06-02 16:48     ` Alison Schofield
  1 sibling, 1 reply; 9+ messages in thread
From: Shiju Jose @ 2025-06-02  8:23 UTC (permalink / raw)
  To: Alison Schofield, Li Ming
  Cc: dave@stgolabs.net, Jonathan Cameron, dave.jiang@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

>-----Original Message-----
>From: Alison Schofield <alison.schofield@intel.com>
>Sent: 30 May 2025 19:28
>To: Li Ming <ming.li@zohomail.com>
>Cc: dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>dave.jiang@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dan.j.williams@intel.com; Shiju Jose <shiju.jose@huawei.com>; linux-
>cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>miscalculation
>
>On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>> When trying to update the scrub_cycle value of a cxl region, which
>> means updating the scrub_cycle value of each memdev under a cxl
>> region. cxl driver needs to guarantee the new scrub_cycle value is
>> greater than the min_scrub_cycle value of a memdev, otherwise the
>> updating operation will fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).
>>
>> Current implementation logic of getting the min_scrub_cycle value of a
>> cxl region is that getting the min_scrub_cycle value of each memdevs
>> under the cxl region, then using the minimum min_scrub_cycle value as
>> the region's min_scrub_cycle. Checking if the new scrub_cycle value is
>> greater than this value. If yes, updating the new scrub_cycle value to
>> each memdevs. The issue is that the new scrub_cycle value is possibly
>> greater than the minimum min_scrub_cycle value of all memdevs but less
>> than the maximum min_scrub_cycle value of all memdevs if memdevs have
>> a different min_scrub_cycle value. The updating operation will always
>> fail on these memdevs which have a greater min_scrub_cycle than the
>> new scrub_cycle.
>>
>> The correct implementation logic is to get the maximum value of these
>> memdevs' min_scrub_cycle, check if the new scrub_cycle value is
>> greater than the value. If yes, the new scrub_cycle value is fit for the region.
>>
>> The change also impacts the result of
>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>> minimum min_scrub_cycle value among all memdevs under the region
>> before the change. The interface will return the maximum
>> min_scrub_cycle value among all memdevs under the region with the change.
>>
>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>> ---
>> I made this change based on my understanding on the SPEC and current
>> CXL EDAC code, but I am not sure if it is a bug or it is designed this way.
>
>The attribute is defined to show (per Documentation/ABI/testing/sysfs-edac-
>scrub)
>   "Supported minimum scrub cycle duration in seconds by the memory
>scrubber."
>
>Your fix, making the min the max of the mins, looks needed.
>
>I took a look at the max attribute. If the min is the max on the mins, then the
>max should be the max of the maxes. But, not true. We do this:
>
>instead: *max = U8_MAX * 3600; /* Max set by register size */
>
>The comment isn't helping me, esp since the sysfs description doesn't explain
>that we are using a constant max.
CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature Readable Attributes
does not define a field for "max scrub cycle supported".  Thus for max scrub 
cycle, returning max value of (U8_MAX) of patrol scrub cycle field. 

Thanks,
Shiju
>
>
>>
>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>> ---
>>  drivers/cxl/core/edac.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index
>> 2cbc664e5d62..ad243cfe00e7 100644
>> --- a/drivers/cxl/core/edac.c
>> +++ b/drivers/cxl/core/edac.c
>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>cxl_patrol_scrub_context *cxl_ps_ctx,
>>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>{
>>  	struct cxl_mailbox *cxl_mbox;
>> -	u8 min_scrub_cycle = U8_MAX;
>>  	struct cxl_region_params *p;
>>  	struct cxl_memdev *cxlmd;
>>  	struct cxl_region *cxlr;
>> +	u8 min_scrub_cycle = 0;
>>  	int i, ret;
>>
>>  	if (!cxl_ps_ctx->cxlr) {
>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>cxl_patrol_scrub_context *cxl_ps_ctx,
>>  		if (ret)
>>  			return ret;
>>
>> +		/*
>> +		 * The min_scrub_cycle of a region is the maximum value
>among
>> +		 * the min_scrub_cycle of all the memdevs under the region.
>> +		 */
>>  		if (min_cycle)
>> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>> +			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>>  	}
>>
>>  	if (min_cycle)
>> --
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-06-02  8:23   ` Shiju Jose
@ 2025-06-02 16:48     ` Alison Schofield
  2025-06-02 17:25       ` Shiju Jose
  0 siblings, 1 reply; 9+ messages in thread
From: Alison Schofield @ 2025-06-02 16:48 UTC (permalink / raw)
  To: Shiju Jose
  Cc: Li Ming, dave@stgolabs.net, Jonathan Cameron,
	dave.jiang@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
> >-----Original Message-----
> >From: Alison Schofield <alison.schofield@intel.com>
> >Sent: 30 May 2025 19:28
> >To: Li Ming <ming.li@zohomail.com>
> >Cc: dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
> >dave.jiang@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
> >dan.j.williams@intel.com; Shiju Jose <shiju.jose@huawei.com>; linux-
> >cxl@vger.kernel.org; linux-kernel@vger.kernel.org
> >Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
> >miscalculation
> >
> >On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
> >> When trying to update the scrub_cycle value of a cxl region, which
> >> means updating the scrub_cycle value of each memdev under a cxl
> >> region. cxl driver needs to guarantee the new scrub_cycle value is
> >> greater than the min_scrub_cycle value of a memdev, otherwise the
> >> updating operation will fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).
> >>
> >> Current implementation logic of getting the min_scrub_cycle value of a
> >> cxl region is that getting the min_scrub_cycle value of each memdevs
> >> under the cxl region, then using the minimum min_scrub_cycle value as
> >> the region's min_scrub_cycle. Checking if the new scrub_cycle value is
> >> greater than this value. If yes, updating the new scrub_cycle value to
> >> each memdevs. The issue is that the new scrub_cycle value is possibly
> >> greater than the minimum min_scrub_cycle value of all memdevs but less
> >> than the maximum min_scrub_cycle value of all memdevs if memdevs have
> >> a different min_scrub_cycle value. The updating operation will always
> >> fail on these memdevs which have a greater min_scrub_cycle than the
> >> new scrub_cycle.
> >>
> >> The correct implementation logic is to get the maximum value of these
> >> memdevs' min_scrub_cycle, check if the new scrub_cycle value is
> >> greater than the value. If yes, the new scrub_cycle value is fit for the region.
> >>
> >> The change also impacts the result of
> >> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
> >> minimum min_scrub_cycle value among all memdevs under the region
> >> before the change. The interface will return the maximum
> >> min_scrub_cycle value among all memdevs under the region with the change.
> >>
> >> Signed-off-by: Li Ming <ming.li@zohomail.com>
> >> ---
> >> I made this change based on my understanding on the SPEC and current
> >> CXL EDAC code, but I am not sure if it is a bug or it is designed this way.
> >
> >The attribute is defined to show (per Documentation/ABI/testing/sysfs-edac-
> >scrub)
> >   "Supported minimum scrub cycle duration in seconds by the memory
> >scrubber."
> >
> >Your fix, making the min the max of the mins, looks needed.
> >
> >I took a look at the max attribute. If the min is the max on the mins, then the
> >max should be the max of the maxes. But, not true. We do this:
> >
> >instead: *max = U8_MAX * 3600; /* Max set by register size */
> >
> >The comment isn't helping me, esp since the sysfs description doesn't explain
> >that we are using a constant max.
> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature Readable Attributes
> does not define a field for "max scrub cycle supported".  Thus for max scrub 
> cycle, returning max value of (U8_MAX) of patrol scrub cycle field. 

Understand that now, thanks. I'm still wondering if both these deserve
more explanation in Documentation/ABI/testing/sysfs-edac-scrub
explaining the calculations. Like if the device represents an aggregate
of devices, like a region, the min scrub cycle is the max of the mins,
whereas if the device is a single, it's exactly what the device
returned.  And for max, explaining what you replied above.

Regardless of this noise I'm making about the Docs.. I think Ming
should go ahead and v1 the fix for the min calc.

--Alison

> 
> Thanks,
> Shiju
> >
> >
> >>
> >> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
> >> ---
> >>  drivers/cxl/core/edac.c | 8 ++++++--
> >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index
> >> 2cbc664e5d62..ad243cfe00e7 100644
> >> --- a/drivers/cxl/core/edac.c
> >> +++ b/drivers/cxl/core/edac.c
> >> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
> >cxl_patrol_scrub_context *cxl_ps_ctx,
> >>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
> >{
> >>  	struct cxl_mailbox *cxl_mbox;
> >> -	u8 min_scrub_cycle = U8_MAX;
> >>  	struct cxl_region_params *p;
> >>  	struct cxl_memdev *cxlmd;
> >>  	struct cxl_region *cxlr;
> >> +	u8 min_scrub_cycle = 0;
> >>  	int i, ret;
> >>
> >>  	if (!cxl_ps_ctx->cxlr) {
> >> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
> >cxl_patrol_scrub_context *cxl_ps_ctx,
> >>  		if (ret)
> >>  			return ret;
> >>
> >> +		/*
> >> +		 * The min_scrub_cycle of a region is the maximum value
> >among
> >> +		 * the min_scrub_cycle of all the memdevs under the region.
> >> +		 */
> >>  		if (min_cycle)
> >> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
> >> +			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
> >>  	}
> >>
> >>  	if (min_cycle)
> >> --
> >> 2.34.1
> >>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-06-02 16:48     ` Alison Schofield
@ 2025-06-02 17:25       ` Shiju Jose
  2025-06-02 23:57         ` Li Ming
  0 siblings, 1 reply; 9+ messages in thread
From: Shiju Jose @ 2025-06-02 17:25 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Li Ming, dave@stgolabs.net, Jonathan Cameron,
	dave.jiang@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org

>-----Original Message-----
>From: Alison Schofield <alison.schofield@intel.com>
>Sent: 02 June 2025 17:48
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: Li Ming <ming.li@zohomail.com>; dave@stgolabs.net; Jonathan Cameron
><jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>vishal.l.verma@intel.com; ira.weiny@intel.com; dan.j.williams@intel.com; linux-
>cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>miscalculation
>
>On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>> >-----Original Message-----
>> >From: Alison Schofield <alison.schofield@intel.com>
>> >Sent: 30 May 2025 19:28
>> >To: Li Ming <ming.li@zohomail.com>
>> >Cc: dave@stgolabs.net; Jonathan Cameron
>> ><jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>> >vishal.l.verma@intel.com; ira.weiny@intel.com;
>> >dan.j.williams@intel.com; Shiju Jose <shiju.jose@huawei.com>; linux-
>> >cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>> >Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>> >region miscalculation
>> >
>> >On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>> >> When trying to update the scrub_cycle value of a cxl region, which
>> >> means updating the scrub_cycle value of each memdev under a cxl
>> >> region. cxl driver needs to guarantee the new scrub_cycle value is
>> >> greater than the min_scrub_cycle value of a memdev, otherwise the
>> >> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>8.2.10.9.11.1).
>> >>
>> >> Current implementation logic of getting the min_scrub_cycle value
>> >> of a cxl region is that getting the min_scrub_cycle value of each
>> >> memdevs under the cxl region, then using the minimum
>> >> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>> >> the new scrub_cycle value is greater than this value. If yes,
>> >> updating the new scrub_cycle value to each memdevs. The issue is
>> >> that the new scrub_cycle value is possibly greater than the minimum
>> >> min_scrub_cycle value of all memdevs but less than the maximum
>> >> min_scrub_cycle value of all memdevs if memdevs have a different
>> >> min_scrub_cycle value. The updating operation will always fail on
>> >> these memdevs which have a greater min_scrub_cycle than the new
>scrub_cycle.
>> >>
>> >> The correct implementation logic is to get the maximum value of
>> >> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>> >> is greater than the value. If yes, the new scrub_cycle value is fit for the
>region.
>> >>
>> >> The change also impacts the result of
>> >> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>> >> minimum min_scrub_cycle value among all memdevs under the region
>> >> before the change. The interface will return the maximum
>> >> min_scrub_cycle value among all memdevs under the region with the
>change.
>> >>
>> >> Signed-off-by: Li Ming <ming.li@zohomail.com>
>> >> ---
>> >> I made this change based on my understanding on the SPEC and
>> >> current CXL EDAC code, but I am not sure if it is a bug or it is designed this
>way.
>> >
>> >The attribute is defined to show (per
>> >Documentation/ABI/testing/sysfs-edac-
>> >scrub)
>> >   "Supported minimum scrub cycle duration in seconds by the memory
>> >scrubber."
>> >
>> >Your fix, making the min the max of the mins, looks needed.
>> >
>> >I took a look at the max attribute. If the min is the max on the
>> >mins, then the max should be the max of the maxes. But, not true. We do
>this:
>> >
>> >instead: *max = U8_MAX * 3600; /* Max set by register size */
>> >
>> >The comment isn't helping me, esp since the sysfs description doesn't
>> >explain that we are using a constant max.
>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>> Readable Attributes does not define a field for "max scrub cycle
>> supported".  Thus for max scrub cycle, returning max value of (U8_MAX) of
>patrol scrub cycle field.
>
>Understand that now, thanks. I'm still wondering if both these deserve more
>explanation in Documentation/ABI/testing/sysfs-edac-scrub
>explaining the calculations. Like if the device represents an aggregate of
>devices, like a region, the min scrub cycle is the max of the mins, whereas if the
>device is a single, it's exactly what the device returned.  And for max, explaining
>what you replied above.

Not sure is it appropriate to add these CXL scrub specific details to the generic file   
Documentation/ABI/testing/sysfs-edac-scrub?

CXL region specific details were added under section 1.2. Region based scrubbing
of Documentation/edac/scrub.rst. May be better add these details for CXL specific
min and max scrub cycle calculation to the Documentation/edac/scrub.rst?

How do you want to post these suggested doc changes, in a follow-up patch now?

Thanks,
Shiju
>
>Regardless of this noise I'm making about the Docs.. I think Ming should go
>ahead and v1 the fix for the min calc.
>
>--Alison
>
>>
>> Thanks,
>> Shiju
>> >
>> >
>> >>
>> >> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>> >> ---
>> >>  drivers/cxl/core/edac.c | 8 ++++++--
>> >>  1 file changed, 6 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>> >> index
>> >> 2cbc664e5d62..ad243cfe00e7 100644
>> >> --- a/drivers/cxl/core/edac.c
>> >> +++ b/drivers/cxl/core/edac.c
>> >> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>> >cxl_patrol_scrub_context *cxl_ps_ctx,
>> >>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>> >{
>> >>  	struct cxl_mailbox *cxl_mbox;
>> >> -	u8 min_scrub_cycle = U8_MAX;
>> >>  	struct cxl_region_params *p;
>> >>  	struct cxl_memdev *cxlmd;
>> >>  	struct cxl_region *cxlr;
>> >> +	u8 min_scrub_cycle = 0;
>> >>  	int i, ret;
>> >>
>> >>  	if (!cxl_ps_ctx->cxlr) {
>> >> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>> >cxl_patrol_scrub_context *cxl_ps_ctx,
>> >>  		if (ret)
>> >>  			return ret;
>> >>
>> >> +		/*
>> >> +		 * The min_scrub_cycle of a region is the maximum value
>> >among
>> >> +		 * the min_scrub_cycle of all the memdevs under the region.
>> >> +		 */
>> >>  		if (min_cycle)
>> >> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>> >> +			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>> >>  	}
>> >>
>> >>  	if (min_cycle)
>> >> --
>> >> 2.34.1
>> >>
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-06-02 17:25       ` Shiju Jose
@ 2025-06-02 23:57         ` Li Ming
  2025-06-03 10:11           ` Shiju Jose
  0 siblings, 1 reply; 9+ messages in thread
From: Li Ming @ 2025-06-02 23:57 UTC (permalink / raw)
  To: Shiju Jose, Alison Schofield
  Cc: dave@stgolabs.net, Jonathan Cameron, dave.jiang@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 6/3/2025 1:25 AM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Alison Schofield <alison.schofield@intel.com>
>> Sent: 02 June 2025 17:48
>> To: Shiju Jose <shiju.jose@huawei.com>
>> Cc: Li Ming <ming.li@zohomail.com>; dave@stgolabs.net; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>> vishal.l.verma@intel.com; ira.weiny@intel.com; dan.j.williams@intel.com; linux-
>> cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>> miscalculation
>>
>> On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>>>> -----Original Message-----
>>>> From: Alison Schofield <alison.schofield@intel.com>
>>>> Sent: 30 May 2025 19:28
>>>> To: Li Ming <ming.li@zohomail.com>
>>>> Cc: dave@stgolabs.net; Jonathan Cameron
>>>> <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>>>> vishal.l.verma@intel.com; ira.weiny@intel.com;
>>>> dan.j.williams@intel.com; Shiju Jose <shiju.jose@huawei.com>; linux-
>>>> cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>> region miscalculation
>>>>
>>>> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>>>>> When trying to update the scrub_cycle value of a cxl region, which
>>>>> means updating the scrub_cycle value of each memdev under a cxl
>>>>> region. cxl driver needs to guarantee the new scrub_cycle value is
>>>>> greater than the min_scrub_cycle value of a memdev, otherwise the
>>>>> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>> 8.2.10.9.11.1).
>>>>> Current implementation logic of getting the min_scrub_cycle value
>>>>> of a cxl region is that getting the min_scrub_cycle value of each
>>>>> memdevs under the cxl region, then using the minimum
>>>>> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>>>>> the new scrub_cycle value is greater than this value. If yes,
>>>>> updating the new scrub_cycle value to each memdevs. The issue is
>>>>> that the new scrub_cycle value is possibly greater than the minimum
>>>>> min_scrub_cycle value of all memdevs but less than the maximum
>>>>> min_scrub_cycle value of all memdevs if memdevs have a different
>>>>> min_scrub_cycle value. The updating operation will always fail on
>>>>> these memdevs which have a greater min_scrub_cycle than the new
>> scrub_cycle.
>>>>> The correct implementation logic is to get the maximum value of
>>>>> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>>>>> is greater than the value. If yes, the new scrub_cycle value is fit for the
>> region.
>>>>> The change also impacts the result of
>>>>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>>>>> minimum min_scrub_cycle value among all memdevs under the region
>>>>> before the change. The interface will return the maximum
>>>>> min_scrub_cycle value among all memdevs under the region with the
>> change.
>>>>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>>>>> ---
>>>>> I made this change based on my understanding on the SPEC and
>>>>> current CXL EDAC code, but I am not sure if it is a bug or it is designed this
>> way.
>>>> The attribute is defined to show (per
>>>> Documentation/ABI/testing/sysfs-edac-
>>>> scrub)
>>>>   "Supported minimum scrub cycle duration in seconds by the memory
>>>> scrubber."
>>>>
>>>> Your fix, making the min the max of the mins, looks needed.
>>>>
>>>> I took a look at the max attribute. If the min is the max on the
>>>> mins, then the max should be the max of the maxes. But, not true. We do
>> this:
>>>> instead: *max = U8_MAX * 3600; /* Max set by register size */
>>>>
>>>> The comment isn't helping me, esp since the sysfs description doesn't
>>>> explain that we are using a constant max.
>>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>>> Readable Attributes does not define a field for "max scrub cycle
>>> supported".  Thus for max scrub cycle, returning max value of (U8_MAX) of
>> patrol scrub cycle field.
>>
>> Understand that now, thanks. I'm still wondering if both these deserve more
>> explanation in Documentation/ABI/testing/sysfs-edac-scrub
>> explaining the calculations. Like if the device represents an aggregate of
>> devices, like a region, the min scrub cycle is the max of the mins, whereas if the
>> device is a single, it's exactly what the device returned.  And for max, explaining
>> what you replied above.
> Not sure is it appropriate to add these CXL scrub specific details to the generic file   
> Documentation/ABI/testing/sysfs-edac-scrub?
>
> CXL region specific details were added under section 1.2. Region based scrubbing
> of Documentation/edac/scrub.rst. May be better add these details for CXL specific
> min and max scrub cycle calculation to the Documentation/edac/scrub.rst?
>
> How do you want to post these suggested doc changes, in a follow-up patch now?
>
> Thanks,
> Shiju

I can include the doc changes in next version.


Thanks

Ming


>> Regardless of this noise I'm making about the Docs.. I think Ming should go
>> ahead and v1 the fix for the min calc.
>>
>> --Alison
>>
>>> Thanks,
>>> Shiju
>>>>
>>>>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>>>>> ---
>>>>>  drivers/cxl/core/edac.c | 8 ++++++--
>>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>>>>> index
>>>>> 2cbc664e5d62..ad243cfe00e7 100644
>>>>> --- a/drivers/cxl/core/edac.c
>>>>> +++ b/drivers/cxl/core/edac.c
>>>>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>>> {
>>>>>  	struct cxl_mailbox *cxl_mbox;
>>>>> -	u8 min_scrub_cycle = U8_MAX;
>>>>>  	struct cxl_region_params *p;
>>>>>  	struct cxl_memdev *cxlmd;
>>>>>  	struct cxl_region *cxlr;
>>>>> +	u8 min_scrub_cycle = 0;
>>>>>  	int i, ret;
>>>>>
>>>>>  	if (!cxl_ps_ctx->cxlr) {
>>>>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>  		if (ret)
>>>>>  			return ret;
>>>>>
>>>>> +		/*
>>>>> +		 * The min_scrub_cycle of a region is the maximum value
>>>> among
>>>>> +		 * the min_scrub_cycle of all the memdevs under the region.
>>>>> +		 */
>>>>>  		if (min_cycle)
>>>>> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>>>>> +			min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>>>>>  	}
>>>>>
>>>>>  	if (min_cycle)
>>>>> --
>>>>> 2.34.1
>>>>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-06-02 23:57         ` Li Ming
@ 2025-06-03 10:11           ` Shiju Jose
  2025-06-03 10:13             ` Li Ming
  0 siblings, 1 reply; 9+ messages in thread
From: Shiju Jose @ 2025-06-03 10:11 UTC (permalink / raw)
  To: Li Ming, Alison Schofield
  Cc: dave@stgolabs.net, Jonathan Cameron, dave.jiang@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

>-----Original Message-----
>From: Li Ming <ming.li@zohomail.com>
>Sent: 03 June 2025 00:57
>To: Shiju Jose <shiju.jose@huawei.com>; Alison Schofield
><alison.schofield@intel.com>
>Cc: dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>dave.jiang@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dan.j.williams@intel.com; linux-cxl@vger.kernel.org; linux-
>kernel@vger.kernel.org
>Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>miscalculation
>
>On 6/3/2025 1:25 AM, Shiju Jose wrote:
>>> -----Original Message-----
>>> From: Alison Schofield <alison.schofield@intel.com>
>>> Sent: 02 June 2025 17:48
>>> To: Shiju Jose <shiju.jose@huawei.com>
>>> Cc: Li Ming <ming.li@zohomail.com>; dave@stgolabs.net; Jonathan
>>> Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>>> vishal.l.verma@intel.com; ira.weiny@intel.com;
>>> dan.j.williams@intel.com; linux- cxl@vger.kernel.org;
>>> linux-kernel@vger.kernel.org
>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>> region miscalculation
>>>
>>> On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>>>>> -----Original Message-----
>>>>> From: Alison Schofield <alison.schofield@intel.com>
>>>>> Sent: 30 May 2025 19:28
>>>>> To: Li Ming <ming.li@zohomail.com>
>>>>> Cc: dave@stgolabs.net; Jonathan Cameron
>>>>> <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>>>>> vishal.l.verma@intel.com; ira.weiny@intel.com;
>>>>> dan.j.williams@intel.com; Shiju Jose <shiju.jose@huawei.com>;
>>>>> linux- cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>>> region miscalculation
>>>>>
>>>>> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>>>>>> When trying to update the scrub_cycle value of a cxl region, which
>>>>>> means updating the scrub_cycle value of each memdev under a cxl
>>>>>> region. cxl driver needs to guarantee the new scrub_cycle value is
>>>>>> greater than the min_scrub_cycle value of a memdev, otherwise the
>>>>>> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>>> 8.2.10.9.11.1).
>>>>>> Current implementation logic of getting the min_scrub_cycle value
>>>>>> of a cxl region is that getting the min_scrub_cycle value of each
>>>>>> memdevs under the cxl region, then using the minimum
>>>>>> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>>>>>> the new scrub_cycle value is greater than this value. If yes,
>>>>>> updating the new scrub_cycle value to each memdevs. The issue is
>>>>>> that the new scrub_cycle value is possibly greater than the
>>>>>> minimum min_scrub_cycle value of all memdevs but less than the
>>>>>> maximum min_scrub_cycle value of all memdevs if memdevs have a
>>>>>> different min_scrub_cycle value. The updating operation will
>>>>>> always fail on these memdevs which have a greater min_scrub_cycle
>>>>>> than the new
>>> scrub_cycle.
>>>>>> The correct implementation logic is to get the maximum value of
>>>>>> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>>>>>> is greater than the value. If yes, the new scrub_cycle value is
>>>>>> fit for the
>>> region.
>>>>>> The change also impacts the result of
>>>>>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>>>>>> minimum min_scrub_cycle value among all memdevs under the region
>>>>>> before the change. The interface will return the maximum
>>>>>> min_scrub_cycle value among all memdevs under the region with the
>>> change.
>>>>>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>>>>>> ---
>>>>>> I made this change based on my understanding on the SPEC and
>>>>>> current CXL EDAC code, but I am not sure if it is a bug or it is
>>>>>> designed this
>>> way.
>>>>> The attribute is defined to show (per
>>>>> Documentation/ABI/testing/sysfs-edac-
>>>>> scrub)
>>>>>   "Supported minimum scrub cycle duration in seconds by the memory
>>>>> scrubber."
>>>>>
>>>>> Your fix, making the min the max of the mins, looks needed.
>>>>>
>>>>> I took a look at the max attribute. If the min is the max on the
>>>>> mins, then the max should be the max of the maxes. But, not true.
>>>>> We do
>>> this:
>>>>> instead: *max = U8_MAX * 3600; /* Max set by register size */
>>>>>
>>>>> The comment isn't helping me, esp since the sysfs description
>>>>> doesn't explain that we are using a constant max.
>>>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>>>> Readable Attributes does not define a field for "max scrub cycle
>>>> supported".  Thus for max scrub cycle, returning max value of
>>>> (U8_MAX) of
>>> patrol scrub cycle field.
>>>
>>> Understand that now, thanks. I'm still wondering if both these
>>> deserve more explanation in
>>> Documentation/ABI/testing/sysfs-edac-scrub
>>> explaining the calculations. Like if the device represents an
>>> aggregate of devices, like a region, the min scrub cycle is the max
>>> of the mins, whereas if the device is a single, it's exactly what the
>>> device returned.  And for max, explaining what you replied above.
>> Not sure is it appropriate to add these CXL scrub specific details to the generic
>file
>> Documentation/ABI/testing/sysfs-edac-scrub?
>>
>> CXL region specific details were added under section 1.2. Region based
>> scrubbing of Documentation/edac/scrub.rst. May be better add these
>> details for CXL specific min and max scrub cycle calculation to the
>Documentation/edac/scrub.rst?
>>
>> How do you want to post these suggested doc changes, in a follow-up patch
>now?
>>
>> Thanks,
>> Shiju
>
>I can include the doc changes in next version.

Thanks Ming.

May be like this?

diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
index c43be90deab4..ab6014743da5 100644
--- a/Documentation/ABI/testing/sysfs-edac-scrub
+++ b/Documentation/ABI/testing/sysfs-edac-scrub
@@ -49,6 +49,12 @@ Description:
                (RO) Supported minimum scrub cycle duration in seconds
                by the memory scrubber.
 
+               Device-based scrub: returns the minimum scrub cycle
+               supported by the memory device.
+
+               Region-based scrub: returns the max of minimum scrub cycles
+               supported by individual memory devices that back the region.
+
 What:          /sys/bus/edac/devices/<dev-name>/scrubX/max_cycle_duration
 Date:          March 2025
 KernelVersion: 6.15
@@ -57,6 +63,16 @@ Description:
                (RO) Supported maximum scrub cycle duration in seconds
                by the memory scrubber.
 
+               Device-based scrub: returns the maximum scrub cycle supported
+               by the memory device.
+
+               Region-based scrub: returns the min of maximum scrub cycles
+               supported by individual memory devices that back the region.
+
+               If the memory device does not provide maximum scrub cycle
+               information, return the maximum supported value of the scrub
+               cycle field.
+
 What:          /sys/bus/edac/devices/<dev-name>/scrubX/current_cycle_duration
 Date:          March 2025
 KernelVersion: 6.15


>
>
>Thanks
>
>Ming
>
>
>>> Regardless of this noise I'm making about the Docs.. I think Ming
>>> should go ahead and v1 the fix for the min calc.
>>>
>>> --Alison
>>>
>>>> Thanks,
>>>> Shiju
>>>>>
>>>>>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>>>>>> ---
>>>>>>  drivers/cxl/core/edac.c | 8 ++++++--
>>>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>>>>>> index
>>>>>> 2cbc664e5d62..ad243cfe00e7 100644
>>>>>> --- a/drivers/cxl/core/edac.c
>>>>>> +++ b/drivers/cxl/core/edac.c
>>>>>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>>>> {
>>>>>>  	struct cxl_mailbox *cxl_mbox;
>>>>>> -	u8 min_scrub_cycle = U8_MAX;
>>>>>>  	struct cxl_region_params *p;
>>>>>>  	struct cxl_memdev *cxlmd;
>>>>>>  	struct cxl_region *cxlr;
>>>>>> +	u8 min_scrub_cycle = 0;
>>>>>>  	int i, ret;
>>>>>>
>>>>>>  	if (!cxl_ps_ctx->cxlr) {
>>>>>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>>  		if (ret)
>>>>>>  			return ret;
>>>>>>
>>>>>> +		/*
>>>>>> +		 * The min_scrub_cycle of a region is the maximum
>value
>>>>> among
>>>>>> +		 * the min_scrub_cycle of all the memdevs under the
>region.
>>>>>> +		 */
>>>>>>  		if (min_cycle)
>>>>>> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>>>>>> +			min_scrub_cycle = max(*min_cycle,
>min_scrub_cycle);
>>>>>>  	}
>>>>>>
>>>>>>  	if (min_cycle)
>>>>>> --
>>>>>> 2.34.1
>>>>>>


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
  2025-06-03 10:11           ` Shiju Jose
@ 2025-06-03 10:13             ` Li Ming
  0 siblings, 0 replies; 9+ messages in thread
From: Li Ming @ 2025-06-03 10:13 UTC (permalink / raw)
  To: Shiju Jose, Alison Schofield
  Cc: dave@stgolabs.net, Jonathan Cameron, dave.jiang@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 6/3/2025 6:11 PM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Li Ming <ming.li@zohomail.com>
>> Sent: 03 June 2025 00:57
>> To: Shiju Jose <shiju.jose@huawei.com>; Alison Schofield
>> <alison.schofield@intel.com>
>> Cc: dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>> dave.jiang@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>> dan.j.williams@intel.com; linux-cxl@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>> miscalculation
>>
>> On 6/3/2025 1:25 AM, Shiju Jose wrote:
>>>> -----Original Message-----
>>>> From: Alison Schofield <alison.schofield@intel.com>
>>>> Sent: 02 June 2025 17:48
>>>> To: Shiju Jose <shiju.jose@huawei.com>
>>>> Cc: Li Ming <ming.li@zohomail.com>; dave@stgolabs.net; Jonathan
>>>> Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>>>> vishal.l.verma@intel.com; ira.weiny@intel.com;
>>>> dan.j.williams@intel.com; linux- cxl@vger.kernel.org;
>>>> linux-kernel@vger.kernel.org
>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>> region miscalculation
>>>>
>>>> On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>>>>>> -----Original Message-----
>>>>>> From: Alison Schofield <alison.schofield@intel.com>
>>>>>> Sent: 30 May 2025 19:28
>>>>>> To: Li Ming <ming.li@zohomail.com>
>>>>>> Cc: dave@stgolabs.net; Jonathan Cameron
>>>>>> <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>>>>>> vishal.l.verma@intel.com; ira.weiny@intel.com;
>>>>>> dan.j.williams@intel.com; Shiju Jose <shiju.jose@huawei.com>;
>>>>>> linux- cxl@vger.kernel.org; linux-kernel@vger.kernel.org
>>>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>>>> region miscalculation
>>>>>>
>>>>>> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>>>>>>> When trying to update the scrub_cycle value of a cxl region, which
>>>>>>> means updating the scrub_cycle value of each memdev under a cxl
>>>>>>> region. cxl driver needs to guarantee the new scrub_cycle value is
>>>>>>> greater than the min_scrub_cycle value of a memdev, otherwise the
>>>>>>> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>>>> 8.2.10.9.11.1).
>>>>>>> Current implementation logic of getting the min_scrub_cycle value
>>>>>>> of a cxl region is that getting the min_scrub_cycle value of each
>>>>>>> memdevs under the cxl region, then using the minimum
>>>>>>> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>>>>>>> the new scrub_cycle value is greater than this value. If yes,
>>>>>>> updating the new scrub_cycle value to each memdevs. The issue is
>>>>>>> that the new scrub_cycle value is possibly greater than the
>>>>>>> minimum min_scrub_cycle value of all memdevs but less than the
>>>>>>> maximum min_scrub_cycle value of all memdevs if memdevs have a
>>>>>>> different min_scrub_cycle value. The updating operation will
>>>>>>> always fail on these memdevs which have a greater min_scrub_cycle
>>>>>>> than the new
>>>> scrub_cycle.
>>>>>>> The correct implementation logic is to get the maximum value of
>>>>>>> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>>>>>>> is greater than the value. If yes, the new scrub_cycle value is
>>>>>>> fit for the
>>>> region.
>>>>>>> The change also impacts the result of
>>>>>>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>>>>>>> minimum min_scrub_cycle value among all memdevs under the region
>>>>>>> before the change. The interface will return the maximum
>>>>>>> min_scrub_cycle value among all memdevs under the region with the
>>>> change.
>>>>>>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>>>>>>> ---
>>>>>>> I made this change based on my understanding on the SPEC and
>>>>>>> current CXL EDAC code, but I am not sure if it is a bug or it is
>>>>>>> designed this
>>>> way.
>>>>>> The attribute is defined to show (per
>>>>>> Documentation/ABI/testing/sysfs-edac-
>>>>>> scrub)
>>>>>>   "Supported minimum scrub cycle duration in seconds by the memory
>>>>>> scrubber."
>>>>>>
>>>>>> Your fix, making the min the max of the mins, looks needed.
>>>>>>
>>>>>> I took a look at the max attribute. If the min is the max on the
>>>>>> mins, then the max should be the max of the maxes. But, not true.
>>>>>> We do
>>>> this:
>>>>>> instead: *max = U8_MAX * 3600; /* Max set by register size */
>>>>>>
>>>>>> The comment isn't helping me, esp since the sysfs description
>>>>>> doesn't explain that we are using a constant max.
>>>>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>>>>> Readable Attributes does not define a field for "max scrub cycle
>>>>> supported".  Thus for max scrub cycle, returning max value of
>>>>> (U8_MAX) of
>>>> patrol scrub cycle field.
>>>>
>>>> Understand that now, thanks. I'm still wondering if both these
>>>> deserve more explanation in
>>>> Documentation/ABI/testing/sysfs-edac-scrub
>>>> explaining the calculations. Like if the device represents an
>>>> aggregate of devices, like a region, the min scrub cycle is the max
>>>> of the mins, whereas if the device is a single, it's exactly what the
>>>> device returned.  And for max, explaining what you replied above.
>>> Not sure is it appropriate to add these CXL scrub specific details to the generic
>> file
>>> Documentation/ABI/testing/sysfs-edac-scrub?
>>>
>>> CXL region specific details were added under section 1.2. Region based
>>> scrubbing of Documentation/edac/scrub.rst. May be better add these
>>> details for CXL specific min and max scrub cycle calculation to the
>> Documentation/edac/scrub.rst?
>>> How do you want to post these suggested doc changes, in a follow-up patch
>> now?
>>> Thanks,
>>> Shiju
>> I can include the doc changes in next version.
> Thanks Ming.
>
> May be like this?
>
> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
> index c43be90deab4..ab6014743da5 100644
> --- a/Documentation/ABI/testing/sysfs-edac-scrub
> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
> @@ -49,6 +49,12 @@ Description:
>                 (RO) Supported minimum scrub cycle duration in seconds
>                 by the memory scrubber.
>  
> +               Device-based scrub: returns the minimum scrub cycle
> +               supported by the memory device.
> +
> +               Region-based scrub: returns the max of minimum scrub cycles
> +               supported by individual memory devices that back the region.
> +
>  What:          /sys/bus/edac/devices/<dev-name>/scrubX/max_cycle_duration
>  Date:          March 2025
>  KernelVersion: 6.15
> @@ -57,6 +63,16 @@ Description:
>                 (RO) Supported maximum scrub cycle duration in seconds
>                 by the memory scrubber.
>  
> +               Device-based scrub: returns the maximum scrub cycle supported
> +               by the memory device.
> +
> +               Region-based scrub: returns the min of maximum scrub cycles
> +               supported by individual memory devices that back the region.
> +
> +               If the memory device does not provide maximum scrub cycle
> +               information, return the maximum supported value of the scrub
> +               cycle field.
> +
>  What:          /sys/bus/edac/devices/<dev-name>/scrubX/current_cycle_duration
>  Date:          March 2025
>  KernelVersion: 6.15
>
Sure, will do it, thanks.


Ming

>>
>> Thanks
>>
>> Ming
>>
>>
>>>> Regardless of this noise I'm making about the Docs.. I think Ming
>>>> should go ahead and v1 the fix for the min calc.
>>>>
>>>> --Alison
>>>>
>>>>> Thanks,
>>>>> Shiju
>>>>>>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>>>>>>> ---
>>>>>>>  drivers/cxl/core/edac.c | 8 ++++++--
>>>>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>>>>>>> index
>>>>>>> 2cbc664e5d62..ad243cfe00e7 100644
>>>>>>> --- a/drivers/cxl/core/edac.c
>>>>>>> +++ b/drivers/cxl/core/edac.c
>>>>>>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>>>  				u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>>>>> {
>>>>>>>  	struct cxl_mailbox *cxl_mbox;
>>>>>>> -	u8 min_scrub_cycle = U8_MAX;
>>>>>>>  	struct cxl_region_params *p;
>>>>>>>  	struct cxl_memdev *cxlmd;
>>>>>>>  	struct cxl_region *cxlr;
>>>>>>> +	u8 min_scrub_cycle = 0;
>>>>>>>  	int i, ret;
>>>>>>>
>>>>>>>  	if (!cxl_ps_ctx->cxlr) {
>>>>>>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>>>  		if (ret)
>>>>>>>  			return ret;
>>>>>>>
>>>>>>> +		/*
>>>>>>> +		 * The min_scrub_cycle of a region is the maximum
>> value
>>>>>> among
>>>>>>> +		 * the min_scrub_cycle of all the memdevs under the
>> region.
>>>>>>> +		 */
>>>>>>>  		if (min_cycle)
>>>>>>> -			min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>>>>>>> +			min_scrub_cycle = max(*min_cycle,
>> min_scrub_cycle);
>>>>>>>  	}
>>>>>>>
>>>>>>>  	if (min_cycle)
>>>>>>> --
>>>>>>> 2.34.1
>>>>>>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-03 10:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 12:28 [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation Li Ming
2025-05-30 18:27 ` Alison Schofield
2025-05-31 11:52   ` Li Ming
2025-06-02  8:23   ` Shiju Jose
2025-06-02 16:48     ` Alison Schofield
2025-06-02 17:25       ` Shiju Jose
2025-06-02 23:57         ` Li Ming
2025-06-03 10:11           ` Shiju Jose
2025-06-03 10:13             ` Li Ming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).