All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: shiju.jose@huawei.com, linux-cxl@vger.kernel.org,
	dan.j.williams@intel.com, jonathan.cameron@huawei.com,
	dave@stgolabs.net, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com
Cc: linux-edac@vger.kernel.org, linux-doc@vger.kernel.org,
	bp@alien8.de, tony.luck@intel.com, lenb@kernel.org,
	Yazen.Ghannam@amd.com, mchehab@kernel.org, nifan.cxl@gmail.com,
	linuxarm@huawei.com, tanxiaofei@huawei.com,
	prime.zeng@hisilicon.com, roberto.sassu@huawei.com,
	kangkang.shen@futurewei.com, wanghuiqiang@huawei.com
Subject: Re: [PATCH v6 0/8] cxl: support CXL memory RAS features
Date: Fri, 23 May 2025 13:38:13 -0700	[thread overview]
Message-ID: <0ece175b-ee76-4814-a591-7a75cac321ea@intel.com> (raw)
In-Reply-To: <20250521124749.817-1-shiju.jose@huawei.com>



On 5/21/25 5:47 AM, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Support for CXL memory EDAC features: patrol scrub, ECS, soft-PPR and
> memory sparing.
> 
> Detailed history of the complete EDAC series with CXL EDAC patches
> up to V20 [1] and this CXL specific series had separated from V20 of
> the above series.
> 
> The series is based on [2] v6.15-rc4 (based on comment from Dave
> in the thread [4]).
> 
> Also applied(no conflicts) and tested on cxl.git [3] branch: next
> 
> 1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
> 2. https://github.com/torvalds/linux.git
> 3. https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
> 4. https://lore.kernel.org/all/d83a83d1-37e7-4192-913f-243098f679e3@intel.com/
> 
> Userspace code for CXL memory repair features [5] and
> sample boot-script for CXL memory repair [6].
> 
> [5]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
> [6]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/

Applied to cxl/next

> 
> Changes
> =======
> v5 -> v6:
> 1. Fixed feedback from Randy Dunlap on CXL EDAC documentation.
> 
> 2. Feedback from Alison:
>   - Replace #ifdef using IS_ENABLED() in the series
>   - Fix for the kfree() oops in devm_cxl_memdev_edac_release()
>     while unloading cxl-test module. 
>   - Added separate helper functions for scrub set attributes for
>     dev scrub and region scrub.
>   - renaming to scrub_cycle and scrub_region_id.
>       
> 3. Feedback from Dave:
>   - Fix for the kfree() oops in devm_cxl_memdev_edac_release()
>     while unloading cxl-test module.
>   - Add cxl_test inclusion of edac.o
>   - Check return from cxl_feature_info() with IS_ERR in the series. 
>  
> 4. Rebased to linux.git [2] v6.15-rc4 (based on comment from Dave
> in the thread [4]).
>    
> v4 -> v5:
> 1. Fixed a compilation warning introduced by v3->v4, reported by Dave Jiang on v4. 
>    drivers/cxl/core/edac.c: In function ‘cxl_mem_perform_sparing’:
>    drivers/cxl/core/edac.c:1335:29: warning: the comparison will always evaluate as ‘true’ for the address of ‘validity_flags’ will never be NULL [-Waddress]
>  1335 |                         if (!rec->media_hdr.validity_flags)
>       |                             ^
>    In file included from ./drivers/cxl/cxlmem.h:10,
>                  from drivers/cxl/core/edac.c:21:
>    ./include/cxl/event.h:35:12: note: ‘validity_flags’ declared here
>    35 |         u8 validity_flags[2];
>       |            ^~~~~~~~~~~~~~
> 2. Updated patches for tags given.
> 
> v3 -> v4:
> 1. Feedback from Dave Jiang on v3,
> 1.1. Changes for comments in EDAC scrub documentation for CXL use cases.
>      https://lore.kernel.org/all/2df68c68-f1a8-4327-abc9-d265326c133d@intel.com/
> 1.2. Changes for comments in CXL memory sparing control feature.
>      https://lore.kernel.org/all/4ee3323c-fb27-4fbe-b032-78fd54bc21a0@intel.com/
>       
> v2 -> v3:
> 1. Feedback from Dan Williams on v2,
>    https://lore.kernel.org/linux-mm/20250320180450.539-1-shiju.jose@huawei.com/
>   - Modified get_support_feature_info() in fwctl series generic to use in
>     cxl/fxctl and cxl/edac and replace cxl_get_feature_entry() in the CXL edac
>     series.
>   - Add usecase note for CXL ECS in Documentation/edac/scrub.rst.
>   - Add info message when device scrub rate set by a region overwritten with a
>     local device scrub rate or another region's scrub rate.
>   - Replace 'ps' with patrol_scrub in the patrol scrub feature.
>   - Replaced usage of intermediate objects struct cxl_memdev_ps_params and
>     enum cxl_scrub_param etc for patrol scrub and did same for ECS.
>   - Rename CXL_MEMDEV_PS_* macros.
>   - Rename scrub_cycle_hrs-> scrub_cycle_hours
>   - Add if (!cxl_dev_name)
> 	return -ENOMEM;  to devm_cxl_memdev_edac_register()
>   - Add  devm_cxl_region_edac_register(cxlr) for CXL_PARTMODE_PMEM case.
>   - Add separate configurations for CXL scrub, ECS and memory repair
>     CXL_EDAC_SCRUB, CXL_EDAC_ECS and CXL_EDAC_MEM_REPAIR.
>   - Add 
>        if (!capable(CAP_SYS_RAWIO))
>              return -EPERM; for set attributes callbacks for CXL scrub, ECS and
>     memory repair.  	       
>   - In patch "cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command"
>     * cxl_do_maintenance() -> cxl_perform_maintenance() and moved to cxl/core/edac.c 
>     * kmalloc() -> kvzalloc()
>   - In patch, "cxl: Support for finding memory operation attributes from the current boot"  
>     * Moved code from drivers/cxl/core/ras.c to drivers/cxl/core/edac.c
>     * Add few logics to releasing the cache to give safety with respect to error storms and burning
>     * unlimited memory. 
>     * Add estimated memory overhead expense of this feature documented in the Kconfig.
>     * Unified various names such as attr, param, attrbs throughout the patches.
>     * Moved > struct xarray rec_gen_media and struct xarray rec_dram; out of struct cxl_memdev
>       to CXL edac object, but there is required a pointer to this object in struct cxl_memdev
>       because the error records are reported and thus stored in the cxl_memdev context not
>       in the CXL EDAC context.
>       
> 2. Feedback from Borislav on v2,
>   - In include/linux/edac.h 
>     Replace EDAC_PPR -> EDAC_REPAIR_PPR
>             EDAC_CACHELINE_SPARING -> EDAC_REPAIR_CACHELINE_SPARING etc.
> 
> v1 -> v2:
> 1. Feedback from Dan Williams on v1,
>    https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
>   - Fixed lock issues in region scrubbing, added local cxl_acquire()
>     and cxl_unlock.
>   - Replaced CXL examples using cat and echo from EDAC .rst docs
>     with short description and ref to ABI docs. Also corrections
>     in existing descriptions as suggested by Dan.
>   - Add policy description for the scrub control feature.
>     However this may require inputs from CXL experts.
>   - Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
>   - Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
>   - Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
>   - snprintf() -> kasprintf() in few places.
> 
> 2. Feedback from Alison on v1,
>   - In cxl_get_feature_entry()(patch 1), return NULL on failures and
>     reintroduced checks in cxl_get_feature_entry().
>   - Changed logic in for loop in region based scrubbing code.
>   - Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
>     and add as a local function to drivers/cxl/core/edac.c
>   - Changed few multiline comments to single line comments.
>   - Removed unnecessary comments from the code.
>   - Reduced line length of few macros in ECS and memory repair code.
>   - In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
>   - Ran clang-format for new files and updated.                                                                                                                                          
> 3. Changes for feedbacks from Jonathan on v1.
>   - Changed few multiline comments to single line comments.
> 
> Shiju Jose (8):
>   EDAC: Update documentation for the CXL memory patrol scrub control
>     feature
>   cxl: Update prototype of function get_support_feature_info()
>   cxl/edac: Add CXL memory device patrol scrub control feature
>   cxl/edac: Add CXL memory device ECS control feature
>   cxl/edac: Add support for PERFORM_MAINTENANCE command
>   cxl/edac: Support for finding memory operation attributes from the
>     current boot
>   cxl/edac: Add CXL memory device memory sparing control feature
>   cxl/edac: Add CXL memory device soft PPR control feature
> 
>  Documentation/edac/memory_repair.rst |   31 +
>  Documentation/edac/scrub.rst         |   76 +
>  drivers/cxl/Kconfig                  |   71 +
>  drivers/cxl/core/Makefile            |    1 +
>  drivers/cxl/core/core.h              |    2 +
>  drivers/cxl/core/edac.c              | 2103 ++++++++++++++++++++++++++
>  drivers/cxl/core/features.c          |   17 +-
>  drivers/cxl/core/mbox.c              |   11 +-
>  drivers/cxl/core/memdev.c            |    1 +
>  drivers/cxl/core/region.c            |   10 +
>  drivers/cxl/cxl.h                    |   10 +
>  drivers/cxl/cxlmem.h                 |   30 +
>  drivers/cxl/mem.c                    |    4 +
>  drivers/edac/mem_repair.c            |    9 +
>  include/linux/edac.h                 |    7 +
>  tools/testing/cxl/Kbuild             |    1 +
>  16 files changed, 2372 insertions(+), 12 deletions(-)
>  create mode 100644 drivers/cxl/core/edac.c
> 


      parent reply	other threads:[~2025-05-23 20:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21 12:47 [PATCH v6 0/8] cxl: support CXL memory RAS features shiju.jose
2025-05-21 12:47 ` [PATCH v6 1/8] EDAC: Update documentation for the CXL memory patrol scrub control feature shiju.jose
2025-05-21 16:28   ` Fan Ni
2025-05-21 12:47 ` [PATCH v6 2/8] cxl: Update prototype of function get_support_feature_info() shiju.jose
2025-05-21 16:31   ` Fan Ni
2025-05-21 12:47 ` [PATCH v6 3/8] cxl/edac: Add CXL memory device patrol scrub control feature shiju.jose
2025-05-21 14:40   ` Jonathan Cameron
2025-05-21 23:55     ` Dave Jiang
2025-05-21 17:07   ` Alison Schofield
2025-05-21 17:48     ` Jonathan Cameron
2025-05-21 20:17       ` Alison Schofield
2025-05-21 12:47 ` [PATCH v6 4/8] cxl/edac: Add CXL memory device ECS " shiju.jose
2025-05-21 12:47 ` [PATCH v6 5/8] cxl/edac: Add support for PERFORM_MAINTENANCE command shiju.jose
2025-05-21 12:47 ` [PATCH v6 6/8] cxl/edac: Support for finding memory operation attributes from the current boot shiju.jose
2025-05-21 12:47 ` [PATCH v6 7/8] cxl/edac: Add CXL memory device memory sparing control feature shiju.jose
2025-05-23 18:50   ` Dan Williams
2025-05-21 12:47 ` [PATCH v6 8/8] cxl/edac: Add CXL memory device soft PPR " shiju.jose
2025-05-21 14:59 ` [PATCH v6 0/8] cxl: support CXL memory RAS features Jonathan Cameron
2025-05-21 20:19 ` Alison Schofield
2025-05-23 18:53 ` Dan Williams
2025-05-23 20:38 ` Dave Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ece175b-ee76-4814-a591-7a75cac321ea@intel.com \
    --to=dave.jiang@intel.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=nifan.cxl@gmail.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=roberto.sassu@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.