linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <shiju.jose@huawei.com>
Cc: <linux-edac@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <bp@alien8.de>,
	<tony.luck@intel.com>, <rafael@kernel.org>, <lenb@kernel.org>,
	<mchehab@kernel.org>, <dan.j.williams@intel.com>,
	<dave@stgolabs.net>, <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <david@redhat.com>,
	<Vilas.Sridharan@amd.com>, <leo.duran@amd.com>,
	<Yazen.Ghannam@amd.com>, <rientjes@google.com>,
	<jiaqiyan@google.com>, <Jon.Grimm@amd.com>,
	<dave.hansen@linux.intel.com>, <naoya.horiguchi@nec.com>,
	<james.morse@arm.com>, <jthoughton@google.com>,
	<somasundaram.a@hpe.com>, <erdemaktas@google.com>,
	<pgonda@google.com>, <duenwen@google.com>, <gthelen@google.com>,
	<wschwartz@amperecomputing.com>, <dferguson@amperecomputing.com>,
	<wbs@os.amperecomputing.com>, <nifan.cxl@gmail.com>,
	<tanxiaofei@huawei.com>, <prime.zeng@hisilicon.com>,
	<roberto.sassu@huawei.com>, <kangkang.shen@futurewei.com>,
	<wanghuiqiang@huawei.com>, <linuxarm@huawei.com>
Subject: Re: [PATCH v13 15/18] EDAC: Add memory repair control feature
Date: Mon, 14 Oct 2024 17:23:12 +0100	[thread overview]
Message-ID: <20241014172312.00007034@Huawei.com> (raw)
In-Reply-To: <20241009124120.1124-16-shiju.jose@huawei.com>

On Wed, 9 Oct 2024 13:41:16 +0100
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
> memory sparing etc, control driver in order to control memory repairs
> in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard memory
> sparing, memory sparing at cacheline/row/bank/rank granularity etc.
> Device with memory repair features registers with EDAC device driver,
> which retrieves memory repair descriptor from EDAC memory repair driver and
> exposes the sysfs repair control attributes to userspace in
> /sys/bus/edac/devices/<dev-name>/mem_repairX/.
> 
> The common memory repair control interface abstracts the control of an
> arbitrary memory repair functionality to a common set of functions.
> The sysfs memory repair attribute nodes would be present only if the client
> driver has implemented the corresponding attribute callback function and
> passed in ops to the EDAC device driver during registration.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
The question inline that we discussed offlist.

Whether it makes sense to potentially have one device provide
several mem_repairX differing in granularity (and may type) of
repair, or one mem_repairX that has a control over granularity?

The CXL spec has it designed as separate control interfaces but
I'm not sure if we should follow that precedence or not.

> ---
>  .../ABI/testing/sysfs-edac-mem-repair         | 152 +++++++++
>  drivers/edac/Makefile                         |   2 +-
>  drivers/edac/edac_device.c                    |  31 ++
>  drivers/edac/mem_repair.c                     | 317 ++++++++++++++++++
>  include/linux/edac.h                          |  67 ++++
>  5 files changed, 568 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-mem-repair
>  create mode 100755 drivers/edac/mem_repair.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-mem-repair b/Documentation/ABI/testing/sysfs-edac-mem-repair
> new file mode 100644
> index 000000000000..9a8712ed9d47
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-mem-repair
> @@ -0,0 +1,152 @@
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory
> +		belongs to the memory media repair features control, such as
> +		PPR (Post Package Repair), memory sparing etc, where<dev-name>
> +		directory corresponds to a device registered with the EDAC
> +		device driver for the memory repair features.
> +		/mem_repairX belongs to either sPPR (Soft PPR) or hPPR (Hard PPR)
> +		feature of PPR feature, hard or soft memory sparing etc. The memory
> +		sparing is a repair function that replaces a portion of memory
> +		(spared memory) with a portion of functional memory. The memory
> +		sparing has cacheline/row/bank/rank sparing granularities.
> +		The sysfs memory repair attr nodes would be only present if a
> +		memory repair feature is supported.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) Type of the repair instance. For eg. sPPR, hPPR, cacheline/
> +		row/bank/rank memory sparing etc.
So this is the open question for me with this feature.
Do we do a monolithic 'device' that does all repair types for which we pick a mode
or do we (as here) allow for one mem_repairX for each supported type?

I don't particularly mind but it is a design question I'd like input on
from a wider audience.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) Set HPA (Host Physical Address) for memory repair.

Can we not just read back what was written?  Seems like userspace
might expect that?



  reply	other threads:[~2024-10-14 16:23 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 12:41 [PATCH v13 00/18] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-10-09 12:41 ` [PATCH v13 01/18] EDAC: Add support for EDAC device features control shiju.jose
2024-10-14 14:18   ` Jonathan Cameron
2024-10-17  8:37     ` Shiju Jose
2024-10-16 10:58   ` Borislav Petkov
2024-10-17  8:37     ` Shiju Jose
2024-10-09 12:41 ` [PATCH v13 02/18] EDAC: Add scrub control feature shiju.jose
2024-10-14 14:26   ` Jonathan Cameron
2024-10-22 19:04   ` Borislav Petkov
2024-10-23 16:04     ` Shiju Jose
2024-10-23 16:16       ` Borislav Petkov
2024-10-09 12:41 ` [PATCH v13 03/18] EDAC: Add ECS " shiju.jose
2024-10-14 14:33   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 04/18] cxl: move cxl headers to new include/cxl/ directory shiju.jose
2024-10-14 14:34   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 05/18] cxl: Move mailbox related bits to the same context shiju.jose
2024-10-14 14:42   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 06/18] cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input shiju.jose
2024-10-09 12:41 ` [PATCH v13 07/18] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-10-14 15:05   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 08/18] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-10-14 15:08   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 09/18] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-10-14 15:12   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 10/18] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-10-14 15:28   ` Jonathan Cameron
2024-10-14 18:02   ` Fan Ni
2024-10-15 16:32     ` Shiju Jose
2024-10-09 12:41 ` [PATCH v13 11/18] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-10-14 15:40   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 12/18] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-10-14 15:43   ` Jonathan Cameron
2024-10-14 16:00     ` Greg KH
2024-10-14 16:04       ` Greg KH
2024-10-14 17:16         ` Jonathan Cameron
2024-10-14 18:06           ` Rafael J. Wysocki
2024-10-15  9:10             ` Jonathan Cameron
2024-10-15  9:40               ` Jonathan Cameron
2024-10-15 10:17                 ` Greg KH
2024-10-15 13:32                   ` Rafael J. Wysocki
2024-10-15 14:19                     ` Jonathan Cameron
2024-10-15 15:35                       ` Rafael J. Wysocki
2024-10-16  9:00                         ` Jonathan Cameron
2024-10-15 13:34                   ` Jonathan Cameron
2024-10-15 13:37                     ` Rafael J. Wysocki
2024-10-09 12:41 ` [PATCH v13 13/18] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-14 15:49   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 14/18] ras: mem: Add memory " shiju.jose
2024-10-09 12:41 ` [PATCH v13 15/18] EDAC: Add memory repair control feature shiju.jose
2024-10-14 16:23   ` Jonathan Cameron [this message]
2024-10-14 16:39     ` Shiju Jose
2024-10-14 17:02       ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 16/18] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-10-14 16:26   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 17/18] cxl/memfeature: Add CXL memory device PPR control feature shiju.jose
2024-10-14 16:38   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 18/18] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2024-10-14 17:00   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241014172312.00007034@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).