All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: <shiju.jose@huawei.com>
Cc: <linux-edac@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <bp@alien8.de>,
	<tony.luck@intel.com>, <rafael@kernel.org>, <lenb@kernel.org>,
	<mchehab@kernel.org>, <dan.j.williams@intel.com>,
	<dave@stgolabs.net>, <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <david@redhat.com>,
	<Vilas.Sridharan@amd.com>, <leo.duran@amd.com>,
	<Yazen.Ghannam@amd.com>, <rientjes@google.com>,
	<jiaqiyan@google.com>, <Jon.Grimm@amd.com>,
	<dave.hansen@linux.intel.com>, <naoya.horiguchi@nec.com>,
	<james.morse@arm.com>, <jthoughton@google.com>,
	<somasundaram.a@hpe.com>, <erdemaktas@google.com>,
	<pgonda@google.com>, <duenwen@google.com>, <gthelen@google.com>,
	<wschwartz@amperecomputing.com>, <dferguson@amperecomputing.com>,
	<wbs@os.amperecomputing.com>, <nifan.cxl@gmail.com>,
	<tanxiaofei@huawei.com>, <prime.zeng@hisilicon.com>,
	<roberto.sassu@huawei.com>, <kangkang.shen@futurewei.com>,
	<wanghuiqiang@huawei.com>, <linuxarm@huawei.com>
Subject: Re: [PATCH v13 15/18] EDAC: Add memory repair control feature
Date: Mon, 14 Oct 2024 17:23:12 +0100	[thread overview]
Message-ID: <20241014172312.00007034@Huawei.com> (raw)
In-Reply-To: <20241009124120.1124-16-shiju.jose@huawei.com>

On Wed, 9 Oct 2024 13:41:16 +0100
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
> memory sparing etc, control driver in order to control memory repairs
> in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard memory
> sparing, memory sparing at cacheline/row/bank/rank granularity etc.
> Device with memory repair features registers with EDAC device driver,
> which retrieves memory repair descriptor from EDAC memory repair driver and
> exposes the sysfs repair control attributes to userspace in
> /sys/bus/edac/devices/<dev-name>/mem_repairX/.
> 
> The common memory repair control interface abstracts the control of an
> arbitrary memory repair functionality to a common set of functions.
> The sysfs memory repair attribute nodes would be present only if the client
> driver has implemented the corresponding attribute callback function and
> passed in ops to the EDAC device driver during registration.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
The question inline that we discussed offlist.

Whether it makes sense to potentially have one device provide
several mem_repairX differing in granularity (and may type) of
repair, or one mem_repairX that has a control over granularity?

The CXL spec has it designed as separate control interfaces but
I'm not sure if we should follow that precedence or not.

> ---
>  .../ABI/testing/sysfs-edac-mem-repair         | 152 +++++++++
>  drivers/edac/Makefile                         |   2 +-
>  drivers/edac/edac_device.c                    |  31 ++
>  drivers/edac/mem_repair.c                     | 317 ++++++++++++++++++
>  include/linux/edac.h                          |  67 ++++
>  5 files changed, 568 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-mem-repair
>  create mode 100755 drivers/edac/mem_repair.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-mem-repair b/Documentation/ABI/testing/sysfs-edac-mem-repair
> new file mode 100644
> index 000000000000..9a8712ed9d47
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-mem-repair
> @@ -0,0 +1,152 @@
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory
> +		belongs to the memory media repair features control, such as
> +		PPR (Post Package Repair), memory sparing etc, where<dev-name>
> +		directory corresponds to a device registered with the EDAC
> +		device driver for the memory repair features.
> +		/mem_repairX belongs to either sPPR (Soft PPR) or hPPR (Hard PPR)
> +		feature of PPR feature, hard or soft memory sparing etc. The memory
> +		sparing is a repair function that replaces a portion of memory
> +		(spared memory) with a portion of functional memory. The memory
> +		sparing has cacheline/row/bank/rank sparing granularities.
> +		The sysfs memory repair attr nodes would be only present if a
> +		memory repair feature is supported.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) Type of the repair instance. For eg. sPPR, hPPR, cacheline/
> +		row/bank/rank memory sparing etc.
So this is the open question for me with this feature.
Do we do a monolithic 'device' that does all repair types for which we pick a mode
or do we (as here) allow for one mem_repairX for each supported type?

I don't particularly mind but it is a design question I'd like input on
from a wider audience.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) Set HPA (Host Physical Address) for memory repair.

Can we not just read back what was written?  Seems like userspace
might expect that?


  reply	other threads:[~2024-10-14 16:23 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 12:41 [PATCH v13 00/18] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-10-09 12:41 ` [PATCH v13 01/18] EDAC: Add support for EDAC device features control shiju.jose
2024-10-14 14:18   ` Jonathan Cameron
2024-10-17  8:37     ` Shiju Jose
2024-10-16 10:58   ` Borislav Petkov
2024-10-17  8:37     ` Shiju Jose
2024-10-09 12:41 ` [PATCH v13 02/18] EDAC: Add scrub control feature shiju.jose
2024-10-14 14:26   ` Jonathan Cameron
2024-10-22 19:04   ` Borislav Petkov
2024-10-23 16:04     ` Shiju Jose
2024-10-23 16:16       ` Borislav Petkov
2024-10-09 12:41 ` [PATCH v13 03/18] EDAC: Add ECS " shiju.jose
2024-10-14 14:33   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 04/18] cxl: move cxl headers to new include/cxl/ directory shiju.jose
2024-10-14 14:34   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 05/18] cxl: Move mailbox related bits to the same context shiju.jose
2024-10-14 14:42   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 06/18] cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input shiju.jose
2024-10-09 12:41 ` [PATCH v13 07/18] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-10-14 15:05   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 08/18] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-10-14 15:08   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 09/18] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-10-14 15:12   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 10/18] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-10-14 15:28   ` Jonathan Cameron
2024-10-14 18:02   ` Fan Ni
2024-10-15 16:32     ` Shiju Jose
2024-10-09 12:41 ` [PATCH v13 11/18] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-10-14 15:40   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 12/18] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-10-14 15:43   ` Jonathan Cameron
2024-10-14 16:00     ` Greg KH
2024-10-14 16:04       ` Greg KH
2024-10-14 17:16         ` Jonathan Cameron
2024-10-14 18:06           ` Rafael J. Wysocki
2024-10-15  9:10             ` Jonathan Cameron
2024-10-15  9:40               ` Jonathan Cameron
2024-10-15 10:17                 ` Greg KH
2024-10-15 13:32                   ` Rafael J. Wysocki
2024-10-15 14:19                     ` Jonathan Cameron
2024-10-15 15:35                       ` Rafael J. Wysocki
2024-10-16  9:00                         ` Jonathan Cameron
2024-10-15 13:34                   ` Jonathan Cameron
2024-10-15 13:37                     ` Rafael J. Wysocki
2024-10-09 12:41 ` [PATCH v13 13/18] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-14 15:49   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 14/18] ras: mem: Add memory " shiju.jose
2024-10-09 12:41 ` [PATCH v13 15/18] EDAC: Add memory repair control feature shiju.jose
2024-10-14 16:23   ` Jonathan Cameron [this message]
2024-10-14 16:39     ` Shiju Jose
2024-10-14 17:02       ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 16/18] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-10-14 16:26   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 17/18] cxl/memfeature: Add CXL memory device PPR control feature shiju.jose
2024-10-14 16:38   ` Jonathan Cameron
2024-10-09 12:41 ` [PATCH v13 18/18] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2024-10-14 17:00   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241014172312.00007034@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.