From: Shiju Jose <shiju.jose@huawei.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"rafael@kernel.org" <rafael@kernel.org>,
"lenb@kernel.org" <lenb@kernel.org>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"dave@stgolabs.net" <dave@stgolabs.net>,
"Jonathan Cameron" <jonathan.cameron@huawei.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"sudeep.holla@arm.com" <sudeep.holla@arm.com>,
"jassisinghbrar@gmail.com" <jassisinghbrar@gmail.com>,
"dave.jiang@intel.com" <dave.jiang@intel.com>,
"alison.schofield@intel.com" <alison.schofield@intel.com>,
"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
"ira.weiny@intel.com" <ira.weiny@intel.com>,
"david@redhat.com" <david@redhat.com>,
"Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
"leo.duran@amd.com" <leo.duran@amd.com>,
"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
"rientjes@google.com" <rientjes@google.com>,
"jiaqiyan@google.com" <jiaqiyan@google.com>,
"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
"james.morse@arm.com" <james.morse@arm.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
"erdemaktas@google.com" <erdemaktas@google.com>,
"pgonda@google.com" <pgonda@google.com>,
"duenwen@google.com" <duenwen@google.com>,
"gthelen@google.com" <gthelen@google.com>,
"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
tanxiaofei <tanxiaofei@huawei.com>,
"Zengtao (B)" <prime.zeng@hisilicon.com>,
"Roberto Sassu" <roberto.sassu@huawei.com>,
"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
wanghuiqiang <wanghuiqiang@huawei.com>,
Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH v14 03/14] EDAC: Add ECS control feature
Date: Mon, 28 Oct 2024 16:03:31 +0000 [thread overview]
Message-ID: <2263e5a551e24a6ca63dbe33538a5ce8@huawei.com> (raw)
In-Reply-To: <20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local>
Thanks for the comments.
>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 28 October 2024 11:17
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; gregkh@linuxfoundation.org;
>sudeep.holla@arm.com; jassisinghbrar@gmail.com; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature
>
>On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@huawei.com wrote:
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RW) The log entry type of how the DDR5 ECS log is reported.
>> + 00b - per DRAM.
>> + 01b - per memory media FRU.
>
>If the conversion function here is kstrtoul(), why are those values not "0"
>and "1" but in binary format?
Followed the description for the " Log Entry Type " in the CXL spec rev 3.1 Table 8-210.
DDR5 ECS Control Feature Readable Attributes and Table 8-211. DDR5 ECS Control Feature
Writable Attributes. This was written as " Common DDR5 ECS Log Capabilities " though actually returns
the status of the log_entry_type.
Common DDR5 ECS Log Capabilities
• Bits[1:0]: Log Entry Type: The log entry type of how the ECS log is
reported. The entry type is defined commonly for all memory media FRUs
within the device.
— 00b = Per DRAM
— 01b = Per Memory Media FRU
— All other encodings are reserved
• Bits[7:2]: Reserved.
I will update to 0 and 1.
>
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type_per_dram
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RO) True if current log entry type is per DRAM.
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type_per_memory_media
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RO) True if current log entry type is per memory media FRU.
>
>What's the point of those two if log_entry_type already gives you the same info?
>
This was written as " Common DDR5 ECS Log Capabilities " in the CXL spec rev 3.1
Table 8-210. DDR5 ECS Control Feature Readable Attributes, though actually returns
the status of the log_entry_type. I will remove these extra log type attributes.
>And the filename length is a bit too much...
>
>> +
>> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RW) The mode of how the DDR5 ECS counts the errors.
>> + 0 - ECS counts rows with errors.
>> + 1 - ECS counts codewords with errors.
>
>Now we have "0" and "1"s. Oh well.
>
>What are "rows", what are "codewords"? Explain them here pls for the user.
Will do.
>
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/mode_counts_rows
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RO) True if current mode is ECS counts rows with errors.
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/mode_counts_codewords
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RO) True if current mode is ECS counts codewords with errors.
>
>Same question as above - redundant files.
I will remove redundant files.
>
>> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (WO) ECS reset ECC counter.
>> + 1 - reset ECC counter to the default value.
>
>1 or any value?
>
>Looks like any to me...
>
>You should restrict it to "1" in case you want to extend this interface with "2" in
>the future, for example, doing something a bit different.
I will update. Returns error for any other value set than '1'.
>
>> +
>> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@vger.kernel.org
>> +Description:
>> + (RW) ECS threshold count per gigabits of memory cells.
>
>That definitely needs more explanation.
Sure.
>
>> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index
>> 188501e676c7..b24c2c112d9c 100644
>> --- a/drivers/edac/Makefile
>> +++ b/drivers/edac/Makefile
>> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o
>>
>> edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o
>> edac_core-y += edac_module.o edac_device_sysfs.o wq.o
>> -edac_core-y += scrub.o
>> +edac_core-y += scrub.o ecs.o
>>
>> edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o
>>
>> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c new file mode
>> 100755 index 000000000000..a2b64d7bf6b6
>> --- /dev/null
>> +++ b/drivers/edac/ecs.c
>> @@ -0,0 +1,240 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Generic ECS driver in order to support control the on die
>> + * error check scrub (e.g. DDR5 ECS).
>
>This sentence needs grammar check.
Will do.
>
>> The common sysfs ECS
>> + * interface abstracts the control of an arbitrary ECS
>> + * functionality to a common set of functions.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>
>#undef pr_fmt
>
>> +#define pr_fmt(fmt) "EDAC ECS: " fmt
>
>Grep the tree for examples how to do that properly.
>
>Also, this pr_fmt looks unused.
I will remove.
>
>> +static umode_t ecs_attr_visible(struct kobject *kobj, struct
>> +attribute *a, int attr_id) {
>> + struct device *ras_feat_dev = kobj_to_dev(kobj);
>> + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
>> +
>> + switch (attr_id) {
>> + case ECS_LOG_ENTRY_TYPE:
>> + if (ops->get_log_entry_type) {
>> + if (ops->set_log_entry_type)
>> + return a->mode;
>> + else
>> + return 0444;
>
>What is the goal for the access mode of all those sysfs entries? I sure hope it is
>going to be root-only no-matter what. I don't want normal users to cause scrub
>activity. Please make sure your whole set does that.
This is the attribute_group's is_visible() callback for controlling the visibility
of the attributes based on whether attribute is added or not by the parent driver.
Yes. Writable only by the root.
>
>> + }
>> + break;
>> + case ECS_LOG_ENTRY_TYPE_PER_DRAM:
>> + if (ops->get_log_entry_type_per_dram)
>> + return a->mode;
>> + break;
[...]
> EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
>> + fru_ctx-
>>ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] =
>> +
> EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
>> + fru_ctx->ecs_dev_attr[ECS_MODE] =
>EDAC_ECS_ATTR_RW(mode, fru);
>> + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] =
>> +
> EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
>> + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] =
>> +
> EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
>> + fru_ctx->ecs_dev_attr[ECS_RESET] =
>EDAC_ECS_ATTR_WO(reset, fru);
>> + fru_ctx->ecs_dev_attr[ECS_THRESHOLD] =
>EDAC_ECS_ATTR_RW(threshold,
>> +fru);
>
>Clearly too long variable and define names. Shorten pls.
Will do
>
>Also, a new line here:
Ok.
>
><---
>
>
[...]
>> + case RAS_FEAT_ECS:
>> + if (!ras_features->ecs_ops)
>> + goto data_mem_free;
>
><---- newline here.
Wil do.
>
>> + dev_data = &ctx->ecs;
>> + dev_data->ecs_ops = ras_features->ecs_ops;
>> + dev_data->private = ras_features->ctx;
>> + ret = edac_ecs_get_desc(parent,
>&ras_attr_groups[attr_gcnt],
>> + ras_features-
>>ecs_info.num_media_frus);
>> + if (ret)
>> + goto data_mem_free;
>
>Ditto.
Wil do.
>
>> + attr_gcnt += ras_features->ecs_info.num_media_frus;
>> + break;
>> default:
>> ret = -EINVAL;
>> goto data_mem_free;
>
>--
>Regards/Gruss,
> Boris.
Thanks,
Shiju
next prev parent reply other threads:[~2024-10-28 16:03 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 17:13 [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-10-25 17:13 ` [PATCH v14 01/14] EDAC: Add support for EDAC device features control shiju.jose
2024-10-25 17:13 ` [PATCH v14 02/14] EDAC: Add scrub control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 03/14] EDAC: Add ECS " shiju.jose
2024-10-28 11:16 ` Borislav Petkov
2024-10-28 16:03 ` Shiju Jose [this message]
2024-10-29 20:07 ` Borislav Petkov
2024-10-25 17:13 ` [PATCH v14 04/14] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-10-25 17:13 ` [PATCH v14 05/14] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-10-29 15:47 ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 06/14] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-10-29 15:51 ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-10-29 16:31 ` Dave Jiang
2024-10-29 17:00 ` Shiju Jose
2024-10-29 18:32 ` Dave Jiang
2024-10-30 16:16 ` Jonathan Cameron
2024-10-30 16:46 ` Dave Jiang
2024-10-29 20:15 ` Borislav Petkov
2024-10-30 12:52 ` Shiju Jose
2024-10-29 20:16 ` Borislav Petkov
2024-10-30 11:26 ` Shiju Jose
2024-10-25 17:13 ` [PATCH v14 08/14] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-10-25 17:13 ` [PATCH v14 09/14] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-25 17:13 ` [PATCH v14 10/14] ras: mem: Add memory " shiju.jose
2024-10-25 17:13 ` [PATCH v14 11/14] EDAC: Add memory repair control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 12/14] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-10-25 17:13 ` [PATCH v14 13/14] cxl/memfeature: Add CXL memory device sPPR control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 14/14] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2024-10-26 10:35 ` [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2263e5a551e24a6ca63dbe33538a5ce8@huawei.com \
--to=shiju.jose@huawei.com \
--cc=Jon.Grimm@amd.com \
--cc=Vilas.Sridharan@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=gthelen@google.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jassisinghbrar@gmail.com \
--cc=jiaqiyan@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=nifan.cxl@gmail.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=roberto.sassu@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=sudeep.holla@arm.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.