All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: shiju.jose@huawei.com
Cc: linux-edac@vger.kernel.org, linux-cxl@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, tony.luck@intel.com,
	rafael@kernel.org, lenb@kernel.org, mchehab@kernel.org,
	dan.j.williams@intel.com, dave@stgolabs.net,
	jonathan.cameron@huawei.com, gregkh@linuxfoundation.org,
	sudeep.holla@arm.com, jassisinghbrar@gmail.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com, david@redhat.com,
	Vilas.Sridharan@amd.com, leo.duran@amd.com,
	Yazen.Ghannam@amd.com, rientjes@google.com, jiaqiyan@google.com,
	Jon.Grimm@amd.com, dave.hansen@linux.intel.com,
	naoya.horiguchi@nec.com, james.morse@arm.com,
	jthoughton@google.com, somasundaram.a@hpe.com,
	erdemaktas@google.com, pgonda@google.com, duenwen@google.com,
	gthelen@google.com, wschwartz@amperecomputing.com,
	dferguson@amperecomputing.com, wbs@os.amperecomputing.com,
	nifan.cxl@gmail.com, tanxiaofei@huawei.com,
	prime.zeng@hisilicon.com, roberto.sassu@huawei.com,
	kangkang.shen@futurewei.com, wanghuiqiang@huawei.com,
	linuxarm@huawei.com
Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature
Date: Mon, 28 Oct 2024 12:16:37 +0100	[thread overview]
Message-ID: <20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local> (raw)
In-Reply-To: <20241025171356.1377-4-shiju.jose@huawei.com>

On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@huawei.com wrote:
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The log entry type of how the DDR5 ECS log is reported.
> +		00b - per DRAM.
> +		01b - per memory media FRU.

If the conversion function here is kstrtoul(), why are those values not "0"
and "1" but in binary format?

> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type_per_dram
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current log entry type is per DRAM.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type_per_memory_media
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current log entry type is per memory media FRU.

What's the point of those two if log_entry_type already gives you the same
info?

And the filename length is a bit too much...

> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The mode of how the DDR5 ECS counts the errors.
> +		0 - ECS counts rows with errors.
> +		1 - ECS counts codewords with errors.

Now we have "0" and "1"s. Oh well.

What are "rows", what are "codewords"? Explain them here pls for the user.

> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_rows
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts rows with errors.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_codewords
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts codewords with errors.

Same question as above - redundant files.

> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) ECS reset ECC counter.
> +		1 - reset ECC counter to the default value.

1 or any value?

Looks like any to me...

You should restrict it to "1" in case you want to extend this interface with
"2" in the future, for example, doing something a bit different.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) ECS threshold count per gigabits of memory cells.

That definitely needs more explanation.

> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index 188501e676c7..b24c2c112d9c 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>  
>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
> -edac_core-y	+= scrub.o
> +edac_core-y	+= scrub.o ecs.o
>  
>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>  
> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c
> new file mode 100755
> index 000000000000..a2b64d7bf6b6
> --- /dev/null
> +++ b/drivers/edac/ecs.c
> @@ -0,0 +1,240 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic ECS driver in order to support control the on die
> + * error check scrub (e.g. DDR5 ECS).

This sentence needs grammar check.

> The common sysfs ECS
> + * interface abstracts the control of an arbitrary ECS
> + * functionality to a common set of functions.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +

#undef pr_fmt

> +#define pr_fmt(fmt)     "EDAC ECS: " fmt

Grep the tree for examples how to do that properly.

Also, this pr_fmt looks unused.

> +static umode_t ecs_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id)
> +{
> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +
> +	switch (attr_id) {
> +	case ECS_LOG_ENTRY_TYPE:
> +		if (ops->get_log_entry_type)  {
> +			if (ops->set_log_entry_type)
> +				return a->mode;
> +			else
> +				return 0444;

What is the goal for the access mode of all those sysfs entries? I sure hope
it is going to be root-only no-matter what. I don't want normal users to cause
scrub activity. Please make sure your whole set does that.

> +		}
> +		break;
> +	case ECS_LOG_ENTRY_TYPE_PER_DRAM:
> +		if (ops->get_log_entry_type_per_dram)
> +			return a->mode;
> +		break;
> +	case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA:
> +		if (ops->get_log_entry_type_per_memory_media)
> +			return a->mode;
> +		break;
> +	case ECS_MODE:
> +		if (ops->get_mode) {
> +			if (ops->set_mode)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case ECS_MODE_COUNTS_ROWS:
> +		if (ops->get_mode_counts_rows)
> +			return a->mode;
> +		break;
> +	case ECS_MODE_COUNTS_CODEWORDS:
> +		if (ops->get_mode_counts_codewords)
> +			return a->mode;
> +		break;
> +	case ECS_RESET:
> +		if (ops->reset)
> +			return a->mode;
> +		break;
> +	case ECS_THRESHOLD:
> +		if (ops->get_threshold) {
> +			if (ops->set_threshold)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +#define EDAC_ECS_ATTR_RO(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \
> +				     .fru_id = _fru_id })
> +
> +#define EDAC_ECS_ATTR_WO(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \
> +				     .fru_id = _fru_id })
> +
> +#define EDAC_ECS_ATTR_RW(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \
> +				     .fru_id = _fru_id })
> +
> +static int ecs_create_desc(struct device *ecs_dev,
> +			   const struct attribute_group **attr_groups, u16 num_media_frus)
> +{
> +	struct edac_ecs_context *ecs_ctx;
> +	u32 fru;
> +
> +	ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL);
> +	if (!ecs_ctx)
> +		return -ENOMEM;
> +
> +	ecs_ctx->num_media_frus = num_media_frus;
> +	ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus,
> +					 sizeof(*ecs_ctx->fru_ctxs),
> +					 GFP_KERNEL);
> +	if (!ecs_ctx->fru_ctxs)
> +		return -ENOMEM;
> +
> +	for (fru = 0; fru < num_media_frus; fru++) {
> +		struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru];
> +		struct attribute_group *group = &fru_ctx->group;
> +		int i;
> +
> +		fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE] = EDAC_ECS_ATTR_RW(log_entry_type, fru);
> +		fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_DRAM] =
> +					EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
> +		fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] =
> +					EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
> +		fru_ctx->ecs_dev_attr[ECS_MODE] = EDAC_ECS_ATTR_RW(mode, fru);
> +		fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] =
> +					EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
> +		fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] =
> +					EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
> +		fru_ctx->ecs_dev_attr[ECS_RESET] = EDAC_ECS_ATTR_WO(reset, fru);
> +		fru_ctx->ecs_dev_attr[ECS_THRESHOLD] = EDAC_ECS_ATTR_RW(threshold, fru);

Clearly too long variable and define names. Shorten pls.

Also, a new line here:

<---


> +		for (i = 0; i < ECS_MAX_ATTRS; i++)
> +			fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr;
> +
> +		sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru);
> +		group->name = fru_ctx->name;
> +		group->attrs = fru_ctx->ecs_attrs;
> +		group->is_visible  = ecs_attr_visible;
> +
> +		attr_groups[fru] = group;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * edac_ecs_get_desc - get EDAC ECS descriptors
> + * @ecs_dev: client device, supports ECS feature
> + * @attr_groups: pointer to attribute group container
> + * @num_media_frus: number of media FRUs in the device
> + *
> + * Return:
> + *  * %0	- Success.
> + *  * %-EINVAL	- Invalid parameters passed.
> + *  * %-ENOMEM	- Dynamic memory allocation failed.
> + */
> +int edac_ecs_get_desc(struct device *ecs_dev,
> +		      const struct attribute_group **attr_groups, u16 num_media_frus)
> +{
> +	if (!ecs_dev || !attr_groups || !num_media_frus)
> +		return -EINVAL;
> +
> +	return ecs_create_desc(ecs_dev, attr_groups, num_media_frus);
> +}
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 91552271b34a..5fc3ec7f25eb 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -626,6 +626,9 @@ int edac_dev_register(struct device *parent, char *name,
>  			attr_gcnt++;
>  			scrub_cnt++;
>  			break;
> +		case RAS_FEAT_ECS:
> +			attr_gcnt += ras_features[feat].ecs_info.num_media_frus;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> @@ -667,6 +670,18 @@ int edac_dev_register(struct device *parent, char *name,
>  			scrub_inst++;
>  			attr_gcnt++;
>  			break;
> +		case RAS_FEAT_ECS:
> +			if (!ras_features->ecs_ops)
> +				goto data_mem_free;

<---- newline here.

> +			dev_data = &ctx->ecs;
> +			dev_data->ecs_ops = ras_features->ecs_ops;
> +			dev_data->private = ras_features->ctx;
> +			ret = edac_ecs_get_desc(parent, &ras_attr_groups[attr_gcnt],
> +						ras_features->ecs_info.num_media_frus);
> +			if (ret)
> +				goto data_mem_free;

Ditto.

> +			attr_gcnt += ras_features->ecs_info.num_media_frus;
> +			break;
>  		default:
>  			ret = -EINVAL;
>  			goto data_mem_free;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2024-10-28 11:17 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 17:13 [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-10-25 17:13 ` [PATCH v14 01/14] EDAC: Add support for EDAC device features control shiju.jose
2024-10-25 17:13 ` [PATCH v14 02/14] EDAC: Add scrub control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 03/14] EDAC: Add ECS " shiju.jose
2024-10-28 11:16   ` Borislav Petkov [this message]
2024-10-28 16:03     ` Shiju Jose
2024-10-29 20:07       ` Borislav Petkov
2024-10-25 17:13 ` [PATCH v14 04/14] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-10-25 17:13 ` [PATCH v14 05/14] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-10-29 15:47   ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 06/14] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-10-29 15:51   ` Dave Jiang
2024-10-25 17:13 ` [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-10-29 16:31   ` Dave Jiang
2024-10-29 17:00     ` Shiju Jose
2024-10-29 18:32       ` Dave Jiang
2024-10-30 16:16         ` Jonathan Cameron
2024-10-30 16:46           ` Dave Jiang
2024-10-29 20:15   ` Borislav Petkov
2024-10-30 12:52     ` Shiju Jose
2024-10-29 20:16   ` Borislav Petkov
2024-10-30 11:26     ` Shiju Jose
2024-10-25 17:13 ` [PATCH v14 08/14] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-10-25 17:13 ` [PATCH v14 09/14] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-25 17:13 ` [PATCH v14 10/14] ras: mem: Add memory " shiju.jose
2024-10-25 17:13 ` [PATCH v14 11/14] EDAC: Add memory repair control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 12/14] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-10-25 17:13 ` [PATCH v14 13/14] cxl/memfeature: Add CXL memory device sPPR control feature shiju.jose
2024-10-25 17:13 ` [PATCH v14 14/14] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2024-10-26 10:35 ` [PATCH v14 00/14] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241028111637.GSZx9yleFPOjTklIVr@fat_crate.local \
    --to=bp@alien8.de \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jassisinghbrar@gmail.com \
    --cc=jiaqiyan@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roberto.sassu@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=sudeep.holla@arm.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.