From: Dave Jiang <dave.jiang@intel.com>
To: shiju.jose@huawei.com, linux-cxl@vger.kernel.org,
dan.j.williams@intel.com, jonathan.cameron@huawei.com,
alison.schofield@intel.com, dave@stgolabs.net,
vishal.l.verma@intel.com, ira.weiny@intel.com
Cc: tanxiaofei@huawei.com, prime.zeng@hisilicon.com, linuxarm@huawei.com
Subject: Re: [PATCH 2/4] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record
Date: Wed, 16 Jul 2025 14:40:26 -0700 [thread overview]
Message-ID: <94f2fdd8-56d4-45c8-8ed3-5c23522425b7@intel.com> (raw)
In-Reply-To: <20250716104945.2002-3-shiju.jose@huawei.com>
On 7/16/25 3:49 AM, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> According to the CXL Specification Revision 3.2, Section 8.2.10.2.1.1,
> Table 8-57 (General Media Event Record), the Corrected Memory Error Count
> field is valid under the following conditions:
> 1. The Threshold Event bit is set in the Memory Event Descriptor field, and
> 2. The Corrected Memory Error Count must be greater than 0 for events where
> the Advanced Programmable Threshold Counter has expired.
>
> Additionally, if the Advanced Programmable Corrected Memory Error Counter
> Expire bit in the Memory Event Type field is set, then the Threshold Event
> bit in the Memory Event Descriptor field shall also be set.
>
> Add validity checks for the above conditions while reporting the event to
> the userspace.
>
> Note: CXL spec rev3.2 Table 8-57. General Media Event Record
> Field: Corrected Memory Error Count at Event) "For events in which the
> advanced programmable threshold counter has expired, this field value
> shall be a value greater than 0. Counter expiration events in which
> the corrected memory error count is 0 shall not generate a media event
> record".
> Q: Should kernel drop the event record in this case or user space
> to handle?
>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
As Jonathan mentioned, the Q doesn't belong in the commit log.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/mbox.c | 9 +++++++++
> drivers/cxl/core/trace.h | 5 ++++-
> 2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 2689e6453c5a..5a30d3891b17 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -926,6 +926,15 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt))
> dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n");
>
> + if (evt->gen_media.media_hdr.descriptor &
> + CXL_GMER_EVT_DESC_THRESHOLD_EVENT)
> + WARN_ON_ONCE((evt->gen_media.media_hdr.type &
> + CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE) &&
> + !evt->gen_media.cme_count);
> + else
> + WARN_ON_ONCE(evt->gen_media.media_hdr.type &
> + CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE);
> +
> trace_cxl_general_media(cxlmd, type, cxlr, hpa,
> hpa_alias, &evt->gen_media);
> } else if (event_type == CXL_CPER_EVENT_DRAM) {
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index a77487a257b3..c38f94ca0ca1 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -506,7 +506,10 @@ TRACE_EVENT(cxl_general_media,
> uuid_copy(&__entry->region_uuid, &uuid_null);
> }
> __entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags;
> - __entry->cme_count = get_unaligned_le24(rec->cme_count);
> + if (rec->media_hdr.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT)
> + __entry->cme_count = get_unaligned_le24(rec->cme_count);
> + else
> + __entry->cme_count = 0;
> ),
>
> CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
next prev parent reply other threads:[~2025-07-16 21:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-16 10:49 [PATCH 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
2025-07-16 10:49 ` [PATCH 1/4] cxl/events: Update Common Event Record to CXL spec rev 3.2 shiju.jose
2025-07-16 12:53 ` Jonathan Cameron
2025-07-16 10:49 ` [PATCH 2/4] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record shiju.jose
2025-07-16 13:04 ` Jonathan Cameron
2025-07-16 21:40 ` Dave Jiang [this message]
2025-07-17 3:32 ` kernel test robot
2025-07-16 10:49 ` [PATCH 3/4] cxl/events: Add extra validity checks for CVME count in DRAM " shiju.jose
2025-07-16 13:07 ` Jonathan Cameron
2025-07-16 21:53 ` Dave Jiang
2025-07-17 5:16 ` kernel test robot
2025-07-16 10:49 ` [PATCH 4/4] cxl/events: Trace Memory Sparing " shiju.jose
2025-07-16 13:16 ` Jonathan Cameron
2025-07-16 15:07 ` Shiju Jose
2025-07-16 22:23 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94f2fdd8-56d4-45c8-8ed3-5c23522425b7@intel.com \
--to=dave.jiang@intel.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=prime.zeng@hisilicon.com \
--cc=shiju.jose@huawei.com \
--cc=tanxiaofei@huawei.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.