All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>
Cc: <shiju.jose@huawei.com>, <linux-cxl@vger.kernel.org>,
	<dan.j.williams@intel.com>, <alison.schofield@intel.com>,
	<dave@stgolabs.net>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <tanxiaofei@huawei.com>,
	<prime.zeng@hisilicon.com>, <linuxarm@huawei.com>
Subject: Re: [PATCH] cxl/events: Update memory event type for patrol scrub cycle end event
Date: Mon, 15 Dec 2025 10:52:53 +0000	[thread overview]
Message-ID: <20251215105253.00003fd6@huawei.com> (raw)
In-Reply-To: <a98e51a3-7939-4fca-9983-413d250f178a@intel.com>

On Wed, 10 Dec 2025 07:47:20 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 12/10/25 6:12 AM, shiju.jose@huawei.com wrote:
> > From: Shiju Jose <shiju.jose@huawei.com>
> > 
> > According to the CXL Specification Revision 4.0, Advanced CVME (Corrected
> > Volatile Memory Error) enhancements added additional granularity control
> > and event generation for Patrol Scrub cycle end.
> > 
> > Update Memory Event Type field in the trace events for section
> > 8.2.10.2.1.1, Table 8-224 (General Media Event Record), and section
> > 8.2.10.2.1.2, Table 8-225 (DRAM Event Record), to include the event type
> > 'Patrol Scrub cycle end'.
> > 
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>  
> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> > ---
> > Open Question,
> > Option to enable event generation for 'Patrol Scrub cycle end' is given in
> > the Memory Error Threshold feature, Section 8.2.10.9.11.3 Advanced
> > Programmable Corrected  Volatile Memory Error Threshold Feature Discovery
> > and Configuration. Does support of this Memory Error Threshold feature is
> > required in the kernel or via fwctl?  
> 
> Any thoughts Jonathan? Is that something that would be exposed through EDAC?

I asked Shiju to add this comment because I wasn't sure of the answer about those
controls.  Given the current conservative view point being taken around more
complex scrub features in general in EDAC (which I'm not saying I disagree with!),
these might be very hard to land unless there are similar facilities in other
Scrub controllers and we can argue it is about generalizing the interface.

For anyone following along this new stuff is about counting granularity,
thresholds, resets of counters etc not the reporting of individual errors.
So more telemetry than error detection. 

So gut feeling is these are probably a fwctl / userspace tool problem but I may
well be wrong and then we run into that question of whether we can rip out
exiting functionality exposed via fwctl later. IIRC correctly we decided
we could but still don't want to do that unless we have to!

Jonathan



> 
> DJ
> 
> > ---
> >  drivers/cxl/core/trace.h | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index a972e4ef1936..e79c2bd415af 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h
> > @@ -367,6 +367,7 @@ TRACE_EVENT(cxl_generic_event,
> >  #define CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR	0x04
> >  #define CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE	0x05
> >  #define CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION		0x06
> > +#define CXL_GMER_MEM_EVT_TYPE_AP_CME_PS_CYCLE_END	0x07
> >  #define show_gmer_mem_event_type(type)	__print_symbolic(type,				\
> >  	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },			\
> >  	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid Address" },		\
> > @@ -374,7 +375,8 @@ TRACE_EVENT(cxl_generic_event,
> >  	{ CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION,	"TE State Violation" },		\
> >  	{ CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR,	"Scrub Media ECC Error" },	\
> >  	{ CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE,	"Adv Prog CME Counter Expiration" },	\
> > -	{ CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION,		"CKID Violation" }		\
> > +	{ CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION,		"CKID Violation" },		\
> > +	{ CXL_GMER_MEM_EVT_TYPE_AP_CME_PS_CYCLE_END,	"Adv Prog CME Patrol Scrub Cycle End" }	\
> >  )
> >  
> >  #define CXL_GMER_TRANS_UNKNOWN				0x00
> > @@ -554,6 +556,7 @@ TRACE_EVENT(cxl_general_media,
> >  #define CXL_DER_MEM_EVT_TYPE_TE_STATE_VIOLATION	0x04
> >  #define CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE	0x05
> >  #define CXL_DER_MEM_EVT_TYPE_CKID_VIOLATION		0x06
> > +#define CXL_DER_MEM_EVT_TYPE_AP_CME_PS_CYCLE_END	0x07
> >  #define show_dram_mem_event_type(type)	__print_symbolic(type,					\
> >  	{ CXL_DER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },				\
> >  	{ CXL_DER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR,	"Scrub Media ECC Error" },		\
> > @@ -561,7 +564,8 @@ TRACE_EVENT(cxl_general_media,
> >  	{ CXL_DER_MEM_EVT_TYPE_DATA_PATH_ERROR,		"Data Path Error" },			\
> >  	{ CXL_DER_MEM_EVT_TYPE_TE_STATE_VIOLATION,	"TE State Violation" },			\
> >  	{ CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE,	"Adv Prog CME Counter Expiration" },	\
> > -	{ CXL_DER_MEM_EVT_TYPE_CKID_VIOLATION,		"CKID Violation" }			\
> > +	{ CXL_DER_MEM_EVT_TYPE_CKID_VIOLATION,		"CKID Violation" },			\
> > +	{ CXL_DER_MEM_EVT_TYPE_AP_CME_PS_CYCLE_END,	"Adv Prog CME Patrol Scrub Cycle End" }	\
> >  )
> >  
> >  #define CXL_DER_VALID_CHANNEL				BIT(0)  
> 
> 


      reply	other threads:[~2025-12-15 10:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10 13:12 [PATCH] cxl/events: Update memory event type for patrol scrub cycle end event shiju.jose
2025-12-10 14:47 ` Dave Jiang
2025-12-15 10:52   ` Jonathan Cameron [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251215105253.00003fd6@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=shiju.jose@huawei.com \
    --cc=tanxiaofei@huawei.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.