qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Tony Luck <tony.luck@intel.com>
To: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: qemu-devel@nongnu.org, linux-cxl@vger.kernel.org,
	Jonathan.Cameron@huawei.com, dan.j.williams@intel.com
Subject: Re: [RFC PATCH 3/5] cxl/core: introduce cxl_mem_report_poison()
Date: Wed, 14 Feb 2024 17:19:36 -0800	[thread overview]
Message-ID: <Zc1mqOp9WiV49_Yi@agluck-desk3> (raw)
In-Reply-To: <20240209115417.724638-6-ruansy.fnst@fujitsu.com>

On Fri, Feb 09, 2024 at 07:54:15PM +0800, Shiyang Ruan wrote:
> If poison is detected(reported from cxl memdev), OS should be notified to
> handle it.  Introduce this function:
>   1. translate DPA to HPA;
>   2. construct a MCE instance; (TODO: more details need to be filled)
>   3. log it into MCE event queue;
> 
> After that, MCE mechanism can walk over its notifier chain to execute
> specific handlers.

This looks like a useful proof of concept patch to pass errors to all
the existing logging systems (console, mcelog, rasdaemon, EDAC). But
it's a bare minimum (just passing the address and dropping any other
interesting information about the error). I think we need something
more advanced that covers more CXL error types.

> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c |  1 +
>  drivers/cxl/core/mbox.c        | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index bc39252bc54f..a64c0aceb7e0 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -131,6 +131,7 @@ void mce_setup(struct mce *m)
>  	m->ppin = cpu_data(m->extcpu).ppin;
>  	m->microcode = boot_cpu_data.microcode;
>  }
> +EXPORT_SYMBOL_GPL(mce_setup);
>  
>  DEFINE_PER_CPU(struct mce, injectm);
>  EXPORT_PER_CPU_SYMBOL_GPL(injectm);
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 27166a411705..f9b6f50fbe80 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -4,6 +4,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/ktime.h>
>  #include <linux/mutex.h>
> +#include <asm/mce.h>
>  #include <asm/unaligned.h>
>  #include <cxlpci.h>
>  #include <cxlmem.h>
> @@ -1290,6 +1291,38 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
>  
> +static void cxl_mem_report_poison(struct cxl_memdev *cxlmd,
> +				  struct cxl_poison_record *poison)
> +{
> +	struct mce m;
> +	u64 dpa = le64_to_cpu(poison->address) & CXL_POISON_START_MASK;
> +	u64 len = le64_to_cpu(poison->length), i;
> +	phys_addr_t phys_addr = cxl_memdev_dpa_to_hpa(cxlmd, dpa);
> +
> +	if (phys_addr)
> +		return;
> +
> +	/*
> +	 * Initialize struct mce.  Call preempt_disable() to avoid
> +	 * "BUG: using smp_processor_id() in preemptible" for now, not sure
> +	 * if this is a correct way.
> +	 */
> +	preempt_disable();
> +	mce_setup(&m);
> +	preempt_enable();
> +
> +	m.bank = -1;
> +	/* Fake a memory read error with unknown channel */
> +	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV |
> +		   MCI_STATUS_MISCV | 0x9f;
> +	m.misc = (MCI_MISC_ADDR_PHYS << 6);
> +
> +	for (i = 0; i < len; i++) {
> +		m.addr = phys_addr++;
> +		mce_log(&m);

This loop looks wrong. What values do you expect for "len" (a.k.a.
poison->length)? Creating one log for each byte in the range will
be very noisy!

> +	}
> +}
> +
>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  		       struct cxl_region *cxlr)
>  {
> -- 
> 2.34.1

-Tony


  parent reply	other threads:[~2024-02-15  3:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-09 11:54 [RFC PATCH SET] cxl: add poison event handler Shiyang Ruan via
2024-02-09 11:54 ` [RFC PATCH 1/2] hw/cxl/type3: add missing flag bit for GMER Shiyang Ruan via
2024-02-13 16:27   ` Jonathan Cameron via
2024-02-09 11:54 ` [RFC PATCH 2/2] hw/cxl/type3: send a GMER while injecting poison Shiyang Ruan via
2024-02-13 16:32   ` Jonathan Cameron via
2024-02-09 11:54 ` [RFC PATCH 1/5] cxl/core: correct length of DPA field masks Shiyang Ruan via
2024-02-10  6:34   ` Dan Williams
2024-02-19 10:49     ` Shiyang Ruan via
2024-02-22  2:27       ` Dan Williams
2024-02-09 11:54 ` [RFC PATCH 2/5] cxl/core: introduce cxl_memdev_dpa_to_hpa() Shiyang Ruan via
2024-02-10  6:39   ` Dan Williams
2024-02-09 11:54 ` [RFC PATCH 3/5] cxl/core: introduce cxl_mem_report_poison() Shiyang Ruan via
2024-02-10  6:46   ` Dan Williams
2024-03-14 15:23     ` Shiyang Ruan via
2024-02-15  1:19   ` Tony Luck [this message]
2024-02-09 11:54 ` [RFC PATCH 4/5] cxl/core: add report option for cxl_mem_get_poison() Shiyang Ruan via
2024-02-10  6:49   ` Dan Williams
2024-03-14 15:01     ` Shiyang Ruan via
2024-02-09 11:54 ` [RFC PATCH 5/5] cxl/core: add poison injection event handler Shiyang Ruan via
2024-02-10  6:54   ` Dan Williams
2024-02-13 16:51   ` Jonathan Cameron via
2024-03-15  2:29     ` Shiyang Ruan via
2024-04-05 17:35       ` Jonathan Cameron via
2024-02-13  0:20 ` [RFC PATCH SET] cxl: add poison " Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zc1mqOp9WiV49_Yi@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ruansy.fnst@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).