From: Tony Luck <tony.luck@intel.com>
To: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: qemu-devel@nongnu.org, linux-cxl@vger.kernel.org,
Jonathan.Cameron@huawei.com, dan.j.williams@intel.com
Subject: Re: [RFC PATCH 3/5] cxl/core: introduce cxl_mem_report_poison()
Date: Wed, 14 Feb 2024 17:19:36 -0800 [thread overview]
Message-ID: <Zc1mqOp9WiV49_Yi@agluck-desk3> (raw)
In-Reply-To: <20240209115417.724638-6-ruansy.fnst@fujitsu.com>
On Fri, Feb 09, 2024 at 07:54:15PM +0800, Shiyang Ruan wrote:
> If poison is detected(reported from cxl memdev), OS should be notified to
> handle it. Introduce this function:
> 1. translate DPA to HPA;
> 2. construct a MCE instance; (TODO: more details need to be filled)
> 3. log it into MCE event queue;
>
> After that, MCE mechanism can walk over its notifier chain to execute
> specific handlers.
This looks like a useful proof of concept patch to pass errors to all
the existing logging systems (console, mcelog, rasdaemon, EDAC). But
it's a bare minimum (just passing the address and dropping any other
interesting information about the error). I think we need something
more advanced that covers more CXL error types.
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
> arch/x86/kernel/cpu/mce/core.c | 1 +
> drivers/cxl/core/mbox.c | 33 +++++++++++++++++++++++++++++++++
> 2 files changed, 34 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index bc39252bc54f..a64c0aceb7e0 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -131,6 +131,7 @@ void mce_setup(struct mce *m)
> m->ppin = cpu_data(m->extcpu).ppin;
> m->microcode = boot_cpu_data.microcode;
> }
> +EXPORT_SYMBOL_GPL(mce_setup);
>
> DEFINE_PER_CPU(struct mce, injectm);
> EXPORT_PER_CPU_SYMBOL_GPL(injectm);
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 27166a411705..f9b6f50fbe80 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -4,6 +4,7 @@
> #include <linux/debugfs.h>
> #include <linux/ktime.h>
> #include <linux/mutex.h>
> +#include <asm/mce.h>
> #include <asm/unaligned.h>
> #include <cxlpci.h>
> #include <cxlmem.h>
> @@ -1290,6 +1291,38 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
>
> +static void cxl_mem_report_poison(struct cxl_memdev *cxlmd,
> + struct cxl_poison_record *poison)
> +{
> + struct mce m;
> + u64 dpa = le64_to_cpu(poison->address) & CXL_POISON_START_MASK;
> + u64 len = le64_to_cpu(poison->length), i;
> + phys_addr_t phys_addr = cxl_memdev_dpa_to_hpa(cxlmd, dpa);
> +
> + if (phys_addr)
> + return;
> +
> + /*
> + * Initialize struct mce. Call preempt_disable() to avoid
> + * "BUG: using smp_processor_id() in preemptible" for now, not sure
> + * if this is a correct way.
> + */
> + preempt_disable();
> + mce_setup(&m);
> + preempt_enable();
> +
> + m.bank = -1;
> + /* Fake a memory read error with unknown channel */
> + m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV |
> + MCI_STATUS_MISCV | 0x9f;
> + m.misc = (MCI_MISC_ADDR_PHYS << 6);
> +
> + for (i = 0; i < len; i++) {
> + m.addr = phys_addr++;
> + mce_log(&m);
This loop looks wrong. What values do you expect for "len" (a.k.a.
poison->length)? Creating one log for each byte in the range will
be very noisy!
> + }
> +}
> +
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> struct cxl_region *cxlr)
> {
> --
> 2.34.1
-Tony
next prev parent reply other threads:[~2024-02-15 3:59 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-09 11:54 [RFC PATCH SET] cxl: add poison event handler Shiyang Ruan via
2024-02-09 11:54 ` [RFC PATCH 1/2] hw/cxl/type3: add missing flag bit for GMER Shiyang Ruan via
2024-02-13 16:27 ` Jonathan Cameron via
2024-02-09 11:54 ` [RFC PATCH 2/2] hw/cxl/type3: send a GMER while injecting poison Shiyang Ruan via
2024-02-13 16:32 ` Jonathan Cameron via
2024-02-09 11:54 ` [RFC PATCH 1/5] cxl/core: correct length of DPA field masks Shiyang Ruan via
2024-02-10 6:34 ` Dan Williams
2024-02-19 10:49 ` Shiyang Ruan via
2024-02-22 2:27 ` Dan Williams
2024-02-09 11:54 ` [RFC PATCH 2/5] cxl/core: introduce cxl_memdev_dpa_to_hpa() Shiyang Ruan via
2024-02-10 6:39 ` Dan Williams
2024-02-09 11:54 ` [RFC PATCH 3/5] cxl/core: introduce cxl_mem_report_poison() Shiyang Ruan via
2024-02-10 6:46 ` Dan Williams
2024-03-14 15:23 ` Shiyang Ruan via
2024-02-15 1:19 ` Tony Luck [this message]
2024-02-09 11:54 ` [RFC PATCH 4/5] cxl/core: add report option for cxl_mem_get_poison() Shiyang Ruan via
2024-02-10 6:49 ` Dan Williams
2024-03-14 15:01 ` Shiyang Ruan via
2024-02-09 11:54 ` [RFC PATCH 5/5] cxl/core: add poison injection event handler Shiyang Ruan via
2024-02-10 6:54 ` Dan Williams
2024-02-13 16:51 ` Jonathan Cameron via
2024-03-15 2:29 ` Shiyang Ruan via
2024-04-05 17:35 ` Jonathan Cameron via
2024-02-13 0:20 ` [RFC PATCH SET] cxl: add poison " Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zc1mqOp9WiV49_Yi@agluck-desk3 \
--to=tony.luck@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=ruansy.fnst@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).