From: Alison Schofield <alison.schofield@intel.com>
To: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: qemu-devel@nongnu.org, linux-cxl@vger.kernel.org,
Jonathan.Cameron@huawei.com, dan.j.williams@intel.com,
dave@stgolabs.net, ira.weiny@intel.com
Subject: Re: [RFC PATCH v2 0/6] cxl: add poison event handler
Date: Fri, 29 Mar 2024 10:43:30 -0700 [thread overview]
Message-ID: <Zgb9wjTIu1CE4S5r@aschofie-mobl2> (raw)
In-Reply-To: <20240329063614.362763-1-ruansy.fnst@fujitsu.com>
On Fri, Mar 29, 2024 at 02:36:08PM +0800, Shiyang Ruan wrote:
> Changes:
> RFCv1 -> RFCv2:
> 1. update commit message of PATCH 1
> 2. use memory_failure_queue() instead of MCE
> 3. also report poison in debugfs when injecting poison
> 4. correct DPA->HPA logic:
> find memdev's endpoint decoder to find the region it belongs to
> 5. distinguish transaction_type of GMER, only handle POISON related
> event for now
>
>
> Currently driver only traces cxl events, poison injection (for both vmem
> and pmem type) on cxl memdev is silent. OS needs to be notified then it
> could handle poison range in time. Per CXL spec, the device error event
> could be signaled through FW-First and OS-First methods.
>
> So, add poison event handler in OS-First method:
> - qemu:
> - CXL device report POISON event to OS by MSI by sending GMER after
> injecting a poison record
> - CXL driver <-- this patchset
> a. parse the POISON event from GMER;
> b. retrieve POISON list from memdev;
> c. translate poisoned DPA to HPA;
> d. enqueue poisoned PFN to memory_failure's work queue;
Hi,
Yesterday I posted code adding the HPAs to cxl_general_media & dram
events[1], so as I review this patchset today it's fresh in my mind.
Can we integrate this into the trace_ path directly:
1) On any GMER/poison, trigger a new poison list read
BTW - I'm not sure where to trigger that because we want to keep all
the locking in place and read by endpoints like is done now. It may
not be safe to sneak in a direct call to cxl_mem_get_poison()
as is done in this patch set.
2) Teach the poison list read trace event handler to call
memory_failure_queue().
Upon receipt of that new poison list, call memory_failture_queue()
on *any* poison in a mapped space. Is that OK? Can we call
memory_failure_queue() on any and every poison report that is in
HPA space regardless of whether it first came to us through a GMER?
I'm actually wondering if that is going to be the next ask anyway -
ie report all poison.
I'll comment a bit more on individual patches.
--Alison
[1] https://lore.kernel.org/linux-cxl/cover.1711598777.git.alison.schofield@intel.com/
>
>
> Shiyang Ruan (6):
> cxl/core: correct length of DPA field masks
> cxl/core: introduce cxl_mem_report_poison()
> cxl/core: add report option for cxl_mem_get_poison()
> cxl/core: report poison when injecting from debugfs
> cxl: add definition for transaction_type
> cxl/core: add poison injection event handler
>
> drivers/cxl/core/mbox.c | 126 +++++++++++++++++++++++++++++++++-----
> drivers/cxl/core/memdev.c | 5 +-
> drivers/cxl/core/region.c | 8 +--
> drivers/cxl/core/trace.h | 6 +-
> drivers/cxl/cxlmem.h | 13 ++--
> include/linux/cxl-event.h | 17 ++++-
> 6 files changed, 144 insertions(+), 31 deletions(-)
>
> --
> 2.34.1
> >
next prev parent reply other threads:[~2024-03-29 17:44 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-29 6:36 [RFC PATCH v2 0/6] cxl: add poison event handler Shiyang Ruan via
2024-03-29 6:36 ` [RFC PATCH v2 1/6] cxl/core: correct length of DPA field masks Shiyang Ruan via
2024-03-30 1:37 ` Dan Williams
2024-04-01 9:14 ` Shiyang Ruan via
2024-03-29 6:36 ` [RFC PATCH v2 2/6] cxl/core: introduce cxl_mem_report_poison() Shiyang Ruan via
2024-03-30 1:39 ` Dan Williams
2024-03-29 6:36 ` [RFC PATCH v2 3/6] cxl/core: add report option for cxl_mem_get_poison() Shiyang Ruan via
2024-03-30 1:50 ` Dan Williams
2024-04-03 14:56 ` Shiyang Ruan via
2024-04-04 13:46 ` Jonathan Cameron via
2024-03-29 6:36 ` [RFC PATCH v2 4/6] cxl/core: report poison when injecting from debugfs Shiyang Ruan via
2024-03-29 18:13 ` Alison Schofield
2024-03-30 1:52 ` Dan Williams
2024-04-03 15:07 ` Shiyang Ruan via
2024-03-29 6:36 ` [RFC PATCH v2 5/6] cxl: add definition for transaction types Shiyang Ruan via
2024-03-30 1:53 ` Dan Williams
2024-03-29 6:36 ` [RFC PATCH v2 6/6] cxl/core: add poison injection event handler Shiyang Ruan via
2024-03-29 18:27 ` Alison Schofield
2024-03-29 17:43 ` Alison Schofield [this message]
2024-03-29 18:22 ` [RFC PATCH v2 0/6] cxl: add poison " Dan Williams
2024-03-29 19:38 ` Alison Schofield
2024-03-29 20:56 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zgb9wjTIu1CE4S5r@aschofie-mobl2 \
--to=alison.schofield@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=ruansy.fnst@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).