From: Shiyang Ruan via <qemu-devel@nongnu.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org, qemu-devel@nongnu.org,
Jonathan.Cameron@huawei.com, dave@stgolabs.net,
ira.weiny@intel.com, alison.schofield@intel.com
Subject: Re: [PATCH v3 2/2] cxl/core: add poison creation event handler
Date: Tue, 21 May 2024 13:35:12 +0800 [thread overview]
Message-ID: <b14ed74c-7fc5-461a-9c5f-fbb94df50e7d@fujitsu.com> (raw)
In-Reply-To: <b5eb1017-5f0a-4d68-bb63-41e628706a78@fujitsu.com>
在 2024/5/3 19:32, Shiyang Ruan 写道:
>
>
> 在 2024/4/24 2:40, Dan Williams 写道:
>> Shiyang Ruan wrote:
>>> Currently driver only traces cxl events, poison creation (for both vmem
>>> and pmem type) on cxl memdev is silent.
>>
>> As it should be.
>>
>>> OS needs to be notified then it could handle poison pages in time.
>>
>> No, it was always the case that latent poison is an "action optional"
>> event. I am not understanding the justification for this approach. What
>> breaks if the kernel does not forward events to memory_failure_queue()?
>
> I think for type3(pmem) device, it should be handled like NVDIMM. If
> there are processes or filesystems running on it, they could be notified
> then operate a friendly shutdown if POISON happens.
>
>>
>> Consider that in the CPU consumption case that the firmware first path
>> will do its own memory_failure_queue() and in the native case the MCE
>> handler will take care of this. So that leaves pages that are accessed
>> by DMA or background operation that encounter poison. Those are "action
>> optional" scenarios and it is not clear to me how the driver tells the
>> difference.
>
> So for real CXL device, it always use FW-First path to notify such
> failure event? Then, there is nothing to do with OS-First path?
>
>>
>> This needs more precision on which agent is repsonsible for what level
>> of reporting. The distribution of responsibility between ACPI GHES,
>> EDAC, and the CXL driver is messy and I expect this changelog to
>> demonstrate it understands all those considerations.
>
> Ok, I'll try to understand them.
Hi Dan,
I checked the GHES, EDAC codes. I think they belong to FW-First path.
GHES polls mem errors, then
1. report by EDAC
2. construct a MCE, mce_log(), and handle in work queue
3. queue it into memory_failure right now if needed (sync, ...)
And community is adding CXL FW-First trace support[1].
But in OS-First path, error record is sent to CXL driver via MSI, it
won't conflict with FW-First path, I think.
[1]
https://lore.kernel.org/linux-cxl/43ab39e9-c9c2-bfe4-7d1c-bad462221c9c@amd.com/T/#t
>
>>
>>> Per CXL spec, the device error event could be signaled through
>>> FW-First and OS-First methods.
>>>
>>> So, add poison creation event handler in OS-First method:
>>> - Qemu:
>>
>> Why is QEMU relevant for this patch? QEMU is only a development vehicle
>> the upstream enabling should be reference shipping or expected to be
>> shipping hardware implementations.
>
> Yes, but currently we don't have a real CXL device so developing and
> verification could only be done on Qemu with simulated CXL device.
>
>>
>>> - CXL device reports POISON creation event to OS by MSI by sending
>>> GMER/DER after injecting a poison record;
>>
>> When you say "inject" here do you mean "add to the poison list if
>> present". Because "inject" to me means the "Inject Poison" Memory Device
>> Command.
>
> It's a Qemu qmp command called "cxl-inject-poison", which only adds a
> given address,length record to CXL's poison list, doesn't send
> INJECT_POISON mbox command.
>
>>
>>> - CXL driver:
>>> a. parse the POISON event from GMER/DER;
>>> b. translate poisoned DPA to HPA (PFN);
>>> c. enqueue poisoned PFN to memory_failure's work queue;
>>>
>>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>>> ---
>>> drivers/cxl/core/mbox.c | 119 +++++++++++++++++++++++++++++++++-----
>>> drivers/cxl/cxlmem.h | 8 +--
>>> include/linux/cxl-event.h | 18 +++++-
>>> 3 files changed, 125 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>>> index f0f54aeccc87..76af0d73859d 100644
>>> --- a/drivers/cxl/core/mbox.c
>>> +++ b/drivers/cxl/core/mbox.c
>>> @@ -837,25 +837,116 @@ int cxl_enumerate_cmds(struct cxl_memdev_state
>>> *mds)
>>> }
>>> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>> - enum cxl_event_log_type type,
>>> - enum cxl_event_type event_type,
>>> - const uuid_t *uuid, union cxl_event *evt)
>>> +static void cxl_report_poison(struct cxl_memdev *cxlmd, struct
>>> cxl_region *cxlr,
>>> + u64 dpa)
>>> {
>>> - if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
>>> + u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
>>> + unsigned long pfn = PHYS_PFN(hpa);
>>> +
>>> + if (!IS_ENABLED(CONFIG_MEMORY_FAILURE))
>>> + return;
>>
>> No need for this check, memory_failure_queue() is already stubbed out in
>> the CONFIG_MEMORY_FAILURE=n case.
> Yes, I'm overthinking it.
>
>>
>>> + memory_failure_queue(pfn, MF_ACTION_REQUIRED);
>>
>> My expectation is MF_ACTION_REQUIRED is not appropriate for CXL event
>> reported errors since action is only required for direct consumption
>> events and those need not be reported through the device event queue.
> Got it.
I'm not very sure about 'Host write/read' type. In my opinion, these
two types of event should be sent from device when CPU is accessing a
bad memory address, they could be thought of a sync event which needs
the 'MF_ACTION_REQUIRED' flag. Then, we can determine the flag by the
types like this:
- CXL_EVENT_TRANSACTION_READ | CXL_EVENT_TRANSACTION_WRITE
=> MF_ACTION_REQUIRED
- CXL_EVENT_TRANSACTION_INJECT_POISON => MF_SW_SIMULATED
- others => 0
--
Thanks,
Ruan.
>
>>
>> It would be useful to collaborate with a BIOS firmware engineer so that
>> the kernel ends up with similar logic as is used to set CPER record
>> severity, or at least understands why it would want to be different.
>> See how ghes_handle_memory_failure() determines the
>> memory_failure_queue() flags.
>
> Ok, thanks for the advice.
>
>
> --
> Ruan.
>
>
>
next prev parent reply other threads:[~2024-05-21 5:36 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 7:50 [PATCH v3 0/2] cxl: add poison creation event handler Shiyang Ruan via
2024-04-17 7:50 ` [PATCH v3 1/2] cxl/core: correct length of DPA field masks Shiyang Ruan via
2024-04-23 17:35 ` Ira Weiny
2024-04-23 17:35 ` Dan Williams
2024-04-23 17:42 ` Alison Schofield
2024-04-23 21:04 ` Ira Weiny
2024-04-25 10:05 ` Shiyang Ruan via
2024-04-25 16:04 ` Ira Weiny
2024-04-30 21:00 ` Alison Schofield
2024-05-03 11:37 ` Shiyang Ruan via
2024-04-17 7:50 ` [PATCH v3 2/2] cxl/core: add poison creation event handler Shiyang Ruan via
2024-04-17 17:30 ` Dave Jiang
2024-04-18 9:01 ` Shiyang Ruan via
2024-04-21 12:14 ` kernel test robot
2024-04-23 17:57 ` Ira Weiny
2024-05-03 10:42 ` Shiyang Ruan via
2024-05-08 16:15 ` Jonathan Cameron via
2024-04-23 18:40 ` Dan Williams
2024-05-03 11:32 ` Shiyang Ruan via
2024-05-21 5:35 ` Shiyang Ruan via [this message]
2024-05-22 6:45 ` Dan Williams
2024-05-24 15:15 ` Shiyang Ruan via
2024-05-28 10:13 ` Shiyang Ruan via
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b14ed74c-7fc5-461a-9c5f-fbb94df50e7d@fujitsu.com \
--to=qemu-devel@nongnu.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=ruansy.fnst@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).