Re: [PATCH v3 2/2] cxl/core: add poison creation event handler

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Shiyang Ruan via <qemu-devel@nongnu.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org, qemu-devel@nongnu.org,
	Jonathan.Cameron@huawei.com, dave@stgolabs.net,
	ira.weiny@intel.com, alison.schofield@intel.com
Subject: Re: [PATCH v3 2/2] cxl/core: add poison creation event handler
Date: Tue, 21 May 2024 13:35:12 +0800	[thread overview]
Message-ID: <b14ed74c-7fc5-461a-9c5f-fbb94df50e7d@fujitsu.com> (raw)
In-Reply-To: <b5eb1017-5f0a-4d68-bb63-41e628706a78@fujitsu.com>



在 2024/5/3 19:32, Shiyang Ruan 写道:
> 
> 
> 在 2024/4/24 2:40, Dan Williams 写道:
>> Shiyang Ruan wrote:
>>> Currently driver only traces cxl events, poison creation (for both vmem
>>> and pmem type) on cxl memdev is silent.
>>
>> As it should be.
>>
>>> OS needs to be notified then it could handle poison pages in time.
>>
>> No, it was always the case that latent poison is an "action optional"
>> event. I am not understanding the justification for this approach. What
>> breaks if the kernel does not forward events to memory_failure_queue()?
> 
> I think for type3(pmem) device, it should be handled like NVDIMM.  If 
> there are processes or filesystems running on it, they could be notified 
> then operate a friendly shutdown if POISON happens.
> 
>>
>> Consider that in the CPU consumption case that the firmware first path
>> will do its own memory_failure_queue() and in the native case the MCE
>> handler will take care of this. So that leaves pages that are accessed
>> by DMA or background operation that encounter poison. Those are "action
>> optional" scenarios and it is not clear to me how the driver tells the
>> difference.
> 
> So for real CXL device, it always use FW-First path to notify such 
> failure event?  Then, there is nothing to do with OS-First path?
> 
>>
>> This needs more precision on which agent is repsonsible for what level
>> of reporting. The distribution of responsibility between ACPI GHES,
>> EDAC, and the CXL driver is messy and I expect this changelog to
>> demonstrate it understands all those considerations.
> 
> Ok, I'll try to understand them.

Hi Dan,

I checked the GHES, EDAC codes. I think they belong to FW-First path. 
GHES polls mem errors, then
  1. report by EDAC
  2. construct a MCE, mce_log(), and handle in work queue
  3. queue it into memory_failure right now if needed (sync, ...)
And community is adding CXL FW-First trace support[1].

But in OS-First path, error record is sent to CXL driver via MSI, it 
won't conflict with FW-First path, I think.

[1] 
https://lore.kernel.org/linux-cxl/43ab39e9-c9c2-bfe4-7d1c-bad462221c9c@amd.com/T/#t

> 
>>
>>> Per CXL spec, the device error event could be signaled through
>>> FW-First and OS-First methods.
>>>
>>> So, add poison creation event handler in OS-First method:
>>>    - Qemu:
>>
>> Why is QEMU relevant for this patch? QEMU is only a development vehicle
>> the upstream enabling should be reference shipping or expected to be
>> shipping hardware implementations.
> 
> Yes, but currently we don't have a real CXL device so developing and 
> verification could only be done on Qemu with simulated CXL device.
> 
>>
>>>      - CXL device reports POISON creation event to OS by MSI by sending
>>>        GMER/DER after injecting a poison record;
>>
>> When you say "inject" here do you mean "add to the poison list if
>> present". Because "inject" to me means the "Inject Poison" Memory Device
>> Command.
> 
> It's a Qemu qmp command called "cxl-inject-poison", which only adds a 
> given address,length record to CXL's poison list, doesn't send 
> INJECT_POISON mbox command.
> 
>>
>>>    - CXL driver:
>>>      a. parse the POISON event from GMER/DER;
>>>      b. translate poisoned DPA to HPA (PFN);
>>>      c. enqueue poisoned PFN to memory_failure's work queue;
>>>
>>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>>> ---
>>>   drivers/cxl/core/mbox.c   | 119 +++++++++++++++++++++++++++++++++-----
>>>   drivers/cxl/cxlmem.h      |   8 +--
>>>   include/linux/cxl-event.h |  18 +++++-
>>>   3 files changed, 125 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>>> index f0f54aeccc87..76af0d73859d 100644
>>> --- a/drivers/cxl/core/mbox.c
>>> +++ b/drivers/cxl/core/mbox.c
>>> @@ -837,25 +837,116 @@ int cxl_enumerate_cmds(struct cxl_memdev_state 
>>> *mds)
>>>   }
>>>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>> -                enum cxl_event_log_type type,
>>> -                enum cxl_event_type event_type,
>>> -                const uuid_t *uuid, union cxl_event *evt)
>>> +static void cxl_report_poison(struct cxl_memdev *cxlmd, struct 
>>> cxl_region *cxlr,
>>> +                  u64 dpa)
>>>   {
>>> -    if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
>>> +    u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
>>> +    unsigned long pfn = PHYS_PFN(hpa);
>>> +
>>> +    if (!IS_ENABLED(CONFIG_MEMORY_FAILURE))
>>> +        return;
>>
>> No need for this check, memory_failure_queue() is already stubbed out in
>> the CONFIG_MEMORY_FAILURE=n case.
> Yes, I'm overthinking it.
> 
>>
>>> +    memory_failure_queue(pfn, MF_ACTION_REQUIRED);
>>
>> My expectation is MF_ACTION_REQUIRED is not appropriate for CXL event
>> reported errors since action is only required for direct consumption
>> events and those need not be reported through the device event queue.
> Got it.

I'm not very sure about 'Host write/read' type.  In my opinion, these 
two types of event should be sent from device when CPU is accessing a 
bad memory address, they could be thought of a sync event which needs 
the 'MF_ACTION_REQUIRED' flag.  Then, we can determine the flag by the 
types like this:
- CXL_EVENT_TRANSACTION_READ | CXL_EVENT_TRANSACTION_WRITE
                                               => MF_ACTION_REQUIRED
- CXL_EVENT_TRANSACTION_INJECT_POISON         => MF_SW_SIMULATED
- others                                      => 0


--
Thanks,
Ruan.

> 
>>
>> It would be useful to collaborate with a BIOS firmware engineer so that
>> the kernel ends up with similar logic as is used to set CPER record
>> severity, or at least understands why it would want to be different.
>> See how ghes_handle_memory_failure() determines the
>> memory_failure_queue() flags.
> 
> Ok, thanks for the advice.
> 
> 
> -- 
> Ruan.
> 
> 
>

next prev parent reply	other threads:[~2024-05-21  5:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17  7:50 [PATCH v3 0/2] cxl: add poison creation event handler Shiyang Ruan via
2024-04-17  7:50 ` [PATCH v3 1/2] cxl/core: correct length of DPA field masks Shiyang Ruan via
2024-04-23 17:35   ` Ira Weiny
2024-04-23 17:35   ` Dan Williams
2024-04-23 17:42   ` Alison Schofield
2024-04-23 21:04     ` Ira Weiny
2024-04-25 10:05       ` Shiyang Ruan via
2024-04-25 16:04         ` Ira Weiny
2024-04-30 21:00   ` Alison Schofield
2024-05-03 11:37     ` Shiyang Ruan via
2024-04-17  7:50 ` [PATCH v3 2/2] cxl/core: add poison creation event handler Shiyang Ruan via
2024-04-17 17:30   ` Dave Jiang
2024-04-18  9:01     ` Shiyang Ruan via
2024-04-21 12:14   ` kernel test robot
2024-04-23 17:57   ` Ira Weiny
2024-05-03 10:42     ` Shiyang Ruan via
2024-05-08 16:15       ` Jonathan Cameron via
2024-04-23 18:40   ` Dan Williams
2024-05-03 11:32     ` Shiyang Ruan via
2024-05-21  5:35       ` Shiyang Ruan via [this message]
2024-05-22  6:45         ` Dan Williams
2024-05-24 15:15           ` Shiyang Ruan via
2024-05-28 10:13             ` Shiyang Ruan via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b14ed74c-7fc5-461a-9c5f-fbb94df50e7d@fujitsu.com \
    --to=qemu-devel@nongnu.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=ruansy.fnst@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).