Linux CXL
 help / color / mirror / Atom feed
* Some thoughts and questions about CXL & MCE
@ 2023-12-22 16:33 Shiyang Ruan
  2024-01-02 17:29 ` Jonathan Cameron
  2024-01-03 16:45 ` Dan Williams
  0 siblings, 2 replies; 7+ messages in thread
From: Shiyang Ruan @ 2023-12-22 16:33 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams@intel.com, Jonathan Cameron, dave.jiang,
	vishal.l.verma

Hi guys,

I have some thoughts and questions about CXL & MCE mechanism.

CXL type-3 devices can be used as volatile or persistent memory, so a 
poisoned page on them should also trigger a memory failure, to let OS 
handle process using the page and let device driver recover the page.  I 
am now investigating this.

Currently, CXL RAS is under development.  We can now inject POISON on a 
CXL device by qemu (qmp commands), and then `cxl list -L` could show 
those poisoned areas.  But the POISON injection is silent, I think we 
need a singal here to notify OS to handle those poisoned areas when 
injecting.  According to CXL 3.0 spec Figure 12-5, there are 2 methods 
to send the signal: FW-First and OS-First.
My understanding about them is:
- FW-First method:
   a. CXL device report POISON to Firmware
   b. GHES calls CXL driver handler[1], which will handle the POISON
   c. CXL driver handler translates DPA to HPA, construct a mce 
instance, then call mce_log() to queue this MCE (? not sure)
- OS-First method:
   a. CXL device report POISON to OS by MSI
   b. CXL driver will handle the POISON
   c. same with the c. above

So, I think:
Firstly, and obviously, we need to add a signal when injecting POISON in 
qemu.  For example, call `cxl_event_insert()` after injection.

Secondly, implement a method in CXL driver to turn POISON to MCE and 
push it into the mce_evt_pool for decode chain to process, then 
mce_uc_nb on this chain will finally call memory_failure().

And a question:
How to configure the CXL device to choose FW-First or OS-First singal 
methods (methods for qemu and bare matel if possible)?


I don't fully understand the CXL spec yet (it's difficult for me), so 
the above ideas may be immature, but I really want to figure out how we 
can make CXL & MCE work.  I'd really appreciate it if you could help me 
on this!

[1] 
https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u


--
Thanks,
Ruan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-09 19:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 16:33 Some thoughts and questions about CXL & MCE Shiyang Ruan
2024-01-02 17:29 ` Jonathan Cameron
2024-01-03 16:45 ` Dan Williams
2024-01-08 12:37   ` Jonathan Cameron
2024-01-08 21:14     ` Dan Williams
2024-01-09 16:18       ` Jonathan Cameron
2024-01-09 19:59         ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox