From: Dave Jiang <dave.jiang@intel.com>
To: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org
Cc: rafael@kernel.org, bp@alien8.de, dan.j.williams@intel.com,
tony.luck@intel.com, dave@stgolabs.net,
jonathan.cameron@huawei.com, alison.schofield@intel.com,
ira.weiny@intel.com
Subject: [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support
Date: Fri, 27 Sep 2024 07:16:52 -0700 [thread overview]
Message-ID: <20240927142108.1156362-1-dave.jiang@intel.com> (raw)
Hi all,
I'm looking for comments on the approach and the implementation of dealing with
this exclusive caching configuration. I have concerns with the discovering and
handling of I/O hole in the memory mapping and looking for suggestions on if
there are better ways to do it. I will be taking a 4 weeks sabbatical starting
next week and I apologize in advance in the delay on responses. Thank you in
advance for reviewing the patches.
The MCE folks will be interested in patch 6/6 where MCE_PRIO_CXL is added.
Certain systems provide an exclusive caching memory configurations where a
1:1 layout of DRAM and far memory (FR) such as CXL memory is utilized. In
this configuration, the memory region is provided as a single memory region
to the OS. For example such as below:
128GB DRAM 128GB CXL memory
|------------------------------------|------------------------------------|
The kernel sees the region as a 256G system memory region. Data can reside
in either DRAM or FM with no replication. Hot data is swapped into DRAM by
the hardware behind the scenes.
This kernel series introduces code to enumerate the side cache by the kernel
when configured in a exclusive-cache configuration. It also adds RAS support
to deal with the aliased memory addresses.
A new ECN [1] to ACPI HMAT table was introduced and was approved to describe
the "extended-linear" addressing for direct-mapped memory-side caches. A
reserved field in the Memory Side Cache Information Structure of HMAT is
redefined as "Address Mode" where a value of 1 is defined as Extended-linear
mode. This value is valid if the cache is direct mapped. "It indicates that
the associated address range (SRAT.MemoryAffinityStructure.Length) is
comprised of the backing store capacity extended by the cache capacity." By
augmenting the HMAT and SRAT parsing code, this new information can be stored
by the HMAT handling code.
Current CXL region enumeration code is not enlightened with the side cache
configuration and therefore only presents the region size as the size of the
CXL region. Add support to allow CXL region enumeration code to query the HMAT
handling code and retrieve the information regarding the side cache and adjust
the region size accordingly. This should allow the CXL CLI to display the
full region size rather than just the CXL only region size.
There are 3 sources where the kernel may be notified that error is detected for
memory.
1. CXL DRAM event. This is a CXL event that is generated when an error is
detected by the CXL device patrol or demand scrubber. The trace_event is
augmented to display the aliased System Phyiscal Address (SPA) in addition
to the alerted address. However, reporting of memory failure is TBD until
the discussion [2] of failure reporting is settled upstream.
2. UCNA event from DRAM patrol or demand scrubber. This should eventually go
through the MCE callback chain.
3. MCE from kernel consume poison.
It is possible that all 3 sources may report at the same time and all report
at the error.
For 2 and 3, a MCE notifier callback is registered by the CXL on a per device
basis. The callback will determine if the reported address is in one of the
special regions and offline the aliased address if that is the case.
[1]: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
[2]: https://lore.kernel.org/linux-cxl/20240808151328.707869-2-ruansy.fnst@fujitsu.com/
---
Dave Jiang (6):
ACPICA: actbl1.h: Add extended linear address mode to MSCIS
acpi: numa: Add support to enumerate and store extended linear address mode
acpi/hmat / cxl: Add extended linear cache support for CXL
acpi/hmat: Add helper functions to provide extended linear cache translation
cxl: Add extended linear cache address alias emission for cxl events
cxl: Add mce notifier to emit aliased address for extended linear cache
Documentation/ABI/stable/sysfs-devices-node | 7 ++
arch/x86/include/asm/mce.h | 1 +
arch/x86/mm/pat/set_memory.c | 1 +
drivers/acpi/numa/hmat.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++
drivers/base/node.c | 2 +
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/acpi.c | 21 ++++++
drivers/cxl/core/core.h | 10 +++
drivers/cxl/core/mbox.c | 87 ++++++++++++++++++++++-
drivers/cxl/core/region.c | 78 +++++++++++++++++++--
drivers/cxl/core/trace.h | 24 ++++---
drivers/cxl/cxl.h | 8 +++
drivers/cxl/cxlmem.h | 2 +
include/acpi/actbl1.h | 5 +-
include/linux/acpi.h | 22 ++++++
include/linux/node.h | 7 ++
tools/testing/cxl/Kbuild | 1 +
17 files changed, 443 insertions(+), 17 deletions(-)
next reply other threads:[~2024-09-27 14:21 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-27 14:16 Dave Jiang [this message]
2024-09-27 14:16 ` [RFC PATCH 1/6] ACPICA: actbl1.h: Add extended linear address mode to MSCIS Dave Jiang
2024-10-02 17:57 ` Rafael J. Wysocki
2024-09-27 14:16 ` [RFC PATCH 2/6] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
2024-10-17 16:00 ` Jonathan Cameron
2024-10-29 21:01 ` Dave Jiang
2024-09-27 14:16 ` [RFC PATCH 3/6] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
2024-10-17 16:20 ` Jonathan Cameron
2024-10-29 22:04 ` Dave Jiang
2024-09-27 14:16 ` [RFC PATCH 4/6] acpi/hmat: Add helper functions to provide extended linear cache translation Dave Jiang
2024-10-17 16:33 ` Jonathan Cameron
2024-10-17 16:46 ` Luck, Tony
2024-10-17 16:59 ` Jonathan Cameron
2024-10-29 22:51 ` Dave Jiang
2024-10-30 22:53 ` Dave Jiang
2024-11-01 11:56 ` Jonathan Cameron
2024-09-27 14:16 ` [RFC PATCH 5/6] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
2024-10-17 16:38 ` Jonathan Cameron
2024-10-30 23:29 ` Dave Jiang
2024-09-27 14:16 ` [RFC PATCH 6/6] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
2024-10-17 16:40 ` Jonathan Cameron
2024-10-30 23:37 ` Dave Jiang
2024-10-31 21:12 ` Dave Jiang
2024-10-17 16:46 ` [RFC PATCH 0/6] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Jonathan Cameron
2024-10-29 22:55 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240927142108.1156362-1-dave.jiang@intel.com \
--to=dave.jiang@intel.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox