From: Dave Jiang <dave.jiang@intel.com>
To: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org
Cc: rafael@kernel.org, bp@alien8.de, dan.j.williams@intel.com,
tony.luck@intel.com, dave@stgolabs.net,
jonathan.cameron@huawei.com, alison.schofield@intel.com,
ira.weiny@intel.com, ming.li@zohomail.com
Subject: Re: [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support
Date: Wed, 26 Feb 2025 14:31:18 -0700 [thread overview]
Message-ID: <d8459e65-74df-4ce8-8475-cd9badc19a82@intel.com> (raw)
In-Reply-To: <20250226162224.3633792-1-dave.jiang@intel.com>
On 2/26/25 9:21 AM, Dave Jiang wrote:
> v5:
> - Update couple dev_dbg() emits. (Alison)
> - Add hpa_alias emits for poison events. (Alison)
> - Drop cxlr_hpa_cache_alias() and opencode the one invocation. (Alison)
> - See individual patches for detailed changes.
Applied to cxl/next
>
> v4:
> - Add alias adjustment for cxl_dpa_to_hpa() (Alison)
> - Add check of adjusted region start against CFMWS (Alison)
> - Use ULLONG_MAX consistently. (Alison)
> - Use hpa_alias0 consistently. (Alison)
> - Move devm_add_action_or_reset() to devm_cxl_add_mce_notifier(). (Ming)
> - See individual patches for detailed changes.
>
> v3:
> - Drop region to nid function, deadcode.
> - Set hpa_alias default to ~0ULL to indicate no alias. (Jonathan)
> - Add endpoint check for mce handler. (Ming)
> - Add mce notifier unregister. (Ming)
>
> v2:
> - Fix 0-day issues
> - Fix checking of cache flag. (Ming)
> - Add comment about cache range vs CFMWS. (Ming)
> - Update EXPORT_SYMBOL_(). (Jonathan)
> - Fix various code comments. (Jonathan)
> - Emit hpa_alias0 instead of hpa_alias. (Jonathan)
> - Introduce CONFIG_CXL_MCE to address kernel build dep issues.
>
> v1:
> - Drop RFC prefix
> - Drop MMIO hole discovery. Will implement if there's real world implementation.
> - Drop MCE_PRI_CXL. Use MCE_PRI_UC. (Boris)
> - Minor refactors and grammar fixes. (Jonathan)
> - Rename 'mode' to 'address_mode'. (Jonathan)
>
> RFCv2:
> - Dropped 1/6 (ACPICA definition merged)
> - Change UNKNOWN to RESERVED for cache definition. (Jonathan)
> - Fix spelling errors (Jonathan)
> - Rename region_res_match_range() to region_res_match_cxl_range(). (Jonathan)
> - Add warning when cache is not 1:1 with backing region. (Jonathan)
> - Code and comments cleanup. (Jonathan)
> - Make MCE code access in CXL arch independent. (Jonathan)
> - Fixup 0-day reports.
>
> Certain systems provide an exclusive caching memory configurations where a
> 1:1 layout of DRAM and far memory (FM) such as CXL memory is utilized. In
> this configuration, the memory region is provided as a single memory region
> to the OS. For example such as below:
>
> 128GB DRAM 128GB CXL memory
> |------------------------------------|------------------------------------|
>
> The kernel sees the region as a 256G system memory region. Data can reside
> in either DRAM or FM with no replication. Hot data is swapped into DRAM by
> the hardware behind the scenes.
>
> This kernel series introduces code to enumerate the side cache by the kernel
> when configured in a exclusive-cache configuration. It also adds RAS support
> to deal with the aliased memory addresses.
>
> A new ECN [1] to ACPI HMAT table was introduced and was approved to describe
> the "extended-linear" addressing for direct-mapped memory-side caches. A
> reserved field in the Memory Side Cache Information Structure of HMAT is
> redefined as "Address Mode" where a value of 1 is defined as Extended-linear
> mode. This value is valid if the cache is direct mapped. "It indicates that
> the associated address range (SRAT.MemoryAffinityStructure.Length) is
> comprised of the backing store capacity extended by the cache capacity." By
> augmenting the HMAT and SRAT parsing code, this new information can be stored
> by the HMAT handling code.
>
> Current CXL region enumeration code is not enlightened with the side cache
> configuration and therefore only presents the region size as the size of the
> CXL region. Add support to allow CXL region enumeration code to query the HMAT
> handling code and retrieve the information regarding the side cache and adjust
> the region size accordingly. This should allow the CXL CLI to display the
> full region size rather than just the CXL only region size.
>
> There are 3 sources where the kernel may be notified that error is detected for
> memory.
> 1. CXL DRAM event. This is a CXL event that is generated when an error is
> detected by the CXL device patrol or demand scrubber. The trace_event is
> augmented to display the aliased System Phyiscal Address (SPA) in addition
> to the alerted address. However, reporting of memory failure is TBD until
> the discussion [2] of failure reporting is settled upstream.
> 2. UCNA event from DRAM patrol or demand scrubber. This should eventually go
> through the MCE callback chain.
> 3. MCE from kernel consume poison.
>
> It is possible that all 3 sources may report at the same time and all report
> at the error.
>
> For 2 and 3, a MCE notifier callback is registered by the CXL on a per device
> basis. The callback will determine if the reported address is in one of the
> special regions and offline the aliased address if that is the case.
>
> [1]: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> [2]: https://lore.kernel.org/linux-cxl/20240808151328.707869-2-ruansy.fnst@fujitsu.com/
>
> ---
>
> Dave Jiang (4):
> acpi: numa: Add support to enumerate and store extended linear address mode
> acpi/hmat / cxl: Add extended linear cache support for CXL
> cxl: Add extended linear cache address alias emission for cxl events
> cxl: Add mce notifier to emit aliased address for extended linear cache
>
> Documentation/ABI/stable/sysfs-devices-node | 6 +++
> arch/x86/mm/pat/set_memory.c | 1 +
> drivers/acpi/numa/hmat.c | 44 +++++++++++++++++++
> drivers/base/node.c | 2 +
> drivers/cxl/Kconfig | 4 ++
> drivers/cxl/core/Makefile | 2 +
> drivers/cxl/core/acpi.c | 11 +++++
> drivers/cxl/core/core.h | 3 ++
> drivers/cxl/core/mbox.c | 20 +++++++--
> drivers/cxl/core/mce.c | 65 +++++++++++++++++++++++++++
> drivers/cxl/core/mce.h | 20 +++++++++
> drivers/cxl/core/region.c | 114 +++++++++++++++++++++++++++++++++++++++++++++---
> drivers/cxl/core/trace.h | 31 ++++++++-----
> drivers/cxl/cxl.h | 8 ++++
> drivers/cxl/cxlmem.h | 2 +
> include/linux/acpi.h | 11 +++++
> include/linux/node.h | 7 +++
> tools/testing/cxl/Kbuild | 2 +
> 18 files changed, 332 insertions(+), 21 deletions(-)
>
> base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
>
prev parent reply other threads:[~2025-02-26 21:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-26 16:21 [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2025-02-26 16:21 ` [PATCH v5 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
2025-02-26 16:21 ` [PATCH v5 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
2025-02-26 16:21 ` [PATCH v5 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
2025-02-26 17:19 ` Alison Schofield
2025-02-26 16:21 ` [PATCH v5 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
2025-02-26 17:27 ` [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Alison Schofield
2025-02-26 21:31 ` Dave Jiang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d8459e65-74df-4ce8-8475-cd9badc19a82@intel.com \
--to=dave.jiang@intel.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=ming.li@zohomail.com \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.