linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org
Cc: rafael@kernel.org, bp@alien8.de, dan.j.williams@intel.com,
	tony.luck@intel.com, dave@stgolabs.net,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	ira.weiny@intel.com, ming.li@zohomail.com
Subject: Re: [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support
Date: Wed, 26 Feb 2025 14:31:18 -0700	[thread overview]
Message-ID: <d8459e65-74df-4ce8-8475-cd9badc19a82@intel.com> (raw)
In-Reply-To: <20250226162224.3633792-1-dave.jiang@intel.com>



On 2/26/25 9:21 AM, Dave Jiang wrote:
> v5:
> - Update couple dev_dbg() emits. (Alison)
> - Add hpa_alias emits for poison events. (Alison)
> - Drop cxlr_hpa_cache_alias() and opencode the one invocation. (Alison)
> - See individual patches for detailed changes.

Applied to cxl/next

> 
> v4:
> - Add alias adjustment for cxl_dpa_to_hpa() (Alison)
> - Add check of adjusted region start against CFMWS (Alison)
> - Use ULLONG_MAX consistently. (Alison)
> - Use hpa_alias0 consistently. (Alison)
> - Move devm_add_action_or_reset() to devm_cxl_add_mce_notifier(). (Ming)
> - See individual patches for detailed changes.
> 
> v3:
> - Drop region to nid function, deadcode.
> - Set hpa_alias default to ~0ULL to indicate no alias. (Jonathan)
> - Add endpoint check for mce handler. (Ming)
> - Add mce notifier unregister. (Ming)
> 
> v2:
> - Fix 0-day issues
> - Fix checking of cache flag. (Ming)
> - Add comment about cache range vs CFMWS. (Ming)
> - Update EXPORT_SYMBOL_(). (Jonathan)
> - Fix various code comments. (Jonathan)
> - Emit hpa_alias0 instead of hpa_alias. (Jonathan)
> - Introduce CONFIG_CXL_MCE to address kernel build dep issues.
> 
> v1:
> - Drop RFC prefix
> - Drop MMIO hole discovery. Will implement if there's real world implementation.
> - Drop MCE_PRI_CXL. Use MCE_PRI_UC. (Boris)
> - Minor refactors and grammar fixes. (Jonathan)
> - Rename 'mode' to 'address_mode'. (Jonathan)
> 
> RFCv2:
> - Dropped 1/6 (ACPICA definition merged)
> - Change UNKNOWN to RESERVED for cache definition. (Jonathan)
> - Fix spelling errors (Jonathan)
> - Rename region_res_match_range() to region_res_match_cxl_range(). (Jonathan)
> - Add warning when cache is not 1:1 with backing region. (Jonathan)
> - Code and comments cleanup. (Jonathan)
> - Make MCE code access in CXL arch independent. (Jonathan)
> - Fixup 0-day reports.
> 
> Certain systems provide an exclusive caching memory configurations where a
> 1:1 layout of DRAM and far memory (FM) such as CXL memory is utilized. In
> this configuration, the memory region is provided as a single memory region
> to the OS. For example such as below:
> 
>              128GB DRAM                         128GB CXL memory
> |------------------------------------|------------------------------------|
> 
> The kernel sees the region as a 256G system memory region. Data can reside
> in either DRAM or FM with no replication. Hot data is swapped into DRAM by
> the hardware behind the scenes.
> 
> This kernel series introduces code to enumerate the side cache by the kernel
> when configured in a exclusive-cache configuration. It also adds RAS support
> to deal with the aliased memory addresses.
> 
> A new ECN [1] to ACPI HMAT table was introduced and was approved to describe
> the "extended-linear" addressing for direct-mapped memory-side caches. A
> reserved field in the Memory Side Cache Information Structure of HMAT is
> redefined as "Address Mode" where a value of 1 is defined as Extended-linear
> mode. This value is valid if the cache is direct mapped. "It indicates that
> the associated address range (SRAT.MemoryAffinityStructure.Length) is
> comprised of the backing store capacity extended by the cache capacity." By
> augmenting the HMAT and SRAT parsing code, this new information can be stored
> by the HMAT handling code.
> 
> Current CXL region enumeration code is not enlightened with the side cache
> configuration and therefore only presents the region size as the size of the
> CXL region. Add support to allow CXL region enumeration code to query the HMAT 
> handling code and retrieve the information regarding the side cache and adjust
> the region size accordingly. This should allow the CXL CLI to display the
> full region size rather than just the CXL only region size.
> 
> There are 3 sources where the kernel may be notified that error is detected for
> memory.
> 1. CXL DRAM event. This is a CXL event that is generated when an error is
>    detected by the CXL device patrol or demand scrubber. The trace_event is
>    augmented to display the aliased System Phyiscal Address (SPA) in addition
>    to the alerted address.  However, reporting of memory failure is TBD until
>    the discussion [2] of failure reporting is settled upstream.
> 2. UCNA event from DRAM patrol or demand scrubber. This should eventually go
>    through the MCE callback chain.
> 3. MCE from kernel consume poison.
> 
> It is possible that all 3 sources may report at the same time and all report
> at the error.
> 
> For 2 and 3, a MCE notifier callback is registered by the CXL on a per device
> basis. The callback will determine if the reported address is in one of the
> special regions and offline the aliased address if that is the case.
> 
> [1]: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
> [2]: https://lore.kernel.org/linux-cxl/20240808151328.707869-2-ruansy.fnst@fujitsu.com/
> 
> ---
> 
> Dave Jiang (4):
>       acpi: numa: Add support to enumerate and store extended linear address mode
>       acpi/hmat / cxl: Add extended linear cache support for CXL
>       cxl: Add extended linear cache address alias emission for cxl events
>       cxl: Add mce notifier to emit aliased address for extended linear cache
> 
>  Documentation/ABI/stable/sysfs-devices-node |   6 +++
>  arch/x86/mm/pat/set_memory.c                |   1 +
>  drivers/acpi/numa/hmat.c                    |  44 +++++++++++++++++++
>  drivers/base/node.c                         |   2 +
>  drivers/cxl/Kconfig                         |   4 ++
>  drivers/cxl/core/Makefile                   |   2 +
>  drivers/cxl/core/acpi.c                     |  11 +++++
>  drivers/cxl/core/core.h                     |   3 ++
>  drivers/cxl/core/mbox.c                     |  20 +++++++--
>  drivers/cxl/core/mce.c                      |  65 +++++++++++++++++++++++++++
>  drivers/cxl/core/mce.h                      |  20 +++++++++
>  drivers/cxl/core/region.c                   | 114 +++++++++++++++++++++++++++++++++++++++++++++---
>  drivers/cxl/core/trace.h                    |  31 ++++++++-----
>  drivers/cxl/cxl.h                           |   8 ++++
>  drivers/cxl/cxlmem.h                        |   2 +
>  include/linux/acpi.h                        |  11 +++++
>  include/linux/node.h                        |   7 +++
>  tools/testing/cxl/Kbuild                    |   2 +
>  18 files changed, 332 insertions(+), 21 deletions(-)
> 
>  base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
> 


      parent reply	other threads:[~2025-02-26 21:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 16:21 [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Dave Jiang
2025-02-26 16:21 ` [PATCH v5 1/4] acpi: numa: Add support to enumerate and store extended linear address mode Dave Jiang
2025-02-26 16:21 ` [PATCH v5 2/4] acpi/hmat / cxl: Add extended linear cache support for CXL Dave Jiang
2025-02-26 16:21 ` [PATCH v5 3/4] cxl: Add extended linear cache address alias emission for cxl events Dave Jiang
2025-02-26 17:19   ` Alison Schofield
2025-02-26 16:21 ` [PATCH v5 4/4] cxl: Add mce notifier to emit aliased address for extended linear cache Dave Jiang
2025-02-26 17:27 ` [PATCH v5 0/4] acpi/hmat / cxl: Add exclusive caching enumeration and RAS support Alison Schofield
2025-02-26 21:31 ` Dave Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8459e65-74df-4ce8-8475-cd9badc19a82@intel.com \
    --to=dave.jiang@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=ming.li@zohomail.com \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).