All of lore.kernel.org
 help / color / mirror / Atom feed
From: Richard Cheng <icheng@nvidia.com>
To: dave@stgolabs.net, jic23@kernel.org, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	djbw@kernel.org, iweiny@kernel.org, danwilliams@nvidia.com
Cc: ming.li@zohomail.com, terry.bowman@amd.com, alucerop@amd.com,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	newtonl@nvidia.com, kristinc@nvidia.com, kaihengf@nvidia.com,
	kobak@nvidia.com, mochs@nvidia.com,
	Richard Cheng <icheng@nvidia.com>
Subject: [PATCH v5 0/2] Support zero-sized HDM decoders
Date: Tue, 23 Jun 2026 17:10:17 +0800	[thread overview]
Message-ID: <20260623091019.33417-1-icheng@nvidia.com> (raw)

CXL r4.0 8.2.4.20.12 ("Committing Decoder Programming") and 14.13.10
("CXL HDM Decoder Zero Size Commit") permit committing an HDM decoder with
size 0. BIOS may commit and lock such decoders so the OS cannot program
regions through them, this is a design choice rather than a spec
requirement.
The kernel rejected these with -ENXIO during port enumeration and aborted
the whole port, so affected systems showed nothing under 'cxl list'.

This series enumerates empty committed decoders into the topology and keeps
them out of region assembly.

Patch 1 stops rejecting empty committed decoders and enumerates them.
Empty decoders are now first-class: they take a real resource instead of
bypassing the reservation path, so port->hdm_end and the in-order DPA
checks stay consistent. discover_region() skips them so autodiscovery does
not build a phantom region for an empty decoder. Poison collection is
fixed for the case where commit_end references a decoder with no DPA
resource.

Patch 2 adds a mock_zero_size_decoders cxl_test module parameter (default
off) that commits empty (zero-size + locked) decoders under host-bridge0,
so the unit tests can exercise the new enumeration and poison paths.

Alison's suggestion about adding tests in ndctl will be sent in a separate
patch.

Tested with the ndctl cxl unit suite (cxl_test) and on real hardware.

Full cxl suite with mock_zero_size_decoders off -- no regressions:
"""
$ sudo env "PATH=$PATH" meson test -C build --suite cxl --num-processes 1 -t 6 --print-errorlogs
 1/14 ndctl:cxl / cxl-topology.sh       OK                2.89s
 2/14 ndctl:cxl / cxl-region-sysfs.sh   OK                2.40s
 3/14 ndctl:cxl / cxl-labels.sh         OK                2.38s
 4/14 ndctl:cxl / cxl-create-region.sh  OK                2.93s
 5/14 ndctl:cxl / cxl-xor-region.sh     OK                2.40s
 6/14 ndctl:cxl / cxl-events.sh         OK                2.20s
 7/14 ndctl:cxl / cxl-sanitize.sh       OK                5.19s
 8/14 ndctl:cxl / cxl-destroy-region.sh OK                2.23s
 9/14 ndctl:cxl / cxl-qos-class.sh      OK                2.28s
10/14 ndctl:cxl / cxl-translate.sh      OK                0.51s
11/14 ndctl:cxl / cxl-elc.sh            OK                2.31s
12/14 ndctl:cxl / cxl-security.sh       SKIP              0.01s   exit status 77
13/14 ndctl:cxl / cxl-features.sh       SKIP              1.04s   exit status 77
14/14 ndctl:cxl / cxl-poison.sh         OK                7.07s

Ok:                 12
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            2
Timeout:            0
"""

The subset Alison asked about:
"""
$ sudo env "PATH=$PATH" meson test -C build --num-processes 1 -t 6 --print-errorlogs cxl-region-sysfs.sh cxl-create-region.sh cxl-xor-region.sh cxl-destroy-region.sh cxl-qos-class.sh cxl-poison.sh
1/6 ndctl:cxl / cxl-region-sysfs.sh   OK                2.39s
2/6 ndctl:cxl / cxl-create-region.sh  OK                2.92s
3/6 ndctl:cxl / cxl-xor-region.sh     OK                2.39s
4/6 ndctl:cxl / cxl-destroy-region.sh OK                2.23s
5/6 ndctl:cxl / cxl-qos-class.sh      OK                2.26s
6/6 ndctl:cxl / cxl-poison.sh         OK                7.02s

Ok:                 6
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0
"""

With mock_zero_size_decoders on, cxl-poison's by-memdev-by-dpa case returns
all 4 records (clean under KASAN).

On a Montage CXL Type 3 device, enumeration is now clean: no -ENXIO, the
existing 128 GiB RAM region stays intact, and the empty decoder appears
with size 0 and locked.

Changelog:

v4 -> v5:
  - Patch 1: rework per Dan Williams -- make the zero-size DPA reservation
    first-class instead of special-casing size 0 and bypassing the
    reservation path; keep port->hdm_end consistent.
  - Patch 1: skip zero-size committed decoders in discover_region() so
    autodiscovery does not fail building a region for an empty decoder (fixes
    a -ENXIO region-attach hit while enumerating a real Type 3 device).
  - Patch 1: poison fixes (sashiko) -- gate per-decoder queries on a valid
    partition and fall back to scanning all partitions when no sized
    decoder was walked; add a len == 0 guard to cxl_mem_get_poison(); add
    the missing return after kfree() in the zero-size release path; tag the
    zero-size marker resource IORESOURCE_MEM so partition resolution
    succeeds.
  - Patch 2: gate the mock behind mock_zero_size_decoders (default off) so
    the shared cxl_test topology stays undisturbed.

Richard Cheng (2):
  cxl/hdm: Allow zero sized HDM decoders
  tools/testing/cxl: Enable zero sized decoders under hb0

 drivers/cxl/core/hdm.c       |  52 ++++++++++++------
 drivers/cxl/core/mbox.c      |   3 ++
 drivers/cxl/core/region.c    |  49 +++++++++++------
 drivers/cxl/cxl.h            |   9 ++++
 drivers/cxl/port.c           |   3 ++
 tools/testing/cxl/test/cxl.c | 100 ++++++++++++++++++++++++++++++-----
 6 files changed, 168 insertions(+), 48 deletions(-)


base-commit: eb3f4b7426cfd2b79d65b7d37155480b32259a11
-- 
2.43.0


             reply	other threads:[~2026-06-23  9:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23  9:10 Richard Cheng [this message]
2026-06-23  9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized HDM decoders Richard Cheng
2026-06-23  9:37   ` sashiko-bot
2026-06-23 19:55     ` Dan Williams (nvidia)
2026-06-23 20:13   ` Dan Williams (nvidia)
2026-06-23  9:10 ` [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0 Richard Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623091019.33417-1-icheng@nvidia.com \
    --to=icheng@nvidia.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=danwilliams@nvidia.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=djbw@kernel.org \
    --cc=iweiny@kernel.org \
    --cc=jic23@kernel.org \
    --cc=kaihengf@nvidia.com \
    --cc=kobak@nvidia.com \
    --cc=kristinc@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.li@zohomail.com \
    --cc=mochs@nvidia.com \
    --cc=newtonl@nvidia.com \
    --cc=terry.bowman@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.