Linux CXL
 help / color / mirror / Atom feed
* [PATCH v2 0/5] cxl: Sashiko bug fixes
@ 2026-07-02  9:08 Richard Cheng
  2026-07-02  9:08 ` [PATCH v2 1/5] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Richard Cheng @ 2026-07-02  9:08 UTC (permalink / raw)
  To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
	danwilliams
  Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
	newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng

Five independent, pre-existing bugs in the CXL core, reported by sashiko.

Patch 1: Get/Set Feature stored offset + transfer-size into a 16-bit
field via cpu_to_le16() with no bounds check, so a large offset/count
from the fwctl interface silently wrapped and steered the device to the
wrong feature offset. Reject offset + size > U16_MAX up front.

Patch 2: cxl_get_poison_unmapped() aborted its whole partition sweep on
the first fully-mapped partition, silently skipping unmapped poison in
all later partitions. Skip that partition instead.

Patch 3: the same function tolerated the -EFAULT a RAM partition returns
for Get Poison List but left it in rc, so a benign fault on the last
scanned partition surfaced as a spurious read failure. Clear rc, as
poison_by_decoder() already does.

Patch 4: the same function also ignored the ctx->offset handoff from
poison_by_decoder() and derived its scan start from the highest DPA
allocation, so the DPA of allocated-but-uncommitted decoders was never
scanned by either phase. Resume the sweep at ctx->offset.

Patch 5: cxl_get_poison_by_memdev() overwrote rc on each partition
query, so an earlier partition's failure was masked by a later success
and unscanned poison was reported as a clean list. Stop on any error
not tolerated as a RAM -EFAULT.

Changes since v1 [1]:
- Patch 1: write the bounds checks as size > U16_MAX - offset so the
  check itself cannot wrap on 32-bit architectures (sashiko)
- Patch 2: commit message wording fix (Dave)
- New patches 4 and 5, fixing the pre-existing issues sashiko raised on
  the v1 patch 3 thread [2]

[1]:
https://lore.kernel.org/linux-cxl/20260630074657.43077-1-icheng@nvidia.com/
[2]:
https://lore.kernel.org/linux-cxl/20260630100022.A621A1F000E9@smtp.kernel.org/

Richard Cheng (5):
  cxl/features: Reject feature offset that overflows 16-bit field
  cxl/region: Scan all partitions for unmapped poison
  cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan
  cxl/region: Start unmapped poison scan at the committed decoder
    boundary
  cxl/memdev: Don't overwrite the error from an earlier partition poison
    query

 drivers/cxl/core/features.c |  6 ++++++
 drivers/cxl/core/memdev.c   |  2 ++
 drivers/cxl/core/region.c   | 13 ++++++-------
 3 files changed, 14 insertions(+), 7 deletions(-)


base-commit: dc59e4fea9d83f03bad6bddf3fa2e52491777482
-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-07-02 11:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02  9:08 [PATCH v2 0/5] cxl: Sashiko bug fixes Richard Cheng
2026-07-02  9:08 ` [PATCH v2 1/5] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
2026-07-02 11:22   ` sashiko-bot
2026-07-02  9:08 ` [PATCH v2 2/5] cxl/region: Scan all partitions for unmapped poison Richard Cheng
2026-07-02  9:08 ` [PATCH v2 3/5] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan Richard Cheng
2026-07-02  9:20   ` sashiko-bot
2026-07-02  9:08 ` [PATCH v2 4/5] cxl/region: Start unmapped poison scan at the committed decoder boundary Richard Cheng
2026-07-02  9:08 ` [PATCH v2 5/5] cxl/memdev: Don't overwrite the error from an earlier partition poison query Richard Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox