From: Gregory Price <gourry@gourry.net>
To: Rakie Kim <rakie.kim@sk.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com,
byungchul@sk.com, ying.huang@linux.alibaba.com,
apopple@nvidia.com, david@kernel.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, dave@stgolabs.net,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com,
dan.j.williams@intel.com, harry.yoo@oracle.com,
lsf-pc@lists.linux-foundation.org, kernel_team@skhynix.com,
honggyu.kim@sk.com, yunjeong.mun@sk.com,
Keith Busch <kbusch@kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] [RFC PATCH 0/4] mm/mempolicy: introduce socket-aware weighted interleave
Date: Thu, 26 Mar 2026 21:54:30 -0400 [thread overview]
Message-ID: <acXjVmTLSi75elvo@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <20260324053549.324-1-rakie.kim@sk.com>
[-- Attachment #1: Type: text/plain, Size: 874 bytes --]
On Tue, Mar 24, 2026 at 02:35:45PM +0900, Rakie Kim wrote:
> On Fri, 20 Mar 2026 16:56:05 +0000 Jonathan Cameron <jonathan.cameron@huawei.com> wrote:
>
> Init->Target | node0 | node1 | node2 | node3
> node0 | 0x38B | 0x89F | 0x9C4 | 0x3AFC
> node1 | 0x89F | 0x38B | 0x3AFC| 0x4268
>
> I used the identical type of DRAM and CXL memory for both sockets.
> However, looking at the table, the local CXL access latency from
> node0->node2 (0x9C4) and node1->node3 (0x4268) shows a massive,
> unjustified difference. This asymmetry proves that the table is
> currently unreliable.
>
Can you dump your CDAT for each device so you can at least check whether
the device reports the same latency?
Would at least tell the interested parties whether this is firmware or
BIOS issue.
sudo cat /sys/bus/cxl/devices/endpointN/CDAT | python3 cdat_dump.py
~Gregory
---
[-- Attachment #2: cdat_dump.py --]
[-- Type: text/plain, Size: 13718 bytes --]
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0-only
# Copyright(c) 2026 Meta Platforms, Inc. and affiliates.
#
# cdat_dump.py - Dump and decode CDAT (Coherent Device Attribute Table)
# from CXL devices via sysfs
#
# Usage:
# cdat_dump.py # dump all CXL devices with CDAT
# cdat_dump.py /sys/bus/cxl/devices/endpoint0/CDAT
# cdat_dump.py --raw cdat_binary.bin # decode from raw file
# cdat_dump.py --hex # include hex dump of each entry
import argparse
import glob
import os
import struct
import sys
# CDAT Header: u32 length, u8 revision, u8 checksum, u8 reserved[6], u32 sequence
CDAT_HDR_FMT = "<IBBBBBBBBBBI"
CDAT_HDR_SIZE = 16
# Common subtable header: u8 type, u8 reserved, u16 length
CDAT_SUBTBL_HDR_FMT = "<BBH"
CDAT_SUBTBL_HDR_SIZE = 4
# DSMAS (type 0): handle, flags, reserved(u16), dpa_base(u64), dpa_length(u64)
DSMAS_FMT = "<BBHQQ"
DSMAS_SIZE = 20
# DSLBIS (type 1): handle, flags, data_type, reserved, entry_base_unit(u64),
# entry[3](u16 x3), reserved2(u16)
DSLBIS_FMT = "<BBBBQHHHH"
DSLBIS_SIZE = 20
# DSMSCIS (type 2): dsmas_handle, reserved[3], side_cache_size(u64),
# cache_attributes(u32)
DSMSCIS_FMT = "<BBBBQI"
DSMSCIS_SIZE = 16
# DSIS (type 3): flags, handle, reserved(u16)
DSIS_FMT = "<BBH"
DSIS_SIZE = 4
# DSEMTS (type 4): dsmas_handle, memory_type, reserved(u16),
# dpa_offset(u64), range_length(u64)
DSEMTS_FMT = "<BBHQQ"
DSEMTS_SIZE = 20
# SSLBIS (type 5) fixed part: data_type, reserved[3], entry_base_unit(u64)
SSLBIS_FMT = "<BBBBQ"
SSLBIS_SIZE = 12
# SSLBE entry: portx_id(u16), porty_id(u16), latency_or_bandwidth(u16),
# reserved(u16)
SSLBE_FMT = "<HHHH"
SSLBE_SIZE = 8
CDAT_TYPE_NAMES = {
0: "DSMAS (Device Scoped Memory Affinity Structure)",
1: "DSLBIS (Device Scoped Latency and Bandwidth Information Structure)",
2: "DSMSCIS (Device Scoped Memory Side Cache Information Structure)",
3: "DSIS (Device Scoped Initiator Structure)",
4: "DSEMTS (Device Scoped EFI Memory Type Structure)",
5: "SSLBIS (Switch Scoped Latency and Bandwidth Information Structure)",
}
HMAT_DATA_TYPES = {
0: "Access Latency",
1: "Read Latency",
2: "Write Latency",
3: "Access Bandwidth",
4: "Read Bandwidth",
5: "Write Bandwidth",
}
EFI_MEM_TYPES = {
0: "EfiConventionalMemory",
1: "EfiConventionalMemory (EFI_MEMORY_SP)",
2: "EfiReservedMemoryType",
}
CACHE_ASSOCIATIVITY = {
0: "None",
1: "Direct Mapped",
2: "Complex Cache Indexing",
}
CACHE_WRITE_POLICY = {
0: "None",
1: "Write Back",
2: "Write Through",
}
def hexdump(data, indent=" "):
lines = []
for i in range(0, len(data), 16):
chunk = data[i:i+16]
hexstr = " ".join(f"{b:02x}" for b in chunk)
ascstr = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
lines.append(f"{indent}{i:04x}: {hexstr:<48s} {ascstr}")
return "\n".join(lines)
def fmt_size(size):
if size >= (1 << 40):
return f"{size / (1 << 40):.2f} TiB"
if size >= (1 << 30):
return f"{size / (1 << 30):.2f} GiB"
if size >= (1 << 20):
return f"{size / (1 << 20):.2f} MiB"
if size >= (1 << 10):
return f"{size / (1 << 10):.2f} KiB"
return f"{size} B"
def fmt_port(port_id):
if port_id == 0xFFFF:
return "ANY"
if port_id == 0x0100:
return "USP (upstream)"
return f"DSP {port_id}"
def decode_latency_bandwidth(entry_val, base_unit, data_type):
"""Decode a DSLBIS/SSLBIS entry value into human-readable form."""
if entry_val == 0xFFFF or entry_val == 0:
return "N/A"
raw = entry_val * base_unit
if data_type <= 2: # latency types (picoeconds -> nanoseconds)
ns = raw / 1000.0
if ns >= 1000:
return f"{ns/1000:.2f} us ({raw} ps)"
return f"{ns:.2f} ns ({raw} ps)"
else: # bandwidth types (MB/s)
if raw >= 1024:
return f"{raw/1024:.2f} GB/s ({raw} MB/s)"
return f"{raw} MB/s"
def decode_dsmas(data, show_hex):
handle, flags, _, dpa_base, dpa_length = struct.unpack_from(DSMAS_FMT, data)
flag_strs = []
if flags & (1 << 2):
flag_strs.append("NonVolatile")
if flags & (1 << 3):
flag_strs.append("Shareable")
if flags & (1 << 6):
flag_strs.append("ReadOnly")
flag_desc = ", ".join(flag_strs) if flag_strs else "None"
print(f" DSMAD Handle: {handle}")
print(f" Flags: 0x{flags:02x} ({flag_desc})")
print(f" DPA Base: 0x{dpa_base:016x}")
print(f" DPA Length: 0x{dpa_length:016x} ({fmt_size(dpa_length)})")
def decode_dslbis(data, show_hex):
handle, flags, data_type, _, base_unit, e0, e1, e2, _ = \
struct.unpack_from(DSLBIS_FMT, data)
dt_name = HMAT_DATA_TYPES.get(data_type, f"Unknown ({data_type})")
print(f" Handle: {handle}")
print(f" Flags: 0x{flags:02x}")
print(f" Data Type: {data_type} ({dt_name})")
print(f" Base Unit: {base_unit}")
print(f" Entry[0]: {e0} -> {decode_latency_bandwidth(e0, base_unit, data_type)}")
if e1:
print(f" Entry[1]: {e1} -> {decode_latency_bandwidth(e1, base_unit, data_type)}")
if e2:
print(f" Entry[2]: {e2} -> {decode_latency_bandwidth(e2, base_unit, data_type)}")
def decode_dsmscis(data, show_hex):
dsmas_handle, _, _, _, cache_size, cache_attr = \
struct.unpack_from(DSMSCIS_FMT, data)
total_levels = cache_attr & 0xF
cache_level = (cache_attr >> 4) & 0xF
assoc = (cache_attr >> 8) & 0xF
write_pol = (cache_attr >> 12) & 0xF
line_size = (cache_attr >> 16) & 0xFFFF
assoc_str = CACHE_ASSOCIATIVITY.get(assoc, f"Unknown ({assoc})")
wp_str = CACHE_WRITE_POLICY.get(write_pol, f"Unknown ({write_pol})")
print(f" DSMAS Handle: {dsmas_handle}")
print(f" Cache Size: 0x{cache_size:016x} ({fmt_size(cache_size)})")
print(f" Cache Attrs: 0x{cache_attr:08x}")
print(f" Total Levels: {total_levels}")
print(f" Cache Level: {cache_level}")
print(f" Associativity: {assoc_str}")
print(f" Write Policy: {wp_str}")
print(f" Line Size: {line_size} bytes")
def decode_dsis(data, show_hex):
flags, handle, _ = struct.unpack_from(DSIS_FMT, data)
mem_attached = bool(flags & 1)
if mem_attached:
handle_desc = f"DSMAS handle {handle}"
else:
handle_desc = f"Initiator handle {handle} (no memory)"
print(f" Flags: 0x{flags:02x} (Memory Attached: {mem_attached})")
print(f" Handle: {handle} ({handle_desc})")
def decode_dsemts(data, show_hex):
dsmas_handle, mem_type, _, dpa_offset, range_length = \
struct.unpack_from(DSEMTS_FMT, data)
mt_str = EFI_MEM_TYPES.get(mem_type, f"Reserved ({mem_type})")
print(f" DSMAS Handle: {dsmas_handle}")
print(f" Memory Type: {mem_type} ({mt_str})")
print(f" DPA Offset: 0x{dpa_offset:016x}")
print(f" Range Length: 0x{range_length:016x} ({fmt_size(range_length)})")
def decode_sslbis(data, total_len, show_hex):
dt, _, _, _, base_unit = struct.unpack_from(SSLBIS_FMT, data)
dt_name = HMAT_DATA_TYPES.get(dt, f"Unknown ({dt})")
print(f" Data Type: {dt} ({dt_name})")
print(f" Base Unit: {base_unit}")
# Variable number of SSLBE entries after the fixed header
entries_data = data[SSLBIS_SIZE:]
n_entries = len(entries_data) // SSLBE_SIZE
for i in range(n_entries):
off = i * SSLBE_SIZE
px, py, val, _ = struct.unpack_from(SSLBE_FMT, entries_data, off)
decoded = decode_latency_bandwidth(val, base_unit, dt)
print(f" Entry[{i}]: {fmt_port(px)} <-> {fmt_port(py)}: "
f"{val} -> {decoded}")
DECODERS = {
0: decode_dsmas,
1: decode_dslbis,
2: decode_dsmscis,
3: decode_dsis,
4: decode_dsemts,
}
def decode_cdat(data, source="", show_hex=False):
if len(data) < CDAT_HDR_SIZE:
print(f"Error: data too short for CDAT header ({len(data)} < {CDAT_HDR_SIZE})")
return False
# Parse header
vals = struct.unpack_from(CDAT_HDR_FMT, data)
length = vals[0]
revision = vals[1]
checksum = vals[2]
# vals[3:9] are the 6 reserved bytes
sequence = vals[9]
# Verify checksum
cksum = sum(data[:length]) & 0xFF
cksum_ok = "OK" if cksum == 0 else f"FAIL (sum=0x{cksum:02x})"
if source:
print(f"=== CDAT from {source} ===")
print(f"CDAT Header:")
print(f" Length: {length} bytes")
print(f" Revision: {revision}")
print(f" Checksum: 0x{checksum:02x} ({cksum_ok})")
print(f" Sequence: {sequence}")
if show_hex:
print(f" Raw header:")
print(hexdump(data[:CDAT_HDR_SIZE], " "))
if length > len(data):
print(f"Warning: CDAT length ({length}) > available data ({len(data)})")
length = len(data)
# Parse subtables
offset = CDAT_HDR_SIZE
entry_num = 0
counts = {}
while offset + CDAT_SUBTBL_HDR_SIZE <= length:
stype, _, slen = struct.unpack_from(CDAT_SUBTBL_HDR_FMT, data, offset)
if slen < CDAT_SUBTBL_HDR_SIZE:
print(f"\nError: subtable at offset {offset} has invalid length {slen}")
break
if offset + slen > length:
print(f"\nError: subtable at offset {offset} extends past end "
f"(offset+len={offset+slen} > {length})")
break
counts[stype] = counts.get(stype, 0) + 1
type_name = CDAT_TYPE_NAMES.get(stype, f"Unknown (type={stype})")
print(f"\n [{entry_num}] {type_name}")
print(f" Offset: {offset}, Length: {slen}")
if show_hex:
print(hexdump(data[offset:offset+slen], " "))
# Decode the subtable body (skip the 4-byte common header)
body = data[offset + CDAT_SUBTBL_HDR_SIZE:offset + slen]
if stype == 5:
# SSLBIS has variable length, pass total subtable body length
decode_sslbis(body, slen - CDAT_SUBTBL_HDR_SIZE, show_hex)
elif stype in DECODERS:
DECODERS[stype](body, show_hex)
else:
print(f" (unknown type, raw data follows)")
print(hexdump(body, " "))
offset += slen
entry_num += 1
# Summary
print(f"\nSummary: {entry_num} entries")
for t in sorted(counts):
name = CDAT_TYPE_NAMES.get(t, f"Unknown ({t})")
print(f" {name}: {counts[t]}")
if offset < length:
trailing = length - offset
print(f"\nWarning: {trailing} trailing bytes after last subtable")
print()
return True
def find_cdat_sysfs():
"""Find all CXL devices with CDAT attributes in sysfs."""
paths = []
for dev_path in sorted(glob.glob("/sys/bus/cxl/devices/*")):
cdat_path = os.path.join(dev_path, "CDAT")
if os.path.exists(cdat_path):
paths.append(cdat_path)
return paths
def read_cdat(path):
"""Read binary CDAT data from a sysfs attribute or file."""
try:
with open(path, "rb") as f:
return f.read()
except PermissionError:
print(f"Error: permission denied reading {path} (need root?)")
return None
except OSError as e:
print(f"Error reading {path}: {e}")
return None
def main():
parser = argparse.ArgumentParser(
description="Dump and decode CXL CDAT (Coherent Device Attribute Table)",
epilog="Without arguments, discovers and dumps CDAT from all CXL devices.\n"
"Requires root access to read sysfs CDAT attributes.",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"path", nargs="*",
help="Path to sysfs CDAT attribute or raw CDAT binary file",
)
parser.add_argument(
"--raw", action="store_true",
help="Treat input as raw CDAT binary file (not sysfs)",
)
parser.add_argument(
"--hex", action="store_true",
help="Include hex dump of each entry",
)
args = parser.parse_args()
paths = args.path
# Read from stdin if piped or explicitly given "-"
if not sys.stdin.isatty() and not paths:
data = sys.stdin.buffer.read()
if not data:
print("Error: no data on stdin")
return 1
return 0 if decode_cdat(data, "stdin", show_hex=args.hex) else 1
if paths == ["-"]:
data = sys.stdin.buffer.read()
if not data:
print("Error: no data on stdin")
return 1
return 0 if decode_cdat(data, "stdin", show_hex=args.hex) else 1
if not paths:
paths = find_cdat_sysfs()
if not paths:
print("No CXL devices with CDAT found in sysfs.")
print("Check that CXL devices are present and the cxl_port driver is loaded.")
return 1
ok = True
for path in paths:
data = read_cdat(path)
if data is None:
ok = False
continue
if not data:
dev = os.path.basename(os.path.dirname(path)) if not args.raw else path
print(f"{dev}: CDAT is empty (read from device failed at probe time)")
ok = False
continue
source = path
if not args.raw and "/sys/" in path:
source = os.path.basename(os.path.dirname(path))
if not decode_cdat(data, source, show_hex=args.hex):
ok = False
return 0 if ok else 1
if __name__ == "__main__":
sys.exit(main())
prev parent reply other threads:[~2026-03-27 1:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 5:12 [LSF/MM/BPF TOPIC] [RFC PATCH 0/4] mm/mempolicy: introduce socket-aware weighted interleave Rakie Kim
2026-03-16 5:12 ` [RFC PATCH 1/4] mm/numa: introduce nearest_nodes_nodemask() Rakie Kim
2026-03-16 5:12 ` [RFC PATCH 2/4] mm/memory-tiers: introduce socket-aware topology management for NUMA nodes Rakie Kim
2026-03-18 12:22 ` Jonathan Cameron
2026-03-16 5:12 ` [RFC PATCH 3/4] mm/memory-tiers: register CXL nodes to socket-aware packages via initiator Rakie Kim
2026-03-16 5:12 ` [RFC PATCH 4/4] mm/mempolicy: enhance weighted interleave with socket-aware locality Rakie Kim
2026-03-16 14:01 ` [LSF/MM/BPF TOPIC] [RFC PATCH 0/4] mm/mempolicy: introduce socket-aware weighted interleave Gregory Price
2026-03-17 9:50 ` Rakie Kim
2026-03-16 15:19 ` Joshua Hahn
2026-03-16 19:45 ` Gregory Price
2026-03-17 11:50 ` Rakie Kim
2026-03-17 11:36 ` Rakie Kim
2026-03-18 12:02 ` Jonathan Cameron
2026-03-19 7:55 ` Rakie Kim
2026-03-20 16:56 ` Jonathan Cameron
2026-03-24 5:35 ` Rakie Kim
2026-03-25 12:33 ` Jonathan Cameron
2026-03-26 8:54 ` Rakie Kim
2026-03-26 21:41 ` Dave Jiang
2026-03-26 22:19 ` Dave Jiang
2026-03-26 20:13 ` Dan Williams
2026-03-26 22:24 ` Dave Jiang
2026-03-27 1:54 ` Gregory Price [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acXjVmTLSi75elvo@gourry-fedora-PF4VCD3F \
--to=gourry@gourry.net \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=apopple@nvidia.com \
--cc=byungchul@sk.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=honggyu.kim@sk.com \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kbusch@kernel.org \
--cc=kernel_team@skhynix.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=rakie.kim@sk.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vishal.l.verma@intel.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yunjeong.mun@sk.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox