Linux CXL
 help / color / mirror / Atom feed
* QEMU freeze with CXL memory in Normal zone and stress-ng
@ 2023-08-18 14:20 Dimitrios Palyvos
  2023-08-23 16:55 ` Jonathan Cameron
  0 siblings, 1 reply; 4+ messages in thread
From: Dimitrios Palyvos @ 2023-08-18 14:20 UTC (permalink / raw)
  To: linux-cxl

Hello,

I have noticed a system-wide freeze when using CXL memory as RAM in
the Normal zone to run stress-ng. I am writing to check if this is a
known issue and/or if anyone has hints on how to debug this.

Versions tested:
- linux-stable v6.4.11
- QEMU from https://gitlab.com/jic23/qemu/ - branches cxl-2023-05-19,
cxl-2023-05-25, cxl-2023-07-17 (cxl-2023-07-17 also tested with linux
v6.5-rc6; haven’t managed to boot with cxl-2023-08-07)
- ndctl v77
- stress-ng, version 0.15.06
- Debian GNU/Linux 12 (bookworm)

To reproduce, start QEMU with the command:
    qemu-system-x86_64 -drive
file=/images/debian-12-cxl.qco
w2,format=qcow2,index=0,media=disk,id=hd
\
    -m 2G,slots=8,maxmem=8G \
    -smp 4 \
    -kernel /linux/arch/x86_64/boot/bzImage \
    -append "root=/dev/sda1 console=ttyS0 serial" \
    -machine type=q35,nvdimm=on,cxl=on \
    -object memory-backend-ram,id=cxl-mem1,share=on,size=1G \
    -object memory-backend-ram,id=cxl-lsa1,share=on,size=128M \
    -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
    -device cxl-rp,port=0,bus=cxl.1,id=root_port_cxl,chassis=0,slot=2 \
    -device cxl-type3,bus=root_port_cxl,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-mem0
\
    -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=1G

Initialize CXL region as RAM in zone Normal:
    cxl create-region -d decoder0.0
    ndctl create-namespace --region=region0 --mode devdax --continue
    echo offline > /sys/devices/system/memory/auto_online_blocks
    daxctl reconfigure-device --no-movable --mode=system-ram all

Running "ls" on the CXL memory  (NUMA node 1) works fine:
    root@cxl-img:~# numactl --membind 1 ls /usr
    bin    include  lib32  libexec  local  share
    games  lib      lib64  libx32   sbin   src

Running stress-ng in CXL completely freezes the system. No interaction
with the guest is possible after a few seconds:
    root@cxl-img:~# numactl --membind 1 stress-ng --vm 1 --vm-bytes 10M -t 10s
    stress-ng: info: [238] setting to a 10 second run per stressor
    stress-ng: info: [238] dispatching hogs: 1 vm

Running stress-ng in NUMA node 0 (not CXL) works fine. When the VM
freezes, the QEMU monitor can still be accessed, but the guest kernel
does not seem to respond to any external commands, e.g., (qemu)
sendkey alt-sysrq-c. Then, QEMU also freezes when trying to quit it.
I have tried to debug the (guest) kernel using gdb (starting QEMU with
the -s flag) but, after the freeze happens, gdb reports that “The
target is not responding to interrupt requests”.
Debugging QEMU works but I haven’t managed to find something
helpful that way. Also tried (briefly) kdb with no luck there either -
the kernel does not respond at all.

Patching hw/mem/cxl_type3.c functions cxl_type3_read() and
cxl_type3_write() to count the calls shows that CXL accesses happen in
both cases. In the "ls" invocation, I see around 100k reads and 100k
writes; in the "stress-ng" case, I see approximately 4 million reads
and 2.3 million writes before the VM freezes.

The issue does not appear if the CXL memory is initialized in the
Movable zone instead, i.e., when using the daxctl command without the
--no-movable flag:
    daxctl reconfigure-device --mode=system-ram all

The issue however appears when using a volatile CXL device and
initializing CXL as Normal with the command:
    cxl create-region -d decoder0.0 -s 1073741824 -t ram

Any ideas are welcome, thanks in advance!

Kind regards,
Dimitris

-- 
**CONFIDENTIALITY NOTICE:*
*
*The contents of this email message and any 
attachments are intended solely for the addressee(s) and may contain 
confidential and/or privileged information and may be legally protected 
from disclosure. If you are not the intended recipient of this message or 
their agent, or if this message has been addressed to you in error, please 
immediately alert the sender by reply email and then delete this message 
and any attachments. If you are not the intended recipient, you are hereby 
notified that any use, dissemination, copying, or storage of this message 
or its attachments is strictly prohibited. *

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-29  0:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18 14:20 QEMU freeze with CXL memory in Normal zone and stress-ng Dimitrios Palyvos
2023-08-23 16:55 ` Jonathan Cameron
2023-08-23 19:39   ` Gregory Price
2023-08-28 23:59     ` Dimitrios Palyvos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox