QEMU freeze with CXL memory in Normal zone and stress-ng

Linux CXL
 help / color / mirror / Atom feed

* QEMU freeze with CXL memory in Normal zone and stress-ng
@ 2023-08-18 14:20 Dimitrios Palyvos
  2023-08-23 16:55 ` Jonathan Cameron
  0 siblings, 1 reply; 4+ messages in thread
From: Dimitrios Palyvos @ 2023-08-18 14:20 UTC (permalink / raw)
  To: linux-cxl

Hello,

I have noticed a system-wide freeze when using CXL memory as RAM in
the Normal zone to run stress-ng. I am writing to check if this is a
known issue and/or if anyone has hints on how to debug this.

Versions tested:
- linux-stable v6.4.11
- QEMU from https://gitlab.com/jic23/qemu/ - branches cxl-2023-05-19,
cxl-2023-05-25, cxl-2023-07-17 (cxl-2023-07-17 also tested with linux
v6.5-rc6; haven’t managed to boot with cxl-2023-08-07)
- ndctl v77
- stress-ng, version 0.15.06
- Debian GNU/Linux 12 (bookworm)

To reproduce, start QEMU with the command:
    qemu-system-x86_64 -drive
file=/images/debian-12-cxl.qco
w2,format=qcow2,index=0,media=disk,id=hd
\
    -m 2G,slots=8,maxmem=8G \
    -smp 4 \
    -kernel /linux/arch/x86_64/boot/bzImage \
    -append "root=/dev/sda1 console=ttyS0 serial" \
    -machine type=q35,nvdimm=on,cxl=on \
    -object memory-backend-ram,id=cxl-mem1,share=on,size=1G \
    -object memory-backend-ram,id=cxl-lsa1,share=on,size=128M \
    -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
    -device cxl-rp,port=0,bus=cxl.1,id=root_port_cxl,chassis=0,slot=2 \
    -device cxl-type3,bus=root_port_cxl,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-mem0
\
    -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=1G

Initialize CXL region as RAM in zone Normal:
    cxl create-region -d decoder0.0
    ndctl create-namespace --region=region0 --mode devdax --continue
    echo offline > /sys/devices/system/memory/auto_online_blocks
    daxctl reconfigure-device --no-movable --mode=system-ram all

Running "ls" on the CXL memory  (NUMA node 1) works fine:
    root@cxl-img:~# numactl --membind 1 ls /usr
    bin    include  lib32  libexec  local  share
    games  lib      lib64  libx32   sbin   src

Running stress-ng in CXL completely freezes the system. No interaction
with the guest is possible after a few seconds:
    root@cxl-img:~# numactl --membind 1 stress-ng --vm 1 --vm-bytes 10M -t 10s
    stress-ng: info: [238] setting to a 10 second run per stressor
    stress-ng: info: [238] dispatching hogs: 1 vm

Running stress-ng in NUMA node 0 (not CXL) works fine. When the VM
freezes, the QEMU monitor can still be accessed, but the guest kernel
does not seem to respond to any external commands, e.g., (qemu)
sendkey alt-sysrq-c. Then, QEMU also freezes when trying to quit it.
I have tried to debug the (guest) kernel using gdb (starting QEMU with
the -s flag) but, after the freeze happens, gdb reports that “The
target is not responding to interrupt requests”.
Debugging QEMU works but I haven’t managed to find something
helpful that way. Also tried (briefly) kdb with no luck there either -
the kernel does not respond at all.

Patching hw/mem/cxl_type3.c functions cxl_type3_read() and
cxl_type3_write() to count the calls shows that CXL accesses happen in
both cases. In the "ls" invocation, I see around 100k reads and 100k
writes; in the "stress-ng" case, I see approximately 4 million reads
and 2.3 million writes before the VM freezes.

The issue does not appear if the CXL memory is initialized in the
Movable zone instead, i.e., when using the daxctl command without the
--no-movable flag:
    daxctl reconfigure-device --mode=system-ram all

The issue however appears when using a volatile CXL device and
initializing CXL as Normal with the command:
    cxl create-region -d decoder0.0 -s 1073741824 -t ram

Any ideas are welcome, thanks in advance!

Kind regards,
Dimitris

-- 
**CONFIDENTIALITY NOTICE:*
*
*The contents of this email message and any 
attachments are intended solely for the addressee(s) and may contain 
confidential and/or privileged information and may be legally protected 
from disclosure. If you are not the intended recipient of this message or 
their agent, or if this message has been addressed to you in error, please 
immediately alert the sender by reply email and then delete this message 
and any attachments. If you are not the intended recipient, you are hereby 
notified that any use, dissemination, copying, or storage of this message 
or its attachments is strictly prohibited. *

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: QEMU freeze with CXL memory in Normal zone and stress-ng
  2023-08-18 14:20 QEMU freeze with CXL memory in Normal zone and stress-ng Dimitrios Palyvos
@ 2023-08-23 16:55 ` Jonathan Cameron
  2023-08-23 19:39   ` Gregory Price
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Cameron @ 2023-08-23 16:55 UTC (permalink / raw)
  To: Dimitrios Palyvos; +Cc: linux-cxl

On Fri, 18 Aug 2023 16:20:55 +0200
Dimitrios Palyvos <dimitrios.palyvos@zptcorp.com> wrote:

> Hello,
> 
> I have noticed a system-wide freeze when using CXL memory as RAM in
> the Normal zone to run stress-ng. I am writing to check if this is a
> known issue and/or if anyone has hints on how to debug this.
> 
> Versions tested:
> - linux-stable v6.4.11
> - QEMU from https://gitlab.com/jic23/qemu/ - branches cxl-2023-05-19,
> cxl-2023-05-25, cxl-2023-07-17 (cxl-2023-07-17 also tested with linux
> v6.5-rc6; haven’t managed to boot with cxl-2023-08-07)
> - ndctl v77
> - stress-ng, version 0.15.06
> - Debian GNU/Linux 12 (bookworm)
> 
> To reproduce, start QEMU with the command:
>     qemu-system-x86_64 -drive
> file=/images/debian-12-cxl.qco
> w2,format=qcow2,index=0,media=disk,id=hd
> \
>     -m 2G,slots=8,maxmem=8G \
>     -smp 4 \
>     -kernel /linux/arch/x86_64/boot/bzImage \
>     -append "root=/dev/sda1 console=ttyS0 serial" \
>     -machine type=q35,nvdimm=on,cxl=on \
>     -object memory-backend-ram,id=cxl-mem1,share=on,size=1G \
>     -object memory-backend-ram,id=cxl-lsa1,share=on,size=128M \
>     -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>     -device cxl-rp,port=0,bus=cxl.1,id=root_port_cxl,chassis=0,slot=2 \
>     -device cxl-type3,bus=root_port_cxl,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-mem0
> \
>     -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=1G
> 
> Initialize CXL region as RAM in zone Normal:
>     cxl create-region -d decoder0.0
>     ndctl create-namespace --region=region0 --mode devdax --continue
>     echo offline > /sys/devices/system/memory/auto_online_blocks
>     daxctl reconfigure-device --no-movable --mode=system-ram all
> 
> Running "ls" on the CXL memory  (NUMA node 1) works fine:
>     root@cxl-img:~# numactl --membind 1 ls /usr
>     bin    include  lib32  libexec  local  share
>     games  lib      lib64  libx32   sbin   src
> 
> Running stress-ng in CXL completely freezes the system. No interaction
> with the guest is possible after a few seconds:
>     root@cxl-img:~# numactl --membind 1 stress-ng --vm 1 --vm-bytes 10M -t 10s
>     stress-ng: info: [238] setting to a 10 second run per stressor
>     stress-ng: info: [238] dispatching hogs: 1 vm
> 
> Running stress-ng in NUMA node 0 (not CXL) works fine. When the VM
> freezes, the QEMU monitor can still be accessed, but the guest kernel
> does not seem to respond to any external commands, e.g., (qemu)
> sendkey alt-sysrq-c. Then, QEMU also freezes when trying to quit it.
> I have tried to debug the (guest) kernel using gdb (starting QEMU with
> the -s flag) but, after the freeze happens, gdb reports that “The
> target is not responding to interrupt requests”.
> Debugging QEMU works but I haven’t managed to find something
> helpful that way. Also tried (briefly) kdb with no luck there either -
> the kernel does not respond at all.
> 
> Patching hw/mem/cxl_type3.c functions cxl_type3_read() and
> cxl_type3_write() to count the calls shows that CXL accesses happen in
> both cases. In the "ls" invocation, I see around 100k reads and 100k
> writes; in the "stress-ng" case, I see approximately 4 million reads
> and 2.3 million writes before the VM freezes.

Long shot, but can you add code to print the address and size of each access.
There might be something nasty around edge conditions that we've gotten
wrong in the emulation - I thought I'd poked them all but maybe not.

Right now I can't boot QEMU x86_64 TCG to due to an unrelated crash (nothing
to do with CXL at all but is present in 8.1.0 release) so hard for me to
try and replicate :(

Jonathan

> 
> The issue does not appear if the CXL memory is initialized in the
> Movable zone instead, i.e., when using the daxctl command without the
> --no-movable flag:
>     daxctl reconfigure-device --mode=system-ram all
> 
> The issue however appears when using a volatile CXL device and
> initializing CXL as Normal with the command:
>     cxl create-region -d decoder0.0 -s 1073741824 -t ram
> 
> Any ideas are welcome, thanks in advance!
> 
> Kind regards,
> Dimitris
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: QEMU freeze with CXL memory in Normal zone and stress-ng
  2023-08-23 16:55 ` Jonathan Cameron
@ 2023-08-23 19:39   ` Gregory Price
  2023-08-28 23:59     ` Dimitrios Palyvos
  0 siblings, 1 reply; 4+ messages in thread
From: Gregory Price @ 2023-08-23 19:39 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Dimitrios Palyvos, linux-cxl

On Wed, Aug 23, 2023 at 05:55:26PM +0100, Jonathan Cameron wrote:
> On Fri, 18 Aug 2023 16:20:55 +0200
> Dimitrios Palyvos <dimitrios.palyvos@zptcorp.com> wrote:
> 
> > Hello,
> > 
> > I have noticed a system-wide freeze when using CXL memory as RAM in
> > the Normal zone to run stress-ng. I am writing to check if this is a
> > known issue and/or if anyone has hints on how to debug this.
> > 
...
> > 
> > Running stress-ng in NUMA node 0 (not CXL) works fine. When the VM
> > freezes, the QEMU monitor can still be accessed, but the guest kernel
> > does not seem to respond to any external commands, e.g., (qemu)
> > sendkey alt-sysrq-c. Then, QEMU also freezes when trying to quit it.
> > I have tried to debug the (guest) kernel using gdb (starting QEMU with
> > the -s flag) but, after the freeze happens, gdb reports that “The
> > target is not responding to interrupt requests”.
> > Debugging QEMU works but I haven’t managed to find something
> > helpful that way. Also tried (briefly) kdb with no luck there either -
> > the kernel does not respond at all.
> > 
> > Patching hw/mem/cxl_type3.c functions cxl_type3_read() and
> > cxl_type3_write() to count the calls shows that CXL accesses happen in
> > both cases. In the "ls" invocation, I see around 100k reads and 100k
> > writes; in the "stress-ng" case, I see approximately 4 million reads
> > and 2.3 million writes before the VM freezes.
> 
> Long shot, but can you add code to print the address and size of each access.
> There might be something nasty around edge conditions that we've gotten
> wrong in the emulation - I thought I'd poked them all but maybe not.
> 
> Right now I can't boot QEMU x86_64 TCG to due to an unrelated crash (nothing
> to do with CXL at all but is present in 8.1.0 release) so hard for me to
> try and replicate :(
> 
> Jonathan
> 
> > 
> > The issue does not appear if the CXL memory is initialized in the
> > Movable zone instead, i.e., when using the daxctl command without the
> > --no-movable flag:
> >     daxctl reconfigure-device --mode=system-ram all
> > 
> > The issue however appears when using a volatile CXL device and
> > initializing CXL as Normal with the command:
> >     cxl create-region -d decoder0.0 -s 1073741824 -t ram
> > 
> > Any ideas are welcome, thanks in advance!
> > 
> > Kind regards,
> > Dimitris
> > 
> 

Something i think that is not well understood is just HOW slow the
performance of CXL memory in QEMU is right now.

1) No caching of this region is allowed at all because it is considered
   an MMIO region by QEMU/TCG.

2) Code running out of this region cannot produce TCG buffers, and so
   any code page hosted on this region must be constantly fetched, by the
   TCG non-JIT/binary translation emulation engine - even if it was
   previously executed.

   This can cause instructions/sec to drop from 100s of millions to less
   than a million in my experience.  Degenerate cases can be very bad.

3) Beyond instruction fetching, any data access requires an MMIO-style
   data-fetch, as opposed to a simple memory buffer mapping and direct
   access (e.g. what normally happens in a TCG buffer cache).

When you initialize the region in ZONE_NORMAL (--no-movable), what
you're really saying is "sure, place kernel resources there".  Once you
get memory pressure, you have the potential to start having the entire
system utilize cxl memory for kernel resources, as opposed to just
stress-ng.

To me, what you're describing isn't the system freezing.  I have
observed that the performance of CXL memory is so poor that the kernel
will simply prefer not to use the memory at all (as it in will prefer
using swap space instead, because it's that slow).

When a system crawls to a halt like this, it's anyone's guess as to
whether things like watchdogs and background tasks start preventing
forward progress.  Your interrupt injections may be masked by emulated
timers and all kinds of other stuff.  Basically you end up in a
starvation situation, and the only real answer to that problem is
"execute faster".

Until there is work to enable caching of CXL-hosted memory, I'm inclined
to say "Working as intended" because the accesses are happening and the
system appears stable - if extremely slow and non-responsive.

~Gregory

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: QEMU freeze with CXL memory in Normal zone and stress-ng
  2023-08-23 19:39   ` Gregory Price
@ 2023-08-28 23:59     ` Dimitrios Palyvos
  0 siblings, 0 replies; 4+ messages in thread
From: Dimitrios Palyvos @ 2023-08-28 23:59 UTC (permalink / raw)
  To: Gregory Price; +Cc: Jonathan Cameron, linux-cxl

Thanks for the responses Jonathan & Gregory!

On Wed, Aug 23, 2023 at 9:39 PM Gregory Price
<gregory.price@memverge.com> wrote:
> To me, what you're describing isn't the system freezing.  I have
> observed that the performance of CXL memory is so poor that the kernel
> will simply prefer not to use the memory at all (as it in will prefer
> using swap space instead, because it's that slow).
>
> When a system crawls to a halt like this, it's anyone's guess as to
> whether things like watchdogs and background tasks start preventing
> forward progress.  Your interrupt injections may be masked by emulated
> timers and all kinds of other stuff.  Basically you end up in a
> starvation situation, and the only real answer to that problem is
> "execute faster".

The explanation about the slowness and ZONE_NORMAL makes sense. I just
want to stress that I cannot see *any* CXL memory accesses after the
guest system freezes.
However, I guess that a potential deadlock in
the kernel could explain this behavior.

Another - maybe unrelated - thing I noticed when investigating this
issue: QEMU branch
cxl-2023-07-17 (from https://gitlab.com/jic23/qemu) seems to have
issues with gdb even in normal execution: pressing <Ctrl-C> in gdb
while targeting QEMU started with flag -s does not work ("The target
is not responding to interrupt requests"). This issue does not exist
in cxl-2023-05-19, where gdb kernel debugging works fine until the
guest freezes.

On Wed, Aug 23, 2023 at 6:55 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
> Long shot, but can you add code to print the address and size of each access.
> There might be something nasty around edge conditions that we've gotten
> wrong in the emulation - I thought I'd poked them all but maybe not.

That's something I can try. I will collect a detailed memory access
trace and ping you back if I
notice anything interesting.

Kind regards,
Dimitris

-- 
**CONFIDENTIALITY NOTICE:*
*
*The contents of this email message and any 
attachments are intended solely for the addressee(s) and may contain 
confidential and/or privileged information and may be legally protected 
from disclosure. If you are not the intended recipient of this message or 
their agent, or if this message has been addressed to you in error, please 
immediately alert the sender by reply email and then delete this message 
and any attachments. If you are not the intended recipient, you are hereby 
notified that any use, dissemination, copying, or storage of this message 
or its attachments is strictly prohibited. *

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-29  0:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18 14:20 QEMU freeze with CXL memory in Normal zone and stress-ng Dimitrios Palyvos
2023-08-23 16:55 ` Jonathan Cameron
2023-08-23 19:39   ` Gregory Price
2023-08-28 23:59     ` Dimitrios Palyvos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox