Re: QEMU freeze with CXL memory in Normal zone and stress-ng

Linux CXL
 help / color / mirror / Atom feed

From: Gregory Price <gregory.price@memverge.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dimitrios Palyvos <dimitrios.palyvos@zptcorp.com>,
	linux-cxl@vger.kernel.org
Subject: Re: QEMU freeze with CXL memory in Normal zone and stress-ng
Date: Wed, 23 Aug 2023 15:39:02 -0400	[thread overview]
Message-ID: <ZOZgVm3fx8fEf8xq@memverge.com> (raw)
In-Reply-To: <20230823175526.0000368e@Huawei.com>

On Wed, Aug 23, 2023 at 05:55:26PM +0100, Jonathan Cameron wrote:
> On Fri, 18 Aug 2023 16:20:55 +0200
> Dimitrios Palyvos <dimitrios.palyvos@zptcorp.com> wrote:
> 
> > Hello,
> > 
> > I have noticed a system-wide freeze when using CXL memory as RAM in
> > the Normal zone to run stress-ng. I am writing to check if this is a
> > known issue and/or if anyone has hints on how to debug this.
> > 
...
> > 
> > Running stress-ng in NUMA node 0 (not CXL) works fine. When the VM
> > freezes, the QEMU monitor can still be accessed, but the guest kernel
> > does not seem to respond to any external commands, e.g., (qemu)
> > sendkey alt-sysrq-c. Then, QEMU also freezes when trying to quit it.
> > I have tried to debug the (guest) kernel using gdb (starting QEMU with
> > the -s flag) but, after the freeze happens, gdb reports that “The
> > target is not responding to interrupt requests”.
> > Debugging QEMU works but I haven’t managed to find something
> > helpful that way. Also tried (briefly) kdb with no luck there either -
> > the kernel does not respond at all.
> > 
> > Patching hw/mem/cxl_type3.c functions cxl_type3_read() and
> > cxl_type3_write() to count the calls shows that CXL accesses happen in
> > both cases. In the "ls" invocation, I see around 100k reads and 100k
> > writes; in the "stress-ng" case, I see approximately 4 million reads
> > and 2.3 million writes before the VM freezes.
> 
> Long shot, but can you add code to print the address and size of each access.
> There might be something nasty around edge conditions that we've gotten
> wrong in the emulation - I thought I'd poked them all but maybe not.
> 
> Right now I can't boot QEMU x86_64 TCG to due to an unrelated crash (nothing
> to do with CXL at all but is present in 8.1.0 release) so hard for me to
> try and replicate :(
> 
> Jonathan
> 
> > 
> > The issue does not appear if the CXL memory is initialized in the
> > Movable zone instead, i.e., when using the daxctl command without the
> > --no-movable flag:
> >     daxctl reconfigure-device --mode=system-ram all
> > 
> > The issue however appears when using a volatile CXL device and
> > initializing CXL as Normal with the command:
> >     cxl create-region -d decoder0.0 -s 1073741824 -t ram
> > 
> > Any ideas are welcome, thanks in advance!
> > 
> > Kind regards,
> > Dimitris
> > 
> 

Something i think that is not well understood is just HOW slow the
performance of CXL memory in QEMU is right now.

1) No caching of this region is allowed at all because it is considered
   an MMIO region by QEMU/TCG.

2) Code running out of this region cannot produce TCG buffers, and so
   any code page hosted on this region must be constantly fetched, by the
   TCG non-JIT/binary translation emulation engine - even if it was
   previously executed.

   This can cause instructions/sec to drop from 100s of millions to less
   than a million in my experience.  Degenerate cases can be very bad.

3) Beyond instruction fetching, any data access requires an MMIO-style
   data-fetch, as opposed to a simple memory buffer mapping and direct
   access (e.g. what normally happens in a TCG buffer cache).

When you initialize the region in ZONE_NORMAL (--no-movable), what
you're really saying is "sure, place kernel resources there".  Once you
get memory pressure, you have the potential to start having the entire
system utilize cxl memory for kernel resources, as opposed to just
stress-ng.

To me, what you're describing isn't the system freezing.  I have
observed that the performance of CXL memory is so poor that the kernel
will simply prefer not to use the memory at all (as it in will prefer
using swap space instead, because it's that slow).

When a system crawls to a halt like this, it's anyone's guess as to
whether things like watchdogs and background tasks start preventing
forward progress.  Your interrupt injections may be masked by emulated
timers and all kinds of other stuff.  Basically you end up in a
starvation situation, and the only real answer to that problem is
"execute faster".

Until there is work to enable caching of CXL-hosted memory, I'm inclined
to say "Working as intended" because the accesses are happening and the
system appears stable - if extremely slow and non-responsive.

~Gregory

next prev parent reply	other threads:[~2023-08-23 19:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-18 14:20 QEMU freeze with CXL memory in Normal zone and stress-ng Dimitrios Palyvos
2023-08-23 16:55 ` Jonathan Cameron
2023-08-23 19:39   ` Gregory Price [this message]
2023-08-28 23:59     ` Dimitrios Palyvos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZOZgVm3fx8fEf8xq@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dimitrios.palyvos@zptcorp.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox