qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: Sajjan Rao <sajjanr@gmail.com>
Cc: Gregory Price <gregory.price@memverge.com>,
	Dimitrios Palyvos <dimitrios.palyvos@zptcorp.com>,
	<linux-cxl@vger.kernel.org>, <qemu-devel@nongnu.org>,
	<richard.henderson@linaro.org>
Subject: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1
Date: Thu, 1 Feb 2024 13:04:38 +0000	[thread overview]
Message-ID: <20240201130438.00001384@Huawei.com> (raw)
In-Reply-To: <CAAg4ParQKj9FUe0DRX0Wmk1KT0bnxx2F7W=ic38781j7eVz+OQ@mail.gmail.com>

On Tue, 30 Jan 2024 13:50:18 +0530
Sajjan Rao <sajjanr@gmail.com> wrote:

> Hi Jonathan,
> 
> The QEMU command line in the original email has been corrected back in
> August 2023 based on the subsequent responses.
> 
> My current QEMU command line reads like below. As you can see I am not
> assigning numa to the CXL memory object.
> 
> qemu-system-x86_64 \
>  -hda /var/lib/libvirt/images/CXL-Test_1.qcow2 \
>  -machine type=q35,nvdimm=on,cxl=on \
>  -accel tcg,thread=single \
>  -m 4G \
>  -smp cpus=4 \
>  -object memory-backend-ram,size=4G,id=m0 \
>  -object memory-backend-ram,size=256M,id=cxl-mem1 \
>  -object memory-backend-ram,size=256M,id=cxl-mem2 \
>  -numa node,memdev=m0,cpus=0-3,nodeid=0 \
>  -netdev user,id=net0,net=192.168.0.0/24,dhcpstart=192.168.0.9,hostfwd=tcp::2222-:22
> \
>  -device virtio-net-pci,netdev=net0 \
>  -device pxb-cxl,bus_nr=2,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
>  -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
>  -device cxl-upstream,bus=cxl_rp_port0,id=us0,addr=0.0,multifunction=on, \
>  -device cxl-switch-mailbox-cci,bus=cxl_rp_port0,addr=0.2,target=us0 \
>  -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
>  -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=8 \
>  -device cxl-type3,bus=swport0,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
>  -device cxl-type3,bus=swport1,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
>  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=512M,cxl-fmw.0.interleave-granularity=2k
> \
>  -D /tmp/qemu.log \
>  -nographic
> 
> Until I moved to Qemu version 8.2 recently, I was able to create
> regions and run linux native commands on CXL memory using
> #numactl --membind <cxl NUMA#> top
> 
> You had advised me to turn off KVM and use tcg since the membind
> command will run code out of CXL memory which is not supported. By
> disabling KVM the membind command worked fine.
> However with Qemu version 8.2 the same membind command results in a
> kernel hard crash.

Just to check, kernel crashes, or qemu crashes?

I've probably replicated and it seems to be qemu that is going down with a TCG issue.

Bisection underway.

This may take a while.
Our use of TCG is unusual with what QEMU thinks of as io memory is unusual
so we tend to run into corners no one else cares about.

Richard, +CC on off chance you can guess what has happened and save
me a bisection run..

x86 machine pretty much as described above

root@localhost:~/devmem2# numactl --membind=1 touch a
qemu: fatal: cpu_io_recompile: could not find TB for pc=(nil)
RAX=00d6b969c0000000 RBX=ff294696c0044440 RCX=0000000000000028 RDX=0000000000000000
RSI=0000000000000275 RDI=0000000000000000 RBP=0000000490000000 RSP=ff4f8767805d3d20
R8 =0000000000000000 R9 =ff4f8767805d3cdc R10=0000000000000000 R11=0000000000000040
R12=ff294696c0044980 R13=0000000000000000 R14=ff294696c51d0000 R15=0000000000000000
RIP=ffffffff9d270fed RFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ff2946973bc00000 00000000 00000000
LDT=0000 0000000000000000 00000000 00008200 DPL=0 LDT
TR =0040 fffffe37d29e7000 00004087 00008900 DPL=0 TSS64-avl
GDT=     fffffe37d29e5000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=00007f2972bdc450 CR3=0000000490000000 CR4=00751ef0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00d6b969c0000000 CCD=0000000490000000 CCO=ADDQ
EFER=0000000000000d01
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
YMM00=0000000000000000 0000000000000000 3a3a3a3a3a3a3a3a 3a3a3a3a3a3a3a3a
YMM01=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM02=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM03=0000000000000000 0000000000000000 00ff0000000000ff 0000000000000000
YMM04=0000000000000000 0000000000000000 5f796c7261655f63 62696c5f5f004554
YMM05=0000000000000000 0000000000000000 0000000000000000 0000000000000060
YMM06=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM07=0000000000000000 0000000000000000 0909090909090909 0909090909090909
YMM08=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM09=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM10=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM11=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM12=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM13=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM14=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM15=0000000000000000 0000000000000000 0000000000000000 0000000000000000

Jonathan



> I wanted to check if this is a known issue with 8.2 and is there a way
> around it.
> 
> Thanks,
> Sajjan
> 
> On Fri, Jan 26, 2024 at 10:42 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Fri, 26 Jan 2024 10:43:43 -0500
> > Gregory Price <gregory.price@memverge.com> wrote:
> >  
> > > On Fri, Jan 26, 2024 at 12:39:26PM +0000, Jonathan Cameron wrote:  
> > > > On Thu, 25 Jan 2024 13:45:09 +0530
> > > > Sajjan Rao <sajjanr@gmail.com> wrote:
> > > >  
> > > > > Looks like something changed in QEMU 8.2 that broke running code out
> > > > > of CXL memory with KVM disabled.
> > > > > I used "numactl --membind 2 ls" as suggested by Dimitrios earlier,
> > > > > this worked for me until I updated to the latest QEMU.
> > > > >
> > > > > Is this a known issue? Or am I missing something?  
> > > >
> > > > I'm confused on how the description below ever worked.
> > > > Assigning the underlying memdev=cxl-mem1 to a numa node isn't going
> > > > to correctly build the connections the CFMWS PA range.
> > > >  
> > >
> > > I've now seen 3-4 occasions where people have done this and run into
> > > trouble (for obvious reasons).  Is there anything we can do to disallow
> > > the double-registering of a single memdev to both a numa node and a cxl
> > > device?
> > >  
> > It would be novel for us to prevent people shooting themselves
> > in the foot ;) but I guess this should be fairly easy as the
> > numa node logic prevents the same one being used multiple times so can
> > copy how that is done.
> >
> > This should do the trick (very lightly tested).
> > It's end of day Friday here so a formal patch can wait for next week.
> >
> >
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index f29346fae7..d4194bb757 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -827,6 +827,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >              error_setg(errp, "volatile memdev must have backing device");
> >              return false;
> >          }
> > +        if (host_memory_backend_is_mapped(ct3d->hostvmem)) {
> > +            error_setg(errp, "memory backend %s can't be used multiple times.",
> > +               object_get_canonical_path_component(OBJECT(ct3d->hostvmem)));
> > +            return false;
> > +        }
> >          memory_region_set_nonvolatile(vmr, false);
> >          memory_region_set_enabled(vmr, true);
> >          host_memory_backend_set_mapped(ct3d->hostvmem, true);
> > @@ -850,6 +855,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >              error_setg(errp, "persistent memdev must have backing device");
> >              return false;
> >          }
> > +        if (host_memory_backend_is_mapped(ct3d->hostpmem)) {
> > +            error_setg(errp, "memory backend %s can't be used multiple times.",
> > +               object_get_canonical_path_component(OBJECT(ct3d->hostpmem)));
> > +            return false;
> > +        }
> >          memory_region_set_nonvolatile(pmr, true);
> >          memory_region_set_enabled(pmr, true);
> >          host_memory_backend_set_mapped(ct3d->hostpmem, true);
> > @@ -880,6 +890,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >              error_setg(errp, "dynamic capacity must have backing device");
> >              return false;
> >          }
> > +        if (host_memory_backend_is_mapped(ct3d->dc.host_dc)) {
> > +            error_setg(errp, "memory backend %s can't be used multiple times.",
> > +               object_get_canonical_path_component(OBJECT(ct3d->dc.host_dc)));
> > +            return false;
> > +        }
> >          /* FIXME: set dc as nonvolatile for now */
> >          memory_region_set_nonvolatile(dc_mr, true);
> >          memory_region_set_enabled(dc_mr, true);
> >
> >
> >
> >
> >  
> > > ~Gregory  
> >  



       reply	other threads:[~2024-02-01 13:04 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAAg4PaqsGZvkDk_=PH+Oz-yeEUVcVsrumncAgegRKuxe_YoFhA@mail.gmail.com>
     [not found] ` <CAGEDW0fWCfuG3KrNSwDjNVGAZVL9NJgF26Jqyd840HfQdNGLbA@mail.gmail.com>
     [not found]   ` <CAAg4Pard=zh_5p650UcNdQEoQWZLM6G7KRqdPQHLmaR4oZMJ3w@mail.gmail.com>
     [not found]     ` <CAGEDW0dVEk-QXuL=DPVvSP4t5cafz6N-r_SrCxgFnBfFOsixSA@mail.gmail.com>
     [not found]       ` <CAAg4PaqgZcTXkWuys7FZjQdRChTkKj-ZnJQCdxpTMCxy4Hghow@mail.gmail.com>
     [not found]         ` <20230823175056.00001a84@Huawei.com>
     [not found]           ` <CAAg4ParSB4_2FU2bu96A=3tSNuwHqZwK0wCS18EJoPAq9kYEkw@mail.gmail.com>
     [not found]             ` <CAAg4Pap9KzkgX=fgE7vNJYxEpGbHA-NVsgBY5npXizUbMhjp9A@mail.gmail.com>
     [not found]               ` <20240126123926.000051bd@Huawei.com>
     [not found]                 ` <ZbPTL00WOo7UC0e6@memverge.com>
     [not found]                   ` <20240126171233.00002a2e@Huawei.com>
     [not found]                     ` <CAAg4ParQKj9FUe0DRX0Wmk1KT0bnxx2F7W=ic38781j7eVz+OQ@mail.gmail.com>
2024-02-01 13:04                       ` Jonathan Cameron via [this message]
2024-02-01 13:12                         ` Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1 Peter Maydell
2024-02-01 14:01                           ` Jonathan Cameron via
2024-02-01 14:35                             ` Peter Maydell
2024-02-01 15:17                               ` Alex Bennée
2024-02-01 15:29                                 ` Jonathan Cameron via
2024-02-01 16:00                                 ` Peter Maydell
2024-02-01 16:21                                   ` Jonathan Cameron via
2024-02-01 16:45                                     ` Alex Bennée
2024-02-01 17:04                                       ` Gregory Price
2024-02-01 17:07                                         ` Peter Maydell
2024-02-01 17:29                                           ` Gregory Price
2024-02-01 17:08                                       ` Jonathan Cameron via
2024-02-01 17:21                                         ` Peter Maydell
2024-02-01 17:41                                           ` Jonathan Cameron via
2024-02-01 17:25                                         ` Alex Bennée
2024-02-01 18:04                                           ` Peter Maydell
2024-02-01 18:56                                             ` Gregory Price
2024-02-02 16:26                                               ` Jonathan Cameron via
2024-02-02 16:33                                                 ` Peter Maydell
2024-02-02 16:50                                                   ` Gregory Price
2024-02-02 16:56                                                     ` Peter Maydell
2024-02-07 17:34                                                       ` Jonathan Cameron via
2024-02-08 14:50                                                         ` Jonathan Cameron via
2024-02-15 15:29                                                           ` Jonathan Cameron via
2024-02-19  7:55                                                             ` Mattias Nissler
2024-02-15 15:04                                   ` Jonathan Cameron via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240201130438.00001384@Huawei.com \
    --to=qemu-devel@nongnu.org \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=dimitrios.palyvos@zptcorp.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=richard.henderson@linaro.org \
    --cc=sajjanr@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).