Linux CXL
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Hao Xiang <hao.xiang@bytedance.com>
Cc: "Gregory Price" <gregory.price@memverge.com>,
	"Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Gregory Price" <gourry.memverge@gmail.com>,
	"Fan Ni" <fan.ni@samsung.com>, "Ira Weiny" <ira.weiny@intel.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	qemu-devel@nongnu.org, "Ho-Ren (Jack) Chuang" <horenc@vt.edu>,
	linux-cxl@vger.kernel.org
Subject: Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'
Date: Wed, 10 Jan 2024 14:31:46 +0000	[thread overview]
Message-ID: <20240110143146.00001a56@Huawei.com> (raw)
In-Reply-To: <CAAYibXhTUgd+z3Xqk7yeWqQmHxtDmf3Ud_01iEHS0KRj9GhjUw@mail.gmail.com>

On Tue, 9 Jan 2024 15:55:46 -0800
Hao Xiang <hao.xiang@bytedance.com> wrote:

> On Tue, Jan 9, 2024 at 2:13 PM Gregory Price <gregory.price@memverge.com> wrote:
> >
> > On Tue, Jan 09, 2024 at 01:27:28PM -0800, Hao Xiang wrote:  
> > > On Tue, Jan 9, 2024 at 11:58 AM Gregory Price
> > > <gregory.price@memverge.com> wrote:  
> > > >
> > > > If you drop this line:
> > > >
> > > > -numa node,memdev=vmem0,nodeid=1  
> > >
> > > We tried this as well and it works after going through the cxlcli
> > > process and created the devdax device. The problem is that without the
> > > "nodeid=1" configuration, we cannot connect with the explicit per numa
> > > node latency/bandwidth configuration "-numa hmat-lb". I glanced at the
> > > code in hw/numa.c, parse_numa_hmat_lb() looks like the one passing the
> > > lb information to VM's hmat.
> > >  
> >
> > Yeah, this is what Jonathan was saying - right now there isn't a good
> > way (in QEMU) to pass the hmat/cdat stuff down through the device.
> > Needs to be plumbed out.
> >
> > In the meantime: You should just straight up drop the cxl device from
> > your QEMU config.  It doesn't actually get you anything.
> >  
> > > From what I understand so far, the guest kernel will dynamically
> > > create a numa node after a cxl devdax device is created. That means we
> > > don't know the numa node until after VM boot. 2. QEMU can only
> > > statically parse the lb information to the VM at boot time. How do we
> > > connect these two things?  
> >
> > during boot, the kernel discovers all the memory regions exposed to
> > bios. In this qemu configuration you have defined:
> >
> > region 0: CPU + DRAM node
> > region 1: DRAM only node
> > region 2: CXL Fixed Memory Window (the last line of the cxl stuff)
> >
> > The kernel reads this information on boot and reserves 1 numa node for
> > each of these regions.
> >
> > The kernel then automatically brings up regions 0 and 1 in nodes 0 and 1
> > respectively.
> >
> > Node2 sits dormant until you go through the cxl-cli startup sequence.
> >
> >
> > What you're asking for is for the QEMU team to plumb hmat/cdat
> > information down through the type3 device.  I *think* that is presently
> > possible with a custom CDAT file - but Jonathan probably has more
> > details on that.  You'll have to go digging for answers on that one.  
> 
> I think this is exactly what I was looking for. When we started with
> the idea of having an explicit CXL memory backend, we wanted to
> 1) Bind a virtual CXL device to an actual CXL memory node on host.
> 2) Pass the latency/bandwidth information from the CXL backend into
> the virtual CXL device.
> I didn't have a concrete idea of how to do 2)
> With the discussion here, I learned that the information is passed
> from CDAT. Just looked into the virtual CXL code and found that
> ct3_build_cdat_entries_for_mr() is the function that builds this
> information. But the latency and bandwidth there are currently
> hard-coded. I think it makes sense to have an explicit CXL memory
> backend where QEMU can query the CXL memory attributes from the host
> and pass that information from the CXL backend into the virtual CXL
> type-3 device.

There is probably an argument for a memory backend to be able to take
perf numbers in general (I don't see this as being CXL specific) or for
us adding more parameters to the cxl device entry, but for now you can
inject a cdat file that presents whatever you like.  

What we are missing though is generic port creation, so even with
everything else in place it won't quite work yet. There was a hacky
patch for generic ports, but it's not upstream yet (or in my tree).

Usefully there is work under review for adding generic initiators to
qemu that we can repurpose most of for GPs.

Jonathan

  reply	other threads:[~2024-01-10 14:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-01  7:53 [QEMU-devel][RFC PATCH 0/1] Introduce HostMemType for 'memory-backend-*' Ho-Ren (Jack) Chuang
2024-01-01  7:53 ` [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram' Ho-Ren (Jack) Chuang
2024-01-02 10:29   ` Philippe Mathieu-Daudé
2024-01-02 13:03   ` David Hildenbrand
2024-01-06  0:45     ` [External] " Hao Xiang
2024-01-03 21:56   ` Gregory Price
2024-01-06  5:59     ` [External] " Hao Xiang
2024-01-08 17:15       ` Gregory Price
2024-01-08 22:47         ` Hao Xiang
2024-01-09  1:05           ` Hao Xiang
2024-01-09  1:13             ` Gregory Price
2024-01-09 19:33               ` Hao Xiang
2024-01-09 19:57                 ` Gregory Price
2024-01-09 21:27                   ` Hao Xiang
2024-01-09 22:13                     ` Gregory Price
2024-01-09 23:55                       ` Hao Xiang
2024-01-10 14:31                         ` Jonathan Cameron [this message]
2024-01-12 15:32   ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240110143146.00001a56@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=fan.ni@samsung.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hao.xiang@bytedance.com \
    --cc=horenc@vt.edu \
    --cc=horenchuang@bytedance.com \
    --cc=imammedo@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox