From: Gregory Price <gregory.price@memverge.com>
To: Hao Xiang <hao.xiang@bytedance.com>
Cc: "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
"Gregory Price" <gourry.memverge@gmail.com>,
"Fan Ni" <fan.ni@samsung.com>, "Ira Weiny" <ira.weiny@intel.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"David Hildenbrand" <david@redhat.com>,
"Igor Mammedov" <imammedo@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
qemu-devel@nongnu.org, "Ho-Ren (Jack) Chuang" <horenc@vt.edu>,
linux-cxl@vger.kernel.org
Subject: Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'
Date: Tue, 9 Jan 2024 17:13:21 -0500 [thread overview]
Message-ID: <ZZ3FAegN/ezu9Ghy@memverge.com> (raw)
In-Reply-To: <CAAYibXjcVu-13h9jHpt2fZ3wmjz-_Dyj+WfUh0AEpid87GLriQ@mail.gmail.com>
On Tue, Jan 09, 2024 at 01:27:28PM -0800, Hao Xiang wrote:
> On Tue, Jan 9, 2024 at 11:58 AM Gregory Price
> <gregory.price@memverge.com> wrote:
> >
> > If you drop this line:
> >
> > -numa node,memdev=vmem0,nodeid=1
>
> We tried this as well and it works after going through the cxlcli
> process and created the devdax device. The problem is that without the
> "nodeid=1" configuration, we cannot connect with the explicit per numa
> node latency/bandwidth configuration "-numa hmat-lb". I glanced at the
> code in hw/numa.c, parse_numa_hmat_lb() looks like the one passing the
> lb information to VM's hmat.
>
Yeah, this is what Jonathan was saying - right now there isn't a good
way (in QEMU) to pass the hmat/cdat stuff down through the device.
Needs to be plumbed out.
In the meantime: You should just straight up drop the cxl device from
your QEMU config. It doesn't actually get you anything.
> From what I understand so far, the guest kernel will dynamically
> create a numa node after a cxl devdax device is created. That means we
> don't know the numa node until after VM boot. 2. QEMU can only
> statically parse the lb information to the VM at boot time. How do we
> connect these two things?
during boot, the kernel discovers all the memory regions exposed to
bios. In this qemu configuration you have defined:
region 0: CPU + DRAM node
region 1: DRAM only node
region 2: CXL Fixed Memory Window (the last line of the cxl stuff)
The kernel reads this information on boot and reserves 1 numa node for
each of these regions.
The kernel then automatically brings up regions 0 and 1 in nodes 0 and 1
respectively.
Node2 sits dormant until you go through the cxl-cli startup sequence.
What you're asking for is for the QEMU team to plumb hmat/cdat
information down through the type3 device. I *think* that is presently
possible with a custom CDAT file - but Jonathan probably has more
details on that. You'll have to go digging for answers on that one.
Now - even if you did that - the current state of the cxl-type3 device
is *not what you want* because your memory accesses will be routed
through the read/write functions in the emulated device.
What Jonathan and I discussed on the other thread is how you might go
about slimming this down to allow pass-through of the memory without the
need for all the fluff. This is a non-trivial refactor of the existing
device, so i would not expect that any time soon.
At the end of the day, quickest way to get-there-from-here is to just
drop the cxl related lines from your QEMU config, and keep everything
else.
>
> Assuming that the same issue applies to a physical server with CXL.
> Were you able to see a host kernel getting the correct lb information
> for a CXL devdax device?
>
Yes, if you bring up a CXL device via cxl-cli on real hardware, the
subsequent numa node ends up in the "lower tier" of the memory-tiering
infrastructure.
~Gregory
next prev parent reply other threads:[~2024-01-09 22:13 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-01 7:53 [QEMU-devel][RFC PATCH 0/1] Introduce HostMemType for 'memory-backend-*' Ho-Ren (Jack) Chuang
2024-01-01 7:53 ` [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram' Ho-Ren (Jack) Chuang
2024-01-02 10:29 ` Philippe Mathieu-Daudé
2024-01-02 13:03 ` David Hildenbrand
2024-01-06 0:45 ` [External] " Hao Xiang
2024-01-03 21:56 ` Gregory Price
2024-01-06 5:59 ` [External] " Hao Xiang
2024-01-08 17:15 ` Gregory Price
2024-01-08 22:47 ` Hao Xiang
2024-01-09 1:05 ` Hao Xiang
2024-01-09 1:13 ` Gregory Price
2024-01-09 19:33 ` Hao Xiang
2024-01-09 19:57 ` Gregory Price
2024-01-09 21:27 ` Hao Xiang
2024-01-09 22:13 ` Gregory Price [this message]
2024-01-09 23:55 ` Hao Xiang
2024-01-10 14:31 ` Jonathan Cameron
2024-01-12 15:32 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZ3FAegN/ezu9Ghy@memverge.com \
--to=gregory.price@memverge.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=fan.ni@samsung.com \
--cc=gourry.memverge@gmail.com \
--cc=hao.xiang@bytedance.com \
--cc=horenc@vt.edu \
--cc=horenchuang@bytedance.com \
--cc=imammedo@redhat.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox