From: Gregory Price <gregory.price@memverge.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Fan Ni <fan.ni@samsung.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
Adam Manzanares <a.manzanares@samsung.com>,
"dave@stgolabs.net" <dave@stgolabs.net>
Subject: Re: [GIT preview] for-6.3/cxl-ram-region
Date: Wed, 1 Feb 2023 17:05:51 -0500 [thread overview]
Message-ID: <Y9riPw7ccA34FQG6@memverge.com> (raw)
In-Reply-To: <20230202160314.00002cfa@Huawei.com>
On Thu, Feb 02, 2023 at 04:03:14PM +0000, Jonathan Cameron wrote:
>
> Note that there is another QEMU issue that needs resolving if you intend
> to use this as normal memory and it's worse under KVM. Effects the
> corner case where an instruction crosses the boundary from normal memory
> into CXL memory.
>
> Thanks to the various QEMU folk who are helping us figure out what to do
> about this for the explanations that follow!
>
> We currently handle the region as MMIO - in QEMU terms, no actual relationship
> to what the OS sees (as need to mess with the address
> mappings on each access for interleaving). That's a problem for KVM
> (which may not cope with sub page granularity remapping under the hood).
>
> https://lore.kernel.org/qemu-devel/ff3f25ee-1c98-242b-905e-0b01d9f0948d@linaro.org/#r
>
> Also a problem in TCG because the handling of executing out of MMIO takes
> a shortcut. It is fine (though very low performance as using a fall back path)
> for fully in MMIO regions, but not the corner case where the start of the instruction
> is in normal RAM (with all the related fast paths and instruction caching) and
> the end of the instruction is in the CXL MMIO region (a CFMWS window).
>
> Currently looks like the fix will be to use the slow path for this case.
> Patches welcome!
>
> Anyhow, in meantime beware.
>
> Jonathan
>
This all tracks, and is similar to what i've seen on other hypervisor
platforms when attempting to execute out of MMIO.
The reality is that CXL is not MMIO and not RAM or ROM or any of that
and is intended (eventually) to even be shared between QEMU instances.
That means it's likely going to require its own MemoryRegion model and
some deep dark corners of TCG and friends are going to require some
updates to make that work.
Whether it's worth the effort when the intent is to just let the
hardware handle that in the future, i don't really know.
Some speculation here:
The crux of the issue, as i understand it, is the invalidation path.
MMIO doesn't traditionally have a mechanism to tell the caches "hey
i got updated, boot this cache line", so whenever your compiler accesses
MMIO it - at best - does a fetch-and-discard, meaning that instruction
translations can never be cached. That's the source of the slow down on
the QEMU side, you're constantly re-compiling the translations.
On the KVM side, it likely requires a VMExit to handle the MMIO, and
when it sees that it's an instruction fetch it probably just falls back
to emulator mode to execute the instruction before re-entering. Maybe
there's a mild optimization where it continues executing until it leaves
that MMIO region, but you're still getting QEMU performance over KVM.
So that all makes sense to me.
To me, the solution here isn't to change QEMU, it's to change the kernel
to try to get it to aggressively keep executable regions out of CXL by
marking CXL regions into a new zone type that essentially says "Use this
as a last resort only for X pages". But that would likely require
adding migration code to the likes of mprotect and friends.
In the meantime, sure would be nice to have a userland program that
grooms software to detect this problem and migrate X pages to DRAM.
~Gregory
next prev parent reply other threads:[~2023-02-02 16:29 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-26 6:25 [GIT preview] for-6.3/cxl-ram-region Dan Williams
2023-01-26 6:29 ` Dan Williams
2023-01-26 18:50 ` Jonathan Cameron
2023-01-26 19:34 ` Jonathan Cameron
2023-01-30 14:16 ` Gregory Price
2023-01-30 20:10 ` Dan Williams
2023-01-30 20:58 ` Gregory Price
2023-01-30 23:18 ` Dan Williams
2023-01-30 22:00 ` Gregory Price
2023-01-31 2:00 ` Gregory Price
2023-01-31 16:56 ` Dan Williams
2023-01-31 17:59 ` Verma, Vishal L
2023-01-31 19:03 ` Gregory Price
2023-01-31 19:46 ` Verma, Vishal L
2023-01-31 20:24 ` Verma, Vishal L
2023-01-31 23:03 ` Gregory Price
2023-01-31 23:17 ` Gregory Price
2023-01-31 23:50 ` Fan Ni
2023-02-01 5:29 ` Gregory Price
2023-02-01 21:16 ` Gregory Price
2023-02-02 1:06 ` Gregory Price
2023-02-02 16:03 ` Jonathan Cameron
2023-02-01 22:05 ` Gregory Price [this message]
2023-02-02 18:13 ` Jonathan Cameron
2023-02-02 0:43 ` Gregory Price
2023-02-02 18:18 ` Dan Williams
2023-02-02 0:44 ` Gregory Price
2023-02-07 16:31 ` Jonathan Cameron
2023-01-30 14:23 ` Gregory Price
2023-01-31 14:56 ` Jonathan Cameron
2023-01-31 17:34 ` Gregory Price
2023-01-26 22:05 ` Gregory Price
2023-01-26 22:20 ` Dan Williams
2023-02-04 2:36 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y9riPw7ccA34FQG6@memverge.com \
--to=gregory.price@memverge.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=a.manzanares@samsung.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox