From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Fan Ni <fan.ni@samsung.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
Adam Manzanares <a.manzanares@samsung.com>,
"dave@stgolabs.net" <dave@stgolabs.net>
Subject: Re: [GIT preview] for-6.3/cxl-ram-region
Date: Thu, 2 Feb 2023 18:13:57 +0000 [thread overview]
Message-ID: <20230202181357.00002827@Huawei.com> (raw)
In-Reply-To: <Y9riPw7ccA34FQG6@memverge.com>
On Wed, 1 Feb 2023 17:05:51 -0500
Gregory Price <gregory.price@memverge.com> wrote:
> On Thu, Feb 02, 2023 at 04:03:14PM +0000, Jonathan Cameron wrote:
> >
> > Note that there is another QEMU issue that needs resolving if you intend
> > to use this as normal memory and it's worse under KVM. Effects the
> > corner case where an instruction crosses the boundary from normal memory
> > into CXL memory.
> >
> > Thanks to the various QEMU folk who are helping us figure out what to do
> > about this for the explanations that follow!
> >
> > We currently handle the region as MMIO - in QEMU terms, no actual relationship
> > to what the OS sees (as need to mess with the address
> > mappings on each access for interleaving). That's a problem for KVM
> > (which may not cope with sub page granularity remapping under the hood).
> >
> > https://lore.kernel.org/qemu-devel/ff3f25ee-1c98-242b-905e-0b01d9f0948d@linaro.org/#r
> >
> > Also a problem in TCG because the handling of executing out of MMIO takes
> > a shortcut. It is fine (though very low performance as using a fall back path)
> > for fully in MMIO regions, but not the corner case where the start of the instruction
> > is in normal RAM (with all the related fast paths and instruction caching) and
> > the end of the instruction is in the CXL MMIO region (a CFMWS window).
> >
> > Currently looks like the fix will be to use the slow path for this case.
> > Patches welcome!
> >
> > Anyhow, in meantime beware.
> >
> > Jonathan
> >
>
> This all tracks, and is similar to what i've seen on other hypervisor
> platforms when attempting to execute out of MMIO.
>
> The reality is that CXL is not MMIO and not RAM or ROM or any of that
> and is intended (eventually) to even be shared between QEMU instances.
Indeed it's an oddity - but this is all smoke and mirrors anyway
as far as the OS is concerned. The OS / firmware doesn't need to know
anything about how we are modeling things in QEMU - we need to make
QEMU functionally correct.
>
> That means it's likely going to require its own MemoryRegion model and
> some deep dark corners of TCG and friends are going to require some
> updates to make that work.
>
> Whether it's worth the effort when the intent is to just let the
> hardware handle that in the future, i don't really know.
I think we need a fix, though perf can be terrible ;)
>
>
>
> Some speculation here:
>
> The crux of the issue, as i understand it, is the invalidation path.
> MMIO doesn't traditionally have a mechanism to tell the caches "hey
> i got updated, boot this cache line", so whenever your compiler accesses
> MMIO it - at best - does a fetch-and-discard, meaning that instruction
> translations can never be cached. That's the source of the slow down on
> the QEMU side, you're constantly re-compiling the translations.
If we actually care, I think we could do some tricks with creating
a cache of pages in between the interleaved underlying element and TCG
so that it could behave as if it were dealing with a normal page when
executing, but when writing it would drop such a cache so the writes
reach the interleave elements. I don't think we do care for now though.
Slow down on TCG is fine (if possibly rather painful). We just need to
work around the currently broken corner.
>
> On the KVM side, it likely requires a VMExit to handle the MMIO, and
> when it sees that it's an instruction fetch it probably just falls back
> to emulator mode to execute the instruction before re-entering. Maybe
> there's a mild optimization where it continues executing until it leaves
> that MMIO region, but you're still getting QEMU performance over KVM.
Might be ways to improve that perf, but TCG is good enough for testing
so I'm not sure we care. I tend to be doing cross architecture anyway
so don't care much about KVM :)
>
> So that all makes sense to me.
>
> To me, the solution here isn't to change QEMU, it's to change the kernel
> to try to get it to aggressively keep executable regions out of CXL by
> marking CXL regions into a new zone type that essentially says "Use this
> as a last resort only for X pages". But that would likely require
> adding migration code to the likes of mprotect and friends.
No. This is a QEMU emulation issue, not a real hardware one - so we shouldn't
touch the kernel. On real hardware (with exception of shared case) it should
be fine to execute from CXL memory.
>
> In the meantime, sure would be nice to have a userland program that
> grooms software to detect this problem and migrate X pages to DRAM.
Lots of program memory is exceedingly cold (typically things like init and
exit code that is only touched once per program run), so we very much do not want
to do this. In some cases we want to push executable memory to the CXL memory.
Jonathan
>
>
> ~Gregory
next prev parent reply other threads:[~2023-02-02 18:14 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-26 6:25 [GIT preview] for-6.3/cxl-ram-region Dan Williams
2023-01-26 6:29 ` Dan Williams
2023-01-26 18:50 ` Jonathan Cameron
2023-01-26 19:34 ` Jonathan Cameron
2023-01-30 14:16 ` Gregory Price
2023-01-30 20:10 ` Dan Williams
2023-01-30 20:58 ` Gregory Price
2023-01-30 23:18 ` Dan Williams
2023-01-30 22:00 ` Gregory Price
2023-01-31 2:00 ` Gregory Price
2023-01-31 16:56 ` Dan Williams
2023-01-31 17:59 ` Verma, Vishal L
2023-01-31 19:03 ` Gregory Price
2023-01-31 19:46 ` Verma, Vishal L
2023-01-31 20:24 ` Verma, Vishal L
2023-01-31 23:03 ` Gregory Price
2023-01-31 23:17 ` Gregory Price
2023-01-31 23:50 ` Fan Ni
2023-02-01 5:29 ` Gregory Price
2023-02-01 21:16 ` Gregory Price
2023-02-02 1:06 ` Gregory Price
2023-02-02 16:03 ` Jonathan Cameron
2023-02-01 22:05 ` Gregory Price
2023-02-02 18:13 ` Jonathan Cameron [this message]
2023-02-02 0:43 ` Gregory Price
2023-02-02 18:18 ` Dan Williams
2023-02-02 0:44 ` Gregory Price
2023-02-07 16:31 ` Jonathan Cameron
2023-01-30 14:23 ` Gregory Price
2023-01-31 14:56 ` Jonathan Cameron
2023-01-31 17:34 ` Gregory Price
2023-01-26 22:05 ` Gregory Price
2023-01-26 22:20 ` Dan Williams
2023-02-04 2:36 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230202181357.00002827@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=a.manzanares@samsung.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=gregory.price@memverge.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox