* [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to
@ 2024-08-09 7:47 Nicholas Piggin
2024-08-09 8:26 ` Philippe Mathieu-Daudé
2024-08-12 1:25 ` Richard Henderson
0 siblings, 2 replies; 5+ messages in thread
From: Nicholas Piggin @ 2024-08-09 7:47 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Nicholas Piggin, Richard Henderson, Paolo Bonzini
This is not a clean patch, but does fix a problem I hit with TB
invalidation due to the target software writing to memory with TBs.
Lockup messages are triggering in Linux due to page clearing taking a
long time when a code page has been freed, because it takes a lot of
notdirty notifiers, which massively slows things down. Linux might
possibly have a bug here too because it seems to hang indefinitely in
some cases, but even if it didn't, the latency of clearing these pages
is very high.
This showed when running KVM on the emulated machine, starting and
stopping guests. That causes lots of instruction pages to be freed.
Usually if you're just running Linux, executable pages remain in
pagecache so you get fewer of these bombs in the kernel memory
allocator. But page reclaim, JITs, deleting executable files, etc.,
could trigger it too.
Invalidating all TBs from the page on any hit seems to avoid the problem
and generally speeds things up.
How important is the precise invalidation? These days I assume the
tricky kind of SMC that frequently writes code close to where it's
executing is pretty rare and might not be something we really care about
for performance. Could we remove sub-page TB invalidation entirely?
Thanks,
Nick
---
accel/tcg/tb-maint.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index cc0f5afd47..d9a76b1665 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
#endif /* TARGET_HAS_PRECISE_SMC */
+ start &= TARGET_PAGE_MASK;
+ last |= ~TARGET_PAGE_MASK;
+
/* Range may not cross a page. */
tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
--
2.45.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to
2024-08-09 7:47 [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to Nicholas Piggin
@ 2024-08-09 8:26 ` Philippe Mathieu-Daudé
2024-08-12 1:25 ` Richard Henderson
1 sibling, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-08-09 8:26 UTC (permalink / raw)
To: Nicholas Piggin, qemu-devel, qemu-ppc
Cc: Richard Henderson, Paolo Bonzini, Brian Cain, Mark Burton,
Anton Johansson, Emilio Cota, Alex Bennée
(Widening Cc list)
On 9/8/24 09:47, Nicholas Piggin wrote:
> This is not a clean patch, but does fix a problem I hit with TB
> invalidation due to the target software writing to memory with TBs.
>
> Lockup messages are triggering in Linux due to page clearing taking a
> long time when a code page has been freed, because it takes a lot of
> notdirty notifiers, which massively slows things down. Linux might
> possibly have a bug here too because it seems to hang indefinitely in
> some cases, but even if it didn't, the latency of clearing these pages
> is very high.
>
> This showed when running KVM on the emulated machine, starting and
> stopping guests. That causes lots of instruction pages to be freed.
> Usually if you're just running Linux, executable pages remain in
> pagecache so you get fewer of these bombs in the kernel memory
> allocator. But page reclaim, JITs, deleting executable files, etc.,
> could trigger it too.
>
> Invalidating all TBs from the page on any hit seems to avoid the problem
> and generally speeds things up.
>
> How important is the precise invalidation? These days I assume the
> tricky kind of SMC that frequently writes code close to where it's
> executing is pretty rare and might not be something we really care about
> for performance. Could we remove sub-page TB invalidation entirely?
>
> Thanks,
> Nick
> ---
> accel/tcg/tb-maint.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
> index cc0f5afd47..d9a76b1665 100644
> --- a/accel/tcg/tb-maint.c
> +++ b/accel/tcg/tb-maint.c
> @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
> #endif /* TARGET_HAS_PRECISE_SMC */
>
> + start &= TARGET_PAGE_MASK;
> + last |= ~TARGET_PAGE_MASK;
> +
> /* Range may not cross a page. */
> tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to
2024-08-09 7:47 [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to Nicholas Piggin
2024-08-09 8:26 ` Philippe Mathieu-Daudé
@ 2024-08-12 1:25 ` Richard Henderson
2024-08-14 6:09 ` Nicholas Piggin
1 sibling, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2024-08-12 1:25 UTC (permalink / raw)
To: Nicholas Piggin, qemu-devel, qemu-ppc; +Cc: Paolo Bonzini
On 8/9/24 17:47, Nicholas Piggin wrote:
> This is not a clean patch, but does fix a problem I hit with TB
> invalidation due to the target software writing to memory with TBs.
>
> Lockup messages are triggering in Linux due to page clearing taking a
> long time when a code page has been freed, because it takes a lot of
> notdirty notifiers, which massively slows things down. Linux might
> possibly have a bug here too because it seems to hang indefinitely in
> some cases, but even if it didn't, the latency of clearing these pages
> is very high.
>
> This showed when running KVM on the emulated machine, starting and
> stopping guests. That causes lots of instruction pages to be freed.
> Usually if you're just running Linux, executable pages remain in
> pagecache so you get fewer of these bombs in the kernel memory
> allocator. But page reclaim, JITs, deleting executable files, etc.,
> could trigger it too.
>
> Invalidating all TBs from the page on any hit seems to avoid the problem
> and generally speeds things up.
>
> How important is the precise invalidation? These days I assume the
> tricky kind of SMC that frequently writes code close to where it's
> executing is pretty rare and might not be something we really care about
> for performance. Could we remove sub-page TB invalidation entirely?
Happens on x86 and s390 regularly enough, so we can't remove it.
> @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
> #endif /* TARGET_HAS_PRECISE_SMC */
>
> + start &= TARGET_PAGE_MASK;
> + last |= ~TARGET_PAGE_MASK;
> +
> /* Range may not cross a page. */
> tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
This would definitely break SMC.
However, there's a better solution. We're already iterating over all of the TBs on the
current page only. Move *everything* except the tb_phys_invalidate__locked call into the
SMC ifdef, and unconditionally invalidate every TB selected in the loop.
We experimented with something like this for aarch64, which used to spend a lot of the
kernel startup time invalidating code pages from the (somewhat bloated) EDK2 bios. But it
turned out the bigger problem was address space randomization, and with CF_PCREL the
problem appeared to go away.
I don't think we've done any kvm-under-tcg performance testing, but lockup messages would
certainly be something to look for...
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to
2024-08-12 1:25 ` Richard Henderson
@ 2024-08-14 6:09 ` Nicholas Piggin
2024-08-20 23:16 ` Richard Henderson
0 siblings, 1 reply; 5+ messages in thread
From: Nicholas Piggin @ 2024-08-14 6:09 UTC (permalink / raw)
To: Richard Henderson, qemu-devel, qemu-ppc; +Cc: Paolo Bonzini
On Mon Aug 12, 2024 at 11:25 AM AEST, Richard Henderson wrote:
> On 8/9/24 17:47, Nicholas Piggin wrote:
> > This is not a clean patch, but does fix a problem I hit with TB
> > invalidation due to the target software writing to memory with TBs.
> >
> > Lockup messages are triggering in Linux due to page clearing taking a
> > long time when a code page has been freed, because it takes a lot of
> > notdirty notifiers, which massively slows things down. Linux might
> > possibly have a bug here too because it seems to hang indefinitely in
> > some cases, but even if it didn't, the latency of clearing these pages
> > is very high.
> >
> > This showed when running KVM on the emulated machine, starting and
> > stopping guests. That causes lots of instruction pages to be freed.
> > Usually if you're just running Linux, executable pages remain in
> > pagecache so you get fewer of these bombs in the kernel memory
> > allocator. But page reclaim, JITs, deleting executable files, etc.,
> > could trigger it too.
> >
> > Invalidating all TBs from the page on any hit seems to avoid the problem
> > and generally speeds things up.
> >
> > How important is the precise invalidation? These days I assume the
> > tricky kind of SMC that frequently writes code close to where it's
> > executing is pretty rare and might not be something we really care about
> > for performance. Could we remove sub-page TB invalidation entirely?
>
> Happens on x86 and s390 regularly enough, so we can't remove it.
>
> > @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> > TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
> > #endif /* TARGET_HAS_PRECISE_SMC */
> >
> > + start &= TARGET_PAGE_MASK;
> > + last |= ~TARGET_PAGE_MASK;
> > +
> > /* Range may not cross a page. */
> > tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
>
> This would definitely break SMC.
They can't invalidate the instruction currently being executed?
I'll experiment a bit more.
> However, there's a better solution. We're already iterating over all of the TBs on the
> current page only. Move *everything* except the tb_phys_invalidate__locked call into the
> SMC ifdef, and unconditionally invalidate every TB selected in the loop.
Okay. I suspect *most* of the time even the strict SMC archs would
not be writing to the same page they're executing either. But I can
start with the !SMC.
> We experimented with something like this for aarch64, which used to spend a lot of the
> kernel startup time invalidating code pages from the (somewhat bloated) EDK2 bios. But it
> turned out the bigger problem was address space randomization, and with CF_PCREL the
> problem appeared to go away.
Interesting.
> I don't think we've done any kvm-under-tcg performance testing, but lockup messages would
> certainly be something to look for...
Yeah, actually Linux is throwing the messages a bit more recently
at least on distros that enable page clearing at alloc for security,
because that clearing is a big chunk that can happen in critical
sections.
Thanks for the suggestion, I'll give it a try.
Thanks,
Nick
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to
2024-08-14 6:09 ` Nicholas Piggin
@ 2024-08-20 23:16 ` Richard Henderson
0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2024-08-20 23:16 UTC (permalink / raw)
To: Nicholas Piggin, qemu-devel, qemu-ppc; +Cc: Paolo Bonzini
On 8/14/24 16:09, Nicholas Piggin wrote:
>>> @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>>> TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
>>> #endif /* TARGET_HAS_PRECISE_SMC */
>>>
>>> + start &= TARGET_PAGE_MASK;
>>> + last |= ~TARGET_PAGE_MASK;
>>> +
>>> /* Range may not cross a page. */
>>> tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
>>
>> This would definitely break SMC.
>
> They can't invalidate the instruction currently being executed?
They can. But that's where adjusting start/last would break things, because we've lost
track of exactly what's being modified, which changes the result of "did we modify the
current instruction".
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-20 23:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09 7:47 [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to Nicholas Piggin
2024-08-09 8:26 ` Philippe Mathieu-Daudé
2024-08-12 1:25 ` Richard Henderson
2024-08-14 6:09 ` Nicholas Piggin
2024-08-20 23:16 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).