From: Joelle van Dyne <j@getutm.app>
To: Roman Bolshakov <r.bolshakov@yadro.com>
Cc: QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [PATCH] tcg: Fix execution on Apple Silicon
Date: Sun, 3 Jan 2021 08:46:27 -0800 [thread overview]
Message-ID: <CA+E+eSBtxV996L+vNo5DBdrgafH=CX+TooRjj0KsfOtcvqYxWQ@mail.gmail.com> (raw)
In-Reply-To: <X/HSr0q0ihc9uBUO@SPB-NB-133.local>
Can you test with a low memory (-m 512) and also with single threaded
SMP (-smp 4)? Wondering if you're hitting all the edge cases because
there's oddities with cache flushing (can be done both in code gen and
code exec) and interrupt handling that caused issues for me.
-j
On Sun, Jan 3, 2021 at 6:20 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
>
> On Sat, Jan 02, 2021 at 11:55:29AM -0800, Joelle van Dyne wrote:
> > I see you didn't touch cpu_loop_exit() and I'm curious how async
> > interrupts are handled. Have you tested this and it works i.e. booting
> > Windows 7 or Ubuntu 20.04? Also I've seen do_tb_phys_invalidate()
> > called both from code generation context (write unlocked) and
> > execution context (write locked), how does this patch differentiate
> > the two?
> >
>
> Hi Joelle,
>
> I used the following rule of thumb when finding places where exec/write
> protection has to be lifted. If it returns EXC_BAD_ACCESS under lldb and
> stack backtrace is meaningful, then a write-protected region is
> accessed. If the stack couldn't be unwinded and EXC_BAD_ACCESS is
> returned then the region has exec restrictions.
>
> With the patch I wasn't able to see any EXC_BAD_ACCESS.
>
> I've tested x86_64 Ubuntu 20.04 Desktop. It boots but it's very slow
> (but faster than TCG on x86). Windows XP is much faster and quite
> responsive. I also tested Windows 95. I'll test Win 7/Win 10 a bit
> later.
>
> I'm going to update v2 shortly and going to introduce assymetric changes of
> permissions akin to Apple's JavaScriptCore. In v2, I'm not changing
> permission back and force unless it's needed to avoid EXC_BAD_ACCESS.
>
> Regards,
> Roman
>
> > -j
> >
> > On Sat, Jan 2, 2021 at 8:13 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
> > >
> > > On Sat, Jan 02, 2021 at 03:21:02PM +0300, Roman Bolshakov wrote:
> > > > Pages can't be both write and executable at the same time on Apple
> > > > Silicon. macOS provides public API to switch write protection [1] for
> > > > JIT applications, like TCG.
> > > >
> > > > 1. https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon
> > > >
> > > > Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> > > > ---
> > > >
> > > > Happy holidays, everyone.
> > > >
> > > > This is somewhat similar to https://patchwork.kernel.org/project/qemu-devel/patch/20201108232425.1705-7-j@getutm.app/
> > > > but I couldn't apply the series so I started from scratch.
> > > >
> > > > The primary difference from the patch above is that public API is used.
> > > > Other differences:
> > > > * TB pages are mostly kept write-locked except around tcg_qemu_tb_exec()
> > > > * x86_64 macOS doesn't use MAP_JIT and W^X switches
> > > >
> > > > Regards,
> > > > Roman
> > > >
> > > > accel/tcg/cpu-exec.c | 10 ++++++++++
> > > > accel/tcg/translate-all.c | 26 ++++++++++++++++++++++++++
> > > > include/exec/exec-all.h | 2 ++
> > > > tcg/tcg.c | 1 +
> > > > 4 files changed, 39 insertions(+)
> > > >
> > > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > > > index 8689c54499..0042fc9f2b 100644
> > > > --- a/accel/tcg/cpu-exec.c
> > > > +++ b/accel/tcg/cpu-exec.c
> > > > @@ -175,7 +175,9 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
> > > > }
> > > > #endif /* DEBUG_DISAS */
> > > >
> > > > + tb_write_lock();
> > > > ret = tcg_qemu_tb_exec(env, tb_ptr);
> > > > + tb_write_unlock();
> > > > cpu->can_do_io = 1;
> > > > last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
> > > > tb_exit = ret & TB_EXIT_MASK;
> > > > @@ -220,9 +222,11 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
> > > > cflags |= MIN(max_cycles, CF_COUNT_MASK);
> > > >
> > > > mmap_lock();
> > > > + tb_write_unlock();
> > > > tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base,
> > > > orig_tb->flags, cflags);
> > > > tb->orig_tb = orig_tb;
> > > > + tb_write_lock();
> > > > mmap_unlock();
> > > >
> > > > /* execute the generated code */
> > > > @@ -268,7 +272,9 @@ void cpu_exec_step_atomic(CPUState *cpu)
> > > > tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
> > > > if (tb == NULL) {
> > > > mmap_lock();
> > > > + tb_write_unlock();
> > > > tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> > > > + tb_write_lock();
> > > > mmap_unlock();
> > > > }
> > > >
> > > > @@ -428,7 +434,9 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
> > > > tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
> > > > if (tb == NULL) {
> > > > mmap_lock();
> > > > + tb_write_unlock();
> > > > tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
> > > > + tb_write_lock();
> > > > mmap_unlock();
> > > > /* We add the TB in the virtual pc hash table for the fast lookup */
> > > > qatomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
> > > > @@ -444,7 +452,9 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
> > > > #endif
> > > > /* See if we can patch the calling TB. */
> > > > if (last_tb) {
> > > > + tb_write_unlock();
> > > > tb_add_jump(last_tb, tb_exit, tb);
> > > > + tb_write_lock();
> > > > }
> > > > return tb;
> > > > }
> > > > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > > > index b7d50a73d4..1562076ffb 100644
> > > > --- a/accel/tcg/translate-all.c
> > > > +++ b/accel/tcg/translate-all.c
> > > > @@ -1072,6 +1072,9 @@ static inline void *alloc_code_gen_buffer(void)
> > > > size_t size = tcg_ctx->code_gen_buffer_size;
> > > > void *buf;
> > > >
> > > > +#if defined(__APPLE__) && defined(__aarch64__)
> > > > + flags |= MAP_JIT;
> > > > +#endif
> > > > buf = mmap(NULL, size, prot, flags, -1, 0);
> > > > if (buf == MAP_FAILED) {
> > > > return NULL;
> > > > @@ -1485,7 +1488,9 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
> > > >
> > > > static void tb_phys_invalidate__locked(TranslationBlock *tb)
> > > > {
> > > > + tb_write_unlock();
> > > > do_tb_phys_invalidate(tb, true);
> > > > + tb_write_lock();
> > > > }
> > > >
> > > > /* invalidate one TB
> > > > @@ -2722,3 +2727,24 @@ void tcg_flush_softmmu_tlb(CPUState *cs)
> > > > tlb_flush(cs);
> > > > #endif
> > > > }
> > > > +
> > > > +#if defined(__APPLE__) && defined(__aarch64__)
> > > > +static void tb_write_protect(bool locked)
> > > > +{
> > > > + if (pthread_jit_write_protect_supported_np()){
> > > > + pthread_jit_write_protect_np(locked);
> > > > + }
> > > > +}
> > > > +#else
> > > > +static void tb_write_protect(bool locked) {}
> > > > +#endif
> > > > +
> > > > +void tb_write_lock(void)
> > > > +{
> > > > + tb_write_protect(true);
> > > > +}
> > > > +
> > > > +void tb_write_unlock(void)
> > > > +{
> > > > + tb_write_protect(false);
> > > > +}
> > > > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > > > index fab573da06..962dca0975 100644
> > > > --- a/include/exec/exec-all.h
> > > > +++ b/include/exec/exec-all.h
> > > > @@ -549,6 +549,8 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
> > > > target_ulong cs_base, uint32_t flags,
> > > > uint32_t cf_mask);
> > > > void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
> > > > +void tb_write_lock(void);
> > > > +void tb_write_unlock(void);
> > > >
> > > > /* GETPC is the true target of the return instruction that we'll execute. */
> > > > #if defined(CONFIG_TCG_INTERPRETER)
> > > > diff --git a/tcg/tcg.c b/tcg/tcg.c
> > > > index 43c6cf8f52..303bb436bd 100644
> > > > --- a/tcg/tcg.c
> > > > +++ b/tcg/tcg.c
> > > > @@ -1065,6 +1065,7 @@ void tcg_prologue_init(TCGContext *s)
> > > > s->pool_labels = NULL;
> > > > #endif
> > > >
> > > > + tb_write_unlock();
> > > > /* Generate the prologue. */
> > > > tcg_target_qemu_prologue(s);
> > > >
> > > > --
> > > > 2.29.2
> > > >
> > >
> > > I've also noticed that Apple doesn't worry about sticking to particular
> > > W^X mode:
> > >
> > > https://bugs.webkit.org/attachment.cgi?id=402515&action=prettypatch
> > >
> > > We might also drop lock/unlock symmetry from here. E.g. we can have two
> > > functions that switch the mode (they might be moved to util/osdep.c):
> > >
> > > qemu_jit_write();
> > > qemu_jit_execute();
> > >
> > > Then we use them just before writing or before executing like advised on
> > > their documentation page.
> > >
> > > -Roman
next prev parent reply other threads:[~2021-01-03 16:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-02 12:21 [PATCH] tcg: Fix execution on Apple Silicon Roman Bolshakov
2021-01-02 12:39 ` no-reply
2021-01-02 16:13 ` Roman Bolshakov
2021-01-02 19:55 ` Joelle van Dyne
2021-01-03 14:20 ` Roman Bolshakov
2021-01-03 16:46 ` Joelle van Dyne [this message]
2021-01-03 22:54 ` Roman Bolshakov
2021-01-04 1:47 ` Joelle van Dyne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+E+eSBtxV996L+vNo5DBdrgafH=CX+TooRjj0KsfOtcvqYxWQ@mail.gmail.com' \
--to=j@getutm.app \
--cc=qemu-devel@nongnu.org \
--cc=r.bolshakov@yadro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).