Re: TCG Address Sanitizer Optimization.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Jon Wilson <jonwilson030981@googlemail.com>
Cc: qemu-devel@nongnu.org,
	Richard Henderson <richard.henderson@linaro.org>,
	Pierrick Bouvier <pierrick.bouvier@linaro.org>
Subject: Re: TCG Address Sanitizer Optimization.
Date: Mon, 02 Jun 2025 16:09:42 +0100	[thread overview]
Message-ID: <87h60y41u1.fsf@draig.linaro.org> (raw)
In-Reply-To: <CAJHT5-JwrQ31MEKKSNL-E0RPm+cg7UOiqzV5cPL-mnTOPa7eUA@mail.gmail.com> (Jon Wilson's message of "Mon, 2 Jun 2025 13:02:08 +0100")

Jon Wilson <jonwilson030981@googlemail.com> writes:

(Adding Richard, the TCG maintainer to CC)

> I am attempting to optimize some TCG code which I have previously written for
> qemu-libafl-bridge (https://github.com/AFLplusplus/qemu-libafl-bridge), the 
> component used by the LibAFL project when fuzzing binaries using QEMU to 
> provide runtime instrumentation. The code is used to write additional TCG into 
> basic blocks whenever a load or store operation is performed in order to provide
> address sanitizer functionality.

I would like to figure out if we can push any of this instrumentation
into TCG plugins so you can avoid patching QEMU itself. I guess you need
something that allows you to hook a memory address into some sort of
gadget?

> Address sanitizer is quite simple and works by mapping each 8-byte region of
> address space to single byte within a region called the shadow map. The address
> (on a 64-bit platform) of the shadow map for a given address is:
>
>     Shadow = (Mem >> 3) + 0x7fff8000;
>
> The value in the shadow map encodes the accessibility of an address:
>
>     0  - The whole 8 byte region is accessible.
>     1 .. 7 - The first n bytes are accessible.
>     negative - The whole 8 byte region is inaccessible.
>
> The following pseudo code shows the algorithm:
>
> ////////////////////////////////////////////////////////////////////////////////
>
> https://github.com/google/sanitizers/wiki/addresssanitizeralgorithm
>
> byte *shadow_address = MemToShadow(address);
> byte shadow_value = *shadow_address;
> if (shadow_value) {
>   if (SlowPathCheck(shadow_value, address, kAccessSize)) {
>     ReportError(address, kAccessSize, kIsWrite);
>   }
> }
>
> // Check the cases where we access first k bytes of the qword
> // and these k bytes are unpoisoned.
> bool SlowPathCheck(shadow_value, address, kAccessSize) {
>   last_accessed_byte = (address & 7) + kAccessSize - 1;
>   return (last_accessed_byte >= shadow_value);
> }
>
> ////////////////////////////////////////////////////////////////////////////////
>
> My current implementation makes use of conditional move instructions to trigger
> a segfault by way of null dereference in the event that the shadow map indicates
> that a memory access is invalid.
>
> ////////////////////////////////////////////////////////////////////////////////
>
> #if TARGET_LONG_BITS == 32
> #define SHADOW_BASE (0x20000000)
> #elif TARGET_LONG_BITS == 64
> #define SHADOW_BASE (0x7fff8000)
> #else
> #error Unhandled TARGET_LONG_BITS value
> #endif
>
> void libafl_tcg_gen_asan(TCGTemp * addr, size_t size)
> {
>     if (size == 0)
>         return;
>
>     TCGv addr_val = temp_tcgv_tl(addr);
>     TCGv k = tcg_temp_new();
>     TCGv shadow_addr = tcg_temp_new();
>     TCGv_ptr shadow_ptr = tcg_temp_new_ptr();
>     TCGv shadow_val = tcg_temp_new();
>     TCGv test_addr = tcg_temp_new();
>     TCGv_ptr test_ptr = tcg_temp_new_ptr();
>
>     tcg_gen_andi_tl(k, addr_val, 7);
>     tcg_gen_addi_tl(k, k, size - 1);
>
>     tcg_gen_shri_tl(shadow_addr, addr_val, 3);
>     tcg_gen_addi_tl(shadow_addr, shadow_addr, SHADOW_BASE);
>     tcg_gen_tl_ptr(shadow_ptr, shadow_addr);
>     tcg_gen_ld8s_tl(shadow_val, shadow_ptr, 0);
>
>     /*
>      * Making conditional branches here appears to cause QEMU issues with dead
>      * temporaries so we will instead avoid branches.

This sounds like a TCG bug that may have been fixed.

>      We will cause the guest
>      * to perform a NULL dereference in the event of an ASAN fault. Note that
>      * we will do this by using a store rather than a load, since the TCG may
>      * otherwise determine that the result of the load is unused and simply
>      * discard the operation. In the event that the shadow memory doesn't
>      * detect a fault, we will simply write the value read from the shadow
>      * memory back to it's original location. If, however, the shadow memory
>      * detects an invalid access, we will instead attempt to write the value
>      * at 0x0.
>      */

Why not conditionally call a helper here? Forcing the guest to actually
fault seems a bit hammer like.

>     tcg_gen_movcond_tl(TCG_COND_EQ, test_addr,
>         shadow_val, tcg_constant_tl(0),
>         shadow_addr, tcg_constant_tl(0));
>
>     if (size < 8)
>     {
>         tcg_gen_movcond_tl(TCG_COND_GE, test_addr,
>             k, shadow_val,
>             test_addr, shadow_addr);
>     }
>
>     tcg_gen_tl_ptr(test_ptr, test_addr);
>     tcg_gen_st8_tl(shadow_val, test_ptr, 0);
> }
>
> ////////////////////////////////////////////////////////////////////////////////
>
> However, I would like test an implementation more like the following to see how
> the performance compares. Whilst this introduces branches, the fast path is much
> more likely to be executed than the slow path and hence bypassing the additional
> checks and unnecessary memory writes I am hopeful it will improve performance.
>
> ////////////////////////////////////////////////////////////////////////////////
>
> void libafl_tcg_gen_asan(TCGTemp* addr, size_t size)
> {
>     if (size == 0) {
>         return;
>     }
>
>     if (size > 8) {
>         size = 8;
>     }
>
>     TCGLabel *done = gen_new_label();
>
>     TCGv addr_val = temp_tcgv_tl(addr);
>     TCGv shadow_addr = tcg_temp_new();
>     TCGv_ptr shadow_ptr = tcg_temp_new_ptr();
>     TCGv shadow_val = tcg_temp_new();
>     TCGv k = tcg_temp_new();
>     TCGv zero = tcg_constant_tl(0);
>     TCGv_ptr null_ptr = tcg_temp_new_ptr();
>
>     tcg_gen_shri_tl(shadow_addr, addr_val, 3);
>     tcg_gen_addi_tl(shadow_addr, shadow_addr, SHADOW_BASE);
>     tcg_gen_tl_ptr(shadow_ptr, shadow_addr);
>     tcg_gen_ld8s_tl(shadow_val, shadow_ptr, 0);
>
>     tcg_gen_brcond_tl(TCG_COND_EQ, shadow_val, zero, done);
>
>     tcg_gen_andi_tl(k, addr_val, 7);
>     tcg_gen_addi_tl(k, k, size - 1);
>
>     tcg_gen_brcond_tl(TCG_COND_LT, shadow_val, k, done);
>
>     tcg_gen_tl_ptr(null_ptr, zero);
>     tcg_gen_st8_tl(zero, null_ptr, 0);
>
>     gen_set_label(done);
> }
>
> ////////////////////////////////////////////////////////////////////////////////
>
> However, when I change to using this implementation, I get the following error.
> I have tested it with a trivial hello world implementation for x86_64 running in
> qemu-user. It doesn't occur the first time the block is executed, therefore I
> think the issue is caused by the surrounding TCG in the block it is injected
> into?
>
> ////////////////////////////////////////////////////////////////////////////////
> runner-x86_64: ../tcg/tcg.c:4852: tcg_reg_alloc_mov: Assertion `ts->val_type == TEMP_VAL_REG' failed.
> Aborted (core dumped)
> ////////////////////////////////////////////////////////////////////////////////
>
> I would be very grateful for any advice of how to resolve this issue, or any
> alternative approaches I could use to optimize my original implementation. The
> code is obviously a very hot path and so even a tiny performance improvement
> could result in a large performance gain overall.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

next prev parent reply	other threads:[~2025-06-02 15:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-02 12:02 TCG Address Sanitizer Optimization Jon Wilson
2025-06-02 15:09 ` Alex Bennée [this message]
2025-06-02 15:54   ` Jon Wilson
2025-06-02 15:58     ` Richard Henderson
2025-06-02 16:26     ` Alex Bennée
2025-06-03  8:20       ` Jon Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h60y41u1.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=jonwilson030981@googlemail.com \
    --cc=pierrick.bouvier@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).