public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Nathan Chancellor <nathan@kernel.org>, x86-ml <x86@kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	llvm@lists.linux.dev, Peter Zijlstra <peterz@infradead.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: clang memcpy calls
Date: Fri, 25 Mar 2022 12:15:28 +0000	[thread overview]
Message-ID: <Yj2yYFloadFobRPx@lakrids> (raw)
In-Reply-To: <CAKwvOdkw0Bbm+=ZyViXQhBE1L6uSbvkstHJuHpQ21tzJRftgAw@mail.gmail.com>

On Thu, Mar 24, 2022 at 11:43:46AM -0700, Nick Desaulniers wrote:
> On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > Hi folks,
> >
> > so I've been looking at a recent objtool noinstr warning from clang
> > builds:
> >
> > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> >
> > The issue is that clang generates a memcpy() call when a struct copy
> > happens:
> >
> >         if (regs != eregs)
> >                 *regs = *eregs;
> 
> Specifically, this is copying one struct pt_regs to another. It looks
> like the sizeof struct pt_regs is just large enough to have clang emit
> the libcall.
> https://godbolt.org/z/scx6aa8jq
> Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> the structs are below ARBITRARY_THRESHOLD.  Should ARBITRARY_THRESHOLD
> be raised so that we continue to inline the memcpy? *shrug*
> 
> Though, looking at the compiled memcpy (`llvm-objdump -D
> --disassemble-symbols=memcpy vmlinux`), maybe we *should* try harder.
> Filed
> https://github.com/llvm/llvm-project/issues/54535.
> 
> > see below for asm output.
> >
> > While gcc does simply generate an actual "rep; movsq".
> >
> > So, how hard would it be to make clang do that too pls?
> 
> As Mark said in the sibling reply; I don't know of general ways to
> inhibit libcall optimizations on the level you're looking for, short
> of heavy handy methods of disabling optimizations entirely.  There's
> games that can be played with -fno-builtin-*, but they're not super
> portable, and I think there's a handful of *blessed* functions that
> must exist in any env, freestanding or not: memcpy, memmove, memset,
> and memcmp for which you cannot yet express "these do not exist."

Talking with Peter on IRC, I think there's an oversight on the compiler
side here w.r.t. the expectations around these blessed functions, since
either:

a) The compiler expects the out-of-line implementations of functions
   ARE NOT instrumented by address-sanitizer.

   If this is the case, then it's legitimate for the compiler to call
   these functions anywhere, and we should NOT instrument the kernel
   implementations of these. If the compiler wants those instrumented it
   needs to add the instrumentation in the caller.

b) The compiler expects the out-of-line implementations of functions
   ARE instrumented by address-sanitizer.

   If this is the case, the compiler MUST NOT generate implicit calls to
   these "blessed" functions from functions marked with:

     __attribute__((no_sanitize_address)).

   ... or the compiler is violating the premise of that attribute.

   AFAICT The two options for the compiler here are:

   1) Always inline an uninstrumented form of the function in this case

   2) Have distinct instrumented/uninstrumented out-of-line
      implementations, and call the uninstrumented form in this case.

To see what clang and GCC do today, I hacked the following in:

| diff --git a/init/main.c b/init/main.c
| index 65fa2e41a9c0..30406c472b5d 100644
| --- a/init/main.c
| +++ b/init/main.c
| @@ -1637,3 +1637,31 @@ static noinline void __init kernel_init_freeable(void)
|  
|         integrity_load_keys();
|  }
| +
| +void
| +test_implicit_memcpy(struct task_struct *dest,
| +                    const struct task_struct *src)
| +{
| +       *dest = *src;
| +}
| +
| +void
| +test_explicit_memcpy(struct task_struct *dest,
| +                    const struct task_struct *src)
| +{
| +       memcpy(dest, src, sizeof(*dest));
| +}
| +
| +void __no_sanitize_address
| +test_implicit_memcpy_nokasan(struct task_struct *dest,
| +                            const struct task_struct *src)
| +{
| +       *dest = *src;
| +}
| +
| +void __no_sanitize_address
| +test_explicit_memcpy_nokasan(struct task_struct *dest,
| +                            const struct task_struct *src)
| +{
| +       memcpy(dest, src, sizeof(*dest));
| +}


For arm64, GCC 11.1.0, KASAN_OUTLINE I see:

| <test_implicit_memcpy>:
|        d503245f        bti     c
|        d503233f        paciasp
|        a9be7bfd        stp     x29, x30, [sp, #-32]!
|        910003fd        mov     x29, sp
|        a90153f3        stp     x19, x20, [sp, #16]
|        aa0103f3        mov     x19, x1
|        aa0003f4        mov     x20, x0
|        d281c001        mov     x1, #0xe00                      // #3584
|        940b9534        bl      ffff8000082f9d90 <__asan_storeN>
|        aa1303e0        mov     x0, x19
|        d281c001        mov     x1, #0xe00                      // #3584
|        940b951e        bl      ffff8000082f9d44 <__asan_loadN>
|        aa1303e1        mov     x1, x19
|        aa1403e0        mov     x0, x20
|        d281c002        mov     x2, #0xe00                      // #3584
|        940b98c5        bl      ffff8000082fabf0 <memcpy>
|        a94153f3        ldp     x19, x20, [sp, #16]
|        a8c27bfd        ldp     x29, x30, [sp], #32
|        d50323bf        autiasp
|        d65f03c0        ret
| 
| <test_explicit_memcpy>:
|        d503245f        bti     c
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        d281c002        mov     x2, #0xe00                      // #3584
|        910003fd        mov     x29, sp
|        940b98bb        bl      ffff8000082fabf0 <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret
| 
| <test_implicit_memcpy_nokasan>:
|        d503245f        bti     c
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        d281c002        mov     x2, #0xe00                      // #3584
|        910003fd        mov     x29, sp
|        940b98b2        bl      ffff8000082fabf0 <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret
| 
| <test_explicit_memcpy_nokasan>:
|        d503245f        bti     c
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        d281c002        mov     x2, #0xe00                      // #3584
|        910003fd        mov     x29, sp
|        940b98a7        bl      ffff8000082fabf0 <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret

For x86_64, GCC 11.1.0, KASAN_OUTLINE I see:

| <test_implicit_memcpy>:
|        41 54                   push   %r12
|        49 89 fc                mov    %rdi,%r12
|        55                      push   %rbp
|        48 89 f5                mov    %rsi,%rbp
|        be 40 1c 00 00          mov    $0x1c40,%esi
|        e8 0d 9a 32 00          call   ffffffff8132b0f0 <__asan_storeN>
|        48 89 ef                mov    %rbp,%rdi
|        be 40 1c 00 00          mov    $0x1c40,%esi
|        e8 f0 99 32 00          call   ffffffff8132b0e0 <__asan_loadN>
|        4c 89 e7                mov    %r12,%rdi
|        48 89 ee                mov    %rbp,%rsi
|        b9 88 03 00 00          mov    $0x388,%ecx
|        f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
|        5d                      pop    %rbp
|        41 5c                   pop    %r12
|        c3                      ret   
| 
| <test_explicit_memcpy>:
|        ba 40 1c 00 00          mov    $0x1c40,%edx
|        e9 b6 9f 32 00          jmp    ffffffff8132b6d0 <memcpy>
| 
| 
| <test_implicit_memcpy_nokasan>:
|        b9 88 03 00 00          mov    $0x388,%ecx
|        f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
|        c3                      ret    
| 
| <test_explicit_memcpy_nokasan>:
|        ba 40 1c 00 00          mov    $0x1c40,%edx
|        e9 96 9f 32 00          jmp    ffffffff8132b6d0 <memcpy>

So from those examples it seems GCC falls into bucket (a), and assumes the
blessed functions ARE NOT instrumented.

We can make this noinstr-safe AND get instrumentation for the first two cases
by removing the instrumentation from the out-of-line copies (always using
noinstr asm implementations) and using ifdeffery to make the explicit calls
target as distinct kasan_instrumented_memcpy() or similar...


For arm64, clang 13.0.0, KASAN_OUTLINE I see:

| <test_implicit_memcpy>:
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        910003fd        mov     x29, sp
|        5281c002        mov     w2, #0xe00                      // #3584
|        940c0f66        bl      ffff8000083185fc <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret
| 
| <test_explicit_memcpy>:
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        910003fd        mov     x29, sp
|        5281c002        mov     w2, #0xe00                      // #3584
|        940c0f5e        bl      ffff8000083185fc <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret
| 
| <test_implicit_memcpy_nokasan>:
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        910003fd        mov     x29, sp
|        5281c002        mov     w2, #0xe00                      // #3584
|        940c0f56        bl      ffff8000083185fc <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret
| 
| <test_explicit_memcpy_nokasan>:
|        d503233f        paciasp
|        a9bf7bfd        stp     x29, x30, [sp, #-16]!
|        910003fd        mov     x29, sp
|        5281c002        mov     w2, #0xe00                      // #3584
|        940c0f4e        bl      ffff8000083185fc <memcpy>
|        a8c17bfd        ldp     x29, x30, [sp], #16
|        d50323bf        autiasp
|        d65f03c0        ret

For x86_64, clang 13.0.0, KASAN_OUTLINE I see:

| <test_implicit_memcpy>:
|        ba 40 1c 00 00          mov    $0x1c40,%edx
|        e8 d6 94 36 00          call   ffffffff8136a830 <memcpy>
|        c3                      ret  
| 
| <test_explicit_memcpy>:
|        ba 40 1c 00 00          mov    $0x1c40,%edx
|        e8 c6 94 36 00          call   ffffffff8136a830 <memcpy>
|        c3                      ret    
| 
| 
| <test_implicit_memcpy_nokasan>:
|        ba 40 1c 00 00          mov    $0x1c40,%edx
|        e9 b6 94 36 00          jmp    ffffffff8136a830 <memcpy>
| 
| 
| <test_explicit_memcpy_nokasan>:
|        ba 40 1c 00 00          mov    $0x1c40,%edx
|        e9 a6 94 36 00          jmp    ffffffff8136a830 <memcpy>

... for which the first two suggests clang thinks the blessed functions *are*
instrumented, which means that generating calls to those in the latter two
cases is a bug.

We can make this noinstr-safe as with the GCC case, but we'll lose the
desirable instrumentation for the test_implicit_memcpy() case.

I think something has to change on the compiler side here (e.g. as per
options above), and we should align GCC and clang on the same
approach...

Thanks,
Mark.

  parent reply	other threads:[~2022-03-25 12:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 11:19 clang memcpy calls Borislav Petkov
2022-03-24 15:29 ` Mark Rutland
2022-03-24 18:43 ` Nick Desaulniers
2022-03-24 22:54   ` David Laight
2022-03-25 12:15   ` Mark Rutland [this message]
2022-03-25 14:13     ` Peter Zijlstra
2022-03-25 15:12       ` Segher Boessenkool
2022-03-28  9:52         ` Mark Rutland
2022-03-28 10:20           ` Jakub Jelinek
2022-03-28 11:54             ` Peter Zijlstra
2022-03-28 12:55             ` Mark Rutland
2022-03-28 13:12               ` Jakub Jelinek
2022-03-28 13:44                 ` Mark Rutland
2022-03-30 14:45                   ` Marco Elver
2022-03-28 14:22           ` Segher Boessenkool
2022-03-28 14:58             ` Mark Rutland
2022-03-28 15:59               ` Segher Boessenkool
2022-03-28 16:16                 ` Peter Zijlstra
2022-03-28 16:58                   ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yj2yYFloadFobRPx@lakrids \
    --to=mark.rutland@arm.com \
    --cc=bp@alien8.de \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox