Building the Linux kernel with Clang and LLVM
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: "kernel test robot" <lkp@intel.com>,
	jpoimboe@kernel.org, llvm@lists.linux.dev,
	oe-kbuild-all@lists.linux.dev,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>
Subject: Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
Date: Wed, 24 Jun 2026 11:28:06 +0200	[thread overview]
Message-ID: <20260624092806.GX48970@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CABXGCsMFOo-X_JeZ6qiwN8L=_4wOcf9xeS_RcDAb8XfzA6+WBA@mail.gmail.com>

On Wed, Jun 24, 2026 at 01:57:46AM +0500, Mikhail Gavrilov wrote:
> [+Josh, +Peter: objtool question below]
> 
> On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@intel.com> wrote:
> > >> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
> 
> I looked into this. It is an objtool false positive on a computed goto,
> not a problem in the patch, and not specific to clang 22.1.3.

Urrgh, computed gotos :-(

> Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
> OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
> objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
> reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
> on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
> 220108), so it is not a 22.1.3-only codegen issue.
> 
> What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
> object survives):
> 
>   17f: 48 c7 c0 00 00 00 00   mov    $0x0,%rax
>           R_X86_64_32S  .text.amdgpu_vm_handle_fault+0x196
>   186: ff e0                  jmp    *%rax
> 
> %rax is loaded via an R_X86_64_32S relocation with the address of label
> .text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
> back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
> This is the drm_exec_retry_on_contention() computed goto
> (goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
> -> +0x188 for the first loop. Both targets are inside the function; this
> is not a tail call into another function. (svm_range_restore_pages() is
> the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
> 
> So clang materialized the label address as mov $imm(reloc); jmp *%rax
> instead of folding it into a direct jmp to the label.

So, if my heat-addled brain isn't completely gone, then this all boils
down to something like:

  drm_exec_init(&exec);
label:
  for (;drm_exec_cleanup(&exec);) {
     ...
     if (unlikely(drm_exec_is_contended(&exec)))
       goto label;
     ...
  }
  ...
  drm_exec_fini(&exec);

Except, you've laundered that label through a computed goto to allow
multiple such constructs in a single function -- because can't have
multiple identical labels etc.

And then clang, can't untangle the web and makes a mess of it. Which is
really rather unfortunate, because indirect calls are yuck -- also
retpolines and cfi and all that jazz.

I do think this is very much a compiler issue, clang should never emit
an indirect call for this. Doing that is just aweful.

Also, note that if anybody were ever to use a guard() inside this
drm_exec_until_all_locked() construct, there is no guarantee the
computed goto will actually trip the __cleanup(). Computed goto's and
scope do not play well together. And the moment clang emits an indirect
call, it means it lost track of things and things *will* go sideways.

And the only reason you want that label, is because nested loops,
because without that, a simple 'continue' would do just fine.

That is, I think this drm_exec_until_all_locked() thing has some
fundamental problems the moment clang fails to optimize it away.
Correctness really depends on the compiler not actually ever doing a
computed goto.

> For the indirect
> jmp objtool looks for a jump table in .rodata, finds none (this is a
> single relocated label, not an indexed table), and falls back to
> treating it as an indirect sibling call. The frame is already set up
> (push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
> modified stack frame". Runtime is fine, the jmp lands on the intended
> in-function label; this is purely an objtool classification issue.
> KASAN probably
> tips the balance: the function is inflated with __asan_* checks
> and shadow tests, and on that body clang keeps the label in a register
> rather than folding it.

Right.

> The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
> are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
> just the first to put such a loop into amdgpu_vm_handle_fault().

It is also the first to cause clang to loose its mind and emit an
indirect call. Which as I've explained above, really is a problem.

> objtool question: should objtool resolve an indirect jmp whose target
> register is loaded from a relocation pointing at a .text label inside
> the same function, and treat it as an intra-function jump rather than a
> sibling call? That would cover the labels-as-values / computed-goto
> pattern that this and any future drm_exec user under LTO+KASAN will hit.
> 
> I am not respinning the series for this, the page-fault conversion is a
> pure refactor with no manual stack work. Happy to test an objtool fix,
> or to help with a drm_exec.h workaround if that is preferred over an
> objtool change.

While I do think we could teach objtool about this, I also think it
should still emit a warning, because as stated, this pattern is very
very suspect and likely broken :-(



  reply	other threads:[~2026-06-24  9:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 15:15 [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame kernel test robot
2026-06-23 20:57 ` Mikhail Gavrilov
2026-06-24  9:28   ` Peter Zijlstra [this message]
2026-06-24  9:48     ` Christian König
2026-06-24 10:56       ` Peter Zijlstra
2026-06-24 11:04         ` Christian König
2026-06-24 11:16           ` Peter Zijlstra
2026-06-24 11:21             ` Peter Zijlstra
2026-06-24 12:36     ` Mikhail Gavrilov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624092806.GX48970@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=oe-kbuild-all@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox