All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: "kernel test robot" <lkp@intel.com>,
	jpoimboe@kernel.org, llvm@lists.linux.dev,
	oe-kbuild-all@lists.linux.dev,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>
Subject: Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
Date: Wed, 24 Jun 2026 11:28:06 +0200	[thread overview]
Message-ID: <20260624092806.GX48970@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CABXGCsMFOo-X_JeZ6qiwN8L=_4wOcf9xeS_RcDAb8XfzA6+WBA@mail.gmail.com>

On Wed, Jun 24, 2026 at 01:57:46AM +0500, Mikhail Gavrilov wrote:
> [+Josh, +Peter: objtool question below]
> 
> On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@intel.com> wrote:
> > >> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
> 
> I looked into this. It is an objtool false positive on a computed goto,
> not a problem in the patch, and not specific to clang 22.1.3.

Urrgh, computed gotos :-(

> Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
> OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
> objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
> reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
> on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
> 220108), so it is not a 22.1.3-only codegen issue.
> 
> What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
> object survives):
> 
>   17f: 48 c7 c0 00 00 00 00   mov    $0x0,%rax
>           R_X86_64_32S  .text.amdgpu_vm_handle_fault+0x196
>   186: ff e0                  jmp    *%rax
> 
> %rax is loaded via an R_X86_64_32S relocation with the address of label
> .text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
> back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
> This is the drm_exec_retry_on_contention() computed goto
> (goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
> -> +0x188 for the first loop. Both targets are inside the function; this
> is not a tail call into another function. (svm_range_restore_pages() is
> the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
> 
> So clang materialized the label address as mov $imm(reloc); jmp *%rax
> instead of folding it into a direct jmp to the label.

So, if my heat-addled brain isn't completely gone, then this all boils
down to something like:

  drm_exec_init(&exec);
label:
  for (;drm_exec_cleanup(&exec);) {
     ...
     if (unlikely(drm_exec_is_contended(&exec)))
       goto label;
     ...
  }
  ...
  drm_exec_fini(&exec);

Except, you've laundered that label through a computed goto to allow
multiple such constructs in a single function -- because can't have
multiple identical labels etc.

And then clang, can't untangle the web and makes a mess of it. Which is
really rather unfortunate, because indirect calls are yuck -- also
retpolines and cfi and all that jazz.

I do think this is very much a compiler issue, clang should never emit
an indirect call for this. Doing that is just aweful.

Also, note that if anybody were ever to use a guard() inside this
drm_exec_until_all_locked() construct, there is no guarantee the
computed goto will actually trip the __cleanup(). Computed goto's and
scope do not play well together. And the moment clang emits an indirect
call, it means it lost track of things and things *will* go sideways.

And the only reason you want that label, is because nested loops,
because without that, a simple 'continue' would do just fine.

That is, I think this drm_exec_until_all_locked() thing has some
fundamental problems the moment clang fails to optimize it away.
Correctness really depends on the compiler not actually ever doing a
computed goto.

> For the indirect
> jmp objtool looks for a jump table in .rodata, finds none (this is a
> single relocated label, not an indexed table), and falls back to
> treating it as an indirect sibling call. The frame is already set up
> (push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
> modified stack frame". Runtime is fine, the jmp lands on the intended
> in-function label; this is purely an objtool classification issue.
> KASAN probably
> tips the balance: the function is inflated with __asan_* checks
> and shadow tests, and on that body clang keeps the label in a register
> rather than folding it.

Right.

> The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
> are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
> just the first to put such a loop into amdgpu_vm_handle_fault().

It is also the first to cause clang to loose its mind and emit an
indirect call. Which as I've explained above, really is a problem.

> objtool question: should objtool resolve an indirect jmp whose target
> register is loaded from a relocation pointing at a .text label inside
> the same function, and treat it as an intra-function jump rather than a
> sibling call? That would cover the labels-as-values / computed-goto
> pattern that this and any future drm_exec user under LTO+KASAN will hit.
> 
> I am not respinning the series for this, the page-fault conversion is a
> pure refactor with no manual stack work. Happy to test an objtool fix,
> or to help with a drm_exec.h workaround if that is preferred over an
> objtool change.

While I do think we could teach objtool about this, I also think it
should still emit a warning, because as stated, this pattern is very
very suspect and likely broken :-(



  reply	other threads:[~2026-06-24  9:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 15:15 [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame kernel test robot
2026-06-23 20:57 ` Mikhail Gavrilov
2026-06-24  9:28   ` Peter Zijlstra [this message]
2026-06-24  9:48     ` Christian König
2026-06-24 10:56       ` Peter Zijlstra
2026-06-24 11:04         ` Christian König
2026-06-24 11:16           ` Peter Zijlstra
2026-06-24 11:21             ` Peter Zijlstra
2026-06-24 12:36     ` Mikhail Gavrilov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624092806.GX48970@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=oe-kbuild-all@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.