* [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
@ 2026-06-23 15:15 kernel test robot
2026-06-23 20:57 ` Mikhail Gavrilov
0 siblings, 1 reply; 9+ messages in thread
From: kernel test robot @ 2026-06-23 15:15 UTC (permalink / raw)
To: Mikhail Gavrilov; +Cc: llvm, oe-kbuild-all, Alex Deucher, Christian König
Hi Mikhail,
FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head: 948efecf22e49aa4bf55bb73ec79a0ddcfd38571
commit: 14682de8ad377bf13ea66e47c26dcfea0b19a21d [14191/14955] drm/amdgpu: convert amdgpu_vm_lock_by_pasid() to drm_exec
config: x86_64-randconfig-006 (https://download.01.org/0day-ci/archive/20260623/202606232356.gwHMAJAW-lkp@intel.com/config)
compiler: clang version 22.1.3 (https://github.com/llvm/llvm-project e9846648fd6183ee6d8cbdb4502213fcf902a211)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260623/202606232356.gwHMAJAW-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606232356.gwHMAJAW-lkp@intel.com/
All errors (new ones prefixed by >>):
>> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-23 15:15 [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame kernel test robot
@ 2026-06-23 20:57 ` Mikhail Gavrilov
2026-06-24 9:28 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2026-06-23 20:57 UTC (permalink / raw)
To: kernel test robot, jpoimboe, Peter Zijlstra
Cc: llvm, oe-kbuild-all, Alex Deucher, Christian König,
Linux List Kernel Mailing
[+Josh, +Peter: objtool question below]
On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@intel.com> wrote:
> >> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
I looked into this. It is an objtool false positive on a computed goto,
not a problem in the patch, and not specific to clang 22.1.3.
Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
220108), so it is not a 22.1.3-only codegen issue.
What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
object survives):
17f: 48 c7 c0 00 00 00 00 mov $0x0,%rax
R_X86_64_32S .text.amdgpu_vm_handle_fault+0x196
186: ff e0 jmp *%rax
%rax is loaded via an R_X86_64_32S relocation with the address of label
.text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
This is the drm_exec_retry_on_contention() computed goto
(goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
-> +0x188 for the first loop. Both targets are inside the function; this
is not a tail call into another function. (svm_range_restore_pages() is
the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
So clang materialized the label address as mov $imm(reloc); jmp *%rax
instead of folding it into a direct jmp to the label. For the indirect
jmp objtool looks for a jump table in .rodata, finds none (this is a
single relocated label, not an indexed table), and falls back to
treating it as an indirect sibling call. The frame is already set up
(push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
modified stack frame". Runtime is fine, the jmp lands on the intended
in-function label; this is purely an objtool classification issue.
KASAN probably
tips the balance: the function is inflated with __asan_* checks
and shadow tests, and on that body clang keeps the label in a register
rather than folding it.
The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
just the first to put such a loop into amdgpu_vm_handle_fault().
objtool question: should objtool resolve an indirect jmp whose target
register is loaded from a relocation pointing at a .text label inside
the same function, and treat it as an intra-function jump rather than a
sibling call? That would cover the labels-as-values / computed-goto
pattern that this and any future drm_exec user under LTO+KASAN will hit.
I am not respinning the series for this, the page-fault conversion is a
pure refactor with no manual stack work. Happy to test an objtool fix,
or to help with a drm_exec.h workaround if that is preferred over an
objtool change.
config: https://download.01.org/0day-ci/archive/20260623/202606232356.gwHMAJAW-lkp@intel.com/config
compiler: clang 22.1.3 (robot report) and clang 22.1.8 (Fedora
22.1.8-4.fc45, my reproduction)
--
Best Regards,
Mikhail Gavrilov.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-23 20:57 ` Mikhail Gavrilov
@ 2026-06-24 9:28 ` Peter Zijlstra
2026-06-24 9:48 ` Christian König
2026-06-24 12:36 ` Mikhail Gavrilov
0 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-24 9:28 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: kernel test robot, jpoimboe, llvm, oe-kbuild-all, Alex Deucher,
Christian König, Linux List Kernel Mailing
On Wed, Jun 24, 2026 at 01:57:46AM +0500, Mikhail Gavrilov wrote:
> [+Josh, +Peter: objtool question below]
>
> On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@intel.com> wrote:
> > >> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
>
> I looked into this. It is an objtool false positive on a computed goto,
> not a problem in the patch, and not specific to clang 22.1.3.
Urrgh, computed gotos :-(
> Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
> OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
> objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
> reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
> on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
> 220108), so it is not a 22.1.3-only codegen issue.
>
> What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
> object survives):
>
> 17f: 48 c7 c0 00 00 00 00 mov $0x0,%rax
> R_X86_64_32S .text.amdgpu_vm_handle_fault+0x196
> 186: ff e0 jmp *%rax
>
> %rax is loaded via an R_X86_64_32S relocation with the address of label
> .text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
> back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
> This is the drm_exec_retry_on_contention() computed goto
> (goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
> -> +0x188 for the first loop. Both targets are inside the function; this
> is not a tail call into another function. (svm_range_restore_pages() is
> the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
>
> So clang materialized the label address as mov $imm(reloc); jmp *%rax
> instead of folding it into a direct jmp to the label.
So, if my heat-addled brain isn't completely gone, then this all boils
down to something like:
drm_exec_init(&exec);
label:
for (;drm_exec_cleanup(&exec);) {
...
if (unlikely(drm_exec_is_contended(&exec)))
goto label;
...
}
...
drm_exec_fini(&exec);
Except, you've laundered that label through a computed goto to allow
multiple such constructs in a single function -- because can't have
multiple identical labels etc.
And then clang, can't untangle the web and makes a mess of it. Which is
really rather unfortunate, because indirect calls are yuck -- also
retpolines and cfi and all that jazz.
I do think this is very much a compiler issue, clang should never emit
an indirect call for this. Doing that is just aweful.
Also, note that if anybody were ever to use a guard() inside this
drm_exec_until_all_locked() construct, there is no guarantee the
computed goto will actually trip the __cleanup(). Computed goto's and
scope do not play well together. And the moment clang emits an indirect
call, it means it lost track of things and things *will* go sideways.
And the only reason you want that label, is because nested loops,
because without that, a simple 'continue' would do just fine.
That is, I think this drm_exec_until_all_locked() thing has some
fundamental problems the moment clang fails to optimize it away.
Correctness really depends on the compiler not actually ever doing a
computed goto.
> For the indirect
> jmp objtool looks for a jump table in .rodata, finds none (this is a
> single relocated label, not an indexed table), and falls back to
> treating it as an indirect sibling call. The frame is already set up
> (push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
> modified stack frame". Runtime is fine, the jmp lands on the intended
> in-function label; this is purely an objtool classification issue.
> KASAN probably
> tips the balance: the function is inflated with __asan_* checks
> and shadow tests, and on that body clang keeps the label in a register
> rather than folding it.
Right.
> The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
> are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
> just the first to put such a loop into amdgpu_vm_handle_fault().
It is also the first to cause clang to loose its mind and emit an
indirect call. Which as I've explained above, really is a problem.
> objtool question: should objtool resolve an indirect jmp whose target
> register is loaded from a relocation pointing at a .text label inside
> the same function, and treat it as an intra-function jump rather than a
> sibling call? That would cover the labels-as-values / computed-goto
> pattern that this and any future drm_exec user under LTO+KASAN will hit.
>
> I am not respinning the series for this, the page-fault conversion is a
> pure refactor with no manual stack work. Happy to test an objtool fix,
> or to help with a drm_exec.h workaround if that is preferred over an
> objtool change.
While I do think we could teach objtool about this, I also think it
should still emit a warning, because as stated, this pattern is very
very suspect and likely broken :-(
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-24 9:28 ` Peter Zijlstra
@ 2026-06-24 9:48 ` Christian König
2026-06-24 10:56 ` Peter Zijlstra
2026-06-24 12:36 ` Mikhail Gavrilov
1 sibling, 1 reply; 9+ messages in thread
From: Christian König @ 2026-06-24 9:48 UTC (permalink / raw)
To: Peter Zijlstra, Mikhail Gavrilov
Cc: kernel test robot, jpoimboe, llvm, oe-kbuild-all, Alex Deucher,
Linux List Kernel Mailing
On 6/24/26 11:28, Peter Zijlstra wrote:
> On Wed, Jun 24, 2026 at 01:57:46AM +0500, Mikhail Gavrilov wrote:
>> [+Josh, +Peter: objtool question below]
>>
>> On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@intel.com> wrote:
>>>>> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
>>
>> I looked into this. It is an objtool false positive on a computed goto,
>> not a problem in the patch, and not specific to clang 22.1.3.
>
> Urrgh, computed gotos :-(
>
>> Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
>> OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
>> objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
>> reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
>> on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
>> 220108), so it is not a 22.1.3-only codegen issue.
>>
>> What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
>> object survives):
>>
>> 17f: 48 c7 c0 00 00 00 00 mov $0x0,%rax
>> R_X86_64_32S .text.amdgpu_vm_handle_fault+0x196
>> 186: ff e0 jmp *%rax
>>
>> %rax is loaded via an R_X86_64_32S relocation with the address of label
>> .text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
>> back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
>> This is the drm_exec_retry_on_contention() computed goto
>> (goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
>> -> +0x188 for the first loop. Both targets are inside the function; this
>> is not a tail call into another function. (svm_range_restore_pages() is
>> the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
>>
>> So clang materialized the label address as mov $imm(reloc); jmp *%rax
>> instead of folding it into a direct jmp to the label.
>
> So, if my heat-addled brain isn't completely gone, then this all boils
> down to something like:
>
> drm_exec_init(&exec);
> label:
> for (;drm_exec_cleanup(&exec);) {
> ...
> if (unlikely(drm_exec_is_contended(&exec)))
> goto label;
> ...
> }
> ...
> drm_exec_fini(&exec);
>
> Except, you've laundered that label through a computed goto to allow
> multiple such constructs in a single function -- because can't have
> multiple identical labels etc.
Just FYI: That's actually not the reason for this, the label name is unique now.
The computed goto is used to limit the scope you can use the retry macros because we initially had problems with people using that outside of the loop.
>
> And then clang, can't untangle the web and makes a mess of it. Which is
> really rather unfortunate, because indirect calls are yuck -- also
> retpolines and cfi and all that jazz.
>
> I do think this is very much a compiler issue, clang should never emit
> an indirect call for this. Doing that is just aweful.
>
> Also, note that if anybody were ever to use a guard() inside this
> drm_exec_until_all_locked() construct, there is no guarantee the
> computed goto will actually trip the __cleanup(). Computed goto's and
> scope do not play well together. And the moment clang emits an indirect
> call, it means it lost track of things and things *will* go sideways.
Good to know.
>
> And the only reason you want that label, is because nested loops,
> because without that, a simple 'continue' would do just fine.
Exactly that, yes. The nested loops are unfortunately a very common use case here.
>
> That is, I think this drm_exec_until_all_locked() thing has some
> fundamental problems the moment clang fails to optimize it away.
> Correctness really depends on the compiler not actually ever doing a
> computed goto.
Mhm, that's not good at all but I also of hand don't see a way to avoid the computed goto.
Thanks for pointing out what's wrong here, I was already scratching my head about that quite a bit.
Regards,
Christian.
>
>> For the indirect
>> jmp objtool looks for a jump table in .rodata, finds none (this is a
>> single relocated label, not an indexed table), and falls back to
>> treating it as an indirect sibling call. The frame is already set up
>> (push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
>> modified stack frame". Runtime is fine, the jmp lands on the intended
>> in-function label; this is purely an objtool classification issue.
>> KASAN probably
>> tips the balance: the function is inflated with __asan_* checks
>> and shadow tests, and on that body clang keeps the label in a register
>> rather than folding it.
>
> Right.
>
>> The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
>> are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
>> just the first to put such a loop into amdgpu_vm_handle_fault().
>
> It is also the first to cause clang to loose its mind and emit an
> indirect call. Which as I've explained above, really is a problem.
>
>> objtool question: should objtool resolve an indirect jmp whose target
>> register is loaded from a relocation pointing at a .text label inside
>> the same function, and treat it as an intra-function jump rather than a
>> sibling call? That would cover the labels-as-values / computed-goto
>> pattern that this and any future drm_exec user under LTO+KASAN will hit.
>>
>> I am not respinning the series for this, the page-fault conversion is a
>> pure refactor with no manual stack work. Happy to test an objtool fix,
>> or to help with a drm_exec.h workaround if that is preferred over an
>> objtool change.
>
> While I do think we could teach objtool about this, I also think it
> should still emit a warning, because as stated, this pattern is very
> very suspect and likely broken :-(
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-24 9:48 ` Christian König
@ 2026-06-24 10:56 ` Peter Zijlstra
2026-06-24 11:04 ` Christian König
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-24 10:56 UTC (permalink / raw)
To: Christian König
Cc: Mikhail Gavrilov, kernel test robot, jpoimboe, llvm,
oe-kbuild-all, Alex Deucher, Linux List Kernel Mailing
On Wed, Jun 24, 2026 at 11:48:09AM +0200, Christian König wrote:
> > So, if my heat-addled brain isn't completely gone, then this all boils
> > down to something like:
> >
> > drm_exec_init(&exec);
> > label:
> > for (;drm_exec_cleanup(&exec);) {
> > ...
> > if (unlikely(drm_exec_is_contended(&exec)))
> > goto label;
> > ...
> > }
> > ...
> > drm_exec_fini(&exec);
> >
> > Except, you've laundered that label through a computed goto to allow
> > multiple such constructs in a single function -- because can't have
> > multiple identical labels etc.
>
> Just FYI: That's actually not the reason for this, the label name is unique now.
Yeah, I know, but I thought that the only way to transfer the name to
the drm_exec_retry() sites was to create that computed-goto alias.
> The computed goto is used to limit the scope you can use the retry
> macros because we initially had problems with people using that
> outside of the loop.
Ah, also a very good reason.
> Exactly that, yes. The nested loops are unfortunately a very common use case here.
Yep, saw that.
> > That is, I think this drm_exec_until_all_locked() thing has some
> > fundamental problems the moment clang fails to optimize it away.
> > Correctness really depends on the compiler not actually ever doing a
> > computed goto.
>
> Mhm, that's not good at all but I also of hand don't see a way to
> avoid the computed goto.
Best I can come up with is something like this. Not exactly nice, but
should be fairly trivial to do.
---
diff --git a/drivers/gpu/drm/tests/drm_exec_test.c b/drivers/gpu/drm/tests/drm_exec_test.c
index 2fc47f3b463b..7d69c06b2dd3 100644
--- a/drivers/gpu/drm/tests/drm_exec_test.c
+++ b/drivers/gpu/drm/tests/drm_exec_test.c
@@ -60,6 +60,7 @@ static void test_lock(struct kunit *test)
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked(&exec) {
+drm_exec_label:
ret = drm_exec_lock_obj(&exec, &gobj);
drm_exec_retry_on_contention(&exec);
KUNIT_EXPECT_EQ(test, ret, 0);
@@ -80,6 +81,7 @@ static void test_lock_unlock(struct kunit *test)
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked(&exec) {
+drm_exec_label:
ret = drm_exec_lock_obj(&exec, &gobj);
drm_exec_retry_on_contention(&exec);
KUNIT_EXPECT_EQ(test, ret, 0);
@@ -107,6 +109,7 @@ static void test_duplicates(struct kunit *test)
drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
drm_exec_until_all_locked(&exec) {
+drm_exec_label:
ret = drm_exec_lock_obj(&exec, &gobj);
drm_exec_retry_on_contention(&exec);
KUNIT_EXPECT_EQ(test, ret, 0);
@@ -134,6 +137,7 @@ static void test_prepare(struct kunit *test)
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
drm_exec_until_all_locked(&exec) {
+drm_exec_label:
ret = drm_exec_prepare_obj(&exec, &gobj, 1);
drm_exec_retry_on_contention(&exec);
KUNIT_EXPECT_EQ(test, ret, 0);
@@ -166,9 +170,10 @@ static void test_prepare_array(struct kunit *test)
drm_gem_private_object_init(priv->drm, gobj2, PAGE_SIZE);
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
- drm_exec_until_all_locked(&exec)
- ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array),
- 1);
+ drm_exec_until_all_locked(&exec) {
+drm_exec_label:
+ ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array), 1);
+ }
KUNIT_EXPECT_EQ(test, ret, 0);
drm_exec_fini(&exec);
@@ -181,15 +186,15 @@ static void test_multiple_loops(struct kunit *test)
struct drm_exec exec;
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
- drm_exec_until_all_locked(&exec)
- {
+ drm_exec_until_all_locked(&exec) {
+drm_exec_label:
break;
}
drm_exec_fini(&exec);
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
- drm_exec_until_all_locked(&exec)
- {
+ drm_exec_until_all_locked(&exec) {
+drm_exec_label:
break;
}
drm_exec_fini(&exec);
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index 8725ba92ff91..f2bfa6aa7b7b 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -101,17 +101,6 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
#define drm_exec_for_each_locked_object_reverse(exec, obj) \
__drm_exec_for_each_locked_object_reverse(exec, obj, __UNIQUE_ID(drm_exec))
-/*
- * Helper to drm_exec_until_all_locked(). Don't use directly.
- *
- * Since labels can't be defined local to the loop's body we use a jump pointer
- * to make sure that the retry is only used from within the loop's body.
- */
-#define __drm_exec_until_all_locked(exec, _label) \
-_label: \
- for (void *const __maybe_unused __drm_exec_retry_ptr = &&_label; \
- drm_exec_cleanup(exec);)
-
/**
* drm_exec_until_all_locked - loop until all GEM objects are locked
* @exec: drm_exec object
@@ -121,7 +110,18 @@ _label: \
* guaranteed that no GEM object is locked.
*/
#define drm_exec_until_all_locked(exec) \
- __drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
+ while (drm_exec_cleanup(exec))
+
+#define drm_exec_label \
+ __label__ __drm_exec_retry; \
+ __label__ __drm_foo; \
+ if (0) { \
+ __drm_exec_retry: __maybe_unused; \
+ continue; \
+ goto __drm_foo; \
+ } \
+ __drm_foo
+
/**
* drm_exec_retry_on_contention - restart the loop to grap all locks
@@ -133,7 +133,7 @@ _label: \
#define drm_exec_retry_on_contention(exec) \
do { \
if (unlikely(drm_exec_is_contended(exec))) \
- goto *__drm_exec_retry_ptr; \
+ goto __drm_exec_retry; \
} while (0)
/**
@@ -158,7 +158,7 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
#define drm_exec_retry(_exec) \
do { \
WARN_ON((_exec)->contended != DRM_EXEC_DUMMY); \
- goto *__drm_exec_retry_ptr; \
+ goto __drm_exec_retry; \
} while (0)
/**
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-24 10:56 ` Peter Zijlstra
@ 2026-06-24 11:04 ` Christian König
2026-06-24 11:16 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-06-24 11:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mikhail Gavrilov, kernel test robot, jpoimboe, llvm,
oe-kbuild-all, Alex Deucher, Linux List Kernel Mailing
On 6/24/26 12:56, Peter Zijlstra wrote:
> On Wed, Jun 24, 2026 at 11:48:09AM +0200, Christian König wrote:
>
>>> So, if my heat-addled brain isn't completely gone, then this all boils
>>> down to something like:
>>>
>>> drm_exec_init(&exec);
>>> label:
>>> for (;drm_exec_cleanup(&exec);) {
>>> ...
>>> if (unlikely(drm_exec_is_contended(&exec)))
>>> goto label;
>>> ...
>>> }
>>> ...
>>> drm_exec_fini(&exec);
>>>
>>> Except, you've laundered that label through a computed goto to allow
>>> multiple such constructs in a single function -- because can't have
>>> multiple identical labels etc.
>>
>> Just FYI: That's actually not the reason for this, the label name is unique now.
>
> Yeah, I know, but I thought that the only way to transfer the name to
> the drm_exec_retry() sites was to create that computed-goto alias.
Yes, that was also a problem.
>
>> The computed goto is used to limit the scope you can use the retry
>> macros because we initially had problems with people using that
>> outside of the loop.
>
> Ah, also a very good reason.
>
>> Exactly that, yes. The nested loops are unfortunately a very common use case here.
>
> Yep, saw that.
>
>>> That is, I think this drm_exec_until_all_locked() thing has some
>>> fundamental problems the moment clang fails to optimize it away.
>>> Correctness really depends on the compiler not actually ever doing a
>>> computed goto.
>>
>> Mhm, that's not good at all but I also of hand don't see a way to
>> avoid the computed goto.
>
> Best I can come up with is something like this. Not exactly nice, but
> should be fairly trivial to do.
Yeah, definitely not nice. The macro was created exactly because people messed this approach up more than once.
I could live with some other hack/workaround, but clearly not with changing all users.
Maybe we can goto the fixed named label but have an extra variable to check that we are in the right scope or something like that.
Regards,
Christian.
>
> ---
> diff --git a/drivers/gpu/drm/tests/drm_exec_test.c b/drivers/gpu/drm/tests/drm_exec_test.c
> index 2fc47f3b463b..7d69c06b2dd3 100644
> --- a/drivers/gpu/drm/tests/drm_exec_test.c
> +++ b/drivers/gpu/drm/tests/drm_exec_test.c
> @@ -60,6 +60,7 @@ static void test_lock(struct kunit *test)
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> ret = drm_exec_lock_obj(&exec, &gobj);
> drm_exec_retry_on_contention(&exec);
> KUNIT_EXPECT_EQ(test, ret, 0);
> @@ -80,6 +81,7 @@ static void test_lock_unlock(struct kunit *test)
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> ret = drm_exec_lock_obj(&exec, &gobj);
> drm_exec_retry_on_contention(&exec);
> KUNIT_EXPECT_EQ(test, ret, 0);
> @@ -107,6 +109,7 @@ static void test_duplicates(struct kunit *test)
>
> drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
> drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> ret = drm_exec_lock_obj(&exec, &gobj);
> drm_exec_retry_on_contention(&exec);
> KUNIT_EXPECT_EQ(test, ret, 0);
> @@ -134,6 +137,7 @@ static void test_prepare(struct kunit *test)
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> ret = drm_exec_prepare_obj(&exec, &gobj, 1);
> drm_exec_retry_on_contention(&exec);
> KUNIT_EXPECT_EQ(test, ret, 0);
> @@ -166,9 +170,10 @@ static void test_prepare_array(struct kunit *test)
> drm_gem_private_object_init(priv->drm, gobj2, PAGE_SIZE);
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> - drm_exec_until_all_locked(&exec)
> - ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array),
> - 1);
> + drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> + ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array), 1);
> + }
> KUNIT_EXPECT_EQ(test, ret, 0);
> drm_exec_fini(&exec);
>
> @@ -181,15 +186,15 @@ static void test_multiple_loops(struct kunit *test)
> struct drm_exec exec;
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> - drm_exec_until_all_locked(&exec)
> - {
> + drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> break;
> }
> drm_exec_fini(&exec);
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> - drm_exec_until_all_locked(&exec)
> - {
> + drm_exec_until_all_locked(&exec) {
> +drm_exec_label:
> break;
> }
> drm_exec_fini(&exec);
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> index 8725ba92ff91..f2bfa6aa7b7b 100644
> --- a/include/drm/drm_exec.h
> +++ b/include/drm/drm_exec.h
> @@ -101,17 +101,6 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
> #define drm_exec_for_each_locked_object_reverse(exec, obj) \
> __drm_exec_for_each_locked_object_reverse(exec, obj, __UNIQUE_ID(drm_exec))
>
> -/*
> - * Helper to drm_exec_until_all_locked(). Don't use directly.
> - *
> - * Since labels can't be defined local to the loop's body we use a jump pointer
> - * to make sure that the retry is only used from within the loop's body.
> - */
> -#define __drm_exec_until_all_locked(exec, _label) \
> -_label: \
> - for (void *const __maybe_unused __drm_exec_retry_ptr = &&_label; \
> - drm_exec_cleanup(exec);)
> -
> /**
> * drm_exec_until_all_locked - loop until all GEM objects are locked
> * @exec: drm_exec object
> @@ -121,7 +110,18 @@ _label: \
> * guaranteed that no GEM object is locked.
> */
> #define drm_exec_until_all_locked(exec) \
> - __drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
> + while (drm_exec_cleanup(exec))
> +
> +#define drm_exec_label \
> + __label__ __drm_exec_retry; \
> + __label__ __drm_foo; \
> + if (0) { \
> + __drm_exec_retry: __maybe_unused; \
> + continue; \
> + goto __drm_foo; \
> + } \
> + __drm_foo
> +
>
> /**
> * drm_exec_retry_on_contention - restart the loop to grap all locks
> @@ -133,7 +133,7 @@ _label: \
> #define drm_exec_retry_on_contention(exec) \
> do { \
> if (unlikely(drm_exec_is_contended(exec))) \
> - goto *__drm_exec_retry_ptr; \
> + goto __drm_exec_retry; \
> } while (0)
>
> /**
> @@ -158,7 +158,7 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
> #define drm_exec_retry(_exec) \
> do { \
> WARN_ON((_exec)->contended != DRM_EXEC_DUMMY); \
> - goto *__drm_exec_retry_ptr; \
> + goto __drm_exec_retry; \
> } while (0)
>
> /**
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-24 11:04 ` Christian König
@ 2026-06-24 11:16 ` Peter Zijlstra
2026-06-24 11:21 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-24 11:16 UTC (permalink / raw)
To: Christian König
Cc: Mikhail Gavrilov, kernel test robot, jpoimboe, llvm,
oe-kbuild-all, Alex Deucher, Linux List Kernel Mailing
On Wed, Jun 24, 2026 at 01:04:36PM +0200, Christian König wrote:
> > Best I can come up with is something like this. Not exactly nice, but
> > should be fairly trivial to do.
>
> Yeah, definitely not nice. The macro was created exactly because
> people messed this approach up more than once.
>
> I could live with some other hack/workaround, but clearly not with
> changing all users.
>
> Maybe we can goto the fixed named label but have an extra variable to
> check that we are in the right scope or something like that.
The fixed name will break test_multiple_loops() and any other user that
actually does that (couldn't find one in a hurry).
That would look like so; by making the drm_exec_retry*() helpers depend
on __drm_exec_loop, it can only be used inside
drm_exec_until_all_locked().
---
diff --git a/drivers/gpu/drm/tests/drm_exec_test.c b/drivers/gpu/drm/tests/drm_exec_test.c
index 2fc47f3b463b..fbb887930e6d 100644
--- a/drivers/gpu/drm/tests/drm_exec_test.c
+++ b/drivers/gpu/drm/tests/drm_exec_test.c
@@ -166,9 +166,9 @@ static void test_prepare_array(struct kunit *test)
drm_gem_private_object_init(priv->drm, gobj2, PAGE_SIZE);
drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
- drm_exec_until_all_locked(&exec)
- ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array),
- 1);
+ drm_exec_until_all_locked(&exec) {
+ ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array), 1);
+ }
KUNIT_EXPECT_EQ(test, ret, 0);
drm_exec_fini(&exec);
@@ -176,26 +176,6 @@ static void test_prepare_array(struct kunit *test)
drm_gem_private_object_fini(gobj2);
}
-static void test_multiple_loops(struct kunit *test)
-{
- struct drm_exec exec;
-
- drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
- drm_exec_until_all_locked(&exec)
- {
- break;
- }
- drm_exec_fini(&exec);
-
- drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
- drm_exec_until_all_locked(&exec)
- {
- break;
- }
- drm_exec_fini(&exec);
- KUNIT_SUCCEED(test);
-}
-
static struct kunit_case drm_exec_tests[] = {
KUNIT_CASE(sanitycheck),
KUNIT_CASE(test_lock),
@@ -203,7 +183,6 @@ static struct kunit_case drm_exec_tests[] = {
KUNIT_CASE(test_duplicates),
KUNIT_CASE(test_prepare),
KUNIT_CASE(test_prepare_array),
- KUNIT_CASE(test_multiple_loops),
{}
};
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index 8725ba92ff91..4a6b6aba4f15 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -101,17 +101,6 @@ drm_exec_obj(struct drm_exec *exec, unsigned long index)
#define drm_exec_for_each_locked_object_reverse(exec, obj) \
__drm_exec_for_each_locked_object_reverse(exec, obj, __UNIQUE_ID(drm_exec))
-/*
- * Helper to drm_exec_until_all_locked(). Don't use directly.
- *
- * Since labels can't be defined local to the loop's body we use a jump pointer
- * to make sure that the retry is only used from within the loop's body.
- */
-#define __drm_exec_until_all_locked(exec, _label) \
-_label: \
- for (void *const __maybe_unused __drm_exec_retry_ptr = &&_label; \
- drm_exec_cleanup(exec);)
-
/**
* drm_exec_until_all_locked - loop until all GEM objects are locked
* @exec: drm_exec object
@@ -121,7 +110,8 @@ _label: \
* guaranteed that no GEM object is locked.
*/
#define drm_exec_until_all_locked(exec) \
- __drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
+ __drm_exec_retry: __maybe_unused; \
+ for (bool __drm_exec_loop __maybe_unused = true; drm_exec_cleanup(exec);)
/**
* drm_exec_retry_on_contention - restart the loop to grap all locks
@@ -132,8 +122,9 @@ _label: \
*/
#define drm_exec_retry_on_contention(exec) \
do { \
- if (unlikely(drm_exec_is_contended(exec))) \
- goto *__drm_exec_retry_ptr; \
+ if (unlikely(__drm_exec_loop && \
+ drm_exec_is_contended(exec))) \
+ goto __drm_exec_retry; \
} while (0)
/**
@@ -157,8 +148,9 @@ static inline bool drm_exec_is_contended(struct drm_exec *exec)
*/
#define drm_exec_retry(_exec) \
do { \
- WARN_ON((_exec)->contended != DRM_EXEC_DUMMY); \
- goto *__drm_exec_retry_ptr; \
+ WARN_ON(__drm_exec_loop && \
+ (_exec)->contended != DRM_EXEC_DUMMY); \
+ goto __drm_exec_retry; \
} while (0)
/**
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-24 11:16 ` Peter Zijlstra
@ 2026-06-24 11:21 ` Peter Zijlstra
0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-24 11:21 UTC (permalink / raw)
To: Christian König
Cc: Mikhail Gavrilov, kernel test robot, jpoimboe, llvm,
oe-kbuild-all, Alex Deucher, Linux List Kernel Mailing
On Wed, Jun 24, 2026 at 01:16:11PM +0200, Peter Zijlstra wrote:
> @@ -121,7 +110,8 @@ _label: \
> * guaranteed that no GEM object is locked.
> */
> #define drm_exec_until_all_locked(exec) \
> - __drm_exec_until_all_locked(exec, __UNIQUE_ID(drm_exec))
> + __drm_exec_retry: __maybe_unused; \
> + for (bool __drm_exec_loop __maybe_unused = true; drm_exec_cleanup(exec);)
An alternative would be something like:
while (drm_exec_cleanup()) \
if (0) { \
__drm_exec_retry: __maybe_unused; \
continue; \
} else
But that still breaks the multi case.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
2026-06-24 9:28 ` Peter Zijlstra
2026-06-24 9:48 ` Christian König
@ 2026-06-24 12:36 ` Mikhail Gavrilov
1 sibling, 0 replies; 9+ messages in thread
From: Mikhail Gavrilov @ 2026-06-24 12:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kernel test robot, jpoimboe, llvm, oe-kbuild-all, Alex Deucher,
Christian König, Linux List Kernel Mailing
On Wed, Jun 24, 2026 at 2:28 PM Peter Zijlstra <peterz@infradead.org> wrote:
> I do think this is very much a compiler issue, clang should never emit
> an indirect call for this. Doing that is just aweful.
Filed upstream as https://github.com/llvm/llvm-project/issues/205544
with a standalone reproducer (the drm_exec computed-goto pattern reduced
to one .c file; plain -O2, no LTO and no KASAN needed). clang lowers the
single-target computed goto to jmp *%reg; gcc folds the same source to a
direct jmp. So the missed fold is tracked on the LLVM side regardless of
the drm_exec.h rework happening here.
I have the KASAN + ThinLTO + clang config that triggered the original
objtool warning, so I can give a Tested-by once the drm_exec.h change
settles.
--
Thanks,
Mikhail.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-24 12:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 15:15 [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame kernel test robot
2026-06-23 20:57 ` Mikhail Gavrilov
2026-06-24 9:28 ` Peter Zijlstra
2026-06-24 9:48 ` Christian König
2026-06-24 10:56 ` Peter Zijlstra
2026-06-24 11:04 ` Christian König
2026-06-24 11:16 ` Peter Zijlstra
2026-06-24 11:21 ` Peter Zijlstra
2026-06-24 12:36 ` Mikhail Gavrilov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox