* [PATCH v2] perf test: Make leafloop workload immune to compiler options
@ 2026-05-11 9:19 James Clark
2026-05-11 15:38 ` Ian Rogers
0 siblings, 1 reply; 3+ messages in thread
From: James Clark @ 2026-05-11 9:19 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter
Cc: linux-perf-users, linux-kernel, James Clark
Since the leafloop test program was moved into the main Perf binary as a
workload, it inherited the same compiler options as Perf. In this case
the -fstack-protector option broke the assumption that simple leaf
frames don't have a stack frame on Arm. This causes
test_arm_callgraph_fp.sh to pass even if the stack isn't augmented with
the link register, making the test useless.
Fix it by rewriting the leaf function in assembly seeing as it's so
simple. Adding -fno-stack-protector would also work, but wouldn't be
robust against other future compiler option additions.
The local variables and 'a' variable were never needed so remove them to
simplify.
Assisted-by: GitHub-Copilot:GPT-5.5
Signed-off-by: James Clark <james.clark@linaro.org>
---
Changes in v2:
- Push and pop asm sections - (Sashiko)
- Add .size directive - (Sashiko)
- Add asm label for done and test with LTO enabled - (Sashiko)
- Link to v1: https://lore.kernel.org/r/20260508-james-perf-leafloop-stack-v1-1-637c260b2da8@linaro.org
---
tools/perf/tests/workloads/leafloop.c | 40 +++++++++++++++++++++++++++--------
1 file changed, 31 insertions(+), 9 deletions(-)
diff --git a/tools/perf/tests/workloads/leafloop.c b/tools/perf/tests/workloads/leafloop.c
index f7561767e32c..c20c75f7ba49 100644
--- a/tools/perf/tests/workloads/leafloop.c
+++ b/tools/perf/tests/workloads/leafloop.c
@@ -6,26 +6,48 @@
#include "../tests.h"
/* We want to check these symbols in perf script */
-noinline void leaf(volatile int b);
-noinline void parent(volatile int b);
+noinline void leaf(void);
+noinline void parent(void);
-static volatile int a;
-static volatile sig_atomic_t done;
+static volatile sig_atomic_t done asm("leafloop_done");
static void sighandler(int sig __maybe_unused)
{
done = 1;
}
-noinline void leaf(volatile int b)
+#if defined(__aarch64__)
+/*
+ * Write leaf() in assembly so it stays as a minimal leaf function with no
+ * stack frame and won't get silently broken in the future by any Perf wide
+ * compilation options like -fstack-protector-all.
+ */
+asm(
+ ".pushsection .text,\"ax\",%progbits\n"
+ ".global leaf\n"
+ ".type leaf, %function\n"
+ "leaf:\n"
+ " adrp x1, leafloop_done\n"
+ " ldr w2, [x1, #:lo12:leafloop_done]\n"
+ " cbz w2, leaf\n"
+ " ret\n"
+ ".size leaf, .-leaf\n"
+ ".popsection\n"
+);
+
+#else
+
+noinline void leaf(void)
{
while (!done)
- a += b;
+ ;
}
-noinline void parent(volatile int b)
+#endif
+
+noinline void parent(void)
{
- leaf(b);
+ leaf();
}
static int leafloop(int argc, const char **argv)
@@ -39,7 +61,7 @@ static int leafloop(int argc, const char **argv)
signal(SIGALRM, sighandler);
alarm(sec);
- parent(sec);
+ parent();
return 0;
}
---
base-commit: 8c8f2093614373ea8179b562320212a25cf937c0
change-id: 20260508-james-perf-leafloop-stack-c221600eddf2
Best regards,
--
James Clark <james.clark@linaro.org>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] perf test: Make leafloop workload immune to compiler options
2026-05-11 9:19 [PATCH v2] perf test: Make leafloop workload immune to compiler options James Clark
@ 2026-05-11 15:38 ` Ian Rogers
2026-05-11 16:17 ` James Clark
0 siblings, 1 reply; 3+ messages in thread
From: Ian Rogers @ 2026-05-11 15:38 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, linux-perf-users, linux-kernel
On Mon, May 11, 2026 at 2:19 AM James Clark <james.clark@linaro.org> wrote:
>
> Since the leafloop test program was moved into the main Perf binary as a
> workload, it inherited the same compiler options as Perf. In this case
> the -fstack-protector option broke the assumption that simple leaf
> frames don't have a stack frame on Arm. This causes
> test_arm_callgraph_fp.sh to pass even if the stack isn't augmented with
> the link register, making the test useless.
>
> Fix it by rewriting the leaf function in assembly seeing as it's so
> simple. Adding -fno-stack-protector would also work, but wouldn't be
> robust against other future compiler option additions.
>
> The local variables and 'a' variable were never needed so remove them to
> simplify.
>
> Assisted-by: GitHub-Copilot:GPT-5.5
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> Changes in v2:
> - Push and pop asm sections - (Sashiko)
> - Add .size directive - (Sashiko)
> - Add asm label for done and test with LTO enabled - (Sashiko)
> - Link to v1: https://lore.kernel.org/r/20260508-james-perf-leafloop-stack-v1-1-637c260b2da8@linaro.org
> ---
> tools/perf/tests/workloads/leafloop.c | 40 +++++++++++++++++++++++++++--------
> 1 file changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/tests/workloads/leafloop.c b/tools/perf/tests/workloads/leafloop.c
> index f7561767e32c..c20c75f7ba49 100644
> --- a/tools/perf/tests/workloads/leafloop.c
> +++ b/tools/perf/tests/workloads/leafloop.c
> @@ -6,26 +6,48 @@
> #include "../tests.h"
>
> /* We want to check these symbols in perf script */
> -noinline void leaf(volatile int b);
> -noinline void parent(volatile int b);
> +noinline void leaf(void);
> +noinline void parent(void);
>
> -static volatile int a;
> -static volatile sig_atomic_t done;
> +static volatile sig_atomic_t done asm("leafloop_done");
>
> static void sighandler(int sig __maybe_unused)
> {
> done = 1;
> }
>
> -noinline void leaf(volatile int b)
> +#if defined(__aarch64__)
> +/*
> + * Write leaf() in assembly so it stays as a minimal leaf function with no
> + * stack frame and won't get silently broken in the future by any Perf wide
> + * compilation options like -fstack-protector-all.
> + */
> +asm(
> + ".pushsection .text,\"ax\",%progbits\n"
> + ".global leaf\n"
> + ".type leaf, %function\n"
> + "leaf:\n"
> + " adrp x1, leafloop_done\n"
> + " ldr w2, [x1, #:lo12:leafloop_done]\n"
> + " cbz w2, leaf\n"
> + " ret\n"
> + ".size leaf, .-leaf\n"
> + ".popsection\n"
> +);
On reading this I thought, why can't we just use an asm block in a
function, but I get it, you want specific function entry/exit to test
the leaf unwinding feature.
Reviewed-by: Ian Rogers <irogers@google.com>
Fwiw, looking at the test the command line is somewhat unusual:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_arm_callgraph_fp.sh?h=perf-tools-next#n32
```
perf record -o "$PERF_DATA" --call-graph fp -e cycles//u
--user-callchains -- $TEST_PROGRAM
```
cycles//u rather than cycles:u and why the extra --user-callchains?
Thanks,
Ian
> +
> +#else
> +
> +noinline void leaf(void)
> {
> while (!done)
> - a += b;
> + ;
> }
>
> -noinline void parent(volatile int b)
> +#endif
> +
> +noinline void parent(void)
> {
> - leaf(b);
> + leaf();
> }
>
> static int leafloop(int argc, const char **argv)
> @@ -39,7 +61,7 @@ static int leafloop(int argc, const char **argv)
> signal(SIGALRM, sighandler);
> alarm(sec);
>
> - parent(sec);
> + parent();
> return 0;
> }
>
>
> ---
> base-commit: 8c8f2093614373ea8179b562320212a25cf937c0
> change-id: 20260508-james-perf-leafloop-stack-c221600eddf2
>
> Best regards,
> --
> James Clark <james.clark@linaro.org>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] perf test: Make leafloop workload immune to compiler options
2026-05-11 15:38 ` Ian Rogers
@ 2026-05-11 16:17 ` James Clark
0 siblings, 0 replies; 3+ messages in thread
From: James Clark @ 2026-05-11 16:17 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, linux-perf-users, linux-kernel
On 11/05/2026 4:38 pm, Ian Rogers wrote:
> On Mon, May 11, 2026 at 2:19 AM James Clark <james.clark@linaro.org> wrote:
>>
>> Since the leafloop test program was moved into the main Perf binary as a
>> workload, it inherited the same compiler options as Perf. In this case
>> the -fstack-protector option broke the assumption that simple leaf
>> frames don't have a stack frame on Arm. This causes
>> test_arm_callgraph_fp.sh to pass even if the stack isn't augmented with
>> the link register, making the test useless.
>>
>> Fix it by rewriting the leaf function in assembly seeing as it's so
>> simple. Adding -fno-stack-protector would also work, but wouldn't be
>> robust against other future compiler option additions.
>>
>> The local variables and 'a' variable were never needed so remove them to
>> simplify.
>>
>> Assisted-by: GitHub-Copilot:GPT-5.5
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> Changes in v2:
>> - Push and pop asm sections - (Sashiko)
>> - Add .size directive - (Sashiko)
>> - Add asm label for done and test with LTO enabled - (Sashiko)
>> - Link to v1: https://lore.kernel.org/r/20260508-james-perf-leafloop-stack-v1-1-637c260b2da8@linaro.org
>> ---
>> tools/perf/tests/workloads/leafloop.c | 40 +++++++++++++++++++++++++++--------
>> 1 file changed, 31 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/perf/tests/workloads/leafloop.c b/tools/perf/tests/workloads/leafloop.c
>> index f7561767e32c..c20c75f7ba49 100644
>> --- a/tools/perf/tests/workloads/leafloop.c
>> +++ b/tools/perf/tests/workloads/leafloop.c
>> @@ -6,26 +6,48 @@
>> #include "../tests.h"
>>
>> /* We want to check these symbols in perf script */
>> -noinline void leaf(volatile int b);
>> -noinline void parent(volatile int b);
>> +noinline void leaf(void);
>> +noinline void parent(void);
>>
>> -static volatile int a;
>> -static volatile sig_atomic_t done;
>> +static volatile sig_atomic_t done asm("leafloop_done");
>>
>> static void sighandler(int sig __maybe_unused)
>> {
>> done = 1;
>> }
>>
>> -noinline void leaf(volatile int b)
>> +#if defined(__aarch64__)
>> +/*
>> + * Write leaf() in assembly so it stays as a minimal leaf function with no
>> + * stack frame and won't get silently broken in the future by any Perf wide
>> + * compilation options like -fstack-protector-all.
>> + */
>> +asm(
>> + ".pushsection .text,\"ax\",%progbits\n"
>> + ".global leaf\n"
>> + ".type leaf, %function\n"
>> + "leaf:\n"
>> + " adrp x1, leafloop_done\n"
>> + " ldr w2, [x1, #:lo12:leafloop_done]\n"
>> + " cbz w2, leaf\n"
>> + " ret\n"
>> + ".size leaf, .-leaf\n"
>> + ".popsection\n"
>> +);
>
> On reading this I thought, why can't we just use an asm block in a
> function, but I get it, you want specific function entry/exit to test
> the leaf unwinding feature.
>
> Reviewed-by: Ian Rogers <irogers@google.com>
>
> Fwiw, looking at the test the command line is somewhat unusual:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_arm_callgraph_fp.sh?h=perf-tools-next#n32
> ```
> perf record -o "$PERF_DATA" --call-graph fp -e cycles//u
> --user-callchains -- $TEST_PROGRAM
> ```
> cycles//u rather than cycles:u and why the extra --user-callchains?
>
> Thanks,
> Ian
>
--user-callchains looks like it was to make the grep in the test easier
so it didn't have to ignore the kernel part. But it might be redundant
now after a later change I made.
cycles//u has always been there so there's no explanation. I thought
that was a valid way to open an event? Is it weird because // is for
options in perf record and not stat?
>> +
>> +#else
>> +
>> +noinline void leaf(void)
>> {
>> while (!done)
>> - a += b;
>> + ;
>> }
>>
>> -noinline void parent(volatile int b)
>> +#endif
>> +
>> +noinline void parent(void)
>> {
>> - leaf(b);
>> + leaf();
>> }
>>
>> static int leafloop(int argc, const char **argv)
>> @@ -39,7 +61,7 @@ static int leafloop(int argc, const char **argv)
>> signal(SIGALRM, sighandler);
>> alarm(sec);
>>
>> - parent(sec);
>> + parent();
>> return 0;
>> }
>>
>>
>> ---
>> base-commit: 8c8f2093614373ea8179b562320212a25cf937c0
>> change-id: 20260508-james-perf-leafloop-stack-c221600eddf2
>>
>> Best regards,
>> --
>> James Clark <james.clark@linaro.org>
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-11 16:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 9:19 [PATCH v2] perf test: Make leafloop workload immune to compiler options James Clark
2026-05-11 15:38 ` Ian Rogers
2026-05-11 16:17 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox