From: Steven Rostedt <rostedt@goodmis.org>
To: Indu Bhagat <indu.bhagat@oracle.com>
Cc: Jens Remus <jremus@linux.ibm.com>,
Sterling Augustine <saugustine@google.com>,
Pavel Labath <labath@google.com>,
Andrii Nakryiko <andrii@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Serhei Makarov <smakarov@redhat.com>,
Binutils <binutils@sourceware.org>,
"linux-toolchains@vger.kernel.org"
<linux-toolchains@vger.kernel.org>
Subject: Re: Unaligned access trade-offs for SFrame FRE layout
Date: Fri, 12 Sep 2025 15:18:55 -0400 [thread overview]
Message-ID: <20250912151855.3af8c2ab@gandalf.local.home> (raw)
In-Reply-To: <b7b139c6-1963-4ffc-a872-518010a50563@oracle.com>
On Fri, 12 Sep 2025 10:34:42 -0700
Indu Bhagat <indu.bhagat@oracle.com> wrote:
> TL;DR: Thinking and experimenting a bit on the possible approaches for
> avoiding unaligned accesses in the SFrame FRE layout (in SFrame V3), I
> am not convinced that avoiding unaligned accesses for performance is
> worth it. IMO, forsaking compactness for avoiding unaligned accesses is
> not a good trade off for SFrame.
>
> Problem Statement
> On architectures such as x86_64, AArch64, and s390x, unaligned memory
> accesses are handled transparently by the hardware but incur a
> performance penalty. The objective of this analysis is to evaluate if
> these unaligned accesses can be eliminated from the SFrame FRE layout
> and if doing so provides a net performance benefit.
I guess the question is really, is it that big of a performance hit?
I know some others were worried about the performance, but we should look
at measurements too. Is it going to be a big enough issue in the stack
unwinding code to even notice?
>
> The central challenge is that any alternative must demonstrate a clear
> performance improvement while avoiding significant size overhead.
> Introducing "bloat" to the format to solve a potential performance issue
> is a poor trade-off.
Correct. I would like to see performance numbers before we invest too much
time in this.
>
> Source of unaligned accesses in SFrame FRE
> - (#1) Access to the SFrame FRE start address (sfre_start_address)
> - (#2) Access to the SFrame FRE stack offsets, This is varlen data
> tailing SFrame FRE top-level members (sfre_start_address and FRE info),
> usually interpreted as stack offsets)
BTW, we should also look at how often are there unaligned accesses? All the
time? or just a percentage of time? If it is a percentage, what is that
percentage?
>
> (Note that in the SFrame specification, SFrame Header, and SFrame FDE
> (function descriptor entry) have aligned accesses.)
>
> Updated notes on the various approaches and respective evaluation notes
> on the wiki page:
> https://sourceware.org/binutils/wiki/sframe/sframev3todo#Avoid_unaligned_accesses
>
> Summary of Approaches and Analysis/Notes
> Unaligned accesses may mean lower performance, but the alternative we
> pick must at least provide better performance. It is also important
> that the chosen approach does not add bloat to the format. Avoiding
> unaligned accesses at the expense of bloating up the format is not a
> good idea IMO.
>
> Approach 1a: Bucketed members
> Pros: Negligible bloat.
> Cons: 1. Writing out the FRE data is somewhat more involved. Affects
> assemblers, linkers. 2. For the common case though, accessing stack
> offsets now needs more memory accesses per FRE. This approach will not
> bring clear performance benefits; the additional complexity in SFrame
> readers and writers is not justified then either.
Right. If this causes more cache misses or worse, more page faults, to save
from an unaligned access, I don't think it's worth it.
>
> Approach 1b: Bucketed members with Index
> Cons: Significant bloat (~30%).
I personally believe 30% is too much overhead.
>
> Approach 2: De-duplicated "stack offsets"
> Pros: Will help reduce the size of SFrame sections.
> Cons: 1. SFrame FRE layout is designed to be flexible so that it can
> serve needs of new ABIs: The varlen data is interpreted as stack
> offsets on x86_64, and AArch64, but may not be the case for other ABIs.
> De-duplicating non-structured data is not meaningful. 2. Writing out the
> FRE data is quite more involved, increasing the complexity in Toolchain.
I don't know enough to comment about the above.
>
> Approach 3: Good old basic padding
> Cons: Significant bloat (~22%). Performance win arguable as well.
I think 22% is also too much.
>
> IMO, none of these approaches provide viable way to move forward. The
> proposed methods either fail to deliver the desired clear performance
> gain or introduce a significant size penalty or complexity, which is an
> unacceptable trade-off.
>
> Would like to gather inputs from the interested folks on this. Please
> take a look and chime in. Other ideas welcome.
As stated above, I'd like to know how much of a performance benefit this
is. It may not be worth it.
I wasn't one of the people who brought up unaligned accesses. I'd like to
hear from them to get their input.
-- Steve
next prev parent reply other threads:[~2025-09-12 19:18 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-12 17:34 Unaligned access trade-offs for SFrame FRE layout Indu Bhagat
2025-09-12 18:19 ` Segher Boessenkool
2025-09-12 19:18 ` Steven Rostedt [this message]
2025-09-13 7:56 ` Indu Bhagat
2025-09-15 16:04 ` Steven Rostedt
[not found] ` <CAEG7qUxk_cZYv3X_VM6+ZGaVFAD-7jdPd3xA92xYHUAqyzb2Xw@mail.gmail.com>
2025-09-13 8:01 ` Indu Bhagat
2025-09-14 14:14 ` Jan Beulich
2025-09-14 14:39 ` Rainer Orth
2025-09-14 15:23 ` Jan Beulich
2025-09-14 16:18 ` Rainer Orth
2025-09-14 18:10 ` Jan Beulich
2025-09-15 5:42 ` Indu Bhagat
2025-09-15 16:07 ` Steven Rostedt
2025-09-15 17:22 ` Segher Boessenkool
2025-09-16 6:05 ` Fangrui Song
2025-09-16 15:58 ` Steven Rostedt
2025-09-18 10:39 ` Jens Remus
2025-09-16 16:03 ` Indu Bhagat
2025-09-16 16:32 ` Fangrui Song
2025-09-16 16:44 ` Segher Boessenkool
2025-09-16 17:05 ` Fangrui Song
2025-09-16 17:54 ` Segher Boessenkool
2025-09-16 17:33 ` Indu Bhagat
2025-09-17 21:12 ` Steven Rostedt
2025-09-17 23:55 ` Alan Modra
2025-09-15 9:08 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250912151855.3af8c2ab@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=andrii@kernel.org \
--cc=binutils@sourceware.org \
--cc=indu.bhagat@oracle.com \
--cc=jpoimboe@kernel.org \
--cc=jremus@linux.ibm.com \
--cc=labath@google.com \
--cc=linux-toolchains@vger.kernel.org \
--cc=saugustine@google.com \
--cc=smakarov@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).