linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Indu Bhagat <indu.bhagat@oracle.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Jens Remus <jremus@linux.ibm.com>,
	Sterling Augustine <saugustine@google.com>,
	Pavel Labath <labath@google.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Serhei Makarov <smakarov@redhat.com>,
	Binutils <binutils@sourceware.org>,
	"linux-toolchains@vger.kernel.org"
	<linux-toolchains@vger.kernel.org>
Subject: Re: Unaligned access trade-offs for SFrame FRE layout
Date: Sat, 13 Sep 2025 00:56:34 -0700	[thread overview]
Message-ID: <0c524c2c-cbfa-4058-b360-b97cd361a190@oracle.com> (raw)
In-Reply-To: <20250912151855.3af8c2ab@gandalf.local.home>

On 9/12/25 12:18 PM, Steven Rostedt wrote:
> On Fri, 12 Sep 2025 10:34:42 -0700
> Indu Bhagat <indu.bhagat@oracle.com> wrote:
> 
>> TL;DR: Thinking and experimenting a bit on the possible approaches for
>> avoiding unaligned accesses in the SFrame FRE layout (in SFrame V3), I
>> am not convinced that avoiding unaligned accesses for performance is
>> worth it.  IMO, forsaking compactness for avoiding unaligned accesses is
>> not a good trade off for SFrame.
>>
>> Problem Statement
>> On architectures such as x86_64, AArch64, and s390x, unaligned memory
>> accesses are handled transparently by the hardware but incur a
>> performance penalty. The objective of this analysis is to evaluate if
>> these unaligned accesses can be eliminated from the SFrame FRE layout
>> and if doing so provides a net performance benefit.
> 
> I guess the question is really, is it that big of a performance hit?
> 
> I know some others were worried about the performance, but we should look
> at measurements too. Is it going to be a big enough issue in the stack
> unwinding code to even notice?
> 

I think quantifying the performance impact of unaligned accesses for 
stack tracing using SFrame sections will be larger experiment which will 
be hardware dependent..

https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/

It seems, for newer architectures, if the unaligned access is to the 
same cache line, the cycle impact is minimal.  When the unaligned access 
crosses cache line boundary, there may be a few cycles of impact. When 
unaligned accesses cross page boundary, it gets noticeable.

That said, I can give some static numbers from some SFrame sections on 
x86_64 for now.  I see that across the SFrame sections for GNU Binutils 
binaries (-O2 binaries):
   - ~30% of SFrame FRE start addr across all functions are unaligned[*]
   - ~4% of all stack offsets are unaligned.

[*] Caveat: This should not be construed to mean that 1 out of every 3 
SFrame FRE start addr are unaligned.  There may be functions where 
SFrame FRE start addr are all aligned (e.g., because they were 1-byte 
long). Above data is average across all functions in one binary.

>>
>> The central challenge is that any alternative must demonstrate a clear
>> performance improvement while avoiding significant size overhead.
>> Introducing "bloat" to the format to solve a potential performance issue
>> is a poor trade-off.
> 
> Correct. I would like to see performance numbers before we invest too much
> time in this.
> 
>>
>> Source of unaligned accesses in SFrame FRE
>>    - (#1) Access to the SFrame FRE start address (sfre_start_address)
>>    - (#2) Access to the SFrame FRE stack offsets,  This is varlen data
>> tailing SFrame FRE top-level members (sfre_start_address and FRE info),
>> usually interpreted as stack offsets)
> 
> BTW, we should also look at how often are there unaligned accesses? All the
> time? or just a percentage of time? If it is a percentage, what is that
> percentage?
> 

The stack offsets for an FRE are accessed once per frame (and an SFrame 
FRE may have an average of 2 stack offsets on x86_64).

WRT FRE start addr, multiple SFrame FRE start address may need to be 
read until the applicable SFrame FRE is found.  SFrame FREs lookup is 
serial. SFrame FRE start addr can be 1-byte/2-byte or 4-byte (one size 
chosen per function).

The larger point I was trying to make was:  The alternative layouts of 
SFrame FREs may fair worse in performance or compactness or both...  So 
either way avoiding unaligned accesses does not look feasible with any 
of those approaches..

>>
>> (Note that in the SFrame specification, SFrame Header, and SFrame FDE
>> (function descriptor entry) have aligned accesses.)
>>
>> Updated notes on the various approaches and respective evaluation notes
>> on the wiki page:
>> https://sourceware.org/binutils/wiki/sframe/sframev3todo#Avoid_unaligned_accesses
>>
>> Summary of Approaches and Analysis/Notes
>> Unaligned accesses may mean lower performance, but the alternative we
>> pick must at least provide better performance.  It is also important
>> that the chosen approach does not add bloat to the format.  Avoiding
>> unaligned accesses at the expense of bloating up the format is not a
>> good idea IMO.
>>
>> Approach 1a: Bucketed members
>>    Pros: Negligible bloat.
>>    Cons: 1. Writing out the FRE data is somewhat more involved. Affects
>>     assemblers, linkers. 2. For the common case though, accessing stack
>> offsets now needs more memory accesses per FRE.  This approach will not
>> bring clear performance benefits; the additional complexity in SFrame
>> readers and writers is not justified then either.
> 
> Right. If this causes more cache misses or worse, more page faults, to save
> from an unaligned access, I don't think it's worth it.
> 
>>
>> Approach 1b: Bucketed members with Index
>>    Cons: Significant bloat (~30%).
> 
> I personally believe 30% is too much overhead.
> 
>>
>> Approach 2: De-duplicated "stack offsets"
>>    Pros: Will help reduce the size of SFrame sections.
>>    Cons: 1. SFrame FRE layout is designed to be flexible so that it can
>>     serve needs of new ABIs:  The varlen data is interpreted as stack
>> offsets on x86_64, and AArch64, but may not be the case for other ABIs.
>> De-duplicating non-structured data is not meaningful. 2. Writing out the
>> FRE data is quite more involved, increasing the complexity in Toolchain.
> 
> I don't know enough to comment about the above.
> 
>>
>> Approach 3: Good old basic padding
>>    Cons: Significant bloat (~22%).  Performance win arguable as well.
> 
> I think 22% is also too much.
> 
>>
>> IMO, none of these approaches provide viable way to move forward. The
>> proposed methods either fail to deliver the desired clear performance
>> gain or introduce a significant size penalty or complexity, which is an
>> unacceptable trade-off.
>>
>> Would like to gather inputs from the interested folks on this. Please
>> take a look and chime in.  Other ideas welcome.
> 
> As stated above, I'd like to know how much of a performance benefit this
> is. It may not be worth it.
> 
> I wasn't one of the people who brought up unaligned accesses. I'd like to
> hear from them to get their input.
> 
> -- Steve


  reply	other threads:[~2025-09-13  7:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-12 17:34 Unaligned access trade-offs for SFrame FRE layout Indu Bhagat
2025-09-12 18:19 ` Segher Boessenkool
2025-09-12 19:18 ` Steven Rostedt
2025-09-13  7:56   ` Indu Bhagat [this message]
2025-09-15 16:04     ` Steven Rostedt
     [not found]   ` <CAEG7qUxk_cZYv3X_VM6+ZGaVFAD-7jdPd3xA92xYHUAqyzb2Xw@mail.gmail.com>
2025-09-13  8:01     ` Indu Bhagat
2025-09-14 14:14 ` Jan Beulich
2025-09-14 14:39   ` Rainer Orth
2025-09-14 15:23     ` Jan Beulich
2025-09-14 16:18       ` Rainer Orth
2025-09-14 18:10         ` Jan Beulich
2025-09-15  5:42           ` Indu Bhagat
2025-09-15 16:07             ` Steven Rostedt
2025-09-15 17:22               ` Segher Boessenkool
2025-09-16  6:05               ` Fangrui Song
2025-09-16 15:58                 ` Steven Rostedt
2025-09-18 10:39                   ` Jens Remus
2025-09-16 16:03                 ` Indu Bhagat
2025-09-16 16:32                   ` Fangrui Song
2025-09-16 16:44                     ` Segher Boessenkool
2025-09-16 17:05                       ` Fangrui Song
2025-09-16 17:54                       ` Segher Boessenkool
2025-09-16 17:33                     ` Indu Bhagat
2025-09-17 21:12                 ` Steven Rostedt
2025-09-17 23:55                   ` Alan Modra
2025-09-15  9:08       ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c524c2c-cbfa-4058-b360-b97cd361a190@oracle.com \
    --to=indu.bhagat@oracle.com \
    --cc=andrii@kernel.org \
    --cc=binutils@sourceware.org \
    --cc=jpoimboe@kernel.org \
    --cc=jremus@linux.ibm.com \
    --cc=labath@google.com \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=saugustine@google.com \
    --cc=smakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).