linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Indu Bhagat <indu.bhagat@oracle.com>
Cc: Jens Remus <jremus@linux.ibm.com>,
	Sterling Augustine <saugustine@google.com>,
	Pavel Labath <labath@google.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Serhei Makarov <smakarov@redhat.com>,
	Binutils <binutils@sourceware.org>,
	"linux-toolchains@vger.kernel.org"
	<linux-toolchains@vger.kernel.org>
Subject: Re: Unaligned access trade-offs for SFrame FRE layout
Date: Fri, 12 Sep 2025 15:18:55 -0400	[thread overview]
Message-ID: <20250912151855.3af8c2ab@gandalf.local.home> (raw)
In-Reply-To: <b7b139c6-1963-4ffc-a872-518010a50563@oracle.com>

On Fri, 12 Sep 2025 10:34:42 -0700
Indu Bhagat <indu.bhagat@oracle.com> wrote:

> TL;DR: Thinking and experimenting a bit on the possible approaches for 
> avoiding unaligned accesses in the SFrame FRE layout (in SFrame V3), I 
> am not convinced that avoiding unaligned accesses for performance is 
> worth it.  IMO, forsaking compactness for avoiding unaligned accesses is 
> not a good trade off for SFrame.
> 
> Problem Statement
> On architectures such as x86_64, AArch64, and s390x, unaligned memory 
> accesses are handled transparently by the hardware but incur a 
> performance penalty. The objective of this analysis is to evaluate if 
> these unaligned accesses can be eliminated from the SFrame FRE layout 
> and if doing so provides a net performance benefit.

I guess the question is really, is it that big of a performance hit?

I know some others were worried about the performance, but we should look
at measurements too. Is it going to be a big enough issue in the stack
unwinding code to even notice?

> 
> The central challenge is that any alternative must demonstrate a clear
> performance improvement while avoiding significant size overhead. 
> Introducing "bloat" to the format to solve a potential performance issue 
> is a poor trade-off.

Correct. I would like to see performance numbers before we invest too much
time in this.

> 
> Source of unaligned accesses in SFrame FRE
>   - (#1) Access to the SFrame FRE start address (sfre_start_address)
>   - (#2) Access to the SFrame FRE stack offsets,  This is varlen data 
> tailing SFrame FRE top-level members (sfre_start_address and FRE info), 
> usually interpreted as stack offsets)

BTW, we should also look at how often are there unaligned accesses? All the
time? or just a percentage of time? If it is a percentage, what is that
percentage?

> 
> (Note that in the SFrame specification, SFrame Header, and SFrame FDE 
> (function descriptor entry) have aligned accesses.)
> 
> Updated notes on the various approaches and respective evaluation notes 
> on the wiki page:
> https://sourceware.org/binutils/wiki/sframe/sframev3todo#Avoid_unaligned_accesses
> 
> Summary of Approaches and Analysis/Notes
> Unaligned accesses may mean lower performance, but the alternative we 
> pick must at least provide better performance.  It is also important 
> that the chosen approach does not add bloat to the format.  Avoiding 
> unaligned accesses at the expense of bloating up the format is not a 
> good idea IMO.
> 
> Approach 1a: Bucketed members
>   Pros: Negligible bloat.
>   Cons: 1. Writing out the FRE data is somewhat more involved. Affects
>    assemblers, linkers. 2. For the common case though, accessing stack 
> offsets now needs more memory accesses per FRE.  This approach will not 
> bring clear performance benefits; the additional complexity in SFrame 
> readers and writers is not justified then either.

Right. If this causes more cache misses or worse, more page faults, to save
from an unaligned access, I don't think it's worth it.

> 
> Approach 1b: Bucketed members with Index
>   Cons: Significant bloat (~30%).

I personally believe 30% is too much overhead.

> 
> Approach 2: De-duplicated "stack offsets"
>   Pros: Will help reduce the size of SFrame sections.
>   Cons: 1. SFrame FRE layout is designed to be flexible so that it can
>    serve needs of new ABIs:  The varlen data is interpreted as stack 
> offsets on x86_64, and AArch64, but may not be the case for other ABIs. 
> De-duplicating non-structured data is not meaningful. 2. Writing out the 
> FRE data is quite more involved, increasing the complexity in Toolchain.

I don't know enough to comment about the above.

> 
> Approach 3: Good old basic padding
>   Cons: Significant bloat (~22%).  Performance win arguable as well.

I think 22% is also too much.

> 
> IMO, none of these approaches provide viable way to move forward. The 
> proposed methods either fail to deliver the desired clear performance 
> gain or introduce a significant size penalty or complexity, which is an 
> unacceptable trade-off.
> 
> Would like to gather inputs from the interested folks on this. Please 
> take a look and chime in.  Other ideas welcome.

As stated above, I'd like to know how much of a performance benefit this
is. It may not be worth it.

I wasn't one of the people who brought up unaligned accesses. I'd like to
hear from them to get their input.

-- Steve

  parent reply	other threads:[~2025-09-12 19:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-12 17:34 Unaligned access trade-offs for SFrame FRE layout Indu Bhagat
2025-09-12 18:19 ` Segher Boessenkool
2025-09-12 19:18 ` Steven Rostedt [this message]
2025-09-13  7:56   ` Indu Bhagat
2025-09-15 16:04     ` Steven Rostedt
     [not found]   ` <CAEG7qUxk_cZYv3X_VM6+ZGaVFAD-7jdPd3xA92xYHUAqyzb2Xw@mail.gmail.com>
2025-09-13  8:01     ` Indu Bhagat
2025-09-14 14:14 ` Jan Beulich
2025-09-14 14:39   ` Rainer Orth
2025-09-14 15:23     ` Jan Beulich
2025-09-14 16:18       ` Rainer Orth
2025-09-14 18:10         ` Jan Beulich
2025-09-15  5:42           ` Indu Bhagat
2025-09-15 16:07             ` Steven Rostedt
2025-09-15 17:22               ` Segher Boessenkool
2025-09-16  6:05               ` Fangrui Song
2025-09-16 15:58                 ` Steven Rostedt
2025-09-18 10:39                   ` Jens Remus
2025-09-16 16:03                 ` Indu Bhagat
2025-09-16 16:32                   ` Fangrui Song
2025-09-16 16:44                     ` Segher Boessenkool
2025-09-16 17:05                       ` Fangrui Song
2025-09-16 17:54                       ` Segher Boessenkool
2025-09-16 17:33                     ` Indu Bhagat
2025-09-17 21:12                 ` Steven Rostedt
2025-09-17 23:55                   ` Alan Modra
2025-09-15  9:08       ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250912151855.3af8c2ab@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=andrii@kernel.org \
    --cc=binutils@sourceware.org \
    --cc=indu.bhagat@oracle.com \
    --cc=jpoimboe@kernel.org \
    --cc=jremus@linux.ibm.com \
    --cc=labath@google.com \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=saugustine@google.com \
    --cc=smakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).