Re: Unaligned access trade-offs for SFrame FRE layout

linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Indu Bhagat <indu.bhagat@oracle.com>
To: Fangrui Song <maskray@sourceware.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Jan Beulich <jbeulich@suse.com>,
	Rainer Orth <ro@cebitec.uni-bielefeld.de>,
	"linux-toolchains@vger.kernel.org"
	<linux-toolchains@vger.kernel.org>,
	Jens Remus <jremus@linux.ibm.com>,
	Sterling Augustine <saugustine@google.com>,
	Pavel Labath <labath@google.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Serhei Makarov <smakarov@redhat.com>,
	Binutils <binutils@sourceware.org>
Subject: Re: Unaligned access trade-offs for SFrame FRE layout
Date: Tue, 16 Sep 2025 10:33:52 -0700	[thread overview]
Message-ID: <332704dc-1e33-444b-afb3-8f3d776fb2a8@oracle.com> (raw)
In-Reply-To: <CAN30aBFW1T7WhBn9QBDig6i1Nh23XTJ1eRFHeNLNU2nfahv_7Q@mail.gmail.com>

On 9/16/25 9:32 AM, Fangrui Song wrote:
> On Tue, Sep 16, 2025 at 9:03 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>>
>> On 9/15/25 11:05 PM, Fangrui Song wrote:
>>> On Mon, Sep 15, 2025 at 9:12 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>>
>>>> On Sun, 14 Sep 2025 22:42:46 -0700
>>>> Indu Bhagat <indu.bhagat@oracle.com> wrote:
>>>>
>>>>> In such cases, the routines reading the SFrame data under consideration
>>>>> here (SFrame FRE start addr, and SFrame FRE stack offsets) from memory
>>>>> will need to use a memcpy to copy out the data to an aligned location.
>>>>>
>>>>> In GNU Binutils libsframe (used by ld), we do the above. Such a "SFrame
>>>>> FRE decoding" routine could be provided in a arch-specific manner in
>>>>> SFrame stack tracers.
>>>>
>>>> I'm perfectly fine with making it a requirement for the reader of the
>>>> SFrame section having to use memcpy into an aligned structure for reading
>>>> if the architecture requires it. Let only the architectures that have
>>>> issues with unaligned access take the performance hit.
>>>>
>>>> -- Steve
>>>
>>> I agree. Unaligned access has nearly zero performance impact on modern
>>> architectures, provided the access doesn't span additional cache
>>> lines.
>>> The padding required for alignment would increase the size, likely
>>> creating more overhead than any alignment benefit would justify.
>>>
>>> (
>>>   From a linker and binary utilities perspective, I'd even suggest
>>> adopting a universal little-endian format regardless of the target
>>> system's native endianness.
>>> This would eliminate the need for endianness templates in the C++ code
>>> and simplify toolchain implementation across platforms.
>>>
>>
>> (Perhaps I am missing something) Wouldnt a toolchain implementation need
>> endianness handling anyway to support cross toolchains?
>>
>>> On the big-endian z/Architecture, this is efficient: the LOAD REVERSED
>>> instructions are used by the bswap versions in the following program,
>>> not even requiring extra instructions.
>>> #define WIDTH(x) \
>>> typedef __UINT##x##_TYPE__ [[gnu::aligned(1)]] uint##x; \
>>> uint##x load_inc##x(uint##x *p) { return *p+1; } \
>>> uint##x load_bswap_inc##x(uint##x *p) { return __builtin_bswap##x(*p)+1; }; \
>>> uint##x load_eq##x(uint##x *p) { return *p==3; } \
>>> uint##x load_bswap_eq##x(uint##x *p) { return __builtin_bswap##x(*p)==3; }; \
>>>
>>> WIDTH(16);
>>> WIDTH(32);
>>> WIDTH(64);
>>> )
>>
>> For AArch64 which SFrame supports too, this is not true. AArch64 has
>> both LE and BE.
> 
> While runtime consumers typically handle a single endianness, other
> tools like linkers and binary utilities must support both. They have
> to support cross compilation, producing a big-endian executable from a
> little-endian host.
> 

Right.  Sorry, I am still missing the link between "complexity of 
endianness templates in the C++ code" vs what you say in the next 
paragraph: endian aware read/write is anyway necessary.

> A universal little-endian approach simplifies code. Instead of using a
> function like read32le(config, p), where config->endian specifies the
> object file's endianness, or read32(p) with an internal endianness
> check, the code can simply use read32le(p).
> 
> The read32le(p) function is either a standard read or a byte-swapped
> read. This byte-swapping is fast on aarch64be (thanks to REV16 and
> REV32 instructions) and s390x (byte-swap load).

The rev* instruction is in the data dependency chain.  This means that 
using little-endian for AArch64 BE then defers the task of endian swap 
on to the stack tracers. E.g., aarch64 (added insn in dependency chain):

load_inc16:
         ldrh    w0, [x0]
         add     w0, w0, 1
         ret
load_bswap_inc16:
         ldrh    w0, [x0]
         rev16   w0, w0
         add     w0, w0, 1
         ret

s390x (same height of dependency chain):

load_inc16:
         lh      %r2,0(%r2)
         ahi     %r2,1
         llghr   %r2,%r2
         br      %r14
load_bswap_inc16:
         lrvh    %r2,0(%r2)
         ahi     %r2,1
         llghr   %r2,%r2
         br      %r14

next prev parent reply	other threads:[~2025-09-16 17:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-12 17:34 Unaligned access trade-offs for SFrame FRE layout Indu Bhagat
2025-09-12 18:19 ` Segher Boessenkool
2025-09-12 19:18 ` Steven Rostedt
2025-09-13  7:56   ` Indu Bhagat
2025-09-15 16:04     ` Steven Rostedt
     [not found]   ` <CAEG7qUxk_cZYv3X_VM6+ZGaVFAD-7jdPd3xA92xYHUAqyzb2Xw@mail.gmail.com>
2025-09-13  8:01     ` Indu Bhagat
2025-09-14 14:14 ` Jan Beulich
2025-09-14 14:39   ` Rainer Orth
2025-09-14 15:23     ` Jan Beulich
2025-09-14 16:18       ` Rainer Orth
2025-09-14 18:10         ` Jan Beulich
2025-09-15  5:42           ` Indu Bhagat
2025-09-15 16:07             ` Steven Rostedt
2025-09-15 17:22               ` Segher Boessenkool
2025-09-16  6:05               ` Fangrui Song
2025-09-16 15:58                 ` Steven Rostedt
2025-09-18 10:39                   ` Jens Remus
2025-09-16 16:03                 ` Indu Bhagat
2025-09-16 16:32                   ` Fangrui Song
2025-09-16 16:44                     ` Segher Boessenkool
2025-09-16 17:05                       ` Fangrui Song
2025-09-16 17:54                       ` Segher Boessenkool
2025-09-16 17:33                     ` Indu Bhagat [this message]
2025-09-17 21:12                 ` Steven Rostedt
2025-09-17 23:55                   ` Alan Modra
2025-09-15  9:08       ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=332704dc-1e33-444b-afb3-8f3d776fb2a8@oracle.com \
    --to=indu.bhagat@oracle.com \
    --cc=andrii@kernel.org \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    --cc=jpoimboe@kernel.org \
    --cc=jremus@linux.ibm.com \
    --cc=labath@google.com \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=maskray@sourceware.org \
    --cc=ro@cebitec.uni-bielefeld.de \
    --cc=rostedt@goodmis.org \
    --cc=saugustine@google.com \
    --cc=smakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).