From: Indu Bhagat <indu.bhagat@oracle.com>
To: Fangrui Song <maskray@sourceware.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Jan Beulich <jbeulich@suse.com>,
Rainer Orth <ro@cebitec.uni-bielefeld.de>,
"linux-toolchains@vger.kernel.org"
<linux-toolchains@vger.kernel.org>,
Jens Remus <jremus@linux.ibm.com>,
Sterling Augustine <saugustine@google.com>,
Pavel Labath <labath@google.com>,
Andrii Nakryiko <andrii@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Serhei Makarov <smakarov@redhat.com>,
Binutils <binutils@sourceware.org>
Subject: Re: Unaligned access trade-offs for SFrame FRE layout
Date: Tue, 16 Sep 2025 10:33:52 -0700 [thread overview]
Message-ID: <332704dc-1e33-444b-afb3-8f3d776fb2a8@oracle.com> (raw)
In-Reply-To: <CAN30aBFW1T7WhBn9QBDig6i1Nh23XTJ1eRFHeNLNU2nfahv_7Q@mail.gmail.com>
On 9/16/25 9:32 AM, Fangrui Song wrote:
> On Tue, Sep 16, 2025 at 9:03 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>>
>> On 9/15/25 11:05 PM, Fangrui Song wrote:
>>> On Mon, Sep 15, 2025 at 9:12 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>>
>>>> On Sun, 14 Sep 2025 22:42:46 -0700
>>>> Indu Bhagat <indu.bhagat@oracle.com> wrote:
>>>>
>>>>> In such cases, the routines reading the SFrame data under consideration
>>>>> here (SFrame FRE start addr, and SFrame FRE stack offsets) from memory
>>>>> will need to use a memcpy to copy out the data to an aligned location.
>>>>>
>>>>> In GNU Binutils libsframe (used by ld), we do the above. Such a "SFrame
>>>>> FRE decoding" routine could be provided in a arch-specific manner in
>>>>> SFrame stack tracers.
>>>>
>>>> I'm perfectly fine with making it a requirement for the reader of the
>>>> SFrame section having to use memcpy into an aligned structure for reading
>>>> if the architecture requires it. Let only the architectures that have
>>>> issues with unaligned access take the performance hit.
>>>>
>>>> -- Steve
>>>
>>> I agree. Unaligned access has nearly zero performance impact on modern
>>> architectures, provided the access doesn't span additional cache
>>> lines.
>>> The padding required for alignment would increase the size, likely
>>> creating more overhead than any alignment benefit would justify.
>>>
>>> (
>>> From a linker and binary utilities perspective, I'd even suggest
>>> adopting a universal little-endian format regardless of the target
>>> system's native endianness.
>>> This would eliminate the need for endianness templates in the C++ code
>>> and simplify toolchain implementation across platforms.
>>>
>>
>> (Perhaps I am missing something) Wouldnt a toolchain implementation need
>> endianness handling anyway to support cross toolchains?
>>
>>> On the big-endian z/Architecture, this is efficient: the LOAD REVERSED
>>> instructions are used by the bswap versions in the following program,
>>> not even requiring extra instructions.
>>> #define WIDTH(x) \
>>> typedef __UINT##x##_TYPE__ [[gnu::aligned(1)]] uint##x; \
>>> uint##x load_inc##x(uint##x *p) { return *p+1; } \
>>> uint##x load_bswap_inc##x(uint##x *p) { return __builtin_bswap##x(*p)+1; }; \
>>> uint##x load_eq##x(uint##x *p) { return *p==3; } \
>>> uint##x load_bswap_eq##x(uint##x *p) { return __builtin_bswap##x(*p)==3; }; \
>>>
>>> WIDTH(16);
>>> WIDTH(32);
>>> WIDTH(64);
>>> )
>>
>> For AArch64 which SFrame supports too, this is not true. AArch64 has
>> both LE and BE.
>
> While runtime consumers typically handle a single endianness, other
> tools like linkers and binary utilities must support both. They have
> to support cross compilation, producing a big-endian executable from a
> little-endian host.
>
Right. Sorry, I am still missing the link between "complexity of
endianness templates in the C++ code" vs what you say in the next
paragraph: endian aware read/write is anyway necessary.
> A universal little-endian approach simplifies code. Instead of using a
> function like read32le(config, p), where config->endian specifies the
> object file's endianness, or read32(p) with an internal endianness
> check, the code can simply use read32le(p).
>
> The read32le(p) function is either a standard read or a byte-swapped
> read. This byte-swapping is fast on aarch64be (thanks to REV16 and
> REV32 instructions) and s390x (byte-swap load).
The rev* instruction is in the data dependency chain. This means that
using little-endian for AArch64 BE then defers the task of endian swap
on to the stack tracers. E.g., aarch64 (added insn in dependency chain):
load_inc16:
ldrh w0, [x0]
add w0, w0, 1
ret
load_bswap_inc16:
ldrh w0, [x0]
rev16 w0, w0
add w0, w0, 1
ret
s390x (same height of dependency chain):
load_inc16:
lh %r2,0(%r2)
ahi %r2,1
llghr %r2,%r2
br %r14
load_bswap_inc16:
lrvh %r2,0(%r2)
ahi %r2,1
llghr %r2,%r2
br %r14
next prev parent reply other threads:[~2025-09-16 17:34 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-12 17:34 Unaligned access trade-offs for SFrame FRE layout Indu Bhagat
2025-09-12 18:19 ` Segher Boessenkool
2025-09-12 19:18 ` Steven Rostedt
2025-09-13 7:56 ` Indu Bhagat
2025-09-15 16:04 ` Steven Rostedt
[not found] ` <CAEG7qUxk_cZYv3X_VM6+ZGaVFAD-7jdPd3xA92xYHUAqyzb2Xw@mail.gmail.com>
2025-09-13 8:01 ` Indu Bhagat
2025-09-14 14:14 ` Jan Beulich
2025-09-14 14:39 ` Rainer Orth
2025-09-14 15:23 ` Jan Beulich
2025-09-14 16:18 ` Rainer Orth
2025-09-14 18:10 ` Jan Beulich
2025-09-15 5:42 ` Indu Bhagat
2025-09-15 16:07 ` Steven Rostedt
2025-09-15 17:22 ` Segher Boessenkool
2025-09-16 6:05 ` Fangrui Song
2025-09-16 15:58 ` Steven Rostedt
2025-09-18 10:39 ` Jens Remus
2025-09-16 16:03 ` Indu Bhagat
2025-09-16 16:32 ` Fangrui Song
2025-09-16 16:44 ` Segher Boessenkool
2025-09-16 17:05 ` Fangrui Song
2025-09-16 17:54 ` Segher Boessenkool
2025-09-16 17:33 ` Indu Bhagat [this message]
2025-09-17 21:12 ` Steven Rostedt
2025-09-17 23:55 ` Alan Modra
2025-09-15 9:08 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=332704dc-1e33-444b-afb3-8f3d776fb2a8@oracle.com \
--to=indu.bhagat@oracle.com \
--cc=andrii@kernel.org \
--cc=binutils@sourceware.org \
--cc=jbeulich@suse.com \
--cc=jpoimboe@kernel.org \
--cc=jremus@linux.ibm.com \
--cc=labath@google.com \
--cc=linux-toolchains@vger.kernel.org \
--cc=maskray@sourceware.org \
--cc=ro@cebitec.uni-bielefeld.de \
--cc=rostedt@goodmis.org \
--cc=saugustine@google.com \
--cc=smakarov@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).