From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 203192701B6 for ; Tue, 16 Sep 2025 17:55:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=63.228.1.57 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758045320; cv=none; b=aIKMlpsjLIJ2iJsqlRQJgU6Ia/u3Au3o0L9MxUcAhxaeRs3eSb/jovoXpWNLXVp4QVQXalwtISG91uanQoQ3EsdpPuSlxkyLQkvW5WrKpFuO5MThGFn6OQQdnG+aHO82V9rb27XWLI7ROLIMYWwYJO4ebn17dKbhnBv3nrHHXQc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758045320; c=relaxed/simple; bh=rMaXQgmBQoRlT88oCJXpFj8zYA1R1B+xUIlAMW/QuJo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=h/XIj5ANask/NDI7mE7TkyfPn7MnXl0DMg7vR4nqPo3DHZyZRigPSMAol0iMjIUev8Hc9tQ5AqeLW9firNuLAefvqjI5LOdcsCKVB4W03gMsOaGYfPxJO94PGvbfu0EnK9Y5x/evAP4Kvd+fmW2nVRxp2ZaAfT9wOji801h6HFM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org; spf=pass smtp.mailfrom=kernel.crashing.org; arc=none smtp.client-ip=63.228.1.57 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost [127.0.0.1]) by gate.crashing.org (8.18.1/8.18.1/Debian-2) with ESMTP id 58GHsQhE303221; Tue, 16 Sep 2025 12:54:26 -0500 Received: (from segher@localhost) by gate.crashing.org (8.18.1/8.18.1/Submit) id 58GHsPId303220; Tue, 16 Sep 2025 12:54:25 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Tue, 16 Sep 2025 12:54:24 -0500 From: Segher Boessenkool To: Fangrui Song Cc: Indu Bhagat , Steven Rostedt , Jan Beulich , Rainer Orth , "linux-toolchains@vger.kernel.org" , Jens Remus , Sterling Augustine , Pavel Labath , Andrii Nakryiko , Josh Poimboeuf , Serhei Makarov , Binutils Subject: Re: Unaligned access trade-offs for SFrame FRE layout Message-ID: References: <26895e7a-5d54-4c89-aeb4-bcd094ba081d@suse.com> <1308e9fa-90c8-4c52-b53d-afd24542b4c8@suse.com> <76b8c89e-5d80-48da-aff1-580d539d1b87@oracle.com> <20250915120742.7ff2f781@gandalf.local.home> Precedence: bulk X-Mailing-List: linux-toolchains@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Sep 16, 2025 at 11:44:26AM -0500, Segher Boessenkool wrote: > On Tue, Sep 16, 2025 at 09:32:30AM -0700, Fangrui Song wrote: > > The read32le(p) function is either a standard read or a byte-swapped > > read. > > You should never overcomplicate things by doing byte-swaps. Instead, > just say what you mean: > > u32 read32le(u8 *p) > { > return p[0] + 0x100*p[1] + 0x10000*p[2] + 0x1000000*p[3]; > } > > or something like that. The compiler can optimise such things just > fine! There is no need to go via extra indirections. The following actually compiles to optimal code, both with -mbig and with -mlittle: === typedef unsigned int u32; typedef unsigned char u8; u32 read32le(u8 *p) { return (u32)p[0] | (u32)p[1]<<8 | (u32)p[2]<<16 | (u32)p[3]<<24; } === With -O2 -mbig: lwbrx 3,0,3 # 10 [c=8 l=4] bswapsi2_load blr # 18 [c=4 l=4] simple_return (on a BE system), and with -O2 -mlittle: lwz 3,0(3) # 11 [c=8 l=4] *movsi_internal1/3 blr # 19 [c=4 l=4] simple_return (I used -mcpu=power10, because a) why not, and b) with an ancient CPU GCC will make more sure not to do misaligned accesses. Power8 is fine already, 970 (aka Apple G5) isn't (for the LE accesses on a BE host): and that is good, because such accesses will frequently trap, so on average they are quite expensive if done as a single read. Segher