From mboxrd@z Thu Jan 1 00:00:00 1970 From: david.woodhouse@intel.com (Woodhouse, David) Date: Sun, 26 May 2013 09:30:00 +0000 Subject: [PATCH v7] arm: use built-in byte swap function In-Reply-To: <51A19FDD.9040403@gmail.com> References: <20130129181046.GC25415@pd.tnic> <20130219203115.114eab79e8d2099c6306d921@freescale.com> <1361356696.13482.269.camel@i7.infradead.org> <1361367842.13482.279.camel@i7.infradead.org> <1361372008.13482.280.camel@i7.infradead.org> <20130220214943.9b28a5b208da9f081387c55e@freescale.com> <20130221005221.15279b1372501af12c1e4f32@freescale.com> <20130221203327.6558f89277468f7ffffa6506@freescale.com> <20130222194032.f7b44aefa5e2723d16767a1b@freescale.com> <1361661654.18110.102.camel@shinybook.infradead.org> <20130523114654.1f273241725205c6703b2226@freescale.com> <51A19FDD.9040403@gmail.com> Message-ID: <1369560593.27719.67.camel@shinybook.infradead.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Sun, 2013-05-26 at 07:38 +0200, Dirk Behme wrote: > Am 23.05.2013 18:46, schrieb Kim Phillips: > > Enable the compiler intrinsic for byte swapping on arch ARM. This > > allows the compiler to detect and be able to optimize out byte > > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > > 4.8): > > I'm no GCC tool chain expert, so I just have an understanding > question: Could anyone kindly give a brief explanation (*) of what the > advantage of this is on ARM? > > http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > mentions "lwbrx/stwbrx on PowerPC, movbe on Atom". But for ARM? > > I haven't understood yet why the __arch_swabXX() in > arch/arm/include/asm/swab.h [1] aren't sufficient? How can this be > done better? E.g. does anybody have a disassembly without/with this > change to illustrate that? The point is just that the compiler gets to *see* what's happening. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55177 for a bunch of examples of things that GCC ought to be able to optimise, even without the CPU having load-and-swap instructions. Not that it always does; hence the PR. But there are some that it does currently manage, evidently. You'll see this if you follow the link above, but as an example: imagine a code sequence that goes load, swap, mask, swap, store. With the swaps done by opaque inline asm, there's nothing the compiler can do to optimise this. But if it *knows* what's going on, it can optimise it into a single load, mask of a pre-byte-swapped constant, and store. Having said that, I can't actually answer your question ? I don't know which optimisations the compiler *is* doing to provide the "modest benefit" that Kim mentions; every class of optimisation I explicitly checked for was missing. Again, hence the PR. But evidently it does manage to get *something* right. -- Sent with Evolution's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse at intel.com Intel Corporation -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4370 bytes Desc: not available URL: