From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@arm.linux.org.uk (Russell King - ARM Linux) Date: Mon, 12 Jan 2015 16:36:48 +0000 Subject: CFT: move outer_cache_sync() out of line Message-ID: <20150112163648.GL12302@n2100.arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org The theory is that moving outer_cache_sync() out of line moves the conditional test from multiple sites to a single location, which in turn means that the branch predictor stands a better chance of making the correct decision. However, I'm a hardware pauper - I have the choice of testing this on iMX6 (which seems to have bus limitations, and seems to produce a wide variation on measured performance which makes it hard to evaluate any differences) or the Versatile Express, which really is not suitable for making IO performance measurements. So, without help from people with "good" hardware, I can't evaluate the performance impact of this change. Anyone willing to do some performance testing on this and feedback any numbers? Obviously, drivers which make use of a lot of non-relaxed IO accessors will be most affected by this - so you really have to know your drivers when deciding how to evaluate this (sorry, I can't say "measure network bandwidth" or "measure SSD sata performance" is a good test.) Theoretically, this should help overall system performance, since the branch predictor should be able to predict this better, but it's entirely possible that trying to benchmark a single workload won't be measurably different. In terms of kernel size figures, this change alone saves almost 17K of 10MB of kernel text on my iMX6 kernels - which is bordering on insignificant since that's not quite a 0.2% saving. So... right now I can't justify this change, but I'm hoping some can come up with some figures which shows that it benefits their workload without causing a performance regression for others. diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h index 891a56b35bcf..807e4e71c8e7 100644 --- a/arch/arm/include/asm/outercache.h +++ b/arch/arm/include/asm/outercache.h @@ -133,11 +133,7 @@ static inline void outer_resume(void) { } * Ensure that all outer cache operations are complete and any store * buffers are drained. */ -static inline void outer_sync(void) -{ - if (outer_cache.sync) - outer_cache.sync(); -} +extern void outer_sync(void); #else static inline void outer_sync(void) { } diff --git a/arch/arm/mm/l2c-common.c b/arch/arm/mm/l2c-common.c index 10a3cf28c362..b1c24c8c1eb9 100644 --- a/arch/arm/mm/l2c-common.c +++ b/arch/arm/mm/l2c-common.c @@ -7,9 +7,17 @@ * published by the Free Software Foundation. */ #include +#include #include #include +void outer_sync(void) +{ + if (outer_cache.sync) + outer_cache.sync(); +} +EXPORT_SYMBOL(outer_sync); + void outer_disable(void) { WARN_ON(!irqs_disabled()); -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net.