From mboxrd@z Thu Jan 1 00:00:00 1970 From: dave.martin@linaro.org (Dave Martin) Date: Tue, 25 Jun 2013 15:29:13 +0100 Subject: [PATCH 0/5] Kernel mode NEON for XOR and RAID6 In-Reply-To: References: <1370530985-20619-1-git-send-email-ard.biesheuvel@linaro.org> <20130606151705.GG16794@mudshark.cambridge.arm.com> <20130607175007.GH8111@mudshark.cambridge.arm.com> <20130621093311.GD6983@mudshark.cambridge.arm.com> <51C46A0D.8070505@codeaurora.org> <20130625135648.GA2327@linaro.org> Message-ID: <20130625142913.GD2327@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Jun 25, 2013 at 04:14:13PM +0200, Ard Biesheuvel wrote: > On 25 June 2013 15:56, Dave Martin wrote: > > Significant benchmarks on the boot path would be unacceptable, unless they > > are *fast* (and by fast, I mean fast on all platforms, not just fast on > > the fast platforms). If one second gets added onto the boot path for each > > optimised algorithm, that sounds like a fail. If all the benchmarks > > combined take one second in total, that's no quite as bad. > > > > Maybe benchmarks could be time-bounded (i.e., see how much data we can > > chug though in X milliseconds) instead of size-bounded. This would avoid > > unreasonable slowdown on slow platforms, while avoiding trivially small > > benchmark payloads on faster platforms which may typically have a more > > complex architecture, bigger caches etc. which would cause them to take > > longer to reach saturated performance when running a particular algorithm. > > > > Benchmarks are already time bounded, at least the instances I am aware > of (xor and raid6) are. They each measure, for each available > implementation, the amount of work performed during a fixed time. For > RAID6, this is 16 jiffies, for XOR it's only 1 jiffy but each test is > repeated 5 times. > > So I think this should not be a problem, especially as it is unlikely > that newly added implementations (such as NEON) will be able to > execute on older/slower platforms in the first place. The tree I was originally looking at might be out of date ... apologies for the trolling. If the XOR benchmark really only takes 50 ms per implementation, I guess that shouldn't be too bad. Cheers ---Dave