From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@arm.linux.org.uk (Russell King - ARM Linux) Date: Sat, 3 Jul 2010 21:28:45 +0100 Subject: Some benchmarks on ARM In-Reply-To: <4C2F98BD.9020803@xenomai.org> References: <20100702180257.GA8767@pengutronix.de> <4C2F98BD.9020803@xenomai.org> Message-ID: <20100703202845.GA23954@n2100.arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Sat, Jul 03, 2010 at 10:08:29PM +0200, Gilles Chanteperdrix wrote: > Robert Schwebel wrote: > > Hi, > > > > We have recently made some benchmarks, in order to get a little bit > > better fealing about where ARM cpus are today, especially when it comes > > to the "recent" ones, and in comparism to the Atom. So we collected a > > few benchmarks (most from lmbench) and did some actual measurements. > > > > Here is a little article: > > http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html > > > > I'm pretty sure that there are quite a few things where people on ALKML > > have good ideas where the effects come from or how to improve the > > methodology - so I'd be glad to get some feedback from the community! > > > > All measurements have been done on 2.6.34. > > The context switch time for PXA270 looks really suspicious. The worst > case context switch time of an AT91RM9200, an armv4 running at 180MHz, > is less than 300us, so, I doubt that the context switch time of a PXA > can be that worse. The measurement is of the thread and MM switch time, which'll involve cache flushes on <= ARMv5. On PXA, we have to 'read' (via means of D cache line allocations) 32K of data into the cache in order to cause the existing data to be written out. These are done from a range of addresses which don't exist in the page tables, and so should not cause any bus activity other than the write-outs. However, the cache still has to interact with the MMU to try to fetch the requested data - which probably consumes some cycles. ARM920 on the other hand can walk through every cache line and clean+ invalidate it. It has 64 lines in each segment, and 8 segments, each line 32 bytes long, which gives a cache size of 16K. I'd therefore expect ARM920's cache flushing to be quicker (in terms of cycles consumed) than PXA.