From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GN8Ar-0006tS-Co for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:19:37 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GN8An-0006tE-RK for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:19:37 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GN8An-0006tB-LC for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:19:33 -0400 Received: from [213.133.111.57] (helo=bugaboo.mu) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.52) id 1GN8CI-0005C9-IO for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:21:06 -0400 Message-ID: <4506B3DF.3000400@bluegap.ch> Date: Tue, 12 Sep 2006 15:19:27 +0200 From: Markus Schiltknecht MIME-Version: 1.0 Subject: Re: [Qemu-devel] ARM CPU Speed simulated by Qemu? References: <20060912072037.96809.qmail@web38105.mail.mud.yahoo.com> <200609121343.38015.paul@codesourcery.com> In-Reply-To: <200609121343.38015.paul@codesourcery.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Paul Brook wrote: > Modern CPUs are complicated, with many factors effecting execution speed > (pipeline interlocks, multiple levels of cache). A cycle accurate simulator > will generally be orders of magnitude slower than qemu. Would it be feasible to just limit the number of instructions qemu simulates? Of course that's not very precise and most probably far from cycle accurate. But it would allow to easily scale down a virtual machine. Which then would allow comparable benchmarks or such. One step, somewhat more accurate would be to weight all the different assembler commands with known (measured) average execution times. Taking into account 40% cache misses, for example. (Hm.. but since cache misses are very expensive, that might not lead to a much better approximation, I guess) Any documents or discussion threads related to that topic? Other projects trying to achieve that? Since my benchmarking application is mostly disk- and network-I/O bound, thus I'll first try to scale that down, which seems far easier to do (much less performance penalty for that). Regards Markus