From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1GN8Ar-0006tS-Co
	for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:19:37 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1GN8An-0006tE-RK
	for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:19:37 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1GN8An-0006tB-LC
	for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:19:33 -0400
Received: from [213.133.111.57] (helo=bugaboo.mu)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.52) id 1GN8CI-0005C9-IO
	for qemu-devel@nongnu.org; Tue, 12 Sep 2006 09:21:06 -0400
Message-ID: <4506B3DF.3000400@bluegap.ch>
Date: Tue, 12 Sep 2006 15:19:27 +0200
From: Markus Schiltknecht <markus@bluegap.ch>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
References: <20060912072037.96809.qmail@web38105.mail.mud.yahoo.com>
	<200609121343.38015.paul@codesourcery.com>
In-Reply-To: <200609121343.38015.paul@codesourcery.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Paul Brook wrote:
> Modern CPUs are complicated, with many factors effecting execution speed 
> (pipeline interlocks, multiple levels of cache). A cycle accurate simulator 
> will generally be orders of magnitude slower than qemu.

Would it be feasible to just limit the number of instructions qemu 
simulates? Of course that's not very precise and most probably far from 
cycle accurate. But it would allow to easily scale down a virtual 
machine. Which then would allow comparable benchmarks or such.

One step, somewhat more accurate would be to weight all the different 
assembler commands with known (measured) average execution times. Taking 
into account 40% cache misses, for example. (Hm.. but since cache misses 
are very expensive, that might not lead to a much better approximation, 
I guess)

Any documents or discussion threads related to that topic? Other 
projects trying to achieve that?

Since my benchmarking application is mostly disk- and network-I/O bound, 
thus I'll first try to scale that down, which seems far easier to do 
(much less performance penalty for that).

Regards

Markus