From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33)
	id 1BicOG-00040M-1q
	for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:08:56 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33)
	id 1BicOA-0003xt-Rk
	for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:08:52 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33) id 1BicOA-0003wn-Ey
	for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:08:50 -0400
Received: from [216.254.0.203] (helo=mail3.speakeasy.net)
	by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168)
	(Exim 4.34) id 1BicLv-0005Fu-CT
	for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:06:31 -0400
Received: from dsl081-088-222.lax1.dsl.speakeasy.net (HELO [192.168.111.2])
	([64.81.88.222]) (envelope-sender <jhoger@pobox.com>)
	by mail3.speakeasy.net (qmail-ldap-1.03) with SMTP
	for <qemu-devel@nongnu.org>; 8 Jul 2004 17:05:05 -0000
Subject: Re: [Qemu-devel] Storing code caching
From: "John R. Hogerhuis" <jhoger@pobox.com>
In-Reply-To: <F85D0A87-D0D9-11D8-B4E3-000A95B1EB4C@mac.com>
References: <F85D0A87-D0D9-11D8-B4E3-000A95B1EB4C@mac.com>
Content-Type: text/plain
Message-Id: <1089306349.12383.1723.camel@aragorn>
Mime-Version: 1.0
Date: Thu, 08 Jul 2004 10:05:49 -0700
Content-Transfer-Encoding: 7bit
Reply-To: jhoger@pobox.com, qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On Thu, 2004-07-08 at 05:26, Martin Williams wrote:
> Has anyone thought about trying to store the code caching on disk?

Are you talking about "save machine state" essentially "suspend/resume?"
That is certainly possible and I believe it has been discussed on the
list.

The other possibility, that you wish to permanently associate
untranslated code with translated code by having a big cache available
on disk is in the general case "the halting problem" and there can be no
algorithm for that. So you've been warned: There Be Dragons Here...

However this is real life so there are probably some things you can do.

Some things to understand:

1. Basic blocks of code in the cache are found by their addresses in
memory, not their content. You can imagine that from one run to the next
code would load in different spots in memory. I suppose you could come
up with a set of heuristics for recognizing a basic block:
a) the location is not permanent but it might be a good clue. Perhaps
though with virtual address space programs always locate to the same
place in a virtual map though they will be different spots in physical
map?
b) the length of the block never changes. That could be a good heuristic
c) A checksum of the code with consideration for absolute addresses that
have been "fixed up" in the code. These addresses may be different from
run-to-run. Remember though adding in a checksum is an efficiency
tradeoff. It may not be worth it.
d) self modifying code, self modifying code, self modifying code...

In coming up with heuristics for recognizing already translated code
available in the cache, remember you are trading off against just
retranslating. Depending on the complexity/resource intensivity of
computations for your heuristic it may not be worth it to do the
computations.

If you think hard about it there are probably some things you could do
efficiently to reuse basic blocks from previous runs. "User mode" QEMU
is probably an easier case than the general one of running an entire OS
image. And maybe you would want to look at load time... When given a
program to run you check your on disk cache to see if you have loaded
this program before. Checksum it once to see if you have already saved a
cache image for this program. If so, load it up. Encountering
dynamically translated (invalidated cache) portions of the code will
result in "dead areas" which should never be cached.

Anyway an interesting problem for a grad student, I'd say... you have
some prototyping/analysis to do in order to come up with some heuristics
for matching up real code with cached code.

-- John.