From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BicOG-00040M-1q for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:08:56 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BicOA-0003xt-Rk for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:08:52 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BicOA-0003wn-Ey for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:08:50 -0400 Received: from [216.254.0.203] (helo=mail3.speakeasy.net) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1BicLv-0005Fu-CT for qemu-devel@nongnu.org; Thu, 08 Jul 2004 13:06:31 -0400 Received: from dsl081-088-222.lax1.dsl.speakeasy.net (HELO [192.168.111.2]) ([64.81.88.222]) (envelope-sender ) by mail3.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 8 Jul 2004 17:05:05 -0000 Subject: Re: [Qemu-devel] Storing code caching From: "John R. Hogerhuis" In-Reply-To: References: Content-Type: text/plain Message-Id: <1089306349.12383.1723.camel@aragorn> Mime-Version: 1.0 Date: Thu, 08 Jul 2004 10:05:49 -0700 Content-Transfer-Encoding: 7bit Reply-To: jhoger@pobox.com, qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Thu, 2004-07-08 at 05:26, Martin Williams wrote: > Has anyone thought about trying to store the code caching on disk? Are you talking about "save machine state" essentially "suspend/resume?" That is certainly possible and I believe it has been discussed on the list. The other possibility, that you wish to permanently associate untranslated code with translated code by having a big cache available on disk is in the general case "the halting problem" and there can be no algorithm for that. So you've been warned: There Be Dragons Here... However this is real life so there are probably some things you can do. Some things to understand: 1. Basic blocks of code in the cache are found by their addresses in memory, not their content. You can imagine that from one run to the next code would load in different spots in memory. I suppose you could come up with a set of heuristics for recognizing a basic block: a) the location is not permanent but it might be a good clue. Perhaps though with virtual address space programs always locate to the same place in a virtual map though they will be different spots in physical map? b) the length of the block never changes. That could be a good heuristic c) A checksum of the code with consideration for absolute addresses that have been "fixed up" in the code. These addresses may be different from run-to-run. Remember though adding in a checksum is an efficiency tradeoff. It may not be worth it. d) self modifying code, self modifying code, self modifying code... In coming up with heuristics for recognizing already translated code available in the cache, remember you are trading off against just retranslating. Depending on the complexity/resource intensivity of computations for your heuristic it may not be worth it to do the computations. If you think hard about it there are probably some things you could do efficiently to reuse basic blocks from previous runs. "User mode" QEMU is probably an easier case than the general one of running an entire OS image. And maybe you would want to look at load time... When given a program to run you check your on disk cache to see if you have loaded this program before. Checksum it once to see if you have already saved a cache image for this program. If so, load it up. Encountering dynamically translated (invalidated cache) portions of the code will result in "dead areas" which should never be cached. Anyway an interesting problem for a grad student, I'd say... you have some prototyping/analysis to do in order to come up with some heuristics for matching up real code with cached code. -- John.