From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BidfL-0002aV-ON for qemu-devel@nongnu.org; Thu, 08 Jul 2004 14:30:39 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BidfK-0002Zo-Mp for qemu-devel@nongnu.org; Thu, 08 Jul 2004 14:30:39 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BidfK-0002Ze-EY for qemu-devel@nongnu.org; Thu, 08 Jul 2004 14:30:38 -0400 Received: from [216.254.0.202] (helo=mail2.speakeasy.net) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1Bidco-0007vZ-9x for qemu-devel@nongnu.org; Thu, 08 Jul 2004 14:28:02 -0400 Received: from dsl081-088-222.lax1.dsl.speakeasy.net (HELO [192.168.111.2]) ([64.81.88.222]) (envelope-sender ) by mail2.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 8 Jul 2004 18:28:00 -0000 Subject: Re: [Qemu-devel] Storing code caching From: "John R. Hogerhuis" In-Reply-To: <431F9EDC-D108-11D8-8B8C-000A95B1EB4C@mac.com> References: <1089306349.12383.1723.camel@aragorn> <431F9EDC-D108-11D8-8B8C-000A95B1EB4C@mac.com> Content-Type: text/plain Message-Id: <1089311324.12380.1750.camel@aragorn> Mime-Version: 1.0 Date: Thu, 08 Jul 2004 11:28:44 -0700 Content-Transfer-Encoding: 7bit Reply-To: jhoger@pobox.com, qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Thu, 2004-07-08 at 10:57, Martin Williams wrote: > My idea is to write a program that caches individual files code (rather > than everything) - based around the idea that when a block is started > executing, the cache would be accessed and address minus the base > address (in other words the offset of the block) would be used to find > it within the cache (some algorithm is needed for an efficient method > of storing and locating these blocks as they will not be the same size > as the originals). The basic idea would then be that once qemu detects > a self modifying piece of code, (by a write to a memory address), it > would then black list the block in which the write happened (is this > possible?). > Again... are you talking about User Mode QEMU where you are only running one program + libraries or are you talking about QEMU running an operating system? I can imagine in running an OS you are going to get a lot of collisions where two blocks are at the same offset. Throw in the length of the block and I suppose you will have a lot less but I would guess it will still happen enough to be a problem. You'll have to try and see. The difference between the regular QEMU cache and one saved to disk is that the segment selectors are not valid across runs. In any event, if you do get a collision you can do a memcmp to see if they are really the same exact thing. The trick is to have good enough heurisitics that you don't do the memcmp very often. > The program I would write would basically use the qemu core to process > an entire executable, creating the blocks that are executable on the > host machine, and store them. Then start work on modifying qemu to > recognise the existense of the cache file and use the blocks. Then deal > with the self-modyfing code issue as above ... > QEMU loads the binary to RAM and builds the cache as it simulates its execution. Working from a executable file without simulating is a bit different. For that you should probably look into how a disassembler works and also realize you have a lot of reworking to do if you want to take the approach. I wouldn't, unless your goal is to be able to actually make a new shippable translated binary image. If you are looking to do that, I think that's, well... ambitious would be the nice word. > Martin > > PS - I'm a CS undergrad, but I'm game for it anyway :) Well that just means you can attack the halting problem with more optimism than some of the other folks here ;-) Actually when people bring that up you just need to keep in mind that it is theory. Important, but in engineering the solution just has to meet the threshold of "good enough."