From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37032) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dy37T-0004ib-GP for qemu-devel@nongnu.org; Fri, 29 Sep 2017 17:46:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dy37Q-0008SY-An for qemu-devel@nongnu.org; Fri, 29 Sep 2017 17:46:51 -0400 Received: from roura.ac.upc.es ([147.83.33.10]:54242) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dy37P-0008QN-V0 for qemu-devel@nongnu.org; Fri, 29 Sep 2017 17:46:48 -0400 From: =?utf-8?Q?Llu=C3=ADs_Vilanova?= References: <150529642278.10902.18234057937634437857.stgit@frigg.lan> <150529666493.10902.14830445134051381968.stgit@frigg.lan> <87poasgjyh.fsf@frigg.lan> <87d16o53xr.fsf@frigg.lan> <87o9pywt8k.fsf@frigg.lan> <87shf5zlty.fsf@frigg.lan> <20170929175943.GA25038@flamenco> Date: Sat, 30 Sep 2017 00:46:33 +0300 In-Reply-To: <20170929175943.GA25038@flamenco> (Emilio G. Cota's message of "Fri, 29 Sep 2017 13:59:43 -0400") Message-ID: <87vak1w53a.fsf@frigg.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" Cc: Peter Maydell , QEMU Developers , Stefan Hajnoczi , Markus Armbruster List-ID: Emilio G Cota writes: > On Fri, Sep 29, 2017 at 16:16:41 +0300, Llu=C3=ADs Vilanova wrote: >> Llu=C3=ADs Vilanova writes: >> [...] >> > This was working on a much older version of instrumentation for QEMU, = but I can >> > implement something that does the first use-case point above and some = filtering >> > example (second use-case point) to see what's the performance differen= ce. >>=20 >> Ok, so here's some numbers for the discussion (booting Emilio's ARM full= system >> image that immediately shuts down): >>=20 >> * Without instrumentation >>=20 >> real 0m10,099s >> user 0m9,876s >> sys 0m0,128s >>=20 >> * Count number of memory access writes, by instrumenting only when they = are >> executed >>=20 >> real 0m15,896s >> user 0m15,752s >> sys 0m0,108s >>=20 >> * Counting same, but the filtering is done at translation time (i.e., not >> generate an execute-time callback if it's not a write) >>=20 >> real 0m11,084s >> user 0m10,880s >> sys 0m0,112s >>=20 >> As Peter said, the filtering can be added into the API to take advantage= of >> this "speedup", without exposing translation vs execution time callbacks. > I'm not sure I understand this concept of filtering. Are you saying that = in > the first case, all memory accesses are instrumented, and then in the > "access helper" we only call the user's callback if it's a memory write? > And in the second case, we simply just generate a "write helper" instead > of an "access helper". Am I understanding this correctly? In the previous case (no filtering), the user callback is always called whe= n a memory access is *executed*, and the user then checks if the access mode is= a write to decide whether to increment a counter. In this case (with filtering), a user callback is called when a memory acce= ss is *translated*, and if the access mode is a write, the user generates a call = to a second callback that is executed every time a memory access is executed (on= ly that it is only generated for memory writes, the ones we care about). Is this clearer? >> * Counting number of executed instructions, by instrumenting the beginni= ng of >> each one of them >>=20 >> real 0m24,583s >> user 0m24,352s >> sys 0m0,184s >>=20 >> * Counting same, but per-TB numbers are collected at translation-time, a= nd we >> only generate a per-TB execution time callback to add the corresponding = number >> of instructions for that TB >>=20 >> real 0m11,151s >> user 0m10,952s >> sys 0m0,092s >>=20 >> This really needs to expose translation vs execution time callbacks to t= ake >> advantage of this "speedup". > Clearly instrumenting per-TB is a significant net gain. I think we should > definitely allow instrumenters to use this option. > FWIW my experiments so far show similar numbers for instrumenting each > instruction (haven't done the per-tb yet). The difference is that I'm > exposing to instrumenters a copy of the guest instructions (const void *d= ata, > size_t size). These copies are kept around until TB's are flushed. > Luckily there seems to be very little overhead in keeping these around, > apart from the memory overhead -- but in terms of performance, the > necessary allocations do not induce significant overhead. To keep this use-case simpler, I added the memory access API I posted in th= is series, where instrumenters can read guest memory (more general than passin= g a copy of the current instruction). Cheers, Lluis