* [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) @ 2010-08-03 21:42 Lluís 2010-11-26 19:06 ` Paul Brook 0 siblings, 1 reply; 5+ messages in thread From: Lluís @ 2010-08-03 21:42 UTC (permalink / raw) To: qemu-devel; +Cc: Stefan Hajnoczi, Yufei Chen, Eduardo Cruz, Jun Koi Ok, sorry for the delay. Here's a "report" on the current status. Please comment if you feel that any decision has been taken through the wrong path. Also, if you send me patches I'll happily push them into the repository. Quick status summary -------------------- * minimal set of instrumentation points in place for testing (FETCH, VMEM) * examples at ./backdoor/examples and ./instrument/examples * code available at: https://projects.gso.ac.upc.edu/projects/qemu-instrument git clone https://code.gso.ac.upc.edu/git/qemu-instrument How instrumentation currently works ----------------------------------- Instrumentation points have the form of preprocessor macro calls. The user defines each of these on a separate file (selected at configure time). Each CPU has an "instrumentation state" variable, that can be dynamically changed (e.g., by defining a backdoor that calls the instrumentation control API) by the host. The number of states is defined by the user. The macros are called at code generation time, where the user can check if a specific instrumentation state is active on the current CPU (assuming 'cpu_single_env' points to the cpu object that originated the request for disassembling the current instruction). Changing the instrumentation state triggers TB flushes to allow for new disassembly calls to take into account the new instrumentation state. As of now, all CPUs must have the same state (see below). What is lacking --------------- 1) immediate end of TB on backdoor instruction I use backdoor instructions to control the instrumentation state from the guest, triggering a call to a host-side code helper associated to the backdoor instruction. For this to work when controlling instrumentation state, the disassembly of an instrumentation backdoor must immediately end the current TB. The problem is that calling 'end_eob' (i386) produces code that infinitely reexecutes that backdoor instruction. 2) instrumenting i386 is extremely time-consuming (for the developer) As my work is not tied to a specific target architecture, I was thinking of shifting into PPC, as the ISA is pretty regular and that would certainly make the process easier by just patching a small set of places in the code. 3) per-CPU instrumentation state The goal is to achieve minimal performance impact when executing TBs: no instrumentation state checks when executing TBs (perform checks at TB generation time), and negligible performance impact when executing non-instrumented TBs (see if the modifications described below have no performance impact). The original idea was to expand the arrays holding TBs ('tbs' and 'tb_phys_hash') into 2-dimensional arrays, where the first dimension would contain one entry for each possible instrumentation state. When a CPU looks up a TB, it is searched/added on the array for the current state. If the CPU-specific state changes, 'tb_jmp_cache' is flushed and lookups will continue wherever they must according to the current state. It is still unclear if PageDesc should also contain an array of 'first_tb' or if 'l1_map' should be 2-dimensional; I still have to look into that code in more detail to see the feasibility and performance costs of each one. 4) KVM I've performed tests only on i386-linux-user, but backdoor instructions and calls to the instrumentation control API should switch from KVM to softmmu (and disabling all instrumentation should jump back to KVM). I dont' really know what would happen right now. What needs to be decided ------------------------ 1) instrumentation points Which static instrumentation points must be present, and which arguments should they have in order to have a target-agnostinc interface. The current example points are: FETCH(vaddress, size, used_registers, defined_registers) VMEM(vaddress, size, read_or_write) 2) instrumentation from code helpers It might be unavoidable the need to add a second set of calls to user-provided macros to instrument from code helpers, as these must not generate code, but call the user-provided instrumentation code helpers. Another option would be to re-define INSTR_GEN_* into a plain function call to the user macro. -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) 2010-08-03 21:42 [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) Lluís @ 2010-11-26 19:06 ` Paul Brook 2010-11-26 20:19 ` Lluís 0 siblings, 1 reply; 5+ messages in thread From: Paul Brook @ 2010-11-26 19:06 UTC (permalink / raw) To: qemu-devel; +Cc: Stefan Hajnoczi, Yufei Chen, Lluís, Eduardo Cruz, Jun Koi > 2) instrumenting i386 is extremely time-consuming (for the developer) > > As my work is not tied to a specific target architecture, I was thinking of > shifting into PPC, as the ISA is pretty regular and that would certainly > make the process easier by just patching a small set of places in the > code. > >... > The current example points are: > > FETCH(vaddress, size, used_registers, defined_registers) Duplicating the insn decoder to determine which registers are accessed is not a maintainable solution. Likewise requiring separate tracing hooks be added to the existing decoders is extremely unlikely to be a feasible long-term solution. Anything solution that tries to separate CPU instrumentation/tracing from code generation is IMO fundamentally flawed and will rapidly bitrot beyond usefulness. I'd also posit that instrumenting changes in sate is of very limited use if you don't know what the new value is. You almost certainly want to do this using the equivalent of a memory watchpoint on the CPUState structure. Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) 2010-11-26 19:06 ` Paul Brook @ 2010-11-26 20:19 ` Lluís 2010-11-26 21:33 ` Paul Brook 0 siblings, 1 reply; 5+ messages in thread From: Lluís @ 2010-11-26 20:19 UTC (permalink / raw) To: Paul Brook; +Cc: Stefan Hajnoczi, Yufei Chen, qemu-devel, Eduardo Cruz, Jun Koi Paul Brook writes: >> 2) instrumenting i386 is extremely time-consuming (for the developer) >> >> As my work is not tied to a specific target architecture, I was thinking of >> shifting into PPC, as the ISA is pretty regular and that would certainly >> make the process easier by just patching a small set of places in the >> code. >> >> ... >> The current example points are: >> >> FETCH(vaddress, size, used_registers, defined_registers) > Duplicating the insn decoder to determine which registers are accessed is not > a maintainable solution. Right. On ISAs like PowerPC this can be solved much more easily, but in x86 the implementation was based on manually searching all uses of the register arrays, and adding the required call to set/define register. Instead I could "hide" these structures in CPUState (not that I can really do that in C), and provide two accessors that will do the job instead. > Likewise requiring separate tracing hooks be added to the existing > decoders is extremely unlikely to be a feasible long-term > solution. You mean having to modify each "translate.c"? The worst event to handle is instruction fetch on x86. Memory accesses are already automatically handled by simply including a header that wraps the tcg_gen_qemu_ld/st functions, and other events like privilege level change are very localized, so bitrotting is much harder there. > Anything solution that tries to separate CPU instrumentation/tracing > from code generation is IMO fundamentally flawed and will rapidly > bitrot beyond usefulness. That's fundamentally correct, but I think that only on certaing events and architectures. As I said, this could be solved by forcing the programmer to use some well-known interface for accessing, e.g., registers. > I'd also posit that instrumenting changes in sate is of very limited use if > you don't know what the new value is. I don't understand what you mean here. > You almost certainly want to do this using the equivalent of a memory > watchpoint on the CPUState structure. Sorry, do what? Thanks, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) 2010-11-26 20:19 ` Lluís @ 2010-11-26 21:33 ` Paul Brook 2010-11-29 15:04 ` Lluís 0 siblings, 1 reply; 5+ messages in thread From: Paul Brook @ 2010-11-26 21:33 UTC (permalink / raw) To: Lluís; +Cc: Stefan Hajnoczi, Yufei Chen, qemu-devel, Eduardo Cruz, Jun Koi > > Likewise requiring separate tracing hooks be added to the existing > > decoders is extremely unlikely to be a feasible long-term > > solution. > > You mean having to modify each "translate.c"? The worst event to handle > is instruction fetch on x86. Instruction fetches are trivial, you just intercept calls to ld*_code. > > I'd also posit that instrumenting changes in sate is of very limited use > > if you don't know what the new value is. > > I don't understand what you mean here. Your proposed FETCH macro instrumented which registers are modified by an insn, but did not the actual values about to be written to those registers. > > You almost certainly want to do this using the equivalent of a memory > > watchpoint on the CPUState structure. > > Sorry, do what? All guest register values are held in the CPUState structure. So to instrument accesses to guest state you just need to intercept TCG accesses to this structure, either via explicit ld/st ops, or via a global_mem. To a first approximation you can probably get away with just the latter. Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) 2010-11-26 21:33 ` Paul Brook @ 2010-11-29 15:04 ` Lluís 0 siblings, 0 replies; 5+ messages in thread From: Lluís @ 2010-11-29 15:04 UTC (permalink / raw) To: Paul Brook; +Cc: Stefan Hajnoczi, Yufei Chen, qemu-devel, Eduardo Cruz, Jun Koi Paul Brook writes: >> > Likewise requiring separate tracing hooks be added to the existing >> > decoders is extremely unlikely to be a feasible long-term >> > solution. >> >> You mean having to modify each "translate.c"? The worst event to handle >> is instruction fetch on x86. > Instruction fetches are trivial, you just intercept calls to ld*_code. For instruction fetch I mean: * instruction address and length * instruction opcode "type" (I'd like to have some kind of a rough generic opcode type: ALU, branch, etc.) * set of used/defined registers For the loads, ld*_code is not accurate, as the code might call it multiple times, even on positions after the actual instruction, or repeat calls for the saame position. I'ts much easier (I think) to do the kind of trivial math I do on x86 with "s->pc". >> > I'd also posit that instrumenting changes in sate is of very limited use >> > if you don't know what the new value is. >> >> I don't understand what you mean here. > Your proposed FETCH macro instrumented which registers are modified by an > insn, but did not the actual values about to be written to those registers. Right. Register values are not part of what I was looking for. Still, having an accurate set of used/defined registers, you could be given the option of gathering their values at "commit" time by, e.g., adding a new tracing event after generating code for an instruction, with a CPUState argument. The combinations of data you might want to gather from guest code are nearly infinite, and that's why I wanted to provide the minimum set of information that, when you enable instrumentation of tracing events, will let you gather all the rest (like values). >> > You almost certainly want to do this using the equivalent of a memory >> > watchpoint on the CPUState structure. >> >> Sorry, do what? > All guest register values are held in the CPUState structure. So to instrument > accesses to guest state you just need to intercept TCG accesses to this > structure, either via explicit ld/st ops, or via a global_mem. To a first > approximation you can probably get away with just the latter. I think I understand what you mean, and it would certainly simplify the implementation, but it depends on being able to efficiently identify from all memory accesses, which are directed to the interesting CPUState fields, which is easy to get when using functions like "tcg_gen_ld/st_i64", but will get almost impòpssible when the code generator starts using multiple tcg operations to calculate the address of a single access to a CPUState field. But indeed this would be much more easily to maintain for the common case, although probably slower: - you have to register all the possible TCGv_ptr that can point to a CPUState (if you use something else, it will no longer work, although I don't know if there is any such case anywhere) - register the "interesting" offsets (which can be multiple and non-consecutive: general-purpose registers, control registers in x86, mtrr, etc.) - "decode" from those offsets which specific field is being accessed This is supposing that CPUState field access detection is embedded into "tcg-op.h", so that I have all the info after translating, not during translated code execution (as then I would be unable to have all the fetch info before actually executing translated code). Still, if this is what you meant, I think it's way better than the time-consuming and easily bitrotting task that I did. The only drawback is that it would force all targets to produce the fetch event after the whole instruction translation, and thus be forced to do the x86 trick of moving buffers at the end of the instruction translation. I hoped that this would be necessary only for x86, and that other architectures would let me do it more easily before starting the real translation (e.g., using the translation tables found in PowerPC). Thanks, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-29 15:04 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-03 21:42 [Qemu-devel] [RFC] Static instrumentation (aka guest code tracing) Lluís 2010-11-26 19:06 ` Paul Brook 2010-11-26 20:19 ` Lluís 2010-11-26 21:33 ` Paul Brook 2010-11-29 15:04 ` Lluís
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).