qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/5] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
@ 2016-09-15 15:50 Lluís Vilanova
  2016-09-15 15:50 ` [Qemu-devel] [PATCH v2 1/5] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Lluís Vilanova @ 2016-09-15 15:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi

Avoids generating TCG code to call guest code tracing events in vCPUs that are
not dynamically tracing that event.

Currently, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, when their dynamic tracing state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series replicates the shared physical TB cache,
creating a separate physical TB cache for every combination of event states
(those with the 'vcpu' and 'tcg' properties). Then, all vCPUs tracing the same
events will use the same physical TB cache.

Sharing physical TBs makes this very space efficient (only the physical TB
caches, simple arrays of pointers, are replicated), sharing physical TB caches
maximizes TB reuse across vCPUs whenever possible, and makes dynamic event state
changes more efficient (simply use a different TB array).

The physical TB cache array is indexed with the vCPU's trace event state
bitmask. This is simpler and more efficient than emitting TCG code to check if
an event needs tracing; then we should still move the tracing call code to
either a cold path (making tracing performance worse), or leave it inlined
(making non-tracing performance worse).

It is also more efficient than eliding TCG code only when *zero* vCPUs are
tracing an event, since enabling it on a single vCPU will impact the performance
of all other vCPUs that are not tracing that event.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---

Changes in v2
=============

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].


Lluís Vilanova (5):
      exec: [tcg] Refactor flush of per-CPU virtual TB cache
      exec: [tcg] Use multiple physical TB caches
      exec: [tcg] Switch physical TB cache based on vCPU tracing state
      trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
      trace: [tcg,trivial] Re-align generated code


 cpu-exec.c                               |   11 ++++
 cputlb.c                                 |    2 -
 include/exec/exec-all.h                  |   12 ++++
 include/exec/tb-context.h                |    2 -
 include/qom/cpu.h                        |    4 +
 qom/cpu.c                                |    1 
 scripts/tracetool/backend/dtrace.py      |    2 -
 scripts/tracetool/backend/ftrace.py      |   20 ++++---
 scripts/tracetool/backend/log.py         |   16 +++---
 scripts/tracetool/backend/simple.py      |    2 -
 scripts/tracetool/backend/syslog.py      |    6 +-
 scripts/tracetool/backend/ust.py         |    2 -
 scripts/tracetool/format/h.py            |   23 ++++++--
 scripts/tracetool/format/tcg_h.py        |   20 ++++++-
 scripts/tracetool/format/tcg_helper_c.py |    3 +
 trace/control-target.c                   |    2 +
 trace/control.h                          |    3 +
 translate-all.c                          |   83 ++++++++++++++++++++++++++----
 translate-all.h                          |   43 ++++++++++++++++
 translate-all.inc.h                      |   13 +++++
 20 files changed, 221 insertions(+), 49 deletions(-)
 create mode 100644 translate-all.inc.h


To: qemu-devel@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-26 16:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-15 15:50 [Qemu-devel] [PATCH v2 0/5] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
2016-09-15 15:50 ` [Qemu-devel] [PATCH v2 1/5] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
2016-09-15 15:50 ` [Qemu-devel] [PATCH v2 2/5] exec: [tcg] Use multiple physical TB caches Lluís Vilanova
2016-09-15 15:50 ` [Qemu-devel] [PATCH v2 3/5] exec: [tcg] Switch physical TB cache based on vCPU tracing state Lluís Vilanova
2016-09-15 15:50 ` [Qemu-devel] [PATCH v2 4/5] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
2016-09-26 13:41   ` Stefan Hajnoczi
2016-09-26 16:12     ` Lluís Vilanova
2016-09-15 15:51 ` [Qemu-devel] [PATCH v2 5/5] trace: [tcg, trivial] Re-align generated code Lluís Vilanova
2016-09-26 13:37 ` [Qemu-devel] [PATCH v2 0/5] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).