All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] per-TLB lock
@ 2018-10-02 21:29 Emilio G. Cota
  2018-10-02 21:29 ` [Qemu-devel] [PATCH 1/3] exec: introduce tlb_init Emilio G. Cota
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Emilio G. Cota @ 2018-10-02 21:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson, Alex Bennée

This series introduces a per-TLB lock. This removes existing UB
(e.g. memset racing with cmpxchg on another thread while flushing),
and in my opinion makes the TLB code simpler to understand.

I had a bit of trouble finding the best place to initialize the
mutex, since it has to be called before tlb_flush, and tlb_flush
is called quite early during cpu initialization. I settled on
cpu_exec_realizefn, since then cpu->env_ptr has been set
but tlb_flush hasn't yet been called.

Perf-wise this change does have a small impact (~2% slowdown for
the aarch64 bootup+shutdown test; 1.2% comes from using atomic_read
consistently), but I think this is a fair price for avoiding UB.
Numbers below.

Initially I tried using atomics instead of memset for flushing (i.e.
no mutex), and the slowdown is close to 2X due to the repeated
(full) memory barriers. That's when I turned to using a lock.

Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

- Before this series:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7464.797838      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.14% )
    31,473,652,436      cycles                    #    4.216 GHz                      ( +-  0.14% )
    57,032,288,549      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,239,975,873      branches                  # 1371.769 M/sec                    ( +-  0.07% )
       172,150,358      branch-misses             #    1.68% of all branches          ( +-  0.12% )

       7.482009203 seconds time elapsed                                          ( +-  0.18% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):
       7621.625434      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.10% )
    32,149,898,976      cycles                    #    4.218 GHz                      ( +-  0.10% )
    58,168,454,452      instructions              #    1.81  insns per cycle          ( +-  0.10% )
    10,486,183,612      branches                  # 1375.846 M/sec                    ( +-  0.10% )
       173,900,633      branch-misses             #    1.66% of all branches          ( +-  0.11% )

       7.632067213 seconds time elapsed                                          ( +-  0.10% )

This series is checkpatch-clean. You can fetch the code from:
  https://github.com/cota/qemu/tree/tlb-lock

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-10-03 18:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-02 21:29 [Qemu-devel] [PATCH 0/3] per-TLB lock Emilio G. Cota
2018-10-02 21:29 ` [Qemu-devel] [PATCH 1/3] exec: introduce tlb_init Emilio G. Cota
2018-10-02 21:29 ` [Qemu-devel] [PATCH 2/3] cputlb: serialize tlb updates with env->tlb_lock Emilio G. Cota
2018-10-03  9:19   ` Alex Bennée
2018-10-03 10:02     ` Paolo Bonzini
2018-10-03 15:48       ` Emilio G. Cota
2018-10-03 15:52         ` Paolo Bonzini
2018-10-03 17:02           ` Emilio G. Cota
2018-10-03 17:05             ` Paolo Bonzini
2018-10-03 18:07               ` Emilio G. Cota
2018-10-02 21:29 ` [Qemu-devel] [PATCH 3/3] cputlb: read CPUTLBEntry.addr_write atomically Emilio G. Cota
2018-10-03  7:56 ` [Qemu-devel] [PATCH 0/3] per-TLB lock Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.