Re: [Qemu-devel] RFC Multi-threaded TCG design document

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Frederic Konrad <fred.konrad@greensocs.com>
To: Mark Burton <mark.burton@greensocs.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: mttcg@greensocs.com, Peter Maydell <peter.maydell@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Alexander Graf <agraf@suse.de>,
	Guillaume Delbergue <guillaume.delbergue@greensocs.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Alex Benn?e <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] RFC Multi-threaded TCG design document
Date: Wed, 17 Jun 2015 23:45:52 +0200	[thread overview]
Message-ID: <5581EA90.5020004@greensocs.com> (raw)
In-Reply-To: <63D89881-446B-4523-B877-D2110E361345@greensocs.com>

On 17/06/2015 20:23, Mark Burton wrote:
>> On 17 Jun 2015, at 18:57, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>
>> * Alex Benn?e (alex.bennee@linaro.org) wrote:
>>> Hi,
>>> Shared Data Structures
>>> ======================
>>>
>>> Global TCG State
>>> ----------------
>>>
>>> We need to protect the entire code generation cycle including any post
>>> generation patching of the translated code. This also implies a shared
>>> translation buffer which contains code running on all cores. Any
>>> execution path that comes to the main run loop will need to hold a
>>> mutex for code generation. This also includes times when we need flush
>>> code or jumps from the tb_cache.
>>>
>>> DESIGN REQUIREMENT: Add locking around all code generation, patching
>>> and jump cache modification
>> I don't think that you require a shared translation buffer between
>> cores to do this - although it *might* be the easiest way.
>> You could have a per-core translation buffer, the only requirement is
>> that most invalidation operations happen on all the buffers
>> (although that might depend on the emulated architecture).
>> With a per-core translation buffer, each core could generate new translations
>> without locking the other cores as long as no one is doing invalidations.
> I agree it’s not a design requirement - however we’ve kind of gone round this loop in terms of getting things to work.
> Fred will doubtless fill in some details, but basically it looks like making the TCG so you could run several in parallel is a nightmare. We seem to get reasonable performance having just one CPU at a time generating TBs.  At the same time, of course, the way Qemu is constructed there are actually several ‘layers’ of buffer - from the CPU local ones through to the TB ‘pool’. So, actually, my accident or design, we benefit from a sort of caching structure.
>
True, it seems to be very complex at least on ARM because of the disassemble
context etc.. But on the other side the invalidation might be easier I 
guess.
For performance I'm not sure of what is the better way..

Fred
>>> Memory maps and TLBs
>>> --------------------
>>>
>>> The memory handling code is fairly critical to the speed of memory
>>> access in the emulated system.
>>>
>>>   - Memory regions (dividing up access to PIO, MMIO and RAM)
>>>   - Dirty page tracking (for code gen, migration and display)
>>>   - Virtual TLB (for translating guest address->real address)
>>>
>>> There is a both a fast path walked by the generated code and a slow
>>> path when resolution is required. When the TLB tables are updated we
>>> need to ensure they are done in a safe way by bringing all executing
>>> threads to a halt before making the modifications.
>>>
>>> DESIGN REQUIREMENTS:
>>>
>>>   - TLB Flush All/Page
>>>     - can be across-CPUs
>>>     - will need all other CPUs brought to a halt
>>>   - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
>>>     - This is a per-CPU table - by definition can't race
>>>     - updated by it's own thread when the slow-path is forced
>>>
>>> Emulated hardware state
>>> -----------------------
>>>
>>> Currently the hardware emulation has no protection against
>>> multiple-accesses. However guest systems accessing emulated hardware
>>> should be carrying out their own locking to prevent multiple CPUs
>>> confusing the hardware. Of course there is no guarantee the there
>>> couldn't be a broken guest that doesn't lock so you could get racing
>>> accesses to the hardware.
>>>
>>> There is the class of paravirtualized hardware (VIRTIO) that works in
>>> a purely mmio mode. Often setting flags directly in guest memory as a
>>> result of a guest triggered transaction.
>>>
>>> DESIGN REQUIREMENTS:
>>>
>>>   - Access to IO Memory should be serialised by an IOMem mutex
>>>   - The mutex should be recursive (e.g. allowing pid to relock itself)
>>>
>>> IO Subsystem
>>> ------------
>>>
>>> The I/O subsystem is heavily used by KVM and has seen a lot of
>>> improvements to offload I/O tasks to dedicated IOThreads. There should
>>> be no additional locking required once we reach the Block Driver.
>>>
>>> DESIGN REQUIREMENTS:
>>>
>>>   - The dataplane should continue to be protected by the iothread locks
>> Watch out for where DMA invalidates the translated code.
>>
>
> need to check - that might be a great catch !
>
> Cheers
>
> Mark.
>
>> Dave
>>
>>>
>>> References
>>> ==========
>>>
>>> [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt
>>> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
>>> [3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297
>>>
>>>
>>>
>>> -- 
>>> Alex Bennée
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> 	 +44 (0)20 7100 3485 x 210
>   +33 (0)5 33 52 01 77x 210
>
> 	+33 (0)603762104
> 	mark.burton
>
>

     prev parent reply	other threads:[~2015-06-17 21:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-12 16:37 [Qemu-devel] RFC Multi-threaded TCG design document Alex Bennée
2015-06-15  9:13 ` Frederic Konrad
2015-06-15 10:06   ` Alex Bennée
2015-06-15 10:51     ` Mark Burton
2015-06-15 12:36       ` Alex Bennée
2015-06-15 14:25       ` Alex Bennée
2015-06-15 13:06 ` alvise rigo
2015-06-15 14:25   ` Alex Bennée
2015-06-17 11:58 ` Paolo Bonzini
2015-06-17 15:57   ` Alex Bennée
2015-06-17 16:13     ` Paolo Bonzini
2015-06-17 16:57 ` Dr. David Alan Gilbert
2015-06-17 18:23   ` Mark Burton
2015-06-17 21:45     ` Frederic Konrad [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5581EA90.5020004@greensocs.com \
    --to=fred.konrad@greensocs.com \
    --cc=agraf@suse.de \
    --cc=alex.bennee@linaro.org \
    --cc=dgilbert@redhat.com \
    --cc=guillaume.delbergue@greensocs.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.