From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42205)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <fred.konrad@greensocs.com>) id 1Z5L9o-0003ET-H9
	for qemu-devel@nongnu.org; Wed, 17 Jun 2015 17:46:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <fred.konrad@greensocs.com>) id 1Z5L9l-0007B9-7j
	for qemu-devel@nongnu.org; Wed, 17 Jun 2015 17:46:04 -0400
Received: from greensocs.com ([193.104.36.180]:33888)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <fred.konrad@greensocs.com>) id 1Z5L9k-0007AC-QW
	for qemu-devel@nongnu.org; Wed, 17 Jun 2015 17:46:01 -0400
Message-ID: <5581EA90.5020004@greensocs.com>
Date: Wed, 17 Jun 2015 23:45:52 +0200
From: Frederic Konrad <fred.konrad@greensocs.com>
MIME-Version: 1.0
References: <878uborigh.fsf@linaro.org> <20150617165716.GM2122@work-vm>
	<63D89881-446B-4523-B877-D2110E361345@greensocs.com>
In-Reply-To: <63D89881-446B-4523-B877-D2110E361345@greensocs.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] RFC Multi-threaded TCG design document
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Mark Burton <mark.burton@greensocs.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: mttcg@greensocs.com, Peter Maydell <peter.maydell@linaro.org>, QEMU Developers <qemu-devel@nongnu.org>, Alexander Graf <agraf@suse.de>, Guillaume Delbergue <guillaume.delbergue@greensocs.com>, Paolo Bonzini <pbonzini@redhat.com>, Alex Benn?e <alex.bennee@linaro.org>

On 17/06/2015 20:23, Mark Burton wrote:
>> On 17 Jun 2015, at 18:57, Dr. David Alan Gilbert <dgilbert@redhat.com>=
 wrote:
>>
>> * Alex Benn?e (alex.bennee@linaro.org) wrote:
>>> Hi,
>>> Shared Data Structures
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>
>>> Global TCG State
>>> ----------------
>>>
>>> We need to protect the entire code generation cycle including any pos=
t
>>> generation patching of the translated code. This also implies a share=
d
>>> translation buffer which contains code running on all cores. Any
>>> execution path that comes to the main run loop will need to hold a
>>> mutex for code generation. This also includes times when we need flus=
h
>>> code or jumps from the tb_cache.
>>>
>>> DESIGN REQUIREMENT: Add locking around all code generation, patching
>>> and jump cache modification
>> I don't think that you require a shared translation buffer between
>> cores to do this - although it *might* be the easiest way.
>> You could have a per-core translation buffer, the only requirement is
>> that most invalidation operations happen on all the buffers
>> (although that might depend on the emulated architecture).
>> With a per-core translation buffer, each core could generate new trans=
lations
>> without locking the other cores as long as no one is doing invalidatio=
ns.
> I agree it=92s not a design requirement - however we=92ve kind of gone =
round this loop in terms of getting things to work.
> Fred will doubtless fill in some details, but basically it looks like m=
aking the TCG so you could run several in parallel is a nightmare. We see=
m to get reasonable performance having just one CPU at a time generating =
TBs.  At the same time, of course, the way Qemu is constructed there are =
actually several =91layers=92 of buffer - from the CPU local ones through=
 to the TB =91pool=92. So, actually, my accident or design, we benefit fr=
om a sort of caching structure.
>
True, it seems to be very complex at least on ARM because of the disassem=
ble
context etc.. But on the other side the invalidation might be easier I=20
guess.
For performance I'm not sure of what is the better way..

Fred
>>> Memory maps and TLBs
>>> --------------------
>>>
>>> The memory handling code is fairly critical to the speed of memory
>>> access in the emulated system.
>>>
>>>   - Memory regions (dividing up access to PIO, MMIO and RAM)
>>>   - Dirty page tracking (for code gen, migration and display)
>>>   - Virtual TLB (for translating guest address->real address)
>>>
>>> There is a both a fast path walked by the generated code and a slow
>>> path when resolution is required. When the TLB tables are updated we
>>> need to ensure they are done in a safe way by bringing all executing
>>> threads to a halt before making the modifications.
>>>
>>> DESIGN REQUIREMENTS:
>>>
>>>   - TLB Flush All/Page
>>>     - can be across-CPUs
>>>     - will need all other CPUs brought to a halt
>>>   - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
>>>     - This is a per-CPU table - by definition can't race
>>>     - updated by it's own thread when the slow-path is forced
>>>
>>> Emulated hardware state
>>> -----------------------
>>>
>>> Currently the hardware emulation has no protection against
>>> multiple-accesses. However guest systems accessing emulated hardware
>>> should be carrying out their own locking to prevent multiple CPUs
>>> confusing the hardware. Of course there is no guarantee the there
>>> couldn't be a broken guest that doesn't lock so you could get racing
>>> accesses to the hardware.
>>>
>>> There is the class of paravirtualized hardware (VIRTIO) that works in
>>> a purely mmio mode. Often setting flags directly in guest memory as a
>>> result of a guest triggered transaction.
>>>
>>> DESIGN REQUIREMENTS:
>>>
>>>   - Access to IO Memory should be serialised by an IOMem mutex
>>>   - The mutex should be recursive (e.g. allowing pid to relock itself=
)
>>>
>>> IO Subsystem
>>> ------------
>>>
>>> The I/O subsystem is heavily used by KVM and has seen a lot of
>>> improvements to offload I/O tasks to dedicated IOThreads. There shoul=
d
>>> be no additional locking required once we reach the Block Driver.
>>>
>>> DESIGN REQUIREMENTS:
>>>
>>>   - The dataplane should continue to be protected by the iothread loc=
ks
>> Watch out for where DMA invalidates the translated code.
>>
>
> need to check - that might be a great catch !
>
> Cheers
>
> Mark.
>
>> Dave
>>
>>>
>>> References
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>
>>> [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/p=
lain/Documentation/memory-barriers.txt
>>> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
>>> [3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297
>>>
>>>
>>>
>>> --=20
>>> Alex Benn=E9e
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> 	 +44 (0)20 7100 3485 x 210
>   +33 (0)5 33 52 01 77x 210
>
> 	+33 (0)603762104
> 	mark.burton
>
>