From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56638)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1Y1a8g-0000a6-1f
	for qemu-devel@nongnu.org; Thu, 18 Dec 2014 07:25:11 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1Y1a8a-00012j-KH
	for qemu-devel@nongnu.org; Thu, 18 Dec 2014 07:25:05 -0500
Received: from cantor2.suse.de ([195.135.220.15]:33394 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1Y1a8a-00012O-Ad
	for qemu-devel@nongnu.org; Thu, 18 Dec 2014 07:25:00 -0500
Message-ID: <5492C798.8070503@suse.de>
Date: Thu, 18 Dec 2014 13:24:56 +0100
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <1418721234-9588-1-git-send-email-fred.konrad@greensocs.com>
	<CAFEAcA-v_ObUU7aw-e3UFew-SQ=GyfY8aVhUkyJpX2w_TEB-qw@mail.gmail.com>
	<54915A76.3000408@greensocs.com> <54915AE8.3010809@suse.de>
	<DC28F09E-E330-4E78-A80B-CFDEDA643E06@greensocs.com>
	<54915EC6.2050708@suse.de>
	<8B6B4BF9-3400-4125-8571-F4EF9F12AA89@greensocs.com>
	<5491666A.7060001@suse.de> <54916829.3020200@redhat.com>
	<CAFEAcA9hW2JeTmCLE9DK8n4VwHaXYPSu5Lt-NnDtTXgL5VfRYA@mail.gmail.com>
	<60A11491-8466-4EBC-9877-22E341688DD9@greensocs.com>
	<CAFEAcA_MVtXGKOjz+ExhWPzB69LvJqHLC=wcs-XngYjXCdkp5A@mail.gmail.com>
	<6B541656-15EA-47CB-8043-AE3B18FC60D4@greensocs.com>
	<CAFEAcA_8U0zsiZ4e43tmi4mQDgMYmK21d3CouZpN8aNHmBR-pg@mail.gmail.com>
	<F8D0B38D-5900-45B5-BFE2-2807498F9B33@greensocs.com>
In-Reply-To: <F8D0B38D-5900-45B5-BFE2-2807498F9B33@greensocs.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Mark Burton <mark.burton@greensocs.com>, Peter Maydell <peter.maydell@linaro.org>
Cc: mttcg@listserver.greensocs.com, Paolo Bonzini <pbonzini@redhat.com>, =?UTF-8?B?TGx1w61zIFZpbGFub3Zh?= <vilanova@ac.upc.edu>, QEMU Developers <qemu-devel@nongnu.org>, =?UTF-8?B?S09OUkFEIEZyw6lkw6lyaWM=?= <fred.konrad@greensocs.com>


On 18.12.14 10:12, Mark Burton wrote:
>=20
>> On 17 Dec 2014, at 17:39, Peter Maydell <peter.maydell@linaro.org> wro=
te:
>>
>> On 17 December 2014 at 16:29, Mark Burton <mark.burton@greensocs.com> =
wrote:
>>>> On 17 Dec 2014, at 17:27, Peter Maydell <peter.maydell@linaro.org> w=
rote:
>>>> I think a mutex is fine, personally -- I just don't want
>>>> to see fifteen hand-hacked mutexes in the target-* code.
>>>>
>>>
>>> Which would seem to favour the helper function approach?
>>> Or am I missing something?
>>
>> You need at least some support from QEMU core -- consider
>> what happens with this patch if the ldrex takes a data
>> abort, for instance.
>>
>> And if you need the "stop all other CPUs while I do this=E2=80=9D
>=20
> It looks like a corner case, but working this through - the =E2=80=99si=
mple=E2=80=99 put a mutex around the atomic instructions approach would i=
ndeed need to ensure that no other core was doing anything - that just ha=
ppens to be true for qemu today (or - we would have to put a mutex around=
 all writes); in order to ensure the case where a store exclusive could p=
otential fail if a non-atomic instruction wrote (a different value) to th=
e same address. This is currently guarantee by the implementation in Qemu=
 - how useful it is I dont know, but if we break it, we run the risk that=
 something will fail (at the least, we could not claim to have kept thing=
s the same).
>=20
> This also has implications for the idea of adding TCG ops I think...
> The ideal scenario is that we could =E2=80=98fallback=E2=80=99 on the s=
ame semantics that are there today - allowing specific target/host combin=
ations to be optimised (and to improve their functionality).=20
> But that means, from within the TCG Op, we would need to have a mechani=
sm, to cause other TCG=E2=80=99s to take an exit=E2=80=A6. etc etc=E2=80=A6=
 In the end, I=E2=80=99m sure it=E2=80=99s possible, but it feels so awkw=
ard.

That's the nice thing about transactions - they guarantee that no other
CPU accesses the same cache line at the same time. So you're safe
against other vcpus even without blocking them manually.

For the non-transactional implementation we probably would need an "IPI
others and halt them until we're done with the critical section"
approach. But I really wouldn't concentrate on making things fast on old
CPUs.

Also keep in mind that for the UP case we can always omit all the magic
- we only need to detect when we move into an SMP case (linux-user clone
or -smp on system).

>=20
> To re-cap where we are (for my own benefit if nobody else):
> We have several propositions in terms of implementing Atomic instructio=
ns
>=20
> 1/ We wrap the atomic instructions in a mutex using helper functions (t=
his is the approach others have taken, it=E2=80=99s simple, but it is not=
 clean, as stated above).

This is horrible. Imagine you have this split approach with a load
exclusive and then store whereas the load starts mutex usage and the
store stop is. At that point if the store creates a segfault you'll be
left with a dangling mutex.

This stuff really belongs into the TCG core.

>=20
> 1.5/ We add a mechanism to ensure that when the mutex is taken, all oth=
er cores are =E2=80=98stopped=E2=80=99.
>=20
> 2/ We add some TCG ops to effectively do the same thing, but this would=
 give us the benefit of being able to provide better implementations. Thi=
s is attractive, but we would end up needing ops to cover at least exclus=
ive load/store and atomic compare exchange. To me this looks less than el=
egant (being pulled close to the target, rather than being able to genera=
lise), but it=E2=80=99s not clear how we would implement the operations a=
s we would like, with a machine instruction, unless we did split them out=
 along these lines. This approach also (probably) requires the 1.5 mechan=
ism above.

I'm still in favor of just forcing the semantics of transactions onto
this. If the host doesn't implement transactions, tough luck - do the
"halt all others" IPI.

>=20
> 3/ We have discussed a =E2=80=98h/w=E2=80=99 approach to the problem. I=
n this case, all atomic instructions are forced to take the slow path - a=
nd a additional flags are added to the memory API. We then deal with the =
issue closer to the memory where we can record who has a lock on a memory=
 address. For this to work - we would also either
> a) need to add a mprotect type approach to ensure no =E2=80=98non atomi=
c=E2=80=99 writes occur - or
> b) need to force all cores to mark the page with the exclusive memory a=
s IO or similar to ensure that all write accesses followed the slow path.
>=20
> 4/ There is an option to implement exclusive operations within the TCG =
using mprotect (and signal handlers). I have some concerns on this : woul=
d we need have to have support for each host O/S=E2=80=A6. I also think w=
e might end up the a lot of protected regions causing a lot of SIGSEGV=E2=
=80=99s because an errant guest doesn=E2=80=99t behave well - basically w=
e will need to see the impact on performance - finally - this will be rea=
lly painful to deal with for cases where the exclusive memory is held in =
what Qemu considers IO space !!!
> 	In other words - putting the mprotect inside TCG looks to me like it=E2=
=80=99s mutually exclusive to supporting a memory-based scheme like (3).

Again, I don't think it's worth caring about legacy host systems too
much. In a few years from now transactional memory will be commodity,
just like KVM is today.


Alex

> My personal preference is for 3b) it  is =E2=80=9Csafe=E2=80=9D - its w=
here the hardware is.
> 3a is an optimization of that.
> to me, (2) is an optimisation again. We are effectively saying, if you =
are able to do this directly, then you dont need to pass via the slow pat=
h. Otherwise, you always have the option of reverting to the slow path.
>=20
> Frankly - 1 and 1.5 are hacks - they are not optimisations, they are ju=
st dirty hacks. However - their saving grace is that they are hacks that =
exist and =E2=80=9Cwork=E2=80=9D. I dislike patching the hack, but it did=
 seem to offer the fastest solution to get around this problem - at least=
 for now. I am no longer convinced.
>=20
> 4/ is something I=E2=80=99d like other peoples views on too=E2=80=A6 Is=
 it a better approach? What about the slow path?
>=20
> I increasingly begin to feel that we should really approach this from t=
he other end, and provide a =E2=80=98correct=E2=80=99 solution using the =
memory - then worry about making that faster=E2=80=A6
>=20
> Cheers
>=20
> Mark.
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>> semantics linux-user currently uses then that definitely needs
>> core code support. (Maybe linux-user is being over-zealous
>> there; I haven't thought about it.)
>>
>> -- PMM
>=20
>=20
> 	 +44 (0)20 7100 3485 x 210
>  +33 (0)5 33 52 01 77x 210
>=20
> 	+33 (0)603762104
> 	mark.burton
>=20