From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y1a8g-0000a6-1f for qemu-devel@nongnu.org; Thu, 18 Dec 2014 07:25:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y1a8a-00012j-KH for qemu-devel@nongnu.org; Thu, 18 Dec 2014 07:25:05 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33394 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y1a8a-00012O-Ad for qemu-devel@nongnu.org; Thu, 18 Dec 2014 07:25:00 -0500 Message-ID: <5492C798.8070503@suse.de> Date: Thu, 18 Dec 2014 13:24:56 +0100 From: Alexander Graf MIME-Version: 1.0 References: <1418721234-9588-1-git-send-email-fred.konrad@greensocs.com> <54915A76.3000408@greensocs.com> <54915AE8.3010809@suse.de> <54915EC6.2050708@suse.de> <8B6B4BF9-3400-4125-8571-F4EF9F12AA89@greensocs.com> <5491666A.7060001@suse.de> <54916829.3020200@redhat.com> <60A11491-8466-4EBC-9877-22E341688DD9@greensocs.com> <6B541656-15EA-47CB-8043-AE3B18FC60D4@greensocs.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Mark Burton , Peter Maydell Cc: mttcg@listserver.greensocs.com, Paolo Bonzini , =?UTF-8?B?TGx1w61zIFZpbGFub3Zh?= , QEMU Developers , =?UTF-8?B?S09OUkFEIEZyw6lkw6lyaWM=?= On 18.12.14 10:12, Mark Burton wrote: >=20 >> On 17 Dec 2014, at 17:39, Peter Maydell wro= te: >> >> On 17 December 2014 at 16:29, Mark Burton = wrote: >>>> On 17 Dec 2014, at 17:27, Peter Maydell w= rote: >>>> I think a mutex is fine, personally -- I just don't want >>>> to see fifteen hand-hacked mutexes in the target-* code. >>>> >>> >>> Which would seem to favour the helper function approach? >>> Or am I missing something? >> >> You need at least some support from QEMU core -- consider >> what happens with this patch if the ldrex takes a data >> abort, for instance. >> >> And if you need the "stop all other CPUs while I do this=E2=80=9D >=20 > It looks like a corner case, but working this through - the =E2=80=99si= mple=E2=80=99 put a mutex around the atomic instructions approach would i= ndeed need to ensure that no other core was doing anything - that just ha= ppens to be true for qemu today (or - we would have to put a mutex around= all writes); in order to ensure the case where a store exclusive could p= otential fail if a non-atomic instruction wrote (a different value) to th= e same address. This is currently guarantee by the implementation in Qemu= - how useful it is I dont know, but if we break it, we run the risk that= something will fail (at the least, we could not claim to have kept thing= s the same). >=20 > This also has implications for the idea of adding TCG ops I think... > The ideal scenario is that we could =E2=80=98fallback=E2=80=99 on the s= ame semantics that are there today - allowing specific target/host combin= ations to be optimised (and to improve their functionality).=20 > But that means, from within the TCG Op, we would need to have a mechani= sm, to cause other TCG=E2=80=99s to take an exit=E2=80=A6. etc etc=E2=80=A6= In the end, I=E2=80=99m sure it=E2=80=99s possible, but it feels so awkw= ard. That's the nice thing about transactions - they guarantee that no other CPU accesses the same cache line at the same time. So you're safe against other vcpus even without blocking them manually. For the non-transactional implementation we probably would need an "IPI others and halt them until we're done with the critical section" approach. But I really wouldn't concentrate on making things fast on old CPUs. Also keep in mind that for the UP case we can always omit all the magic - we only need to detect when we move into an SMP case (linux-user clone or -smp on system). >=20 > To re-cap where we are (for my own benefit if nobody else): > We have several propositions in terms of implementing Atomic instructio= ns >=20 > 1/ We wrap the atomic instructions in a mutex using helper functions (t= his is the approach others have taken, it=E2=80=99s simple, but it is not= clean, as stated above). This is horrible. Imagine you have this split approach with a load exclusive and then store whereas the load starts mutex usage and the store stop is. At that point if the store creates a segfault you'll be left with a dangling mutex. This stuff really belongs into the TCG core. >=20 > 1.5/ We add a mechanism to ensure that when the mutex is taken, all oth= er cores are =E2=80=98stopped=E2=80=99. >=20 > 2/ We add some TCG ops to effectively do the same thing, but this would= give us the benefit of being able to provide better implementations. Thi= s is attractive, but we would end up needing ops to cover at least exclus= ive load/store and atomic compare exchange. To me this looks less than el= egant (being pulled close to the target, rather than being able to genera= lise), but it=E2=80=99s not clear how we would implement the operations a= s we would like, with a machine instruction, unless we did split them out= along these lines. This approach also (probably) requires the 1.5 mechan= ism above. I'm still in favor of just forcing the semantics of transactions onto this. If the host doesn't implement transactions, tough luck - do the "halt all others" IPI. >=20 > 3/ We have discussed a =E2=80=98h/w=E2=80=99 approach to the problem. I= n this case, all atomic instructions are forced to take the slow path - a= nd a additional flags are added to the memory API. We then deal with the = issue closer to the memory where we can record who has a lock on a memory= address. For this to work - we would also either > a) need to add a mprotect type approach to ensure no =E2=80=98non atomi= c=E2=80=99 writes occur - or > b) need to force all cores to mark the page with the exclusive memory a= s IO or similar to ensure that all write accesses followed the slow path. >=20 > 4/ There is an option to implement exclusive operations within the TCG = using mprotect (and signal handlers). I have some concerns on this : woul= d we need have to have support for each host O/S=E2=80=A6. I also think w= e might end up the a lot of protected regions causing a lot of SIGSEGV=E2= =80=99s because an errant guest doesn=E2=80=99t behave well - basically w= e will need to see the impact on performance - finally - this will be rea= lly painful to deal with for cases where the exclusive memory is held in = what Qemu considers IO space !!! > In other words - putting the mprotect inside TCG looks to me like it=E2= =80=99s mutually exclusive to supporting a memory-based scheme like (3). Again, I don't think it's worth caring about legacy host systems too much. In a few years from now transactional memory will be commodity, just like KVM is today. Alex > My personal preference is for 3b) it is =E2=80=9Csafe=E2=80=9D - its w= here the hardware is. > 3a is an optimization of that. > to me, (2) is an optimisation again. We are effectively saying, if you = are able to do this directly, then you dont need to pass via the slow pat= h. Otherwise, you always have the option of reverting to the slow path. >=20 > Frankly - 1 and 1.5 are hacks - they are not optimisations, they are ju= st dirty hacks. However - their saving grace is that they are hacks that = exist and =E2=80=9Cwork=E2=80=9D. I dislike patching the hack, but it did= seem to offer the fastest solution to get around this problem - at least= for now. I am no longer convinced. >=20 > 4/ is something I=E2=80=99d like other peoples views on too=E2=80=A6 Is= it a better approach? What about the slow path? >=20 > I increasingly begin to feel that we should really approach this from t= he other end, and provide a =E2=80=98correct=E2=80=99 solution using the = memory - then worry about making that faster=E2=80=A6 >=20 > Cheers >=20 > Mark. >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >> semantics linux-user currently uses then that definitely needs >> core code support. (Maybe linux-user is being over-zealous >> there; I haven't thought about it.) >> >> -- PMM >=20 >=20 > +44 (0)20 7100 3485 x 210 > +33 (0)5 33 52 01 77x 210 >=20 > +33 (0)603762104 > mark.burton >=20