From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38862)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1ZPVHL-0007Wh-Cj
	for qemu-devel@nongnu.org; Wed, 12 Aug 2015 08:37:12 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1ZPVHG-0000f6-0f
	for qemu-devel@nongnu.org; Wed, 12 Aug 2015 08:37:11 -0400
Received: from mx1.redhat.com ([209.132.183.28]:57686)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1ZPVHF-0000f2-Of
	for qemu-devel@nongnu.org; Wed, 12 Aug 2015 08:37:05 -0400
References: <1438966995-5913-1-git-send-email-a.rigo@virtualopensystems.com>
	<1438966995-5913-2-git-send-email-a.rigo@virtualopensystems.com>
	<55C9FE29.2080204@redhat.com>
	<CAH47eN2Cq84G85gqVrUj1_re89sewC4crGj4WHb9CnpmbT+nWw@mail.gmail.com>
	<55CA1AF8.1090409@redhat.com>
	<CAH47eN2xF-=kJ27eTQuuwgvdSP2=dC12eUS1VxK70VpKopeNeg@mail.gmail.com>
	<55CA2383.60805@redhat.com>
	<CAH47eN34Frs9ySE6F5=GRQSHw+o-juXt18yeuS9C5i140wR3Pg@mail.gmail.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <55CB3DEA.2050103@redhat.com>
Date: Wed, 12 Aug 2015 14:36:58 +0200
MIME-Version: 1.0
In-Reply-To: <CAH47eN34Frs9ySE6F5=GRQSHw+o-juXt18yeuS9C5i140wR3Pg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to
	ram_list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: alvise rigo <a.rigo@virtualopensystems.com>
Cc: mttcg@greensocs.com, Claudio Fontana <claudio.fontana@huawei.com>, QEMU Developers <qemu-devel@nongnu.org>, Jani Kokkonen <jani.kokkonen@huawei.com>, VirtualOpenSystems Technical Team <tech@virtualopensystems.com>, =?UTF-8?Q?Alex_Benn=c3=a9e?= <alex.bennee@linaro.org>


On 12/08/2015 09:31, alvise rigo wrote:
> I think that tlb_flush_entry is not enough, since in theory another
> vCPU could have a different TLB address referring the same phys
> address.

You're right, this is a TLB so it's virtually-indexed. :(  I'm not sure=20
what happens on ARM, since it has a virtually indexed (VIVT or VIPT)=20
cache, but indeed it would be a problem when implementing e.g. CMPXCHG=20
using the TCG ll/sc ops.

I'm a bit worried about adding such a big bitmap.  It's only used on=20
TCG, but it is also allocated on KVM and on KVM you can have hundreds=20
of VCPUs.  Wasting 200 bits per guest memory page (i.e. ~0.6% of guest=20
memory) is obviously not a great idea. :(

Perhaps we can use a bytemap instead:

- 0..253 =3D TLB_EXCL must be set in all VCPUs except CPU n.  A VCPU that
loads the TLB for this vaddr does not have to set it.

- 254 =3D TLB_EXCL must be set in all VCPUs.  A VCPU that
loads the TLB for this vaddr has to set it.

- 255 =3D TLB_EXCL not set in at least two VCPUs

Transitions:

- ll transitions: anything -> 254

- sc transitions: 254 -> current CPU_ID

- TLB_EXCL store transitions: 254 -> current CPU_ID

- tlb_st_page transitions: CPU_ID other than current -> 255

The initial value is 255 on SMP guests, 0 on UP guests.

The algorithms are very similar to yours, just using this approximate
representation.

ll algorithm:
  llsc_value =3D bytemap[vaddr]
  if llsc_value =3D=3D CPU_ID
     do nothing
  elseif llsc_value < 254
     flush TLB of CPU llsc_value
  elseif llsc_value =3D=3D 255
     flush all TLBs
  set TLB_EXCL
  bytemap[vaddr] =3D 254
  load

tlb_set_page algorithm:
  llsc_value =3D bytemap[vaddr]
  if llsc_value =3D=3D CPU_ID or llsc_value =3D=3D 255
     do nothing
  else if llsc_value =3D=3D 254
     set TLB_EXCL
  else
     # two CPUs without TLB_EXCL
     bytemap[vaddr] =3D 255

TLB_EXCL slow path algorithm:
   if bytemap[vaddr] =3D=3D 254
      bytemap[vaddr] =3D CPU_ID
   else
      # two CPUs without TLB_EXCL
      bytemap[vaddr] =3D 255
   clear TLB_EXCL in this CPU
   store

sc algorithm:
   if bytemap[vaddr] =3D=3D CPU_ID or bytemap[vaddr] =3D=3D 254
      bytemap[vaddr] =3D CPU_ID
      clear TLB_EXCL in this CPU
      store
      succeed
   else
      fail

clear algorithm:
   if bytemap[vaddr] =3D=3D 254
      bytemap[vaddr] =3D CPU_ID

The UP case is optimized because bytemap[vaddr] will always be 0 or 254.

The algorithm requires the LL to be cleared e.g. on exceptions.
Paolo

> alvise
>=20
> On Tue, Aug 11, 2015 at 6:32 PM, Paolo Bonzini <pbonzini@redhat.com> wr=
ote:
>>
>>
>> On 11/08/2015 18:11, alvise rigo wrote:
>>>>> Why flush the entire cache (I understand you mean TLB)?
>>> Sorry, I meant the TLB.
>>> If for each removal of an exclusive entry we set also the bit to 1, w=
e
>>> force the following LL to make a tlb_flush() on every vCPU.
>>
>> What if you only flush one entry with tlb_flush_entry (on every vCPU)?
>>
>> Paolo