From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46414)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgibson@ozlabs.org>) id 1YJZ3S-0001Bx-2E
	for qemu-devel@nongnu.org; Thu, 05 Feb 2015 21:54:03 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgibson@ozlabs.org>) id 1YJZ3N-0000Fx-T9
	for qemu-devel@nongnu.org; Thu, 05 Feb 2015 21:54:02 -0500
Date: Fri, 6 Feb 2015 13:54:15 +1100
From: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20150206025415.GZ25675@voom.fritz.box>
References: <1422943851-25836-1-git-send-email-david@gibson.dropbear.id.au>
	<20150203211906.GA13992@iris.ozlabs.ibm.com>
	<20150204013211.GU28703@voom.fritz.box> <54D23872.90007@suse.de>
	<20150205004812.GD25675@voom.fritz.box> <54D2BF4F.1030609@suse.de>
	<20150205025556.GH25675@voom.fritz.box>
	<6EFB0F0E-BB1D-4DAE-8BA4-367B05E88553@suse.de>
	<20150205113007.GT25675@voom.fritz.box> <54D35A41.6020907@suse.de>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="mu4zWe6QekgCAJVd"
Content-Disposition: inline
In-Reply-To: <54D35A41.6020907@suse.de>
Subject: Re: [Qemu-devel] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD,
 STORE} implementations
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Graf <agraf@suse.de>
Cc: "aik@ozlabs.ru" <aik@ozlabs.ru>, "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>, Paul Mackerras <paulus@samba.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "mdroth@us.ibm.com" <mdroth@us.ibm.com>


--mu4zWe6QekgCAJVd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote:
>=20
>=20
> On 05.02.15 12:30, David Gibson wrote:
> > On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote:
[snip]
> >>>>>>>> [snip]
> >>>>>>>>
> >>>>>>>>> +    ret1 =3D kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOAD);
> >>>>>>>>> +    if (ret1 !=3D 0) {
> >>>>>>>>> +        fprintf(stderr, "Warning: error enabling H_LOGICAL_CI_=
LOAD in KVM:"
> >>>>>>>>> +                " %s\n", strerror(errno));
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    ret2 =3D kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STORE=
);
> >>>>>>>>> +    if (ret2 !=3D 0) {
> >>>>>>>>> +        fprintf(stderr, "Warning: error enabling H_LOGICAL_CI_=
STORE in KVM:"
> >>>>>>>>> +                " %s\n", strerror(errno));
> >>>>>>>>> +     }
> >>>>>>>>> +
> >>>>>>>>> +    if ((ret1 !=3D 0) || (ret2 !=3D 0)) {
> >>>>>>>>> +        fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_CI=
_* in KVM, SLOF"
> >>>>>>>>> +                " may be unable to operate devices with in-ker=
nel emulation\n");
> >>>>>>>>> +    }
> >>>>>>>>
> >>>>>>>> You'll always get these warnings if you're running on an old (me=
aning
> >>>>>>>> current upstream) kernel, which could be annoying.
> >>>>>>>
> >>>>>>> True.
> >>>>>>>
> >>>>>>>> Is there any way
> >>>>>>>> to tell whether you have configured any devices which need the
> >>>>>>>> in-kernel MMIO emulation and only warn if you have?
> >>>>>>>
> >>>>>>> In theory, I guess so.  In practice I can't see how you'd enumera=
te
> >>>>>>> all devices that might require kernel intervention without someth=
ing
> >>>>>>> horribly invasive.
> >>>>>>
> >>>>>> We could WARN_ONCE in QEMU if we emulate such a hypercall, but its
> >>>>>> handler is io_mem_unassigned (or we add another minimum priority h=
uge
> >>>>>> memory region on all 64bits of address space that reports the brea=
kage).
> >>>>>
> >>>>> Would that work for the virtio+iothread case?  I had the impression
> >>>>> the kernel handled notification region was layered over the qemu
> >>>>> emulated region in that case.
> >>>>
> >>>> IIRC we don't have a way to call back into kvm saying "please write =
to
> >>>> this in-kernel device". But we could at least defer the warning to a
> >>>> point where we know that we actually hit it.
> >>>
> >>> Right, but I'm saying we might miss the warning in cases where we want
> >>> it, because the KVM device is shadowed by a qemu device, so qemu won't
> >>> see the IO as unassigned or unhandled.
> >>>
> >>> In particular, I think that will happen in the case of virtio-blk with
> >>> iothread, which is the simplest case in which to observe the problem.
> >>> The virtio-blk device exists in qemu and is functional, but we rely on
> >>> KVM catching the queue notification MMIO before it reaches the qemu
> >>> implementation of the rest of the device's IO space.
> >>
> >> But in that case the VM stays functional and will merely see a
> >> performance hit when using virtio in SLOF, no? I don't think that's
> >> a problem worth worrying users about.
> >=20
> > Alas, no.  The iothread stuff *relies* on the in-kernel notification,
> > so it will not work if the IO gets punted to qemu.  This is the whole
> > reason for the in-kernel hcall implementation.
>=20
> So at least with vhost-net the in-kernel trapping is optional. If we
> happen to get MMIO into QEMU, we'll just handle it there.
>=20
> Enlighten me why the iothread stuff can't handle it that way too.

So, as I understand it, it could, but it doesn't.  Working out how to
fix it properly requires better understanding of the dataplane code
than I currently possess,

So, using virtio-blk as the example case.  Normally the queue notify
mmio will get routed by the general virtio code to
virtio_blk_handle_output().

In the case of dataplane, that just calls
virtio_blk_data_plane_start().  So the first time we get a vq notify,
the dataplane is started.  That sets up the host notifier
(VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier ->
virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd()
-> memory_region_transaction_commit() ->
address_space_update_ioeventfds - >address_space_add_del_ioeventfds ->
kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD
ioctl)

=46rom this point on further calls to virtio_blk_handle_output() are
IIUC a "can't happen", because vq notifies should go to the eventfd
instead, where they will kick the iothread.

So, with SLOF, the first request is ok - it hits
virtio_blk_handle_output() which starts the iothread which goes on to
process the request.

On the second request, however, we get back into
virtio_blk_data_plane_start() which sees the iothread is already
running and aborts.  I think it is assuming that this must be the
result of a race with another vcpu starting the dataplane, and so
assumes the racing thread will have woken the dataplane which will
then handle this vcpu's request as well.

In our case, however, the IO hcalls go through to
virtio_blk_handle_output() when the dataplane already going, and
become no-ops without waking it up again to handle the new request.

Enlightened enough yet?

--=20
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--mu4zWe6QekgCAJVd
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJU1CzXAAoJEGw4ysog2bOSfMsQALzWBUtmR6cR1dZ2oeue9sXC
ADi+XZKj8V7ko7hrezjI5uy3Jbt62s1uPinsI9pPlbGu2mIc0f3yIxy5LD7Xocgm
3B+s3Kjx1eKvcVq5aqbAbzAor+3sQd/ZMQcfzccv219KWFs49XwkMd80h060Yhcq
w7d8rLJet2mBr8+7Yi2TzaOm/IeBd/Jay/c//8WcWY5ik5P7rEM5/W+YE6d2TSyV
EN3DOHPvQGig5K8KKQG1GeJULbVvLxVZZXdJb7giQQTmrR0K2hwmJ99GjRvAunOq
8LuU1lz886HtUHcGxVunzF5MciNLYiJ72mXs4vDVlpBTJ+9h09JXN70yn0nW0K5n
XQbibo/v8j7ZoWjVkDPNS6VrgbybofofILKp5WmBzodme1iHQKmef/n4iTazKp+Y
BbgRpDaoaWAfREKefXusxb0/+NPv6Yp44QmkWbY+/t3te+WpIviX/TfSHaL7MNs4
PNfUQJeaV8oJJ3+aNQb0eQcjMJb2rfcAlEiekRiEnHlfxe+z5jmNMtBkgVVhGSr/
gzDYn42VGOYFjAJ84B4Jg29yEIaiZVVoUmqQls0pIesRODxKG9UuUNNoMeCcyqvf
r22llCEDoN7oSOfac8PRYmaNd09WYc9HIOCmZLXm4LDnorlycS/cMYdVxfPqUZ8p
z5qzHnhOEzMObL1Jr/O2
=GU3/
-----END PGP SIGNATURE-----

--mu4zWe6QekgCAJVd--