From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39309)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1YkQeF-0001Mm-Gh
	for qemu-devel@nongnu.org; Tue, 21 Apr 2015 01:23:04 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1YkQeC-0008AX-9c
	for qemu-devel@nongnu.org; Tue, 21 Apr 2015 01:23:03 -0400
Received: from mx1.redhat.com ([209.132.183.28]:43875)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1YkQeC-0008AI-1m
	for qemu-devel@nongnu.org; Tue, 21 Apr 2015 01:23:00 -0400
Date: Tue, 21 Apr 2015 07:22:53 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20150421070941-mutt-send-email-mst@redhat.com>
References: <1429257573-7359-1-git-send-email-famz@redhat.com>
	<20150420175905-mutt-send-email-mst@redhat.com>
	<20150421023700.GC8048@fam-t430.nay.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20150421023700.GC8048@fam-t430.nay.redhat.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support
 "VIRTIO_CONFIG_S_NEEDS_RESET"
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Fam Zheng <famz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Rusty Russell <rusty@rustcorp.com.au>, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, Stefan Hajnoczi <stefanha@redhat.com>, Amit Shah <amit.shah@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>

On Tue, Apr 21, 2015 at 10:37:00AM +0800, Fam Zheng wrote:
> On Mon, 04/20 19:36, Michael S. Tsirkin wrote:
> > On Fri, Apr 17, 2015 at 03:59:15PM +0800, Fam Zheng wrote:
> > > Currently, virtio code chooses to kill QEMU if the guest passes any=
 invalid
> > > data with vring.
> > > That has drawbacks such as losing unsaved data (e.g. when
> > > guest user is writing a very long email), or possible denial of ser=
vice in
> > > a nested vm use case where virtio device is passed through.
> > >=20
> > > virtio-1 has introduced a new status bit "NEEDS RESET" which could =
be used to
> > > improve this by communicating the error state between virtio device=
s and
> > > drivers. The device notifies guest upon setting the bit, then the g=
uest driver
> > > should detect this bit and report to userspace, or recover the devi=
ce by
> > > resetting it.
> >=20
> > Unfortunately, virtio 1 spec does not have a conformance statement
> > that requires driver to recover. We merely have a non-normative looki=
ng
> > text:
> > 	Note: For example, the driver can=E2=80=99t assume requests in fligh=
t
> > 	will be completed if DEVICE_NEEDS_RESET is set, nor can it assume th=
at
> > 	they have not been completed. A good implementation will try to reco=
ver
> > 	by issuing a reset.
> >=20
> > Implementing this reset for all devices in a race-free manner might a=
lso
> > be far from trivial.  I think we'd need a feature bit for this.
> > OTOH as long as we make this a new feature, would an ability to
> > reset a single VQ be a better match for what you are trying to
> > achieve?
>=20
> I think that is too complicated as a recovery measure, a device level r=
esetting
> will be better to get to a deterministic state, at least.

Question would be, how hard is it to stop host from using all queues,
retrieve all host OS state and re-program it into the device.
If we need to shadow all OS state within the driver, then that's a lot
of not well tested code with a possibility of introducing more bugs.

> >=20
> > > This series makes necessary changes in virtio core code, based on w=
hich
> > > virtio-blk is converted. Other devices now keep the existing behavi=
or by
> > > passing in "error_abort". They will be converted in following serie=
s. The Linux
> > > driver part will also be worked on.
> > >=20
> > > One concern with this behavior change is that it's now harder to no=
tice the
> > > actual driver bug that caused the error, as the guest continues to =
run.  To
> > > address that, we could probably add a new error action option to vi=
rtio
> > > devices,  similar to the "read/write werror" in block layer, so the=
 vm could be
> > > paused and the management will get an event in QMP like pvpanic.  T=
his work can
> > > be done on top.
> >=20
> > At the architectural level, that's only one concern. Others would be
> > - workloads such as openstack handle guest crash better than
> >   a guest that's e.g. slow because of a memory leak
>=20
> What memory leak are you referring to?

That was just an example.  If host detects a malformed ring, it will
crash.  But often it doesn't, result is buffers not being used, so guest
can't free them up.

> > - it's easier for guests to probe host for security issues
> >   if guest isn't killed
> > - guest can flood host log with guest-triggered errors
>=20
> We can still abort() if guest is triggering error too quickly.
>=20
> Fam


Absolutely, and if it looked like I'm against error detection and
recovery, this was not my intent.

I am merely saying we can't apply this patchset as is, deferring
addressing the issues to patches on top.

But I have an idea: refactor the code to use error_abort. This way we
can apply the patchset without making functional changes, and you can
make progress to complete this, on top.


--=20
MST