From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39309) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkQeF-0001Mm-Gh for qemu-devel@nongnu.org; Tue, 21 Apr 2015 01:23:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YkQeC-0008AX-9c for qemu-devel@nongnu.org; Tue, 21 Apr 2015 01:23:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43875) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkQeC-0008AI-1m for qemu-devel@nongnu.org; Tue, 21 Apr 2015 01:23:00 -0400 Date: Tue, 21 Apr 2015 07:22:53 +0200 From: "Michael S. Tsirkin" Message-ID: <20150421070941-mutt-send-email-mst@redhat.com> References: <1429257573-7359-1-git-send-email-famz@redhat.com> <20150420175905-mutt-send-email-mst@redhat.com> <20150421023700.GC8048@fam-t430.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150421023700.GC8048@fam-t430.nay.redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Kevin Wolf , Rusty Russell , qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, "Aneesh Kumar K.V" , Stefan Hajnoczi , Amit Shah , Paolo Bonzini On Tue, Apr 21, 2015 at 10:37:00AM +0800, Fam Zheng wrote: > On Mon, 04/20 19:36, Michael S. Tsirkin wrote: > > On Fri, Apr 17, 2015 at 03:59:15PM +0800, Fam Zheng wrote: > > > Currently, virtio code chooses to kill QEMU if the guest passes any= invalid > > > data with vring. > > > That has drawbacks such as losing unsaved data (e.g. when > > > guest user is writing a very long email), or possible denial of ser= vice in > > > a nested vm use case where virtio device is passed through. > > >=20 > > > virtio-1 has introduced a new status bit "NEEDS RESET" which could = be used to > > > improve this by communicating the error state between virtio device= s and > > > drivers. The device notifies guest upon setting the bit, then the g= uest driver > > > should detect this bit and report to userspace, or recover the devi= ce by > > > resetting it. > >=20 > > Unfortunately, virtio 1 spec does not have a conformance statement > > that requires driver to recover. We merely have a non-normative looki= ng > > text: > > Note: For example, the driver can=E2=80=99t assume requests in fligh= t > > will be completed if DEVICE_NEEDS_RESET is set, nor can it assume th= at > > they have not been completed. A good implementation will try to reco= ver > > by issuing a reset. > >=20 > > Implementing this reset for all devices in a race-free manner might a= lso > > be far from trivial. I think we'd need a feature bit for this. > > OTOH as long as we make this a new feature, would an ability to > > reset a single VQ be a better match for what you are trying to > > achieve? >=20 > I think that is too complicated as a recovery measure, a device level r= esetting > will be better to get to a deterministic state, at least. Question would be, how hard is it to stop host from using all queues, retrieve all host OS state and re-program it into the device. If we need to shadow all OS state within the driver, then that's a lot of not well tested code with a possibility of introducing more bugs. > >=20 > > > This series makes necessary changes in virtio core code, based on w= hich > > > virtio-blk is converted. Other devices now keep the existing behavi= or by > > > passing in "error_abort". They will be converted in following serie= s. The Linux > > > driver part will also be worked on. > > >=20 > > > One concern with this behavior change is that it's now harder to no= tice the > > > actual driver bug that caused the error, as the guest continues to = run. To > > > address that, we could probably add a new error action option to vi= rtio > > > devices, similar to the "read/write werror" in block layer, so the= vm could be > > > paused and the management will get an event in QMP like pvpanic. T= his work can > > > be done on top. > >=20 > > At the architectural level, that's only one concern. Others would be > > - workloads such as openstack handle guest crash better than > > a guest that's e.g. slow because of a memory leak >=20 > What memory leak are you referring to? That was just an example. If host detects a malformed ring, it will crash. But often it doesn't, result is buffers not being used, so guest can't free them up. > > - it's easier for guests to probe host for security issues > > if guest isn't killed > > - guest can flood host log with guest-triggered errors >=20 > We can still abort() if guest is triggering error too quickly. >=20 > Fam Absolutely, and if it looked like I'm against error detection and recovery, this was not my intent. I am merely saying we can't apply this patchset as is, deferring addressing the issues to patches on top. But I have an idea: refactor the code to use error_abort. This way we can apply the patchset without making functional changes, and you can make progress to complete this, on top. --=20 MST