From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41341) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZqfSI-0004Hz-EK for qemu-devel@nongnu.org; Mon, 26 Oct 2015 06:56:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZqfSB-0002wN-16 for qemu-devel@nongnu.org; Mon, 26 Oct 2015 06:56:46 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:37038 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZqfSA-0002wH-OC for qemu-devel@nongnu.org; Mon, 26 Oct 2015 06:56:38 -0400 References: <1444652845-20642-1-git-send-email-pl@kamp.de> <20151026104252.GB20111@stefanha-x1.localdomain> From: Peter Lieven Message-ID: <562E06DA.3070902@kamp.de> Date: Mon, 26 Oct 2015 11:56:26 +0100 MIME-Version: 1.0 In-Reply-To: <20151026104252.GB20111@stefanha-x1.localdomain> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, jcody@redhat.com, jsnow@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org Am 26.10.2015 um 11:42 schrieb Stefan Hajnoczi: > On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote: >> This series aims at avoiding a hanging main-loop if a vserver has a >> CDROM image mounted from a NFS share and that NFS share goes down. >> Typical situation is that users mount an CDROM ISO to install something >> and then forget to eject that CDROM afterwards. >> As a consequence this mounted CD is able to bring down the >> whole vserver if the backend NFS share is unreachable. This is bad >> especially if the CDROM itself is not needed anymore at this point. >> >> This series aims at fixing 2 blocking I/O operations that would >> hang if the NFS server is unavailable: >> - ATAPI PIO read requests used sync calls to blk_read, convert >> them to an async variant where possible. >> - If a busmaster DMA request is cancelled all requests are drained. >> Convert the drain to an async request canceling. >> >> v1->v2: - fix offset for 2352 byte sector size [Kevin] >> - use a sync request if we continue an elementary transfer. >> As John pointed out we enter a race condition between next >> IDE command and async transfer otherwise. This is sill not >> optimal, but it fixes the NFS down problems for all cases where >> the NFS server goes down while there is no PIO CD activity. >> Of course, it could still happen during a PIO transfer, but I >> expect this to be the unlikelier case. >> I spent some effort trying to read more sectors at once and >> avoiding continuation of elementary transfers, but with >> whatever I came up it was destroying migration between different >> Qemu versions. I have a quite hackish patch that works and >> should survive migration, but I am not happy with it. So I >> would like to start with this version as it is a big improvement >> already. >> - Dropped Patch 5 because it is upstream meanwhile. >> >> Peter Lieven (4): >> ide/atapi: make PIO read requests async >> ide/atapi: blk_aio_readv may return NULL >> ide: add support for cancelable read requests >> ide/atapi: enable cancelable requests >> >> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------ >> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++ >> hw/ide/internal.h | 16 +++++++++ >> hw/ide/pci.c | 42 +++++++++++++++-------- >> 4 files changed, 188 insertions(+), 24 deletions(-) > Any reason why write and discard requests aren't covered in this series? > > If this is a good idea for CD-ROM it should be a good idea for all PCI > IDE devices. > > Having a specialized code path is often a sign that it hasn't been > tested enough. Can we get confident enough to enable this everywhere? The reason is that the buffered request trick does only work for read-only devices (like a CDROM). A write request that is completed on the backend storage at a later point (after the OS thinks the request is canceled) can cause damage to the filesystem. Peter