From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=44547 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OdoKb-0007Qk-NU
	for qemu-devel@nongnu.org; Tue, 27 Jul 2010 13:52:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OdoCg-0007RF-C1
	for qemu-devel@nongnu.org; Tue, 27 Jul 2010 13:44:35 -0400
Received: from mail-gy0-f173.google.com ([209.85.160.173]:36258)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OdoCg-0007R2-7i
	for qemu-devel@nongnu.org; Tue, 27 Jul 2010 13:44:34 -0400
Received: by gyd10 with SMTP id 10so1524485gyd.4
	for <qemu-devel@nongnu.org>; Tue, 27 Jul 2010 10:44:33 -0700 (PDT)
Message-ID: <4C4F1AFB.3020100@codemonkey.ws>
Date: Tue, 27 Jul 2010 12:44:27 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] ide_dma_cancel will result in partial DMA
	transfer (resend #4)
References: <20100727173050.GK16655@random.random>
In-Reply-To: <20100727173050.GK16655@random.random>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: qemu-devel@nongnu.org

On 07/27/2010 12:30 PM, Andrea Arcangeli wrote:
> Subject: avoid canceling ide dma
>
> From: Andrea Arcangeli<aarcange@redhat.com>
>
> The reason for not actually canceling the I/O is because with
> virtualization and lots of VM running, a guest fs may mistake a
> overload of the host, as an IDE timeout. So rather than canceling the
> I/O, it's safer to wait I/O completion and simulate that the I/O has
> completed just before the io cancellation was requested by the
> guest. This way if ntfs or an app writes data without checking for
> -EIO retval, and it thinks the write has succeeded, it's less likely
> to run into troubles. Similar issues for reads.
>
> Furthermore because the DMA operation is splitted into many synchronous
> aio_read/write if there's more than one entry in the SG table, without this
> patch the DMA would be cancelled in the middle, something we've no idea if it
> happens on real hardware too or not. Overall this seems a great risk for zero
> gain.
>
> This approach is sure safer than previous code given we can't pretend all guest
> fs code out there to check for errors and reply the DMA if it was completed
> partially, given a timeout would never materialize on a real harddisk unless
> there are defective blocks (and defective blocks are practically only an issue
> for reads never for writes in any recent hardware as writing to blocks is the
> way to fix them) or the harddisk breaks as a whole.
>
> Signed-off-by: Izik Eidus<ieidus@redhat.com>
> Signed-off-by: Andrea Arcangeli<aarcange@redhat.com>
> ---
>
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 780fc5f..9f6d42a 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -40,8 +40,25 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val)
>       printf("%s: 0x%08x\n", __func__, val);
>   #endif
>       if (!(val&  BM_CMD_START)) {
> -        /* XXX: do it better */
> -        ide_dma_cancel(bm);
> +        /*
> +	 * We can't cancel Scatter Gather DMA in the middle of the
> +	 * operation or a partial (not full) DMA transfer would reach
> +	 * the storage so we wait for completion instead (we beahve
> +	 * like if the DMA was completed by the time the guest trying
> +	 * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> +	 * set).
> +	 *
> +	 * In the future we'll be able to safely cancel the I/O if the
> +	 * whole DMA operation will be submitted to disk with a single
> +	 * aio operation with preadv/pwritev.
> +	 */
> +	if (bm->aiocb) {
> +		qemu_aio_flush();
> +		if (bm->aiocb)
> +			printf("ide_dma_cancel: aiocb still pending");
> +		if (bm->status&  BM_STATUS_DMAING)
> +			printf("ide_dma_cancel: BM_STATUS_DMAING still pending");
>    


printf()s?

Regards,

Anthony Liguori

> +	}
>           bm->cmd = val&  0x09;
>       } else {
>           if (!(bm->status&  BM_STATUS_DMAING)) {
>
>
>