From: Avi Kivity <avi@redhat.com>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] honor IDE_DMA_BUF_SECTORS
Date: Fri, 27 Mar 2009 12:52:15 +0300 [thread overview]
Message-ID: <49CCA1CF.4000003@redhat.com> (raw)
In-Reply-To: <20090326231801.GE5458@const.famille.thibault.fr>
Samuel Thibault wrote:
> Ah, I thought you understood that the posix driver has the same kind of
> limitation
It's not the same limitation. The posix driver has no limits on DMA
size, it will happily transfer a gigabyte of data if you ask it to.
> (and qemu is actually _bugged_ in that regard).
>
It has a bug in that it does not correctly interpret the return value of
pread()/pwrite(). It's a minor bug since no system supported by qemu
will actually return a short read or write (I think) and in that we
hope disk errors are rare. Nevertheless it should be fixed (it's an
easy fix too). However implementing DMA limits like you propose
(IDE_DMA_BUF_SECTORS) will not fix the bug, only reduce performance.
>
>>> I'm here just pointing out that the problem is not
>>> _only_ in the xen-specific driver, but also in the posix driver, on any
>>> OS that doesn't necessarily do all the work the caller asked for (which
>>> is _allowed_ by POSIX).
>>>
>>>
>> But that's not limited DMA (or at least, not limited up-front). And
>> it's easily corrected, place a while loop around preadv/pwritev, no need
>> to split a request a priori somewhere up the stack.
>>
>
> Sure, and I could do the same in the block-vbd driver, thus then my
> original remark "it should be centralized in the block layer instead of
> placing the burden on all block format drivers". Just to make sure: I'm
> _not_ saying that should be done in the DMA code. I said it should be
> done in the block layer, shared by all block drivers.
>
A generic fix will have to issue a new aio request. block-raw-posix
need not do that, just a while loop.
>> And it wouldn't be right for block-vbd - you should split your requests
>> as late as possible, IMO.
>>
>
> Why making it "late"? Exposing the lower limits to let upper layers
> decide how they should manage fragmentation usually gets better
> performance. (Note that in my case there is no system involved, so it's
> really _not_ costly to do the fragmentation on the qemu side).
>
If ring entries can be more than a page (if the request is contiguous),
then the limit can be expanded. In other words, it's a worst-case
limit, not a hard limit. Exposing the worst case limit will lead to
pessimistic choices.
That's how virtio-blk works, don't know about xen vbd (might not work
due to the need to transfer grants?)
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
next prev parent reply other threads:[~2009-03-27 9:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-25 13:45 [Qemu-devel] [PATCH] honor IDE_DMA_BUF_SECTORS Stefano Stabellini
2009-03-25 15:22 ` Avi Kivity
2009-03-25 16:19 ` Stefano Stabellini
2009-03-25 16:45 ` Avi Kivity
2009-03-25 16:50 ` Stefano Stabellini
2009-03-25 17:47 ` Stefano Stabellini
2009-03-26 10:23 ` Avi Kivity
2009-03-26 10:31 ` Stefano Stabellini
2009-03-26 10:57 ` Avi Kivity
2009-03-26 11:45 ` Stefano Stabellini
2009-03-26 12:10 ` Avi Kivity
2009-03-26 12:28 ` Stefano Stabellini
2009-03-26 12:47 ` Samuel Thibault
2009-03-26 12:58 ` Avi Kivity
2009-03-26 15:30 ` Samuel Thibault
2009-03-26 18:32 ` Avi Kivity
2009-03-26 18:48 ` Samuel Thibault
2009-03-26 19:40 ` Avi Kivity
2009-03-26 23:18 ` Samuel Thibault
2009-03-27 9:52 ` Avi Kivity [this message]
2009-03-27 10:32 ` Samuel Thibault
2009-03-27 10:53 ` Avi Kivity
2009-03-27 13:45 ` Samuel Thibault
2009-03-26 22:42 ` Christoph Hellwig
2009-03-26 23:22 ` Samuel Thibault
2009-03-27 10:02 ` Avi Kivity
2009-03-27 10:36 ` Samuel Thibault
2009-03-27 10:58 ` Avi Kivity
2009-03-25 16:46 ` Samuel Thibault
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49CCA1CF.4000003@redhat.com \
--to=avi@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).