From: Andrew Morton <akpm@linux-foundation.org>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [patch] RFC directio: partial writes support
Date: Mon, 1 Mar 2010 15:21:49 -0800 [thread overview]
Message-ID: <20100301152149.7ce78e14.akpm@linux-foundation.org> (raw)
In-Reply-To: <87iq9lxz3t.fsf@openvz.org>
On Thu, 25 Feb 2010 15:45:58 +0300
Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> Can someone please describe me why directio deny partial writes.
> For example if someone try to write 100Mb but file system has less
> data it return ENOSPC in the middle of block allocation.
> All allocated blocks will be truncated (it may be 100Mb -4k) end
> ENOSPC will be returned. As far as i remember direct_io always act
> like this, but i never asked why?
> Why do we have to give up all the progress we made?
> In fact partial writes are possible in case of holes, when we
> fall back to buffered write. XFS implemented partial writes.
The problem with direct-io writes is that the writes don't necessarily
complete in file-offset-ascending order. So if we've issued 50 write
BIOs and then hit an EIO on a BIO then we could have a hunk of
unwritten data with newly-writted data either side of it. If we get a
bunch of discontiguous EIO BIOs coming in then the problem gets even
messier - we have a span of disk which has a random mix of
correctly-written and not-correctly-written runs of sectors. What do
we do with that?
The code _could_ perhaps go back and crawl through the request and
identify the number of successfully-written bytes between
start-of-request and first-EIO and then return that. But we didn't
bother.
ENOSPC errors are handled via the same code path and hence got
deoptimised due to this EIO handling. We could perhaps improve the
ENOSPC handling along the lines you propose, as long as we
appropriately take care of EIO considerations. Which, afacit, your
patch didn't do.
The presence of opt-in DIO_PARTIAL_WRITE thing is rather unfortunate -
it would be better to make this change for all filesystems in one hit.
But I guess DIO_PARTIAL_WRITE permits us to migrate filesystems
one-at-a-time as testing permits. But the aim should be to remove
DIO_PARTIAL_WRITE altogether once all the conversion and testing is
completed.
next prev parent reply other threads:[~2010-03-01 23:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-25 12:45 [patch] RFC directio: partial writes support Dmitry Monakhov
2010-02-27 11:10 ` Dmitry Monakhov
2010-03-01 23:21 ` Andrew Morton [this message]
2010-03-02 9:25 ` Nick Piggin
2010-03-02 11:34 ` Jan Kara
2010-03-02 12:37 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100301152149.7ce78e14.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=dmonakhov@openvz.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).