From: Christoph Hellwig <hch@infradead.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>, Dave Chinner <david@fromorbit.com>,
linux-ext4@vger.kernel.org
Subject: Re: Query about DIO/AIO WRITE throttling and ext4 serialization
Date: Thu, 2 Jun 2011 21:02:33 -0400 [thread overview]
Message-ID: <20110603010233.GA17726@infradead.org> (raw)
In-Reply-To: <20110603005403.GB27129@redhat.com>
On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
> Just wondering why ext4 and XFS behavior are different and which is a
> more appropriate behavior. ext4 does not seem to be waiting for all
> pending AIO/DIO to finish while XFS does.
They're both wrong. Ext4 completely misses support in fsync or sync
to catch pending unwrittent extent conversions, and thus fails to obey
the data integrity guarante. XFS is beeing rather stupid about the
amount of synchronization it requires. The untested patch below
should help with avoiding the synchronization if you're purely doing
overwrites:
Index: xfs/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_aops.c 2011-06-03 09:54:52.964337556 +0900
+++ xfs/fs/xfs/linux-2.6/xfs_aops.c 2011-06-03 09:57:06.877674259 +0900
@@ -270,7 +270,7 @@ xfs_finish_ioend_sync(
* (vs. incore size).
*/
STATIC xfs_ioend_t *
-xfs_alloc_ioend(
+__xfs_alloc_ioend(
struct inode *inode,
unsigned int type)
{
@@ -290,7 +290,6 @@ xfs_alloc_ioend(
ioend->io_inode = inode;
ioend->io_buffer_head = NULL;
ioend->io_buffer_tail = NULL;
- atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
ioend->io_offset = 0;
ioend->io_size = 0;
ioend->io_iocb = NULL;
@@ -300,6 +299,18 @@ xfs_alloc_ioend(
return ioend;
}
+STATIC xfs_ioend_t *
+xfs_alloc_ioend(
+ struct inode *inode,
+ unsigned int type)
+{
+ struct xfs_ioend *ioend;
+
+ ioend = __xfs_alloc_ioend(inode, type);
+ atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
+ return ioend;
+}
+
STATIC int
xfs_map_blocks(
struct inode *inode,
@@ -1318,6 +1329,7 @@ xfs_end_io_direct_write(
*/
iocb->private = NULL;
+ atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
ioend->io_offset = offset;
ioend->io_size = size;
if (private && size > 0)
@@ -1354,7 +1366,7 @@ xfs_vm_direct_IO(
ssize_t ret;
if (rw & WRITE) {
- iocb->private = xfs_alloc_ioend(inode, IO_DIRECT);
+ iocb->private = __xfs_alloc_ioend(inode, IO_DIRECT);
ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov,
offset, nr_segs,
next prev parent reply other threads:[~2011-06-03 1:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-01 21:50 Query about DIO/AIO WRITE throttling and ext4 serialization Vivek Goyal
2011-06-02 1:22 ` Dave Chinner
2011-06-02 14:17 ` Vivek Goyal
2011-06-02 14:36 ` Vivek Goyal
2011-06-02 15:56 ` Vivek Goyal
2011-06-02 23:51 ` Dave Chinner
2011-06-03 0:27 ` Vivek Goyal
2011-06-03 0:43 ` Ted Ts'o
2011-06-03 0:54 ` Vivek Goyal
2011-06-03 1:02 ` Christoph Hellwig [this message]
2011-06-03 1:28 ` Vivek Goyal
2011-06-03 1:33 ` Vivek Goyal
2011-06-09 13:09 ` Christoph Hellwig
2011-06-03 3:30 ` Eric Sandeen
2011-06-03 5:00 ` Christoph Hellwig
2011-06-03 1:11 ` Ted Ts'o
2011-06-02 23:46 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110603010233.GA17726@infradead.org \
--to=hch@infradead.org \
--cc=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).