From: Eric Sandeen <sandeen@sgi.com>
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Some O_DIRECT vs. block_truncate_page problems
Date: 23 Sep 2002 13:09:37 -0500 [thread overview]
Message-ID: <1032804577.27949.17.camel@stout.americas.sgi.com> (raw)
I'm trying to track down a bug I'm seeing after I run xfs's
defragmenter, where data is present past EOF in the last block.
xfs_fsr does this (essentially):
o allocate some contiguous space with an xfs-specific ioctl
o copy the fragmented file into the new file with O_DIRECT
o truncate() the new file back to the correct size (since it's currently
a multiple of the block size)
o sync the new file
o do "xfs_inval_cached_pages", which is essentially:
- filemap_fdatasync(ip->i_mapping);
- fsync_inode_data_buffers(ip);
- filemap_fdatawait(ip->i_mapping);
- truncate_inode_pages(ip->i_mapping, first);
o swap the extents between the old & the new files
When this is all done, I see data past EOF in the resulting file.
I believe that this is because the buffer from the truncate() call is
not being synced to disk, and gets thrown away when we do
truncate_inode_pages().
block_truncate_page() does a __mark_buffer_dirty(bh) at the end, but it
does not file the buffer on the inode's dirty data queue, so the
fsync_inode_data_buffers() does not see it. The inode only has a page
on it's clean pages list, so filemap_fdatasync does not see it either.
Since generic_file_direct_IO expects to do filemap_fdatasync,
fsync_inode_data_buffers, filemap_fdatawait to flush data to disk, I
think that the behavior after a truncate() will be broken for any
filesystem.
This patch seems to fix things up for me; is there a reason that the
buffer was not inserted originally?
--- linux/fs/buffer.c_1.109 Mon Sep 23 13:10:56 2002
+++ linux/fs/buffer.c Mon Sep 23 13:04:44 2002
@@ -2028,7 +2028,12 @@
flush_dcache_page(page);
kunmap(page);
- __mark_buffer_dirty(bh);
+ if (!atomic_set_buffer_dirty(bh)) {
+ __mark_dirty(bh);
+ buffer_insert_inode_data_queue(bh, inode);
+ balance_dirty();
+ }
+
err = 0;
On a related note, while testing some O_DIRECT cases on ext2, I found
another, possibly related, bug.
If you O_DIRECT write 2 blocks into a new file, truncate() the file to
1.5 blocks, then do an O_DIRECT read of the last block, you get
incorrect data. Rather than 1/2 data, 1/2 zeroes, I get all zeroes.
This test passes on xfs.
Comments?
-Eric
--
Eric Sandeen XFS for Linux http://oss.sgi.com/projects/xfs
sandeen@sgi.com SGI, Inc. 651-683-3102
next reply other threads:[~2002-09-23 18:16 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-23 18:09 Eric Sandeen [this message]
2002-09-24 5:39 ` Some O_DIRECT vs. block_truncate_page problems Andrew Morton
2002-09-24 12:48 ` Some O_DIRECT vs. block_truncate_page problemsr Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1032804577.27949.17.camel@stout.americas.sgi.com \
--to=sandeen@sgi.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).