linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some O_DIRECT vs. block_truncate_page problems
@ 2002-09-23 18:09 Eric Sandeen
  2002-09-24  5:39 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Sandeen @ 2002-09-23 18:09 UTC (permalink / raw)
  To: linux-fsdevel

I'm trying to track down a bug I'm seeing after I run xfs's
defragmenter, where data is present past EOF in the last block.

xfs_fsr does this (essentially):

o allocate some contiguous space with an xfs-specific ioctl
o copy the fragmented file into the new file with O_DIRECT
o truncate() the new file back to the correct size (since it's currently
  a multiple of the block size)
o sync the new file
o do "xfs_inval_cached_pages", which is essentially:
    - filemap_fdatasync(ip->i_mapping);
    - fsync_inode_data_buffers(ip);
    - filemap_fdatawait(ip->i_mapping);
    - truncate_inode_pages(ip->i_mapping, first);
o swap the extents between the old & the new files

When this is all done, I see data past EOF in the resulting file.

I believe that this is because the buffer from the truncate() call is
not being synced to disk, and gets thrown away when we do
truncate_inode_pages().

block_truncate_page() does a __mark_buffer_dirty(bh) at the end, but it
does not file the buffer on the inode's dirty data queue, so the
fsync_inode_data_buffers() does not see it.  The inode only has a page
on it's clean pages list, so filemap_fdatasync does not see it either.

Since generic_file_direct_IO expects to do filemap_fdatasync,
fsync_inode_data_buffers, filemap_fdatawait to flush data to disk, I
think that the behavior after a truncate() will be broken for any
filesystem.

This patch seems to fix things up for me; is there a reason that the
buffer was not inserted originally?

--- linux/fs/buffer.c_1.109	Mon Sep 23 13:10:56 2002
+++ linux/fs/buffer.c	Mon Sep 23 13:04:44 2002
@@ -2028,7 +2028,12 @@
 	flush_dcache_page(page);
 	kunmap(page);
 
-	__mark_buffer_dirty(bh);
+	if (!atomic_set_buffer_dirty(bh)) {
+		__mark_dirty(bh);
+		buffer_insert_inode_data_queue(bh, inode);
+		balance_dirty();
+	}
+
 	err = 0;


On a related note, while testing some O_DIRECT cases on ext2, I found
another, possibly related, bug.

If you O_DIRECT write 2 blocks into a new file, truncate() the file to
1.5 blocks, then do an O_DIRECT read of the last block, you get
incorrect data.  Rather than 1/2 data, 1/2 zeroes, I get all zeroes.
This test passes on xfs.

Comments?

-Eric


-- 
Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
sandeen@sgi.com   SGI, Inc.         651-683-3102


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-09-24 12:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-23 18:09 Some O_DIRECT vs. block_truncate_page problems Eric Sandeen
2002-09-24  5:39 ` Andrew Morton
2002-09-24 12:48   ` Some O_DIRECT vs. block_truncate_page problemsr Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).