From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Sandeen <sandeen@sgi.com>
Subject: Some O_DIRECT vs. block_truncate_page problems
Date: 23 Sep 2002 13:09:37 -0500
Sender: linux-fsdevel-owner@vger.kernel.org
Message-ID: <1032804577.27949.17.camel@stout.americas.sgi.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeus-e8.americas.sgi.com (zeus-e8.americas.sgi.com [192.48.203.6])
	by tolkor.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g8NIGHS9000817
	for <linux-fsdevel@vger.kernel.org>; Mon, 23 Sep 2002 13:16:17 -0500
Received: from poppy-e185.americas.sgi.com (poppy-e185.americas.sgi.com [128.162.185.207]) by zeus-e8.americas.sgi.com (SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id NAA05104 for <linux-fsdevel@vger.kernel.org>; Mon, 23 Sep 2002 13:15:37 -0500 (CDT)
Received: from stout.americas.sgi.com (stout.americas.sgi.com [128.162.187.5]) by poppy-e185.americas.sgi.com (980427.SGI.8.8.8/SGI-server-1.8) with ESMTP id NAA07363 for <linux-fsdevel@vger.kernel.org>; Mon, 23 Sep 2002 13:15:37 -0500 (CDT)
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>
List-Id: linux-fsdevel.vger.kernel.org

I'm trying to track down a bug I'm seeing after I run xfs's
defragmenter, where data is present past EOF in the last block.

xfs_fsr does this (essentially):

o allocate some contiguous space with an xfs-specific ioctl
o copy the fragmented file into the new file with O_DIRECT
o truncate() the new file back to the correct size (since it's currently
  a multiple of the block size)
o sync the new file
o do "xfs_inval_cached_pages", which is essentially:
    - filemap_fdatasync(ip->i_mapping);
    - fsync_inode_data_buffers(ip);
    - filemap_fdatawait(ip->i_mapping);
    - truncate_inode_pages(ip->i_mapping, first);
o swap the extents between the old & the new files

When this is all done, I see data past EOF in the resulting file.

I believe that this is because the buffer from the truncate() call is
not being synced to disk, and gets thrown away when we do
truncate_inode_pages().

block_truncate_page() does a __mark_buffer_dirty(bh) at the end, but it
does not file the buffer on the inode's dirty data queue, so the
fsync_inode_data_buffers() does not see it.  The inode only has a page
on it's clean pages list, so filemap_fdatasync does not see it either.

Since generic_file_direct_IO expects to do filemap_fdatasync,
fsync_inode_data_buffers, filemap_fdatawait to flush data to disk, I
think that the behavior after a truncate() will be broken for any
filesystem.

This patch seems to fix things up for me; is there a reason that the
buffer was not inserted originally?

--- linux/fs/buffer.c_1.109	Mon Sep 23 13:10:56 2002
+++ linux/fs/buffer.c	Mon Sep 23 13:04:44 2002
@@ -2028,7 +2028,12 @@
 	flush_dcache_page(page);
 	kunmap(page);
 
-	__mark_buffer_dirty(bh);
+	if (!atomic_set_buffer_dirty(bh)) {
+		__mark_dirty(bh);
+		buffer_insert_inode_data_queue(bh, inode);
+		balance_dirty();
+	}
+
 	err = 0;


On a related note, while testing some O_DIRECT cases on ext2, I found
another, possibly related, bug.

If you O_DIRECT write 2 blocks into a new file, truncate() the file to
1.5 blocks, then do an O_DIRECT read of the last block, you get
incorrect data.  Rather than 1/2 data, 1/2 zeroes, I get all zeroes.
This test passes on xfs.

Comments?

-Eric


-- 
Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
sandeen@sgi.com   SGI, Inc.         651-683-3102