From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Sandeen Subject: Some O_DIRECT vs. block_truncate_page problems Date: 23 Sep 2002 13:09:37 -0500 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <1032804577.27949.17.camel@stout.americas.sgi.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from zeus-e8.americas.sgi.com (zeus-e8.americas.sgi.com [192.48.203.6]) by tolkor.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g8NIGHS9000817 for ; Mon, 23 Sep 2002 13:16:17 -0500 Received: from poppy-e185.americas.sgi.com (poppy-e185.americas.sgi.com [128.162.185.207]) by zeus-e8.americas.sgi.com (SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id NAA05104 for ; Mon, 23 Sep 2002 13:15:37 -0500 (CDT) Received: from stout.americas.sgi.com (stout.americas.sgi.com [128.162.187.5]) by poppy-e185.americas.sgi.com (980427.SGI.8.8.8/SGI-server-1.8) with ESMTP id NAA07363 for ; Mon, 23 Sep 2002 13:15:37 -0500 (CDT) To: linux-fsdevel List-Id: linux-fsdevel.vger.kernel.org I'm trying to track down a bug I'm seeing after I run xfs's defragmenter, where data is present past EOF in the last block. xfs_fsr does this (essentially): o allocate some contiguous space with an xfs-specific ioctl o copy the fragmented file into the new file with O_DIRECT o truncate() the new file back to the correct size (since it's currently a multiple of the block size) o sync the new file o do "xfs_inval_cached_pages", which is essentially: - filemap_fdatasync(ip->i_mapping); - fsync_inode_data_buffers(ip); - filemap_fdatawait(ip->i_mapping); - truncate_inode_pages(ip->i_mapping, first); o swap the extents between the old & the new files When this is all done, I see data past EOF in the resulting file. I believe that this is because the buffer from the truncate() call is not being synced to disk, and gets thrown away when we do truncate_inode_pages(). block_truncate_page() does a __mark_buffer_dirty(bh) at the end, but it does not file the buffer on the inode's dirty data queue, so the fsync_inode_data_buffers() does not see it. The inode only has a page on it's clean pages list, so filemap_fdatasync does not see it either. Since generic_file_direct_IO expects to do filemap_fdatasync, fsync_inode_data_buffers, filemap_fdatawait to flush data to disk, I think that the behavior after a truncate() will be broken for any filesystem. This patch seems to fix things up for me; is there a reason that the buffer was not inserted originally? --- linux/fs/buffer.c_1.109 Mon Sep 23 13:10:56 2002 +++ linux/fs/buffer.c Mon Sep 23 13:04:44 2002 @@ -2028,7 +2028,12 @@ flush_dcache_page(page); kunmap(page); - __mark_buffer_dirty(bh); + if (!atomic_set_buffer_dirty(bh)) { + __mark_dirty(bh); + buffer_insert_inode_data_queue(bh, inode); + balance_dirty(); + } + err = 0; On a related note, while testing some O_DIRECT cases on ext2, I found another, possibly related, bug. If you O_DIRECT write 2 blocks into a new file, truncate() the file to 1.5 blocks, then do an O_DIRECT read of the last block, you get incorrect data. Rather than 1/2 data, 1/2 zeroes, I get all zeroes. This test passes on xfs. Comments? -Eric -- Eric Sandeen XFS for Linux http://oss.sgi.com/projects/xfs sandeen@sgi.com SGI, Inc. 651-683-3102