From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p5KNp0Dx197086 for ; Mon, 20 Jun 2011 18:51:01 -0500 Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C0B6A1E4AEB0 for ; Mon, 20 Jun 2011 16:50:58 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id yb7OIsDrWU0iTDbM for ; Mon, 20 Jun 2011 16:50:58 -0700 (PDT) Date: Tue, 21 Jun 2011 09:50:40 +1000 From: Dave Chinner Subject: Re: question on xfs_vm_writepage in combination with fsync Message-ID: <20110620235040.GQ561@dastard> References: <4DFFB3F3.3070606@sgi.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <4DFFB3F3.3070606@sgi.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Kevan Rehm Cc: xfs@oss.sgi.com On Mon, Jun 20, 2011 at 03:56:19PM -0500, Kevan Rehm wrote: > Greetings, > > I've run into a case where the fsync() system call seems to have > returned before all file data was actually on disk. (A SLES11SP1 system > crash occurred shortly after an fsync which had returned zero. After > restarting the machine, the last I/O before the fsync is not in the > file.) In attempting to find the problem, I've come across code I don't > understand, and am hoping someone can enlighten me as to how things are > supposed to work. > > Routine xfs_vm_writepage has various situations under which it will > decide it can't currently initiate writeback on a page, and in that case > calls redirty_page_for_writepage, unlocks the page, and returns zero. > That seems to me to be incompatible with fsync(), so I'm obviously > missing some key piece of logic. > > The calling sequence of routines involved in fsync is: > > do_fsync->vfs_fsync->vfs_fsync_range-> > filemap_write_and_wait_range-> > __filemap_fdatawrite_range-> > do_writepages->generic_writepages-> > write_cache_pages > > Routine write_cache_pages walks the radix tree and calls > clear_page_dirty_for_io and then __writepage on each dirty page to > initiate writeback. __writepage calls xfs_vm_writepage. That routine > is occasionally unable to immediately start writeback of the page, and > so it calls redirty_page_for_writepage without setting the writeback flag. Hi Kevan, The current xfs_vm_writepage mainline code will only enter the redirty path if: - it is called from direct memory reclaim - it is called within a transaction context and we need to do an allocation transaction - it is WB_SYNC_NONE writeback and we can't get the inode lock without blocking during block mapping (EAGAIN case). None of these cases are triggered by fsync() driven (WB_SYNC_ALL) writeback, so AFAICT fsync() based writeback should not be skipping writeback of dirty pages in the given fsync range. So for a mainline kernel I don't think there are any problems w.r.t. fsync() and redirtying pages causing dirty pages to be skipped during writeback. However, the mainline writeback path has had significant change (especially to WB_SYNC_ALL writeback) since sles11sp1 was snapshotted (2.6.32, right?). Hence it is possible that one (or several) of the changes fixed this bug without us even realising it was a problem. That said, having dirty pages after an fsync is not necessarily an fsync bug - something coul dhave dirtied them while the fsync was in progress. I don't know any details of how this occurred, so I'm simply speculating that there could be other causes of the dirty pages you are seeing... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs