From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes Date: Tue, 25 Aug 2015 08:27:20 +1000 Message-ID: <20150824222720.GD714@dastard> References: <20150818091603.GA12317@quack.suse.cz> <20150818174718.GA15739@mtj.duckdns.org> <20150818195439.GB15739@mtj.duckdns.org> <20150818215611.GD3902@dastard> <20150820061224.GG17933@dhcp-13-216.nay.redhat.com> <20150820143626.GI17933@dhcp-13-216.nay.redhat.com> <20150820143735.GJ17933@dhcp-13-216.nay.redhat.com> <20150820165537.GA2044@mtj.duckdns.org> <20150820230451.GT714@dastard> <20150824181038.GA28944@mtj.duckdns.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Jens Axboe , Jan Kara , Eryu Guan , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, axboe@fb.com, Jan Kara , linux-fsdevel@vger.kernel.org, kernel-team@fb.com To: Tejun Heo Return-path: Content-Disposition: inline In-Reply-To: <20150824181038.GA28944@mtj.duckdns.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com List-Id: linux-fsdevel.vger.kernel.org On Mon, Aug 24, 2015 at 02:10:38PM -0400, Tejun Heo wrote: > Hello, Dave. > > On Fri, Aug 21, 2015 at 09:04:51AM +1000, Dave Chinner wrote: > > > Maybe I'm misunderstanding the code but all xfs_writepage() calls are > > > from unbound workqueues - the writeback workers - while > > > xfs_setfilesize() are from bound workqueues, so I wondered why that > > > was and looked at the code and the setsize functions are run off of a > > > separate work item which is queued from the end_bio callback and I > > > can't tell who would be waiting for them. Dave, what am I missing? > > > > xfs_setfilesize runs transactions, so it can't be run from IO > > completion context as it needs to block (i.e. on log space or inode > > locks). It also can't block log IO completion, nor metadata Io > > completion, as only log IO completion can free log space, and the > > inode lock might be waiting on metadata buffer IO completion (e.g. > > during delayed allocation). Hence we have multiple IO completion > > workqueues to keep these things separated and deadlock free. i.e. > > they all get punted to a workqueue where they are then processed in > > a context that can block safely. > > I'm still a bit confused. What prevents the following from happening? > > 1. io completion of last dirty page of an inode and work item for > xfs_setfilesize() is queued. > > 2. inode removed from dirty list. The inode has already been removed from the dirty list - that happens at inode writeback submission time, not IO completion. > 3. __sync_filesystem() invokes sync_inodes_sb(). There are no dirty > pages, so it finishes. There are no dirty pages, but the pages aren't clean, either. i.e they are still under writeback. Hence we need to invoke wait_inodes_sb() to wait for writeback on all pages to complete before returning. > 4. xfs_fs_sync_fs() is called which calls _xfs_log_force() but the > work item from #1 hasn't run yet, so the size update isn't written > out. The bug here is that wait_inodes_sb() has not been run, therefore ->syncfs is being run before IO completions have been processed and pages marked clean. > 5. Crash. > > Is it that _xfs_log_force() waits for the setfilesize transaction > created during writepage? No, it's wait_inodes_sb() that does the waiting for data IO completion for sync. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs