From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q3KIJoN7254337 for ; Fri, 20 Apr 2012 13:19:50 -0500 Message-ID: <4F91A8C2.3050907@sgi.com> Date: Fri, 20 Apr 2012 13:19:46 -0500 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: [PATCH 09/10] xfs: on-stack delayed write buffer lists References: <20120327164400.967415009@bombadil.infradead.org> <20120327164646.975031281@bombadil.infradead.org> In-Reply-To: <20120327164646.975031281@bombadil.infradead.org> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Christoph Hellwig Cc: xfs@oss.sgi.com On 03/27/12 11:44, Christoph Hellwig wrote: > Queue delwri buffers on a local on-stack list instead of a per-buftarg one, > and write back the buffers per-process instead of by waking up xfsbufd. > > This is now easily doable given that we have very few places left that write > delwri buffers: > > - log recovery: > Only done at mount time, and already forcing out the buffers > synchronously using xfs_flush_buftarg > > - quotacheck: > Same story. > > - dquot reclaim: > Writes out dirty dquots on the LRU under memory pressure. We might > want to look into doing more of this via xfsaild, but it's already > more optimal than the synchronous inode reclaim that writes each > buffer synchronously. > > - xfsaild: > This is the main beneficiary of the change. By keeping a local list > of buffers to write we reduce latency of writing out buffers, and > more importably we can remove all the delwri list promotions which > were hitting the buffer cache hard under sustained metadata loads. > > The implementation is very straight forward - xfs_buf_delwri_queue now gets > a new list_head pointer that it adds the delwri buffers to, and all callers > need to eventually submit the list using xfs_buf_delwi_submit or > xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are > skipped in xfs_buf_delwri_queue, assuming they already are on another delwri > list. The biggest change to pass down the buffer list was done to the AIL > pushing. Now that we operate on buffers the trylock, push and pushbuf log > item methods are merged into a single push routine, which tries to lock the > item, and if possible add the buffer that needs writeback to the buffer list. > This leads to much simpler code than the previous split but requires the > individual IOP_PUSH instances to unlock and reacquire the AIL around calls > to blocking routines. > > Given that xfsailds now also handles writing out buffers the conditions for > log forcing and the sleep times needed some small changes. The most > important one is that we consider an AIL busy as long we still have buffers > to push, and the other one is that we do increment the pushed LSN for > buffers that are under flushing at this moment, but still count them towards > the stuck items for restart purposes. Without this we could hammer on stuck > items without ever forcing the log and not make progress under heavy random > delete workloads on fast flash storage devices. > > Signed-off-by: Christoph Hellwig Test 106 runs to completion with patch 06. Patch 07 and 08 do not compile without patch 09. Starting with patch 09, I get the following hang on every test 106: ID: 27992 TASK: ffff8808310d00c0 CPU: 2 COMMAND: "mount" #0 [ffff880834237938] __schedule at ffffffff81417200 #1 [ffff880834237a80] schedule at ffffffff81417574 #2 [ffff880834237a90] schedule_timeout at ffffffff81415805 #3 [ffff880834237b30] wait_for_common at ffffffff81416a67 #4 [ffff880834237bc0] wait_for_completion at ffffffff81416bd8 #5 [ffff880834237bd0] xfs_buf_iowait at ffffffffa04fc5a5 [xfs] #6 [ffff880834237c00] xfs_buf_delwri_submit at ffffffffa04fe4b9 [xfs] #7 [ffff880834237c40] xfs_qm_quotacheck at ffffffffa055cb2d [xfs] #8 [ffff880834237cc0] xfs_qm_mount_quotas at ffffffffa055cdf0 [xfs] #9 [ffff880834237cf0] xfs_mountfs at ffffffffa054c041 [xfs] #10 [ffff880834237d40] xfs_fs_fill_super at ffffffffa050ca80 [xfs] #11 [ffff880834237d70] mount_bdev at ffffffff81150c5c #12 [ffff880834237de0] xfs_fs_mount at ffffffffa050ac00 [xfs] #13 [ffff880834237df0] mount_fs at ffffffff811505f8 #14 [ffff880834237e40] vfs_kern_mount at ffffffff8116c070 #15 [ffff880834237e80] do_kern_mount at ffffffff8116c16e #16 [ffff880834237ec0] do_mount at ffffffff8116d6f0 #17 [ffff880834237f20] sys_mount at ffffffff8116d7f3 #18 [ffff880834237f80] system_call_fastpath at ffffffff814203b9 The workers seem to be idle. For example the xfsaild: PID: 27676 TASK: ffff880832880240 CPU: 3 COMMAND: "xfsaild/sda7" #0 [ffff880832933cb0] __schedule at ffffffff81417200 #1 [ffff880832933df8] schedule at ffffffff81417574 #2 [ffff880832933e08] schedule_timeout at ffffffff81415805 #3 [ffff880832933ea8] xfsaild at ffffffffa0555935 [xfs] #4 [ffff880832933ee8] kthread at ffffffff8105dd6e #5 [ffff880832933f48] kernel_thread_helper at ffffffff814216a4 The hang is on the third quotacheck. Should be easy to duplicate this. --Mark Tinguely. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs