From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q840CsW7004490 for ; Mon, 3 Sep 2012 19:12:54 -0500 Message-ID: <504547BC.3000907@sgi.com> Date: Mon, 03 Sep 2012 19:13:48 -0500 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: [PATCH V2 00/13] xfs: remove the xfssyncd mess References: <1346328017-2795-1-git-send-email-david@fromorbit.com> <5040C3A0.2050107@sgi.com> <20120903040523.GP15292@dastard> In-Reply-To: <20120903040523.GP15292@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com On 09/02/12 23:05, Dave Chinner wrote: > On Fri, Aug 31, 2012 at 09:01:04AM -0500, Mark Tinguely wrote: >> On 08/30/12 07:00, Dave Chinner wrote: >>> Version 2 of the patchset I described here: >>> >>> http://oss.sgi.com/archives/xfs/2012-06/msg00064.html >>> >>> This version has run through xfstests completely once, so it's >>> less likely to let smoke out.... >>> >>> Version 2: >>> - fix writeback_inodes_sb_if_idle call in xfs_create() >>> - refreshed patch 13 before sending. >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs >> >> I wanted to get a fast look at your patch series. I am getting the >> following ASSERT on xfstest 179 when running the series with the >> latest OSS soruces.The ASSERT appears to start at patch number 3. >> Sorry these boxes won't kdump the top of tree kernels: >> >> [17474.545964] XFS: Assertion failed: atomic_read(&bp->b_hold)> 0, >> file: /root/xfs/fs/xfs/xfs_buf.c, line: 896 > > FWIW, when you paste stack traces, can you turn off line wrapping > when you paste it so the crash is simple to quote in reply? (use > :set paste in mutt, the :set nopaste when finished pasting it in). > >> [17474.559784] Process umount (pid: 26427, threadinfo > ... >> [17474.559784] Call Trace: >> [17474.559784] [] xfs_buf_rele+0xa4/0x1b0 [xfs] >> [17474.559784] [] xfs_buf_iodone_work+0x46/0x50 [xfs] >> [17474.559784] [] xfs_buf_ioend+0x96/0x120 [xfs] >> [17474.559784] [] xfs_buf_iodone_callbacks+0x59/0x230 [xfs] >> [17474.559784] [] xfs_buf_iodone_work+0x21/0x50 [xfs] >> [17474.559784] [] xfs_buf_ioend+0x96/0x120 [xfs] >> [17474.559784] [] xfs_buf_item_unpin+0x289/0x2d0 [xfs] >> [17474.559784] [] xfs_trans_committed_bulk+0x213/0x300 [xfs] >> [17474.559784] [] xlog_cil_committed+0x36/0x130 [xfs] >> [17474.559784] [] xlog_cil_push+0x308/0x430 [xfs] >> [17474.559784] [] xlog_cil_force_lsn+0x146/0x1b0 [xfs] >> [17474.559784] [] _xfs_log_force+0x64/0x280 [xfs] >> [17474.559784] [] xfs_log_force+0x54/0x80 [xfs] >> [17474.559784] [] xfs_fs_sync_fs+0x2d/0x50 [xfs] >> [17474.559784] [] __sync_filesystem+0x2b/0x50 >> [17474.559784] [] sync_filesystem+0x43/0x60 >> [17474.559784] [] generic_shutdown_super+0x36/0xe0 >> [17474.559784] [] kill_block_super+0x2c/0x80 >> [17474.559784] [] deactivate_locked_super+0x38/0x90 >> [17474.559784] [] deactivate_super+0x61/0x70 >> [17474.559784] [] mntput_no_expire+0x149/0x1b0 >> [17474.559784] [] sys_umount+0x6e/0xd0 > > Nothing has been shut down in XFS at this point (i.e. .put_super() > has not yet been called) so none of the shutdown changes could have > caused this problem. > > Indeed, it looks like this is during a forced shutdown here in > xfs_buf_item_unpin: > > } else if (freed&& remove) { > xfs_buf_lock(bp); > xfs_buf_ioerror(bp, EIO); > XFS_BUF_UNDONE(bp); > xfs_buf_stale(bp); >>>>>>> xfs_buf_ioend(bp, 0); > } > > Now, xfs_buf_stale() does this: > > ASSERT(atomic_read(&bp->b_hold)>= 1); > > Which means that in calling xfs_buf_ioend(), at least two references > to the buffer are being dropped. Working out why that is occurring > will find the root cause of this problem. > > All that I can say at this point is that I find it highly unlikely > that it is caused by the changes in this patchset. > >> I got this ASSERT when I ran it on the 8/27 OSS sources: >> >> [188646.952426] XFS: Assertion failed: >> atomic_read(&iclog->ic_refcnt) == 0, file: >> /root/xfs/fs/xfs/xfs_log.c, line: 2590 > >> [188646.967020] Process kworker/2:1H (pid: 356, threadinfo ffff8808396a4000, task ffff88083a9aa1c0) >> [188646.967020] Call Trace: >> [188646.967020] [] xlog_state_done_syncing+0x7f/0x110 [xfs] >> [188646.967020] [] xlog_iodone+0x7e/0x100 [xfs] >> [188646.967020] [] xfs_buf_iodone_work+0x21/0x50 [xfs] >> [188646.967020] [] process_one_work+0x1d3/0x370 >> [188646.967020] [] worker_thread+0x133/0x390 >> [188646.967020] [] kthread+0x9e/0xb0 >> [188646.967020] [] kernel_thread_helper+0x4/0x10 > > I've never seen that ASSERT fire. That implies we've got a log > buffer that is being actively modified under IO, but I cannot see > how that would happen. Was this during an unmount? What test? > > /me is starting to wonder about memory errors... > > Cheers, > > Dave. all panic on xfstest 179 - 3 different machines: 2 are x86_64 and one is x86_32. I believe all have XFS debug turned. I will see what else I can find out. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs