From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 10E727F54 for ; Wed, 19 Aug 2015 13:39:12 -0500 (CDT) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay1.corp.sgi.com (Postfix) with ESMTP id F400B8F804C for ; Wed, 19 Aug 2015 11:39:08 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id AnBIRcB8Ds57EaVm (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Wed, 19 Aug 2015 11:39:07 -0700 (PDT) Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id EE4CA36B1FD for ; Wed, 19 Aug 2015 18:39:06 +0000 (UTC) Received: from bfoster.bfoster (dhcp-41-103.bos.redhat.com [10.18.41.103]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7JId6UN000557 for ; Wed, 19 Aug 2015 14:39:06 -0400 Date: Wed, 19 Aug 2015 14:39:05 -0400 From: Brian Foster Subject: v5 filesystem corruption due to log recovery lsn ordering Message-ID: <20150819183904.GB49174@bfoster.bfoster> MIME-Version: 1.0 Content-Disposition: inline List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com Hi all, Here's another issue I've run into from recent log recovery testing... Many on-disk data structures for v5 filesystems have the LSN from the last modification stamped the associated header. As of the following commit, log recovery compares the recovery item LSN against the LSN of the on-disk structure to avoid restoration of stale contents: 50d5c8d xfs: check LSN ordering for v5 superblocks during recovery This presumably addresses some problems where recovery of the stale contents leads to CRC failure. The problem here is that xfs_repair clears the log (even when the fs is clean) and resets the current LSN on the next mount. This creates a situation where logging is ineffective for any structure that has not yet been modified since the current LSN was reset. I'm not quite sure how pervasive this is in practice, but the following is a corruption reproducer for directory buffers: - mkfs (-m crc=1), mount and fsstress a filesystem for a bit such that the LSN increases a decent amount (e.g., several log cycles or so). # cat /sys/fs/xfs/dm-3/log/log_* 3:9378 3:9376 - Kill fsstress, create a new directory and populate with some files: # mkdir /mnt/dir # for i in $(seq 0 999); do touch /mnt/dir/$i; done - Unmount the fs, run xfs_repair, mount the fs and verify the LSN has been reset: # cat /sys/fs/xfs/dm-3/log/log_* 1:2 1:2 - Remove a file from the previously created directory and immediately shutdown the fs, flushing the log: # rm -f /mnt/dir/0; ~/xfstests-dev/src/godown -f /mnt/ # umount /mnt - Remount the fs to replay the log. Unmount and repair once more: # mount /mnt; umount /mnt # xfs_repair -n ... imap claims in-use inode 3082 is free, would correct imap ... ... and the filesystem is inconsistent. This occurs because the log recovery records are tagged with an LSN based on the reset value of (1:2) and the buffers to be recovered that hadn't yet been rewritten before the shutdown have an LSN from around the time the fsstress was stopped. The target buffer is incorrectly seen as "newer" than the recovery item, and thus recovery of this buffer is skipped. Note that the resulting behavior is not always consistent. I have seen log recovery ignore the file removal such that the fs is consistent and the modification is simply lost. The original instance I hit on a separate machine caused repair to complain about and fix the directory rather than the imap, but that could have been a repair thing. The larger question is how to resolve this problem? I don't think this is something that is ultimately addressed in xfs_repair. Even if we stopped clearing the log, that doesn't help users who might have had to forcibly zero the log to recover a filesystem. Another option in theory might be to unconditionally reset the LSN of everything on disk, but that sounds like overkill just to preserve the current kernel workaround. It sounds more to me that we have to adjust this behavior on the kernel side. That said, the original commit presumably addresses some log recovery shutdown problems that we do not want to reintroduce. I haven't yet wrapped my head around what that original problem was, but I wanted to get this reported. If the issue was early buffer I/O submission, perhaps we need a new mechanism to defer this I/O submission until a point that CRC verification is expected to pass (or otherwise generate a filesystem error)? Or perhaps do something similar with CRC verification? Any other thoughts, issues or things I might have missed here? Brian _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs