From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B67BAC433F5 for ; Mon, 7 Feb 2022 22:33:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236354AbiBGWd5 (ORCPT ); Mon, 7 Feb 2022 17:33:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234065AbiBGWd5 (ORCPT ); Mon, 7 Feb 2022 17:33:57 -0500 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 094ACC061355 for ; Mon, 7 Feb 2022 14:33:55 -0800 (PST) Received: from dread.disaster.area (pa49-180-69-7.pa.nsw.optusnet.com.au [49.180.69.7]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id CDF5210C7938; Tue, 8 Feb 2022 09:33:54 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1nHCa1-009KUK-0P; Tue, 08 Feb 2022 09:33:53 +1100 Date: Tue, 8 Feb 2022 09:33:52 +1100 From: Dave Chinner To: Sean Caron Cc: linux-xfs@vger.kernel.org Subject: Re: XFS disaster recovery Message-ID: <20220207223352.GG59729@dread.disaster.area> References: <20220201233312.GX59729@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=deDjYVbe c=1 sm=1 tr=0 ts=62019e53 a=NB+Ng1P8A7U24Uo7qoRq4Q==:117 a=NB+Ng1P8A7U24Uo7qoRq4Q==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=7-415B0cAAAA:8 a=jpeW6z9rN1Rjkf2ewFgA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Mon, Feb 07, 2022 at 05:03:03PM -0500, Sean Caron wrote: > Hi Dave, > > OK! With your patch and help on that other thread pertaining to > xfs_metadump I was able to get a full metadata dump of this > filesystem. > > I used xfs_mdrestore to set up a sparse image for this volume using my > dumped metadata: > > xfs_mdrestore /exports/home/work/md4.metadump /exports/home/work/md4.img > > Then set up a loopback device for it and tried to mount. > > losetup --show --find /exports/home/work/md4.img > mount /dev/loop0 /mnt > > When I do this, I get a "Structure needs cleaning" error and the > following in dmesg: > > [523615.874581] XFS (loop0): Corruption warning: Metadata has LSN > (7095:2330880) ahead of current LSN (7095:2328512). Please unmount and > run xfs_repair (>= v4.3) to resolve. > [523615.874637] XFS (loop0): Metadata corruption detected at > xfs_agi_verify+0xef/0x180 [xfs], xfs_agi block 0x10 > [523615.874666] XFS (loop0): Unmount and run xfs_repair > [523615.874679] XFS (loop0): First 128 bytes of corrupted metadata buffer: > [523615.874695] 00000000: 58 41 47 49 00 00 00 01 00 00 00 00 0f ff ff > f8 XAGI............ > [523615.874713] 00000010: 00 03 ba 40 00 04 ef 7e 00 00 00 02 00 00 00 > 34 ...@...~.......4 > [523615.874732] 00000020: 00 30 09 40 ff ff ff ff ff ff ff ff ff ff ff > ff .0.@............ > [523615.874750] 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ................ > [523615.874768] 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ................ > [523615.874787] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ................ > [523615.874806] 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ................ > [523615.874824] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ................ > [523615.874914] XFS (loop0): metadata I/O error in > "xfs_trans_read_buf_map" at daddr 0x10 len 8 error 117 > [523615.874998] XFS (loop0): xfs_imap_lookup: xfs_ialloc_read_agi() > returned error -117, agno 0 > [523615.876866] XFS (loop0): Failed to read root inode 0x80, error 117 Hmmm - I think this is after log recovery. The nature of the error (metadata LSN a few blocks larger than the current recovered LSN) implies that part of the log was lost during device failure/recovery and hence not recovered when mounting the filesystem. > Seems like the next step is to just run xfs_repair (with or without > log zeroing?) on this image and see what shakes out? Yup. You may be able to run it on the image file without log zeroing after the failed mount if there were no pending intents that needed replay. But it doesn't matter if you do zero the log at this point, as it's already replayed everything it can replay back into the filesystem and it will be as consistent as it's going to get. Regardless, you are still likely to get a bunch of "unlinked but not freed" inode warnings and inconsistent free space because the mount failed between the initial recovery phase and the final recovery phase that runs intent replay and processes unlinked inodes. Cheers, Dave. -- Dave Chinner david@fromorbit.com