From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B521C433FE for ; Tue, 8 Feb 2022 22:24:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1386296AbiBHWYb (ORCPT ); Tue, 8 Feb 2022 17:24:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386590AbiBHU4s (ORCPT ); Tue, 8 Feb 2022 15:56:48 -0500 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B582DC0612B9 for ; Tue, 8 Feb 2022 12:56:44 -0800 (PST) Received: from dread.disaster.area (pa49-180-69-7.pa.nsw.optusnet.com.au [49.180.69.7]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 988A310C68C9; Wed, 9 Feb 2022 07:56:41 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1nHXXU-009hEQ-MP; Wed, 09 Feb 2022 07:56:40 +1100 Date: Wed, 9 Feb 2022 07:56:40 +1100 From: Dave Chinner To: Sean Caron Cc: linux-xfs@vger.kernel.org Subject: Re: XFS disaster recovery Message-ID: <20220208205640.GJ59729@dread.disaster.area> References: <20220201233312.GX59729@dread.disaster.area> <20220207223352.GG59729@dread.disaster.area> <20220208015115.GI59729@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=6202d90b a=NB+Ng1P8A7U24Uo7qoRq4Q==:117 a=NB+Ng1P8A7U24Uo7qoRq4Q==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=7-415B0cAAAA:8 a=YbSQJlk7lLm4R3f1oWgA:9 a=CjuIK1q_8ugA:10 a=igBNqPyMv6gA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Feb 08, 2022 at 10:46:45AM -0500, Sean Caron wrote: > Hi Dave, > > I'm sorry for some imprecise language. The array is around 450 TB raw > and I will refer to it as roughly half a petabyte but factoring out > RAID parity disks and spare disks it should indeed be around 384 TB > formatted. Ah, OK, looks like it was a complete dump, then. > I found that if I ran the dev tree xfs_repair with the -P option, I > could get xfs_repair to complete a run. It exits with return code 130 > but the resulting loopback image filesystem is mountable and I see > around 27 TB in lost+found which would represent around 9% loss in > terms of what was actually on the filesystem. I'm sure that if that much ended up in lost+found, xfs_repair also threw away a whole load of metadata which means data will have been lost. And with this much metadata corruption occurring, it tends to imply that there will be widespread data corruption, too. Hence I think it's worth pointing out (maybe unnecessarily!) that xfs_repair doesn't tell you about (or fix) data corruption - it just rebuilds the metadata back into a consistent state. > Given where we started I think this is acceptable (more than > acceptable, IMO, I was getting to the point of expecting to have to > write off the majority of the filesystem) and it seems like a way > forward to get the majority of the data off this old filesystem. Yes, but you are still going to have to verify the data you can still access is not corrupted - random offsets within files could now contain garbage regardless of whether the file was moved to lost+found or not. > Is there anything further I should check or any caveats that I should > bear in mind applying this xfs_repair to the real filesystem? Or does > it seem reasonable to go ahead, repair this and start copying off? Seems reasonable to repeat the process on the real filesystem, but given the caveat about data corruption above, I suspect that the entire dataset on the filesystem might still end up being a complete write-off. Cheers, Dave. -- Dave Chinner david@fromorbit.com