From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:37956 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726194AbeKGQRG (ORCPT ); Wed, 7 Nov 2018 11:17:06 -0500 Date: Wed, 7 Nov 2018 17:48:04 +1100 From: Dave Chinner Subject: Re: [PATCH 0/7] xfs_repair: scale to 150,000 iops Message-ID: <20181107064804.GW19305@dastard> References: <20181030112043.6034-1-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Arkadiusz =?utf-8?Q?Mi=C5=9Bkiewicz?= Cc: linux-xfs@vger.kernel.org On Wed, Nov 07, 2018 at 06:44:54AM +0100, Arkadiusz Miƛkiewicz wrote: > On 30/10/2018 12:20, Dave Chinner wrote: > > Hi folks, > > > > This patchset enables me to successfully repair a rather large > > metadump image (~500GB of metadata) that was provided to us because > > it crashed xfs_repair. Darrick and Eric have already posted patches > > to fix the crash bugs, and this series is built on top of them. > > I was finally able to repair my big fs using for-next + these patches. > > But it wasn't as easy as just running repair. > > With default bhash OOM killed repair in ~1/3 of phase6 (128GB of ram + > 50GB of ssd swap). bhash=256000 worked. Yup, we need to work on the default bhash sizing. it comes out at about 750,000 for 128GB ram on your fs. It needs to be much smaller. > Sometimes segfault happens but I don't have any stack trace > unfortunately and trying to reproduce on my other test machine > gave me no luck. > > One time I got: > xfs_repair: workqueue.c:142: workqueue_add: Assertion `wq->item_count == > 0' failed. Yup, I think i've fixed that - a throttling wakeup related race condition - but I'm still trying to reproduce it to confirm I've fixed it... Cheers, Dave. -- Dave Chinner david@fromorbit.com