From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [3.2-rc7] slowdown, warning + oops creating lots of files Date: Thu, 5 Jan 2012 22:43:40 +1100 Message-ID: <20120105114340.GF24466@dastard> References: <20120104214445.GE17026@dastard> <20120104221105.GF17026@dastard> <4F04D178.2070006@csamuel.org> <20120104230122.GA24466@dastard> <4F050996.1060206@cn.fujitsu.com> <20120105022630.GD24466@dastard> <4F05F5E3.70600@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Chris Samuel , linux-btrfs@vger.kernel.org To: Liu Bo Return-path: In-Reply-To: <4F05F5E3.70600@cn.fujitsu.com> List-ID: On Thu, Jan 05, 2012 at 02:11:31PM -0500, Liu Bo wrote: > On 01/04/2012 09:26 PM, Dave Chinner wrote: > > On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote: > >> On 01/04/2012 06:01 PM, Dave Chinner wrote: > >>> On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote: > >>>> On 05/01/12 09:11, Dave Chinner wrote: > >>>> > >>>>> Looks to be reproducable. > >>>> Does this happen with rc6 ? > >>> I haven't tried. All I'm doing is running some benchmarks to get > >>> numbers for a talk I'm giving about improvements in XFS metadata > >>> scalability, so I wanted to update my last set of numbers from > >>> 2.6.39. > >>> > >>> As it was, these benchmarks also failed on btrfs with oopsen and > >>> corruptions back in 2.6.39 time frame. e.g. same VM, same > >>> test, different crashes, similar slowdowns as reported here: > >>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062 > >>> > >>> Given that there is now a history of this simple test uncovering > >>> problems, perhaps this is a test that should be run more regularly > >>> by btrfs developers? > >>> > >>>> If not then it might be easy to track down as there are only > >>>> 2 modifications between rc6 and rc7.. > >>> They don't look like they'd be responsible for fixing an extent tree > >>> corruption, and I don't really have the time to do an open-ended > >>> bisect to find where the problem fix arose. > >>> > >>> As it is, 3rd attempt failed at 22m inodes, without the warning this > >>> time: > > > > ..... > > > >>> It's hard to tell exactly what path gets to that BUG_ON(), so much > >>> code is inlined by the compiler into run_clustered_refs() that I > >>> can't tell exactly how it got to the BUG_ON() triggered in > >>> alloc_reserved_tree_block(). > >>> > >> This seems to be an oops led by ENOSPC. > > > > At the time of the oops, this is the space used on the filesystem: > > > > $ df -h /mnt/scratch > > Filesystem Size Used Avail Use% Mounted on > > /dev/vdc 17T 31G 17T 1% /mnt/scratch > > > > It's less than 0.2% full, so I think ENOSPC can be ruled out here. > > > > This bug has done something with our block reservation allocator, not the real disk space. > > Can you try the below one and see what happens? Still crashes, still has severe slowdowns. Cheers, Dave. -- Dave Chinner david@fromorbit.com