From: Liu Bo <liubo2009@cn.fujitsu.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Chris Samuel <chris@csamuel.org>, linux-btrfs@vger.kernel.org
Subject: Re: [3.2-rc7] slowdown, warning + oops creating lots of files
Date: Thu, 05 Jan 2012 14:11:31 -0500 [thread overview]
Message-ID: <4F05F5E3.70600@cn.fujitsu.com> (raw)
In-Reply-To: <20120105022630.GD24466@dastard>
On 01/04/2012 09:26 PM, Dave Chinner wrote:
> On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
>> On 01/04/2012 06:01 PM, Dave Chinner wrote:
>>> On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
>>>> On 05/01/12 09:11, Dave Chinner wrote:
>>>>
>>>>> Looks to be reproducable.
>>>> Does this happen with rc6 ?
>>> I haven't tried. All I'm doing is running some benchmarks to get
>>> numbers for a talk I'm giving about improvements in XFS metadata
>>> scalability, so I wanted to update my last set of numbers from
>>> 2.6.39.
>>>
>>> As it was, these benchmarks also failed on btrfs with oopsen and
>>> corruptions back in 2.6.39 time frame. e.g. same VM, same
>>> test, different crashes, similar slowdowns as reported here:
>>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
>>>
>>> Given that there is now a history of this simple test uncovering
>>> problems, perhaps this is a test that should be run more regularly
>>> by btrfs developers?
>>>
>>>> If not then it might be easy to track down as there are only
>>>> 2 modifications between rc6 and rc7..
>>> They don't look like they'd be responsible for fixing an extent tree
>>> corruption, and I don't really have the time to do an open-ended
>>> bisect to find where the problem fix arose.
>>>
>>> As it is, 3rd attempt failed at 22m inodes, without the warning this
>>> time:
>
> .....
>
>>> It's hard to tell exactly what path gets to that BUG_ON(), so much
>>> code is inlined by the compiler into run_clustered_refs() that I
>>> can't tell exactly how it got to the BUG_ON() triggered in
>>> alloc_reserved_tree_block().
>>>
>> This seems to be an oops led by ENOSPC.
>
> At the time of the oops, this is the space used on the filesystem:
>
> $ df -h /mnt/scratch
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdc 17T 31G 17T 1% /mnt/scratch
>
> It's less than 0.2% full, so I think ENOSPC can be ruled out here.
>
This bug has done something with our block reservation allocator, not the real disk space.
Can you try the below one and see what happens?
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b1c8732..5a7f918 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3978,8 +3978,8 @@ static u64 calc_global_metadata_size(struct btrfs_fs_info *fs_info)
csum_size * 2;
num_bytes += div64_u64(data_used + meta_used, 50);
- if (num_bytes * 3 > meta_used)
- num_bytes = div64_u64(meta_used, 3);
+ if (num_bytes * 2 > meta_used)
+ num_bytes = div64_u64(meta_used, 2);
return ALIGN(num_bytes, fs_info->extent_root->leafsize << 10);
}
> I have noticed one thing, however, in that the there are significant
> numbers of reads coming from disk when the slowdowns and oops occur.
> When everything runs fast, there are virtually no reads occurring at
> all. It looks to me that maybe the working set of metadata is being
> kicked out of memory, only to be read back in again short while
> later. Maybe that is a contributing factor.
>
> BTW, there is a lot of CPU time being spent on the tree locks. perf
> shows this as the top 2 CPU consumers:
>
> - 9.49% [kernel] [k] __write_lock_failed
> - __write_lock_failed
> - 99.80% _raw_write_lock
> - 79.35% btrfs_try_tree_write_lock
> 99.99% btrfs_search_slot
> - 20.63% btrfs_tree_lock
> 89.19% btrfs_search_slot
> 10.54% btrfs_lock_root_node
> btrfs_search_slot
> - 9.25% [kernel] [k] _raw_spin_unlock_irqrestore
> - _raw_spin_unlock_irqrestore
> - 55.87% __wake_up
> + 93.89% btrfs_clear_lock_blocking_rw
> + 3.46% btrfs_tree_read_unlock_blocking
> + 2.35% btrfs_tree_unlock
>
hmm, the new extent_buffer lock scheme written by Chris is aimed to avoid such cases,
maybe he can provide some advices.
thanks,
liubo
> Cheers,
>
> Dave.
next prev parent reply other threads:[~2012-01-05 19:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-04 21:44 [3.2-rc7] slowdown, warning + oops creating lots of files Dave Chinner
2012-01-04 22:11 ` Dave Chinner
2012-01-04 22:23 ` Chris Samuel
2012-01-04 23:01 ` Dave Chinner
2012-01-05 2:23 ` Liu Bo
2012-01-05 2:26 ` Dave Chinner
2012-01-05 19:11 ` Liu Bo [this message]
2012-01-05 11:43 ` Dave Chinner
2012-01-05 18:46 ` Chris Mason
2012-01-05 19:45 ` Chris Mason
2012-01-05 20:12 ` Dave Chinner
2012-01-05 21:02 ` Chris Mason
2012-01-05 21:24 ` Chris Samuel
2012-01-06 1:22 ` Chris Mason
2012-01-07 21:34 ` Christian Brunner
2012-01-12 16:18 ` Christian Brunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F05F5E3.70600@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=chris@csamuel.org \
--cc=david@fromorbit.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).