linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [3.2-rc7] slowdown, warning + oops creating lots of files
@ 2012-01-04 21:44 Dave Chinner
  2012-01-04 22:11 ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-01-04 21:44 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5739 bytes --]

Hi there buttery folks,

I just hit this warning and oops running a parallel fs_mark create
workload on a test VM using a 17TB btrfs filesystem (12 disk dm
RAID0) using default mkfs and mount parmeters, mounted on
/mnt/scratch. The VM has 8p and 4GB RAM, and the fs_mark command
line was:

$ ./fs_mark  -D  10000  -S0  -n  100000 -s  0  -L  250 \
	-d /mnt/scratch/0  -d /mnt/scratch/1 \
	-d /mnt/scratch/2  -d /mnt/scratch/3 \
	-d /mnt/scratch/4  -d /mnt/scratch/5 \
	-d /mnt/scratch/6  -d /mnt/scratch/7

The attached image should give you a better idea of the performance
drop-off that was well under way when the crash occurred at about 96
million files created.

I'm rerunning the test on a fresh filesystem, so I guess I'll see if
this a one-off in the next couple of hours....

Cheers,

Dave.

[70972.788725] device fsid 5c8b0884-4e94-41ac-9a7e-8628a03a6f6d devid 1 transid 4 /dev/vdc
[70972.795441] btrfs: disk space caching is enabled
[121727.817810] ------------[ cut here ]------------
[121727.819430] WARNING: at fs/btrfs/extent-tree.c:4771 __btrfs_free_extent+0x630/0x6d0()
[121727.821729] Hardware name: Bochs
[121727.823417] Modules linked in:
[121727.824601] Pid: 2984, comm: btrfs-transacti Not tainted 3.2.0-rc7-dgc+ #167
[121727.826784] Call Trace:
[121727.827508]  [<ffffffff8108a69f>] warn_slowpath_common+0x7f/0xc0
[121727.829290]  [<ffffffff8108a6fa>] warn_slowpath_null+0x1a/0x20
[121727.830785]  [<ffffffff815b0680>] __btrfs_free_extent+0x630/0x6d0
[121727.832486]  [<ffffffff815ac110>] ? block_group_cache_tree_search+0x90/0xc0
[121727.833907]  [<ffffffff815b42f1>] run_clustered_refs+0x381/0x800
[121727.834627]  [<ffffffff815b483a>] btrfs_run_delayed_refs+0xca/0x220
[121727.836118]  [<ffffffff815c5469>] btrfs_commit_transaction+0x359/0x840
[121727.837042]  [<ffffffff810ac420>] ? add_wait_queue+0x60/0x60
[121727.838053]  [<ffffffff815c5da4>] ? start_transaction+0x94/0x2b0
[121727.839133]  [<ffffffff810986d0>] ? usleep_range+0x50/0x50
[121727.840642]  [<ffffffff815be0d5>] transaction_kthread+0x265/0x280
[121727.842918]  [<ffffffff815bde70>] ? btrfs_congested_fn+0xb0/0xb0
[121727.844480]  [<ffffffff810ab97c>] kthread+0x8c/0xa0
[121727.845772]  [<ffffffff81abb5f4>] kernel_thread_helper+0x4/0x10
[121727.847204]  [<ffffffff810ab8f0>] ? flush_kthread_worker+0xa0/0xa0
[121727.848803]  [<ffffffff81abb5f0>] ? gs_change+0x13/0x13
[121727.850073] ---[ end trace 804317a2ff1cf200 ]---
[121727.851207] btrfs unable to find ref byte nr 14368411648 parent 0 root 2  owner 0 offset 0
[121727.853324] BUG: unable to handle kernel NULL pointer dereference at           (null)
[121727.855261] IP: [<ffffffff815e60f2>] map_private_extent_buffer+0x12/0x150
[121727.856730] PGD 0
[121727.857105] Oops: 0000 [#1] SMP
[121727.857243] CPU 1
[121727.857243] Modules linked in:
[121727.857243]
[121727.857243] Pid: 2984, comm: btrfs-transacti Tainted: G        W    3.2.0-rc7-dgc+ #167 Bochs Bochs
[121727.857243] RIP: 0010:[<ffffffff815e60f2>]  [<ffffffff815e60f2>] map_private_extent_buffer+0x12/0x150
[121727.857243] RSP: 0018:ffff880117b49b00  EFLAGS: 00010286
[121727.857243] RAX: 0000000000000000 RBX: 0000000000000065 RCX: ffff880117b49b30
[121727.857243] RDX: 0000000000000004 RSI: 000000000000007a RDI: 0000000000000000
[121727.857243] RBP: ffff880117b49b20 R08: ffff880117b49b38 R09: ffff880117b49b40
[121727.857243] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000007a
[121727.857243] R13: 0000000000000000 R14: 00000000ffffffe4 R15: 0000000000001000
[121727.857243] FS:  0000000000000000(0000) GS:ffff88011fc40000(0000) knlGS:0000000000000000
[121727.857243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[121727.857243] CR2: 0000000000000000 CR3: 0000000001ec1000 CR4: 00000000000006e0
[121727.857243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[121727.857243] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[121727.857243] Process btrfs-transacti (pid: 2984, threadinfo ffff880117b48000, task ffff88010f38e280)
[121727.857243] Stack:
[121727.857243]  00000000ffffffe4 0000000000000065 000000000000007a 0000000000000000
[121727.857243]  ffff880117b49b70 ffffffff815db96c ffff880117b49b80 ffff880117b49b40
[121727.857243]  0000000000000002 00000003586c9000 0000000000000000 ffff8801042d97f0
[121727.857243] Call Trace:
[121727.857243]  [<ffffffff815db96c>] btrfs_item_size+0x3c/0x90
[121727.857243]  [<ffffffff815b0282>] __btrfs_free_extent+0x232/0x6d0
[121727.857243]  [<ffffffff815ac110>] ? block_group_cache_tree_search+0x90/0xc0
[121727.857243]  [<ffffffff815b42f1>] run_clustered_refs+0x381/0x800
[121727.857243]  [<ffffffff815b483a>] btrfs_run_delayed_refs+0xca/0x220
[121727.857243]  [<ffffffff815c5469>] btrfs_commit_transaction+0x359/0x840
[121727.857243]  [<ffffffff810ac420>] ? add_wait_queue+0x60/0x60
[121727.857243]  [<ffffffff815c5da4>] ? start_transaction+0x94/0x2b0
[121727.857243]  [<ffffffff810986d0>] ? usleep_range+0x50/0x50
[121727.857243]  [<ffffffff815be0d5>] transaction_kthread+0x265/0x280
[121727.857243]  [<ffffffff815bde70>] ? btrfs_congested_fn+0xb0/0xb0
[121727.857243]  [<ffffffff810ab97c>] kthread+0x8c/0xa0
[121727.857243]  [<ffffffff81abb5f4>] kernel_thread_helper+0x4/0x10
[121727.857243]  [<ffffffff810ab8f0>] ? flush_kthread_worker+0xa0/0xa0
[121727.857243]  [<ffffffff81abb5f0>] ? gs_change+0x13/0x13
[121727.857243] Code: 83 c0 01 48 89 85 78 ff ff ff e9 c0 fc ff ff 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 <4c> 8b 27 48 89
[121727.857243] RIP  [<ffffffff815e60f2>] map_private_extent_buffer+0x12/0x150
[121727.857243]  RSP <ffff880117b49b00>
[121727.857243] CR2: 0000000000000000
[121727.904461] ---[ end trace 804317a2ff1cf201 ]---
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #2: xfs-btrfs-ext4.png --]
[-- Type: image/png, Size: 44436 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-04 21:44 [3.2-rc7] slowdown, warning + oops creating lots of files Dave Chinner
@ 2012-01-04 22:11 ` Dave Chinner
  2012-01-04 22:23   ` Chris Samuel
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-01-04 22:11 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Jan 05, 2012 at 08:44:45AM +1100, Dave Chinner wrote:
> Hi there buttery folks,
> 
> I just hit this warning and oops running a parallel fs_mark create
> workload on a test VM using a 17TB btrfs filesystem (12 disk dm
> RAID0) using default mkfs and mount parmeters, mounted on
> /mnt/scratch. The VM has 8p and 4GB RAM, and the fs_mark command
> line was:
> 
> $ ./fs_mark  -D  10000  -S0  -n  100000 -s  0  -L  250 \
> 	-d /mnt/scratch/0  -d /mnt/scratch/1 \
> 	-d /mnt/scratch/2  -d /mnt/scratch/3 \
> 	-d /mnt/scratch/4  -d /mnt/scratch/5 \
> 	-d /mnt/scratch/6  -d /mnt/scratch/7
> 
> The attached image should give you a better idea of the performance
> drop-off that was well under way when the crash occurred at about 96
> million files created.
> 
> I'm rerunning the test on a fresh filesystem, so I guess I'll see if
> this a one-off in the next couple of hours....

Looks to be reproducable. With a fresh filesystem performance was all
over the place from the start, and the warning/oops occurred at
about 43M files created. the failure stacks this time are:

[ 1490.841957] device fsid 4b7ec51b-9747-4244-a568-fbecdb157383 devid 1 transid 4 /dev/vdc
[ 1490.847408] btrfs: disk space caching is enabled
[ 3027.690722] ------------[ cut here ]------------
[ 3027.692612] WARNING: at fs/btrfs/extent-tree.c:4771 __btrfs_free_extent+0x630/0x6d0()
[ 3027.695607] Hardware name: Bochs
[ 3027.696968] Modules linked in:
[ 3027.697894] Pid: 3460, comm: fs_mark Not tainted 3.2.0-rc7-dgc+ #167
[ 3027.699581] Call Trace:
[ 3027.700452]  [<ffffffff8108a69f>] warn_slowpath_common+0x7f/0xc0
[ 3027.701973]  [<ffffffff8108a6fa>] warn_slowpath_null+0x1a/0x20
[ 3027.703480]  [<ffffffff815b0680>] __btrfs_free_extent+0x630/0x6d0
[ 3027.704981]  [<ffffffff815ac110>] ? block_group_cache_tree_search+0x90/0xc0
[ 3027.706368]  [<ffffffff815b42f1>] run_clustered_refs+0x381/0x800
[ 3027.707897]  [<ffffffff815b483a>] btrfs_run_delayed_refs+0xca/0x220
[ 3027.709347]  [<ffffffff815b8f1c>] ? btrfs_update_root+0x9c/0xe0
[ 3027.710909]  [<ffffffff815c3c33>] commit_cowonly_roots+0x33/0x1e0
[ 3027.712370]  [<ffffffff81ab168e>] ? _raw_spin_lock+0xe/0x20
[ 3027.713220]  [<ffffffff815c54cf>] btrfs_commit_transaction+0x3bf/0x840
[ 3027.714412]  [<ffffffff810ac420>] ? add_wait_queue+0x60/0x60
[ 3027.715460]  [<ffffffff815c5da4>] ? start_transaction+0x94/0x2b0
[ 3027.716790]  [<ffffffff815ac80c>] may_commit_transaction+0x6c/0x100
[ 3027.717843]  [<ffffffff815b2b47>] reserve_metadata_bytes.isra.71+0x5a7/0x660
[ 3027.719223]  [<ffffffff81073c23>] ? __wake_up+0x53/0x70
[ 3027.720328]  [<ffffffff815a43ba>] ? btrfs_free_path+0x2a/0x40
[ 3027.721511]  [<ffffffff815b2f9e>] btrfs_block_rsv_add+0x3e/0x70
[ 3027.722610]  [<ffffffff81666dfb>] ? security_d_instantiate+0x1b/0x30
[ 3027.723765]  [<ffffffff815c5f65>] start_transaction+0x255/0x2b0
[ 3027.725204]  [<ffffffff815c6283>] btrfs_start_transaction+0x13/0x20
[ 3027.726273]  [<ffffffff815d2236>] btrfs_create+0x46/0x220
[ 3027.727275]  [<ffffffff8116c204>] vfs_create+0xb4/0xf0
[ 3027.728344]  [<ffffffff8116e1d7>] do_last.isra.45+0x547/0x7c0
[ 3027.729400]  [<ffffffff8116f7ab>] path_openat+0xcb/0x3d0
[ 3027.730363]  [<ffffffff81ab168e>] ? _raw_spin_lock+0xe/0x20
[ 3027.731394]  [<ffffffff8117cc1e>] ? vfsmount_lock_local_unlock+0x1e/0x30
[ 3027.733077]  [<ffffffff8116fbd2>] do_filp_open+0x42/0xa0
[ 3027.733949]  [<ffffffff8117c487>] ? alloc_fd+0xf7/0x150
[ 3027.734911]  [<ffffffff8115f8e7>] do_sys_open+0xf7/0x1d0
[ 3027.735894]  [<ffffffff810b572a>] ? do_gettimeofday+0x1a/0x50
[ 3027.737304]  [<ffffffff8115f9e0>] sys_open+0x20/0x30
[ 3027.738099]  [<ffffffff81ab9502>] system_call_fastpath+0x16/0x1b
[ 3027.739199] ---[ end trace df586861a93ef3bf ]---
[ 3027.740348] btrfs unable to find ref byte nr 19982405632 parent 0 root 2  owner 0 offset 0
[ 3027.742001] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 3027.743502] IP: [<ffffffff815e60f2>] map_private_extent_buffer+0x12/0x150
[ 3027.744982] PGD 109d8e067 PUD 1050a9067 PMD 0
[ 3027.745968] Oops: 0000 [#1] SMP
[ 3027.745968] CPU 7
[ 3027.745968] Modules linked in:
[ 3027.745968]
[ 3027.745968] Pid: 3460, comm: fs_mark Tainted: G        W    3.2.0-rc7-dgc+ #167 Bochs Bochs
[ 3027.745968] RIP: 0010:[<ffffffff815e60f2>]  [<ffffffff815e60f2>] map_private_extent_buffer+0x12/0x150
[ 3027.745968] RSP: 0018:ffff8800d2ac36d8  EFLAGS: 00010296
[ 3027.745968] RAX: 0000000000000000 RBX: 0000000000000065 RCX: ffff8800d2ac3708
[ 3027.745968] RDX: 0000000000000004 RSI: 000000000000007a RDI: 0000000000000000
[ 3027.745968] RBP: ffff8800d2ac36f8 R08: ffff8800d2ac3710 R09: ffff8800d2ac3718
[ 3027.745968] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000007a
[ 3027.745968] R13: 0000000000000000 R14: 00000000ffffffe4 R15: 0000000000001000
[ 3027.745968] FS:  00007f3bf8ab5700(0000) GS:ffff88011fdc0000(0000) knlGS:0000000000000000
[ 3027.745968] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3027.745968] CR2: 00007fe424b0e000 CR3: 0000000106c33000 CR4: 00000000000006e0
[ 3027.745968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3027.745968] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3027.745968] Process fs_mark (pid: 3460, threadinfo ffff8800d2ac2000, task ffff88006c81c100)
[ 3027.745968] Stack:
[ 3027.745968]  00000000ffffffe4 0000000000000065 000000000000007a 0000000000000000
[ 3027.745968]  ffff8800d2ac3748 ffffffff815db96c ffff8800d2ac3758 ffff8800d2ac3718
[ 3027.745968]  0000000000000002 00000004a70b5000 0000000000000000 ffff88010bf1a910
[ 3027.745968] Call Trace:
[ 3027.745968]  [<ffffffff815db96c>] btrfs_item_size+0x3c/0x90
[ 3027.745968]  [<ffffffff815b0282>] __btrfs_free_extent+0x232/0x6d0
[ 3027.745968]  [<ffffffff815ac110>] ? block_group_cache_tree_search+0x90/0xc0
[ 3027.745968]  [<ffffffff815b42f1>] run_clustered_refs+0x381/0x800
[ 3027.745968]  [<ffffffff815b483a>] btrfs_run_delayed_refs+0xca/0x220
[ 3027.745968]  [<ffffffff815b8f1c>] ? btrfs_update_root+0x9c/0xe0
[ 3027.745968]  [<ffffffff815c3c33>] commit_cowonly_roots+0x33/0x1e0
[ 3027.745968]  [<ffffffff81ab168e>] ? _raw_spin_lock+0xe/0x20
[ 3027.745968]  [<ffffffff815c54cf>] btrfs_commit_transaction+0x3bf/0x840
[ 3027.745968]  [<ffffffff810ac420>] ? add_wait_queue+0x60/0x60
[ 3027.745968]  [<ffffffff815c5da4>] ? start_transaction+0x94/0x2b0
[ 3027.745968]  [<ffffffff815ac80c>] may_commit_transaction+0x6c/0x100
[ 3027.745968]  [<ffffffff815b2b47>] reserve_metadata_bytes.isra.71+0x5a7/0x660
[ 3027.745968]  [<ffffffff81073c23>] ? __wake_up+0x53/0x70
[ 3027.745968]  [<ffffffff815a43ba>] ? btrfs_free_path+0x2a/0x40
[ 3027.745968]  [<ffffffff815b2f9e>] btrfs_block_rsv_add+0x3e/0x70
[ 3027.745968]  [<ffffffff81666dfb>] ? security_d_instantiate+0x1b/0x30
[ 3027.745968]  [<ffffffff815c5f65>] start_transaction+0x255/0x2b0
[ 3027.745968]  [<ffffffff815c6283>] btrfs_start_transaction+0x13/0x20
[ 3027.745968]  [<ffffffff815d2236>] btrfs_create+0x46/0x220
[ 3027.745968]  [<ffffffff8116c204>] vfs_create+0xb4/0xf0
[ 3027.745968]  [<ffffffff8116e1d7>] do_last.isra.45+0x547/0x7c0
[ 3027.745968]  [<ffffffff8116f7ab>] path_openat+0xcb/0x3d0
[ 3027.745968]  [<ffffffff81ab168e>] ? _raw_spin_lock+0xe/0x20
[ 3027.745968]  [<ffffffff8117cc1e>] ? vfsmount_lock_local_unlock+0x1e/0x30
[ 3027.745968]  [<ffffffff8116fbd2>] do_filp_open+0x42/0xa0
[ 3027.745968]  [<ffffffff8117c487>] ? alloc_fd+0xf7/0x150
[ 3027.745968]  [<ffffffff8115f8e7>] do_sys_open+0xf7/0x1d0
[ 3027.745968]  [<ffffffff810b572a>] ? do_gettimeofday+0x1a/0x50
[ 3027.745968]  [<ffffffff8115f9e0>] sys_open+0x20/0x30
[ 3027.745968]  [<ffffffff81ab9502>] system_call_fastpath+0x16/0x1b
[ 3027.745968] Code: 83 c0 01 48 89 85 78 ff ff ff e9 c0 fc ff ff 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 <4c> 8b 27 48 89
[ 3027.745968] RIP  [<ffffffff815e60f2>] map_private_extent_buffer+0x12/0x150
[ 3027.745968]  RSP <ffff8800d2ac36d8>
[ 3027.745968] CR2: 0000000000000000
[ 3027.817725] ---[ end trace df586861a93ef3c0 ]---

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-04 22:11 ` Dave Chinner
@ 2012-01-04 22:23   ` Chris Samuel
  2012-01-04 23:01     ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Samuel @ 2012-01-04 22:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-btrfs

On 05/01/12 09:11, Dave Chinner wrote:

> Looks to be reproducable.

Does this happen with rc6 ?

If not then it might be easy to track down as there are only
2 modifications between rc6 and rc7..

commit 08c422c27f855d27b0b3d9fa30ebd938d4ae6f1f
Author: Al Viro <viro@ZenIV.linux.org.uk>
Date:   Fri Dec 23 07:58:13 2011 -0500

    Btrfs: call d_instantiate after all ops are setup
    
    This closes races where btrfs is calling d_instantiate too soon during
    inode creation.  All of the callers of btrfs_add_nondir are updated to
    instantiate after the inode is fully setup in memory.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit 8d532b2afb2eacc84588db709ec280a3d1219be3
Author: Chris Mason <chris.mason@oracle.com>
Date:   Fri Dec 23 07:53:00 2011 -0500

    Btrfs: fix worker lock misuse in find_worker
    
    Dan Carpenter noticed that we were doing a double unlock on the worker
    lock, and sometimes picking a worker thread without the lock held.
    
    This fixes both errors.
    
    Signed-off-by: Chris Mason <chris.mason@oracle.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>


-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-04 22:23   ` Chris Samuel
@ 2012-01-04 23:01     ` Dave Chinner
  2012-01-05  2:23       ` Liu Bo
  2012-01-05 18:46       ` Chris Mason
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2012-01-04 23:01 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> On 05/01/12 09:11, Dave Chinner wrote:
> 
> > Looks to be reproducable.
> 
> Does this happen with rc6 ?

I haven't tried. All I'm doing is running some benchmarks to get
numbers for a talk I'm giving about improvements in XFS metadata
scalability, so I wanted to update my last set of numbers from
2.6.39.

As it was, these benchmarks also failed on btrfs with oopsen and
corruptions back in 2.6.39 time frame.  e.g. same VM, same
test, different crashes, similar slowdowns as reported here:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062

Given that there is now a history of this simple test uncovering
problems, perhaps this is a test that should be run more regularly
by btrfs developers?

> If not then it might be easy to track down as there are only
> 2 modifications between rc6 and rc7..

They don't look like they'd be responsible for fixing an extent tree
corruption, and I don't really have the time to do an open-ended
bisect to find where the problem fix arose.

As it is, 3rd attempt failed at 22m inodes, without the warning this
time:

[   59.433452] device fsid 4d27dc14-562d-4722-9591-723bd2bbe94c devid 1 transid 4 /dev/vdc
[   59.437050] btrfs: disk space caching is enabled
[  753.258465] ------------[ cut here ]------------
[  753.259806] kernel BUG at fs/btrfs/extent-tree.c:5797!
[  753.260014] invalid opcode: 0000 [#1] SMP
[  753.260014] CPU 7
[  753.260014] Modules linked in:
[  753.260014]
[  753.260014] Pid: 2874, comm: fs_mark Not tainted 3.2.0-rc7-dgc+ #167 Bochs Bochs
[  753.260014] RIP: 0010:[<ffffffff815b475b>]  [<ffffffff815b475b>] run_clustered_refs+0x7eb/0x800
[  753.260014] RSP: 0018:ffff8800430258a8  EFLAGS: 00010286
[  753.260014] RAX: 00000000ffffffe4 RBX: ffff88009c8ab1c0 RCX: 0000000000000000
[  753.260014] RDX: 0000000000000008 RSI: 0000000000000282 RDI: 0000000000000000
[  753.260014] RBP: ffff880043025988 R08: 0000000000000000 R09: 0000000000000002
[  753.260014] R10: ffff8801188f6000 R11: ffff880101b50d20 R12: ffff88008fc1ad40
[  753.260014] R13: ffff88003940a6c0 R14: ffff880118a49000 R15: ffff88010fc77e80
[  753.260014] FS:  00007f416ce90700(0000) GS:ffff88011fdc0000(0000) knlGS:0000000000000000
[  753.260014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  753.260014] CR2: 00007f416c2f6000 CR3: 000000003aaea000 CR4: 00000000000006e0
[  753.260014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  753.260014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  753.260014] Process fs_mark (pid: 2874, threadinfo ffff880043024000, task ffff8800090e6180)
[  753.260014] Stack:
[  753.260014]  0000000000000000 0000000000000000 ffff880000000001 0000000000000000
[  753.260014]  ffff88010fc77f38 0000000000000e92 0000000000000000 0000000000000002
[  753.260014]  0000000000000e03 0000000000000e68 0000000000000000 ffff8800430259d8
[  753.260014] Call Trace:
[  753.260014]  [<ffffffff815b483a>] btrfs_run_delayed_refs+0xca/0x220
[  753.260014]  [<ffffffff815c5469>] btrfs_commit_transaction+0x359/0x840
[  753.260014]  [<ffffffff810ac420>] ? add_wait_queue+0x60/0x60
[  753.260014]  [<ffffffff815c5da4>] ? start_transaction+0x94/0x2b0
[  753.260014]  [<ffffffff815ac80c>] may_commit_transaction+0x6c/0x100
[  753.260014]  [<ffffffff815b2b47>] reserve_metadata_bytes.isra.71+0x5a7/0x660
[  753.260014]  [<ffffffff81073c23>] ? __wake_up+0x53/0x70
[  753.260014]  [<ffffffff815a43ba>] ? btrfs_free_path+0x2a/0x40
[  753.260014]  [<ffffffff815b2f9e>] btrfs_block_rsv_add+0x3e/0x70
[  753.260014]  [<ffffffff81666dfb>] ? security_d_instantiate+0x1b/0x30
[  753.260014]  [<ffffffff815c5f65>] start_transaction+0x255/0x2b0
[  753.260014]  [<ffffffff815c6283>] btrfs_start_transaction+0x13/0x20
[  753.260014]  [<ffffffff815d2236>] btrfs_create+0x46/0x220
[  753.260014]  [<ffffffff8116c204>] vfs_create+0xb4/0xf0
[  753.260014]  [<ffffffff8116e1d7>] do_last.isra.45+0x547/0x7c0
[  753.260014]  [<ffffffff8116f7ab>] path_openat+0xcb/0x3d0
[  753.260014]  [<ffffffff81ab168e>] ? _raw_spin_lock+0xe/0x20
[  753.260014]  [<ffffffff8117cc1e>] ? vfsmount_lock_local_unlock+0x1e/0x30
[  753.260014]  [<ffffffff8116fbd2>] do_filp_open+0x42/0xa0
[  753.260014]  [<ffffffff8117c487>] ? alloc_fd+0xf7/0x150
[  753.260014]  [<ffffffff8115f8e7>] do_sys_open+0xf7/0x1d0
[  753.260014]  [<ffffffff810b572a>] ? do_gettimeofday+0x1a/0x50
[  753.260014]  [<ffffffff8115f9e0>] sys_open+0x20/0x30
[  753.260014]  [<ffffffff81ab9502>] system_call_fastpath+0x16/0x1b
[  753.260014] Code: ff e9 37 f9 ff ff be 95 00 00 00 48 c7 c7 43 6f df 81 e8 99 5f ad ff e9 36 f9 ff ff 80 fa b2 0f 84 d0 f9 ff ff 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f
[  753.260014] RIP  [<ffffffff815b475b>] run_clustered_refs+0x7eb/0x800
[  753.260014]  RSP <ffff8800430258a8>
[  753.330089] ---[ end trace f3d0e286a928c349 ]---

It's hard to tell exactly what path gets to that BUG_ON(), so much
code is inlined by the compiler into run_clustered_refs() that I
can't tell exactly how it got to the BUG_ON() triggered in
alloc_reserved_tree_block().

(As an aside, this is why XFS use noinline for most of it's static
functions - so that stack traces are accurate when a problem occurs.
Debuggability of complex code paths is far more important than the
small speed improvement automatic inlining of static functions
gives...)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-04 23:01     ` Dave Chinner
@ 2012-01-05  2:23       ` Liu Bo
  2012-01-05  2:26         ` Dave Chinner
  2012-01-05 18:46       ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Liu Bo @ 2012-01-05  2:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Chris Samuel, linux-btrfs

On 01/04/2012 06:01 PM, Dave Chinner wrote:
> On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
>> On 05/01/12 09:11, Dave Chinner wrote:
>>
>>> Looks to be reproducable.
>> Does this happen with rc6 ?
> 
> I haven't tried. All I'm doing is running some benchmarks to get
> numbers for a talk I'm giving about improvements in XFS metadata
> scalability, so I wanted to update my last set of numbers from
> 2.6.39.
> 
> As it was, these benchmarks also failed on btrfs with oopsen and
> corruptions back in 2.6.39 time frame.  e.g. same VM, same
> test, different crashes, similar slowdowns as reported here:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> 
> Given that there is now a history of this simple test uncovering
> problems, perhaps this is a test that should be run more regularly
> by btrfs developers?
> 
>> If not then it might be easy to track down as there are only
>> 2 modifications between rc6 and rc7..
> 
> They don't look like they'd be responsible for fixing an extent tree
> corruption, and I don't really have the time to do an open-ended
> bisect to find where the problem fix arose.
> 
> As it is, 3rd attempt failed at 22m inodes, without the warning this
> time:
> 
> [   59.433452] device fsid 4d27dc14-562d-4722-9591-723bd2bbe94c devid 1 transid 4 /dev/vdc
> [   59.437050] btrfs: disk space caching is enabled
> [  753.258465] ------------[ cut here ]------------
> [  753.259806] kernel BUG at fs/btrfs/extent-tree.c:5797!
> [  753.260014] invalid opcode: 0000 [#1] SMP
> [  753.260014] CPU 7
> [  753.260014] Modules linked in:
> [  753.260014]
> [  753.260014] Pid: 2874, comm: fs_mark Not tainted 3.2.0-rc7-dgc+ #167 Bochs Bochs
> [  753.260014] RIP: 0010:[<ffffffff815b475b>]  [<ffffffff815b475b>] run_clustered_refs+0x7eb/0x800
> [  753.260014] RSP: 0018:ffff8800430258a8  EFLAGS: 00010286
> [  753.260014] RAX: 00000000ffffffe4 RBX: ffff88009c8ab1c0 RCX: 0000000000000000
> [  753.260014] RDX: 0000000000000008 RSI: 0000000000000282 RDI: 0000000000000000
> [  753.260014] RBP: ffff880043025988 R08: 0000000000000000 R09: 0000000000000002
> [  753.260014] R10: ffff8801188f6000 R11: ffff880101b50d20 R12: ffff88008fc1ad40
> [  753.260014] R13: ffff88003940a6c0 R14: ffff880118a49000 R15: ffff88010fc77e80
> [  753.260014] FS:  00007f416ce90700(0000) GS:ffff88011fdc0000(0000) knlGS:0000000000000000
> [  753.260014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  753.260014] CR2: 00007f416c2f6000 CR3: 000000003aaea000 CR4: 00000000000006e0
> [  753.260014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  753.260014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  753.260014] Process fs_mark (pid: 2874, threadinfo ffff880043024000, task ffff8800090e6180)
> [  753.260014] Stack:
> [  753.260014]  0000000000000000 0000000000000000 ffff880000000001 0000000000000000
> [  753.260014]  ffff88010fc77f38 0000000000000e92 0000000000000000 0000000000000002
> [  753.260014]  0000000000000e03 0000000000000e68 0000000000000000 ffff8800430259d8
> [  753.260014] Call Trace:
> [  753.260014]  [<ffffffff815b483a>] btrfs_run_delayed_refs+0xca/0x220
> [  753.260014]  [<ffffffff815c5469>] btrfs_commit_transaction+0x359/0x840
> [  753.260014]  [<ffffffff810ac420>] ? add_wait_queue+0x60/0x60
> [  753.260014]  [<ffffffff815c5da4>] ? start_transaction+0x94/0x2b0
> [  753.260014]  [<ffffffff815ac80c>] may_commit_transaction+0x6c/0x100
> [  753.260014]  [<ffffffff815b2b47>] reserve_metadata_bytes.isra.71+0x5a7/0x660
> [  753.260014]  [<ffffffff81073c23>] ? __wake_up+0x53/0x70
> [  753.260014]  [<ffffffff815a43ba>] ? btrfs_free_path+0x2a/0x40
> [  753.260014]  [<ffffffff815b2f9e>] btrfs_block_rsv_add+0x3e/0x70
> [  753.260014]  [<ffffffff81666dfb>] ? security_d_instantiate+0x1b/0x30
> [  753.260014]  [<ffffffff815c5f65>] start_transaction+0x255/0x2b0
> [  753.260014]  [<ffffffff815c6283>] btrfs_start_transaction+0x13/0x20
> [  753.260014]  [<ffffffff815d2236>] btrfs_create+0x46/0x220
> [  753.260014]  [<ffffffff8116c204>] vfs_create+0xb4/0xf0
> [  753.260014]  [<ffffffff8116e1d7>] do_last.isra.45+0x547/0x7c0
> [  753.260014]  [<ffffffff8116f7ab>] path_openat+0xcb/0x3d0
> [  753.260014]  [<ffffffff81ab168e>] ? _raw_spin_lock+0xe/0x20
> [  753.260014]  [<ffffffff8117cc1e>] ? vfsmount_lock_local_unlock+0x1e/0x30
> [  753.260014]  [<ffffffff8116fbd2>] do_filp_open+0x42/0xa0
> [  753.260014]  [<ffffffff8117c487>] ? alloc_fd+0xf7/0x150
> [  753.260014]  [<ffffffff8115f8e7>] do_sys_open+0xf7/0x1d0
> [  753.260014]  [<ffffffff810b572a>] ? do_gettimeofday+0x1a/0x50
> [  753.260014]  [<ffffffff8115f9e0>] sys_open+0x20/0x30
> [  753.260014]  [<ffffffff81ab9502>] system_call_fastpath+0x16/0x1b
> [  753.260014] Code: ff e9 37 f9 ff ff be 95 00 00 00 48 c7 c7 43 6f df 81 e8 99 5f ad ff e9 36 f9 ff ff 80 fa b2 0f 84 d0 f9 ff ff 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f
> [  753.260014] RIP  [<ffffffff815b475b>] run_clustered_refs+0x7eb/0x800
> [  753.260014]  RSP <ffff8800430258a8>
> [  753.330089] ---[ end trace f3d0e286a928c349 ]---
> 
> It's hard to tell exactly what path gets to that BUG_ON(), so much
> code is inlined by the compiler into run_clustered_refs() that I
> can't tell exactly how it got to the BUG_ON() triggered in
> alloc_reserved_tree_block().
> 

This seems to be an oops led by ENOSPC.

Thanks for the report.


> (As an aside, this is why XFS use noinline for most of it's static
> functions - so that stack traces are accurate when a problem occurs.
> Debuggability of complex code paths is far more important than the
> small speed improvement automatic inlining of static functions
> gives...)
> 

Make sense.

thanks,
liubo

> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05  2:23       ` Liu Bo
@ 2012-01-05  2:26         ` Dave Chinner
  2012-01-05 19:11           ` Liu Bo
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-01-05  2:26 UTC (permalink / raw)
  To: Liu Bo; +Cc: Chris Samuel, linux-btrfs

On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
> On 01/04/2012 06:01 PM, Dave Chinner wrote:
> > On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> >> On 05/01/12 09:11, Dave Chinner wrote:
> >>
> >>> Looks to be reproducable.
> >> Does this happen with rc6 ?
> > 
> > I haven't tried. All I'm doing is running some benchmarks to get
> > numbers for a talk I'm giving about improvements in XFS metadata
> > scalability, so I wanted to update my last set of numbers from
> > 2.6.39.
> > 
> > As it was, these benchmarks also failed on btrfs with oopsen and
> > corruptions back in 2.6.39 time frame.  e.g. same VM, same
> > test, different crashes, similar slowdowns as reported here:
> > http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> > 
> > Given that there is now a history of this simple test uncovering
> > problems, perhaps this is a test that should be run more regularly
> > by btrfs developers?
> > 
> >> If not then it might be easy to track down as there are only
> >> 2 modifications between rc6 and rc7..
> > 
> > They don't look like they'd be responsible for fixing an extent tree
> > corruption, and I don't really have the time to do an open-ended
> > bisect to find where the problem fix arose.
> > 
> > As it is, 3rd attempt failed at 22m inodes, without the warning this
> > time:

.....

> > It's hard to tell exactly what path gets to that BUG_ON(), so much
> > code is inlined by the compiler into run_clustered_refs() that I
> > can't tell exactly how it got to the BUG_ON() triggered in
> > alloc_reserved_tree_block().
> > 
> 
> This seems to be an oops led by ENOSPC.

At the time of the oops, this is the space used on the filesystem:

$ df -h /mnt/scratch
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc         17T   31G   17T   1% /mnt/scratch

It's less than 0.2% full, so I think ENOSPC can be ruled out here.

I have noticed one thing, however, in that the there are significant
numbers of reads coming from disk when the slowdowns and oops occur.
When everything runs fast, there are virtually no reads occurring at
all.  It looks to me that maybe the working set of metadata is being
kicked out of memory, only to be read back in again short while
later. Maybe that is a contributing factor.

BTW, there is a lot of CPU time being spent on the tree locks. perf
shows this as the top 2 CPU consumers:

-   9.49%  [kernel]  [k] __write_lock_failed
   - __write_lock_failed
      - 99.80% _raw_write_lock
         - 79.35% btrfs_try_tree_write_lock
              99.99% btrfs_search_slot
         - 20.63% btrfs_tree_lock
              89.19% btrfs_search_slot
              10.54% btrfs_lock_root_node
                 btrfs_search_slot
-   9.25%  [kernel]  [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
      - 55.87% __wake_up
         + 93.89% btrfs_clear_lock_blocking_rw
         + 3.46% btrfs_tree_read_unlock_blocking
         + 2.35% btrfs_tree_unlock

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 19:11           ` Liu Bo
@ 2012-01-05 11:43             ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2012-01-05 11:43 UTC (permalink / raw)
  To: Liu Bo; +Cc: Chris Samuel, linux-btrfs

On Thu, Jan 05, 2012 at 02:11:31PM -0500, Liu Bo wrote:
> On 01/04/2012 09:26 PM, Dave Chinner wrote:
> > On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
> >> On 01/04/2012 06:01 PM, Dave Chinner wrote:
> >>> On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> >>>> On 05/01/12 09:11, Dave Chinner wrote:
> >>>>
> >>>>> Looks to be reproducable.
> >>>> Does this happen with rc6 ?
> >>> I haven't tried. All I'm doing is running some benchmarks to get
> >>> numbers for a talk I'm giving about improvements in XFS metadata
> >>> scalability, so I wanted to update my last set of numbers from
> >>> 2.6.39.
> >>>
> >>> As it was, these benchmarks also failed on btrfs with oopsen and
> >>> corruptions back in 2.6.39 time frame.  e.g. same VM, same
> >>> test, different crashes, similar slowdowns as reported here:
> >>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> >>>
> >>> Given that there is now a history of this simple test uncovering
> >>> problems, perhaps this is a test that should be run more regularly
> >>> by btrfs developers?
> >>>
> >>>> If not then it might be easy to track down as there are only
> >>>> 2 modifications between rc6 and rc7..
> >>> They don't look like they'd be responsible for fixing an extent tree
> >>> corruption, and I don't really have the time to do an open-ended
> >>> bisect to find where the problem fix arose.
> >>>
> >>> As it is, 3rd attempt failed at 22m inodes, without the warning this
> >>> time:
> > 
> > .....
> > 
> >>> It's hard to tell exactly what path gets to that BUG_ON(), so much
> >>> code is inlined by the compiler into run_clustered_refs() that I
> >>> can't tell exactly how it got to the BUG_ON() triggered in
> >>> alloc_reserved_tree_block().
> >>>
> >> This seems to be an oops led by ENOSPC.
> > 
> > At the time of the oops, this is the space used on the filesystem:
> > 
> > $ df -h /mnt/scratch
> > Filesystem      Size  Used Avail Use% Mounted on
> > /dev/vdc         17T   31G   17T   1% /mnt/scratch
> > 
> > It's less than 0.2% full, so I think ENOSPC can be ruled out here.
> > 
> 
> This bug has done something with our block reservation allocator, not the real disk space.
> 
> Can you try the below one and see what happens?

Still crashes, still has severe slowdowns.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-04 23:01     ` Dave Chinner
  2012-01-05  2:23       ` Liu Bo
@ 2012-01-05 18:46       ` Chris Mason
  2012-01-05 19:45         ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Mason @ 2012-01-05 18:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Chris Samuel, linux-btrfs

On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
> On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> > On 05/01/12 09:11, Dave Chinner wrote:
> > 
> > > Looks to be reproducable.
> > 
> > Does this happen with rc6 ?
> 
> I haven't tried. All I'm doing is running some benchmarks to get
> numbers for a talk I'm giving about improvements in XFS metadata
> scalability, so I wanted to update my last set of numbers from
> 2.6.39.
> 
> As it was, these benchmarks also failed on btrfs with oopsen and
> corruptions back in 2.6.39 time frame.  e.g. same VM, same
> test, different crashes, similar slowdowns as reported here:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> 
> Given that there is now a history of this simple test uncovering
> problems, perhaps this is a test that should be run more regularly
> by btrfs developers?

Unfortunately, this one works for me.  I'll try it again and see if I
can push harder.  If not, I'll see if I can trade beer for some
diagnostic runs.

Thanks Dave

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05  2:26         ` Dave Chinner
@ 2012-01-05 19:11           ` Liu Bo
  2012-01-05 11:43             ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Liu Bo @ 2012-01-05 19:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Chris Samuel, linux-btrfs

On 01/04/2012 09:26 PM, Dave Chinner wrote:
> On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
>> On 01/04/2012 06:01 PM, Dave Chinner wrote:
>>> On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
>>>> On 05/01/12 09:11, Dave Chinner wrote:
>>>>
>>>>> Looks to be reproducable.
>>>> Does this happen with rc6 ?
>>> I haven't tried. All I'm doing is running some benchmarks to get
>>> numbers for a talk I'm giving about improvements in XFS metadata
>>> scalability, so I wanted to update my last set of numbers from
>>> 2.6.39.
>>>
>>> As it was, these benchmarks also failed on btrfs with oopsen and
>>> corruptions back in 2.6.39 time frame.  e.g. same VM, same
>>> test, different crashes, similar slowdowns as reported here:
>>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
>>>
>>> Given that there is now a history of this simple test uncovering
>>> problems, perhaps this is a test that should be run more regularly
>>> by btrfs developers?
>>>
>>>> If not then it might be easy to track down as there are only
>>>> 2 modifications between rc6 and rc7..
>>> They don't look like they'd be responsible for fixing an extent tree
>>> corruption, and I don't really have the time to do an open-ended
>>> bisect to find where the problem fix arose.
>>>
>>> As it is, 3rd attempt failed at 22m inodes, without the warning this
>>> time:
> 
> .....
> 
>>> It's hard to tell exactly what path gets to that BUG_ON(), so much
>>> code is inlined by the compiler into run_clustered_refs() that I
>>> can't tell exactly how it got to the BUG_ON() triggered in
>>> alloc_reserved_tree_block().
>>>
>> This seems to be an oops led by ENOSPC.
> 
> At the time of the oops, this is the space used on the filesystem:
> 
> $ df -h /mnt/scratch
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdc         17T   31G   17T   1% /mnt/scratch
> 
> It's less than 0.2% full, so I think ENOSPC can be ruled out here.
> 

This bug has done something with our block reservation allocator, not the real disk space.

Can you try the below one and see what happens?

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b1c8732..5a7f918 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3978,8 +3978,8 @@ static u64 calc_global_metadata_size(struct btrfs_fs_info *fs_info)
 		    csum_size * 2;
 	num_bytes += div64_u64(data_used + meta_used, 50);
 
-	if (num_bytes * 3 > meta_used)
-		num_bytes = div64_u64(meta_used, 3);
+	if (num_bytes * 2 > meta_used)
+		num_bytes = div64_u64(meta_used, 2);
 
 	return ALIGN(num_bytes, fs_info->extent_root->leafsize << 10);
 }

> I have noticed one thing, however, in that the there are significant
> numbers of reads coming from disk when the slowdowns and oops occur.
> When everything runs fast, there are virtually no reads occurring at
> all.  It looks to me that maybe the working set of metadata is being
> kicked out of memory, only to be read back in again short while
> later. Maybe that is a contributing factor.
> 
> BTW, there is a lot of CPU time being spent on the tree locks. perf
> shows this as the top 2 CPU consumers:
> 
> -   9.49%  [kernel]  [k] __write_lock_failed
>    - __write_lock_failed
>       - 99.80% _raw_write_lock
>          - 79.35% btrfs_try_tree_write_lock
>               99.99% btrfs_search_slot
>          - 20.63% btrfs_tree_lock
>               89.19% btrfs_search_slot
>               10.54% btrfs_lock_root_node
>                  btrfs_search_slot
> -   9.25%  [kernel]  [k] _raw_spin_unlock_irqrestore
>    - _raw_spin_unlock_irqrestore
>       - 55.87% __wake_up
>          + 93.89% btrfs_clear_lock_blocking_rw
>          + 3.46% btrfs_tree_read_unlock_blocking
>          + 2.35% btrfs_tree_unlock
> 

hmm, the new extent_buffer lock scheme written by Chris is aimed to avoid such cases,
maybe he can provide some advices.

thanks,
liubo

> Cheers,
> 
> Dave.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 18:46       ` Chris Mason
@ 2012-01-05 19:45         ` Chris Mason
  2012-01-05 20:12           ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2012-01-05 19:45 UTC (permalink / raw)
  To: Dave Chinner, Chris Samuel, linux-btrfs

On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
> On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
> > On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> > > On 05/01/12 09:11, Dave Chinner wrote:
> > > 
> > > > Looks to be reproducable.
> > > 
> > > Does this happen with rc6 ?
> > 
> > I haven't tried. All I'm doing is running some benchmarks to get
> > numbers for a talk I'm giving about improvements in XFS metadata
> > scalability, so I wanted to update my last set of numbers from
> > 2.6.39.
> > 
> > As it was, these benchmarks also failed on btrfs with oopsen and
> > corruptions back in 2.6.39 time frame.  e.g. same VM, same
> > test, different crashes, similar slowdowns as reported here:
> > http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> > 
> > Given that there is now a history of this simple test uncovering
> > problems, perhaps this is a test that should be run more regularly
> > by btrfs developers?
> 
> Unfortunately, this one works for me.  I'll try it again and see if I
> can push harder.  If not, I'll see if I can trade beer for some
> diagnostic runs.

Aha, if I try it just on the ssd instead of on my full array it triggers
at 88M files.  Great.

-chris


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 19:45         ` Chris Mason
@ 2012-01-05 20:12           ` Dave Chinner
  2012-01-05 21:02             ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-01-05 20:12 UTC (permalink / raw)
  To: Chris Mason, Chris Samuel, linux-btrfs

On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
> On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
> > On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> > > > On 05/01/12 09:11, Dave Chinner wrote:
> > > > 
> > > > > Looks to be reproducable.
> > > > 
> > > > Does this happen with rc6 ?
> > > 
> > > I haven't tried. All I'm doing is running some benchmarks to get
> > > numbers for a talk I'm giving about improvements in XFS metadata
> > > scalability, so I wanted to update my last set of numbers from
> > > 2.6.39.
> > > 
> > > As it was, these benchmarks also failed on btrfs with oopsen and
> > > corruptions back in 2.6.39 time frame.  e.g. same VM, same
> > > test, different crashes, similar slowdowns as reported here:
> > > http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> > > 
> > > Given that there is now a history of this simple test uncovering
> > > problems, perhaps this is a test that should be run more regularly
> > > by btrfs developers?
> > 
> > Unfortunately, this one works for me.  I'll try it again and see if I
> > can push harder.  If not, I'll see if I can trade beer for some
> > diagnostic runs.
> 
> Aha, if I try it just on the ssd instead of on my full array it triggers
> at 88M files.  Great.

Good to know.  The error that is generating the BUG on my machine is
-28 (ENOSPC).  Given there's 17TB free on my filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 20:12           ` Dave Chinner
@ 2012-01-05 21:02             ` Chris Mason
  2012-01-05 21:24               ` Chris Samuel
  2012-01-07 21:34               ` Christian Brunner
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2012-01-05 21:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Chris Samuel, linux-btrfs

On Fri, Jan 06, 2012 at 07:12:16AM +1100, Dave Chinner wrote:
> On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
> > On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
> > > On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
> > > > On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
> > > > > On 05/01/12 09:11, Dave Chinner wrote:
> > > > > 
> > > > > > Looks to be reproducable.
> > > > > 
> > > > > Does this happen with rc6 ?
> > > > 
> > > > I haven't tried. All I'm doing is running some benchmarks to get
> > > > numbers for a talk I'm giving about improvements in XFS metadata
> > > > scalability, so I wanted to update my last set of numbers from
> > > > 2.6.39.
> > > > 
> > > > As it was, these benchmarks also failed on btrfs with oopsen and
> > > > corruptions back in 2.6.39 time frame.  e.g. same VM, same
> > > > test, different crashes, similar slowdowns as reported here:
> > > > http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
> > > > 
> > > > Given that there is now a history of this simple test uncovering
> > > > problems, perhaps this is a test that should be run more regularly
> > > > by btrfs developers?
> > > 
> > > Unfortunately, this one works for me.  I'll try it again and see if I
> > > can push harder.  If not, I'll see if I can trade beer for some
> > > diagnostic runs.
> > 
> > Aha, if I try it just on the ssd instead of on my full array it triggers
> > at 88M files.  Great.
> 
> Good to know.  The error that is generating the BUG on my machine is
> -28 (ENOSPC).  Given there's 17TB free on my filesystem....

Yeah, same thing here.  I'm testing a fix now, it's pretty dumb.  We're
not allocating more metadata chunks from the drive because of where the
allocation is happening, so it is just a check for "do we need a new
chunk" in the right place.

I'll make sure it can fill my ssd and then send to you.

-chris


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 21:02             ` Chris Mason
@ 2012-01-05 21:24               ` Chris Samuel
  2012-01-06  1:22                 ` Chris Mason
  2012-01-07 21:34               ` Christian Brunner
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Samuel @ 2012-01-05 21:24 UTC (permalink / raw)
  To: Chris Mason, Dave Chinner; +Cc: linux-btrfs

On Fri,=C2=A0  6 Jan 2012, 08:02:55 EST, Chris Mason <chris.mason@oracl=
e.com> wrote:

> Yeah, same thing here.=C2=A0  I'm testing a fix now, it's pretty dumb=
=2E=C2=A0  We're
> not allocating more metadata chunks from the drive because of where t=
he
> allocation is happening, so it is just a check for "do we need a new
> chunk" in the right place.
>=20
> I'll make sure it can fill my ssd and then send to you.

*If* the fix works would this be a candidate for a
stable 3.2.1 release rather than having to wait for
3.3?

cheers,
Chris
--=20
Chris Samuel - http://www.csamuel.org/ - on mobile
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 21:24               ` Chris Samuel
@ 2012-01-06  1:22                 ` Chris Mason
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Mason @ 2012-01-06  1:22 UTC (permalink / raw)
  To: Chris Samuel; +Cc: Dave Chinner, linux-btrfs

On Fri, Jan 06, 2012 at 08:24:32AM +1100, Chris Samuel wrote:
> On Fri,=A0  6 Jan 2012, 08:02:55 EST, Chris Mason <chris.mason@oracle=
=2Ecom> wrote:
>=20
> > Yeah, same thing here.=A0  I'm testing a fix now, it's pretty dumb.=
=A0  We're
> > not allocating more metadata chunks from the drive because of where=
 the
> > allocation is happening, so it is just a check for "do we need a ne=
w
> > chunk" in the right place.
> >=20
> > I'll make sure it can fill my ssd and then send to you.
>=20
> *If* the fix works would this be a candidate for a
> stable 3.2.1 release rather than having to wait for
> 3.3?

Yes, it's pretty safe.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-05 21:02             ` Chris Mason
  2012-01-05 21:24               ` Chris Samuel
@ 2012-01-07 21:34               ` Christian Brunner
  2012-01-12 16:18                 ` Christian Brunner
  1 sibling, 1 reply; 16+ messages in thread
From: Christian Brunner @ 2012-01-07 21:34 UTC (permalink / raw)
  To: Chris Mason, Dave Chinner, Chris Samuel, linux-btrfs

2012/1/5 Chris Mason <chris.mason@oracle.com>:
> On Fri, Jan 06, 2012 at 07:12:16AM +1100, Dave Chinner wrote:
>> On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
>> > On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
>> > > On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
>> > > > On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
>> > > > > On 05/01/12 09:11, Dave Chinner wrote:
>> > > > >
>> > > > > > Looks to be reproducable.
>> > > > >
>> > > > > Does this happen with rc6 ?
>> > > >
>> > > > I haven't tried. All I'm doing is running some benchmarks to g=
et
>> > > > numbers for a talk I'm giving about improvements in XFS metada=
ta
>> > > > scalability, so I wanted to update my last set of numbers from
>> > > > 2.6.39.
>> > > >
>> > > > As it was, these benchmarks also failed on btrfs with oopsen a=
nd
>> > > > corruptions back in 2.6.39 time frame. =A0e.g. same VM, same
>> > > > test, different crashes, similar slowdowns as reported here:
>> > > > http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
>> > > >
>> > > > Given that there is now a history of this simple test uncoveri=
ng
>> > > > problems, perhaps this is a test that should be run more regul=
arly
>> > > > by btrfs developers?
>> > >
>> > > Unfortunately, this one works for me. =A0I'll try it again and s=
ee if I
>> > > can push harder. =A0If not, I'll see if I can trade beer for som=
e
>> > > diagnostic runs.
>> >
>> > Aha, if I try it just on the ssd instead of on my full array it tr=
iggers
>> > at 88M files. =A0Great.
>>
>> Good to know. =A0The error that is generating the BUG on my machine =
is
>> -28 (ENOSPC). =A0Given there's 17TB free on my filesystem....
>
> Yeah, same thing here. =A0I'm testing a fix now, it's pretty dumb. =A0=
We're
> not allocating more metadata chunks from the drive because of where t=
he
> allocation is happening, so it is just a check for "do we need a new
> chunk" in the right place.
>
> I'll make sure it can fill my ssd and then send to you.

Could you send the patch to the list (or to me), please? Telling from
what you mentioned on IRC this sounds quite interesting and I would
like to see if this solves my performance problems with ceph, too...

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [3.2-rc7] slowdown, warning + oops creating lots of files
  2012-01-07 21:34               ` Christian Brunner
@ 2012-01-12 16:18                 ` Christian Brunner
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Brunner @ 2012-01-12 16:18 UTC (permalink / raw)
  To: Chris Mason, Dave Chinner, Chris Samuel, linux-btrfs

2012/1/7 Christian Brunner <chb@muc.de>:
> 2012/1/5 Chris Mason <chris.mason@oracle.com>:
>> On Fri, Jan 06, 2012 at 07:12:16AM +1100, Dave Chinner wrote:
>>> On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
>>> > On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
>>> > >
>>> > > Unfortunately, this one works for me. =A0I'll try it again and =
see if I
>>> > > can push harder. =A0If not, I'll see if I can trade beer for so=
me
>>> > > diagnostic runs.
>>> >
>>> > Aha, if I try it just on the ssd instead of on my full array it t=
riggers
>>> > at 88M files. =A0Great.
>>>
>>> Good to know. =A0The error that is generating the BUG on my machine=
 is
>>> -28 (ENOSPC). =A0Given there's 17TB free on my filesystem....
>>
>> Yeah, same thing here. =A0I'm testing a fix now, it's pretty dumb. =A0=
We're
>> not allocating more metadata chunks from the drive because of where =
the
>> allocation is happening, so it is just a check for "do we need a new
>> chunk" in the right place.
>>
>> I'll make sure it can fill my ssd and then send to you.
>
> Could you send the patch to the list (or to me), please? Telling from
> what you mentioned on IRC this sounds quite interesting and I would
> like to see if this solves my performance problems with ceph, too...

I apologize for bothering you again, but I would really like to give it=
 a spin.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-01-12 16:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-04 21:44 [3.2-rc7] slowdown, warning + oops creating lots of files Dave Chinner
2012-01-04 22:11 ` Dave Chinner
2012-01-04 22:23   ` Chris Samuel
2012-01-04 23:01     ` Dave Chinner
2012-01-05  2:23       ` Liu Bo
2012-01-05  2:26         ` Dave Chinner
2012-01-05 19:11           ` Liu Bo
2012-01-05 11:43             ` Dave Chinner
2012-01-05 18:46       ` Chris Mason
2012-01-05 19:45         ` Chris Mason
2012-01-05 20:12           ` Dave Chinner
2012-01-05 21:02             ` Chris Mason
2012-01-05 21:24               ` Chris Samuel
2012-01-06  1:22                 ` Chris Mason
2012-01-07 21:34               ` Christian Brunner
2012-01-12 16:18                 ` Christian Brunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).