From: Dave Chinner <david@fromorbit.com>
To: linux-btrfs@vger.kernel.org
Subject: [BUG 3.0-rc1] oops during file removal, severe lock contention
Date: Thu, 2 Jun 2011 11:11:39 +1000 [thread overview]
Message-ID: <20110602011139.GP561@dastard> (raw)
Hi Folks,
Running on 3.0-rc1 on an 8p/4G RAM VM with a 16TB filesystem (12
disk DM stripe) a 50 million inode 8-way fsmark creation workload
via:
$ /usr/bin/time ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 63 \
> -d /mnt/scratch/0 -d /mnt/scratch/1 \
> -d /mnt/scratch/2 -d /mnt/scratch/3 \
> -d /mnt/scratch/4 -d /mnt/scratch/5 \
> -d /mnt/scratch/6 -d /mnt/scratch/7
followed by an 8-way rm -rf on the result via:
$ for i in /mnt/scratch/*; do /usr/bin/time rm -rf $i 2>&1 & done
resulted in this oops:
[ 2671.052861] device fsid 84f7a99b2f193c6-c3228aae4c5a2f8a devid 1 transid 7 /dev/vda
[ 8626.879250] BUG: unable to handle kernel paging request at ffff880120000000
[ 8626.880020] IP: [<ffffffff81659a43>] chksum_update+0x23/0x50
[ 8626.880020] PGD 1ef2063 PUD 11fffa067 PMD 0
[ 8626.880020] Oops: 0000 [#1] SMP
[ 8626.880020] CPU 5
[ 8626.880020] Modules linked in:
[ 8626.880020]
[ 8626.880020] Pid: 3326, comm: btrfs-transacti Not tainted 3.0.0-rc1-dgc+ #1272 Bochs Bochs
[ 8626.880020] RIP: 0010:[<ffffffff81659a43>] [<ffffffff81659a43>] chksum_update+0x23/0x50
[ 8626.880020] RSP: 0018:ffff88010fba7a30 EFLAGS: 00010283
[ 8626.880020] RAX: 000000002dda3ac0 RBX: ffff88010fba7a50 RCX: ffff880120000001
[ 8626.880020] RDX: 000000000000009d RSI: ffff880120000000 RDI: ffff88010fba7a50
[ 8626.880020] RBP: ffff88010fba7a30 R08: ffff880217846000 R09: 0000000000001025
[ 8626.880020] R10: ffff88011affe0c0 R11: dead000000200200 R12: ffff88010fba7bd0
[ 8626.880020] R13: ffff880117846025 R14: 0000000000000000 R15: 0000000000000001
[ 8626.880020] FS: 0000000000000000(0000) GS:ffff88011fd40000(0000) knlGS:0000000000000000
[ 8626.880020] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 8626.880020] CR2: ffff880120000000 CR3: 0000000001ef1000 CR4: 00000000000006e0
[ 8626.880020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8626.880020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8626.880020] Process btrfs-transacti (pid: 3326, threadinfo ffff88010fba6000, task ffff88011affe0c0)
[ 8626.880020] Stack:
[ 8626.880020] ffff88010fba7a40 ffffffff81651688 ffff88010fba7a90 ffffffff81688967
[ 8626.880020] ffff880119b065c0 0000000800000000 ffff8801ffffffff 0000005005818de0
[ 8626.880020] ffff88010fba7bd0 ffff880105818de0 ffff88010fba7aa0 ffff8800cc6c63a0
[ 8626.880020] Call Trace:
[ 8626.880020] [<ffffffff81651688>] crypto_shash_update+0x18/0x30
[ 8626.880020] [<ffffffff81688967>] crc32c+0x47/0x60
[ 8626.880020] [<ffffffff8159d5e2>] btrfs_csum_data+0x12/0x20
[ 8626.880020] [<ffffffff815df671>] __btrfs_write_out_cache+0x601/0xc70
[ 8626.880020] [<ffffffff815abdc6>] ? __btrfs_prealloc_file_range+0x196/0x220
[ 8626.880020] [<ffffffff81aff90e>] ? _raw_spin_lock+0xe/0x20
[ 8626.880020] [<ffffffff815dfd42>] btrfs_write_out_ino_cache+0x62/0xb0
[ 8626.880020] [<ffffffff8159b33e>] btrfs_save_ino_cache+0x11e/0x210
[ 8626.880020] [<ffffffff815a291d>] commit_fs_roots+0xad/0x180
[ 8626.880020] [<ffffffff81afe77e>] ? mutex_lock+0x1e/0x50
[ 8626.880020] [<ffffffff8158056a>] ? btrfs_free_path+0x2a/0x40
[ 8626.880020] [<ffffffff815a3855>] btrfs_commit_transaction+0x375/0x7b0
[ 8626.880020] [<ffffffff810a40e0>] ? wake_up_bit+0x40/0x40
[ 8626.880020] [<ffffffff8159cf63>] transaction_kthread+0x293/0x2b0
[ 8626.880020] [<ffffffff8159ccd0>] ? btrfs_bio_wq_end_io+0x90/0x90
[ 8626.880020] [<ffffffff810a3b96>] kthread+0x96/0xa0
[ 8626.880020] [<ffffffff81b08724>] kernel_thread_helper+0x4/0x10
[ 8626.880020] [<ffffffff810a3b00>] ? kthread_worker_fn+0x190/0x190
[ 8626.880020] [<ffffffff81b08720>] ? gs_change+0x13/0x13
[ 8626.880020] Code: ea ff ff ff c9 c3 66 90 55 48 89 e5 66 66 66 66 90 8b 47 10 85 d2 74 2d 48 8d 4e 01 44 8d 42 ff 4e 8d 04 01 eb 05 66 90 48 ff c1 <0f> b6 16 48 89
[ 8626.880020] RIP [<ffffffff81659a43>] chksum_update+0x23/0x50
[ 8626.880020] RSP <ffff88010fba7a30>
[ 8626.880020] CR2: ffff880120000000
[ 8626.880020] ---[ end trace dad2f8b74a28cc71 ]---
Also, there is massive lock contention while running these workloads.
perf top shows this for the create after about 5m inodes have been
created:
samples pcnt function DSO
_______ _____ _____________________________ _________________
20626.00 25.6% __ticket_spin_lock [kernel.kallsyms]
5148.00 6.4% _raw_spin_unlock_irqrestore [kernel.kallsyms]
3769.00 4.7% test_range_bit [kernel.kallsyms]
2239.00 2.8% chksum_update [kernel.kallsyms]
2143.00 2.7% finish_task_switch [kernel.kallsyms]
1912.00 2.4% inode_tree_add [kernel.kallsyms]
1825.00 2.3% radix_tree_lookup [kernel.kallsyms]
1449.00 1.8% generic_bin_search [kernel.kallsyms]
1205.00 1.5% btrfs_search_slot [kernel.kallsyms]
1198.00 1.5% btrfs_tree_lock [kernel.kallsyms]
1104.00 1.4% mutex_spin_on_owner [kernel.kallsyms]
1023.00 1.3% kmem_cache_alloc [kernel.kallsyms]
1016.00 1.3% map_private_extent_buffer [kernel.kallsyms]
931.00 1.2% verify_parent_transid [kernel.kallsyms]
895.00 1.1% find_extent_buffer [kernel.kallsyms]
785.00 1.0% kmem_cache_free [kernel.kallsyms]
778.00 1.0% memmove [kernel.kallsyms]
745.00 0.9% __d_lookup_rcu [kernel.kallsyms]
697.00 0.9% mark_page_accessed [kernel.kallsyms]
638.00 0.8% btrfs_try_spin_lock [kernel.kallsyms]
600.00 0.7% map_extent_buffer [kernel.kallsyms]
570.00 0.7% btrfs_item_offset [kernel.kallsyms]
539.00 0.7% __memcpy [kernel.kallsyms]
507.00 0.6% __ticket_spin_unlock [kernel.kallsyms]
500.00 0.6% link_path_walk [kernel.kallsyms]
457.00 0.6% __do_softirq [kernel.kallsyms]
442.00 0.5% __d_lookup [kernel.kallsyms]
438.00 0.5% free_extent_buffer [kernel.kallsyms]
430.00 0.5% block_group_cache_tree_search [kernel.kallsyms]
428.00 0.5% pvclock_clocksource_read [kernel.kallsyms]
By 10 million inodes:
25928.00 32.0% __ticket_spin_lock [kernel.kallsyms]
3715.00 4.6% test_range_bit [kernel.kallsyms]
3681.00 4.5% _raw_spin_unlock_irqrestore [kernel.kallsyms]
And it kept getting worse all the way to 50m inodes where
__ticket_spin_lock was taking ~60% of the CPU time. While the
unlinking was running, I was seeing similar levels of lock
contention.
FYI, performance on this create workload my test box is roughly 45k
creates/s for btrfs, 75k creates/s for ext4 and 110k create/s for
XFS. btrfs is without doubt being slowed down by the lock contention
problems....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next reply other threads:[~2011-06-02 1:11 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-02 1:11 Dave Chinner [this message]
2011-06-02 19:46 ` [BUG 3.0-rc1] oops during file removal, severe lock contention Andi Kleen
2011-06-02 22:21 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110602011139.GP561@dastard \
--to=david@fromorbit.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).