From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Pratt Subject: Re: New experimental btrfs branch ready for testing Date: Thu, 04 Jun 2009 14:02:20 -0500 Message-ID: <4A281A3C.6000006@austin.ibm.com> References: <20090601210447.GC3890@think> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed To: Chris Mason , linux-btrfs@vger.kernel.org Return-path: In-Reply-To: <20090601210447.GC3890@think> List-ID: Chris Mason wrote: > Hello everyone, > > Yan Zheng has been doing some major surgery to the back references and > extent allocation code, tackling bottlenecks in the code that tracks > extents. It scales better with many snapshots and performs better in > the common case of no snapshots at all. > > THE NEW CODE IS A FORWARD ROLLING DISK FORMAT CHANGE. This means it is > compatible with the current btrfs disk format, but once you mount a > filesystem with the new code, it WILL NO LONGER BE MOUNTABLE FROM OLD > KERNELS. Old kernels spit out an error message when you try them on new > format filesystems. > > This is a large change, and I'm hoping to have it stable in time for the > 2.6.31 merge window. I've been testing it for about a week now, and > haven't been able to cause major problems yet. But, testing the > compatibility with old format filesystems is the hard part, and > everyone that pulls the new code should backup their data first. > > I've setup git branches called newformat where you can pull the new code. > > For the kernel (based on 2.6.30-rc7): > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git newformat > > So I started the performance runs on this. The base tests completed fine on the raid system and I will post results as soon as I can finish postprocessing, but when I tried to do nodatacow that machine it crashed pretty early. Here is console log: btrfs2 kernel: [82057.882255] ------------[ cut here ]------------ Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] invalid opcode: 0000 [#1] SMP Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] Stack: Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] ffff88011786d800 ffff8801259f6ea0 000000b21f256030 00000000000000e9 Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] 000000352231b250 ffff880089abbf40 ffff88013d0e2440 0000000000000001 Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] Call Trace: Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [] run_one_delayed_ref+0x382/0x42f [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [] ? map_extent_buffer+0xab/0xbe [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [] run_clustered_refs+0x237/0x2b4 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:47 2009 ... btrfs2 kernel: [82057.882535] [] btrfs_run_delayed_refs+0xac/0x195 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] __btrfs_end_transaction+0x59/0xfe [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] btrfs_end_transaction+0xb/0xd [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] btrfs_finish_ordered_io+0x224/0x24d [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] btrfs_writepage_end_io_hook+0x10/0x12 [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] end_bio_extent_writepage+0xa3/0x18f [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] ? del_timer_sync+0x14/0x20 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] bio_endio+0x26/0x28 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] end_workqueue_fn+0x111/0x11e [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] worker_loop+0x67/0x1ee [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] ? worker_loop+0x0/0x1ee [btrfs] Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] kthread+0x56/0x86 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] child_rip+0xa/0x20 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] ? kthread+0x0/0x86 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] [] ? child_rip+0x0/0x20 Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... btrfs2 kernel: [82057.882535] Code: 08 4c 8d 45 d4 41 8d 44 24 18 48 8b 73 20 48 8b 4d 18 41 b9 01 00 00 00 48 8b 7d b8 4c 89 ea 89 45 d4 e8 df e3 ff ff 85 c0 74 04 <0f> 0b eb fe 49 63 75 40 4d 8b 65 00 49 83 cf 01 4c 89 e7 48 6b Message from syslogd@ at Thu Jun 4 08:02:48 2009 ... I also ran this on the single disk system and it did not make it through base tests. Error are different. [101511.664497] Pid: 28597, comm: btrfs-transacti Tainted: G D 2.6.30-rc7-autokern1 #1 IBM x3950-[88726RU]- [101511.675497] RIP: 0010:[] [] _spin_lock+0x14/0x1a [101511.684494] RSP: 0018:ffff8801309bbb40 EFLAGS: 00000297 [101511.689494] RAX: 0000000000001514 RBX: ffff8801309bbb40 RCX: ffff8801309bbb40 [101511.697493] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800b7427d70 [101511.705491] RBP: ffffffff8020c50e R08: 0000000000000001 R09: ffff8801309bba68 [101511.713490] R10: ffff88012231b910 R11: ffff8800478ad5b0 R12: 0000001a00000032 [101511.721488] R13: ffffffffa04370b1 R14: ffff8801309bbb60 R15: 00000000000003bf [101511.729486] FS: 0000000000000000(0000) GS:ffff88002bac0000(0000) knlGS:0000000000000000 [101511.738483] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [101511.744482] CR2: 00007fbcd3ff1b80 CR3: 0000000000201000 CR4: 00000000000006e0 [101511.752480] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [101511.760479] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [101511.768478] Call Trace: [101511.771478] [] ? btrfs_try_spin_lock+0x1c/0x61 [btrfs] [101511.778476] [] ? btrfs_search_slot+0x619/0x73e [btrfs] [101511.786474] [] ? btrfs_insert_empty_items+0x5e/0xa9 [btrfs] [101511.803472] [] ? alloc_reserved_file_extent+0x89/0x1c3 [btrfs] [101511.811470] [] ? update_reserved_extents+0x98/0xab [btrfs] [101511.819468] [] ? run_one_delayed_ref+0x382/0x42f [btrfs] [101511.827467] [] ? cache_flusharray+0xa2/0xae [101511.833466] [] ? run_clustered_refs+0x237/0x2b4 [btrfs] [101511.840463] [] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] [101511.848462] [] ? thread_return+0x3e/0x91 [101511.854461] [] ? btrfs_run_delayed_refs+0xac/0x195 [btrfs] [101511.862459] [] ? btrfs_commit_transaction+0x7b/0x69c [btrfs] [101511.870458] [] ? autoremove_wake_function+0x0/0x38 [101511.877458] [] ? start_transaction+0x103/0x10f [btrfs] [101511.885456] [] ? transaction_kthread+0x17f/0x20a [btrfs] [101511.892453] [] ? transaction_kthread+0x0/0x20a [btrfs] [101511.900453] [] ? transaction_kthread+0x0/0x20a [btrfs] [101511.907452] [] ? kthread+0x56/0x86 [101511.912450] [] ? child_rip+0xa/0x20 [101511.918449] [] ? kthread+0x0/0x86 [101511.923449] [] ? child_rip+0x0/0 [101536.249729] Pid: 28594, comm: btrfs-endio-wri Tainted: G D 2.6.30-rc7-autokern1 #1 IBM x3950-[88726RU]- [101536.249729] RIP: 0010:[] [] _spin_lock+0x14/0x1a [101536.249729] RSP: 0018:ffff88011a80da80 EFLAGS: 00000297 [101536.249729] RAX: 000000000000c6c2 RBX: ffff88011a80da80 RCX: 0000000000000000 [101536.249729] RDX: 0000000000000000 RSI: ffff88013d080000 RDI: ffff8800478ad6b0 [101536.249729] RBP: ffffffff8020c50e R08: 000000000000004c R09: 0000000000000001 [101536.249729] R10: 0000000000000008 R11: 0000000000086000 R12: ffff88011a80da40 [101536.249729] R13: ffff8800aa254800 R14: 0000000b470c7fff R15: ffff88011f256030 [101536.249729] FS: 0000000000000000(0000) GS:ffff88002ba30000(0000) knlGS:0000000000000000 [101536.249729] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [101536.249729] CR2: 000000000065b078 CR3: 0000000000201000 CR4: 00000000000006e0 [101536.249729] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [101536.249729] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [101536.249729] Call Trace: [101536.249729] [] ? btrfs_tree_lock+0x54/0x9e [btrfs] [101536.249729] [] ? btrfs_wake_function+0x0/0x10 [btrfs] [101536.249729] [] ? btrfs_lock_root_node+0x1d/0x4b [btrfs] [101536.249729] [] ? btrfs_search_slot+0xc7/0x73e [btrfs] [101536.249729] [] ? btrfs_insert_empty_items+0x5e/0xa9 [btrfs] [101536.249729] [] ? run_one_delayed_ref+0x164/0x42f [btrfs] [101536.249729] [] ? run_clustered_refs+0x237/0x2b4 [btrfs] [101536.249729] [] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] [101536.249729] [] ? btrfs_run_delayed_refs+0xac/0x195 [btrfs] [101536.249729] [] ? __btrfs_end_transaction+0x59/0xfe [btrfs] [101536.249729] [] ? btrfs_end_transaction+0xb/0xd [btrfs] [101536.249729] [] ? btrfs_finish_ordered_io+0x224/0x24d [btrfs] [101536.249729] [] ? btrfs_writepage_end_io_hook+0x10/0x12 [btrfs] [101536.249729] [] ? end_bio_extent_writepage+0xa3/0x18f [btrfs] [101536.249729] [] ? del_timer_sync+0x14/0x20 [101536.249729] [] ? bio_endio+0x26/0x28 [101536.249729] [] ? end_workqueue_fn+0x111/0x11e [btrfs] [101536.249729] [] ? worker_loop+0x67/0x1ee [btrfs] : > For the progs: > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git newformat > I should mention that I missed the part about the new user tools, so while these we newly formated filesystems, they were created with the old tools. These are both running 64bit. I plan to install the new tools and re-run. Steve > The main benefit of the new code is that backrefs on the extent > allocation tree use a fuzzier format. It basically means that we search > for the key in the extent allocation tree instead of providing an exact > backref to the parent block. > > This means we can predict how many blocks will be changed when changing > the extent allocation tree, and it makes enospc much less complex. It > is also significantly faster. > > For regular subvolume trees, a similar change is made as long as there > are no snapshots against a given block. This is the common case, and it > makes COW less expensive overall. > > Yan Zheng also worked out a way to free blocks during the transaction > without needing to do an explicit snapshot deletion on the old root when > the transaction was done. This gets rid of some complex caching code, > and fixes worst-case problems where btrfs could take a very very long > time to unmount. > > btrfs-vol -b is faster with the new code as well, he added caching of > high levels in the tree to speed things up. > > (Many kudos to Yan Zheng for all of this work!) > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >