From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([222.73.24.84]:35523 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751268Ab2HCBya (ORCPT ); Thu, 2 Aug 2012 21:54:30 -0400 Message-ID: <501B2F20.7080903@cn.fujitsu.com> Date: Fri, 03 Aug 2012 09:53:36 +0800 From: Miao Xie Reply-To: miaox@cn.fujitsu.com MIME-Version: 1.0 To: Linux Btrfs , Josef Bacik , Wu Fengguang , Seto Hidetoshi , David Sterba Subject: Re: [PATCH V3 2/2] Btrfs: fix the snapshot that should not exist References: <501A04CE.9090408@cn.fujitsu.com> <20120802114631.GM17430@twin.jikos.cz> In-Reply-To: <20120802114631.GM17430@twin.jikos.cz> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, 2 Aug 2012 13:46:31 +0200, David Sterba wrote: > Hi, > > appologies for late reply, > > On Thu, Aug 02, 2012 at 12:40:46PM +0800, Miao Xie wrote: >> Changelog v1 -> v2: >> - add comment to explain why we need deal with the delayed items after >> snapshot creation and why this operation do not corrupt the metadata. > > I'm sorry, the comment did not fix the bug :) > > The subvol stress is able to hit this: > > [ 2360.444321] ------------[ cut here ]------------ > [ 2360.448019] kernel BUG at fs/btrfs/extent-tree.c:6047! > [ 2360.448019] invalid opcode: 0000 [#1] SMP > [ 2360.448019] CPU 0 > [ 2360.448019] Modules linked in: btrfs aoe [last unloaded: btrfs] > [ 2360.448019] > [ 2360.448019] Pid: 8212, comm: btrfs Not tainted 3.5.0-default+ #170 Intel Corporation Santa Rosa platform/Matanzas > [ 2360.448019] RIP: 0010:[] [] run_clustered_refs+0xa11/0xa20 [btrfs] > [ 2360.448019] RSP: 0018:ffff88003eca1a68 EFLAGS: 00010246 > [ 2360.448019] RAX: 00000000000007ff RBX: ffff880017a694c8 RCX: ffff88003eca1a08 > [ 2360.448019] RDX: ffff880028aa9000 RSI: 00000000000007fe RDI: ffff880064223cf0 > [ 2360.448019] RBP: ffff88003eca1b48 R08: 00000000000007ff R09: ffff88003eca19f8 > [ 2360.448019] R10: ffff88002435d1e8 R11: 0000000000000000 R12: ffff880025d66d28 > [ 2360.448019] R13: ffff880038640000 R14: ffff8800778dfa88 R15: ffff880060f010d0 > [ 2360.448019] FS: 00007f3289f35740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 > [ 2360.448019] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2360.448019] CR2: ffffffffff600400 CR3: 000000002e112000 CR4: 00000000000007f0 > [ 2360.448019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 2360.448019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 2360.448019] Process btrfs (pid: 8212, threadinfo ffff88003eca0000, task ffff88001d834200) > [ 2360.448019] Stack: > [ 2360.448019] 0000000000000000 0000000000000000 0000000000000001 0000000000000000 > [ 2360.448019] 00000000000007ed ffff88002435d1e8 000000003eca1b18 0000000000000000 > [ 2360.448019] 0000000000000770 0000000000000000 000000005cb1e000 ffff88003eca1c08 > [ 2360.448019] Call Trace: > [ 2360.448019] [] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs] > [ 2360.448019] [] ? trace_hardirqs_on_caller+0x155/0x1d0 > [ 2360.448019] [] ? btrfs_free_path+0x2a/0x40 [btrfs] > [ 2360.448019] [] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] > [ 2360.448019] [] btrfs_commit_transaction+0x605/0xb00 [btrfs] > [ 2360.448019] [] ? lock_release_holdtime+0x3d/0x1c0 > [ 2360.448019] [] ? btrfs_mksubvol+0x298/0x360 [btrfs] > [ 2360.448019] [] ? wake_up_bit+0x40/0x40 > [ 2360.448019] [] ? do_raw_spin_unlock+0x5e/0xb0 > [ 2360.448019] [] btrfs_mksubvol+0x358/0x360 [btrfs] > [ 2360.448019] [] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs] > [ 2360.448019] [] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs] > [ 2360.448019] [] btrfs_ioctl+0x48e/0x1340 [btrfs] > [ 2360.448019] [] ? do_page_fault+0x2d0/0x580 > [ 2360.448019] [] ? _raw_spin_unlock_irq+0x30/0x50 > [ 2360.448019] [] ? finish_task_switch+0x83/0xf0 > [ 2360.448019] [] do_vfs_ioctl+0x98/0x560 > [ 2360.448019] [] ? retint_swapgs+0x13/0x1b > [ 2360.448019] [] sys_ioctl+0x4f/0x80 > [ 2360.448019] [] system_call_fastpath+0x16/0x1b > [ 2360.448019] Code: 8b 76 40 48 89 d7 48 89 55 a0 e8 2b 74 ff ff 83 f8 17 0f 87 1e ff ff ff 0f 0b 80 fa b2 0f 84 b4 f8 ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 > [ 2360.448019] RIP [] run_clustered_refs+0xa11/0xa20 [btrfs] > [ 2360.448019] RSP > [ 2360.814508] ---[ end trace 555a16cac3620ccb ]--- > [ 2360.820398] note: btrfs[8212] exited with preempt_count 1 > [ 2360.827072] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 > [ 2360.836047] in_atomic(): 1, irqs_disabled(): 0, pid: 8212, name: btrfs > [ 2360.843859] INFO: lockdep is turned off. > [ 2360.849021] Pid: 8212, comm: btrfs Tainted: G D 3.5.0-default+ #170 > [ 2360.849022] Call Trace: > [ 2360.849027] [] __might_sleep+0xfc/0x130 > [ 2360.849030] [] down_read+0x26/0xa0 > [ 2360.849034] [] acct_collect+0x4b/0x1b0 > [ 2360.849038] [] do_exit+0x718/0x9a0 > [ 2360.849041] [] ? kmsg_dump+0x26/0x140 > [ 2360.849043] [] oops_end+0xb0/0xf0 > [ 2360.849046] [] die+0x5b/0x90 > [ 2360.849048] [] do_trap+0xc4/0x170 > [ 2360.849052] [] do_invalid_op+0x95/0xb0 > [ 2360.849067] [] ? run_clustered_refs+0xa11/0xa20 [btrfs] > [ 2360.849071] [] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 2360.849073] [] ? restore_args+0x30/0x30 > [ 2360.849076] [] invalid_op+0x1b/0x20 > [ 2360.849087] [] ? run_clustered_refs+0xa11/0xa20 [btrfs] > [ 2360.849097] [] ? run_clustered_refs+0x69b/0xa20 [btrfs] > [ 2360.849108] [] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs] > [ 2360.849110] [] ? trace_hardirqs_on_caller+0x155/0x1d0 > [ 2360.849119] [] ? btrfs_free_path+0x2a/0x40 [btrfs] > [ 2360.849133] [] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] > [ 2360.849145] [] btrfs_commit_transaction+0x605/0xb00 [btrfs] > [ 2360.849148] [] ? lock_release_holdtime+0x3d/0x1c0 > [ 2360.849161] [] ? btrfs_mksubvol+0x298/0x360 [btrfs] > [ 2360.849164] [] ? wake_up_bit+0x40/0x40 > [ 2360.849166] [] ? do_raw_spin_unlock+0x5e/0xb0 > [ 2360.849180] [] btrfs_mksubvol+0x358/0x360 [btrfs] > [ 2360.849194] [] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs] > [ 2360.849207] [] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs] > [ 2360.849221] [] btrfs_ioctl+0x48e/0x1340 [btrfs] > [ 2360.849224] [] ? do_page_fault+0x2d0/0x580 > [ 2360.849226] [] ? _raw_spin_unlock_irq+0x30/0x50 > [ 2360.849229] [] ? finish_task_switch+0x83/0xf0 > [ 2360.849231] [] do_vfs_ioctl+0x98/0x560 > [ 2360.849234] [] ? retint_swapgs+0x13/0x1b > [ 2360.849236] [] sys_ioctl+0x4f/0x80 > [ 2360.849239] [] system_call_fastpath+0x16/0x1b > [ 2360.849255] BUG: scheduling while atomic: btrfs/8212/0x10000002 > [ 2360.849256] INFO: lockdep is turned off. > [ 2360.849257] Modules linked in: btrfs aoe [last unloaded: btrfs] > [ 2360.849261] Pid: 8212, comm: btrfs Tainted: G D 3.5.0-default+ #170 > [ 2360.849262] Call Trace: > [ 2360.849262] [] __schedule_bug+0x68/0x90 > [ 2360.849265] [] __schedule+0x73c/0x810 > [ 2360.849268] [] __cond_resched+0x2a/0x40 > [ 2360.849270] [] _cond_resched+0x31/0x40 > [ 2360.849273] [] unmap_single_vma+0x493/0x750 > [ 2360.849276] [] ? lru_deactivate_fn+0x1e0/0x1e0 > [ 2360.849279] [] ? trace_hardirqs_on_caller+0x20/0x1d0 > [ 2360.849281] [] unmap_vmas+0x3c/0x60 > [ 2360.849284] [] exit_mmap+0x81/0x140 > [ 2360.849287] [] mmput+0x74/0x130 > [ 2360.849289] [] exit_mm+0x100/0x120 > [ 2360.849292] [] do_exit+0x738/0x9a0 > [ 2360.849294] [] ? kmsg_dump+0x26/0x140 > [ 2360.849297] [] oops_end+0xb0/0xf0 > [ 2360.849299] [] die+0x5b/0x90 > [ 2360.849301] [] do_trap+0xc4/0x170 > [ 2360.849304] [] do_invalid_op+0x95/0xb0 > [ 2360.849307] [] ? run_clustered_refs+0xa11/0xa20 [btrfs] > [ 2360.849317] [] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 2360.849320] [] ? restore_args+0x30/0x30 > [ 2360.849322] [] invalid_op+0x1b/0x20 > [ 2360.849325] [] ? run_clustered_refs+0xa11/0xa20 [btrfs] > [ 2360.849335] [] ? run_clustered_refs+0x69b/0xa20 [btrfs] > [ 2360.849346] [] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs] > [ 2360.849356] [] ? trace_hardirqs_on_caller+0x155/0x1d0 > [ 2360.849358] [] ? btrfs_free_path+0x2a/0x40 [btrfs] > [ 2360.849367] [] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] > [ 2360.849380] [] btrfs_commit_transaction+0x605/0xb00 [btrfs] > [ 2360.849393] [] ? lock_release_holdtime+0x3d/0x1c0 > [ 2360.849395] [] ? btrfs_mksubvol+0x298/0x360 [btrfs] > [ 2360.849409] [] ? wake_up_bit+0x40/0x40 > [ 2360.849411] [] ? do_raw_spin_unlock+0x5e/0xb0 > [ 2360.849413] [] btrfs_mksubvol+0x358/0x360 [btrfs] > [ 2360.849427] [] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs] > [ 2360.849441] [] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs] > [ 2360.849455] [] btrfs_ioctl+0x48e/0x1340 [btrfs] > [ 2360.849469] [] ? do_page_fault+0x2d0/0x580 > [ 2360.849471] [] ? _raw_spin_unlock_irq+0x30/0x50 > [ 2360.849473] [] ? finish_task_switch+0x83/0xf0 > [ 2360.849476] [] do_vfs_ioctl+0x98/0x560 > [ 2360.849478] [] ? retint_swapgs+0x13/0x1b > [ 2360.849481] [] sys_ioctl+0x4f/0x80 > [ 2360.849483] [] system_call_fastpath+0x16/0x1b > > fs/btrfs/extent-tree.c:6047 > > 6046 if (parent > 0) { > 6047 BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)); > 6048 btrfs_set_extent_inline_ref_type(leaf, iref, > 6049 BTRFS_SHARED_BLOCK_REF_KEY); > 6050 btrfs_set_extent_inline_ref_offset(leaf, iref, parent); > 6051 } else { > 6052 btrfs_set_extent_inline_ref_type(leaf, iref, > 6053 BTRFS_TREE_BLOCK_REF_KEY); > 6054 btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid); > 6055 } This bug is similar to the one which is reported by Daniel J Blueman a month ago. And Josef have fixed it, but the patch has not been merged into for-linus branch till now. Did you applied that patch? > > Currently for-linux hangs early during the test, so I applied V3 patches on top > of 3.5. > > The filesystem is freshly created, the load is to simultaneously unpack large tar, > snapshot the fs, delete random snapshot, looped rm of the untarred dir. Crashes after > some minutes, reliably. Could you send the test tool to me? I want to look into it. Thanks Miao > > Fsck spits lots of errors: > > ref mismatch on [1133031424 4096] extent item 1, found 0 > Backref 1133031424 root 5 not referenced back 0x7d1f40 > Incorrect global backref count on 1133031424 found 1 wanted 0 > backpointer mismatch on [1133031424 4096] > owner ref check failed [1133031424 4096] > > ref mismatch on [11213131776 16384] extent item 1, found 0 > Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 found 0 wanted 1 back 0x1424d8e0 > backpointer mismatch on [11213131776 16384] > owner ref check failed [11213131776 16384] > > fs tree 260 refs 6 not found > unresolved ref root 263 dir 256 index 4 namelen 14 name snap2748615355 error 600 > unresolved ref root 267 dir 256 index 4 namelen 14 name snap2748615355 error 600 > unresolved ref root 269 dir 256 index 4 namelen 14 name snap2748615355 error 600 > unresolved ref root 273 dir 256 index 4 namelen 14 name snap2748615355 error 600 > unresolved ref root 274 dir 256 index 4 namelen 14 name snap2748615355 error 600 > unresolved ref root 276 dir 256 index 4 namelen 14 name snap2748615355 error 600 > > > I've asked Josef to pull those patches out of btrfs-next, feel free to send me any testing > version if you can't reproduce it on your side. > > > david >