From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from magic.merlins.org ([209.81.13.136]:54174 "EHLO mail1.merlins.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754668AbcARXkO (ORCPT ); Mon, 18 Jan 2016 18:40:14 -0500 Date: Mon, 18 Jan 2016 15:39:43 -0800 From: Marc MERLIN To: Duncan <1i5t5.duncan@cox.net>, Hugo Mills , Btrfs mailing list Subject: Re: BTRFS: bdev /dev/mapper/dshelf1 errs: wr 2970, rd 848, flush 0, corrupt 189, gen 0 Message-ID: <20160118233943.GF17679@merlins.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20160118124533.GU422@carfax.org.uk> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Mon, Jan 18, 2016 at 03:21:53AM +0000, Duncan wrote: > No. Those are monotonically increasing counts that are never > automatically reset over the life of the filesystem. Use ... > > btrfs dev stats > > ... to display them on the commandline (vs. the kernel log at filesystem > mount), with the -z option to reset them after display, if desired. It's a bit counter intuitive that check --repair doesn't reset error counts if it fixed underlying errors, but maybe there is a good reason :) btrfs dev stats -z /dev/mapper/dshelf1 put everything back to 0 as you hinted, I can now watch what's going on from here on, thanks. On Mon, Jan 18, 2016 at 12:45:33PM +0000, Hugo Mills wrote: > > bad extent [8697338122240, 8697338126336), type mismatch with chunk > > bad extent [8697338126336, 8697338130432), type mismatch with chunk > > bad extent [8697338130432, 8697338134528), type mismatch with chunk > > This is, I think, a symptom of an FS created with a broken > mkfs.btrfs, and it needs to be re-created. Take a look for that error > message in the mailing list archives -- there's been a few posts about > it in the last couple of months. Thanks for that other hint. It was created quite a while ago (1y+), and ran ok until I had an unexpected crash. If I have to rebuild it, I will, but that will take 2 days+ due to the size and getting the backup back (well also, I'm not home for a week, so restoring backups remotely isn't going to be fun). But sure enough, while it ran perfectly fine for a long time, after check --repair, my machine is now crashing every hour or so, with the crash below. I had to have an older kernel due to a separate problem where PMP (sata port multiplier) support was broken in more recent kernels. I've just put 4.3.3 on it to see if crashes still, but either way, I thought btrfs had gotten rid of its last BUG_ON(xxx). System should never crash due to unexpected filesystem state on a non system partition. Mmmh, scratch this, here's the BUG_ON, it's a memory problem :( fs/btrfs/ctree.c:5200 /* save our key for returning back */ btrfs_node_key_to_cpu(cur, &found_key, slot); path->slots[level] = slot; if (level == path->lowest_level) { ret = 0; goto out; } btrfs_set_path_blocking(path); cur = read_node_slot(root, cur, slot); BUG_ON(!cur); /* -ENOMEM */ Did check --repair potentially change my FS in a way that is now making the kernel take a lot of RAM and crash? ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:5200! invalid opcode: 0000 [#1] SMP CPU: 3 PID: 29041 Comm: btrfs Tainted: G W 4.2.5-amd64-i915-volpreempt-20150421 #4 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 task: ffff88002f8482c0 ti: ffff8800c9058000 task.ti: ffff8800c9058000 RIP: 0010:[] [] btrfs_search_forward+0x1a6/0x24b RSP: 0018:ffff8800c905bbc8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801494409b0 RCX: 000000000003c6c8 RDX: 0000000000000000 RSI: 000000000aa60000 RDI: fffffffffffffffb RBP: ffff8800c905bc38 R08: 0000000001c00000 R09: 0000000000000000 R10: 0000000000aaaaaa R11: ffffffff818799e0 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801494409b4 FS: 00007fa9d46848c0(0000) GS:ffff88021e2c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f77cd000 CR3: 0000000143780000 CR4: 00000000001406e0 Stack: ffff8800c9bb6050 0001880149440a18 ffff8800c905bc87 ffff8800c939a800 0000000000000003 1a0000000000002e 84000000000000b3 000000000003c59e 000000000000002c ffff8801494409b0 0000000000000000 ffff8800c905bce0 Call Trace: [] search_ioctl+0xfc/0x167 [] btrfs_ioctl_tree_search+0x5b/0x8e [] btrfs_ioctl+0x396/0x232f [] ? get_page+0xe/0x28 [] ? __lru_cache_add+0x23/0x44 [] ? lru_cache_add_active_or_unevictable+0x2d/0x6b [] ? set_pte_at+0x9/0xd [] ? handle_mm_fault+0x90e/0xe9b [] ? paravirt_write_msr+0xf/0x13 [] do_vfs_ioctl+0x39b/0x412 [] ? current_kernel_time+0xe/0x32 [] ? __audit_syscall_entry+0xbe/0xe0 [] SyS_ioctl+0x5a/0x7f [] entry_SYSCALL_64_fastpath+0x16/0x75 Code: 89 47 40 44 3b ab 84 00 00 00 74 69 48 89 df e8 f7 a3 ff ff 8b 55 b8 48 8b 7d a8 4c 89 e6 e8 d8 86 ff ff 48 85 c0 <0f> 0b 48 89 c7 e8 23 a7 04 00 41 8d 45 ff 41 c7 47 5c 02 00 00 RIP [] btrfs_search_forward+0x1a6/0x24b RSP ---[ end trace e3c37adcaa703f63 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled drm_kms_helper: panic occurred, switching back to text console -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901