From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from magic.merlins.org ([209.81.13.136]:60930 "EHLO mail1.merlins.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751913AbcAYUzI (ORCPT ); Mon, 25 Jan 2016 15:55:08 -0500 Date: Mon, 25 Jan 2016 12:55:00 -0800 From: Marc MERLIN To: Qu Wenruo Cc: David Sterba , Btrfs mailing list Message-ID: <20160125205500.GK23751@merlins.org> References: <20160123170354.GA10113@merlins.org> <56A57C59.1040203@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <56A57C59.1040203@cn.fujitsu.com> Subject: Re: BTRFS: bdev /dev/mapper/dshelf1 errs: wr 2970, rd 848, flush 0, corrupt 189, gen 0 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Mon, Jan 25, 2016 at 09:37:29AM +0800, Qu Wenruo wrote: > >+David, +Qu > >about > >1) kernel crash on BUG_ON > > From your code mentioned, and your second kernel warning, it's out > of memory. > Such case also happened when I was debugging in-band de-dup patches. Right. So it's obviously a bug since it's on a lightly loaded server with 8GB of RAM, and this only started happeening after my FS started having problems. > Things seems that by some method, btrfs used a lot of memory for > dirty page caches. Maybe metadata pages. > > Normally when such case happens, VFS should trigger a sync to free > dirty pages, but btrfs seems to either delayed the sync due to > running trans or the VFS sync is already too late. Oh, I see. > But it's also possible that large leafsize is related to such problem. > The larger leafsize is, the harder to alloc continuous memory for kmalloc(). So basically, we seem to understand how we get there, but not quite why, or how to fix it, correct? > If you're using old version btrfsck, then it's possible such error > is a false alert. Update btrfsck and try again is a good idea. I had 4.3 as the latest in debian unstable, but now I see 4.4 just came out, so I installed it. > Even if it's not a false alert, mail list says it shouldn't cause > huge problem, only known problems happens is related to scrub. > And there is already some user reporting balance can fix it, > although you need to balance all chunks. Thanks for that tip. > >3) say more about "root 45948 inode 204452 errors 1000, some csum missing", > >that they aren't being fixed, and whether they're a big deal or not. > > Personally speaking, I didn't consider it as a big problem itself. > If csum is missing/corrupted, btrfsck --init-csum-tree can rebuild it. Any idea why check --repair isn't fixing them too, is that expected? gargamel:~# btrfs --version btrfs-progs v4.4 gargamel:~# btrfs check --repair --init-csum-tree -p /dev/mapper/dshelf1 2>&1 | tee check7 Reinit crc root crc refilling failed <<<< is that bad? enabling repair mode Creating a new CRC tree Checking filesystem on /dev/mapper/dshelf1 UUID: 6358304a-2234-4243-b02d-4944c9af47d7 Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901