From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@vger.kernel.org>
Received: from mail-ea0-f174.google.com ([209.85.215.174]:51271 "EHLO
	mail-ea0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754467Ab2JNUWs (ORCPT
	<rfc822;linux-btrfs@vger.kernel.org>);
	Sun, 14 Oct 2012 16:22:48 -0400
Received: by mail-ea0-f174.google.com with SMTP id c13so1001713eaa.19
        for <linux-btrfs@vger.kernel.org>; Sun, 14 Oct 2012 13:22:47 -0700 (PDT)
Message-ID: <507B1F2B.1030803@gmail.com>
Date: Sun, 14 Oct 2012 22:23:07 +0200
From: Goffredo Baroncelli <kreijack@gmail.com>
MIME-Version: 1.0
To: Tommy Pettersson <ptp@lysator.liu.se>
CC: linux-btrfs@vger.kernel.org
Subject: Re: btrfs suddenly lost all om my huge free space
References: <20121014001912.GA1247@fruity> <507AE44E.4040008@gmail.com> <20121014183503.GA7616@fruity>
In-Reply-To: <20121014183503.GA7616@fruity>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Sender: linux-btrfs-owner@vger.kernel.org
List-ID: <linux-btrfs.vger.kernel.org>

On 2012-10-14 20:35, Tommy Pettersson wrote:
> The problem has been resolved, but I think it will be impossible
> to figure out what went wrong. The root cause was I accidentally
> messed up my initrd so that btrfs was mounted without prior dev
> scan (which I think didn't work with earlier kernels, but now
> (3.4.9-gentoo) it "worked" in a very bad way it seems), and
> possibly that I also mounted subvolid=0 (containing the subvol I
> previously mounted as / ) with conflicting mount options for
> space_cache.

This is a very strange behaviour; I am not aware of any bug which could 
justify this.
>
> But after I had realized and fixed that, it was too late. Both
> Scrub and Balance, and reading from the filesystem, behaved
> strange. The output of df jumped between 95 % and 12 %, while I
> got many lines about wrong checksums, unexpected tree parent
> generation something, and free space inode generation (0) did
> not match free space cache. It sometimes said it corrected
> things, but it didn't seem to help, and at random points I would
> get a kernel panic.
>
> # uname -a
> Linux fruit64 3.4.9-gentoo #2 SMP PREEMPT Sat Sep 1 17:34:38 CEST 2012 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/Linux

The 3.4.9 is a quite old kernel. I am guessing if a recent kernel would 
still behave as you described.

>
> # btrfs --version
> Btrfs Btrfs v0.19
>
> It would have been nice to debug this mess so that btrfs could
> handled it in the future, and not do all the strange things with
> the free space and cause kernel panics, but I had to get my
> system back up.
>
> The good news is that even this torture of my bits didn't
> actually kill them. I eventually cleared the btrfs master record
> on one of the disks, mounted in degraded mode, added it back,
> waited seven hours for balance to finish, and now my filesystem
> is consistent again, and everything is back to normal. So no
> need to restore from my daily backup yet. :-)

Good !

>
>
> Regards,
> Tommy
>
>
> On Sun, Oct 14, 2012 at 06:11:58PM +0200, Goffredo Baroncelli wrote:
>> Hi,
>>
>> did you used the latest kernel version ?
>> The other thing that you could try is a scrub looking for a defective
>> page.. but I don't think so....
>>
>> BR
>> G.Baroncelli
>>
>>
>>
>> On 2012-10-14 02:19, Tommy Pettersson wrote:
>>> Hi,
>>>
>>> (I'm not subscribed to the list, so please CC me.)
>>>
>>> I have a btrfs with raid1 on two identical unpartitioned disks.
>>> Today I noticed that df (normal df) said I am 77 % full. This
>>> was a chock, because since forever it has been around 12 %.
>>>
>>>
>>> # btrfs fi show
>>> Label: 'green'  uuid: dd83031c-2447-4736-a8f6-9bd9cdeea879
>>>           Total devices 2 FS bytes used 212.88GB
>>>           devid    2 size 1.82TB used 356.04GB path /dev/sdb
>>>           devid    1 size 1.82TB used 356.06GB path /dev/sda
>>>
>>> # btrfs fi df /
>>> Data, RAID1: total=276.00GB, used=209.02GB
>>> Data: total=8.00M, used=0.00
>>> System, RAID1: total=40.00MB, used=64.00KB
>>> System: total=4.00MB, used=0.00
>>> Metadata, RAID1: total=80.00GB, used=3.88GB
>>> Metadata: total=8.00MB, used=0.00
>>>
>>> # df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> rootfs          3.7T  426G   134G  77% /
>>>
>>>
>>> The thing that has drastically changed is Avail in the output
>>> from df.
>>>
>>> I tried a btrfs balance, which self-aborted after some hours
>>> with No space left on device. I deleted two snapshots, so I got
>>> some free space and could use the system again.
>>>
>>> The balance, although it didn't finish, seems to have reduced
>>> the used space, but it also reduced the "available" space:
>>>
>>>
>>> # btrfs fi show
>>> Label: 'green'  uuid: dd83031c-2447-4736-a8f6-9bd9cdeea879
>>>           Total devices 2 FS bytes used 212.88GB
>>>           devid    2 size 1.82TB used 356.04GB path /dev/sdb
>>>           devid    1 size 1.82TB used 215.01GB path /dev/sda
>>>
>>> # btrfs fi df /
>>> Data, RAID1: total=210.00GB, used=197.97GB
>>> System, RAID1: total=8.00MB, used=44.00KB
>>> System: total=4.00MB, used=0.00
>>> Metadata, RAID1: total=5.00GB, used=3.41GB
>>>
>>> # df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> rootfs          3.7T  403G   25G  95% /
>>>
>>>
>>> I made an unqualified guess that the space cache was corrupted,
>>> and tried to mount with option clear_cache and nospace_cache.
>>> Both of them caused btrfs to scan my disks for a couple of
>>> minutes at boot, but the amount of available space did not
>>> improve.
>>>
>>> What can I do to help locate the cause of this problem?
>>>
>>>
>>> Regards,
>>> Tommy
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>