linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Joliet <marcec@gmx.de>
To: linux-btrfs@vger.kernel.org
Subject: ENOSPC errors during balance
Date: Sat, 19 Jul 2014 17:26:05 +0200	[thread overview]
Message-ID: <20140719172605.445e8445@marcec> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 7014 bytes --]

Hello all

I'm a somewhat new btrfs user, and have recently finalized the conversion of my
desktop to btrfs with the 1TB backup partition on my 3TB USB3 drive.

The concrete issue is that I have seen two enospc errors this week while running
a full balance on the backup partition.

A little background on the system first: it consists, as of now, of 3 btrfs
file systems (for more details look towards the bottom of this email):

- / on a single 120GB Crucial M500 SSD (data single, metadata DUP)
- /home on 4x320GB disks (data raid10, metadata raid1)
- the aforementioned backup partition on the external USB3 drive (data single,
  metadata DUP)

This is the end result of a conversion process I had started about two to three
months ago.  My system started off with mdraid + LVM2 (/ and /boot on RAID1,
LVM on RAID10), and no SSD, and no backups (nasty, I know).  Both the SSD and
the backup partition were originally ext4, and were converted using
btrfs-convert.

After converting the backup partition about a week ago, following the wiki entry
on ext4 conversion, I eventually ran a full balance (after first converting
metadata to DUP and running a "lighter" balance with -dusage=50 or so, although
that was probably a waste of time).

The full balance was still running the same night, but the morning after I found
that it aborted with ENOSPC (after about 12-13 hours, I think).  My syslog said
the following:

  kernel: BTRFS info (device sdg2): 4 enospc errors during balance

Some additional information: a balance was running on /home (which finished
successfully), and fcron started a weekly scrub on the backup partition, which
finished (also without errors) shortly after the ENOSPC error.

Then, since I wasn't thinking properly, I started a new balance (which ran
fine) before collecting the information below. From memory I can say this:
the total size of metadata and data was nowhere *near* full disk capacity. In
fact, it was very close to the "btrfs fi df" output below. Furthermore, the
output of "btrfs balance status" reported "(-nan)" where "(N considered)"
normally appears (I think the other two numbers were incorrect, too, like "0
out of about 0", but don't remember their exact values).

Now, when I fist set up my backups, I decided on rsnapshot.  After converting
the backup partition to btrfs, rsnapshot got *really* slow, so this week I
switched to plain rsync + btrfs-snap, wrapped in two custom shell scripts (I
will switch to btrfs-send/recv once I think that they are stable). This is
consistently faster, but still more erratic and slower than rsnapshot with an
ext4 target file system (about 8-23 minutes vs. 5-10 minutes).

Finally we arrive at today, where, after deleting the rest of my old rsnapshot
backups, I did a full rebalance (because it freed up a *lot* of space, both
data and metadata), which also aborted with ENOSPC:

  kernel: BTRFS info (device sdg2): 2 enospc errors during balance

This time, after about 3 hours, it was almost done (241 of 244 chunks), and
"balance status" didn't show the -nan I had seen the previous time.  The only
thing I noticed was that "total" space jumped by several GB (from 229 to 236 or
so), while "used" only increased by 1-2 GB a few minutes before the balance
aborted.  Starting and aborting another balance freed the space again.

Looking through my local ML archive, I found some problem reports related to
balance.  The one most similar to mine (AFAICT) is "3.14.2 Debian kernel BTRFS
corruption after balance" from Russel Coker, although in my case the file
system has yet to end up corrupted.

And finally, I'd like to make clear that up to now I have been very happy with
btrfs and  that this is the first real issue I have encountered with it
(although I don't use a lot of its features yet).  For my usage I definitely
like it a lot more than mdraid + LVM2 :-) .

The requested output as per wiki, from shortly after I got the first error:

    marcec marcec # uname -a

    Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
    x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

    marcec marcec # btrfs --version

    Btrfs v3.14.2

    marcec marcec # btrfs fi show

    Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
            Total devices 1 FS bytes used 42.19GiB
            devid    1 size 107.79GiB used 50.06GiB path /dev/sdf1

    Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
            Total devices 4 FS bytes used 476.02GiB
            devid    1 size 298.09GiB used 238.03GiB path /dev/sda
            devid    2 size 298.09GiB used 239.03GiB path /dev/sdb
            devid    3 size 298.09GiB used 240.00GiB path /dev/sdc
            devid    4 size 298.09GiB used 239.00GiB path /dev/sdd

    Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
            Total devices 1 FS bytes used 167.81GiB
            devid    1 size 976.56GiB used 180.06GiB path /dev/sdg2

    Btrfs v3.14.2

    marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP 

    Data, single: total=160.00GiB, used=159.06GiB
    System, DUP: total=32.00MiB, used=28.00KiB
    Metadata, DUP: total=10.00GiB, used=8.80GiB
    unknown, single: total=512.00MiB, used=40.95MiB

And now from today:

    marcec marcec # uname -a

    Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
    x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

    marcec marcec # btrfs --version 

    Btrfs v3.14.2

    marcec marcec # btrfs fi show 

    Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
	        Total devices 1 FS bytes used 42.16GiB
	        devid    1 size 107.79GiB used 50.06GiB path /dev/sdf1

    Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
            Total devices 4 FS bytes used 474.84GiB
            devid    1 size 298.09GiB used 238.03GiB path /dev/sda
            devid    2 size 298.09GiB used 239.03GiB path /dev/sdb
            devid    3 size 298.09GiB used 240.00GiB path /dev/sdc
            devid    4 size 298.09GiB used 239.00GiB path /dev/sdd

    Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
            Total devices 1 FS bytes used 231.97GiB
            devid    1 size 976.56GiB used 237.06GiB path /dev/sdg2

    Btrfs v3.14.2

    marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP 

    Data, single: total=229.00GiB, used=228.77GiB
    System, DUP: total=32.00MiB, used=36.00KiB
    Metadata, DUP: total=4.00GiB, used=3.19GiB
    unknown, single: total=512.00MiB, used=0.00

The output of dmesg from both times is attached; however, to avoid exceeding
100KiB, I compressed them with xz first.

Greetings,
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg.log.xz --]
[-- Type: application/x-xz, Size: 36848 bytes --]

[-- Attachment #1.3: dmesg2.log.xz --]
[-- Type: application/x-xz, Size: 26568 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

             reply	other threads:[~2014-07-19 15:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-19 15:26 Marc Joliet [this message]
2014-07-19 17:38 ` ENOSPC errors during balance Chris Murphy
2014-07-19 21:06   ` Piotr Szymaniak
2014-07-20  2:39   ` Duncan
2014-07-20 10:22     ` Marc Joliet
2014-07-20 11:40       ` Marc Joliet
2014-07-20 19:44         ` Marc Joliet
2014-07-21  2:41           ` Duncan
2014-07-21 13:22           ` Marc Joliet
2014-07-21 22:30             ` Marc Joliet
2014-07-21 23:30               ` Marc Joliet
2014-07-22  3:26                 ` Duncan
2014-07-22  7:37                   ` Marc Joliet
2014-07-20 12:59       ` Duncan
2014-07-21 11:01         ` Brendan Hide
  -- strict thread matches above, loose matches on Subject: below --
2014-07-19 20:10 Fw: " Marc Joliet
2014-07-19 20:58 ` Marc Joliet
2014-07-20  0:53   ` Chris Murphy
2014-07-20  9:50     ` Marc Joliet
2014-07-20  1:11   ` Chris Murphy
2014-07-20  9:48     ` Marc Joliet
2014-07-20 19:46       ` Marc Joliet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140719172605.445e8445@marcec \
    --to=marcec@gmx.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).