ENOSPC errors during balance

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ENOSPC errors during balance
@ 2014-07-19 15:26 Marc Joliet
  2014-07-19 17:38 ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-19 15:26 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 7014 bytes --]

Hello all

I'm a somewhat new btrfs user, and have recently finalized the conversion of my
desktop to btrfs with the 1TB backup partition on my 3TB USB3 drive.

The concrete issue is that I have seen two enospc errors this week while running
a full balance on the backup partition.

A little background on the system first: it consists, as of now, of 3 btrfs
file systems (for more details look towards the bottom of this email):

- / on a single 120GB Crucial M500 SSD (data single, metadata DUP)
- /home on 4x320GB disks (data raid10, metadata raid1)
- the aforementioned backup partition on the external USB3 drive (data single,
  metadata DUP)

This is the end result of a conversion process I had started about two to three
months ago.  My system started off with mdraid + LVM2 (/ and /boot on RAID1,
LVM on RAID10), and no SSD, and no backups (nasty, I know).  Both the SSD and
the backup partition were originally ext4, and were converted using
btrfs-convert.

After converting the backup partition about a week ago, following the wiki entry
on ext4 conversion, I eventually ran a full balance (after first converting
metadata to DUP and running a "lighter" balance with -dusage=50 or so, although
that was probably a waste of time).

The full balance was still running the same night, but the morning after I found
that it aborted with ENOSPC (after about 12-13 hours, I think).  My syslog said
the following:

  kernel: BTRFS info (device sdg2): 4 enospc errors during balance

Some additional information: a balance was running on /home (which finished
successfully), and fcron started a weekly scrub on the backup partition, which
finished (also without errors) shortly after the ENOSPC error.

Then, since I wasn't thinking properly, I started a new balance (which ran
fine) before collecting the information below. From memory I can say this:
the total size of metadata and data was nowhere *near* full disk capacity. In
fact, it was very close to the "btrfs fi df" output below. Furthermore, the
output of "btrfs balance status" reported "(-nan)" where "(N considered)"
normally appears (I think the other two numbers were incorrect, too, like "0
out of about 0", but don't remember their exact values).

Now, when I fist set up my backups, I decided on rsnapshot.  After converting
the backup partition to btrfs, rsnapshot got *really* slow, so this week I
switched to plain rsync + btrfs-snap, wrapped in two custom shell scripts (I
will switch to btrfs-send/recv once I think that they are stable). This is
consistently faster, but still more erratic and slower than rsnapshot with an
ext4 target file system (about 8-23 minutes vs. 5-10 minutes).

Finally we arrive at today, where, after deleting the rest of my old rsnapshot
backups, I did a full rebalance (because it freed up a *lot* of space, both
data and metadata), which also aborted with ENOSPC:

  kernel: BTRFS info (device sdg2): 2 enospc errors during balance

This time, after about 3 hours, it was almost done (241 of 244 chunks), and
"balance status" didn't show the -nan I had seen the previous time.  The only
thing I noticed was that "total" space jumped by several GB (from 229 to 236 or
so), while "used" only increased by 1-2 GB a few minutes before the balance
aborted.  Starting and aborting another balance freed the space again.

Looking through my local ML archive, I found some problem reports related to
balance.  The one most similar to mine (AFAICT) is "3.14.2 Debian kernel BTRFS
corruption after balance" from Russel Coker, although in my case the file
system has yet to end up corrupted.

And finally, I'd like to make clear that up to now I have been very happy with
btrfs and  that this is the first real issue I have encountered with it
(although I don't use a lot of its features yet).  For my usage I definitely
like it a lot more than mdraid + LVM2 :-) .

The requested output as per wiki, from shortly after I got the first error:

    marcec marcec # uname -a

    Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
    x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

    marcec marcec # btrfs --version

    Btrfs v3.14.2

    marcec marcec # btrfs fi show

    Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
            Total devices 1 FS bytes used 42.19GiB
            devid    1 size 107.79GiB used 50.06GiB path /dev/sdf1

    Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
            Total devices 4 FS bytes used 476.02GiB
            devid    1 size 298.09GiB used 238.03GiB path /dev/sda
            devid    2 size 298.09GiB used 239.03GiB path /dev/sdb
            devid    3 size 298.09GiB used 240.00GiB path /dev/sdc
            devid    4 size 298.09GiB used 239.00GiB path /dev/sdd

    Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
            Total devices 1 FS bytes used 167.81GiB
            devid    1 size 976.56GiB used 180.06GiB path /dev/sdg2

    Btrfs v3.14.2

    marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP 

    Data, single: total=160.00GiB, used=159.06GiB
    System, DUP: total=32.00MiB, used=28.00KiB
    Metadata, DUP: total=10.00GiB, used=8.80GiB
    unknown, single: total=512.00MiB, used=40.95MiB

And now from today:

    marcec marcec # uname -a

    Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
    x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

    marcec marcec # btrfs --version 

    Btrfs v3.14.2

    marcec marcec # btrfs fi show 

    Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
	        Total devices 1 FS bytes used 42.16GiB
	        devid    1 size 107.79GiB used 50.06GiB path /dev/sdf1

    Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
            Total devices 4 FS bytes used 474.84GiB
            devid    1 size 298.09GiB used 238.03GiB path /dev/sda
            devid    2 size 298.09GiB used 239.03GiB path /dev/sdb
            devid    3 size 298.09GiB used 240.00GiB path /dev/sdc
            devid    4 size 298.09GiB used 239.00GiB path /dev/sdd

    Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
            Total devices 1 FS bytes used 231.97GiB
            devid    1 size 976.56GiB used 237.06GiB path /dev/sdg2

    Btrfs v3.14.2

    marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP 

    Data, single: total=229.00GiB, used=228.77GiB
    System, DUP: total=32.00MiB, used=36.00KiB
    Metadata, DUP: total=4.00GiB, used=3.19GiB
    unknown, single: total=512.00MiB, used=0.00

The output of dmesg from both times is attached; however, to avoid exceeding
100KiB, I compressed them with xz first.

Greetings,
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg.log.xz --]
[-- Type: application/x-xz, Size: 36848 bytes --]

[-- Attachment #1.3: dmesg2.log.xz --]
[-- Type: application/x-xz, Size: 26568 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 15:26 Marc Joliet
@ 2014-07-19 17:38 ` Chris Murphy
  2014-07-19 21:06   ` Piotr Szymaniak
  2014-07-20  2:39   ` Duncan
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Murphy @ 2014-07-19 17:38 UTC (permalink / raw)
  To: Marc Joliet; +Cc: linux-btrfs

The 2nd dmesg (didn't look at the 1st), has many instances like this;

[96241.882138] ata2.00: exception Emask 0x1 SAct 0x7ffe0fff SErr 0x0 action 0x6 frozen
[96241.882139] ata2.00: Ata error. fis:0x21
[96241.882142] ata2.00: failed command: READ FPDMA QUEUED
[96241.882148] ata2.00: cmd 60/08:00:68:0a:2d/00:00:18:00:00/40 tag 0 ncq 4096 in
         res 41/00:58:40:5c:2c/00:00:18:00:00/40 Emask 0x1 (device error)

I'm not sure what this error is, it acts like an unrecoverable read error but I'm not seeing UNC reported. It looks like ata 2.00 is sdb, which is a member of a btrfs raid10 volume. So this isn't related to your sdg2 and enospc error, it's a different problem.

I'm not sure of the reason for the "BTRFS info (device sdg2): 2 enospc errors during balance" but it seems informational rather than either a warning or problem. I'd treat ext4->btrfs converted file systems to be something of an odd duck, in that it's uncommon, therefore isn't getting as much testing and extra caution is a good idea. Make frequent backups.

Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Fw: ENOSPC errors during balance
@ 2014-07-19 20:10 Marc Joliet
  2014-07-19 20:58 ` Marc Joliet
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-19 20:10 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2902 bytes --]

Start weitergeleitete Nachricht:
Huh, turns out the Reply-To was to Chris Murphy, so here it is again for the
whole list.

Datum: Sat, 19 Jul 2014 20:34:34 +0200
Von: Marc Joliet <marcec@gmx.de>
An: Chris Murphy <lists@colorremedies.com>
Betreff: Re: ENOSPC errors during balance

Am Sat, 19 Jul 2014 11:38:08 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> The 2nd dmesg (didn't look at the 1st), has many instances like this;
> 
> [96241.882138] ata2.00: exception Emask 0x1 SAct 0x7ffe0fff SErr 0x0 action 0x6 frozen
> [96241.882139] ata2.00: Ata error. fis:0x21
> [96241.882142] ata2.00: failed command: READ FPDMA QUEUED
> [96241.882148] ata2.00: cmd 60/08:00:68:0a:2d/00:00:18:00:00/40 tag 0 ncq 4096 in
>          res 41/00:58:40:5c:2c/00:00:18:00:00/40 Emask 0x1 (device error)
> 
> I'm not sure what this error is, it acts like an unrecoverable read error but I'm not seeing UNC reported. It looks like ata 2.00 is sdb, which is a member of a btrfs raid10 volume. So this isn't related to your sdg2 and enospc error, it's a different problem.

Yeah, from what I remember reading it's related to nforce2 chipsets, but I
never pursued it, since I never really noticed any consequences (this is an old
computer that I originally build in 2006).  IIRC one workaround is to switch to
1.5gpbs instead of 3gbps (but then, it already is at 1.5 Gbps, but none of the
other ports are?  Might be the hard drive, I *think* it's older than the
others.), another is related to irqbalance (which I forgot about, I've just
switched it off and will see if the messages stop, but then again, my first
dmesg didn't have any of those messages).

Anyway, yes, it's unrelated to my problem :-) .

> I'm not sure of the reason for the "BTRFS info (device sdg2): 2 enospc errors during balance" but it seems informational rather than either a warning or problem. I'd treat ext4->btrfs converted file systems to be something of an odd duck, in that it's uncommon, therefore isn't getting as much testing and extra caution is a good idea. Make frequent backups.

Well, I *could* just recreate the file system.  Since these are my only backups
(no offsite backup as of yet), I wanted to keep the existing ones.  So
btrfs-convert was a convenient way to upgrade.

But since I ended up deleting those backups anyway, I would only be losing my
hourly and a few daily backups.  But it's not as if the file system is otherwise
misbehaving.

Another random idea:  the number of errors decreased the second time I ran
balance (from 4 to 2), I could run another full balance and see if it keeps
decreasing.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 20:10 Fw: ENOSPC errors during balance Marc Joliet
@ 2014-07-19 20:58 ` Marc Joliet
  2014-07-20  0:53   ` Chris Murphy
  2014-07-20  1:11   ` Chris Murphy
  0 siblings, 2 replies; 22+ messages in thread
From: Marc Joliet @ 2014-07-19 20:58 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1111 bytes --]

Am Sat, 19 Jul 2014 22:10:51 +0200
schrieb Marc Joliet <marcec@gmx.de>:

[...]
> Another random idea:  the number of errors decreased the second time I ran
> balance (from 4 to 2), I could run another full balance and see if it keeps
> decreasing.

Well, this time there were still 2 ENOSPC errors.  But I can show the df output
after such an ENOSPC error, to illustrate what I meant with the sudden surge
in total usage:

# btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
Data, single: total=236.00GiB, used=229.04GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.00GiB, used=3.20GiB
unknown, single: total=512.00MiB, used=0.00

And then after running a balance and (almost) immediately cancelling:

# btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
Data, single: total=230.00GiB, used=229.04GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.00GiB, used=3.20GiB
unknown, single: total=512.00MiB, used=0.00

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 17:38 ` Chris Murphy
@ 2014-07-19 21:06   ` Piotr Szymaniak
  2014-07-20  2:39   ` Duncan
  1 sibling, 0 replies; 22+ messages in thread
From: Piotr Szymaniak @ 2014-07-19 21:06 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Marc Joliet, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

On Sat, Jul 19, 2014 at 11:38:08AM -0600, Chris Murphy wrote:
> [96241.882138] ata2.00: exception Emask 0x1 SAct 0x7ffe0fff SErr 0x0 action 0x6 frozen
> [96241.882139] ata2.00: Ata error. fis:0x21
> [96241.882142] ata2.00: failed command: READ FPDMA QUEUED
> [96241.882148] ata2.00: cmd 60/08:00:68:0a:2d/00:00:18:00:00/40 tag 0 ncq 4096 in
>          res 41/00:58:40:5c:2c/00:00:18:00:00/40 Emask 0x1 (device error)

Afair those are somehow related to NCQ.


Piotr Szymaniak.
-- 
Komnata  audiencyjna  zlotego  Bruna  przerastala  wszystko,  co  dotad
widzialem.  Musial  zatrudnic  dziesiatki  programistow i kreatorow, by
stworzyc tak wyuzdane i wysmakowane  wnetrze.  Dzwieki, barwy, ksztalty
i zapachy wywolywaly erekcje.
  -- Marcin Przybylek, "Gamedec: Syndrom Adelheima"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 20:58 ` Marc Joliet
@ 2014-07-20  0:53   ` Chris Murphy
  2014-07-20  9:50     ` Marc Joliet
  2014-07-20  1:11   ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2014-07-20  0:53 UTC (permalink / raw)
  To: Marc Joliet; +Cc: linux-btrfs


On Jul 19, 2014, at 2:58 PM, Marc Joliet <marcec@gmx.de> wrote:

> Am Sat, 19 Jul 2014 22:10:51 +0200
> schrieb Marc Joliet <marcec@gmx.de>:
> 
> [...]
>> Another random idea:  the number of errors decreased the second time I ran
>> balance (from 4 to 2), I could run another full balance and see if it keeps
>> decreasing.
> 
> Well, this time there were still 2 ENOSPC errors.  But I can show the df output
> after such an ENOSPC error, to illustrate what I meant with the sudden surge
> in total usage:
> 
> # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> Data, single: total=236.00GiB, used=229.04GiB
> System, DUP: total=32.00MiB, used=36.00KiB
> Metadata, DUP: total=4.00GiB, used=3.20GiB
> unknown, single: total=512.00MiB, used=0.00
> 
> And then after running a balance and (almost) immediately cancelling:
> 
> # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> Data, single: total=230.00GiB, used=229.04GiB
> System, DUP: total=32.00MiB, used=36.00KiB
> Metadata, DUP: total=4.00GiB, used=3.20GiB
> unknown, single: total=512.00MiB, used=0.00

I think it's a bit weird. Two options: a. Keep using the file system, with judicious backups, if a dev wants more info they'll reply to the thread; b. Migrate the data to a new file system, first capture the file system with btrfs-image in case a dev wants more info and you've since blown away the filesystem, and then move it to a new btrfs fs. I'd use send/receive for this to preserve subvolumes and snapshots.


Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 20:58 ` Marc Joliet
  2014-07-20  0:53   ` Chris Murphy
@ 2014-07-20  1:11   ` Chris Murphy
  2014-07-20  9:48     ` Marc Joliet
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2014-07-20  1:11 UTC (permalink / raw)
  To: Marc Joliet; +Cc: linux-btrfs

I'm seeing this also in the 2nd dmesg:

[  249.893310] BTRFS error (device sdg2): free space inode generation (0) did not match free space cache generation (26286)

So you could try umounting the volume. And doing a one time mount with the clear_cache mount option. Give it some time to rebuild the space cache.

After that you could umount again, and mount with enospc_debug and try to reproduce the enospc with another balance and see if dmesg contains more information this time.

Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 17:38 ` Chris Murphy
  2014-07-19 21:06   ` Piotr Szymaniak
@ 2014-07-20  2:39   ` Duncan
  2014-07-20 10:22     ` Marc Joliet
  1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2014-07-20  2:39 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Sat, 19 Jul 2014 11:38:08 -0600 as excerpted:

> I'm not sure of the reason for the "BTRFS info (device sdg2): 2 enospc
> errors during balance" but it seems informational rather than either a
> warning or problem. I'd treat ext4->btrfs converted file systems to be
> something of an odd duck, in that it's uncommon, therefore isn't getting
> as much testing and extra caution is a good idea. Make frequent backups.

Expanding on that a bit...

Balance simply rewrites chunks, combining where possible and possibly 
converting to a different layout (single/dup/raid0/1/10/5/6[1]) in the 
process.  The most common reason for enospc during balance is of course 
all space allocated to chunks, with various workarounds for that if it 
happens, but that doesn't seem to be what was happening to you
(Mark J./OP).

Based on very similar issues reported by another ext4 -> btrfs converter 
and the discussion on that thread, here's what I think happened:

First a critical question for you as it's a critical piece of this 
scenario that you didn't mention in your summary.  The wiki page on
ext4 -> btrfs conversion suggests deleting the ext2_saved subvolume and 
then doing a full defrag and rebalance.  You're attempting a full 
rebalance, but have you yet deleted ext2_saved and did you do the defrag 
before attempting the rebalance?

I'm guessing not, as was the case with the other user that reported this 
issue.  Here's what apparently happened in his case and how we fixed it:

The problem is that btrfs data chunks are 1 GiB each.  Thus, the maximum 
size of a btrfs extent is 1 GiB.  But ext4 doesn't have an arbitrary 
limitation on extent size, and for files over a GiB in size, ext4 extents 
can /also/ be over a GiB in size.

That results in two potential issues at balance time.  First, btrfs 
treats the ext2_saved subvolume as a read-only snapshot and won't touch 
it, thus keeping the ext* data intact in case the user wishes to rollback 
to ext*.  I don't think btrfs touches that data during a balance either, 
as it really couldn't do so /safely/ without incorporating all of the 
ext* code into btrfs.  I'm not sure how it expresses that situation, but 
it's quite possible that btrfs treats it as enospc.

Second, for files that had ext4 extents greater than a GiB, balance will 
naturally enospc, because even the biggest possible btrfs extent, a full 
1 GiB data chunk, is too small to hold the existing file extent.  Of 
course this only happens on filesystems converted from ext*, because 
natively btrfs has no way to make an extent larger than a GiB, so it 
won't run into the problem if it was created natively instead of 
converted from ext*.

Once the ext2_saved subvolume/snapshot is deleted, defragging should cure 
the problem as it rewrites those files to btrfs-native chunks, normally 
defragging but in this case fragging to the 1 GiB btrfs-native data-chunk-
size extent size.

Alternatively, and this is what the other guy did, one can find all the 
files from the original ext*fs over a GiB in size, and move them off-
filesystem and back AFAIK he had several gigs of spare RAM and no files 
larger than that, so he used tmpfs as the temporary storage location, 
which is memory so the only I/O is that on the btrfs in question.  By 
doing that he deleted the existing files on btrfs and recreated them, 
naturally splitting the extents on data-chunk-boundaries as btrfs 
normally does, in the recreation.

If you had deleted the ext2_saved subvolume/snapshot and done the defrag 
already, that explanation doesn't work as-is, but I'd still consider it 
an artifact from the conversion, and try the alternative move-off-
filesystem-temporarily method.

If you don't have any files over a GiB in size, then I don't know... 
perhaps it's some other bug.

---
[1] Raid5/6 support not yet complete.  Operational code is there but 
recovery code is still incomplete.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  1:11   ` Chris Murphy
@ 2014-07-20  9:48     ` Marc Joliet
  2014-07-20 19:46       ` Marc Joliet
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-20  9:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Murphy


[-- Attachment #1.1: Type: text/plain, Size: 1421 bytes --]

Am Sat, 19 Jul 2014 19:11:00 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> I'm seeing this also in the 2nd dmesg:
> 
> [  249.893310] BTRFS error (device sdg2): free space inode generation (0) did not match free space cache generation (26286)
> 
> 
> So you could try umounting the volume. And doing a one time mount with the clear_cache mount option. Give it some time to rebuild the space cache.
> 
> After that you could umount again, and mount with enospc_debug and try to reproduce the enospc with another balance and see if dmesg contains more information this time.

OK, I did that, and the new dmesg is attached. Also, some outputs again, first
"filesystem df" (that "total" surge at the end sure is consistent):

# btrfs filesystem df /mnt           
Data, single: total=237.00GiB, used=229.67GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.50GiB, used=3.49GiB
unknown, single: total=512.00MiB, used=0.00

And here what I described in my initial post, the output of "balance status"
immediately after the error (turns out my memory was correct):

btrfs filesystem balance status /mnt
Balance on '/mnt' is running
0 out of about 0 chunks balanced (0 considered), -nan% left

(Also, this is with Gentoo kernel 3.15.6 now.)

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg4.log.xz --]
[-- Type: application/x-xz, Size: 26508 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  0:53   ` Chris Murphy
@ 2014-07-20  9:50     ` Marc Joliet
  0 siblings, 0 replies; 22+ messages in thread
From: Marc Joliet @ 2014-07-20  9:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Murphy

[-- Attachment #1: Type: text/plain, Size: 2124 bytes --]

Am Sat, 19 Jul 2014 18:53:03 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> 
> On Jul 19, 2014, at 2:58 PM, Marc Joliet <marcec@gmx.de> wrote:
> 
> > Am Sat, 19 Jul 2014 22:10:51 +0200
> > schrieb Marc Joliet <marcec@gmx.de>:
> > 
> > [...]
> >> Another random idea:  the number of errors decreased the second time I ran
> >> balance (from 4 to 2), I could run another full balance and see if it keeps
> >> decreasing.
> > 
> > Well, this time there were still 2 ENOSPC errors.  But I can show the df output
> > after such an ENOSPC error, to illustrate what I meant with the sudden surge
> > in total usage:
> > 
> > # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> > Data, single: total=236.00GiB, used=229.04GiB
> > System, DUP: total=32.00MiB, used=36.00KiB
> > Metadata, DUP: total=4.00GiB, used=3.20GiB
> > unknown, single: total=512.00MiB, used=0.00
> > 
> > And then after running a balance and (almost) immediately cancelling:
> > 
> > # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> > Data, single: total=230.00GiB, used=229.04GiB
> > System, DUP: total=32.00MiB, used=36.00KiB
> > Metadata, DUP: total=4.00GiB, used=3.20GiB
> > unknown, single: total=512.00MiB, used=0.00
> 
> I think it's a bit weird. Two options: a. Keep using the file system, with judicious backups, if a dev wants more info they'll reply to the thread; b. Migrate the data to a new file system, first capture the file system with btrfs-image in case a dev wants more info and you've since blown away the filesystem, and then move it to a new btrfs fs. I'd use send/receive for this to preserve subvolumes and snapshots.

OK, I'll keep that in mind.  I'll keep running the file system for now, just in
case it's a run-time error (i.e., a bug in the balance code, and not a problem
with the file system itself).  If it gets trashed on its own, or I move to a new
file system, I'll be sure to follow the steps you outlined.

> Chris Murphy

Thanks
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  2:39   ` Duncan
@ 2014-07-20 10:22     ` Marc Joliet
  2014-07-20 11:40       ` Marc Joliet
  2014-07-20 12:59       ` Duncan
  0 siblings, 2 replies; 22+ messages in thread
From: Marc Joliet @ 2014-07-20 10:22 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5543 bytes --]

Am Sun, 20 Jul 2014 02:39:27 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Chris Murphy posted on Sat, 19 Jul 2014 11:38:08 -0600 as excerpted:
> 
> > I'm not sure of the reason for the "BTRFS info (device sdg2): 2 enospc
> > errors during balance" but it seems informational rather than either a
> > warning or problem. I'd treat ext4->btrfs converted file systems to be
> > something of an odd duck, in that it's uncommon, therefore isn't getting
> > as much testing and extra caution is a good idea. Make frequent backups.
> 
> Expanding on that a bit...
> 
> Balance simply rewrites chunks, combining where possible and possibly 
> converting to a different layout (single/dup/raid0/1/10/5/6[1]) in the 
> process.  The most common reason for enospc during balance is of course 
> all space allocated to chunks, with various workarounds for that if it 
> happens, but that doesn't seem to be what was happening to you
> (Mark J./OP).
> 
> Based on very similar issues reported by another ext4 -> btrfs converter 
> and the discussion on that thread, here's what I think happened:
> 
> First a critical question for you as it's a critical piece of this 
> scenario that you didn't mention in your summary.  The wiki page on
> ext4 -> btrfs conversion suggests deleting the ext2_saved subvolume and 
> then doing a full defrag and rebalance.  You're attempting a full 
> rebalance, but have you yet deleted ext2_saved and did you do the defrag 
> before attempting the rebalance?
> 
> I'm guessing not, as was the case with the other user that reported this 
> issue.  Here's what apparently happened in his case and how we fixed it:

Ah, I actually did, in fact.  I only implicitly said it, though.  Here's what I
wrote:

"After converting the backup partition about a week ago, following the wiki
entry on ext4 conversion, I eventually ran a full balance [...]"

The wiki says to run a full balance (and defragment before that, but that was
sloooooooow, so I didn't do it), *after* deleting the ext4 file system image.
So the full balance was right after doing that :) .

> The problem is that btrfs data chunks are 1 GiB each.  Thus, the maximum 
> size of a btrfs extent is 1 GiB.  But ext4 doesn't have an arbitrary 
> limitation on extent size, and for files over a GiB in size, ext4 extents 
> can /also/ be over a GiB in size.
> 
> That results in two potential issues at balance time.  First, btrfs 
> treats the ext2_saved subvolume as a read-only snapshot and won't touch 
> it, thus keeping the ext* data intact in case the user wishes to rollback 
> to ext*.  I don't think btrfs touches that data during a balance either, 
> as it really couldn't do so /safely/ without incorporating all of the 
> ext* code into btrfs.  I'm not sure how it expresses that situation, but 
> it's quite possible that btrfs treats it as enospc.
> 
> Second, for files that had ext4 extents greater than a GiB, balance will 
> naturally enospc, because even the biggest possible btrfs extent, a full 
> 1 GiB data chunk, is too small to hold the existing file extent.  Of 
> course this only happens on filesystems converted from ext*, because 
> natively btrfs has no way to make an extent larger than a GiB, so it 
> won't run into the problem if it was created natively instead of 
> converted from ext*.
> 
> Once the ext2_saved subvolume/snapshot is deleted, defragging should cure 
> the problem as it rewrites those files to btrfs-native chunks, normally 
> defragging but in this case fragging to the 1 GiB btrfs-native data-chunk-
> size extent size.

Hmm, well, I didn't defragment because it would have taken *forever* to go
through all those hardlinks, plus my experience is that ext* doesn't fragment
much at all, so I skipped that step.  But I certainly have files over 1GB in
size.

On the other hand, the wiki [0] says that defragmentation (and balancing) is
optional, and the only reason stated for doing either is because they "will have
impact on performance".

> Alternatively, and this is what the other guy did, one can find all the 
> files from the original ext*fs over a GiB in size, and move them off-
> filesystem and back AFAIK he had several gigs of spare RAM and no files 
> larger than that, so he used tmpfs as the temporary storage location, 
> which is memory so the only I/O is that on the btrfs in question.  By 
> doing that he deleted the existing files on btrfs and recreated them, 
> naturally splitting the extents on data-chunk-boundaries as btrfs 
> normally does, in the recreation.
> 
> If you had deleted the ext2_saved subvolume/snapshot and done the defrag 
> already, that explanation doesn't work as-is, but I'd still consider it 
> an artifact from the conversion, and try the alternative move-off-
> filesystem-temporarily method.

I'll try this and see, but I think I have more files >1GB than would account
for this error (which comes towards the end of the balance when only a few
chunks are left).  I'll see what "find /mnt -type f -size +1G" finds :) .

> If you don't have any files over a GiB in size, then I don't know... 
> perhaps it's some other bug.
> 
> ---
> [1] Raid5/6 support not yet complete.  Operational code is there but 
> recovery code is still incomplete.

[0] https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3

Thanks
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20 10:22     ` Marc Joliet
@ 2014-07-20 11:40       ` Marc Joliet
  2014-07-20 19:44         ` Marc Joliet
  2014-07-20 12:59       ` Duncan
  1 sibling, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-20 11:40 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

Am Sun, 20 Jul 2014 12:22:33 +0200
schrieb Marc Joliet <marcec@gmx.de>:

[...]
> I'll try this and see, but I think I have more files >1GB than would account
> for this error (which comes towards the end of the balance when only a few
> chunks are left).  I'll see what "find /mnt -type f -size +1G" finds :) .

Now that I think about it, though, it sounds like it could explain the sudden
surge in total data size: for one very big file, several chunks/extents are
created, but the data cannot be copied from the original ext4 extent.

So far, the above find command has only found a few handful of files (plus all
the reflinks in the snapshots), much to my surprise. It still has one subvolume
to go through, though.

And just for completeness, that same find command didn't find any files on /,
which I also converted from ext4, and for which a full balance completed
successfully.  So maybe this is in the right direction, but I'll wait and see
what Chris Murphy (or anyone else) might find in my latest dmesg output.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20 10:22     ` Marc Joliet
  2014-07-20 11:40       ` Marc Joliet
@ 2014-07-20 12:59       ` Duncan
  2014-07-21 11:01         ` Brendan Hide
  1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2014-07-20 12:59 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Sun, 20 Jul 2014 12:22:33 +0200 as excerpted:

> On the other hand, the wiki [0] says that defragmentation (and
> balancing) is optional, and the only reason stated for doing either is
> because they "will have impact on performance".

Yes.  That's what threw off the other guy as well.  He decided to skip it 
for the same reason.

If I had a wiki account I'd change it, but for whatever reason I tend to 
be far more comfortable writing list replies, sometimes repeatedly, than 
writing anything on the web, which I tend to treat as read-only.  So I've 
never gotten a wiki account and thus haven't changed it, and apparently 
the other guy with the problem and anyone else that knows hasn't changed 
it either, so the conversion page still continues to underemphasize the 
importance of completing the conversion steps, including the defrag, in 
proper order.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20 11:40       ` Marc Joliet
@ 2014-07-20 19:44         ` Marc Joliet
  2014-07-21  2:41           ` Duncan
  2014-07-21 13:22           ` Marc Joliet
  0 siblings, 2 replies; 22+ messages in thread
From: Marc Joliet @ 2014-07-20 19:44 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

Am Sun, 20 Jul 2014 13:40:54 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> Am Sun, 20 Jul 2014 12:22:33 +0200
> schrieb Marc Joliet <marcec@gmx.de>:
> 
> [...]
> > I'll try this and see, but I think I have more files >1GB than would account
> > for this error (which comes towards the end of the balance when only a few
> > chunks are left).  I'll see what "find /mnt -type f -size +1G" finds :) .
> 
> Now that I think about it, though, it sounds like it could explain the sudden
> surge in total data size: for one very big file, several chunks/extents are
> created, but the data cannot be copied from the original ext4 extent.

Well, turns out that was it!

What I did:

- delete the single largest file on the file system, a 12 GB VM image, along
  with all subvolumes that contained it
- rsync it over again
- start a full balance

This time, the balance finished successfully :-) .

I'll do another full balance in a few days, to see if it sticks, otherwise,
thanks for all the help!

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  9:48     ` Marc Joliet
@ 2014-07-20 19:46       ` Marc Joliet
  0 siblings, 0 replies; 22+ messages in thread
From: Marc Joliet @ 2014-07-20 19:46 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 248 bytes --]

Oh, and because I'm forgetful, here the new dmesg output.  The new content
(relative to dmesg4) starts at line 2513.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg5.log.xz --]
[-- Type: application/x-xz, Size: 24548 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20 19:44         ` Marc Joliet
@ 2014-07-21  2:41           ` Duncan
  2014-07-21 13:22           ` Marc Joliet
  1 sibling, 0 replies; 22+ messages in thread
From: Duncan @ 2014-07-21  2:41 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Sun, 20 Jul 2014 21:44:40 +0200 as excerpted:

> Am Sun, 20 Jul 2014 13:40:54 +0200 schrieb Marc Joliet <marcec@gmx.de>:
> 
>> Am Sun, 20 Jul 2014 12:22:33 +0200 schrieb Marc Joliet <marcec@gmx.de>:
>> 
>> [...]
>> > I'll try this and see, but I think I have more files >1GB than would
>> > account for this error (which comes towards the end of the balance
>> > when only a few chunks are left).  I'll see what "find /mnt -type f
>> > -size +1G" finds :) .

Note that it's extent's over 1 GiB on the converted former ext4, not 
necessarily files over 1 GiB.  You may have files over a GiB that were 
already broken into extents that are all less than a GiB, and btrfs would 
be able to deal with them fine.  It's only when a single extent ended up 
larger than a GiB on the former ext4 that btrfs can't deal with it.

>> Now that I think about it, though, it sounds like it could explain the
>> sudden surge in total data size: for one very big file, several
>> chunks/extents are created, but the data cannot be copied from the
>> original ext4 extent.

I hadn't thought about that effect, but good deductive reasoning. =:^)

> Well, turns out that was it!
> 
> What I did:
> 
> - delete the single largest file on the file system, a 12 GB VM image,
> along with all subvolumes that contained it
> - rsync it over again - start a full balance
> 
> This time, the balance finished successfully :-) .

Good to read!

We're now two for two on this technique working around this problem! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20 12:59       ` Duncan
@ 2014-07-21 11:01         ` Brendan Hide
  0 siblings, 0 replies; 22+ messages in thread
From: Brendan Hide @ 2014-07-21 11:01 UTC (permalink / raw)
  To: Duncan, linux-btrfs Mailing list; +Cc: Marc Joliet

On 20/07/14 14:59, Duncan wrote:
> Marc Joliet posted on Sun, 20 Jul 2014 12:22:33 +0200 as excerpted:
>
>> On the other hand, the wiki [0] says that defragmentation (and
>> balancing) is optional, and the only reason stated for doing either is
>> because they "will have impact on performance".
> Yes.  That's what threw off the other guy as well.  He decided to skip it
> for the same reason.
>
> If I had a wiki account I'd change it, but for whatever reason I tend to
> be far more comfortable writing list replies, sometimes repeatedly, than
> writing anything on the web, which I tend to treat as read-only.  So I've
> never gotten a wiki account and thus haven't changed it, and apparently
> the other guy with the problem and anyone else that knows hasn't changed
> it either, so the conversion page still continues to underemphasize the
> importance of completing the conversion steps, including the defrag, in
> proper order.
>
I've inserted information specific to this in the wiki. Others with wiki 
accounts, feel free to review:
https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3#Before_first_use

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20 19:44         ` Marc Joliet
  2014-07-21  2:41           ` Duncan
@ 2014-07-21 13:22           ` Marc Joliet
  2014-07-21 22:30             ` Marc Joliet
  1 sibling, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-21 13:22 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

Am Sun, 20 Jul 2014 21:44:40 +0200
schrieb Marc Joliet <marcec@gmx.de>:

[...]
> What I did:
> 
> - delete the single largest file on the file system, a 12 GB VM image, along
>   with all subvolumes that contained it
> - rsync it over again
[...]

I want to point out at this point, though, that doing those two steps freed a
disproportionate amount of space.  The image file is only 12 GB, and it hadn't
changed in any of the snapshots (I haven't used this VM since June), so that
"subvolume delete -c <snapshots>" returned after a few seconds. Yet deleting it
seems to have freed up twice as much. You can see this from the "filesystem df"
output: before, "used" was at 229.04 GiB, and after deleting it and copying it
back (and after a day's worth of backups) went down to 218 GiB.

Does anyone have any idea how this happened?

Actually, now I remember something that is probably related: when I first
moved to my current backup scheme last week, I first copied the data from the
last rsnapshot based backup with "cp --reflink" to the new backup location, but
forgot to use "-a".  I interrupted it and ran "cp -a -u --reflink", but it had
already copied a lot, and I was too impatient to start over; after all, the
data hadn't changed.  Then, when rsync (with --inplace) ran for the first time,
all of these files with wrong permissions and different time stamps were copied
over, but for some reason, the space used increased *greatly*; *much* more than
I would expect from changed metadata.

The total size of the file system data should be around 142 GB (+ snapshots),
but, well, it's more than 1.5 times as much.

Perhaps cp --reflink treats hard links differently than expected?  I would have
expected the data pointed to by the hard link to have been referenced, but
maybe something else happened?

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-21 13:22           ` Marc Joliet
@ 2014-07-21 22:30             ` Marc Joliet
  2014-07-21 23:30               ` Marc Joliet
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-21 22:30 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]

Am Mon, 21 Jul 2014 15:22:16 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> Am Sun, 20 Jul 2014 21:44:40 +0200
> schrieb Marc Joliet <marcec@gmx.de>:
> 
> [...]
> > What I did:
> > 
> > - delete the single largest file on the file system, a 12 GB VM image, along
> >   with all subvolumes that contained it
> > - rsync it over again
> [...]
> 
> I want to point out at this point, though, that doing those two steps freed a
> disproportionate amount of space.  The image file is only 12 GB, and it hadn't
> changed in any of the snapshots (I haven't used this VM since June), so that
> "subvolume delete -c <snapshots>" returned after a few seconds. Yet deleting it
> seems to have freed up twice as much. You can see this from the "filesystem df"
> output: before, "used" was at 229.04 GiB, and after deleting it and copying it
> back (and after a day's worth of backups) went down to 218 GiB.
> 
> Does anyone have any idea how this happened?
> 
> Actually, now I remember something that is probably related: when I first
> moved to my current backup scheme last week, I first copied the data from the
> last rsnapshot based backup with "cp --reflink" to the new backup location, but
> forgot to use "-a".  I interrupted it and ran "cp -a -u --reflink", but it had
> already copied a lot, and I was too impatient to start over; after all, the
> data hadn't changed.  Then, when rsync (with --inplace) ran for the first time,
> all of these files with wrong permissions and different time stamps were copied
> over, but for some reason, the space used increased *greatly*; *much* more than
> I would expect from changed metadata.
> 
> The total size of the file system data should be around 142 GB (+ snapshots),
> but, well, it's more than 1.5 times as much.
> 
> Perhaps cp --reflink treats hard links differently than expected?  I would have
> expected the data pointed to by the hard link to have been referenced, but
> maybe something else happened?

Hah, OK, apparently when my daily backup removed the oldest daily snapshot, it
freed up whatever was taking up so much space, so as of now the file system
uses only 169.14 GiB (from 218).  Weird.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-21 22:30             ` Marc Joliet
@ 2014-07-21 23:30               ` Marc Joliet
  2014-07-22  3:26                 ` Duncan
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-21 23:30 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3248 bytes --]

Am Tue, 22 Jul 2014 00:30:57 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> Am Mon, 21 Jul 2014 15:22:16 +0200
> schrieb Marc Joliet <marcec@gmx.de>:
> 
> > Am Sun, 20 Jul 2014 21:44:40 +0200
> > schrieb Marc Joliet <marcec@gmx.de>:
> > 
> > [...]
> > > What I did:
> > > 
> > > - delete the single largest file on the file system, a 12 GB VM image, along
> > >   with all subvolumes that contained it
> > > - rsync it over again
> > [...]
> > 
> > I want to point out at this point, though, that doing those two steps freed a
> > disproportionate amount of space.  The image file is only 12 GB, and it hadn't
> > changed in any of the snapshots (I haven't used this VM since June), so that
> > "subvolume delete -c <snapshots>" returned after a few seconds. Yet deleting it
> > seems to have freed up twice as much. You can see this from the "filesystem df"
> > output: before, "used" was at 229.04 GiB, and after deleting it and copying it
> > back (and after a day's worth of backups) went down to 218 GiB.
> > 
> > Does anyone have any idea how this happened?
> > 
> > Actually, now I remember something that is probably related: when I first
> > moved to my current backup scheme last week, I first copied the data from the
> > last rsnapshot based backup with "cp --reflink" to the new backup location, but
> > forgot to use "-a".  I interrupted it and ran "cp -a -u --reflink", but it had
> > already copied a lot, and I was too impatient to start over; after all, the
> > data hadn't changed.  Then, when rsync (with --inplace) ran for the first time,
> > all of these files with wrong permissions and different time stamps were copied
> > over, but for some reason, the space used increased *greatly*; *much* more than
> > I would expect from changed metadata.
> > 
> > The total size of the file system data should be around 142 GB (+ snapshots),
> > but, well, it's more than 1.5 times as much.
> > 
> > Perhaps cp --reflink treats hard links differently than expected?  I would have
> > expected the data pointed to by the hard link to have been referenced, but
> > maybe something else happened?
> 
> Hah, OK, apparently when my daily backup removed the oldest daily snapshot, it
> freed up whatever was taking up so much space, so as of now the file system
> uses only 169.14 GiB (from 218).  Weird.

And now that the background deletion of the old snapshots is done, the file
system ended up at:

# btrfs filesystem df /run/media/marcec/MARCEC_BACKUP    
Data, single: total=219.00GiB, used=140.13GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.50GiB, used=2.40GiB
unknown, single: total=512.00MiB, used=0.00

I don't know how reliable du is for this, but I used it to estimate how much
used data I should expect, and I get 138 GiB.  That means that the snapshots
yield about 2 GiB "overhead", which is very reasonable, I think.  Obviously
I'll be starting a full balance now.

I still think this whole... thing is very odd, hopefully somebody can shed
some light on it for me (maybe it's obvious, but I don't see it).

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-21 23:30               ` Marc Joliet
@ 2014-07-22  3:26                 ` Duncan
  2014-07-22  7:37                   ` Marc Joliet
  0 siblings, 1 reply; 22+ messages in thread
From: Duncan @ 2014-07-22  3:26 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Tue, 22 Jul 2014 01:30:22 +0200 as excerpted:

> And now that the background deletion of the old snapshots is done, the file
> system ended up at:
> 
> # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP    
> Data, single: total=219.00GiB, used=140.13GiB
> System, DUP: total=32.00MiB, used=36.00KiB
> Metadata, DUP: total=4.50GiB, used=2.40GiB
> unknown, single: total=512.00MiB, used=0.00
> 
> I don't know how reliable du is for this, but I used it to estimate how much
> used data I should expect, and I get 138 GiB.  That means that the snapshots
> yield about 2 GiB "overhead", which is very reasonable, I think.  Obviously
> I'll be starting a full balance now.

FWIW, the balance should reduce the data total quite a bit, to 141-ish GiB
(might be 142 or 145, but it should definitely come down from 219 GiB),
because the spread between total and used is relatively high, now, and balance
is what's used to bring that back down.

Metadata total will probably come down a bit as well, to 3.00 GiB or so.

What's going on there is this:  Btrfs allocates and deallocates data and
metadata in two stages.  First it allocates chunks, 1 GiB in size for
data, 256 MiB in size for metadata, but because metadata is dup by default
it allocates two chunks so half a GiB at a time, there.  Then the actual
file data and metadata can be written into the pre-allocated chunks, filling
them up.  As they near full, more chunks will be allocated from the
unallocated pool as necessary.

But on file deletion, btrfs only automatically handles the file
data/metadata level; it doesn't (yet) automatically deallocate the chunks,
nor can it change the allocation from say a data chunk to a metadata chunk.
So when a chunk is allocated, it stays allocated.

That's the spread you see in btrfs filesystem df, between total and used,
for each chunk type.

The way to recover those allocated but unused chunks to the unallocated
pool, so they can be reallocated between data and metadata as necessary,
is with a balance.  That balance, therefore, should reduce the spread
seen in the above between total and used.

Meanwhile, btrfs filesystem df shows the spread between allocated and
used for each type, but what about unallocated?  Simple.  Btrfs
filesystem show lists total filesystem size as well as allocated
usage for each device.  (The total line is something else, I recommend
ignoring it as it's simply confusing.  Only pay attention to the
individual device lines.)

Thus, to get a proper picture of the space usage status on a btrfs
filesystem, you must have both the btrfs filesystem show and
btrfs filesystem df output for that filesystem, show to tell
you how much of the total space is chunk-allocated for each device,
df to tell you what those allocations are, and how much of the
chunk-allocated space is actually used, for each allocation type.

It's wise to keep track of the show output in particular, and
when the spread between used (allocated) and total for each
device gets low, under a few GiB, check btrfs fi df and see
what's using that space unnecessarily and then do a balance
to recover it, if possible.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-22  3:26                 ` Duncan
@ 2014-07-22  7:37                   ` Marc Joliet
  0 siblings, 0 replies; 22+ messages in thread
From: Marc Joliet @ 2014-07-22  7:37 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]

Am Tue, 22 Jul 2014 03:26:39 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Marc Joliet posted on Tue, 22 Jul 2014 01:30:22 +0200 as excerpted:
> 
> > And now that the background deletion of the old snapshots is done, the file
> > system ended up at:
> > 
> > # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP    
> > Data, single: total=219.00GiB, used=140.13GiB
> > System, DUP: total=32.00MiB, used=36.00KiB
> > Metadata, DUP: total=4.50GiB, used=2.40GiB
> > unknown, single: total=512.00MiB, used=0.00
> > 
> > I don't know how reliable du is for this, but I used it to estimate how much
> > used data I should expect, and I get 138 GiB.  That means that the snapshots
> > yield about 2 GiB "overhead", which is very reasonable, I think.  Obviously
> > I'll be starting a full balance now.
> 
[snip total/used discussion]

No, you misunderstand: read my email three steps above yours (from the 21. at
15:22).  I am wondering about why the disk usage ballooned to >200 GiB in the
first place.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-07-22  7:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-19 20:10 Fw: ENOSPC errors during balance Marc Joliet
2014-07-19 20:58 ` Marc Joliet
2014-07-20  0:53   ` Chris Murphy
2014-07-20  9:50     ` Marc Joliet
2014-07-20  1:11   ` Chris Murphy
2014-07-20  9:48     ` Marc Joliet
2014-07-20 19:46       ` Marc Joliet
  -- strict thread matches above, loose matches on Subject: below --
2014-07-19 15:26 Marc Joliet
2014-07-19 17:38 ` Chris Murphy
2014-07-19 21:06   ` Piotr Szymaniak
2014-07-20  2:39   ` Duncan
2014-07-20 10:22     ` Marc Joliet
2014-07-20 11:40       ` Marc Joliet
2014-07-20 19:44         ` Marc Joliet
2014-07-21  2:41           ` Duncan
2014-07-21 13:22           ` Marc Joliet
2014-07-21 22:30             ` Marc Joliet
2014-07-21 23:30               ` Marc Joliet
2014-07-22  3:26                 ` Duncan
2014-07-22  7:37                   ` Marc Joliet
2014-07-20 12:59       ` Duncan
2014-07-21 11:01         ` Brendan Hide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).