linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: ENOSPC errors during balance
@ 2014-07-19 20:10 Marc Joliet
  2014-07-19 20:58 ` Marc Joliet
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-19 20:10 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2902 bytes --]



Start weitergeleitete Nachricht:
Huh, turns out the Reply-To was to Chris Murphy, so here it is again for the
whole list.

Datum: Sat, 19 Jul 2014 20:34:34 +0200
Von: Marc Joliet <marcec@gmx.de>
An: Chris Murphy <lists@colorremedies.com>
Betreff: Re: ENOSPC errors during balance


Am Sat, 19 Jul 2014 11:38:08 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> The 2nd dmesg (didn't look at the 1st), has many instances like this;
> 
> [96241.882138] ata2.00: exception Emask 0x1 SAct 0x7ffe0fff SErr 0x0 action 0x6 frozen
> [96241.882139] ata2.00: Ata error. fis:0x21
> [96241.882142] ata2.00: failed command: READ FPDMA QUEUED
> [96241.882148] ata2.00: cmd 60/08:00:68:0a:2d/00:00:18:00:00/40 tag 0 ncq 4096 in
>          res 41/00:58:40:5c:2c/00:00:18:00:00/40 Emask 0x1 (device error)
> 
> I'm not sure what this error is, it acts like an unrecoverable read error but I'm not seeing UNC reported. It looks like ata 2.00 is sdb, which is a member of a btrfs raid10 volume. So this isn't related to your sdg2 and enospc error, it's a different problem.

Yeah, from what I remember reading it's related to nforce2 chipsets, but I
never pursued it, since I never really noticed any consequences (this is an old
computer that I originally build in 2006).  IIRC one workaround is to switch to
1.5gpbs instead of 3gbps (but then, it already is at 1.5 Gbps, but none of the
other ports are?  Might be the hard drive, I *think* it's older than the
others.), another is related to irqbalance (which I forgot about, I've just
switched it off and will see if the messages stop, but then again, my first
dmesg didn't have any of those messages).

Anyway, yes, it's unrelated to my problem :-) .

> I'm not sure of the reason for the "BTRFS info (device sdg2): 2 enospc errors during balance" but it seems informational rather than either a warning or problem. I'd treat ext4->btrfs converted file systems to be something of an odd duck, in that it's uncommon, therefore isn't getting as much testing and extra caution is a good idea. Make frequent backups.

Well, I *could* just recreate the file system.  Since these are my only backups
(no offsite backup as of yet), I wanted to keep the existing ones.  So
btrfs-convert was a convenient way to upgrade.

But since I ended up deleting those backups anyway, I would only be losing my
hourly and a few daily backups.  But it's not as if the file system is otherwise
misbehaving.

Another random idea:  the number of errors decreased the second time I ran
balance (from 4 to 2), I could run another full balance and see if it keeps
decreasing.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread
* ENOSPC errors during balance
@ 2014-07-19 15:26 Marc Joliet
  2014-07-19 17:38 ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Joliet @ 2014-07-19 15:26 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 7014 bytes --]

Hello all

I'm a somewhat new btrfs user, and have recently finalized the conversion of my
desktop to btrfs with the 1TB backup partition on my 3TB USB3 drive.

The concrete issue is that I have seen two enospc errors this week while running
a full balance on the backup partition.

A little background on the system first: it consists, as of now, of 3 btrfs
file systems (for more details look towards the bottom of this email):

- / on a single 120GB Crucial M500 SSD (data single, metadata DUP)
- /home on 4x320GB disks (data raid10, metadata raid1)
- the aforementioned backup partition on the external USB3 drive (data single,
  metadata DUP)

This is the end result of a conversion process I had started about two to three
months ago.  My system started off with mdraid + LVM2 (/ and /boot on RAID1,
LVM on RAID10), and no SSD, and no backups (nasty, I know).  Both the SSD and
the backup partition were originally ext4, and were converted using
btrfs-convert.

After converting the backup partition about a week ago, following the wiki entry
on ext4 conversion, I eventually ran a full balance (after first converting
metadata to DUP and running a "lighter" balance with -dusage=50 or so, although
that was probably a waste of time).

The full balance was still running the same night, but the morning after I found
that it aborted with ENOSPC (after about 12-13 hours, I think).  My syslog said
the following:

  kernel: BTRFS info (device sdg2): 4 enospc errors during balance

Some additional information: a balance was running on /home (which finished
successfully), and fcron started a weekly scrub on the backup partition, which
finished (also without errors) shortly after the ENOSPC error.

Then, since I wasn't thinking properly, I started a new balance (which ran
fine) before collecting the information below. From memory I can say this:
the total size of metadata and data was nowhere *near* full disk capacity. In
fact, it was very close to the "btrfs fi df" output below. Furthermore, the
output of "btrfs balance status" reported "(-nan)" where "(N considered)"
normally appears (I think the other two numbers were incorrect, too, like "0
out of about 0", but don't remember their exact values).

Now, when I fist set up my backups, I decided on rsnapshot.  After converting
the backup partition to btrfs, rsnapshot got *really* slow, so this week I
switched to plain rsync + btrfs-snap, wrapped in two custom shell scripts (I
will switch to btrfs-send/recv once I think that they are stable). This is
consistently faster, but still more erratic and slower than rsnapshot with an
ext4 target file system (about 8-23 minutes vs. 5-10 minutes).

Finally we arrive at today, where, after deleting the rest of my old rsnapshot
backups, I did a full rebalance (because it freed up a *lot* of space, both
data and metadata), which also aborted with ENOSPC:

  kernel: BTRFS info (device sdg2): 2 enospc errors during balance

This time, after about 3 hours, it was almost done (241 of 244 chunks), and
"balance status" didn't show the -nan I had seen the previous time.  The only
thing I noticed was that "total" space jumped by several GB (from 229 to 236 or
so), while "used" only increased by 1-2 GB a few minutes before the balance
aborted.  Starting and aborting another balance freed the space again.

Looking through my local ML archive, I found some problem reports related to
balance.  The one most similar to mine (AFAICT) is "3.14.2 Debian kernel BTRFS
corruption after balance" from Russel Coker, although in my case the file
system has yet to end up corrupted.

And finally, I'd like to make clear that up to now I have been very happy with
btrfs and  that this is the first real issue I have encountered with it
(although I don't use a lot of its features yet).  For my usage I definitely
like it a lot more than mdraid + LVM2 :-) .

The requested output as per wiki, from shortly after I got the first error:

    marcec marcec # uname -a

    Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
    x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

    marcec marcec # btrfs --version

    Btrfs v3.14.2

    marcec marcec # btrfs fi show

    Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
            Total devices 1 FS bytes used 42.19GiB
            devid    1 size 107.79GiB used 50.06GiB path /dev/sdf1

    Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
            Total devices 4 FS bytes used 476.02GiB
            devid    1 size 298.09GiB used 238.03GiB path /dev/sda
            devid    2 size 298.09GiB used 239.03GiB path /dev/sdb
            devid    3 size 298.09GiB used 240.00GiB path /dev/sdc
            devid    4 size 298.09GiB used 239.00GiB path /dev/sdd

    Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
            Total devices 1 FS bytes used 167.81GiB
            devid    1 size 976.56GiB used 180.06GiB path /dev/sdg2

    Btrfs v3.14.2

    marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP 

    Data, single: total=160.00GiB, used=159.06GiB
    System, DUP: total=32.00MiB, used=28.00KiB
    Metadata, DUP: total=10.00GiB, used=8.80GiB
    unknown, single: total=512.00MiB, used=40.95MiB

And now from today:

    marcec marcec # uname -a

    Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
    x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

    marcec marcec # btrfs --version 

    Btrfs v3.14.2

    marcec marcec # btrfs fi show 

    Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
	        Total devices 1 FS bytes used 42.16GiB
	        devid    1 size 107.79GiB used 50.06GiB path /dev/sdf1

    Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
            Total devices 4 FS bytes used 474.84GiB
            devid    1 size 298.09GiB used 238.03GiB path /dev/sda
            devid    2 size 298.09GiB used 239.03GiB path /dev/sdb
            devid    3 size 298.09GiB used 240.00GiB path /dev/sdc
            devid    4 size 298.09GiB used 239.00GiB path /dev/sdd

    Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
            Total devices 1 FS bytes used 231.97GiB
            devid    1 size 976.56GiB used 237.06GiB path /dev/sdg2

    Btrfs v3.14.2

    marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP 

    Data, single: total=229.00GiB, used=228.77GiB
    System, DUP: total=32.00MiB, used=36.00KiB
    Metadata, DUP: total=4.00GiB, used=3.19GiB
    unknown, single: total=512.00MiB, used=0.00

The output of dmesg from both times is attached; however, to avoid exceeding
100KiB, I compressed them with xz first.

Greetings,
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg.log.xz --]
[-- Type: application/x-xz, Size: 36848 bytes --]

[-- Attachment #1.3: dmesg2.log.xz --]
[-- Type: application/x-xz, Size: 26568 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-07-22  7:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-19 20:10 Fw: ENOSPC errors during balance Marc Joliet
2014-07-19 20:58 ` Marc Joliet
2014-07-20  0:53   ` Chris Murphy
2014-07-20  9:50     ` Marc Joliet
2014-07-20  1:11   ` Chris Murphy
2014-07-20  9:48     ` Marc Joliet
2014-07-20 19:46       ` Marc Joliet
  -- strict thread matches above, loose matches on Subject: below --
2014-07-19 15:26 Marc Joliet
2014-07-19 17:38 ` Chris Murphy
2014-07-19 21:06   ` Piotr Szymaniak
2014-07-20  2:39   ` Duncan
2014-07-20 10:22     ` Marc Joliet
2014-07-20 11:40       ` Marc Joliet
2014-07-20 19:44         ` Marc Joliet
2014-07-21  2:41           ` Duncan
2014-07-21 13:22           ` Marc Joliet
2014-07-21 22:30             ` Marc Joliet
2014-07-21 23:30               ` Marc Joliet
2014-07-22  3:26                 ` Duncan
2014-07-22  7:37                   ` Marc Joliet
2014-07-20 12:59       ` Duncan
2014-07-21 11:01         ` Brendan Hide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).