Fw: ENOSPC errors during balance

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fw: ENOSPC errors during balance
@ 2014-07-19 20:10 Marc Joliet
  2014-07-19 20:58 ` Marc Joliet
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Joliet @ 2014-07-19 20:10 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2902 bytes --]

Start weitergeleitete Nachricht:
Huh, turns out the Reply-To was to Chris Murphy, so here it is again for the
whole list.

Datum: Sat, 19 Jul 2014 20:34:34 +0200
Von: Marc Joliet <marcec@gmx.de>
An: Chris Murphy <lists@colorremedies.com>
Betreff: Re: ENOSPC errors during balance

Am Sat, 19 Jul 2014 11:38:08 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> The 2nd dmesg (didn't look at the 1st), has many instances like this;
> 
> [96241.882138] ata2.00: exception Emask 0x1 SAct 0x7ffe0fff SErr 0x0 action 0x6 frozen
> [96241.882139] ata2.00: Ata error. fis:0x21
> [96241.882142] ata2.00: failed command: READ FPDMA QUEUED
> [96241.882148] ata2.00: cmd 60/08:00:68:0a:2d/00:00:18:00:00/40 tag 0 ncq 4096 in
>          res 41/00:58:40:5c:2c/00:00:18:00:00/40 Emask 0x1 (device error)
> 
> I'm not sure what this error is, it acts like an unrecoverable read error but I'm not seeing UNC reported. It looks like ata 2.00 is sdb, which is a member of a btrfs raid10 volume. So this isn't related to your sdg2 and enospc error, it's a different problem.

Yeah, from what I remember reading it's related to nforce2 chipsets, but I
never pursued it, since I never really noticed any consequences (this is an old
computer that I originally build in 2006).  IIRC one workaround is to switch to
1.5gpbs instead of 3gbps (but then, it already is at 1.5 Gbps, but none of the
other ports are?  Might be the hard drive, I *think* it's older than the
others.), another is related to irqbalance (which I forgot about, I've just
switched it off and will see if the messages stop, but then again, my first
dmesg didn't have any of those messages).

Anyway, yes, it's unrelated to my problem :-) .

> I'm not sure of the reason for the "BTRFS info (device sdg2): 2 enospc errors during balance" but it seems informational rather than either a warning or problem. I'd treat ext4->btrfs converted file systems to be something of an odd duck, in that it's uncommon, therefore isn't getting as much testing and extra caution is a good idea. Make frequent backups.

Well, I *could* just recreate the file system.  Since these are my only backups
(no offsite backup as of yet), I wanted to keep the existing ones.  So
btrfs-convert was a convenient way to upgrade.

But since I ended up deleting those backups anyway, I would only be losing my
hourly and a few daily backups.  But it's not as if the file system is otherwise
misbehaving.

Another random idea:  the number of errors decreased the second time I ran
balance (from 4 to 2), I could run another full balance and see if it keeps
decreasing.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 20:10 Fw: ENOSPC errors during balance Marc Joliet
@ 2014-07-19 20:58 ` Marc Joliet
  2014-07-20  0:53   ` Chris Murphy
  2014-07-20  1:11   ` Chris Murphy
  0 siblings, 2 replies; 7+ messages in thread
From: Marc Joliet @ 2014-07-19 20:58 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1111 bytes --]

Am Sat, 19 Jul 2014 22:10:51 +0200
schrieb Marc Joliet <marcec@gmx.de>:

[...]
> Another random idea:  the number of errors decreased the second time I ran
> balance (from 4 to 2), I could run another full balance and see if it keeps
> decreasing.

Well, this time there were still 2 ENOSPC errors.  But I can show the df output
after such an ENOSPC error, to illustrate what I meant with the sudden surge
in total usage:

# btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
Data, single: total=236.00GiB, used=229.04GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.00GiB, used=3.20GiB
unknown, single: total=512.00MiB, used=0.00

And then after running a balance and (almost) immediately cancelling:

# btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
Data, single: total=230.00GiB, used=229.04GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.00GiB, used=3.20GiB
unknown, single: total=512.00MiB, used=0.00

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 20:58 ` Marc Joliet
@ 2014-07-20  0:53   ` Chris Murphy
  2014-07-20  9:50     ` Marc Joliet
  2014-07-20  1:11   ` Chris Murphy
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2014-07-20  0:53 UTC (permalink / raw)
  To: Marc Joliet; +Cc: linux-btrfs


On Jul 19, 2014, at 2:58 PM, Marc Joliet <marcec@gmx.de> wrote:

> Am Sat, 19 Jul 2014 22:10:51 +0200
> schrieb Marc Joliet <marcec@gmx.de>:
> 
> [...]
>> Another random idea:  the number of errors decreased the second time I ran
>> balance (from 4 to 2), I could run another full balance and see if it keeps
>> decreasing.
> 
> Well, this time there were still 2 ENOSPC errors.  But I can show the df output
> after such an ENOSPC error, to illustrate what I meant with the sudden surge
> in total usage:
> 
> # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> Data, single: total=236.00GiB, used=229.04GiB
> System, DUP: total=32.00MiB, used=36.00KiB
> Metadata, DUP: total=4.00GiB, used=3.20GiB
> unknown, single: total=512.00MiB, used=0.00
> 
> And then after running a balance and (almost) immediately cancelling:
> 
> # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> Data, single: total=230.00GiB, used=229.04GiB
> System, DUP: total=32.00MiB, used=36.00KiB
> Metadata, DUP: total=4.00GiB, used=3.20GiB
> unknown, single: total=512.00MiB, used=0.00

I think it's a bit weird. Two options: a. Keep using the file system, with judicious backups, if a dev wants more info they'll reply to the thread; b. Migrate the data to a new file system, first capture the file system with btrfs-image in case a dev wants more info and you've since blown away the filesystem, and then move it to a new btrfs fs. I'd use send/receive for this to preserve subvolumes and snapshots.


Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-19 20:58 ` Marc Joliet
  2014-07-20  0:53   ` Chris Murphy
@ 2014-07-20  1:11   ` Chris Murphy
  2014-07-20  9:48     ` Marc Joliet
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2014-07-20  1:11 UTC (permalink / raw)
  To: Marc Joliet; +Cc: linux-btrfs

I'm seeing this also in the 2nd dmesg:

[  249.893310] BTRFS error (device sdg2): free space inode generation (0) did not match free space cache generation (26286)

So you could try umounting the volume. And doing a one time mount with the clear_cache mount option. Give it some time to rebuild the space cache.

After that you could umount again, and mount with enospc_debug and try to reproduce the enospc with another balance and see if dmesg contains more information this time.

Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  1:11   ` Chris Murphy
@ 2014-07-20  9:48     ` Marc Joliet
  2014-07-20 19:46       ` Marc Joliet
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Joliet @ 2014-07-20  9:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Murphy


[-- Attachment #1.1: Type: text/plain, Size: 1421 bytes --]

Am Sat, 19 Jul 2014 19:11:00 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> I'm seeing this also in the 2nd dmesg:
> 
> [  249.893310] BTRFS error (device sdg2): free space inode generation (0) did not match free space cache generation (26286)
> 
> 
> So you could try umounting the volume. And doing a one time mount with the clear_cache mount option. Give it some time to rebuild the space cache.
> 
> After that you could umount again, and mount with enospc_debug and try to reproduce the enospc with another balance and see if dmesg contains more information this time.

OK, I did that, and the new dmesg is attached. Also, some outputs again, first
"filesystem df" (that "total" surge at the end sure is consistent):

# btrfs filesystem df /mnt           
Data, single: total=237.00GiB, used=229.67GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.50GiB, used=3.49GiB
unknown, single: total=512.00MiB, used=0.00

And here what I described in my initial post, the output of "balance status"
immediately after the error (turns out my memory was correct):

btrfs filesystem balance status /mnt
Balance on '/mnt' is running
0 out of about 0 chunks balanced (0 considered), -nan% left

(Also, this is with Gentoo kernel 3.15.6 now.)

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg4.log.xz --]
[-- Type: application/x-xz, Size: 26508 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  0:53   ` Chris Murphy
@ 2014-07-20  9:50     ` Marc Joliet
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Joliet @ 2014-07-20  9:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Murphy

[-- Attachment #1: Type: text/plain, Size: 2124 bytes --]

Am Sat, 19 Jul 2014 18:53:03 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> 
> On Jul 19, 2014, at 2:58 PM, Marc Joliet <marcec@gmx.de> wrote:
> 
> > Am Sat, 19 Jul 2014 22:10:51 +0200
> > schrieb Marc Joliet <marcec@gmx.de>:
> > 
> > [...]
> >> Another random idea:  the number of errors decreased the second time I ran
> >> balance (from 4 to 2), I could run another full balance and see if it keeps
> >> decreasing.
> > 
> > Well, this time there were still 2 ENOSPC errors.  But I can show the df output
> > after such an ENOSPC error, to illustrate what I meant with the sudden surge
> > in total usage:
> > 
> > # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> > Data, single: total=236.00GiB, used=229.04GiB
> > System, DUP: total=32.00MiB, used=36.00KiB
> > Metadata, DUP: total=4.00GiB, used=3.20GiB
> > unknown, single: total=512.00MiB, used=0.00
> > 
> > And then after running a balance and (almost) immediately cancelling:
> > 
> > # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
> > Data, single: total=230.00GiB, used=229.04GiB
> > System, DUP: total=32.00MiB, used=36.00KiB
> > Metadata, DUP: total=4.00GiB, used=3.20GiB
> > unknown, single: total=512.00MiB, used=0.00
> 
> I think it's a bit weird. Two options: a. Keep using the file system, with judicious backups, if a dev wants more info they'll reply to the thread; b. Migrate the data to a new file system, first capture the file system with btrfs-image in case a dev wants more info and you've since blown away the filesystem, and then move it to a new btrfs fs. I'd use send/receive for this to preserve subvolumes and snapshots.

OK, I'll keep that in mind.  I'll keep running the file system for now, just in
case it's a run-time error (i.e., a bug in the balance code, and not a problem
with the file system itself).  If it gets trashed on its own, or I move to a new
file system, I'll be sure to follow the steps you outlined.

> Chris Murphy

Thanks
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ENOSPC errors during balance
  2014-07-20  9:48     ` Marc Joliet
@ 2014-07-20 19:46       ` Marc Joliet
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Joliet @ 2014-07-20 19:46 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 248 bytes --]

Oh, and because I'm forgetful, here the new dmesg output.  The new content
(relative to dmesg4) starts at line 2513.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #1.2: dmesg5.log.xz --]
[-- Type: application/x-xz, Size: 24548 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-20 19:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-19 20:10 Fw: ENOSPC errors during balance Marc Joliet
2014-07-19 20:58 ` Marc Joliet
2014-07-20  0:53   ` Chris Murphy
2014-07-20  9:50     ` Marc Joliet
2014-07-20  1:11   ` Chris Murphy
2014-07-20  9:48     ` Marc Joliet
2014-07-20 19:46       ` Marc Joliet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).