Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* intermittent -ENOSPC errors on btrfs filesystem with 170G free
@ 2015-05-26 17:36 Lennert Buytenhek
  2015-05-26 17:50 ` Chris Murphy
  2015-05-26 18:08 ` Hugo Mills
  0 siblings, 2 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-05-26 17:36 UTC (permalink / raw)
  To: linux-btrfs

Hi!

The btrfs filesystem on my newly installed laptop has managed to
hose itself rather thoroughly, and it's now in a state where it
works okay if you don't write too much to it, but if you do, it
starts returning -ENOSPC on a random subset of your filesystem
operations until you let it cool down again.

This was a fresh Fedora 21 install, upgraded to F22, installed
about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
and this system has only even run 4.0, and it has never had more
than ~60G on it.  It's currently running:

# uname -a
Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.0

# btrfs fi show
Label: 'foobox'  uuid: [...]
        Total devices 1 FS bytes used 58.87GiB
        devid    1 size 229.97GiB used 229.97GiB path /dev/[...]

# btrfs fi df /
Data, single: total=227.94GiB, used=58.16GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=730.80MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=256.00MiB, used=0.00B

There's zero btrfs related messages in the logs, or anything disk or
filesystem related, except a whole bunch of:

[1843230.259205] systemd-journald[692]: /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: IO error, rotating.
[1843230.290505] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1843230.315511] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1843230.348163] systemd-journald[692]: Failed to write entry (23 items, 626 bytes) despite vacuuming, ignoring: Bad message
[1843230.372496] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1843230.385662] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1843230.385750] systemd-journald[692]: Failed to write entry (23 items, 585 bytes), ignoring: Bad message
[1848548.026408] systemd-journald[692]: Failed to sync system journal: Input/output error
[1848642.374799] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1848642.392197] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1848642.392433] systemd-journald[692]: Failed to write entry (21 items, 796 bytes), ignoring: Bad message
[1848642.405032] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1848642.416944] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device

This is with ~170G free.

It's currently still in a funky state:

[root@foobox lib]# df /
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/dm-0      241145856 62743256 178027464  27% /
[root@foobox lib]# pwd
/var/lib
[root@foobox lib]# touch foo
[root@foobox lib]# rm -f foo
[root@foobox lib]# ls -ald abrt
drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
[root@foobox lib]# mv abrt abrt2
mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
[root@foobox lib]# 

Any tests anyone wants to run on this before I wipe and reinstall
the box?


cheers,
Lennert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek
@ 2015-05-26 17:50 ` Chris Murphy
  2015-05-26 18:02   ` Lennert Buytenhek
  2015-05-26 18:08 ` Hugo Mills
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2015-05-26 17:50 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: Btrfs BTRFS

Before wiping I suggest making an image with btrfs-image. And then
also see if any additional messages appear with the enospc_debug mount
option. And also see if there's any correlation between the journald
reported failures (the specific .journal file) and whether it's +C by
using lsattr. Fedora 21 systemd journal files do not have +C by
default, whereas Fedora 22 systemd journal files do use +C by default.
So I'd expect the journal files to be a mix. Also let us know if there
are snapshots that affect /var/log/journal (either reflink copies of
the journal files or a snapshot of the root subvolume which contains
/var/log/journal.


Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 17:50 ` Chris Murphy
@ 2015-05-26 18:02   ` Lennert Buytenhek
  0 siblings, 0 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-05-26 18:02 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Tue, May 26, 2015 at 11:50:00AM -0600, Chris Murphy wrote:

> Before wiping I suggest making an image with btrfs-image. And then
> also see if any additional messages appear with the enospc_debug mount
> option. And also see if there's any correlation between the journald
> reported failures (the specific .journal file) and whether it's +C by
> using lsattr. Fedora 21 systemd journal files do not have +C by
> default, whereas Fedora 22 systemd journal files do use +C by default.
> So I'd expect the journal files to be a mix. Also let us know if there
> are snapshots that affect /var/log/journal (either reflink copies of
> the journal files or a snapshot of the root subvolume which contains
> /var/log/journal.

I have:

[root@foobox 5997c521ad1f4293842511a8ae54ff19]# pwd
/var/log/journal/5997c521ad1f4293842511a8ae54ff19
[root@foobox 5997c521ad1f4293842511a8ae54ff19]# lsattr
---------------C ./system@7ea3ae63cd0247cbbe8f33bfec625725-0000000000000001-0005140f825e6aa3.journal
---------------C ./user-1000@f2b557cdffd5430caff9632200741e18-00000000000007dd-0005140f840dd4e4.journal
---------------C ./user-1000@f2b557cdffd5430caff9632200741e18-0000000000019aba-0005167499cd62e4.journal
---------------C ./user-1000.journal
---------------C ./system@00000000000000000000000000000000-0000000000000000-0000000000000000.journal
---------------C ./system.journal


And I tried remounting with enospc_debug, which seems to have
succeeded:

[root@foobox ~]# cat /proc/mounts  | grep enos
/dev/dm-0 / btrfs rw,seclabel,relatime,ssd,space_cache,enospc_debug 0 0


But I still get the -ENOSPC errors:

[root@foobox ~]# cd /var/lib
[root@foobox lib]# mv abrt abrt2
mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
[root@foobox lib]#

Yet nothing is appearing in dmesg about this.


I didn't create any snapshots, and I don't think I have any.

[root@foobox ~]# btrfs subvolume list /
ID 257 gen 125768 top level 5 path root
ID 260 gen 31676 top level 257 path var/lib/machines


I think the journal thing is somewhat of a red herring, though, as lots
of I/O is returning -ENOSPC right now, not just the journal related
operations, yet only journald is syslogging the I/O errors it sees.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek
  2015-05-26 17:50 ` Chris Murphy
@ 2015-05-26 18:08 ` Hugo Mills
  2015-05-26 18:18   ` Chris Murphy
  2015-06-09 10:54   ` Lennert Buytenhek
  1 sibling, 2 replies; 9+ messages in thread
From: Hugo Mills @ 2015-05-26 18:08 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2633 bytes --]

On Tue, May 26, 2015 at 08:36:34PM +0300, Lennert Buytenhek wrote:
> Hi!
> 
> The btrfs filesystem on my newly installed laptop has managed to
> hose itself rather thoroughly, and it's now in a state where it
> works okay if you don't write too much to it, but if you do, it
> starts returning -ENOSPC on a random subset of your filesystem
> operations until you let it cool down again.
> 
> This was a fresh Fedora 21 install, upgraded to F22, installed
> about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
> and this system has only even run 4.0, and it has never had more
> than ~60G on it.  It's currently running:
>
> # uname -a
> Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> # btrfs --version
> btrfs-progs v4.0
> 
> # btrfs fi show
> Label: 'foobox'  uuid: [...]
>         Total devices 1 FS bytes used 58.87GiB
>         devid    1 size 229.97GiB used 229.97GiB path /dev/[...]

   All the space has been allocated for some purpose.

> # btrfs fi df /
> Data, single: total=227.94GiB, used=58.16GiB
> System, DUP: total=8.00MiB, used=48.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=1.00GiB, used=730.80MiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=256.00MiB, used=0.00B

   For a filesystem of this size, a very small proportion has gone to
metadata for some reason. This is odd. Given that it's happened,
though, all the other behaviour is as expected.

> It's currently still in a funky state:
> 
> [root@foobox lib]# df /
> Filesystem     1K-blocks     Used Available Use% Mounted on
> /dev/dm-0      241145856 62743256 178027464  27% /

   Looks reasonable, give the figures above.

> [root@foobox lib]# pwd
> /var/lib
> [root@foobox lib]# touch foo
> [root@foobox lib]# rm -f foo
> [root@foobox lib]# ls -ald abrt
> drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
> [root@foobox lib]# mv abrt abrt2
> mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
> [root@foobox lib]# 
> 
> Any tests anyone wants to run on this before I wipe and reinstall
> the box?

   No tests needed. Just run a filtered balance on it to clean up
unused chunks:

# btrfs balance start -dusage=5 /

as suggested in the FAQ [1].

   Hugo.

[1] https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29

-- 
Hugo Mills             | Two things came out of Berkeley in the 1960s: LSD
hugo@... carfax.org.uk | and Unix. This is not a coincidence.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 18:08 ` Hugo Mills
@ 2015-05-26 18:18   ` Chris Murphy
  2015-05-26 18:50     ` Calvin Walton
  2015-05-26 18:57     ` Holger Hoffstätte
  2015-06-09 10:54   ` Lennert Buytenhek
  1 sibling, 2 replies; 9+ messages in thread
From: Chris Murphy @ 2015-05-26 18:18 UTC (permalink / raw)
  To: Hugo Mills, Lennert Buytenhek, Btrfs BTRFS

Oh yeah easy to miss, but obvious once pointed out:

> Data, single: total=227.94GiB, used=58.16GiB

I thought we had automatic deallocation of unused chunks but I guess
it hasn't landed yet.


Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 18:18   ` Chris Murphy
@ 2015-05-26 18:50     ` Calvin Walton
  2015-05-26 19:29       ` Chris Murphy
  2015-05-26 18:57     ` Holger Hoffstätte
  1 sibling, 1 reply; 9+ messages in thread
From: Calvin Walton @ 2015-05-26 18:50 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Hugo Mills, Lennert Buytenhek, Btrfs BTRFS

On Tue, 2015-05-26 at 12:18 -0600, Chris Murphy wrote:
> Oh yeah easy to miss, but obvious once pointed out:
> 
> > Data, single: total=227.94GiB, used=58.16GiB
> 
> I thought we had automatic deallocation of unused chunks but I guess
> it hasn't landed yet.

We do have automatic deallocation of unused chunks. (I know it's in 
4.0, dunno about earlier versions.)

Unfortunately, this only deallocates *completely* unused chunks - if 
you're e.g. writing a bunch of small files and large files at the same 
time, then delete the large files, you could end up with a bunch of 
data chunks that are mostly, but not completely, empty: they still 
have the small files hanging around.

In this case a balance is still necessary to clean things up.

-- 
Calvin Walton <calvin.walton@kepstin.ca>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 18:18   ` Chris Murphy
  2015-05-26 18:50     ` Calvin Walton
@ 2015-05-26 18:57     ` Holger Hoffstätte
  1 sibling, 0 replies; 9+ messages in thread
From: Holger Hoffstätte @ 2015-05-26 18:57 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 26 May 2015 12:18:40 -0600, Chris Murphy wrote:

> Oh yeah easy to miss, but obvious once pointed out:
> 
>> Data, single: total=227.94GiB, used=58.16GiB
> 
> I thought we had automatic deallocation of unused chunks but I guess
> it hasn't landed yet.

It did (in 3.18 IIRC), but that doesn't help with - for whatever reason -
severely unbalanced chunks, i.e. all allocated but each used only a
little bit. Now, how and why that happened..

-h


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 18:50     ` Calvin Walton
@ 2015-05-26 19:29       ` Chris Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2015-05-26 19:29 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Chris Murphy, Hugo Mills, Lennert Buytenhek, Btrfs BTRFS

On Tue, May 26, 2015 at 12:50 PM, Calvin Walton
<calvin.walton@kepstin.ca> wrote:
> On Tue, 2015-05-26 at 12:18 -0600, Chris Murphy wrote:
>> Oh yeah easy to miss, but obvious once pointed out:
>>
>> > Data, single: total=227.94GiB, used=58.16GiB
>>
>> I thought we had automatic deallocation of unused chunks but I guess
>> it hasn't landed yet.
>
> We do have automatic deallocation of unused chunks. (I know it's in
> 4.0, dunno about earlier versions.)
>
> Unfortunately, this only deallocates *completely* unused chunks

oic.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
  2015-05-26 18:08 ` Hugo Mills
  2015-05-26 18:18   ` Chris Murphy
@ 2015-06-09 10:54   ` Lennert Buytenhek
  1 sibling, 0 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-06-09 10:54 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

On Tue, May 26, 2015 at 06:08:20PM +0000, Hugo Mills wrote:

> > The btrfs filesystem on my newly installed laptop has managed to
> > hose itself rather thoroughly, and it's now in a state where it
> > works okay if you don't write too much to it, but if you do, it
> > starts returning -ENOSPC on a random subset of your filesystem
> > operations until you let it cool down again.
> > 
> > This was a fresh Fedora 21 install, upgraded to F22, installed
> > about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
> > and this system has only even run 4.0, and it has never had more
> > than ~60G on it.  It's currently running:
> >
> > # uname -a
> > Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > # btrfs --version
> > btrfs-progs v4.0
> > 
> > # btrfs fi show
> > Label: 'foobox'  uuid: [...]
> >         Total devices 1 FS bytes used 58.87GiB
> >         devid    1 size 229.97GiB used 229.97GiB path /dev/[...]
> 
>    All the space has been allocated for some purpose.
> 
> > # btrfs fi df /
> > Data, single: total=227.94GiB, used=58.16GiB
> > System, DUP: total=8.00MiB, used=48.00KiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, DUP: total=1.00GiB, used=730.80MiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=256.00MiB, used=0.00B
> 
>    For a filesystem of this size, a very small proportion has gone to
> metadata for some reason. This is odd. Given that it's happened,
> though, all the other behaviour is as expected.
> 
> > It's currently still in a funky state:
> > 
> > [root@foobox lib]# df /
> > Filesystem     1K-blocks     Used Available Use% Mounted on
> > /dev/dm-0      241145856 62743256 178027464  27% /
> 
>    Looks reasonable, give the figures above.
> 
> > [root@foobox lib]# pwd
> > /var/lib
> > [root@foobox lib]# touch foo
> > [root@foobox lib]# rm -f foo
> > [root@foobox lib]# ls -ald abrt
> > drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
> > [root@foobox lib]# mv abrt abrt2
> > mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
> > [root@foobox lib]# 
> > 
> > Any tests anyone wants to run on this before I wipe and reinstall
> > the box?
> 
>    No tests needed. Just run a filtered balance on it to clean up
> unused chunks:
> 
> # btrfs balance start -dusage=5 /
> 
> as suggested in the FAQ [1].

Doing this helped for a few days, but now I'm back in a state where
I can't create any files at all -- everything fails with -ENOSPC.  The
filesystem isn't even half full and has never been more than half full.

[root@foobox ~]# uname -a
Linux foobox.wantstofly.org 4.0.4-301.fc22.x86_64 #1 SMP Thu May 21 13:10:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@foobox ~]# btrfs --version
btrfs-progs v4.0

[root@foobox ~]# btrfs fi show
Label: 'foobox'  uuid: [...]
        Total devices 1 FS bytes used 98.72GiB
        devid    1 size 229.97GiB used 100.97GiB path /dev/mapper/[...]

btrfs-progs v4.0

[root@foobox ~]# btrfs fi df /
Data, single: total=98.00GiB, used=98.00GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.47GiB, used=737.73MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=256.00MiB, used=0.00B

[root@foobox ~]# df /
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/dm-0      241145856 104533304 135265480  44% /


The balancing trick doesn't do anything:

[root@foobox ~]# btrfs balance start -dusage=5 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=10 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=20 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=50 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=80 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# 


Any ideas?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-06-09 10:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek
2015-05-26 17:50 ` Chris Murphy
2015-05-26 18:02   ` Lennert Buytenhek
2015-05-26 18:08 ` Hugo Mills
2015-05-26 18:18   ` Chris Murphy
2015-05-26 18:50     ` Calvin Walton
2015-05-26 19:29       ` Chris Murphy
2015-05-26 18:57     ` Holger Hoffstätte
2015-06-09 10:54   ` Lennert Buytenhek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox