* intermittent -ENOSPC errors on btrfs filesystem with 170G free
@ 2015-05-26 17:36 Lennert Buytenhek
2015-05-26 17:50 ` Chris Murphy
2015-05-26 18:08 ` Hugo Mills
0 siblings, 2 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-05-26 17:36 UTC (permalink / raw)
To: linux-btrfs
Hi!
The btrfs filesystem on my newly installed laptop has managed to
hose itself rather thoroughly, and it's now in a state where it
works okay if you don't write too much to it, but if you do, it
starts returning -ENOSPC on a random subset of your filesystem
operations until you let it cool down again.
This was a fresh Fedora 21 install, upgraded to F22, installed
about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
and this system has only even run 4.0, and it has never had more
than ~60G on it. It's currently running:
# uname -a
Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
btrfs-progs v4.0
# btrfs fi show
Label: 'foobox' uuid: [...]
Total devices 1 FS bytes used 58.87GiB
devid 1 size 229.97GiB used 229.97GiB path /dev/[...]
# btrfs fi df /
Data, single: total=227.94GiB, used=58.16GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=730.80MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=256.00MiB, used=0.00B
There's zero btrfs related messages in the logs, or anything disk or
filesystem related, except a whole bunch of:
[1843230.259205] systemd-journald[692]: /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: IO error, rotating.
[1843230.290505] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1843230.315511] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1843230.348163] systemd-journald[692]: Failed to write entry (23 items, 626 bytes) despite vacuuming, ignoring: Bad message
[1843230.372496] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1843230.385662] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1843230.385750] systemd-journald[692]: Failed to write entry (23 items, 585 bytes), ignoring: Bad message
[1848548.026408] systemd-journald[692]: Failed to sync system journal: Input/output error
[1848642.374799] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1848642.392197] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1848642.392433] systemd-journald[692]: Failed to write entry (21 items, 796 bytes), ignoring: Bad message
[1848642.405032] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1848642.416944] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
This is with ~170G free.
It's currently still in a funky state:
[root@foobox lib]# df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 241145856 62743256 178027464 27% /
[root@foobox lib]# pwd
/var/lib
[root@foobox lib]# touch foo
[root@foobox lib]# rm -f foo
[root@foobox lib]# ls -ald abrt
drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
[root@foobox lib]# mv abrt abrt2
mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
[root@foobox lib]#
Any tests anyone wants to run on this before I wipe and reinstall
the box?
cheers,
Lennert
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek
@ 2015-05-26 17:50 ` Chris Murphy
2015-05-26 18:02 ` Lennert Buytenhek
2015-05-26 18:08 ` Hugo Mills
1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2015-05-26 17:50 UTC (permalink / raw)
To: Lennert Buytenhek; +Cc: Btrfs BTRFS
Before wiping I suggest making an image with btrfs-image. And then
also see if any additional messages appear with the enospc_debug mount
option. And also see if there's any correlation between the journald
reported failures (the specific .journal file) and whether it's +C by
using lsattr. Fedora 21 systemd journal files do not have +C by
default, whereas Fedora 22 systemd journal files do use +C by default.
So I'd expect the journal files to be a mix. Also let us know if there
are snapshots that affect /var/log/journal (either reflink copies of
the journal files or a snapshot of the root subvolume which contains
/var/log/journal.
Chris Murphy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 17:50 ` Chris Murphy
@ 2015-05-26 18:02 ` Lennert Buytenhek
0 siblings, 0 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-05-26 18:02 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Tue, May 26, 2015 at 11:50:00AM -0600, Chris Murphy wrote:
> Before wiping I suggest making an image with btrfs-image. And then
> also see if any additional messages appear with the enospc_debug mount
> option. And also see if there's any correlation between the journald
> reported failures (the specific .journal file) and whether it's +C by
> using lsattr. Fedora 21 systemd journal files do not have +C by
> default, whereas Fedora 22 systemd journal files do use +C by default.
> So I'd expect the journal files to be a mix. Also let us know if there
> are snapshots that affect /var/log/journal (either reflink copies of
> the journal files or a snapshot of the root subvolume which contains
> /var/log/journal.
I have:
[root@foobox 5997c521ad1f4293842511a8ae54ff19]# pwd
/var/log/journal/5997c521ad1f4293842511a8ae54ff19
[root@foobox 5997c521ad1f4293842511a8ae54ff19]# lsattr
---------------C ./system@7ea3ae63cd0247cbbe8f33bfec625725-0000000000000001-0005140f825e6aa3.journal
---------------C ./user-1000@f2b557cdffd5430caff9632200741e18-00000000000007dd-0005140f840dd4e4.journal
---------------C ./user-1000@f2b557cdffd5430caff9632200741e18-0000000000019aba-0005167499cd62e4.journal
---------------C ./user-1000.journal
---------------C ./system@00000000000000000000000000000000-0000000000000000-0000000000000000.journal
---------------C ./system.journal
And I tried remounting with enospc_debug, which seems to have
succeeded:
[root@foobox ~]# cat /proc/mounts | grep enos
/dev/dm-0 / btrfs rw,seclabel,relatime,ssd,space_cache,enospc_debug 0 0
But I still get the -ENOSPC errors:
[root@foobox ~]# cd /var/lib
[root@foobox lib]# mv abrt abrt2
mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
[root@foobox lib]#
Yet nothing is appearing in dmesg about this.
I didn't create any snapshots, and I don't think I have any.
[root@foobox ~]# btrfs subvolume list /
ID 257 gen 125768 top level 5 path root
ID 260 gen 31676 top level 257 path var/lib/machines
I think the journal thing is somewhat of a red herring, though, as lots
of I/O is returning -ENOSPC right now, not just the journal related
operations, yet only journald is syslogging the I/O errors it sees.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek
2015-05-26 17:50 ` Chris Murphy
@ 2015-05-26 18:08 ` Hugo Mills
2015-05-26 18:18 ` Chris Murphy
2015-06-09 10:54 ` Lennert Buytenhek
1 sibling, 2 replies; 9+ messages in thread
From: Hugo Mills @ 2015-05-26 18:08 UTC (permalink / raw)
To: Lennert Buytenhek; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2633 bytes --]
On Tue, May 26, 2015 at 08:36:34PM +0300, Lennert Buytenhek wrote:
> Hi!
>
> The btrfs filesystem on my newly installed laptop has managed to
> hose itself rather thoroughly, and it's now in a state where it
> works okay if you don't write too much to it, but if you do, it
> starts returning -ENOSPC on a random subset of your filesystem
> operations until you let it cool down again.
>
> This was a fresh Fedora 21 install, upgraded to F22, installed
> about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
> and this system has only even run 4.0, and it has never had more
> than ~60G on it. It's currently running:
>
> # uname -a
> Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v4.0
>
> # btrfs fi show
> Label: 'foobox' uuid: [...]
> Total devices 1 FS bytes used 58.87GiB
> devid 1 size 229.97GiB used 229.97GiB path /dev/[...]
All the space has been allocated for some purpose.
> # btrfs fi df /
> Data, single: total=227.94GiB, used=58.16GiB
> System, DUP: total=8.00MiB, used=48.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=1.00GiB, used=730.80MiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=256.00MiB, used=0.00B
For a filesystem of this size, a very small proportion has gone to
metadata for some reason. This is odd. Given that it's happened,
though, all the other behaviour is as expected.
> It's currently still in a funky state:
>
> [root@foobox lib]# df /
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/dm-0 241145856 62743256 178027464 27% /
Looks reasonable, give the figures above.
> [root@foobox lib]# pwd
> /var/lib
> [root@foobox lib]# touch foo
> [root@foobox lib]# rm -f foo
> [root@foobox lib]# ls -ald abrt
> drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
> [root@foobox lib]# mv abrt abrt2
> mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
> [root@foobox lib]#
>
> Any tests anyone wants to run on this before I wipe and reinstall
> the box?
No tests needed. Just run a filtered balance on it to clean up
unused chunks:
# btrfs balance start -dusage=5 /
as suggested in the FAQ [1].
Hugo.
[1] https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29
--
Hugo Mills | Two things came out of Berkeley in the 1960s: LSD
hugo@... carfax.org.uk | and Unix. This is not a coincidence.
http://carfax.org.uk/ |
PGP: E2AB1DE4 |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 18:08 ` Hugo Mills
@ 2015-05-26 18:18 ` Chris Murphy
2015-05-26 18:50 ` Calvin Walton
2015-05-26 18:57 ` Holger Hoffstätte
2015-06-09 10:54 ` Lennert Buytenhek
1 sibling, 2 replies; 9+ messages in thread
From: Chris Murphy @ 2015-05-26 18:18 UTC (permalink / raw)
To: Hugo Mills, Lennert Buytenhek, Btrfs BTRFS
Oh yeah easy to miss, but obvious once pointed out:
> Data, single: total=227.94GiB, used=58.16GiB
I thought we had automatic deallocation of unused chunks but I guess
it hasn't landed yet.
Chris Murphy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 18:18 ` Chris Murphy
@ 2015-05-26 18:50 ` Calvin Walton
2015-05-26 19:29 ` Chris Murphy
2015-05-26 18:57 ` Holger Hoffstätte
1 sibling, 1 reply; 9+ messages in thread
From: Calvin Walton @ 2015-05-26 18:50 UTC (permalink / raw)
To: Chris Murphy; +Cc: Hugo Mills, Lennert Buytenhek, Btrfs BTRFS
On Tue, 2015-05-26 at 12:18 -0600, Chris Murphy wrote:
> Oh yeah easy to miss, but obvious once pointed out:
>
> > Data, single: total=227.94GiB, used=58.16GiB
>
> I thought we had automatic deallocation of unused chunks but I guess
> it hasn't landed yet.
We do have automatic deallocation of unused chunks. (I know it's in
4.0, dunno about earlier versions.)
Unfortunately, this only deallocates *completely* unused chunks - if
you're e.g. writing a bunch of small files and large files at the same
time, then delete the large files, you could end up with a bunch of
data chunks that are mostly, but not completely, empty: they still
have the small files hanging around.
In this case a balance is still necessary to clean things up.
--
Calvin Walton <calvin.walton@kepstin.ca>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 18:18 ` Chris Murphy
2015-05-26 18:50 ` Calvin Walton
@ 2015-05-26 18:57 ` Holger Hoffstätte
1 sibling, 0 replies; 9+ messages in thread
From: Holger Hoffstätte @ 2015-05-26 18:57 UTC (permalink / raw)
To: linux-btrfs
On Tue, 26 May 2015 12:18:40 -0600, Chris Murphy wrote:
> Oh yeah easy to miss, but obvious once pointed out:
>
>> Data, single: total=227.94GiB, used=58.16GiB
>
> I thought we had automatic deallocation of unused chunks but I guess
> it hasn't landed yet.
It did (in 3.18 IIRC), but that doesn't help with - for whatever reason -
severely unbalanced chunks, i.e. all allocated but each used only a
little bit. Now, how and why that happened..
-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 18:50 ` Calvin Walton
@ 2015-05-26 19:29 ` Chris Murphy
0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2015-05-26 19:29 UTC (permalink / raw)
To: Calvin Walton; +Cc: Chris Murphy, Hugo Mills, Lennert Buytenhek, Btrfs BTRFS
On Tue, May 26, 2015 at 12:50 PM, Calvin Walton
<calvin.walton@kepstin.ca> wrote:
> On Tue, 2015-05-26 at 12:18 -0600, Chris Murphy wrote:
>> Oh yeah easy to miss, but obvious once pointed out:
>>
>> > Data, single: total=227.94GiB, used=58.16GiB
>>
>> I thought we had automatic deallocation of unused chunks but I guess
>> it hasn't landed yet.
>
> We do have automatic deallocation of unused chunks. (I know it's in
> 4.0, dunno about earlier versions.)
>
> Unfortunately, this only deallocates *completely* unused chunks
oic.
--
Chris Murphy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free
2015-05-26 18:08 ` Hugo Mills
2015-05-26 18:18 ` Chris Murphy
@ 2015-06-09 10:54 ` Lennert Buytenhek
1 sibling, 0 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-06-09 10:54 UTC (permalink / raw)
To: Hugo Mills, linux-btrfs
On Tue, May 26, 2015 at 06:08:20PM +0000, Hugo Mills wrote:
> > The btrfs filesystem on my newly installed laptop has managed to
> > hose itself rather thoroughly, and it's now in a state where it
> > works okay if you don't write too much to it, but if you do, it
> > starts returning -ENOSPC on a random subset of your filesystem
> > operations until you let it cool down again.
> >
> > This was a fresh Fedora 21 install, upgraded to F22, installed
> > about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
> > and this system has only even run 4.0, and it has never had more
> > than ~60G on it. It's currently running:
> >
> > # uname -a
> > Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> > # btrfs --version
> > btrfs-progs v4.0
> >
> > # btrfs fi show
> > Label: 'foobox' uuid: [...]
> > Total devices 1 FS bytes used 58.87GiB
> > devid 1 size 229.97GiB used 229.97GiB path /dev/[...]
>
> All the space has been allocated for some purpose.
>
> > # btrfs fi df /
> > Data, single: total=227.94GiB, used=58.16GiB
> > System, DUP: total=8.00MiB, used=48.00KiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, DUP: total=1.00GiB, used=730.80MiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=256.00MiB, used=0.00B
>
> For a filesystem of this size, a very small proportion has gone to
> metadata for some reason. This is odd. Given that it's happened,
> though, all the other behaviour is as expected.
>
> > It's currently still in a funky state:
> >
> > [root@foobox lib]# df /
> > Filesystem 1K-blocks Used Available Use% Mounted on
> > /dev/dm-0 241145856 62743256 178027464 27% /
>
> Looks reasonable, give the figures above.
>
> > [root@foobox lib]# pwd
> > /var/lib
> > [root@foobox lib]# touch foo
> > [root@foobox lib]# rm -f foo
> > [root@foobox lib]# ls -ald abrt
> > drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
> > [root@foobox lib]# mv abrt abrt2
> > mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
> > [root@foobox lib]#
> >
> > Any tests anyone wants to run on this before I wipe and reinstall
> > the box?
>
> No tests needed. Just run a filtered balance on it to clean up
> unused chunks:
>
> # btrfs balance start -dusage=5 /
>
> as suggested in the FAQ [1].
Doing this helped for a few days, but now I'm back in a state where
I can't create any files at all -- everything fails with -ENOSPC. The
filesystem isn't even half full and has never been more than half full.
[root@foobox ~]# uname -a
Linux foobox.wantstofly.org 4.0.4-301.fc22.x86_64 #1 SMP Thu May 21 13:10:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@foobox ~]# btrfs --version
btrfs-progs v4.0
[root@foobox ~]# btrfs fi show
Label: 'foobox' uuid: [...]
Total devices 1 FS bytes used 98.72GiB
devid 1 size 229.97GiB used 100.97GiB path /dev/mapper/[...]
btrfs-progs v4.0
[root@foobox ~]# btrfs fi df /
Data, single: total=98.00GiB, used=98.00GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.47GiB, used=737.73MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=256.00MiB, used=0.00B
[root@foobox ~]# df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 241145856 104533304 135265480 44% /
The balancing trick doesn't do anything:
[root@foobox ~]# btrfs balance start -dusage=5 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=10 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=20 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=50 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]# btrfs balance start -dusage=80 /
Done, had to relocate 0 out of 104 chunks
[root@foobox ~]#
Any ideas?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-06-09 10:54 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek
2015-05-26 17:50 ` Chris Murphy
2015-05-26 18:02 ` Lennert Buytenhek
2015-05-26 18:08 ` Hugo Mills
2015-05-26 18:18 ` Chris Murphy
2015-05-26 18:50 ` Calvin Walton
2015-05-26 19:29 ` Chris Murphy
2015-05-26 18:57 ` Holger Hoffstätte
2015-06-09 10:54 ` Lennert Buytenhek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox