* intermittent -ENOSPC errors on btrfs filesystem with 170G free
@ 2015-05-26 17:36 Lennert Buytenhek
2015-05-26 17:50 ` Chris Murphy
2015-05-26 18:08 ` Hugo Mills
0 siblings, 2 replies; 9+ messages in thread
From: Lennert Buytenhek @ 2015-05-26 17:36 UTC (permalink / raw)
To: linux-btrfs
Hi!
The btrfs filesystem on my newly installed laptop has managed to
hose itself rather thoroughly, and it's now in a state where it
works okay if you don't write too much to it, but if you do, it
starts returning -ENOSPC on a random subset of your filesystem
operations until you let it cool down again.
This was a fresh Fedora 21 install, upgraded to F22, installed
about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
and this system has only even run 4.0, and it has never had more
than ~60G on it. It's currently running:
# uname -a
Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
btrfs-progs v4.0
# btrfs fi show
Label: 'foobox' uuid: [...]
Total devices 1 FS bytes used 58.87GiB
devid 1 size 229.97GiB used 229.97GiB path /dev/[...]
# btrfs fi df /
Data, single: total=227.94GiB, used=58.16GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=730.80MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=256.00MiB, used=0.00B
There's zero btrfs related messages in the logs, or anything disk or
filesystem related, except a whole bunch of:
[1843230.259205] systemd-journald[692]: /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: IO error, rotating.
[1843230.290505] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1843230.315511] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1843230.348163] systemd-journald[692]: Failed to write entry (23 items, 626 bytes) despite vacuuming, ignoring: Bad message
[1843230.372496] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1843230.385662] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1843230.385750] systemd-journald[692]: Failed to write entry (23 items, 585 bytes), ignoring: Bad message
[1848548.026408] systemd-journald[692]: Failed to sync system journal: Input/output error
[1848642.374799] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1848642.392197] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
[1848642.392433] systemd-journald[692]: Failed to write entry (21 items, 796 bytes), ignoring: Bad message
[1848642.405032] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/system.journal: No space left on device
[1848642.416944] systemd-journald[692]: Failed to rotate /var/log/journal/5997c521ad1f4293842511a8ae54ff19/user-1000.journal: No space left on device
This is with ~170G free.
It's currently still in a funky state:
[root@foobox lib]# df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 241145856 62743256 178027464 27% /
[root@foobox lib]# pwd
/var/lib
[root@foobox lib]# touch foo
[root@foobox lib]# rm -f foo
[root@foobox lib]# ls -ald abrt
drwxr-xr-x. 1 root root 56 May 21 10:02 abrt
[root@foobox lib]# mv abrt abrt2
mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device
[root@foobox lib]#
Any tests anyone wants to run on this before I wipe and reinstall
the box?
cheers,
Lennert
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek @ 2015-05-26 17:50 ` Chris Murphy 2015-05-26 18:02 ` Lennert Buytenhek 2015-05-26 18:08 ` Hugo Mills 1 sibling, 1 reply; 9+ messages in thread From: Chris Murphy @ 2015-05-26 17:50 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: Btrfs BTRFS Before wiping I suggest making an image with btrfs-image. And then also see if any additional messages appear with the enospc_debug mount option. And also see if there's any correlation between the journald reported failures (the specific .journal file) and whether it's +C by using lsattr. Fedora 21 systemd journal files do not have +C by default, whereas Fedora 22 systemd journal files do use +C by default. So I'd expect the journal files to be a mix. Also let us know if there are snapshots that affect /var/log/journal (either reflink copies of the journal files or a snapshot of the root subvolume which contains /var/log/journal. Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 17:50 ` Chris Murphy @ 2015-05-26 18:02 ` Lennert Buytenhek 0 siblings, 0 replies; 9+ messages in thread From: Lennert Buytenhek @ 2015-05-26 18:02 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS On Tue, May 26, 2015 at 11:50:00AM -0600, Chris Murphy wrote: > Before wiping I suggest making an image with btrfs-image. And then > also see if any additional messages appear with the enospc_debug mount > option. And also see if there's any correlation between the journald > reported failures (the specific .journal file) and whether it's +C by > using lsattr. Fedora 21 systemd journal files do not have +C by > default, whereas Fedora 22 systemd journal files do use +C by default. > So I'd expect the journal files to be a mix. Also let us know if there > are snapshots that affect /var/log/journal (either reflink copies of > the journal files or a snapshot of the root subvolume which contains > /var/log/journal. I have: [root@foobox 5997c521ad1f4293842511a8ae54ff19]# pwd /var/log/journal/5997c521ad1f4293842511a8ae54ff19 [root@foobox 5997c521ad1f4293842511a8ae54ff19]# lsattr ---------------C ./system@7ea3ae63cd0247cbbe8f33bfec625725-0000000000000001-0005140f825e6aa3.journal ---------------C ./user-1000@f2b557cdffd5430caff9632200741e18-00000000000007dd-0005140f840dd4e4.journal ---------------C ./user-1000@f2b557cdffd5430caff9632200741e18-0000000000019aba-0005167499cd62e4.journal ---------------C ./user-1000.journal ---------------C ./system@00000000000000000000000000000000-0000000000000000-0000000000000000.journal ---------------C ./system.journal And I tried remounting with enospc_debug, which seems to have succeeded: [root@foobox ~]# cat /proc/mounts | grep enos /dev/dm-0 / btrfs rw,seclabel,relatime,ssd,space_cache,enospc_debug 0 0 But I still get the -ENOSPC errors: [root@foobox ~]# cd /var/lib [root@foobox lib]# mv abrt abrt2 mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device [root@foobox lib]# Yet nothing is appearing in dmesg about this. I didn't create any snapshots, and I don't think I have any. [root@foobox ~]# btrfs subvolume list / ID 257 gen 125768 top level 5 path root ID 260 gen 31676 top level 257 path var/lib/machines I think the journal thing is somewhat of a red herring, though, as lots of I/O is returning -ENOSPC right now, not just the journal related operations, yet only journald is syslogging the I/O errors it sees. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek 2015-05-26 17:50 ` Chris Murphy @ 2015-05-26 18:08 ` Hugo Mills 2015-05-26 18:18 ` Chris Murphy 2015-06-09 10:54 ` Lennert Buytenhek 1 sibling, 2 replies; 9+ messages in thread From: Hugo Mills @ 2015-05-26 18:08 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2633 bytes --] On Tue, May 26, 2015 at 08:36:34PM +0300, Lennert Buytenhek wrote: > Hi! > > The btrfs filesystem on my newly installed laptop has managed to > hose itself rather thoroughly, and it's now in a state where it > works okay if you don't write too much to it, but if you do, it > starts returning -ENOSPC on a random subset of your filesystem > operations until you let it cool down again. > > This was a fresh Fedora 21 install, upgraded to F22, installed > about a month ago, with a ~250G btrfs filesystem on a 256G SSD, > and this system has only even run 4.0, and it has never had more > than ~60G on it. It's currently running: > > # uname -a > Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > > # btrfs --version > btrfs-progs v4.0 > > # btrfs fi show > Label: 'foobox' uuid: [...] > Total devices 1 FS bytes used 58.87GiB > devid 1 size 229.97GiB used 229.97GiB path /dev/[...] All the space has been allocated for some purpose. > # btrfs fi df / > Data, single: total=227.94GiB, used=58.16GiB > System, DUP: total=8.00MiB, used=48.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, DUP: total=1.00GiB, used=730.80MiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=256.00MiB, used=0.00B For a filesystem of this size, a very small proportion has gone to metadata for some reason. This is odd. Given that it's happened, though, all the other behaviour is as expected. > It's currently still in a funky state: > > [root@foobox lib]# df / > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/dm-0 241145856 62743256 178027464 27% / Looks reasonable, give the figures above. > [root@foobox lib]# pwd > /var/lib > [root@foobox lib]# touch foo > [root@foobox lib]# rm -f foo > [root@foobox lib]# ls -ald abrt > drwxr-xr-x. 1 root root 56 May 21 10:02 abrt > [root@foobox lib]# mv abrt abrt2 > mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device > [root@foobox lib]# > > Any tests anyone wants to run on this before I wipe and reinstall > the box? No tests needed. Just run a filtered balance on it to clean up unused chunks: # btrfs balance start -dusage=5 / as suggested in the FAQ [1]. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29 -- Hugo Mills | Two things came out of Berkeley in the 1960s: LSD hugo@... carfax.org.uk | and Unix. This is not a coincidence. http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 18:08 ` Hugo Mills @ 2015-05-26 18:18 ` Chris Murphy 2015-05-26 18:50 ` Calvin Walton 2015-05-26 18:57 ` Holger Hoffstätte 2015-06-09 10:54 ` Lennert Buytenhek 1 sibling, 2 replies; 9+ messages in thread From: Chris Murphy @ 2015-05-26 18:18 UTC (permalink / raw) To: Hugo Mills, Lennert Buytenhek, Btrfs BTRFS Oh yeah easy to miss, but obvious once pointed out: > Data, single: total=227.94GiB, used=58.16GiB I thought we had automatic deallocation of unused chunks but I guess it hasn't landed yet. Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 18:18 ` Chris Murphy @ 2015-05-26 18:50 ` Calvin Walton 2015-05-26 19:29 ` Chris Murphy 2015-05-26 18:57 ` Holger Hoffstätte 1 sibling, 1 reply; 9+ messages in thread From: Calvin Walton @ 2015-05-26 18:50 UTC (permalink / raw) To: Chris Murphy; +Cc: Hugo Mills, Lennert Buytenhek, Btrfs BTRFS On Tue, 2015-05-26 at 12:18 -0600, Chris Murphy wrote: > Oh yeah easy to miss, but obvious once pointed out: > > > Data, single: total=227.94GiB, used=58.16GiB > > I thought we had automatic deallocation of unused chunks but I guess > it hasn't landed yet. We do have automatic deallocation of unused chunks. (I know it's in 4.0, dunno about earlier versions.) Unfortunately, this only deallocates *completely* unused chunks - if you're e.g. writing a bunch of small files and large files at the same time, then delete the large files, you could end up with a bunch of data chunks that are mostly, but not completely, empty: they still have the small files hanging around. In this case a balance is still necessary to clean things up. -- Calvin Walton <calvin.walton@kepstin.ca> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 18:50 ` Calvin Walton @ 2015-05-26 19:29 ` Chris Murphy 0 siblings, 0 replies; 9+ messages in thread From: Chris Murphy @ 2015-05-26 19:29 UTC (permalink / raw) To: Calvin Walton; +Cc: Chris Murphy, Hugo Mills, Lennert Buytenhek, Btrfs BTRFS On Tue, May 26, 2015 at 12:50 PM, Calvin Walton <calvin.walton@kepstin.ca> wrote: > On Tue, 2015-05-26 at 12:18 -0600, Chris Murphy wrote: >> Oh yeah easy to miss, but obvious once pointed out: >> >> > Data, single: total=227.94GiB, used=58.16GiB >> >> I thought we had automatic deallocation of unused chunks but I guess >> it hasn't landed yet. > > We do have automatic deallocation of unused chunks. (I know it's in > 4.0, dunno about earlier versions.) > > Unfortunately, this only deallocates *completely* unused chunks oic. -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 18:18 ` Chris Murphy 2015-05-26 18:50 ` Calvin Walton @ 2015-05-26 18:57 ` Holger Hoffstätte 1 sibling, 0 replies; 9+ messages in thread From: Holger Hoffstätte @ 2015-05-26 18:57 UTC (permalink / raw) To: linux-btrfs On Tue, 26 May 2015 12:18:40 -0600, Chris Murphy wrote: > Oh yeah easy to miss, but obvious once pointed out: > >> Data, single: total=227.94GiB, used=58.16GiB > > I thought we had automatic deallocation of unused chunks but I guess > it hasn't landed yet. It did (in 3.18 IIRC), but that doesn't help with - for whatever reason - severely unbalanced chunks, i.e. all allocated but each used only a little bit. Now, how and why that happened.. -h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: intermittent -ENOSPC errors on btrfs filesystem with 170G free 2015-05-26 18:08 ` Hugo Mills 2015-05-26 18:18 ` Chris Murphy @ 2015-06-09 10:54 ` Lennert Buytenhek 1 sibling, 0 replies; 9+ messages in thread From: Lennert Buytenhek @ 2015-06-09 10:54 UTC (permalink / raw) To: Hugo Mills, linux-btrfs On Tue, May 26, 2015 at 06:08:20PM +0000, Hugo Mills wrote: > > The btrfs filesystem on my newly installed laptop has managed to > > hose itself rather thoroughly, and it's now in a state where it > > works okay if you don't write too much to it, but if you do, it > > starts returning -ENOSPC on a random subset of your filesystem > > operations until you let it cool down again. > > > > This was a fresh Fedora 21 install, upgraded to F22, installed > > about a month ago, with a ~250G btrfs filesystem on a 256G SSD, > > and this system has only even run 4.0, and it has never had more > > than ~60G on it. It's currently running: > > > > # uname -a > > Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > > > > # btrfs --version > > btrfs-progs v4.0 > > > > # btrfs fi show > > Label: 'foobox' uuid: [...] > > Total devices 1 FS bytes used 58.87GiB > > devid 1 size 229.97GiB used 229.97GiB path /dev/[...] > > All the space has been allocated for some purpose. > > > # btrfs fi df / > > Data, single: total=227.94GiB, used=58.16GiB > > System, DUP: total=8.00MiB, used=48.00KiB > > System, single: total=4.00MiB, used=0.00B > > Metadata, DUP: total=1.00GiB, used=730.80MiB > > Metadata, single: total=8.00MiB, used=0.00B > > GlobalReserve, single: total=256.00MiB, used=0.00B > > For a filesystem of this size, a very small proportion has gone to > metadata for some reason. This is odd. Given that it's happened, > though, all the other behaviour is as expected. > > > It's currently still in a funky state: > > > > [root@foobox lib]# df / > > Filesystem 1K-blocks Used Available Use% Mounted on > > /dev/dm-0 241145856 62743256 178027464 27% / > > Looks reasonable, give the figures above. > > > [root@foobox lib]# pwd > > /var/lib > > [root@foobox lib]# touch foo > > [root@foobox lib]# rm -f foo > > [root@foobox lib]# ls -ald abrt > > drwxr-xr-x. 1 root root 56 May 21 10:02 abrt > > [root@foobox lib]# mv abrt abrt2 > > mv: cannot move ‘abrt’ to ‘abrt2’: No space left on device > > [root@foobox lib]# > > > > Any tests anyone wants to run on this before I wipe and reinstall > > the box? > > No tests needed. Just run a filtered balance on it to clean up > unused chunks: > > # btrfs balance start -dusage=5 / > > as suggested in the FAQ [1]. Doing this helped for a few days, but now I'm back in a state where I can't create any files at all -- everything fails with -ENOSPC. The filesystem isn't even half full and has never been more than half full. [root@foobox ~]# uname -a Linux foobox.wantstofly.org 4.0.4-301.fc22.x86_64 #1 SMP Thu May 21 13:10:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@foobox ~]# btrfs --version btrfs-progs v4.0 [root@foobox ~]# btrfs fi show Label: 'foobox' uuid: [...] Total devices 1 FS bytes used 98.72GiB devid 1 size 229.97GiB used 100.97GiB path /dev/mapper/[...] btrfs-progs v4.0 [root@foobox ~]# btrfs fi df / Data, single: total=98.00GiB, used=98.00GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.47GiB, used=737.73MiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=256.00MiB, used=0.00B [root@foobox ~]# df / Filesystem 1K-blocks Used Available Use% Mounted on /dev/dm-0 241145856 104533304 135265480 44% / The balancing trick doesn't do anything: [root@foobox ~]# btrfs balance start -dusage=5 / Done, had to relocate 0 out of 104 chunks [root@foobox ~]# btrfs balance start -dusage=10 / Done, had to relocate 0 out of 104 chunks [root@foobox ~]# btrfs balance start -dusage=20 / Done, had to relocate 0 out of 104 chunks [root@foobox ~]# btrfs balance start -dusage=50 / Done, had to relocate 0 out of 104 chunks [root@foobox ~]# btrfs balance start -dusage=80 / Done, had to relocate 0 out of 104 chunks [root@foobox ~]# Any ideas? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-06-09 10:54 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-05-26 17:36 intermittent -ENOSPC errors on btrfs filesystem with 170G free Lennert Buytenhek 2015-05-26 17:50 ` Chris Murphy 2015-05-26 18:02 ` Lennert Buytenhek 2015-05-26 18:08 ` Hugo Mills 2015-05-26 18:18 ` Chris Murphy 2015-05-26 18:50 ` Calvin Walton 2015-05-26 19:29 ` Chris Murphy 2015-05-26 18:57 ` Holger Hoffstätte 2015-06-09 10:54 ` Lennert Buytenhek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox