* ENOSPC with mkdir and rename
@ 2014-08-02 23:35 Peter Waller
2014-08-03 0:28 ` Mitch Harder
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Peter Waller @ 2014-08-02 23:35 UTC (permalink / raw)
To: linux-btrfs
Hi All,
My TL;DR questions are at the bottom, before the stack trace.
I'm running Ubuntu 14.04. I wonder if this problem is related to the
thread titled "Machine lockup due to btrfs-transaction on AWS EC2
Ubuntu 14.04" which I started on the 29th of July:
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224
Kernel: 3.15.7-031507-generic
I'm on a single block device system, i.e, no RAID.
I was observing ENOSPC from `mkdir` and `rename` on this system, with
a good amount of free disk space (df -h reports 62 GB remain). I added
enospc_debug (full umount/mount, not just mount -o remount), but this
had no apparent effect when receiving ENOSPC from userland.
$ sudo btrfs fi df /path/to/volume
Data, single: total=489.97GiB, used=427.75GiB
System, DUP: total=8.00MiB, used=60.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=5.00GiB, used=4.50GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=512.00MiB, used=820.00KiB
After a thorough search of the internet for ENOSPC BTRFS I found
various resources and came to understand a little bit more. One thing
which broke my intuition severely is that I expected if there is a
large number of free GiB, I should expect things to continue to work.
In this case, for example, metadata has 0.5GiB free ("sounds like
plenty for metadata for one mkdir to me"). Data has 62GiB free. Why
would I get ENOSPC for a file rename?
I expected that if metadata needed more space, it would just eat it
from the 'data'. Now I believe this not to be the case and that it
wanted to allocate > 0.5GiB, and this is why I was getting ENOSPC.
I tried a rebalance with btrfs balance start -dusage=10 and tried
increasing the value until I saw reallocations in dmesg.
This spat out a large number of messages in dmesg, of this form:
> [376096.546353] BTRFS info (device dm-0): relocating block group 530457821184 flags 1
> [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance
(and a full stack trace at the end of this message).
The rebalance printed:
> ERROR: error during balancing '/path/to/volume' - No space left on device
> There may be more info in syslog - try dmesg | tail
Eventually, not knowing what else to do I had to take my escape hatch
and enlarge the volume. When I did this, metadata grew by 1GiB:
> Data, single: total=490.97GiB, used=427.75GiB
> System, DUP: total=8.00MiB, used=60.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=5.50GiB, used=4.50GiB
> Metadata, single: total=8.00MiB, used=0.00
> unknown, single: total=512.00MiB, used=0.00
A few questions:
* Why didn't the metadata grow before enlarging the disk?
* Why didn't the rebalance enable the metadata to grow?
* Why is it necessary to rebalance? Can't it automatically take some
free space from 'data'?
* Are my machine lockups related to the fact I was low on space?
* Can we improve the documentation/FAQ for this? I was scratching my
head in particular because my notion of free space definitely does not
match up with BTRFS', and I didn't find the FAQ very helpful for
getting out of this mess.
* It isn't documented on the wiki what enospc_debug is supposed to do,
so I couldn't tell whether I should have expected it to tell me
anything in my circumstances.
* What is the best course of action to take (other than enlarging the
disk or deleting files) if I encounter this situation again?
Thanks in advance,
- Peter
[376007.681938] ------------[ cut here ]------------
[376007.681957] WARNING: CPU: 1 PID: 27021 at
/home/apw/COD/linux/fs/btrfs/extent-tree.c:6946
use_block_rsv+0xfd/0x1a0 [btrfs]()
[376007.681958] BTRFS: block rsv returned -28
[376007.681959] Modules linked in: softdog tcp_diag inet_diag dm_crypt
ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt
i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp
iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse
[376007.681980] CPU: 1 PID: 27021 Comm: pam_script_ses_ Tainted: G
W 3.15.7-031507-generic #201407281235
[376007.681981] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
[376007.681983] 0000000000001b22 ffff8800acca39d8 ffffffff8176f115
0000000000000007
[376007.681986] ffff8800acca3a28 ffff8800acca3a18 ffffffff8106ceac
ffff8801efc37870
[376007.681989] ffff88017db0ff00 ffff8801aedcd800 0000000000001000
ffff88001c987000
[376007.681992] Call Trace:
[376007.682000] [<ffffffff8176f115>] dump_stack+0x46/0x58
[376007.682005] [<ffffffff8106ceac>] warn_slowpath_common+0x8c/0xc0
[376007.682008] [<ffffffff8106cf96>] warn_slowpath_fmt+0x46/0x50
[376007.682016] [<ffffffffa00d9d1d>] use_block_rsv+0xfd/0x1a0 [btrfs]
[376007.682024] [<ffffffffa00de687>] btrfs_alloc_free_block+0x57/0x220 [btrfs]
[376007.682027] [<ffffffff8178033c>] ? __do_page_fault+0x28c/0x550
[376007.682031] [<ffffffff8119749f>] ? page_add_file_rmap+0x6f/0xb0
[376007.682037] [<ffffffffa00c8a3c>] btrfs_copy_root+0xfc/0x2b0 [btrfs]
[376007.682041] [<ffffffff811c60b9>] ? memcg_check_events+0x29/0x50
[376007.682051] [<ffffffffa013a583>] ? create_reloc_root+0x33/0x2c0 [btrfs]
[376007.682061] [<ffffffffa013a743>] create_reloc_root+0x1f3/0x2c0 [btrfs]
[376007.682064] [<ffffffff811dd073>] ? generic_permission+0xf3/0x120
[376007.682073] [<ffffffffa0140eb8>] btrfs_init_reloc_root+0xb8/0xd0 [btrfs]
[376007.682082] [<ffffffffa00ee967>]
record_root_in_trans.part.30+0x97/0x100 [btrfs]
[376007.682090] [<ffffffffa00ee9f4>] record_root_in_trans+0x24/0x30 [btrfs]
[376007.682098] [<ffffffffa00efeb1>]
btrfs_record_root_in_trans+0x51/0x80 [btrfs]
[376007.682106] [<ffffffffa00f13d6>]
start_transaction.part.35+0x86/0x560 [btrfs]
[376007.682109] [<ffffffff8132c197>] ? apparmor_capable+0x27/0x80
[376007.682117] [<ffffffffa00f18d9>] start_transaction+0x29/0x30 [btrfs]
[376007.682125] [<ffffffffa00f19a7>] btrfs_join_transaction+0x17/0x20 [btrfs]
[376007.682133] [<ffffffffa00f7fa8>] btrfs_dirty_inode+0x58/0xe0 [btrfs]
[376007.682141] [<ffffffffa00fcaf2>] btrfs_setattr+0xa2/0xf0 [btrfs]
[376007.682144] [<ffffffff811eec74>] notify_change+0x1c4/0x3b0
[376007.682146] [<ffffffff811dde96>] ? final_putname+0x26/0x50
[376007.682149] [<ffffffff811d088d>] chown_common+0x16d/0x1a0
[376007.682153] [<ffffffff811f2b08>] ? __mnt_want_write+0x58/0x70
[376007.682156] [<ffffffff811d1a8f>] SyS_fchownat+0xbf/0x100
[376007.682159] [<ffffffff811d1aed>] SyS_chown+0x1d/0x20
[376007.682163] [<ffffffff817858bf>] tracesys+0xe1/0xe6
[376007.682165] ---[ end trace 1853311c87a5cd94 ]---
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: ENOSPC with mkdir and rename 2014-08-02 23:35 ENOSPC with mkdir and rename Peter Waller @ 2014-08-03 0:28 ` Mitch Harder 2014-08-03 1:52 ` Nick Krause 2014-08-03 2:39 ` Russell Coker 2014-08-04 1:38 ` Qu Wenruo 2 siblings, 1 reply; 44+ messages in thread From: Mitch Harder @ 2014-08-03 0:28 UTC (permalink / raw) To: Peter Waller; +Cc: linux-btrfs On Sat, Aug 2, 2014 at 6:35 PM, Peter Waller <peter@scraperwiki.com> wrote: > Hi All, > > My TL;DR questions are at the bottom, before the stack trace. > > I'm running Ubuntu 14.04. I wonder if this problem is related to the > thread titled "Machine lockup due to btrfs-transaction on AWS EC2 > Ubuntu 14.04" which I started on the 29th of July: > >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 > > Kernel: 3.15.7-031507-generic > > I'm on a single block device system, i.e, no RAID. > > I was observing ENOSPC from `mkdir` and `rename` on this system, with > a good amount of free disk space (df -h reports 62 GB remain). I added > enospc_debug (full umount/mount, not just mount -o remount), but this > had no apparent effect when receiving ENOSPC from userland. > > $ sudo btrfs fi df /path/to/volume > Data, single: total=489.97GiB, used=427.75GiB > System, DUP: total=8.00MiB, used=60.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=5.00GiB, used=4.50GiB > Metadata, single: total=8.00MiB, used=0.00 > unknown, single: total=512.00MiB, used=820.00KiB > > After a thorough search of the internet for ENOSPC BTRFS I found > various resources and came to understand a little bit more. One thing > which broke my intuition severely is that I expected if there is a > large number of free GiB, I should expect things to continue to work. > > In this case, for example, metadata has 0.5GiB free ("sounds like > plenty for metadata for one mkdir to me"). Data has 62GiB free. Why > would I get ENOSPC for a file rename? > > I expected that if metadata needed more space, it would just eat it > from the 'data'. Now I believe this not to be the case and that it > wanted to allocate > 0.5GiB, and this is why I was getting ENOSPC. > > I tried a rebalance with btrfs balance start -dusage=10 and tried > increasing the value until I saw reallocations in dmesg. > > This spat out a large number of messages in dmesg, of this form: > >> [376096.546353] BTRFS info (device dm-0): relocating block group 530457821184 flags 1 >> [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance > > (and a full stack trace at the end of this message). > > The rebalance printed: > >> ERROR: error during balancing '/path/to/volume' - No space left on device >> There may be more info in syslog - try dmesg | tail > > Eventually, not knowing what else to do I had to take my escape hatch > and enlarge the volume. When I did this, metadata grew by 1GiB: > >> Data, single: total=490.97GiB, used=427.75GiB >> System, DUP: total=8.00MiB, used=60.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, DUP: total=5.50GiB, used=4.50GiB >> Metadata, single: total=8.00MiB, used=0.00 >> unknown, single: total=512.00MiB, used=0.00 > > A few questions: > > * Why didn't the metadata grow before enlarging the disk? > * Why didn't the rebalance enable the metadata to grow? > * Why is it necessary to rebalance? Can't it automatically take some > free space from 'data'? > * Are my machine lockups related to the fact I was low on space? > * Can we improve the documentation/FAQ for this? I was scratching my > head in particular because my notion of free space definitely does not > match up with BTRFS', and I didn't find the FAQ very helpful for > getting out of this mess. > * It isn't documented on the wiki what enospc_debug is supposed to do, > so I couldn't tell whether I should have expected it to tell me > anything in my circumstances. > * What is the best course of action to take (other than enlarging the > disk or deleting files) if I encounter this situation again? > Looking at this line: > Data, single: total=489.97GiB, used=427.75GiB I see that btrfs has allocated almost the entire disk to Data, and it appears you are starved for Metadata room. Once btrfs allocates space for either Data or Metadata, there are currently no build-in kernel mechanisms re-allocate that space. We have to use the userland balance tools. I agree that this behavior can become a "gotcha". Btrfs has the capability to run in a mode where Data and Metadata are combined, but there is a speed penalty running in Mixed Data/Metadata mode. The btrfs balance tools have to ability to use filters to run a quicker pass on just the mostly-empty blocks, skipping a full balance. https://btrfs.wiki.kernel.org/index.php/Balance_Filters I would suggest this as the next step. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-03 0:28 ` Mitch Harder @ 2014-08-03 1:52 ` Nick Krause 0 siblings, 0 replies; 44+ messages in thread From: Nick Krause @ 2014-08-03 1:52 UTC (permalink / raw) To: Mitch Harder; +Cc: Peter Waller, linux-btrfs On Sat, Aug 2, 2014 at 8:28 PM, Mitch Harder <mitch.harder@sabayonlinux.org> wrote: > On Sat, Aug 2, 2014 at 6:35 PM, Peter Waller <peter@scraperwiki.com> wrote: >> Hi All, >> >> My TL;DR questions are at the bottom, before the stack trace. >> >> I'm running Ubuntu 14.04. I wonder if this problem is related to the >> thread titled "Machine lockup due to btrfs-transaction on AWS EC2 >> Ubuntu 14.04" which I started on the 29th of July: >> >>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 >> >> Kernel: 3.15.7-031507-generic >> >> I'm on a single block device system, i.e, no RAID. >> >> I was observing ENOSPC from `mkdir` and `rename` on this system, with >> a good amount of free disk space (df -h reports 62 GB remain). I added >> enospc_debug (full umount/mount, not just mount -o remount), but this >> had no apparent effect when receiving ENOSPC from userland. >> >> $ sudo btrfs fi df /path/to/volume >> Data, single: total=489.97GiB, used=427.75GiB >> System, DUP: total=8.00MiB, used=60.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, DUP: total=5.00GiB, used=4.50GiB >> Metadata, single: total=8.00MiB, used=0.00 >> unknown, single: total=512.00MiB, used=820.00KiB >> >> After a thorough search of the internet for ENOSPC BTRFS I found >> various resources and came to understand a little bit more. One thing >> which broke my intuition severely is that I expected if there is a >> large number of free GiB, I should expect things to continue to work. >> >> In this case, for example, metadata has 0.5GiB free ("sounds like >> plenty for metadata for one mkdir to me"). Data has 62GiB free. Why >> would I get ENOSPC for a file rename? >> >> I expected that if metadata needed more space, it would just eat it >> from the 'data'. Now I believe this not to be the case and that it >> wanted to allocate > 0.5GiB, and this is why I was getting ENOSPC. >> >> I tried a rebalance with btrfs balance start -dusage=10 and tried >> increasing the value until I saw reallocations in dmesg. >> >> This spat out a large number of messages in dmesg, of this form: >> >>> [376096.546353] BTRFS info (device dm-0): relocating block group 530457821184 flags 1 >>> [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance >> >> (and a full stack trace at the end of this message). >> >> The rebalance printed: >> >>> ERROR: error during balancing '/path/to/volume' - No space left on device >>> There may be more info in syslog - try dmesg | tail >> >> Eventually, not knowing what else to do I had to take my escape hatch >> and enlarge the volume. When I did this, metadata grew by 1GiB: >> >>> Data, single: total=490.97GiB, used=427.75GiB >>> System, DUP: total=8.00MiB, used=60.00KiB >>> System, single: total=4.00MiB, used=0.00 >>> Metadata, DUP: total=5.50GiB, used=4.50GiB >>> Metadata, single: total=8.00MiB, used=0.00 >>> unknown, single: total=512.00MiB, used=0.00 >> >> A few questions: >> >> * Why didn't the metadata grow before enlarging the disk? >> * Why didn't the rebalance enable the metadata to grow? >> * Why is it necessary to rebalance? Can't it automatically take some >> free space from 'data'? >> * Are my machine lockups related to the fact I was low on space? >> * Can we improve the documentation/FAQ for this? I was scratching my >> head in particular because my notion of free space definitely does not >> match up with BTRFS', and I didn't find the FAQ very helpful for >> getting out of this mess. >> * It isn't documented on the wiki what enospc_debug is supposed to do, >> so I couldn't tell whether I should have expected it to tell me >> anything in my circumstances. >> * What is the best course of action to take (other than enlarging the >> disk or deleting files) if I encounter this situation again? >> > > Looking at this line: > >> Data, single: total=489.97GiB, used=427.75GiB > > I see that btrfs has allocated almost the entire disk to Data, and it > appears you are starved for Metadata room. > > Once btrfs allocates space for either Data or Metadata, there are > currently no build-in kernel mechanisms re-allocate that space. We > have to use the userland balance tools. > > I agree that this behavior can become a "gotcha". Btrfs has the > capability to run in a mode where Data and Metadata are combined, but > there is a speed penalty running in Mixed Data/Metadata mode. > > The btrfs balance tools have to ability to use filters to run a > quicker pass on just the mostly-empty blocks, skipping a full balance. > > https://btrfs.wiki.kernel.org/index.php/Balance_Filters > > I would suggest this as the next step. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Mitch, I have run into this error to and this seems to be a rather big issue as ext4 seems to never run of metadata room at least from my testing. I feel greatly that this part of btrfs needs be improved and moved into a function or set of functions for re balancing metadata in the kernel itself. Regards Nick ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-02 23:35 ENOSPC with mkdir and rename Peter Waller 2014-08-03 0:28 ` Mitch Harder @ 2014-08-03 2:39 ` Russell Coker 2014-08-03 2:59 ` Nick Krause 2014-08-04 1:38 ` Qu Wenruo 2 siblings, 1 reply; 44+ messages in thread From: Russell Coker @ 2014-08-03 2:39 UTC (permalink / raw) To: Peter Waller; +Cc: linux-btrfs On Sun, 3 Aug 2014 00:35:28 Peter Waller wrote: > I'm running Ubuntu 14.04. I wonder if this problem is related to the > thread titled "Machine lockup due to btrfs-transaction on AWS EC2 > > Ubuntu 14.04" which I started on the 29th of July: > > http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 > > Kernel: 3.15.7-031507-generic As an aside, I'm still on 3.14 kernels for my systems and have no immediate plans to use 3.15. There has been discussion here about a number of problems with 3.15, so I don't think that any testing I do with 3.15 will help the developers and it will just take more of my time. > $ sudo btrfs fi df /path/to/volume > Data, single: total=489.97GiB, used=427.75GiB > Metadata, DUP: total=5.00GiB, used=4.50GiB As has been noted you are using all the space in 1G data chunks and the system can't allocate more 256M metadata chunks (which are allocated in pairs because it's "DUP" so allocating 512M at a time. > In this case, for example, metadata has 0.5GiB free ("sounds like > plenty for metadata for one mkdir to me"). Data has 62GiB free. Why > would I get ENOSPC for a file rename? Some space is always reserved. Due to the way BTRFS works changes to a file requires writing a new copy of the tree. So the amount of metadata space required for an operation that is conceptually simple can be significant. One thing that can sometimes solve that problem is to delete a subvol. But note that it can take a considerable amount of time to free the space, particularly if you are running out of metadata space. So you could delete a couple of subvols, run "sync" a couple of times, and have a coffee break. If possible avoid rebooting as that can make things much worse. This was a particular problem with kernels 3.13 and earlier which could enter a CPU loop requiring a reboot and then you would have big problems. > I tried a rebalance with btrfs balance start -dusage=10 and tried > increasing the value until I saw reallocations in dmesg. /sbin/btrfs fi balance start -dusage=30 -musage=10 / It's a good idea to have a cron job running a rebalance. Above is what I use on some of my systems, it will free data chunks that are up to 30% used and metadata chunks that are only 10% used. It almost never frees metadata chunks and regularly frees data chunks which is what I want. > and enlarge the volume. When I did this, metadata grew by 1GiB: > > Data, single: total=490.97GiB, used=427.75GiB > > System, DUP: total=8.00MiB, used=60.00KiB > > System, single: total=4.00MiB, used=0.00 > > Metadata, DUP: total=5.50GiB, used=4.50GiB > > Metadata, single: total=8.00MiB, used=0.00 > > unknown, single: total=512.00MiB, used=0.00 Now that you have solved that problem you could balance the filesystem (deallocating ~60 data chunks) and then shrink it. In the past I've added a USB flash disk to a filesystem to give it enough space to allow a balance and then removed it (NB you have to do a btrfs remove before removing the USB stick). > * Why didn't the metadata grow before enlarging the disk? > * Why didn't the rebalance enable the metadata to grow? > * Why is it necessary to rebalance? Can't it automatically take some > free space from 'data'? It would be nice if it could automatically rebalance. It's theoretically possible as the btrfs program just asks the kernel to do it. But there's nothing stopping you from having a regular cron job to do it. You could even write a daemon to poll the status of a btrfs filesystem and run balance when appropriate if you were keen enough. > * What is the best course of action to take (other than enlarging the > disk or deleting files) if I encounter this situation again? Have a cron job run a balance regularly. On Sat, 2 Aug 2014 21:52:36 Nick Krause wrote: > I have run into this error to and this seems to be a rather big issue as > ext4 seems to never run of metadata room at least from my testing. I feel > greatly that this part of btrfs needs be improved and moved into a function > or set of functions for re balancing metadata in the kernel itself. Ext4 has fixed size Inode tables that are assigned at mkfs time. If you run out of Inodes then you can't create new files. If you have too big Inode tables then you waste disk space and have a longer fsck time (at least before uninit_bg). The other metadata for Ext4 is allocated from data blocks so it will run out when data space runs out (EG if mkdir fails due to lack of space on ext4 then you can delete a file to make it work). But really BTRFS is just a totally different filesystem. Ext4 lacks the features such as full data checksums and subvolume support that make these things difficult. I always found the CP/M filesystem to be easier. It was when they added support for directories that things started getting difficult. :-# -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-03 2:39 ` Russell Coker @ 2014-08-03 2:59 ` Nick Krause 0 siblings, 0 replies; 44+ messages in thread From: Nick Krause @ 2014-08-03 2:59 UTC (permalink / raw) To: russell; +Cc: Peter Waller, linux-btrfs@vger.kernel.org SYSTEM list:BTRFS FILE On Sat, Aug 2, 2014 at 10:39 PM, Russell Coker <russell@coker.com.au> wrote: > On Sun, 3 Aug 2014 00:35:28 Peter Waller wrote: >> I'm running Ubuntu 14.04. I wonder if this problem is related to the >> thread titled "Machine lockup due to btrfs-transaction on AWS EC2 >> >> Ubuntu 14.04" which I started on the 29th of July: >> > http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 >> >> Kernel: 3.15.7-031507-generic > > As an aside, I'm still on 3.14 kernels for my systems and have no immediate > plans to use 3.15. There has been discussion here about a number of problems > with 3.15, so I don't think that any testing I do with 3.15 will help the > developers and it will just take more of my time. > >> $ sudo btrfs fi df /path/to/volume >> Data, single: total=489.97GiB, used=427.75GiB >> Metadata, DUP: total=5.00GiB, used=4.50GiB > > As has been noted you are using all the space in 1G data chunks and the system > can't allocate more 256M metadata chunks (which are allocated in pairs because > it's "DUP" so allocating 512M at a time. > >> In this case, for example, metadata has 0.5GiB free ("sounds like >> plenty for metadata for one mkdir to me"). Data has 62GiB free. Why >> would I get ENOSPC for a file rename? > > Some space is always reserved. Due to the way BTRFS works changes to a file > requires writing a new copy of the tree. So the amount of metadata space > required for an operation that is conceptually simple can be significant. > > One thing that can sometimes solve that problem is to delete a subvol. But > note that it can take a considerable amount of time to free the space, > particularly if you are running out of metadata space. So you could delete a > couple of subvols, run "sync" a couple of times, and have a coffee break. > > If possible avoid rebooting as that can make things much worse. This was a > particular problem with kernels 3.13 and earlier which could enter a CPU loop > requiring a reboot and then you would have big problems. > >> I tried a rebalance with btrfs balance start -dusage=10 and tried >> increasing the value until I saw reallocations in dmesg. > > /sbin/btrfs fi balance start -dusage=30 -musage=10 / > > It's a good idea to have a cron job running a rebalance. Above is what I use > on some of my systems, it will free data chunks that are up to 30% used and > metadata chunks that are only 10% used. It almost never frees metadata chunks > and regularly frees data chunks which is what I want. > >> and enlarge the volume. When I did this, metadata grew by 1GiB: >> > Data, single: total=490.97GiB, used=427.75GiB >> > System, DUP: total=8.00MiB, used=60.00KiB >> > System, single: total=4.00MiB, used=0.00 >> > Metadata, DUP: total=5.50GiB, used=4.50GiB >> > Metadata, single: total=8.00MiB, used=0.00 >> > unknown, single: total=512.00MiB, used=0.00 > > Now that you have solved that problem you could balance the filesystem > (deallocating ~60 data chunks) and then shrink it. In the past I've added a > USB flash disk to a filesystem to give it enough space to allow a balance and > then removed it (NB you have to do a btrfs remove before removing the USB > stick). > >> * Why didn't the metadata grow before enlarging the disk? >> * Why didn't the rebalance enable the metadata to grow? >> * Why is it necessary to rebalance? Can't it automatically take some >> free space from 'data'? > > It would be nice if it could automatically rebalance. It's theoretically > possible as the btrfs program just asks the kernel to do it. But there's > nothing stopping you from having a regular cron job to do it. You could even > write a daemon to poll the status of a btrfs filesystem and run balance when > appropriate if you were keen enough. > >> * What is the best course of action to take (other than enlarging the >> disk or deleting files) if I encounter this situation again? > > Have a cron job run a balance regularly. > > On Sat, 2 Aug 2014 21:52:36 Nick Krause wrote: >> I have run into this error to and this seems to be a rather big issue as >> ext4 seems to never run of metadata room at least from my testing. I feel >> greatly that this part of btrfs needs be improved and moved into a function >> or set of functions for re balancing metadata in the kernel itself. > > Ext4 has fixed size Inode tables that are assigned at mkfs time. If you run > out of Inodes then you can't create new files. If you have too big Inode > tables then you waste disk space and have a longer fsck time (at least before > uninit_bg). > > The other metadata for Ext4 is allocated from data blocks so it will run out > when data space runs out (EG if mkdir fails due to lack of space on ext4 then > you can delete a file to make it work). > > But really BTRFS is just a totally different filesystem. Ext4 lacks the > features such as full data checksums and subvolume support that make these > things difficult. > > I always found the CP/M filesystem to be easier. It was when they added > support for directories that things started getting difficult. :-# > > -- > My Main Blog http://etbe.coker.com.au/ > My Documents Blog http://doc.coker.com.au/ > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html No that's fine seems valid as of reading this message. Thanks again Russell. Regards Nick ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-02 23:35 ENOSPC with mkdir and rename Peter Waller 2014-08-03 0:28 ` Mitch Harder 2014-08-03 2:39 ` Russell Coker @ 2014-08-04 1:38 ` Qu Wenruo 2014-08-04 8:14 ` Peter Waller 2 siblings, 1 reply; 44+ messages in thread From: Qu Wenruo @ 2014-08-04 1:38 UTC (permalink / raw) To: Peter Waller, linux-btrfs Hi, Peter Some explain below inline. -------- Original Message -------- Subject: ENOSPC with mkdir and rename From: Peter Waller <peter@scraperwiki.com> To: <linux-btrfs@vger.kernel.org> Date: 2014年08月03日 07:35 > Hi All, > > My TL;DR questions are at the bottom, before the stack trace. > > I'm running Ubuntu 14.04. I wonder if this problem is related to the > thread titled "Machine lockup due to btrfs-transaction on AWS EC2 > Ubuntu 14.04" which I started on the 29th of July: > >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 > Kernel: 3.15.7-031507-generic > > I'm on a single block device system, i.e, no RAID. > > I was observing ENOSPC from `mkdir` and `rename` on this system, with > a good amount of free disk space (df -h reports 62 GB remain). I added > enospc_debug (full umount/mount, not just mount -o remount), but this > had no apparent effect when receiving ENOSPC from userland. > > $ sudo btrfs fi df /path/to/volume > Data, single: total=489.97GiB, used=427.75GiB > System, DUP: total=8.00MiB, used=60.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=5.00GiB, used=4.50GiB In fact, all your metadata is used. It seems strange since there should be 500MB(to be precious 512MiB) free, but I'll explain it below. > Metadata, single: total=8.00MiB, used=0.00 > unknown, single: total=512.00MiB, used=820.00KiB Here the "unknown" is in fact "global data reserve", reserved for COW tree write (except FS-tree and subvolume tree if I'm right) If you use latest btrfs-progs, it will not show "unknown" but "GlobalReserve" and it should not be used under most cases, but it is used, which really shows the shortage of space. So saddly, there is really no space for metadata for mkdir and rename(*). *: since rename will modify the metadata and since btrfs will do COW for metadata tree, and rename/mkdir will not use space from global reserve, so ENOSPC is normal. The good thing is that rm will steel space from global reserve, so you should be OK to remove files and hope to free enough metadata space. Or you can try to add more device to this btrfs. Thanks, Qu > > After a thorough search of the internet for ENOSPC BTRFS I found > various resources and came to understand a little bit more. One thing > which broke my intuition severely is that I expected if there is a > large number of free GiB, I should expect things to continue to work. > > In this case, for example, metadata has 0.5GiB free ("sounds like > plenty for metadata for one mkdir to me"). Data has 62GiB free. Why > would I get ENOSPC for a file rename? > > I expected that if metadata needed more space, it would just eat it > from the 'data'. Now I believe this not to be the case and that it > wanted to allocate > 0.5GiB, and this is why I was getting ENOSPC. > > I tried a rebalance with btrfs balance start -dusage=10 and tried > increasing the value until I saw reallocations in dmesg. > > This spat out a large number of messages in dmesg, of this form: > >> [376096.546353] BTRFS info (device dm-0): relocating block group 530457821184 flags 1 >> [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance > (and a full stack trace at the end of this message). > > The rebalance printed: > >> ERROR: error during balancing '/path/to/volume' - No space left on device >> There may be more info in syslog - try dmesg | tail > Eventually, not knowing what else to do I had to take my escape hatch > and enlarge the volume. When I did this, metadata grew by 1GiB: > >> Data, single: total=490.97GiB, used=427.75GiB >> System, DUP: total=8.00MiB, used=60.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, DUP: total=5.50GiB, used=4.50GiB >> Metadata, single: total=8.00MiB, used=0.00 >> unknown, single: total=512.00MiB, used=0.00 > A few questions: > > * Why didn't the metadata grow before enlarging the disk? > * Why didn't the rebalance enable the metadata to grow? > * Why is it necessary to rebalance? Can't it automatically take some > free space from 'data'? > * Are my machine lockups related to the fact I was low on space? > * Can we improve the documentation/FAQ for this? I was scratching my > head in particular because my notion of free space definitely does not > match up with BTRFS', and I didn't find the FAQ very helpful for > getting out of this mess. > * It isn't documented on the wiki what enospc_debug is supposed to do, > so I couldn't tell whether I should have expected it to tell me > anything in my circumstances. > * What is the best course of action to take (other than enlarging the > disk or deleting files) if I encounter this situation again? > > Thanks in advance, > > - Peter > > [376007.681938] ------------[ cut here ]------------ > [376007.681957] WARNING: CPU: 1 PID: 27021 at > /home/apw/COD/linux/fs/btrfs/extent-tree.c:6946 > use_block_rsv+0xfd/0x1a0 [btrfs]() > [376007.681958] BTRFS: block rsv returned -28 > [376007.681959] Modules linked in: softdog tcp_diag inet_diag dm_crypt > ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt > i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp > iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse > [376007.681980] CPU: 1 PID: 27021 Comm: pam_script_ses_ Tainted: G > W 3.15.7-031507-generic #201407281235 > [376007.681981] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014 > [376007.681983] 0000000000001b22 ffff8800acca39d8 ffffffff8176f115 > 0000000000000007 > [376007.681986] ffff8800acca3a28 ffff8800acca3a18 ffffffff8106ceac > ffff8801efc37870 > [376007.681989] ffff88017db0ff00 ffff8801aedcd800 0000000000001000 > ffff88001c987000 > [376007.681992] Call Trace: > [376007.682000] [<ffffffff8176f115>] dump_stack+0x46/0x58 > [376007.682005] [<ffffffff8106ceac>] warn_slowpath_common+0x8c/0xc0 > [376007.682008] [<ffffffff8106cf96>] warn_slowpath_fmt+0x46/0x50 > [376007.682016] [<ffffffffa00d9d1d>] use_block_rsv+0xfd/0x1a0 [btrfs] > [376007.682024] [<ffffffffa00de687>] btrfs_alloc_free_block+0x57/0x220 [btrfs] > [376007.682027] [<ffffffff8178033c>] ? __do_page_fault+0x28c/0x550 > [376007.682031] [<ffffffff8119749f>] ? page_add_file_rmap+0x6f/0xb0 > [376007.682037] [<ffffffffa00c8a3c>] btrfs_copy_root+0xfc/0x2b0 [btrfs] > [376007.682041] [<ffffffff811c60b9>] ? memcg_check_events+0x29/0x50 > [376007.682051] [<ffffffffa013a583>] ? create_reloc_root+0x33/0x2c0 [btrfs] > [376007.682061] [<ffffffffa013a743>] create_reloc_root+0x1f3/0x2c0 [btrfs] > [376007.682064] [<ffffffff811dd073>] ? generic_permission+0xf3/0x120 > [376007.682073] [<ffffffffa0140eb8>] btrfs_init_reloc_root+0xb8/0xd0 [btrfs] > [376007.682082] [<ffffffffa00ee967>] > record_root_in_trans.part.30+0x97/0x100 [btrfs] > [376007.682090] [<ffffffffa00ee9f4>] record_root_in_trans+0x24/0x30 [btrfs] > [376007.682098] [<ffffffffa00efeb1>] > btrfs_record_root_in_trans+0x51/0x80 [btrfs] > [376007.682106] [<ffffffffa00f13d6>] > start_transaction.part.35+0x86/0x560 [btrfs] > [376007.682109] [<ffffffff8132c197>] ? apparmor_capable+0x27/0x80 > [376007.682117] [<ffffffffa00f18d9>] start_transaction+0x29/0x30 [btrfs] > [376007.682125] [<ffffffffa00f19a7>] btrfs_join_transaction+0x17/0x20 [btrfs] > [376007.682133] [<ffffffffa00f7fa8>] btrfs_dirty_inode+0x58/0xe0 [btrfs] > [376007.682141] [<ffffffffa00fcaf2>] btrfs_setattr+0xa2/0xf0 [btrfs] > [376007.682144] [<ffffffff811eec74>] notify_change+0x1c4/0x3b0 > [376007.682146] [<ffffffff811dde96>] ? final_putname+0x26/0x50 > [376007.682149] [<ffffffff811d088d>] chown_common+0x16d/0x1a0 > [376007.682153] [<ffffffff811f2b08>] ? __mnt_want_write+0x58/0x70 > [376007.682156] [<ffffffff811d1a8f>] SyS_fchownat+0xbf/0x100 > [376007.682159] [<ffffffff811d1aed>] SyS_chown+0x1d/0x20 > [376007.682163] [<ffffffff817858bf>] tracesys+0xe1/0xe6 > [376007.682165] ---[ end trace 1853311c87a5cd94 ]--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 1:38 ` Qu Wenruo @ 2014-08-04 8:14 ` Peter Waller 2014-08-04 9:22 ` Clemens Eisserer ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Peter Waller @ 2014-08-04 8:14 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs Thanks for responses. All of this is *very* surprising. I'm not new to BTRFS, I've been using it on my own machines for multiple years. I didn't realise there was an un-holstered footgun on my lap at this point. How can it be made clear how to avoid the ENOSPC problem to myself and other sysadmins? Or preferably not exist as a problem? One thing which continues to puzzle me is "How do I make an alarm to warn of an impending ENOSPC condition on BTRFS?". ENOSPC is a bad place to be. All of the standard monitoring tools warn on the output of `df`. My first thought was to make a graph and put a threshold in `metadata total - used`. However, I was fortunate enough in this case to know about `btrfs fi df`. When I looked at "metadata free" I concluded that there is plenty free, not knowing that it was allocated in blocks larger than the amount presented as free (total - used = 0.5GiB). So these numbers were quite misleading in this case. If I had seen total=used, or available=0, the problem would have been much clearer. Why present space as available when it can't be used? In the end, it seems that metadata should be able to steal space from "data" on demand. That would make the output of "df" more informative, since you wouldn't see "60 GB free" and get ENOSPC, which is an utterly confusing situation and harmful to production. Is there something fundamental preventing that from happening or is it just that no-one has gotten around to yet? Thanks, - Peter On 4 August 2014 02:38, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > Hi, Peter > > Some explain below inline. > > -------- Original Message -------- > Subject: ENOSPC with mkdir and rename > From: Peter Waller <peter@scraperwiki.com> > To: <linux-btrfs@vger.kernel.org> > Date: 2014年08月03日 07:35 >> >> Hi All, >> >> My TL;DR questions are at the bottom, before the stack trace. >> >> I'm running Ubuntu 14.04. I wonder if this problem is related to the >> thread titled "Machine lockup due to btrfs-transaction on AWS EC2 >> Ubuntu 14.04" which I started on the 29th of July: >> >>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224 >> >> Kernel: 3.15.7-031507-generic >> >> I'm on a single block device system, i.e, no RAID. >> >> I was observing ENOSPC from `mkdir` and `rename` on this system, with >> a good amount of free disk space (df -h reports 62 GB remain). I added >> enospc_debug (full umount/mount, not just mount -o remount), but this >> had no apparent effect when receiving ENOSPC from userland. >> >> $ sudo btrfs fi df /path/to/volume >> Data, single: total=489.97GiB, used=427.75GiB >> System, DUP: total=8.00MiB, used=60.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, DUP: total=5.00GiB, used=4.50GiB > > In fact, all your metadata is used. > It seems strange since there should be 500MB(to be precious 512MiB) free, > but I'll explain it below. > >> Metadata, single: total=8.00MiB, used=0.00 >> unknown, single: total=512.00MiB, used=820.00KiB > > Here the "unknown" is in fact "global data reserve", reserved for COW tree > write (except FS-tree and subvolume tree if I'm right) > If you use latest btrfs-progs, it will not show "unknown" but > "GlobalReserve" and it should not be used under most cases, but it is used, > which really shows the shortage of space. > > So saddly, there is really no space for metadata for mkdir and rename(*). > > *: since rename will modify the metadata and since btrfs will do COW for > metadata tree, and rename/mkdir > will not use space from global reserve, so ENOSPC is normal. > > The good thing is that rm will steel space from global reserve, so you > should be OK to remove files and hope to free > enough metadata space. > Or you can try to add more device to this btrfs. > > Thanks, > Qu >> >> >> After a thorough search of the internet for ENOSPC BTRFS I found >> various resources and came to understand a little bit more. One thing >> which broke my intuition severely is that I expected if there is a >> large number of free GiB, I should expect things to continue to work. >> >> In this case, for example, metadata has 0.5GiB free ("sounds like >> plenty for metadata for one mkdir to me"). Data has 62GiB free. Why >> would I get ENOSPC for a file rename? >> >> I expected that if metadata needed more space, it would just eat it >> from the 'data'. Now I believe this not to be the case and that it >> wanted to allocate > 0.5GiB, and this is why I was getting ENOSPC. >> >> I tried a rebalance with btrfs balance start -dusage=10 and tried >> increasing the value until I saw reallocations in dmesg. >> >> This spat out a large number of messages in dmesg, of this form: >> >>> [376096.546353] BTRFS info (device dm-0): relocating block group >>> 530457821184 flags 1 >>> [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance >> >> (and a full stack trace at the end of this message). >> >> The rebalance printed: >> >>> ERROR: error during balancing '/path/to/volume' - No space left on device >>> There may be more info in syslog - try dmesg | tail >> >> Eventually, not knowing what else to do I had to take my escape hatch >> and enlarge the volume. When I did this, metadata grew by 1GiB: >> >>> Data, single: total=490.97GiB, used=427.75GiB >>> System, DUP: total=8.00MiB, used=60.00KiB >>> System, single: total=4.00MiB, used=0.00 >>> Metadata, DUP: total=5.50GiB, used=4.50GiB >>> Metadata, single: total=8.00MiB, used=0.00 >>> unknown, single: total=512.00MiB, used=0.00 >> >> A few questions: >> >> * Why didn't the metadata grow before enlarging the disk? >> * Why didn't the rebalance enable the metadata to grow? >> * Why is it necessary to rebalance? Can't it automatically take some >> free space from 'data'? >> * Are my machine lockups related to the fact I was low on space? >> * Can we improve the documentation/FAQ for this? I was scratching my >> head in particular because my notion of free space definitely does not >> match up with BTRFS', and I didn't find the FAQ very helpful for >> getting out of this mess. >> * It isn't documented on the wiki what enospc_debug is supposed to do, >> so I couldn't tell whether I should have expected it to tell me >> anything in my circumstances. >> * What is the best course of action to take (other than enlarging the >> disk or deleting files) if I encounter this situation again? >> >> Thanks in advance, >> >> - Peter >> >> [376007.681938] ------------[ cut here ]------------ >> [376007.681957] WARNING: CPU: 1 PID: 27021 at >> /home/apw/COD/linux/fs/btrfs/extent-tree.c:6946 >> use_block_rsv+0xfd/0x1a0 [btrfs]() >> [376007.681958] BTRFS: block rsv returned -28 >> [376007.681959] Modules linked in: softdog tcp_diag inet_diag dm_crypt >> ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt >> i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp >> iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq >> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel >> aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse >> [376007.681980] CPU: 1 PID: 27021 Comm: pam_script_ses_ Tainted: G >> W 3.15.7-031507-generic #201407281235 >> [376007.681981] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014 >> [376007.681983] 0000000000001b22 ffff8800acca39d8 ffffffff8176f115 >> 0000000000000007 >> [376007.681986] ffff8800acca3a28 ffff8800acca3a18 ffffffff8106ceac >> ffff8801efc37870 >> [376007.681989] ffff88017db0ff00 ffff8801aedcd800 0000000000001000 >> ffff88001c987000 >> [376007.681992] Call Trace: >> [376007.682000] [<ffffffff8176f115>] dump_stack+0x46/0x58 >> [376007.682005] [<ffffffff8106ceac>] warn_slowpath_common+0x8c/0xc0 >> [376007.682008] [<ffffffff8106cf96>] warn_slowpath_fmt+0x46/0x50 >> [376007.682016] [<ffffffffa00d9d1d>] use_block_rsv+0xfd/0x1a0 [btrfs] >> [376007.682024] [<ffffffffa00de687>] btrfs_alloc_free_block+0x57/0x220 >> [btrfs] >> [376007.682027] [<ffffffff8178033c>] ? __do_page_fault+0x28c/0x550 >> [376007.682031] [<ffffffff8119749f>] ? page_add_file_rmap+0x6f/0xb0 >> [376007.682037] [<ffffffffa00c8a3c>] btrfs_copy_root+0xfc/0x2b0 [btrfs] >> [376007.682041] [<ffffffff811c60b9>] ? memcg_check_events+0x29/0x50 >> [376007.682051] [<ffffffffa013a583>] ? create_reloc_root+0x33/0x2c0 >> [btrfs] >> [376007.682061] [<ffffffffa013a743>] create_reloc_root+0x1f3/0x2c0 >> [btrfs] >> [376007.682064] [<ffffffff811dd073>] ? generic_permission+0xf3/0x120 >> [376007.682073] [<ffffffffa0140eb8>] btrfs_init_reloc_root+0xb8/0xd0 >> [btrfs] >> [376007.682082] [<ffffffffa00ee967>] >> record_root_in_trans.part.30+0x97/0x100 [btrfs] >> [376007.682090] [<ffffffffa00ee9f4>] record_root_in_trans+0x24/0x30 >> [btrfs] >> [376007.682098] [<ffffffffa00efeb1>] >> btrfs_record_root_in_trans+0x51/0x80 [btrfs] >> [376007.682106] [<ffffffffa00f13d6>] >> start_transaction.part.35+0x86/0x560 [btrfs] >> [376007.682109] [<ffffffff8132c197>] ? apparmor_capable+0x27/0x80 >> [376007.682117] [<ffffffffa00f18d9>] start_transaction+0x29/0x30 [btrfs] >> [376007.682125] [<ffffffffa00f19a7>] btrfs_join_transaction+0x17/0x20 >> [btrfs] >> [376007.682133] [<ffffffffa00f7fa8>] btrfs_dirty_inode+0x58/0xe0 [btrfs] >> [376007.682141] [<ffffffffa00fcaf2>] btrfs_setattr+0xa2/0xf0 [btrfs] >> [376007.682144] [<ffffffff811eec74>] notify_change+0x1c4/0x3b0 >> [376007.682146] [<ffffffff811dde96>] ? final_putname+0x26/0x50 >> [376007.682149] [<ffffffff811d088d>] chown_common+0x16d/0x1a0 >> [376007.682153] [<ffffffff811f2b08>] ? __mnt_want_write+0x58/0x70 >> [376007.682156] [<ffffffff811d1a8f>] SyS_fchownat+0xbf/0x100 >> [376007.682159] [<ffffffff811d1aed>] SyS_chown+0x1d/0x20 >> [376007.682163] [<ffffffff817858bf>] tracesys+0xe1/0xe6 >> [376007.682165] ---[ end trace 1853311c87a5cd94 ]--- >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 8:14 ` Peter Waller @ 2014-08-04 9:22 ` Clemens Eisserer 2014-08-04 9:39 ` Chris Samuel 2014-08-05 8:51 ` Qu Wenruo 2 siblings, 0 replies; 44+ messages in thread From: Clemens Eisserer @ 2014-08-04 9:22 UTC (permalink / raw) To: linux-btrfs Hi Peter, > All of this is *very* surprising. I'm not new to BTRFS, I've been > using it on my own machines for multiple years. I didn't realise there > was an un-holstered footgun on my lap at this point. How can it be > made clear how to avoid the ENOSPC problem to myself and other > sysadmins? Or preferably not exist as a problem? I've also found the fixed metadata/data split to be an uncomfortable implementation detail, and some more flexible approach would be very welcome from my side. So far I've used BTRFS' mixed mode mentioned in the mkfs.btrfs man page: > -M|--mixedMix data and metadata chunks together for more efficient space utilization. > This feature incurs a performance penalty in larger filesystems. > It is recommended for use with filesystems of 1 GiB or smaller. However I didn't find any information on how large the mentioned overhead is, or where it originates from. Best regards, Clemens ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 8:14 ` Peter Waller 2014-08-04 9:22 ` Clemens Eisserer @ 2014-08-04 9:39 ` Chris Samuel 2014-08-04 9:56 ` Clemens Eisserer 2014-08-04 10:09 ` Peter Waller 2014-08-05 8:51 ` Qu Wenruo 2 siblings, 2 replies; 44+ messages in thread From: Chris Samuel @ 2014-08-04 9:39 UTC (permalink / raw) To: linux-btrfs On Mon, 4 Aug 2014 09:14:19 AM Peter Waller wrote: > All of this is *very* surprising. Hmm, it shouldn't be, the ENOSPC issues are well known and have been discussed here for years. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 9:39 ` Chris Samuel @ 2014-08-04 9:56 ` Clemens Eisserer 2014-08-04 10:24 ` Chris Samuel 2014-08-04 10:09 ` Peter Waller 1 sibling, 1 reply; 44+ messages in thread From: Clemens Eisserer @ 2014-08-04 9:56 UTC (permalink / raw) To: linux-btrfs Hi Chris, > Hmm, it shouldn't be, the ENOSPC issues are well known and have been discussed > here for years. Which doesn't protect the *average* user from running into issues like this. Just because it has been discussed, doesn't mean nothing can/should be done about it ;) However, as I am only a user too and can't contribute in terms of code, I keep patient and observe how btrfs is evolving. One day or another, the ENOSPC issues will get fixed or worked arround, Regards, Clemens ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 9:56 ` Clemens Eisserer @ 2014-08-04 10:24 ` Chris Samuel 2014-08-05 8:06 ` Duncan 0 siblings, 1 reply; 44+ messages in thread From: Chris Samuel @ 2014-08-04 10:24 UTC (permalink / raw) To: linux-btrfs On Mon, 4 Aug 2014 11:56:46 AM Clemens Eisserer wrote: > Which doesn't protect the *average* user from running into issues like this. No, but they need to be aware of it. > Just because it has been discussed, doesn't mean nothing can/should be done > about it Indeed, and a lot of work has been done over the years on it and it's a lot better than it used to be. :-) cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:24 ` Chris Samuel @ 2014-08-05 8:06 ` Duncan 2014-08-05 12:20 ` Russell Coker 0 siblings, 1 reply; 44+ messages in thread From: Duncan @ 2014-08-05 8:06 UTC (permalink / raw) To: linux-btrfs Chris Samuel posted on Mon, 04 Aug 2014 20:24:46 +1000 as excerpted: > On Mon, 4 Aug 2014 11:56:46 AM Clemens Eisserer wrote: > >> Which doesn't protect the *average* user from running into issues like >> this. > > No, but they need to be aware of it. Actually, an ordinary user/admin /should/ have no more need to be aware of it than they do on any other filesystem. Since that issue doesn't occur on ext* or reiserfs, to pick two examples I'm familiar with, they shouldn't need to worry about it on btrfs either. But then, just such an "ordinary admin" shouldn't yet be running btrfs on their system, as it's simply not to that point of readiness and maturity yet. Which is why I'm not particularly happy with seeing all the "btrfs is still not stable, use at your own risk" warnings disappearing. With them there, people who chose to run btrfs /could/ be expected to have done their research and have btrfs specific knowledge such as this, because btrfs was clearly marked as /not/ ready for "ordinary users" not prepared to do such research on their own. But now that those warnings are all being removed, btrfs should "just work" for all those "ordinary users". But it doesn't. Btrfs is still special and requires btrfs-domain specific knowledge to properly administer, as the fixes that would remove that requirement, in this case perhaps a background thread that would check for data/metadata imbalance and at least log a warning suggesting a rebalance, if not triggering that rebalance on its own, simply aren't there yet. IMO, without those fixes, btrfs is still experimental, or at least not entirely stable yet and requiring btrfs-domain-specific knowledge, and should keep the warnings saying exactly that. Unfortunately... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 8:06 ` Duncan @ 2014-08-05 12:20 ` Russell Coker 2014-08-05 12:58 ` Clemens Eisserer ` (3 more replies) 0 siblings, 4 replies; 44+ messages in thread From: Russell Coker @ 2014-08-05 12:20 UTC (permalink / raw) To: Duncan; +Cc: linux-btrfs On Tue, 5 Aug 2014 08:06:12 Duncan wrote: > Which is why I'm not particularly happy with seeing all the "btrfs is > still not stable, use at your own risk" warnings disappearing. With them > there, people who chose to run btrfs /could/ be expected to have done > their research and have btrfs specific knowledge such as this, because > btrfs was clearly marked as /not/ ready for "ordinary users" not prepared > to do such research on their own. > > But now that those warnings are all being removed, btrfs should "just > work" for all those "ordinary users". > > But it doesn't. Btrfs is still special and requires btrfs-domain > specific knowledge to properly administer, as the fixes that would remove > that requirement, in this case perhaps a background thread that would > check for data/metadata imbalance and at least log a warning suggesting a > rebalance, if not triggering that rebalance on its own, simply aren't > there yet. Currently the Debian/Jessie freeze is approaching. The Debian kernel team have chosen 3.16 and don't have any plans for significant back-ports from later kernels. Based on what I've read on this list it seems that BTRFS is less stable in 3.15 than in 3.14. Even 3.14 isn't something I'd recommend to random people who want something to just work. The Debian installer has BTRFS in a list of filesystems to choose with no special notice about it. I'm thinking of filing a Debian bug requesting that they put a warning against it. What do people here think? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 12:20 ` Russell Coker @ 2014-08-05 12:58 ` Clemens Eisserer 2014-08-05 13:02 ` Peter Waller 2014-08-10 17:21 ` Martin Steigerwald 2014-08-05 13:36 ` Chris Samuel ` (2 subsequent siblings) 3 siblings, 2 replies; 44+ messages in thread From: Clemens Eisserer @ 2014-08-05 12:58 UTC (permalink / raw) To: linux-btrfs Hi Russel, > The Debian installer has BTRFS in a list of filesystems to choose with no > special notice about it. I'm thinking of filing a Debian bug requesting that > they put a warning against it. As long as it is not selected as the default filesystem, I think it is fine. Other distributions have been offering btrfs for some time now, too. Regards, Clemens ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 12:58 ` Clemens Eisserer @ 2014-08-05 13:02 ` Peter Waller 2014-08-10 17:21 ` Martin Steigerwald 1 sibling, 0 replies; 44+ messages in thread From: Peter Waller @ 2014-08-05 13:02 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-btrfs On 5 August 2014 13:58, Clemens Eisserer <linuxhippy@gmail.com> wrote: > As long as it is not selected as the default filesystem, I think it is fine. > Other distributions have been offering btrfs for some time now, too. How do you warn non-BTRFS-developers in this case that they need to run a regular rebalance or they may end up in a difficult/expensive/impossible to fix ENOSPC condition at an inconvenient moment? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 12:58 ` Clemens Eisserer 2014-08-05 13:02 ` Peter Waller @ 2014-08-10 17:21 ` Martin Steigerwald 1 sibling, 0 replies; 44+ messages in thread From: Martin Steigerwald @ 2014-08-10 17:21 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-btrfs Am Dienstag, 5. August 2014, 14:58:34 schrieb Clemens Eisserer: > Hi Russel, > > > The Debian installer has BTRFS in a list of filesystems to choose with no > > special notice about it. I'm thinking of filing a Debian bug requesting > > that they put a warning against it. > > As long as it is not selected as the default filesystem, I think it is fine. > Other distributions have been offering btrfs for some time now, too. For example SLES 11 SP 2. A Linux training VM image I developed some slides about implementing an OpenLDAP server in SLES with on the next day was totally broke: - no space left on device - snapper created tons of snapshots - yet in df -h still 2 GB free - rm on a logfile returned no space left on device - btrfs subvol delete returned no space left on device - I think I also tried btrfs balance with no space left on device, but I am not 100% sure At that time I just created a snapshot of the broken state and returned to a previous snapshot to have the VM fixed. And note: This is on a distro that has enterprise support for using BTRFS on root filesystem – while using a 3.0 kernel, with hopefully some… but apparently not enough backports. Granted, still it would be nice to to add a warning to Debian. Better still would be just to have stability fixes for the hangs go into 3.16- stable and thus also into Debian Jessie. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 12:20 ` Russell Coker 2014-08-05 12:58 ` Clemens Eisserer @ 2014-08-05 13:36 ` Chris Samuel 2014-08-06 0:04 ` Duncan 2014-08-06 0:38 ` ronnie sahlberg 3 siblings, 0 replies; 44+ messages in thread From: Chris Samuel @ 2014-08-05 13:36 UTC (permalink / raw) To: linux-btrfs On Tue, 5 Aug 2014 10:20:33 PM Russell Coker wrote: > The Debian installer has BTRFS in a list of filesystems to choose with no > special notice about it. I'm thinking of filing a Debian bug requesting > that they put a warning against it. I think it's a good plan. People should be aware of the risks they are running. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 12:20 ` Russell Coker 2014-08-05 12:58 ` Clemens Eisserer 2014-08-05 13:36 ` Chris Samuel @ 2014-08-06 0:04 ` Duncan 2014-08-06 0:38 ` ronnie sahlberg 3 siblings, 0 replies; 44+ messages in thread From: Duncan @ 2014-08-06 0:04 UTC (permalink / raw) To: linux-btrfs Russell Coker posted on Tue, 05 Aug 2014 22:20:33 +1000 as excerpted: > The Debian installer has BTRFS in a list of filesystems to choose with > no special notice about it. I'm thinking of filing a Debian bug > requesting that they put a warning against it. > > What do people here think? You already have my general feeling, a warning is still appropriate. For Debian, I believe it's fair to characterize people running stable as relatively conservative. As such, a warning may be appropriate, but if they're actually /that/ conservative, is it needed, or will user's natural inclinations to filesystem conservatism be enough, and a btrfs warning thus look more serious than it is? I'd say warn, unless that warning /will/ be seen as "eat your babies" level, even when it's more "appropriate care and a regular backup program recommended." -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 12:20 ` Russell Coker ` (2 preceding siblings ...) 2014-08-06 0:04 ` Duncan @ 2014-08-06 0:38 ` ronnie sahlberg 2014-08-06 1:18 ` Nick Krause 3 siblings, 1 reply; 44+ messages in thread From: ronnie sahlberg @ 2014-08-06 0:38 UTC (permalink / raw) To: russell; +Cc: Duncan, Btrfs BTRFS On Tue, Aug 5, 2014 at 5:20 AM, Russell Coker <russell@coker.com.au> wrote: > > Based on what I've read on this list it seems that BTRFS is less stable in > 3.15 than in 3.14. Even 3.14 isn't something I'd recommend to random people > who want something to just work. > > The Debian installer has BTRFS in a list of filesystems to choose with no > special notice about it. I'm thinking of filing a Debian bug requesting that > they put a warning against it. > > What do people here think? +1 for a warning. btrfs is still a young filesystem and not as stable as say ext4. I think it would be very prudent to have a small warning. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-06 0:38 ` ronnie sahlberg @ 2014-08-06 1:18 ` Nick Krause 0 siblings, 0 replies; 44+ messages in thread From: Nick Krause @ 2014-08-06 1:18 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Russell Coker, Duncan, Btrfs BTRFS On Tue, Aug 5, 2014 at 8:38 PM, ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > On Tue, Aug 5, 2014 at 5:20 AM, Russell Coker <russell@coker.com.au> wrote: > >> >> Based on what I've read on this list it seems that BTRFS is less stable in >> 3.15 than in 3.14. Even 3.14 isn't something I'd recommend to random people >> who want something to just work. >> >> The Debian installer has BTRFS in a list of filesystems to choose with no >> special notice about it. I'm thinking of filing a Debian bug requesting that >> they put a warning against it. >> >> What do people here think? > > +1 for a warning. > > btrfs is still a young filesystem and not as stable as say ext4. > I think it would be very prudent to have a small warning. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html I agree here and feel this is very important. +1 Nick ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 9:39 ` Chris Samuel 2014-08-04 9:56 ` Clemens Eisserer @ 2014-08-04 10:09 ` Peter Waller 2014-08-04 10:22 ` Hugo Mills ` (2 more replies) 1 sibling, 3 replies; 44+ messages in thread From: Peter Waller @ 2014-08-04 10:09 UTC (permalink / raw) To: Chris Samuel; +Cc: linux-btrfs On 4 August 2014 10:39, Chris Samuel <chris@csamuel.org> wrote: > On Mon, 4 Aug 2014 09:14:19 AM Peter Waller wrote: >> All of this is *very* surprising. > > Hmm, it shouldn't be, the ENOSPC issues are well known and have been discussed > here for years. I accept that. It's all very well if you read the BTRFS list and/or are a BTRFS developer. But if you're trying to work it out in the heat of battle, as we have sysadmins who would have to, there is a combination of things here that makes it unreasonable and harmful for production. I was in a situation where I was getting sporadic ENOSPC and none of the instructions I could find helped. I did a thorough search of the wiki and mailing list - I found a plethora of similar sounding problems and none of the advice given helped. Our usage is a simple case: no RAID, no subvolumes, no snapshots. We had >60GiB free and apparently some metadata free. I still can't find a clear answer to the question "How do I make an alarm to warn of an impending ENOSPC condition on BTRFS?" Is that because there is no clear answer? The nature of "running out of disk space" as a problem means you won't hit it until you've been using it for a long while, which makes this problem of the form "a ticking time bomb". Is there no way to make this operationally easier? or should only BTRFS developers use BTRFS? I'm breaking the rest out below if you are interested to try and understand more the problems I was having. Thanks, - Peter More thoughts to illustrate the problems with the existing documentation: Getting started contains no warning of what's different about free space compared with other filesystems one might be familiar with: https://btrfs.wiki.kernel.org/index.php/Getting_started The sysadmin guide doesn't appear to mention free space at all: https://btrfs.wiki.kernel.org/index.php/SysadminGuide The FAQ has a question: https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_like_I_should_have_lots_left.21 Which starts out "Free space is a tricky concept in Btrfs" but then doesn't explain it very well. None of the advice given there helped in my case. There is talk about a mixed mode, but not how to move an existing filesystem to it. I'm yet to find an explanation of rebalancing which isn't focussed on what it means for RAID, and it still isn't crystal clear to me what rebalancing means for metadata/data on one disk. Rebalancing didn't work in my case. Must I construct an image of the underlying BTRFS datastructures in my head? I'm fine if I have to do that, but nowhere makes it clear what mental tools I need to tackle this. This link is mentioned by the above but not directly linked to by it (and has "are" and "is" changed compared with the above text): https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F This link would have helped a bit but wasn't cross referenced by any of the other materials which I did find, so I couldn't find it in the heat of battle: https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space One problem is that it isn't clear what "chunks" are. Does an operator of a BTRFS filesystem need to understand this in the simple case of no snapshots, no RAID? How did the whole disk come to be allocated to data given that we hadn't used all of it? Is it because the data is using chunks inefficiently? How does this come to be in the simple case? The documentation could use some illustrations to make this clear. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:09 ` Peter Waller @ 2014-08-04 10:22 ` Hugo Mills 2014-08-04 10:31 ` Peter Waller 2014-08-04 11:04 ` Clemens Eisserer 2014-08-04 10:50 ` Chris Samuel 2014-08-10 17:26 ` Martin Steigerwald 2 siblings, 2 replies; 44+ messages in thread From: Hugo Mills @ 2014-08-04 10:22 UTC (permalink / raw) To: Peter Waller; +Cc: Chris Samuel, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 5495 bytes --] On Mon, Aug 04, 2014 at 11:09:23AM +0100, Peter Waller wrote: > On 4 August 2014 10:39, Chris Samuel <chris@csamuel.org> wrote: > > On Mon, 4 Aug 2014 09:14:19 AM Peter Waller wrote: > >> All of this is *very* surprising. > > > > Hmm, it shouldn't be, the ENOSPC issues are well known and have been discussed > > here for years. > > I accept that. It's all very well if you read the BTRFS list and/or > are a BTRFS developer. But if you're trying to work it out in the heat > of battle, as we have sysadmins who would have to, there is a > combination of things here that makes it unreasonable and harmful for > production. > > I was in a situation where I was getting sporadic ENOSPC and none of > the instructions I could find helped. I did a thorough search of the > wiki and mailing list - I found a plethora of similar sounding > problems and none of the advice given helped. > > Our usage is a simple case: no RAID, no subvolumes, no snapshots. We > had >60GiB free and apparently some metadata free. > > I still can't find a clear answer to the question "How do I make an > alarm to warn of an impending ENOSPC condition on BTRFS?" On the 3.15+ kernels, the block reserve is split out of metadata and reported separately. This helps with the following process: * btrfs fi show - look at the total and used values. If used < total, you're OK. If used == total, then you could potentially hit ENOSPC. * btrfs fi df - look at metadata used vs total. If these are close to zero (on 3.15+) or close to 512 MiB (on <3.15), then you are in danger of ENOSPC. - look at data used vs total. If the used is much smaller than total, you can reclaim some of the allocation with a filtered balance (btrfs balance start -dusage=5), which will then give you unallocated space again (see the btrfs fi show test). > Is that because there is no clear answer? > > The nature of "running out of disk space" as a problem means you won't > hit it until you've been using it for a long while, which makes this > problem of the form "a ticking time bomb". Is there no way to make > this operationally easier? or should only BTRFS developers use BTRFS? > > I'm breaking the rest out below if you are interested to try and > understand more the problems I was having. > > Thanks, > > - Peter > > More thoughts to illustrate the problems with the existing documentation: > > Getting started contains no warning of what's different about free > space compared with other filesystems one might be familiar with: > > https://btrfs.wiki.kernel.org/index.php/Getting_started > > The sysadmin guide doesn't appear to mention free space at all: > > https://btrfs.wiki.kernel.org/index.php/SysadminGuide > > The FAQ has a question: > > https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_like_I_should_have_lots_left.21 > > Which starts out "Free space is a tricky concept in Btrfs" but then > doesn't explain it very well. None of the advice given there helped in > my case. There is talk about a mixed mode, but not how to move an > existing filesystem to it. I'm yet to find an explanation of > rebalancing which isn't focussed on what it means for RAID, and it > still isn't crystal clear to me what rebalancing means for > metadata/data on one disk. Rebalancing didn't work in my case. Must I > construct an image of the underlying BTRFS datastructures in my head? > I'm fine if I have to do that, but nowhere makes it clear what mental > tools I need to tackle this. This FAQ entry is pretty horrible, I'm afraid. I actually started rewriting it here to try to make it clearer what's going on. I'll try to work on it a bit more this week and put out a better version for the wiki. > This link is mentioned by the above but not directly linked to by it > (and has "are" and "is" changed compared with the above text): > > https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F > > This link would have helped a bit but wasn't cross referenced by any > of the other materials which I did find, so I couldn't find it in the > heat of battle: > > https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space > > One problem is that it isn't clear what "chunks" are. Does an operator > of a BTRFS filesystem need to understand this in the simple case of no > snapshots, no RAID? > > How did the whole disk come to be allocated to data given that we > hadn't used all of it? Is it because the data is using chunks > inefficiently? How does this come to be in the simple case? Two ways: Write lots of data, delete it again. (This could also happen with snapshots). Alternatively, kernels earlier than about 3.10 had a bug that massively overallocated data chunks when it didn't need to. Please do feel free to add more crosslinks or text to the wiki to make it clearer where to look. The "pretty horrible" FAQ entry mentioned above is the canonical location for dealing with early ENOSPC problems, so other things should probably point at that. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you're afraid of having no --- money? There's irony... [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:22 ` Hugo Mills @ 2014-08-04 10:31 ` Peter Waller 2014-08-04 10:39 ` Hugo Mills 2014-08-04 17:09 ` Austin S Hemmelgarn 2014-08-04 11:04 ` Clemens Eisserer 1 sibling, 2 replies; 44+ messages in thread From: Peter Waller @ 2014-08-04 10:31 UTC (permalink / raw) To: Hugo Mills, Peter Waller, Chris Samuel, linux-btrfs Thanks Hugo, this is the most informative e-mail yet! (more inline) On 4 August 2014 11:22, Hugo Mills <hugo@carfax.org.uk> wrote: > > * btrfs fi show > - look at the total and used values. If used < total, you're OK. > If used == total, then you could potentially hit ENOSPC. Another thing which is unclear and undocumented anywhere I can find is what the meaning of `btrfs fi show` is. I'm sure it is totally obvious if you are a developer or if you have used it for long enough. But it isn't covered in the manpage, nor in the oracle documentation, nor anywhere on the wiki that I could find. When I looked at it in my problematic situation, it said "500 GiB / 500 GiB". That sounded fine to me because I interpreted the output as what fraction of which RAID devices BTRFS was using. In other words, I thought "Oh, BTRFS will just make use of the whole device that's available to it.". I thought that `btrfs fi df` was the source of information for how much space was free inside of that. > * btrfs fi df > - look at metadata used vs total. If these are close to zero (on > 3.15+) or close to 512 MiB (on <3.15), then you are in danger of > ENOSPC. Hmm. It's unfortunate that this could indicate an amount of space which is free when it actually isn't. > - look at data used vs total. If the used is much smaller than > total, you can reclaim some of the allocation with a filtered > balance (btrfs balance start -dusage=5), which will then give > you unallocated space again (see the btrfs fi show test). So the filtered balance didn't help in my situation. I understand it's something to do with the "5" parameter. But I do not understand what the impact of changing this parameter is. It is something to do with a fraction of something, but those things are still not present in my mental model despite a large amount of reading. Is there an illustration which could clear this up? Among other things I also got the kernel stack trace I pasted at the bottom of the first e-mail to this thread when I did the rebalance. > This FAQ entry is pretty horrible, I'm afraid. I actually started > rewriting it here to try to make it clearer what's going on. I'll try > to work on it a bit more this week and put out a better version for > the wiki. This is great to hear! :) Thanks for your response Hugo, that really cleared up a lot of mental model problems. I hope the documentation can be improved so that others can learn from my mistakes. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:31 ` Peter Waller @ 2014-08-04 10:39 ` Hugo Mills 2014-08-04 10:48 ` Peter Waller 2014-08-04 17:09 ` Austin S Hemmelgarn 1 sibling, 1 reply; 44+ messages in thread From: Hugo Mills @ 2014-08-04 10:39 UTC (permalink / raw) To: Peter Waller; +Cc: Chris Samuel, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4027 bytes --] On Mon, Aug 04, 2014 at 11:31:57AM +0100, Peter Waller wrote: > Thanks Hugo, this is the most informative e-mail yet! (more inline) > > On 4 August 2014 11:22, Hugo Mills <hugo@carfax.org.uk> wrote: > > > > * btrfs fi show > > - look at the total and used values. If used < total, you're OK. > > If used == total, then you could potentially hit ENOSPC. > > Another thing which is unclear and undocumented anywhere I can find is > what the meaning of `btrfs fi show` is. > > I'm sure it is totally obvious if you are a developer or if you have > used it for long enough. But it isn't covered in the manpage, nor in > the oracle documentation, nor anywhere on the wiki that I could find. > > When I looked at it in my problematic situation, it said "500 GiB / > 500 GiB". That sounded fine to me because I interpreted the output as > what fraction of which RAID devices BTRFS was using. In other words, I > thought "Oh, BTRFS will just make use of the whole device that's > available to it.". I thought that `btrfs fi df` was the source of > information for how much space was free inside of that. That's actually pretty much accurate. The problem is that btrfs distinguishes between "space available for data" and "space available for metadata", and doesn't trade off one for the other once they've been allocated. The balance operation frees up some of the allocation, allowing the newly-freed space to be allocated again for something else. All of the information about the data/metadata split, and what's used out of that, is revealed by btrfs fi df. > > * btrfs fi df > > - look at metadata used vs total. If these are close to zero (on > > 3.15+) or close to 512 MiB (on <3.15), then you are in danger of > > ENOSPC. > > Hmm. It's unfortunate that this could indicate an amount of space > which is free when it actually isn't. That's why the 512 MiB block reserve was split out of metadata -- so that you don't look at metadata and say "oh, I've got half a gig free, that's OK". > > - look at data used vs total. If the used is much smaller than > > total, you can reclaim some of the allocation with a filtered > > balance (btrfs balance start -dusage=5), which will then give > > you unallocated space again (see the btrfs fi show test). > > So the filtered balance didn't help in my situation. I understand it's > something to do with the "5" parameter. But I do not understand what > the impact of changing this parameter is. It is something to do with a > fraction of something, but those things are still not present in my > mental model despite a large amount of reading. Is there an > illustration which could clear this up? The 5 is 5%. So, it'll only look at chunks which are less than 5% full. David Sterba published a patch that would balance the (approximately N) least-used chunks, which is a considerably more usable approach, but I don't know what happened to that one. > Among other things I also got the kernel stack trace I pasted at the > bottom of the first e-mail to this thread when I did the rebalance. OK, I'll go back and read that. You probably shouldn't have had it, though. :) > > This FAQ entry is pretty horrible, I'm afraid. I actually started > > rewriting it here to try to make it clearer what's going on. I'll try > > to work on it a bit more this week and put out a better version for > > the wiki. > > This is great to hear! :) > > Thanks for your response Hugo, that really cleared up a lot of mental > model problems. I hope the documentation can be improved so that > others can learn from my mistakes. I do try to work on it every so often. Note to self: win lottery, or get cloned. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you're afraid of having no --- money? There's irony... [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:39 ` Hugo Mills @ 2014-08-04 10:48 ` Peter Waller 2014-08-04 11:29 ` Hugo Mills 0 siblings, 1 reply; 44+ messages in thread From: Peter Waller @ 2014-08-04 10:48 UTC (permalink / raw) To: Hugo Mills, Peter Waller, Chris Samuel, linux-btrfs On 4 August 2014 11:39, Hugo Mills <hugo@carfax.org.uk> wrote: >> > * btrfs fi df >> > - look at metadata used vs total. If these are close to zero (on >> > 3.15+) or close to 512 MiB (on <3.15), then you are in danger of >> > ENOSPC. >> >> Hmm. It's unfortunate that this could indicate an amount of space >> which is free when it actually isn't. > > That's why the 512 MiB block reserve was split out of metadata -- > so that you don't look at metadata and say "oh, I've got half a gig > free, that's OK". I don't quite follow this. Is it a recent development I missed? When was it "split out"? More recently than the software I'm using? Otherwise I'm having difficulty parsing this. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:48 ` Peter Waller @ 2014-08-04 11:29 ` Hugo Mills 0 siblings, 0 replies; 44+ messages in thread From: Hugo Mills @ 2014-08-04 11:29 UTC (permalink / raw) To: Peter Waller; +Cc: Chris Samuel, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1658 bytes --] On Mon, Aug 04, 2014 at 11:48:17AM +0100, Peter Waller wrote: > On 4 August 2014 11:39, Hugo Mills <hugo@carfax.org.uk> wrote: > >> > * btrfs fi df > >> > - look at metadata used vs total. If these are close to zero (on > >> > 3.15+) or close to 512 MiB (on <3.15), then you are in danger of > >> > ENOSPC. > >> > >> Hmm. It's unfortunate that this could indicate an amount of space > >> which is free when it actually isn't. > > > > That's why the 512 MiB block reserve was split out of metadata -- > > so that you don't look at metadata and say "oh, I've got half a gig > > free, that's OK". > > I don't quite follow this. Is it a recent development I missed? When > was it "split out"? More recently than the software I'm using? > Otherwise I'm having difficulty parsing this. It's purely a change in the way that the kernel reports this info. Before 3.15, the block reserve was included in the "Metadata" report in btrfs fi df. After 3.15, the kernel reports the block reserve as its own separate item in btrfs fi df (either as "BlockRsv", or "unknown", depending on how old your userspace is). The theory is, the change is made to make it clearer how much is used/reserved/free and thus to make this kind of calculation simpler in the long run. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Reading Mein Kampf won't make you a Nazi. Reading Das Kapital --- won't make you a communist. But most trolls started out with a copy of Lord of the Rings. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:31 ` Peter Waller 2014-08-04 10:39 ` Hugo Mills @ 2014-08-04 17:09 ` Austin S Hemmelgarn 2014-08-05 8:20 ` Duncan 1 sibling, 1 reply; 44+ messages in thread From: Austin S Hemmelgarn @ 2014-08-04 17:09 UTC (permalink / raw) To: Peter Waller; +Cc: Hugo Mills, Chris Samuel, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4234 bytes --] On 2014-08-04 06:31, Peter Waller wrote: > Thanks Hugo, this is the most informative e-mail yet! (more inline) > > On 4 August 2014 11:22, Hugo Mills <hugo@carfax.org.uk> wrote: >> >> * btrfs fi show >> - look at the total and used values. If used < total, you're OK. >> If used == total, then you could potentially hit ENOSPC. > > Another thing which is unclear and undocumented anywhere I can find is > what the meaning of `btrfs fi show` is. > > I'm sure it is totally obvious if you are a developer or if you have > used it for long enough. But it isn't covered in the manpage, nor in > the oracle documentation, nor anywhere on the wiki that I could find. > You didn't look very hard then, because there is information in the manpage (oh wait, you mentioned Oracle, your probably using RHEL or CentOS, which are the last thing you should be using if you want to use stuff like BTRFS that is under heavy development), and it is documented on the wiki. > When I looked at it in my problematic situation, it said "500 GiB / > 500 GiB". That sounded fine to me because I interpreted the output as > what fraction of which RAID devices BTRFS was using. In other words, I > thought "Oh, BTRFS will just make use of the whole device that's > available to it.". I thought that `btrfs fi df` was the source of > information for how much space was free inside of that. > >> * btrfs fi df >> - look at metadata used vs total. If these are close to zero (on >> 3.15+) or close to 512 MiB (on <3.15), then you are in danger of >> ENOSPC. > > Hmm. It's unfortunate that this could indicate an amount of space > which is free when it actually isn't. That depends on what you mean by 'free'. > >> - look at data used vs total. If the used is much smaller than >> total, you can reclaim some of the allocation with a filtered >> balance (btrfs balance start -dusage=5), which will then give >> you unallocated space again (see the btrfs fi show test). > > So the filtered balance didn't help in my situation. I understand it's > something to do with the "5" parameter. But I do not understand what > the impact of changing this parameter is. It is something to do with a > fraction of something, but those things are still not present in my > mental model despite a large amount of reading. Is there an > illustration which could clear this up? > Think of each chunk like a box, and each block as a block, and that you have two different types of block (data and metadata) and two different types of box (also data and metadata). The data boxes are four times the size of the metadata boxes, and they all have to fit in one really big container (the device itself). You can only put data blocks in the data boxs, and you can only put metadata blocks in metadata boxes. Say that in total, you can fit 128 data boxes in the large container, or you can replace one data box with up to four metadata boxes. Even though you may only have a few blocks in a given box, the box still takes up the same amount of space in the larger container. Thus, it's possible to have only a few blocks stored, but not be able to add any more boxes to the larger container. A balance operation is essentially the equivalent of taking all of the blocks of a given type, and fitting them into the smallest number of boxes possible. > Among other things I also got the kernel stack trace I pasted at the > bottom of the first e-mail to this thread when I did the rebalance. > >> This FAQ entry is pretty horrible, I'm afraid. I actually started >> rewriting it here to try to make it clearer what's going on. I'll try >> to work on it a bit more this week and put out a better version for >> the wiki. > > This is great to hear! :) > > Thanks for your response Hugo, that really cleared up a lot of mental > model problems. I hope the documentation can be improved so that > others can learn from my mistakes. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 17:09 ` Austin S Hemmelgarn @ 2014-08-05 8:20 ` Duncan 2014-08-05 11:31 ` Austin S Hemmelgarn 0 siblings, 1 reply; 44+ messages in thread From: Duncan @ 2014-08-05 8:20 UTC (permalink / raw) To: linux-btrfs Austin S Hemmelgarn posted on Mon, 04 Aug 2014 13:09:23 -0400 as excerpted: > Think of each chunk like a box, and each block as a block, and that you > have two different types of block (data and metadata) and two different > types of box (also data and metadata). The data boxes are four times the > size of the metadata boxes, and they all have to fit in one really big > container (the device itself). You can only put data blocks in the data > boxs, and you can only put metadata blocks in metadata boxes. Say that > in total, you can fit 128 data boxes in the large container, or you can > replace one data box with up to four metadata boxes. Even though you > may only have a few blocks in a given box, the box still takes up the > same amount of space in the larger container. Thus, it's possible to > have only a few blocks stored, but not be able to add any more boxes to > the larger container. A balance operation is essentially the equivalent > of taking all of the blocks of a given type, and fitting them into the > smallest number of boxes possible. FWIW, that's a great analogy to stick up on the wiki somewhere, probably somewhere in the FAQ related to ENOSPC. Please consider doing so. (Someone took one of my explanations from the list and stuck it in the wiki, virtually word-for-word, with a link to the list post in the archives for more. I was glad, as for some reason I just seem to work best on the lists, and seem to treat web pages as read-only, even if they're on a wiki I in theory have or can get write-privs on. I'm suggesting someone, doesn't have to be you tho great if it is, do the same with this.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 8:20 ` Duncan @ 2014-08-05 11:31 ` Austin S Hemmelgarn 0 siblings, 0 replies; 44+ messages in thread From: Austin S Hemmelgarn @ 2014-08-05 11:31 UTC (permalink / raw) To: Duncan, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2023 bytes --] On 2014-08-05 04:20, Duncan wrote: > Austin S Hemmelgarn posted on Mon, 04 Aug 2014 13:09:23 -0400 as > excerpted: > >> Think of each chunk like a box, and each block as a block, and that you >> have two different types of block (data and metadata) and two different >> types of box (also data and metadata). The data boxes are four times the >> size of the metadata boxes, and they all have to fit in one really big >> container (the device itself). You can only put data blocks in the data >> boxs, and you can only put metadata blocks in metadata boxes. Say that >> in total, you can fit 128 data boxes in the large container, or you can >> replace one data box with up to four metadata boxes. Even though you >> may only have a few blocks in a given box, the box still takes up the >> same amount of space in the larger container. Thus, it's possible to >> have only a few blocks stored, but not be able to add any more boxes to >> the larger container. A balance operation is essentially the equivalent >> of taking all of the blocks of a given type, and fitting them into the >> smallest number of boxes possible. > > FWIW, that's a great analogy to stick up on the wiki somewhere, probably > somewhere in the FAQ related to ENOSPC. Please consider doing so. > > (Someone took one of my explanations from the list and stuck it in the > wiki, virtually word-for-word, with a link to the list post in the > archives for more. I was glad, as for some reason I just seem to work > best on the lists, and seem to treat web pages as read-only, even if > they're on a wiki I in theory have or can get write-privs on. I'm > suggesting someone, doesn't have to be you tho great if it is, do the > same with this.) > I would love to have it up on the wiki, but don't have an account or write privileges. FWIW, I consider anything I post on a mailing list that isn't marked otherwise (except patches) to be public domain, so everyone feel free to use it however you want. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:22 ` Hugo Mills 2014-08-04 10:31 ` Peter Waller @ 2014-08-04 11:04 ` Clemens Eisserer 2014-08-04 11:32 ` Hugo Mills 1 sibling, 1 reply; 44+ messages in thread From: Clemens Eisserer @ 2014-08-04 11:04 UTC (permalink / raw) To: linux-btrfs Hi Hugo, > On the 3.15+ kernels, the block reserve is split out of metadata > and reported separately. This helps with the following process: Thanks a lot for pointing this out, I hadn't noticed this change until now. One thing I didn't find any information about is the overhead introduced by mixied-mode. It would be great if you could explain it in a few sentences. Thank you in advance, Clemens ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 11:04 ` Clemens Eisserer @ 2014-08-04 11:32 ` Hugo Mills 2014-08-04 13:17 ` Peter Waller 0 siblings, 1 reply; 44+ messages in thread From: Hugo Mills @ 2014-08-04 11:32 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 943 bytes --] On Mon, Aug 04, 2014 at 01:04:25PM +0200, Clemens Eisserer wrote: > Hi Hugo, > > > On the 3.15+ kernels, the block reserve is split out of metadata > > and reported separately. This helps with the following process: > > Thanks a lot for pointing this out, I hadn't noticed this change until now. > > One thing I didn't find any information about is the overhead > introduced by mixied-mode. > It would be great if you could explain it in a few sentences. I don't know, I'm afraid. I don't think we've got any benchmarks on the scale of the slowdown. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Reading Mein Kampf won't make you a Nazi. Reading Das Kapital --- won't make you a communist. But most trolls started out with a copy of Lord of the Rings. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 11:32 ` Hugo Mills @ 2014-08-04 13:17 ` Peter Waller 2014-08-04 13:35 ` Hugo Mills ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Peter Waller @ 2014-08-04 13:17 UTC (permalink / raw) To: linux-btrfs For anyone else having this problem, this article is fairly useful for understanding disk full problems and rebalance: http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html It actually covers the problem that I had, which is that a rebalance can't take place because it is full. I still am unsure what is really wrong with this whole situation. Is it that I wasn't careful to do a rebalance when I should have been doing? Is it that BTRFS doesn't do a rebalance automatically when it could in principle? It's pretty bad to end up in a situation (with spare space) where the only way out is to add more storage, which may be impractical, difficult or expensive. The other thing that I still don't understand I've seen repeated in a few places, from the above article: "because the filesystem is only 55% full, I can ask balance to rewrite all chunks that are more than 55% full" Then he uses `btrfs balance start -dusage=55 /mnt/btrfs_pool1`. I don't understand the relationship between "the FS is 55% full" and "chunks more than 55% full". What's going on here? I conclude that now since I have added more storage, the rebalance won't fail and if I keep rebalancing from a cron job I won't hit this problem again (unless the filesystem fills up very fast! what then?). I don't know however what value to assign to `-dusage` in general for the cron rebalance. Any hints? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 13:17 ` Peter Waller @ 2014-08-04 13:35 ` Hugo Mills 2014-08-04 14:02 ` Austin S Hemmelgarn 2014-08-04 14:47 ` Russell Coker 2 siblings, 0 replies; 44+ messages in thread From: Hugo Mills @ 2014-08-04 13:35 UTC (permalink / raw) To: Peter Waller; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3269 bytes --] On Mon, Aug 04, 2014 at 02:17:02PM +0100, Peter Waller wrote: > For anyone else having this problem, this article is fairly useful for > understanding disk full problems and rebalance: > > http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html > > It actually covers the problem that I had, which is that a rebalance > can't take place because it is full. > > I still am unsure what is really wrong with this whole situation. Is > it that I wasn't careful to do a rebalance when I should have been > doing? Is it that BTRFS doesn't do a rebalance automatically when it > could in principle? This latter one. Well, actually two things: the FS should be capable of autonomously rebalancing at low bandwidth to prevent this problem, but nobody's got round to implementing it yet. Secondly, it should not be possible to get into a state where you can't run the balance -- Josef spent about three kernel revisions fixing the block reserve code to that end. However, since about 3.14, there's been more cases like yours show up, so I think there's been a regression. It's not very common, though. I think we've had maybe a dozen reported instances in the last 6 months. Someone on IRC had it just now, though, and captured a metadata image, so at least we've got some (meta)data to work with now. > It's pretty bad to end up in a situation (with spare space) where the > only way out is to add more storage, which may be impractical, > difficult or expensive. > > The other thing that I still don't understand I've seen repeated in a > few places, from the above article: > > "because the filesystem is only 55% full, I can ask balance to rewrite > all chunks that are more than 55% full" > > Then he uses `btrfs balance start -dusage=55 /mnt/btrfs_pool1`. I > don't understand the relationship between "the FS is 55% full" and > "chunks more than 55% full". What's going on here? Pigeonhole principle -- if the FS is 55% full, there must be at least one chunk <= 55% full. > I conclude that now since I have added more storage, the rebalance > won't fail and if I keep rebalancing from a cron job I won't hit this > problem again (unless the filesystem fills up very fast! what then?). > I don't know however what value to assign to `-dusage` in general for > the cron rebalance. Any hints? Try with increasing values until you've moved as many chunks as you want to. This is what David's "balance at least N chunks" patch did. I'd suggest start with 5, and go up in increments of 5, if you're making it an automatic process. Stop when you reach some threshold (like, say, 80), or when it reports that it's actually moved some chunks. Doing it manually, I usually recommend 5, 10, 20, 50, 80. Hugo. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, you don't get to be a kernel hacker simply by looking --- good in Speedos. -- Rusty Russell [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 13:17 ` Peter Waller 2014-08-04 13:35 ` Hugo Mills @ 2014-08-04 14:02 ` Austin S Hemmelgarn 2014-08-04 14:11 ` Peter Waller 2014-08-04 14:47 ` Russell Coker 2 siblings, 1 reply; 44+ messages in thread From: Austin S Hemmelgarn @ 2014-08-04 14:02 UTC (permalink / raw) To: Peter Waller, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4437 bytes --] On 2014-08-04 09:17, Peter Waller wrote: > For anyone else having this problem, this article is fairly useful for > understanding disk full problems and rebalance: > > http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html > > It actually covers the problem that I had, which is that a rebalance > can't take place because it is full. > > I still am unsure what is really wrong with this whole situation. Is > it that I wasn't careful to do a rebalance when I should have been > doing? Is it that BTRFS doesn't do a rebalance automatically when it > could in principle? > > It's pretty bad to end up in a situation (with spare space) where the > only way out is to add more storage, which may be impractical, > difficult or expensive. I really disagree with the statement that adding more storage is difficult or expensive, all you need to do is plug in a 2G USB flash drive, or allocate a ramdisk, and add the device to the filesystem only long enough to do a full balance. > > The other thing that I still don't understand I've seen repeated in a > few places, from the above article: > > "because the filesystem is only 55% full, I can ask balance to rewrite > all chunks that are more than 55% full" > > Then he uses `btrfs balance start -dusage=55 /mnt/btrfs_pool1`. I > don't understand the relationship between "the FS is 55% full" and > "chunks more than 55% full". What's going on here? To understand this, you have to understand that BTRFS uses a two level allocation scheme, at the top level, you have chunks, which are contiguous regions of the disk that get used for storing a specific block type. For data chunks, these default to 1G in size, for metadata, they default to 256M in size. When a filesystem is created, you get the minimum number of chunks of each type based on the replication profiles chosen for each chunk type; with no extra options, this means 1 data chunk and 2 metadata chunks for a single disk filesystem. Within each chunk, BTRFS then allocates and frees individual blocks on demand, these blocks are the analogue of blocks in most other filesystems. When there are no free blocks in any chunks of a given type, BTRFS then allocates new chunks of that type based on the replication profile. Unlike blocks however, chunks aren't freed automatically (there are good reasons for this behavior, but they are kind of long to explain here), this is where balance comes in, it takes all of the blocks in the filesystem, and sends them back through the block allocator. This usually causes all of the free blocks to end up in a single chunk, and frees the unneeded chunks. When someone talks about a chunk being x% full, they mean that x% of the space in that chunk is used by allocated blocks. Talking about how full the filesystem is can get tricky because of the replication profiles, but the usual consensus is to treat that as the percentage of the filesystem that contains blocks that are being used. It should say LESS than 55% full in the various articles, as the -dusage=x option tells balance to only consider chunks that are less than 55% full for balancing. In general, if your filesystem is totally full, you should use numbers starting with 0, and working your way up from there. You may even get lucky, and using -dusage=0 -musage=0 may free up enough chunks that you don't need to add more storage. > > I conclude that now since I have added more storage, the rebalance > won't fail and if I keep rebalancing from a cron job I won't hit this > problem again (unless the filesystem fills up very fast! what then?). > I don't know however what value to assign to `-dusage` in general for > the cron rebalance. Any hints? I've found that something between 25 and 50 tends to do well, much outside of that range and you start to get diminishing returns. The exact value tends to be more personal preference, I use 25 on most of my systems, because I don't like saturating the disks with I/O for very long. Do make sure however to add -musage=x as well, metadata also should be balanced (especially if you have very large numbers of small files). > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 14:02 ` Austin S Hemmelgarn @ 2014-08-04 14:11 ` Peter Waller 2014-08-04 14:26 ` Austin S Hemmelgarn 0 siblings, 1 reply; 44+ messages in thread From: Peter Waller @ 2014-08-04 14:11 UTC (permalink / raw) To: Austin S Hemmelgarn; +Cc: linux-btrfs On 4 August 2014 15:02, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: > I really disagree with the statement that adding more storage is > difficult or expensive, all you need to do is plug in a 2G USB flash > drive, or allocate a ramdisk, and add the device to the filesystem only > long enough to do a full balance. What if the machine is a server in a datacenter you don't have physical access to and the problem is an emergency preventing your users from being able to get work done? What happens if you use a RAM disk and there is a power failure? Thanks for the other explanations and advice also, - Peter ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 14:11 ` Peter Waller @ 2014-08-04 14:26 ` Austin S Hemmelgarn 0 siblings, 0 replies; 44+ messages in thread From: Austin S Hemmelgarn @ 2014-08-04 14:26 UTC (permalink / raw) To: Peter Waller; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 991 bytes --] On 2014-08-04 10:11, Peter Waller wrote: > On 4 August 2014 15:02, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: >> I really disagree with the statement that adding more storage is >> difficult or expensive, all you need to do is plug in a 2G USB flash >> drive, or allocate a ramdisk, and add the device to the filesystem only >> long enough to do a full balance. > > What if the machine is a server in a datacenter you don't have > physical access to and the problem is an emergency preventing your > users from being able to get work done? > > What happens if you use a RAM disk and there is a power failure? > I'm not saying that either option is a perfect solution. In fact, the only reason that I even mentioned the ramdisk is because I have had good success with that on my laptop, but then laptops essentially have a built-in UPS. I personally wouldn't use a ramdisk except as a last resort if you don't have some sort of UPS or redundancy in the PSU. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 13:17 ` Peter Waller 2014-08-04 13:35 ` Hugo Mills 2014-08-04 14:02 ` Austin S Hemmelgarn @ 2014-08-04 14:47 ` Russell Coker 2014-08-04 15:19 ` Mitch Harder 2 siblings, 1 reply; 44+ messages in thread From: Russell Coker @ 2014-08-04 14:47 UTC (permalink / raw) To: Peter Waller; +Cc: linux-btrfs On Mon, 4 Aug 2014 14:17:02 Peter Waller wrote: > For anyone else having this problem, this article is fairly useful for > understanding disk full problems and rebalance: > > http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-> Full-Problems.html > > It actually covers the problem that I had, which is that a rebalance > can't take place because it is full. > > I still am unsure what is really wrong with this whole situation. Is > it that I wasn't careful to do a rebalance when I should have been > doing? Is it that BTRFS doesn't do a rebalance automatically when it > could in principle? Yes and yes. The fact that BTRFS can't avoid getting into such situations and can't recover when it does are both bugs in BTRFS. The fact that you didn't run a balance to prevent this is due to not being careful enough with a filesystem that's still in a development stage. > It's pretty bad to end up in a situation (with spare space) where the > only way out is to add more storage, which may be impractical, > difficult or expensive. Absolutely. > I conclude that now since I have added more storage, the rebalance > won't fail and if I keep rebalancing from a cron job I won't hit this > problem again Yes. > (unless the filesystem fills up very fast! what then?). > I don't know however what value to assign to `-dusage` in general for > the cron rebalance. Any hints? If you regularly run a scrub with options such as "-dusage=50 -musage=10" then the amount of free space in metadata chunks will tend to be a lot greater than that in data chunks. Another option I've considered is to write a program that creates millions of files with 1000 byte random file names. After creating a filesystem I could run that program to cause a sufficient number of metadata chunks to be allocated and then remove the subvol containing all those files (which incidentally is a lot faster than "rm -rf"). Another thing I've considered is making a filesystem for a file server with a RAID-1 array of SSDs and running the above program to allocate all chunks for metadata. Then when the SSDs are totally assigned to metadata I would add a pair of SATA disks for data. A filesystem with all metadata on SSD and all data on SATA disks should give great performance as well as having lots of space. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 14:47 ` Russell Coker @ 2014-08-04 15:19 ` Mitch Harder 0 siblings, 0 replies; 44+ messages in thread From: Mitch Harder @ 2014-08-04 15:19 UTC (permalink / raw) To: russell; +Cc: Peter Waller, linux-btrfs On Mon, Aug 4, 2014 at 9:47 AM, Russell Coker <russell@coker.com.au> wrote: > If you regularly run a scrub with options such as "-dusage=50 -musage=10" then > the amount of free space in metadata chunks will tend to be a lot greater than > that in data chunks. > Just to clarify for posterity, I'm pretty sure you meant 'balance' with "-dusage=50 -musage=10" instead of 'scrub'. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:09 ` Peter Waller 2014-08-04 10:22 ` Hugo Mills @ 2014-08-04 10:50 ` Chris Samuel 2014-08-04 10:59 ` Peter Waller 2014-08-10 17:26 ` Martin Steigerwald 2 siblings, 1 reply; 44+ messages in thread From: Chris Samuel @ 2014-08-04 10:50 UTC (permalink / raw) To: linux-btrfs On Mon, 4 Aug 2014 11:09:23 AM Peter Waller wrote: > I accept that. It's all very well if you read the BTRFS list and/or > are a BTRFS developer. But if you're trying to work it out in the heat > of battle, as we have sysadmins who would have to, there is a > combination of things here that makes it unreasonable and harmful for > production. To be honest I'm not sure I'd suggest btrfs for production use at all at present, it's only recently been unmarked as experimental and to be honest I feel that was premature. :-( All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:50 ` Chris Samuel @ 2014-08-04 10:59 ` Peter Waller 2014-08-04 21:27 ` Chris Samuel 0 siblings, 1 reply; 44+ messages in thread From: Peter Waller @ 2014-08-04 10:59 UTC (permalink / raw) To: Chris Samuel; +Cc: linux-btrfs On 4 August 2014 11:50, Chris Samuel <chris@csamuel.org> wrote: > To be honest I'm not sure I'd suggest btrfs for production use at all at > present, it's only recently been unmarked as experimental and to be honest I > feel that was premature. :-( Thanks for the honest answer. There are very positive signals out there which I had perhaps taken too literally. I'd love to see it become ready, there are a lot of things about BTRFS which appeal greatly. So I hope I'm helping by trying to make it clear the problems that I encountered. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:59 ` Peter Waller @ 2014-08-04 21:27 ` Chris Samuel 0 siblings, 0 replies; 44+ messages in thread From: Chris Samuel @ 2014-08-04 21:27 UTC (permalink / raw) To: linux-btrfs Hi Peter, On Mon, 4 Aug 2014 11:59:19 AM Peter Waller wrote: > On 4 August 2014 11:50, Chris Samuel <chris@csamuel.org> wrote: > > > To be honest I'm not sure I'd suggest btrfs for production use at all at > > present, it's only recently been unmarked as experimental and to be honest > > I feel that was premature. > > Thanks for the honest answer. That's OK, I am enthusiastic about btrfs (being a pre-mainline merge user), but I don't think it serves it well to signal that it's more ready than it actually is. > There are very positive signals out there which I had perhaps taken > too literally. I'd love to see it become ready, there are a lot of things > about BTRFS which appeal greatly. So I hope I'm helping by trying > to make it clear the problems that I encountered. Oh indeed, please don't take my brief replies as being anything other than brief due to preparing to do some astronomy! :-) All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 10:09 ` Peter Waller 2014-08-04 10:22 ` Hugo Mills 2014-08-04 10:50 ` Chris Samuel @ 2014-08-10 17:26 ` Martin Steigerwald 2 siblings, 0 replies; 44+ messages in thread From: Martin Steigerwald @ 2014-08-10 17:26 UTC (permalink / raw) To: Peter Waller; +Cc: Chris Samuel, linux-btrfs Am Montag, 4. August 2014, 11:09:23 schrieb Peter Waller: > On 4 August 2014 10:39, Chris Samuel <chris@csamuel.org> wrote: > > On Mon, 4 Aug 2014 09:14:19 AM Peter Waller wrote: > >> All of this is *very* surprising. > > > > Hmm, it shouldn't be, the ENOSPC issues are well known and have been > > discussed here for years. > > I accept that. It's all very well if you read the BTRFS list and/or > are a BTRFS developer. But if you're trying to work it out in the heat > of battle, as we have sysadmins who would have to, there is a > combination of things here that makes it unreasonable and harmful for > production. Well, maybe, just maybe… BTRFS is not yet ready for production use. I installed it on a server recently, my own VM. And I am expect that I may need to fix up things there. And I did it with a *huge* free space margin. And still running Debian backport kernel 3.14. Won´t change it to 3.16 until I have seen that it runs nicely on my laptop again. Test it on non critical servers – yes. Use it on critical production servers? My answer is a no for this. Despite partial support in SLES 11 SP 2 and support (partial?) for it in Oracle Unbreakable Linux. BTRFS is just not yet there from my current experiences with it. One thing that may get you covered up usually: Make it five times as large as the data you put on it and try to monitor for situation you better rebalance in advance. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-04 8:14 ` Peter Waller 2014-08-04 9:22 ` Clemens Eisserer 2014-08-04 9:39 ` Chris Samuel @ 2014-08-05 8:51 ` Qu Wenruo 2014-08-05 12:17 ` Russell Coker 2 siblings, 1 reply; 44+ messages in thread From: Qu Wenruo @ 2014-08-05 8:51 UTC (permalink / raw) To: Peter Waller; +Cc: linux-btrfs -------- Original Message -------- Subject: Re: ENOSPC with mkdir and rename From: Peter Waller <peter@scraperwiki.com> To: Qu Wenruo <quwenruo@cn.fujitsu.com> Date: 2014年08月04日 16:14 > Thanks for responses. > > All of this is *very* surprising. I'm not new to BTRFS, I've been > using it on my own machines for multiple years. I didn't realise there > was an un-holstered footgun on my lap at this point. How can it be > made clear how to avoid the ENOSPC problem to myself and other > sysadmins? Or preferably not exist as a problem? [snip] In fact such "defeat"(or whatever) is not really btrfs only problem. In ext*, there is still similiar behavior: ext* has a up limit on the number of inode after mkfs. (When you mkfs.ext*, you are prompt the up limit of inodes) However other metadata in ext* is stored together with data, so no ENOSPC problem like btrfs. Btrfs only makes ENOSPC easier to happen by completly split data and metadata, and does extra data reserve for metadata. If you like the ext* way, as already mentioned you can mkfs.btrfs with -M flag. But IMO, some tuning in btrfs chunk allocation algorithm may helps. For example, we have a 20G disk, and 14G space is allocated to data/metadata chunks. Under such sitiuation, if btrfs needs new data chunk, it will allocate up to 10% of disk, which is 2G. But if it comes to metadata, it will only allocate up to 256M metadata chunk. This makes it very easy to allocate the rest of space all to data chunk. But if btrfs can use the free space in a more diligent way when space is not enough, metadata and data usage should be more balanced and less ENOSPC will occur. If nobody dislike the idea, I'd like try to implent this later. Thanks, Qu ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: ENOSPC with mkdir and rename 2014-08-05 8:51 ` Qu Wenruo @ 2014-08-05 12:17 ` Russell Coker 0 siblings, 0 replies; 44+ messages in thread From: Russell Coker @ 2014-08-05 12:17 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On Tue, 5 Aug 2014 16:51:44 Qu Wenruo wrote: > In fact such "defeat"(or whatever) is not really btrfs only problem. > In ext*, there is still similiar behavior: ext* has a up limit on the > number of inode after mkfs. > (When you mkfs.ext*, you are prompt the up limit of inodes) > However other metadata in ext* is stored together with data, so no > ENOSPC problem like btrfs. There is a huge difference between BTRFS and Ext* in this regard. The way that Ext* has always worked is that if you delete one file, pipe or socket that isn't hard-linked, or one sym-link or directory then you free up 1 Inode. 1 free Inode allows you to create 1 file, pipe, socket, sym-link, or directory. Deleting a file or directory on BTRFS takes MORE metadata space (at least temporarily) because it writes a new copy of the tree. So not only will deleting files not immediately solve a lack of metadata space on BTRFS but it might even make things worse. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2014-08-10 17:26 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-02 23:35 ENOSPC with mkdir and rename Peter Waller 2014-08-03 0:28 ` Mitch Harder 2014-08-03 1:52 ` Nick Krause 2014-08-03 2:39 ` Russell Coker 2014-08-03 2:59 ` Nick Krause 2014-08-04 1:38 ` Qu Wenruo 2014-08-04 8:14 ` Peter Waller 2014-08-04 9:22 ` Clemens Eisserer 2014-08-04 9:39 ` Chris Samuel 2014-08-04 9:56 ` Clemens Eisserer 2014-08-04 10:24 ` Chris Samuel 2014-08-05 8:06 ` Duncan 2014-08-05 12:20 ` Russell Coker 2014-08-05 12:58 ` Clemens Eisserer 2014-08-05 13:02 ` Peter Waller 2014-08-10 17:21 ` Martin Steigerwald 2014-08-05 13:36 ` Chris Samuel 2014-08-06 0:04 ` Duncan 2014-08-06 0:38 ` ronnie sahlberg 2014-08-06 1:18 ` Nick Krause 2014-08-04 10:09 ` Peter Waller 2014-08-04 10:22 ` Hugo Mills 2014-08-04 10:31 ` Peter Waller 2014-08-04 10:39 ` Hugo Mills 2014-08-04 10:48 ` Peter Waller 2014-08-04 11:29 ` Hugo Mills 2014-08-04 17:09 ` Austin S Hemmelgarn 2014-08-05 8:20 ` Duncan 2014-08-05 11:31 ` Austin S Hemmelgarn 2014-08-04 11:04 ` Clemens Eisserer 2014-08-04 11:32 ` Hugo Mills 2014-08-04 13:17 ` Peter Waller 2014-08-04 13:35 ` Hugo Mills 2014-08-04 14:02 ` Austin S Hemmelgarn 2014-08-04 14:11 ` Peter Waller 2014-08-04 14:26 ` Austin S Hemmelgarn 2014-08-04 14:47 ` Russell Coker 2014-08-04 15:19 ` Mitch Harder 2014-08-04 10:50 ` Chris Samuel 2014-08-04 10:59 ` Peter Waller 2014-08-04 21:27 ` Chris Samuel 2014-08-10 17:26 ` Martin Steigerwald 2014-08-05 8:51 ` Qu Wenruo 2014-08-05 12:17 ` Russell Coker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).