* enospace regression in 4.4
@ 2016-04-12 10:24 Julian Taylor
2016-04-12 15:52 ` Julian Taylor
0 siblings, 1 reply; 6+ messages in thread
From: Julian Taylor @ 2016-04-12 10:24 UTC (permalink / raw)
To: linux-btrfs
hi,
I have a system with two filesystems which are both affected by the
notorious enospace bug when there is plenty of unallocated space
available. The system is a raid0 on two 900 GiB disks and an iscsi
single/dup 1.4TiB.
To deal with the problem I use a cronjob that uses fallocate to give me
an advance notice on the issue so I can apply the only workaround that
works for me, which is shrink the fs to the minimum and grow it again.
This has worked fine for a couple of month.
I now updated from 4.2 to 4.4.6 and it appears my cronjob actually
triggers an immediate enospc in the balance after removing the
fallocated file and the shrink/resize workaround does not work anymore.
it is mounted with enospc_debug but that just says "2 enospc in
balance". Nothing else useful in the log.
I had to revert back to 4.2 to get the system running again so it is
currently not available for more testing, but I may be able to do more
tests if required in future.
The cronjob does this once a day:
#!/bin/bash
sync
check() {
date
mnt=$1
time btrfs fi balance start -mlimit=2 $mnt
btrfs fi balance start -dusage=5 $mnt
sync
freespace=$(df -B1 $mnt | tail -n 1 | awk '{print $4 -
50*1024*1024*1024}')
fallocate -l $freespace $mnt/falloc
/usr/sbin/filefrag $mnt/falloc
rm -f $mnt/falloc
btrfs fi balance start -dusage=0 $mnt
time btrfs fi balance start -mlimit=2 $mnt
time btrfs fi balance start -dlimit=10 $mnt
date
}
check /data
check /data/nas
btrfs info:
~ $ btrfs --version
btrfs-progs v4.4
sagan5 ~ $ sudo btrfs fi show
Label: none uuid: e4aef349-7a56-4287-93b1-79233e016aae
Total devices 2 FS bytes used 898.18GiB
devid 1 size 880.00GiB used 473.03GiB path /dev/mapper/data-linear1
devid 2 size 880.00GiB used 473.03GiB path /dev/mapper/data-linear2
Label: none uuid: 14040f9b-53c8-46cf-be6b-35de746c3153
Total devices 1 FS bytes used 557.19GiB
devid 1 size 1.36TiB used 585.95GiB path /dev/sdd
~ $ sudo btrfs fi df /data
Data, RAID0: total=938.00GiB, used=895.09GiB
System, RAID1: total=32.00MiB, used=112.00KiB
Metadata, RAID1: total=4.00GiB, used=3.10GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
sagan5 ~ $ sudo btrfs fi usage /data
Overall:
Device size: 1.72TiB
Device allocated: 946.06GiB
Device unallocated: 813.94GiB
Device missing: 0.00B
Used: 901.27GiB
Free (estimated): 856.85GiB (min: 449.88GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,RAID0: Size:938.00GiB, Used:895.09GiB
/dev/dm-1 469.00GiB
/dev/mapper/data-linear1 469.00GiB
Metadata,RAID1: Size:4.00GiB, Used:3.09GiB
/dev/dm-1 4.00GiB
/dev/mapper/data-linear1 4.00GiB
System,RAID1: Size:32.00MiB, Used:112.00KiB
/dev/dm-1 32.00MiB
/dev/mapper/data-linear1 32.00MiB
Unallocated:
/dev/dm-1 406.97GiB
/dev/mapper/data-linear1 406.97GiB
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enospace regression in 4.4
2016-04-12 10:24 enospace regression in 4.4 Julian Taylor
@ 2016-04-12 15:52 ` Julian Taylor
2016-04-12 18:09 ` Henk Slager
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Julian Taylor @ 2016-04-12 15:52 UTC (permalink / raw)
To: linux-btrfs
smaller testcase that shows the immediate enospc after fallocate -> rm,
though I don't know if it is really related to the full filesystem
bugging out as the balance does work if you wait a few seconds after the
balance.
But this sequence of commands did work in 4.2.
$ sudo btrfs fi show /dev/mapper/lvm-testing
Label: none uuid: 25889ba9-a957-415a-83b0-e34a62cb3212
Total devices 1 FS bytes used 225.18MiB
devid 1 size 5.00GiB used 788.00MiB path /dev/mapper/lvm-testing
$ fallocate -l 4.4G test.dat
$ rm -f test.dat
$ sudo btrfs fi balance start -dusage=0 .
ERROR: error during balancing '.': No space left on device
There may be more info in syslog - try dmesg | tail
On 04/12/2016 12:24 PM, Julian Taylor wrote:
> hi,
> I have a system with two filesystems which are both affected by the
> notorious enospace bug when there is plenty of unallocated space
> available. The system is a raid0 on two 900 GiB disks and an iscsi
> single/dup 1.4TiB.
> To deal with the problem I use a cronjob that uses fallocate to give me
> an advance notice on the issue so I can apply the only workaround that
> works for me, which is shrink the fs to the minimum and grow it again.
> This has worked fine for a couple of month.
>
> I now updated from 4.2 to 4.4.6 and it appears my cronjob actually
> triggers an immediate enospc in the balance after removing the
> fallocated file and the shrink/resize workaround does not work anymore.
> it is mounted with enospc_debug but that just says "2 enospc in
> balance". Nothing else useful in the log.
>
> I had to revert back to 4.2 to get the system running again so it is
> currently not available for more testing, but I may be able to do more
> tests if required in future.
>
> The cronjob does this once a day:
>
> #!/bin/bash
> sync
>
> check() {
> date
> mnt=$1
> time btrfs fi balance start -mlimit=2 $mnt
> btrfs fi balance start -dusage=5 $mnt
> sync
> freespace=$(df -B1 $mnt | tail -n 1 | awk '{print $4 -
> 50*1024*1024*1024}')
> fallocate -l $freespace $mnt/falloc
> /usr/sbin/filefrag $mnt/falloc
> rm -f $mnt/falloc
> btrfs fi balance start -dusage=0 $mnt
>
> time btrfs fi balance start -mlimit=2 $mnt
> time btrfs fi balance start -dlimit=10 $mnt
> date
> }
>
> check /data
> check /data/nas
>
>
> btrfs info:
>
>
> ~ $ btrfs --version
> btrfs-progs v4.4
> sagan5 ~ $ sudo btrfs fi show
> Label: none uuid: e4aef349-7a56-4287-93b1-79233e016aae
> Total devices 2 FS bytes used 898.18GiB
> devid 1 size 880.00GiB used 473.03GiB path /dev/mapper/data-linear1
> devid 2 size 880.00GiB used 473.03GiB path /dev/mapper/data-linear2
>
> Label: none uuid: 14040f9b-53c8-46cf-be6b-35de746c3153
> Total devices 1 FS bytes used 557.19GiB
> devid 1 size 1.36TiB used 585.95GiB path /dev/sdd
>
> ~ $ sudo btrfs fi df /data
> Data, RAID0: total=938.00GiB, used=895.09GiB
> System, RAID1: total=32.00MiB, used=112.00KiB
> Metadata, RAID1: total=4.00GiB, used=3.10GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> sagan5 ~ $ sudo btrfs fi usage /data
> Overall:
> Device size: 1.72TiB
> Device allocated: 946.06GiB
> Device unallocated: 813.94GiB
> Device missing: 0.00B
> Used: 901.27GiB
> Free (estimated): 856.85GiB (min: 449.88GiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
> Data,RAID0: Size:938.00GiB, Used:895.09GiB
> /dev/dm-1 469.00GiB
> /dev/mapper/data-linear1 469.00GiB
>
> Metadata,RAID1: Size:4.00GiB, Used:3.09GiB
> /dev/dm-1 4.00GiB
> /dev/mapper/data-linear1 4.00GiB
>
> System,RAID1: Size:32.00MiB, Used:112.00KiB
> /dev/dm-1 32.00MiB
> /dev/mapper/data-linear1 32.00MiB
>
> Unallocated:
> /dev/dm-1 406.97GiB
> /dev/mapper/data-linear1 406.97GiB
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enospace regression in 4.4
2016-04-12 15:52 ` Julian Taylor
@ 2016-04-12 18:09 ` Henk Slager
2016-04-12 19:01 ` Julian Taylor
2016-04-13 3:13 ` Duncan
2016-04-13 11:56 ` Henk Slager
2 siblings, 1 reply; 6+ messages in thread
From: Henk Slager @ 2016-04-12 18:09 UTC (permalink / raw)
To: Julian Taylor; +Cc: linux-btrfs
On Tue, Apr 12, 2016 at 5:52 PM, Julian Taylor
<jtaylor.debian@googlemail.com> wrote:
> smaller testcase that shows the immediate enospc after fallocate -> rm,
> though I don't know if it is really related to the full filesystem
> bugging out as the balance does work if you wait a few seconds after the
> balance.
> But this sequence of commands did work in 4.2.
>
> $ sudo btrfs fi show /dev/mapper/lvm-testing
> Label: none uuid: 25889ba9-a957-415a-83b0-e34a62cb3212
> Total devices 1 FS bytes used 225.18MiB
> devid 1 size 5.00GiB used 788.00MiB path /dev/mapper/lvm-testing
>
> $ fallocate -l 4.4G test.dat
> $ rm -f test.dat
> $ sudo btrfs fi balance start -dusage=0 .
> ERROR: error during balancing '.': No space left on device
> There may be more info in syslog - try dmesg | tail
It seems that kernel 4.4.6 waits longer with de-allocating empty
chunks and the balance kicks in at a time when the 5 GiB is still
completely filled with chunks. As balance needs uncallocated space (on
device level, how much depends on profiles), this error can be
expected.
> On 04/12/2016 12:24 PM, Julian Taylor wrote:
>> hi,
>> I have a system with two filesystems which are both affected by the
>> notorious enospace bug when there is plenty of unallocated space
>> available. The system is a raid0 on two 900 GiB disks and an iscsi
>> single/dup 1.4TiB.
>> To deal with the problem I use a cronjob that uses fallocate to give me
>> an advance notice on the issue so I can apply the only workaround that
>> works for me, which is shrink the fs to the minimum and grow it again.
>> This has worked fine for a couple of month.
>>
>> I now updated from 4.2 to 4.4.6 and it appears my cronjob actually
>> triggers an immediate enospc in the balance after removing the
>> fallocated file and the shrink/resize workaround does not work anymore.
The filesystem itself is not resized AFAIU, correct?
>> it is mounted with enospc_debug but that just says "2 enospc in
>> balance". Nothing else useful in the log.
>>
>> I had to revert back to 4.2 to get the system running again so it is
>> currently not available for more testing, but I may be able to do more
>> tests if required in future.
>>
>> The cronjob does this once a day:
>>
>> #!/bin/bash
>> sync
>>
>> check() {
>> date
>> mnt=$1
>> time btrfs fi balance start -mlimit=2 $mnt
>> btrfs fi balance start -dusage=5 $mnt
>> sync
>> freespace=$(df -B1 $mnt | tail -n 1 | awk '{print $4 -
>> 50*1024*1024*1024}')
>> fallocate -l $freespace $mnt/falloc
>> /usr/sbin/filefrag $mnt/falloc
>> rm -f $mnt/falloc
>> btrfs fi balance start -dusage=0 $mnt
See comment for smaller test; Maybe you could put a delay of larger
than the commit time before this balance. To give the kernel itself
the possibility to cleanup empty chunks.
>> time btrfs fi balance start -mlimit=2 $mnt
>> time btrfs fi balance start -dlimit=10 $mnt
>> date
>> }
>>
>> check /data
>> check /data/nas
It could be that now with kernel 4.4.6 or newer, the original enospc
(so not the ones due to balances) does not popup anymore. That would
mean the cronjob workaround itself creates a problem now. Can you give
some background on what other (types of) enospc occurred in the past
and was this with 4.2 kernel ? or older?
You could shrink a file-system by a few GiB's (without changing the
size of the underlying device), so that once it really gets filled up
and hits enospc, you resize to max again and delete files or snapshot
or something. Of course no option for a 24/7 unattended system, but
maybe for a client laptop as testing.
>> btrfs info:
>>
>>
>> ~ $ btrfs --version
>> btrfs-progs v4.4
>> sagan5 ~ $ sudo btrfs fi show
>> Label: none uuid: e4aef349-7a56-4287-93b1-79233e016aae
>> Total devices 2 FS bytes used 898.18GiB
>> devid 1 size 880.00GiB used 473.03GiB path /dev/mapper/data-linear1
>> devid 2 size 880.00GiB used 473.03GiB path /dev/mapper/data-linear2
>>
>> Label: none uuid: 14040f9b-53c8-46cf-be6b-35de746c3153
>> Total devices 1 FS bytes used 557.19GiB
>> devid 1 size 1.36TiB used 585.95GiB path /dev/sdd
>>
>> ~ $ sudo btrfs fi df /data
>> Data, RAID0: total=938.00GiB, used=895.09GiB
>> System, RAID1: total=32.00MiB, used=112.00KiB
>> Metadata, RAID1: total=4.00GiB, used=3.10GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> sagan5 ~ $ sudo btrfs fi usage /data
>> Overall:
>> Device size: 1.72TiB
>> Device allocated: 946.06GiB
>> Device unallocated: 813.94GiB
>> Device missing: 0.00B
>> Used: 901.27GiB
>> Free (estimated): 856.85GiB (min: 449.88GiB)
>> Data ratio: 1.00
>> Metadata ratio: 2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>> Data,RAID0: Size:938.00GiB, Used:895.09GiB
>> /dev/dm-1 469.00GiB
>> /dev/mapper/data-linear1 469.00GiB
>>
>> Metadata,RAID1: Size:4.00GiB, Used:3.09GiB
>> /dev/dm-1 4.00GiB
>> /dev/mapper/data-linear1 4.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:112.00KiB
>> /dev/dm-1 32.00MiB
>> /dev/mapper/data-linear1 32.00MiB
>>
>> Unallocated:
>> /dev/dm-1 406.97GiB
>> /dev/mapper/data-linear1 406.97GiB
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enospace regression in 4.4
2016-04-12 18:09 ` Henk Slager
@ 2016-04-12 19:01 ` Julian Taylor
0 siblings, 0 replies; 6+ messages in thread
From: Julian Taylor @ 2016-04-12 19:01 UTC (permalink / raw)
To: linux-btrfs
On 12.04.2016 20:09, Henk Slager wrote:
> On Tue, Apr 12, 2016 at 5:52 PM, Julian Taylor
> <jtaylor.debian@googlemail.com> wrote:
>> smaller testcase that shows the immediate enospc after fallocate -> rm,
>> though I don't know if it is really related to the full filesystem
>> bugging out as the balance does work if you wait a few seconds after the
>> balance.
>> But this sequence of commands did work in 4.2.
>>
>> $ sudo btrfs fi show /dev/mapper/lvm-testing
>> Label: none uuid: 25889ba9-a957-415a-83b0-e34a62cb3212
>> Total devices 1 FS bytes used 225.18MiB
>> devid 1 size 5.00GiB used 788.00MiB path /dev/mapper/lvm-testing
>>
>> $ fallocate -l 4.4G test.dat
>> $ rm -f test.dat
>> $ sudo btrfs fi balance start -dusage=0 .
>> ERROR: error during balancing '.': No space left on device
>> There may be more info in syslog - try dmesg | tail
>
> It seems that kernel 4.4.6 waits longer with de-allocating empty
> chunks and the balance kicks in at a time when the 5 GiB is still
> completely filled with chunks. As balance needs uncallocated space (on
> device level, how much depends on profiles), this error can be
> expected.
hm ok, I'll put a sleep in the script then.
fallocate; rm; fallocate seems to work so its probably ok in normal usage.
>
>> On 04/12/2016 12:24 PM, Julian Taylor wrote:
>>> hi,
>>> I have a system with two filesystems which are both affected by the
>>> notorious enospace bug when there is plenty of unallocated space
>>> available. The system is a raid0 on two 900 GiB disks and an iscsi
>>> single/dup 1.4TiB.
>>> To deal with the problem I use a cronjob that uses fallocate to give me
>>> an advance notice on the issue so I can apply the only workaround that
>>> works for me, which is shrink the fs to the minimum and grow it again.
>>> This has worked fine for a couple of month.
>>>
>>> I now updated from 4.2 to 4.4.6 and it appears my cronjob actually
>>> triggers an immediate enospc in the balance after removing the
>>> fallocated file and the shrink/resize workaround does not work anymore.
>
> The filesystem itself is not resized AFAIU, correct?
btrfs resize -XG /mount
so resize filesystem but not the underlying device.
Actually the system just went into enospc again with unallocated free
even after the revert to 4.2 and the shrink trick doesn't want to work
anymore either ...
Though the 4.2 running now is not the same where the shrink workaround
work. I'll have to check the changelog to see if there are btrfs related
changes in it.
>
> You could shrink a file-system by a few GiB's (without changing the
> size of the underlying device), so that once it really gets filled up
> and hits enospc, you resize to max again and delete files or snapshot
> or something. Of course no option for a 24/7 unattended system, but
> maybe for a client laptop as testing.
>
that us basically what I have been doing, I used the cronjob to see when
the enospc issue occurred and then resize shrink to fix it. It was
relatively rare, I had to do it maybe every two month.
But now for some reason that trick doesn't work anymore either, I can
shrink it by 200G and resize it back to max and it still complains about
no free space. So now I'm at a loss on how to keep this system working.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enospace regression in 4.4
2016-04-12 15:52 ` Julian Taylor
2016-04-12 18:09 ` Henk Slager
@ 2016-04-13 3:13 ` Duncan
2016-04-13 11:56 ` Henk Slager
2 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2016-04-13 3:13 UTC (permalink / raw)
To: linux-btrfs
Julian Taylor posted on Tue, 12 Apr 2016 17:52:57 +0200 as excerpted:
> $ sudo btrfs fi balance start -dusage=0 .
> ERROR: error during balancing '.': No space left on device
Not much to add, but this one really surprises me and it may be related
to the new problem you're seeing.
I don't recall ever seeing a -dusage=0 actually error out due to ENOSPC
before. It normally either works, killing some empty chunks, or runs
without error but also without finding any empty chunks to kill, thus
"doing nothing, successfully" (to borrow the one-line name and
description for true (1)).
That even a balance with -dusage=0 is actually failing, not just
completing without doing anything as might be expected, is strange
indeed. With a bit of luck that's a strong hint to the devs as to what
has actually gone wrong and how to fix it.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enospace regression in 4.4
2016-04-12 15:52 ` Julian Taylor
2016-04-12 18:09 ` Henk Slager
2016-04-13 3:13 ` Duncan
@ 2016-04-13 11:56 ` Henk Slager
2 siblings, 0 replies; 6+ messages in thread
From: Henk Slager @ 2016-04-13 11:56 UTC (permalink / raw)
To: linux-btrfs
On Tue, Apr 12, 2016 at 5:52 PM, Julian Taylor
<jtaylor.debian@googlemail.com> wrote:
> smaller testcase that shows the immediate enospc after fallocate -> rm,
> though I don't know if it is really related to the full filesystem
> bugging out as the balance does work if you wait a few seconds after the
> balance.
> But this sequence of commands did work in 4.2.
>
> $ sudo btrfs fi show /dev/mapper/lvm-testing
> Label: none uuid: 25889ba9-a957-415a-83b0-e34a62cb3212
> Total devices 1 FS bytes used 225.18MiB
> devid 1 size 5.00GiB used 788.00MiB path /dev/mapper/lvm-testing
>
> $ fallocate -l 4.4G test.dat
> $ rm -f test.dat
> $ sudo btrfs fi balance start -dusage=0 .
> ERROR: error during balancing '.': No space left on device
> There may be more info in syslog - try dmesg | tail
The effect is the same with kernel / progs v4.6.0-rc3 / v4.5.1
It also doesn't matter if fallocate -l 4400M test.dat or dd
if=/dev/zero of=test.dat bs=1M count=4400 is used to create test.dat
(I was looking at --dig-holes and --punch-hole options earlier and was
wondering if the use of fallocate would make a difference).
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-04-13 12:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-12 10:24 enospace regression in 4.4 Julian Taylor
2016-04-12 15:52 ` Julian Taylor
2016-04-12 18:09 ` Henk Slager
2016-04-12 19:01 ` Julian Taylor
2016-04-13 3:13 ` Duncan
2016-04-13 11:56 ` Henk Slager
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.