Metadata balance fails ENOSPC

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Metadata balance fails ENOSPC
@ 2016-11-30 21:03 Stefan Priebe - Profihost AG
  2016-11-30 23:02 ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-11-30 21:03 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Hello,

# btrfs balance start -v -dusage=0 -musage=1 /ssddisk/
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
  METADATA (flags 0x2): balancing, usage=1
  SYSTEM (flags 0x2): balancing, usage=1
ERROR: error during balancing '/ssddisk/': No space left on device
There may be more info in syslog - try dmesg | tail

# btrfs filesystem show /ssddisk/
Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
        Total devices 1 FS bytes used 305.67GiB
        devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1

# btrfs filesystem usage /ssddisk/
Overall:
    Device size:                 500.00GiB
    Device allocated:            500.00GiB
    Device unallocated:            1.05MiB
    Device missing:                  0.00B
    Used:                        305.69GiB
    Free (estimated):            185.78GiB      (min: 185.78GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 608.00KiB)

Data,single: Size:483.97GiB, Used:298.18GiB
   /dev/vdb1     483.97GiB

Metadata,single: Size:16.00GiB, Used:7.51GiB
   /dev/vdb1      16.00GiB

System,single: Size:32.00MiB, Used:144.00KiB
   /dev/vdb1      32.00MiB

Unallocated:
   /dev/vdb1       1.05MiB

How can i make it balancing again?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-11-30 21:03 Metadata balance fails ENOSPC Stefan Priebe - Profihost AG
@ 2016-11-30 23:02 ` Chris Murphy
  2016-12-01  7:49   ` Stefan Priebe - Profihost AG
  2016-12-01  8:18   ` Duncan
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2016-11-30 23:02 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org

On Wed, Nov 30, 2016 at 2:03 PM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Hello,
>
> # btrfs balance start -v -dusage=0 -musage=1 /ssddisk/
> Dumping filters: flags 0x7, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=0
>   METADATA (flags 0x2): balancing, usage=1
>   SYSTEM (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/ssddisk/': No space left on device
> There may be more info in syslog - try dmesg | tail

You haven't provided kernel messages at the time of the error.

Also useful is the kernel version.



>
> # btrfs filesystem show /ssddisk/
> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>         Total devices 1 FS bytes used 305.67GiB
>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>
> # btrfs filesystem usage /ssddisk/
> Overall:
>     Device size:                 500.00GiB
>     Device allocated:            500.00GiB
>     Device unallocated:            1.05MiB

Drive is actually fully allocated so if Btrfs needs to create a new
chunk right now, it can't. However,



>
> Data,single: Size:483.97GiB, Used:298.18GiB
>    /dev/vdb1     483.97GiB
>
> Metadata,single: Size:16.00GiB, Used:7.51GiB
>    /dev/vdb1      16.00GiB
>
> System,single: Size:32.00MiB, Used:144.00KiB
>    /dev/vdb1      32.00MiB

All three chunk types have quite a bit of unused space in them, so
it's unclear why there's a no space left error.

Try remounting with enoscp_debug, and then trigger the problem again,
and post the resulting kernel messages.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-11-30 23:02 ` Chris Murphy
@ 2016-12-01  7:49   ` Stefan Priebe - Profihost AG
  2016-12-01  8:12     ` Andrei Borzenkov
  2016-12-01  8:18   ` Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-12-01  7:49 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs@vger.kernel.org

Am 01.12.2016 um 00:02 schrieb Chris Murphy:
> On Wed, Nov 30, 2016 at 2:03 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Hello,
>>
>> # btrfs balance start -v -dusage=0 -musage=1 /ssddisk/
>> Dumping filters: flags 0x7, state 0x0, force is off
>>   DATA (flags 0x2): balancing, usage=0
>>   METADATA (flags 0x2): balancing, usage=1
>>   SYSTEM (flags 0x2): balancing, usage=1
>> ERROR: error during balancing '/ssddisk/': No space left on device
>> There may be more info in syslog - try dmesg | tail
> 
> You haven't provided kernel messages at the time of the error.

Kernel Message:
[  429.107723] BTRFS info (device vdb1): 1 enospc errors during balance

> Also useful is the kernel version.

Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
which does the same.


>> # btrfs filesystem show /ssddisk/
>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>         Total devices 1 FS bytes used 305.67GiB
>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>
>> # btrfs filesystem usage /ssddisk/
>> Overall:
>>     Device size:                 500.00GiB
>>     Device allocated:            500.00GiB
>>     Device unallocated:            1.05MiB
> 
> Drive is actually fully allocated so if Btrfs needs to create a new
> chunk right now, it can't. However,

Yes but there's lot of free space:
    Free (estimated):            193.46GiB      (min: 193.46GiB)

How does this match?


> All three chunk types have quite a bit of unused space in them, so
> it's unclear why there's a no space left error.
> 
> Try remounting with enoscp_debug, and then trigger the problem again,
> and post the resulting kernel messages.

With enospc debug it says:
[39193.425682] BTRFS warning (device vdb1): no space to allocate a new
chunk for block group 839941881856
[39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance

Greets,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01  7:49   ` Stefan Priebe - Profihost AG
@ 2016-12-01  8:12     ` Andrei Borzenkov
  2016-12-01 11:55       ` Stefan Priebe - Profihost AG
  2016-12-01 13:51       ` Hans van Kranenburg
  0 siblings, 2 replies; 14+ messages in thread
From: Andrei Borzenkov @ 2016-12-01  8:12 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Chris Murphy, linux-btrfs@vger.kernel.org

On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
...
>
> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
> which does the same.
>
>
>>> # btrfs filesystem show /ssddisk/
>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>         Total devices 1 FS bytes used 305.67GiB
>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>
>>> # btrfs filesystem usage /ssddisk/
>>> Overall:
>>>     Device size:                 500.00GiB
>>>     Device allocated:            500.00GiB
>>>     Device unallocated:            1.05MiB
>>
>> Drive is actually fully allocated so if Btrfs needs to create a new
>> chunk right now, it can't. However,
>
> Yes but there's lot of free space:
>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>
> How does this match?
>
>
>> All three chunk types have quite a bit of unused space in them, so
>> it's unclear why there's a no space left error.
>>

I remember discussion that balance always tries to pre-allocate one
chunk in advance, and I believe there was patch to correct it but I am
not sure whether it was merged.

>> Try remounting with enoscp_debug, and then trigger the problem again,
>> and post the resulting kernel messages.
>
> With enospc debug it says:
> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
> chunk for block group 839941881856
> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01  8:12     ` Andrei Borzenkov
@ 2016-12-01 11:55       ` Stefan Priebe - Profihost AG
  2016-12-01 13:32         ` E V
  2016-12-01 13:51       ` Hans van Kranenburg
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-12-01 11:55 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, linux-btrfs@vger.kernel.org


Am 01.12.2016 um 09:12 schrieb Andrei Borzenkov:
> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
> ...
>>
>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>> which does the same.
>>
>>
>>>> # btrfs filesystem show /ssddisk/
>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>
>>>> # btrfs filesystem usage /ssddisk/
>>>> Overall:
>>>>     Device size:                 500.00GiB
>>>>     Device allocated:            500.00GiB
>>>>     Device unallocated:            1.05MiB
>>>
>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>> chunk right now, it can't. However,
>>
>> Yes but there's lot of free space:
>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>
>> How does this match?
>>
>>
>>> All three chunk types have quite a bit of unused space in them, so
>>> it's unclear why there's a no space left error.
>>>
> 
> I remember discussion that balance always tries to pre-allocate one
> chunk in advance, and I believe there was patch to correct it but I am
> not sure whether it was merged.

Is there otherwise a possibility to make the free space unallocated again?

Stefan

> 
>>> Try remounting with enoscp_debug, and then trigger the problem again,
>>> and post the resulting kernel messages.
>>
>> With enospc debug it says:
>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>> chunk for block group 839941881856
>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01 11:55       ` Stefan Priebe - Profihost AG
@ 2016-12-01 13:32         ` E V
  0 siblings, 0 replies; 14+ messages in thread
From: E V @ 2016-12-01 13:32 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Andrei Borzenkov, Chris Murphy, linux-btrfs@vger.kernel.org

I've frequently seen free space cache corruption lead to phantom
ENOSPC. You could try clearing the space cache, and/or mounting with
nospache_cache.

On Thu, Dec 1, 2016 at 6:55 AM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> Am 01.12.2016 um 09:12 schrieb Andrei Borzenkov:
>> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
>> ...
>>>
>>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>>> which does the same.
>>>
>>>
>>>>> # btrfs filesystem show /ssddisk/
>>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>>
>>>>> # btrfs filesystem usage /ssddisk/
>>>>> Overall:
>>>>>     Device size:                 500.00GiB
>>>>>     Device allocated:            500.00GiB
>>>>>     Device unallocated:            1.05MiB
>>>>
>>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>>> chunk right now, it can't. However,
>>>
>>> Yes but there's lot of free space:
>>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>>
>>> How does this match?
>>>
>>>
>>>> All three chunk types have quite a bit of unused space in them, so
>>>> it's unclear why there's a no space left error.
>>>>
>>
>> I remember discussion that balance always tries to pre-allocate one
>> chunk in advance, and I believe there was patch to correct it but I am
>> not sure whether it was merged.
>
> Is there otherwise a possibility to make the free space unallocated again?
>
> Stefan
>
>>
>>>> Try remounting with enoscp_debug, and then trigger the problem again,
>>>> and post the resulting kernel messages.
>>>
>>> With enospc debug it says:
>>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>>> chunk for block group 839941881856
>>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01  8:12     ` Andrei Borzenkov
  2016-12-01 11:55       ` Stefan Priebe - Profihost AG
@ 2016-12-01 13:51       ` Hans van Kranenburg
  2016-12-01 14:10         ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 14+ messages in thread
From: Hans van Kranenburg @ 2016-12-01 13:51 UTC (permalink / raw)
  To: Andrei Borzenkov, Stefan Priebe - Profihost AG
  Cc: Chris Murphy, linux-btrfs@vger.kernel.org

On 12/01/2016 09:12 AM, Andrei Borzenkov wrote:
> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
> ...
>>
>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>> which does the same.
>>
>>
>>>> # btrfs filesystem show /ssddisk/
>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>
>>>> # btrfs filesystem usage /ssddisk/
>>>> Overall:
>>>>     Device size:                 500.00GiB
>>>>     Device allocated:            500.00GiB
>>>>     Device unallocated:            1.05MiB
>>>
>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>> chunk right now, it can't. However,
>>
>> Yes but there's lot of free space:
>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>
>> How does this match?
>>
>>
>>> All three chunk types have quite a bit of unused space in them, so
>>> it's unclear why there's a no space left error.
>>>
> 
> I remember discussion that balance always tries to pre-allocate one
> chunk in advance, and I believe there was patch to correct it but I am
> not sure whether it was merged.

http://www.spinics.net/lists/linux-btrfs/msg56772.html

>>> Try remounting with enoscp_debug, and then trigger the problem again,
>>> and post the resulting kernel messages.
>>
>> With enospc debug it says:
>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>> chunk for block group 839941881856
>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01 13:51       ` Hans van Kranenburg
@ 2016-12-01 14:10         ` Stefan Priebe - Profihost AG
  2016-12-01 15:48           ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-12-01 14:10 UTC (permalink / raw)
  To: Hans van Kranenburg, Andrei Borzenkov
  Cc: Chris Murphy, linux-btrfs@vger.kernel.org


Am 01.12.2016 um 14:51 schrieb Hans van Kranenburg:
> On 12/01/2016 09:12 AM, Andrei Borzenkov wrote:
>> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
>> ...
>>>
>>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>>> which does the same.
>>>
>>>
>>>>> # btrfs filesystem show /ssddisk/
>>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>>
>>>>> # btrfs filesystem usage /ssddisk/
>>>>> Overall:
>>>>>     Device size:                 500.00GiB
>>>>>     Device allocated:            500.00GiB
>>>>>     Device unallocated:            1.05MiB
>>>>
>>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>>> chunk right now, it can't. However,
>>>
>>> Yes but there's lot of free space:
>>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>>
>>> How does this match?
>>>
>>>
>>>> All three chunk types have quite a bit of unused space in them, so
>>>> it's unclear why there's a no space left error.
>>>>
>>
>> I remember discussion that balance always tries to pre-allocate one
>> chunk in advance, and I believe there was patch to correct it but I am
>> not sure whether it was merged.
> 
> http://www.spinics.net/lists/linux-btrfs/msg56772.html

Thanks - still don't understand why that one is not upstream or why it
was reverted. Looks absolutely reasonable to me. Other option would be
to make it possible to make allocated unused space unallocted again - no
idea how todo that.

> 
>>>> Try remounting with enoscp_debug, and then trigger the problem again,
>>>> and post the resulting kernel messages.
>>>
>>> With enospc debug it says:
>>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>>> chunk for block group 839941881856
>>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01 14:10         ` Stefan Priebe - Profihost AG
@ 2016-12-01 15:48           ` Chris Murphy
  2016-12-01 18:43             ` Stefan Priebe - Profihost AG
  2016-12-03  4:43             ` Andrei Borzenkov
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2016-12-01 15:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Hans van Kranenburg, Andrei Borzenkov, Chris Murphy,
	linux-btrfs@vger.kernel.org

On Thu, Dec 1, 2016 at 7:10 AM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> Am 01.12.2016 um 14:51 schrieb Hans van Kranenburg:
>> On 12/01/2016 09:12 AM, Andrei Borzenkov wrote:
>>> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag> wrote:
>>> ...
>>>>
>>>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>>>> which does the same.
>>>>
>>>>
>>>>>> # btrfs filesystem show /ssddisk/
>>>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>>>
>>>>>> # btrfs filesystem usage /ssddisk/
>>>>>> Overall:
>>>>>>     Device size:                 500.00GiB
>>>>>>     Device allocated:            500.00GiB
>>>>>>     Device unallocated:            1.05MiB
>>>>>
>>>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>>>> chunk right now, it can't. However,
>>>>
>>>> Yes but there's lot of free space:
>>>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>>>
>>>> How does this match?
>>>>
>>>>
>>>>> All three chunk types have quite a bit of unused space in them, so
>>>>> it's unclear why there's a no space left error.
>>>>>
>>>
>>> I remember discussion that balance always tries to pre-allocate one
>>> chunk in advance, and I believe there was patch to correct it but I am
>>> not sure whether it was merged.
>>
>> http://www.spinics.net/lists/linux-btrfs/msg56772.html
>
> Thanks - still don't understand why that one is not upstream or why it
> was reverted. Looks absolutely reasonable to me.

It is upstream and hasn't been reverted.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/volumes.c?id=refs/tags/v4.8.11
line 3650

I would try Duncan's idea of using just one filter and seeing what happens:

'btrfs balance start -dusage=1 <mp>'


>>>> With enospc debug it says:
>>>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>>>> chunk for block group 839941881856
>>>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance

It might be nice if this stated what kind of chunk it's trying to allocate.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01 15:48           ` Chris Murphy
@ 2016-12-01 18:43             ` Stefan Priebe - Profihost AG
  2016-12-03  4:43             ` Andrei Borzenkov
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-12-01 18:43 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Hans van Kranenburg, Andrei Borzenkov,
	linux-btrfs@vger.kernel.org


Am 01.12.2016 um 16:48 schrieb Chris Murphy:
> On Thu, Dec 1, 2016 at 7:10 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>
>> Am 01.12.2016 um 14:51 schrieb Hans van Kranenburg:
>>> On 12/01/2016 09:12 AM, Andrei Borzenkov wrote:
>>>> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
>>>> <s.priebe@profihost.ag> wrote:
>>>> ...
>>>>>
>>>>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>>>>> which does the same.
>>>>>
>>>>>
>>>>>>> # btrfs filesystem show /ssddisk/
>>>>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>>>>
>>>>>>> # btrfs filesystem usage /ssddisk/
>>>>>>> Overall:
>>>>>>>     Device size:                 500.00GiB
>>>>>>>     Device allocated:            500.00GiB
>>>>>>>     Device unallocated:            1.05MiB
>>>>>>
>>>>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>>>>> chunk right now, it can't. However,
>>>>>
>>>>> Yes but there's lot of free space:
>>>>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>>>>
>>>>> How does this match?
>>>>>
>>>>>
>>>>>> All three chunk types have quite a bit of unused space in them, so
>>>>>> it's unclear why there's a no space left error.
>>>>>>
>>>>
>>>> I remember discussion that balance always tries to pre-allocate one
>>>> chunk in advance, and I believe there was patch to correct it but I am
>>>> not sure whether it was merged.
>>>
>>> http://www.spinics.net/lists/linux-btrfs/msg56772.html
>>
>> Thanks - still don't understand why that one is not upstream or why it
>> was reverted. Looks absolutely reasonable to me.
> 
> It is upstream and hasn't been reverted.
> 
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/volumes.c?id=refs/tags/v4.8.11
> line 3650
> 
> I would try Duncan's idea of using just one filter and seeing what happens:
> 
> 'btrfs balance start -dusage=1 <mp>'

see below:

[zabbix-db ~]# btrfs balance start -dusage=1 /ssddisk/
Done, had to relocate 0 out of 505 chunks
[zabbix-db ~]# btrfs balance start -dusage=10 /ssddisk/
Done, had to relocate 0 out of 505 chunks
[zabbix-db ~]# btrfs balance start -musage=1 /ssddisk/
ERROR: error during balancing '/ssddisk/': No space left on device
There may be more info in syslog - try dmesg | tail
[zabbix-db ~]# dmesg
[78306.288834] BTRFS warning (device vdb1): no space to allocate a new
chunk for block group 839941881856
[78306.289197] BTRFS info (device vdb1): 1 enospc errors during balance

> 
> 
>>>>> With enospc debug it says:
>>>>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>>>>> chunk for block group 839941881856
>>>>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
> 
> It might be nice if this stated what kind of chunk it's trying to allocate.
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-01 15:48           ` Chris Murphy
  2016-12-01 18:43             ` Stefan Priebe - Profihost AG
@ 2016-12-03  4:43             ` Andrei Borzenkov
  2016-12-05 11:12               ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 14+ messages in thread
From: Andrei Borzenkov @ 2016-12-03  4:43 UTC (permalink / raw)
  To: Chris Murphy, Stefan Priebe - Profihost AG
  Cc: Hans van Kranenburg, linux-btrfs@vger.kernel.org

01.12.2016 18:48, Chris Murphy пишет:
> On Thu, Dec 1, 2016 at 7:10 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>
>> Am 01.12.2016 um 14:51 schrieb Hans van Kranenburg:
>>> On 12/01/2016 09:12 AM, Andrei Borzenkov wrote:
>>>> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
>>>> <s.priebe@profihost.ag> wrote:
>>>> ...
>>>>>
>>>>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>>>>> which does the same.
>>>>>
>>>>>
>>>>>>> # btrfs filesystem show /ssddisk/
>>>>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>>>>
>>>>>>> # btrfs filesystem usage /ssddisk/
>>>>>>> Overall:
>>>>>>>     Device size:                 500.00GiB
>>>>>>>     Device allocated:            500.00GiB
>>>>>>>     Device unallocated:            1.05MiB
>>>>>>
>>>>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>>>>> chunk right now, it can't. However,
>>>>>
>>>>> Yes but there's lot of free space:
>>>>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>>>>
>>>>> How does this match?
>>>>>
>>>>>
>>>>>> All three chunk types have quite a bit of unused space in them, so
>>>>>> it's unclear why there's a no space left error.
>>>>>>
>>>>
>>>> I remember discussion that balance always tries to pre-allocate one
>>>> chunk in advance, and I believe there was patch to correct it but I am
>>>> not sure whether it was merged.
>>>
>>> http://www.spinics.net/lists/linux-btrfs/msg56772.html
>>
>> Thanks - still don't understand why that one is not upstream or why it
>> was reverted. Looks absolutely reasonable to me.
> 
> It is upstream and hasn't been reverted.
> 
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/volumes.c?id=refs/tags/v4.8.11
> line 3650
> 
> I would try Duncan's idea of using just one filter and seeing what happens:
> 
> 'btrfs balance start -dusage=1 <mp>'
> 

Actually I just hit exactly the same symptoms on my VM where device was
fully allocated and metadata balance failed, but data balance succeeded
to free up space which allowed metadata balance to run too. This is
under 4.8.10.

So it appears that balance logic between data and metadata is somehow
different.

As this VM gets in 100% allocated condition fairly often I'd try to get
better understanding next time.


> 
>>>>> With enospc debug it says:
>>>>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>>>>> chunk for block group 839941881856
>>>>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
> 
> It might be nice if this stated what kind of chunk it's trying to allocate.
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-03  4:43             ` Andrei Borzenkov
@ 2016-12-05 11:12               ` Stefan Priebe - Profihost AG
  2016-12-05 11:51                 ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-12-05 11:12 UTC (permalink / raw)
  To: Andrei Borzenkov, Chris Murphy
  Cc: Hans van Kranenburg, linux-btrfs@vger.kernel.org

isn't there a way to move free space to unallocated space again?


Am 03.12.2016 um 05:43 schrieb Andrei Borzenkov:
> 01.12.2016 18:48, Chris Murphy пишет:
>> On Thu, Dec 1, 2016 at 7:10 AM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
>>>
>>> Am 01.12.2016 um 14:51 schrieb Hans van Kranenburg:
>>>> On 12/01/2016 09:12 AM, Andrei Borzenkov wrote:
>>>>> On Thu, Dec 1, 2016 at 10:49 AM, Stefan Priebe - Profihost AG
>>>>> <s.priebe@profihost.ag> wrote:
>>>>> ...
>>>>>>
>>>>>> Custom 4.4 kernel with patches up to 4.10. But i already tried 4.9-rc7
>>>>>> which does the same.
>>>>>>
>>>>>>
>>>>>>>> # btrfs filesystem show /ssddisk/
>>>>>>>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>>>>>>>         Total devices 1 FS bytes used 305.67GiB
>>>>>>>>         devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1
>>>>>>>>
>>>>>>>> # btrfs filesystem usage /ssddisk/
>>>>>>>> Overall:
>>>>>>>>     Device size:                 500.00GiB
>>>>>>>>     Device allocated:            500.00GiB
>>>>>>>>     Device unallocated:            1.05MiB
>>>>>>>
>>>>>>> Drive is actually fully allocated so if Btrfs needs to create a new
>>>>>>> chunk right now, it can't. However,
>>>>>>
>>>>>> Yes but there's lot of free space:
>>>>>>     Free (estimated):            193.46GiB      (min: 193.46GiB)
>>>>>>
>>>>>> How does this match?
>>>>>>
>>>>>>
>>>>>>> All three chunk types have quite a bit of unused space in them, so
>>>>>>> it's unclear why there's a no space left error.
>>>>>>>
>>>>>
>>>>> I remember discussion that balance always tries to pre-allocate one
>>>>> chunk in advance, and I believe there was patch to correct it but I am
>>>>> not sure whether it was merged.
>>>>
>>>> http://www.spinics.net/lists/linux-btrfs/msg56772.html
>>>
>>> Thanks - still don't understand why that one is not upstream or why it
>>> was reverted. Looks absolutely reasonable to me.
>>
>> It is upstream and hasn't been reverted.
>>
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/volumes.c?id=refs/tags/v4.8.11
>> line 3650
>>
>> I would try Duncan's idea of using just one filter and seeing what happens:
>>
>> 'btrfs balance start -dusage=1 <mp>'
>>
> 
> Actually I just hit exactly the same symptoms on my VM where device was
> fully allocated and metadata balance failed, but data balance succeeded
> to free up space which allowed metadata balance to run too. This is
> under 4.8.10.
> 
> So it appears that balance logic between data and metadata is somehow
> different.
> 
> As this VM gets in 100% allocated condition fairly often I'd try to get
> better understanding next time.
> 
> 
>>
>>>>>> With enospc debug it says:
>>>>>> [39193.425682] BTRFS warning (device vdb1): no space to allocate a new
>>>>>> chunk for block group 839941881856
>>>>>> [39193.426033] BTRFS info (device vdb1): 1 enospc errors during balance
>>
>> It might be nice if this stated what kind of chunk it's trying to allocate.
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-12-05 11:12               ` Stefan Priebe - Profihost AG
@ 2016-12-05 11:51                 ` Duncan
  0 siblings, 0 replies; 14+ messages in thread
From: Duncan @ 2016-12-05 11:51 UTC (permalink / raw)
  To: linux-btrfs

Stefan Priebe - Profihost AG posted on Mon, 05 Dec 2016 12:12:12 +0100 as
excerpted:

> isn't there a way to move free space to unallocated space again?

Yes, btrfs balance, but...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Metadata balance fails ENOSPC
  2016-11-30 23:02 ` Chris Murphy
  2016-12-01  7:49   ` Stefan Priebe - Profihost AG
@ 2016-12-01  8:18   ` Duncan
  1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2016-12-01  8:18 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Wed, 30 Nov 2016 16:02:29 -0700 as excerpted:

> On Wed, Nov 30, 2016 at 2:03 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Hello,
>>
>> # btrfs balance start -v -dusage=0 -musage=1 /ssddisk/
>> Dumping filters: flags 0x7, state 0x0, force is off
>>   DATA (flags 0x2): balancing, usage=0
>>   METADATA (flags 0x2): balancing, usage=1
>>   SYSTEM (flags 0x2): balancing, usage=1
>> ERROR: error during balancing '/ssddisk/': No space left on device
>> There may be more info in syslog - try dmesg | tail
> 
> You haven't provided kernel messages at the time of the error.
> 
> Also useful is the kernel version.

I won't disagree here as often it's kernel-version-specific behavior in 
question, but in this case I think the behavior is generic and the 
question can thus be answered on that basis, without the kernel version 
or dmesg output.

@ Chris: Note that the ENOSPC wasn't during ordinary use, but 
/specifically/ during balance, which behaves a bit differently regarding 
ENOSPC, and I believe it's that version-generic behavior difference 
that's in focus, here.

>> # btrfs filesystem show /ssddisk/
>> Label: none  uuid: a69d2e90-c2ca-4589-9876-234446868adc
>>         Total devices 1 FS bytes used 305.67GiB
>          devid    1 size 500.00GiB used 500.00GiB path /dev/vdb1

Device line says 100% used (meaning allocated).  The below simply shows 
it a different way, confirming the 100% used.

>> # btrfs filesystem usage /ssddisk/
>> Overall:
>>     Device size:                 500.00GiB
>>     Device allocated:            500.00GiB
>>     Device unallocated:            1.05MiB
> 
> Drive is actually fully allocated so if Btrfs needs to create a new
> chunk right now, it can't.

... And that right there is the problem.

When doing chunk consolidation, with one exception noted below, btrfs 
balance creates new chunks to write into, then rewrites the content from 
the old into the new.  But there's no space left (1 MiB isn't enough) 
unallocated to allocate new chunks from, so balance errors out with 
ENOSPC.

>> Data,single: Size:483.97GiB, Used:298.18GiB
>>    /dev/vdb1     483.97GiB
>>
>> Metadata,single: Size:16.00GiB, Used:7.51GiB
>>    /dev/vdb1      16.00GiB
>>
>> System,single: Size:32.00MiB, Used:144.00KiB
>>    /dev/vdb1      32.00MiB
> 
> All three chunk types have quite a bit of unused space in them, so it's
> unclear why there's a no space left error.

Normal usage can still write into the existing chunks since they're not 
yet entirely full, but that's not where the error occurred.  There's no 
space left unallocated to allocate further chunks from, and that's what 
balance, with one single exception, must do first, allocate a new chunk 
in ordered to write into, so it errors out.

The one single exception is when there's actually nothing to rewrite, the 
usage=0 case, in which case balance will simply erase any entirely empty 
chunks of the appropriate type (-d=data, -m=metadata).

This _used_ to be required somewhat regularly, as the kernel knew how to 
allocate new chunks but couldn't deallocate chunks, even entirely empty 
chunks, without a balance.  However, since 3.16 (IIRC), the kernel has 
been able to deallocate entirely empty chunks entirely on its own 
(automatically), and does so reasonably regularly in normal usage, so the 
issue of zero-sized chunks is far rarer than it used to be.

But apparently there's still a bug or two somewhere, as we still get 
reports of the usage=0 filter actually deallocating some empty chunks 
back to unallocated, even on kernels that should be doing that 
automatically.  It's not as common as it once was, but it does still 
happen.

So the usage=0 filter, the only case where the kernel doesn't have to 
create a new chunk in ordered to clear space during a balance, because 
it's not actually writing a new chunk, only deleting an empty one, does 
still make sense to try, because sometimes it _does_ work, and in the 
100% allocated case it's the simplest thing to try so it's worth trying 
even tho there's a good chance it won't work, because the kernel is 
/supposed/ to be removing those chunks automatically now, and /usually/ 
does just that.

OK, so what was wrong with the above command, and what should be tried 
instead?

The above command used TWO filters, -dusage=0 -musage=1 .  It choked on 
the -musage=1, apparently because it tried a less-than 1% full but not /
entirely/ empty metadata chunk first, before trying data chunks with the 
-dusuage=0, which should have succeeded, even if it found no empty data 
chunks to remove.

So the fix is to try either -dusage=0 -musage=0 together, first, or to 
try -dusage by itself first (and possibly -musage=0 after that), before 
trying -musage=1.

If it works and there are empty chunks of either type that can be 
removed, hopefully that will free up enough space to write at least one 
more metadata chunk, leaving room to create at least the one more (it'd 
be two with dup metadata, but here it's single so just one, typically 256 
MiB tho it can be larger), and the -musage= can be slowly incremented as 
necessary until there's enough space unallocated to work more freely once 
again.

The reason to tackle metadata first, once the usage=0 filters have 
cleared the entirely empty chunks out, is that metadata chunks are 
typically only 256 MiB, while data chunks are nominally 1 GiB in size.  
So if the usage=0 filters clear out more than 256 MiB but under a GiB, 
space will still be tight enough to only do metadata, but hopefully doing 
it will clear even more space (it should given the numbers, but you may 
have to increment the usage= some, first), GiBs worth, so then data can 
be done as well.

If neither -dusage=0 nor -musage=0 clear anything, as may well be the 
case if the kernel is indeed clearing the empty chunks as it should, then 
it's time for more drastic measures.

Since you still have free space in both data and metadata, you're not in 
/too/ bad a shape, and deleting some files (which should be backed up 
anyway, given that btrfs is still stabilizing and ready backups are 
strongly recommended) or snapshots should eventually empty out some 
chunks.  The trouble is, it's a bit of trial and error to know what and 
how much to delete in ordered to empty some chunks (unless you use the 
debug commands to trace files down to individual chunks, but that'd be 
quite some manual work if nobody's written a tool to help with the task, 
alread, tho they may), which is time and hassle.

The other alternative is to btrfs device add a temporary device of 
perhaps 30-60 GiB or so -- a thumbdrive will work if necessary.  Then do 
the balance (being sure to specify single profile metadata as it'll 
default to raid1 as soon as there's a second device) using usage= to 
rewrite and combine chunks as necessary to free up some of that allocated 
but unused data and metadata space.  Then once a suitable amount of space 
has been freed, btrfs device remove the temporary device once again, thus 
triggering balance to write everything from it back to the original 
device once again.

Which is why the usage=0 is still worth trying, even tho it doesn't work 
a lot of the time these days, because there's no empty chunks -- it's by 
far the easiest of the three alternatives, and when it /does/ work it's 
very fast and nearly hassle-free, certainly compared to either of the 
other two alternatives, deleting stuff, or doing the temporary device 
dance.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-12-05 11:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-30 21:03 Metadata balance fails ENOSPC Stefan Priebe - Profihost AG
2016-11-30 23:02 ` Chris Murphy
2016-12-01  7:49   ` Stefan Priebe - Profihost AG
2016-12-01  8:12     ` Andrei Borzenkov
2016-12-01 11:55       ` Stefan Priebe - Profihost AG
2016-12-01 13:32         ` E V
2016-12-01 13:51       ` Hans van Kranenburg
2016-12-01 14:10         ` Stefan Priebe - Profihost AG
2016-12-01 15:48           ` Chris Murphy
2016-12-01 18:43             ` Stefan Priebe - Profihost AG
2016-12-03  4:43             ` Andrei Borzenkov
2016-12-05 11:12               ` Stefan Priebe - Profihost AG
2016-12-05 11:51                 ` Duncan
2016-12-01  8:18   ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).