* 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0)
@ 2017-09-08 4:33 Tomasz Chmielewski
2017-09-08 4:56 ` Tomasz Chmielewski
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Tomasz Chmielewski @ 2017-09-08 4:33 UTC (permalink / raw)
To: linux-btrfs
Just got this one in dmesg with btrfs RAID-1 on top of Linux software
RAID-5.
Why does it say "No space left" if we have 9 TB free there?
[233787.920933] BTRFS: Transaction aborted (error -28)
[233787.920953] ------------[ cut here ]------------
[233787.920971] WARNING: CPU: 1 PID: 2235 at
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989
__btrfs_free_extent.isra.62+0xc2c/0xdb0 [btrfs]
[233787.920971] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6
xt_NFLOG xt_conntrack ip6table_filter ip6_tables xt_CHECKSUM
iptable_mangle xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4
xt_comment iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_filter ip_tables x_tables nfnetlink_log
nfnetlink rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache
sunrpc bluetooth ecdh_generic binfmt_misc veth bridge stp llc btrfs
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ppdev
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd eeepc_wmi
intel_cstate intel_rapl_perf input_leds asus_wmi sparse_keymap serio_raw
wmi_bmof parport_pc shpchp ie31200_edac tpm_infineon lpc_ich parport
[233787.920992] mac_hid autofs4 raid0 multipath linear raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid10
raid6_pq libcrc32c raid1 ahci r8169 libahci mii wmi video
[233787.921001] CPU: 1 PID: 2235 Comm: btrfs-transacti Not tainted
4.13.0-041300-generic #201709031731
[233787.921002] Hardware name: System manufacturer System Product
Name/P8H77-M PRO, BIOS 9002 05/30/2014
[233787.921002] task: ffff943b0a779740 task.stack: ffffb1c4491a4000
[233787.921012] RIP: 0010:__btrfs_free_extent.isra.62+0xc2c/0xdb0
[btrfs]
[233787.921013] RSP: 0018:ffffb1c4491a7b08 EFLAGS: 00010286
[233787.921013] RAX: 0000000000000026 RBX: 00000cdf3dddc000 RCX:
0000000000000000
[233787.921014] RDX: 0000000000000000 RSI: ffff943b5fa4dc78 RDI:
ffff943b5fa4dc78
[233787.921014] RBP: ffffb1c4491a7bb0 R08: 0000000000000001 R09:
00000000000004c5
[233787.921015] R10: 00000013dd7ec000 R11: 0000000000000000 R12:
ffff943b0d1c0000
[233787.921015] R13: 00000000ffffffe4 R14: 0000000000000000 R15:
ffff943b0cadcee0
[233787.921016] FS: 0000000000000000(0000) GS:ffff943b5fa40000(0000)
knlGS:0000000000000000
[233787.921016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[233787.921017] CR2: 00007ffc9230dde8 CR3: 000000075a009000 CR4:
00000000001406e0
[233787.921018] Call Trace:
[233787.921031] ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs]
[233787.921039] __btrfs_run_delayed_refs+0x6f0/0x1380 [btrfs]
[233787.921047] btrfs_run_delayed_refs+0x6b/0x250 [btrfs]
[233787.921054] btrfs_write_dirty_block_groups+0x158/0x390 [btrfs]
[233787.921063] commit_cowonly_roots+0x221/0x2c0 [btrfs]
[233787.921071] btrfs_commit_transaction+0x46e/0x8d0 [btrfs]
[233787.921079] transaction_kthread+0x1a2/0x1c0 [btrfs]
[233787.921081] kthread+0x125/0x140
[233787.921088] ? btrfs_cleanup_transaction+0x500/0x500 [btrfs]
[233787.921089] ? kthread_create_on_node+0x70/0x70
[233787.921091] ret_from_fork+0x25/0x30
[233787.921092] Code: 3e d3 0f ff eb d0 44 89 ee 48 c7 c7 40 53 7a c0 e8
0b ba 3e d3 0f ff e9 76 fb ff ff 44 89 ee 48 c7 c7 40 53 7a c0 e8 f5 b9
3e d3 <0f> ff e9 f7 f4 ff ff 8b 55 20 48 89 c1 49 89 d8 48 c7 c6 20 54
[233787.921107] ---[ end trace f4e71e70fbc200d2 ]---
[233787.921132] BTRFS: error (device md2) in __btrfs_free_extent:6989:
errno=-28 No space left
[233787.921189] BTRFS info (device md2): forced readonly
[233787.921191] BTRFS: error (device md2) in
btrfs_run_delayed_refs:3009: errno=-28 No space left
[233789.507669] BTRFS warning (device md2): Skipping commit of aborted
transaction.
[233789.507672] BTRFS: error (device md2) in cleanup_transaction:1873:
errno=-28 No space left
# df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/md2 17T 7.3T 9.1T 45% /data
# btrfs fi show /data
Label: 'data' uuid: fddbd057-4fa6-4b2e-a9ca-993829bab4b9
Total devices 1 FS bytes used 7.21TiB
devid 1 size 16.30TiB used 12.99TiB path /dev/md2
# btrfs fi df /data
Data, single: total=12.84TiB, used=7.13TiB
System, DUP: total=8.00MiB, used=1.48MiB
Metadata, DUP: total=79.00GiB, used=77.87GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
root@srv8 ~ # btrfs fi usage /data
Overall:
Device size: 16.30TiB
Device allocated: 12.99TiB
Device unallocated: 3.31TiB
Device missing: 0.00B
Used: 7.29TiB
Free (estimated): 9.01TiB (min: 7.36TiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:12.84TiB, Used:7.13TiB
/dev/md2 12.84TiB
Metadata,DUP: Size:79.00GiB, Used:77.87GiB
/dev/md2 158.00GiB
System,DUP: Size:8.00MiB, Used:1.48MiB
/dev/md2 16.00MiB
Unallocated:
/dev/md2 3.31TiB
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0)
2017-09-08 4:33 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0) Tomasz Chmielewski
@ 2017-09-08 4:56 ` Tomasz Chmielewski
2017-09-08 12:45 ` Tomasz Chmielewski
2017-09-08 11:02 ` Peter Grandi
2017-09-08 19:57 ` Josef Bacik
2 siblings, 1 reply; 5+ messages in thread
From: Tomasz Chmielewski @ 2017-09-08 4:56 UTC (permalink / raw)
To: linux-btrfs
On 2017-09-08 13:33, Tomasz Chmielewski wrote:
> Just got this one in dmesg with btrfs RAID-1 on top of Linux software
> RAID-5.
Should say: with btrfs _single_ on top of Linux software RAID-5.
> Why does it say "No space left" if we have 9 TB free there?
>
> [233787.920933] BTRFS: Transaction aborted (error -28)
> [233787.920953] ------------[ cut here ]------------
> [233787.920971] WARNING: CPU: 1 PID: 2235 at
> /home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989
> __btrfs_free_extent.isra.62+0xc2c/0xdb0 [btrfs]
> [233787.920971] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6
> xt_NFLOG xt_conntrack ip6table_filter ip6_tables xt_CHECKSUM
> iptable_mangle xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4
> xt_comment iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_filter ip_tables x_tables nfnetlink_log
> nfnetlink rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache
> sunrpc bluetooth ecdh_generic binfmt_misc veth bridge stp llc btrfs
> intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
> kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
> ppdev aesni_intel aes_x86_64 crypto_simd glue_helper cryptd eeepc_wmi
> intel_cstate intel_rapl_perf input_leds asus_wmi sparse_keymap
> serio_raw wmi_bmof parport_pc shpchp ie31200_edac tpm_infineon lpc_ich
> parport
> [233787.920992] mac_hid autofs4 raid0 multipath linear raid456
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid10
> raid6_pq libcrc32c raid1 ahci r8169 libahci mii wmi video
> [233787.921001] CPU: 1 PID: 2235 Comm: btrfs-transacti Not tainted
> 4.13.0-041300-generic #201709031731
> [233787.921002] Hardware name: System manufacturer System Product
> Name/P8H77-M PRO, BIOS 9002 05/30/2014
> [233787.921002] task: ffff943b0a779740 task.stack: ffffb1c4491a4000
> [233787.921012] RIP: 0010:__btrfs_free_extent.isra.62+0xc2c/0xdb0
> [btrfs]
> [233787.921013] RSP: 0018:ffffb1c4491a7b08 EFLAGS: 00010286
> [233787.921013] RAX: 0000000000000026 RBX: 00000cdf3dddc000 RCX:
> 0000000000000000
> [233787.921014] RDX: 0000000000000000 RSI: ffff943b5fa4dc78 RDI:
> ffff943b5fa4dc78
> [233787.921014] RBP: ffffb1c4491a7bb0 R08: 0000000000000001 R09:
> 00000000000004c5
> [233787.921015] R10: 00000013dd7ec000 R11: 0000000000000000 R12:
> ffff943b0d1c0000
> [233787.921015] R13: 00000000ffffffe4 R14: 0000000000000000 R15:
> ffff943b0cadcee0
> [233787.921016] FS: 0000000000000000(0000) GS:ffff943b5fa40000(0000)
> knlGS:0000000000000000
> [233787.921016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [233787.921017] CR2: 00007ffc9230dde8 CR3: 000000075a009000 CR4:
> 00000000001406e0
> [233787.921018] Call Trace:
> [233787.921031] ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs]
> [233787.921039] __btrfs_run_delayed_refs+0x6f0/0x1380 [btrfs]
> [233787.921047] btrfs_run_delayed_refs+0x6b/0x250 [btrfs]
> [233787.921054] btrfs_write_dirty_block_groups+0x158/0x390 [btrfs]
> [233787.921063] commit_cowonly_roots+0x221/0x2c0 [btrfs]
> [233787.921071] btrfs_commit_transaction+0x46e/0x8d0 [btrfs]
> [233787.921079] transaction_kthread+0x1a2/0x1c0 [btrfs]
> [233787.921081] kthread+0x125/0x140
> [233787.921088] ? btrfs_cleanup_transaction+0x500/0x500 [btrfs]
> [233787.921089] ? kthread_create_on_node+0x70/0x70
> [233787.921091] ret_from_fork+0x25/0x30
> [233787.921092] Code: 3e d3 0f ff eb d0 44 89 ee 48 c7 c7 40 53 7a c0
> e8 0b ba 3e d3 0f ff e9 76 fb ff ff 44 89 ee 48 c7 c7 40 53 7a c0 e8
> f5 b9 3e d3 <0f> ff e9 f7 f4 ff ff 8b 55 20 48 89 c1 49 89 d8 48 c7 c6
> 20 54
> [233787.921107] ---[ end trace f4e71e70fbc200d2 ]---
> [233787.921132] BTRFS: error (device md2) in __btrfs_free_extent:6989:
> errno=-28 No space left
> [233787.921189] BTRFS info (device md2): forced readonly
> [233787.921191] BTRFS: error (device md2) in
> btrfs_run_delayed_refs:3009: errno=-28 No space left
> [233789.507669] BTRFS warning (device md2): Skipping commit of aborted
> transaction.
> [233789.507672] BTRFS: error (device md2) in cleanup_transaction:1873:
> errno=-28 No space left
>
>
>
>
> # df -h /data
> Filesystem Size Used Avail Use% Mounted on
> /dev/md2 17T 7.3T 9.1T 45% /data
>
>
> # btrfs fi show /data
> Label: 'data' uuid: fddbd057-4fa6-4b2e-a9ca-993829bab4b9
> Total devices 1 FS bytes used 7.21TiB
> devid 1 size 16.30TiB used 12.99TiB path /dev/md2
>
> # btrfs fi df /data
> Data, single: total=12.84TiB, used=7.13TiB
> System, DUP: total=8.00MiB, used=1.48MiB
> Metadata, DUP: total=79.00GiB, used=77.87GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> root@srv8 ~ # btrfs fi usage /data
> Overall:
> Device size: 16.30TiB
> Device allocated: 12.99TiB
> Device unallocated: 3.31TiB
> Device missing: 0.00B
> Used: 7.29TiB
> Free (estimated): 9.01TiB (min: 7.36TiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
> Data,single: Size:12.84TiB, Used:7.13TiB
> /dev/md2 12.84TiB
>
> Metadata,DUP: Size:79.00GiB, Used:77.87GiB
> /dev/md2 158.00GiB
>
> System,DUP: Size:8.00MiB, Used:1.48MiB
> /dev/md2 16.00MiB
>
> Unallocated:
> /dev/md2 3.31TiB
>
>
>
> Tomasz Chmielewski
> https://lxadm.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0)
2017-09-08 4:33 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0) Tomasz Chmielewski
2017-09-08 4:56 ` Tomasz Chmielewski
@ 2017-09-08 11:02 ` Peter Grandi
2017-09-08 19:57 ` Josef Bacik
2 siblings, 0 replies; 5+ messages in thread
From: Peter Grandi @ 2017-09-08 11:02 UTC (permalink / raw)
To: Linux fs Btrfs
[ ... ]
> [233787.921018] Call Trace:
> [233787.921031] ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs]
> [233787.921039] __btrfs_run_delayed_refs+0x6f0/0x1380 [btrfs]
> [233787.921047] btrfs_run_delayed_refs+0x6b/0x250 [btrfs]
> [233787.921054] btrfs_write_dirty_block_groups+0x158/0x390 [btrfs]
> [233787.921063] commit_cowonly_roots+0x221/0x2c0 [btrfs]
> [233787.921071] btrfs_commit_transaction+0x46e/0x8d0 [btrfs]
[ ... ]
> [233787.921191] BTRFS: error (device md2) in
> btrfs_run_delayed_refs:3009: errno=-28 No space left
> [233789.507669] BTRFS warning (device md2): Skipping commit of aborted
> transaction.
> [233789.507672] BTRFS: error (device md2) in cleanup_transaction:1873:
> errno=-28 No space left
[ ... ]
So the numbers that matter are:
> Data,single: Size:12.84TiB, Used:7.13TiB
> /dev/md2 12.84TiB
> Metadata,DUP: Size:79.00GiB, Used:77.87GiB
> /dev/md2 158.00GiB
> Unallocated:
> /dev/md2 3.31TiB
The metadata allocations is nearly full, so it could be the
usual story with the two-level allocator that there are not
unallocated chunks for metadata expansion, but since you have
3TiB of 'unallocated' space there is no obvious reason why
allocation of the metadata to do a new root transaction flush
should abort, so this is about "guessing" which corner case or
bug applies:
* If you are using the 'space_cache' it has a known issue:
https://btrfs.wiki.kernel.org/index.php/Gotchas#Free_space_cache
* Some versions of Btrfs (IIRC around 4.8-4.9) had some other
allocator bug.
* Maybe some previous issue, hw or sw, had damaged internal
filesystem structures.
I also notice that your volume's data free space seems to be
extremely fragmented, as the large difference here shows
"Data,single: Size:12.84TiB, Used:7.13TiB".
Which may mean that it is mounted with 'ssd' and/or has gone a
long time without a 'balance', and conceivably this can make it
easier for the free space cache to fail finding space (some
handwaving here).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0)
2017-09-08 4:56 ` Tomasz Chmielewski
@ 2017-09-08 12:45 ` Tomasz Chmielewski
0 siblings, 0 replies; 5+ messages in thread
From: Tomasz Chmielewski @ 2017-09-08 12:45 UTC (permalink / raw)
To: linux-btrfs
> So the numbers that matter are:
>
>> Data,single: Size:12.84TiB, Used:7.13TiB
>> /dev/md2 12.84TiB
>> Metadata,DUP: Size:79.00GiB, Used:77.87GiB
>> /dev/md2 158.00GiB
>> Unallocated:
>> /dev/md2 3.31TiB
> * If you are using the 'space_cache' it has a known issue:
> https://btrfs.wiki.kernel.org/index.php/Gotchas#Free_space_cache
# mount | grep btrfs
/dev/md2 on /data type btrfs
(rw,noatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)
Citing from the URL you pasted:
Free space cache
Currently sometimes the free space cache v1 and v2 lose track of
free space and a volume can be reported as not having free space when it
obviously does.
Fix: disable use of the free space cache with mount option
nospace_cache.
Fix: remount the volume with -o remount,clear_cache.
Switch to to new free space tree.
What does "switch to to new free space tree" mean / how to do it?
> I also notice that your volume's data free space seems to be
> extremely fragmented, as the large difference here shows
> "Data,single: Size:12.84TiB, Used:7.13TiB".
Yes, it's possible it will be very fragmented: lots of rsync + inplace
and many snapshots. Also - not sure if it matters - IO load is 100% or
close for most of the day.
> Which may mean that it is mounted with 'ssd' and/or has gone a
> long time without a 'balance', and conceivably this can make it
> easier for the free space cache to fail finding space (some
> handwaving here).
It's using HDDs, not mounted with "ssd" option.
I think there wasn't ever a balance run there. Since full balance may
take a few months to finish (!) and causes even more IO, I'm not a big
fan of running it.
Still, it does seem like a bug to me to error with "no space left", when
there is a lot of space left?
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0)
2017-09-08 4:33 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0) Tomasz Chmielewski
2017-09-08 4:56 ` Tomasz Chmielewski
2017-09-08 11:02 ` Peter Grandi
@ 2017-09-08 19:57 ` Josef Bacik
2 siblings, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2017-09-08 19:57 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: linux-btrfs
On Fri, Sep 08, 2017 at 01:33:43PM +0900, Tomasz Chmielewski wrote:
> Just got this one in dmesg with btrfs RAID-1 on top of Linux software
> RAID-5.
>
> Why does it say "No space left" if we have 9 TB free there?
>
We've exhausted the global reserve. I have a patch that I'm pretty sure fixes
this problem, would you mind trying
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git btrfs-readdir
You need to make sure you are on the btrfs-readdir branch, and that you have the
patch entitled
Btrfs: only check delayed ref usage in should_end_transaction
If you still see the problem let me know, I really want to nail this issue down
and haven't been able to reproduce locally. Thanks,
Josef
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-09-08 19:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-08 4:33 4.13: No space left with plenty of free space (/home/kernel/COD/linux/fs/btrfs/extent-tree.c:6989 __btrfs_free_extent.isra.62+0xc2c/0xdb0) Tomasz Chmielewski
2017-09-08 4:56 ` Tomasz Chmielewski
2017-09-08 12:45 ` Tomasz Chmielewski
2017-09-08 11:02 ` Peter Grandi
2017-09-08 19:57 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).