* FS Remounted RO due to false-positive for OOS?
@ 2020-03-03 21:32 Ellis H. Wilson III
2020-03-03 23:36 ` Nikolay Borisov
0 siblings, 1 reply; 3+ messages in thread
From: Ellis H. Wilson III @ 2020-03-03 21:32 UTC (permalink / raw)
To: BTRFS
Hi all,
I encountered the following issue and wasn't sure if it was known or not
yet. I'll be glad to hear it matches a fingerprint of a known or fixed
bug as I'm admittedly running an older kernel, but my searching skills
have failed me.
I have an mdraid array formatted with BTRFS. 6x12TB drives in raid0.
Only about 240GB of 72TB consumed at the time of OOS.
/etc/fstab mount options:
/dev/md0 /pandata/0 btrfs defaults,space_cache=v2,noauto 0 0
uname:
Linux 4d00fa3d419078 4.12.14-lp150.11-default #1 SMP Fri May 11 08:28:30
UTC 2018 (a9fee09) x86_64 x86_64 x86_64 GNU/Linux
dmesg output:
[17939.536301] BTRFS: Transaction aborted (error -28)
[17939.536331] ------------[ cut here ]------------
[17939.542058] WARNING: CPU: 7 PID: 3372 at
../fs/btrfs/extent-tree.c:6988 __btrfs_free_extent.isra.64+0xb9d/0xd40
[btrfs]
[17939.553779] Modules linked in: binfmt_misc af_packet bonding
iscsi_ibft iscsi_boot_sysfs msr nls_iso8859_1 nls_cp437 vfat intel_rapl
fat skx_edac x86_pkg_temp_thermal btrfs intel_powerclamp coretemp xor
ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
crc32c_intel raid0 iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcbc
dax_pmem ixgbe device_dax md_mod ptp nd_pmem pps_core mdio nd_btt
aesni_intel aes_x86_64 raid6_pq crypto_simd glue_helper cryptd i2c_i801
lpc_ich ioatdma ipmi_si pcspkr mei_me mei nfit ipmi_devintf shpchp dca
wmi ipmi_msghandler libnvdimm acpi_pad button joydev hid_generic usbhid
ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops xhci_pci ttm xhci_hcd nvme drm ahci
drm_panel_orientation_quirks nvme_core usbcore libahci sg dm_multipath
dm_mod
[17939.631713] scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[17939.638341] CPU: 7 PID: 3372 Comm: btrfs-transacti Not tainted
4.12.14-lp150.11-default #1 openSUSE Leap 15.0 (unreleased)
[17939.650466] Hardware name: Supermicro SYS-F629P3-RTB/X11DPFR-S, BIOS
3.0c_PI021_2e 11/26/2019
[17939.660095] task: ffff88083b975680 task.stack: ffffc9000a238000
[17939.667128] RIP: 0010:__btrfs_free_extent.isra.64+0xb9d/0xd40 [btrfs]
[17939.674653] RSP: 0018:ffffc9000a23bc78 EFLAGS: 00010296
[17939.680953] RAX: 0000000000000026 RBX: 0000000000000000 RCX:
0000000000000000
[17939.689172] RDX: ffff88085c1dfd40 RSI: ffff88085c1d7a68 RDI:
ffff88085c1d7a68
[17939.697386] RBP: 00000012b9a5c000 R08: 0000000000000511 R09:
0000000000000007
[17939.705602] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff8808530ae000
[17939.713803] R13: 00000000ffffffe4 R14: ffff8802edf64870 R15:
ffff8801368c0230
[17939.722017] FS: 0000000000000000(0000) GS:ffff88085c1c0000(0000)
knlGS:0000000000000000
[17939.731203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17939.738051] CR2: 00007f12998bea08 CR3: 000000000200a003 CR4:
00000000007606e0
[17939.746292] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[17939.754525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[17939.762735] PKRU: 55555554
[17939.766521] Call Trace:
[17939.770075] __btrfs_run_delayed_refs+0x5b9/0x1300 [btrfs]
[17939.776682] btrfs_run_delayed_refs+0x68/0x250 [btrfs]
[17939.782948] btrfs_commit_transaction+0x2df/0x900 [btrfs]
[17939.789462] ? wait_woken+0x80/0x80
[17939.794087] transaction_kthread+0x186/0x1a0 [btrfs]
[17939.800201] ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
[17939.806983] kthread+0x11a/0x130
[17939.811308] ? kthread_create_on_node+0x40/0x40
[17939.816939] ret_from_fork+0x1f/0x40
[17939.821591] Code: 00 00 48 c7 c6 c0 07 8e a0 4c 89 f7 41 bd ea ff ff
ff e8 4d d0 09 00 e9 a0 f5 ff ff 44 89 ee 48 c7 c7 18 71 8e a0 e8 d9 95
96 e0 <0f> 0b e9 73 f5 ff ff 49 8b 46 60 f0 0f ba a8 30 17 00 00 02 72
[17939.842686] ---[ end trace 179787a3004a4525 ]---
[17939.848482] BTRFS: error (device md0) in __btrfs_free_extent:6988:
errno=-28 No space left
[17939.857923] BTRFS info (device md0): forced readonly
[17939.864081] BTRFS: error (device md0) in btrfs_run_delayed_refs:3016:
errno=-28 No space left
[17939.873811] BTRFS warning (device md0): Skipping commit of aborted
transaction.
[17939.882319] BTRFS: error (device md0) in cleanup_transaction:1876:
errno=-28 No space left
[17940.192941] BTRFS error (device md0): pending csums is 334954496
fsyncs for a running application immediately began to return "fileio: no
more space" following the above. The mount went RO.
btrfs check output:
4d00fa3d419078:~ # btrfs check -p /dev/md0
Checking filesystem on /dev/md0
UUID: 2a71b152-ade6-4be6-9b2f-8db1e736455a
checking extents [O]
checking free space cache [o]
checking fs roots [.]
checking csums
checking root refs
found 242851065856 bytes used, no error found
total csum bytes: 234919228
total tree bytes: 2293776384
total fs tree bytes: 910114816
total extent tree bytes: 998359040
btree space waste bytes: 440673068
file data blocks allocated: 450663858176
referenced 236223201280
A remount following btrfs check worked just fine.
btrfs usage fi reports:
# btrfs fi usage /pandata/0/
Overall:
Device size: 65.48TiB
Device allocated: 276.02GiB
Device unallocated: 65.21TiB
Device missing: 0.00B
Used: 227.67GiB
Free (estimated): 65.26TiB (min: 32.65TiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:268.00GiB, Used:223.57GiB
/dev/md0 268.00GiB
Metadata,DUP: Size:4.00GiB, Used:2.05GiB
/dev/md0 8.00GiB
System,DUP: Size:8.00MiB, Used:48.00KiB
/dev/md0 16.00MiB
Unallocated:
/dev/md0 65.21TiB
I suspect this is a free space cache issue, and a bug that false reports
up the chain that there is no more space and then locks the FS out in RO
mode. But why it doesn't hit on check or remount is unclear to me.
Any and all thoughts are greatly appreciated,
ellis
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: FS Remounted RO due to false-positive for OOS?
2020-03-03 21:32 FS Remounted RO due to false-positive for OOS? Ellis H. Wilson III
@ 2020-03-03 23:36 ` Nikolay Borisov
2020-03-04 15:23 ` Ellis H. Wilson III
0 siblings, 1 reply; 3+ messages in thread
From: Nikolay Borisov @ 2020-03-03 23:36 UTC (permalink / raw)
To: Ellis H. Wilson III, BTRFS
On 3.03.20 г. 23:32 ч., Ellis H. Wilson III wrote:
> Hi all,
>
> I encountered the following issue and wasn't sure if it was known or not
> yet. I'll be glad to hear it matches a fingerprint of a known or fixed
> bug as I'm admittedly running an older kernel, but my searching skills
> have failed me.
>
> I have an mdraid array formatted with BTRFS. 6x12TB drives in raid0.
> Only about 240GB of 72TB consumed at the time of OOS.
>
> /etc/fstab mount options:
>
> /dev/md0 /pandata/0 btrfs defaults,space_cache=v2,noauto 0 0
>
> uname:
>
> Linux 4d00fa3d419078 4.12.14-lp150.11-default #1 SMP Fri May 11 08:28:30
> UTC 2018 (a9fee09) x86_64 x86_64 x86_64 GNU/Linux
>
> dmesg output:
>
> [17939.536301] BTRFS: Transaction aborted (error -28)
> [17939.536331] ------------[ cut here ]------------
> [17939.542058] WARNING: CPU: 7 PID: 3372 at
> ../fs/btrfs/extent-tree.c:6988 __btrfs_free_extent.isra.64+0xb9d/0xd40
> [btrfs]
> [17939.553779] Modules linked in: binfmt_misc af_packet bonding
> iscsi_ibft iscsi_boot_sysfs msr nls_iso8859_1 nls_cp437 vfat intel_rapl
> fat skx_edac x86_pkg_temp_thermal btrfs intel_powerclamp coretemp xor
> ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
> crc32c_intel raid0 iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcbc
> dax_pmem ixgbe device_dax md_mod ptp nd_pmem pps_core mdio nd_btt
> aesni_intel aes_x86_64 raid6_pq crypto_simd glue_helper cryptd i2c_i801
> lpc_ich ioatdma ipmi_si pcspkr mei_me mei nfit ipmi_devintf shpchp dca
> wmi ipmi_msghandler libnvdimm acpi_pad button joydev hid_generic usbhid
> ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops xhci_pci ttm xhci_hcd nvme drm ahci
> drm_panel_orientation_quirks nvme_core usbcore libahci sg dm_multipath
> dm_mod
> [17939.631713] scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
> [17939.638341] CPU: 7 PID: 3372 Comm: btrfs-transacti Not tainted
> 4.12.14-lp150.11-default #1 openSUSE Leap 15.0 (unreleased)
> [17939.650466] Hardware name: Supermicro SYS-F629P3-RTB/X11DPFR-S, BIOS
> 3.0c_PI021_2e 11/26/2019
> [17939.660095] task: ffff88083b975680 task.stack: ffffc9000a238000
> [17939.667128] RIP: 0010:__btrfs_free_extent.isra.64+0xb9d/0xd40 [btrfs]
> [17939.674653] RSP: 0018:ffffc9000a23bc78 EFLAGS: 00010296
> [17939.680953] RAX: 0000000000000026 RBX: 0000000000000000 RCX:
> 0000000000000000
> [17939.689172] RDX: ffff88085c1dfd40 RSI: ffff88085c1d7a68 RDI:
> ffff88085c1d7a68
> [17939.697386] RBP: 00000012b9a5c000 R08: 0000000000000511 R09:
> 0000000000000007
> [17939.705602] R10: 0000000000000001 R11: 0000000000000001 R12:
> ffff8808530ae000
> [17939.713803] R13: 00000000ffffffe4 R14: ffff8802edf64870 R15:
> ffff8801368c0230
> [17939.722017] FS: 0000000000000000(0000) GS:ffff88085c1c0000(0000)
> knlGS:0000000000000000
> [17939.731203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [17939.738051] CR2: 00007f12998bea08 CR3: 000000000200a003 CR4:
> 00000000007606e0
> [17939.746292] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [17939.754525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [17939.762735] PKRU: 55555554
> [17939.766521] Call Trace:
> [17939.770075] __btrfs_run_delayed_refs+0x5b9/0x1300 [btrfs]
> [17939.776682] btrfs_run_delayed_refs+0x68/0x250 [btrfs]
> [17939.782948] btrfs_commit_transaction+0x2df/0x900 [btrfs]
> [17939.789462] ? wait_woken+0x80/0x80
> [17939.794087] transaction_kthread+0x186/0x1a0 [btrfs]
> [17939.800201] ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
> [17939.806983] kthread+0x11a/0x130
> [17939.811308] ? kthread_create_on_node+0x40/0x40
> [17939.816939] ret_from_fork+0x1f/0x40
> [17939.821591] Code: 00 00 48 c7 c6 c0 07 8e a0 4c 89 f7 41 bd ea ff ff
> ff e8 4d d0 09 00 e9 a0 f5 ff ff 44 89 ee 48 c7 c7 18 71 8e a0 e8 d9 95
> 96 e0 <0f> 0b e9 73 f5 ff ff 49 8b 46 60 f0 0f ba a8 30 17 00 00 02 72
> [17939.842686] ---[ end trace 179787a3004a4525 ]---
> [17939.848482] BTRFS: error (device md0) in __btrfs_free_extent:6988:
> errno=-28 No space left
> [17939.857923] BTRFS info (device md0): forced readonly
> [17939.864081] BTRFS: error (device md0) in btrfs_run_delayed_refs:3016:
> errno=-28 No space left
> [17939.873811] BTRFS warning (device md0): Skipping commit of aborted
> transaction.
> [17939.882319] BTRFS: error (device md0) in cleanup_transaction:1876:
> errno=-28 No space left
> [17940.192941] BTRFS error (device md0): pending csums is 334954496
>
There were multiple fixes to the ENOSPC machinery. In particular:
https://patchwork.kernel.org/cover/10709795/
But this series might depend on other fixes you'd have to do the
backporting yourself.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: FS Remounted RO due to false-positive for OOS?
2020-03-03 23:36 ` Nikolay Borisov
@ 2020-03-04 15:23 ` Ellis H. Wilson III
0 siblings, 0 replies; 3+ messages in thread
From: Ellis H. Wilson III @ 2020-03-04 15:23 UTC (permalink / raw)
To: Nikolay Borisov, BTRFS
On 3/3/20 6:36 PM, Nikolay Borisov wrote:
> There were multiple fixes to the ENOSPC machinery. In particular:
>
> https://patchwork.kernel.org/cover/10709795/
>
> But this series might depend on other fixes you'd have to do the
> backporting yourself.
Ah, that does look very relevant -- thank you!
We are moving to a newer kernel within the next few months, and this is
the first time after years of running with BTRFS that I've hit this
specific failure, so I'm ok with living with this risk until we upgrade.
I'll report back if I see something similar on newer bits.
Best,
ellis
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-03-04 15:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-03 21:32 FS Remounted RO due to false-positive for OOS? Ellis H. Wilson III
2020-03-03 23:36 ` Nikolay Borisov
2020-03-04 15:23 ` Ellis H. Wilson III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox