public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* FS Remounted RO due to false-positive for OOS?
@ 2020-03-03 21:32 Ellis H. Wilson III
  2020-03-03 23:36 ` Nikolay Borisov
  0 siblings, 1 reply; 3+ messages in thread
From: Ellis H. Wilson III @ 2020-03-03 21:32 UTC (permalink / raw)
  To: BTRFS

Hi all,

I encountered the following issue and wasn't sure if it was known or not 
yet.  I'll be glad to hear it matches a fingerprint of a known or fixed 
bug as I'm admittedly running an older kernel, but my searching skills 
have failed me.

I have an mdraid array formatted with BTRFS.  6x12TB drives in raid0. 
Only about 240GB of 72TB consumed at the time of OOS.

/etc/fstab mount options:

/dev/md0        /pandata/0      btrfs   defaults,space_cache=v2,noauto  0 0

uname:

Linux 4d00fa3d419078 4.12.14-lp150.11-default #1 SMP Fri May 11 08:28:30 
UTC 2018 (a9fee09) x86_64 x86_64 x86_64 GNU/Linux

dmesg output:

[17939.536301] BTRFS: Transaction aborted (error -28)
[17939.536331] ------------[ cut here ]------------
[17939.542058] WARNING: CPU: 7 PID: 3372 at 
../fs/btrfs/extent-tree.c:6988 __btrfs_free_extent.isra.64+0xb9d/0xd40 
[btrfs]
[17939.553779] Modules linked in: binfmt_misc af_packet bonding 
iscsi_ibft iscsi_boot_sysfs msr nls_iso8859_1 nls_cp437 vfat intel_rapl 
fat skx_edac x86_pkg_temp_thermal btrfs intel_powerclamp coretemp xor 
ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
crc32c_intel raid0 iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcbc 
dax_pmem ixgbe device_dax md_mod ptp nd_pmem pps_core mdio nd_btt 
aesni_intel aes_x86_64 raid6_pq crypto_simd glue_helper cryptd i2c_i801 
lpc_ich ioatdma ipmi_si pcspkr mei_me mei nfit ipmi_devintf shpchp dca 
wmi ipmi_msghandler libnvdimm acpi_pad button joydev hid_generic usbhid 
ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops xhci_pci ttm xhci_hcd nvme drm ahci 
drm_panel_orientation_quirks nvme_core usbcore libahci sg dm_multipath 
dm_mod
[17939.631713]  scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[17939.638341] CPU: 7 PID: 3372 Comm: btrfs-transacti Not tainted 
4.12.14-lp150.11-default #1 openSUSE Leap 15.0 (unreleased)
[17939.650466] Hardware name: Supermicro SYS-F629P3-RTB/X11DPFR-S, BIOS 
3.0c_PI021_2e 11/26/2019
[17939.660095] task: ffff88083b975680 task.stack: ffffc9000a238000
[17939.667128] RIP: 0010:__btrfs_free_extent.isra.64+0xb9d/0xd40 [btrfs]
[17939.674653] RSP: 0018:ffffc9000a23bc78 EFLAGS: 00010296
[17939.680953] RAX: 0000000000000026 RBX: 0000000000000000 RCX: 
0000000000000000
[17939.689172] RDX: ffff88085c1dfd40 RSI: ffff88085c1d7a68 RDI: 
ffff88085c1d7a68
[17939.697386] RBP: 00000012b9a5c000 R08: 0000000000000511 R09: 
0000000000000007
[17939.705602] R10: 0000000000000001 R11: 0000000000000001 R12: 
ffff8808530ae000
[17939.713803] R13: 00000000ffffffe4 R14: ffff8802edf64870 R15: 
ffff8801368c0230
[17939.722017] FS:  0000000000000000(0000) GS:ffff88085c1c0000(0000) 
knlGS:0000000000000000
[17939.731203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17939.738051] CR2: 00007f12998bea08 CR3: 000000000200a003 CR4: 
00000000007606e0
[17939.746292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[17939.754525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[17939.762735] PKRU: 55555554
[17939.766521] Call Trace:
[17939.770075]  __btrfs_run_delayed_refs+0x5b9/0x1300 [btrfs]
[17939.776682]  btrfs_run_delayed_refs+0x68/0x250 [btrfs]
[17939.782948]  btrfs_commit_transaction+0x2df/0x900 [btrfs]
[17939.789462]  ? wait_woken+0x80/0x80
[17939.794087]  transaction_kthread+0x186/0x1a0 [btrfs]
[17939.800201]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
[17939.806983]  kthread+0x11a/0x130
[17939.811308]  ? kthread_create_on_node+0x40/0x40
[17939.816939]  ret_from_fork+0x1f/0x40
[17939.821591] Code: 00 00 48 c7 c6 c0 07 8e a0 4c 89 f7 41 bd ea ff ff 
ff e8 4d d0 09 00 e9 a0 f5 ff ff 44 89 ee 48 c7 c7 18 71 8e a0 e8 d9 95 
96 e0 <0f> 0b e9 73 f5 ff ff 49 8b 46 60 f0 0f ba a8 30 17 00 00 02 72
[17939.842686] ---[ end trace 179787a3004a4525 ]---
[17939.848482] BTRFS: error (device md0) in __btrfs_free_extent:6988: 
errno=-28 No space left
[17939.857923] BTRFS info (device md0): forced readonly
[17939.864081] BTRFS: error (device md0) in btrfs_run_delayed_refs:3016: 
errno=-28 No space left
[17939.873811] BTRFS warning (device md0): Skipping commit of aborted 
transaction.
[17939.882319] BTRFS: error (device md0) in cleanup_transaction:1876: 
errno=-28 No space left
[17940.192941] BTRFS error (device md0): pending csums is 334954496

fsyncs for a running application immediately began to return "fileio: no 
more space" following the above.  The mount went RO.

btrfs check output:

4d00fa3d419078:~ # btrfs check -p /dev/md0
Checking filesystem on /dev/md0
UUID: 2a71b152-ade6-4be6-9b2f-8db1e736455a
checking extents [O]
checking free space cache [o]
checking fs roots [.]
checking csums
checking root refs
found 242851065856 bytes used, no error found
total csum bytes: 234919228
total tree bytes: 2293776384
total fs tree bytes: 910114816
total extent tree bytes: 998359040
btree space waste bytes: 440673068
file data blocks allocated: 450663858176
  referenced 236223201280

A remount following btrfs check worked just fine.

btrfs usage fi reports:

# btrfs fi usage /pandata/0/
Overall:
     Device size:                  65.48TiB
     Device allocated:            276.02GiB
     Device unallocated:           65.21TiB
     Device missing:                  0.00B
     Used:                        227.67GiB
     Free (estimated):             65.26TiB      (min: 32.65TiB)
     Data ratio:                       1.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:268.00GiB, Used:223.57GiB
    /dev/md0      268.00GiB

Metadata,DUP: Size:4.00GiB, Used:2.05GiB
    /dev/md0        8.00GiB

System,DUP: Size:8.00MiB, Used:48.00KiB
    /dev/md0       16.00MiB

Unallocated:
    /dev/md0       65.21TiB

I suspect this is a free space cache issue, and a bug that false reports 
up the chain that there is no more space and then locks the FS out in RO 
mode.  But why it doesn't hit on check or remount is unclear to me.

Any and all thoughts are greatly appreciated,

ellis

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-04 15:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-03 21:32 FS Remounted RO due to false-positive for OOS? Ellis H. Wilson III
2020-03-03 23:36 ` Nikolay Borisov
2020-03-04 15:23   ` Ellis H. Wilson III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox