From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Martin Raiber <martin@urbackup.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11
Date: Mon, 24 Jun 2019 07:36:07 -0400 [thread overview]
Message-ID: <20190624113607.GC11820@hungrycats.org> (raw)
In-Reply-To: <0102016b88eef8ed-42aa5e72-7483-4c58-9555-5b1991a0da5e-000000@eu-west-1.amazonses.com>
[-- Attachment #1: Type: text/plain, Size: 17097 bytes --]
On Mon, Jun 24, 2019 at 10:01:31AM +0000, Martin Raiber wrote:
> I've fixed the same problem(s) by increasing the global metadata size as
> well. Though I haven't encountered them since Josef Bacik's block rsv
> rework in 5.0.
Now _that_ is interesting--for me, this issue never happened in five
years _before_ 5.0.
> Another problem with increasing the global metadata size is, that I
> think it is the only way dirty metadata is throttled. If increased too
> much (as a percentage of RAM) the system goes OOM depending on work
> load. As far as I can see dirty metdata isn't included into the
> dirty_ratio calculation as well, causing issues on that front as well.
> Another thing that I think helps is to run with "nodatasum" -- probably
> because then less metadata needs to be changed when deleting.
>
> On 23.06.2019 16:14 Zygo Blaxell wrote:
> > On Tue, Apr 23, 2019 at 07:06:51PM -0400, Zygo Blaxell wrote:
> >> I had a test filesystem that ran out of unallocated space, then ran
> >> out of metadata space during a snapshot delete, and forced readonly.
> >> The workload before the failure was a lot of rsync and bees dedupe
> >> combined with random snapshot creates and deletes.
> > Had this happen again on a production filesystem, this time on 5.1.11,
> > and it happened during orphan inode cleanup instead of snapshot delete:
> >
> > [14303.076134][T20882] BTRFS: error (device dm-21) in add_to_free_space_tree:1037: errno=-28 No space left
> > [14303.076144][T20882] BTRFS: error (device dm-21) in __btrfs_free_extent:7196: errno=-28 No space left
> > [14303.076157][T20882] BTRFS: error (device dm-21) in btrfs_run_delayed_refs:3008: errno=-28 No space left
> > [14303.076203][T20882] BTRFS error (device dm-21): Error removing orphan entry, stopping orphan cleanup
> > [14303.076210][T20882] BTRFS error (device dm-21): could not do orphan cleanup -22
> > [14303.076281][T20882] BTRFS error (device dm-21): commit super ret -30
> > [14303.357337][T20882] BTRFS error (device dm-21): open_ctree failed
> >
> > Same fix: I bumped the reserved size limit from 512M to 2G and mounted
> > normally. (OK, technically, I booted my old 5.0.21 kernel--but my 5.0.21
> > kernel has the 2G reserved space patch below in it.)
> >
> > I've not been able to repeat this ENOSPC behavior under test conditions
> > in the last two months of trying, but it's now happened twice in different
> > places, so it has non-zero repeatability.
> >
> >> I tried the usual fix strategies:
> >>
> >> 1. Immediately after mount, try to balance to free space for
> >> metadata
> >>
> >> 2. Immediately after mount, add additional disks to provide
> >> unallocated space for metadata
> >>
> >> 3. Mount -o nossd to increase metadata density
> >>
> >> #3 had no effect. #1 failed consistently.
> >>
> >> #2 was successful, but the additional space was not used because
> >> btrfs couldn't allocate chunks for metadata because it ran out of
> >> metadata space for new metadata chunks.
> >>
> >> When btrfs-cleaner tried to remove the first pending deleted snapshot,
> >> it started a transaction that failed due to lack of metadata space.
> >> Since the transaction failed, the filesystem reverts to its earlier state,
> >> and exactly the same thing happens on the next mount. The 'btrfs dev
> >> add' in #2 is successful only if it is executed immediately after mount,
> >> before the btrfs-cleaner thread wakes up.
> >>
> >> Here's what the kernel said during one of the attempts:
> >>
> >> [41263.822252] BTRFS info (device dm-3): use zstd compression, level 0
> >> [41263.825135] BTRFS info (device dm-3): using free space tree
> >> [41263.827319] BTRFS info (device dm-3): has skinny extents
> >> [42046.463356] ------------[ cut here ]------------
> >> [42046.463387] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> [42046.463404] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> [42046.463407] BTRFS info (device dm-3): forced readonly
> >> [42046.463414] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >> [42046.463429] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >> [42046.463548] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >> [42046.471363] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >> [42046.471475] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >> [42046.471506] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >> [42046.473672] BTRFS: error (device dm-3) in btrfs_drop_snapshot:9489: errno=-28 No space left
> >> [42046.475643] WARNING: CPU: 0 PID: 10187 at fs/btrfs/extent-tree.c:7056 __btrfs_free_extent+0x364/0xf60
> >> [42046.475645] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio joydev ppdev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod snd_pcm aesni_intel aes_x86_64 snd_timer crypto_simd cryptd glue_helper sr_mod snd cdrom psmouse sg soundcore input_leds pcspkr serio_raw ide_pci_generic i2c_piix4 bochs_drm parport_pc piix rtc_cmos floppy parport pcc_cpufreq ide_core qemu_fw_cfg evbug evdev ip_tables x_tables ipv6 crc_ccitt autofs4
> >> [42046.475677] CPU: 0 PID: 10187 Comm: btrfs-transacti Tainted: G B W 5.0.8-zb64-10a85e8a1569+ #1
> >> [42046.475678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >> [42046.475681] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> >> [42046.475684] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 00 85 be b8 e8 3c 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> >> [42046.475685] RSP: 0018:ffff888103d8f8e0 EFLAGS: 00010282
> >> [42046.475688] RAX: 0000000000000000 RBX: ffff88802d9ce370 RCX: ffffffffb7224ca2
> >> [42046.475689] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88811960dfac
> >> [42046.475691] RBP: ffff888103d8fa98 R08: ffffed10232c24c1 R09: ffffed10232c24c0
> >> [42046.475693] R10: ffff888025c94aeb R11: ffffed10232c24c1 R12: 0000000000a5054e
> >> [42046.475694] R13: 0000000000003000 R14: 000000000000214a R15: ffff888103d8fa70
> >> [42046.475696] FS: 0000000000000000(0000) GS:ffff888119400000(0000) knlGS:0000000000000000
> >> [42046.475698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [42046.475700] CR2: 00007f1b84271718 CR3: 00000000a3636001 CR4: 00000000001606f0
> >> [42046.475702] Call Trace:
> >> [42046.475707] ? trace_hardirqs_off+0x24/0x120
> >> [42046.475714] ? update_block_group+0x6d0/0x6d0
> >> [42046.475718] ? __lock_acquire+0xf8/0x26e0
> >> [42046.475721] ? __kasan_slab_free+0x14d/0x230
> >> [42046.475724] ? btrfs_delayed_ref_lock+0x3e/0x100
> >> [42046.475727] ? __lock_acquire+0xf8/0x26e0
> >> [42046.475730] ? kthread+0x1a9/0x200
> >> [42046.475733] ? debug_show_all_locks+0x210/0x210
> >> [42046.475737] ? __btrfs_run_delayed_refs+0xce/0x210
> >> [42046.475742] ? debug_show_all_locks+0x210/0x210
> >> [42046.475747] run_delayed_data_ref+0x15a/0x400
> >> [42046.475752] ? alloc_reserved_file_extent+0x520/0x520
> >> [42046.475756] ? rb_next+0x25/0x90
> >> [42046.475761] run_one_delayed_ref+0x71/0xe0
> >> [42046.475765] btrfs_run_delayed_refs_for_head+0x284/0x580
> >> [42046.475772] __btrfs_run_delayed_refs+0xeb/0x210
> >> [42046.475777] ? btrfs_run_delayed_refs_for_head+0x580/0x580
> >> [42046.475781] ? debug_show_all_locks+0x210/0x210
> >> [42046.475786] btrfs_run_delayed_refs+0xc0/0x260
> >> [42046.475792] btrfs_commit_transaction+0xf0/0x10f0
> >> [42046.475795] ? do_raw_spin_unlock+0xa8/0x140
> >> [42046.475799] ? _raw_spin_unlock+0x27/0x40
> >> [42046.475803] ? btrfs_record_root_in_trans+0x24/0xb0
> >> [42046.475807] ? btrfs_apply_pending_changes+0xb0/0xb0
> >> [42046.475809] ? start_transaction+0x14a/0x760
> >> [42046.475816] transaction_kthread+0x218/0x250
> >> [42046.475822] kthread+0x1a9/0x200
> >> [42046.475824] ? btrfs_cleanup_transaction+0xb80/0xb80
> >> [42046.475827] ? kthread_park+0xb0/0xb0
> >> [42046.475831] ret_from_fork+0x3a/0x50
> >> [42046.475837] irq event stamp: 0
> >> [42046.475839] hardirqs last enabled at (0): [<0000000000000000>] (null)
> >> [42046.475843] hardirqs last disabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
> >> [42046.475846] softirqs last enabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
> >> [42046.475847] softirqs last disabled at (0): [<0000000000000000>] (null)
> >> [42046.475849] ---[ end trace b02d556137ea688c ]---
> >> [42046.475852] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> [42046.475859] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >> [42046.475878] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >> [42046.475905] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >>
> >> I noticed that during the few minutes when the filesystem was running,
> >> the "global reserve" number in "btrfs fi usage" output kept going up to
> >> 503M or so, then the transaction failed.
> >>
> >> I then applied this patch to the kernel (5.0.8):
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index 83791d13c204..59ba456e2314 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -5790,7 +5790,7 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
> >> spin_lock(&sinfo->lock);
> >> spin_lock(&block_rsv->lock);
> >>
> >> - block_rsv->size = min_t(u64, num_bytes, SZ_512M);
> >> + block_rsv->size = min_t(u64, num_bytes, SZ_1G);
> >>
> >> if (block_rsv->reserved < block_rsv->size) {
> >> num_bytes = btrfs_space_info_used(sinfo, true);
> >>
> >> Then mounted the filesystem.
> >>
> >> With the larger reserved block size, btrfs-cleaner was able to get
> >> past its earlier failures, allocating 3GB of new metadata chunks in the
> >> first transaction after it started deleting snapshots. After this, the
> >> filesystem deleted some more snapshots (there are currently 2470 deleted
> >> snapshots remaining on this filesystem).
> >>
> >> This isn't really a solution, though, for two reasons:
> >>
> >> 1. The commit that adds the limit on block_rsv->size (fdf30d1c1b3
> >> "Btrfs: limit the global reserve to 512mb") points out:
> >>
> >> A user reported a problem where he was getting early ENOSPC with
> >> hundreds of gigs of free data space and 6 gigs of free metadata space.
> >> This is because the global block reserve was taking up the entire
> >> free metadata space. This is ridiculous, we have infrastructure in
> >> place to throttle if we start using too much of the global reserve,
> >> so instead of letting it get this huge just limit it to 512mb so that
> >> users can still get work done. This allowed the user to complete
> >> his rsync without issues.
> >>
> >> In other words, something should be handling the case when reserved
> >> space runs out, and that something is failing.
> >>
> >> 2. The 1G reserved limit still isn't sufficient. The filesystem deletes
> >> a few dozen snapshots, then fails again later:
> >>
> >> [ 3705.477036] CPU: 2 PID: 8388 Comm: btrfs-transacti Tainted: G B 5.0.9-zb64-b6eed6abc880+ #3
> >> [ 3705.481510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >> [ 3705.484038] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> >> [ 3705.485061] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 80 85 be 8b e8 fc 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> >> [ 3705.488046] RSP: 0018:ffff8881c29cf8e0 EFLAGS: 00010282
> >> [ 3705.488868] RAX: 0000000000000000 RBX: ffff888121dbdf20 RCX: ffffffff8a224a92
> >> [ 3705.490129] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8881f5e0dfac
> >> [ 3705.491246] RBP: ffff8881c29cfa98 R08: ffffed103ebc24c1 R09: ffffed103ebc24c0
> >> [ 3705.492376] R10: ffff88819e6d430b R11: ffffed103ebc24c1 R12: 00000000009ecb21
> >> [ 3705.493501] R13: 0000000000040000 R14: 00000000000021b3 R15: ffff8881c29cfa70
> >> [ 3705.494619] FS: 0000000000000000(0000) GS:ffff8881f5c00000(0000) knlGS:0000000000000000
> >> [ 3705.495883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 3705.496781] CR2: 0000556bc833ec70 CR3: 00000001ef9e0001 CR4: 00000000001606e0
> >> [ 3705.497872] Call Trace:
> >> [ 3705.498279] ? update_block_group+0x6d0/0x6d0
> >> [ 3705.498950] ? btrfs_run_delayed_refs_for_head+0x352/0x580
> >> [ 3705.499816] ? kthread+0x1a9/0x200
> >> [ 3705.500379] ? debug_show_all_locks+0x210/0x210
> >> [ 3705.501078] ? __btrfs_run_delayed_refs+0xce/0x210
> >> [ 3705.501813] ? debug_show_all_locks+0x210/0x210
> >> [ 3705.502516] run_delayed_data_ref+0x15a/0x400
> >> [ 3705.503193] ? alloc_reserved_file_extent+0x520/0x520
> >> [ 3705.503968] ? rb_next+0x3c/0x90
> >> [ 3705.504479] run_one_delayed_ref+0x71/0xe0
> >> [ 3705.505116] btrfs_run_delayed_refs_for_head+0x284/0x580
> >> [ 3705.505935] __btrfs_run_delayed_refs+0xeb/0x210
> >> [ 3705.506649] ? btrfs_run_delayed_refs_for_head+0x580/0x580
> >> [ 3705.507495] ? debug_show_all_locks+0x210/0x210
> >> [ 3705.508201] btrfs_run_delayed_refs+0xc0/0x260
> >> [ 3705.508912] btrfs_commit_transaction+0xf0/0x10f0
> >> [ 3705.509654] ? do_raw_spin_unlock+0xa8/0x140
> >> [ 3705.510343] ? _raw_spin_unlock+0x27/0x40
> >> [ 3705.510979] ? btrfs_record_root_in_trans+0x24/0xb0
> >> [ 3705.511748] ? btrfs_apply_pending_changes+0xb0/0xb0
> >> [ 3705.512533] ? start_transaction+0x14a/0x760
> >> [ 3705.513216] transaction_kthread+0x218/0x250
> >> [ 3705.513900] kthread+0x1a9/0x200
> >> [ 3705.514425] ? btrfs_cleanup_transaction+0xb80/0xb80
> >> [ 3705.515233] ? kthread_park+0xb0/0xb0
> >> [ 3705.515829] ret_from_fork+0x3a/0x50
> >> [ 3705.516408] irq event stamp: 0
> >> [ 3705.516898] hardirqs last enabled at (0): [<0000000000000000>] (null)
> >> [ 3705.518054] hardirqs last disabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
> >> [ 3705.519419] softirqs last enabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
> >> [ 3705.520779] softirqs last disabled at (0): [<0000000000000000>] (null)
> >> [ 3705.521944] ---[ end trace 3a5f720e889bc0d3 ]---
> >> [ 3705.522713] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> [ 3705.524069] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >>
> >> If I increase the reserved limit again to 2G, btrfs-cleaner gets past
> >> this shapshot (so far). It takes more than 10 minutes to delete some
> >> of the snapshots, so this will take a while to finish all 2470.
> >>
> >> The current btrfs fi usage output worries me:
> >>
> >> Overall:
> >> Device size: 2.02TiB
> >> Device allocated: 2.01TiB
> >> Device unallocated: 10.03GiB
> >> Device missing: 0.00B
> >> Used: 1.70TiB
> >> Free (estimated): 296.02GiB (min: 291.01GiB)
> >> Data ratio: 1.00
> >> Metadata ratio: 2.00
> >> Global reserve: 2.00GiB (used: 864.00KiB)
> >>
> >> Data,single: Size:1.81TiB, Used:1.53TiB
> >> /dev/mapper/vgnebtest-tvdb 928.97GiB
> >> /dev/mapper/vgnebtest-tvdc 929.00GiB
> >>
> >> Metadata,RAID1: Size:99.99GiB, Used:82.29GiB
> >> /dev/mapper/vgnebtest-lvol1 5.00GiB
> >> /dev/mapper/vgnebtest-lvol2 5.00GiB
> >> /dev/mapper/vgnebtest-tvdb 94.99GiB
> >> /dev/mapper/vgnebtest-tvdc 94.99GiB
> >>
> >> System,RAID1: Size:8.00MiB, Used:240.00KiB
> >> /dev/mapper/vgnebtest-tvdb 8.00MiB
> >> /dev/mapper/vgnebtest-tvdc 8.00MiB
> >>
> >> Unallocated:
> >> /dev/mapper/vgnebtest-lvol1 5.00GiB
> >> /dev/mapper/vgnebtest-lvol2 5.00GiB
> >> /dev/mapper/vgnebtest-tvdb 30.00MiB
> >> /dev/mapper/vgnebtest-tvdc 1.00MiB
> >>
> >> Note "lvol1" and "lvol2" were added after the initial failure.
> >>
> >> There is clearly a lot of metadata space available, but the filesystem still
> >> intermittently hits ENOSPC while trying to free space.
> >
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
next prev parent reply other threads:[~2019-06-24 11:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-23 23:06 Global reserve and ENOSPC while deleting snapshots on 5.0.9 Zygo Blaxell
2019-04-24 2:57 ` Paul Jones
2019-04-24 3:58 ` Zygo Blaxell
2019-06-23 14:14 ` Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11 Zygo Blaxell
2019-06-24 10:01 ` Martin Raiber
2019-06-24 11:36 ` Zygo Blaxell [this message]
2019-07-01 3:07 ` Global reserve and ENOSPC while deleting orphan inodes on 5.1.14 Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190624113607.GC11820@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=martin@urbackup.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).