From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: dsterba@suse.cz, jeffm@suse.com
Subject: Re: [PATCH 00/14] Qgroup metadata reservation rework
Date: Tue, 12 Dec 2017 16:16:08 +0200 [thread overview]
Message-ID: <4c56a262-51fb-b66d-3d1b-f4b0a906cef5@suse.com> (raw)
In-Reply-To: <20171212073436.16447-1-wqu@suse.com>
On 12.12.2017 09:34, Qu Wenruo wrote:
> [Overall]
> The previous rework on qgroup reservation system put a lot of effort on
> data, which works quite fine.
>
> But it takes less focus on metadata reservation, causing some problem
> like metadata reservation underflow and noisy kernel warning.
>
> This patchset will try to address the remaining problem of metadata
> reservation.
>
> The idea of new qgroup metadata reservation is to use 2 types of
> metadata reservation:
> 1) Per-transaction reservation
> Life span will be inside a transaction. Will be freed at transaction
> commit time.
>
> 2) Preallocated reservation
> For case where we reserve space before starting a transaction.
> Operation like dealloc and delayed-inode/item belongs to this type.
>
> This works similar to block_rsv, its reservation can be
> reserved/released at any timing caller like.
>
> The only point to notice is, if preallocated reservation is used and
> finished without problem, it should be converted to per-transaction
> type instead of just freeing.
> This is to co-operate with qgroup update at commit time.
>
> For preallocated type, this patch will integrate them into inode_rsv
> mechanism reworked by Josef, and delayed-inode/item reservation.
>
>
> [Problem: Over-reserve]
> Currently the patchset addresses metadata underflow quite well, but
> due to the over-reserve nature of btrfs and highly bounded to inode_rsv,
> qgroup metadata reservation also tends to be over-reserved.
>
> This is especially obvious for small limit.
> For 128M limit, it's will only be able to write about 70M before hitting
> quota limit.
> Although for larger limit, like 5G limit, it can reach 4.5G or more
> before hitting limit.
>
> Such over-reserved behavior can lead to some problem with existing test
> cases (where limit is normally less than 20M).
>
> While it's also possible to be addressed by use more accurate space other
> than max estimations.
>
> For example, to calculate metadata needed for delalloc, we use
> btrfs_calc_trans_metadata_size(), which always try to reserve space for
> CoW a full-height tree, and will also include csum size.
> Both calculate is way over-killed for qgroup metadata reservation.
In private chat with Chris couple of months ago we discussed making the
reservation a lot less pessimistic. One assumption which we could
exploit is the fact that upon a tree split it's unlikely we will create
more than 1 additional level in the tree. So we could potentially modify
btrfs_calc_trans_metadata_size to take a root parameter and instead of
BTRFS_MAX_LEVEL * 2 we could change this to root_level * 2. How does
that sound?
>
> [Patch structure]
> The patch is consist of 2 main parts:
> 1) Type based qgroup reservation
> The original patchset is sent several months ago.
> Nothing is modified at all, just rebased. And not conflict at all.
>
> It's from patch 1 to patch 6.
>
> 2) Split meta qgroup reservation into per-trans and prealloc sub types
> The real work to address metadata underflow.
> Due to the over-reserve problem, this part is still in RFC state.
> But the framework should mostly be fine, only needs extra fine-tuning
> to get more accurate qgroup rsv to avoid too early limit.
>
> It's from patch 7 to 14.
>
> Qu Wenruo (14):
> btrfs: qgroup: Skeleton to support separate qgroup reservation type
> btrfs: qgroup: Introduce helpers to update and access new qgroup rsv
> btrfs: qgroup: Make qgroup_reserve and its callers to use separate
> reservation type
> btrfs: qgroup: Fix wrong qgroup reservation update for relationship
> modification
> btrfs: qgroup: Update trace events to use new separate rsv types
> btrfs: qgroup: Cleanup the remaining old reservation counters
> btrfs: qgroup: Split meta rsv type into meta_prealloc and
> meta_pertrans
> btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup
> btrfs: qgroup: Introduce function to convert META_PREALLOC into
> META_PERTRANS
> btrfs: qgroup: Use separate meta reservation type for delalloc
> btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and
> item
> btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta
> reserved space
> btrfs: qgroup: Update trace events for metadata reservation
> Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
>
> fs/btrfs/ctree.h | 15 +-
> fs/btrfs/delayed-inode.c | 50 +++++--
> fs/btrfs/disk-io.c | 2 +-
> fs/btrfs/extent-tree.c | 49 +++---
> fs/btrfs/file.c | 15 +-
> fs/btrfs/free-space-cache.c | 2 +-
> fs/btrfs/inode-map.c | 4 +-
> fs/btrfs/inode.c | 27 ++--
> fs/btrfs/ioctl.c | 10 +-
> fs/btrfs/ordered-data.c | 2 +-
> fs/btrfs/qgroup.c | 350 ++++++++++++++++++++++++++++++++-----------
> fs/btrfs/qgroup.h | 102 ++++++++++++-
> fs/btrfs/relocation.c | 9 +-
> fs/btrfs/transaction.c | 8 +-
> include/trace/events/btrfs.h | 73 ++++++++-
> 15 files changed, 537 insertions(+), 181 deletions(-)
>
next prev parent reply other threads:[~2017-12-12 14:16 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-12 7:34 [PATCH 00/14] Qgroup metadata reservation rework Qu Wenruo
2017-12-12 7:34 ` [PATCH 01/14] btrfs: qgroup: Skeleton to support separate qgroup reservation type Qu Wenruo
2017-12-12 7:34 ` [PATCH 02/14] btrfs: qgroup: Introduce helpers to update and access new qgroup rsv Qu Wenruo
2017-12-21 15:23 ` Nikolay Borisov
2017-12-12 7:34 ` [PATCH 03/14] btrfs: qgroup: Make qgroup_reserve and its callers to use separate reservation type Qu Wenruo
2017-12-12 7:34 ` [PATCH 04/14] btrfs: qgroup: Fix wrong qgroup reservation update for relationship modification Qu Wenruo
2017-12-12 7:34 ` [PATCH 05/14] btrfs: qgroup: Update trace events to use new separate rsv types Qu Wenruo
2017-12-12 7:34 ` [PATCH 06/14] btrfs: qgroup: Cleanup the remaining old reservation counters Qu Wenruo
2017-12-12 7:34 ` [PATCH 07/14] btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans Qu Wenruo
2017-12-12 7:34 ` [PATCH 08/14] btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup Qu Wenruo
2017-12-12 7:34 ` [PATCH 09/14] btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS Qu Wenruo
2017-12-12 7:34 ` [PATCH 10/14] btrfs: qgroup: Use separate meta reservation type for delalloc Qu Wenruo
2017-12-12 7:34 ` [PATCH 11/14] btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item Qu Wenruo
2017-12-12 7:34 ` [PATCH 12/14] btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta reserved space Qu Wenruo
2017-12-12 7:34 ` [PATCH 13/14] btrfs: qgroup: Update trace events for metadata reservation Qu Wenruo
2017-12-12 7:34 ` [PATCH 14/14] Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" Qu Wenruo
2017-12-12 14:16 ` Nikolay Borisov [this message]
2017-12-12 18:01 ` [PATCH 00/14] Qgroup metadata reservation rework David Sterba
2017-12-13 0:54 ` Qu Wenruo
2017-12-12 21:12 ` David Sterba
2017-12-13 0:55 ` Qu Wenruo
2018-03-26 14:10 ` David Sterba
2018-03-26 23:49 ` Qu Wenruo
2018-03-27 15:23 ` David Sterba
2018-03-27 18:00 ` Filipe Manana
2018-03-27 16:30 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c56a262-51fb-b66d-3d1b-f4b0a906cef5@suse.com \
--to=nborisov@suse.com \
--cc=dsterba@suse.cz \
--cc=jeffm@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).