linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/21] Rework btrfs qgroup reserved space framework
@ 2015-10-13  2:20 Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits Qu Wenruo
                   ` (20 more replies)
  0 siblings, 21 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

In previous rework of qgroup, we succeeded in fixing qgroup accounting
part, making the rfer/excl numbers accurate.

But that's just part of qgroup work, another part of qgroup still has
quite a lot problem, that's qgroup reserve space part which will lead to
EQUOT even we are far from the limit.

[[BUG]]
The easiest way to trigger the bug is,
1) Enable quota
2) Limit excl of qgroup 5 to 16M
3) Write [0,2M) of a file inside subvol 5 10 times without sync

EQUOT will be triggered at about the 8th write.
But after remount, we can still write until about 15M.

[[CAUSE]]
The problem is caused by the fact that qgroup will reserve space even
the data space is already reserved.

In above reproducer, each time we buffered write [0,2M) qgroup will
reserve 2M space, but in fact, at the 1st time, we have already reserved
2M and from then on, we don't need to reserved any data space as we are
only writing [0,2M).

Also, the reserved space will only be freed *ONCE* when its backref is
run at commit_transaction() time.

That's causing the reserved space leaking.

[[FIX]]
Reuse the existing io_tree facilities to record which range is already
reserved for qgroup.

Although qgroup reserved space behavior is quite similar with already
existing DELALLOC flag, but since fallocate don't go through DELALLOC
flag, we introduce a new extent flag, EXTENT_QGROUP_RESERVED for our own
purpose, without interfering any existing flag.

The new API itself is quite safe, any stupid caller reserve or free a
range twice or more won't cause any problem, due to the nature of the
design.

[[PATCH STRUCTURE]]
As the patchset is a little huge, it can be spilt into different parts:
1) Accurate reserve space framework API(Patch 1 ~ 8)
   Use io_tree to implement the needed data reserve API.
   And slightly change the metadata reserve API

2) Apply needed hooks to related callers(Pathc 9 ~ 18)
   The following functions need to be converted to using new qgroup
   reserve API:
   btrfs_check_free_data_space()
   btrfs_free_reserved_data_space()
   btrfs_delalloc_reserve_space()
   btrfs_delalloc_release_space()

   And the following function need to change its behavior for accurate
   qgroup reserve space:
   btrfs_fallocate()

   Also add ftrace support for new APIs in patch 17.

3) Minor enhancement and fix(Patch 19~21)
   Avoid unneeded page truncating (Patch 19)
   Fix a deadlock due to lock io_tree with io_tree lock hold in
   set_bit_hook() (Patch 20)
   And finally, makes qgroup reserved space much more obvious for
   further debugging (Patch 21)

[[Changelog]]
v2:
  Add new handlers to avoid reserved space leaking for buffered write
  followed by a truncate:
    btrfs_invalidatepage()
    evict_inode_truncate_page()
  Add new handlers to avoid reserved space leaking for error handle
  routine:
    btrfs_free_reserved_data_space()
    btrfs_delalloc_release_space()

v3:
  Use io_tree to implement data reserve map, which hugely reduced the
  patchset size, from 1300+ lines net insert to 600+ lines net insert.
  Suggested-by: Josef Bacik<jbacik@fb.com>

Qu Wenruo (21):
  btrfs: extent_io: Introduce needed structure for recoding set/clear
    bits
  btrfs: extent_io: Introduce new function set_record_extent_bits
  btrfs: extent_io: Introduce new function clear_record_extent_bits()
  btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
  btrfs: qgroup: Introduce functions to release/free qgroup reserve data
        space
  btrfs: delayed_ref: Add new function to record reserved space into
    delayed ref
  btrfs: delayed_ref: release and free qgroup reserved at proper timing
  btrfs: qgroup: Introduce new functions to reserve/free metadata
  btrfs: qgroup: Use new metadata reservation.
  btrfs: extent-tree: Add new version of btrfs_check_data_free_space and
    btrfs_free_reserved_data_space.
  btrfs: extent-tree: Switch to new check_data_free_space and
    free_reserved_data_space
  btrfs: extent-tree: Add new version of
    btrfs_delalloc_reserve/release_space
  btrfs: extent-tree: Switch to new delalloc space reserve and release
  btrfs: qgroup: Cleanup old inaccurate facilities
  btrfs: qgroup: Add handler for NOCOW and inline
  btrfs: Add handler for invalidate page
  btrfs: qgroup: Add new trace point for qgroup data reserve
  btrfs: fallocate: Add support to accurate qgroup reserve
  btrfs: Avoid truncate tailing page if fallocate range doesn't exceed
    inode size
  btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in
    clear_bit_hook
  btrfs: qgroup: Check if qgroup reserved space leaked

 fs/btrfs/ctree.h             |  14 ++-
 fs/btrfs/delayed-ref.c       |  29 +++++++
 fs/btrfs/delayed-ref.h       |  14 +++
 fs/btrfs/disk-io.c           |   1 +
 fs/btrfs/extent-tree.c       | 149 ++++++++++++++++++++++----------
 fs/btrfs/extent_io.c         | 121 +++++++++++++++++++-------
 fs/btrfs/extent_io.h         |  19 +++++
 fs/btrfs/file.c              | 193 +++++++++++++++++++++++++++++------------
 fs/btrfs/inode-map.c         |   6 +-
 fs/btrfs/inode.c             |  86 ++++++++++++++++---
 fs/btrfs/ioctl.c             |  10 ++-
 fs/btrfs/qgroup.c            | 199 ++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/qgroup.h            |  31 ++++++-
 fs/btrfs/relocation.c        |   8 +-
 fs/btrfs/transaction.c       |  34 ++------
 fs/btrfs/transaction.h       |   1 -
 include/trace/events/btrfs.h | 113 ++++++++++++++++++++++++
 17 files changed, 832 insertions(+), 196 deletions(-)

-- 
2.6.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-10-29  6:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 02/21] btrfs: extent_io: Introduce new function set_record_extent_bits Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 03/21] btrfs: extent_io: Introduce new function clear_record_extent_bits() Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 04/21] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 05/21] btrfs: qgroup: Introduce functions to release/free qgroup reserve data space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-10-25 14:39   ` Filipe Manana
2015-10-26  1:27     ` Qu Wenruo
2015-10-27  4:13     ` Qu Wenruo
2015-10-27  5:14       ` Chris Mason
2015-10-27  5:48         ` Qu Wenruo
2015-10-27  6:12           ` Chris Mason
2015-10-27  7:26             ` Qu Wenruo
2015-10-27  9:05             ` Qu Wenruo
2015-10-27 11:34               ` Chris Mason
2015-10-28  0:25                 ` Qu Wenruo
2015-10-28 13:36                 ` Holger Hoffstätte
2015-10-29  6:29                   ` Chris Mason
2015-10-27  9:22       ` Filipe Manana
2015-10-13  2:20 ` [PATCH v3 07/21] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 08/21] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 09/21] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 10/21] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 11/21] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 12/21] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 13/21] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 14/21] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 15/21] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 16/21] btrfs: Add handler for invalidate page Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 17/21] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 18/21] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 19/21] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 20/21] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 21/21] btrfs: qgroup: Check if qgroup reserved space leaked Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).