linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Mahoney <jeffm@suse.com>
To: fdmanana@gmail.com
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2] btrfs: iterate over unused chunk space in FITRIM
Date: Fri, 08 May 2015 18:17:26 -0400	[thread overview]
Message-ID: <554D35F6.8040601@suse.com> (raw)
In-Reply-To: <CAL3q7H5O0w+_EPSbpMyWTYzyzyCOxHmU-wPRx7w85ftUzHzXgQ@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/8/15 4:38 PM, Filipe David Manana wrote:
> On Tue, May 5, 2015 at 7:19 PM, Jeff Mahoney <jeffm@suse.com>
> wrote:
>> Since we now clean up block groups automatically as they become 
>> empty, iterating over block groups is no longer sufficient to
>> discard unused space.
>> 
>> This patch iterates over the unused chunk space and discards any
>> regions that are unallocated, regardless of whether they were
>> ever used.  This is a change for btrfs but is consistent with
>> other file systems.
>> 
>> We do this in a transactionless manner since the discard process
>> can take a substantial amount of time and a transaction would
>> need to be started before the acquisition of the device list
>> lock.  That would mean a transaction would be held open across
>> /all/ of the discards collectively. In order to prevent other
>> threads from allocating or freeing chunks, we hold the chunks
>> lock across the search and discard calls.  We release it between
>> searches to allow the file system to perform more-or-less 
>> normally.  Since the running transaction can commit and disappear
>> while we're using the transaction pointer, we take a reference to
>> it and release it after the search.  This is safe since it would
>> happen normally at the end of the transaction commit after any
>> locks are released anyway.
>> 
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com> ---
> 
> Hi Jeff,
> 
> Missing changelog describing what changed between v1 and v2.
> 
>> fs/btrfs/extent-tree.c | 96
>> ++++++++++++++++++++++++++++++++++++++++++++++++++ 
>> fs/btrfs/volumes.c     | 61 ++++++++++++++++++++------------ 
>> fs/btrfs/volumes.h     |  3 ++ 3 files changed, 137
>> insertions(+), 23 deletions(-)
>> 
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c 
>> index 0ec8e22..6d1d74d 100644 --- a/fs/btrfs/extent-tree.c +++
>> b/fs/btrfs/extent-tree.c @@ -10031,10 +10031,94 @@ int
>> btrfs_error_unpin_extent_range(struct btrfs_root *root, u64
>> start, u64 end) return unpin_extent_range(root, start, end,
>> false); }
>> 
>> +/* + * It used to be that old block groups would be left around
>> forever. + * Iterating over them would be enough to trim unused
>> space.  Since we + * now automatically remove them, we also need
>> to iterate over unallocated + * space. + * + * We don't want a
>> transaction for this since the discard may take a + * substantial
>> amount of time.  We don't require that a transaction be + *
>> running, but we do need to take a running transaction into
>> account + * to ensure that we're not discarding chunks that were
>> released in + * the current transaction. + * + * Holding the
>> chunks lock will prevent other threads from allocating + * or
>> releasing chunks, but it won't prevent a running transaction + *
>> from committing and releasing the memory that the pending chunks 
>> + * list head uses.  For that, we need to take a reference to
>> the + * transaction. + */ +static int
>> btrfs_trim_free_extents(struct btrfs_device *device, +
>> u64 minlen, u64 *trimmed) +{ +       u64 start = 0, len = 0; +
>> int ret; + +       *trimmed = 0; + +       /* Not writeable =
>> nothing to do. */ +       if (!device->writeable) +
>> return 0; + +       /* No free space = nothing to do. */ +
>> if (device->total_bytes <= device->bytes_used) +
>> return 0; + +       ret = 0; + +       while (1) { +
>> struct btrfs_fs_info *fs_info = device->dev_root->fs_info; +
>> struct btrfs_transaction *trans; + +               ret =
>> mutex_lock_interruptible(&fs_info->chunk_mutex); +
>> if (ret) +                       return ret; + +
>> spin_lock(&fs_info->trans_lock); +               trans =
>> fs_info->running_transaction; +               if (trans) +
>> atomic_inc(&trans->use_count); +
>> spin_unlock(&fs_info->trans_lock); + +               ret =
>> find_free_dev_extent_start(trans, device, minlen, start, +
>> &start, &len); +               if (trans) +
>> btrfs_put_transaction(trans); + +               if (ret) { +
>> mutex_unlock(&fs_info->chunk_mutex); +                       if
>> (ret == -ENOSPC) +                               ret = 0; +
>> break; +               } + +               ret =
>> btrfs_issue_discard(device->bdev, start, len); +
>> mutex_unlock(&fs_info->chunk_mutex); + +               if (ret) +
>> break; + +               start += len; +               *trimmed
>> += len; + +               if (fatal_signal_pending(current)) { +
>> ret = -ERESTARTSYS; +                       break; +
>> } + +               cond_resched(); +       } + +       return
>> ret; +} + int btrfs_trim_fs(struct btrfs_root *root, struct
>> fstrim_range *range) { struct btrfs_fs_info *fs_info =
>> root->fs_info; struct btrfs_block_group_cache *cache = NULL; +
>> struct btrfs_device *device; +       struct list_head *devices; 
>> u64 group_trimmed; u64 start; u64 end; @@ -10089,6 +10173,18 @@
>> int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range
>> *range) cache = next_block_group(fs_info->tree_root, cache); }
>> 
>> +
>> mutex_lock(&root->fs_info->fs_devices->device_list_mutex); +
>> devices = &root->fs_info->fs_devices->alloc_list; +
>> list_for_each_entry(device, devices, dev_alloc_list) { +
>> ret = btrfs_trim_free_extents(device, range->minlen, +
>> &group_trimmed); +               if (ret) +
>> break; + +               trimmed += group_trimmed; +       } +
>> mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); + 
>> range->len = trimmed; return ret; } diff --git
>> a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 96aebf3..ada8965
>> 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@
>> -1051,15 +1051,19 @@ out: return ret; }
>> 
>> -static int contains_pending_extent(struct btrfs_trans_handle
>> *trans, +static int contains_pending_extent(struct
>> btrfs_transaction *transaction, struct btrfs_device *device, u64
>> *start, u64 len) { +       struct btrfs_fs_info *fs_info =
>> device->dev_root->fs_info; struct extent_map *em; -       struct
>> list_head *search_list = &trans->transaction->pending_chunks; +
>> struct list_head *search_list = &transaction->pending_chunks;
> 
> transaction can be NULL so we need to check for that first (as you
> do below).

The ordering here is safe. We're not dereferencing the transaction,
we're getting the address of one of the members. search_list will
point to 0x100 or something. We don't ever dereference that without
checking the transaction against NULL.

- -Jeff

>> int ret = 0; u64 physical_start = *start;
>> 
>> +       if (!transaction) +               search_list =
>> &fs_info->pinned_chunks; + again: list_for_each_entry(em,
>> search_list, list) { struct map_lookup *map; @@ -1078,8 +1082,8
>> @@ again: ret = 1; } } -       if (search_list ==
>> &trans->transaction->pending_chunks) { -
>> search_list = &trans->root->fs_info->pinned_chunks; +       if
>> (search_list == &transaction->pending_chunks) { +
>> search_list = &fs_info->pinned_chunks; goto again; }
>> 
>> @@ -1088,12 +1092,13 @@ again:
>> 
>> 
>> /* - * find_free_dev_extent - find free space in the specified
>> device - * @device:    the device which we search the free space
>> in - * @num_bytes: the size of the free space that we need - *
>> @start:     store the start of the free space. - * @len:
>> the size of the free space. that we find, or the size of the max 
>> - *             free space if we don't find suitable free space +
>> * find_free_dev_extent_start - find free space in the specified
>> device + * @device:      the device which we search the free
>> space in + * @num_bytes:   the size of the free space that we
>> need + * @search_start: the position from which to begin the
>> search + * @start:       store the start of the free space. + *
>> @len:         the size of the free space. that we find, or the
>> size + *               of the max free space if we don't find
>> suitable free space * * this uses a pretty simple search, the
>> expectation is that it is * called very infrequently and that a
>> given device has a small number @@ -1107,9 +1112,9 @@ again: *
>> But if we don't find suitable free space, it is used to store the
>> size of * the max free space. */ -int find_free_dev_extent(struct
>> btrfs_trans_handle *trans, -                        struct
>> btrfs_device *device, u64 num_bytes, -                        u64
>> *start, u64 *len) +int find_free_dev_extent_start(struct
>> btrfs_transaction *transaction, +
>> struct btrfs_device *device, u64 num_bytes, +
>> u64 search_start, u64 *start, u64 *len) { struct btrfs_key key; 
>> struct btrfs_root *root = device->dev_root; @@ -1119,19 +1124,11
>> @@ int find_free_dev_extent(struct btrfs_trans_handle *trans, u64
>> max_hole_start; u64 max_hole_size; u64 extent_end; -       u64
>> search_start; u64 search_end = device->total_bytes; int ret; int
>> slot; struct extent_buffer *l;
>> 
>> -       /* FIXME use last free of some kind */ - -       /* we
>> don't want to overwrite the superblock on the drive, -        *
>> so we make sure to start at an offset of at least 1MB -
>> */ -       search_start = max(root->fs_info->alloc_start, 1024ull
>> * 1024); - path = btrfs_alloc_path(); if (!path) return -ENOMEM; 
>> @@ -1192,7 +1189,7 @@ again: * Have to check before we set
>> max_hole_start, otherwise * we could end up sending back this
>> offset anyway. */ -                       if
>> (contains_pending_extent(trans, device, +
>> if (contains_pending_extent(transaction, device, &search_start, 
>> hole_size)) { if (key.offset >= search_start) { @@ -1241,7
>> +1238,7 @@ next: if (search_end > search_start) { hole_size =
>> search_end - search_start;
>> 
>> -               if (contains_pending_extent(trans, device,
>> &search_start, +               if
>> (contains_pending_extent(transaction, device, &search_start, 
>> hole_size)) { btrfs_release_path(path); goto again; @@ -1267,6
>> +1264,24 @@ out: return ret; }
>> 
>> +int find_free_dev_extent(struct btrfs_trans_handle *trans, +
>> struct btrfs_device *device, u64 num_bytes, +
>> u64 *start, u64 *len) +{ +       struct btrfs_root *root =
>> device->dev_root; +       u64 search_start; + +       /* FIXME
>> use last free of some kind */ + +       /* +        * we don't
>> want to overwrite the superblock on the drive, +        * so we
>> make sure to start at an offset of at least 1MB +        */ +
>> search_start = max(root->fs_info->alloc_start, 1024ull * 1024); +
>> return find_free_dev_extent_start(trans->transaction, device, +
>> num_bytes, search_start, start, len); +} + static int
>> btrfs_free_dev_extent(struct btrfs_trans_handle *trans, struct
>> btrfs_device *device, u64 start, u64 *dev_extent_len) diff --git
>> a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ebc3133..30918a8
>> 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@
>> -449,6 +449,9 @@ int btrfs_cancel_balance(struct btrfs_fs_info
>> *fs_info); int btrfs_create_uuid_tree(struct btrfs_fs_info
>> *fs_info); int btrfs_check_uuid_tree(struct btrfs_fs_info
>> *fs_info); int btrfs_chunk_readonly(struct btrfs_root *root, u64
>> chunk_offset); +int find_free_dev_extent_start(struct
>> btrfs_transaction *transaction, +                        struct
>> btrfs_device *device, u64 num_bytes, +                        u64
>> search_start, u64 *start, u64 *max_avail); int
>> find_free_dev_extent(struct btrfs_trans_handle *trans, struct
>> btrfs_device *device, u64 num_bytes, u64 *start, u64
>> *max_avail); -- 1.8.5.6
>> 
>> 
>> -- Jeff Mahoney SUSE Labs -- To unsubscribe from this list: send
>> the line "unsubscribe linux-btrfs" in the body of a message to
>> majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> 
> 
> 


- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJVTTX2AAoJEB57S2MheeWyUGYP/iIl5JchDH1JqFV/AWtdrf4B
huxL1gFYH8hAVj0pr1av7+EnbPp9rPwIFZXanvbJC9KFM3SCGEnF9/IYHeTzxiUl
ZH7UC6E4xnr4xXrgKVT5nhOIDwg5cSdvuBzGfG6HttdL7KxhiHNGATDAXIn/hU1h
XbU+aze8RR5+AHmh4ZboMAJr0/2VRU7C5aHyzvnCJbBwY6IqaI+RLyJ5ClQsIK1H
PTMAgNmKogieEmNcFDq4UKaMZDZ0lj4A3QDJ6XAGvXyFcecncz8vF9QrurusslDO
b074P1QVnzCPlejiIxsalPjT441tvWTPc5tcIa7Is6Zuoe9ezjN3LFwUA1Yvr061
0/D667gN2/+Tjgak5zAvD/efkvzXdZV1GOm9fZdFeR6cBhXVYv2sZVBmf7llEtn6
EAk6gqqHmdJQT1SgG1hOqSGVAKH4FbGUGq40zuKpey8Yna0iOaYxMssYxEnJIeeP
EPt56nIv84Np69pFIsembp1ZB6Evy5yFm6DzjFwUQ8kH8NHbz1GqMDKaOCuCAXPn
hiYG0VK6tNrfwmzthxmgwlTwYtA2wx9ug/i8fnOLwkFyxnOJJxX3RGBRe5Wiqcaf
RKP2a3ZbXSl+OyNguqr0Zg4GXcgtrccSqXj9FGvUqNSVTdYoo5B6ctbKPZtxA0u1
v5n8rcnwDvuLDKJuldG4
=HZMM
-----END PGP SIGNATURE-----

  parent reply	other threads:[~2015-05-08 22:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-05 18:19 [PATCH v2] btrfs: iterate over unused chunk space in FITRIM Jeff Mahoney
2015-05-08 20:38 ` Filipe David Manana
2015-05-08 21:16   ` Filipe David Manana
2015-05-08 22:45     ` Jeff Mahoney
2015-05-14  9:33       ` Filipe David Manana
2015-05-14 13:07         ` Jeff Mahoney
2015-05-08 22:17   ` Jeff Mahoney [this message]
2015-05-09  0:18     ` Omar Sandoval
2015-05-09  3:57       ` Jeff Mahoney
2015-05-14  9:40 ` Filipe David Manana
2015-05-14 12:54   ` Jeff Mahoney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554D35F6.8040601@suse.com \
    --to=jeffm@suse.com \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).