From: Jeff Mahoney <jeffm@suse.com>
To: fdmanana@gmail.com
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2] btrfs: iterate over unused chunk space in FITRIM
Date: Thu, 14 May 2015 09:07:51 -0400 [thread overview]
Message-ID: <55549E27.9060006@suse.com> (raw)
In-Reply-To: <CAL3q7H5cA6h8AFgOui6Lqo6LJ+t5zme+ZMC6RJd26Rn30GC5eA@mail.gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 5/14/15 5:33 AM, Filipe David Manana wrote:
> On Fri, May 8, 2015 at 11:45 PM, Jeff Mahoney <jeffm@suse.com>
> wrote:
>>
>>> On May 8, 2015, at 5:16 PM, Filipe David Manana
>>> <fdmanana@gmail.com> wrote:
>>>
>>>> On Fri, May 8, 2015 at 9:38 PM, Filipe David Manana
>>>> <fdmanana@gmail.com> wrote:
>>>>> On Tue, May 5, 2015 at 7:19 PM, Jeff Mahoney
>>>>> <jeffm@suse.com> wrote: Since we now clean up block groups
>>>>> automatically as they become empty, iterating over block
>>>>> groups is no longer sufficient to discard unused space.
>>>>>
>>>>> This patch iterates over the unused chunk space and
>>>>> discards any regions that are unallocated, regardless of
>>>>> whether they were ever used. This is a change for btrfs
>>>>> but is consistent with other file systems.
>>>>>
>>>>> We do this in a transactionless manner since the discard
>>>>> process can take a substantial amount of time and a
>>>>> transaction would need to be started before the acquisition
>>>>> of the device list lock. That would mean a transaction
>>>>> would be held open across /all/ of the discards
>>>>> collectively. In order to prevent other threads from
>>>>> allocating or freeing chunks, we hold the chunks lock
>>>>> across the search and discard calls. We release it between
>>>>> searches to allow the file system to perform more-or-less
>>>>> normally. Since the running transaction can commit and
>>>>> disappear while we're using the transaction pointer, we
>>>>> take a reference to it and release it after the search.
>>>>> This is safe since it would happen normally at the end of
>>>>> the transaction commit after any locks are released
>>>>> anyway.
>>>>>
>>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com> ---
>>>>
>>>> Hi Jeff,
>>>>
>>>> Missing changelog describing what changed between v1 and v2.
>>>>
>>>>> fs/btrfs/extent-tree.c | 96
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> fs/btrfs/volumes.c | 61
>>>>> ++++++++++++++++++++------------ fs/btrfs/volumes.h |
>>>>> 3 ++ 3 files changed, 137 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/extent-tree.c
>>>>> b/fs/btrfs/extent-tree.c index 0ec8e22..6d1d74d 100644 ---
>>>>> a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@
>>>>> -10031,10 +10031,94 @@ int
>>>>> btrfs_error_unpin_extent_range(struct btrfs_root *root, u64
>>>>> start, u64 end) return unpin_extent_range(root, start, end,
>>>>> false); }
>>>>>
>>>>> +/* + * It used to be that old block groups would be left
>>>>> around forever. + * Iterating over them would be enough to
>>>>> trim unused space. Since we + * now automatically remove
>>>>> them, we also need to iterate over unallocated + * space. +
>>>>> * + * We don't want a transaction for this since the
>>>>> discard may take a + * substantial amount of time. We
>>>>> don't require that a transaction be + * running, but we do
>>>>> need to take a running transaction into account + * to
>>>>> ensure that we're not discarding chunks that were released
>>>>> in + * the current transaction. + * + * Holding the chunks
>>>>> lock will prevent other threads from allocating + * or
>>>>> releasing chunks, but it won't prevent a running
>>>>> transaction + * from committing and releasing the memory
>>>>> that the pending chunks + * list head uses. For that, we
>>>>> need to take a reference to the + * transaction. + */
>>>>> +static int btrfs_trim_free_extents(struct btrfs_device
>>>>> *device, + u64 minlen, u64
>>>>> *trimmed) +{ + u64 start = 0, len = 0; + int
>>>>> ret; + + *trimmed = 0; + + /* Not writeable =
>>>>> nothing to do. */ + if (!device->writeable) +
>>>>> return 0; + + /* No free space = nothing to do. */ +
>>>>> if (device->total_bytes <= device->bytes_used) +
>>>>> return 0; + + ret = 0; + + while (1) { +
>>>>> struct btrfs_fs_info *fs_info = device->dev_root->fs_info;
>>>>> + struct btrfs_transaction *trans; + +
>>>>> ret = mutex_lock_interruptible(&fs_info->chunk_mutex); +
>>>>> if (ret) + return ret; + +
>>>>> spin_lock(&fs_info->trans_lock); + trans =
>>>>> fs_info->running_transaction; + if (trans) +
>>>>> atomic_inc(&trans->use_count); +
>>>>> spin_unlock(&fs_info->trans_lock); + + ret =
>>>>> find_free_dev_extent_start(trans, device, minlen, start, +
>>>>> &start, &len); + if (trans) +
>>>>> btrfs_put_transaction(trans); + + if (ret) {
>>>>> +
>>>>> mutex_unlock(&fs_info->chunk_mutex); +
>>>>> if (ret == -ENOSPC) + ret =
>>>>> 0; + break; + } + +
>>>>> ret = btrfs_issue_discard(device->bdev, start, len); +
>>>>> mutex_unlock(&fs_info->chunk_mutex); + + if
>>>>> (ret) + break; + +
>>>>> start += len; + *trimmed += len; + +
>>>>> if (fatal_signal_pending(current)) { +
>>>>> ret = -ERESTARTSYS; + break; +
>>>>> } + + cond_resched(); + } + +
>>>>> return ret; +} + int btrfs_trim_fs(struct btrfs_root *root,
>>>>> struct fstrim_range *range) { struct btrfs_fs_info *fs_info
>>>>> = root->fs_info; struct btrfs_block_group_cache *cache =
>>>>> NULL; + struct btrfs_device *device; + struct
>>>>> list_head *devices; u64 group_trimmed; u64 start; u64 end;
>>>>> @@ -10089,6 +10173,18 @@ int btrfs_trim_fs(struct
>>>>> btrfs_root *root, struct fstrim_range *range) cache =
>>>>> next_block_group(fs_info->tree_root, cache); }
>>>>>
>>>>> +
>>>>> mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
>>>>> + devices = &root->fs_info->fs_devices->alloc_list; +
>>>>> list_for_each_entry(device, devices, dev_alloc_list) { +
>>>>> ret = btrfs_trim_free_extents(device, range->minlen, +
>>>>> &group_trimmed); + if (ret) +
>>>>> break; + + trimmed += group_trimmed; +
>>>>> } +
>>>>> mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
>>>>>
>>>>>
+
>>>>> range->len = trimmed; return ret; } diff --git
>>>>> a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
>>>>> 96aebf3..ada8965 100644 --- a/fs/btrfs/volumes.c +++
>>>>> b/fs/btrfs/volumes.c @@ -1051,15 +1051,19 @@ out: return
>>>>> ret; }
>>>>>
>>>>> -static int contains_pending_extent(struct
>>>>> btrfs_trans_handle *trans, +static int
>>>>> contains_pending_extent(struct btrfs_transaction
>>>>> *transaction, struct btrfs_device *device, u64 *start, u64
>>>>> len) { + struct btrfs_fs_info *fs_info =
>>>>> device->dev_root->fs_info; struct extent_map *em; -
>>>>> struct list_head *search_list =
>>>>> &trans->transaction->pending_chunks; + struct
>>>>> list_head *search_list = &transaction->pending_chunks;
>>>>
>>>> transaction can be NULL so we need to check for that first
>>>> (as you do below).
>>>>
>>>>> int ret = 0; u64 physical_start = *start;
>>>>>
>>>>> + if (!transaction) + search_list =
>>>>> &fs_info->pinned_chunks; + again: list_for_each_entry(em,
>>>>> search_list, list) { struct map_lookup *map; @@ -1078,8
>>>>> +1082,8 @@ again: ret = 1; } } - if (search_list ==
>>>>> &trans->transaction->pending_chunks) { -
>>>>> search_list = &trans->root->fs_info->pinned_chunks; +
>>>>> if (search_list == &transaction->pending_chunks) {
>>>
>>> And here missing the null test too, which should be something
>>> like:
>>>
>>> if (transaction && search_list == &transaction->pending_chunks)
>>> {
>>>
>>
>> That's the same case as above. We're not dereferencing
>> transaction here either. The test in the if statement will fail
>> because &transaction->pending_chunks will be 0x100 or something.
>
> Right. Sorry friday evening review. What you are doing is valid
> then and something very close to the offsetof implementation.
Exactly. I got enough pushback on it that I just changed it around so
there aren't any questions, though.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
iQIcBAEBAgAGBQJVVJ4nAAoJEB57S2MheeWyOJUQAKRdRkTzw6QGPBbEHL4mz5fV
u/1EUvnsf3nUf1xvjxhnimAkxdVFJ66jV4Et3MaZXQwx/ejVByn4VfT7wi58AiYD
x3mzbGgMBB2lxHrS4OZD2yO7nh+n9IRSKE3yVuKkqgXWqL0meyJ5FvU3cpVWaj5u
v7ka5pyFEIlaymJqcvFJX57uKYKgUKJkAreGYqEvvCpZYc+BWibmp1svsN66824l
kO0aBLIz0WCPc0CuG3i9YSRXMSIbf8sV6QygpQiaCQ2WfJiDNZINjNwUzK+mwBq9
UtMs3LiyqRMMIINSM6Ey14s842ov6Xh3DsIDS1YzqzyE0it4tYjS6o5lTK2vpmrM
RQVDGxu3btlFlWfTPOP32Ts1tOWdHB6bMOezca53p3I/sLvA5g5snonqbEDPltov
DnHNTskDdFAuh/3QzTpGHU93cPZJE3vp8bpQfc1UQuUs9NzN/b26ghxau1Rto87q
v0KKcp9qnU+2nQ/6TxKekudKBWTT+glene73ddHct0WRwtQWwPIZihL028llzHF4
Iu9JEE3ov879PMTvUzFFfc6cSKPQGy9UNjn/O0E0G6km/GFAW/BHNDeywa0O+Smh
DPJ4GLXYHKw4JHqGYtbuoMwt2KXZd/9XmCUiBPo0s+wRCqz9Ms3J1T1UMmhqJDOt
lLa/DggUBMYfjGOVHS/2
=WAYC
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2015-05-14 13:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-05 18:19 [PATCH v2] btrfs: iterate over unused chunk space in FITRIM Jeff Mahoney
2015-05-08 20:38 ` Filipe David Manana
2015-05-08 21:16 ` Filipe David Manana
2015-05-08 22:45 ` Jeff Mahoney
2015-05-14 9:33 ` Filipe David Manana
2015-05-14 13:07 ` Jeff Mahoney [this message]
2015-05-08 22:17 ` Jeff Mahoney
2015-05-09 0:18 ` Omar Sandoval
2015-05-09 3:57 ` Jeff Mahoney
2015-05-14 9:40 ` Filipe David Manana
2015-05-14 12:54 ` Jeff Mahoney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55549E27.9060006@suse.com \
--to=jeffm@suse.com \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).