From: Jeff Mahoney <jeffm@suse.com>
To: fdmanana@gmail.com
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2] btrfs: iterate over unused chunk space in FITRIM
Date: Thu, 14 May 2015 09:07:51 -0400 [thread overview]
Message-ID: <55549E27.9060006@suse.com> (raw)
In-Reply-To: <CAL3q7H5cA6h8AFgOui6Lqo6LJ+t5zme+ZMC6RJd26Rn30GC5eA@mail.gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 5/14/15 5:33 AM, Filipe David Manana wrote:
> On Fri, May 8, 2015 at 11:45 PM, Jeff Mahoney <jeffm@suse.com>
> wrote:
>>
>>> On May 8, 2015, at 5:16 PM, Filipe David Manana
>>> <fdmanana@gmail.com> wrote:
>>>
>>>> On Fri, May 8, 2015 at 9:38 PM, Filipe David Manana
>>>> <fdmanana@gmail.com> wrote:
>>>>> On Tue, May 5, 2015 at 7:19 PM, Jeff Mahoney
>>>>> <jeffm@suse.com> wrote: Since we now clean up block groups
>>>>> automatically as they become empty, iterating over block
>>>>> groups is no longer sufficient to discard unused space.
>>>>>
>>>>> This patch iterates over the unused chunk space and
>>>>> discards any regions that are unallocated, regardless of
>>>>> whether they were ever used. This is a change for btrfs
>>>>> but is consistent with other file systems.
>>>>>
>>>>> We do this in a transactionless manner since the discard
>>>>> process can take a substantial amount of time and a
>>>>> transaction would need to be started before the acquisition
>>>>> of the device list lock. That would mean a transaction
>>>>> would be held open across /all/ of the discards
>>>>> collectively. In order to prevent other threads from
>>>>> allocating or freeing chunks, we hold the chunks lock
>>>>> across the search and discard calls. We release it between
>>>>> searches to allow the file system to perform more-or-less
>>>>> normally. Since the running transaction can commit and
>>>>> disappear while we're using the transaction pointer, we
>>>>> take a reference to it and release it after the search.
>>>>> This is safe since it would happen normally at the end of
>>>>> the transaction commit after any locks are released
>>>>> anyway.
>>>>>
>>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com> ---
>>>>
>>>> Hi Jeff,
>>>>
>>>> Missing changelog describing what changed between v1 and v2.
>>>>
>>>>> fs/btrfs/extent-tree.c | 96
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> fs/btrfs/volumes.c | 61
>>>>> ++++++++++++++++++++------------ fs/btrfs/volumes.h |
>>>>> 3 ++ 3 files changed, 137 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/extent-tree.c
>>>>> b/fs/btrfs/extent-tree.c index 0ec8e22..6d1d74d 100644 ---
>>>>> a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@
>>>>> -10031,10 +10031,94 @@ int
>>>>> btrfs_error_unpin_extent_range(struct btrfs_root *root, u64
>>>>> start, u64 end) return unpin_extent_range(root, start, end,
>>>>> false); }
>>>>>
>>>>> +/* + * It used to be that old block groups would be left
>>>>> around forever. + * Iterating over them would be enough to
>>>>> trim unused space. Since we + * now automatically remove
>>>>> them, we also need to iterate over unallocated + * space. +
>>>>> * + * We don't want a transaction for this since the
>>>>> discard may take a + * substantial amount of time. We
>>>>> don't require that a transaction be + * running, but we do
>>>>> need to take a running transaction into account + * to
>>>>> ensure that we're not discarding chunks that were released
>>>>> in + * the current transaction. + * + * Holding the chunks
>>>>> lock will prevent other threads from allocating + * or
>>>>> releasing chunks, but it won't prevent a running
>>>>> transaction + * from committing and releasing the memory
>>>>> that the pending chunks + * list head uses. For that, we
>>>>> need to take a reference to the + * transaction. + */
>>>>> +static int btrfs_trim_free_extents(struct btrfs_device
>>>>> *device, + u64 minlen, u64
>>>>> *trimmed) +{ + u64 start = 0, len = 0; + int
>>>>> ret; + + *trimmed = 0; + + /* Not writeable =
>>>>> nothing to do. */ + if (!device->writeable) +
>>>>> return 0; + + /* No free space = nothing to do. */ +
>>>>> if (device->total_bytes <= device->bytes_used) +
>>>>> return 0; + + ret = 0; + + while (1) { +
>>>>> struct btrfs_fs_info *fs_info = device->dev_root->fs_info;
>>>>> + struct btrfs_transaction *trans; + +
>>>>> ret = mutex_lock_interruptible(&fs_info->chunk_mutex); +
>>>>> if (ret) + return ret; + +
>>>>> spin_lock(&fs_info->trans_lock); + trans =
>>>>> fs_info->running_transaction; + if (trans) +
>>>>> atomic_inc(&trans->use_count); +
>>>>> spin_unlock(&fs_info->trans_lock); + + ret =
>>>>> find_free_dev_extent_start(trans, device, minlen, start, +
>>>>> &start, &len); + if (trans) +
>>>>> btrfs_put_transaction(trans); + + if (ret) {
>>>>> +
>>>>> mutex_unlock(&fs_info->chunk_mutex); +
>>>>> if (ret == -ENOSPC) + ret =
>>>>> 0; + break; + } + +
>>>>> ret = btrfs_issue_discard(device->bdev, start, len); +
>>>>> mutex_unlock(&fs_info->chunk_mutex); + + if
>>>>> (ret) + break; + +
>>>>> start += len; + *trimmed += len; + +
>>>>> if (fatal_signal_pending(current)) { +
>>>>> ret = -ERESTARTSYS; + break; +
>>>>> } + + cond_resched(); + } + +
>>>>> return ret; +} + int btrfs_trim_fs(struct btrfs_root *root,
>>>>> struct fstrim_range *range) { struct btrfs_fs_info *fs_info
>>>>> = root->fs_info; struct btrfs_block_group_cache *cache =
>>>>> NULL; + struct btrfs_device *device; + struct
>>>>> list_head *devices; u64 group_trimmed; u64 start; u64 end;
>>>>> @@ -10089,6 +10173,18 @@ int btrfs_trim_fs(struct
>>>>> btrfs_root *root, struct fstrim_range *range) cache =
>>>>> next_block_group(fs_info->tree_root, cache); }
>>>>>
>>>>> +
>>>>> mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
>>>>> + devices = &root->fs_info->fs_devices->alloc_list; +
>>>>> list_for_each_entry(device, devices, dev_alloc_list) { +
>>>>> ret = btrfs_trim_free_extents(device, range->minlen, +
>>>>> &group_trimmed); + if (ret) +
>>>>> break; + + trimmed += group_trimmed; +
>>>>> } +
>>>>> mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
>>>>>
>>>>>
+
>>>>> range->len = trimmed; return ret; } diff --git
>>>>> a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
>>>>> 96aebf3..ada8965 100644 --- a/fs/btrfs/volumes.c +++
>>>>> b/fs/btrfs/volumes.c @@ -1051,15 +1051,19 @@ out: return
>>>>> ret; }
>>>>>
>>>>> -static int contains_pending_extent(struct
>>>>> btrfs_trans_handle *trans, +static int
>>>>> contains_pending_extent(struct btrfs_transaction
>>>>> *transaction, struct btrfs_device *device, u64 *start, u64
>>>>> len) { + struct btrfs_fs_info *fs_info =
>>>>> device->dev_root->fs_info; struct extent_map *em; -
>>>>> struct list_head *search_list =
>>>>> &trans->transaction->pending_chunks; + struct
>>>>> list_head *search_list = &transaction->pending_chunks;
>>>>
>>>> transaction can be NULL so we need to check for that first
>>>> (as you do below).
>>>>
>>>>> int ret = 0; u64 physical_start = *start;
>>>>>
>>>>> + if (!transaction) + search_list =
>>>>> &fs_info->pinned_chunks; + again: list_for_each_entry(em,
>>>>> search_list, list) { struct map_lookup *map; @@ -1078,8
>>>>> +1082,8 @@ again: ret = 1; } } - if (search_list ==
>>>>> &trans->transaction->pending_chunks) { -
>>>>> search_list = &trans->root->fs_info->pinned_chunks; +
>>>>> if (search_list == &transaction->pending_chunks) {
>>>
>>> And here missing the null test too, which should be something
>>> like:
>>>
>>> if (transaction && search_list == &transaction->pending_chunks)
>>> {
>>>
>>
>> That's the same case as above. We're not dereferencing
>> transaction here either. The test in the if statement will fail
>> because &transaction->pending_chunks will be 0x100 or something.
>
> Right. Sorry friday evening review. What you are doing is valid
> then and something very close to the offsetof implementation.
Exactly. I got enough pushback on it that I just changed it around so
there aren't any questions, though.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
iQIcBAEBAgAGBQJVVJ4nAAoJEB57S2MheeWyOJUQAKRdRkTzw6QGPBbEHL4mz5fV
u/1EUvnsf3nUf1xvjxhnimAkxdVFJ66jV4Et3MaZXQwx/ejVByn4VfT7wi58AiYD
x3mzbGgMBB2lxHrS4OZD2yO7nh+n9IRSKE3yVuKkqgXWqL0meyJ5FvU3cpVWaj5u
v7ka5pyFEIlaymJqcvFJX57uKYKgUKJkAreGYqEvvCpZYc+BWibmp1svsN66824l
kO0aBLIz0WCPc0CuG3i9YSRXMSIbf8sV6QygpQiaCQ2WfJiDNZINjNwUzK+mwBq9
UtMs3LiyqRMMIINSM6Ey14s842ov6Xh3DsIDS1YzqzyE0it4tYjS6o5lTK2vpmrM
RQVDGxu3btlFlWfTPOP32Ts1tOWdHB6bMOezca53p3I/sLvA5g5snonqbEDPltov
DnHNTskDdFAuh/3QzTpGHU93cPZJE3vp8bpQfc1UQuUs9NzN/b26ghxau1Rto87q
v0KKcp9qnU+2nQ/6TxKekudKBWTT+glene73ddHct0WRwtQWwPIZihL028llzHF4
Iu9JEE3ov879PMTvUzFFfc6cSKPQGy9UNjn/O0E0G6km/GFAW/BHNDeywa0O+Smh
DPJ4GLXYHKw4JHqGYtbuoMwt2KXZd/9XmCUiBPo0s+wRCqz9Ms3J1T1UMmhqJDOt
lLa/DggUBMYfjGOVHS/2
=WAYC
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2015-05-14 13:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-05 18:19 [PATCH v2] btrfs: iterate over unused chunk space in FITRIM Jeff Mahoney
2015-05-08 20:38 ` Filipe David Manana
2015-05-08 21:16 ` Filipe David Manana
2015-05-08 22:45 ` Jeff Mahoney
2015-05-14 9:33 ` Filipe David Manana
2015-05-14 13:07 ` Jeff Mahoney [this message]
2015-05-08 22:17 ` Jeff Mahoney
2015-05-09 0:18 ` Omar Sandoval
2015-05-09 3:57 ` Jeff Mahoney
2015-05-14 9:40 ` Filipe David Manana
2015-05-14 12:54 ` Jeff Mahoney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55549E27.9060006@suse.com \
--to=jeffm@suse.com \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.