public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org,
	Jeff Mahoney <jeffm@suse.com>
Subject: Re: [PATCH v3 07/12] btrfs: replace pending/pinned chunks lists with io tree
Date: Mon, 25 Mar 2019 18:43:49 +0200	[thread overview]
Message-ID: <75db110a-db25-e1ff-f74d-997b61532bf7@suse.com> (raw)
In-Reply-To: <20190325162651.GI10640@twin.jikos.cz>



On 25.03.19 г. 18:26 ч., David Sterba wrote:
> On Mon, Mar 25, 2019 at 02:31:27PM +0200, Nikolay Borisov wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> The pending chunks list contains chunks that are allocated in the
>> current transaction but haven't been created yet. The pinned chunks
>> list contains chunks that are being released in the current transaction.
>> Both describe chunks that are not reflected on disk as in use but are
>> unavailable just the same.
>>
>> The pending chunks list is anchored by the transaction handle, which
>> means that we need to hold a reference to a transaction when working
>> with the list.
>>
>> We use these lists to ensure that we don't end up discarding chunks
>> that are allocated or released in the current transaction.  What we r
>>
>> The way we use them is by iterating over both lists to perform
>> comparisons on the stripes they describe for each device. This is
>> backwards and requires that we keep a transaction handle open while
>> we're trimming.
>>
>> This patchset adds an extent_io_tree to btrfs_device that maintains
>> the allocation state of the device.  Extents are set dirty when
>> chunks are first allocated -- when the extent maps are added to the
>> mapping tree. They're cleared when last removed -- when the extent
>> maps are removed from the mapping tree. This matches the lifespan
>> of the pending and pinned chunks list and allows us to do trims
>> on unallocated space safely without pinning the transaction for what
>> may be a lengthy operation. We can also use this io tree to mark
>> which chunks have already been trimmed so we don't repeat the operation.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  fs/btrfs/ctree.h            |  6 ---
>>  fs/btrfs/disk-io.c          | 11 -----
>>  fs/btrfs/extent-tree.c      | 28 -----------
>>  fs/btrfs/extent_io.c        |  2 +-
>>  fs/btrfs/extent_io.h        |  6 ++-
>>  fs/btrfs/extent_map.c       | 36 ++++++++++++++
>>  fs/btrfs/extent_map.h       |  1 -
>>  fs/btrfs/free-space-cache.c |  4 --
>>  fs/btrfs/transaction.c      |  9 ----
>>  fs/btrfs/transaction.h      |  1 -
>>  fs/btrfs/volumes.c          | 96 +++++++++++++------------------------
>>  fs/btrfs/volumes.h          |  2 +
>>  12 files changed, 76 insertions(+), 126 deletions(-)
>>
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -918,7 +918,7 @@ static void cache_state(struct extent_state *state,
>>   * [start, end] is inclusive This takes the tree lock.
>>   */
>>  
>> -static int __must_check
>> +int __must_check
>>  __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>>  		 unsigned bits, unsigned exclusive_bits,
>>  		 u64 *failed_start, struct extent_state **cached_state,
> 
> Does this really need to be exported again? There are helpers that can
> wrap specific combinations of parameters.

This is exported so that __set_extent_bit could be called with
GFP_NOWAIT parameter otherwise I was getting lockdep splats since
extent_map_device_set_bits (called from add_extent_mapping) is called
under a write_lock hence we can't sleep.

> 
>> @@ -335,6 +335,8 @@ void btrfs_free_device(struct btrfs_device *device)
>>  {
>>  	WARN_ON(!list_empty(&device->post_commit_list));
>>  	rcu_string_free(device->name);
>> +	if (!in_softirq())
>> +		extent_io_tree_release(&device->alloc_state);

This is used to distinguish between btrfs_free_device being called from
btrfs_close_devices in close_ctree i.e non rcu (hence no softirq )
context or any of the error handlers and from free_device_rcu. In the
latter case the extent tree is already freed in btrfs_close_one_device,
hence there is no need to do it in the RCU callback.

Furthermore, there is also a comment that the extent io tree cannot be
destroyed in RCU context because extent_io_tree_release calls
cond_resched_lock which in turn could sleep, but this is forbidden in
RCU context.

> 
> This needs a comment
> 
>>  	bio_put(device->flush_bio);
>>  	kfree(device);
>>  }
> 
> The commit is quite big but I don't see how to shrink it, the changes
> need to be done in several places. So, probably ok.
> 

  reply	other threads:[~2019-03-25 16:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-25 12:31 [PATCH v3 00/12] FITRIM improvements Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 01/12] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 02/12] btrfs: combine device update operations during transaction commit Nikolay Borisov
2019-03-25 13:44   ` Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
2019-03-25 15:09   ` David Sterba
2019-03-25 15:16   ` David Sterba
2019-03-25 12:31 ` [PATCH v3 04/12] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 05/12] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 06/12] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
2019-03-25 16:12   ` David Sterba
2019-03-25 16:13     ` Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 07/12] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
2019-03-25 14:22   ` David Sterba
2019-03-25 16:26   ` David Sterba
2019-03-25 16:43     ` Nikolay Borisov [this message]
2019-03-25 16:57       ` David Sterba
2019-03-25 12:31 ` [PATCH v3 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 09/12] btrfs: Factor out in_range macro Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 10/12] btrfs: Optimize unallocated chunks discard Nikolay Borisov
2019-03-25 16:29   ` David Sterba
2019-03-25 12:31 ` [PATCH v3 11/12] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
2019-03-25 12:31 ` [PATCH v3 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
2019-03-25 18:44 ` [PATCH v3 00/12] FITRIM improvements Darrick J. Wong
2019-03-26  8:09   ` Nikolay Borisov
2019-03-26 10:50     ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75db110a-db25-e1ff-f74d-997b61532bf7@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.cz \
    --cc=jeffm@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox