Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
Date: Fri, 11 Dec 2020 18:57:46 +0200	[thread overview]
Message-ID: <047a66b1-5804-ac05-26ac-4eaf71f5c4df@suse.com> (raw)
In-Reply-To: <a2732cae-4dea-744e-2eda-8b8e5f2b6710@suse.com>

On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>
>>
>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>> Unlike the original try_release_extent_buffer,
>>> try_release_subpage_extent_buffer() will iterate through
>>> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 73 insertions(+)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index 141e414b1ab9..4d55803302e9 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>>>  	}
>>>  }
>>>  
>>> +static int try_release_subpage_extent_buffer(struct page *page)
>>> +{
>>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>>> +	u64 page_start = page_offset(page);
>>> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>>
>> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
>> terminating condition
>>
>>> +	int bit_start = 0;
>>> +	int ret;
>>> +
>>> +	while (bit_start < bitmap_size) {
>>
>> You really want to iterate for a fixed number of items so switch that to
>> a for loop.
> 
> The problem here is, it's not always fixed.
> 
> If it finds one bit set, it will skip (nodesize >> sectorsize_bits) bits.
> 
> But if not found, it will skip to just next bit.
> 
> Thus I'm not sure if for loop is really a good choice here for
> differential step.
> 
>>
>>> +		struct btrfs_subpage *subpage;
>>> +		struct extent_buffer *eb;
>>> +		unsigned long flags;
>>> +		u16 tmp = 1 << bit_start;
>>> +		u64 start;
>>> +
>>> +		/*
>>> +		 * Make sure the page still has private, as previous run can
>>> +		 * detach the private
>>> +		 */
>>
>> But if previous run has run it would have disposed of this eb and you
>> won't find this page at all, no ?
> 
> For the "previous run" I mean, previous iteration in the same loop.
> 
> E.g. the page has 4 bits set, just one eb (16K nodesize).

Isn't it guaranteed that if you iterate the eb's in a page if you meet
an empty block then the whole extent buffer is gone, hence instead of
doing bit_start++ you ought to also increment by the size of nodesize.

For example, assume a page contains 4 EBs:

0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
x|x|x|x|0|0|0|0|x|x| x|x |0 | 0|0 |0

So first bit is set, so you proceed to call release_extent_buffer on it,
which clears the first 4 bits in tree_block_bitmap, in this case you've
incremented by nodesize so next iteration begins at index 4. You detect
it's unset (0) hence you increment it byte 1 and you repeat this for the
next 3 bits, then you free the whole of the next eb. I argue that you
also need to increment by nodesize in the case of a bit which is not
set, because you cannot really see partially freed eb i.e you cannot see
the following state:

0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
x|x|x|x|x|0|0|0|x|x| x|x |0 | 0|0 |0

Am I missing something?

> 
> For the first run, it release the only eb of the page, and cleared page
> private.
> For the second run, since private is cleared, we need to break out.
> 
>>
>>> +		spin_lock(&page->mapping->private_lock);
>>> +		if (!PagePrivate(page)) {
>>> +			spin_unlock(&page->mapping->private_lock);
>>> +			break;
>>> +		}

Aren't we guaranteed that a page has private if this function is called ?

>>> +		subpage = (struct btrfs_subpage *)page->private;
>>> +		spin_unlock(&page->mapping->private_lock);
>>> +
>>> +		spin_lock_irqsave(&subpage->lock, flags);
>>> +		if (!(tmp & subpage->tree_block_bitmap))  {
>>> +			spin_unlock_irqrestore(&subpage->lock, flags);
>>> +			bit_start++;
>>> +			continue;
>>> +		}
>>> +		spin_unlock_irqrestore(&subpage->lock, flags);
>>> +
>>> +		start = bit_start * fs_info->sectorsize + page_start;
>>> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;

<snip>

> Thanks,
> Qu
>>
>>
>>> +		/*
>>> +		 * Here we can't call find_extent_buffer() which will increase
>>> +		 * eb->refs.
>>> +		 */
>>> +		rcu_read_lock();
>>> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
>>> +				start >> fs_info->sectorsize_bits);
>>> +		rcu_read_unlock();

Your usage of radix_tree_lookup + rcu lock is wrong. rcu guarantees that
an EB you get won't be freed while the rcu section is active, however
you get a reference to the EB and you do not increment the ref count
WHILE holding the RCU critical section, consult find_extent_buffer
what's the correct usage pattern.

Frankly the locking in this function is insane, first mapping->private
lock is acquired to check if Page_private is set and then page->private
is referenced but that is not signalled at all. Then subpage->lock is
taken to check the tree_block_bitmap, then the lock is dropped. At that
point no locks are held so this page could possibly be referenced by
someone else? Then the buggy locking is used to get the eb, then you
lock refs_lock and call release_extent_buffer...

>>> +		ASSERT(eb);

Doing this outside of the rcu read side critical section _without_
incrementing the ref count is buggy!

>>> +		spin_lock(&eb->refs_lock);
>>> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
>>> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
>>> +			spin_unlock(&eb->refs_lock);
>>> +			continue;
>>> +		}
>>> +		/*

<snip>

next prev parent reply	other threads:[~2020-12-11 18:11 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
2020-12-17 15:44   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
2020-12-10 12:12   ` Nikolay Borisov
2020-12-10 12:53     ` Qu Wenruo
2020-12-10 12:58       ` Nikolay Borisov
2020-12-17 15:43   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
2020-12-10 12:16   ` Nikolay Borisov
2020-12-10  6:38 ` [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
2020-12-10 13:51   ` Nikolay Borisov
2020-12-17 15:50   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
2020-12-17 15:52   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
2020-12-10 15:30   ` Nikolay Borisov
2020-12-17  6:48     ` Qu Wenruo
2020-12-10 16:09   ` Nikolay Borisov
2020-12-17 16:00   ` Josef Bacik
2020-12-18  0:44     ` Qu Wenruo
2020-12-18 15:41       ` Josef Bacik
2020-12-19  0:24         ` Qu Wenruo
2020-12-21 10:15           ` Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
2020-12-10 15:39   ` Nikolay Borisov
2020-12-17  6:55     ` Qu Wenruo
2020-12-17 16:02   ` Josef Bacik
2020-12-18  0:49     ` Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
2020-12-10 16:13   ` Nikolay Borisov
2020-12-10  6:38 ` [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
2020-12-11 10:10   ` Nikolay Borisov
2020-12-11 10:48     ` Qu Wenruo
2020-12-11 11:41       ` Nikolay Borisov
2020-12-11 11:56         ` Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 10/18] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 11/18] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2020-12-11 12:00   ` Nikolay Borisov
2020-12-11 12:11     ` Qu Wenruo
2020-12-11 16:57       ` Nikolay Borisov [this message]
2020-12-12  1:28         ` Qu Wenruo
2020-12-12  9:26           ` Nikolay Borisov
2020-12-12 10:26             ` Qu Wenruo
2020-12-12  5:44         ` Qu Wenruo
2020-12-12 10:30           ` Nikolay Borisov
2020-12-12 10:31             ` Qu Wenruo
2020-12-10  6:39 ` [PATCH v2 13/18] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
2020-12-10  6:39 ` [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
2020-12-14  9:57   ` Nikolay Borisov
2020-12-14 10:46     ` Qu Wenruo
2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
2020-12-10 13:24   ` kernel test robot
2020-12-10 13:39   ` kernel test robot
2020-12-14 10:21   ` Nikolay Borisov
2020-12-14 10:50     ` Qu Wenruo
2020-12-14 11:17       ` Nikolay Borisov
2020-12-14 11:32         ` Qu Wenruo
2020-12-14 12:40           ` Nikolay Borisov
2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
2020-12-10  9:44   ` kernel test robot
2020-12-11  0:43   ` kernel test robot
2020-12-14 12:46   ` Nikolay Borisov
2020-12-10  6:39 ` [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
2020-12-14 13:59   ` Nikolay Borisov
2020-12-10  6:39 ` [PATCH v2 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=047a66b1-5804-ac05-26ac-4eaf71f5c4df@suse.com \
    --to=nborisov@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox