Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
Date: Wed, 6 Jan 2021 14:54:46 +0800	[thread overview]
Message-ID: <844a6c6b-c5e9-c0c8-0177-fe3ae015c108@suse.com> (raw)
In-Reply-To: <20210106010201.37864-9-wqu@suse.com>



On 2021/1/6 上午9:01, Qu Wenruo wrote:
> For subpage case, we need to allocate new memory for each metadata page.
> 
> So we need to:
> - Allow attach_extent_buffer_page() to return int
>    To indicate allocation failure
> 
> - Prealloc page->private for alloc_extent_buffer()
>    We don't want to call memory allocation with spinlock hold, so
>    do preallocation before we acquire the spin lock.
> 
> - Handle subpage and regular case differently in
>    attach_extent_buffer_page()
>    For regular case, just do the usual thing.
>    For subpage case, allocate new memory and update the tree_block
>    bitmap.
> 
>    The bitmap update will be handled by new subpage specific helper,
>    btrfs_subpage_set_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 74 ++++++++++++++++++++++++++++++++++----------
>   fs/btrfs/subpage.h   | 50 ++++++++++++++++++++++++++++++
>   2 files changed, 108 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d60f1837f8fb..2eeff925450f 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -24,6 +24,7 @@
>   #include "rcu-string.h"
>   #include "backref.h"
>   #include "disk-io.h"
> +#include "subpage.h"
>   
>   static struct kmem_cache *extent_state_cache;
>   static struct kmem_cache *extent_buffer_cache;
> @@ -3140,22 +3141,41 @@ static int submit_extent_page(unsigned int opf,
>   	return ret;
>   }
>   
> -static void attach_extent_buffer_page(struct extent_buffer *eb,
> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>   				      struct page *page)
>   {
> -	/*
> -	 * If the page is mapped to btree inode, we should hold the private
> -	 * lock to prevent race.
> -	 * For cloned or dummy extent buffers, their pages are not mapped and
> -	 * will not race with any other ebs.
> -	 */
> -	if (page->mapping)
> -		lockdep_assert_held(&page->mapping->private_lock);
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	int ret;
>   
> -	if (!PagePrivate(page))
> -		attach_page_private(page, eb);
> -	else
> -		WARN_ON(page->private != (unsigned long)eb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		/*
> +		 * If the page is mapped to btree inode, we should hold the
> +		 * private lock to prevent race.
> +		 * For cloned or dummy extent buffers, their pages are not
> +		 * mapped and will not race with any other ebs.
> +		 */
> +		if (page->mapping)
> +			lockdep_assert_held(&page->mapping->private_lock);
> +
> +		if (!PagePrivate(page))
> +			attach_page_private(page, eb);
> +		else
> +			WARN_ON(page->private != (unsigned long)eb);
> +		return 0;
> +	}
> +
> +	/* Already mapped, just update the existing range */
> +	if (PagePrivate(page))
> +		goto update_bitmap;
> +
> +	/* Do new allocation to attach subpage */
> +	ret = btrfs_attach_subpage(fs_info, page);
> +	if (ret < 0)
> +		return ret;
> +
> +update_bitmap:
> +	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
> +	return 0;
>   }
>   
>   void set_page_extent_mapped(struct page *page)
> @@ -5063,21 +5083,29 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>   	if (new == NULL)
>   		return NULL;
>   
> +	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
> +	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
> +
>   	for (i = 0; i < num_pages; i++) {
> +		int ret;
> +
>   		p = alloc_page(GFP_NOFS);
>   		if (!p) {
>   			btrfs_release_extent_buffer(new);
>   			return NULL;
>   		}
> -		attach_extent_buffer_page(new, p);
> +		ret = attach_extent_buffer_page(new, p);
> +		if (ret < 0) {
> +			put_page(p);
> +			btrfs_release_extent_buffer(new);
> +			return NULL;
> +		}
>   		WARN_ON(PageDirty(p));
>   		SetPageUptodate(p);
>   		new->pages[i] = p;
>   		copy_page(page_address(p), page_address(src->pages[i]));
>   	}
>   
> -	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
> -	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
>   
>   	return new;
>   }
> @@ -5316,6 +5344,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			goto free_eb;
>   		}
>   
> +		/*
> +		 * Preallocate page->private for subpage case, so that
> +		 * we won't allocate memory with private_lock hold.
> +		 */
> +		ret = btrfs_attach_subpage(fs_info, p);

Although we try to preallocate the subpage structure here, it's not 
reliable.

I have hit a case just minutes before, where we still try to allocate 
memory inside that private_lock spinlock, and causes sleep inside atomic 
warning.

The problem is, we can have a race where the page has one existing eb, 
and it's being freed.

At this point, we still have page::private.

But before we acquire that private_lock, the eb get freed and since it's 
the only eb of that page (our eb hasn't yet being added to the page), it 
detach page private.

> +		if (ret < 0) {
> +			unlock_page(p);
> +			put_page(p);
> +			exists = ERR_PTR(-ENOMEM);
> +			goto free_eb;
> +		}
> +
>   		spin_lock(&mapping->private_lock);
>   		exists = grab_extent_buffer(p);
>   		if (exists) {
> @@ -5325,8 +5365,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			mark_extent_buffer_accessed(exists, p);
>   			goto free_eb;
>   		}
> +		/* Should not fail, as we have attached the subpage already */
>   		attach_extent_buffer_page(eb, p);

So here we can not rely on any result before we acquire private_lock.

Thus I guess we have to pre-allocate the memory manually and pass the 
pointer in.

Why I didn't hit the bug before sending the patches...

Thanks,
Qu

>   		spin_unlock(&mapping->private_lock);
> +
>   		WARN_ON(PageDirty(p));
>   		eb->pages[i] = p;
>   		if (!PageUptodate(p))
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 96f3b226913e..e49d4a7329e1 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -23,9 +23,59 @@
>   struct btrfs_subpage {
>   	/* Common members for both data and metadata pages */
>   	spinlock_t lock;
> +	union {
> +		/* Structures only used by metadata */
> +		struct {
> +			u16 tree_block_bitmap;
> +		};
> +		/* structures only used by data */
> +	};
>   };
>   
>   int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>   void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>   
> +/*
> + * Convert the [start, start + len) range into a u16 bitmap
> + *
> + * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
> + */
> +static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
> +	int nbits = len >> fs_info->sectorsize_bits;
> +
> +	/* Basic checks */
> +	ASSERT(PagePrivate(page) && page->private);
> +	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
> +	       IS_ALIGNED(len, fs_info->sectorsize));
> +
> +	/*
> +	 * The range check only works for mapped page, we can
> +	 * still have unampped page like dummy extent buffer pages.
> +	 */
> +	if (page->mapping)
> +		ASSERT(page_offset(page) <= start &&
> +			start + len <= page_offset(page) + PAGE_SIZE);
> +	/*
> +	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
> +	 * first left shift to be calculated in unsigned long (u32), then
> +	 * truncate the result to u16.
> +	 */
> +	return (u16)(((1UL << nbits) - 1) << bit_start);
> +}
> +
> +static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	unsigned long flags;
> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->tree_block_bitmap |= tmp;
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
>   #endif /* BTRFS_SUBPAGE_H */
> 


  reply	other threads:[~2021-01-06  7:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 01/22] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 02/22] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 03/22] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 04/22] btrfs: extent_io: update locked page dirty/writeback/error bits in __process_pages_contig() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 05/22] btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into PAGE_START_WRITEBACK Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
2021-01-12 15:08   ` David Sterba
2021-01-06  1:01 ` [PATCH v3 07/22] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
2021-01-06  6:54   ` Qu Wenruo [this message]
2021-01-07  1:35   ` Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 09/22] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 10/22] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 11/22] btrfs: extent_io: attach private to dummy extent buffer pages Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 12/22] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 13/22] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 14/22] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 15/22] btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage compatible Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2021-01-06  8:24   ` Qu Wenruo
2021-01-06  8:43     ` Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 17/22] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 18/22] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 19/22] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
2021-01-06  5:04   ` kernel test robot
2021-01-06  5:32     ` Qu Wenruo
2021-01-06  6:48       ` Rong Chen
2021-01-09  9:53   ` kernel test robot
2021-01-06  1:02 ` [PATCH v3 21/22] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
2021-01-06  1:02 ` [PATCH v3 22/22] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2021-01-12 15:14 ` [PATCH v3 00/22] btrfs: add read-only support for subpage sector size David Sterba
2021-01-13  5:06   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=844a6c6b-c5e9-c0c8-0177-fe3ae015c108@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox