[PATCH 0/2] Limit scope of extent locks in btrfs_buffered

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Limit scope of extent locks in btrfs_buffered_write()
@ 2024-08-28 12:45 Goldwyn Rodrigues
  2024-08-28 12:45 ` [PATCH 1/2] btrfs: btrfs_has_ordered_extent() to check for ordered extent in range Goldwyn Rodrigues
  2024-08-28 12:45 ` [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write Goldwyn Rodrigues
  0 siblings, 2 replies; 6+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-28 12:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

This is in preparation for using iomap for btrfs. It will help in
keeping the extent lock limited to iomap_end() function.

The extent locks are taken for a majority of the write sequence. With
Josef patches limiting the extent locks to recording extent changes, we
can reduce the scope of locking of extent locks during write to only
when recording the extent as delalloc. Restrict the locking to
btrfs_dirty_pages(). However, the write needs to make sure that there
are no pending ordered extents. So, wait for any pending ordered extents
before performing the copying of data from userspace.

As for testing, it did go through a round of xfstests without any crash
or failure. However, since this touches a crucial part of write, please
make sure this is correct especially in terms of data corruption by
overwrites/staleness.

Goldwyn Rodrigues (2):
  btrfs: btrfs_has_ordered_extent() to check for ordered extent in range
  btrfs: reduce scope of extent locks during buffered write

 fs/btrfs/file.c         | 109 ++++++----------------------------------
 fs/btrfs/ordered-data.c |  11 ++++
 fs/btrfs/ordered-data.h |   1 +
 3 files changed, 28 insertions(+), 93 deletions(-)

-- 
2.46.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] btrfs: btrfs_has_ordered_extent() to check for ordered extent in range
  2024-08-28 12:45 [PATCH 0/2] Limit scope of extent locks in btrfs_buffered_write() Goldwyn Rodrigues
@ 2024-08-28 12:45 ` Goldwyn Rodrigues
  2024-08-28 13:28   ` Filipe Manana
  2024-08-28 12:45 ` [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write Goldwyn Rodrigues
  1 sibling, 1 reply; 6+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-28 12:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

A simple function which checks if there are ordered extent in range
supplied.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/ordered-data.c | 11 +++++++++++
 fs/btrfs/ordered-data.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index eb9b32ffbc0c..ea8b5fa42454 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -1006,6 +1006,17 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 	return entry;
 }
 
+bool btrfs_has_ordered_extent(struct btrfs_inode *inode, u64 file_offset, u64 len)
+{
+	struct btrfs_ordered_extent *oe = NULL;
+
+	oe = btrfs_lookup_ordered_range(inode, file_offset, len);
+	if (!oe)
+		return false;
+	btrfs_put_ordered_extent(oe);
+	return true;
+}
+
 /*
  * Adds all ordered extents to the given list. The list ends up sorted by the
  * file_offset of the ordered extents.
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 4e152736d06c..0c715c2a01ac 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -201,6 +201,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		struct btrfs_inode *inode,
 		u64 file_offset,
 		u64 len);
+bool btrfs_has_ordered_extent(struct btrfs_inode *inode, u64 file_offset, u64 len);
 void btrfs_get_ordered_extents_for_logging(struct btrfs_inode *inode,
 					   struct list_head *list);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] btrfs: btrfs_has_ordered_extent() to check for ordered extent in range
  2024-08-28 12:45 ` [PATCH 1/2] btrfs: btrfs_has_ordered_extent() to check for ordered extent in range Goldwyn Rodrigues
@ 2024-08-28 13:28   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2024-08-28 13:28 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Wed, Aug 28, 2024 at 1:53 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> A simple function which checks if there are ordered extent in range
> supplied.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/ordered-data.c | 11 +++++++++++
>  fs/btrfs/ordered-data.h |  1 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index eb9b32ffbc0c..ea8b5fa42454 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -1006,6 +1006,17 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
>         return entry;
>  }
>
> +bool btrfs_has_ordered_extent(struct btrfs_inode *inode, u64 file_offset, u64 len)
> +{
> +       struct btrfs_ordered_extent *oe = NULL;

There's no point in this assignment, as we don't ever use the variable
before assigning it below.
Stuff like this has resulted in warnings from tools (and therefore
reports and patches), see for example:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=966de47ff0c9e64d74e1719e4480b7c34f6190fa

Thanks.

> +
> +       oe = btrfs_lookup_ordered_range(inode, file_offset, len);
> +       if (!oe)
> +               return false;
> +       btrfs_put_ordered_extent(oe);
> +       return true;
> +}
> +
>  /*
>   * Adds all ordered extents to the given list. The list ends up sorted by the
>   * file_offset of the ordered extents.
> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
> index 4e152736d06c..0c715c2a01ac 100644
> --- a/fs/btrfs/ordered-data.h
> +++ b/fs/btrfs/ordered-data.h
> @@ -201,6 +201,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
>                 struct btrfs_inode *inode,
>                 u64 file_offset,
>                 u64 len);
> +bool btrfs_has_ordered_extent(struct btrfs_inode *inode, u64 file_offset, u64 len);
>  void btrfs_get_ordered_extents_for_logging(struct btrfs_inode *inode,
>                                            struct list_head *list);
>  u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
> --
> 2.46.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write
  2024-08-28 12:45 [PATCH 0/2] Limit scope of extent locks in btrfs_buffered_write() Goldwyn Rodrigues
  2024-08-28 12:45 ` [PATCH 1/2] btrfs: btrfs_has_ordered_extent() to check for ordered extent in range Goldwyn Rodrigues
@ 2024-08-28 12:45 ` Goldwyn Rodrigues
  2024-08-28 13:17   ` Filipe Manana
  2024-08-28 14:21   ` Filipe Manana
  1 sibling, 2 replies; 6+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-28 12:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Reduce the scope of locking while performing writes. The scope is
limited to changing extent bits, which is in btrfs_dirty_pages().
btrfs_dirty_pages() is also called from __btrfs_write_out_cache(), and
it expects extent locks held. So, perform extent locking around
btrfs_dirty_pages() only.

The inode lock will insure that no other process is writing to this
file, so we don't need to worry about multiple processes dirtying
folios.

However, the write has to make sure that there are no ordered extents in
the range specified. So, call btrfs_wait_ordered_range() before
initiating the write. In case of nowait, bail if there exists an
ordered range within the write range.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/file.c | 109 +++++++-----------------------------------------
 1 file changed, 16 insertions(+), 93 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 76f4cc686af9..6c5f712bfa0f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -976,83 +976,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 
 }
 
-/*
- * This function locks the extent and properly waits for data=ordered extents
- * to finish before allowing the pages to be modified if need.
- *
- * The return value:
- * 1 - the extent is locked
- * 0 - the extent is not locked, and everything is OK
- * -EAGAIN - need re-prepare the pages
- * the other < 0 number - Something wrong happens
- */
-static noinline int
-lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
-				size_t num_pages, loff_t pos,
-				size_t write_bytes,
-				u64 *lockstart, u64 *lockend, bool nowait,
-				struct extent_state **cached_state)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u64 start_pos;
-	u64 last_pos;
-	int i;
-	int ret = 0;
-
-	start_pos = round_down(pos, fs_info->sectorsize);
-	last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
-
-	if (start_pos < inode->vfs_inode.i_size) {
-		struct btrfs_ordered_extent *ordered;
-
-		if (nowait) {
-			if (!try_lock_extent(&inode->io_tree, start_pos, last_pos,
-					     cached_state)) {
-				for (i = 0; i < num_pages; i++) {
-					unlock_page(pages[i]);
-					put_page(pages[i]);
-					pages[i] = NULL;
-				}
-
-				return -EAGAIN;
-			}
-		} else {
-			lock_extent(&inode->io_tree, start_pos, last_pos, cached_state);
-		}
-
-		ordered = btrfs_lookup_ordered_range(inode, start_pos,
-						     last_pos - start_pos + 1);
-		if (ordered &&
-		    ordered->file_offset + ordered->num_bytes > start_pos &&
-		    ordered->file_offset <= last_pos) {
-			unlock_extent(&inode->io_tree, start_pos, last_pos,
-				      cached_state);
-			for (i = 0; i < num_pages; i++) {
-				unlock_page(pages[i]);
-				put_page(pages[i]);
-			}
-			btrfs_start_ordered_extent(ordered);
-			btrfs_put_ordered_extent(ordered);
-			return -EAGAIN;
-		}
-		if (ordered)
-			btrfs_put_ordered_extent(ordered);
-
-		*lockstart = start_pos;
-		*lockend = last_pos;
-		ret = 1;
-	}
-
-	/*
-	 * We should be called after prepare_pages() which should have locked
-	 * all pages in the range.
-	 */
-	for (i = 0; i < num_pages; i++)
-		WARN_ON(!PageLocked(pages[i]));
-
-	return ret;
-}
-
 /*
  * Check if we can do nocow write into the range [@pos, @pos + @write_bytes)
  *
@@ -1246,7 +1169,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 		size_t copied;
 		size_t dirty_sectors;
 		size_t num_sectors;
-		int extents_locked;
+		int extents_locked = false;
 
 		/*
 		 * Fault pages before locking them in prepare_pages
@@ -1310,13 +1233,19 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 		}
 
 		release_bytes = reserve_bytes;
-again:
 		ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
 			break;
 		}
 
+		if (nowait && !btrfs_has_ordered_extent(BTRFS_I(inode), pos, write_bytes)) {
+			btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+			break;
+		} else {
+			btrfs_wait_ordered_range(BTRFS_I(inode), pos, write_bytes);
+		}
+
 		/*
 		 * This is going to setup the pages array with the number of
 		 * pages we want, so we don't really need to worry about the
@@ -1330,20 +1259,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 			break;
 		}
 
-		extents_locked = lock_and_cleanup_extent_if_need(
-				BTRFS_I(inode), pages,
-				num_pages, pos, write_bytes, &lockstart,
-				&lockend, nowait, &cached_state);
-		if (extents_locked < 0) {
-			if (!nowait && extents_locked == -EAGAIN)
-				goto again;
-
-			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
-			ret = extents_locked;
-			break;
-		}
-
 		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
 
 		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
@@ -1389,6 +1304,14 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 		release_bytes = round_up(copied + sector_offset,
 					fs_info->sectorsize);
 
+		if (pos < inode->i_size) {
+			lockstart = round_down(pos, fs_info->sectorsize);
+			lockend = round_up(pos + copied, fs_info->sectorsize);
+			lock_extent(&BTRFS_I(inode)->io_tree, lockstart,
+				      lockend, &cached_state);
+			extents_locked = true;
+		}
+
 		ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
 					dirty_pages, pos, copied,
 					&cached_state, only_release_metadata);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write
  2024-08-28 12:45 ` [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write Goldwyn Rodrigues
@ 2024-08-28 13:17   ` Filipe Manana
  2024-08-28 14:21   ` Filipe Manana
  1 sibling, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2024-08-28 13:17 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Wed, Aug 28, 2024 at 1:54 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> Reduce the scope of locking while performing writes. The scope is
> limited to changing extent bits, which is in btrfs_dirty_pages().
> btrfs_dirty_pages() is also called from __btrfs_write_out_cache(), and
> it expects extent locks held. So, perform extent locking around
> btrfs_dirty_pages() only.
>
> The inode lock will insure that no other process is writing to this
> file, so we don't need to worry about multiple processes dirtying
> folios.
>
> However, the write has to make sure that there are no ordered extents in
> the range specified. So, call btrfs_wait_ordered_range() before
> initiating the write. In case of nowait, bail if there exists an
> ordered range within the write range.

if there exists an ordered range -> if there exists an ordered extent

>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/file.c | 109 +++++++-----------------------------------------
>  1 file changed, 16 insertions(+), 93 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 76f4cc686af9..6c5f712bfa0f 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -976,83 +976,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
>
>  }
>
> -/*
> - * This function locks the extent and properly waits for data=ordered extents
> - * to finish before allowing the pages to be modified if need.
> - *
> - * The return value:
> - * 1 - the extent is locked
> - * 0 - the extent is not locked, and everything is OK
> - * -EAGAIN - need re-prepare the pages
> - * the other < 0 number - Something wrong happens
> - */
> -static noinline int
> -lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
> -                               size_t num_pages, loff_t pos,
> -                               size_t write_bytes,
> -                               u64 *lockstart, u64 *lockend, bool nowait,
> -                               struct extent_state **cached_state)
> -{
> -       struct btrfs_fs_info *fs_info = inode->root->fs_info;
> -       u64 start_pos;
> -       u64 last_pos;
> -       int i;
> -       int ret = 0;
> -
> -       start_pos = round_down(pos, fs_info->sectorsize);
> -       last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
> -
> -       if (start_pos < inode->vfs_inode.i_size) {
> -               struct btrfs_ordered_extent *ordered;
> -
> -               if (nowait) {
> -                       if (!try_lock_extent(&inode->io_tree, start_pos, last_pos,
> -                                            cached_state)) {
> -                               for (i = 0; i < num_pages; i++) {
> -                                       unlock_page(pages[i]);
> -                                       put_page(pages[i]);
> -                                       pages[i] = NULL;
> -                               }
> -
> -                               return -EAGAIN;
> -                       }
> -               } else {
> -                       lock_extent(&inode->io_tree, start_pos, last_pos, cached_state);
> -               }
> -
> -               ordered = btrfs_lookup_ordered_range(inode, start_pos,
> -                                                    last_pos - start_pos + 1);
> -               if (ordered &&
> -                   ordered->file_offset + ordered->num_bytes > start_pos &&
> -                   ordered->file_offset <= last_pos) {
> -                       unlock_extent(&inode->io_tree, start_pos, last_pos,
> -                                     cached_state);
> -                       for (i = 0; i < num_pages; i++) {
> -                               unlock_page(pages[i]);
> -                               put_page(pages[i]);
> -                       }
> -                       btrfs_start_ordered_extent(ordered);
> -                       btrfs_put_ordered_extent(ordered);
> -                       return -EAGAIN;
> -               }
> -               if (ordered)
> -                       btrfs_put_ordered_extent(ordered);
> -
> -               *lockstart = start_pos;
> -               *lockend = last_pos;
> -               ret = 1;
> -       }
> -
> -       /*
> -        * We should be called after prepare_pages() which should have locked
> -        * all pages in the range.
> -        */
> -       for (i = 0; i < num_pages; i++)
> -               WARN_ON(!PageLocked(pages[i]));
> -
> -       return ret;
> -}
> -
>  /*
>   * Check if we can do nocow write into the range [@pos, @pos + @write_bytes)
>   *
> @@ -1246,7 +1169,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 size_t copied;
>                 size_t dirty_sectors;
>                 size_t num_sectors;
> -               int extents_locked;
> +               int extents_locked = false;

This is an int. So either assign to 0 or change the type to bool (preferable).

>
>                 /*
>                  * Fault pages before locking them in prepare_pages
> @@ -1310,13 +1233,19 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 }
>
>                 release_bytes = reserve_bytes;
> -again:
>                 ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
>                 if (ret) {
>                         btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
>                         break;
>                 }
>
> +               if (nowait && !btrfs_has_ordered_extent(BTRFS_I(inode), pos, write_bytes)) {

The logic is not correct, the ! should be removed.
I.e. if it's a nowait write and there are ordered extents, we want to
stop because it would make us block waiting for ordered extents.

> +                       btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
> +                       break;

Before the break we also need a:  ret = -EAGAIN

We should also prevent looking up ordered extents if the write starts
at or beyond i_size, just like before this patch.

> +               } else {
> +                       btrfs_wait_ordered_range(BTRFS_I(inode), pos, write_bytes);
> +               }
> +
>                 /*
>                  * This is going to setup the pages array with the number of
>                  * pages we want, so we don't really need to worry about the
> @@ -1330,20 +1259,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                         break;
>                 }
>
> -               extents_locked = lock_and_cleanup_extent_if_need(
> -                               BTRFS_I(inode), pages,
> -                               num_pages, pos, write_bytes, &lockstart,
> -                               &lockend, nowait, &cached_state);
> -               if (extents_locked < 0) {
> -                       if (!nowait && extents_locked == -EAGAIN)
> -                               goto again;
> -
> -                       btrfs_delalloc_release_extents(BTRFS_I(inode),
> -                                                      reserve_bytes);
> -                       ret = extents_locked;
> -                       break;
> -               }
> -
>                 copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
>
>                 num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
> @@ -1389,6 +1304,14 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 release_bytes = round_up(copied + sector_offset,
>                                         fs_info->sectorsize);
>
> +               if (pos < inode->i_size) {
> +                       lockstart = round_down(pos, fs_info->sectorsize);
> +                       lockend = round_up(pos + copied, fs_info->sectorsize);
> +                       lock_extent(&BTRFS_I(inode)->io_tree, lockstart,
> +                                     lockend, &cached_state);

We should respect the nowait semantics and do a try_lock_extent() if
we're a nowait write.


> +                       extents_locked = true;

Same here, the type is int and not bool.

Thanks.

> +               }
> +
>                 ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
>                                         dirty_pages, pos, copied,
>                                         &cached_state, only_release_metadata);
> --
> 2.46.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write
  2024-08-28 12:45 ` [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write Goldwyn Rodrigues
  2024-08-28 13:17   ` Filipe Manana
@ 2024-08-28 14:21   ` Filipe Manana
  1 sibling, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2024-08-28 14:21 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues

On Wed, Aug 28, 2024 at 1:54 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> Reduce the scope of locking while performing writes. The scope is
> limited to changing extent bits, which is in btrfs_dirty_pages().
> btrfs_dirty_pages() is also called from __btrfs_write_out_cache(), and
> it expects extent locks held. So, perform extent locking around
> btrfs_dirty_pages() only.

There's one fundamental problem I missed before. Comments inlined below.

>
> The inode lock will insure that no other process is writing to this
> file, so we don't need to worry about multiple processes dirtying
> folios.

Well, there's an exception: memory mapped writes, as they don't take
the inode lock.

So this patch introduces a race, see below.

>
> However, the write has to make sure that there are no ordered extents in
> the range specified. So, call btrfs_wait_ordered_range() before
> initiating the write. In case of nowait, bail if there exists an
> ordered range within the write range.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
>  fs/btrfs/file.c | 109 +++++++-----------------------------------------
>  1 file changed, 16 insertions(+), 93 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 76f4cc686af9..6c5f712bfa0f 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -976,83 +976,6 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
>
>  }
>
> -/*
> - * This function locks the extent and properly waits for data=ordered extents
> - * to finish before allowing the pages to be modified if need.
> - *
> - * The return value:
> - * 1 - the extent is locked
> - * 0 - the extent is not locked, and everything is OK
> - * -EAGAIN - need re-prepare the pages
> - * the other < 0 number - Something wrong happens
> - */
> -static noinline int
> -lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
> -                               size_t num_pages, loff_t pos,
> -                               size_t write_bytes,
> -                               u64 *lockstart, u64 *lockend, bool nowait,
> -                               struct extent_state **cached_state)
> -{
> -       struct btrfs_fs_info *fs_info = inode->root->fs_info;
> -       u64 start_pos;
> -       u64 last_pos;
> -       int i;
> -       int ret = 0;
> -
> -       start_pos = round_down(pos, fs_info->sectorsize);
> -       last_pos = round_up(pos + write_bytes, fs_info->sectorsize) - 1;
> -
> -       if (start_pos < inode->vfs_inode.i_size) {
> -               struct btrfs_ordered_extent *ordered;
> -
> -               if (nowait) {
> -                       if (!try_lock_extent(&inode->io_tree, start_pos, last_pos,
> -                                            cached_state)) {
> -                               for (i = 0; i < num_pages; i++) {
> -                                       unlock_page(pages[i]);
> -                                       put_page(pages[i]);
> -                                       pages[i] = NULL;
> -                               }
> -
> -                               return -EAGAIN;
> -                       }
> -               } else {
> -                       lock_extent(&inode->io_tree, start_pos, last_pos, cached_state);
> -               }
> -
> -               ordered = btrfs_lookup_ordered_range(inode, start_pos,
> -                                                    last_pos - start_pos + 1);
> -               if (ordered &&
> -                   ordered->file_offset + ordered->num_bytes > start_pos &&
> -                   ordered->file_offset <= last_pos) {
> -                       unlock_extent(&inode->io_tree, start_pos, last_pos,
> -                                     cached_state);
> -                       for (i = 0; i < num_pages; i++) {
> -                               unlock_page(pages[i]);
> -                               put_page(pages[i]);
> -                       }
> -                       btrfs_start_ordered_extent(ordered);
> -                       btrfs_put_ordered_extent(ordered);
> -                       return -EAGAIN;
> -               }
> -               if (ordered)
> -                       btrfs_put_ordered_extent(ordered);
> -
> -               *lockstart = start_pos;
> -               *lockend = last_pos;
> -               ret = 1;
> -       }
> -
> -       /*
> -        * We should be called after prepare_pages() which should have locked
> -        * all pages in the range.
> -        */
> -       for (i = 0; i < num_pages; i++)
> -               WARN_ON(!PageLocked(pages[i]));
> -
> -       return ret;
> -}
> -
>  /*
>   * Check if we can do nocow write into the range [@pos, @pos + @write_bytes)
>   *
> @@ -1246,7 +1169,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 size_t copied;
>                 size_t dirty_sectors;
>                 size_t num_sectors;
> -               int extents_locked;
> +               int extents_locked = false;
>
>                 /*
>                  * Fault pages before locking them in prepare_pages
> @@ -1310,13 +1233,19 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 }
>
>                 release_bytes = reserve_bytes;
> -again:
>                 ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
>                 if (ret) {
>                         btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
>                         break;
>                 }
>
> +               if (nowait && !btrfs_has_ordered_extent(BTRFS_I(inode), pos, write_bytes)) {
> +                       btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
> +                       break;
> +               } else {
> +                       btrfs_wait_ordered_range(BTRFS_I(inode), pos, write_bytes);

So after waiting for ordered extents, we call prepare_pages() below,
which is what gets and locks the pages.

But after this wait and before we lock the pages, a mmap write can
happen, it dirties pages and if after it completes and before we lock
the pages at prepare_pages(), delalloc is flushed, an ordered extent
for the range is created - and we will miss it here and not wait for
it to complete.

Before this patch that couldn't happen, as we always lock the extent
range before locking pages.

Thanks.


> +               }
> +
>                 /*
>                  * This is going to setup the pages array with the number of
>                  * pages we want, so we don't really need to worry about the
> @@ -1330,20 +1259,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                         break;
>                 }
>
> -               extents_locked = lock_and_cleanup_extent_if_need(
> -                               BTRFS_I(inode), pages,
> -                               num_pages, pos, write_bytes, &lockstart,
> -                               &lockend, nowait, &cached_state);
> -               if (extents_locked < 0) {
> -                       if (!nowait && extents_locked == -EAGAIN)
> -                               goto again;
> -
> -                       btrfs_delalloc_release_extents(BTRFS_I(inode),
> -                                                      reserve_bytes);
> -                       ret = extents_locked;
> -                       break;
> -               }
> -
>                 copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
>
>                 num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
> @@ -1389,6 +1304,14 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 release_bytes = round_up(copied + sector_offset,
>                                         fs_info->sectorsize);
>
> +               if (pos < inode->i_size) {
> +                       lockstart = round_down(pos, fs_info->sectorsize);
> +                       lockend = round_up(pos + copied, fs_info->sectorsize);
> +                       lock_extent(&BTRFS_I(inode)->io_tree, lockstart,
> +                                     lockend, &cached_state);
> +                       extents_locked = true;
> +               }
> +
>                 ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
>                                         dirty_pages, pos, copied,
>                                         &cached_state, only_release_metadata);
> --
> 2.46.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-28 14:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-28 12:45 [PATCH 0/2] Limit scope of extent locks in btrfs_buffered_write() Goldwyn Rodrigues
2024-08-28 12:45 ` [PATCH 1/2] btrfs: btrfs_has_ordered_extent() to check for ordered extent in range Goldwyn Rodrigues
2024-08-28 13:28   ` Filipe Manana
2024-08-28 12:45 ` [PATCH 2/2] btrfs: reduce scope of extent locks during buffered write Goldwyn Rodrigues
2024-08-28 13:17   ` Filipe Manana
2024-08-28 14:21   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox