[PATCH v3 0/2] btrfs: fix logging unwritten extents after failure in write paths

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH v3 0/2] btrfs: fix logging unwritten extents after failure in write paths
@ 2024-05-17 16:52 fdmanana
  2024-05-17 16:52 ` [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: fdmanana @ 2024-05-17 16:52 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

There's a bug where a fast fsync can log extent maps that were not written
due to an error in a write path or during writeback. This affects both
direct IO writes and buffered writes, and besides the failure depends on
a race due to the fact that ordered extent completion happens in a work
queue and a fast fsync doesn't wait for ordered extent completion before
logging. The details are in the change log of the first patch.

V3: Change the approach of patch 1/2 to not drop extent maps at
    btrfs_finish_ordered_extent() since that runs in irq context and
    dropping an extent map range triggers NOFS extent map allocations,
    which can trigger a reclaim and that can't run in irq context.
    Updated comments and changelog to distinguish differences between
    failures for direct IO writes and buffered writes.

V2: Rework solution since other error paths caused the same problem, make
    it more generic.
    Added more details to change log and comment about what's going on,
    and why reads aren't affected.

    https://lore.kernel.org/linux-btrfs/cover.1715798440.git.fdmanana@suse.com/

V1: https://lore.kernel.org/linux-btrfs/cover.1715688057.git.fdmanana@suse.com/

Filipe Manana (2):
  btrfs: ensure fast fsync waits for ordered extents after a write failure
  btrfs: make btrfs_finish_ordered_extent() return void

 fs/btrfs/btrfs_inode.h  | 19 +++++++++++++------
 fs/btrfs/file.c         | 15 +++++++++++++++
 fs/btrfs/ordered-data.c | 34 ++++++++++++++++++++++++++++++++--
 fs/btrfs/ordered-data.h |  2 +-
 4 files changed, 61 insertions(+), 9 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure
  2024-05-17 16:52 [PATCH v3 0/2] btrfs: fix logging unwritten extents after failure in write paths fdmanana
@ 2024-05-17 16:52 ` fdmanana
  2024-05-18  5:28   ` Qu Wenruo
  2024-05-17 16:52 ` [PATCH v3 2/2] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
  2 siblings, 1 reply; 14+ messages in thread
From: fdmanana @ 2024-05-17 16:52 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

If a write path in COW mode fails, either before submitting a bio for the
new extents or an actual IO error happens, we can end up allowing a fast
fsync to log file extent items that point to unwritten extents.

This is because dropping the extent maps happens when completing ordered
extents, at btrfs_finish_one_ordered(), and the completion of an ordered
extent is executed in a work queue.

This can result in a fast fsync to start logging file extent items based
on existing extent maps before the ordered extents complete, therefore
resulting in a log that has file extent items that point to unwritten
extents, resulting in a corrupt file if a crash happens after and the log
tree is replayed the next time the fs is mounted.

This can happen for both direct IO writes and buffered writes.

For example consider a direct IO write, in COW mode, that fails at
btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
error:

1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
   set to false, meaning an error happened;

2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
   flag;

3) btrfs_finish_ordered_extent() queues the completion of the ordered
   extent - so that btrfs_finish_one_ordered() will be executed later in
   a work queue. That function will drop extent maps in the range when
   it's executed, since the extent maps point to unwritten locations
   (signaled by the BTRFS_ORDERED_IOERR flag);

4) After calling btrfs_finish_ordered_extent() we keep going down the
   write path and unlock the inode;

5) After that a fast fsync starts and locks the inode;

6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
   task sees the extent maps that point to the unwritten locations and
   logs file extent items based on them - it does not know they are
   unwritten, and the fast fsync path does not wait for ordered extents
   to complete, which is an intentional behaviour in order to reduce
   latency.

For the buffered write case, here's one example:

1) A fast fsync begins, and it starts by flushing delalloc and waiting for
   the writeback to complete by calling filemap_fdatawait_range();

2) Flushing the dellaloc created a new extent map X;

3) During the writeback some IO error happened, and at the end io callback
   (end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
   sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
   completion;

4) After queuing the ordered extent completion, the end io callback clears
   the writeback flag from all pages (or folios), and from that moment the
   fast fsync can proceed;

5) The fast fsync proceeds sees extent map X and logs a file extent item
   based on extent map X, resulting in a log that points to an unwritten
   data extent - because the ordered extent completion hasn't run yet, it
   happens only after the logging.

To fix this make btrfs_finish_ordered_extent() set the inode flag
BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
so that a fast fsync will wait for ordered extent completion.

Note that this issues of using extent maps that point to unwritten
locations can not happen for reads, because in read paths we start by
locking the extent range and wait for any ordered extents in the range
to complete before looking for extent maps.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/btrfs_inode.h  | 19 +++++++++++++------
 fs/btrfs/file.c         | 15 +++++++++++++++
 fs/btrfs/ordered-data.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 3c8bc7a8ebdd..26ee7251ad6c 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -56,7 +56,9 @@ enum {
 	 /*
 	  * Always set under the VFS' inode lock, otherwise it can cause races
 	  * during fsync (we start as a fast fsync and then end up in a full
-	  * fsync racing with ordered extent completion).
+	  * fsync racing with ordered extent completion). It's also safe to be
+	  * set during writeback without holding the VFS' inode lock because
+	  * a fsync flushes and waits for delalloc before starting to log.
 	  */
 	BTRFS_INODE_NEEDS_FULL_SYNC,
 	BTRFS_INODE_COPY_EVERYTHING,
@@ -453,14 +455,19 @@ static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode)
 }
 
 /*
- * Should be called while holding the inode's VFS lock in exclusive mode, or
- * while holding the inode's mmap lock (struct btrfs_inode::i_mmap_lock) in
+ * Should be called while holding the inode's VFS lock in exclusive/shared mode,
+ * or while holding the inode's mmap lock (struct btrfs_inode::i_mmap_lock) in
  * either shared or exclusive mode, or in a context where no one else can access
  * the inode concurrently (during inode creation or when loading an inode from
- * disk).
+ * disk). Can also be called in irq context from btrfs_finish_ordered_extent()
+ * because at that point we are either in a direct IO write, in which case the
+ * inode's VFS lock is held, or in end IO callback of a buffered write, in which
+ * case fsync allways waits for writeback completion.
  */
 static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
 {
+	unsigned long flags;
+
 	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
 	/*
 	 * The inode may have been part of a reflink operation in the last
@@ -478,10 +485,10 @@ static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
 	 * while ->last_trans was not yet updated in the current transaction,
 	 * and therefore has a lower value.
 	 */
-	spin_lock(&inode->lock);
+	spin_lock_irqsave(&inode->lock, flags);
 	if (inode->last_reflink_trans < inode->last_trans)
 		inode->last_reflink_trans = inode->last_trans;
-	spin_unlock(&inode->lock);
+	spin_unlock_irqrestore(&inode->lock, flags);
 }
 
 static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0c7c1b42028e..d635bc0c01df 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1894,6 +1894,21 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		btrfs_get_ordered_extents_for_logging(BTRFS_I(inode),
 						      &ctx.ordered_extents);
 		ret = filemap_fdatawait_range(inode->i_mapping, start, end);
+		if (ret)
+			goto out_release_extents;
+
+		/*
+		 * Check again the full sync flag, because it may have been set
+		 * during the end IO callback (end_bbio_data_write() ->
+		 * btrfs_finish_ordered_extent()) in case an error happened and
+		 * we need to wait for ordered extents to complete so that any
+		 * extent maps that point to unwritten locations are dropped and
+		 * we don't log them.
+		 */
+		full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
+				     &BTRFS_I(inode)->runtime_flags);
+		if (full_sync)
+			ret = btrfs_wait_ordered_range(inode, start, len);
 	}
 
 	if (ret)
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 44157e43fd2a..55a9aeed7344 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -388,6 +388,37 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 	ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
 	spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
 
+	/*
+	 * If this is a COW write it means we created new extent maps for the
+	 * range and they point to unwritten locations if we got an error either
+	 * before submitting a bio or during IO.
+	 *
+	 * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
+	 * are queuing its completion below. During completion, at
+	 * btrfs_finish_one_ordered(), we will drop the extent maps for the
+	 * unwritten extents.
+	 *
+	 * However because completion runs in a work queue we can end up having
+	 * a fast fsync running before that. In the case of direct IO, once we
+	 * unlock the inode the fsync might start, and we queue the completion
+	 * before unlocking the inode. In the case of buffered IO when writeback
+	 * finishes (end_bbio_data_write()) we queue the completion, so if the
+	 * writeback was triggered by a fast fsync, the fsync might start
+	 * logging before ordered extent completion runs in the work queue.
+	 *
+	 * The fast fsync will log file extent items based on the extent maps it
+	 * finds, so if by the time it collects extent maps the ordered extent
+	 * completion didn't happen yet, it will log file extent items that
+	 * point to unwritten extents, resulting in a corruption if a crash
+	 * happens and the log tree is replayed. Note that a fast fsync does not
+	 * wait for completion of ordered extents in order to reduce latency.
+	 *
+	 * Set a flag in the inode so that the next fast fsync will wait for
+	 * ordered extents to complete before starting to log.
+	 */
+	if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
+		btrfs_set_inode_full_sync(inode);
+
 	if (ret)
 		btrfs_queue_ordered_fn(ordered);
 	return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/2] btrfs: make btrfs_finish_ordered_extent() return void
  2024-05-17 16:52 [PATCH v3 0/2] btrfs: fix logging unwritten extents after failure in write paths fdmanana
  2024-05-17 16:52 ` [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
@ 2024-05-17 16:52 ` fdmanana
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
  2 siblings, 0 replies; 14+ messages in thread
From: fdmanana @ 2024-05-17 16:52 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Currently btrfs_finish_ordered_extent() returns a boolean indicating if
the ordered extent was added to the work queue for completion, but none
of its callers cares about it, so make it return void.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ordered-data.c | 3 +--
 fs/btrfs/ordered-data.h | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 55a9aeed7344..adc274605776 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -374,7 +374,7 @@ static void btrfs_queue_ordered_fn(struct btrfs_ordered_extent *ordered)
 	btrfs_queue_work(wq, &ordered->work);
 }
 
-bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+void btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 				 struct page *page, u64 file_offset, u64 len,
 				 bool uptodate)
 {
@@ -421,7 +421,6 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 
 	if (ret)
 		btrfs_queue_ordered_fn(ordered);
-	return ret;
 }
 
 /*
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index b6f6c6b91732..bef22179e7c5 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -162,7 +162,7 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent);
 void btrfs_put_ordered_extent(struct btrfs_ordered_extent *entry);
 void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
 				struct btrfs_ordered_extent *entry);
-bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+void btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 				 struct page *page, u64 file_offset, u64 len,
 				 bool uptodate);
 void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure
  2024-05-17 16:52 ` [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
@ 2024-05-18  5:28   ` Qu Wenruo
  2024-05-20  9:46     ` Filipe Manana
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-05-18  5:28 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/5/18 02:22, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> If a write path in COW mode fails, either before submitting a bio for the
> new extents or an actual IO error happens, we can end up allowing a fast
> fsync to log file extent items that point to unwritten extents.
>
> This is because dropping the extent maps happens when completing ordered
> extents, at btrfs_finish_one_ordered(), and the completion of an ordered
> extent is executed in a work queue.
>
> This can result in a fast fsync to start logging file extent items based
> on existing extent maps before the ordered extents complete, therefore
> resulting in a log that has file extent items that point to unwritten
> extents, resulting in a corrupt file if a crash happens after and the log
> tree is replayed the next time the fs is mounted.
>
> This can happen for both direct IO writes and buffered writes.
>
> For example consider a direct IO write, in COW mode, that fails at
> btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
> error:
>
> 1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
>     set to false, meaning an error happened;
>
> 2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
>     flag;
>
> 3) btrfs_finish_ordered_extent() queues the completion of the ordered
>     extent - so that btrfs_finish_one_ordered() will be executed later in
>     a work queue. That function will drop extent maps in the range when
>     it's executed, since the extent maps point to unwritten locations
>     (signaled by the BTRFS_ORDERED_IOERR flag);
>
> 4) After calling btrfs_finish_ordered_extent() we keep going down the
>     write path and unlock the inode;
>
> 5) After that a fast fsync starts and locks the inode;
>
> 6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
>     task sees the extent maps that point to the unwritten locations and
>     logs file extent items based on them - it does not know they are
>     unwritten, and the fast fsync path does not wait for ordered extents
>     to complete, which is an intentional behaviour in order to reduce
>     latency.
>
> For the buffered write case, here's one example:
>
> 1) A fast fsync begins, and it starts by flushing delalloc and waiting for
>     the writeback to complete by calling filemap_fdatawait_range();
>
> 2) Flushing the dellaloc created a new extent map X;
>
> 3) During the writeback some IO error happened, and at the end io callback
>     (end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
>     sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
>     completion;
>
> 4) After queuing the ordered extent completion, the end io callback clears
>     the writeback flag from all pages (or folios), and from that moment the
>     fast fsync can proceed;
>
> 5) The fast fsync proceeds sees extent map X and logs a file extent item
>     based on extent map X, resulting in a log that points to an unwritten
>     data extent - because the ordered extent completion hasn't run yet, it
>     happens only after the logging.
>
> To fix this make btrfs_finish_ordered_extent() set the inode flag
> BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
> so that a fast fsync will wait for ordered extent completion.
>
> Note that this issues of using extent maps that point to unwritten
> locations can not happen for reads, because in read paths we start by
> locking the extent range and wait for any ordered extents in the range
> to complete before looking for extent maps.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Thanks for the updated commit messages, it's much clear for the race window.

And since we no longer try to run finish_ordered_io() inside the endio
function, we should no longer hit the memory allocation warning inside
irq context.

But the inode->lock usage seems unsafe to me, comment inlined below:
[...]
> @@ -478,10 +485,10 @@ static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
>   	 * while ->last_trans was not yet updated in the current transaction,
>   	 * and therefore has a lower value.
>   	 */
> -	spin_lock(&inode->lock);
> +	spin_lock_irqsave(&inode->lock, flags);
>   	if (inode->last_reflink_trans < inode->last_trans)
>   		inode->last_reflink_trans = inode->last_trans;
> -	spin_unlock(&inode->lock);
> +	spin_unlock_irqrestore(&inode->lock, flags);

IIRC this is not how we change the lock usage to be irq safe.
We need all lock users to use irq variants.

Or we can hit situation like:

	Thread A

	spin_lock(&inode->lock);
--- IRQ happens for the endio ---
	spin_lock_irqsave();

Then we dead lock.

Thus all inode->lock users needs to use the irq variant, which would be
a huge change.

I guess if we unconditionally wait for ordered extents inside
btrfs_sync_file() would be too slow?

Thanks,
Qu

>   }
>
>   static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0c7c1b42028e..d635bc0c01df 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1894,6 +1894,21 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>   		btrfs_get_ordered_extents_for_logging(BTRFS_I(inode),
>   						      &ctx.ordered_extents);
>   		ret = filemap_fdatawait_range(inode->i_mapping, start, end);
> +		if (ret)
> +			goto out_release_extents;
> +
> +		/*
> +		 * Check again the full sync flag, because it may have been set
> +		 * during the end IO callback (end_bbio_data_write() ->
> +		 * btrfs_finish_ordered_extent()) in case an error happened and
> +		 * we need to wait for ordered extents to complete so that any
> +		 * extent maps that point to unwritten locations are dropped and
> +		 * we don't log them.
> +		 */
> +		full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> +				     &BTRFS_I(inode)->runtime_flags);
> +		if (full_sync)
> +			ret = btrfs_wait_ordered_range(inode, start, len);
>   	}
>
>   	if (ret)
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 44157e43fd2a..55a9aeed7344 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -388,6 +388,37 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
>   	ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
>   	spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
>
> +	/*
> +	 * If this is a COW write it means we created new extent maps for the
> +	 * range and they point to unwritten locations if we got an error either
> +	 * before submitting a bio or during IO.
> +	 *
> +	 * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
> +	 * are queuing its completion below. During completion, at
> +	 * btrfs_finish_one_ordered(), we will drop the extent maps for the
> +	 * unwritten extents.
> +	 *
> +	 * However because completion runs in a work queue we can end up having
> +	 * a fast fsync running before that. In the case of direct IO, once we
> +	 * unlock the inode the fsync might start, and we queue the completion
> +	 * before unlocking the inode. In the case of buffered IO when writeback
> +	 * finishes (end_bbio_data_write()) we queue the completion, so if the
> +	 * writeback was triggered by a fast fsync, the fsync might start
> +	 * logging before ordered extent completion runs in the work queue.
> +	 *
> +	 * The fast fsync will log file extent items based on the extent maps it
> +	 * finds, so if by the time it collects extent maps the ordered extent
> +	 * completion didn't happen yet, it will log file extent items that
> +	 * point to unwritten extents, resulting in a corruption if a crash
> +	 * happens and the log tree is replayed. Note that a fast fsync does not
> +	 * wait for completion of ordered extents in order to reduce latency.
> +	 *
> +	 * Set a flag in the inode so that the next fast fsync will wait for
> +	 * ordered extents to complete before starting to log.
> +	 */
> +	if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
> +		btrfs_set_inode_full_sync(inode);
> +
>   	if (ret)
>   		btrfs_queue_ordered_fn(ordered);
>   	return ret;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure
  2024-05-18  5:28   ` Qu Wenruo
@ 2024-05-20  9:46     ` Filipe Manana
  0 siblings, 0 replies; 14+ messages in thread
From: Filipe Manana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sat, May 18, 2024 at 6:28 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/5/18 02:22, fdmanana@kernel.org 写道:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > If a write path in COW mode fails, either before submitting a bio for the
> > new extents or an actual IO error happens, we can end up allowing a fast
> > fsync to log file extent items that point to unwritten extents.
> >
> > This is because dropping the extent maps happens when completing ordered
> > extents, at btrfs_finish_one_ordered(), and the completion of an ordered
> > extent is executed in a work queue.
> >
> > This can result in a fast fsync to start logging file extent items based
> > on existing extent maps before the ordered extents complete, therefore
> > resulting in a log that has file extent items that point to unwritten
> > extents, resulting in a corrupt file if a crash happens after and the log
> > tree is replayed the next time the fs is mounted.
> >
> > This can happen for both direct IO writes and buffered writes.
> >
> > For example consider a direct IO write, in COW mode, that fails at
> > btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
> > error:
> >
> > 1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
> >     set to false, meaning an error happened;
> >
> > 2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
> >     flag;
> >
> > 3) btrfs_finish_ordered_extent() queues the completion of the ordered
> >     extent - so that btrfs_finish_one_ordered() will be executed later in
> >     a work queue. That function will drop extent maps in the range when
> >     it's executed, since the extent maps point to unwritten locations
> >     (signaled by the BTRFS_ORDERED_IOERR flag);
> >
> > 4) After calling btrfs_finish_ordered_extent() we keep going down the
> >     write path and unlock the inode;
> >
> > 5) After that a fast fsync starts and locks the inode;
> >
> > 6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
> >     task sees the extent maps that point to the unwritten locations and
> >     logs file extent items based on them - it does not know they are
> >     unwritten, and the fast fsync path does not wait for ordered extents
> >     to complete, which is an intentional behaviour in order to reduce
> >     latency.
> >
> > For the buffered write case, here's one example:
> >
> > 1) A fast fsync begins, and it starts by flushing delalloc and waiting for
> >     the writeback to complete by calling filemap_fdatawait_range();
> >
> > 2) Flushing the dellaloc created a new extent map X;
> >
> > 3) During the writeback some IO error happened, and at the end io callback
> >     (end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
> >     sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
> >     completion;
> >
> > 4) After queuing the ordered extent completion, the end io callback clears
> >     the writeback flag from all pages (or folios), and from that moment the
> >     fast fsync can proceed;
> >
> > 5) The fast fsync proceeds sees extent map X and logs a file extent item
> >     based on extent map X, resulting in a log that points to an unwritten
> >     data extent - because the ordered extent completion hasn't run yet, it
> >     happens only after the logging.
> >
> > To fix this make btrfs_finish_ordered_extent() set the inode flag
> > BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
> > so that a fast fsync will wait for ordered extent completion.
> >
> > Note that this issues of using extent maps that point to unwritten
> > locations can not happen for reads, because in read paths we start by
> > locking the extent range and wait for any ordered extents in the range
> > to complete before looking for extent maps.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Thanks for the updated commit messages, it's much clear for the race window.
>
> And since we no longer try to run finish_ordered_io() inside the endio
> function, we should no longer hit the memory allocation warning inside
> irq context.
>
> But the inode->lock usage seems unsafe to me, comment inlined below:
> [...]
> > @@ -478,10 +485,10 @@ static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
> >        * while ->last_trans was not yet updated in the current transaction,
> >        * and therefore has a lower value.
> >        */
> > -     spin_lock(&inode->lock);
> > +     spin_lock_irqsave(&inode->lock, flags);
> >       if (inode->last_reflink_trans < inode->last_trans)
> >               inode->last_reflink_trans = inode->last_trans;
> > -     spin_unlock(&inode->lock);
> > +     spin_unlock_irqrestore(&inode->lock, flags);
>
> IIRC this is not how we change the lock usage to be irq safe.
> We need all lock users to use irq variants.
>
> Or we can hit situation like:
>
>         Thread A
>
>         spin_lock(&inode->lock);
> --- IRQ happens for the endio ---
>         spin_lock_irqsave();
>
> Then we dead lock.
>
> Thus all inode->lock users needs to use the irq variant, which would be
> a huge change.

Indeed, I missed that, thanks.

>
> I guess if we unconditionally wait for ordered extents inside
> btrfs_sync_file() would be too slow?

No way. Not waiting for ordered extent completion is one of the main
things that makes the fast fsync faster then the full fsync.
It's ok to wait only in case of errors, since they are unexpected and
unlikely, and in error cases the ordered extent completion doesn't do
much (no trees to update).

Fixed in v4, thanks.

>
> Thanks,
> Qu
>
> >   }
> >
> >   static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 0c7c1b42028e..d635bc0c01df 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -1894,6 +1894,21 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >               btrfs_get_ordered_extents_for_logging(BTRFS_I(inode),
> >                                                     &ctx.ordered_extents);
> >               ret = filemap_fdatawait_range(inode->i_mapping, start, end);
> > +             if (ret)
> > +                     goto out_release_extents;
> > +
> > +             /*
> > +              * Check again the full sync flag, because it may have been set
> > +              * during the end IO callback (end_bbio_data_write() ->
> > +              * btrfs_finish_ordered_extent()) in case an error happened and
> > +              * we need to wait for ordered extents to complete so that any
> > +              * extent maps that point to unwritten locations are dropped and
> > +              * we don't log them.
> > +              */
> > +             full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> > +                                  &BTRFS_I(inode)->runtime_flags);
> > +             if (full_sync)
> > +                     ret = btrfs_wait_ordered_range(inode, start, len);
> >       }
> >
> >       if (ret)
> > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> > index 44157e43fd2a..55a9aeed7344 100644
> > --- a/fs/btrfs/ordered-data.c
> > +++ b/fs/btrfs/ordered-data.c
> > @@ -388,6 +388,37 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
> >       ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
> >       spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
> >
> > +     /*
> > +      * If this is a COW write it means we created new extent maps for the
> > +      * range and they point to unwritten locations if we got an error either
> > +      * before submitting a bio or during IO.
> > +      *
> > +      * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
> > +      * are queuing its completion below. During completion, at
> > +      * btrfs_finish_one_ordered(), we will drop the extent maps for the
> > +      * unwritten extents.
> > +      *
> > +      * However because completion runs in a work queue we can end up having
> > +      * a fast fsync running before that. In the case of direct IO, once we
> > +      * unlock the inode the fsync might start, and we queue the completion
> > +      * before unlocking the inode. In the case of buffered IO when writeback
> > +      * finishes (end_bbio_data_write()) we queue the completion, so if the
> > +      * writeback was triggered by a fast fsync, the fsync might start
> > +      * logging before ordered extent completion runs in the work queue.
> > +      *
> > +      * The fast fsync will log file extent items based on the extent maps it
> > +      * finds, so if by the time it collects extent maps the ordered extent
> > +      * completion didn't happen yet, it will log file extent items that
> > +      * point to unwritten extents, resulting in a corruption if a crash
> > +      * happens and the log tree is replayed. Note that a fast fsync does not
> > +      * wait for completion of ordered extents in order to reduce latency.
> > +      *
> > +      * Set a flag in the inode so that the next fast fsync will wait for
> > +      * ordered extents to complete before starting to log.
> > +      */
> > +     if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
> > +             btrfs_set_inode_full_sync(inode);
> > +
> >       if (ret)
> >               btrfs_queue_ordered_fn(ordered);
> >       return ret;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths
  2024-05-17 16:52 [PATCH v3 0/2] btrfs: fix logging unwritten extents after failure in write paths fdmanana
  2024-05-17 16:52 ` [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
  2024-05-17 16:52 ` [PATCH v3 2/2] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
@ 2024-05-20  9:46 ` fdmanana
  2024-05-20  9:46   ` [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
                     ` (6 more replies)
  2 siblings, 7 replies; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

There's a bug where a fast fsync can log extent maps that were not written
due to an error in a write path or during writeback. This affects both
direct IO writes and buffered writes, and besides the failure depends on
a race due to the fact that ordered extent completion happens in a work
queue and a fast fsync doesn't wait for ordered extent completion before
logging. The details are in the change log of the first patch.

V4: Use a slightly different approach to avoid a deadlock on the inode's
    spinlock due to it being used both in irq and non-irq context, pointed
    out by Qu.
    Added some cleanup patches (patches 3, 4, 5 and 6).

V3: Change the approach of patch 1/2 to not drop extent maps at
    btrfs_finish_ordered_extent() since that runs in irq context and
    dropping an extent map range triggers NOFS extent map allocations,
    which can trigger a reclaim and that can't run in irq context.
    Updated comments and changelog to distinguish differences between
    failures for direct IO writes and buffered writes.

V2: Rework solution since other error paths caused the same problem, make
    it more generic.
    Added more details to change log and comment about what's going on,
    and why reads aren't affected.

    https://lore.kernel.org/linux-btrfs/cover.1715798440.git.fdmanana@suse.com/

V1: https://lore.kernel.org/linux-btrfs/cover.1715688057.git.fdmanana@suse.com/

Filipe Manana (6):
  btrfs: ensure fast fsync waits for ordered extents after a write failure
  btrfs: make btrfs_finish_ordered_extent() return void
  btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx)
  btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()
  btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()
  btrfs: use a btrfs_inode local variable at btrfs_sync_file()

 fs/btrfs/btrfs_inode.h      | 10 ++++++
 fs/btrfs/file.c             | 63 ++++++++++++++++++++++---------------
 fs/btrfs/file.h             |  2 +-
 fs/btrfs/free-space-cache.c |  4 +--
 fs/btrfs/inode.c            | 16 +++++-----
 fs/btrfs/ordered-data.c     | 40 ++++++++++++++++++++---
 fs/btrfs/ordered-data.h     |  4 +--
 fs/btrfs/reflink.c          |  8 ++---
 fs/btrfs/relocation.c       |  2 +-
 fs/btrfs/tree-log.c         | 10 +++---
 fs/btrfs/tree-log.h         |  4 +--
 11 files changed, 108 insertions(+), 55 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
@ 2024-05-20  9:46   ` fdmanana
  2024-05-20 10:20     ` Qu Wenruo
  2024-05-20  9:46   ` [PATCH v4 2/6] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

If a write path in COW mode fails, either before submitting a bio for the
new extents or an actual IO error happens, we can end up allowing a fast
fsync to log file extent items that point to unwritten extents.

This is because dropping the extent maps happens when completing ordered
extents, at btrfs_finish_one_ordered(), and the completion of an ordered
extent is executed in a work queue.

This can result in a fast fsync to start logging file extent items based
on existing extent maps before the ordered extents complete, therefore
resulting in a log that has file extent items that point to unwritten
extents, resulting in a corrupt file if a crash happens after and the log
tree is replayed the next time the fs is mounted.

This can happen for both direct IO writes and buffered writes.

For example consider a direct IO write, in COW mode, that fails at
btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
error:

1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
   set to false, meaning an error happened;

2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
   flag;

3) btrfs_finish_ordered_extent() queues the completion of the ordered
   extent - so that btrfs_finish_one_ordered() will be executed later in
   a work queue. That function will drop extent maps in the range when
   it's executed, since the extent maps point to unwritten locations
   (signaled by the BTRFS_ORDERED_IOERR flag);

4) After calling btrfs_finish_ordered_extent() we keep going down the
   write path and unlock the inode;

5) After that a fast fsync starts and locks the inode;

6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
   task sees the extent maps that point to the unwritten locations and
   logs file extent items based on them - it does not know they are
   unwritten, and the fast fsync path does not wait for ordered extents
   to complete, which is an intentional behaviour in order to reduce
   latency.

For the buffered write case, here's one example:

1) A fast fsync begins, and it starts by flushing delalloc and waiting for
   the writeback to complete by calling filemap_fdatawait_range();

2) Flushing the dellaloc created a new extent map X;

3) During the writeback some IO error happened, and at the end io callback
   (end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
   sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
   completion;

4) After queuing the ordered extent completion, the end io callback clears
   the writeback flag from all pages (or folios), and from that moment the
   fast fsync can proceed;

5) The fast fsync proceeds sees extent map X and logs a file extent item
   based on extent map X, resulting in a log that points to an unwritten
   data extent - because the ordered extent completion hasn't run yet, it
   happens only after the logging.

To fix this make btrfs_finish_ordered_extent() set the inode flag
BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
so that a fast fsync will wait for ordered extent completion.

Note that this issues of using extent maps that point to unwritten
locations can not happen for reads, because in read paths we start by
locking the extent range and wait for any ordered extents in the range
to complete before looking for extent maps.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/btrfs_inode.h  | 10 ++++++++++
 fs/btrfs/file.c         | 16 ++++++++++++++++
 fs/btrfs/ordered-data.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 3c8bc7a8ebdd..46db4027bf15 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -112,6 +112,16 @@ enum {
 	 * done at new_simple_dir(), called from btrfs_lookup_dentry().
 	 */
 	BTRFS_INODE_ROOT_STUB,
+	/*
+	 * Set if an error happened when doing a COW write before submitting a
+	 * bio or during writeback. Used for both buffered writes and direct IO
+	 * writes. This is to signal a fast fsync that it has to wait for
+	 * ordered extents to complete and therefore not log extent maps that
+	 * point to unwritten extents (when an ordered extent completes and it
+	 * has the BTRFS_ORDERED_IOERR flag set, it drops extent maps in its
+	 * range).
+	 */
+	BTRFS_INODE_COW_WRITE_ERROR,
 };
 
 /* in memory btrfs inode */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0c7c1b42028e..00670596bf06 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1885,6 +1885,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 */
 	if (full_sync || btrfs_is_zoned(fs_info)) {
 		ret = btrfs_wait_ordered_range(inode, start, len);
+		clear_bit(BTRFS_INODE_COW_WRITE_ERROR, &BTRFS_I(inode)->runtime_flags);
 	} else {
 		/*
 		 * Get our ordered extents as soon as possible to avoid doing
@@ -1894,6 +1895,21 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		btrfs_get_ordered_extents_for_logging(BTRFS_I(inode),
 						      &ctx.ordered_extents);
 		ret = filemap_fdatawait_range(inode->i_mapping, start, end);
+		if (ret)
+			goto out_release_extents;
+
+		/*
+		 * Check and clear the BTRFS_INODE_COW_WRITE_ERROR now after
+		 * starting and waiting for writeback, because for buffered IO
+		 * it may have been set during the end IO callback
+		 * (end_bbio_data_write() -> btrfs_finish_ordered_extent()) in
+		 * case an error happened and we need to wait for ordered
+		 * extents to complete so that any extent maps that point to
+		 * unwritten locations are dropped and we don't log them.
+		 */
+		if (test_and_clear_bit(BTRFS_INODE_COW_WRITE_ERROR,
+				       &BTRFS_I(inode)->runtime_flags))
+			ret = btrfs_wait_ordered_range(inode, start, len);
 	}
 
 	if (ret)
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 44157e43fd2a..7d175d10a6d0 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -388,6 +388,37 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 	ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
 	spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
 
+	/*
+	 * If this is a COW write it means we created new extent maps for the
+	 * range and they point to unwritten locations if we got an error either
+	 * before submitting a bio or during IO.
+	 *
+	 * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
+	 * are queuing its completion below. During completion, at
+	 * btrfs_finish_one_ordered(), we will drop the extent maps for the
+	 * unwritten extents.
+	 *
+	 * However because completion runs in a work queue we can end up having
+	 * a fast fsync running before that. In the case of direct IO, once we
+	 * unlock the inode the fsync might start, and we queue the completion
+	 * before unlocking the inode. In the case of buffered IO when writeback
+	 * finishes (end_bbio_data_write()) we queue the completion, so if the
+	 * writeback was triggered by a fast fsync, the fsync might start
+	 * logging before ordered extent completion runs in the work queue.
+	 *
+	 * The fast fsync will log file extent items based on the extent maps it
+	 * finds, so if by the time it collects extent maps the ordered extent
+	 * completion didn't happen yet, it will log file extent items that
+	 * point to unwritten extents, resulting in a corruption if a crash
+	 * happens and the log tree is replayed. Note that a fast fsync does not
+	 * wait for completion of ordered extents in order to reduce latency.
+	 *
+	 * Set a flag in the inode so that the next fast fsync will wait for
+	 * ordered extents to complete before starting to log.
+	 */
+	if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
+		set_bit(BTRFS_INODE_COW_WRITE_ERROR, &inode->runtime_flags);
+
 	if (ret)
 		btrfs_queue_ordered_fn(ordered);
 	return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/6] btrfs: make btrfs_finish_ordered_extent() return void
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
  2024-05-20  9:46   ` [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
@ 2024-05-20  9:46   ` fdmanana
  2024-05-20  9:46   ` [PATCH v4 3/6] btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx) fdmanana
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Currently btrfs_finish_ordered_extent() returns a boolean indicating if
the ordered extent was added to the work queue for completion, but none
of its callers cares about it, so make it return void.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ordered-data.c | 3 +--
 fs/btrfs/ordered-data.h | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 7d175d10a6d0..16f9ddd2831c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -374,7 +374,7 @@ static void btrfs_queue_ordered_fn(struct btrfs_ordered_extent *ordered)
 	btrfs_queue_work(wq, &ordered->work);
 }
 
-bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+void btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 				 struct page *page, u64 file_offset, u64 len,
 				 bool uptodate)
 {
@@ -421,7 +421,6 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 
 	if (ret)
 		btrfs_queue_ordered_fn(ordered);
-	return ret;
 }
 
 /*
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index b6f6c6b91732..bef22179e7c5 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -162,7 +162,7 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent);
 void btrfs_put_ordered_extent(struct btrfs_ordered_extent *entry);
 void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
 				struct btrfs_ordered_extent *entry);
-bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
+void btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 				 struct page *page, u64 file_offset, u64 len,
 				 bool uptodate);
 void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/6] btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx)
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
  2024-05-20  9:46   ` [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
  2024-05-20  9:46   ` [PATCH v4 2/6] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
@ 2024-05-20  9:46   ` fdmanana
  2024-05-20  9:46   ` [PATCH v4 4/6] btrfs: pass a btrfs_inode to btrfs_fdatawrite_range() fdmanana
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of using a inode pointer, use a btrfs_inode pointer in the log
context structure, as this is generally what we need and allows for some
internal APIs to take a btrfs_inode instead, making them more consistent
with most of the code base. This will later allow to help to remove a lot
of BTRFS_I() calls in btrfs_sync_file().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c     |  4 ++--
 fs/btrfs/tree-log.c | 10 +++++-----
 fs/btrfs/tree-log.h |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 00670596bf06..506eabcd809d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1758,7 +1758,7 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
 
 static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
 {
-	struct btrfs_inode *inode = BTRFS_I(ctx->inode);
+	struct btrfs_inode *inode = ctx->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 
 	if (btrfs_inode_in_log(inode, btrfs_get_fs_generation(fs_info)) &&
@@ -1805,7 +1805,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 	trace_btrfs_sync_file(file, datasync);
 
-	btrfs_init_log_ctx(&ctx, inode);
+	btrfs_init_log_ctx(&ctx, BTRFS_I(inode));
 
 	/*
 	 * Always set the range to a full range, otherwise we can get into
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 2e762b89d4a2..51a167559ae8 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2821,7 +2821,7 @@ static void wait_for_writer(struct btrfs_root *root)
 	finish_wait(&root->log_writer_wait, &wait);
 }
 
-void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct inode *inode)
+void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct btrfs_inode *inode)
 {
 	ctx->log_ret = 0;
 	ctx->log_transid = 0;
@@ -2840,7 +2840,7 @@ void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct inode *inode)
 
 void btrfs_init_log_ctx_scratch_eb(struct btrfs_log_ctx *ctx)
 {
-	struct btrfs_inode *inode = BTRFS_I(ctx->inode);
+	struct btrfs_inode *inode = ctx->inode;
 
 	if (!test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
 	    !test_bit(BTRFS_INODE_COPY_EVERYTHING, &inode->runtime_flags))
@@ -2858,7 +2858,7 @@ void btrfs_release_log_ctx_extents(struct btrfs_log_ctx *ctx)
 	struct btrfs_ordered_extent *ordered;
 	struct btrfs_ordered_extent *tmp;
 
-	ASSERT(inode_is_locked(ctx->inode));
+	ASSERT(inode_is_locked(&ctx->inode->vfs_inode));
 
 	list_for_each_entry_safe(ordered, tmp, &ctx->ordered_extents, log_list) {
 		list_del_init(&ordered->log_list);
@@ -5908,7 +5908,7 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans,
 			if (ret < 0) {
 				return ret;
 			} else if (ret > 0 &&
-				   other_ino != btrfs_ino(BTRFS_I(ctx->inode))) {
+				   other_ino != btrfs_ino(ctx->inode)) {
 				if (ins_nr > 0) {
 					ins_nr++;
 				} else {
@@ -7570,7 +7570,7 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			goto out;
 	}
 
-	btrfs_init_log_ctx(&ctx, &inode->vfs_inode);
+	btrfs_init_log_ctx(&ctx, inode);
 	ctx.logging_new_name = true;
 	btrfs_init_log_ctx_scratch_eb(&ctx);
 	/*
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 22e9cbc81577..fa0a689259b1 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -37,7 +37,7 @@ struct btrfs_log_ctx {
 	bool logging_new_delayed_dentries;
 	/* Indicate if the inode being logged was logged before. */
 	bool logged_before;
-	struct inode *inode;
+	struct btrfs_inode *inode;
 	struct list_head list;
 	/* Only used for fast fsyncs. */
 	struct list_head ordered_extents;
@@ -55,7 +55,7 @@ struct btrfs_log_ctx {
 	struct extent_buffer *scratch_eb;
 };
 
-void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct inode *inode);
+void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx, struct btrfs_inode *inode);
 void btrfs_init_log_ctx_scratch_eb(struct btrfs_log_ctx *ctx);
 void btrfs_release_log_ctx_extents(struct btrfs_log_ctx *ctx);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/6] btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
                     ` (2 preceding siblings ...)
  2024-05-20  9:46   ` [PATCH v4 3/6] btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx) fdmanana
@ 2024-05-20  9:46   ` fdmanana
  2024-05-20  9:46   ` [PATCH v4 5/6] btrfs: pass a btrfs_inode to btrfs_wait_ordered_range() fdmanana
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of passing a (VFS) inode pointer argument, pass a btrfs_inode
instead, as this is generally what we do for internal APIs, making it
more consistent with most of the code base. This will later allow to
help to remove a lot of BTRFS_I() calls in btrfs_sync_file().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c             | 18 +++++++++---------
 fs/btrfs/file.h             |  2 +-
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/ordered-data.c     |  2 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 506eabcd809d..23c5510f6271 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1625,7 +1625,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	 * able to read what was just written.
 	 */
 	endbyte = pos + written_buffered - 1;
-	ret = btrfs_fdatawrite_range(inode, pos, endbyte);
+	ret = btrfs_fdatawrite_range(BTRFS_I(inode), pos, endbyte);
 	if (ret)
 		goto out;
 	ret = filemap_fdatawait_range(inode->i_mapping, pos, endbyte);
@@ -1738,7 +1738,7 @@ int btrfs_release_file(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
+static int start_ordered_ops(struct btrfs_inode *inode, loff_t start, loff_t end)
 {
 	int ret;
 	struct blk_plug plug;
@@ -1825,7 +1825,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * multi-task, and make the performance up.  See
 	 * btrfs_wait_ordered_range for an explanation of the ASYNC check.
 	 */
-	ret = start_ordered_ops(inode, start, end);
+	ret = start_ordered_ops(BTRFS_I(inode), start, end);
 	if (ret)
 		goto out;
 
@@ -1851,7 +1851,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * So trigger writeback for any eventual new dirty pages and then we
 	 * wait for all ordered extents to complete below.
 	 */
-	ret = start_ordered_ops(inode, start, end);
+	ret = start_ordered_ops(BTRFS_I(inode), start, end);
 	if (ret) {
 		btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
 		goto out;
@@ -4045,8 +4045,9 @@ const struct file_operations btrfs_file_operations = {
 	.remap_file_range = btrfs_remap_file_range,
 };
 
-int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end)
+int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end)
 {
+	struct address_space *mapping = inode->vfs_inode.i_mapping;
 	int ret;
 
 	/*
@@ -4063,10 +4064,9 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end)
 	 * know better and pull this out at some point in the future, it is
 	 * right and you are wrong.
 	 */
-	ret = filemap_fdatawrite_range(inode->i_mapping, start, end);
-	if (!ret && test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
-			     &BTRFS_I(inode)->runtime_flags))
-		ret = filemap_fdatawrite_range(inode->i_mapping, start, end);
+	ret = filemap_fdatawrite_range(mapping, start, end);
+	if (!ret && test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags))
+		ret = filemap_fdatawrite_range(mapping, start, end);
 
 	return ret;
 }
diff --git a/fs/btrfs/file.h b/fs/btrfs/file.h
index 77aaca208c7b..ce93ed7083ab 100644
--- a/fs/btrfs/file.h
+++ b/fs/btrfs/file.h
@@ -37,7 +37,7 @@ int btrfs_release_file(struct inode *inode, struct file *file);
 int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		      size_t num_pages, loff_t pos, size_t write_bytes,
 		      struct extent_state **cached, bool noreserve);
-int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
+int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end);
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 			   size_t *write_bytes, bool nowait);
 void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index c8a05d5eb9cb..e6d599efd713 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1483,7 +1483,7 @@ static int __btrfs_write_out_cache(struct inode *inode,
 	io_ctl->entries = entries;
 	io_ctl->bitmaps = bitmaps;
 
-	ret = btrfs_fdatawrite_range(inode, 0, (u64)-1);
+	ret = btrfs_fdatawrite_range(BTRFS_I(inode), 0, (u64)-1);
 	if (ret)
 		goto out;
 
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 16f9ddd2831c..605d88e09525 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -859,7 +859,7 @@ int btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
 	/* start IO across the range first to instantiate any delalloc
 	 * extents
 	 */
-	ret = btrfs_fdatawrite_range(inode, start, orig_end);
+	ret = btrfs_fdatawrite_range(BTRFS_I(inode), start, orig_end);
 	if (ret)
 		return ret;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/6] btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
                     ` (3 preceding siblings ...)
  2024-05-20  9:46   ` [PATCH v4 4/6] btrfs: pass a btrfs_inode to btrfs_fdatawrite_range() fdmanana
@ 2024-05-20  9:46   ` fdmanana
  2024-05-20  9:46   ` [PATCH v4 6/6] btrfs: use a btrfs_inode local variable at btrfs_sync_file() fdmanana
  2024-05-20 10:23   ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths Qu Wenruo
  6 siblings, 0 replies; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of passing a (VFS) inode pointer argument, pass a btrfs_inode
instead, as this is generally what we do for internal APIs, making it
more consistent with most of the code base. This will later allow to
help to remove a lot of BTRFS_I() calls in btrfs_sync_file().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c             | 10 +++++-----
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/inode.c            | 16 ++++++++--------
 fs/btrfs/ordered-data.c     |  8 ++++----
 fs/btrfs/ordered-data.h     |  2 +-
 fs/btrfs/reflink.c          |  8 ++++----
 fs/btrfs/relocation.c       |  2 +-
 7 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 23c5510f6271..21a48d326b99 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1884,7 +1884,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * to wait for the IO to stabilize the logical address.
 	 */
 	if (full_sync || btrfs_is_zoned(fs_info)) {
-		ret = btrfs_wait_ordered_range(inode, start, len);
+		ret = btrfs_wait_ordered_range(BTRFS_I(inode), start, len);
 		clear_bit(BTRFS_INODE_COW_WRITE_ERROR, &BTRFS_I(inode)->runtime_flags);
 	} else {
 		/*
@@ -1909,7 +1909,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		 */
 		if (test_and_clear_bit(BTRFS_INODE_COW_WRITE_ERROR,
 				       &BTRFS_I(inode)->runtime_flags))
-			ret = btrfs_wait_ordered_range(inode, start, len);
+			ret = btrfs_wait_ordered_range(BTRFS_I(inode), start, len);
 	}
 
 	if (ret)
@@ -2014,7 +2014,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		ret = btrfs_end_transaction(trans);
 		if (ret)
 			goto out;
-		ret = btrfs_wait_ordered_range(inode, start, len);
+		ret = btrfs_wait_ordered_range(BTRFS_I(inode), start, len);
 		if (ret)
 			goto out;
 
@@ -2814,7 +2814,7 @@ static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len)
 
 	btrfs_inode_lock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
 
-	ret = btrfs_wait_ordered_range(inode, offset, len);
+	ret = btrfs_wait_ordered_range(BTRFS_I(inode), offset, len);
 	if (ret)
 		goto out_only_mutex;
 
@@ -3309,7 +3309,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 	 * the file range and, due to the previous locking we did, we know there
 	 * can't be more delalloc or ordered extents in the range.
 	 */
-	ret = btrfs_wait_ordered_range(inode, alloc_start,
+	ret = btrfs_wait_ordered_range(BTRFS_I(inode), alloc_start,
 				       alloc_end - alloc_start);
 	if (ret)
 		goto out;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index e6d599efd713..8bed59fe937c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1268,7 +1268,7 @@ static int flush_dirty_cache(struct inode *inode)
 {
 	int ret;
 
-	ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
+	ret = btrfs_wait_ordered_range(BTRFS_I(inode), 0, (u64)-1);
 	if (ret)
 		clear_extent_bit(&BTRFS_I(inode)->io_tree, 0, inode->i_size - 1,
 				 EXTENT_DELALLOC, NULL);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2f3129fe0e58..3cf32bc721d2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5081,7 +5081,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
 
 		if (btrfs_is_zoned(fs_info)) {
-			ret = btrfs_wait_ordered_range(inode,
+			ret = btrfs_wait_ordered_range(BTRFS_I(inode),
 					ALIGN(newsize, fs_info->sectorsize),
 					(u64)-1);
 			if (ret)
@@ -5111,7 +5111,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 			 * wait for disk_i_size to be stable and then update the
 			 * in-memory size to match.
 			 */
-			err = btrfs_wait_ordered_range(inode, 0, (u64)-1);
+			err = btrfs_wait_ordered_range(BTRFS_I(inode), 0, (u64)-1);
 			if (err)
 				return err;
 			i_size_write(inode, BTRFS_I(inode)->disk_i_size);
@@ -7955,7 +7955,7 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	 * if we have delalloc in those ranges.
 	 */
 	if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC) {
-		ret = btrfs_wait_ordered_range(inode, 0, LLONG_MAX);
+		ret = btrfs_wait_ordered_range(btrfs_inode, 0, LLONG_MAX);
 		if (ret)
 			return ret;
 	}
@@ -7969,7 +7969,7 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 	 * possible a new write may have happened in between those two steps.
 	 */
 	if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC) {
-		ret = btrfs_wait_ordered_range(inode, 0, LLONG_MAX);
+		ret = btrfs_wait_ordered_range(btrfs_inode, 0, LLONG_MAX);
 		if (ret) {
 			btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED);
 			return ret;
@@ -8238,7 +8238,7 @@ static int btrfs_truncate(struct btrfs_inode *inode, bool skip_writeback)
 	const u64 min_size = btrfs_calc_metadata_size(fs_info, 1);
 
 	if (!skip_writeback) {
-		ret = btrfs_wait_ordered_range(&inode->vfs_inode,
+		ret = btrfs_wait_ordered_range(inode,
 					       inode->vfs_inode.i_size & (~mask),
 					       (u64)-1);
 		if (ret)
@@ -10059,7 +10059,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 	for (;;) {
 		struct btrfs_ordered_extent *ordered;
 
-		ret = btrfs_wait_ordered_range(&inode->vfs_inode, start,
+		ret = btrfs_wait_ordered_range(inode, start,
 					       lockend - start + 1);
 		if (ret)
 			goto out_unlock_inode;
@@ -10302,7 +10302,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	for (;;) {
 		struct btrfs_ordered_extent *ordered;
 
-		ret = btrfs_wait_ordered_range(&inode->vfs_inode, start, num_bytes);
+		ret = btrfs_wait_ordered_range(inode, start, num_bytes);
 		if (ret)
 			goto out_folios;
 		ret = invalidate_inode_pages2_range(inode->vfs_inode.i_mapping,
@@ -10575,7 +10575,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	 * file changes again after this, the user is doing something stupid and
 	 * we don't really care.
 	 */
-	ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
+	ret = btrfs_wait_ordered_range(BTRFS_I(inode), 0, (u64)-1);
 	if (ret)
 		return ret;
 
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 605d88e09525..e2c176f7c387 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -840,7 +840,7 @@ void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry)
 /*
  * Used to wait on ordered extents across a large range of bytes.
  */
-int btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
+int btrfs_wait_ordered_range(struct btrfs_inode *inode, u64 start, u64 len)
 {
 	int ret = 0;
 	int ret_wb = 0;
@@ -859,7 +859,7 @@ int btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
 	/* start IO across the range first to instantiate any delalloc
 	 * extents
 	 */
-	ret = btrfs_fdatawrite_range(BTRFS_I(inode), start, orig_end);
+	ret = btrfs_fdatawrite_range(inode, start, orig_end);
 	if (ret)
 		return ret;
 
@@ -870,11 +870,11 @@ int btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
 	 * before the ordered extents complete - to avoid failures (-EEXIST)
 	 * when adding the new ordered extents to the ordered tree.
 	 */
-	ret_wb = filemap_fdatawait_range(inode->i_mapping, start, orig_end);
+	ret_wb = filemap_fdatawait_range(inode->vfs_inode.i_mapping, start, orig_end);
 
 	end = orig_end;
 	while (1) {
-		ordered = btrfs_lookup_first_ordered_extent(BTRFS_I(inode), end);
+		ordered = btrfs_lookup_first_ordered_extent(inode, end);
 		if (!ordered)
 			break;
 		if (ordered->file_offset > orig_end) {
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index bef22179e7c5..4a4dd15d38ba 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -181,7 +181,7 @@ void btrfs_add_ordered_sum(struct btrfs_ordered_extent *entry,
 struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *inode,
 							 u64 file_offset);
 void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry);
-int btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len);
+int btrfs_wait_ordered_range(struct btrfs_inode *inode, u64 start, u64 len);
 struct btrfs_ordered_extent *
 btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset);
 struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index d0a3fcecc46a..df6b93b927cd 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -733,7 +733,7 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 		 * we found the previous extent covering eof and before we
 		 * attempted to increment its reference count).
 		 */
-		ret = btrfs_wait_ordered_range(inode, wb_start,
+		ret = btrfs_wait_ordered_range(BTRFS_I(inode), wb_start,
 					       destoff - wb_start);
 		if (ret)
 			return ret;
@@ -755,7 +755,7 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 	 * range, so wait for writeback to complete before truncating pages
 	 * from the page cache. This is a rare case.
 	 */
-	wb_ret = btrfs_wait_ordered_range(inode, destoff, len);
+	wb_ret = btrfs_wait_ordered_range(BTRFS_I(inode), destoff, len);
 	ret = ret ? ret : wb_ret;
 	/*
 	 * Truncate page cache pages so that future reads will see the cloned
@@ -835,11 +835,11 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	if (ret < 0)
 		return ret;
 
-	ret = btrfs_wait_ordered_range(inode_in, ALIGN_DOWN(pos_in, bs),
+	ret = btrfs_wait_ordered_range(BTRFS_I(inode_in), ALIGN_DOWN(pos_in, bs),
 				       wb_len);
 	if (ret < 0)
 		return ret;
-	ret = btrfs_wait_ordered_range(inode_out, ALIGN_DOWN(pos_out, bs),
+	ret = btrfs_wait_ordered_range(BTRFS_I(inode_out), ALIGN_DOWN(pos_out, bs),
 				       wb_len);
 	if (ret < 0)
 		return ret;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9f35524b6664..8ce337ec033c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4149,7 +4149,7 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 		 * out of the loop if we hit an error.
 		 */
 		if (rc->stage == MOVE_DATA_EXTENTS && rc->found_file_extent) {
-			ret = btrfs_wait_ordered_range(rc->data_inode, 0,
+			ret = btrfs_wait_ordered_range(BTRFS_I(rc->data_inode), 0,
 						       (u64)-1);
 			if (ret)
 				err = ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 6/6] btrfs: use a btrfs_inode local variable at btrfs_sync_file()
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
                     ` (4 preceding siblings ...)
  2024-05-20  9:46   ` [PATCH v4 5/6] btrfs: pass a btrfs_inode to btrfs_wait_ordered_range() fdmanana
@ 2024-05-20  9:46   ` fdmanana
  2024-05-20 10:23   ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths Qu Wenruo
  6 siblings, 0 replies; 14+ messages in thread
From: fdmanana @ 2024-05-20  9:46 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Instead of using a VFS inode local pointer and then doing many BTRFS_I()
calls inside btrfs_sync_file(), use a btrfs_inode pointer instead. This
makes everything a bit easier to read and less confusing, allowing to
make some statements shorter.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c | 43 ++++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 21a48d326b99..af58a1b33498 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1794,9 +1794,9 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
 int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 {
 	struct dentry *dentry = file_dentry(file);
-	struct inode *inode = d_inode(dentry);
-	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
-	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_inode *inode = BTRFS_I(d_inode(dentry));
+	struct btrfs_root *root = inode->root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_trans_handle *trans;
 	struct btrfs_log_ctx ctx;
 	int ret = 0, err;
@@ -1805,7 +1805,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 	trace_btrfs_sync_file(file, datasync);
 
-	btrfs_init_log_ctx(&ctx, BTRFS_I(inode));
+	btrfs_init_log_ctx(&ctx, inode);
 
 	/*
 	 * Always set the range to a full range, otherwise we can get into
@@ -1825,11 +1825,11 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * multi-task, and make the performance up.  See
 	 * btrfs_wait_ordered_range for an explanation of the ASYNC check.
 	 */
-	ret = start_ordered_ops(BTRFS_I(inode), start, end);
+	ret = start_ordered_ops(inode, start, end);
 	if (ret)
 		goto out;
 
-	btrfs_inode_lock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+	btrfs_inode_lock(inode, BTRFS_ILOCK_MMAP);
 
 	atomic_inc(&root->log_batch);
 
@@ -1851,9 +1851,9 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * So trigger writeback for any eventual new dirty pages and then we
 	 * wait for all ordered extents to complete below.
 	 */
-	ret = start_ordered_ops(BTRFS_I(inode), start, end);
+	ret = start_ordered_ops(inode, start, end);
 	if (ret) {
-		btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+		btrfs_inode_unlock(inode, BTRFS_ILOCK_MMAP);
 		goto out;
 	}
 
@@ -1865,8 +1865,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * running delalloc the full sync flag may be set if we need to drop
 	 * extra extent map ranges due to temporary memory allocation failures.
 	 */
-	full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-			     &BTRFS_I(inode)->runtime_flags);
+	full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
 
 	/*
 	 * We have to do this here to avoid the priority inversion of waiting on
@@ -1884,17 +1883,16 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * to wait for the IO to stabilize the logical address.
 	 */
 	if (full_sync || btrfs_is_zoned(fs_info)) {
-		ret = btrfs_wait_ordered_range(BTRFS_I(inode), start, len);
-		clear_bit(BTRFS_INODE_COW_WRITE_ERROR, &BTRFS_I(inode)->runtime_flags);
+		ret = btrfs_wait_ordered_range(inode, start, len);
+		clear_bit(BTRFS_INODE_COW_WRITE_ERROR, &inode->runtime_flags);
 	} else {
 		/*
 		 * Get our ordered extents as soon as possible to avoid doing
 		 * checksum lookups in the csum tree, and use instead the
 		 * checksums attached to the ordered extents.
 		 */
-		btrfs_get_ordered_extents_for_logging(BTRFS_I(inode),
-						      &ctx.ordered_extents);
-		ret = filemap_fdatawait_range(inode->i_mapping, start, end);
+		btrfs_get_ordered_extents_for_logging(inode, &ctx.ordered_extents);
+		ret = filemap_fdatawait_range(inode->vfs_inode.i_mapping, start, end);
 		if (ret)
 			goto out_release_extents;
 
@@ -1908,8 +1906,8 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		 * unwritten locations are dropped and we don't log them.
 		 */
 		if (test_and_clear_bit(BTRFS_INODE_COW_WRITE_ERROR,
-				       &BTRFS_I(inode)->runtime_flags))
-			ret = btrfs_wait_ordered_range(BTRFS_I(inode), start, len);
+				       &inode->runtime_flags))
+			ret = btrfs_wait_ordered_range(inode, start, len);
 	}
 
 	if (ret)
@@ -1923,8 +1921,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		 * modified so clear this flag in case it was set for whatever
 		 * reason, it's no longer relevant.
 		 */
-		clear_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-			  &BTRFS_I(inode)->runtime_flags);
+		clear_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
 		/*
 		 * An ordered extent might have started before and completed
 		 * already with io errors, in which case the inode was not
@@ -1932,7 +1929,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		 * for any errors that might have happened since we last
 		 * checked called fsync.
 		 */
-		ret = filemap_check_wb_err(inode->i_mapping, file->f_wb_err);
+		ret = filemap_check_wb_err(inode->vfs_inode.i_mapping, file->f_wb_err);
 		goto out_release_extents;
 	}
 
@@ -1982,7 +1979,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * file again, but that will end up using the synchronization
 	 * inside btrfs_sync_log to keep things safe.
 	 */
-	btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+	btrfs_inode_unlock(inode, BTRFS_ILOCK_MMAP);
 
 	if (ret == BTRFS_NO_LOG_SYNC) {
 		ret = btrfs_end_transaction(trans);
@@ -2014,7 +2011,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		ret = btrfs_end_transaction(trans);
 		if (ret)
 			goto out;
-		ret = btrfs_wait_ordered_range(BTRFS_I(inode), start, len);
+		ret = btrfs_wait_ordered_range(inode, start, len);
 		if (ret)
 			goto out;
 
@@ -2051,7 +2048,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 out_release_extents:
 	btrfs_release_log_ctx_extents(&ctx);
-	btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_MMAP);
+	btrfs_inode_unlock(inode, BTRFS_ILOCK_MMAP);
 	goto out;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure
  2024-05-20  9:46   ` [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
@ 2024-05-20 10:20     ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2024-05-20 10:20 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/5/20 19:16, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> If a write path in COW mode fails, either before submitting a bio for the
> new extents or an actual IO error happens, we can end up allowing a fast
> fsync to log file extent items that point to unwritten extents.
>
> This is because dropping the extent maps happens when completing ordered
> extents, at btrfs_finish_one_ordered(), and the completion of an ordered
> extent is executed in a work queue.
>
> This can result in a fast fsync to start logging file extent items based
> on existing extent maps before the ordered extents complete, therefore
> resulting in a log that has file extent items that point to unwritten
> extents, resulting in a corrupt file if a crash happens after and the log
> tree is replayed the next time the fs is mounted.
>
> This can happen for both direct IO writes and buffered writes.
>
> For example consider a direct IO write, in COW mode, that fails at
> btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an
> error:
>
> 1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter
>     set to false, meaning an error happened;
>
> 2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR
>     flag;
>
> 3) btrfs_finish_ordered_extent() queues the completion of the ordered
>     extent - so that btrfs_finish_one_ordered() will be executed later in
>     a work queue. That function will drop extent maps in the range when
>     it's executed, since the extent maps point to unwritten locations
>     (signaled by the BTRFS_ORDERED_IOERR flag);
>
> 4) After calling btrfs_finish_ordered_extent() we keep going down the
>     write path and unlock the inode;
>
> 5) After that a fast fsync starts and locks the inode;
>
> 6) Before the work queue executes btrfs_finish_one_ordered(), the fsync
>     task sees the extent maps that point to the unwritten locations and
>     logs file extent items based on them - it does not know they are
>     unwritten, and the fast fsync path does not wait for ordered extents
>     to complete, which is an intentional behaviour in order to reduce
>     latency.
>
> For the buffered write case, here's one example:
>
> 1) A fast fsync begins, and it starts by flushing delalloc and waiting for
>     the writeback to complete by calling filemap_fdatawait_range();
>
> 2) Flushing the dellaloc created a new extent map X;
>
> 3) During the writeback some IO error happened, and at the end io callback
>     (end_bbio_data_write()) we call btrfs_finish_ordered_extent(), which
>     sets the BTRFS_ORDERED_IOERR flag in the ordered extent and queues its
>     completion;
>
> 4) After queuing the ordered extent completion, the end io callback clears
>     the writeback flag from all pages (or folios), and from that moment the
>     fast fsync can proceed;
>
> 5) The fast fsync proceeds sees extent map X and logs a file extent item
>     based on extent map X, resulting in a log that points to an unwritten
>     data extent - because the ordered extent completion hasn't run yet, it
>     happens only after the logging.
>
> To fix this make btrfs_finish_ordered_extent() set the inode flag
> BTRFS_INODE_NEEDS_FULL_SYNC in case an error happened for a COW write,
> so that a fast fsync will wait for ordered extent completion.
>
> Note that this issues of using extent maps that point to unwritten
> locations can not happen for reads, because in read paths we start by
> locking the extent range and wait for any ordered extents in the range
> to complete before looking for extent maps.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

So a new inode flag without touching the spinlock, that's a solid solution.

Thanks,
Qu
> ---
>   fs/btrfs/btrfs_inode.h  | 10 ++++++++++
>   fs/btrfs/file.c         | 16 ++++++++++++++++
>   fs/btrfs/ordered-data.c | 31 +++++++++++++++++++++++++++++++
>   3 files changed, 57 insertions(+)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 3c8bc7a8ebdd..46db4027bf15 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -112,6 +112,16 @@ enum {
>   	 * done at new_simple_dir(), called from btrfs_lookup_dentry().
>   	 */
>   	BTRFS_INODE_ROOT_STUB,
> +	/*
> +	 * Set if an error happened when doing a COW write before submitting a
> +	 * bio or during writeback. Used for both buffered writes and direct IO
> +	 * writes. This is to signal a fast fsync that it has to wait for
> +	 * ordered extents to complete and therefore not log extent maps that
> +	 * point to unwritten extents (when an ordered extent completes and it
> +	 * has the BTRFS_ORDERED_IOERR flag set, it drops extent maps in its
> +	 * range).
> +	 */
> +	BTRFS_INODE_COW_WRITE_ERROR,
>   };
>
>   /* in memory btrfs inode */
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0c7c1b42028e..00670596bf06 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1885,6 +1885,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>   	 */
>   	if (full_sync || btrfs_is_zoned(fs_info)) {
>   		ret = btrfs_wait_ordered_range(inode, start, len);
> +		clear_bit(BTRFS_INODE_COW_WRITE_ERROR, &BTRFS_I(inode)->runtime_flags);
>   	} else {
>   		/*
>   		 * Get our ordered extents as soon as possible to avoid doing
> @@ -1894,6 +1895,21 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>   		btrfs_get_ordered_extents_for_logging(BTRFS_I(inode),
>   						      &ctx.ordered_extents);
>   		ret = filemap_fdatawait_range(inode->i_mapping, start, end);
> +		if (ret)
> +			goto out_release_extents;
> +
> +		/*
> +		 * Check and clear the BTRFS_INODE_COW_WRITE_ERROR now after
> +		 * starting and waiting for writeback, because for buffered IO
> +		 * it may have been set during the end IO callback
> +		 * (end_bbio_data_write() -> btrfs_finish_ordered_extent()) in
> +		 * case an error happened and we need to wait for ordered
> +		 * extents to complete so that any extent maps that point to
> +		 * unwritten locations are dropped and we don't log them.
> +		 */
> +		if (test_and_clear_bit(BTRFS_INODE_COW_WRITE_ERROR,
> +				       &BTRFS_I(inode)->runtime_flags))
> +			ret = btrfs_wait_ordered_range(inode, start, len);
>   	}
>
>   	if (ret)
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 44157e43fd2a..7d175d10a6d0 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -388,6 +388,37 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
>   	ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
>   	spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
>
> +	/*
> +	 * If this is a COW write it means we created new extent maps for the
> +	 * range and they point to unwritten locations if we got an error either
> +	 * before submitting a bio or during IO.
> +	 *
> +	 * We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
> +	 * are queuing its completion below. During completion, at
> +	 * btrfs_finish_one_ordered(), we will drop the extent maps for the
> +	 * unwritten extents.
> +	 *
> +	 * However because completion runs in a work queue we can end up having
> +	 * a fast fsync running before that. In the case of direct IO, once we
> +	 * unlock the inode the fsync might start, and we queue the completion
> +	 * before unlocking the inode. In the case of buffered IO when writeback
> +	 * finishes (end_bbio_data_write()) we queue the completion, so if the
> +	 * writeback was triggered by a fast fsync, the fsync might start
> +	 * logging before ordered extent completion runs in the work queue.
> +	 *
> +	 * The fast fsync will log file extent items based on the extent maps it
> +	 * finds, so if by the time it collects extent maps the ordered extent
> +	 * completion didn't happen yet, it will log file extent items that
> +	 * point to unwritten extents, resulting in a corruption if a crash
> +	 * happens and the log tree is replayed. Note that a fast fsync does not
> +	 * wait for completion of ordered extents in order to reduce latency.
> +	 *
> +	 * Set a flag in the inode so that the next fast fsync will wait for
> +	 * ordered extents to complete before starting to log.
> +	 */
> +	if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
> +		set_bit(BTRFS_INODE_COW_WRITE_ERROR, &inode->runtime_flags);
> +
>   	if (ret)
>   		btrfs_queue_ordered_fn(ordered);
>   	return ret;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths
  2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
                     ` (5 preceding siblings ...)
  2024-05-20  9:46   ` [PATCH v4 6/6] btrfs: use a btrfs_inode local variable at btrfs_sync_file() fdmanana
@ 2024-05-20 10:23   ` Qu Wenruo
  6 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2024-05-20 10:23 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



在 2024/5/20 19:16, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> There's a bug where a fast fsync can log extent maps that were not written
> due to an error in a write path or during writeback. This affects both
> direct IO writes and buffered writes, and besides the failure depends on
> a race due to the fact that ordered extent completion happens in a work
> queue and a fast fsync doesn't wait for ordered extent completion before
> logging. The details are in the change log of the first patch.
>
> V4: Use a slightly different approach to avoid a deadlock on the inode's
>      spinlock due to it being used both in irq and non-irq context, pointed
>      out by Qu.
>      Added some cleanup patches (patches 3, 4, 5 and 6).

The whole series looks good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> V3: Change the approach of patch 1/2 to not drop extent maps at
>      btrfs_finish_ordered_extent() since that runs in irq context and
>      dropping an extent map range triggers NOFS extent map allocations,
>      which can trigger a reclaim and that can't run in irq context.
>      Updated comments and changelog to distinguish differences between
>      failures for direct IO writes and buffered writes.
>
> V2: Rework solution since other error paths caused the same problem, make
>      it more generic.
>      Added more details to change log and comment about what's going on,
>      and why reads aren't affected.
>
>      https://lore.kernel.org/linux-btrfs/cover.1715798440.git.fdmanana@suse.com/
>
> V1: https://lore.kernel.org/linux-btrfs/cover.1715688057.git.fdmanana@suse.com/
>
> Filipe Manana (6):
>    btrfs: ensure fast fsync waits for ordered extents after a write failure
>    btrfs: make btrfs_finish_ordered_extent() return void
>    btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx)
>    btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()
>    btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()
>    btrfs: use a btrfs_inode local variable at btrfs_sync_file()
>
>   fs/btrfs/btrfs_inode.h      | 10 ++++++
>   fs/btrfs/file.c             | 63 ++++++++++++++++++++++---------------
>   fs/btrfs/file.h             |  2 +-
>   fs/btrfs/free-space-cache.c |  4 +--
>   fs/btrfs/inode.c            | 16 +++++-----
>   fs/btrfs/ordered-data.c     | 40 ++++++++++++++++++++---
>   fs/btrfs/ordered-data.h     |  4 +--
>   fs/btrfs/reflink.c          |  8 ++---
>   fs/btrfs/relocation.c       |  2 +-
>   fs/btrfs/tree-log.c         | 10 +++---
>   fs/btrfs/tree-log.h         |  4 +--
>   11 files changed, 108 insertions(+), 55 deletions(-)
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-05-20 10:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-17 16:52 [PATCH v3 0/2] btrfs: fix logging unwritten extents after failure in write paths fdmanana
2024-05-17 16:52 ` [PATCH v3 1/2] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
2024-05-18  5:28   ` Qu Wenruo
2024-05-20  9:46     ` Filipe Manana
2024-05-17 16:52 ` [PATCH v3 2/2] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
2024-05-20  9:46 ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths fdmanana
2024-05-20  9:46   ` [PATCH v4 1/6] btrfs: ensure fast fsync waits for ordered extents after a write failure fdmanana
2024-05-20 10:20     ` Qu Wenruo
2024-05-20  9:46   ` [PATCH v4 2/6] btrfs: make btrfs_finish_ordered_extent() return void fdmanana
2024-05-20  9:46   ` [PATCH v4 3/6] btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx) fdmanana
2024-05-20  9:46   ` [PATCH v4 4/6] btrfs: pass a btrfs_inode to btrfs_fdatawrite_range() fdmanana
2024-05-20  9:46   ` [PATCH v4 5/6] btrfs: pass a btrfs_inode to btrfs_wait_ordered_range() fdmanana
2024-05-20  9:46   ` [PATCH v4 6/6] btrfs: use a btrfs_inode local variable at btrfs_sync_file() fdmanana
2024-05-20 10:23   ` [PATCH v4 0/6] btrfs: fix logging unwritten extents after failure in write paths Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox