[PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Sasha Levin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Anand Jain, Josef Bacik, David Sterba, Sasha Levin, clm, jbacik,
	linux-btrfs

From: Anand Jain <anand.jain@oracle.com>

[ Upstream commit 770c79fb65506fc7c16459855c3839429f46cb32 ]

Identifying and removing the stale device from the fs_uuids list is done
by btrfs_free_stale_devices().  btrfs_free_stale_devices() in turn
depends on device_path_matched() to check if the device appears in more
than one btrfs_device structure.

The matching of the device happens by its path, the device path. However,
when device mapper is in use, the dm device paths are nothing but a link
to the actual block device, which leads to the device_path_matched()
failing to match.

Fix this by matching the dev_t as provided by lookup_bdev() instead of
plain string compare of the device paths.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/volumes.c | 45 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b07d382d53a8..24e559d90b6a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -534,15 +534,48 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 	return ret;
 }
 
-static bool device_path_matched(const char *path, struct btrfs_device *device)
+/*
+ * Check if the device in the path matches the device in the given struct device.
+ *
+ * Returns:
+ *   true  If it is the same device.
+ *   false If it is not the same device or on error.
+ */
+static bool device_matched(const struct btrfs_device *device, const char *path)
 {
-	int found;
+	char *device_name;
+	dev_t dev_old;
+	dev_t dev_new;
+	int ret;
+
+	/*
+	 * If we are looking for a device with the matching dev_t, then skip
+	 * device without a name (a missing device).
+	 */
+	if (!device->name)
+		return false;
+
+	device_name = kzalloc(BTRFS_PATH_NAME_MAX, GFP_KERNEL);
+	if (!device_name)
+		return false;
 
 	rcu_read_lock();
-	found = strcmp(rcu_str_deref(device->name), path);
+	scnprintf(device_name, BTRFS_PATH_NAME_MAX, "%s", rcu_str_deref(device->name));
 	rcu_read_unlock();
 
-	return found == 0;
+	ret = lookup_bdev(device_name, &dev_old);
+	kfree(device_name);
+	if (ret)
+		return false;
+
+	ret = lookup_bdev(path, &dev_new);
+	if (ret)
+		return false;
+
+	if (dev_old == dev_new)
+		return true;
+
+	return false;
 }
 
 /*
@@ -575,9 +608,7 @@ static int btrfs_free_stale_devices(const char *path,
 					 &fs_devices->devices, dev_list) {
 			if (skip_device && skip_device == device)
 				continue;
-			if (path && !device->name)
-				continue;
-			if (path && !device_path_matched(path, device))
+			if (path && !device_matched(device, path))
 				continue;
 			if (fs_devices->opened) {
 				/* for an already deleted device return 0 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio()
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device Sasha Levin
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-29 19:57   ` Omar Sandoval
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 13/21] btrfs: make search_csum_tree return 0 if we get -EFBIG Sasha Levin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Omar Sandoval, Josef Bacik, Nikolay Borisov, David Sterba,
	Sasha Levin, clm, jbacik, linux-btrfs

From: Omar Sandoval <osandov@fb.com>

[ Upstream commit e331f6b19f8adde2307588bb325ae5de78617c20 ]

btrfs_csum_one_bio() loops over each filesystem block in the bio while
keeping a cursor of its current logical position in the file in order to
look up the ordered extent to add the checksums to. However, this
doesn't make much sense for compressed extents, as a sector on disk does
not correspond to a sector of decompressed file data. It happens to work
because:

1) the compressed bio always covers one ordered extent
2) the size of the bio is always less than the size of the ordered
   extent

However, the second point will not always be true for encoded writes.

Let's add a boolean parameter to btrfs_csum_one_bio() to indicate that
it can assume that the bio only covers one ordered extent. Since we're
already changing the signature, let's get rid of the contig parameter
and make it implied by the offset parameter, similar to the change we
recently made to btrfs_lookup_bio_sums(). Additionally, let's rename
nr_sectors to blockcount to make it clear that it's the number of
filesystem blocks, not the number of 512-byte sectors.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/compression.c |  2 +-
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/file-item.c   | 37 +++++++++++++++++--------------------
 fs/btrfs/inode.c       |  8 ++++----
 4 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 71e5b2e9a1ba..8b3bca269de3 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -591,7 +591,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 		if (submit) {
 			if (!skip_sum) {
-				ret = btrfs_csum_one_bio(inode, bio, start, 1);
+				ret = btrfs_csum_one_bio(inode, bio, start, true);
 				if (ret)
 					goto finish_cb;
 			}
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ebb2d109e8bb..dc70f37f2131 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3155,7 +3155,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   struct btrfs_ordered_sum *sums);
 blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
-				u64 file_start, int contig);
+				u64 offset, bool one_ordered);
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 			     struct list_head *list, int search_commit);
 void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 90c5c38836ab..42c1073a4e13 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -612,32 +612,33 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 	return ret;
 }
 
-/*
- * btrfs_csum_one_bio - Calculates checksums of the data contained inside a bio
+/**
+ * Calculate checksums of the data contained inside a bio
+ *
  * @inode:	 Owner of the data inside the bio
  * @bio:	 Contains the data to be checksummed
- * @file_start:  offset in file this bio begins to describe
- * @contig:	 Boolean. If true/1 means all bio vecs in this bio are
- *		 contiguous and they begin at @file_start in the file. False/0
- *		 means this bio can contain potentially discontiguous bio vecs
- *		 so the logical offset of each should be calculated separately.
+ * @offset:      If (u64)-1, @bio may contain discontiguous bio vecs, so the
+ *               file offsets are determined from the page offsets in the bio.
+ *               Otherwise, this is the starting file offset of the bio vecs in
+ *               @bio, which must be contiguous.
+ * @one_ordered: If true, @bio only refers to one ordered extent.
  */
 blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
-		       u64 file_start, int contig)
+				u64 offset, bool one_ordered)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	struct btrfs_ordered_sum *sums;
 	struct btrfs_ordered_extent *ordered = NULL;
+	const bool use_page_offsets = (offset == (u64)-1);
 	char *data;
 	struct bvec_iter iter;
 	struct bio_vec bvec;
 	int index;
-	int nr_sectors;
+	unsigned int blockcount;
 	unsigned long total_bytes = 0;
 	unsigned long this_sum_bytes = 0;
 	int i;
-	u64 offset;
 	unsigned nofs_flag;
 
 	nofs_flag = memalloc_nofs_save();
@@ -651,18 +652,13 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
 	sums->len = bio->bi_iter.bi_size;
 	INIT_LIST_HEAD(&sums->list);
 
-	if (contig)
-		offset = file_start;
-	else
-		offset = 0; /* shut up gcc */
-
 	sums->bytenr = bio->bi_iter.bi_sector << 9;
 	index = 0;
 
 	shash->tfm = fs_info->csum_shash;
 
 	bio_for_each_segment(bvec, bio, iter) {
-		if (!contig)
+		if (use_page_offsets)
 			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
 
 		if (!ordered) {
@@ -681,13 +677,14 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
 			}
 		}
 
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
+		blockcount = BTRFS_BYTES_TO_BLKS(fs_info,
 						 bvec.bv_len + fs_info->sectorsize
 						 - 1);
 
-		for (i = 0; i < nr_sectors; i++) {
-			if (offset >= ordered->file_offset + ordered->num_bytes ||
-			    offset < ordered->file_offset) {
+		for (i = 0; i < blockcount; i++) {
+			if (!one_ordered &&
+			    !in_range(offset, ordered->file_offset,
+				      ordered->num_bytes)) {
 				unsigned long bytes_left;
 
 				sums->len = this_sum_bytes;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5bbea5ec31fc..826f94b2fda5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2310,7 +2310,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
 static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio,
 					   u64 dio_file_offset)
 {
-	return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
+	return btrfs_csum_one_bio(BTRFS_I(inode), bio, (u64)-1, false);
 }
 
 /*
@@ -2562,7 +2562,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 					  0, btrfs_submit_bio_start);
 		goto out;
 	} else if (!skip_sum) {
-		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
+		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, (u64)-1, false);
 		if (ret)
 			goto out;
 	}
@@ -7831,7 +7831,7 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
 						     struct bio *bio,
 						     u64 dio_file_offset)
 {
-	return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, 1);
+	return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, false);
 }
 
 static void btrfs_end_dio_bio(struct bio *bio)
@@ -7888,7 +7888,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		 * If we aren't doing async submit, calculate the csum of the
 		 * bio now.
 		 */
-		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, 1);
+		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, false);
 		if (ret)
 			goto err;
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.17 13/21] btrfs: make search_csum_tree return 0 if we get -EFBIG
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Sasha Levin
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 14/21] btrfs: handle csum lookup errors properly on reads Sasha Levin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Josef Bacik, Boris Burkov, Johannes Thumshirn, David Sterba,
	Sasha Levin, clm, jbacik, linux-btrfs

From: Josef Bacik <josef@toxicpanda.com>

[ Upstream commit 03ddb19d2ea745228879b9334f3b550c88acb10a ]

We can either fail to find a csum entry at all and return -ENOENT, or we
can find a range that is close, but return -EFBIG.  In essence these
both mean the same thing when we are doing a lookup for a csum in an
existing range, we didn't find a csum.  We want to treat both of these
errors the same way, complain loudly that there wasn't a csum.  This
currently happens anyway because we do

	count = search_csum_tree();
	if (count <= 0) {
		// reloc and error handling
	}

However it forces us to incorrectly treat EIO or ENOMEM errors as on
disk corruption.  Fix this by returning 0 if we get either -ENOENT or
-EFBIG from btrfs_lookup_csum() so we can do proper error handling.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/file-item.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 42c1073a4e13..f9813853eaf8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -305,7 +305,7 @@ static int search_csum_tree(struct btrfs_fs_info *fs_info,
 	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
 			ret * csum_size);
 out:
-	if (ret == -ENOENT)
+	if (ret == -ENOENT || ret == -EFBIG)
 		ret = 0;
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.17 14/21] btrfs: handle csum lookup errors properly on reads
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 13/21] btrfs: make search_csum_tree return 0 if we get -EFBIG Sasha Levin
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 15/21] btrfs: do not double complete bio on errors during compressed reads Sasha Levin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Josef Bacik, David Sterba, Sasha Levin, clm, jbacik, linux-btrfs

From: Josef Bacik <josef@toxicpanda.com>

[ Upstream commit 1784b7d502a94b561eae58249adde5f72c26eb3c ]

Currently any error we get while trying to lookup csums during reads
shows up as a missing csum, and then on the read completion side we
print an error saying there was a csum mismatch and we increase the
device corruption count.

However we could have gotten an EIO from the lookup.  We could also be
inside of a memory constrained container and gotten a ENOMEM while
trying to do the read.  In either case we don't want to make this look
like a file system corruption problem, we want to make it look like the
actual error it is.  Capture any negative value, convert it to the
appropriate blk_status_t, free the csum array if we have one and bail.

Note: a possible improvement would be to make the relocation code look
up the owning inode and see if it's marked as NODATASUM and set
EXTENT_NODATASUM there, that way if there's corruption and there isn't a
checksum when we want it we can fail here rather than later.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/file-item.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index f9813853eaf8..70f5cbd9020b 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -368,6 +368,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	struct btrfs_bio *bbio = NULL;
 	struct btrfs_path *path;
 	const u32 sectorsize = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
@@ -377,6 +378,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 	u8 *csum;
 	const unsigned int nblocks = orig_len >> fs_info->sectorsize_bits;
 	int count = 0;
+	blk_status_t ret = BLK_STS_OK;
 
 	if ((BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) ||
 	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
@@ -400,7 +402,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 		return BLK_STS_RESOURCE;
 
 	if (!dst) {
-		struct btrfs_bio *bbio = btrfs_bio(bio);
+		bbio = btrfs_bio(bio);
 
 		if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) {
 			bbio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS);
@@ -456,21 +458,27 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 
 		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
 					 search_len, csum_dst);
-		if (count <= 0) {
-			/*
-			 * Either we hit a critical error or we didn't find
-			 * the csum.
-			 * Either way, we put zero into the csums dst, and skip
-			 * to the next sector.
-			 */
+		if (count < 0) {
+			ret = errno_to_blk_status(count);
+			if (bbio)
+				btrfs_bio_free_csum(bbio);
+			break;
+		}
+
+		/*
+		 * We didn't find a csum for this range.  We need to make sure
+		 * we complain loudly about this, because we are not NODATASUM.
+		 *
+		 * However for the DATA_RELOC inode we could potentially be
+		 * relocating data extents for a NODATASUM inode, so the inode
+		 * itself won't be marked with NODATASUM, but the extent we're
+		 * copying is in fact NODATASUM.  If we don't find a csum we
+		 * assume this is the case.
+		 */
+		if (count == 0) {
 			memset(csum_dst, 0, csum_size);
 			count = 1;
 
-			/*
-			 * For data reloc inode, we need to mark the range
-			 * NODATASUM so that balance won't report false csum
-			 * error.
-			 */
 			if (BTRFS_I(inode)->root->root_key.objectid ==
 			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
 				u64 file_offset;
@@ -491,7 +499,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 	}
 
 	btrfs_free_path(path);
-	return BLK_STS_OK;
+	return ret;
 }
 
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.17 15/21] btrfs: do not double complete bio on errors during compressed reads
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 14/21] btrfs: handle csum lookup errors properly on reads Sasha Levin
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 16/21] btrfs: do not clean up repair bio if submit fails Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode Sasha Levin
  6 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Josef Bacik, David Sterba, Sasha Levin, clm, jbacik, linux-btrfs

From: Josef Bacik <josef@toxicpanda.com>

[ Upstream commit f9f15de85d74e7eef021af059ca53a15f041cdd8 ]

I hit some weird panics while fixing up the error handling from
btrfs_lookup_bio_sums().  Turns out the compression path will complete
the bio we use if we set up any of the compression bios and then return
an error, and then btrfs_submit_data_bio() will also call bio_endio() on
the bio.

Fix this by making btrfs_submit_compressed_read() responsible for
calling bio_endio() on the bio if there are any errors.  Currently it
was only doing it if we created the compression bios, otherwise it was
depending on btrfs_submit_data_bio() to do the right thing.  This
creates the above problem, so fix up btrfs_submit_compressed_read() to
always call bio_endio() in case of an error, and then simply return from
btrfs_submit_data_bio() if we had to call
btrfs_submit_compressed_read().

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/compression.c | 20 ++++++++++++++------
 fs/btrfs/inode.c       |  8 +++++++-
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 8b3bca269de3..59f50d362db3 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -808,7 +808,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
-	blk_status_t ret = BLK_STS_RESOURCE;
+	blk_status_t ret;
 	int faili = 0;
 	u8 *sums;
 
@@ -821,14 +821,18 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	read_lock(&em_tree->lock);
 	em = lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
 	read_unlock(&em_tree->lock);
-	if (!em)
-		return BLK_STS_IOERR;
+	if (!em) {
+		ret = BLK_STS_IOERR;
+		goto out;
+	}
 
 	ASSERT(em->compress_type != BTRFS_COMPRESS_NONE);
 	compressed_len = em->block_len;
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
-	if (!cb)
+	if (!cb) {
+		ret = BLK_STS_RESOURCE;
 		goto out;
+	}
 
 	refcount_set(&cb->pending_sectors, compressed_len >> fs_info->sectorsize_bits);
 	cb->errors = 0;
@@ -851,8 +855,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	nr_pages = DIV_ROUND_UP(compressed_len, PAGE_SIZE);
 	cb->compressed_pages = kcalloc(nr_pages, sizeof(struct page *),
 				       GFP_NOFS);
-	if (!cb->compressed_pages)
+	if (!cb->compressed_pages) {
+		ret = BLK_STS_RESOURCE;
 		goto fail1;
+	}
 
 	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
 		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS);
@@ -938,7 +944,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			comp_bio = NULL;
 		}
 	}
-	return 0;
+	return BLK_STS_OK;
 
 fail2:
 	while (faili >= 0) {
@@ -951,6 +957,8 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	kfree(cb);
 out:
 	free_extent_map(em);
+	bio->bi_status = ret;
+	bio_endio(bio);
 	return ret;
 finish_cb:
 	if (comp_bio) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 826f94b2fda5..5aace4c13519 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2538,10 +2538,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			goto out;
 
 		if (bio_flags & EXTENT_BIO_COMPRESSED) {
+			/*
+			 * btrfs_submit_compressed_read will handle completing
+			 * the bio if there were any errors, so just return
+			 * here.
+			 */
 			ret = btrfs_submit_compressed_read(inode, bio,
 							   mirror_num,
 							   bio_flags);
-			goto out;
+			goto out_no_endio;
 		} else {
 			/*
 			 * Lookup bio sums does extra checks around whether we
@@ -2575,6 +2580,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 		bio->bi_status = ret;
 		bio_endio(bio);
 	}
+out_no_endio:
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.17 16/21] btrfs: do not clean up repair bio if submit fails
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 15/21] btrfs: do not double complete bio on errors during compressed reads Sasha Levin
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode Sasha Levin
  6 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Josef Bacik, Boris Burkov, David Sterba, Sasha Levin, clm, jbacik,
	linux-btrfs

From: Josef Bacik <josef@toxicpanda.com>

[ Upstream commit 8cbc3001a3264d998d6b6db3e23f935c158abd4d ]

The submit helper will always run bio_endio() on the bio if it fails to
submit, so cleaning up the bio just leads to a variety of use-after-free
and NULL pointer dereference bugs because we race with the endio
function that is cleaning up the bio.  Instead just return BLK_STS_OK as
the repair function has to continue to process the rest of the pages,
and the endio for the repair bio will do the appropriate cleanup for the
page that it was given.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/extent_io.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4c91060d103a..2c4f75348282 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2639,7 +2639,6 @@ int btrfs_repair_one_sector(struct inode *inode,
 	const int icsum = bio_offset >> fs_info->sectorsize_bits;
 	struct bio *repair_bio;
 	struct btrfs_bio *repair_bbio;
-	blk_status_t status;
 
 	btrfs_debug(fs_info,
 		   "repair read error: read error at %llu", start);
@@ -2678,13 +2677,13 @@ int btrfs_repair_one_sector(struct inode *inode,
 		    "repair read error: submitting new read to mirror %d",
 		    failrec->this_mirror);
 
-	status = submit_bio_hook(inode, repair_bio, failrec->this_mirror,
-				 failrec->bio_flags);
-	if (status) {
-		free_io_failure(failure_tree, tree, failrec);
-		bio_put(repair_bio);
-	}
-	return blk_status_to_errno(status);
+	/*
+	 * At this point we have a bio, so any errors from submit_bio_hook()
+	 * will be handled by the endio on the repair_bio, so we can't return an
+	 * error here.
+	 */
+	submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->bio_flags);
+	return BLK_STS_OK;
 }
 
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode
       [not found] <20220328194157.1585642-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 16/21] btrfs: do not clean up repair bio if submit fails Sasha Levin
@ 2022-03-28 19:41 ` Sasha Levin
  2022-03-29  9:59   ` Filipe Manana
  6 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2022-03-28 19:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Filipe Manana, David Sterba, Sasha Levin, clm, jbacik,
	linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 23e3337faf73e5bb2610697977e175313d48acb0 ]

When an inode has a last_reflink_trans matching the current transaction,
we have to take special care when logging its checksums in order to
avoid getting checksum items with overlapping ranges in a log tree,
which could result in missing checksums after log replay (more on that
in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
fix corrupt log due to concurrent fsync of inodes with shared extents")).
We also need to make sure a full fsync will copy all old file extent
items it finds in modified leaves, because they might have been copied
from some other inode.

However once we fsync an inode, we don't need to keep paying the price of
that extra special care in future fsyncs done in the same transaction,
unless the inode is used for another reflink operation or the full sync
flag is set on it (truncate, failure to allocate extent maps for holes,
and other exceptional and infrequent cases).

So after we fsync an inode reset its last_unlink_trans to zero. In case
another reflink happens, we continue to update the last_reflink_trans of
the inode, just as before. Also set last_reflink_trans to the generation
of the last transaction that modified the inode whenever we need to set
the full sync flag on the inode, just like when we need to load an inode
from disk after eviction.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/btrfs_inode.h | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/file.c        |  7 +++----
 fs/btrfs/inode.c       | 12 +++++-------
 fs/btrfs/reflink.c     |  5 ++---
 fs/btrfs/tree-log.c    |  8 ++++++++
 5 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index b3e46aabc3d8..d0b52b106041 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -333,6 +333,36 @@ static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode)
 	spin_unlock(&inode->lock);
 }
 
+/*
+ * Should be called while holding the inode's VFS lock in exclusive mode or in a
+ * context where no one else can access the inode concurrently (during inode
+ * creation or when loading an inode from disk).
+ */
+static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
+{
+	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+	/*
+	 * The inode may have been part of a reflink operation in the last
+	 * transaction that modified it, and then a fsync has reset the
+	 * last_reflink_trans to avoid subsequent fsyncs in the same
+	 * transaction to do unnecessary work. So update last_reflink_trans
+	 * to the last_trans value (we have to be pessimistic and assume a
+	 * reflink happened).
+	 *
+	 * The ->last_trans is protected by the inode's spinlock and we can
+	 * have a concurrent ordered extent completion update it. Also set
+	 * last_reflink_trans to ->last_trans only if the former is less than
+	 * the later, because we can be called in a context where
+	 * last_reflink_trans was set to the current transaction generation
+	 * while ->last_trans was not yet updated in the current transaction,
+	 * and therefore has a lower value.
+	 */
+	spin_lock(&inode->lock);
+	if (inode->last_reflink_trans < inode->last_trans)
+		inode->last_reflink_trans = inode->last_trans;
+	spin_unlock(&inode->lock);
+}
+
 static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
 {
 	bool ret = false;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a0179cc62913..f38cc706a6cf 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2474,7 +2474,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	hole_em = alloc_extent_map();
 	if (!hole_em) {
 		btrfs_drop_extent_cache(inode, offset, end - 1, 0);
-		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+		btrfs_set_inode_full_sync(inode);
 	} else {
 		hole_em->start = offset;
 		hole_em->len = end - offset;
@@ -2495,8 +2495,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 		} while (ret == -EEXIST);
 		free_extent_map(hole_em);
 		if (ret)
-			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-					&inode->runtime_flags);
+			btrfs_set_inode_full_sync(inode);
 	}
 
 	return 0;
@@ -2850,7 +2849,7 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
 	 * maps for the replacement extents (or holes).
 	 */
 	if (extent_info && !extent_info->is_new_extent)
-		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+		btrfs_set_inode_full_sync(inode);
 
 	if (ret)
 		goto out_trans;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5aace4c13519..3783fdf78da8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -423,7 +423,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start,
 		goto out;
 	}
 
-	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+	btrfs_set_inode_full_sync(inode);
 out:
 	/*
 	 * Don't forget to free the reserved space, as for inlined extent
@@ -4882,8 +4882,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
 			if (!hole_em) {
-				set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-					&inode->runtime_flags);
+				btrfs_set_inode_full_sync(inode);
 				goto next;
 			}
 			hole_em->start = cur_offset;
@@ -6146,7 +6145,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
 	 * sync since it will be a full sync anyway and this will blow away the
 	 * old info in the log.
 	 */
-	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
+	btrfs_set_inode_full_sync(BTRFS_I(inode));
 
 	key[0].objectid = objectid;
 	key[0].type = BTRFS_INODE_ITEM_KEY;
@@ -8740,7 +8739,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 	 * extents beyond i_size to drop.
 	 */
 	if (control.extents_found > 0)
-		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
+		btrfs_set_inode_full_sync(BTRFS_I(inode));
 
 	return ret;
 }
@@ -10027,8 +10026,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 
 		em = alloc_extent_map();
 		if (!em) {
-			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-				&BTRFS_I(inode)->runtime_flags);
+			btrfs_set_inode_full_sync(BTRFS_I(inode));
 			goto next;
 		}
 
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index a3930da4eb3f..e37a61ad87df 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -277,7 +277,7 @@ static int clone_copy_inline_extent(struct inode *dst,
 						  path->slots[0]),
 			    size);
 	btrfs_update_inode_bytes(BTRFS_I(dst), datal, drop_args.bytes_found);
-	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(dst)->runtime_flags);
+	btrfs_set_inode_full_sync(BTRFS_I(dst));
 	ret = btrfs_inode_set_file_extent_range(BTRFS_I(dst), 0, aligned_end);
 out:
 	if (!ret && !trans) {
@@ -575,8 +575,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 		 * replaced file extent items.
 		 */
 		if (last_dest_end >= i_size_read(inode))
-			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-				&BTRFS_I(inode)->runtime_flags);
+			btrfs_set_inode_full_sync(BTRFS_I(inode));
 
 		ret = btrfs_replace_file_extents(BTRFS_I(inode), path,
 				last_dest_end, destoff + len - 1, NULL, &trans);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6bc8834ac8f7..607527a924c2 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5836,6 +5836,14 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
 	if (inode_only != LOG_INODE_EXISTS)
 		inode->last_log_commit = inode->last_sub_trans;
 	spin_unlock(&inode->lock);
+
+	/*
+	 * Reset the last_reflink_trans so that the next fsync does not need to
+	 * go through the slower path when logging extents and their checksums.
+	 */
+	if (inode_only == LOG_INODE_ALL)
+		inode->last_reflink_trans = 0;
+
 out_unlock:
 	mutex_unlock(&inode->log_mutex);
 out:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode Sasha Levin
@ 2022-03-29  9:59   ` Filipe Manana
  2022-03-31 16:59     ` Sasha Levin
  0 siblings, 1 reply; 12+ messages in thread
From: Filipe Manana @ 2022-03-29  9:59 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Filipe Manana, David Sterba, clm, jbacik,
	linux-btrfs

On Mon, Mar 28, 2022 at 03:41:52PM -0400, Sasha Levin wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> [ Upstream commit 23e3337faf73e5bb2610697977e175313d48acb0 ]
> 
> When an inode has a last_reflink_trans matching the current transaction,
> we have to take special care when logging its checksums in order to
> avoid getting checksum items with overlapping ranges in a log tree,
> which could result in missing checksums after log replay (more on that
> in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
> checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
> fix corrupt log due to concurrent fsync of inodes with shared extents")).
> We also need to make sure a full fsync will copy all old file extent
> items it finds in modified leaves, because they might have been copied
> from some other inode.
> 
> However once we fsync an inode, we don't need to keep paying the price of
> that extra special care in future fsyncs done in the same transaction,
> unless the inode is used for another reflink operation or the full sync
> flag is set on it (truncate, failure to allocate extent maps for holes,
> and other exceptional and infrequent cases).
> 
> So after we fsync an inode reset its last_unlink_trans to zero. In case
> another reflink happens, we continue to update the last_reflink_trans of
> the inode, just as before. Also set last_reflink_trans to the generation
> of the last transaction that modified the inode whenever we need to set
> the full sync flag on the inode, just like when we need to load an inode
> from disk after eviction.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>

What's the motivation to backport this to stable?

It doesn't fix a bug or any regression, as far as I know at least.
Or is it to make some other backport easier?

Thanks.

> ---
>  fs/btrfs/btrfs_inode.h | 30 ++++++++++++++++++++++++++++++
>  fs/btrfs/file.c        |  7 +++----
>  fs/btrfs/inode.c       | 12 +++++-------
>  fs/btrfs/reflink.c     |  5 ++---
>  fs/btrfs/tree-log.c    |  8 ++++++++
>  5 files changed, 48 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index b3e46aabc3d8..d0b52b106041 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -333,6 +333,36 @@ static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode)
>  	spin_unlock(&inode->lock);
>  }
>  
> +/*
> + * Should be called while holding the inode's VFS lock in exclusive mode or in a
> + * context where no one else can access the inode concurrently (during inode
> + * creation or when loading an inode from disk).
> + */
> +static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
> +{
> +	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> +	/*
> +	 * The inode may have been part of a reflink operation in the last
> +	 * transaction that modified it, and then a fsync has reset the
> +	 * last_reflink_trans to avoid subsequent fsyncs in the same
> +	 * transaction to do unnecessary work. So update last_reflink_trans
> +	 * to the last_trans value (we have to be pessimistic and assume a
> +	 * reflink happened).
> +	 *
> +	 * The ->last_trans is protected by the inode's spinlock and we can
> +	 * have a concurrent ordered extent completion update it. Also set
> +	 * last_reflink_trans to ->last_trans only if the former is less than
> +	 * the later, because we can be called in a context where
> +	 * last_reflink_trans was set to the current transaction generation
> +	 * while ->last_trans was not yet updated in the current transaction,
> +	 * and therefore has a lower value.
> +	 */
> +	spin_lock(&inode->lock);
> +	if (inode->last_reflink_trans < inode->last_trans)
> +		inode->last_reflink_trans = inode->last_trans;
> +	spin_unlock(&inode->lock);
> +}
> +
>  static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
>  {
>  	bool ret = false;
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a0179cc62913..f38cc706a6cf 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2474,7 +2474,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>  	hole_em = alloc_extent_map();
>  	if (!hole_em) {
>  		btrfs_drop_extent_cache(inode, offset, end - 1, 0);
> -		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> +		btrfs_set_inode_full_sync(inode);
>  	} else {
>  		hole_em->start = offset;
>  		hole_em->len = end - offset;
> @@ -2495,8 +2495,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>  		} while (ret == -EEXIST);
>  		free_extent_map(hole_em);
>  		if (ret)
> -			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> -					&inode->runtime_flags);
> +			btrfs_set_inode_full_sync(inode);
>  	}
>  
>  	return 0;
> @@ -2850,7 +2849,7 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
>  	 * maps for the replacement extents (or holes).
>  	 */
>  	if (extent_info && !extent_info->is_new_extent)
> -		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> +		btrfs_set_inode_full_sync(inode);
>  
>  	if (ret)
>  		goto out_trans;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5aace4c13519..3783fdf78da8 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -423,7 +423,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start,
>  		goto out;
>  	}
>  
> -	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
> +	btrfs_set_inode_full_sync(inode);
>  out:
>  	/*
>  	 * Don't forget to free the reserved space, as for inlined extent
> @@ -4882,8 +4882,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
>  						cur_offset + hole_size - 1, 0);
>  			hole_em = alloc_extent_map();
>  			if (!hole_em) {
> -				set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> -					&inode->runtime_flags);
> +				btrfs_set_inode_full_sync(inode);
>  				goto next;
>  			}
>  			hole_em->start = cur_offset;
> @@ -6146,7 +6145,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
>  	 * sync since it will be a full sync anyway and this will blow away the
>  	 * old info in the log.
>  	 */
> -	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
> +	btrfs_set_inode_full_sync(BTRFS_I(inode));
>  
>  	key[0].objectid = objectid;
>  	key[0].type = BTRFS_INODE_ITEM_KEY;
> @@ -8740,7 +8739,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
>  	 * extents beyond i_size to drop.
>  	 */
>  	if (control.extents_found > 0)
> -		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
> +		btrfs_set_inode_full_sync(BTRFS_I(inode));
>  
>  	return ret;
>  }
> @@ -10027,8 +10026,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>  
>  		em = alloc_extent_map();
>  		if (!em) {
> -			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> -				&BTRFS_I(inode)->runtime_flags);
> +			btrfs_set_inode_full_sync(BTRFS_I(inode));
>  			goto next;
>  		}
>  
> diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> index a3930da4eb3f..e37a61ad87df 100644
> --- a/fs/btrfs/reflink.c
> +++ b/fs/btrfs/reflink.c
> @@ -277,7 +277,7 @@ static int clone_copy_inline_extent(struct inode *dst,
>  						  path->slots[0]),
>  			    size);
>  	btrfs_update_inode_bytes(BTRFS_I(dst), datal, drop_args.bytes_found);
> -	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(dst)->runtime_flags);
> +	btrfs_set_inode_full_sync(BTRFS_I(dst));
>  	ret = btrfs_inode_set_file_extent_range(BTRFS_I(dst), 0, aligned_end);
>  out:
>  	if (!ret && !trans) {
> @@ -575,8 +575,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
>  		 * replaced file extent items.
>  		 */
>  		if (last_dest_end >= i_size_read(inode))
> -			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> -				&BTRFS_I(inode)->runtime_flags);
> +			btrfs_set_inode_full_sync(BTRFS_I(inode));
>  
>  		ret = btrfs_replace_file_extents(BTRFS_I(inode), path,
>  				last_dest_end, destoff + len - 1, NULL, &trans);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 6bc8834ac8f7..607527a924c2 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -5836,6 +5836,14 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
>  	if (inode_only != LOG_INODE_EXISTS)
>  		inode->last_log_commit = inode->last_sub_trans;
>  	spin_unlock(&inode->lock);
> +
> +	/*
> +	 * Reset the last_reflink_trans so that the next fsync does not need to
> +	 * go through the slower path when logging extents and their checksums.
> +	 */
> +	if (inode_only == LOG_INODE_ALL)
> +		inode->last_reflink_trans = 0;
> +
>  out_unlock:
>  	mutex_unlock(&inode->log_mutex);
>  out:
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio()
  2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Sasha Levin
@ 2022-03-29 19:57   ` Omar Sandoval
  2022-03-31 16:58     ` Sasha Levin
  0 siblings, 1 reply; 12+ messages in thread
From: Omar Sandoval @ 2022-03-29 19:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Omar Sandoval, Josef Bacik, Nikolay Borisov,
	David Sterba, clm, jbacik, linux-btrfs

On Mon, Mar 28, 2022 at 03:41:47PM -0400, Sasha Levin wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> [ Upstream commit e331f6b19f8adde2307588bb325ae5de78617c20 ]
> 
> btrfs_csum_one_bio() loops over each filesystem block in the bio while
> keeping a cursor of its current logical position in the file in order to
> look up the ordered extent to add the checksums to. However, this
> doesn't make much sense for compressed extents, as a sector on disk does
> not correspond to a sector of decompressed file data. It happens to work
> because:
> 
> 1) the compressed bio always covers one ordered extent
> 2) the size of the bio is always less than the size of the ordered
>    extent
> 
> However, the second point will not always be true for encoded writes.
> 
> Let's add a boolean parameter to btrfs_csum_one_bio() to indicate that
> it can assume that the bio only covers one ordered extent. Since we're
> already changing the signature, let's get rid of the contig parameter
> and make it implied by the offset parameter, similar to the change we
> recently made to btrfs_lookup_bio_sums(). Additionally, let's rename
> nr_sectors to blockcount to make it clear that it's the number of
> filesystem blocks, not the number of 512-byte sectors.
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  fs/btrfs/compression.c |  2 +-
>  fs/btrfs/ctree.h       |  2 +-
>  fs/btrfs/file-item.c   | 37 +++++++++++++++++--------------------
>  fs/btrfs/inode.c       |  8 ++++----
>  4 files changed, 23 insertions(+), 26 deletions(-)

Hi, Sasha,

This patch doesn't fix a real bug, so it should be dropped from both
5.16 and 5.17.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio()
  2022-03-29 19:57   ` Omar Sandoval
@ 2022-03-31 16:58     ` Sasha Levin
  0 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2022-03-31 16:58 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-kernel, stable, Omar Sandoval, Josef Bacik, Nikolay Borisov,
	David Sterba, clm, jbacik, linux-btrfs

On Tue, Mar 29, 2022 at 12:57:23PM -0700, Omar Sandoval wrote:
>On Mon, Mar 28, 2022 at 03:41:47PM -0400, Sasha Levin wrote:
>> From: Omar Sandoval <osandov@fb.com>
>>
>> [ Upstream commit e331f6b19f8adde2307588bb325ae5de78617c20 ]
>>
>> btrfs_csum_one_bio() loops over each filesystem block in the bio while
>> keeping a cursor of its current logical position in the file in order to
>> look up the ordered extent to add the checksums to. However, this
>> doesn't make much sense for compressed extents, as a sector on disk does
>> not correspond to a sector of decompressed file data. It happens to work
>> because:
>>
>> 1) the compressed bio always covers one ordered extent
>> 2) the size of the bio is always less than the size of the ordered
>>    extent
>>
>> However, the second point will not always be true for encoded writes.
>>
>> Let's add a boolean parameter to btrfs_csum_one_bio() to indicate that
>> it can assume that the bio only covers one ordered extent. Since we're
>> already changing the signature, let's get rid of the contig parameter
>> and make it implied by the offset parameter, similar to the change we
>> recently made to btrfs_lookup_bio_sums(). Additionally, let's rename
>> nr_sectors to blockcount to make it clear that it's the number of
>> filesystem blocks, not the number of 512-byte sectors.
>>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Omar Sandoval <osandov@fb.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>> ---
>>  fs/btrfs/compression.c |  2 +-
>>  fs/btrfs/ctree.h       |  2 +-
>>  fs/btrfs/file-item.c   | 37 +++++++++++++++++--------------------
>>  fs/btrfs/inode.c       |  8 ++++----
>>  4 files changed, 23 insertions(+), 26 deletions(-)
>
>Hi, Sasha,
>
>This patch doesn't fix a real bug, so it should be dropped from both
>5.16 and 5.17.

I'll drop it, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode
  2022-03-29  9:59   ` Filipe Manana
@ 2022-03-31 16:59     ` Sasha Levin
  2022-03-31 17:57       ` Filipe Manana
  0 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2022-03-31 16:59 UTC (permalink / raw)
  To: Filipe Manana
  Cc: linux-kernel, stable, Filipe Manana, David Sterba, clm, jbacik,
	linux-btrfs

On Tue, Mar 29, 2022 at 10:59:33AM +0100, Filipe Manana wrote:
>On Mon, Mar 28, 2022 at 03:41:52PM -0400, Sasha Levin wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> [ Upstream commit 23e3337faf73e5bb2610697977e175313d48acb0 ]
>>
>> When an inode has a last_reflink_trans matching the current transaction,
>> we have to take special care when logging its checksums in order to
>> avoid getting checksum items with overlapping ranges in a log tree,
>> which could result in missing checksums after log replay (more on that
>> in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
>> checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
>> fix corrupt log due to concurrent fsync of inodes with shared extents")).
>> We also need to make sure a full fsync will copy all old file extent
>> items it finds in modified leaves, because they might have been copied
>> from some other inode.
>>
>> However once we fsync an inode, we don't need to keep paying the price of
>> that extra special care in future fsyncs done in the same transaction,
>> unless the inode is used for another reflink operation or the full sync
>> flag is set on it (truncate, failure to allocate extent maps for holes,
>> and other exceptional and infrequent cases).
>>
>> So after we fsync an inode reset its last_unlink_trans to zero. In case
>> another reflink happens, we continue to update the last_reflink_trans of
>> the inode, just as before. Also set last_reflink_trans to the generation
>> of the last transaction that modified the inode whenever we need to set
>> the full sync flag on the inode, just like when we need to load an inode
>> from disk after eviction.
>>
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
>What's the motivation to backport this to stable?
>
>It doesn't fix a bug or any regression, as far as I know at least.
>Or is it to make some other backport easier?

I wasn't sure if it's needed for completeness for the mentioned fixes,
so I took it. Can drop it if it's not needed.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode
  2022-03-31 16:59     ` Sasha Levin
@ 2022-03-31 17:57       ` Filipe Manana
  0 siblings, 0 replies; 12+ messages in thread
From: Filipe Manana @ 2022-03-31 17:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linux Kernel Mailing List, stable, Filipe Manana, David Sterba,
	Chris Mason, Josef Bacik, linux-btrfs

On Thu, Mar 31, 2022 at 5:59 PM Sasha Levin <sashal@kernel.org> wrote:
>
> On Tue, Mar 29, 2022 at 10:59:33AM +0100, Filipe Manana wrote:
> >On Mon, Mar 28, 2022 at 03:41:52PM -0400, Sasha Levin wrote:
> >> From: Filipe Manana <fdmanana@suse.com>
> >>
> >> [ Upstream commit 23e3337faf73e5bb2610697977e175313d48acb0 ]
> >>
> >> When an inode has a last_reflink_trans matching the current transaction,
> >> we have to take special care when logging its checksums in order to
> >> avoid getting checksum items with overlapping ranges in a log tree,
> >> which could result in missing checksums after log replay (more on that
> >> in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
> >> checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
> >> fix corrupt log due to concurrent fsync of inodes with shared extents")).
> >> We also need to make sure a full fsync will copy all old file extent
> >> items it finds in modified leaves, because they might have been copied
> >> from some other inode.
> >>
> >> However once we fsync an inode, we don't need to keep paying the price of
> >> that extra special care in future fsyncs done in the same transaction,
> >> unless the inode is used for another reflink operation or the full sync
> >> flag is set on it (truncate, failure to allocate extent maps for holes,
> >> and other exceptional and infrequent cases).
> >>
> >> So after we fsync an inode reset its last_unlink_trans to zero. In case
> >> another reflink happens, we continue to update the last_reflink_trans of
> >> the inode, just as before. Also set last_reflink_trans to the generation
> >> of the last transaction that modified the inode whenever we need to set
> >> the full sync flag on the inode, just like when we need to load an inode
> >> from disk after eviction.
> >>
> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >> Signed-off-by: David Sterba <dsterba@suse.com>
> >> Signed-off-by: Sasha Levin <sashal@kernel.org>
> >
> >What's the motivation to backport this to stable?
> >
> >It doesn't fix a bug or any regression, as far as I know at least.
> >Or is it to make some other backport easier?
>
> I wasn't sure if it's needed for completeness for the mentioned fixes,
> so I took it. Can drop it if it's not needed.

Yes, please drop it. It's not needed (nor was intended) to go to any
stable releases.

Thanks.

>
> --
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-31 17:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220328194157.1585642-1-sashal@kernel.org>
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Sasha Levin
2022-03-29 19:57   ` Omar Sandoval
2022-03-31 16:58     ` Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 13/21] btrfs: make search_csum_tree return 0 if we get -EFBIG Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 14/21] btrfs: handle csum lookup errors properly on reads Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 15/21] btrfs: do not double complete bio on errors during compressed reads Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 16/21] btrfs: do not clean up repair bio if submit fails Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode Sasha Levin
2022-03-29  9:59   ` Filipe Manana
2022-03-31 16:59     ` Sasha Levin
2022-03-31 17:57       ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox