public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] fix buffered write path on -ENOSPC
@ 2024-10-18  4:40 Kent Overstreet
  2024-10-18  4:40 ` [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() Kent Overstreet
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-10-18  4:40 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

xfstests generic/299 turned up a data corruption when we do a short
write because of -ENOSPC - when we truncate the write we might no longer
be doing a full write to a !uptodate folio, which then needs to be read
in.

Amusingly, fixing this turned up a bug in fio where verify mode behaves
badly in the presence of short writes, which took longer to figure out
than the original bug...

Kent Overstreet (3):
  bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()
  bcachefs: bch2_folio_reservation_get_partial() is now better behaved
  bcachefs: Fix data corruption on -ENOSPC in buffered write path

 fs/bcachefs/buckets.c         |  7 +++-
 fs/bcachefs/buckets.h         | 12 +++---
 fs/bcachefs/fs-io-buffered.c  |  6 +++
 fs/bcachefs/fs-io-pagecache.c | 70 ++++++++++++++++++++++-------------
 4 files changed, 63 insertions(+), 32 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()
  2024-10-18  4:40 [PATCH 0/3] fix buffered write path on -ENOSPC Kent Overstreet
@ 2024-10-18  4:40 ` Kent Overstreet
  2024-10-18 11:49   ` wangjianjian (C)
  2024-10-18  4:40 ` [PATCH 2/3] bcachefs: bch2_folio_reservation_get_partial() is now better behaved Kent Overstreet
  2024-10-18  4:40 ` [PATCH 3/3] bcachefs: Fix data corruption on -ENOSPC in buffered write path Kent Overstreet
  2 siblings, 1 reply; 5+ messages in thread
From: Kent Overstreet @ 2024-10-18  4:40 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

bch2_disk_reservation_put() zeroes out the reservation - oops.

This fixes a disk reservation leak when getting a quota reservation
returned an error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/fs-io-pagecache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/bcachefs/fs-io-pagecache.c b/fs/bcachefs/fs-io-pagecache.c
index af3a24546aa3..cde3a0445ee9 100644
--- a/fs/bcachefs/fs-io-pagecache.c
+++ b/fs/bcachefs/fs-io-pagecache.c
@@ -431,9 +431,9 @@ int bch2_folio_reservation_get(struct bch_fs *c,
 		ret = bch2_quota_reservation_add(c, inode, &res->quota, quota_sectors, true);
 		if (unlikely(ret)) {
 			struct disk_reservation tmp = { .sectors = disk_sectors };
+			res->disk.sectors -= disk_sectors;
 
 			bch2_disk_reservation_put(c, &tmp);
-			res->disk.sectors -= disk_sectors;
 			return ret;
 		}
 	}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] bcachefs: bch2_folio_reservation_get_partial() is now better behaved
  2024-10-18  4:40 [PATCH 0/3] fix buffered write path on -ENOSPC Kent Overstreet
  2024-10-18  4:40 ` [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() Kent Overstreet
@ 2024-10-18  4:40 ` Kent Overstreet
  2024-10-18  4:40 ` [PATCH 3/3] bcachefs: Fix data corruption on -ENOSPC in buffered write path Kent Overstreet
  2 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-10-18  4:40 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

bch2_folio_reservation_get_partial(), on partial success, will now
return a reservation that's aligned to the filesystem blocksize.

This is a partial fix for fstests generic/299 - fio verify is badly
behaved in the presence of short writes that aren't aligned to its
blocksize.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/buckets.c         |  7 +++-
 fs/bcachefs/buckets.h         | 12 +++---
 fs/bcachefs/fs-io-pagecache.c | 70 ++++++++++++++++++++++-------------
 3 files changed, 57 insertions(+), 32 deletions(-)

diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c
index 546cd01a72e3..ec7d9a59bea9 100644
--- a/fs/bcachefs/buckets.c
+++ b/fs/bcachefs/buckets.c
@@ -1160,11 +1160,11 @@ int bch2_trans_mark_dev_sbs(struct bch_fs *c)
 #define SECTORS_CACHE	1024
 
 int __bch2_disk_reservation_add(struct bch_fs *c, struct disk_reservation *res,
-			      u64 sectors, int flags)
+				u64 sectors, enum bch_reservation_flags flags)
 {
 	struct bch_fs_pcpu *pcpu;
 	u64 old, get;
-	s64 sectors_available;
+	u64 sectors_available;
 	int ret;
 
 	percpu_down_read(&c->mark_lock);
@@ -1202,6 +1202,9 @@ int __bch2_disk_reservation_add(struct bch_fs *c, struct disk_reservation *res,
 	percpu_u64_set(&c->pcpu->sectors_available, 0);
 	sectors_available = avail_factor(__bch2_fs_usage_read_short(c).free);
 
+	if (sectors_available && (flags & BCH_DISK_RESERVATION_PARTIAL))
+		sectors = min(sectors, sectors_available);
+
 	if (sectors <= sectors_available ||
 	    (flags & BCH_DISK_RESERVATION_NOFAIL)) {
 		atomic64_set(&c->sectors_available,
diff --git a/fs/bcachefs/buckets.h b/fs/bcachefs/buckets.h
index e2cb7b24b220..fd5e6ccad45e 100644
--- a/fs/bcachefs/buckets.h
+++ b/fs/bcachefs/buckets.h
@@ -344,14 +344,16 @@ static inline void bch2_disk_reservation_put(struct bch_fs *c,
 	}
 }
 
-#define BCH_DISK_RESERVATION_NOFAIL		(1 << 0)
+enum bch_reservation_flags {
+	BCH_DISK_RESERVATION_NOFAIL	= 1 << 0,
+	BCH_DISK_RESERVATION_PARTIAL	= 1 << 1,
+};
 
-int __bch2_disk_reservation_add(struct bch_fs *,
-				struct disk_reservation *,
-				u64, int);
+int __bch2_disk_reservation_add(struct bch_fs *, struct disk_reservation *,
+				u64, enum bch_reservation_flags);
 
 static inline int bch2_disk_reservation_add(struct bch_fs *c, struct disk_reservation *res,
-					    u64 sectors, int flags)
+					    u64 sectors, enum bch_reservation_flags flags)
 {
 #ifdef __KERNEL__
 	u64 old, new;
diff --git a/fs/bcachefs/fs-io-pagecache.c b/fs/bcachefs/fs-io-pagecache.c
index cde3a0445ee9..1d4910ea0f1d 100644
--- a/fs/bcachefs/fs-io-pagecache.c
+++ b/fs/bcachefs/fs-io-pagecache.c
@@ -399,14 +399,17 @@ void bch2_folio_reservation_put(struct bch_fs *c,
 	bch2_quota_reservation_put(c, inode, &res->quota);
 }
 
-int bch2_folio_reservation_get(struct bch_fs *c,
+static int __bch2_folio_reservation_get(struct bch_fs *c,
 			struct bch_inode_info *inode,
 			struct folio *folio,
 			struct bch2_folio_reservation *res,
-			size_t offset, size_t len)
+			size_t offset, size_t len,
+			bool partial)
 {
 	struct bch_folio *s = bch2_folio_create(folio, 0);
 	unsigned i, disk_sectors = 0, quota_sectors = 0;
+	struct disk_reservation disk_res = {};
+	size_t reserved = len;
 	int ret;
 
 	if (!s)
@@ -422,48 +425,65 @@ int bch2_folio_reservation_get(struct bch_fs *c,
 	}
 
 	if (disk_sectors) {
-		ret = bch2_disk_reservation_add(c, &res->disk, disk_sectors, 0);
+		ret = bch2_disk_reservation_add(c, &disk_res, disk_sectors,
+				partial ? BCH_DISK_RESERVATION_PARTIAL : 0);
 		if (unlikely(ret))
 			return ret;
+
+		if (unlikely(disk_res.sectors != disk_sectors)) {
+			disk_sectors = quota_sectors = 0;
+
+			for (i = round_down(offset, block_bytes(c)) >> 9;
+			     i < round_up(offset + len, block_bytes(c)) >> 9;
+			     i++) {
+				disk_sectors += sectors_to_reserve(&s->s[i], res->disk.nr_replicas);
+				if (disk_sectors > disk_res.sectors) {
+					/*
+					 * Make sure to get a reservation that's
+					 * aligned to the filesystem blocksize:
+					 */
+					unsigned reserved_offset = round_down(i << 9, block_bytes(c));
+					reserved = clamp(reserved_offset, offset, offset + len) - offset;
+
+					if (!reserved) {
+						bch2_disk_reservation_put(c, &disk_res);
+						return -BCH_ERR_ENOSPC_disk_reservation;
+					}
+					break;
+				}
+				quota_sectors += s->s[i].state == SECTOR_unallocated;
+			}
+		}
 	}
 
 	if (quota_sectors) {
 		ret = bch2_quota_reservation_add(c, inode, &res->quota, quota_sectors, true);
 		if (unlikely(ret)) {
-			struct disk_reservation tmp = { .sectors = disk_sectors };
-			res->disk.sectors -= disk_sectors;
-
-			bch2_disk_reservation_put(c, &tmp);
+			bch2_disk_reservation_put(c, &disk_res);
 			return ret;
 		}
 	}
 
-	return 0;
+	res->disk.sectors += disk_res.sectors;
+	return partial ? reserved : 0;
 }
 
-ssize_t bch2_folio_reservation_get_partial(struct bch_fs *c,
+int bch2_folio_reservation_get(struct bch_fs *c,
 			struct bch_inode_info *inode,
 			struct folio *folio,
 			struct bch2_folio_reservation *res,
 			size_t offset, size_t len)
 {
-	size_t l, reserved = 0;
-	int ret;
-
-	while ((l = len - reserved)) {
-		while ((ret = bch2_folio_reservation_get(c, inode, folio, res, offset, l))) {
-			if ((offset & (block_bytes(c) - 1)) + l <= block_bytes(c))
-				return reserved ?: ret;
-
-			len = reserved + l;
-			l /= 2;
-		}
-
-		offset += l;
-		reserved += l;
-	}
+	return __bch2_folio_reservation_get(c, inode, folio, res, offset, len, false);
+}
 
-	return reserved;
+ssize_t bch2_folio_reservation_get_partial(struct bch_fs *c,
+			struct bch_inode_info *inode,
+			struct folio *folio,
+			struct bch2_folio_reservation *res,
+			size_t offset, size_t len)
+{
+	return __bch2_folio_reservation_get(c, inode, folio, res, offset, len, true);
 }
 
 static void bch2_clear_folio_bits(struct folio *folio)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] bcachefs: Fix data corruption on -ENOSPC in buffered write path
  2024-10-18  4:40 [PATCH 0/3] fix buffered write path on -ENOSPC Kent Overstreet
  2024-10-18  4:40 ` [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() Kent Overstreet
  2024-10-18  4:40 ` [PATCH 2/3] bcachefs: bch2_folio_reservation_get_partial() is now better behaved Kent Overstreet
@ 2024-10-18  4:40 ` Kent Overstreet
  2 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-10-18  4:40 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Found by generic/299: When we have to truncate a write due to -ENOSPC,
we may have to read in the folio we're writing to if we're now no longer
doing a complete write to a !uptodate folio.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/fs-io-buffered.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
index 48a1ab9a649b..95972809e76d 100644
--- a/fs/bcachefs/fs-io-buffered.c
+++ b/fs/bcachefs/fs-io-buffered.c
@@ -856,6 +856,12 @@ static int __bch2_buffered_write(struct bch_inode_info *inode,
 				folios_trunc(&fs, fi);
 				end = min(end, folio_end_pos(darray_last(fs)));
 			} else {
+				if (!folio_test_uptodate(f)) {
+					ret = bch2_read_single_folio(f, mapping);
+					if (ret)
+						goto out;
+				}
+
 				folios_trunc(&fs, fi + 1);
 				end = f_pos + f_reserved;
 			}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()
  2024-10-18  4:40 ` [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() Kent Overstreet
@ 2024-10-18 11:49   ` wangjianjian (C)
  0 siblings, 0 replies; 5+ messages in thread
From: wangjianjian (C) @ 2024-10-18 11:49 UTC (permalink / raw)
  To: Kent Overstreet, linux-bcachefs

On 2024/10/18 12:40, Kent Overstreet wrote:
> bch2_disk_reservation_put() zeroes out the reservation - oops.
> 
> This fixes a disk reservation leak when getting a quota reservation
> returned an error.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> ---
>   fs/bcachefs/fs-io-pagecache.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/bcachefs/fs-io-pagecache.c b/fs/bcachefs/fs-io-pagecache.c
> index af3a24546aa3..cde3a0445ee9 100644
> --- a/fs/bcachefs/fs-io-pagecache.c
> +++ b/fs/bcachefs/fs-io-pagecache.c
> @@ -431,9 +431,9 @@ int bch2_folio_reservation_get(struct bch_fs *c,
>   		ret = bch2_quota_reservation_add(c, inode, &res->quota, quota_sectors, true);
>   		if (unlikely(ret)) {
>   			struct disk_reservation tmp = { .sectors = disk_sectors };
> +			res->disk.sectors -= disk_sectors;
>   
>   			bch2_disk_reservation_put(c, &tmp);
> -			res->disk.sectors -= disk_sectors;
I am new to bcachefs, but seems bch2_disk_reservation_put zeros out 
tmp->sectors, no affect to 'res' ?
>   			return ret;
>   		}
>   	}
-- 
Regards


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-18 11:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18  4:40 [PATCH 0/3] fix buffered write path on -ENOSPC Kent Overstreet
2024-10-18  4:40 ` [PATCH 1/3] bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() Kent Overstreet
2024-10-18 11:49   ` wangjianjian (C)
2024-10-18  4:40 ` [PATCH 2/3] bcachefs: bch2_folio_reservation_get_partial() is now better behaved Kent Overstreet
2024-10-18  4:40 ` [PATCH 3/3] bcachefs: Fix data corruption on -ENOSPC in buffered write path Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox