* [PATCH 0/2] Reduce scope of extent locking while reading
@ 2024-08-19 20:00 Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 1/2] btrfs: clear folio writeback after completing ordered range Goldwyn Rodrigues
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-19 20:00 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
Extent locking is not required for the entire read process. Once the
extent map is read all information to retrieve is available in the
extent map, and is cached in em_cached for later retrieval. The extent
map cached is also refcounted so it would not disappear.
Limit the extent locking while reading the extnet maps only. The rest is
just creating bio and fetching/decompressing the data according to the
extent map provided.
The only reason this was not working is because we would get EIO as
the CRCs would not match (or were not committed to disk as yet) for
folios that were written and released. In order to get over it, mark the
extent as finished only after all folios are cleared of writeback.
I have run the xfstests on it without any deadlocks or corruptions.
However, a fresh outlook on what could go wrong with a limiting the
scope of locks would be good.
Goldwyn Rodrigues (2):
btrfs: clear folio writeback after completing ordered range
btrfs: take extent lock only while reading extent map
fs/btrfs/compression.c | 5 ---
fs/btrfs/extent_io.c | 99 +++---------------------------------------
2 files changed, 7 insertions(+), 97 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] btrfs: clear folio writeback after completing ordered range
2024-08-19 20:00 [PATCH 0/2] Reduce scope of extent locking while reading Goldwyn Rodrigues
@ 2024-08-19 20:00 ` Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 2/2] btrfs: take extent lock only while reading extent map Goldwyn Rodrigues
2024-08-29 17:53 ` [PATCH 0/2] Reduce scope of extent locking while reading David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-19 20:00 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
Mark the folio as completed writeback only after ordered range has been
marked as finished. This is to make sure that the CRCs are written to
disk before a disk read is performed.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
fs/btrfs/extent_io.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0bc4fa1f15cb..695589f02263 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -470,11 +470,11 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
"incomplete page write with offset %zu and length %zu",
fi.offset, fi.length);
- btrfs_finish_ordered_extent(bbio->ordered, folio, start, len,
- !error);
if (error)
mapping_set_error(folio->mapping, error);
btrfs_folio_clear_writeback(fs_info, folio, start, len);
+ btrfs_finish_ordered_extent(bbio->ordered, folio, start, len,
+ !error);
}
bio_put(bio);
--
2.46.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] btrfs: take extent lock only while reading extent map
2024-08-19 20:00 [PATCH 0/2] Reduce scope of extent locking while reading Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 1/2] btrfs: clear folio writeback after completing ordered range Goldwyn Rodrigues
@ 2024-08-19 20:00 ` Goldwyn Rodrigues
2024-08-29 17:53 ` [PATCH 0/2] Reduce scope of extent locking while reading David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-19 20:00 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
Limit the scope of the extent lock only while reading extent map.
The extent_map is refcounted so we don't need to worry about holding the
lock for it's lifecycle (in em_cached).
This removes all the processed extent structures and functions for
unlocking the extent as a whole on finishing reads.
This also removes all unlock extents for page boundaries in case of a
failure while performing reads.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
fs/btrfs/compression.c | 5 ---
fs/btrfs/extent_io.c | 95 +++---------------------------------------
2 files changed, 5 insertions(+), 95 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 832ab8984c41..094d46cdfd1b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -424,11 +424,9 @@ static noinline int add_ra_bio_pages(struct inode *inode,
struct extent_map *em;
struct address_space *mapping = inode->i_mapping;
struct extent_map_tree *em_tree;
- struct extent_io_tree *tree;
int sectors_missed = 0;
em_tree = &BTRFS_I(inode)->extent_tree;
- tree = &BTRFS_I(inode)->io_tree;
if (isize == 0)
return 0;
@@ -499,7 +497,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
}
page_end = (pg_index << PAGE_SHIFT) + folio_size(folio) - 1;
- lock_extent(tree, cur, page_end, NULL);
read_lock(&em_tree->lock);
em = lookup_extent_mapping(em_tree, cur, page_end + 1 - cur);
read_unlock(&em_tree->lock);
@@ -514,7 +511,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
(extent_map_block_start(em) >> SECTOR_SHIFT) !=
orig_bio->bi_iter.bi_sector) {
free_extent_map(em);
- unlock_extent(tree, cur, page_end, NULL);
folio_unlock(folio);
folio_put(folio);
break;
@@ -534,7 +530,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
if (!bio_add_folio(orig_bio, folio, add_size,
offset_in_folio(folio, cur))) {
- unlock_extent(tree, cur, page_end, NULL);
folio_unlock(folio);
folio_put(folio);
break;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 695589f02263..b16b80895dc7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -480,75 +480,6 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
bio_put(bio);
}
-/*
- * Record previously processed extent range
- *
- * For endio_readpage_release_extent() to handle a full extent range, reducing
- * the extent io operations.
- */
-struct processed_extent {
- struct btrfs_inode *inode;
- /* Start of the range in @inode */
- u64 start;
- /* End of the range in @inode */
- u64 end;
- bool uptodate;
-};
-
-/*
- * Try to release processed extent range
- *
- * May not release the extent range right now if the current range is
- * contiguous to processed extent.
- *
- * Will release processed extent when any of @inode, @uptodate, the range is
- * no longer contiguous to the processed range.
- *
- * Passing @inode == NULL will force processed extent to be released.
- */
-static void endio_readpage_release_extent(struct processed_extent *processed,
- struct btrfs_inode *inode, u64 start, u64 end,
- bool uptodate)
-{
- struct extent_state *cached = NULL;
- struct extent_io_tree *tree;
-
- /* The first extent, initialize @processed */
- if (!processed->inode)
- goto update;
-
- /*
- * Contiguous to processed extent, just uptodate the end.
- *
- * Several things to notice:
- *
- * - bio can be merged as long as on-disk bytenr is contiguous
- * This means we can have page belonging to other inodes, thus need to
- * check if the inode still matches.
- * - bvec can contain range beyond current page for multi-page bvec
- * Thus we need to do processed->end + 1 >= start check
- */
- if (processed->inode == inode && processed->uptodate == uptodate &&
- processed->end + 1 >= start && end >= processed->end) {
- processed->end = end;
- return;
- }
-
- tree = &processed->inode->io_tree;
- /*
- * Now we don't have range contiguous to the processed range, release
- * the processed range now.
- */
- unlock_extent(tree, processed->start, processed->end, &cached);
-
-update:
- /* Update processed to current range */
- processed->inode = inode;
- processed->start = start;
- processed->end = end;
- processed->uptodate = uptodate;
-}
-
static void begin_folio_read(struct btrfs_fs_info *fs_info, struct folio *folio)
{
ASSERT(folio_test_locked(folio));
@@ -575,7 +506,6 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
{
struct btrfs_fs_info *fs_info = bbio->fs_info;
struct bio *bio = &bbio->bio;
- struct processed_extent processed = { 0 };
struct folio_iter fi;
const u32 sectorsize = fs_info->sectorsize;
@@ -640,11 +570,7 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
/* Update page status and unlock. */
end_folio_read(folio, uptodate, start, len);
- endio_readpage_release_extent(&processed, BTRFS_I(inode),
- start, end, uptodate);
}
- /* Release the last extent */
- endio_readpage_release_extent(&processed, NULL, 0, 0, false);
bio_put(bio);
}
@@ -973,6 +899,7 @@ static struct extent_map *__get_extent_map(struct inode *inode,
u64 len, struct extent_map **em_cached)
{
struct extent_map *em;
+ struct extent_state *cached_state = NULL;
ASSERT(em_cached);
@@ -988,12 +915,16 @@ static struct extent_map *__get_extent_map(struct inode *inode,
*em_cached = NULL;
}
+ btrfs_lock_and_flush_ordered_range(BTRFS_I(inode), start, start + len - 1, &cached_state);
+
em = btrfs_get_extent(BTRFS_I(inode), folio, start, len);
if (!IS_ERR(em)) {
BUG_ON(*em_cached);
refcount_inc(&em->refs);
*em_cached = em;
}
+ unlock_extent(&BTRFS_I(inode)->io_tree, start, start + len - 1, &cached_state);
+
return em;
}
/*
@@ -1019,11 +950,9 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
size_t pg_offset = 0;
size_t iosize;
size_t blocksize = fs_info->sectorsize;
- struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
ret = set_folio_extent_mapped(folio);
if (ret < 0) {
- unlock_extent(tree, start, end, NULL);
folio_unlock(folio);
return ret;
}
@@ -1047,14 +976,12 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
if (cur >= last_byte) {
iosize = folio_size(folio) - pg_offset;
folio_zero_range(folio, pg_offset, iosize);
- unlock_extent(tree, cur, cur + iosize - 1, NULL);
end_folio_read(folio, true, cur, iosize);
break;
}
em = __get_extent_map(inode, folio, cur, end - cur + 1,
em_cached);
if (IS_ERR(em)) {
- unlock_extent(tree, cur, end, NULL);
end_folio_read(folio, false, cur, end + 1 - cur);
return PTR_ERR(em);
}
@@ -1123,7 +1050,6 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
if (block_start == EXTENT_MAP_HOLE) {
folio_zero_range(folio, pg_offset, iosize);
- unlock_extent(tree, cur, cur + iosize - 1, NULL);
end_folio_read(folio, true, cur, iosize);
cur = cur + iosize;
pg_offset += iosize;
@@ -1131,7 +1057,6 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
}
/* the get_extent function already copied into the folio */
if (block_start == EXTENT_MAP_INLINE) {
- unlock_extent(tree, cur, cur + iosize - 1, NULL);
end_folio_read(folio, true, cur, iosize);
cur = cur + iosize;
pg_offset += iosize;
@@ -1156,15 +1081,10 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
int btrfs_read_folio(struct file *file, struct folio *folio)
{
- struct btrfs_inode *inode = folio_to_inode(folio);
- u64 start = folio_pos(folio);
- u64 end = start + folio_size(folio) - 1;
struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ };
struct extent_map *em_cached = NULL;
int ret;
- btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
-
ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL);
free_extent_map(em_cached);
@@ -2345,15 +2265,10 @@ int btrfs_writepages(struct address_space *mapping, struct writeback_control *wb
void btrfs_readahead(struct readahead_control *rac)
{
struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD };
- struct btrfs_inode *inode = BTRFS_I(rac->mapping->host);
struct folio *folio;
- u64 start = readahead_pos(rac);
- u64 end = start + readahead_length(rac) - 1;
struct extent_map *em_cached = NULL;
u64 prev_em_start = (u64)-1;
- btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
-
while ((folio = readahead_folio(rac)) != NULL)
btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
--
2.46.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] Reduce scope of extent locking while reading
2024-08-19 20:00 [PATCH 0/2] Reduce scope of extent locking while reading Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 1/2] btrfs: clear folio writeback after completing ordered range Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 2/2] btrfs: take extent lock only while reading extent map Goldwyn Rodrigues
@ 2024-08-29 17:53 ` David Sterba
2024-08-29 21:59 ` Goldwyn Rodrigues
2 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2024-08-29 17:53 UTC (permalink / raw)
To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues, josef
On Mon, Aug 19, 2024 at 04:00:51PM -0400, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> Extent locking is not required for the entire read process. Once the
> extent map is read all information to retrieve is available in the
> extent map, and is cached in em_cached for later retrieval. The extent
> map cached is also refcounted so it would not disappear.
> Limit the extent locking while reading the extnet maps only. The rest is
> just creating bio and fetching/decompressing the data according to the
> extent map provided.
>
> The only reason this was not working is because we would get EIO as
> the CRCs would not match (or were not committed to disk as yet) for
> folios that were written and released. In order to get over it, mark the
> extent as finished only after all folios are cleared of writeback.
>
> I have run the xfstests on it without any deadlocks or corruptions.
> However, a fresh outlook on what could go wrong with a limiting the
> scope of locks would be good.
>
> Goldwyn Rodrigues (2):
> btrfs: clear folio writeback after completing ordered range
> btrfs: take extent lock only while reading extent map
The first patch seems relevant but it had been sent before Josef added
the read locking updates so I don't know if it still needs to be merged.
The second patch is covered by "btrfs: do not hold the extent lock for
entire read".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] Reduce scope of extent locking while reading
2024-08-29 17:53 ` [PATCH 0/2] Reduce scope of extent locking while reading David Sterba
@ 2024-08-29 21:59 ` Goldwyn Rodrigues
0 siblings, 0 replies; 5+ messages in thread
From: Goldwyn Rodrigues @ 2024-08-29 21:59 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs, josef
On 19:53 29/08, David Sterba wrote:
> On Mon, Aug 19, 2024 at 04:00:51PM -0400, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >
> > Extent locking is not required for the entire read process. Once the
> > extent map is read all information to retrieve is available in the
> > extent map, and is cached in em_cached for later retrieval. The extent
> > map cached is also refcounted so it would not disappear.
> > Limit the extent locking while reading the extnet maps only. The rest is
> > just creating bio and fetching/decompressing the data according to the
> > extent map provided.
> >
> > The only reason this was not working is because we would get EIO as
> > the CRCs would not match (or were not committed to disk as yet) for
> > folios that were written and released. In order to get over it, mark the
> > extent as finished only after all folios are cleared of writeback.
> >
> > I have run the xfstests on it without any deadlocks or corruptions.
> > However, a fresh outlook on what could go wrong with a limiting the
> > scope of locks would be good.
> >
> > Goldwyn Rodrigues (2):
> > btrfs: clear folio writeback after completing ordered range
> > btrfs: take extent lock only while reading extent map
>
> The first patch seems relevant but it had been sent before Josef added
> the read locking updates so I don't know if it still needs to be merged.
> The second patch is covered by "btrfs: do not hold the extent lock for
> entire read".
The first patch is not relevant anymore because of the way extent locking
is used for reads. ie, a read must make sure all ordered extents are
submitted beore proceeding. This was not the case before and hence I
used to receive CRC errors for reads. This was a small race, where a
process would writeback and invalidate pages, and a fresh read is
performed simultaneously. The CRC tree was not updated and reads would
fail with EIO.
The second patch is incorporated in Josef's v2 version of the patchset.
In short, ignore this series.
--
Goldwyn
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-29 21:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-19 20:00 [PATCH 0/2] Reduce scope of extent locking while reading Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 1/2] btrfs: clear folio writeback after completing ordered range Goldwyn Rodrigues
2024-08-19 20:00 ` [PATCH 2/2] btrfs: take extent lock only while reading extent map Goldwyn Rodrigues
2024-08-29 17:53 ` [PATCH 0/2] Reduce scope of extent locking while reading David Sterba
2024-08-29 21:59 ` Goldwyn Rodrigues
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.