* [PATCH v2] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
@ 2019-03-14 7:52 Nikolay Borisov
2019-03-19 18:40 ` David Sterba
0 siblings, 1 reply; 2+ messages in thread
From: Nikolay Borisov @ 2019-03-14 7:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
If a an eb fails to be read for whatever reason - it's corrupted on disk
and parent transid/key validations fail or IO for eb pages fail then
this buffer must be removed from the buffer cache. Currently the code
calls free_extent_buffer if an error occurs. Unfortunately this doesn't
achieve the desired behavior since btrfs_find_create_tree_block returns
with eb->refs == 2. On the other hand free_extent_buffer will only
decrement the refs once leavin it added to the buffer cache radix tree.
This enables later code to look up the buffer from the cache and utilize
it potentially leading to a crash.
The correct way to free the buffer is call free_extent_buffer_stale.
This function will correctly call atomic_dec explicitly for the buffer
and subsequently call release_extent_buffer which will decrement the
final reference thus correctly remove the invalid buffer from buffer
cache. This change affects only newly allocated buffers since they have
eb->refs == 2.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
Changes since v1:
* Immediately remove a corrupted extent buffer from readahead_tree_block as
well.
fs/btrfs/disk-io.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f7010312d171..9dda16c27255 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1021,9 +1021,12 @@ void readahead_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr)
buf = btrfs_find_create_tree_block(fs_info, bytenr);
if (IS_ERR(buf))
return;
- read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree,
- buf, WAIT_NONE, 0);
- free_extent_buffer(buf);
+
+ if (read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree,
+ buf, WAIT_NONE, 0))
+ free_extent_buffer_stale(buf);
+ else
+ free_extent_buffer(buf);
}
int reada_tree_block_flagged(struct btrfs_fs_info *fs_info, u64 bytenr,
@@ -1043,12 +1046,12 @@ int reada_tree_block_flagged(struct btrfs_fs_info *fs_info, u64 bytenr,
ret = read_extent_buffer_pages(io_tree, buf, WAIT_PAGE_LOCK,
mirror_num);
if (ret) {
- free_extent_buffer(buf);
+ free_extent_buffer_stale(buf);
return ret;
}
if (test_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags)) {
- free_extent_buffer(buf);
+ free_extent_buffer_stale(buf);
return -EIO;
} else if (extent_buffer_uptodate(buf)) {
*eb = buf;
@@ -1102,7 +1105,7 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
ret = btree_read_extent_buffer_pages(fs_info, buf, parent_transid,
level, first_key);
if (ret) {
- free_extent_buffer(buf);
+ free_extent_buffer_stale(buf);
return ERR_PTR(ret);
}
return buf;
--
2.17.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v2] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
2019-03-14 7:52 [PATCH v2] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails Nikolay Borisov
@ 2019-03-19 18:40 ` David Sterba
0 siblings, 0 replies; 2+ messages in thread
From: David Sterba @ 2019-03-19 18:40 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: linux-btrfs
On Thu, Mar 14, 2019 at 09:52:35AM +0200, Nikolay Borisov wrote:
> If a an eb fails to be read for whatever reason - it's corrupted on disk
> and parent transid/key validations fail or IO for eb pages fail then
> this buffer must be removed from the buffer cache. Currently the code
> calls free_extent_buffer if an error occurs. Unfortunately this doesn't
> achieve the desired behavior since btrfs_find_create_tree_block returns
> with eb->refs == 2. On the other hand free_extent_buffer will only
> decrement the refs once leavin it added to the buffer cache radix tree.
> This enables later code to look up the buffer from the cache and utilize
> it potentially leading to a crash.
>
> The correct way to free the buffer is call free_extent_buffer_stale.
> This function will correctly call atomic_dec explicitly for the buffer
> and subsequently call release_extent_buffer which will decrement the
> final reference thus correctly remove the invalid buffer from buffer
> cache. This change affects only newly allocated buffers since they have
> eb->refs == 2.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>
> Changes since v1:
> * Immediately remove a corrupted extent buffer from readahead_tree_block as
> well.
>
> fs/btrfs/disk-io.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f7010312d171..9dda16c27255 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1021,9 +1021,12 @@ void readahead_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr)
> buf = btrfs_find_create_tree_block(fs_info, bytenr);
> if (IS_ERR(buf))
> return;
> - read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree,
> - buf, WAIT_NONE, 0);
> - free_extent_buffer(buf);
> +
> + if (read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree,
> + buf, WAIT_NONE, 0))
I've switched that to
ret = read_extent_buffer_pages()
if (ret < 0)
as this is the common style. Function calls inside the conditions should
be only predicates, the rest should store and check the return value
separately.
Otherwise ok, though the dedicated helper for this would still be nice
to have, but this cann follow later and this patch can go to stable.
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-03-19 18:39 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-14 7:52 [PATCH v2] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails Nikolay Borisov
2019-03-19 18:40 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).