From: Jan Kara <jack@suse.cz>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Subject: Re: [RFC 1/6] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit
Date: Tue, 1 Feb 2022 12:21:34 +0100 [thread overview]
Message-ID: <20220201112134.aps3kd2ffv4trlhs@quack3.lan> (raw)
In-Reply-To: <a9770b46522c03989bdd96f63f7d0bfb2cf499ab.1643642105.git.riteshh@linux.ibm.com>
On Mon 31-01-22 20:46:50, Ritesh Harjani wrote:
> In case of flex_bg feature (which is by default enabled), extents for
> any given inode might span across blocks from two different block group.
> ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the
> starting block group, but it fails to read it again when the extent length
> boundary overflows to another block group. Then in this below loop it
> accesses memory beyond the block group bitmap buffer_head and results
> into a data abort.
>
> for (i = 0; i < clen; i++)
> if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> already++;
>
> This patch adds this functionality for checking block group boundary in
> ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different
> block group.
>
> w/o this patch, I was easily able to hit a data access abort using Power platform.
>
> <...>
> [ 74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters
> [ 74.533214] EXT4-fs (loop3): shut down requested (2)
> [ 74.536705] Aborting journal on device loop3-8.
> [ 74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000
> [ 74.703727] Faulting instruction address: 0xc0000000007bffb8
> cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060]
> pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0
> lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0
> sp: c000000015db7300
> msr: 800000000280b033
> dar: c00000005e980000
> dsisr: 40000000
> current = 0xc000000027af6880
> paca = 0xc00000003ffd5200 irqmask: 0x03 irq_happened: 0x01
> pid = 5167, comm = mount
> <...>
> enter ? for help
> [c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410
> [c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000
> [c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0
> [c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0
> [c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0
> [c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10
> [c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350
> [c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40
> [c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100
> [c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70
> [c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550
> [c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0
> [c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250
> --- Exception: c00 (System Call) at 00007ffff7dbfaa4
>
> Fixes: 8016e29f4362e28 ("ext4: fast commit recovery path")
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> ---
> fs/ext4/mballoc.c | 30 +++++++++++++++++++++++++++---
> 1 file changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index c781974df9d0..8d23108cf9d7 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3899,12 +3899,29 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> ext4_group_t group;
> ext4_grpblk_t blkoff;
> - int i, clen, err;
> + int i, err;
> int already;
> + unsigned int clen, overflow;
>
> - clen = EXT4_B2C(sbi, len);
> -
> +again:
And maybe structure this as a while loop? Like:
while (len > 0) {
...
}
> + overflow = 0;
> ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> +
> + /*
> + * Check to see if we are freeing blocks across a group
> + * boundary.
> + * In case of flex_bg, this can happen that (block, len) may span across
> + * more than one group. In that case we need to get the corresponding
> + * group metadata to work with. For this we have goto again loop.
> + */
> + if (EXT4_C2B(sbi, blkoff) + len > EXT4_BLOCKS_PER_GROUP(sb)) {
> + overflow = EXT4_C2B(sbi, blkoff) + len -
> + EXT4_BLOCKS_PER_GROUP(sb);
> + len -= overflow;
Why not just:
thisgrp_len = min_t(int, len,
EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
clen = EXT4_NUM_B2C(sbi, thisgrp_len);
It seems easier to understand to me.
Honza
> + }
> +
> + clen = EXT4_NUM_B2C(sbi, len);
> +
> bitmap_bh = ext4_read_block_bitmap(sb, group);
> if (IS_ERR(bitmap_bh)) {
> err = PTR_ERR(bitmap_bh);
> @@ -3960,6 +3977,13 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> sync_dirty_buffer(gdp_bh);
>
> + if (overflow && !err) {
> + block += len;
> + len = overflow;
> + put_bh(bitmap_bh);
> + goto again;
> + }
> +
> out_err:
> brelse(bitmap_bh);
> }
> --
> 2.31.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2022-02-01 11:21 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1643642105.git.riteshh@linux.ibm.com>
2022-01-31 15:16 ` [RFC 1/6] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit Ritesh Harjani
2022-02-01 11:21 ` Jan Kara [this message]
2022-02-04 10:12 ` Ritesh Harjani
2022-01-31 15:16 ` [RFC 2/6] ext4: Implement ext4_group_block_valid() as common function Ritesh Harjani
2022-02-01 11:34 ` Jan Kara
2022-02-04 10:08 ` Ritesh Harjani
2022-02-04 11:49 ` Jan Kara
2022-02-05 10:43 ` Ritesh Harjani
2022-01-31 15:16 ` [RFC 3/6] ext4: Use in_range() for range checking in ext4_fc_replay_check_excluded Ritesh Harjani
2022-02-01 11:35 ` Jan Kara
2022-01-31 15:16 ` [RFC 4/6] ext4: No need to test for block bitmap bits in ext4_mb_mark_bb() Ritesh Harjani
2022-02-01 11:38 ` Jan Kara
2022-02-04 10:10 ` Ritesh Harjani
2022-01-31 15:16 ` [RFC 5/6] ext4: Refactor ext4_free_blocks() to pull out ext4_mb_clear_bb() Ritesh Harjani
2022-02-01 11:40 ` Jan Kara
2022-01-31 15:16 ` [RFC 6/6] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption Ritesh Harjani
2022-02-01 11:47 ` Jan Kara
2022-02-04 10:11 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220201112134.aps3kd2ffv4trlhs@quack3.lan \
--to=jack@suse.cz \
--cc=harshadshirwadkar@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=riteshh@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.