From: Eric Sandeen <sandeen@redhat.com>
To: Lukas Czerner <lczerner@redhat.com>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, psusi@ubuntu.com
Subject: Re: [PATCH 1/2] e2fsck: Discard only unused parts of inode table
Date: Wed, 22 Feb 2012 18:51:37 -0600 [thread overview]
Message-ID: <4F458D99.30802@redhat.com> (raw)
In-Reply-To: <1329840132-16808-1-git-send-email-lczerner@redhat.com>
On 2/21/12 10:02 AM, Lukas Czerner wrote:
> When calling e2fsck with '-E discard' option it might happen that
> valid inodes are discarded accidentally. This is because we just
> discard the part of inode table which lies past the highest used
> inode. This is terribly wrong (sorry!).
>
> This patch fixes it so only the free parts of an inode table
> is discarded, leaving used inodes intact. This was tested with highly
> fragmented inode tables with block size 4k and 1k.
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reported-by: Phillip Susi <psusi@ubuntu.com>
> ---
> e2fsck/pass5.c | 59 +++++++++++++++++++++++++++++++++++--------------------
> 1 files changed, 37 insertions(+), 22 deletions(-)
>
> diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> index 1e836e3..9cc4a20 100644
> --- a/e2fsck/pass5.c
> +++ b/e2fsck/pass5.c
> @@ -94,6 +94,26 @@ static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
> ctx->options &= ~E2F_OPT_DISCARD;
> }
I could really use a comment here. Is "start" the inode nr within
the group? Is the first inode in the group 0 or 1? The calling
function's loop starts at inode 1 ... but based on some testing
it looks like you call the first free inode in an empty group
"inode 0" so I'll assume that, but comments please.
> +static void e2fsck_discard_inodes(e2fsck_t ctx, int group,
> + int start, int count)
I'm not sure if ints are good here, but it's what the calling
function uses, I guess.
> +{
> + ext2_filsys fs = ctx->fs;
> + blk64_t blk, num;
> +
> + /*
> + * We can discard only blocks containing only unused
> + * inodes in the table.
> + */
> + blk = DIV_ROUND_UP(start,
> + EXT2_INODES_PER_BLOCK(fs->super));
> + count -= (blk * EXT2_INODES_PER_BLOCK(fs->super) - start);
> + blk += ext2fs_inode_table_loc(fs, group);
> + num = count / EXT2_INODES_PER_BLOCK(fs->super);
> +
> + if (num > 0)
> + e2fsck_discard_blocks(ctx, fs->io->manager, blk, num);
> +}
I can't make intuitive sense of that at all, and had to count on my fingers
and draw diagrams. Maybe it's just my old brain. :(
Let's say we have 4 inode blocks in this table, with
1 == used, 0 == unused, and we start counting inodes at 0.
|1111111100000000|0000000000000000|0000000000000000|0000000000000001|
012345678 <- start
so start = 8, count = 55 (note one inode used at the end)
In this case we should get:
blk = 1 (DIV_ROUND_UP(8, 16)) - ok, that's next block after the partial block
count = 55 - ((1 * 16) - 8) = 47
num = 47 / 16 = 2
Ok, so that works, assuming the first inode in the group is "0"
We can make a filesystem kind of like that, to test it:
# mkfs.ext4 -I 256 -b 4096 -N 256 -G 1 -g 256 /dev/loop0 1024
with 4 groups each having 64 inodes.
If I run a e2fsck -E discard on that fresh filesystem, with some debugging, I see:
Pass 5: Checking group summary information
got start 12 count 51; count reduced to 47 inodes, discarding 2 blocks starting at block 36
Ok, that is a little weird; on a fresh filesystem lost+found/ is 11, so the first free
inode is 12; but now we seem to have started counting at 1, not 0?
got start 0 count 63; count reduced to 63 inodes, discarding 3 blocks starting at block 291
got start 0 count 63; count reduced to 63 inodes, discarding 3 blocks starting at block 514
got start 0 count 63; count reduced to 63 inodes, discarding 3 blocks starting at block 803
Oh, now we're back to "the first inode in the group is 0" :(
And here it looks like we are not discarding all the free blocks in the completely-free tables;
the last group has:
Inode table at 803-806 (+35)
217 free blocks, 64 free inodes, 0 directories, 64 unused inodes
but we only discarded 803 through 805, for example, because we were only told to discard
63 inodes, not 64 as I'd have expected since 64 are free.
I can't really make heads or tails out of how we got here, the calling function is so
messy it's really hard for me to keep track of. I think perhaps a clean up series
for the calling function first, and then we can try to properly extend it...?
I think we need ultra-careful attention to which variables are 0-based and which
are 1-based, for starters.
For e2fsck_discard_inodes() I wonder if it could be made more intuitive by:
1) find first totally free block based on start
2) find last totally free block based on the last free inode (start+count-1)
3) find free block length, free_length = last_blk - first_blk + 1
4) e2fsck_discard_blocks(..., ..., first_blk, free_length);
i.e. get rid of the count -= ...
_assuming_ we start counting at 0:
/* round up to next block after a partially free one */
first_blk = DIV_ROUND_UP(start,
EXT2_INODES_PER_BLOCK(fs->super));
first_blk += ext2fs_inode_table_loc(fs, group);
last_free_inode = start + count - 1;
/* round down to block before a partially free one */
last_blk = last_free_inode / EXT2_INODES_PER_BLOCK(fs->super));
free_blks = last_blk - first_blk + 1;
if (free_blks > 0)
e2fsck_discard_blocks(ctx, fs->io->manager, first_blk, free_blks);
I probably have off-by-ones there too, but you get the idea...
-Eric
> +
> #define NO_BLK ((blk64_t) -1)
>
> static void print_bitmap_problem(e2fsck_t ctx, int problem,
> @@ -422,6 +442,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
> ext2_ino_t i;
> unsigned int free_inodes = 0;
> int group_free = 0;
> + int first_free = fs->super->s_inodes_per_group;
> int dirs_count = 0;
> int group = 0;
> unsigned int inodes = 0;
> @@ -497,6 +518,7 @@ redo_counts:
> * are 0, count the free inode,
> * skip the current block group.
> */
> + first_free = 0;
> inodes = fs->super->s_inodes_per_group - 1;
> group_free = inodes;
> free_inodes += inodes;
> @@ -561,37 +583,27 @@ redo_counts:
> ctx->options &= ~E2F_OPT_DISCARD;
>
> do_counts:
> + inodes++;
> if (bitmap) {
> if (ext2fs_test_inode_bitmap2(ctx->inode_dir_map, i))
> dirs_count++;
> + if (inodes - 1 > first_free) {
> + e2fsck_discard_inodes(ctx, group, first_free,
> + inodes - first_free - 1);
> + first_free = fs->super->s_inodes_per_group;
> + }
> } else if (!skip_group || csum_flag) {
> group_free++;
> free_inodes++;
> + if (first_free > inodes)
> + first_free = inodes;
> }
>
> - inodes++;
> if ((inodes == fs->super->s_inodes_per_group) ||
> (i == fs->super->s_inodes_count)) {
> -
> - free_array[group] = group_free;
> - dir_array[group] = dirs_count;
> -
> - /* Discard inode table */
> - if (ctx->options & E2F_OPT_DISCARD) {
> - blk64_t used_blks, blk, num;
> -
> - used_blks = DIV_ROUND_UP(
> - (EXT2_INODES_PER_GROUP(fs->super) -
> - group_free),
> - EXT2_INODES_PER_BLOCK(fs->super));
> -
> - blk = ext2fs_inode_table_loc(fs, group) +
> - used_blks;
> - num = fs->inode_blocks_per_group -
> - used_blks;
> - e2fsck_discard_blocks(ctx, manager, blk, num);
> - }
> -
> + if (inodes - 1 > first_free)
> + e2fsck_discard_inodes(ctx, group, first_free,
> + inodes - first_free - 1);
> /*
> * If discard zeroes data and the group inode table
> * was not zeroed yet, set itable as zeroed
> @@ -599,12 +611,15 @@ do_counts:
> if ((ctx->options & E2F_OPT_DISCARD) &&
> (io_channel_discard_zeroes_data(fs->io)) &&
> !(ext2fs_bg_flags_test(fs, group,
> - EXT2_BG_INODE_ZEROED))) {
> + EXT2_BG_INODE_ZEROED))) {
> ext2fs_bg_flags_set(fs, group,
> EXT2_BG_INODE_ZEROED);
> ext2fs_group_desc_csum_set(fs, group);
> }
>
> + first_free = fs->super->s_inodes_per_group;
> + free_array[group] = group_free;
> + dir_array[group] = dirs_count;
> group ++;
> inodes = 0;
> skip_group = 0;
next prev parent reply other threads:[~2012-02-23 0:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-21 16:02 [PATCH 1/2] e2fsck: Discard only unused parts of inode table Lukas Czerner
2012-02-21 16:02 ` [PATCH 2/2] e2fsck: Do not forget to discard last block group Lukas Czerner
2012-02-23 0:51 ` Eric Sandeen [this message]
2012-02-23 13:05 ` [PATCH 1/2] e2fsck: Discard only unused parts of inode table Lukas Czerner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F458D99.30802@redhat.com \
--to=sandeen@redhat.com \
--cc=lczerner@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=psusi@ubuntu.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.