All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Lukas Czerner <lczerner@redhat.com>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, psusi@ubuntu.com
Subject: Re: [PATCH 1/2 v2] e2fsck: Discard only unused parts of inode table
Date: Tue, 28 Feb 2012 10:31:21 -0600	[thread overview]
Message-ID: <4F4D0159.1090409@sandeen.net> (raw)
In-Reply-To: <1330014566-2020-1-git-send-email-lczerner@redhat.com>

On 02/23/2012 10:29 AM, Lukas Czerner wrote:
> When calling e2fsck with '-E discard' option it might happen that
> valid inodes are discarded accidentally. This is because we just
> discard the part of inode table which lies past the highest used
> inode. This is terribly wrong (sorry!).
> 
> This patch fixes it so only the free parts of an inode table
> is discarded, leaving used inodes intact. This was tested with highly
> fragmented inode tables with block size 4k and 1k.
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reported-by: Phillip Susi <psusi@ubuntu.com>

Apologies for being so iterative about this review but it's taken a while
for this function to make sense to me.

This patch does seem to fix the original bug, but I have a few more
nitpicks below.

> ---
> v2: reworked so that we comply with inode number counting and adjust
>     start in the e2fsck_discard_inodes(). Add some comments

>  e2fsck/pass5.c |   82 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 59 insertions(+), 23 deletions(-)
> 
> diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> index 1e836e3..57a207d 100644
> --- a/e2fsck/pass5.c
> +++ b/e2fsck/pass5.c
> @@ -94,6 +94,43 @@ static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
>  		ctx->options &= ~E2F_OPT_DISCARD;
>  }
>  
> +/*
> + * This will try to discard number 'count' starting at inode
> + * number 'start'. Note that 'start' is a inode number and hence
> + * it is counted from 1, it means that we need to adjust it
> + * by -1 in this function to compute right offset in the
> + * inode table.
> + */

But that's not quite right... a casual reader would assume that we
pass in the inode number of the starting inode we want to free;
that's true for the first group, but not the rest:

$ e2fsck/e2fsck -f -E discard fsfile 
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
discard starting at 12
discard starting at 1
discard starting at 1
discard starting at 1
discard starting at 1
discard starting at 1

...

we are not discarding inode 1 that many times ;) 

The comment should reflect that the function expects to receive
the inode number within this group, i.e. the Nth inode, starting
with the 1st, i.e. 1-based.  (which is a little weird, but I don't
see a better way).

The calling loop _does_ keep track of the actual inode number
within the system ('i') - but that's not what's passed in here,
and I don't think ctx->fs has the info we need to be able to
do it that way.

> +static void e2fsck_discard_inodes(e2fsck_t ctx, int group,
> +				  int start, int count)
> +{
> +	ext2_filsys fs = ctx->fs;
> +	blk64_t blk, num;
> +	int orig = count;
> +
> +	if ((ctx->options & E2F_OPT_NO) || !(ctx->options & E2F_OPT_DISCARD))
> +		return;
> +
> +	/*
> +	 * Start is inode number which starts counting from 1,

... inode number within the group ...

> +	 * so we need to adjust it.
> +	 */
> +	start -= 1;

I wonder if we need defense against programming errors here; if start
inadvertently comes in at 0, what happens?  I suppose blk goes off the end
and other checks hopefully catch it... probably not 'til it hits the kernel
though?

> +	/*
> +	 * We can discard only blocks containing only unused
> +	 * inodes in the table.
> +	 */
> +	blk = DIV_ROUND_UP(start,
> +		EXT2_INODES_PER_BLOCK(fs->super));
> +	count -= (blk * EXT2_INODES_PER_BLOCK(fs->super) - start);
> +	blk += ext2fs_inode_table_loc(fs, group);
> +	num = count / EXT2_INODES_PER_BLOCK(fs->super);
> +
> +	if (num > 0)
> +		e2fsck_discard_blocks(ctx, fs->io->manager, blk, num);
> +}
> +
>  #define NO_BLK ((blk64_t) -1)
>  
>  static void print_bitmap_problem(e2fsck_t ctx, int problem,
> @@ -435,6 +472,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
>  	int		skip_group = 0;
>  	int		redo_flag = 0;
>  	io_manager	manager = ctx->fs->io->manager;
> +	unsigned long long	first_free = fs->super->s_inodes_per_group + 1;

>  
>  	clear_problem_context(&pctx);
>  	free_array = (int *) e2fsck_allocate_memory(ctx,
> @@ -497,6 +535,7 @@ redo_counts:
>  				 * are 0, count the free inode,
>  				 * skip the current block group.
>  				 */
> +				first_free = 1;
>  				inodes = fs->super->s_inodes_per_group - 1;
>  				group_free = inodes;
>  				free_inodes += inodes;
> @@ -561,50 +600,47 @@ redo_counts:
>  		ctx->options &= ~E2F_OPT_DISCARD;
>  
>  do_counts:
> +		inodes++;
>  		if (bitmap) {
>  			if (ext2fs_test_inode_bitmap2(ctx->inode_dir_map, i))
>  				dirs_count++;
> +			if (inodes > first_free) {
> +				e2fsck_discard_inodes(ctx, group, first_free,
> +						      inodes - first_free);
> +				first_free = fs->super->s_inodes_per_group + 1;
> +			}
>  		} else if (!skip_group || csum_flag) {
>  			group_free++;
>  			free_inodes++;
> +			if (first_free > inodes)
> +				first_free = inodes;
>  		}
>  
> -		inodes++;
>  		if ((inodes == fs->super->s_inodes_per_group) ||
>  		    (i == fs->super->s_inodes_count)) {
> -
> -			free_array[group] = group_free;
> -			dir_array[group] = dirs_count;
> -
> -			/* Discard inode table */
> -			if (ctx->options & E2F_OPT_DISCARD) {
> -				blk64_t used_blks, blk, num;
> -
> -				used_blks = DIV_ROUND_UP(
> -					(EXT2_INODES_PER_GROUP(fs->super) -
> -					group_free),
> -					EXT2_INODES_PER_BLOCK(fs->super));
> -
> -				blk = ext2fs_inode_table_loc(fs, group) +
> -				      used_blks;
> -				num = fs->inode_blocks_per_group -
> -				      used_blks;
> -				e2fsck_discard_blocks(ctx, manager, blk, num);
> -			}
> -
> +			/*
> +			 * If the last inode is free, we can discard it as well.
> +			 */
> +			if (inodes >= first_free)
> +				e2fsck_discard_inodes(ctx, group, first_free,
> +						      inodes - first_free + 1);

This is unrelated to the changelog

>  			/*
>  			 * If discard zeroes data and the group inode table
>  			 * was not zeroed yet, set itable as zeroed
>  			 */
>  			if ((ctx->options & E2F_OPT_DISCARD) &&
> -			    (io_channel_discard_zeroes_data(fs->io)) &&
> +			    !(ctx->options & E2F_OPT_NO) &&

This is unrelated as well...

Rather than continuing to check _OPT_NO here and in e2fsck_discard_blocks(), it
might be better (in a separate patch) ;) to simply turn off OPT_DISCARD right
after options processing if -n was specified, and not worry about it down here.

> +			    io_channel_discard_zeroes_data(fs->io) &&
>  			    !(ext2fs_bg_flags_test(fs, group,
> -						  EXT2_BG_INODE_ZEROED))) {
> +						   EXT2_BG_INODE_ZEROED))) {
>  				ext2fs_bg_flags_set(fs, group,
>  						    EXT2_BG_INODE_ZEROED);
>  				ext2fs_group_desc_csum_set(fs, group);
>  			}
>  
> +			first_free = fs->super->s_inodes_per_group + 1;
> +			free_array[group] = group_free;
> +			dir_array[group] = dirs_count;
>  			group ++;
>  			inodes = 0;
>  			skip_group = 0;


  parent reply	other threads:[~2012-02-28 16:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-23 16:29 [PATCH 1/2 v2] e2fsck: Discard only unused parts of inode table Lukas Czerner
2012-02-27  5:42 ` Ted Ts'o
2012-02-27  7:03   ` Lukas Czerner
2012-02-28 16:31 ` Eric Sandeen [this message]
2012-02-29  7:22   ` Lukas Czerner
2012-03-01 17:35     ` Eric Sandeen
2012-03-06 22:22       ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4D0159.1090409@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=psusi@ubuntu.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.