* [PATCH 01/31] misc: fix clang warnings and a resource leak
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
@ 2014-12-20 21:16 ` Darrick J. Wong
2015-01-19 21:39 ` Theodore Ts'o
2014-12-20 21:16 ` [PATCH 02/31] debugfs: document new commands Darrick J. Wong
` (34 subsequent siblings)
35 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:16 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/symlink.c | 2 +-
misc/e2fuzz.c | 2 +-
resize/resize2fs.c | 6 ++++--
3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/lib/ext2fs/symlink.c b/lib/ext2fs/symlink.c
index 0732afe..279f48b 100644
--- a/lib/ext2fs/symlink.c
+++ b/lib/ext2fs/symlink.c
@@ -112,7 +112,7 @@ need_block:
ext2fs_iblk_set(fs, &inode, 1);
/* Slow symlinks, target stored in the first block */
memset(block_buf, 0, fs->blocksize);
- strcpy(block_buf, target);
+ strncpy(block_buf, target, fs->blocksize);
if (fs->super->s_feature_incompat &
EXT3_FEATURE_INCOMPAT_EXTENTS) {
/*
diff --git a/misc/e2fuzz.c b/misc/e2fuzz.c
index c08e3df..f786eac 100644
--- a/misc/e2fuzz.c
+++ b/misc/e2fuzz.c
@@ -273,7 +273,7 @@ int process_fs(const char *fsname)
if ((rand() % 2) && c < 128)
c |= 0x80;
if (verbose)
- printf("Corrupting byte %jd in block %jd to 0x%x\n",
+ printf("Corrupting byte %zu in block %zu to 0x%x\n",
off % fs->blocksize, off / fs->blocksize, c);
if (dryrun)
continue;
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 3fa13cf..2febfde 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -2311,8 +2311,10 @@ static errcode_t move_itables(ext2_resize_t rfs)
ext2fs_inode_table_loc(fs, i))
to_move++;
- if (to_move == 0)
- return 0;
+ if (to_move == 0) {
+ retval = 0;
+ goto errout;
+ }
if (rfs->progress) {
retval = rfs->progress(rfs, E2_RSZ_MOVE_ITABLE_PASS,
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 02/31] debugfs: document new commands
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
2014-12-20 21:16 ` [PATCH 01/31] misc: fix clang warnings and a resource leak Darrick J. Wong
@ 2014-12-20 21:16 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 03/31] libext2fs: zero blocks via FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks Darrick J. Wong
` (33 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:16 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Document the new journal and xattr commands in the debugfs manpage.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
debugfs/debugfs.8.in | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
debugfs/do_journal.c | 3 ++-
2 files changed, 48 insertions(+), 2 deletions(-)
diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index 8f44ced..04c280d 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -262,6 +262,32 @@ not stored in filesystem data structures. Hence, the values displayed
may not necessarily by accurate and does not indicate a problem or
corruption in the file system.)
.TP
+.BI ea_get " [-f outfile] filespec attr_name"
+Retrieve the value of the extended attribute
+.I attr_name
+in the file
+.I filespec
+and write it either to stdout or to \fIoutfile\fR.
+.TP
+.BI ea_list " filespec
+List the extended attributes associated with the file
+.I filespec
+to standard output.
+.TP
+.BI ea_set " [-f infile] filespec attr_name attr_value
+Set the value of the extended attribute
+.I attr_name
+in the file
+.I filespec
+to the string value
+.I attr_value
+or read it from \fIinfile\fR.
+.TP
+.BI ea_rm " filespec attr_names...
+Remove the extended attribute
+.I attr_name
+from the file \fIfilespec\fR.
+.TP
.BI expand_dir " filespec"
Expand the directory
.IR filespec .
@@ -373,6 +399,26 @@ to do this, use the
program. This is just a call to the low-level library, which sets up
the superblock and block descriptors.
.TP
+.BI journal_close
+Close the open journal.
+.TP
+.BI journal_open " [-c] [-v ver] [-j ext_jnl]
+Opens the journal for reading and writing. Journal checksumming can
+be enabled by supplying \fI-c\fR; checksum formats 2 and 3 can be
+selected with the \fI-v\fR option. An external journal can be loaded
+from \fIext_jnl\fR.
+.TP
+.BI journal_run
+Replay all transactions in the open journal.
+.TP
+.BI journal_write " [-b blocks] [-r revoke] [-c] file
+Write a transaction to the open journal. The list of blocks to write
+should be supplied as a comma-separated list in \fIblocks\fR; the
+blocks themselves should be readable from \fIfile\fR. A list of
+blocks to revoke can be supplied as a comma-separated list in
+\fIrevoke\fR. By default, a commit record is written at the end; the
+\fI-c\fR switch writes an uncommitted transaction.
+.TP
.BI kill_file " filespec"
Deallocate the inode
.I filespec
@@ -667,7 +713,6 @@ into a newly-created file in the filesystem named
.IR out_file .
.TP
.BI zap_block " [-f filespec] [-o offset] [-l length] [-p pattern] block_num"
-.TP
Overwrite the block specified by
.I block_num
with zero (NUL) bytes, or if
diff --git a/debugfs/do_journal.c b/debugfs/do_journal.c
index a17af6e..46d1793 100644
--- a/debugfs/do_journal.c
+++ b/debugfs/do_journal.c
@@ -925,9 +925,10 @@ void do_journal_open(int argc, char *argv[])
}
break;
default:
- printf("%s: [-c] [-v ver]\n", argv[0]);
+ printf("%s: [-c] [-v ver] [-f ext_jnl]\n", argv[0]);
printf("-c: Enable journal checksumming.\n");
printf("-v: Use this version checksum format.\n");
+ printf("-j: Load this external journal.\n");
}
}
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 03/31] libext2fs: zero blocks via FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
2014-12-20 21:16 ` [PATCH 01/31] misc: fix clang warnings and a resource leak Darrick J. Wong
2014-12-20 21:16 ` [PATCH 02/31] debugfs: document new commands Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 04/31] libext2fs: ext2fs_new_block2() should call alloc_block hook Darrick J. Wong
` (32 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Plumb a new call into the IO manager to support translating
ext2fs_zero_blocks calls into the equivalent FALLOC_FL_ZERO_RANGE
fallocate flag primitive when possible. This patch provides _only_
support for file-based images.
v2: Remove zero-out for block devices until BLKZEROOUT is cleaned up.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/ext2_io.h | 7 ++++-
lib/ext2fs/io_manager.c | 11 +++++++
lib/ext2fs/mkjournal.c | 5 +++
lib/ext2fs/test_io.c | 21 ++++++++++++++
lib/ext2fs/unix_io.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 4c5a5c5..1faa720 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -93,7 +93,9 @@ struct struct_io_manager {
errcode_t (*cache_readahead)(io_channel channel,
unsigned long long block,
unsigned long long count);
- long reserved[15];
+ errcode_t (*zeroout)(io_channel channel, unsigned long long block,
+ unsigned long long count);
+ long reserved[14];
};
#define IO_FLAG_RW 0x0001
@@ -125,6 +127,9 @@ extern errcode_t io_channel_write_blk64(io_channel channel,
extern errcode_t io_channel_discard(io_channel channel,
unsigned long long block,
unsigned long long count);
+extern errcode_t io_channel_zeroout(io_channel channel,
+ unsigned long long block,
+ unsigned long long count);
extern errcode_t io_channel_alloc_buf(io_channel channel,
int count, void *ptr);
extern errcode_t io_channel_cache_readahead(io_channel io,
diff --git a/lib/ext2fs/io_manager.c b/lib/ext2fs/io_manager.c
index dc5888d..c395d61 100644
--- a/lib/ext2fs/io_manager.c
+++ b/lib/ext2fs/io_manager.c
@@ -112,6 +112,17 @@ errcode_t io_channel_discard(io_channel channel, unsigned long long block,
return EXT2_ET_UNIMPLEMENTED;
}
+errcode_t io_channel_zeroout(io_channel channel, unsigned long long block,
+ unsigned long long count)
+{
+ EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+
+ if (channel->manager->zeroout)
+ return (channel->manager->zeroout)(channel, block, count);
+
+ return EXT2_ET_UNIMPLEMENTED;
+}
+
errcode_t io_channel_alloc_buf(io_channel io, int count, void *ptr)
{
size_t size;
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index fcc6741..c42cb98 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -170,6 +170,11 @@ errcode_t ext2fs_zero_blocks2(ext2_filsys fs, blk64_t blk, int num,
if (num <= 0)
return 0;
+ /* Try a zero out command, if supported */
+ retval = io_channel_zeroout(fs->io, blk, num);
+ if (retval == 0)
+ return 0;
+
/* Allocate the zeroizing buffer if necessary */
if (num > stride_length && stride_length < MAX_STRIDE_LENGTH) {
void *p;
diff --git a/lib/ext2fs/test_io.c b/lib/ext2fs/test_io.c
index b03a939..f7c50d1 100644
--- a/lib/ext2fs/test_io.c
+++ b/lib/ext2fs/test_io.c
@@ -86,6 +86,7 @@ void (*test_io_cb_write_byte)
#define TEST_FLAG_SET_OPTION 0x20
#define TEST_FLAG_DISCARD 0x40
#define TEST_FLAG_READAHEAD 0x80
+#define TEST_FLAG_ZEROOUT 0x100
static void test_dump_block(io_channel channel,
struct test_private_data *data,
@@ -507,6 +508,25 @@ static errcode_t test_cache_readahead(io_channel channel,
return retval;
}
+static errcode_t test_zeroout(io_channel channel, unsigned long long block,
+ unsigned long long count)
+{
+ struct test_private_data *data;
+ errcode_t retval = 0;
+
+ EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+ data = (struct test_private_data *) channel->private_data;
+ EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_TEST_IO_CHANNEL);
+
+ if (data->real)
+ retval = io_channel_zeroout(data->real, block, count);
+ if (data->flags & TEST_FLAG_ZEROOUT)
+ fprintf(data->outfile,
+ "Test_io: zeroout(%llu, %llu) returned %s\n",
+ block, count, retval ? error_message(retval) : "OK");
+ return retval;
+}
+
static struct struct_io_manager struct_test_manager = {
.magic = EXT2_ET_MAGIC_IO_MANAGER,
.name = "Test I/O Manager",
@@ -523,6 +543,7 @@ static struct struct_io_manager struct_test_manager = {
.write_blk64 = test_write_blk64,
.discard = test_discard,
.cache_readahead = test_cache_readahead,
+ .zeroout = test_zeroout,
};
io_manager test_io_manager = &struct_test_manager;
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index c3a8ea5..3408599 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -56,6 +56,9 @@
#if HAVE_LINUX_FALLOC_H
#include <linux/falloc.h>
#endif
+#if HAVE_STDINT_H
+#include <stdint.h>
+#endif
#if defined(__linux__) && defined(_IO) && !defined(BLKROGET)
#define BLKROGET _IO(0x12, 94) /* Get read-only status (0 = read_write). */
@@ -987,6 +990,72 @@ unimplemented:
return EXT2_ET_UNIMPLEMENTED;
}
+static errcode_t unix_zeroout(io_channel channel, unsigned long long block,
+ unsigned long long count)
+{
+ struct unix_private_data *data;
+ int ret;
+
+ EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+ data = (struct unix_private_data *) channel->private_data;
+ EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+ if (getenv("UNIX_IO_NOZEROOUT"))
+ goto unimplemented;
+
+ if (channel->flags & CHANNEL_FLAGS_BLOCK_DEVICE) {
+ /* Not implemented until the BLKZEROOUT mess is fixed */
+ goto unimplemented;
+ } else {
+ /* Regular file, try to use truncate/punch/zero. */
+#if defined(HAVE_FALLOCATE) && (defined(FALLOC_FL_ZERO_RANGE) || \
+ (defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE)))
+ struct stat statbuf;
+
+ if (count == 0)
+ return 0;
+ /*
+ * If we're trying to zero a range past the end of the file,
+ * extend the file size, then punch (or zero_range) everything.
+ */
+ ret = fstat(data->dev, &statbuf);
+ if (ret)
+ goto err;
+ if (statbuf.st_size < (block + count) * channel->block_size) {
+ ret = ftruncate(data->dev,
+ (block + count) * channel->block_size);
+ if (ret)
+ goto err;
+ }
+#if defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE)
+ ret = fallocate(data->dev,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ (off_t)(block) * channel->block_size,
+ (off_t)(count) * channel->block_size);
+ if (ret == 0)
+ goto err;
+#endif
+#ifdef FALLOC_FL_ZERO_RANGE
+ ret = fallocate(data->dev,
+ FALLOC_FL_ZERO_RANGE,
+ (off_t)(block) * channel->block_size,
+ (off_t)(count) * channel->block_size);
+#endif
+#else
+ goto unimplemented;
+#endif /* HAVE_FALLOCATE && (ZERO_RANGE || (PUNCH_HOLE && KEEP_SIZE)) */
+ }
+err:
+ if (ret < 0) {
+ if (errno == EOPNOTSUPP)
+ goto unimplemented;
+ return errno;
+ }
+ return 0;
+unimplemented:
+ return EXT2_ET_UNIMPLEMENTED;
+}
+
static struct struct_io_manager struct_unix_manager = {
.magic = EXT2_ET_MAGIC_IO_MANAGER,
.name = "Unix I/O Manager",
@@ -1003,6 +1072,7 @@ static struct struct_io_manager struct_unix_manager = {
.write_blk64 = unix_write_blk64,
.discard = unix_discard,
.cache_readahead = unix_cache_readahead,
+ .zeroout = unix_zeroout,
};
io_manager unix_io_manager = &struct_unix_manager;
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 04/31] libext2fs: ext2fs_new_block2() should call alloc_block hook
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (2 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 03/31] libext2fs: zero blocks via FALLOC_FL_ZERO_RANGE in ext2fs_zero_blocks Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 05/31] tune2fs: disable csum verification before resizing inode Darrick J. Wong
` (31 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
If ext2fs_new_block2() is called without a specific block map, we
should call the alloc_block hook before checking fs->block_map. This
helps us to avoid a bug in e2fsck where we need to allocate a block
but instead of consulting block_found_map, we use the FS bitmaps,
which (prior to pass 5) could be wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/pass1.c | 2 +-
lib/ext2fs/alloc.c | 15 +++++++++++++++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 14877d7..5f6e1dc 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -3729,7 +3729,7 @@ static errcode_t e2fsck_get_alloc_block(ext2_filsys fs, blk64_t goal,
return retval;
}
- retval = ext2fs_new_block2(fs, goal, 0, &new_block);
+ retval = ext2fs_new_block2(fs, goal, fs->block_map, &new_block);
if (retval)
return retval;
}
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 62b36fe..9901ca5 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -137,9 +137,23 @@ errcode_t ext2fs_new_block2(ext2_filsys fs, blk64_t goal,
{
errcode_t retval;
blk64_t b = 0;
+ errcode_t (*gab)(ext2_filsys fs, blk64_t goal, blk64_t *ret);
EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+ if (!map && fs->get_alloc_block) {
+ /*
+ * In case there are clients out there whose get_alloc_block
+ * handlers call ext2fs_new_block2 with a NULL block map,
+ * temporarily swap out the function pointer so that we don't
+ * end up in an infinite loop.
+ */
+ gab = fs->get_alloc_block;
+ fs->get_alloc_block = NULL;
+ retval = gab(fs, goal, &b);
+ fs->get_alloc_block = gab;
+ goto allocated;
+ }
if (!map)
map = fs->block_map;
if (!map)
@@ -153,6 +167,7 @@ errcode_t ext2fs_new_block2(ext2_filsys fs, blk64_t goal,
if ((retval == ENOENT) && (goal != fs->super->s_first_data_block))
retval = ext2fs_find_first_zero_block_bitmap2(map,
fs->super->s_first_data_block, goal - 1, &b);
+allocated:
if (retval == ENOENT)
return EXT2_ET_BLOCK_ALLOC_FAIL;
if (retval)
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 05/31] tune2fs: disable csum verification before resizing inode
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (3 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 04/31] libext2fs: ext2fs_new_block2() should call alloc_block hook Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 06/31] e2fsck: clear i_block[] when there are too many bad mappings on a special inode Darrick J. Wong
` (30 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
When we're turning on metadata checksumming /and/ resizing the inode
at the same time, disable checksum verification during the
resize_inode() call because the subroutines it calls will try to
verify the checksums (which have not yet been set), causing the
operation to fail unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
misc/tune2fs.c | 7 +++-
tests/t_iexpand_mcsum/expect | 54 ++++++++++++++++++++++++++++
tests/t_iexpand_mcsum/name | 1 +
tests/t_iexpand_mcsum/script | 80 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 141 insertions(+), 1 deletion(-)
create mode 100644 tests/t_iexpand_mcsum/expect
create mode 100644 tests/t_iexpand_mcsum/name
create mode 100644 tests/t_iexpand_mcsum/script
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index b510c49..f01b05b 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -2961,8 +2961,13 @@ retry_open:
* We want to update group descriptor also
* with the new free inode count
*/
+ if (rewrite_checksums)
+ fs->flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
- if (resize_inode(fs, new_inode_size) == 0) {
+ retval = resize_inode(fs, new_inode_size);
+ if (rewrite_checksums)
+ fs->flags &= ~EXT2_FLAG_IGNORE_CSUM_ERRORS;
+ if (retval == 0) {
printf(_("Setting inode size %lu\n"),
new_inode_size);
rewrite_checksums = 1;
diff --git a/tests/t_iexpand_mcsum/expect b/tests/t_iexpand_mcsum/expect
new file mode 100644
index 0000000..2a6d705
--- /dev/null
+++ b/tests/t_iexpand_mcsum/expect
@@ -0,0 +1,54 @@
+tune2fs test
+Creating filesystem with 786432 1k blocks and 98304 inodes
+Superblock backups stored on blocks:
+ 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409, 663553
+
+Allocating group tables: \b\b\b\b\bdone
+Writing inode tables: \b\b\b\b\bdone
+Creating journal (16384 blocks): done
+Creating 6334 huge file(s) with 117 blocks each: done
+Writing superblocks and filesystem accounting information: \b\b\b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
+tune2fs -I 256 -O metadata_csum test.img
+Setting inode size 256
+
+Please run e2fsck -D on the filesystem.
+
+Exit status is 0
+Backing up journal inode block information.
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+
+
+Change in FS metadata:
+@@ -5 +5 @@
+-Filesystem features: has_journal ext_attr dir_index filetype extent 64bit sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
++Filesystem features: has_journal ext_attr dir_index filetype extent 64bit sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
+@@ -21 +21 @@
+-Inode blocks per group: 128
++Inode blocks per group: 256
+@@ -27 +27 @@
+-Inode size: 128
++Inode size: 256
+@@ -30,0 +31 @@
++Checksum type: crc32c
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
diff --git a/tests/t_iexpand_mcsum/name b/tests/t_iexpand_mcsum/name
new file mode 100644
index 0000000..e767715
--- /dev/null
+++ b/tests/t_iexpand_mcsum/name
@@ -0,0 +1 @@
+expand inodes and turn on metadata_csum
diff --git a/tests/t_iexpand_mcsum/script b/tests/t_iexpand_mcsum/script
new file mode 100644
index 0000000..aa6592e
--- /dev/null
+++ b/tests/t_iexpand_mcsum/script
@@ -0,0 +1,80 @@
+if test -x $RESIZE2FS_EXE -a -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fn
+OUT=$test_name.log
+EXP=$test_dir/expect
+CONF=$TMPFILE.conf
+
+#gzip -d < $EXP.gz > $EXP
+
+cat > $CONF << ENDL
+[fs_types]
+ ext4h = {
+ features = has_journal,extent,huge_file,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,^resize_inode,^meta_bg,^flex_bg,^metadata_csum,64bit
+ blocksize = 1024
+ inode_size = 256
+ make_hugefiles = true
+ hugefiles_dir = /
+ hugefiles_slack = 16000K
+ hugefiles_name = aaaaa
+ hugefiles_digits = 4
+ hugefiles_size = 117K
+ zero_hugefiles = false
+ }
+ENDL
+
+echo "tune2fs test" > $OUT
+
+MKE2FS_CONFIG=$CONF $MKE2FS -F -T ext4h -I 128 $TMPFILE 786432 >> $OUT 2>&1
+rm -rf $CONF
+
+# dump and check
+($DUMPE2FS -h $TMPFILE | grep -v '^Free blocks:'; $DUMPE2FS -g $TMPFILE) 2>&1 | sed -f $cmd_dir/filter.sed -e '/^Checksum:.*/d' >> $OUT.before 2> /dev/null
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+# convert it
+echo "tune2fs -I 256 -O metadata_csum test.img" >> $OUT
+dd if=/dev/zero of=$TMPFILE conv=notrunc bs=1 count=1 seek=3221225471 2> /dev/null
+$TUNE2FS -I 256 -O metadata_csum $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+$FSCK -N test_filesys -y -f -D $TMPFILE >> $OUT 2>&1
+
+# dump and check
+($DUMPE2FS -h $TMPFILE | grep -v '^Free blocks:'; $DUMPE2FS -g $TMPFILE) 2>&1 | sed -f $cmd_dir/filter.sed -e '/^Checksum:.*/d' >> $OUT.after 2> /dev/null
+echo "Change in FS metadata:" >> $OUT
+diff -u0 $OUT.before $OUT.after | sed -e '/^---.*/d' -e '/^+++.*/d' >> $OUT
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+rm $TMPFILE
+
+#
+# Do the verification
+#
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" -e 's/test_filesys:.*//g' < $OUT > $OUT.new
+mv $OUT.new $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+rm $OUT.before $OUT.after
+
+unset IMAGE FSCK_OPT OUT EXP CONF
+
+else #if test -x $RESIZE2FS_EXE -a -x $DEBUGFS_EXE; then
+ echo "$test_name: $test_description: skipped"
+fi
+
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 06/31] e2fsck: clear i_block[] when there are too many bad mappings on a special inode
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (4 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 05/31] tune2fs: disable csum verification before resizing inode Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 07/31] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
` (29 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
If we decide to clear a special inode because of bad mappings, we need
to zero the i_block array. The clearing routine depends on setting
i_links_count to zero to keep us from re-checking the block maps,
but that field isn't checked for special inodes. Therefore, if we
haven't erased the mappings, check_blocks will restart fsck and fsck
will try to check the blocks again, leading to an infinite loop.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/pass1.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 5f6e1dc..a861177 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2862,6 +2862,14 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
}
if (pb.clear) {
+ /*
+ * If a special inode has such rotten block mappings that we
+ * want to clear the whole inode, be sure to actually zap
+ * the block maps because i_links_count isn't checked for
+ * special inodes, and we'll end up right back here.
+ */
+ if (ino < EXT2_FIRST_INODE(fs->super))
+ memset(inode->i_block, 0, sizeof(inode->i_block));
e2fsck_clear_inode(ctx, ino, inode, E2F_FLAG_RESTART,
"check_blocks");
return;
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 07/31] libext2fs/e2fsck: provide routines to read-ahead metadata
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (5 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 06/31] e2fsck: clear i_block[] when there are too many bad mappings on a special inode Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 08/31] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
` (28 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
This patch adds to e2fsck the ability to pre-fetch metadata into the
page cache in the hopes of speeding up fsck runs. There are two new
functions -- the first allows a caller to readahead a list of blocks,
and the second is a helper function that uses that first mechanism to
load group data (bitmaps, inode tables).
These new e2fsck routines require the addition of a dblist API to
allow us to iterate a subset of a dblist. This will enable
incremental directory block readahead in e2fsck pass 2.
There's also a function to estimate the readahead given a FS.
v2: Add an API to create a dblist with a given number of list elements
pre-allocated. This enables us to save ~2ms per call to
e2fsck_readahead() (assuming a 2MB RA buffer) by not having to
repeatedly call ext2_resize_mem as we add blocks to the list.
v3: Instead of creating dblists of arbitrary size, change the dblist
iterator to allow iterating a sub-range. This eliminates a lot of
unnecessary list copying during e2fsck part2.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
configure | 2
configure.in | 1
e2fsck/Makefile.in | 9 +-
e2fsck/e2fsck.h | 18 ++++
e2fsck/readahead.c | 252 +++++++++++++++++++++++++++++++++++++++++++++++++++
e2fsck/util.c | 51 ++++++++++
lib/config.h.in | 3 +
lib/ext2fs/dblist.c | 21 ++++
lib/ext2fs/ext2fs.h | 10 ++
9 files changed, 359 insertions(+), 8 deletions(-)
create mode 100644 e2fsck/readahead.c
diff --git a/configure b/configure
index f59d232..fdc93c0 100755
--- a/configure
+++ b/configure
@@ -12414,7 +12414,7 @@ fi
done
fi
-for ac_header in dirent.h errno.h execinfo.h getopt.h malloc.h mntent.h paths.h semaphore.h setjmp.h signal.h stdarg.h stdint.h stdlib.h termios.h termio.h unistd.h utime.h attr/xattr.h linux/falloc.h linux/fd.h linux/major.h linux/loop.h net/if_dl.h netinet/in.h sys/disklabel.h sys/disk.h sys/file.h sys/ioctl.h sys/mkdev.h sys/mman.h sys/mount.h sys/prctl.h sys/resource.h sys/select.h sys/socket.h sys/sockio.h sys/stat.h sys/syscall.h sys/sysmacros.h sys/time.h sys/types.h sys/un.h sys/wait.h
+for ac_header in dirent.h errno.h execinfo.h getopt.h malloc.h mntent.h paths.h semaphore.h setjmp.h signal.h stdarg.h stdint.h stdlib.h termios.h termio.h unistd.h utime.h attr/xattr.h linux/falloc.h linux/fd.h linux/major.h linux/loop.h net/if_dl.h netinet/in.h sys/disklabel.h sys/disk.h sys/file.h sys/ioctl.h sys/mkdev.h sys/mman.h sys/mount.h sys/prctl.h sys/resource.h sys/select.h sys/socket.h sys/sockio.h sys/stat.h sys/syscall.h sys/sysctl.h sys/sysmacros.h sys/time.h sys/types.h sys/un.h sys/wait.h
do :
as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index 9069234..73cfeb4 100644
--- a/configure.in
+++ b/configure.in
@@ -949,6 +949,7 @@ AC_CHECK_HEADERS(m4_flatten([
sys/sockio.h
sys/stat.h
sys/syscall.h
+ sys/sysctl.h
sys/sysmacros.h
sys/time.h
sys/types.h
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index d0e64eb..e40e51b 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -62,7 +62,7 @@ OBJS= dict.o unix.o e2fsck.o super.o pass1.o pass1b.o pass2.o \
pass3.o pass4.o pass5.o journal.o badblocks.o util.o dirinfo.o \
dx_dirinfo.o ehandler.o problem.o message.o quota.o recovery.o \
region.o revoke.o ea_refcount.o rehash.o profile.o prof_err.o \
- logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o
+ logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o readahead.o
PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
profiled/super.o profiled/pass1.o profiled/pass1b.o \
@@ -73,7 +73,8 @@ PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
profiled/recovery.o profiled/region.o profiled/revoke.o \
profiled/ea_refcount.o profiled/rehash.o profiled/profile.o \
profiled/prof_err.o profiled/logfile.o \
- profiled/sigcatcher.o profiled/plausible.o
+ profiled/sigcatcher.o profiled/plausible.o \
+ profiled/sigcatcher.o profiled/readahead.o
SRCS= $(srcdir)/e2fsck.c \
$(srcdir)/dict.c \
@@ -97,6 +98,7 @@ SRCS= $(srcdir)/e2fsck.c \
$(srcdir)/message.c \
$(srcdir)/ea_refcount.c \
$(srcdir)/rehash.c \
+ $(srcdir)/readahead.c \
$(srcdir)/region.c \
$(srcdir)/profile.c \
$(srcdir)/sigcatcher.c \
@@ -541,3 +543,6 @@ plausible.o: $(srcdir)/../misc/plausible.c $(top_builddir)/lib/config.h \
$(top_builddir)/lib/ext2fs/ext2_err.h \
$(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
$(srcdir)/../misc/nls-enable.h $(srcdir)/../misc/plausible.h
+readahead.o: $(srcdir)/readahead.c $(top_builddir)/lib/config.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/e2fsck.h prof_err.h
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 615ad75..0252824 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -490,6 +490,23 @@ extern ext2_ino_t e2fsck_get_lost_and_found(e2fsck_t ctx, int fix);
extern errcode_t e2fsck_adjust_inode_count(e2fsck_t ctx, ext2_ino_t ino,
int adj);
+/* readahead.c */
+#define E2FSCK_READA_SUPER (0x01)
+#define E2FSCK_READA_GDT (0x02)
+#define E2FSCK_READA_BBITMAP (0x04)
+#define E2FSCK_READA_IBITMAP (0x08)
+#define E2FSCK_READA_ITABLE (0x10)
+#define E2FSCK_READA_ALL_FLAGS (0x1F)
+errcode_t e2fsck_readahead(ext2_filsys fs, int flags, dgrp_t start,
+ dgrp_t ngroups);
+#define E2FSCK_RA_DBLIST_IGNORE_BLOCKCNT (0x01)
+#define E2FSCK_RA_DBLIST_ALL_FLAGS (0x01)
+errcode_t e2fsck_readahead_dblist(ext2_filsys fs, int flags,
+ ext2_dblist dblist,
+ unsigned long long start,
+ unsigned long long count);
+int e2fsck_can_readahead(ext2_filsys fs);
+unsigned long long e2fsck_guess_readahead(ext2_filsys fs);
/* region.c */
extern region_t region_create(region_addr_t min, region_addr_t max);
@@ -577,6 +594,7 @@ extern errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs,
int default_type,
const char *profile_name,
ext2fs_block_bitmap *ret);
+unsigned long long get_memory_size(void);
/* unix.c */
extern void e2fsck_clear_progbar(e2fsck_t ctx);
diff --git a/e2fsck/readahead.c b/e2fsck/readahead.c
new file mode 100644
index 0000000..a35f9f8
--- /dev/null
+++ b/e2fsck/readahead.c
@@ -0,0 +1,252 @@
+/*
+ * readahead.c -- Prefetch filesystem metadata to speed up fsck.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+
+#include "e2fsck.h"
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+struct read_dblist {
+ errcode_t err;
+ blk64_t run_start;
+ blk64_t run_len;
+ int flags;
+};
+
+static int readahead_dir_block(ext2_filsys fs, struct ext2_db_entry2 *db,
+ void *priv_data)
+{
+ struct read_dblist *pr = priv_data;
+ e2_blkcnt_t count = (pr->flags & E2FSCK_RA_DBLIST_IGNORE_BLOCKCNT ?
+ 1 : db->blockcnt);
+
+ if (!pr->run_len || db->blk != pr->run_start + pr->run_len) {
+ if (pr->run_len) {
+ pr->err = io_channel_cache_readahead(fs->io,
+ pr->run_start,
+ pr->run_len);
+ dbg_printf("readahead start=%llu len=%llu err=%d\n",
+ pr->run_start, pr->run_len,
+ (int)pr->err);
+ }
+ pr->run_start = db->blk;
+ pr->run_len = 0;
+ }
+ pr->run_len += count;
+
+ return pr->err ? DBLIST_ABORT : 0;
+}
+
+errcode_t e2fsck_readahead_dblist(ext2_filsys fs, int flags,
+ ext2_dblist dblist,
+ unsigned long long start,
+ unsigned long long count)
+{
+ errcode_t err;
+ struct read_dblist pr;
+
+ dbg_printf("%s: flags=0x%x\n", __func__, flags);
+ if (flags & ~E2FSCK_RA_DBLIST_ALL_FLAGS)
+ return EXT2_ET_INVALID_ARGUMENT;
+
+ memset(&pr, 0, sizeof(pr));
+ pr.flags = flags;
+ err = ext2fs_dblist_iterate3(dblist, readahead_dir_block, start,
+ count, &pr);
+ if (pr.err)
+ return pr.err;
+ if (err)
+ return err;
+
+ if (pr.run_len)
+ err = io_channel_cache_readahead(fs->io, pr.run_start,
+ pr.run_len);
+
+ return err;
+}
+
+static errcode_t e2fsck_readahead_bitmap(ext2_filsys fs,
+ ext2fs_block_bitmap ra_map)
+{
+ blk64_t start, end, out;
+ errcode_t err;
+
+ start = 1;
+ end = ext2fs_blocks_count(fs->super) - 1;
+
+ err = ext2fs_find_first_set_block_bitmap2(ra_map, start, end, &out);
+ while (err == 0) {
+ start = out;
+ err = ext2fs_find_first_zero_block_bitmap2(ra_map, start, end,
+ &out);
+ if (err == ENOENT) {
+ out = end;
+ err = 0;
+ } else if (err)
+ break;
+
+ err = io_channel_cache_readahead(fs->io, start, out - start);
+ if (err)
+ break;
+ start = out;
+ err = ext2fs_find_first_set_block_bitmap2(ra_map, start, end,
+ &out);
+ }
+
+ if (err == ENOENT)
+ err = 0;
+
+ return err;
+}
+
+/* Try not to spew bitmap range errors for readahead */
+static errcode_t mark_bmap_range(ext2_filsys fs, ext2fs_block_bitmap map,
+ blk64_t blk, unsigned int num)
+{
+ if (blk >= ext2fs_get_generic_bmap_start(map) &&
+ blk + num <= ext2fs_get_generic_bmap_end(map))
+ ext2fs_mark_block_bitmap_range2(map, blk, num);
+ else
+ return EXT2_ET_INVALID_ARGUMENT;
+ return 0;
+}
+
+static errcode_t mark_bmap(ext2_filsys fs, ext2fs_block_bitmap map, blk64_t blk)
+{
+ if (blk >= ext2fs_get_generic_bmap_start(map) &&
+ blk <= ext2fs_get_generic_bmap_end(map))
+ ext2fs_mark_block_bitmap2(map, blk);
+ else
+ return EXT2_ET_INVALID_ARGUMENT;
+ return 0;
+}
+
+errcode_t e2fsck_readahead(ext2_filsys fs, int flags, dgrp_t start,
+ dgrp_t ngroups)
+{
+ blk64_t super, old_gdt, new_gdt;
+ blk_t blocks;
+ dgrp_t i;
+ ext2fs_block_bitmap ra_map = NULL;
+ dgrp_t end = start + ngroups;
+ errcode_t err = 0;
+
+ dbg_printf("%s: flags=0x%x start=%d groups=%d\n", __func__, flags,
+ start, ngroups);
+ if (flags & ~E2FSCK_READA_ALL_FLAGS)
+ return EXT2_ET_INVALID_ARGUMENT;
+
+ if (end > fs->group_desc_count)
+ end = fs->group_desc_count;
+
+ if (flags == 0)
+ return 0;
+
+ err = ext2fs_allocate_block_bitmap(fs, "readahead bitmap",
+ &ra_map);
+ if (err)
+ return err;
+
+ for (i = start; i < end; i++) {
+ err = ext2fs_super_and_bgd_loc2(fs, i, &super, &old_gdt,
+ &new_gdt, &blocks);
+ if (err)
+ break;
+
+ if (flags & E2FSCK_READA_SUPER) {
+ err = mark_bmap(fs, ra_map, super);
+ if (err)
+ break;
+ }
+
+ if (flags & E2FSCK_READA_GDT) {
+ err = mark_bmap_range(fs, ra_map,
+ old_gdt ? old_gdt : new_gdt,
+ blocks);
+ if (err)
+ break;
+ }
+
+ if ((flags & E2FSCK_READA_BBITMAP) &&
+ !ext2fs_bg_flags_test(fs, i, EXT2_BG_BLOCK_UNINIT) &&
+ ext2fs_bg_free_blocks_count(fs, i) <
+ fs->super->s_blocks_per_group) {
+ super = ext2fs_block_bitmap_loc(fs, i);
+ err = mark_bmap(fs, ra_map, super);
+ if (err)
+ break;
+ }
+
+ if ((flags & E2FSCK_READA_IBITMAP) &&
+ !ext2fs_bg_flags_test(fs, i, EXT2_BG_INODE_UNINIT) &&
+ ext2fs_bg_free_inodes_count(fs, i) <
+ fs->super->s_inodes_per_group) {
+ super = ext2fs_inode_bitmap_loc(fs, i);
+ err = mark_bmap(fs, ra_map, super);
+ if (err)
+ break;
+ }
+
+ if ((flags & E2FSCK_READA_ITABLE) &&
+ ext2fs_bg_free_inodes_count(fs, i) <
+ fs->super->s_inodes_per_group) {
+ super = ext2fs_inode_table_loc(fs, i);
+ blocks = fs->inode_blocks_per_group -
+ (ext2fs_bg_itable_unused(fs, i) *
+ EXT2_INODE_SIZE(fs->super) / fs->blocksize);
+ err = mark_bmap_range(fs, ra_map, super, blocks);
+ if (err)
+ break;
+ }
+ }
+
+ if (!err)
+ err = e2fsck_readahead_bitmap(fs, ra_map);
+
+ ext2fs_free_block_bitmap(ra_map);
+ return err;
+}
+
+int e2fsck_can_readahead(ext2_filsys fs)
+{
+ errcode_t err;
+
+ err = io_channel_cache_readahead(fs->io, 0, 1);
+ dbg_printf("%s: supp=%d\n", __func__, err != EXT2_ET_OP_NOT_SUPPORTED);
+ return err != EXT2_ET_OP_NOT_SUPPORTED;
+}
+
+unsigned long long e2fsck_guess_readahead(ext2_filsys fs)
+{
+ unsigned long long guess;
+
+ /*
+ * The optimal readahead sizes were experimentally determined by
+ * djwong in August 2014. Setting the RA size to one block group's
+ * worth of inode table blocks seems to yield the largest reductions
+ * in e2fsck runtime.
+ */
+ guess = fs->blocksize * fs->inode_blocks_per_group;
+
+ /* Disable RA if it'd use more 1/100th of RAM. */
+ if (get_memory_size() > (guess * 100))
+ return guess / 1024;
+
+ return 0;
+}
diff --git a/e2fsck/util.c b/e2fsck/util.c
index 2de45f8..723dafb 100644
--- a/e2fsck/util.c
+++ b/e2fsck/util.c
@@ -37,6 +37,10 @@
#include <errno.h>
#endif
+#ifdef HAVE_SYS_SYSCTL_H
+#include <sys/sysctl.h>
+#endif
+
#include "e2fsck.h"
extern e2fsck_t e2fsck_global_ctx; /* Try your very best not to use this! */
@@ -795,3 +799,50 @@ errcode_t e2fsck_allocate_subcluster_bitmap(ext2_filsys fs, const char *descr,
fs->default_bitmap_type = save_type;
return retval;
}
+
+/* Return memory size in bytes */
+unsigned long long get_memory_size(void)
+{
+#if defined(_SC_PHYS_PAGES)
+# if defined(_SC_PAGESIZE)
+ return (unsigned long long)sysconf(_SC_PHYS_PAGES) *
+ (unsigned long long)sysconf(_SC_PAGESIZE);
+# elif defined(_SC_PAGE_SIZE)
+ return (unsigned long long)sysconf(_SC_PHYS_PAGES) *
+ (unsigned long long)sysconf(_SC_PAGE_SIZE);
+# endif
+#elif defined(CTL_HW)
+# if (defined(HW_MEMSIZE) || defined(HW_PHYSMEM64))
+# define CTL_HW_INT64
+# elif (defined(HW_PHYSMEM) || defined(HW_REALMEM))
+# define CTL_HW_UINT
+# endif
+ int mib[2];
+
+ mib[0] = CTL_HW;
+# if defined(HW_MEMSIZE)
+ mib[1] = HW_MEMSIZE;
+# elif defined(HW_PHYSMEM64)
+ mib[1] = HW_PHYSMEM64;
+# elif defined(HW_REALMEM)
+ mib[1] = HW_REALMEM;
+# elif defined(HW_PYSMEM)
+ mib[1] = HW_PHYSMEM;
+# endif
+# if defined(CTL_HW_INT64)
+ unsigned long long size = 0;
+# elif defined(CTL_HW_UINT)
+ unsigned int size = 0;
+# endif
+# if defined(CTL_HW_INT64) || defined(CTL_HW_UINT)
+ size_t len = sizeof(size);
+
+ if (sysctl(mib, 2, &size, &len, NULL, 0) == 0)
+ return (unsigned long long)size;
+# endif
+ return 0;
+#else
+# warning "Don't know how to detect memory on your platform?"
+ return 0;
+#endif
+}
diff --git a/lib/config.h.in b/lib/config.h.in
index 0db010f..cd7ec90 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -509,6 +509,9 @@
/* Define to 1 if you have the <sys/syscall.h> header file. */
#undef HAVE_SYS_SYSCALL_H
+/* Define to 1 if you have the <sys/sysctl.h> header file. */
+#undef HAVE_SYS_SYSCTL_H
+
/* Define to 1 if you have the <sys/sysmacros.h> header file. */
#undef HAVE_SYS_SYSMACROS_H
diff --git a/lib/ext2fs/dblist.c b/lib/ext2fs/dblist.c
index 942c4f0..bbdb221 100644
--- a/lib/ext2fs/dblist.c
+++ b/lib/ext2fs/dblist.c
@@ -194,20 +194,25 @@ void ext2fs_dblist_sort2(ext2_dblist dblist,
/*
* This function iterates over the directory block list
*/
-errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
+errcode_t ext2fs_dblist_iterate3(ext2_dblist dblist,
int (*func)(ext2_filsys fs,
struct ext2_db_entry2 *db_info,
void *priv_data),
+ unsigned long long start,
+ unsigned long long count,
void *priv_data)
{
- unsigned long long i;
+ unsigned long long i, end;
int ret;
EXT2_CHECK_MAGIC(dblist, EXT2_ET_MAGIC_DBLIST);
+ end = start + count;
if (!dblist->sorted)
ext2fs_dblist_sort2(dblist, 0);
- for (i=0; i < dblist->count; i++) {
+ if (end > dblist->count)
+ end = dblist->count;
+ for (i = start; i < end; i++) {
ret = (*func)(dblist->fs, &dblist->list[i], priv_data);
if (ret & DBLIST_ABORT)
return 0;
@@ -215,6 +220,16 @@ errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
return 0;
}
+errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
+ int (*func)(ext2_filsys fs,
+ struct ext2_db_entry2 *db_info,
+ void *priv_data),
+ void *priv_data)
+{
+ return ext2fs_dblist_iterate3(dblist, func, 0, dblist->count,
+ priv_data);
+}
+
static EXT2_QSORT_TYPE dir_block_cmp2(const void *a, const void *b)
{
const struct ext2_db_entry2 *db_a =
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 42c4ce1..8c16ae5 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1051,11 +1051,17 @@ extern void ext2fs_dblist_sort2(ext2_dblist dblist,
extern errcode_t ext2fs_dblist_iterate(ext2_dblist dblist,
int (*func)(ext2_filsys fs, struct ext2_db_entry *db_info,
void *priv_data),
- void *priv_data);
+ void *priv_data);
extern errcode_t ext2fs_dblist_iterate2(ext2_dblist dblist,
int (*func)(ext2_filsys fs, struct ext2_db_entry2 *db_info,
void *priv_data),
- void *priv_data);
+ void *priv_data);
+extern errcode_t ext2fs_dblist_iterate3(ext2_dblist dblist,
+ int (*func)(ext2_filsys fs, struct ext2_db_entry2 *db_info,
+ void *priv_data),
+ unsigned long long start,
+ unsigned long long count,
+ void *priv_data);
extern errcode_t ext2fs_set_dir_block(ext2_dblist dblist, ext2_ino_t ino,
blk_t blk, int blockcnt);
extern errcode_t ext2fs_set_dir_block2(ext2_dblist dblist, ext2_ino_t ino,
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 08/31] e2fsck: read-ahead metadata during passes 1, 2, and 4
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (6 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 07/31] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 09/31] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
` (27 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed. We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.
pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1. We use the
"iterate a subset of a dblist" and avoid copying the dblist. Directory
blocks are fetched incrementally as we walk through the directory
block list. In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it. Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.
pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.
In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs. Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost. (Single-issue USB mass storage disks seem to suffer badly.)
By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS). The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.
v2: Fix an off-by-one error in the pass1 readahead which made the
readahead trigger one inode too late if the block groups are full.
v3: Use the dblist partial iterator function to read ahead parts
of the directory block list in pass 2, instead of making sublists.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/e2fsck.8.in | 7 +++++
e2fsck/e2fsck.conf.5.in | 15 +++++++++++
e2fsck/e2fsck.h | 3 ++
e2fsck/pass1.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++
e2fsck/pass2.c | 38 +++++++++++++++++++++++++++
e2fsck/pass4.c | 9 +++++++
e2fsck/unix.c | 28 ++++++++++++++++++++
lib/ext2fs/ext2fs.h | 1 +
lib/ext2fs/inode.c | 3 +-
9 files changed, 167 insertions(+), 2 deletions(-)
diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index f5ed758..84ae50f 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -207,6 +207,13 @@ option may prevent you from further manual data recovery.
.BI nodiscard
Do not attempt to discard free blocks and unused inode blocks. This option is
exactly the opposite of discard option. This is set as default.
+.TP
+.BI readahead_kb
+Use this many KiB of memory to pre-fetch metadata in the hopes of reducing
+e2fsck runtime. By default, this is set to the size of a block group's inode
+table (typically 2MiB on a regular ext4 filesystem); if this amount is more
+than 1/100 of total physical memory, readahead is disabled. Set this to zero
+to disable readahead entirely.
.RE
.TP
.B \-f
diff --git a/e2fsck/e2fsck.conf.5.in b/e2fsck/e2fsck.conf.5.in
index 9ebfbbf..e1d0518 100644
--- a/e2fsck/e2fsck.conf.5.in
+++ b/e2fsck/e2fsck.conf.5.in
@@ -205,6 +205,21 @@ of that type are squelched. This can be useful if the console is slow
(i.e., connected to a serial port) and so a large amount of output could
end up delaying the boot process for a long time (potentially hours).
.TP
+.I readahead_mem_pct
+Use this percentage of memory to try to read in metadata blocks ahead of the
+main e2fsck thread. This should reduce run times, depending on the speed of
+the underlying storage and the amount of free memory. There is no default, but
+see
+.B readahead_mem_pct
+for more details.
+.TP
+.I readahead_kb
+Use this amount of memory to read in metadata blocks ahead of the main checking
+thread. Setting this value to zero disables readahead entirely. By default,
+this is set the size of one block group's inode table (typically 2MiB on a
+regular ext4 filesystem); if this amount is more than 1/100th of total physical
+memory, readahead is disabled.
+.TP
.I report_features
If this boolean relation is true, e2fsck will print the file system
features as part of its verbose reporting (i.e., if the
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 0252824..e359515 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -378,6 +378,9 @@ struct e2fsck_struct {
*/
void *priv_data;
ext2fs_block_bitmap block_metadata_map; /* Metadata blocks */
+
+ /* How much are we allowed to readahead? */
+ unsigned long long readahead_kb;
};
/* Used by the region allocation code */
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index a861177..d3d6ca3 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -869,6 +869,60 @@ out:
return 0;
}
+static void pass1_readahead(e2fsck_t ctx, dgrp_t *group, ext2_ino_t *next_ino)
+{
+ ext2_ino_t inodes_in_group = 0, inodes_per_block, inodes_per_buffer;
+ dgrp_t start = *group, grp;
+ blk64_t blocks_to_read = 0;
+ errcode_t err = EXT2_ET_INVALID_ARGUMENT;
+
+ if (ctx->readahead_kb == 0)
+ goto out;
+
+ /* Keep iterating groups until we have enough to readahead */
+ inodes_per_block = EXT2_INODES_PER_BLOCK(ctx->fs->super);
+ for (grp = start; grp < ctx->fs->group_desc_count; grp++) {
+ if (ext2fs_bg_flags_test(ctx->fs, grp, EXT2_BG_INODE_UNINIT))
+ continue;
+ inodes_in_group = ctx->fs->super->s_inodes_per_group -
+ ext2fs_bg_itable_unused(ctx->fs, grp);
+ blocks_to_read += (inodes_in_group + inodes_per_block - 1) /
+ inodes_per_block;
+ if (blocks_to_read * ctx->fs->blocksize >
+ ctx->readahead_kb * 1024)
+ break;
+ }
+
+ err = e2fsck_readahead(ctx->fs, E2FSCK_READA_ITABLE, start,
+ grp - start + 1);
+ if (err == EAGAIN) {
+ ctx->readahead_kb /= 2;
+ err = 0;
+ }
+
+out:
+ if (err) {
+ /* Error; disable itable readahead */
+ *group = ctx->fs->group_desc_count;
+ *next_ino = ctx->fs->super->s_inodes_count;
+ } else {
+ /*
+ * Don't do more readahead until we've reached the first inode
+ * of the last inode scan buffer block for the last group.
+ */
+ *group = grp + 1;
+ inodes_per_buffer = (ctx->inode_buffer_blocks ?
+ ctx->inode_buffer_blocks :
+ EXT2_INODE_SCAN_DEFAULT_BUFFER_BLOCKS) *
+ ctx->fs->blocksize /
+ EXT2_INODE_SIZE(ctx->fs->super);
+ inodes_in_group--;
+ *next_ino = inodes_in_group -
+ (inodes_in_group % inodes_per_buffer) + 1 +
+ (grp * ctx->fs->super->s_inodes_per_group);
+ }
+}
+
void e2fsck_pass1(e2fsck_t ctx)
{
int i;
@@ -891,10 +945,19 @@ void e2fsck_pass1(e2fsck_t ctx)
int low_dtime_check = 1;
int inode_size;
int failed_csum = 0;
+ ext2_ino_t ino_threshold = 0;
+ dgrp_t ra_group = 0;
init_resource_track(&rtrack, ctx->fs->io);
clear_problem_context(&pctx);
+ /* If we can do readahead, figure out how many groups to pull in. */
+ if (!e2fsck_can_readahead(ctx->fs))
+ ctx->readahead_kb = 0;
+ else if (ctx->readahead_kb == ~0ULL)
+ ctx->readahead_kb = e2fsck_guess_readahead(ctx->fs);
+ pass1_readahead(ctx, &ra_group, &ino_threshold);
+
if (!(ctx->options & E2F_OPT_PREEN))
fix_problem(ctx, PR_1_PASS_HEADER, &pctx);
@@ -1074,6 +1137,8 @@ void e2fsck_pass1(e2fsck_t ctx)
old_op = ehandler_operation(_("getting next inode from scan"));
pctx.errcode = ext2fs_get_next_inode_full(scan, &ino,
inode, inode_size);
+ if (ino > ino_threshold)
+ pass1_readahead(ctx, &ra_group, &ino_threshold);
ehandler_operation(old_op);
if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
return;
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 7aaebce..cffaac4 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -61,6 +61,9 @@
* Keeps track of how many times an inode is referenced.
*/
static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf);
+static int check_dir_block2(ext2_filsys fs,
+ struct ext2_db_entry2 *dir_blocks_info,
+ void *priv_data);
static int check_dir_block(ext2_filsys fs,
struct ext2_db_entry2 *dir_blocks_info,
void *priv_data);
@@ -77,6 +80,9 @@ struct check_dir_struct {
struct problem_context pctx;
int count, max;
e2fsck_t ctx;
+ unsigned long long list_offset;
+ unsigned long long ra_entries;
+ unsigned long long next_ra_off;
};
void e2fsck_pass2(e2fsck_t ctx)
@@ -96,6 +102,9 @@ void e2fsck_pass2(e2fsck_t ctx)
int i, depth;
problem_t code;
int bad_dir;
+ int (*check_dir_func)(ext2_filsys fs,
+ struct ext2_db_entry2 *dir_blocks_info,
+ void *priv_data);
init_resource_track(&rtrack, ctx->fs->io);
clear_problem_context(&cd.pctx);
@@ -139,6 +148,9 @@ void e2fsck_pass2(e2fsck_t ctx)
cd.ctx = ctx;
cd.count = 1;
cd.max = ext2fs_dblist_count2(fs->dblist);
+ cd.list_offset = 0;
+ cd.ra_entries = ctx->readahead_kb * 1024 / ctx->fs->blocksize;
+ cd.next_ra_off = 0;
if (ctx->progress)
(void) (ctx->progress)(ctx, 2, 0, cd.max);
@@ -146,7 +158,8 @@ void e2fsck_pass2(e2fsck_t ctx)
if (fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX)
ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);
- cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_block,
+ check_dir_func = cd.ra_entries ? check_dir_block2 : check_dir_block;
+ cd.pctx.errcode = ext2fs_dblist_iterate2(fs->dblist, check_dir_func,
&cd);
if (ctx->flags & E2F_FLAG_SIGNAL_MASK || ctx->flags & E2F_FLAG_RESTART)
return;
@@ -825,6 +838,29 @@ err:
return retval;
}
+static int check_dir_block2(ext2_filsys fs,
+ struct ext2_db_entry2 *db,
+ void *priv_data)
+{
+ int err;
+ struct check_dir_struct *cd = priv_data;
+
+ if (cd->ra_entries && cd->list_offset >= cd->next_ra_off) {
+ err = e2fsck_readahead_dblist(fs,
+ E2FSCK_RA_DBLIST_IGNORE_BLOCKCNT,
+ fs->dblist,
+ cd->list_offset + cd->ra_entries / 8,
+ cd->ra_entries);
+ if (err)
+ cd->ra_entries = 0;
+ cd->next_ra_off = cd->list_offset + (cd->ra_entries * 7 / 8);
+ }
+
+ err = check_dir_block(fs, db, priv_data);
+ cd->list_offset++;
+ return err;
+}
+
static int check_dir_block(ext2_filsys fs,
struct ext2_db_entry2 *db,
void *priv_data)
diff --git a/e2fsck/pass4.c b/e2fsck/pass4.c
index 21d93f0..bc9a2c4 100644
--- a/e2fsck/pass4.c
+++ b/e2fsck/pass4.c
@@ -106,6 +106,15 @@ void e2fsck_pass4(e2fsck_t ctx)
#ifdef MTRACE
mtrace_print("Pass 4");
#endif
+ /*
+ * Since pass4 is mostly CPU bound, start readahead of bitmaps
+ * ahead of pass 5 if we haven't already loaded them.
+ */
+ if (ctx->readahead_kb &&
+ (fs->block_map == NULL || fs->inode_map == NULL))
+ e2fsck_readahead(fs, E2FSCK_READA_BBITMAP |
+ E2FSCK_READA_IBITMAP,
+ 0, fs->group_desc_count);
clear_problem_context(&pctx);
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 615d690..f3672c0 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -650,6 +650,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
char *buf, *token, *next, *p, *arg;
int ea_ver;
int extended_usage = 0;
+ unsigned long long reada_kb;
buf = string_copy(ctx, opts, 0);
for (token = buf; token && *token; token = next) {
@@ -678,6 +679,15 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
continue;
}
ctx->ext_attr_ver = ea_ver;
+ } else if (strcmp(token, "readahead_kb") == 0) {
+ reada_kb = strtoull(arg, &p, 0);
+ if (*p) {
+ fprintf(stderr, "%s",
+ _("Invalid readahead buffer size.\n"));
+ extended_usage++;
+ continue;
+ }
+ ctx->readahead_kb = reada_kb;
} else if (strcmp(token, "fragcheck") == 0) {
ctx->options |= E2F_OPT_FRAGCHECK;
continue;
@@ -717,6 +727,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
fputs(("\tjournal_only\n"), stderr);
fputs(("\tdiscard\n"), stderr);
fputs(("\tnodiscard\n"), stderr);
+ fputs(("\treadahead_kb=<buffer size>\n"), stderr);
fputc('\n', stderr);
exit(1);
}
@@ -750,6 +761,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
#ifdef CONFIG_JBD_DEBUG
char *jbd_debug;
#endif
+ unsigned long long phys_mem_kb;
retval = e2fsck_allocate_context(&ctx);
if (retval)
@@ -777,6 +789,8 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
else
ctx->program_name = "e2fsck";
+ phys_mem_kb = get_memory_size() / 1024;
+ ctx->readahead_kb = ~0ULL;
while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
switch (c) {
case 'C':
@@ -961,6 +975,20 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
if (c)
verbose = 1;
+ if (ctx->readahead_kb == ~0ULL) {
+ profile_get_integer(ctx->profile, "options",
+ "readahead_mem_pct", 0, -1, &c);
+ if (c >= 0 && c <= 100)
+ ctx->readahead_kb = phys_mem_kb * c / 100;
+ profile_get_integer(ctx->profile, "options",
+ "readahead_kb", 0, -1, &c);
+ if (c >= 0)
+ ctx->readahead_kb = c;
+ if (ctx->readahead_kb != ~0ULL &&
+ ctx->readahead_kb > phys_mem_kb)
+ ctx->readahead_kb = phys_mem_kb;
+ }
+
/* Turn off discard in read-only mode */
if ((ctx->options & E2F_OPT_NO) &&
(ctx->options & E2F_OPT_DISCARD))
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 8c16ae5..9c2259a 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1418,6 +1418,7 @@ extern errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan,
ext2_ino_t *ino,
struct ext2_inode *inode,
int bufsize);
+#define EXT2_INODE_SCAN_DEFAULT_BUFFER_BLOCKS 8
extern errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
ext2_inode_scan *ret_scan);
extern void ext2fs_close_inode_scan(ext2_inode_scan scan);
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index ff7009b..17e49d8 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -175,7 +175,8 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
scan->bytes_left = 0;
scan->current_group = 0;
scan->groups_left = fs->group_desc_count - 1;
- scan->inode_buffer_blocks = buffer_blocks ? buffer_blocks : 8;
+ scan->inode_buffer_blocks = buffer_blocks ? buffer_blocks :
+ EXT2_INODE_SCAN_DEFAULT_BUFFER_BLOCKS;
scan->current_block = ext2fs_inode_table_loc(scan->fs,
scan->current_group);
scan->inodes_left = EXT2_INODES_PER_GROUP(scan->fs->super);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 09/31] e2fsck: track directories to be rehashed with a bitmap
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (7 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 08/31] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 10/31] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
` (26 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Use a bitmap to track which directories we want to rehash, since
bitmaps will use less memory. This enables us to clean up the
rehash-all case to use inode_dir_map, and we can free the dirinfo
memory sooner.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/e2fsck.c | 4 ++--
e2fsck/e2fsck.h | 2 +-
e2fsck/pass1.c | 8 ++++++-
e2fsck/pass2.c | 4 ++--
e2fsck/pass3.c | 4 ++++
e2fsck/rehash.c | 60 ++++++++++++++++++-------------------------------------
6 files changed, 35 insertions(+), 47 deletions(-)
diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index fcda7d7..7483072 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -125,8 +125,8 @@ errcode_t e2fsck_reset_context(e2fsck_t ctx)
ctx->inode_imagic_map = 0;
}
if (ctx->dirs_to_hash) {
- ext2fs_u32_list_free(ctx->dirs_to_hash);
- ctx->dirs_to_hash = 0;
+ ext2fs_free_inode_bitmap(ctx->dirs_to_hash);
+ ctx->dirs_to_hash = NULL;
}
/*
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index e359515..33dbcad 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -304,7 +304,7 @@ struct e2fsck_struct {
/*
* Directories to hash
*/
- ext2_u32_list dirs_to_hash;
+ ext2fs_inode_bitmap dirs_to_hash;
/*
* Tuning parameters
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index d3d6ca3..5d53a84 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -963,8 +963,12 @@ void e2fsck_pass1(e2fsck_t ctx)
if ((fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_DIR_INDEX) &&
!(ctx->options & E2F_OPT_NO)) {
- if (ext2fs_u32_list_create(&ctx->dirs_to_hash, 50))
- ctx->dirs_to_hash = 0;
+ if (e2fsck_allocate_inode_bitmap(fs,
+ _("directories to rehash"),
+ EXT2FS_BMAP64_AUTODIR,
+ "dirs_to_hash",
+ &ctx->dirs_to_hash))
+ ctx->dirs_to_hash = NULL;
}
#ifdef MTRACE
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index cffaac4..a7b2381 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -950,7 +950,7 @@ static int check_dir_block(ext2_filsys fs,
dot_state = 0;
if (ctx->dirs_to_hash &&
- ext2fs_u32_list_test(ctx->dirs_to_hash, ino))
+ ext2fs_fast_test_block_bitmap2(ctx->dirs_to_hash, ino))
dups_found++;
#if 0
@@ -1635,7 +1635,7 @@ static void clear_htree(e2fsck_t ctx, ext2_ino_t ino)
inode.i_flags = inode.i_flags & ~EXT2_INDEX_FL;
e2fsck_write_inode(ctx, ino, &inode, "clear_htree");
if (ctx->dirs_to_hash)
- ext2fs_u32_list_add(ctx->dirs_to_hash, ino);
+ ext2fs_mark_inode_bitmap2(ctx->dirs_to_hash, ino);
}
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 1d5255f..c331b98 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -119,6 +119,10 @@ void e2fsck_pass3(e2fsck_t ctx)
* If there are any directories that need to be indexed or
* optimized, do it here.
*/
+ if (iter)
+ e2fsck_dir_info_iter_end(ctx, iter);
+ iter = NULL;
+ e2fsck_free_dir_info(ctx);
e2fsck_rehash_directories(ctx);
abort_exit:
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index e37e871..348923e 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -56,9 +56,13 @@
void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino)
{
if (!ctx->dirs_to_hash)
- ext2fs_u32_list_create(&ctx->dirs_to_hash, 50);
+ e2fsck_allocate_inode_bitmap(ctx->fs,
+ _("directories to rehash"),
+ EXT2FS_BMAP64_AUTODIR,
+ "dirs_to_hash",
+ &ctx->dirs_to_hash);
if (ctx->dirs_to_hash)
- ext2fs_u32_list_add(ctx->dirs_to_hash, ino);
+ ext2fs_mark_inode_bitmap2(ctx->dirs_to_hash, ino);
}
/* Ask if a dir will be rebuilt during pass 3A. */
@@ -68,7 +72,7 @@ int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino)
return 1;
if (!ctx->dirs_to_hash)
return 0;
- return ext2fs_u32_list_test(ctx->dirs_to_hash, ino);
+ return ext2fs_test_inode_bitmap2(ctx->dirs_to_hash, ino);
}
struct fill_dir_struct {
@@ -911,12 +915,9 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
#ifdef RESOURCE_TRACK
struct resource_track rtrack;
#endif
- struct dir_info *dir;
- ext2_u32_iterate iter;
- struct dir_info_iter * dirinfo_iter = 0;
- ext2_ino_t ino;
- errcode_t retval;
- int cur, max, all_dirs, first = 1;
+ ext2_ino_t ino = 0;
+ int all_dirs, first = 1;
+ ext2fs_inode_bitmap hmap;
init_resource_track(&rtrack, ctx->fs->io);
all_dirs = ctx->options & E2F_OPT_COMPRESS_DIRS;
@@ -928,30 +929,12 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
clear_problem_context(&pctx);
- cur = 0;
- if (all_dirs) {
- dirinfo_iter = e2fsck_dir_info_iter_begin(ctx);
- max = e2fsck_get_num_dirinfo(ctx);
- } else {
- retval = ext2fs_u32_list_iterate_begin(ctx->dirs_to_hash,
- &iter);
- if (retval) {
- pctx.errcode = retval;
- fix_problem(ctx, PR_3A_OPTIMIZE_ITER, &pctx);
- return;
- }
- max = ext2fs_u32_list_count(ctx->dirs_to_hash);
- }
+ hmap = (all_dirs ? ctx->inode_dir_map : ctx->dirs_to_hash);
while (1) {
- if (all_dirs) {
- if ((dir = e2fsck_dir_info_iter(ctx,
- dirinfo_iter)) == 0)
- break;
- ino = dir->ino;
- } else {
- if (!ext2fs_u32_list_iterate(iter, &ino))
- break;
- }
+ if (ext2fs_find_first_set_inode_bitmap2(
+ hmap, ino + 1,
+ ctx->fs->super->s_inodes_count, &ino))
+ break;
pctx.dir = ino;
if (first) {
@@ -968,17 +951,14 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
}
if (ctx->progress && !ctx->progress_fd)
e2fsck_simple_progress(ctx, "Rebuilding directory",
- 100.0 * (float) (++cur) / (float) max, ino);
+ 100.0 * (float) ino /
+ (float) ctx->fs->super->s_inodes_count,
+ ino);
}
end_problem_latch(ctx, PR_LATCH_OPTIMIZE_DIR);
- if (all_dirs)
- e2fsck_dir_info_iter_end(ctx, dirinfo_iter);
- else
- ext2fs_u32_list_iterate_end(iter);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 10/31] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (8 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 09/31] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:17 ` [PATCH 11/31] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
` (25 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Teach e2fsck to (re)construct extent trees. This enables us to do
either of the following: compress a highly sparse extent tree into
fewer ETB blocks; or convert a ext3-style block mapped file to an
extent file. The reconstruction is performed during pass 1E or 3A,
as detailed below.
For files that are already extent based, this algorithm will
automatically run (pending user approval) if pass1 determines either
(1) that a whole level of extent tree will fit into a higher level of
the tree; (2) that the size of any level can be reduced by at least
one ETB block; or (3) the extent tree is unnecessarily deep. It will
not run at all if errors are found and the user declines to fix the
errors.
The option "-E bmap2extent" can be used to force e2fsck to convert all
block map files to extent trees, and to rebuild all extent files'
extent trees. After conversion, files larger than 12 blocks should be
defragmented to eliminate empty holes where a block lives.
The extent tree constructor is pretty dumb -- it creates a list of
leaf extents (adjacent extents are collapsed), marks all indirect
blocks / ETB blocks free, installs a new extent tree root in the
inode, then loads the leaf extents into the tree.
v2: Account for extent tree block slack that we create when splitting
a block, so that we don't repeatedly annoy the user to rebuild a tree
that we can't optimize further.
v3: For any directory being rebuilt during pass 3A, defer any extent
tree rebuilding until after the rehash. It's quite possible that the
act of compressing an aged directory will cause it to shrink far
enough to enable us to knock a level off the dir's extent tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/Makefile.in | 16 +
e2fsck/e2fsck.8.in | 3
e2fsck/e2fsck.c | 4
e2fsck/e2fsck.h | 35 ++
e2fsck/extents.c | 532 ++++++++++++++++++++++++++++++++
e2fsck/pass1.c | 62 ++++
e2fsck/problem.c | 48 +++
e2fsck/problem.h | 33 ++
e2fsck/rehash.c | 27 +-
e2fsck/super.c | 7
e2fsck/unix.c | 4
tests/f_extent_bad_node/expect.1 | 11 -
tests/f_extent_bad_node/expect.2 | 2
tests/f_extent_int_bad_magic/expect.1 | 5
tests/f_extent_leaf_bad_magic/expect.1 | 5
tests/f_extent_oobounds/expect.1 | 11 -
tests/f_extent_oobounds/expect.2 | 2
tests/f_extents/expect.1 | 9 +
18 files changed, 789 insertions(+), 27 deletions(-)
create mode 100644 e2fsck/extents.c
diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index e40e51b..a4413d9 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -62,7 +62,8 @@ OBJS= dict.o unix.o e2fsck.o super.o pass1.o pass1b.o pass2.o \
pass3.o pass4.o pass5.o journal.o badblocks.o util.o dirinfo.o \
dx_dirinfo.o ehandler.o problem.o message.o quota.o recovery.o \
region.o revoke.o ea_refcount.o rehash.o profile.o prof_err.o \
- logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o readahead.o
+ logfile.o sigcatcher.o $(MTRACE_OBJ) plausible.o readahead.o \
+ extents.o
PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
profiled/super.o profiled/pass1.o profiled/pass1b.o \
@@ -74,7 +75,7 @@ PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
profiled/ea_refcount.o profiled/rehash.o profiled/profile.o \
profiled/prof_err.o profiled/logfile.o \
profiled/sigcatcher.o profiled/plausible.o \
- profiled/sigcatcher.o profiled/readahead.o
+ profiled/sigcatcher.o profiled/readahead.o profiled/extents.o
SRCS= $(srcdir)/e2fsck.c \
$(srcdir)/dict.c \
@@ -106,6 +107,7 @@ SRCS= $(srcdir)/e2fsck.c \
prof_err.c \
$(srcdir)/quota.c \
$(srcdir)/../misc/plausible.c \
+ $(srcdir)/extents.c \
$(MTRACE_SRC)
all:: profiled $(PROGS) e2fsck $(MANPAGES) $(FMANPAGES)
@@ -308,6 +310,16 @@ pass1.o: $(srcdir)/pass1.c $(top_builddir)/lib/config.h \
$(srcdir)/profile.h prof_err.h $(top_srcdir)/lib/quota/quotaio.h \
$(top_srcdir)/lib/quota/dqblk_v2.h $(top_srcdir)/lib/quota/quotaio_tree.h \
$(top_srcdir)/lib/../e2fsck/dict.h $(srcdir)/problem.h
+extents.o: $(srcdir)/extents.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
+ $(srcdir)/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
+ $(top_srcdir)/lib/ext2fs/ext3_extents.h $(top_srcdir)/lib/ext2fs/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
+ $(srcdir)/profile.h prof_err.h $(top_srcdir)/lib/quota/quotaio.h \
+ $(top_srcdir)/lib/quota/dqblk_v2.h $(top_srcdir)/lib/quota/quotaio_tree.h \
+ $(top_srcdir)/lib/../e2fsck/dict.h $(srcdir)/problem.h $(srcdir)/dict.h
pass1b.o: $(srcdir)/pass1b.c $(top_builddir)/lib/config.h \
$(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
$(srcdir)/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 84ae50f..0c2725e 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -214,6 +214,9 @@ e2fsck runtime. By default, this is set to the size of a block group's inode
table (typically 2MiB on a regular ext4 filesystem); if this amount is more
than 1/100 of total physical memory, readahead is disabled. Set this to zero
to disable readahead entirely.
+.TP
+.BI bmap2extent
+Convert block-mapped files to extent-mapped files.
.RE
.TP
.B \-f
diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 7483072..8a7b041 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -204,8 +204,8 @@ void e2fsck_free_context(e2fsck_t ctx)
typedef void (*pass_t)(e2fsck_t ctx);
static pass_t e2fsck_passes[] = {
- e2fsck_pass1, e2fsck_pass2, e2fsck_pass3, e2fsck_pass4,
- e2fsck_pass5, 0 };
+ e2fsck_pass1, e2fsck_pass1e, e2fsck_pass2, e2fsck_pass3,
+ e2fsck_pass4, e2fsck_pass5, 0 };
#define E2F_FLAG_RUN_RETURN (E2F_FLAG_SIGNAL_MASK|E2F_FLAG_RESTART)
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 33dbcad..15d968e 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -167,6 +167,7 @@ struct resource_track {
#define E2F_OPT_FRAGCHECK 0x0800
#define E2F_OPT_JOURNAL_ONLY 0x1000 /* only replay the journal */
#define E2F_OPT_DISCARD 0x2000
+#define E2F_OPT_CONVERT_BMAP 0x4000 /* convert blockmap to extent */
/*
* E2fsck flags
@@ -190,6 +191,7 @@ struct resource_track {
#define E2F_FLAG_EXITING 0x1000 /* E2fsck exiting due to errors */
#define E2F_FLAG_TIME_INSANE 0x2000 /* Time is insane */
#define E2F_FLAG_PROBLEMS_FIXED 0x4000 /* At least one problem was fixed */
+#define E2F_FLAG_ALLOC_OK 0x8000 /* Can we allocate blocks? */
#define E2F_RESET_FLAGS (E2F_FLAG_TIME_INSANE | E2F_FLAG_PROBLEMS_FIXED)
@@ -381,6 +383,23 @@ struct e2fsck_struct {
/* How much are we allowed to readahead? */
unsigned long long readahead_kb;
+
+ /*
+ * Inodes to rebuild extent trees
+ */
+ ext2fs_inode_bitmap inodes_to_rebuild;
+};
+
+/* Data structures to evaluate whether an extent tree needs rebuilding. */
+struct extent_tree_level {
+ unsigned int num_extents;
+ unsigned int max_extents;
+};
+
+struct extent_tree_info {
+ int force_rebuild:1;
+ ext2_ino_t ino;
+ struct extent_tree_level ext_info[MAX_EXTENT_DEPTH_COUNT];
};
/* Used by the region allocation code */
@@ -456,6 +475,19 @@ extern blk64_t ea_refcount_intr_next(ext2_refcount_t refcount, int *ret);
extern const char *ehandler_operation(const char *op);
extern void ehandler_init(io_channel channel);
+/* extents.c */
+struct problem_context;
+errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino);
+int e2fsck_ino_will_be_rebuilt(e2fsck_t ctx, ext2_ino_t ino);
+void e2fsck_pass1e(e2fsck_t ctx);
+errcode_t e2fsck_check_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino,
+ struct ext2_inode *inode,
+ struct problem_context *pctx);
+errcode_t e2fsck_should_rebuild_extents(e2fsck_t ctx,
+ struct problem_context *pctx,
+ struct extent_tree_info *eti,
+ struct ext2_extent_info *info);
+
/* journal.c */
extern errcode_t e2fsck_check_ext3_journal(e2fsck_t ctx);
extern errcode_t e2fsck_run_ext3_journal(e2fsck_t ctx);
@@ -519,7 +551,8 @@ extern int region_allocate(region_t region, region_addr_t start, int n);
/* rehash.c */
void e2fsck_rehash_dir_later(e2fsck_t ctx, ext2_ino_t ino);
int e2fsck_dir_will_be_rehashed(e2fsck_t ctx, ext2_ino_t ino);
-errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino);
+errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
+ struct problem_context *pctx);
void e2fsck_rehash_directories(e2fsck_t ctx);
/* sigcatcher.c */
diff --git a/e2fsck/extents.c b/e2fsck/extents.c
new file mode 100644
index 0000000..9c2ae42
--- /dev/null
+++ b/e2fsck/extents.c
@@ -0,0 +1,532 @@
+/*
+ * extents.c --- rebuild extent tree
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include "e2fsck.h"
+#include "problem.h"
+
+#undef DEBUG
+#undef DEBUG_SUMMARY
+#undef DEBUG_FREE
+
+#define NUM_EXTENTS 341 /* about one ETB' worth of extents */
+
+static errcode_t e2fsck_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino);
+
+/* Schedule an inode to have its extent tree rebuilt during pass 1E. */
+errcode_t e2fsck_rebuild_extents_later(e2fsck_t ctx, ext2_ino_t ino)
+{
+ if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
+ EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+ (ctx->options & E2F_OPT_NO) ||
+ (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
+ return 0;
+
+ if (ctx->flags & E2F_FLAG_ALLOC_OK)
+ return e2fsck_rebuild_extents(ctx, ino);
+
+ if (!ctx->inodes_to_rebuild)
+ e2fsck_allocate_inode_bitmap(ctx->fs,
+ _("extent rebuild inode map"),
+ EXT2FS_BMAP64_RBTREE,
+ "inodes_to_rebuild",
+ &ctx->inodes_to_rebuild);
+ if (ctx->inodes_to_rebuild)
+ ext2fs_mark_inode_bitmap2(ctx->inodes_to_rebuild, ino);
+ return 0;
+}
+
+/* Ask if an inode will have its extents rebuilt during pass 1E. */
+int e2fsck_ino_will_be_rebuilt(e2fsck_t ctx, ext2_ino_t ino)
+{
+ if (!ctx->inodes_to_rebuild)
+ return 0;
+ return ext2fs_test_inode_bitmap2(ctx->inodes_to_rebuild, ino);
+}
+
+struct extent_list {
+ blk64_t blocks_freed;
+ struct ext2fs_extent *extents;
+ unsigned int count;
+ unsigned int size;
+ unsigned int ext_read;
+ errcode_t retval;
+ ext2_ino_t ino;
+};
+
+static errcode_t load_extents(e2fsck_t ctx, struct extent_list *list)
+{
+ ext2_filsys fs = ctx->fs;
+ ext2_extent_handle_t handle;
+ struct ext2fs_extent extent;
+ errcode_t retval;
+
+ retval = ext2fs_extent_open(fs, list->ino, &handle);
+ if (retval)
+ return retval;
+
+ retval = ext2fs_extent_get(handle, EXT2_EXTENT_ROOT, &extent);
+ if (retval)
+ goto out;
+
+ do {
+ if (extent.e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+ goto next;
+
+ /* Internal node; free it and we'll re-allocate it later */
+ if (!(extent.e_flags & EXT2_EXTENT_FLAGS_LEAF)) {
+#if defined(DEBUG) || defined(DEBUG_FREE)
+ printf("ino=%d free=%llu bf=%llu\n", list->ino,
+ extent.e_pblk, list->blocks_freed + 1);
+#endif
+ list->blocks_freed++;
+ ext2fs_block_alloc_stats2(fs, extent.e_pblk, -1);
+ goto next;
+ }
+
+ list->ext_read++;
+ /* Can we attach it to the previous extent? */
+ if (list->count) {
+ struct ext2fs_extent *last = list->extents +
+ list->count - 1;
+ blk64_t end = last->e_len + extent.e_len;
+
+ if (last->e_pblk + last->e_len == extent.e_pblk &&
+ last->e_lblk + last->e_len == extent.e_lblk &&
+ (last->e_flags & EXT2_EXTENT_FLAGS_UNINIT) ==
+ (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ end < (1ULL << 32)) {
+ last->e_len += extent.e_len;
+#ifdef DEBUG
+ printf("R: ino=%d len=%u\n", list->ino,
+ last->e_len);
+#endif
+ goto next;
+ }
+ }
+
+ /* Do we need to expand? */
+ if (list->count == list->size) {
+ unsigned int new_size = (list->size + NUM_EXTENTS) *
+ sizeof(struct ext2fs_extent);
+ retval = ext2fs_resize_mem(0, new_size, &list->extents);
+ if (retval)
+ goto out;
+ list->size += NUM_EXTENTS;
+ }
+
+ /* Add a new extent */
+ memcpy(list->extents + list->count, &extent, sizeof(extent));
+#ifdef DEBUG
+ printf("R: ino=%d pblk=%llu lblk=%llu len=%u\n", list->ino,
+ extent.e_pblk, extent.e_lblk, extent.e_len);
+#endif
+ list->count++;
+next:
+ retval = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT, &extent);
+ } while (retval == 0);
+
+out:
+ /* Ok if we run off the end */
+ if (retval == EXT2_ET_EXTENT_NO_NEXT)
+ retval = 0;
+ ext2fs_extent_free(handle);
+ return retval;
+}
+
+static int find_blocks(ext2_filsys fs, blk64_t *blocknr, e2_blkcnt_t blockcnt,
+ blk64_t ref_blk, int ref_offset, void *priv_data)
+{
+ struct extent_list *list = priv_data;
+
+ /* Internal node? */
+ if (blockcnt < 0) {
+#if defined(DEBUG) || defined(DEBUG_FREE)
+ printf("ino=%d free=%llu bf=%llu\n", list->ino, *blocknr,
+ list->blocks_freed + 1);
+#endif
+ list->blocks_freed++;
+ ext2fs_block_alloc_stats2(fs, *blocknr, -1);
+ return 0;
+ }
+
+ /* Can we attach it to the previous extent? */
+ if (list->count) {
+ struct ext2fs_extent *last = list->extents +
+ list->count - 1;
+ blk64_t end = last->e_len + 1;
+
+ if (last->e_pblk + last->e_len == *blocknr &&
+ end < (1ULL << 32)) {
+ last->e_len++;
+#ifdef DEBUG
+ printf("R: ino=%d len=%u\n", list->ino, last->e_len);
+#endif
+ return 0;
+ }
+ }
+
+ /* Do we need to expand? */
+ if (list->count == list->size) {
+ unsigned int new_size = (list->size + NUM_EXTENTS) *
+ sizeof(struct ext2fs_extent);
+ list->retval = ext2fs_resize_mem(0, new_size, &list->extents);
+ if (list->retval)
+ return BLOCK_ABORT;
+ list->size += NUM_EXTENTS;
+ }
+
+ /* Add a new extent */
+ list->extents[list->count].e_pblk = *blocknr;
+ list->extents[list->count].e_lblk = blockcnt;
+ list->extents[list->count].e_len = 1;
+ list->extents[list->count].e_flags = 0;
+#ifdef DEBUG
+ printf("R: ino=%d pblk=%llu lblk=%llu len=%u\n", list->ino, *blocknr,
+ blockcnt, 1);
+#endif
+ list->count++;
+
+ return 0;
+}
+
+static errcode_t rebuild_extent_tree(e2fsck_t ctx, struct extent_list *list,
+ ext2_ino_t ino)
+{
+ struct ext2_inode inode;
+ errcode_t retval;
+ ext2_extent_handle_t handle;
+ unsigned int i, ext_written;
+ struct ext2fs_extent *ex, extent;
+
+ list->count = 0;
+ list->blocks_freed = 0;
+ list->ino = ino;
+ list->ext_read = 0;
+ e2fsck_read_inode(ctx, ino, &inode, "rebuild_extents");
+
+ /* Skip deleted inodes and inline data files */
+ if (inode.i_links_count == 0 ||
+ inode.i_flags & EXT4_INLINE_DATA_FL)
+ return 0;
+
+ /* Collect lblk->pblk mappings */
+ if (inode.i_flags & EXT4_EXTENTS_FL) {
+ retval = load_extents(ctx, list);
+ goto extents_loaded;
+ }
+
+ retval = ext2fs_block_iterate3(ctx->fs, ino, BLOCK_FLAG_READ_ONLY, 0,
+ find_blocks, list);
+ if (retval)
+ goto err;
+ if (list->retval) {
+ retval = list->retval;
+ goto err;
+ }
+
+extents_loaded:
+ /* Reset extent tree */
+ inode.i_flags &= ~EXT4_EXTENTS_FL;
+ memset(inode.i_block, 0, sizeof(inode.i_block));
+
+ /* Make a note of freed blocks */
+ retval = ext2fs_iblk_sub_blocks(ctx->fs, &inode, list->blocks_freed);
+ if (retval)
+ goto err;
+
+ /* Now stuff extents into the file */
+ retval = ext2fs_extent_open2(ctx->fs, ino, &inode, &handle);
+ if (retval)
+ goto err;
+
+ ext_written = 0;
+ for (i = 0, ex = list->extents; i < list->count; i++, ex++) {
+ memcpy(&extent, ex, sizeof(struct ext2fs_extent));
+ extent.e_flags &= EXT2_EXTENT_FLAGS_UNINIT;
+ if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) {
+ if (extent.e_len > EXT_UNINIT_MAX_LEN) {
+ extent.e_len = EXT_UNINIT_MAX_LEN;
+ ex->e_pblk += EXT_UNINIT_MAX_LEN;
+ ex->e_lblk += EXT_UNINIT_MAX_LEN;
+ ex->e_len -= EXT_UNINIT_MAX_LEN;
+ ex--;
+ i--;
+ }
+ } else {
+ if (extent.e_len > EXT_INIT_MAX_LEN) {
+ extent.e_len = EXT_INIT_MAX_LEN;
+ ex->e_pblk += EXT_INIT_MAX_LEN;
+ ex->e_lblk += EXT_INIT_MAX_LEN;
+ ex->e_len -= EXT_INIT_MAX_LEN;
+ ex--;
+ i--;
+ }
+ }
+
+#ifdef DEBUG
+ printf("W: ino=%d pblk=%llu lblk=%llu len=%u\n", ino,
+ extent.e_pblk, extent.e_lblk, extent.e_len);
+#endif
+ retval = ext2fs_extent_insert(handle, EXT2_EXTENT_INSERT_AFTER,
+ &extent);
+ if (retval)
+ goto err2;
+ retval = ext2fs_extent_fix_parents(handle);
+ if (retval)
+ goto err2;
+ ext_written++;
+ }
+
+#if defined(DEBUG) || defined(DEBUG_SUMMARY)
+ printf("rebuild: ino=%d extents=%d->%d\n", ino, list->ext_read,
+ ext_written);
+#endif
+ e2fsck_write_inode(ctx, ino, &inode, "rebuild_extents");
+
+err2:
+ ext2fs_extent_free(handle);
+err:
+ return retval;
+}
+
+/* Rebuild the extents immediately */
+static errcode_t e2fsck_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino)
+{
+ struct extent_list list;
+ errcode_t err;
+
+ if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
+ EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+ (ctx->options & E2F_OPT_NO) ||
+ (ino != EXT2_ROOT_INO && ino < ctx->fs->super->s_first_ino))
+ return 0;
+
+ e2fsck_read_bitmaps(ctx);
+ memset(&list, 0, sizeof(list));
+ err = ext2fs_get_mem(sizeof(struct ext2fs_extent) * NUM_EXTENTS,
+ &list.extents);
+ if (err)
+ return err;
+ list.size = NUM_EXTENTS;
+ err = rebuild_extent_tree(ctx, &list, ino);
+ ext2fs_free_mem(&list.extents);
+
+ return err;
+}
+
+static void rebuild_extents(e2fsck_t ctx, const char *pass_name, int pr_header)
+{
+ struct problem_context pctx;
+#ifdef RESOURCE_TRACK
+ struct resource_track rtrack;
+#endif
+ struct extent_list list;
+ int first = 1;
+ ext2_ino_t ino = 0;
+ errcode_t retval;
+
+ if (!EXT2_HAS_INCOMPAT_FEATURE(ctx->fs->super,
+ EXT3_FEATURE_INCOMPAT_EXTENTS) ||
+ !ext2fs_test_valid(ctx->fs) ||
+ ctx->invalid_bitmaps) {
+ if (ctx->inodes_to_rebuild)
+ ext2fs_free_inode_bitmap(ctx->inodes_to_rebuild);
+ ctx->inodes_to_rebuild = NULL;
+ }
+
+ if (ctx->inodes_to_rebuild == NULL)
+ return;
+
+ init_resource_track(&rtrack, ctx->fs->io);
+ clear_problem_context(&pctx);
+ e2fsck_read_bitmaps(ctx);
+
+ memset(&list, 0, sizeof(list));
+ retval = ext2fs_get_mem(sizeof(struct ext2fs_extent) * NUM_EXTENTS,
+ &list.extents);
+ list.size = NUM_EXTENTS;
+ while (1) {
+ retval = ext2fs_find_first_set_inode_bitmap2(
+ ctx->inodes_to_rebuild, ino + 1,
+ ctx->fs->super->s_inodes_count, &ino);
+ if (retval)
+ break;
+ pctx.ino = ino;
+ if (first) {
+ fix_problem(ctx, pr_header, &pctx);
+ first = 0;
+ }
+ pctx.errcode = rebuild_extent_tree(ctx, &list, ino);
+ if (pctx.errcode) {
+ end_problem_latch(ctx, PR_LATCH_OPTIMIZE_EXT);
+ fix_problem(ctx, PR_1E_OPTIMIZE_EXT_ERR, &pctx);
+ }
+ if (ctx->progress && !ctx->progress_fd)
+ e2fsck_simple_progress(ctx, "Rebuilding extents",
+ 100.0 * (float) ino /
+ (float) ctx->fs->super->s_inodes_count,
+ ino);
+ }
+ end_problem_latch(ctx, PR_LATCH_OPTIMIZE_EXT);
+
+ ext2fs_free_inode_bitmap(ctx->inodes_to_rebuild);
+ ctx->inodes_to_rebuild = NULL;
+ ext2fs_free_mem(&list.extents);
+
+ print_resource_track(ctx, pass_name, &rtrack, ctx->fs->io);
+}
+
+/* Scan a file to see if we should rebuild its extent tree */
+errcode_t e2fsck_check_rebuild_extents(e2fsck_t ctx, ext2_ino_t ino,
+ struct ext2_inode *inode,
+ struct problem_context *pctx)
+{
+ struct extent_tree_info eti;
+ struct ext2_extent_info info, top_info;
+ struct ext2fs_extent extent;
+ ext2_extent_handle_t ehandle;
+ ext2_filsys fs = ctx->fs;
+ errcode_t retval;
+
+ /* block map file and we want extent conversion */
+ if (!(inode->i_flags & EXT4_EXTENTS_FL) &&
+ !(inode->i_flags & EXT4_INLINE_DATA_FL) &&
+ (ctx->options & E2F_OPT_CONVERT_BMAP)) {
+ return e2fsck_rebuild_extents_later(ctx, ino);
+ }
+
+ if (!(inode->i_flags & EXT4_EXTENTS_FL))
+ return 0;
+ memset(&eti, 0, sizeof(eti));
+ eti.ino = ino;
+
+ /* Otherwise, go scan the extent tree... */
+ retval = ext2fs_extent_open2(fs, ino, inode, &ehandle);
+ if (retval)
+ return 0;
+
+ retval = ext2fs_extent_get_info(ehandle, &top_info);
+ if (retval)
+ goto out;
+
+ /* Check maximum extent depth */
+ pctx->ino = ino;
+ pctx->blk = top_info.max_depth;
+ pctx->blk2 = ext2fs_max_extent_depth(ehandle);
+ if (pctx->blk2 < pctx->blk &&
+ fix_problem(ctx, PR_1_EXTENT_BAD_MAX_DEPTH, pctx))
+ eti.force_rebuild = 1;
+
+ /* Can we collect extent tree level stats? */
+ pctx->blk = MAX_EXTENT_DEPTH_COUNT;
+ if (pctx->blk2 > pctx->blk)
+ fix_problem(ctx, PR_1E_MAX_EXTENT_TREE_DEPTH, pctx);
+
+ retval = ext2fs_extent_get(ehandle, EXT2_EXTENT_ROOT, &extent);
+ if (retval)
+ goto out;
+
+ do {
+ retval = ext2fs_extent_get_info(ehandle, &info);
+ if (retval)
+ break;
+
+ /*
+ * If this is the first extent in an extent block that we
+ * haven't visited, collect stats on the block.
+ */
+ if (info.curr_entry == 1 &&
+ !(extent.e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT) &&
+ !eti.force_rebuild) {
+ struct extent_tree_level *etl;
+
+ etl = eti.ext_info + info.curr_level;
+ etl->num_extents += info.num_entries;
+ etl->max_extents += info.max_entries;
+ /*
+ * Implementation wart: Splitting extent blocks when
+ * appending will leave the old block with one free
+ * entry. Therefore unless the node is totally full,
+ * pretend that a non-root extent block can hold one
+ * fewer entry than it actually does, so that we don't
+ * repeatedly rebuild the extent tree.
+ */
+ if (info.curr_level &&
+ info.num_entries < info.max_entries)
+ etl->max_extents--;
+ }
+
+ /* Skip to the end of a block of leaf nodes */
+ if (extent.e_flags & EXT2_EXTENT_FLAGS_LEAF) {
+ retval = ext2fs_extent_get(ehandle,
+ EXT2_EXTENT_LAST_SIB,
+ &extent);
+ if (retval)
+ break;
+ }
+
+ retval = ext2fs_extent_get(ehandle, EXT2_EXTENT_NEXT, &extent);
+ } while (retval == 0);
+out:
+ ext2fs_extent_free(ehandle);
+ return e2fsck_should_rebuild_extents(ctx, pctx, &eti, &top_info);
+}
+
+/* Having scanned a file's extent tree, decide if we should rebuild it */
+errcode_t e2fsck_should_rebuild_extents(e2fsck_t ctx,
+ struct problem_context *pctx,
+ struct extent_tree_info *eti,
+ struct ext2_extent_info *info)
+{
+ struct extent_tree_level *ei;
+ int i, j, op;
+ unsigned int extents_per_block;
+
+ if (eti->force_rebuild)
+ goto rebuild;
+
+ extents_per_block = (ctx->fs->blocksize -
+ sizeof(struct ext3_extent_header)) /
+ sizeof(struct ext3_extent);
+ /*
+ * If we can consolidate a level or shorten the tree, schedule the
+ * extent tree to be rebuilt.
+ */
+ for (i = 0, ei = eti->ext_info; i < info->max_depth + 1; i++, ei++) {
+ if (ei->max_extents - ei->num_extents > extents_per_block) {
+ pctx->blk = i;
+ op = PR_1E_CAN_NARROW_EXTENT_TREE;
+ goto rebuild;
+ }
+ for (j = 0; j < i; j++) {
+ if (ei->num_extents < eti->ext_info[j].max_extents) {
+ pctx->blk = i;
+ op = PR_1E_CAN_COLLAPSE_EXTENT_TREE;
+ goto rebuild;
+ }
+ }
+ }
+ return 0;
+
+rebuild:
+ if (eti->force_rebuild || fix_problem(ctx, op, pctx))
+ return e2fsck_rebuild_extents_later(ctx, eti->ino);
+ return 0;
+}
+
+void e2fsck_pass1e(e2fsck_t ctx)
+{
+ rebuild_extents(ctx, "Pass 1E", PR_1E_PASS_HEADER);
+}
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 5d53a84..fd10c72 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -56,6 +56,8 @@
#define _INLINE_ inline
#endif
+#undef DEBUG
+
static int process_block(ext2_filsys fs, blk64_t *blocknr,
e2_blkcnt_t blockcnt, blk64_t ref_blk,
int ref_offset, void *priv_data);
@@ -95,6 +97,7 @@ struct process_block_struct {
e2fsck_t ctx;
blk64_t bad_ref;
region_t region;
+ struct extent_tree_info eti;
};
struct process_inode_block {
@@ -1805,6 +1808,7 @@ void e2fsck_pass1(e2fsck_t ctx)
}
e2fsck_pass1_dupblocks(ctx, block_buf);
}
+ ctx->flags |= E2F_FLAG_ALLOC_OK;
ext2fs_free_mem(&inodes_to_process);
endit:
e2fsck_use_inode_shortcuts(ctx, 0);
@@ -2429,6 +2433,22 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
pctx->errcode = ext2fs_extent_get_info(ehandle, &info);
if (pctx->errcode)
return;
+ if (!pb->eti.force_rebuild) {
+ struct extent_tree_level *etl;
+
+ etl = pb->eti.ext_info + info.curr_level;
+ etl->num_extents += info.num_entries;
+ etl->max_extents += info.max_entries;
+ /*
+ * Implementation wart: Splitting extent blocks when appending
+ * will leave the old block with one free entry. Therefore
+ * unless the node is totally full, pretend that a non-root
+ * extent block can hold one fewer entry than it actually does,
+ * so that we don't repeatedly rebuild the extent tree.
+ */
+ if (info.curr_level && info.num_entries < info.max_entries)
+ etl->max_extents--;
+ }
pctx->errcode = ext2fs_extent_get(ehandle, EXT2_EXTENT_FIRST_SIB,
&extent);
@@ -2765,11 +2785,27 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,
retval = ext2fs_extent_get_info(ehandle, &info);
if (retval == 0) {
- if (info.max_depth >= MAX_EXTENT_DEPTH_COUNT)
- info.max_depth = MAX_EXTENT_DEPTH_COUNT-1;
- ctx->extent_depth_count[info.max_depth]++;
+ int max_depth = info.max_depth;
+
+ if (max_depth >= MAX_EXTENT_DEPTH_COUNT)
+ max_depth = MAX_EXTENT_DEPTH_COUNT-1;
+ ctx->extent_depth_count[max_depth]++;
}
+ /* Check maximum extent depth */
+ pctx->blk = info.max_depth;
+ pctx->blk2 = ext2fs_max_extent_depth(ehandle);
+ if (pctx->blk2 < pctx->blk &&
+ fix_problem(ctx, PR_1_EXTENT_BAD_MAX_DEPTH, pctx))
+ pb->eti.force_rebuild = 1;
+
+ /* Can we collect extent tree level stats? */
+ pctx->blk = MAX_EXTENT_DEPTH_COUNT;
+ if (pctx->blk2 > pctx->blk)
+ fix_problem(ctx, PR_1E_MAX_EXTENT_TREE_DEPTH, pctx);
+ memset(pb->eti.ext_info, 0, sizeof(pb->eti.ext_info));
+ pb->eti.ino = pb->ino;
+
pb->region = region_create(0, info.max_lblk);
if (!pb->region) {
ext2fs_extent_free(ehandle);
@@ -2792,6 +2828,16 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,
region_free(pb->region);
pb->region = NULL;
ext2fs_extent_free(ehandle);
+
+ /* Rebuild unless it's a dir and we're rehashing it */
+ if (LINUX_S_ISDIR(inode->i_mode) &&
+ e2fsck_dir_will_be_rehashed(ctx, ino))
+ return;
+
+ if (ctx->options & E2F_OPT_CONVERT_BMAP)
+ e2fsck_rebuild_extents_later(ctx, ino);
+ else
+ e2fsck_should_rebuild_extents(ctx, pctx, &pb->eti, &info);
}
/*
@@ -2851,6 +2897,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
pb.ctx = ctx;
pb.inode_modified = 0;
pb.bad_ref = 0;
+ pb.eti.force_rebuild = 0;
pctx->ino = ino;
pctx->errcode = 0;
@@ -2914,6 +2961,15 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
"check_blocks");
fs->flags = (flags & EXT2_FLAG_IGNORE_CSUM_ERRORS) |
(fs->flags & ~EXT2_FLAG_IGNORE_CSUM_ERRORS);
+
+ if (ctx->options & E2F_OPT_CONVERT_BMAP) {
+#ifdef DEBUG
+ printf("bmap rebuild ino=%d\n", ino);
+#endif
+ if (!LINUX_S_ISDIR(inode->i_mode) ||
+ !e2fsck_dir_will_be_rehashed(ctx, ino))
+ e2fsck_rebuild_extents_later(ctx, ino);
+ }
}
}
end_problem_latch(ctx, PR_LATCH_BLOCK);
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index a63e61c..07d79f3 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1101,6 +1101,11 @@ static struct e2fsck_problem problem_table[] = {
N_("@i %i has a duplicate @x mapping\n\t(logical @b %c, @n physical @b %b, len %N)\n"),
PROMPT_CLEAR, 0 },
+ /* Inode extent tree could be more shallow */
+ { PR_1_EXTENT_BAD_MAX_DEPTH,
+ N_("@i %i @x tree could be more shallow (%b; could be <= %c)\n"),
+ PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
+
/* Pass 1b errors */
/* Pass 1B: Rescan for duplicate/bad blocks */
@@ -1198,6 +1203,48 @@ static struct e2fsck_problem problem_table[] = {
{ PR_1D_CLONE_ERROR,
N_("Couldn't clone file: %m\n"), PROMPT_NONE, 0 },
+ /* Pass 1E Extent tree optimization */
+
+ /* Pass 1E: Optimizing extent trees */
+ { PR_1E_PASS_HEADER,
+ N_("Pass 1E: Optimizing @x trees\n"),
+ PROMPT_NONE, PR_PREEN_NOMSG },
+
+ /* Failed to optimize extent tree */
+ { PR_1E_OPTIMIZE_EXT_ERR,
+ N_("Failed to optimize @x tree %p (%i): %m\n"),
+ PROMPT_NONE, 0 },
+
+ /* Optimizing extent trees */
+ { PR_1E_OPTIMIZE_EXT_HEADER,
+ N_("Optimizing @x trees: "),
+ PROMPT_NONE, PR_MSG_ONLY },
+
+ /* Rebuilding extent tree %d */
+ { PR_1E_OPTIMIZE_EXT,
+ " %i",
+ PROMPT_NONE, PR_LATCH_OPTIMIZE_EXT | PR_PREEN_NOHDR},
+
+ /* Rebuilding extent tree end */
+ { PR_1E_OPTIMIZE_EXT_END,
+ "\n",
+ PROMPT_NONE, PR_PREEN_NOHDR },
+
+ /* Internal error: extent tree depth too large */
+ { PR_1E_MAX_EXTENT_TREE_DEPTH,
+ N_("Internal error: max extent tree depth too large (%b; expected=%c).\n"),
+ PROMPT_NONE, PR_FATAL },
+
+ /* Inode extent tree could be shorter */
+ { PR_1E_CAN_COLLAPSE_EXTENT_TREE,
+ N_("@i %i @x tree could be shorter.\n\t(level %b is unnecessary)\n"),
+ PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
+
+ /* Inode extent tree could be narrower */
+ { PR_1E_CAN_NARROW_EXTENT_TREE,
+ N_("@i %i @x tree could be narrower.\n\t(level %b has unnecessary nodes)\n"),
+ PROMPT_FIX, PR_NO_OK | PR_PREEN_NO | PR_PREEN_OK },
+
/* Pass 2 errors */
/* Pass 2: Checking directory structure */
@@ -1946,6 +1993,7 @@ static struct latch_descr pr_latch_info[] = {
{ PR_LATCH_TOOBIG, PR_1_INODE_TOOBIG, 0 },
{ PR_LATCH_OPTIMIZE_DIR, PR_3A_OPTIMIZE_DIR_HEADER, PR_3A_OPTIMIZE_DIR_END },
{ PR_LATCH_BG_CHECKSUM, PR_0_GDT_CSUM_LATCH, 0 },
+ { PR_LATCH_OPTIMIZE_EXT, PR_1E_OPTIMIZE_EXT_HEADER, PR_1E_OPTIMIZE_EXT_END },
{ -1, 0, 0 },
};
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 3c28166..4678b23 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -40,6 +40,7 @@ struct problem_context {
#define PR_LATCH_TOOBIG 0x0080 /* Latch for file to big errors */
#define PR_LATCH_OPTIMIZE_DIR 0x0090 /* Latch for optimize directories */
#define PR_LATCH_BG_CHECKSUM 0x00A0 /* Latch for block group checksums */
+#define PR_LATCH_OPTIMIZE_EXT 0x00B0 /* Latch for rebuild extents */
#define PR_LATCH(x) ((((x) & PR_LATCH_MASK) >> 4) - 1)
@@ -641,6 +642,9 @@ struct problem_context {
/* leaf extent collision */
#define PR_1_EXTENT_COLLISION 0x01007D
+/* extent tree max depth too big */
+#define PR_1_EXTENT_BAD_MAX_DEPTH 0x01007E
+
/*
* Pass 1b errors
*/
@@ -704,6 +708,33 @@ struct problem_context {
#define PR_1D_CLONE_ERROR 0x013008
/*
+ * Pass 1e --- rebuilding extent trees
+ */
+/* Pass 1e: Rebuilding extent trees */
+#define PR_1E_PASS_HEADER 0x014000
+
+/* Error rehash directory */
+#define PR_1E_OPTIMIZE_EXT_ERR 0x014001
+
+/* Rebuilding extent trees */
+#define PR_1E_OPTIMIZE_EXT_HEADER 0x014002
+
+/* Rebuilding extent %d */
+#define PR_1E_OPTIMIZE_EXT 0x014003
+
+/* Rebuilding extent tree end */
+#define PR_1E_OPTIMIZE_EXT_END 0x014004
+
+/* Internal error: extent tree depth too large */
+#define PR_1E_MAX_EXTENT_TREE_DEPTH 0x014005
+
+/* Inode extent tree could be shorter */
+#define PR_1E_CAN_COLLAPSE_EXTENT_TREE 0x014006
+
+/* Inode extent tree could be narrower */
+#define PR_1E_CAN_NARROW_EXTENT_TREE 0x014007
+
+/*
* Pass 2 errors
*/
@@ -1032,6 +1063,8 @@ struct problem_context {
/* Rehashing dir end */
#define PR_3A_OPTIMIZE_DIR_END 0x031005
+/* Pass 3B is really just 1E */
+
/*
* Pass 4 errors
*/
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 348923e..dadcfc7 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -749,11 +749,11 @@ static int write_dir_block(ext2_filsys fs,
static errcode_t write_directory(e2fsck_t ctx, ext2_filsys fs,
struct out_dir *outdir,
- ext2_ino_t ino, int compress)
+ ext2_ino_t ino, struct ext2_inode *inode,
+ int compress)
{
struct write_dir_struct wd;
errcode_t retval;
- struct ext2_inode inode;
retval = e2fsck_expand_directory(ctx, ino, -1, outdir->num);
if (retval)
@@ -772,22 +772,23 @@ static errcode_t write_directory(e2fsck_t ctx, ext2_filsys fs,
if (wd.err)
return wd.err;
- e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
+ e2fsck_read_inode(ctx, ino, inode, "rehash_dir");
if (compress)
- inode.i_flags &= ~EXT2_INDEX_FL;
+ inode->i_flags &= ~EXT2_INDEX_FL;
else
- inode.i_flags |= EXT2_INDEX_FL;
- retval = ext2fs_inode_size_set(fs, &inode,
+ inode->i_flags |= EXT2_INDEX_FL;
+ retval = ext2fs_inode_size_set(fs, inode,
outdir->num * fs->blocksize);
if (retval)
return retval;
- ext2fs_iblk_sub_blocks(fs, &inode, wd.cleared);
- e2fsck_write_inode(ctx, ino, &inode, "rehash_dir");
+ ext2fs_iblk_sub_blocks(fs, inode, wd.cleared);
+ e2fsck_write_inode(ctx, ino, inode, "rehash_dir");
return 0;
}
-errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino)
+errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
+ struct problem_context *pctx)
{
ext2_filsys fs = ctx->fs;
errcode_t retval;
@@ -897,10 +898,14 @@ resort:
goto errout;
}
- retval = write_directory(ctx, fs, &outdir, ino, fd.compress);
+ retval = write_directory(ctx, fs, &outdir, ino, &inode, fd.compress);
if (retval)
goto errout;
+ if (ctx->options & E2F_OPT_CONVERT_BMAP)
+ retval = e2fsck_rebuild_extents_later(ctx, ino);
+ else
+ retval = e2fsck_check_rebuild_extents(ctx, ino, &inode, pctx);
errout:
free(dir_buf);
free(fd.harray);
@@ -944,7 +949,7 @@ void e2fsck_rehash_directories(e2fsck_t ctx)
#if 0
fix_problem(ctx, PR_3A_OPTIMIZE_DIR, &pctx);
#endif
- pctx.errcode = e2fsck_rehash_dir(ctx, ino);
+ pctx.errcode = e2fsck_rehash_dir(ctx, ino, &pctx);
if (pctx.errcode) {
end_problem_latch(ctx, PR_LATCH_OPTIMIZE_DIR);
fix_problem(ctx, PR_3A_OPTIMIZE_DIR_ERR, &pctx);
diff --git a/e2fsck/super.c b/e2fsck/super.c
index 1e7e749..e64262a 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -606,6 +606,13 @@ void check_super_block(e2fsck_t ctx)
ext2fs_mark_super_dirty(fs);
}
+ /* Did user ask us to convert files to extents? */
+ if (ctx->options & E2F_OPT_CONVERT_BMAP) {
+ fs->super->s_feature_incompat |=
+ EXT3_FEATURE_INCOMPAT_EXTENTS;
+ ext2fs_mark_super_dirty(fs);
+ }
+
if ((fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) &&
(fs->super->s_first_meta_bg > fs->desc_blocks)) {
pctx.group = fs->desc_blocks;
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index f3672c0..fe5127a 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -709,6 +709,9 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
else
ctx->log_fn = string_copy(ctx, arg, 0);
continue;
+ } else if (strcmp(token, "bmap2extent") == 0) {
+ ctx->options |= E2F_OPT_CONVERT_BMAP;
+ continue;
} else {
fprintf(stderr, _("Unknown extended option: %s\n"),
token);
@@ -728,6 +731,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
fputs(("\tdiscard\n"), stderr);
fputs(("\tnodiscard\n"), stderr);
fputs(("\treadahead_kb=<buffer size>\n"), stderr);
+ fputs(("\tbmap2extent\n"), stderr);
fputc('\n', stderr);
exit(1);
}
diff --git a/tests/f_extent_bad_node/expect.1 b/tests/f_extent_bad_node/expect.1
index 0c0bc28..c9643a1 100644
--- a/tests/f_extent_bad_node/expect.1
+++ b/tests/f_extent_bad_node/expect.1
@@ -2,8 +2,13 @@ Pass 1: Checking inodes, blocks, and sizes
Inode 12 has an invalid extent node (blk 22, lblk 0)
Clear? yes
+Inode 12 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
Inode 12, i_blocks is 16, should be 8. Fix? yes
+Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
@@ -11,13 +16,13 @@ Pass 5: Checking group summary information
Block bitmap differences: -(21--23) -25
Fix? yes
-Free blocks count wrong for group #0 (71, counted=75).
+Free blocks count wrong for group #0 (73, counted=77).
Fix? yes
-Free blocks count wrong (71, counted=75).
+Free blocks count wrong (73, counted=77).
Fix? yes
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
-test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks
+test_filesys: 12/16 files (0.0% non-contiguous), 23/100 blocks
Exit status is 1
diff --git a/tests/f_extent_bad_node/expect.2 b/tests/f_extent_bad_node/expect.2
index 568c792..b78b193 100644
--- a/tests/f_extent_bad_node/expect.2
+++ b/tests/f_extent_bad_node/expect.2
@@ -3,5 +3,5 @@ Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
-test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks
+test_filesys: 12/16 files (0.0% non-contiguous), 23/100 blocks
Exit status is 0
diff --git a/tests/f_extent_int_bad_magic/expect.1 b/tests/f_extent_int_bad_magic/expect.1
index 0e82e2b..3529636 100644
--- a/tests/f_extent_int_bad_magic/expect.1
+++ b/tests/f_extent_int_bad_magic/expect.1
@@ -2,8 +2,13 @@ Pass 1: Checking inodes, blocks, and sizes
Inode 12 has an invalid extent node (blk 1295, lblk 0)
Clear? yes
+Inode 12 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
Inode 12, i_blocks is 712, should be 0. Fix? yes
+Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
diff --git a/tests/f_extent_leaf_bad_magic/expect.1 b/tests/f_extent_leaf_bad_magic/expect.1
index 7b6dbf1..ae27ecc 100644
--- a/tests/f_extent_leaf_bad_magic/expect.1
+++ b/tests/f_extent_leaf_bad_magic/expect.1
@@ -2,8 +2,13 @@ Pass 1: Checking inodes, blocks, and sizes
Inode 12 has an invalid extent node (blk 1604, lblk 0)
Clear? yes
+Inode 12 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
Inode 12, i_blocks is 18, should be 0. Fix? yes
+Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
diff --git a/tests/f_extent_oobounds/expect.1 b/tests/f_extent_oobounds/expect.1
index 3164ea0..f0e282e 100644
--- a/tests/f_extent_oobounds/expect.1
+++ b/tests/f_extent_oobounds/expect.1
@@ -3,8 +3,13 @@ Inode 12, end of extent exceeds allowed value
(logical block 15, physical block 200, len 30)
Clear? yes
+Inode 12 extent tree could be narrower.
+ (level 1 has unnecessary nodes)
+Fix? yes
+
Inode 12, i_blocks is 154, should be 94. Fix? yes
+Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
@@ -12,13 +17,13 @@ Pass 5: Checking group summary information
Block bitmap differences: -(200--229)
Fix? yes
-Free blocks count wrong for group #0 (156, counted=186).
+Free blocks count wrong for group #0 (158, counted=188).
Fix? yes
-Free blocks count wrong (156, counted=186).
+Free blocks count wrong (158, counted=188).
Fix? yes
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
-test_filesys: 12/32 files (8.3% non-contiguous), 70/256 blocks
+test_filesys: 12/32 files (8.3% non-contiguous), 68/256 blocks
Exit status is 1
diff --git a/tests/f_extent_oobounds/expect.2 b/tests/f_extent_oobounds/expect.2
index 22c4f2c..0729283 100644
--- a/tests/f_extent_oobounds/expect.2
+++ b/tests/f_extent_oobounds/expect.2
@@ -3,5 +3,5 @@ Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
-test_filesys: 12/32 files (8.3% non-contiguous), 70/256 blocks
+test_filesys: 12/32 files (8.3% non-contiguous), 68/256 blocks
Exit status is 0
diff --git a/tests/f_extents/expect.1 b/tests/f_extents/expect.1
index aeebc7b..2751eb9 100644
--- a/tests/f_extents/expect.1
+++ b/tests/f_extents/expect.1
@@ -6,6 +6,10 @@ Inode 12 has an invalid extent
(logical block 0, invalid physical block 21994527527949, len 17)
Clear? yes
+Inode 12 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
Inode 12, i_blocks is 34, should be 0. Fix? yes
Inode 13 missing EXTENT_FL, but is in extents format
@@ -21,6 +25,10 @@ Inode 17 has an invalid extent
(logical block 0, invalid physical block 22011707397135, len 15)
Clear? yes
+Inode 17 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
Inode 17, i_blocks is 32, should be 0. Fix? yes
Error while reading over extent tree in inode 18: Corrupt extent header
@@ -31,6 +39,7 @@ Inode 18, i_blocks is 2, should be 0. Fix? yes
Special (device/socket/fifo) file (inode 19) has extents
or inline-data flag set. Clear? yes
+Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Entry 'fbad-flag' in / (2) has deleted/unused inode 18. Clear? yes
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 11/31] tests: verify proper rebuilding of sparse extent trees and block map file conversion
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (9 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 10/31] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
@ 2014-12-20 21:17 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 12/31] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
` (24 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:17 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
tests/f_collapse_extent_tree/expect.1 | 18 ++++
tests/f_collapse_extent_tree/expect.2 | 10 ++
tests/f_collapse_extent_tree/image.gz | Bin
tests/f_collapse_extent_tree/name | 1
tests/f_collapse_extent_tree/script | 118 +++++++++++++++++++++++++++
tests/f_compress_extent_tree_level/expect.1 | 25 ++++++
tests/f_compress_extent_tree_level/expect.2 | 17 ++++
tests/f_compress_extent_tree_level/image.gz | Bin
tests/f_compress_extent_tree_level/name | 1
tests/f_compress_extent_tree_level/script | 118 +++++++++++++++++++++++++++
tests/f_convert_bmap/expect.1 | 26 ++++++
tests/f_convert_bmap/expect.2 | 10 ++
tests/f_convert_bmap/image.gz | Bin
tests/f_convert_bmap/name | 1
tests/f_convert_bmap/script | 117 +++++++++++++++++++++++++++
tests/f_convert_bmap_and_extent/expect.1 | 33 +++++++
tests/f_convert_bmap_and_extent/expect.2 | 16 ++++
tests/f_convert_bmap_and_extent/image.gz | Bin
tests/f_convert_bmap_and_extent/name | 1
tests/f_convert_bmap_and_extent/script | 119 +++++++++++++++++++++++++++
tests/f_extent_too_deep/expect.1 | 23 +++++
tests/f_extent_too_deep/expect.2 | 10 ++
tests/f_extent_too_deep/image.gz | Bin
tests/f_extent_too_deep/name | 1
tests/f_extent_too_deep/script | 118 +++++++++++++++++++++++++++
tests/f_opt_extent/expect | 55 ++++++++++++
tests/f_opt_extent/name | 1
tests/f_opt_extent/script | 64 +++++++++++++++
tests/f_opt_extent_ext3/expect | 44 ++++++++++
tests/f_opt_extent_ext3/name | 1
tests/f_opt_extent_ext3/script | 65 +++++++++++++++
31 files changed, 1013 insertions(+)
create mode 100644 tests/f_collapse_extent_tree/expect.1
create mode 100644 tests/f_collapse_extent_tree/expect.2
create mode 100644 tests/f_collapse_extent_tree/image.gz
create mode 100644 tests/f_collapse_extent_tree/name
create mode 100644 tests/f_collapse_extent_tree/script
create mode 100644 tests/f_compress_extent_tree_level/expect.1
create mode 100644 tests/f_compress_extent_tree_level/expect.2
create mode 100644 tests/f_compress_extent_tree_level/image.gz
create mode 100644 tests/f_compress_extent_tree_level/name
create mode 100644 tests/f_compress_extent_tree_level/script
create mode 100644 tests/f_convert_bmap/expect.1
create mode 100644 tests/f_convert_bmap/expect.2
create mode 100644 tests/f_convert_bmap/image.gz
create mode 100644 tests/f_convert_bmap/name
create mode 100644 tests/f_convert_bmap/script
create mode 100644 tests/f_convert_bmap_and_extent/expect.1
create mode 100644 tests/f_convert_bmap_and_extent/expect.2
create mode 100644 tests/f_convert_bmap_and_extent/image.gz
create mode 100644 tests/f_convert_bmap_and_extent/name
create mode 100644 tests/f_convert_bmap_and_extent/script
create mode 100644 tests/f_extent_too_deep/expect.1
create mode 100644 tests/f_extent_too_deep/expect.2
create mode 100644 tests/f_extent_too_deep/image.gz
create mode 100644 tests/f_extent_too_deep/name
create mode 100644 tests/f_extent_too_deep/script
create mode 100644 tests/f_opt_extent/expect
create mode 100644 tests/f_opt_extent/name
create mode 100644 tests/f_opt_extent/script
create mode 100644 tests/f_opt_extent_ext3/expect
create mode 100644 tests/f_opt_extent_ext3/name
create mode 100644 tests/f_opt_extent_ext3/script
diff --git a/tests/f_collapse_extent_tree/expect.1 b/tests/f_collapse_extent_tree/expect.1
new file mode 100644
index 0000000..d76880c
--- /dev/null
+++ b/tests/f_collapse_extent_tree/expect.1
@@ -0,0 +1,18 @@
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 0 9 1
+ 1/ 1 1/ 1 0 - 0 10 - 10 1
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 1
diff --git a/tests/f_collapse_extent_tree/expect.2 b/tests/f_collapse_extent_tree/expect.2
new file mode 100644
index 0000000..a1d28b1
--- /dev/null
+++ b/tests/f_collapse_extent_tree/expect.2
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 1 0 - 0 10 - 10 1
diff --git a/tests/f_collapse_extent_tree/image.gz b/tests/f_collapse_extent_tree/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..97036cc597c5d2bcf6da6ffa68039b743b46aa20
GIT binary patch
literal 2537
zcmb2|=3r2~7aqdI{Pxc7?BGBdh6lxY##=IfImEQ^-3nbX`9VZ0->#??lNCe@FW4FQ
z#YVL}TII*Z*HrHjAfO&wvUze}my@ja$2)hVrIT2)v*+IzulqiySl;uUviknXEiDW(
zp{IYUa6fd>vO6yD>gk^~J=eHSyKdZJX8c#hmFMiUr|S)uTs`cRwf(*Jw5JI-Ev9#g
zzx10H^z+x-+1&PezM<j%wN-x#UY^?f?s?&}Z!v%C-)BF5eDl?n_HOO-adCV1e|+^f
zi)Zq}X`-jCz9%p5SsMMuaQ~#F154Q%4#=Ijd_P(9SoF_&+pXIBr(Hc<aafRvfuW&!
zy=4ribnWdMatsU%H@<&*|36zXO?crCT?gHin@3zeO5c0DaMs;f`-AOklmETD^R*_D
z<Gowbo@dFzK=l>@Y`c&Bzwh4l(H_VvIMHLp0;CiE@B>MS|I9!V1gcl=k~>^-C+GRq
zERBES2J6qaJ*xkI`|HBKUj^>6K=~WTAJ?~oRpKF>ikGbB1<K1<*)cNWQAa>3L0jxv
z+Ope)Q@;H^{nJEJ_gnqOo+Qus7x`2FDlW?`wQch{Ubb=Tv2VRgcl<QbdwS81f1O)`
zxz73}p2?fs|I45L_x?U_+1AJMzxKcTu9(hR=B|9cUak6ReWXR?Gy8?^|AIA>)j!SO
zTl3dGUh~(|qxV1UpRv3(_UHXHezOY~k3TI=558jc%31d7spkCte`^0vE7$)i)8ioE
m=uzp>5Eu=C(GVC7fzc2c4FPgP;J~%NEDm#Conv57U;qF<XAwLA
literal 0
HcmV?d00001
diff --git a/tests/f_collapse_extent_tree/name b/tests/f_collapse_extent_tree/name
new file mode 100644
index 0000000..83e506f
--- /dev/null
+++ b/tests/f_collapse_extent_tree/name
@@ -0,0 +1 @@
+extent tree can be collapsed one level
diff --git a/tests/f_collapse_extent_tree/script b/tests/f_collapse_extent_tree/script
new file mode 100644
index 0000000..ee18438
--- /dev/null
+++ b/tests/f_collapse_extent_tree/script
@@ -0,0 +1,118 @@
+if [ "$DESCRIPTION"x != x ]; then
+ test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+ IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+ FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+ SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+ OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+ OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+ if [ -f $test_dir/expect.1.gz ]; then
+ EXP1=$test_name.1.tmp
+ gunzip < $test_dir/expect.1.gz > $EXP1
+ else
+ EXP1=$test_dir/expect.1
+ fi
+fi
+
+if [ "$EXP2"x = x ]; then
+ if [ -f $test_dir/expect.2.gz ]; then
+ EXP2=$test_name.2.tmp
+ gunzip < $test_dir/expect.2.gz > $EXP2
+ else
+ EXP2=$test_dir/expect.2
+ fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+ gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+if [ "$ONE_PASS_ONLY" != "true" ]; then
+ $FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1
+ status=$?
+ echo Exit status is $status >> $OUT2.new
+ echo 'ex /a' > $TMPFILE.cmd
+ $DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+ rm -rf $TMPFILE.cmd
+ sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+ rm -f $OUT2.new
+fi
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+ rm -f $test_name.ok $test_name.failed
+ cmp -s $OUT1 $EXP1
+ status1=$?
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ cmp -s $OUT2 $EXP2
+ status2=$?
+ else
+ status2=0
+ fi
+ if [ "$PASS_ZERO" = "true" ]; then
+ cmp -s $test_name.0.log $test_dir/expect.0
+ status3=$?
+ else
+ status3=0
+ fi
+
+ if [ -z "$test_description" ] ; then
+ description="$test_name"
+ else
+ description="$test_name: $test_description"
+ fi
+
+ if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+ echo "$description: ok"
+ touch $test_name.ok
+ else
+ echo "$description: failed"
+ rm -f $test_name.failed
+ if [ "$PASS_ZERO" = "true" ]; then
+ diff $DIFF_OPTS $test_dir/expect.0 \
+ $test_name.0.log >> $test_name.failed
+ fi
+ diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+ fi
+ fi
+ rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+ unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2
+ unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+ unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_compress_extent_tree_level/expect.1 b/tests/f_compress_extent_tree_level/expect.1
new file mode 100644
index 0000000..45dcb39
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/expect.1
@@ -0,0 +1,25 @@
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 2 0 - 16 9 17
+ 1/ 1 1/ 4 0 - 0 10 - 10 1
+ 1/ 1 2/ 4 11 - 11 100 - 100 1
+ 1/ 1 3/ 4 13 - 13 101 - 101 1
+ 1/ 1 4/ 4 15 - 15 102 - 102 1
+ 0/ 1 2/ 2 17 - 21 12 5
+ 1/ 1 1/ 3 17 - 17 103 - 103 1
+ 1/ 1 2/ 3 19 - 19 104 - 104 1
+ 1/ 1 3/ 3 21 - 21 105 - 105 1
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 extent tree could be narrower.
+ (level 1 has unnecessary nodes)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (8.3% non-contiguous), 26/512 blocks
+Exit status is 1
diff --git a/tests/f_compress_extent_tree_level/expect.2 b/tests/f_compress_extent_tree_level/expect.2
new file mode 100644
index 0000000..07d1082
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/expect.2
@@ -0,0 +1,17 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (8.3% non-contiguous), 26/512 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 21 9 22
+ 1/ 1 1/ 7 0 - 0 10 - 10 1
+ 1/ 1 2/ 7 11 - 11 100 - 100 1
+ 1/ 1 3/ 7 13 - 13 101 - 101 1
+ 1/ 1 4/ 7 15 - 15 102 - 102 1
+ 1/ 1 5/ 7 17 - 17 103 - 103 1
+ 1/ 1 6/ 7 19 - 19 104 - 104 1
+ 1/ 1 7/ 7 21 - 21 105 - 105 1
diff --git a/tests/f_compress_extent_tree_level/image.gz b/tests/f_compress_extent_tree_level/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..a552a586ce20e4fbc01d6f332ec800c5c5b6011f
GIT binary patch
literal 2581
zcmb2|=3rR!C_IFT`RyHR9}!0hh6gkEYCmX@`O&3zM1osdIA^Jlw(gQiEytLI<J9c9
z&L3%E^Y!W3JGZlUZ`ds<>4n?Hq`jLtg`|b&q_WGat95BG-FN>tx4!H<U)lS-AJ6lO
z6&ZG4&Dp8pWHDvdcEgsixcDa#cN4c|Jrj3xXS?6icVxq+qB_I2E!8u;UtfEt{cM|d
zx!<y9uDd4Byy90XBb_r>eqZ_jr)RIH>*?If{BP|Sv-8%k+sAkPd2;YL_u=Paa{ab8
z7R65wT$X5aDi6&4{msJ6yZ%?_>ox0_zxZ;6nIVCD%G3ArMBKw){||e0eSc1N>&EnR
z0t^fcIrZA@uO9}lJ$3)OSwS^W;KBazx_|d&lMDq`Wi#$_4&>npx&GlDckiQ5TVJnz
zJo$QFjI93l)(@*BAG_wy2C6@>?b21V|Nk$ndG;T~<GcJ?1V}ghv<H$2|M-C<2;?k1
z7G&FVWc^>G+?H3>jHhdCT>a0f2bmPx&0M7aJXrJFvl^?T|K`LORWSmc@PEdCYgULx
zh7E5`r)9;j`(JJgG7%fdIpei?Ey$ub7WWu|+Q|hE%FpQb^sPCXXZ5ap`Q=61t^WO#
zxjHTCMMSQ(lh5g6K|Ejc^Tfly|0w>exlI4=-q>Te9zVFr@rQf!_Dhd`Ua)#xf9CQ1
zW2IZaKK4Jh|J#3OInf{H^QM12f1&DYykOsxU+Qau|6eK2ar?V@>iYlgPk)$;9sa}r
zHJ&GZ|LXr;C41uCH=MikZRv}1mHu*%)xaV1=-A3zpVnWgPo1*Aeytb_x&9i}HyQ$?
fAut*OqaiRF0;3^7AOs$KuVp*dzh0JsL4g4PCSfgc
literal 0
HcmV?d00001
diff --git a/tests/f_compress_extent_tree_level/name b/tests/f_compress_extent_tree_level/name
new file mode 100644
index 0000000..fde4f4a
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/name
@@ -0,0 +1 @@
+compress an extent tree level
diff --git a/tests/f_compress_extent_tree_level/script b/tests/f_compress_extent_tree_level/script
new file mode 100644
index 0000000..ee18438
--- /dev/null
+++ b/tests/f_compress_extent_tree_level/script
@@ -0,0 +1,118 @@
+if [ "$DESCRIPTION"x != x ]; then
+ test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+ IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+ FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+ SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+ OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+ OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+ if [ -f $test_dir/expect.1.gz ]; then
+ EXP1=$test_name.1.tmp
+ gunzip < $test_dir/expect.1.gz > $EXP1
+ else
+ EXP1=$test_dir/expect.1
+ fi
+fi
+
+if [ "$EXP2"x = x ]; then
+ if [ -f $test_dir/expect.2.gz ]; then
+ EXP2=$test_name.2.tmp
+ gunzip < $test_dir/expect.2.gz > $EXP2
+ else
+ EXP2=$test_dir/expect.2
+ fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+ gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+if [ "$ONE_PASS_ONLY" != "true" ]; then
+ $FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1
+ status=$?
+ echo Exit status is $status >> $OUT2.new
+ echo 'ex /a' > $TMPFILE.cmd
+ $DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+ rm -rf $TMPFILE.cmd
+ sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+ rm -f $OUT2.new
+fi
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+ rm -f $test_name.ok $test_name.failed
+ cmp -s $OUT1 $EXP1
+ status1=$?
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ cmp -s $OUT2 $EXP2
+ status2=$?
+ else
+ status2=0
+ fi
+ if [ "$PASS_ZERO" = "true" ]; then
+ cmp -s $test_name.0.log $test_dir/expect.0
+ status3=$?
+ else
+ status3=0
+ fi
+
+ if [ -z "$test_description" ] ; then
+ description="$test_name"
+ else
+ description="$test_name: $test_description"
+ fi
+
+ if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+ echo "$description: ok"
+ touch $test_name.ok
+ else
+ echo "$description: failed"
+ rm -f $test_name.failed
+ if [ "$PASS_ZERO" = "true" ]; then
+ diff $DIFF_OPTS $test_dir/expect.0 \
+ $test_name.0.log >> $test_name.failed
+ fi
+ diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+ fi
+ fi
+ rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+ unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2
+ unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+ unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_convert_bmap/expect.1 b/tests/f_convert_bmap/expect.1
new file mode 100644
index 0000000..7d2ca86
--- /dev/null
+++ b/tests/f_convert_bmap/expect.1
@@ -0,0 +1,26 @@
+debugfs: stat /a
+Inode: 12 Type: regular Mode: 0644 Flags: 0x0
+Generation: 1573716129 Version: 0x00000000:00000001
+User: 0 Group: 0 Size: 524288
+File ACL: 0 Directory ACL: 0
+Links: 1 Blockcount: 1030
+Fragment: Address: 0 Number: 0 Size: 0
+ ctime: 0x5457f87a:62ae2980 -- Mon Nov 3 21:49:46 2014
+ atime: 0x5457f87a:61ba0598 -- Mon Nov 3 21:49:46 2014
+ mtime: 0x5457f87a:62ae2980 -- Mon Nov 3 21:49:46 2014
+crtime: 0x5457f87a:61ba0598 -- Mon Nov 3 21:49:46 2014
+Size of extra inode fields: 28
+BLOCKS:
+(0-11):1025-1036, (IND):24, (12-267):1037-1292, (DIND):25, (IND):41, (268-511):1293-1536
+TOTAL: 515
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (8.3% non-contiguous), 570/2048 blocks
+Exit status is 1
diff --git a/tests/f_convert_bmap/expect.2 b/tests/f_convert_bmap/expect.2
new file mode 100644
index 0000000..632d411
--- /dev/null
+++ b/tests/f_convert_bmap/expect.2
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 570/2048 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 1 0 - 511 1025 - 1536 512
diff --git a/tests/f_convert_bmap/image.gz b/tests/f_convert_bmap/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..7c22532397ba8d42e928f75190dc545cae2140cf
GIT binary patch
literal 3548
zcmeH{TTD|27{|LAMF+@(0Tox+B<ivR0l73*3mPNpq{>AKJy;kpC>$)zXsxs>&77h{
z5d{R1Tf|{+oO3i%(L%x52w{Q@ZBMl=7l%+hhteTCTuLv?;vP1$_wAwI!}sN%d{4js
zKY#y=D@kV;l8$`5%sjNJl`~FyG$XVKkeS>DH}6f}NR0H7XZeLMHZ~n7S%IBv`1#=N
zvthAe6*sSJFDc(vQMvN!_C4jFe<j65!*^OUm={)9=G$@69F1o=F+fw#*9&#jyvB6W
zL}KW^E$$xb{J^bs+dN*RTpr^8kjSR!^SeC_oE_k?>(^gS=<hNObc@YMH>Kvk-p-E5
z35ksKaBDTb(PT!@CVwq0qb~h&i*n<*7OXbTS(15(N<89733W!X1W$_EjVBkiwdO+O
zgdm1$#XPIm+f@lYY^0qqFBCAP$X(HIv%QKlVn5zT?)`;ZVOpN;uzxq3w!?I(<aK$X
zaS!SJe5<2`goNateEzncYj1mM?y%OZ3HI8cvU9NDh)8IB%F@##rq|m2I-85nb<Z2w
zU2pNJ`vVSvg$htSs_DKP485(ZoNIkL`xm<MpHIzr^#yEhcYn0tc1iSaYjk?luhhO5
z90Xc?kWFAOAghA2z*hiVg?NKc04W0&pkFFLh%qk{=rBd}kQi-QLIK4oC}Cc{q-zD*
zP`R6C8NQns<?n~mydY0dt(W*QypVmG06dPE)IW%j((~@|V>qd0P#|_(kwxolh)vP1
z!4D|7w1*Ax2F-m!!v{D?hY_}Bj_Bhv%&`NoNpo3|U_RQefeE7?I)#|DeuRy`+sqVb
zhZToacT$fmp+(`UIb^NwAlA&?)i0^me$S019}PlkxcgO2tDz)Rj%kS-d=8m`$kjMO
z6!SC5aRssfTtZb|mQr*n*h?xr3>3)6@Uzsrhh!CaB~>w;YLUFa>Is<7Q;82Dp_q#3
z<csQ={t_?rJf<eDT62Ug&tzq~S40U_9La2yoxnS+C+pK1rS8~c>oFl5nU*FALaVW-
z#5F6(I+7;8h~wM?EMP?P6ssj5Wk)A#L~23336-^o#f*~pqh(kRQDM!sx4^PST@Y4H
z$d$?>mQGWE>8%_)T$Cn~;8+<UR(9q~qI9L)1bcHnoC0S2?i-o4t{d#wtbKDMepYK!
z><UbF*FFwcpxc%|-kFy%XsC0WSf9P?uV}qL5AcUQmFdYp`an;>8K9}Er5St++^&Mg
z!A0OEqvaYnu4F^;aunnuN*<J&rxAkvO3UQwWiHGAd&`@O7ag;OUt9kyt+v8kM|<{I
zIyy5D-6rU+%SLL#@YyT-x{F^mR@l1kJP-CwGEIen(_^EFs~!aO?fhmP`84O<$bd6b
zozXVSD_9JACTbIs-$@>k$83X+0lqWGS@2!*G{ZnA+mYmpn#n71w;rbdEhKizylA>_
zMx@ViZ17o^Z5wioybK<B(`9y+686({S#NlAG;TtcgFgWhA6R-TGwW2%xanw*RK0hH
z?Vcg(1Qg2i3)j1S%5oZoyZ;f(55(v*iZt4vOKqNzFP<OV(zIHAX4?0IAD46ya1n43
aa1n43a1r?55SY$(e6qN4MIedv8R-w=i-Ai3
literal 0
HcmV?d00001
diff --git a/tests/f_convert_bmap/name b/tests/f_convert_bmap/name
new file mode 100644
index 0000000..67e0d47
--- /dev/null
+++ b/tests/f_convert_bmap/name
@@ -0,0 +1 @@
+convert blockmap file to extents file
diff --git a/tests/f_convert_bmap/script b/tests/f_convert_bmap/script
new file mode 100644
index 0000000..f6b6f62
--- /dev/null
+++ b/tests/f_convert_bmap/script
@@ -0,0 +1,117 @@
+if [ "$DESCRIPTION"x != x ]; then
+ test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+ IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+ FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+ SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+ OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+ OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+ if [ -f $test_dir/expect.1.gz ]; then
+ EXP1=$test_name.1.tmp
+ gunzip < $test_dir/expect.1.gz > $EXP1
+ else
+ EXP1=$test_dir/expect.1
+ fi
+fi
+
+if [ "$EXP2"x = x ]; then
+ if [ -f $test_dir/expect.2.gz ]; then
+ EXP2=$test_name.2.tmp
+ gunzip < $test_dir/expect.2.gz > $EXP2
+ else
+ EXP2=$test_dir/expect.2
+ fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+ gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'stat /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$TUNE2FS -O extent $TMPFILE >> $OUT1.new 2>&1
+$FSCK $FSCK_OPT -E bmap2extent -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT2.new
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+rm -rf $TMPFILE.cmd
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+rm -f $OUT2.new
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+ rm -f $test_name.ok $test_name.failed
+ cmp -s $OUT1 $EXP1
+ status1=$?
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ cmp -s $OUT2 $EXP2
+ status2=$?
+ else
+ status2=0
+ fi
+ if [ "$PASS_ZERO" = "true" ]; then
+ cmp -s $test_name.0.log $test_dir/expect.0
+ status3=$?
+ else
+ status3=0
+ fi
+
+ if [ -z "$test_description" ] ; then
+ description="$test_name"
+ else
+ description="$test_name: $test_description"
+ fi
+
+ if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+ echo "$description: ok"
+ touch $test_name.ok
+ else
+ echo "$description: failed"
+ rm -f $test_name.failed
+ if [ "$PASS_ZERO" = "true" ]; then
+ diff $DIFF_OPTS $test_dir/expect.0 \
+ $test_name.0.log >> $test_name.failed
+ fi
+ diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+ fi
+ fi
+ rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+ unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2
+ unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+ unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_convert_bmap_and_extent/expect.1 b/tests/f_convert_bmap_and_extent/expect.1
new file mode 100644
index 0000000..7af91aa
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/expect.1
@@ -0,0 +1,33 @@
+debugfs: stat /a
+Inode: 12 Type: regular Mode: 0644 Flags: 0x0
+Generation: 1573716129 Version: 0x00000000:00000001
+User: 0 Group: 0 Size: 524288
+File ACL: 0 Directory ACL: 0
+Links: 1 Blockcount: 1030
+Fragment: Address: 0 Number: 0 Size: 0
+ ctime: 0x5457f87a:62ae2980 -- Mon Nov 3 21:49:46 2014
+ atime: 0x5457f87a:61ba0598 -- Mon Nov 3 21:49:46 2014
+ mtime: 0x5457f87a:62ae2980 -- Mon Nov 3 21:49:46 2014
+crtime: 0x5457f87a:61ba0598 -- Mon Nov 3 21:49:46 2014
+Size of extra inode fields: 28
+BLOCKS:
+(0-11):1025-1036, (IND):24, (12-267):1037-1292, (DIND):25, (IND):41, (268-511):1293-1536
+TOTAL: 515
+
+debugfs: ex /zero
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 8 28 9
+ 1/ 1 1/ 4 0 - 0 27 - 27 1
+ 1/ 1 2/ 4 2 - 2 29 - 29 1
+ 1/ 1 3/ 4 4 - 4 31 - 31 1
+ 1/ 1 4/ 4 6 - 6 33 - 33 1
+Pass 1: Checking inodes, blocks, and sizes
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 13/128 files (15.4% non-contiguous), 574/2048 blocks
+Exit status is 1
diff --git a/tests/f_convert_bmap_and_extent/expect.2 b/tests/f_convert_bmap_and_extent/expect.2
new file mode 100644
index 0000000..73765ea
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/expect.2
@@ -0,0 +1,16 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 13/128 files (7.7% non-contiguous), 574/2048 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 1 0 - 511 1025 - 1536 512
+debugfs: ex /zero
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 4 0 - 0 27 - 27 1
+ 0/ 0 2/ 4 2 - 2 29 - 29 1
+ 0/ 0 3/ 4 4 - 4 31 - 31 1
+ 0/ 0 4/ 4 6 - 6 33 - 33 1
diff --git a/tests/f_convert_bmap_and_extent/image.gz b/tests/f_convert_bmap_and_extent/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..916b493c710843030b453b548be0b6adcd12c48d
GIT binary patch
literal 3657
zcmeIyTToMX9tUvLscfwayK5@~0__Zy*%bt#ks1VQS!7xUxg-RLBtct56D}f#m|%je
z4(XO+RzbXANW}$m;UwG+mq0*^Ra<Uj&B>8)$s$(poDBi>PvC?kJG0$K-}i;z!}sAg
zzo*Z9=jT`PI~P@THFe`A^VGW?Il$%alCvc2Nobha)|k?9a!YJ>-XH&z5>s=9R_{gH
z=JkvJRiWZw+oAU_2l@M#P_IDM(l-Cwf0$Wym}>Wve?8-$k`pE4^;@Q`4(o`}>MR|w
z4HS&pOc6U=DwO;|Ui{Y0uAl7Z{hSrj`|?qOM|w{E%;}<nr@|MWujj^gdVW`FE;c_}
zIbB3d{t)XSn(BDRFBkc`_wl`deKYf>dUe8&ztkCUOE~=ZiD4akDn8#K%q?F0s3_lQ
z2o$Nhn0rf1-oZM7_`+f1(%mltm^$$DpoK@w-_Hx5y5y`Om3Xgu$Yh!L>f_nNS!{^u
zEs;3KFFan7jlSFNs`8ol-RSb)IUh05+S;l+BnaR4(%I>l__bIqn%)ukLwTuVpD-kp
zeCOr;wfXt)Jhzn&c0J3&<`1?X9~*yuK11x+W?JlA%8pyk*HWd?Ctp39w7b`42XEme
zJjvhZ(8xa7>nMx8|21#J4(ItL_qWC52Z6{_mbONb&FP`{)8rp;8t6SVMJflT>kM)d
z2htva7`R@`wbC^PO^H5|bHUiMPg%*S(x&Cn!`SV{z6FL=VKZ!pv!^r*tRsj4PKf;8
zH{D(7ipF5K;k>-G3Dga=7+Zar5_z2AFpwL2d>InOL&F}>ZG?+eH^7SEb2BrsIRL4`
z{|v{OGZG;kb`s{9DK>)$F7-)D)a2;Pz_5VkLv=+zBB|;aV#85V?+g75xFyJs)LfGH
zfHlEEpN!z<J8Y&8BUt$r_9HjfC+#5`k2Qcc0bf_o%+M&)P!0xjKbNMz#<_qH#7L<H
z%~f~;cp-?_!PB$?4F&T7qhxqG)uMiay%8kpx<=(z#D)(Op9x!{sD5gRJ_9@^(uIAa
z>O}p1@Do85YDb|lJs*q{nZo}5Bvx}Uat&W2z7)cH<nQ721f7w|(!^ulV1%Fvd;2NZ
z4B22x+ozUW5G(Eq28m>Qw<skVk|9^QsrEim@<1~SK>)686j4w^0nP(9ty-;#*8Aa!
z+!%Y0BPHCx!V7@a%R5S(=(+U}yJHvf{ANl8@(liKPzHrVPta8Ud*ji|q|W7$sLk6w
zm(RD%-!uk$h3giPML{ZO>#osvLT<4*_e$N}=EUMH6+zcoyb#fN%Mu@L`(Gh{Yo)WY
zp>gZxcr+Bw4N4nAx4_3B>TdKeFdL$&P%rp6l*&`V94WnPfr%poSE9|yOY(Lb3s3S!
zcX2hwj^p~{Fn_9dfntS%*h5;@)Ig^a;?xR0i)`_wx~s+d7|w)I-Kp`xj>2C>_W34D
znx$AWTokGGrAQzKRtR&<v@m=LTr}$H(i4>dD1|LCw%5^1kSA=5v1u8yvz_Gc>zHHm
zP52GaOqRQ;W7xFFUN>lvT|A-~#P@?q0n;bB2ww$%BeyZr<B&eGmPvi6ufU&@)t5Ba
z^znE$cqNF{wJ}pQYE;j`FOpkdpg0CVIax8C9)Q+jBcMjW(LvM9CVd1Z<+63EX+{(B
z64VI_q{$YQ6}<t<h%BKzD$`HVj@<#R#91jrCNILOfPy$9Jt9-Q#@ay-aZY%<Kb@rv
zMpwaAg3U<sFvJ2_+uhH+rVqzz0YnrFoAy9MSTC2)NH(b~s5>SGcZfp!KSlIt)oxr#
z@a^@Yi~*$zN4W|1Mm1zISTHM>!C;ljqYW~woG7w4<Tno(Oqhj4n++bdxsUW5wcdE|
zNcD0lnj5jBv~2cN@7$gf&uYGy)s*e@>VHsZbe^zt+{$OMTRL&jxt#}gZ{S_z$GMjW
zFl)~Z*`1HF#wS(_EN==HocWFe&y&j{9`FZeWg|{x$uysZ)CPtNS9*^HF^2S~Of8u~
ztlj!=;S?xSHThsx=A2uJ=5{;11NUApshAzOeD7G5lfM@J%i4(q<xixB@~&WJy#5La
zZbaV9^n}`OBHNp1tMA2@OdFTSCgz8Q<-eC~5aX<~foJ1B<#gL|tL^#PYM4joDOHo(
zu7%p$3bgXf^#i9X9-0lK+d@x<6xRJ?;?;_?$o|_?eS4mo3+X#Nw`i17Hg7}Cndd?Z
pk+F}bNN+!0->wU+3#<#Q3#<#Q3;h2DCVyPpWW8%MxVU`i@*iE}y4(N&
literal 0
HcmV?d00001
diff --git a/tests/f_convert_bmap_and_extent/name b/tests/f_convert_bmap_and_extent/name
new file mode 100644
index 0000000..c9394c6
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/name
@@ -0,0 +1 @@
+convert blockmap and extents files to extents files
diff --git a/tests/f_convert_bmap_and_extent/script b/tests/f_convert_bmap_and_extent/script
new file mode 100644
index 0000000..203ab25
--- /dev/null
+++ b/tests/f_convert_bmap_and_extent/script
@@ -0,0 +1,119 @@
+if [ "$DESCRIPTION"x != x ]; then
+ test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+ IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+ FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+ SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+ OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+ OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+ if [ -f $test_dir/expect.1.gz ]; then
+ EXP1=$test_name.1.tmp
+ gunzip < $test_dir/expect.1.gz > $EXP1
+ else
+ EXP1=$test_dir/expect.1
+ fi
+fi
+
+if [ "$EXP2"x = x ]; then
+ if [ -f $test_dir/expect.2.gz ]; then
+ EXP2=$test_name.2.tmp
+ gunzip < $test_dir/expect.2.gz > $EXP2
+ else
+ EXP2=$test_dir/expect.2
+ fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+ gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'stat /a' > $TMPFILE.cmd
+echo 'ex /zero' >> $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$TUNE2FS -O extent $TMPFILE >> $OUT1.new 2>&1
+$FSCK $FSCK_OPT -E bmap2extent -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+$FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT2.new
+echo 'ex /a' > $TMPFILE.cmd
+echo 'ex /zero' >> $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+rm -rf $TMPFILE.cmd
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+rm -f $OUT2.new
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+ rm -f $test_name.ok $test_name.failed
+ cmp -s $OUT1 $EXP1
+ status1=$?
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ cmp -s $OUT2 $EXP2
+ status2=$?
+ else
+ status2=0
+ fi
+ if [ "$PASS_ZERO" = "true" ]; then
+ cmp -s $test_name.0.log $test_dir/expect.0
+ status3=$?
+ else
+ status3=0
+ fi
+
+ if [ -z "$test_description" ] ; then
+ description="$test_name"
+ else
+ description="$test_name: $test_description"
+ fi
+
+ if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+ echo "$description: ok"
+ touch $test_name.ok
+ else
+ echo "$description: failed"
+ rm -f $test_name.failed
+ if [ "$PASS_ZERO" = "true" ]; then
+ diff $DIFF_OPTS $test_dir/expect.0 \
+ $test_name.0.log >> $test_name.failed
+ fi
+ diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+ fi
+ fi
+ rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+ unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2
+ unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+ unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_extent_too_deep/expect.1 b/tests/f_extent_too_deep/expect.1
new file mode 100644
index 0000000..a595482
--- /dev/null
+++ b/tests/f_extent_too_deep/expect.1
@@ -0,0 +1,23 @@
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 7 1/ 1 0 - 0 12 1
+ 1/ 7 1/ 1 0 - 0 13 1
+ 2/ 7 1/ 1 0 - 0 14 1
+ 3/ 7 1/ 1 0 - 0 15 1
+ 4/ 7 1/ 1 0 - 0 16 1
+ 5/ 7 1/ 1 0 - 0 17 1
+ 6/ 7 1/ 1 0 - 0 9 1
+ 7/ 7 1/ 1 0 - 0 10 - 10 1
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 extent tree could be more shallow (7; could be <= 4)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 1
diff --git a/tests/f_extent_too_deep/expect.2 b/tests/f_extent_too_deep/expect.2
new file mode 100644
index 0000000..a1d28b1
--- /dev/null
+++ b/tests/f_extent_too_deep/expect.2
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 19/512 blocks
+Exit status is 0
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 1 0 - 0 10 - 10 1
diff --git a/tests/f_extent_too_deep/image.gz b/tests/f_extent_too_deep/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..0f5adff562c7f45f275a4401344e784b6c61ef0b
GIT binary patch
literal 2592
zcmb2|=3wx?6duCF{PymCf8jtGh7aG@g)R_rT-?mre9>1}CPu>N$ctAhoXss6N_=iF
z5@Jgn)7j0{g?;~U$MjrWSXgN2BJ`haZVz)`*522$TUX9#P_RGs-1*+#-}m|t)ke$R
z5APFT+7Par(yh|6IP<$^TY&yHb=Pd!4HL>YcLz^?KYf{6a)z0HeXpXe&6y>)!{oHZ
z{w4_aEWdmEi{rbce>>ZI-7<e&n#^wgT>k9*+V6{hzqYV7kN<!F`meNSFQ+Q6*V|Y3
z^T*9EtryRSC(q!xs&fB&rR?lflfNCSbM%+z9`NR8Xvm!~b$|P%vOS^yuV3DA=UzC!
zdV2~70|P_Z{%7a+eb~FD@iQX>LxZgT@4xk?M^bd2d}j)Ap48L(^!o=nzNt^b-j@Bn
z(tB&}&(}Bi3E1%+uQ+>K4XAm;YM!#Q|M&Ag|6C8`EjZre0W^eR#(!oY`RqTC0D%K#
zt8|VVNY)>}nWdV0b3e-(v5!&LK$gwTUKeJq&J9t;uwnPp@BiQDA9oI70*csL0e#oU
z@4*RVe|hm||DC?}r7S>pRms2jtmDcsHU76gI%<N<lAHVgc$tlYDp1LV;#slvAMS%y
z_mw*UMaTsa{WFiQJUMmS^mV^a|12?_6#qZx;wRgs7w*6M?d&ss>QjkRvd@oR6O=Ze
zwT(YrEiNVZY2}A3mTiX){VlmTBO>)r|Ir`c-?Qz0^ke$R^}nA_NN3&Vro6gdEo}Aw
zNd;?G)F0CN>c2B%cHsY<=imQ-{{3@@$#3_+;+waWT&=Ht`+1dYa@5urll#5~9k0rL
z`R+&5tNm+I=hrV419@x|jE2By2#kinXb6mkz-S22A_Nlt)-hg8%bdf&puhkC@B%oQ
literal 0
HcmV?d00001
diff --git a/tests/f_extent_too_deep/name b/tests/f_extent_too_deep/name
new file mode 100644
index 0000000..7e8654a
--- /dev/null
+++ b/tests/f_extent_too_deep/name
@@ -0,0 +1 @@
+extent tree is deeper than it needs to be
diff --git a/tests/f_extent_too_deep/script b/tests/f_extent_too_deep/script
new file mode 100644
index 0000000..ee18438
--- /dev/null
+++ b/tests/f_extent_too_deep/script
@@ -0,0 +1,118 @@
+if [ "$DESCRIPTION"x != x ]; then
+ test_description="$DESCRIPTION"
+fi
+if [ "$IMAGE"x = x ]; then
+ IMAGE=$test_dir/image.gz
+fi
+
+if [ "$FSCK_OPT"x = x ]; then
+ FSCK_OPT=-yf
+fi
+
+if [ "$SECOND_FSCK_OPT"x = x ]; then
+ SECOND_FSCK_OPT=-yf
+fi
+
+if [ "$OUT1"x = x ]; then
+ OUT1=$test_name.1.log
+fi
+
+if [ "$OUT2"x = x ]; then
+ OUT2=$test_name.2.log
+fi
+
+if [ "$EXP1"x = x ]; then
+ if [ -f $test_dir/expect.1.gz ]; then
+ EXP1=$test_name.1.tmp
+ gunzip < $test_dir/expect.1.gz > $EXP1
+ else
+ EXP1=$test_dir/expect.1
+ fi
+fi
+
+if [ "$EXP2"x = x ]; then
+ if [ -f $test_dir/expect.2.gz ]; then
+ EXP2=$test_name.2.tmp
+ gunzip < $test_dir/expect.2.gz > $EXP2
+ else
+ EXP2=$test_dir/expect.2
+ fi
+fi
+
+if [ "$SKIP_GUNZIP" != "true" ] ; then
+ gunzip < $IMAGE > $TMPFILE
+fi
+
+cp /dev/null $OUT1
+
+eval $PREP_CMD
+
+echo 'ex /a' > $TMPFILE.cmd
+$DEBUGFS -f $TMPFILE.cmd $TMPFILE > $OUT1.new 2>&1
+rm -rf $TMPFILE.cmd
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT1.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT1.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT1.new >> $OUT1
+rm -f $OUT1.new
+
+if [ "$ONE_PASS_ONLY" != "true" ]; then
+ $FSCK $SECOND_FSCK_OPT -N test_filesys $TMPFILE > $OUT2.new 2>&1
+ status=$?
+ echo Exit status is $status >> $OUT2.new
+ echo 'ex /a' > $TMPFILE.cmd
+ $DEBUGFS -f $TMPFILE.cmd $TMPFILE >> $OUT2.new 2>&1
+ rm -rf $TMPFILE.cmd
+ sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT2.new > $OUT2
+ rm -f $OUT2.new
+fi
+
+eval $AFTER_CMD
+
+if [ "$SKIP_VERIFY" != "true" ] ; then
+ rm -f $test_name.ok $test_name.failed
+ cmp -s $OUT1 $EXP1
+ status1=$?
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ cmp -s $OUT2 $EXP2
+ status2=$?
+ else
+ status2=0
+ fi
+ if [ "$PASS_ZERO" = "true" ]; then
+ cmp -s $test_name.0.log $test_dir/expect.0
+ status3=$?
+ else
+ status3=0
+ fi
+
+ if [ -z "$test_description" ] ; then
+ description="$test_name"
+ else
+ description="$test_name: $test_description"
+ fi
+
+ if [ "$status1" -eq 0 -a "$status2" -eq 0 -a "$status3" -eq 0 ] ; then
+ echo "$description: ok"
+ touch $test_name.ok
+ else
+ echo "$description: failed"
+ rm -f $test_name.failed
+ if [ "$PASS_ZERO" = "true" ]; then
+ diff $DIFF_OPTS $test_dir/expect.0 \
+ $test_name.0.log >> $test_name.failed
+ fi
+ diff $DIFF_OPTS $EXP1 $OUT1 >> $test_name.failed
+ if [ "$ONE_PASS_ONLY" != "true" ]; then
+ diff $DIFF_OPTS $EXP2 $OUT2 >> $test_name.failed
+ fi
+ fi
+ rm -f tmp_expect
+fi
+
+if [ "$SKIP_CLEANUP" != "true" ] ; then
+ unset IMAGE FSCK_OPT SECOND_FSCK_OPT OUT1 OUT2 EXP1 EXP2
+ unset SKIP_VERIFY SKIP_CLEANUP SKIP_GUNZIP ONE_PASS_ONLY PREP_CMD
+ unset DESCRIPTION SKIP_UNLINK AFTER_CMD PASS_ZERO
+fi
+
diff --git a/tests/f_opt_extent/expect b/tests/f_opt_extent/expect
new file mode 100644
index 0000000..6d4863b
--- /dev/null
+++ b/tests/f_opt_extent/expect
@@ -0,0 +1,55 @@
+tune2fs metadata_csum test
+Creating filesystem with 524288 1k blocks and 65536 inodes
+Superblock backups stored on blocks:
+ 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
+
+Allocating group tables: \b\b\b\b\bdone
+Writing inode tables: \b\b\b\b\bdone
+Creating journal (16384 blocks): done
+Creating 477 huge file(s) with 1024 blocks each: done
+Writing superblocks and filesystem accounting information: \b\b\b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+
+
+Change in FS metadata:
+@@ -10,7 +10,7 @@
+ Inode count: 65536
+ Block count: 524288
+ Reserved block count: 26214
+-Free blocks: 570
++Free blocks: 567
+ Free inodes: 65047
+ First block: 1
+ Block size: 1024
+@@ -47,8 +47,8 @@
+ Block bitmap at 262 (+261)
+ Inode bitmap at 278 (+277)
+ Inode table at 294-549 (+293)
+- 21 free blocks, 535 free inodes, 3 directories, 535 unused inodes
+- Free blocks: 4414-4434
++ 18 free blocks, 535 free inodes, 3 directories, 535 unused inodes
++ Free blocks: 4417-4434
+ Free inodes: 490-1024
+ Group 1: (Blocks 8193-16384) [INODE_UNINIT]
+ Backup superblock at 8193, Group descriptors at 8194-8197
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
diff --git a/tests/f_opt_extent/name b/tests/f_opt_extent/name
new file mode 100644
index 0000000..7d4389c
--- /dev/null
+++ b/tests/f_opt_extent/name
@@ -0,0 +1 @@
+optimize extent tree
diff --git a/tests/f_opt_extent/script b/tests/f_opt_extent/script
new file mode 100644
index 0000000..2da5e91
--- /dev/null
+++ b/tests/f_opt_extent/script
@@ -0,0 +1,64 @@
+FSCK_OPT=-fn
+OUT=$test_name.log
+EXP=$test_dir/expect
+CONF=$TMPFILE.conf
+
+cat > $CONF << ENDL
+[fs_types]
+ ext4h = {
+ features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode,64bit,metadata_csum
+ blocksize = 1024
+ inode_size = 256
+ make_hugefiles = true
+ hugefiles_dir = /xyz
+ hugefiles_slack = 0
+ hugefiles_name = aaaaa
+ hugefiles_digits = 4
+ hugefiles_size = 1M
+ zero_hugefiles = false
+ }
+ENDL
+
+echo "tune2fs metadata_csum test" > $OUT
+
+MKE2FS_CONFIG=$CONF $MKE2FS -F -T ext4h $TMPFILE 524288 >> $OUT 2>&1
+rm -rf $CONF
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.before
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+# check
+$FSCK -fyD -N test_filesys $TMPFILE >> $OUT 2>&1
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.after
+echo "Change in FS metadata:" >> $OUT
+diff -u $OUT.before $OUT.after | tail -n +3 >> $OUT
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+rm $TMPFILE $OUT.before $OUT.after
+
+#
+# Do the verification
+#
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" -e 's/test_filesys:.*//g' < $OUT > $OUT.new
+mv $OUT.new $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+unset IMAGE FSCK_OPT OUT EXP CONF
diff --git a/tests/f_opt_extent_ext3/expect b/tests/f_opt_extent_ext3/expect
new file mode 100644
index 0000000..1761471
--- /dev/null
+++ b/tests/f_opt_extent_ext3/expect
@@ -0,0 +1,44 @@
+rebuild extent metadata_csum test
+Creating filesystem with 524288 1k blocks and 65536 inodes
+Superblock backups stored on blocks:
+ 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
+
+Allocating group tables: \b\b\b\b\bdone
+Writing inode tables: \b\b\b\b\bdone
+Creating journal (16384 blocks): done
+mke2fs: Operation not supported for inodes containing extents while creating huge files
+Writing superblocks and filesystem accounting information: \b\b\b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 3A: Optimizing directories
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+
+
+Change in FS metadata:
+@@ -2,7 +2,7 @@
+ Last mounted on: <not available>
+ Filesystem magic number: 0xEF53
+ Filesystem revision #: 1 (dynamic)
+-Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file huge_file dir_nlink
++Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent sparse_super large_file huge_file dir_nlink
+ Default mount options: user_xattr acl
+ Filesystem state: clean
+ Errors behavior: Continue
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+Exit status is 0
diff --git a/tests/f_opt_extent_ext3/name b/tests/f_opt_extent_ext3/name
new file mode 100644
index 0000000..b369685
--- /dev/null
+++ b/tests/f_opt_extent_ext3/name
@@ -0,0 +1 @@
+convert ext3 to extent tree
diff --git a/tests/f_opt_extent_ext3/script b/tests/f_opt_extent_ext3/script
new file mode 100644
index 0000000..931eae7
--- /dev/null
+++ b/tests/f_opt_extent_ext3/script
@@ -0,0 +1,65 @@
+FSCK_OPT=-fn
+OUT=$test_name.log
+EXP=$test_dir/expect
+CONF=$TMPFILE.conf
+
+cat > $CONF << ENDL
+[fs_types]
+ ext4h = {
+ features = has_journal,^extent,huge_file,^flex_bg,^uninit_bg,dir_nlink,^extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode,^64bit,^metadata_csum
+ blocksize = 1024
+ inode_size = 256
+ make_hugefiles = true
+ hugefiles_dir = /
+ num_hugefiles = 100
+ hugefiles_slack = 0
+ hugefiles_name = aaaaa
+ hugefiles_digits = 4
+ hugefiles_size = 1M
+ zero_hugefiles = false
+ }
+ENDL
+
+echo "rebuild extent metadata_csum test" > $OUT
+
+MKE2FS_CONFIG=$CONF $MKE2FS -F -T ext4h $TMPFILE 524288 >> $OUT 2>&1
+rm -rf $CONF
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.before
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+# check
+$FSCK -fyD -N test_filesys -E bmap2extent $TMPFILE >> $OUT 2>&1
+
+# dump and check
+$DUMPE2FS $TMPFILE 2> /dev/null | grep '^Group 0:' -B99 -A20 | sed -f $cmd_dir/filter.sed > $OUT.after
+echo "Change in FS metadata:" >> $OUT
+diff -u $OUT.before $OUT.after | tail -n +3 >> $OUT
+$FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
+status=$?
+echo Exit status is $status >> $OUT
+
+rm $TMPFILE $OUT.before $OUT.after
+
+#
+# Do the verification
+#
+
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" -e 's/test_filesys:.*//g' < $OUT > $OUT.new
+mv $OUT.new $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+fi
+
+unset IMAGE FSCK_OPT OUT EXP CONF
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 12/31] undo-io: add new calls to and speed up the undo io manager
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (10 preceding siblings ...)
2014-12-20 21:17 ` [PATCH 11/31] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 13/31] undo-io: be more flexible about setting block size Darrick J. Wong
` (23 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Implement pass-through calls for discard, zero-out, and readahead in
the IO manager so that we can take advantage of any underlying
support.
Furthermore, improve tdb write-out speed by disabling locking and only
fsyncing at the end -- we don't care about locking because having
multiple writers to the undo file will produce an undo database full
of garbage blocks; and we only need to fsync at the end because if we
fail before the end, our undo file will lack the necessary superblock
data that e2undo requires to do replay safely. Without this, we call
fsync four times per tdb update(!) This reduces the overhead of using
undo_io while converting a 2TB FS to metadata_csum from 3+ hours to 55
minutes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/tdb.c | 10 ++++++
lib/ext2fs/tdb.h | 2 +
lib/ext2fs/undo_io.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 97 insertions(+), 2 deletions(-)
diff --git a/lib/ext2fs/tdb.c b/lib/ext2fs/tdb.c
index 61e30ed..a916768 100644
--- a/lib/ext2fs/tdb.c
+++ b/lib/ext2fs/tdb.c
@@ -4138,3 +4138,13 @@ int tdb_reopen_all(int parent_longlived)
return 0;
}
+
+/**
+ * Flush a database file from the page cache.
+ **/
+int tdb_flush(struct tdb_context *tdb)
+{
+ if (tdb->fd != -1)
+ return fsync(tdb->fd);
+ return 0;
+}
diff --git a/lib/ext2fs/tdb.h b/lib/ext2fs/tdb.h
index 732ef0e..6a4086c 100644
--- a/lib/ext2fs/tdb.h
+++ b/lib/ext2fs/tdb.h
@@ -129,6 +129,7 @@ typedef struct TDB_DATA {
#define tdb_lockall_nonblock ext2fs_tdb_lockall_nonblock
#define tdb_lockall_read_nonblock ext2fs_tdb_lockall_read_nonblock
#define tdb_lockall_unmark ext2fs_tdb_lockall_unmark
+#define tdb_flush ext2fs_tdb_flush
/* this is the context structure that is returned from a db open */
typedef struct tdb_context TDB_CONTEXT;
@@ -191,6 +192,7 @@ size_t tdb_map_size(struct tdb_context *tdb);
int tdb_get_flags(struct tdb_context *tdb);
void tdb_enable_seqnum(struct tdb_context *tdb);
void tdb_increment_seqnum_nonblock(struct tdb_context *tdb);
+int tdb_flush(struct tdb_context *tdb);
/* Low level locking functions: use with care */
int tdb_chainlock(struct tdb_context *tdb, TDB_DATA key);
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index d6beb02..94317cb 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -37,6 +37,7 @@
#if HAVE_SYS_RESOURCE_H
#include <sys/resource.h>
#endif
+#include <limits.h>
#include "tdb.h"
@@ -354,8 +355,12 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
data->real = 0;
}
+ if (data->real)
+ io->flags = (io->flags & ~CHANNEL_FLAGS_DISCARD_ZEROES) |
+ (data->real->flags & CHANNEL_FLAGS_DISCARD_ZEROES);
+
/* setup the tdb file */
- data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST,
+ data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST | TDB_NOLOCK | TDB_NOSYNC,
O_RDWR | O_CREAT | O_TRUNC | O_EXCL, 0600);
if (!data->tdb) {
retval = errno;
@@ -399,8 +404,10 @@ static errcode_t undo_close(io_channel channel)
return retval;
if (data->real)
retval = io_channel_close(data->real);
- if (data->tdb)
+ if (data->tdb) {
+ tdb_flush(data->tdb);
tdb_close(data->tdb);
+ }
ext2fs_free_mem(&channel->private_data);
if (channel->name)
ext2fs_free_mem(&channel->name);
@@ -510,6 +517,77 @@ static errcode_t undo_write_byte(io_channel channel, unsigned long offset,
return retval;
}
+static errcode_t undo_discard(io_channel channel, unsigned long long block,
+ unsigned long long count)
+{
+ struct undo_private_data *data;
+ errcode_t retval = 0;
+ int icount;
+
+ EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+ data = (struct undo_private_data *) channel->private_data;
+ EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+ if (count > INT_MAX)
+ return EXT2_ET_UNIMPLEMENTED;
+ icount = count;
+
+ /*
+ * First write the existing content into database
+ */
+ retval = undo_write_tdb(channel, block, icount);
+ if (retval)
+ return retval;
+ if (data->real)
+ retval = io_channel_discard(data->real, block, count);
+
+ return retval;
+}
+
+static errcode_t undo_zeroout(io_channel channel, unsigned long long block,
+ unsigned long long count)
+{
+ struct undo_private_data *data;
+ errcode_t retval = 0;
+ int icount;
+
+ EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+ data = (struct undo_private_data *) channel->private_data;
+ EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+ if (count > INT_MAX)
+ return EXT2_ET_UNIMPLEMENTED;
+ icount = count;
+
+ /*
+ * First write the existing content into database
+ */
+ retval = undo_write_tdb(channel, block, icount);
+ if (retval)
+ return retval;
+ if (data->real)
+ retval = io_channel_zeroout(data->real, block, count);
+
+ return retval;
+}
+
+static errcode_t undo_cache_readahead(io_channel channel,
+ unsigned long long block,
+ unsigned long long count)
+{
+ struct undo_private_data *data;
+ errcode_t retval = 0;
+
+ EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
+ data = (struct undo_private_data *) channel->private_data;
+ EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+
+ if (data->real)
+ retval = io_channel_cache_readahead(data->real, block, count);
+
+ return retval;
+}
+
/*
* Flush data buffers to disk.
*/
@@ -522,6 +600,8 @@ static errcode_t undo_flush(io_channel channel)
data = (struct undo_private_data *) channel->private_data;
EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
+ if (data->tdb)
+ tdb_flush(data->tdb);
if (data->real)
retval = io_channel_flush(data->real);
@@ -601,6 +681,9 @@ static struct struct_io_manager struct_undo_manager = {
.get_stats = undo_get_stats,
.read_blk64 = undo_read_blk64,
.write_blk64 = undo_write_blk64,
+ .discard = undo_discard,
+ .zeroout = undo_zeroout,
+ .cache_readahead = undo_cache_readahead,
};
io_manager undo_io_manager = &struct_undo_manager;
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 13/31] undo-io: be more flexible about setting block size
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (11 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 12/31] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 14/31] undo-io: use a bitmap to track what we've already written Darrick J. Wong
` (22 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Most of the e2fsprogs utilities set the IO block size multiple times
(once to 1k to read the superblock, then again to set the real block
size if we find a real superblock). Unfortunately, the undo IO
manager only lets the block size be set once. For the non-mke2fs
utilities we'd rather catch the real block size and use that. mke2fs
of course wants to use a really large block size since it's probably
writing a lot of data.
Therefore, if we haven't written any blocks to the undo file, it's
perfectly fine to allow block size changes. For mke2fs, we'll modify
the IO channel option that lets us set the huge size to lock that
in place. This greatly reduces index overhead for undo files for
e2fsck/tune2fs/resize2fs while continuing the practice of reducing
it even more for mke2fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/undo_io.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 94317cb..70b90d5 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -265,7 +265,7 @@ static errcode_t undo_write_tdb(io_channel channel,
tdb_data.dptr,
tdb_data.dsize);
#endif
- if (!data->tdb_written) {
+ if (data->tdb_written != 1) {
data->tdb_written = 1;
/* Write the blocksize to tdb file */
retval = write_block_size(data->tdb,
@@ -430,9 +430,8 @@ static errcode_t undo_set_blksize(io_channel channel, int blksize)
/*
* Set the block size used for tdb
*/
- if (!data->tdb_data_size) {
+ if (!data->tdb_data_size || !data->tdb_written)
data->tdb_data_size = blksize;
- }
channel->block_size = blksize;
return retval;
}
@@ -628,6 +627,7 @@ static errcode_t undo_set_option(io_channel channel, const char *option,
if (*end)
return EXT2_ET_INVALID_ARGUMENT;
if (!data->tdb_data_size || !data->tdb_written) {
+ data->tdb_written = -1;
data->tdb_data_size = tmp;
}
return 0;
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 14/31] undo-io: use a bitmap to track what we've already written
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (12 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 13/31] undo-io: be more flexible about setting block size Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 15/31] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
` (21 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
It's really inefficient to (ab)use the TDB key store as a bitmap to
find out if we've already written a block to the undo file, because
the tdb code is reads the database key btree disk blocks for *every*
query. Changing that logic to a bitmap reduces overhead by a large
margin -- the overhead of using undo_io while converting a 2TB FS to
metadata_csum is reduced from 55 minutes to 45.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/undo_io.c | 69 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 51 insertions(+), 18 deletions(-)
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 70b90d5..9a01e30 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -70,6 +70,9 @@ struct undo_private_data {
/* to support offset in unix I/O manager */
ext2_loff_t offset;
+
+ ext2fs_block_bitmap written_block_map;
+ struct struct_ext2_filsys fake_fs;
};
static io_manager undo_io_backing_manager;
@@ -164,6 +167,38 @@ static errcode_t write_block_size(TDB_CONTEXT *tdb, int block_size)
return retval;
}
+static errcode_t undo_setup_tdb(struct undo_private_data *data)
+{
+ errcode_t retval;
+
+ if (data->tdb_written == 1)
+ return 0;
+
+ data->tdb_written = 1;
+
+ /* Make a bitmap to track what we've written */
+ memset(&data->fake_fs, 0, sizeof(data->fake_fs));
+ data->fake_fs.blocksize = data->tdb_data_size;
+ retval = ext2fs_alloc_generic_bmap(&data->fake_fs,
+ EXT2_ET_MAGIC_BLOCK_BITMAP64,
+ EXT2FS_BMAP64_RBTREE,
+ 0, ~1ULL, ~1ULL,
+ "undo block map", &data->written_block_map);
+ if (retval)
+ return retval;
+
+ /* Write the blocksize to tdb file */
+ tdb_transaction_start(data->tdb);
+ retval = write_block_size(data->tdb,
+ data->tdb_data_size);
+ if (retval) {
+ tdb_transaction_cancel(data->tdb);
+ return EXT2_ET_TDB_ERR_IO;
+ }
+ tdb_transaction_commit(data->tdb);
+ return 0;
+}
+
static errcode_t undo_write_tdb(io_channel channel,
unsigned long long block, int count)
@@ -194,6 +229,10 @@ static errcode_t undo_write_tdb(io_channel channel,
else
size = count * channel->block_size;
}
+
+ retval = undo_setup_tdb(data);
+ if (retval)
+ return retval;
/*
* Data is stored in tdb database as blocks of tdb_data_size size
* This helps in efficient lookup further.
@@ -212,11 +251,14 @@ static errcode_t undo_write_tdb(io_channel channel,
/*
* Check if we have the record already
*/
- if (tdb_exists(data->tdb, tdb_key)) {
+ if (ext2fs_test_block_bitmap2(data->written_block_map,
+ block_num)) {
/* Try the next block */
block_num++;
continue;
}
+ ext2fs_mark_block_bitmap2(data->written_block_map, block_num);
+
/*
* Read one block using the backing I/O manager
* The backing I/O manager block size may be
@@ -265,19 +307,7 @@ static errcode_t undo_write_tdb(io_channel channel,
tdb_data.dptr,
tdb_data.dsize);
#endif
- if (data->tdb_written != 1) {
- data->tdb_written = 1;
- /* Write the blocksize to tdb file */
- retval = write_block_size(data->tdb,
- data->tdb_data_size);
- if (retval) {
- tdb_transaction_cancel(data->tdb);
- retval = EXT2_ET_TDB_ERR_IO;
- free(read_ptr);
- return retval;
- }
- }
- retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_INSERT);
+ retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_REPLACE);
if (retval == -1) {
/*
* TDB_ERR_EXISTS cannot happen because we
@@ -345,6 +375,7 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
memset(data, 0, sizeof(struct undo_private_data));
data->magic = EXT2_ET_MAGIC_UNIX_IO_CHANNEL;
+ data->written_block_map = NULL;
if (undo_io_backing_manager) {
retval = undo_io_backing_manager->open(name, flags,
@@ -390,7 +421,7 @@ cleanup:
static errcode_t undo_close(io_channel channel)
{
struct undo_private_data *data;
- errcode_t retval = 0;
+ errcode_t err, retval = 0;
EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
data = (struct undo_private_data *) channel->private_data;
@@ -399,20 +430,22 @@ static errcode_t undo_close(io_channel channel)
if (--channel->refcount > 0)
return 0;
/* Before closing write the file system identity */
- retval = write_file_system_identity(channel, data->tdb);
- if (retval)
- return retval;
+ err = write_file_system_identity(channel, data->tdb);
if (data->real)
retval = io_channel_close(data->real);
if (data->tdb) {
tdb_flush(data->tdb);
tdb_close(data->tdb);
}
+ if (data->written_block_map)
+ ext2fs_free_generic_bitmap(data->written_block_map);
ext2fs_free_mem(&channel->private_data);
if (channel->name)
ext2fs_free_mem(&channel->name);
ext2fs_free_mem(&channel);
+ if (err)
+ return err;
return retval;
}
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 15/31] e2undo: fix memory leaks and tweak the error messages somewhat
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (13 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 14/31] undo-io: use a bitmap to track what we've already written Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 16/31] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
` (20 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Fix memory leaks and improve the error messages to make it easier
to figure out why e2undo went wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
misc/e2undo.c | 43 ++++++++++++++++++++++++-------------------
1 file changed, 24 insertions(+), 19 deletions(-)
diff --git a/misc/e2undo.c b/misc/e2undo.c
index a43c26f..d828d3b 100644
--- a/misc/e2undo.c
+++ b/misc/e2undo.c
@@ -49,7 +49,7 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
if (retval) {
com_err(prg_name, retval,
- "%s", _("Failed to read the file system data \n"));
+ "%s", _("while reading filesystem superblock."));
return retval;
}
@@ -58,16 +58,16 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
tdb_data = tdb_fetch(tdb, tdb_key);
if (!tdb_data.dptr) {
retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- com_err(prg_name, retval,
- _("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+ com_err(prg_name, retval, "%s",
+ _("while fetching last mount time."));
return retval;
}
s_mtime = *(__u32 *)tdb_data.dptr;
+ free(tdb_data.dptr);
if (super.s_mtime != s_mtime) {
-
com_err(prg_name, 0,
- _("The file system Mount time didn't match %u\n"),
+ _("The filesystem last mount time didn't match %u."),
s_mtime);
return -1;
@@ -79,14 +79,14 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
tdb_data = tdb_fetch(tdb, tdb_key);
if (!tdb_data.dptr) {
retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- com_err(prg_name, retval,
- _("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+ com_err(prg_name, retval, "%s", _("while fetching UUID"));
return retval;
}
memcpy(s_uuid, tdb_data.dptr, sizeof(s_uuid));
+ free(tdb_data.dptr);
if (memcmp(s_uuid, super.s_uuid, sizeof(s_uuid))) {
com_err(prg_name, 0, "%s",
- _("The file system UUID didn't match \n"));
+ _("The filesystem UUID didn't match."));
return -1;
}
@@ -104,12 +104,12 @@ static int set_blk_size(TDB_CONTEXT *tdb, io_channel channel)
tdb_data = tdb_fetch(tdb, tdb_key);
if (!tdb_data.dptr) {
retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- com_err(prg_name, retval,
- _("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+ com_err(prg_name, retval, "%s", _("while fetching block size"));
return retval;
}
block_size = *(int *)tdb_data.dptr;
+ free(tdb_data.dptr);
#ifdef DEBUG
printf("Block size %d\n", block_size);
#endif
@@ -129,6 +129,7 @@ int main(int argc, char *argv[])
blk64_t blk_num;
char *device_name, *tdb_file;
io_manager manager = unix_io_manager;
+ void *old_dptr = NULL;
#ifdef ENABLE_NLS
setlocale(LC_MESSAGES, "");
@@ -160,20 +161,20 @@ int main(int argc, char *argv[])
if (!tdb) {
com_err(prg_name, errno,
- _("Failed tdb_open %s\n"), tdb_file);
+ _("while opening undo file `%s'\n"), tdb_file);
exit(1);
}
retval = ext2fs_check_if_mounted(device_name, &mount_flags);
if (retval) {
com_err(prg_name, retval, _("Error while determining whether "
- "%s is mounted.\n"), device_name);
+ "%s is mounted."), device_name);
exit(1);
}
if (mount_flags & EXT2_MF_MOUNTED) {
com_err(prg_name, retval, "%s", _("e2undo should only be run "
- "on unmounted file system\n"));
+ "on unmounted filesystems"));
exit(1);
}
@@ -181,7 +182,7 @@ int main(int argc, char *argv[])
IO_FLAG_EXCLUSIVE | IO_FLAG_RW, &channel);
if (retval) {
com_err(prg_name, retval,
- _("Failed to open %s\n"), device_name);
+ _("while opening `%s'"), device_name);
exit(1);
}
@@ -194,30 +195,34 @@ int main(int argc, char *argv[])
}
for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
+ free(old_dptr);
+ old_dptr = key.dptr;
if (!strcmp((char *) key.dptr, (char *) mtime_key) ||
!strcmp((char *) key.dptr, (char *) uuid_key) ||
!strcmp((char *) key.dptr, (char *) blksize_key)) {
continue;
}
+ blk_num = *(blk64_t *)key.dptr;
data = tdb_fetch(tdb, key);
if (!data.dptr) {
- com_err(prg_name, 0,
- _("Failed tdb_fetch %s\n"), tdb_errorstr(tdb));
+ retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
+ com_err(prg_name, retval,
+ _("while fetching block %llu."), blk_num);
exit(1);
}
- blk_num = *(blk64_t *)key.dptr;
printf(_("Replayed transaction of size %zd at location %llu\n"),
data.dsize, blk_num);
retval = io_channel_write_blk64(channel, blk_num,
-data.dsize, data.dptr);
+ free(data.dptr);
if (retval == -1) {
com_err(prg_name, retval,
- _("Failed write %s\n"),
- strerror(errno));
+ _("while writing block %llu."), blk_num);
exit(1);
}
}
+ free(old_dptr);
io_channel_close(channel);
tdb_close(tdb);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 16/31] e2undo: ditch tdb file, write everything to a flat file
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (14 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 15/31] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2015-01-08 1:36 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 17/31] e2fsck: optionally create an undo file Darrick J. Wong
` (19 subsequent siblings)
35 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
The existing undo file format (which is based on tdb) has many
problems. First, its comparison of superblock fields is ineffective,
since the last mount time is only written by the kernel, not the tools
(which means that undo files can be applied out of order, thus
corrupting the filesystem); block numbers are written in CPU byte
order, which will cause silent failures if an undo file is moved from
one type of system to another; using the tdb database costs us an
enormous amount of CPU overhead to maintain the key data structure,
and finally, the tdb database is unable to deal with databases larger
than 2GB. (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
to 64bit,metadata_csum easily produces 4.5GB of undo files, so we
might as well move off of tdb now.)
The last problem is fatal if you want to use tune2fs to turn on
metadata checksumming, since that rewrites every block on the
filesystem, which can easily produce a many-gigabyte undo file, which
of course is unreadable and therefore the operation cannot be undone.
Therefore, rip all of that out in favor of writing to a flat file.
Old blocks are appended to a file and the index is written to the end
when we're done. This implementation is much faster than wasting a
considerable amount of time trying to maintain a hash index, which
drops the runtime overhead of tune2fs -O metadata_csum from ~45min
to ~20 seconds on a 2TB filesystem.
I have a few reasons that factored in my decision not to repurpose the
jbd2 file format for undo files. First, undo files are limited to
2^32 blocks (16TB) which some day might not serve us well. Second,
the journal block size is tied to the file system block size, but
mke2fs wants to be able to back up big chunks of old device contents.
This would require large changes to the e2fsck journal replay code,
which itself is derived from the kernel jbd2 driver, which I'd rather
not destabilize. Third, I want to require undo files to store the FS
superblock at the end of undo file creation so that e2undo can be
reasonably sure that an undo file is supposed to apply against the
given block device, and doing so would require changes to the jbd2
format. Fourth, it didn't seem like a good idea that external
journals should resemble undo files so closely.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/undo_io.c | 283 ++++++++++++++++++++++++--------------
misc/e2undo.8.in | 11 +
misc/e2undo.c | 371 +++++++++++++++++++++++++++++++++++++-------------
3 files changed, 461 insertions(+), 204 deletions(-)
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 9a01e30..a480246 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -39,8 +39,6 @@
#endif
#include <limits.h>
-#include "tdb.h"
-
#include "ext2_fs.h"
#include "ext2fs.h"
@@ -50,17 +48,67 @@
#define ATTR(x)
#endif
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
/*
* For checking structure magic numbers...
*/
#define EXT2_CHECK_MAGIC(struct, code) \
if ((struct)->magic != (code)) return (code)
+/*
+ * Undo file format: The file is cut up into undo_header.block_size blocks.
+ * The first block contains the header.
+ * There is then a repeating series of blocks as follows:
+ * A key block, which contains undo_keys to map the following data blocks.
+ * Data blocks
+ * The last block contains the superblock.
+ * (Note that there are pointers to the first key block and the sb, so this
+ * order isn't strictly necessary.)
+ */
+#define E2UNDO_MAGIC "E2UNDO02"
+#define KEYBLOCK_MAGIC 0xCADECADE
+
+struct undo_header {
+ char magic[8]; /* "E2UNDO02" */
+ __le64 num_keys; /* how many keys? */
+ __le64 super_offset; /* where in the file is the superblock copy? */
+ __le64 key_offset; /* where do the key/data block chunks start? */
+ __le32 block_size; /* block size of the undo file */
+ __le32 fs_block_size; /* block size of the target device */
+ __le32 sb_crc; /* crc32c of the superblock */
+ __u8 padding[464]; /* padding */
+ __le32 header_crc; /* crc32c of this header (but not this field) */
+};
+
+struct undo_key {
+ __le64 fsblk; /* where in the fs does the block go */
+ __le32 blk_crc; /* crc32c of the block */
+ __le32 size; /* how many bytes in this block? */
+};
+
+struct undo_key_block {
+ __le32 magic; /* KEYBLOCK_MAGIC number */
+ __le32 crc; /* block checksum */
+ __le64 reserved; /* zero */
+
+ struct undo_key keys[0]; /* keys, which come immediately after */
+};
struct undo_private_data {
int magic;
- TDB_CONTEXT *tdb;
- char *tdb_file;
+
+ /* the undo file io channel */
+ io_channel undo_file;
+ blk64_t undo_blk_num, key_blk_num;
+ struct undo_key_block *keyb;
+ size_t num_keys, keys_in_block;
/* The backing io channel */
io_channel real;
@@ -73,16 +121,15 @@ struct undo_private_data {
ext2fs_block_bitmap written_block_map;
struct struct_ext2_filsys fake_fs;
+
+ struct undo_header hdr;
};
+#define KEYS_PER_BLOCK(d) (((d)->tdb_data_size / sizeof(struct undo_key)) - 1)
static io_manager undo_io_backing_manager;
static char *tdb_file;
static int actual_size;
-static unsigned char mtime_key[] = "filesystem MTIME";
-static unsigned char blksize_key[] = "filesystem BLKSIZE";
-static unsigned char uuid_key[] = "filesystem UUID";
-
errcode_t set_undo_io_backing_manager(io_manager manager)
{
/*
@@ -103,17 +150,34 @@ errcode_t set_undo_io_backup_file(char *file_name)
return 0;
}
-static errcode_t write_file_system_identity(io_channel undo_channel,
- TDB_CONTEXT *tdb)
+static errcode_t write_undo_indexes(struct undo_private_data *data)
{
errcode_t retval;
struct ext2_super_block super;
- TDB_DATA tdb_key, tdb_data;
- struct undo_private_data *data;
io_channel channel;
- int block_size ;
+ int block_size;
+ __u32 sb_crc, hdr_crc;
+
+ /* Spit out a key block, if there's any data */
+ if (data->keys_in_block) {
+ data->keyb->magic = ext2fs_cpu_to_le32(KEYBLOCK_MAGIC);
+ data->keyb->crc = 0;
+ data->keyb->crc = ext2fs_cpu_to_le32(
+ ext2fs_crc32c_le(~0,
+ (unsigned char *)data->keyb,
+ data->tdb_data_size));
+ dbg_printf("Writing keyblock to blk %llu\n", data->key_blk_num);
+ retval = io_channel_write_blk64(data->undo_file,
+ data->key_blk_num,
+ 1, data->keyb);
+ if (retval)
+ return retval;
+ memset(data->keyb, 0, data->tdb_data_size);
+ data->keys_in_block = 0;
+ data->key_blk_num = data->undo_blk_num;
+ }
- data = (struct undo_private_data *) undo_channel->private_data;
+ /* Prepare superblock for write */
channel = data->real;
block_size = channel->block_size;
@@ -121,52 +185,42 @@ static errcode_t write_file_system_identity(io_channel undo_channel,
retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
if (retval)
goto err_out;
-
- /* Write to tdb file in the file system byte order */
- tdb_key.dptr = mtime_key;
- tdb_key.dsize = sizeof(mtime_key);
- tdb_data.dptr = (unsigned char *) &(super.s_mtime);
- tdb_data.dsize = sizeof(super.s_mtime);
-
- retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
- if (retval == -1) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
+ sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)&super, SUPERBLOCK_SIZE);
+
+ /* Write the undo header to disk. */
+ memcpy(data->hdr.magic, E2UNDO_MAGIC, sizeof(data->hdr.magic));
+ data->hdr.num_keys = ext2fs_cpu_to_le64(data->num_keys);
+ data->hdr.super_offset = ext2fs_cpu_to_le64(data->undo_blk_num);
+ data->hdr.key_offset = ext2fs_cpu_to_le64(1);
+ data->hdr.fs_block_size = ext2fs_cpu_to_le32(block_size);
+ data->hdr.sb_crc = ext2fs_cpu_to_le32(sb_crc);
+ hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&data->hdr,
+ sizeof(data->hdr) -
+ sizeof(data->hdr.header_crc));
+ data->hdr.header_crc = ext2fs_cpu_to_le32(hdr_crc);
+ retval = io_channel_write_blk64(data->undo_file, 0,
+ -(int)sizeof(data->hdr),
+ &data->hdr);
+ if (retval)
goto err_out;
- }
-
- tdb_key.dptr = uuid_key;
- tdb_key.dsize = sizeof(uuid_key);
- tdb_data.dptr = (unsigned char *)&(super.s_uuid);
- tdb_data.dsize = sizeof(super.s_uuid);
- retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
- if (retval == -1) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- }
+ /*
+ * Record the entire superblock (in FS byte order) so that we can't
+ * apply e2undo files to the wrong FS or out of order.
+ */
+ dbg_printf("Writing superblock to block %llu\n", data->undo_blk_num);
+ retval = io_channel_write_blk64(data->undo_file, data->undo_blk_num,
+ -SUPERBLOCK_SIZE, &super);
+ if (retval)
+ goto err_out;
+ data->undo_blk_num++;
+ retval = io_channel_flush(data->undo_file);
err_out:
io_channel_set_blksize(channel, block_size);
return retval;
}
-static errcode_t write_block_size(TDB_CONTEXT *tdb, int block_size)
-{
- errcode_t retval;
- TDB_DATA tdb_key, tdb_data;
-
- tdb_key.dptr = blksize_key;
- tdb_key.dsize = sizeof(blksize_key);
- tdb_data.dptr = (unsigned char *)&(block_size);
- tdb_data.dsize = sizeof(block_size);
-
- retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
- if (retval == -1) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- }
-
- return retval;
-}
-
static errcode_t undo_setup_tdb(struct undo_private_data *data)
{
errcode_t retval;
@@ -187,15 +241,17 @@ static errcode_t undo_setup_tdb(struct undo_private_data *data)
if (retval)
return retval;
- /* Write the blocksize to tdb file */
- tdb_transaction_start(data->tdb);
- retval = write_block_size(data->tdb,
- data->tdb_data_size);
- if (retval) {
- tdb_transaction_cancel(data->tdb);
- return EXT2_ET_TDB_ERR_IO;
- }
- tdb_transaction_commit(data->tdb);
+ /* Allocate key block */
+ retval = ext2fs_get_memzero(data->tdb_data_size, &data->keyb);
+ if (retval)
+ return retval;
+ data->key_blk_num = data->undo_blk_num++;
+
+ /* Record block size */
+ dbg_printf("Undo block size %d\n", data->tdb_data_size);
+ dbg_printf("Keys per block %zu\n", KEYS_PER_BLOCK(data));
+ data->hdr.block_size = ext2fs_cpu_to_le32(data->tdb_data_size);
+ io_channel_set_blksize(data->undo_file, data->tdb_data_size);
return 0;
}
@@ -208,13 +264,16 @@ static errcode_t undo_write_tdb(io_channel channel,
errcode_t retval = 0;
ext2_loff_t offset;
struct undo_private_data *data;
- TDB_DATA tdb_key, tdb_data;
unsigned char *read_ptr;
unsigned long long end_block;
+ unsigned long long data_size;
+ void *data_ptr;
+ struct undo_key *key;
+ __u32 blk_crc;
data = (struct undo_private_data *) channel->private_data;
- if (data->tdb == NULL) {
+ if (data->undo_file == NULL) {
/*
* Transaction database not initialized
*/
@@ -243,11 +302,7 @@ static errcode_t undo_write_tdb(io_channel channel,
block_num = offset / data->tdb_data_size;
end_block = (offset + size) / data->tdb_data_size;
- tdb_transaction_start(data->tdb);
- while (block_num <= end_block ) {
-
- tdb_key.dptr = (unsigned char *)&block_num;
- tdb_key.dsize = sizeof(block_num);
+ while (block_num <= end_block) {
/*
* Check if we have the record already
*/
@@ -259,6 +314,13 @@ static errcode_t undo_write_tdb(io_channel channel,
}
ext2fs_mark_block_bitmap2(data->written_block_map, block_num);
+ /* Spit out a key block */
+ if (data->keys_in_block == KEYS_PER_BLOCK(data)) {
+ retval = write_undo_indexes(data);
+ if (retval)
+ return retval;
+ }
+
/*
* Read one block using the backing I/O manager
* The backing I/O manager block size may be
@@ -273,7 +335,6 @@ static errcode_t undo_write_tdb(io_channel channel,
((offset - data->offset) % channel->block_size);
retval = ext2fs_get_mem(count, &read_ptr);
if (retval) {
- tdb_transaction_cancel(data->tdb);
return retval;
}
@@ -288,33 +349,43 @@ static errcode_t undo_write_tdb(io_channel channel,
if (retval) {
if (retval != EXT2_ET_SHORT_READ) {
free(read_ptr);
- tdb_transaction_cancel(data->tdb);
return retval;
}
/*
* short read so update the record size
* accordingly
*/
- tdb_data.dsize = actual_size;
+ data_size = actual_size;
} else {
- tdb_data.dsize = data->tdb_data_size;
+ data_size = data->tdb_data_size;
}
- tdb_data.dptr = read_ptr +
- ((offset - data->offset) % channel->block_size);
-#ifdef DEBUG
- printf("Printing with key %lld data %x and size %d\n",
+ if (data_size == 0) {
+ free(read_ptr);
+ block_num++;
+ continue;
+ }
+ dbg_printf("Read %llu bytes from FS block %llu (blk=%llu cnt=%u)\n",
+ data_size, backing_blk_num, block, count);
+ if ((data_size % data->undo_file->block_size) == 0)
+ sz = data_size / data->undo_file->block_size;
+ else
+ sz = -actual_size;
+ data_ptr = read_ptr + ((offset - data->offset) %
+ data->undo_file->block_size);
+ dbg_printf("Writing block %llu to offset %llu size %d\n",
block_num,
- tdb_data.dptr,
- tdb_data.dsize);
-#endif
- retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_REPLACE);
- if (retval == -1) {
- /*
- * TDB_ERR_EXISTS cannot happen because we
- * have already verified it doesn't exist
- */
- tdb_transaction_cancel(data->tdb);
- retval = EXT2_ET_TDB_ERR_IO;
+ data->undo_blk_num,
+ sz);
+ data->num_keys++;
+ key = &data->keyb->keys[data->keys_in_block++];
+ key->fsblk = ext2fs_cpu_to_le64(backing_blk_num);
+ blk_crc = ext2fs_crc32c_le(~0, (unsigned char *)data_ptr,
+ data_size);
+ key->blk_crc = ext2fs_cpu_to_le32(blk_crc);
+ key->size = ext2fs_cpu_to_le32(data_size);
+ retval = io_channel_write_blk64(data->undo_file,
+ data->undo_blk_num++, sz, data_ptr);
+ if (retval) {
free(read_ptr);
return retval;
}
@@ -322,7 +393,6 @@ static errcode_t undo_write_tdb(io_channel channel,
/* Next block */
block_num++;
}
- tdb_transaction_commit(data->tdb);
return retval;
}
@@ -375,29 +445,33 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
memset(data, 0, sizeof(struct undo_private_data));
data->magic = EXT2_ET_MAGIC_UNIX_IO_CHANNEL;
- data->written_block_map = NULL;
+ data->undo_blk_num = 1;
if (undo_io_backing_manager) {
retval = undo_io_backing_manager->open(name, flags,
&data->real);
if (retval)
goto cleanup;
+
+ retval = ext2fs_open_file(tdb_file,
+ O_RDWR | O_CREAT | O_TRUNC | O_EXCL,
+ 0600);
+ if (retval < 0)
+ goto cleanup;
+ close(retval);
+ retval = undo_io_backing_manager->open(tdb_file, IO_FLAG_RW,
+ &data->undo_file);
+ if (retval)
+ goto cleanup;
} else {
- data->real = 0;
+ data->real = NULL;
+ data->undo_file = NULL;
}
if (data->real)
io->flags = (io->flags & ~CHANNEL_FLAGS_DISCARD_ZEROES) |
(data->real->flags & CHANNEL_FLAGS_DISCARD_ZEROES);
- /* setup the tdb file */
- data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST | TDB_NOLOCK | TDB_NOSYNC,
- O_RDWR | O_CREAT | O_TRUNC | O_EXCL, 0600);
- if (!data->tdb) {
- retval = errno;
- goto cleanup;
- }
-
/*
* setup err handler for read so that we know
* when the backing manager fails do short read
@@ -409,6 +483,8 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
return 0;
cleanup:
+ if (data && data->undo_file)
+ io_channel_close(data->undo_file);
if (data && data->real)
io_channel_close(data->real);
if (data)
@@ -430,13 +506,12 @@ static errcode_t undo_close(io_channel channel)
if (--channel->refcount > 0)
return 0;
/* Before closing write the file system identity */
- err = write_file_system_identity(channel, data->tdb);
+ err = write_undo_indexes(data);
if (data->real)
retval = io_channel_close(data->real);
- if (data->tdb) {
- tdb_flush(data->tdb);
- tdb_close(data->tdb);
- }
+ if (data->undo_file)
+ io_channel_close(data->undo_file);
+ ext2fs_free_mem(&data->keyb);
if (data->written_block_map)
ext2fs_free_generic_bitmap(data->written_block_map);
ext2fs_free_mem(&channel->private_data);
@@ -632,8 +707,6 @@ static errcode_t undo_flush(io_channel channel)
data = (struct undo_private_data *) channel->private_data;
EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
- if (data->tdb)
- tdb_flush(data->tdb);
if (data->real)
retval = io_channel_flush(data->real);
diff --git a/misc/e2undo.8.in b/misc/e2undo.8.in
index 4bf0798..3677b83 100644
--- a/misc/e2undo.8.in
+++ b/misc/e2undo.8.in
@@ -10,6 +10,9 @@ e2undo \- Replay an undo log for an ext2/ext3/ext4 filesystem
[
.B \-f
]
+[
+.B \-n
+]
.I undo_log device
.SH DESCRIPTION
.B e2undo
@@ -24,13 +27,15 @@ used to undo a failed operation by an e2fsprogs program.
.B \-f
Normally,
.B e2undo
-will check the filesystem UUID and last modified time to make sure the
-undo log matches with the filesystem on the device. If they do not
-match,
+will check the filesystem superblock to make sure the undo log matches
+with the filesystem on the device. If they do not match,
.B e2undo
will refuse to apply the undo log as a safety mechanism. The
.B \-f
option disables this safety mechanism.
+.TP
+.B \-n
+Dry-run; do not actually write blocks back to the filesystem.
.SH AUTHOR
.B e2undo
was written by Aneesh Kumar K.V. (aneesh.kumar@linux.vnet.ibm.com)
diff --git a/misc/e2undo.c b/misc/e2undo.c
index d828d3b..00e8e27 100644
--- a/misc/e2undo.c
+++ b/misc/e2undo.c
@@ -20,13 +20,72 @@
#if HAVE_ERRNO_H
#include <errno.h>
#endif
-#include "ext2fs/tdb.h"
#include "ext2fs/ext2fs.h"
#include "nls-enable.h"
-static unsigned char mtime_key[] = "filesystem MTIME";
-static unsigned char uuid_key[] = "filesystem UUID";
-static unsigned char blksize_key[] = "filesystem BLKSIZE";
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Undo file format: The file is cut up into undo_header.block_size blocks.
+ * The first block contains the header.
+ * There is then a repeating series of blocks as follows:
+ * A key block, which contains undo_keys to map the following data blocks.
+ * Data blocks
+ * The last block contains the superblock.
+ * (Note that there are pointers to the first key block and the sb, so this
+ * order isn't strictly necessary.)
+ */
+#define E2UNDO_MAGIC "E2UNDO02"
+#define KEYBLOCK_MAGIC 0xCADECADE
+
+struct undo_header {
+ char magic[8]; /* "E2UNDO02" */
+ __le64 num_keys; /* how many keys? */
+ __le64 super_offset; /* where in the file is the superblock copy? */
+ __le64 key_offset; /* where do the key/data block chunks start? */
+ __le32 block_size; /* block size of the undo file */
+ __le32 fs_block_size; /* block size of the target device */
+ __le32 sb_crc; /* crc32c of the superblock */
+ __u8 padding[464]; /* padding */
+ __le32 header_crc; /* crc32c of the header (but not this field) */
+};
+
+struct undo_key {
+ __le64 fsblk; /* where in the fs does the block go */
+ __le32 blk_crc; /* crc32c of the block */
+ __le32 size; /* how many bytes in this block? */
+};
+
+struct undo_key_block {
+ __le32 magic; /* KEYBLOCK_MAGIC number */
+ __le32 crc; /* block checksum */
+ __le64 reserved; /* zero */
+
+ struct undo_key keys[0]; /* keys, which come immediately after */
+};
+
+struct undo_key_info {
+ blk64_t fsblk;
+ blk64_t fileblk;
+ __u32 blk_crc;
+ unsigned int size;
+};
+
+struct undo_context {
+ struct undo_header hdr;
+ io_channel undo_file;
+ unsigned int blocksize, fs_blocksize;
+ blk64_t super_block;
+ size_t num_keys;
+ struct undo_key_info *keys;
+};
+#define KEYS_PER_BLOCK(d) (((d)->blocksize / sizeof(struct undo_key)) - 1)
static char *prg_name;
@@ -37,13 +96,28 @@ static void usage(void)
exit(1);
}
-static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
+static void print_undo_mismatch(struct ext2_super_block *fs_super,
+ struct ext2_super_block *undo_super)
+{
+ printf("%s",
+ _("The file system superblock doesn't match the undo file.\n"));
+ if (memcmp(fs_super->s_uuid, undo_super->s_uuid,
+ sizeof(fs_super->s_uuid)))
+ printf("%s", _("UUID does not match.\n"));
+ if (fs_super->s_mtime != undo_super->s_mtime)
+ printf("%s", _("Last mount time does not match.\n"));
+ if (fs_super->s_wtime != undo_super->s_wtime)
+ printf("%s", _("Last write time does not match.\n"));
+ if (fs_super->s_kbytes_written != undo_super->s_kbytes_written)
+ printf("%s", _("Lifetime write counter does not match.\n"));
+}
+
+static int check_filesystem(struct undo_context *ctx, io_channel channel)
{
- __u32 s_mtime;
- __u8 s_uuid[16];
- errcode_t retval;
- TDB_DATA tdb_key, tdb_data;
struct ext2_super_block super;
+ char *buf;
+ __u32 sb_crc;
+ errcode_t retval;
io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
@@ -53,83 +127,64 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
return retval;
}
- tdb_key.dptr = mtime_key;
- tdb_key.dsize = sizeof(mtime_key);
- tdb_data = tdb_fetch(tdb, tdb_key);
- if (!tdb_data.dptr) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- com_err(prg_name, retval, "%s",
- _("while fetching last mount time."));
+ /*
+ * Compare the FS and the undo file superblock so that we can't apply
+ * e2undo "patches" out of order.
+ */
+ retval = ext2fs_get_mem(ctx->blocksize, &buf);
+ if (retval) {
+ com_err(prg_name, retval, "%s", _("while allocating memory"));
return retval;
}
-
- s_mtime = *(__u32 *)tdb_data.dptr;
- free(tdb_data.dptr);
- if (super.s_mtime != s_mtime) {
- com_err(prg_name, 0,
- _("The filesystem last mount time didn't match %u."),
- s_mtime);
-
- return -1;
- }
-
-
- tdb_key.dptr = uuid_key;
- tdb_key.dsize = sizeof(uuid_key);
- tdb_data = tdb_fetch(tdb, tdb_key);
- if (!tdb_data.dptr) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- com_err(prg_name, retval, "%s", _("while fetching UUID"));
+ retval = io_channel_read_blk64(ctx->undo_file, ctx->super_block,
+ -SUPERBLOCK_SIZE, buf);
+ if (retval) {
+ ext2fs_free_mem(&buf);
+ com_err(prg_name, retval, "%s", _("while fetching superblock"));
return retval;
}
- memcpy(s_uuid, tdb_data.dptr, sizeof(s_uuid));
- free(tdb_data.dptr);
- if (memcmp(s_uuid, super.s_uuid, sizeof(s_uuid))) {
- com_err(prg_name, 0, "%s",
- _("The filesystem UUID didn't match."));
+ if (memcmp(&super, buf, sizeof(super))) {
+ print_undo_mismatch(&super, (struct ext2_super_block *)buf);
+ ext2fs_free_mem(&buf);
+ return -1;
+ }
+ sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf, SUPERBLOCK_SIZE);
+ if (ext2fs_le32_to_cpu(ctx->hdr.sb_crc) != sb_crc) {
+ fprintf(stderr,
+ _("Undo file superblock checksum doesn't match.\n"));
return -1;
}
+ ext2fs_free_mem(&buf);
return 0;
}
-static int set_blk_size(TDB_CONTEXT *tdb, io_channel channel)
+static int key_compare(const void *a, const void *b)
{
- int block_size;
- errcode_t retval;
- TDB_DATA tdb_key, tdb_data;
-
- tdb_key.dptr = blksize_key;
- tdb_key.dsize = sizeof(blksize_key);
- tdb_data = tdb_fetch(tdb, tdb_key);
- if (!tdb_data.dptr) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
- com_err(prg_name, retval, "%s", _("while fetching block size"));
- return retval;
- }
-
- block_size = *(int *)tdb_data.dptr;
- free(tdb_data.dptr);
-#ifdef DEBUG
- printf("Block size %d\n", block_size);
-#endif
- io_channel_set_blksize(channel, block_size);
+ const struct undo_key_info *ka, *kb;
- return 0;
+ ka = a;
+ kb = b;
+ return ext2fs_le64_to_cpu(ka->fsblk) -
+ ext2fs_le64_to_cpu(kb->fsblk);
}
int main(int argc, char *argv[])
{
- int c,force = 0;
- TDB_CONTEXT *tdb;
- TDB_DATA key, data;
+ int c, force = 0, dry_run = 0;
io_channel channel;
errcode_t retval;
- int mount_flags;
- blk64_t blk_num;
+ int mount_flags, csum_error = 0;
+ size_t i, keys_per_block;
char *device_name, *tdb_file;
io_manager manager = unix_io_manager;
- void *old_dptr = NULL;
+ struct undo_context undo_ctx;
+ char *buf;
+ struct undo_key_block *keyb;
+ struct undo_key *dkey;
+ struct undo_key_info *ikey;
+ __u32 key_crc, blk_crc, hdr_crc;
+ blk64_t lblk;
#ifdef ENABLE_NLS
setlocale(LC_MESSAGES, "");
@@ -141,11 +196,14 @@ int main(int argc, char *argv[])
add_error_table(&et_ext2_error_table);
prg_name = argv[0];
- while((c = getopt(argc, argv, "f")) != EOF) {
+ while((c = getopt(argc, argv, "fn")) != EOF) {
switch (c) {
case 'f':
force = 1;
break;
+ case 'n':
+ dry_run = 1;
+ break;
default:
usage();
}
@@ -157,14 +215,41 @@ int main(int argc, char *argv[])
tdb_file = argv[optind];
device_name = argv[optind+1];
- tdb = tdb_open(tdb_file, 0, 0, O_RDONLY, 0600);
-
- if (!tdb) {
+ /* Interpret the undo file */
+ retval = manager->open(tdb_file, IO_FLAG_EXCLUSIVE,
+ &undo_ctx.undo_file);
+ if (retval) {
com_err(prg_name, errno,
_("while opening undo file `%s'\n"), tdb_file);
exit(1);
}
+ retval = io_channel_read_blk64(undo_ctx.undo_file, 0,
+ -(int)sizeof(undo_ctx.hdr),
+ &undo_ctx.hdr);
+ if (retval) {
+ com_err(prg_name, retval, _("while reading undo file"));
+ exit(1);
+ }
+ if (memcmp(undo_ctx.hdr.magic, E2UNDO_MAGIC,
+ sizeof(undo_ctx.hdr.magic))) {
+ fprintf(stderr, _("%s: Not an undo file.\n"), tdb_file);
+ exit(1);
+ }
+ hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&undo_ctx.hdr,
+ sizeof(struct undo_header) -
+ sizeof(__u32));
+ if (!force && ext2fs_le32_to_cpu(undo_ctx.hdr.header_crc) != hdr_crc) {
+ fprintf(stderr, _("%s: Header checksum doesn't match.\n"),
+ tdb_file);
+ exit(1);
+ }
+ undo_ctx.blocksize = ext2fs_le32_to_cpu(undo_ctx.hdr.block_size);
+ undo_ctx.fs_blocksize = ext2fs_le32_to_cpu(undo_ctx.hdr.fs_block_size);
+ undo_ctx.super_block = ext2fs_le64_to_cpu(undo_ctx.hdr.super_offset);
+ undo_ctx.num_keys = ext2fs_le64_to_cpu(undo_ctx.hdr.num_keys);
+ io_channel_set_blksize(undo_ctx.undo_file, undo_ctx.blocksize);
+ /* open the fs */
retval = ext2fs_check_if_mounted(device_name, &mount_flags);
if (retval) {
com_err(prg_name, retval, _("Error while determining whether "
@@ -179,52 +264,146 @@ int main(int argc, char *argv[])
}
retval = manager->open(device_name,
- IO_FLAG_EXCLUSIVE | IO_FLAG_RW, &channel);
+ IO_FLAG_EXCLUSIVE | (dry_run ? 0 : IO_FLAG_RW),
+ &channel);
if (retval) {
com_err(prg_name, retval,
_("while opening `%s'"), device_name);
exit(1);
}
- if (!force && check_filesystem(tdb, channel)) {
+ if (!force && check_filesystem(&undo_ctx, channel))
exit(1);
- }
- if (set_blk_size(tdb, channel)) {
+ /* prepare to read keys */
+ retval = ext2fs_get_mem(sizeof(struct undo_key_info) * undo_ctx.num_keys,
+ &undo_ctx.keys);
+ if (retval) {
+ com_err(prg_name, retval, "%s", _("while allocating memory"));
+ exit(1);
+ }
+ ikey = undo_ctx.keys;
+ retval = ext2fs_get_mem(undo_ctx.blocksize, &keyb);
+ if (retval) {
+ com_err(prg_name, retval, "%s", _("while allocating memory"));
+ exit(1);
+ }
+ retval = ext2fs_get_mem(undo_ctx.blocksize, &buf);
+ if (retval) {
+ com_err(prg_name, retval, "%s", _("while allocating memory"));
exit(1);
}
- for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
- free(old_dptr);
- old_dptr = key.dptr;
- if (!strcmp((char *) key.dptr, (char *) mtime_key) ||
- !strcmp((char *) key.dptr, (char *) uuid_key) ||
- !strcmp((char *) key.dptr, (char *) blksize_key)) {
- continue;
+ /* load keys */
+ keys_per_block = KEYS_PER_BLOCK(&undo_ctx);
+ lblk = ext2fs_le64_to_cpu(undo_ctx.hdr.key_offset);
+ dbg_printf("nr_keys=%lu, kpb=%zu, blksz=%u\n",
+ undo_ctx.num_keys, keys_per_block, undo_ctx.blocksize);
+ for (i = 0; i < undo_ctx.num_keys; i += keys_per_block) {
+ size_t j, max_j;
+ __le32 crc;
+
+ retval = io_channel_read_blk64(undo_ctx.undo_file,
+ lblk, 1, keyb);
+ if (retval) {
+ com_err(prg_name, retval, "%s", _("while reading keys"));
+ exit(1);
+ }
+
+ /* check keys */
+ if (!force &&
+ ext2fs_le32_to_cpu(keyb->magic) != KEYBLOCK_MAGIC) {
+ fprintf(stderr, _("%s: wrong key magic at %llu\n"),
+ tdb_file, lblk);
+ exit(1);
}
+ crc = keyb->crc;
+ keyb->crc = 0;
+ key_crc = ext2fs_crc32c_le(~0, (unsigned char *)keyb,
+ undo_ctx.blocksize);
+ if (!force && ext2fs_le32_to_cpu(crc) != key_crc) {
+ fprintf(stderr,
+ _("%s: key block checksum error at %llu.\n"),
+ tdb_file, lblk);
+ exit(1);
+ }
+
+ /* load keys from key block */
+ lblk++;
+ max_j = undo_ctx.num_keys - i;
+ if (max_j > keys_per_block)
+ max_j = keys_per_block;
+ for (j = 0, dkey = keyb->keys;
+ j < max_j;
+ j++, ikey++, dkey++) {
+ ikey->fsblk = ext2fs_le64_to_cpu(dkey->fsblk);
+ ikey->fileblk = lblk++;
+ ikey->blk_crc = ext2fs_le32_to_cpu(dkey->blk_crc);
+ ikey->size = ext2fs_le32_to_cpu(dkey->size);
+
+ /* check each block's crc */
+ retval = io_channel_read_blk64(undo_ctx.undo_file,
+ ikey->fileblk,
+ -(int)undo_ctx.blocksize,
+ buf);
+ if (retval) {
+ com_err(prg_name, retval,
+ _("while fetching block %llu."),
+ ikey->fileblk);
+ exit(1);
+ }
- blk_num = *(blk64_t *)key.dptr;
- data = tdb_fetch(tdb, key);
- if (!data.dptr) {
- retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
+ blk_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf,
+ ikey->size);
+ if (blk_crc != ikey->blk_crc) {
+ fprintf(stderr,
+ _("checksum error in filesystem block %llu\n"),
+ ikey->fsblk);
+ if (!force)
+ exit(1);
+ csum_error = 1;
+ }
+ }
+ }
+ ext2fs_free_mem(&keyb);
+
+ /* sort keys in fs block order */
+ qsort(undo_ctx.keys, undo_ctx.num_keys, sizeof(struct undo_key_info),
+ key_compare);
+
+ /* replay */
+ io_channel_set_blksize(channel, undo_ctx.fs_blocksize);
+ for (i = 0, ikey = undo_ctx.keys; i < undo_ctx.num_keys; i++, ikey++) {
+ retval = io_channel_read_blk64(undo_ctx.undo_file,
+ ikey->fileblk,
+ -(int)undo_ctx.blocksize,
+ buf);
+ if (retval) {
com_err(prg_name, retval,
- _("while fetching block %llu."), blk_num);
+ _("while fetching block %llu."),
+ ikey->fileblk);
exit(1);
}
- printf(_("Replayed transaction of size %zd at location %llu\n"),
- data.dsize, blk_num);
- retval = io_channel_write_blk64(channel, blk_num,
- -data.dsize, data.dptr);
- free(data.dptr);
- if (retval == -1) {
+
+ dbg_printf("Replayed transaction of size %u from %llu to %llu\n",
+ ikey->size, ikey->fileblk, ikey->fsblk);
+ if (dry_run)
+ continue;
+ retval = io_channel_write_blk64(channel, ikey->fsblk,
+ -(int)ikey->size, buf);
+ if (retval) {
com_err(prg_name, retval,
- _("while writing block %llu."), blk_num);
+ _("while writing block %llu."), ikey->fsblk);
exit(1);
}
}
- free(old_dptr);
+
+ if (csum_error)
+ fprintf(stderr, _("Undo file corruption; run e2fsck NOW!\n"));
+ ext2fs_free_mem(&buf);
+ ext2fs_free_mem(&undo_ctx.keys);
io_channel_close(channel);
- tdb_close(tdb);
+ io_channel_close(undo_ctx.undo_file);
- return 0;
+ return csum_error;
}
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 16/31] e2undo: ditch tdb file, write everything to a flat file
2014-12-20 21:18 ` [PATCH 16/31] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
@ 2015-01-08 1:36 ` Darrick J. Wong
0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2015-01-08 1:36 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
All of the comments below will be fixed in the v2 patch.
On Sat, Dec 20, 2014 at 01:18:37PM -0800, Darrick J. Wong wrote:
> The existing undo file format (which is based on tdb) has many
> problems. First, its comparison of superblock fields is ineffective,
> since the last mount time is only written by the kernel, not the tools
> (which means that undo files can be applied out of order, thus
> corrupting the filesystem); block numbers are written in CPU byte
> order, which will cause silent failures if an undo file is moved from
> one type of system to another; using the tdb database costs us an
> enormous amount of CPU overhead to maintain the key data structure,
> and finally, the tdb database is unable to deal with databases larger
> than 2GB. (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
> to 64bit,metadata_csum easily produces 4.5GB of undo files, so we
> might as well move off of tdb now.)
>
> The last problem is fatal if you want to use tune2fs to turn on
> metadata checksumming, since that rewrites every block on the
> filesystem, which can easily produce a many-gigabyte undo file, which
> of course is unreadable and therefore the operation cannot be undone.
>
> Therefore, rip all of that out in favor of writing to a flat file.
> Old blocks are appended to a file and the index is written to the end
> when we're done. This implementation is much faster than wasting a
> considerable amount of time trying to maintain a hash index, which
> drops the runtime overhead of tune2fs -O metadata_csum from ~45min
> to ~20 seconds on a 2TB filesystem.
>
> I have a few reasons that factored in my decision not to repurpose the
> jbd2 file format for undo files. First, undo files are limited to
> 2^32 blocks (16TB) which some day might not serve us well. Second,
> the journal block size is tied to the file system block size, but
> mke2fs wants to be able to back up big chunks of old device contents.
> This would require large changes to the e2fsck journal replay code,
> which itself is derived from the kernel jbd2 driver, which I'd rather
> not destabilize. Third, I want to require undo files to store the FS
> superblock at the end of undo file creation so that e2undo can be
> reasonably sure that an undo file is supposed to apply against the
> given block device, and doing so would require changes to the jbd2
> format. Fourth, it didn't seem like a good idea that external
> journals should resemble undo files so closely.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> lib/ext2fs/undo_io.c | 283 ++++++++++++++++++++++++--------------
> misc/e2undo.8.in | 11 +
> misc/e2undo.c | 371 +++++++++++++++++++++++++++++++++++++-------------
> 3 files changed, 461 insertions(+), 204 deletions(-)
>
>
> diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
> index 9a01e30..a480246 100644
> --- a/lib/ext2fs/undo_io.c
> +++ b/lib/ext2fs/undo_io.c
> @@ -39,8 +39,6 @@
> #endif
> #include <limits.h>
>
> -#include "tdb.h"
> -
> #include "ext2_fs.h"
> #include "ext2fs.h"
>
> @@ -50,17 +48,67 @@
> #define ATTR(x)
> #endif
>
> +#undef DEBUG
> +
> +#ifdef DEBUG
> +# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
> +#else
> +# define dbg_printf(f, a...)
> +#endif
> +
> /*
> * For checking structure magic numbers...
> */
>
> #define EXT2_CHECK_MAGIC(struct, code) \
> if ((struct)->magic != (code)) return (code)
> +/*
> + * Undo file format: The file is cut up into undo_header.block_size blocks.
> + * The first block contains the header.
> + * There is then a repeating series of blocks as follows:
> + * A key block, which contains undo_keys to map the following data blocks.
> + * Data blocks
> + * The last block contains the superblock.
> + * (Note that there are pointers to the first key block and the sb, so this
> + * order isn't strictly necessary.)
> + */
> +#define E2UNDO_MAGIC "E2UNDO02"
> +#define KEYBLOCK_MAGIC 0xCADECADE
> +
> +struct undo_header {
> + char magic[8]; /* "E2UNDO02" */
> + __le64 num_keys; /* how many keys? */
> + __le64 super_offset; /* where in the file is the superblock copy? */
> + __le64 key_offset; /* where do the key/data block chunks start? */
> + __le32 block_size; /* block size of the undo file */
> + __le32 fs_block_size; /* block size of the target device */
> + __le32 sb_crc; /* crc32c of the superblock */
> + __u8 padding[464]; /* padding */
> + __le32 header_crc; /* crc32c of this header (but not this field) */
> +};
> +
> +struct undo_key {
> + __le64 fsblk; /* where in the fs does the block go */
> + __le32 blk_crc; /* crc32c of the block */
> + __le32 size; /* how many bytes in this block? */
> +};
> +
> +struct undo_key_block {
> + __le32 magic; /* KEYBLOCK_MAGIC number */
> + __le32 crc; /* block checksum */
> + __le64 reserved; /* zero */
> +
> + struct undo_key keys[0]; /* keys, which come immediately after */
> +};
>
> struct undo_private_data {
> int magic;
> - TDB_CONTEXT *tdb;
> - char *tdb_file;
> +
> + /* the undo file io channel */
> + io_channel undo_file;
> + blk64_t undo_blk_num, key_blk_num;
> + struct undo_key_block *keyb;
> + size_t num_keys, keys_in_block;
>
> /* The backing io channel */
> io_channel real;
> @@ -73,16 +121,15 @@ struct undo_private_data {
>
> ext2fs_block_bitmap written_block_map;
> struct struct_ext2_filsys fake_fs;
> +
> + struct undo_header hdr;
> };
> +#define KEYS_PER_BLOCK(d) (((d)->tdb_data_size / sizeof(struct undo_key)) - 1)
>
> static io_manager undo_io_backing_manager;
> static char *tdb_file;
> static int actual_size;
>
> -static unsigned char mtime_key[] = "filesystem MTIME";
> -static unsigned char blksize_key[] = "filesystem BLKSIZE";
> -static unsigned char uuid_key[] = "filesystem UUID";
> -
> errcode_t set_undo_io_backing_manager(io_manager manager)
> {
> /*
> @@ -103,17 +150,34 @@ errcode_t set_undo_io_backup_file(char *file_name)
> return 0;
> }
>
> -static errcode_t write_file_system_identity(io_channel undo_channel,
> - TDB_CONTEXT *tdb)
> +static errcode_t write_undo_indexes(struct undo_private_data *data)
> {
> errcode_t retval;
> struct ext2_super_block super;
> - TDB_DATA tdb_key, tdb_data;
> - struct undo_private_data *data;
> io_channel channel;
> - int block_size ;
> + int block_size;
> + __u32 sb_crc, hdr_crc;
> +
> + /* Spit out a key block, if there's any data */
> + if (data->keys_in_block) {
> + data->keyb->magic = ext2fs_cpu_to_le32(KEYBLOCK_MAGIC);
> + data->keyb->crc = 0;
> + data->keyb->crc = ext2fs_cpu_to_le32(
> + ext2fs_crc32c_le(~0,
> + (unsigned char *)data->keyb,
> + data->tdb_data_size));
> + dbg_printf("Writing keyblock to blk %llu\n", data->key_blk_num);
> + retval = io_channel_write_blk64(data->undo_file,
> + data->key_blk_num,
> + 1, data->keyb);
> + if (retval)
> + return retval;
> + memset(data->keyb, 0, data->tdb_data_size);
> + data->keys_in_block = 0;
> + data->key_blk_num = data->undo_blk_num;
> + }
>
> - data = (struct undo_private_data *) undo_channel->private_data;
> + /* Prepare superblock for write */
> channel = data->real;
> block_size = channel->block_size;
>
> @@ -121,52 +185,42 @@ static errcode_t write_file_system_identity(io_channel undo_channel,
> retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
> if (retval)
> goto err_out;
> -
> - /* Write to tdb file in the file system byte order */
> - tdb_key.dptr = mtime_key;
> - tdb_key.dsize = sizeof(mtime_key);
> - tdb_data.dptr = (unsigned char *) &(super.s_mtime);
> - tdb_data.dsize = sizeof(super.s_mtime);
> -
> - retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
> - if (retval == -1) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> + sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)&super, SUPERBLOCK_SIZE);
> +
> + /* Write the undo header to disk. */
> + memcpy(data->hdr.magic, E2UNDO_MAGIC, sizeof(data->hdr.magic));
> + data->hdr.num_keys = ext2fs_cpu_to_le64(data->num_keys);
> + data->hdr.super_offset = ext2fs_cpu_to_le64(data->undo_blk_num);
> + data->hdr.key_offset = ext2fs_cpu_to_le64(1);
> + data->hdr.fs_block_size = ext2fs_cpu_to_le32(block_size);
> + data->hdr.sb_crc = ext2fs_cpu_to_le32(sb_crc);
It would be useful to have a state bit here that could tell us if we made it
all the way to undo_close(), which implies that the operation completed
successfully. When the bit isn't set, that means that the program writing the
undo file likely errored out halfway through, and the user should be warned at
e2undo time to run fsck.
> + hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&data->hdr,
> + sizeof(data->hdr) -
> + sizeof(data->hdr.header_crc));
> + data->hdr.header_crc = ext2fs_cpu_to_le32(hdr_crc);
> + retval = io_channel_write_blk64(data->undo_file, 0,
> + -(int)sizeof(data->hdr),
> + &data->hdr);
> + if (retval)
> goto err_out;
> - }
> -
> - tdb_key.dptr = uuid_key;
> - tdb_key.dsize = sizeof(uuid_key);
> - tdb_data.dptr = (unsigned char *)&(super.s_uuid);
> - tdb_data.dsize = sizeof(super.s_uuid);
>
> - retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
> - if (retval == -1) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> - }
> + /*
> + * Record the entire superblock (in FS byte order) so that we can't
> + * apply e2undo files to the wrong FS or out of order.
> + */
> + dbg_printf("Writing superblock to block %llu\n", data->undo_blk_num);
> + retval = io_channel_write_blk64(data->undo_file, data->undo_blk_num,
> + -SUPERBLOCK_SIZE, &super);
It's confusing how this keeps writing the superblock wherever the last block
is, and letting the next block write overwrite it. Seeing as the header tells
us where the superblock is located, just put it somewhere at the beginning, and
perhaps twiddle s_magic so that blkid and libmagic won't get confused if the
superblock ends up at offset 1024.
> + if (retval)
> + goto err_out;
> + data->undo_blk_num++;
>
> + retval = io_channel_flush(data->undo_file);
> err_out:
> io_channel_set_blksize(channel, block_size);
> return retval;
> }
>
> -static errcode_t write_block_size(TDB_CONTEXT *tdb, int block_size)
> -{
> - errcode_t retval;
> - TDB_DATA tdb_key, tdb_data;
> -
> - tdb_key.dptr = blksize_key;
> - tdb_key.dsize = sizeof(blksize_key);
> - tdb_data.dptr = (unsigned char *)&(block_size);
> - tdb_data.dsize = sizeof(block_size);
> -
> - retval = tdb_store(tdb, tdb_key, tdb_data, TDB_INSERT);
> - if (retval == -1) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> - }
> -
> - return retval;
> -}
> -
> static errcode_t undo_setup_tdb(struct undo_private_data *data)
> {
> errcode_t retval;
> @@ -187,15 +241,17 @@ static errcode_t undo_setup_tdb(struct undo_private_data *data)
> if (retval)
> return retval;
>
> - /* Write the blocksize to tdb file */
> - tdb_transaction_start(data->tdb);
> - retval = write_block_size(data->tdb,
> - data->tdb_data_size);
> - if (retval) {
> - tdb_transaction_cancel(data->tdb);
> - return EXT2_ET_TDB_ERR_IO;
> - }
> - tdb_transaction_commit(data->tdb);
> + /* Allocate key block */
> + retval = ext2fs_get_memzero(data->tdb_data_size, &data->keyb);
> + if (retval)
> + return retval;
> + data->key_blk_num = data->undo_blk_num++;
> +
> + /* Record block size */
> + dbg_printf("Undo block size %d\n", data->tdb_data_size);
> + dbg_printf("Keys per block %zu\n", KEYS_PER_BLOCK(data));
> + data->hdr.block_size = ext2fs_cpu_to_le32(data->tdb_data_size);
> + io_channel_set_blksize(data->undo_file, data->tdb_data_size);
> return 0;
> }
>
> @@ -208,13 +264,16 @@ static errcode_t undo_write_tdb(io_channel channel,
> errcode_t retval = 0;
> ext2_loff_t offset;
> struct undo_private_data *data;
> - TDB_DATA tdb_key, tdb_data;
> unsigned char *read_ptr;
> unsigned long long end_block;
> + unsigned long long data_size;
> + void *data_ptr;
> + struct undo_key *key;
> + __u32 blk_crc;
>
> data = (struct undo_private_data *) channel->private_data;
>
> - if (data->tdb == NULL) {
> + if (data->undo_file == NULL) {
> /*
> * Transaction database not initialized
> */
> @@ -243,11 +302,7 @@ static errcode_t undo_write_tdb(io_channel channel,
> block_num = offset / data->tdb_data_size;
> end_block = (offset + size) / data->tdb_data_size;
>
> - tdb_transaction_start(data->tdb);
> - while (block_num <= end_block ) {
> -
> - tdb_key.dptr = (unsigned char *)&block_num;
> - tdb_key.dsize = sizeof(block_num);
> + while (block_num <= end_block) {
> /*
> * Check if we have the record already
> */
> @@ -259,6 +314,13 @@ static errcode_t undo_write_tdb(io_channel channel,
> }
> ext2fs_mark_block_bitmap2(data->written_block_map, block_num);
>
> + /* Spit out a key block */
> + if (data->keys_in_block == KEYS_PER_BLOCK(data)) {
> + retval = write_undo_indexes(data);
> + if (retval)
> + return retval;
> + }
> +
> /*
> * Read one block using the backing I/O manager
> * The backing I/O manager block size may be
> @@ -273,7 +335,6 @@ static errcode_t undo_write_tdb(io_channel channel,
> ((offset - data->offset) % channel->block_size);
> retval = ext2fs_get_mem(count, &read_ptr);
> if (retval) {
> - tdb_transaction_cancel(data->tdb);
> return retval;
> }
>
> @@ -288,33 +349,43 @@ static errcode_t undo_write_tdb(io_channel channel,
> if (retval) {
> if (retval != EXT2_ET_SHORT_READ) {
> free(read_ptr);
> - tdb_transaction_cancel(data->tdb);
> return retval;
> }
> /*
> * short read so update the record size
> * accordingly
> */
> - tdb_data.dsize = actual_size;
> + data_size = actual_size;
> } else {
> - tdb_data.dsize = data->tdb_data_size;
> + data_size = data->tdb_data_size;
> }
> - tdb_data.dptr = read_ptr +
> - ((offset - data->offset) % channel->block_size);
> -#ifdef DEBUG
> - printf("Printing with key %lld data %x and size %d\n",
> + if (data_size == 0) {
> + free(read_ptr);
> + block_num++;
> + continue;
> + }
> + dbg_printf("Read %llu bytes from FS block %llu (blk=%llu cnt=%u)\n",
> + data_size, backing_blk_num, block, count);
> + if ((data_size % data->undo_file->block_size) == 0)
> + sz = data_size / data->undo_file->block_size;
> + else
> + sz = -actual_size;
> + data_ptr = read_ptr + ((offset - data->offset) %
> + data->undo_file->block_size);
> + dbg_printf("Writing block %llu to offset %llu size %d\n",
> block_num,
> - tdb_data.dptr,
> - tdb_data.dsize);
> -#endif
> - retval = tdb_store(data->tdb, tdb_key, tdb_data, TDB_REPLACE);
> - if (retval == -1) {
> - /*
> - * TDB_ERR_EXISTS cannot happen because we
> - * have already verified it doesn't exist
> - */
> - tdb_transaction_cancel(data->tdb);
> - retval = EXT2_ET_TDB_ERR_IO;
> + data->undo_blk_num,
> + sz);
> + data->num_keys++;
> + key = &data->keyb->keys[data->keys_in_block++];
> + key->fsblk = ext2fs_cpu_to_le64(backing_blk_num);
> + blk_crc = ext2fs_crc32c_le(~0, (unsigned char *)data_ptr,
> + data_size);
> + key->blk_crc = ext2fs_cpu_to_le32(blk_crc);
> + key->size = ext2fs_cpu_to_le32(data_size);
Since the key size is a u32 byte count, you might as well collect a run of
contiguous blocks and save them all with one key. There's a slight risk that a
minor bitflip somewhere could cause the whole block to be marked bad, but
seeing as our current strategy for handling that is to re-run with -f to ignore
csum errors and force a fsck, I guess that's not /so/ bad.
> + retval = io_channel_write_blk64(data->undo_file,
> + data->undo_blk_num++, sz, data_ptr);
> + if (retval) {
> free(read_ptr);
> return retval;
> }
> @@ -322,7 +393,6 @@ static errcode_t undo_write_tdb(io_channel channel,
> /* Next block */
> block_num++;
> }
> - tdb_transaction_commit(data->tdb);
>
> return retval;
> }
> @@ -375,29 +445,33 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
>
> memset(data, 0, sizeof(struct undo_private_data));
> data->magic = EXT2_ET_MAGIC_UNIX_IO_CHANNEL;
> - data->written_block_map = NULL;
> + data->undo_blk_num = 1;
>
> if (undo_io_backing_manager) {
> retval = undo_io_backing_manager->open(name, flags,
> &data->real);
> if (retval)
> goto cleanup;
> +
> + retval = ext2fs_open_file(tdb_file,
> + O_RDWR | O_CREAT | O_TRUNC | O_EXCL,
> + 0600);
O_CREAT | O_TRUNC? Why not allow programs to reopen an undo file, particularly
if the user passes in a specific undo file via -z. You'd want to perform some
checking to ensure that the state of the FS when we finished writing the undo
file matches the current state of the FS (comparing superblocks would suffice
to capture UUID, mount time, write time, and lifetime write counters).
For big multi-program transitions (think resize2fs -b to enable 64bit, then
tune2fs -O to enable metadata_csum, then a full fsck -fyD to rebuild the
indexes) reusing the same undo file saves us the trouble of storing (and then
restoring) a lot of in-between-program-invocation state.
(In particular, doing exactly that on my 2TB test FS reduces the undo space
requirement from 4430MB to 2930MB.)
> + if (retval < 0)
> + goto cleanup;
> + close(retval);
> + retval = undo_io_backing_manager->open(tdb_file, IO_FLAG_RW,
> + &data->undo_file);
> + if (retval)
> + goto cleanup;
> } else {
> - data->real = 0;
> + data->real = NULL;
> + data->undo_file = NULL;
> }
>
> if (data->real)
> io->flags = (io->flags & ~CHANNEL_FLAGS_DISCARD_ZEROES) |
> (data->real->flags & CHANNEL_FLAGS_DISCARD_ZEROES);
>
> - /* setup the tdb file */
> - data->tdb = tdb_open(tdb_file, 0, TDB_CLEAR_IF_FIRST | TDB_NOLOCK | TDB_NOSYNC,
> - O_RDWR | O_CREAT | O_TRUNC | O_EXCL, 0600);
> - if (!data->tdb) {
> - retval = errno;
> - goto cleanup;
> - }
> -
> /*
> * setup err handler for read so that we know
> * when the backing manager fails do short read
> @@ -409,6 +483,8 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
> return 0;
>
> cleanup:
> + if (data && data->undo_file)
> + io_channel_close(data->undo_file);
> if (data && data->real)
> io_channel_close(data->real);
> if (data)
> @@ -430,13 +506,12 @@ static errcode_t undo_close(io_channel channel)
> if (--channel->refcount > 0)
> return 0;
> /* Before closing write the file system identity */
> - err = write_file_system_identity(channel, data->tdb);
> + err = write_undo_indexes(data);
Most of the e2fsprogs client programs will exit() when they encounter errors.
Since we're no longer writing out the entire undo file index with every block
write and the client programs can't be relied upon to shut down the IO channel,
I suppose we ought to find a way to make sure that write_undo_indexes() gets
called at exit if undo_close() hasn't already been called.
> if (data->real)
> retval = io_channel_close(data->real);
> - if (data->tdb) {
> - tdb_flush(data->tdb);
> - tdb_close(data->tdb);
> - }
> + if (data->undo_file)
> + io_channel_close(data->undo_file);
> + ext2fs_free_mem(&data->keyb);
> if (data->written_block_map)
> ext2fs_free_generic_bitmap(data->written_block_map);
> ext2fs_free_mem(&channel->private_data);
> @@ -632,8 +707,6 @@ static errcode_t undo_flush(io_channel channel)
> data = (struct undo_private_data *) channel->private_data;
> EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);
>
> - if (data->tdb)
> - tdb_flush(data->tdb);
> if (data->real)
> retval = io_channel_flush(data->real);
>
> diff --git a/misc/e2undo.8.in b/misc/e2undo.8.in
> index 4bf0798..3677b83 100644
> --- a/misc/e2undo.8.in
> +++ b/misc/e2undo.8.in
> @@ -10,6 +10,9 @@ e2undo \- Replay an undo log for an ext2/ext3/ext4 filesystem
> [
> .B \-f
> ]
> +[
> +.B \-n
> +]
> .I undo_log device
> .SH DESCRIPTION
> .B e2undo
> @@ -24,13 +27,15 @@ used to undo a failed operation by an e2fsprogs program.
> .B \-f
> Normally,
> .B e2undo
> -will check the filesystem UUID and last modified time to make sure the
> -undo log matches with the filesystem on the device. If they do not
> -match,
> +will check the filesystem superblock to make sure the undo log matches
> +with the filesystem on the device. If they do not match,
> .B e2undo
> will refuse to apply the undo log as a safety mechanism. The
> .B \-f
> option disables this safety mechanism.
> +.TP
> +.B \-n
> +Dry-run; do not actually write blocks back to the filesystem.
> .SH AUTHOR
> .B e2undo
> was written by Aneesh Kumar K.V. (aneesh.kumar@linux.vnet.ibm.com)
> diff --git a/misc/e2undo.c b/misc/e2undo.c
> index d828d3b..00e8e27 100644
> --- a/misc/e2undo.c
> +++ b/misc/e2undo.c
> @@ -20,13 +20,72 @@
> #if HAVE_ERRNO_H
> #include <errno.h>
> #endif
> -#include "ext2fs/tdb.h"
> #include "ext2fs/ext2fs.h"
> #include "nls-enable.h"
>
> -static unsigned char mtime_key[] = "filesystem MTIME";
> -static unsigned char uuid_key[] = "filesystem UUID";
> -static unsigned char blksize_key[] = "filesystem BLKSIZE";
> +#undef DEBUG
> +
> +#ifdef DEBUG
> +# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
> +#else
> +# define dbg_printf(f, a...)
> +#endif
> +
> +/*
> + * Undo file format: The file is cut up into undo_header.block_size blocks.
> + * The first block contains the header.
> + * There is then a repeating series of blocks as follows:
> + * A key block, which contains undo_keys to map the following data blocks.
> + * Data blocks
> + * The last block contains the superblock.
> + * (Note that there are pointers to the first key block and the sb, so this
> + * order isn't strictly necessary.)
> + */
> +#define E2UNDO_MAGIC "E2UNDO02"
> +#define KEYBLOCK_MAGIC 0xCADECADE
> +
> +struct undo_header {
> + char magic[8]; /* "E2UNDO02" */
> + __le64 num_keys; /* how many keys? */
> + __le64 super_offset; /* where in the file is the superblock copy? */
> + __le64 key_offset; /* where do the key/data block chunks start? */
> + __le32 block_size; /* block size of the undo file */
> + __le32 fs_block_size; /* block size of the target device */
> + __le32 sb_crc; /* crc32c of the superblock */
> + __u8 padding[464]; /* padding */
> + __le32 header_crc; /* crc32c of the header (but not this field) */
> +};
> +
> +struct undo_key {
> + __le64 fsblk; /* where in the fs does the block go */
> + __le32 blk_crc; /* crc32c of the block */
> + __le32 size; /* how many bytes in this block? */
> +};
> +
> +struct undo_key_block {
> + __le32 magic; /* KEYBLOCK_MAGIC number */
> + __le32 crc; /* block checksum */
> + __le64 reserved; /* zero */
> +
> + struct undo_key keys[0]; /* keys, which come immediately after */
> +};
> +
> +struct undo_key_info {
> + blk64_t fsblk;
> + blk64_t fileblk;
> + __u32 blk_crc;
> + unsigned int size;
> +};
> +
> +struct undo_context {
> + struct undo_header hdr;
> + io_channel undo_file;
> + unsigned int blocksize, fs_blocksize;
> + blk64_t super_block;
> + size_t num_keys;
> + struct undo_key_info *keys;
> +};
> +#define KEYS_PER_BLOCK(d) (((d)->blocksize / sizeof(struct undo_key)) - 1)
>
> static char *prg_name;
>
> @@ -37,13 +96,28 @@ static void usage(void)
> exit(1);
> }
>
> -static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
> +static void print_undo_mismatch(struct ext2_super_block *fs_super,
> + struct ext2_super_block *undo_super)
> +{
> + printf("%s",
> + _("The file system superblock doesn't match the undo file.\n"));
> + if (memcmp(fs_super->s_uuid, undo_super->s_uuid,
> + sizeof(fs_super->s_uuid)))
> + printf("%s", _("UUID does not match.\n"));
> + if (fs_super->s_mtime != undo_super->s_mtime)
> + printf("%s", _("Last mount time does not match.\n"));
> + if (fs_super->s_wtime != undo_super->s_wtime)
> + printf("%s", _("Last write time does not match.\n"));
> + if (fs_super->s_kbytes_written != undo_super->s_kbytes_written)
> + printf("%s", _("Lifetime write counter does not match.\n"));
> +}
> +
> +static int check_filesystem(struct undo_context *ctx, io_channel channel)
> {
> - __u32 s_mtime;
> - __u8 s_uuid[16];
> - errcode_t retval;
> - TDB_DATA tdb_key, tdb_data;
> struct ext2_super_block super;
> + char *buf;
> + __u32 sb_crc;
> + errcode_t retval;
>
> io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
> retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
> @@ -53,83 +127,64 @@ static int check_filesystem(TDB_CONTEXT *tdb, io_channel channel)
> return retval;
> }
>
> - tdb_key.dptr = mtime_key;
> - tdb_key.dsize = sizeof(mtime_key);
> - tdb_data = tdb_fetch(tdb, tdb_key);
> - if (!tdb_data.dptr) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> - com_err(prg_name, retval, "%s",
> - _("while fetching last mount time."));
> + /*
> + * Compare the FS and the undo file superblock so that we can't apply
> + * e2undo "patches" out of order.
> + */
> + retval = ext2fs_get_mem(ctx->blocksize, &buf);
> + if (retval) {
> + com_err(prg_name, retval, "%s", _("while allocating memory"));
> return retval;
> }
> -
> - s_mtime = *(__u32 *)tdb_data.dptr;
> - free(tdb_data.dptr);
> - if (super.s_mtime != s_mtime) {
> - com_err(prg_name, 0,
> - _("The filesystem last mount time didn't match %u."),
> - s_mtime);
> -
> - return -1;
> - }
> -
> -
> - tdb_key.dptr = uuid_key;
> - tdb_key.dsize = sizeof(uuid_key);
> - tdb_data = tdb_fetch(tdb, tdb_key);
> - if (!tdb_data.dptr) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> - com_err(prg_name, retval, "%s", _("while fetching UUID"));
> + retval = io_channel_read_blk64(ctx->undo_file, ctx->super_block,
> + -SUPERBLOCK_SIZE, buf);
> + if (retval) {
> + ext2fs_free_mem(&buf);
> + com_err(prg_name, retval, "%s", _("while fetching superblock"));
> return retval;
> }
> - memcpy(s_uuid, tdb_data.dptr, sizeof(s_uuid));
> - free(tdb_data.dptr);
> - if (memcmp(s_uuid, super.s_uuid, sizeof(s_uuid))) {
> - com_err(prg_name, 0, "%s",
> - _("The filesystem UUID didn't match."));
> + if (memcmp(&super, buf, sizeof(super))) {
> + print_undo_mismatch(&super, (struct ext2_super_block *)buf);
> + ext2fs_free_mem(&buf);
> + return -1;
> + }
> + sb_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf, SUPERBLOCK_SIZE);
> + if (ext2fs_le32_to_cpu(ctx->hdr.sb_crc) != sb_crc) {
> + fprintf(stderr,
> + _("Undo file superblock checksum doesn't match.\n"));
> return -1;
> }
> + ext2fs_free_mem(&buf);
>
> return 0;
> }
>
> -static int set_blk_size(TDB_CONTEXT *tdb, io_channel channel)
> +static int key_compare(const void *a, const void *b)
> {
> - int block_size;
> - errcode_t retval;
> - TDB_DATA tdb_key, tdb_data;
> -
> - tdb_key.dptr = blksize_key;
> - tdb_key.dsize = sizeof(blksize_key);
> - tdb_data = tdb_fetch(tdb, tdb_key);
> - if (!tdb_data.dptr) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> - com_err(prg_name, retval, "%s", _("while fetching block size"));
> - return retval;
> - }
> -
> - block_size = *(int *)tdb_data.dptr;
> - free(tdb_data.dptr);
> -#ifdef DEBUG
> - printf("Block size %d\n", block_size);
> -#endif
> - io_channel_set_blksize(channel, block_size);
> + const struct undo_key_info *ka, *kb;
>
> - return 0;
> + ka = a;
> + kb = b;
> + return ext2fs_le64_to_cpu(ka->fsblk) -
> + ext2fs_le64_to_cpu(kb->fsblk);
> }
>
> int main(int argc, char *argv[])
> {
> - int c,force = 0;
> - TDB_CONTEXT *tdb;
> - TDB_DATA key, data;
> + int c, force = 0, dry_run = 0;
> io_channel channel;
> errcode_t retval;
> - int mount_flags;
> - blk64_t blk_num;
> + int mount_flags, csum_error = 0;
> + size_t i, keys_per_block;
> char *device_name, *tdb_file;
> io_manager manager = unix_io_manager;
> - void *old_dptr = NULL;
> + struct undo_context undo_ctx;
> + char *buf;
> + struct undo_key_block *keyb;
> + struct undo_key *dkey;
> + struct undo_key_info *ikey;
> + __u32 key_crc, blk_crc, hdr_crc;
> + blk64_t lblk;
>
> #ifdef ENABLE_NLS
> setlocale(LC_MESSAGES, "");
> @@ -141,11 +196,14 @@ int main(int argc, char *argv[])
> add_error_table(&et_ext2_error_table);
>
> prg_name = argv[0];
> - while((c = getopt(argc, argv, "f")) != EOF) {
> + while((c = getopt(argc, argv, "fn")) != EOF) {
> switch (c) {
> case 'f':
> force = 1;
> break;
> + case 'n':
> + dry_run = 1;
> + break;
Could be useful to have a -v for verbose status messages and a -h to dump the
e2undo file header.
> default:
> usage();
> }
> @@ -157,14 +215,41 @@ int main(int argc, char *argv[])
> tdb_file = argv[optind];
> device_name = argv[optind+1];
>
> - tdb = tdb_open(tdb_file, 0, 0, O_RDONLY, 0600);
> -
> - if (!tdb) {
> + /* Interpret the undo file */
> + retval = manager->open(tdb_file, IO_FLAG_EXCLUSIVE,
> + &undo_ctx.undo_file);
> + if (retval) {
> com_err(prg_name, errno,
> _("while opening undo file `%s'\n"), tdb_file);
> exit(1);
> }
> + retval = io_channel_read_blk64(undo_ctx.undo_file, 0,
> + -(int)sizeof(undo_ctx.hdr),
> + &undo_ctx.hdr);
> + if (retval) {
> + com_err(prg_name, retval, _("while reading undo file"));
> + exit(1);
> + }
> + if (memcmp(undo_ctx.hdr.magic, E2UNDO_MAGIC,
> + sizeof(undo_ctx.hdr.magic))) {
> + fprintf(stderr, _("%s: Not an undo file.\n"), tdb_file);
> + exit(1);
> + }
> + hdr_crc = ext2fs_crc32c_le(~0, (unsigned char *)&undo_ctx.hdr,
> + sizeof(struct undo_header) -
> + sizeof(__u32));
> + if (!force && ext2fs_le32_to_cpu(undo_ctx.hdr.header_crc) != hdr_crc) {
> + fprintf(stderr, _("%s: Header checksum doesn't match.\n"),
> + tdb_file);
> + exit(1);
> + }
> + undo_ctx.blocksize = ext2fs_le32_to_cpu(undo_ctx.hdr.block_size);
> + undo_ctx.fs_blocksize = ext2fs_le32_to_cpu(undo_ctx.hdr.fs_block_size);
> + undo_ctx.super_block = ext2fs_le64_to_cpu(undo_ctx.hdr.super_offset);
> + undo_ctx.num_keys = ext2fs_le64_to_cpu(undo_ctx.hdr.num_keys);
> + io_channel_set_blksize(undo_ctx.undo_file, undo_ctx.blocksize);
>
> + /* open the fs */
> retval = ext2fs_check_if_mounted(device_name, &mount_flags);
> if (retval) {
> com_err(prg_name, retval, _("Error while determining whether "
> @@ -179,52 +264,146 @@ int main(int argc, char *argv[])
> }
>
> retval = manager->open(device_name,
> - IO_FLAG_EXCLUSIVE | IO_FLAG_RW, &channel);
> + IO_FLAG_EXCLUSIVE | (dry_run ? 0 : IO_FLAG_RW),
> + &channel);
> if (retval) {
> com_err(prg_name, retval,
> _("while opening `%s'"), device_name);
> exit(1);
> }
>
> - if (!force && check_filesystem(tdb, channel)) {
> + if (!force && check_filesystem(&undo_ctx, channel))
> exit(1);
> - }
>
> - if (set_blk_size(tdb, channel)) {
> + /* prepare to read keys */
> + retval = ext2fs_get_mem(sizeof(struct undo_key_info) * undo_ctx.num_keys,
> + &undo_ctx.keys);
> + if (retval) {
> + com_err(prg_name, retval, "%s", _("while allocating memory"));
> + exit(1);
> + }
> + ikey = undo_ctx.keys;
> + retval = ext2fs_get_mem(undo_ctx.blocksize, &keyb);
> + if (retval) {
> + com_err(prg_name, retval, "%s", _("while allocating memory"));
> + exit(1);
> + }
> + retval = ext2fs_get_mem(undo_ctx.blocksize, &buf);
> + if (retval) {
> + com_err(prg_name, retval, "%s", _("while allocating memory"));
> exit(1);
> }
>
> - for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
> - free(old_dptr);
> - old_dptr = key.dptr;
> - if (!strcmp((char *) key.dptr, (char *) mtime_key) ||
> - !strcmp((char *) key.dptr, (char *) uuid_key) ||
> - !strcmp((char *) key.dptr, (char *) blksize_key)) {
> - continue;
> + /* load keys */
> + keys_per_block = KEYS_PER_BLOCK(&undo_ctx);
> + lblk = ext2fs_le64_to_cpu(undo_ctx.hdr.key_offset);
> + dbg_printf("nr_keys=%lu, kpb=%zu, blksz=%u\n",
> + undo_ctx.num_keys, keys_per_block, undo_ctx.blocksize);
> + for (i = 0; i < undo_ctx.num_keys; i += keys_per_block) {
> + size_t j, max_j;
> + __le32 crc;
> +
> + retval = io_channel_read_blk64(undo_ctx.undo_file,
> + lblk, 1, keyb);
> + if (retval) {
> + com_err(prg_name, retval, "%s", _("while reading keys"));
> + exit(1);
> + }
> +
> + /* check keys */
> + if (!force &&
> + ext2fs_le32_to_cpu(keyb->magic) != KEYBLOCK_MAGIC) {
> + fprintf(stderr, _("%s: wrong key magic at %llu\n"),
> + tdb_file, lblk);
> + exit(1);
> }
> + crc = keyb->crc;
> + keyb->crc = 0;
> + key_crc = ext2fs_crc32c_le(~0, (unsigned char *)keyb,
> + undo_ctx.blocksize);
> + if (!force && ext2fs_le32_to_cpu(crc) != key_crc) {
> + fprintf(stderr,
> + _("%s: key block checksum error at %llu.\n"),
> + tdb_file, lblk);
> + exit(1);
> + }
> +
> + /* load keys from key block */
> + lblk++;
> + max_j = undo_ctx.num_keys - i;
> + if (max_j > keys_per_block)
> + max_j = keys_per_block;
> + for (j = 0, dkey = keyb->keys;
> + j < max_j;
> + j++, ikey++, dkey++) {
> + ikey->fsblk = ext2fs_le64_to_cpu(dkey->fsblk);
> + ikey->fileblk = lblk++;
> + ikey->blk_crc = ext2fs_le32_to_cpu(dkey->blk_crc);
> + ikey->size = ext2fs_le32_to_cpu(dkey->size);
Icky. Please check ikey->size; later down you assume that ikey->size <
blocksize. For that matter, it's quite possible to specify multiblock extents
with this key format, but this code would explode if it ever found such a
thing.
> +
> + /* check each block's crc */
> + retval = io_channel_read_blk64(undo_ctx.undo_file,
> + ikey->fileblk,
> + -(int)undo_ctx.blocksize,
> + buf);
> + if (retval) {
> + com_err(prg_name, retval,
> + _("while fetching block %llu."),
> + ikey->fileblk);
> + exit(1);
> + }
>
> - blk_num = *(blk64_t *)key.dptr;
> - data = tdb_fetch(tdb, key);
> - if (!data.dptr) {
> - retval = EXT2_ET_TDB_SUCCESS + tdb_error(tdb);
> + blk_crc = ext2fs_crc32c_le(~0, (unsigned char *)buf,
> + ikey->size);
> + if (blk_crc != ikey->blk_crc) {
> + fprintf(stderr,
> + _("checksum error in filesystem block %llu\n"),
> + ikey->fsblk);
> + if (!force)
> + exit(1);
> + csum_error = 1;
> + }
> + }
> + }
> + ext2fs_free_mem(&keyb);
> +
> + /* sort keys in fs block order */
> + qsort(undo_ctx.keys, undo_ctx.num_keys, sizeof(struct undo_key_info),
> + key_compare);
> +
> + /* replay */
> + io_channel_set_blksize(channel, undo_ctx.fs_blocksize);
> + for (i = 0, ikey = undo_ctx.keys; i < undo_ctx.num_keys; i++, ikey++) {
> + retval = io_channel_read_blk64(undo_ctx.undo_file,
> + ikey->fileblk,
> + -(int)undo_ctx.blocksize,
> + buf);
> + if (retval) {
> com_err(prg_name, retval,
> - _("while fetching block %llu."), blk_num);
> + _("while fetching block %llu."),
> + ikey->fileblk);
> exit(1);
> }
> - printf(_("Replayed transaction of size %zd at location %llu\n"),
> - data.dsize, blk_num);
> - retval = io_channel_write_blk64(channel, blk_num,
> - -data.dsize, data.dptr);
> - free(data.dptr);
> - if (retval == -1) {
> +
> + dbg_printf("Replayed transaction of size %u from %llu to %llu\n",
> + ikey->size, ikey->fileblk, ikey->fsblk);
> + if (dry_run)
> + continue;
> + retval = io_channel_write_blk64(channel, ikey->fsblk,
> + -(int)ikey->size, buf);
> + if (retval) {
> com_err(prg_name, retval,
> - _("while writing block %llu."), blk_num);
> + _("while writing block %llu."), ikey->fsblk);
> exit(1);
> }
> }
> - free(old_dptr);
> +
> + if (csum_error)
> + fprintf(stderr, _("Undo file corruption; run e2fsck NOW!\n"));
In addition to whining about checksum errors, I think we should complain IO
errors, without aborting playback. In case of either error, we can keep
going in the hope of restoring as much of the FS as possible.
Then we could try to open the filesystem (unless we're undoing mkfs) to clear
the VALID_FS bit and set the ERROR_FS bit, which will force a full fsck.
--D
> + ext2fs_free_mem(&buf);
> + ext2fs_free_mem(&undo_ctx.keys);
> io_channel_close(channel);
> - tdb_close(tdb);
> + io_channel_close(undo_ctx.undo_file);
>
> - return 0;
> + return csum_error;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 17/31] e2fsck: optionally create an undo file
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (15 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 16/31] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 18/31] resize2fs: optionally create " Darrick J. Wong
` (18 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Provide the user with an option to create an undo file so that they
can roll back a failed repair operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/e2fsck.8.in | 10 +++++
e2fsck/e2fsck.h | 3 ++
e2fsck/unix.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 111 insertions(+), 2 deletions(-)
diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 0c2725e..11f72f8 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -321,6 +321,16 @@ may not be specified at the same time as the
or
.B \-p
options.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file. This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong. If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+e2fsck-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
.SH EXIT CODE
The exit code returned by
.B e2fsck
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 15d968e..04f7bac 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -388,6 +388,9 @@ struct e2fsck_struct {
* Inodes to rebuild extent trees
*/
ext2fs_inode_bitmap inodes_to_rebuild;
+
+ /* Undo file */
+ char *undo_file;
};
/* Data structures to evaluate whether an extent tree needs rebuilding. */
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index fe5127a..d9be549 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -45,6 +45,7 @@ extern int optind;
#ifdef HAVE_DIRENT_H
#include <dirent.h>
#endif
+#include <libgen.h>
#include "e2p/e2p.h"
#include "et/com_err.h"
@@ -75,7 +76,7 @@ static void usage(e2fsck_t ctx)
_("Usage: %s [-panyrcdfvtDFV] [-b superblock] [-B blocksize]\n"
"\t\t[-I inode_buffer_blocks] [-P process_inode_size]\n"
"\t\t[-l|-L bad_blocks_file] [-C fd] [-j external_journal]\n"
- "\t\t[-E extended-options] device\n"),
+ "\t\t[-E extended-options] [-z undo_file] device\n"),
ctx->program_name);
fprintf(stderr, "%s", _("\nEmergency help:\n"
@@ -91,6 +92,7 @@ static void usage(e2fsck_t ctx)
" -j external_journal Set location of the external journal\n"
" -l bad_blocks_file Add to badblocks list\n"
" -L bad_blocks_file Set badblocks list\n"
+ " -z undo_file Create an undo file\n"
));
exit(FSCK_USAGE);
@@ -795,7 +797,7 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
phys_mem_kb = get_memory_size() / 1024;
ctx->readahead_kb = ~0ULL;
- while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
+ while ((c = getopt(argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDkz:")) != EOF)
switch (c) {
case 'C':
ctx->progress = e2fsck_update_progress;
@@ -927,6 +929,9 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
case 'k':
keep_bad_blocks++;
break;
+ case 'z':
+ ctx->undo_file = optarg;
+ break;
default:
usage(ctx);
}
@@ -1205,6 +1210,91 @@ check_error:
return retval;
}
+static int e2fsck_setup_tdb(e2fsck_t ctx, io_manager *io_ptr)
+{
+ errcode_t retval = ENOMEM;
+ char *tdb_dir = NULL, *tdb_file = NULL;
+ char *dev_name, *tmp_name;
+ int free_tdb_dir = 0;
+
+ if (ctx->undo_file && ctx->undo_file[0] != 0) {
+ if ((unlink(ctx->undo_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto err;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(ctx->undo_file);
+ if (retval)
+ goto err;
+ printf(_("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n"),
+ ctx->undo_file, ctx->filesystem_name);
+ return 0;
+ }
+
+ /*
+ * Configuration via a conf file would be
+ * nice
+ */
+ tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+ if (!tdb_dir) {
+ profile_get_string(ctx->profile, "defaults",
+ "undo_dir", 0, "/var/lib/e2fsprogs",
+ &tdb_dir);
+ free_tdb_dir = 1;
+ }
+
+ if (!strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+ access(tdb_dir, W_OK)) {
+ if (free_tdb_dir)
+ free(tdb_dir);
+ return 0;
+ }
+
+ tmp_name = strdup(ctx->filesystem_name);
+ if (!tmp_name)
+ goto errout;
+ dev_name = basename(tmp_name);
+ tdb_file = malloc(strlen(tdb_dir) + 8 + strlen(dev_name) + 7 + 1);
+ if (!tdb_file) {
+ free(tmp_name);
+ goto errout;
+ }
+ sprintf(tdb_file, "%s/e2fsck-%s.e2undo", tdb_dir, dev_name);
+ free(tmp_name);
+
+ if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto errout;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(tdb_file);
+ if (retval)
+ goto errout;
+ printf(_("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n"), tdb_file, ctx->filesystem_name);
+
+ if (free_tdb_dir)
+ free(tdb_dir);
+ free(tdb_file);
+ return 0;
+
+errout:
+ if (free_tdb_dir)
+ free(tdb_dir);
+ free(tdb_file);
+err:
+ com_err(ctx->program_name, retval, "%s",
+ _("while trying to setup undo file\n"));
+ return retval;
+}
+
int main (int argc, char *argv[])
{
errcode_t retval = 0, retval2 = 0, orig_retval = 0;
@@ -1314,6 +1404,12 @@ restart:
flags &= ~EXT2_FLAG_EXCLUSIVE;
}
+ if (ctx->undo_file) {
+ retval = e2fsck_setup_tdb(ctx, &io_ptr);
+ if (retval)
+ exit(FSCK_ERROR);
+ }
+
ctx->openfs_flags = flags;
retval = try_open_fs(ctx, flags, io_ptr, &fs);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 18/31] resize2fs: optionally create undo file
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (16 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 17/31] e2fsck: optionally create an undo file Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:18 ` [PATCH 19/31] tune2fs: " Darrick J. Wong
` (17 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Provide the user with an option to create an undo file so that they
can roll back a failed resize operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
resize/main.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++--
resize/resize2fs.8.in | 14 +++++++
2 files changed, 108 insertions(+), 3 deletions(-)
diff --git a/resize/main.c b/resize/main.c
index c25de61..ea9a91b 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -29,6 +29,7 @@ extern int optind;
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
+#include <libgen.h>
#include "e2p/e2p.h"
@@ -42,7 +43,8 @@ static char *device_name, *io_options;
static void usage (char *prog)
{
fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
- "[-p] device [-b|-s|new_size]\n\n"), prog);
+ "[-p] device [-b|-s|new_size] [-z undo_file]\n\n"),
+ prog);
exit (1);
}
@@ -162,6 +164,86 @@ static void bigalloc_check(ext2_filsys fs, int force)
}
}
+static int resize2fs_setup_tdb(const char *device_name, char *undo_file,
+ io_manager *io_ptr)
+{
+ errcode_t retval = ENOMEM;
+ char *tdb_dir = NULL, *tdb_file = NULL;
+ char *dev_name, *tmp_name;
+ int free_tdb_dir = 0;
+
+ if (undo_file && undo_file[0] != 0) {
+ if ((unlink(undo_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto err;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(undo_file);
+ if (retval)
+ goto err;
+ printf(_("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n"),
+ undo_file, device_name);
+ return 0;
+ }
+
+ /*
+ * Configuration via a conf file would be
+ * nice
+ */
+ tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+
+ if (tdb_dir == NULL || !strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+ access(tdb_dir, W_OK)) {
+ if (free_tdb_dir)
+ free(tdb_dir);
+ return 0;
+ }
+
+ tmp_name = strdup(device_name);
+ if (!tmp_name)
+ goto errout;
+ dev_name = basename(tmp_name);
+ tdb_file = malloc(strlen(tdb_dir) + 8 + strlen(dev_name) + 7 + 1);
+ if (!tdb_file) {
+ free(tmp_name);
+ goto errout;
+ }
+ sprintf(tdb_file, "%s/resize2fs-%s.e2undo", tdb_dir, dev_name);
+ free(tmp_name);
+
+ if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto errout;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(tdb_file);
+ if (retval)
+ goto errout;
+ printf(_("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n"), tdb_file, device_name);
+
+ if (free_tdb_dir)
+ free(tdb_dir);
+ free(tdb_file);
+ return 0;
+
+errout:
+ if (free_tdb_dir)
+ free(tdb_dir);
+ free(tdb_file);
+err:
+ com_err(program_name, retval, "%s",
+ _("while trying to setup undo file\n"));
+ return retval;
+}
+
int main (int argc, char ** argv)
{
errcode_t retval;
@@ -186,7 +268,7 @@ int main (int argc, char ** argv)
unsigned int blocksize;
long sysval;
int len, mount_flags;
- char *mtpt;
+ char *mtpt, *undo_file = NULL;
#ifdef ENABLE_NLS
setlocale(LC_MESSAGES, "");
@@ -203,7 +285,7 @@ int main (int argc, char ** argv)
if (argc && *argv)
program_name = *argv;
- while ((c = getopt(argc, argv, "d:fFhMPpS:bs")) != EOF) {
+ while ((c = getopt(argc, argv, "d:fFhMPpS:bsz:")) != EOF) {
switch (c) {
case 'h':
usage(program_name);
@@ -235,6 +317,9 @@ int main (int argc, char ** argv)
case 's':
flags |= RESIZE_DISABLE_64BIT;
break;
+ case 'z':
+ undo_file = optarg;
+ break;
default:
usage(program_name);
}
@@ -319,6 +404,12 @@ int main (int argc, char ** argv)
io_flags |= EXT2_FLAG_64BITS;
+ if (undo_file) {
+ retval = resize2fs_setup_tdb(device_name, undo_file, &io_ptr);
+ if (retval)
+ exit(1);
+ }
+
retval = ext2fs_open2(device_name, io_options, io_flags,
0, 0, io_ptr, &fs);
if (retval) {
diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
index 0129bfc..d2738e9 100644
--- a/resize/resize2fs.8.in
+++ b/resize/resize2fs.8.in
@@ -18,6 +18,10 @@ resize2fs \- ext2/ext3/ext4 file system resizer
.B \-S
.I RAID-stride
]
+[
+.B \-z
+.I undo_file
+]
.I device
[
.I size
@@ -149,6 +153,16 @@ The
program will heuristically determine the RAID stride that was specified
when the filesystem was created. This option allows the user to
explicitly specify a RAID stride setting to be used by resize2fs instead.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file. This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong. If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+resize2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
.SH KNOWN BUGS
The minimum size of the filesystem as estimated by resize2fs may be
incorrect, especially for filesystems with 1k and 2k blocksizes.
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 19/31] tune2fs: optionally create undo file
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (17 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 18/31] resize2fs: optionally create " Darrick J. Wong
@ 2014-12-20 21:18 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 20/31] mke2fs: " Darrick J. Wong
` (16 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:18 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Provide the user with an option to create an undo file so that they
can roll back a failed tuning operation. Previously, one would be
created for inode resize if a bunch of (undocumented) conditions were
met.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
misc/tune2fs.8.in | 14 ++++++++++++++
misc/tune2fs.c | 33 +++++++++++++++++++++++++++++----
2 files changed, 43 insertions(+), 4 deletions(-)
diff --git a/misc/tune2fs.8.in b/misc/tune2fs.8.in
index c50d475..f6a475d 100644
--- a/misc/tune2fs.8.in
+++ b/misc/tune2fs.8.in
@@ -88,6 +88,10 @@ tune2fs \- adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems
.B \-U
.I UUID
]
+[
+.B \-z
+.I undo_file
+]
device
.SH DESCRIPTION
.BI tune2fs
@@ -684,6 +688,16 @@ or
.IR /dev/urandom ,
.B tune2fs
will automatically use a time-based UUID instead of a randomly-generated UUID.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file. This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong. If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+tune2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
.SH BUGS
We haven't found any bugs yet. That doesn't mean there aren't any...
.SH AUTHOR
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index f01b05b..3a73f20 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -97,6 +97,7 @@ static unsigned long new_inode_size;
static char *ext_mount_opts;
static int usrquota, grpquota;
static int rewrite_checksums;
+static char *undo_file;
int journal_size, journal_flags;
char *journal_device;
@@ -134,7 +135,8 @@ static void usage(void)
"\t[-Q quota_options]\n"
#endif
"\t[-E extended-option[,...]] [-T last_check_time] "
- "[-U UUID]\n\t[ -I new_inode_size ] device\n"), program_name);
+ "[-U UUID]\n\t[-I new_inode_size] [-z undo_file] device\n"),
+ program_name);
exit(1);
}
@@ -1498,7 +1500,7 @@ static void parse_tune2fs_options(int argc, char **argv)
char *tmp;
struct group *gr;
struct passwd *pw;
- char optstring[100] = "c:e:fg:i:jlm:o:r:s:u:C:E:I:J:L:M:O:T:U:";
+ char optstring[100] = "c:e:fg:i:jlm:o:r:s:u:C:E:I:J:L:M:O:T:U:z:";
#ifdef CONFIG_QUOTA
strcat(optstring, "Q:");
@@ -1732,6 +1734,9 @@ static void parse_tune2fs_options(int argc, char **argv)
open_flag = EXT2_FLAG_RW;
I_flag = 1;
break;
+ case 'z':
+ undo_file = optarg;
+ break;
default:
usage();
}
@@ -2452,6 +2457,21 @@ static int tune2fs_setup_tdb(const char *name, io_manager *io_ptr)
char *tdb_file;
char *dev_name, *tmp_name;
+ if (undo_file && undo_file[0] != 0) {
+ if ((unlink(undo_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto err;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ set_undo_io_backup_file(undo_file);
+ printf(_("To undo the tune2fs operation please run "
+ "the command\n e2undo %s %s\n\n"),
+ undo_file, name);
+ return retval;
+ }
+
#if 0 /* FIXME!! */
/*
* Configuration via a conf file would be
@@ -2499,6 +2519,7 @@ static int tune2fs_setup_tdb(const char *name, io_manager *io_ptr)
tdb_file, name);
free(tdb_file);
free(tmp_name);
+err:
return retval;
}
@@ -2647,7 +2668,7 @@ retry_open:
}
fs->default_bitmap_type = EXT2FS_BMAP64_RBTREE;
- if (I_flag && !io_ptr_orig) {
+ if (I_flag) {
/*
* Check the inode size is right so we can issue an
* error message and bail before setting up the tdb
@@ -2671,11 +2692,15 @@ retry_open:
rc = 1;
goto closefs;
}
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 20/31] mke2fs: optionally create undo file
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (18 preceding siblings ...)
2014-12-20 21:18 ` [PATCH 19/31] tune2fs: " Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 21/31] debugfs: " Darrick J. Wong
` (15 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Provide the user with an option to create an undo file so that they
can roll back a failed tuning operation. Previously, one would be
created if force_undo was set in the configuration file and a bunch of
(undocumented) conditions were met.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
misc/mke2fs.8.in | 15 +++++++++++++++
misc/mke2fs.c | 29 ++++++++++++++++++++++++++---
2 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index aeb5caf..3230f65 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -117,6 +117,10 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
.B \-e
.I errors-behavior
]
+[
+.B \-z
+.I undo_file
+]
.I device
[
.I fs-size
@@ -738,6 +742,17 @@ Verbose execution.
Print the version number of
.B mke2fs
and exit.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file. This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong. If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+mke2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable or the \fIundo_dir\fR directive
+in the configuration file.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
.SH ENVIRONMENT
.TP
.BI MKE2FS_SYNC
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index aeb852f..c421afb 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -110,6 +110,7 @@ char *journal_device;
static int sync_kludge; /* Set using the MKE2FS_SYNC env. option */
char **fs_types;
const char *root_dir; /* Copy files from the specified directory */
+static char *undo_file;
static profile_t profile;
@@ -129,7 +130,8 @@ static void usage(void)
"[-M last-mounted-directory]\n\t[-O feature[,...]] "
"[-r fs-revision] [-E extended-option[,...]]\n"
"\t[-t fs-type] [-T usage-type ] [-U UUID] [-e errors_behavior]"
- "[-jnqvDFKSV] device [blocks-count]\n"),
+ "[-z undo_file]\n"
+ "\t[-jnqvDFKSV] device [blocks-count]\n"),
program_name);
exit(1);
}
@@ -1551,7 +1553,7 @@ profile_error:
}
while ((c = getopt (argc, argv,
- "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
+ "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:Vz:")) != EOF) {
switch (c) {
case 'b':
blocksize = parse_num_blocks2(optarg, -1);
@@ -1774,6 +1776,9 @@ profile_error:
/* Print version number and exit */
show_version_only++;
break;
+ case 'z':
+ undo_file = optarg;
+ break;
default:
usage();
}
@@ -2492,6 +2497,23 @@ static int mke2fs_setup_tdb(const char *name, io_manager *io_ptr)
char *dev_name, *tmp_name;
int free_tdb_dir = 0;
+ if (undo_file && undo_file[0] != 0) {
+ if ((unlink(undo_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto err;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(undo_file);
+ if (retval)
+ goto err;
+ printf(_("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n"), undo_file, name);
+ return 0;
+ }
+
/*
* Configuration via a conf file would be
* nice
@@ -2546,6 +2568,7 @@ errout:
if (free_tdb_dir)
free(tdb_dir);
free(tdb_file);
+err:
com_err(program_name, retval, "%s",
_("while trying to setup undo file\n"));
return retval;
@@ -2717,7 +2740,7 @@ int main (int argc, char *argv[])
#endif
io_ptr = unix_io_manager;
- if (should_do_undo(device_name)) {
+ if (undo_file != NULL || should_do_undo(device_name)) {
retval = mke2fs_setup_tdb(device_name, &io_ptr);
if (retval)
exit(1);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 21/31] debugfs: optionally create undo file
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (19 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 20/31] mke2fs: " Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 22/31] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
` (14 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Provide the user with an option to create an undo file so that they
can roll back a failed debugfs expedition.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
debugfs/debugfs.8.in | 16 +++++++
debugfs/debugfs.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 116 insertions(+), 8 deletions(-)
diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index 04c280d..81899a3 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -31,6 +31,10 @@ request
data_source_device
]
[
+.B \-z
+.I undo_file
+]
+[
device
]
.SH DESCRIPTION
@@ -130,6 +134,16 @@ and then exit.
print the version number of
.B debugfs
and exit.
+.TP
+.BI \-z " undo_file"
+Before overwriting a file system block, write the old contents of the block to
+an undo file. This undo file can be used with e2undo(8) to restore the old
+contents of the file system should something go wrong. If the empty string is
+passed as the undo_file argument, the undo file will be written to a file named
+resize2fs-\fIdevice\fR.e2undo in the directory specified via the
+\fIE2FSPROGS_UNDO_DIR\fR environment variable.
+
+WARNING: The undo file cannot be used to recover from a power or system crash.
.SH SPECIFYING FILES
Many
.B debugfs
@@ -532,7 +546,7 @@ to those inodes. The
flag will enable checking the file type information in the directory
entry to make sure it matches the inode's type.
.TP
-.BI open " [-weficD] [-b blocksize] [-s superblock] device"
+.BI open " [-weficD] [-b blocksize] [-s superblock] [-z undo_file] device"
Open a filesystem for editing. The
.I -f
flag forces the filesystem to be opened even if there are some unknown
diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index fe57366..7576a1a 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -48,12 +48,92 @@ ext2_filsys current_fs;
quota_ctx_t current_qctx;
ext2_ino_t root, cwd;
+static int debugfs_setup_tdb(const char *device_name, char *undo_file,
+ io_manager *io_ptr)
+{
+ errcode_t retval = ENOMEM;
+ char *tdb_dir = NULL, *tdb_file = NULL;
+ char *dev_name, *tmp_name;
+ int free_tdb_dir = 0;
+
+ if (undo_file && undo_file[0] != 0) {
+ if ((unlink(undo_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto err;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(undo_file);
+ if (retval)
+ goto err;
+ printf("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n",
+ undo_file, device_name);
+ return 0;
+ }
+
+ /*
+ * Configuration via a conf file would be
+ * nice
+ */
+ tdb_dir = getenv("E2FSPROGS_UNDO_DIR");
+
+ if (tdb_dir == NULL || !strcmp(tdb_dir, "none") || (tdb_dir[0] == 0) ||
+ access(tdb_dir, W_OK)) {
+ if (free_tdb_dir)
+ free(tdb_dir);
+ return 0;
+ }
+
+ tmp_name = strdup(device_name);
+ if (!tmp_name)
+ goto errout;
+ dev_name = basename(tmp_name);
+ tdb_file = malloc(strlen(tdb_dir) + 8 + strlen(dev_name) + 7 + 1);
+ if (!tdb_file) {
+ free(tmp_name);
+ goto errout;
+ }
+ sprintf(tdb_file, "%s/debugfs-%s.e2undo", tdb_dir, dev_name);
+ free(tmp_name);
+
+ if ((unlink(tdb_file) < 0) && (errno != ENOENT)) {
+ retval = errno;
+ goto errout;
+ }
+
+ set_undo_io_backing_manager(*io_ptr);
+ *io_ptr = undo_io_manager;
+ retval = set_undo_io_backup_file(tdb_file);
+ if (retval)
+ goto errout;
+ printf("Overwriting existing filesystem; this can be undone "
+ "using the command:\n"
+ " e2undo %s %s\n\n", tdb_file, device_name);
+
+ if (free_tdb_dir)
+ free(tdb_dir);
+ free(tdb_file);
+ return 0;
+
+errout:
+ if (free_tdb_dir)
+ free(tdb_dir);
+ free(tdb_file);
+err:
+ com_err("debugfs", retval, "while trying to setup undo file\n");
+ return retval;
+}
+
static void open_filesystem(char *device, int open_flags, blk64_t superblock,
blk64_t blocksize, int catastrophic,
- char *data_filename)
+ char *data_filename, char *undo_file)
{
int retval;
io_channel data_io = 0;
+ io_manager io_ptr = unix_io_manager;
if (superblock != 0 && blocksize == 0) {
com_err(device, 0, "if you specify the superblock, you must also specify the block size");
@@ -84,8 +164,14 @@ static void open_filesystem(char *device, int open_flags, blk64_t superblock,
if (catastrophic)
open_flags |= EXT2_FLAG_SKIP_MMP;
+ if (undo_file) {
+ retval = debugfs_setup_tdb(device, undo_file, &io_ptr);
+ if (retval)
+ exit(1);
+ }
+
retval = ext2fs_open(device, open_flags, superblock, blocksize,
- unix_io_manager, ¤t_fs);
+ io_ptr, ¤t_fs);
if (retval) {
com_err(device, retval, "while opening filesystem");
if (retval == EXT2_ET_BAD_MAGIC)
@@ -136,9 +222,10 @@ void do_open_filesys(int argc, char **argv)
blk64_t blocksize = 0;
int open_flags = EXT2_FLAG_SOFTSUPP_FEATURES | EXT2_FLAG_64BITS;
char *data_filename = 0;
+ char *undo_file = NULL;
reset_getopt();
- while ((c = getopt (argc, argv, "iwfecb:s:d:D")) != EOF) {
+ while ((c = getopt (argc, argv, "iwfecb:s:d:Dz:")) != EOF) {
switch (c) {
case 'i':
open_flags |= EXT2_FLAG_IMAGE_FILE;
@@ -177,6 +264,9 @@ void do_open_filesys(int argc, char **argv)
if (err)
return;
break;
+ case 'z':
+ undo_file = optarg;
+ break;
default:
goto print_usage;
}
@@ -188,7 +278,7 @@ void do_open_filesys(int argc, char **argv)
return;
open_filesystem(argv[optind], open_flags,
superblock, blocksize, catastrophic,
- data_filename);
+ data_filename, undo_file);
return;
print_usage:
@@ -2219,7 +2309,7 @@ int main(int argc, char **argv)
"Usage: %s [-b blocksize] [-s superblock] [-f cmd_file] "
"[-R request] [-V] ["
#ifndef READ_ONLY
- "[-w] "
+ "[-w] [-z undo_file] "
#endif
"[-c] device]";
int c;
@@ -2234,7 +2324,8 @@ int main(int argc, char **argv)
#ifdef READ_ONLY
const char *opt_string = "nicR:f:b:s:Vd:D";
#else
- const char *opt_string = "niwcR:f:b:s:Vd:D";
+ const char *opt_string = "niwcR:f:b:s:Vd:Dz:";
+ char *undo_file = NULL;
#endif
if (debug_prog_name == 0)
@@ -2291,6 +2382,9 @@ int main(int argc, char **argv)
fprintf(stderr, "\tUsing %s\n",
error_message(EXT2_ET_BASE));
exit(0);
+ case 'z':
+ undo_file = optarg;
+ break;
default:
com_err(argv[0], 0, usage, debug_prog_name);
return 1;
@@ -2299,7 +2393,7 @@ int main(int argc, char **argv)
if (optind < argc)
open_filesystem(argv[optind], open_flags,
superblock, blocksize, catastrophic,
- data_filename);
+ data_filename, undo_file);
sci_idx = ss_create_invocation(debug_prog_name, "0.0", (char *) NULL,
&debug_cmds, &retval);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 22/31] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (20 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 21/31] debugfs: " Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 23/31] tests: test various features of the new e2undo format Darrick J. Wong
` (13 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Regression tests to ensure that we can create undo files and roll
things back if need be.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
tests/test_config | 1
tests/u_compound_rollback/script | 62 ++++++++++++
tests/u_debugfs_opt/script | 32 ++++++
tests/u_e2fsck_opt/script | 32 ++++++
tests/u_mke2fs/script | 2
tests/u_mke2fs_opt/script | 32 ++++++
tests/u_mke2fs_opt_oddsize/script | 31 ++++++
tests/u_resize2fs_opt/script | 32 ++++++
tests/u_revert_upgrade_to_64bitmcsum/script | 136 +++++++++++++++++++++++++++
tests/u_tune2fs/script | 2
tests/u_tune2fs_opt/script | 32 ++++++
11 files changed, 392 insertions(+), 2 deletions(-)
create mode 100644 tests/u_compound_rollback/script
create mode 100644 tests/u_debugfs_opt/script
create mode 100644 tests/u_e2fsck_opt/script
create mode 100644 tests/u_mke2fs_opt/script
create mode 100644 tests/u_mke2fs_opt_oddsize/script
create mode 100644 tests/u_resize2fs_opt/script
create mode 100644 tests/u_revert_upgrade_to_64bitmcsum/script
create mode 100644 tests/u_tune2fs_opt/script
diff --git a/tests/test_config b/tests/test_config
index 2e3af6b..7f39157 100644
--- a/tests/test_config
+++ b/tests/test_config
@@ -17,6 +17,7 @@ TEST_BITS="../debugfs/debugfs"
RESIZE2FS_EXE="../resize/resize2fs"
RESIZE2FS="$USE_VALGRIND $RESIZE2FS_EXE"
E2UNDO_EXE="../misc/e2undo"
+E2UNDO="$USE_VALGRIND $E2UNDO_EXE"
TEST_REL=../tests/progs/test_rel
TEST_ICOUNT=../tests/progs/test_icount
CRCSUM=../tests/progs/crcsum
diff --git a/tests/u_compound_rollback/script b/tests/u_compound_rollback/script
new file mode 100644
index 0000000..df658e7
--- /dev/null
+++ b/tests/u_compound_rollback/script
@@ -0,0 +1,62 @@
+test_description="e2undo with mke2fs/tune2fs/resize2fs/e2fsck -z"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+echo compound e2undo rollback test > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc2 >> $OUT
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.2 $TMPFILE 512 >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc3 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+
+echo roll back e2fsck >> $OUT
+$E2UNDO $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc3_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck $crc3_2 >> $OUT
+
+echo roll back resize2fs >> $OUT
+$E2UNDO $TDB_FILE.2 $TMPFILE >> $OUT 2>&1
+crc2_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo resize2fs $crc2_2 >> $OUT
+
+echo roll back tune2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo tune2fs $crc1_2 >> $OUT
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+
+if [ $crc0 = $crc0_2 ] && [ $crc1 = $crc1_2 ] && [ $crc2 = $crc2_2 ] && [ $crc3 = $crc3_2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TDB_FILE.2 $TDB_FILE.3 $TMPFILE
+fi
diff --git a/tests/u_debugfs_opt/script b/tests/u_debugfs_opt/script
new file mode 100644
index 0000000..173e5d8
--- /dev/null
+++ b/tests/u_debugfs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with debugfs -z"
+if test -x $E2UNDO_EXE -a -x $DEBUGFS_EXE; then
+
+TDB_FILE=/tmp/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before debugfs $crc0 >> $OUT
+
+echo using debugfs to test e2undo >> $OUT
+$DEBUGFS -w -z $TDB_FILE -R 'zap -p 0x55 0' $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after debugfs $crc1 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_e2fsck_opt/script b/tests/u_e2fsck_opt/script
new file mode 100644
index 0000000..ad7aa8d
--- /dev/null
+++ b/tests/u_e2fsck_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with e2fsck -z"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/e2fsck-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before e2fsck $crc0 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc1 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_mke2fs/script b/tests/u_mke2fs/script
index d249ddd..e93cde6 100644
--- a/tests/u_mke2fs/script
+++ b/tests/u_mke2fs/script
@@ -19,7 +19,7 @@ $MKE2FS -q -F -o Linux -I 256 -O uninit_bg -E lazy_itable_init=1 -b 1024 $TMPFIL
new_crc=`$CRCSUM $TMPFILE`
echo $CRCSUM after mke2fs $new_crc >> $OUT
-$E2UNDO_EXE $TDB_FILE $TMPFILE >> $OUT 2>&1
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
new_crc=`$CRCSUM $TMPFILE`
echo $CRCSUM after e2undo $new_crc >> $OUT
diff --git a/tests/u_mke2fs_opt/script b/tests/u_mke2fs_opt/script
new file mode 100644
index 0000000..aa367f7
--- /dev/null
+++ b/tests/u_mke2fs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with mke2fs -z"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/mke2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -F -o Linux -I 128 -b 1024 test.img > $OUT
+$MKE2FS -F -o Linux -I 128 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+echo using mke2fs to test e2undo >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -E lazy_itable_init=1 -b 1024 -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_mke2fs_opt_oddsize/script b/tests/u_mke2fs_opt_oddsize/script
new file mode 100644
index 0000000..464fa9e
--- /dev/null
+++ b/tests/u_mke2fs_opt_oddsize/script
@@ -0,0 +1,31 @@
+test_description="e2undo with mke2fs -z and non-32k-aligned bdev size"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/mke2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+yes "abc123abc123abc" | dd bs=1k count=8 >> $TMPFILE 2> /dev/null
+
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 > $OUT
+
+echo using mke2fs to test e2undo >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -E lazy_itable_init=1 -b 1024 -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_resize2fs_opt/script b/tests/u_resize2fs_opt/script
new file mode 100644
index 0000000..0455bd0
--- /dev/null
+++ b/tests/u_resize2fs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with resize2fs -z"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE 256 > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE 256 >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before resize2fs $crc0 >> $OUT
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE $TMPFILE 512 >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc1 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_revert_upgrade_to_64bitmcsum/script b/tests/u_revert_upgrade_to_64bitmcsum/script
new file mode 100644
index 0000000..8c01558
--- /dev/null
+++ b/tests/u_revert_upgrade_to_64bitmcsum/script
@@ -0,0 +1,136 @@
+test_description="convert fs to 64bit,metadata_csum and revert both changes"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+fail=0
+
+echo convert fs to 64bit,metadata_csum and revert both changes > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+CONF=$TMPFILE.conf
+cat > $CONF << ENDL
+[fs_types]
+ ext4h = {
+ features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,sparse_super,filetype,dir_index,ext_attr,resize_inode
+ blocksize = 4096
+ inode_size = 256
+ make_hugefiles = true
+ hugefiles_dir = /
+ hugefiles_slack = 0
+ hugefiles_name = aaaaa
+ hugefiles_digits = 4
+ hugefiles_size = 1M
+ zero_hugefiles = false
+ }
+ENDL
+
+echo mke2fs -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE 524288 >> $OUT
+MKE2FS_CONFIG=$CONF $MKE2FS -q -F -o Linux -T ext4h -O ^metadata_csum,^64bit -E lazy_itable_init=1 -b 4096 -z $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should not have 64bit or metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.1 -b $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should have 64bit but not metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.2 $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc3 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should have 64bit and metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should have 64bit and metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back e2fsck >> $OUT
+$E2UNDO $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc3_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck $crc3_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -lt 1 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should have 64bit and metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back tune2fs >> $OUT
+$E2UNDO $TDB_FILE.2 $TMPFILE >> $OUT 2>&1
+crc2_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo tune2fs $crc2_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -lt 1 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should have 64bit but not metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back resize2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo resize2fs $crc1_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ "$(echo "${features}" | grep "metadata_csum" -c)" -gt 0 ] || [ "$(echo "${features}" | grep 64bit -c)" -gt 0 ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should not have 64bit or metadata_csum set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 || fail=1
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+features="$($DUMPE2FS -h $TMPFILE 2> /dev/null | grep 'Filesystem features:')"
+if [ -n "${features}" ]; then
+ echo "FS features: ${features}" >> $OUT
+ echo "Should not have any features set" >> $OUT
+ fail=1
+fi
+$FSCK -f -n $TMPFILE >> $OUT 2>&1 && fail=1
+
+if [ $fail -eq 0 ] && [ $crc0 = $crc0_2 ] && [ $crc1 = $crc1_2 ] && [ $crc2 = $crc2_2 ] && [ $crc3 = $crc3_2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TDB_FILE.2 $TDB_FILE.3 $TMPFILE $CONF
+fi
diff --git a/tests/u_tune2fs/script b/tests/u_tune2fs/script
index a443f5a..d6f5e66 100644
--- a/tests/u_tune2fs/script
+++ b/tests/u_tune2fs/script
@@ -19,7 +19,7 @@ $TUNE2FS -I 256 $TMPFILE >> $OUT 2>&1
new_crc=`$CRCSUM $TMPFILE`
echo $CRCSUM after tune2fs $new_crc >> $OUT
-$E2UNDO_EXE $TDB_FILE $TMPFILE >> $OUT 2>&1
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
new_crc=`$CRCSUM $TMPFILE`
echo $CRCSUM after e2undo $new_crc >> $OUT
diff --git a/tests/u_tune2fs_opt/script b/tests/u_tune2fs_opt/script
new file mode 100644
index 0000000..be1b6bf
--- /dev/null
+++ b/tests/u_tune2fs_opt/script
@@ -0,0 +1,32 @@
+test_description="e2undo with tune2fs -z"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 = $crc2 ] && [ $crc1 != $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 23/31] tests: test various features of the new e2undo format
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (21 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 22/31] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 24/31] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
` (12 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Verify that the header, checksum, and wrong-order rollback detection
features of the new e2undo actually work.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
tests/u_compound_bad_rollback/script | 62 +++++++++++++++++++++++++++++++++
tests/u_corrupt_blk_csum/script | 38 ++++++++++++++++++++
tests/u_corrupt_blk_csum_force/script | 38 ++++++++++++++++++++
tests/u_corrupt_hdr_csum/script | 37 ++++++++++++++++++++
tests/u_corrupt_key_csum/script | 37 ++++++++++++++++++++
tests/u_dryrun/script | 32 +++++++++++++++++
tests/u_force/script | 38 ++++++++++++++++++++
tests/u_force_dryrun/script | 38 ++++++++++++++++++++
tests/u_not_undo/script | 28 +++++++++++++++
tests/u_wrong_fs/script | 36 +++++++++++++++++++
10 files changed, 384 insertions(+)
create mode 100644 tests/u_compound_bad_rollback/script
create mode 100644 tests/u_corrupt_blk_csum/script
create mode 100644 tests/u_corrupt_blk_csum_force/script
create mode 100644 tests/u_corrupt_hdr_csum/script
create mode 100644 tests/u_corrupt_key_csum/script
create mode 100644 tests/u_dryrun/script
create mode 100644 tests/u_force/script
create mode 100644 tests/u_force_dryrun/script
create mode 100644 tests/u_not_undo/script
create mode 100644 tests/u_wrong_fs/script
diff --git a/tests/u_compound_bad_rollback/script b/tests/u_compound_bad_rollback/script
new file mode 100644
index 0000000..adb6e4a
--- /dev/null
+++ b/tests/u_compound_bad_rollback/script
@@ -0,0 +1,62 @@
+test_description="e2undo with mke2fs/tune2fs/resize2fs/e2fsck -z"
+if test -x $RESIZE2FS_EXE -a -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/resize2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+echo compound e2undo rollback test > $OUT
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before mke2fs $crc0 >> $OUT
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 -z $TDB_FILE.0 $TMPFILE 256 >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after mke2fs $crc1 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc2 >> $OUT
+
+echo using resize2fs to test e2undo >> $OUT
+$RESIZE2FS -z $TDB_FILE.2 $TMPFILE 512 >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after resize2fs $crc3 >> $OUT
+
+echo using e2fsck to test e2undo >> $OUT
+$FSCK -f -y -D -z $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc4=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2fsck $crc4 >> $OUT
+
+echo roll back mke2fs >> $OUT
+$E2UNDO $TDB_FILE.0 $TMPFILE >> $OUT 2>&1
+crc0_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo mke2fs $crc0_2 >> $OUT
+
+echo roll back tune2fs >> $OUT
+$E2UNDO $TDB_FILE.1 $TMPFILE >> $OUT 2>&1
+crc1_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo tune2fs $crc1_2 >> $OUT
+
+echo roll back resize2fs >> $OUT
+$E2UNDO $TDB_FILE.2 $TMPFILE >> $OUT 2>&1
+crc2_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo resize2fs $crc2_2 >> $OUT
+
+echo roll back e2fsck >> $OUT
+$E2UNDO $TDB_FILE.3 $TMPFILE >> $OUT 2>&1
+crc3_2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo e2fsck $crc3_2 >> $OUT
+
+if [ $crc4 = $crc0_2 ] && [ $crc4 = $crc1_2 ] && [ $crc4 = $crc2_2 ] && [ $crc3 = $crc3_2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE.0 $TDB_FILE.1 $TDB_FILE.2 $TDB_FILE.3 $TMPFILE
+fi
diff --git a/tests/u_corrupt_blk_csum/script b/tests/u_corrupt_blk_csum/script
new file mode 100644
index 0000000..40146fc
--- /dev/null
+++ b/tests/u_corrupt_blk_csum/script
@@ -0,0 +1,38 @@
+test_description="corrupt e2undo block data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=/tmp
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 2)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $res -ne 0 ] && [ $crc2 = $crc1 ] && [ $crc2 != $crc0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_corrupt_blk_csum_force/script b/tests/u_corrupt_blk_csum_force/script
new file mode 100644
index 0000000..f0b28d5
--- /dev/null
+++ b/tests/u_corrupt_blk_csum_force/script
@@ -0,0 +1,38 @@
+test_description="force replay of corrupt e2undo block data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=/tmp
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 2)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO -f $TDB_FILE $TMPFILE >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc2 != $crc1 ] && [ $crc2 != $crc0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_corrupt_hdr_csum/script b/tests/u_corrupt_hdr_csum/script
new file mode 100644
index 0000000..41c0cbe
--- /dev/null
+++ b/tests/u_corrupt_hdr_csum/script
@@ -0,0 +1,37 @@
+test_description="corrupt e2undo header"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=/tmp
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+dd if=/dev/zero of=$TDB_FILE bs=256 count=1 seek=1 conv=notrunc > /dev/null 2>&1
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $res -ne 0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_corrupt_key_csum/script b/tests/u_corrupt_key_csum/script
new file mode 100644
index 0000000..0664e17
--- /dev/null
+++ b/tests/u_corrupt_key_csum/script
@@ -0,0 +1,37 @@
+test_description="corrupt e2undo key data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=/tmp
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 1)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc0 != $crc1 ] && [ $crc1 = $crc2 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_dryrun/script b/tests/u_dryrun/script
new file mode 100644
index 0000000..d6de362
--- /dev/null
+++ b/tests/u_dryrun/script
@@ -0,0 +1,32 @@
+test_description="e2undo dry run"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$E2UNDO -n $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc1 = $crc2 ] && [ $crc1 != $crc0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_force/script b/tests/u_force/script
new file mode 100644
index 0000000..7402294
--- /dev/null
+++ b/tests/u_force/script
@@ -0,0 +1,38 @@
+test_description="e2undo force"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+dd if=/dev/zero of=$TDB_FILE bs=4 count=1 seek=127 conv=notrunc 2> /dev/null
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+$E2UNDO -f $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo -f $crc3 >> $OUT
+
+if [ $crc0 = $crc3 ] && [ $crc1 = $crc2 ] && [ $crc2 != $crc0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_force_dryrun/script b/tests/u_force_dryrun/script
new file mode 100644
index 0000000..9fd847e
--- /dev/null
+++ b/tests/u_force_dryrun/script
@@ -0,0 +1,38 @@
+test_description="force dry-run replay of corrupt e2undo block data"
+if test -x $E2UNDO_EXE; then
+
+E2FSPROGS_UNDO_DIR=/tmp
+export E2FSPROGS_UNDO_DIR
+TDB_FILE=$E2FSPROGS_UNDO_DIR/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -I 128 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -I 256 $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+undo_blks=$(( $(stat -c '%s' $TDB_FILE) / 1024 ))
+dd if=/dev/zero of=$TDB_FILE bs=1024 count=1 seek=$((undo_blks - 2)) conv=notrunc > /dev/null 2>&1
+
+$E2UNDO -f -n $TDB_FILE $TMPFILE >> $OUT 2>&1
+res=$?
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc2 >> $OUT
+
+if [ $crc2 = $crc1 ] && [ $crc2 != $crc0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_not_undo/script b/tests/u_not_undo/script
new file mode 100644
index 0000000..2331c55
--- /dev/null
+++ b/tests/u_not_undo/script
@@ -0,0 +1,28 @@
+test_description="e2undo a non-undo file"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+dd if=/dev/zero of=$TDB_FILE bs=1k count=512 > /dev/null 2>&1
+
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before e2undo $crc0 > $OUT
+
+od -tx1 -Ad -c < $TDB_FILE >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc3 >> $OUT
+
+if [ $crc3 = $crc0 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
diff --git a/tests/u_wrong_fs/script b/tests/u_wrong_fs/script
new file mode 100644
index 0000000..6a96b20
--- /dev/null
+++ b/tests/u_wrong_fs/script
@@ -0,0 +1,36 @@
+test_description="e2undo on the wrong fs"
+if test -x $E2UNDO_EXE; then
+
+TDB_FILE=/tmp/tune2fs-$(basename $TMPFILE).e2undo
+OUT=$test_name.log
+rm -f $TDB_FILE >/dev/null 2>&1
+
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+
+echo mke2fs -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE > $OUT
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc0=`$CRCSUM $TMPFILE`
+echo $CRCSUM before tune2fs $crc0 >> $OUT
+
+echo using tune2fs to test e2undo >> $OUT
+$TUNE2FS -O metadata_csum -z $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc1=`$CRCSUM $TMPFILE`
+echo $CRCSUM after tune2fs $crc1 >> $OUT
+
+$MKE2FS -q -F -o Linux -T ext4 -O ^metadata_csum,64bit -E lazy_itable_init=1 -b 1024 $TMPFILE >> $OUT 2>&1
+crc2=`$CRCSUM $TMPFILE`
+echo $CRCSUM after re-mke2fs $crc2 >> $OUT
+
+$E2UNDO $TDB_FILE $TMPFILE >> $OUT 2>&1
+crc3=`$CRCSUM $TMPFILE`
+echo $CRCSUM after e2undo $crc3 >> $OUT
+
+if [ $crc3 = $crc2 ] && [ $crc2 != $crc1 ]; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ ln -f $test_name.log $test_name.failed
+ echo "$test_name: $test_description: failed"
+fi
+rm -f $TDB_FILE $TMPFILE
+fi
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 24/31] libext2fs: support allocating uninit blocks in bmap2()
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (22 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 23/31] tests: test various features of the new e2undo format Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 25/31] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
` (11 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
As part of supporting fallocate-like functionality, extend
ext2fs_bmap() with two flags -- BMAP_UNINIT and BMAP_ZERO. The first
will cause it to mark/set a block uninitialized, if it's part of an
extent based file. For a block mapped file, the mapping is put in,
but there is no way to remember the uninitialized status. The second
flag causes the block to be zeroed to support the use case of
emulating uninitialized blocks on a block-map file by zeroing them.
Eventually fallocate or fuse2fs or somebody will use these.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/bmap.c | 9 +++++++--
lib/ext2fs/ext2fs.h | 2 ++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/lib/ext2fs/bmap.c b/lib/ext2fs/bmap.c
index cb3f5a1..c18f742 100644
--- a/lib/ext2fs/bmap.c
+++ b/lib/ext2fs/bmap.c
@@ -214,10 +214,13 @@ static errcode_t extent_bmap(ext2_filsys fs, ext2_ino_t ino,
errcode_t retval = 0;
blk64_t blk64 = 0;
int alloc = 0;
+ int set_flags;
+
+ set_flags = bmap_flags & BMAP_UNINIT ? EXT2_EXTENT_SET_BMAP_UNINIT : 0;
if (bmap_flags & BMAP_SET) {
retval = ext2fs_extent_set_bmap(handle, block,
- *phys_blk, 0);
+ *phys_blk, set_flags);
return retval;
}
retval = ext2fs_extent_goto(handle, block);
@@ -254,7 +257,7 @@ got_block:
alloc++;
set_extent:
retval = ext2fs_extent_set_bmap(handle, block,
- blk64, 0);
+ blk64, set_flags);
if (retval) {
ext2fs_block_alloc_stats2(fs, blk64, -1);
return retval;
@@ -441,6 +444,8 @@ errcode_t ext2fs_bmap2(ext2_filsys fs, ext2_ino_t ino, struct ext2_inode *inode,
if (retval == 0)
*phys_blk = blk32;
done:
+ if (*phys_blk && retval == 0 && (bmap_flags & BMAP_ZERO))
+ retval = ext2fs_zero_blocks2(fs, *phys_blk, 1, NULL, NULL);
if (buf)
ext2fs_free_mem(&buf);
if (handle)
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 9c2259a..e117441 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -527,6 +527,8 @@ typedef struct ext2_icount *ext2_icount_t;
*/
#define BMAP_ALLOC 0x0001
#define BMAP_SET 0x0002
+#define BMAP_UNINIT 0x0004
+#define BMAP_ZERO 0x0008
/*
* Returned flags from ext2fs_bmap
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 25/31] libext2fs: find/alloc a range of empty blocks
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (23 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 24/31] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 26/31] libext2fs: add new hooks to support large allocations Darrick J. Wong
` (10 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Provide a function that, given a goal pblk and a range, will try to
find a run of free blocks to satisfy the allocation. By default the
function will look anywhere in the filesystem for the run, though this
can be constrained with optional flags. One flag indicates that the
range must start at the goal block; the other flag indicates that we
should not return a range shorter than len.
v2: Add a second function to allocate a range of blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/alloc.c | 141 +++++++++++++++++++++++++++++++++++++++++++++++++++
lib/ext2fs/ext2fs.h | 11 ++++
2 files changed, 152 insertions(+)
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 9901ca5..4c3b620 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -26,6 +26,16 @@
#include "ext2_fs.h"
#include "ext2fs.h"
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
/*
* Clear the uninit block bitmap flag if necessary
*/
@@ -346,3 +356,134 @@ no_blocks:
group = group & ~((1 << (log_flex)) - 1);
return ext2fs_group_first_block2(fs, group);
}
+
+/*
+ * Starting at _goal_, scan around the filesystem to find a run of free blocks
+ * that's at least _len_ blocks long. Possible flags:
+ * - EXT2_NEWRANGE_EXACT_GOAL: The range of blocks must start at _goal_.
+ * - EXT2_NEWRANGE_MIN_LENGTH: do not return a allocation shorter than _len_.
+ * - EXT2_NEWRANGE_ZERO_BLOCKS: Zero blocks pblk to pblk+plen before returning.
+ *
+ * The starting block is returned in _pblk_ and the length is returned via
+ * _plen_. The blocks are not marked in the bitmap; the caller must mark
+ * however much of the returned run they actually use, hopefully via
+ * ext2fs_block_alloc_stats_range().
+ *
+ * This function can return a range that is longer than what was requested.
+ */
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+ blk64_t *plen)
+{
+ errcode_t retval;
+ blk64_t start, end, b;
+ int looped = 0;
+ blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+
+ dbg_printf("%s: flags=0x%x goal=%llu len=%llu\n", __func__, flags,
+ goal, len);
+ EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+ if (len == 0 || (flags & ~EXT2_NEWRANGE_ALL_FLAGS))
+ return EXT2_ET_INVALID_ARGUMENT;
+ if (!map)
+ map = fs->block_map;
+ if (!map)
+ return EXT2_ET_NO_BLOCK_BITMAP;
+ if (!goal || goal >= ext2fs_blocks_count(fs->super))
+ goal = fs->super->s_first_data_block;
+
+ start = goal;
+ while (!looped || start <= goal) {
+ retval = ext2fs_find_first_zero_block_bitmap2(map, start,
+ max_blocks - 1,
+ &start);
+ if (retval == ENOENT) {
+ /*
+ * If there are no free blocks beyond the starting
+ * point, try scanning the whole filesystem, unless the
+ * user told us only to allocate from _goal_, or if
+ * we're already scanning the whole filesystem.
+ */
+ if (flags & EXT2_NEWRANGE_FIXED_GOAL ||
+ start == fs->super->s_first_data_block)
+ goto fail;
+ start = fs->super->s_first_data_block;
+ continue;
+ } else if (retval)
+ goto errout;
+
+ if (flags & EXT2_NEWRANGE_FIXED_GOAL && start != goal)
+ goto fail;
+
+ b = min(start + len - 1, max_blocks - 1);
+ retval = ext2fs_find_first_set_block_bitmap2(map, start, b,
+ &end);
+ if (retval == ENOENT)
+ end = b + 1;
+ else if (retval)
+ goto errout;
+
+ if (!(flags & EXT2_NEWRANGE_MIN_LENGTH) ||
+ (end - start) >= len) {
+ /* Success! */
+ *pblk = start;
+ *plen = end - start;
+ dbg_printf("%s: new_range goal=%llu--%llu "
+ "blk=%llu--%llu %llu\n",
+ __func__, goal, goal + len - 1,
+ *pblk, *pblk + *plen - 1, *plen);
+
+ for (b = start; b < end;
+ b += fs->super->s_blocks_per_group)
+ clear_block_uninit(fs,
+ ext2fs_group_of_blk2(fs, b));
+ return 0;
+ }
+
+ if (flags & EXT2_NEWRANGE_FIXED_GOAL)
+ goto fail;
+ start = end;
+ if (start >= max_blocks) {
+ if (looped)
+ goto fail;
+ looped = 1;
+ start = fs->super->s_first_data_block;
+ }
+ }
+
+fail:
+ retval = EXT2_ET_BLOCK_ALLOC_FAIL;
+errout:
+ return retval;
+}
+
+errcode_t ext2fs_alloc_range(ext2_filsys fs, int flags, blk64_t goal,
+ blk_t len, blk64_t *ret)
+{
+ int newr_flags = EXT2_NEWRANGE_MIN_LENGTH;
+ errcode_t retval;
+ blk64_t plen;
+
+ EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
+ if (len == 0 || (flags & ~EXT2_ALLOCRANGE_ALL_FLAGS))
+ return EXT2_ET_INVALID_ARGUMENT;
+
+ if (flags & EXT2_ALLOCRANGE_FIXED_GOAL)
+ newr_flags |= EXT2_NEWRANGE_FIXED_GOAL;
+
+ retval = ext2fs_new_range(fs, newr_flags, goal, len, NULL, ret, &plen);
+ if (retval)
+ return retval;
+
+ if (plen < len)
+ return EXT2_ET_BLOCK_ALLOC_FAIL;
+
+ if (flags & EXT2_ALLOCRANGE_ZERO_BLOCKS) {
+ retval = ext2fs_zero_blocks2(fs, *ret, len, NULL, NULL);
+ if (retval)
+ return retval;
+ }
+
+ ext2fs_block_alloc_stats_range(fs, *ret, len, +1);
+ return retval;
+}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index e117441..bfe0ecf 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -693,6 +693,17 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
blk64_t *ret));
blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino,
struct ext2_inode *inode, blk64_t lblk);
+#define EXT2_NEWRANGE_FIXED_GOAL (0x1)
+#define EXT2_NEWRANGE_MIN_LENGTH (0x2)
+#define EXT2_NEWRANGE_ALL_FLAGS (0x3)
+errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, ext2fs_block_bitmap map, blk64_t *pblk,
+ blk64_t *plen);
+#define EXT2_ALLOCRANGE_FIXED_GOAL (0x1)
+#define EXT2_ALLOCRANGE_ZERO_BLOCKS (0x2)
+#define EXT2_ALLOCRANGE_ALL_FLAGS (0x3)
+errcode_t ext2fs_alloc_range(ext2_filsys fs, int flags, blk64_t goal,
+ blk_t len, blk64_t *ret);
/* alloc_sb.c */
extern int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 26/31] libext2fs: add new hooks to support large allocations
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (24 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 25/31] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 27/31] libext2fs: implement fallocate Darrick J. Wong
` (9 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Add a new get_alloc_blocks hook and a block_alloc_stats_range hook so
that e2fsck can capture allocation requests spanning more than a
block to its block_found_map.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/pass1.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
lib/ext2fs/alloc.c | 37 ++++++++++++++++++++++++++++++++++++-
lib/ext2fs/alloc_stats.c | 16 ++++++++++++++++
lib/ext2fs/ext2fs.h | 16 ++++++++++++++++
4 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index fd10c72..dc5c4c6 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -3871,6 +3871,26 @@ static errcode_t e2fsck_get_alloc_block(ext2_filsys fs, blk64_t goal,
return (0);
}
+static errcode_t e2fsck_new_range(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen)
+{
+ e2fsck_t ctx = (e2fsck_t) fs->priv_data;
+ errcode_t retval;
+
+ if (ctx->block_found_map)
+ return ext2fs_new_range(fs, flags, goal, len,
+ ctx->block_found_map, pblk, plen);
+
+ if (!fs->block_map) {
+ retval = ext2fs_read_block_bitmap(fs);
+ if (retval)
+ return retval;
+ }
+
+ return ext2fs_new_range(fs, flags, goal, len, fs->block_map,
+ pblk, plen);
+}
+
static void e2fsck_block_alloc_stats(ext2_filsys fs, blk64_t blk, int inuse)
{
e2fsck_t ctx = (e2fsck_t) fs->priv_data;
@@ -3890,6 +3910,28 @@ static void e2fsck_block_alloc_stats(ext2_filsys fs, blk64_t blk, int inuse)
}
}
+static void e2fsck_block_alloc_stats_range(ext2_filsys fs, blk64_t blk,
+ blk_t num, int inuse)
+{
+ e2fsck_t ctx = (e2fsck_t) fs->priv_data;
+
+ /* Never free a critical metadata block */
+ if (ctx->block_found_map &&
+ ctx->block_metadata_map &&
+ inuse < 0 &&
+ ext2fs_test_block_bitmap_range2(ctx->block_metadata_map, blk, num))
+ return;
+
+ if (ctx->block_found_map) {
+ if (inuse > 0)
+ ext2fs_mark_block_bitmap_range2(ctx->block_found_map,
+ blk, num);
+ else
+ ext2fs_unmark_block_bitmap_range2(ctx->block_found_map,
+ blk, num);
+ }
+}
+
void e2fsck_use_inode_shortcuts(e2fsck_t ctx, int use_shortcuts)
{
ext2_filsys fs = ctx->fs;
@@ -3913,4 +3955,7 @@ void e2fsck_intercept_block_allocations(e2fsck_t ctx)
ext2fs_set_alloc_block_callback(ctx->fs, e2fsck_get_alloc_block, 0);
ext2fs_set_block_alloc_stats_callback(ctx->fs,
e2fsck_block_alloc_stats, 0);
+ ext2fs_set_new_range_callback(ctx->fs, e2fsck_new_range, NULL);
+ ext2fs_set_block_alloc_stats_range_callback(ctx->fs,
+ e2fsck_block_alloc_stats_range, NULL);
}
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 4c3b620..86e7f99 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -379,12 +379,32 @@ errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
blk64_t start, end, b;
int looped = 0;
blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+ errcode_t (*nrf)(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen);
dbg_printf("%s: flags=0x%x goal=%llu len=%llu\n", __func__, flags,
goal, len);
EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
if (len == 0 || (flags & ~EXT2_NEWRANGE_ALL_FLAGS))
return EXT2_ET_INVALID_ARGUMENT;
+
+ if (!map && fs->new_range) {
+ /*
+ * In case there are clients out there whose new_range
+ * handlers call ext2fs_new_range with a NULL block map,
+ * temporarily swap out the function pointer so that we don't
+ * end up in an infinite loop.
+ */
+ nrf = fs->new_range;
+ fs->new_range = NULL;
+ retval = nrf(fs, flags, goal, len, pblk, plen);
+ fs->new_range = nrf;
+ if (retval)
+ return retval;
+ start = *pblk;
+ end = *pblk + *plen;
+ goto allocated;
+ }
if (!map)
map = fs->block_map;
if (!map)
@@ -432,7 +452,7 @@ errcode_t ext2fs_new_range(ext2_filsys fs, int flags, blk64_t goal,
"blk=%llu--%llu %llu\n",
__func__, goal, goal + len - 1,
*pblk, *pblk + *plen - 1, *plen);
-
+allocated:
for (b = start; b < end;
b += fs->super->s_blocks_per_group)
clear_block_uninit(fs,
@@ -457,6 +477,21 @@ errout:
return retval;
}
+void ext2fs_set_new_range_callback(ext2_filsys fs,
+ errcode_t (*func)(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen),
+ errcode_t (**old)(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen))
+{
+ if (!fs || fs->magic != EXT2_ET_MAGIC_EXT2FS_FILSYS)
+ return;
+
+ if (old)
+ *old = fs->new_range;
+
+ fs->new_range = func;
+}
+
errcode_t ext2fs_alloc_range(ext2_filsys fs, int flags, blk64_t goal,
blk_t len, blk64_t *ret)
{
diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
index aca5004..3949f61 100644
--- a/lib/ext2fs/alloc_stats.c
+++ b/lib/ext2fs/alloc_stats.c
@@ -145,4 +145,20 @@ void ext2fs_block_alloc_stats_range(ext2_filsys fs, blk64_t blk,
}
ext2fs_mark_super_dirty(fs);
ext2fs_mark_bb_dirty(fs);
+ if (fs->block_alloc_stats_range)
+ (fs->block_alloc_stats_range)(fs, blk, num, inuse);
+}
+
+void ext2fs_set_block_alloc_stats_range_callback(ext2_filsys fs,
+ void (*func)(ext2_filsys fs, blk64_t blk,
+ blk_t num, int inuse),
+ void (**old)(ext2_filsys fs, blk64_t blk,
+ blk_t num, int inuse))
+{
+ if (!fs || fs->magic != EXT2_ET_MAGIC_EXT2FS_FILSYS)
+ return;
+ if (old)
+ *old = fs->block_alloc_stats_range;
+
+ fs->block_alloc_stats_range = func;
}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index bfe0ecf..ea7e624 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -279,6 +279,12 @@ struct struct_ext2_filsys {
io_channel journal_io;
char *journal_name;
+
+ /* New block range allocation hooks */
+ errcode_t (*new_range)(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen);
+ void (*block_alloc_stats_range)(ext2_filsys fs, blk64_t blk, blk_t num,
+ int inuse);
};
#if EXT2_FLAT_INCLUDES
@@ -693,6 +699,16 @@ extern void ext2fs_set_alloc_block_callback(ext2_filsys fs,
blk64_t *ret));
blk64_t ext2fs_find_inode_goal(ext2_filsys fs, ext2_ino_t ino,
struct ext2_inode *inode, blk64_t lblk);
+extern void ext2fs_set_new_range_callback(ext2_filsys fs,
+ errcode_t (*func)(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen),
+ errcode_t (**old)(ext2_filsys fs, int flags, blk64_t goal,
+ blk64_t len, blk64_t *pblk, blk64_t *plen));
+extern void ext2fs_set_block_alloc_stats_range_callback(ext2_filsys fs,
+ void (*func)(ext2_filsys fs, blk64_t blk,
+ blk_t num, int inuse),
+ void (**old)(ext2_filsys fs, blk64_t blk,
+ blk_t num, int inuse));
#define EXT2_NEWRANGE_FIXED_GOAL (0x1)
#define EXT2_NEWRANGE_MIN_LENGTH (0x2)
#define EXT2_NEWRANGE_ALL_FLAGS (0x3)
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 27/31] libext2fs: implement fallocate
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (25 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 26/31] libext2fs: add new hooks to support large allocations Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:19 ` [PATCH 28/31] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
` (8 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Create a library function to perform fallocation on arbitrary files,
and wire up a few users for this function. This is a bit more intense
than Ted's original mk_hugefiles implementation since we have to honor
any blocks that may already be allocated to the file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/Makefile.in | 8
lib/ext2fs/ext2fs.h | 10 +
lib/ext2fs/fallocate.c | 853 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 871 insertions(+)
create mode 100644 lib/ext2fs/fallocate.c
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 2706bfa..d1350b5 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -79,6 +79,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
expanddir.o \
ext_attr.o \
extent.o \
+ fallocate.o \
fileio.o \
finddev.o \
flushb.o \
@@ -763,6 +764,13 @@ extent.o: $(srcdir)/extent.c $(top_builddir)/lib/config.h \
$(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
$(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
$(srcdir)/bitops.h $(srcdir)/e2image.h
+fallocate.o: $(srcdir)/fallocate.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
+ $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fsP.h \
+ $(srcdir)/ext2fs.h $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(srcdir)/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h $(srcdir)/ext2_ext_attr.h \
+ $(srcdir)/bitops.h $(srcdir)/e2image.h
fileio.o: $(srcdir)/fileio.c $(top_builddir)/lib/config.h \
$(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
$(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index ea7e624..23f6fa5 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1255,6 +1255,16 @@ extern errcode_t ext2fs_extent_goto2(ext2_extent_handle_t handle,
extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);
size_t ext2fs_max_extent_depth(ext2_extent_handle_t handle);
+/* fallocate.c */
+#define EXT2_FALLOCATE_ZERO_BLOCKS (0x1)
+#define EXT2_FALLOCATE_FORCE_INIT (0x2)
+#define EXT2_FALLOCATE_FORCE_UNINIT (0x4)
+#define EXT2_FALLOCATE_INIT_BEYOND_EOF (0x8)
+#define EXT2_FALLOCATE_ALL_FLAGS (0xF)
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+ struct ext2_inode *inode, blk64_t goal,
+ blk64_t start, blk64_t len);
+
/* fileio.c */
extern errcode_t ext2fs_file_open2(ext2_filsys fs, ext2_ino_t ino,
struct ext2_inode *inode,
diff --git a/lib/ext2fs/fallocate.c b/lib/ext2fs/fallocate.c
new file mode 100644
index 0000000..8b502d5
--- /dev/null
+++ b/lib/ext2fs/fallocate.c
@@ -0,0 +1,853 @@
+/*
+ * fallocate.c -- Allocate large chunks of file.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+
+#include "ext2_fs.h"
+#include "ext2fs.h"
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#undef DEBUG
+
+#ifdef DEBUG
+# define dbg_printf(f, a...) do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Extent-based fallocate code.
+ *
+ * Find runs of unmapped logical blocks by starting at start and walking the
+ * extents until we reach the end of the range we want.
+ *
+ * For each run of unmapped blocks, try to find the extents on either side of
+ * the range. If there's a left extent that can grow by at least a cluster and
+ * there are lblocks between start and the next lcluster after start, see if
+ * there's an implied cluster allocation; if so, zero the blocks (if the left
+ * extent is initialized) and adjust the extent. Ditto for the blocks between
+ * the end of the last full lcluster and end, if there's a right extent.
+ *
+ * Try to attach as much as we can to the left extent, then try to attach as
+ * much as we can to the right extent. For the remainder, try to allocate the
+ * whole range; map in whatever we get; and repeat until we're done.
+ *
+ * To attach to a left extent, figure out the maximum amount we can add to the
+ * extent and try to allocate that much, and append if successful. To attach
+ * to a right extent, figure out the max we can add to the extent, try to
+ * allocate that much, and prepend if successful.
+ *
+ * We need an alloc_range function that tells us how much we can allocate given
+ * a maximum length and one of a suggested start, a fixed start, or a fixed end
+ * point.
+ *
+ * Every time we modify the extent tree we also need to update the block stats.
+ *
+ * At the end, update i_blocks and i_size appropriately.
+ */
+
+static void dbg_print_extent(char *desc, struct ext2fs_extent *extent)
+{
+#ifdef DEBUG
+ if (desc)
+ printf("%s: ", desc);
+ printf("extent: lblk %llu--%llu, len %u, pblk %llu, flags: ",
+ extent->e_lblk, extent->e_lblk + extent->e_len - 1,
+ extent->e_len, extent->e_pblk);
+ if (extent->e_flags & EXT2_EXTENT_FLAGS_LEAF)
+ fputs("LEAF ", stdout);
+ if (extent->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+ fputs("UNINIT ", stdout);
+ if (extent->e_flags & EXT2_EXTENT_FLAGS_SECOND_VISIT)
+ fputs("2ND_VISIT ", stdout);
+ if (!extent->e_flags)
+ fputs("(none)", stdout);
+ fputc('\n', stdout);
+ fflush(stdout);
+#endif
+}
+
+static errcode_t claim_range(ext2_filsys fs, struct ext2_inode *inode,
+ blk64_t blk, blk64_t len)
+{
+ blk64_t clusters;
+
+ clusters = (len + EXT2FS_CLUSTER_RATIO(fs) - 1) /
+ EXT2FS_CLUSTER_RATIO(fs);
+ ext2fs_block_alloc_stats_range(fs, blk,
+ clusters * EXT2FS_CLUSTER_RATIO(fs), +1);
+ return ext2fs_iblk_add_blocks(fs, inode, clusters);
+}
+
+static errcode_t ext_falloc_helper(ext2_filsys fs,
+ int flags,
+ ext2_ino_t ino,
+ struct ext2_inode *inode,
+ ext2_extent_handle_t handle,
+ struct ext2fs_extent *left_ext,
+ struct ext2fs_extent *right_ext,
+ blk64_t range_start, blk64_t range_len,
+ blk64_t alloc_goal)
+{
+ struct ext2fs_extent newex, ex;
+ int op;
+ blk64_t fillable, pblk, plen, x, y;
+ blk64_t eof_blk = 0, cluster_fill = 0;
+ errcode_t err;
+ blk_t max_extent_len, max_uninit_len, max_init_len;
+
+#ifdef DEBUG
+ printf("%s: ", __func__);
+ if (left_ext)
+ printf("left_ext=%llu--%llu, ", left_ext->e_lblk,
+ left_ext->e_lblk + left_ext->e_len - 1);
+ if (right_ext)
+ printf("right_ext=%llu--%llu, ", right_ext->e_lblk,
+ right_ext->e_lblk + right_ext->e_len - 1);
+ printf("start=%llu len=%llu, goal=%llu\n", range_start, range_len,
+ alloc_goal);
+ fflush(stdout);
+#endif
+ /* Can't create initialized extents past EOF? */
+ if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF))
+ eof_blk = EXT2_I_SIZE(inode) / fs->blocksize;
+
+ /* The allocation goal must be as far into a cluster as range_start. */
+ alloc_goal = (alloc_goal & ~EXT2FS_CLUSTER_MASK(fs)) |
+ (range_start & EXT2FS_CLUSTER_MASK(fs));
+
+ max_uninit_len = EXT_UNINIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+ max_init_len = EXT_INIT_MAX_LEN & ~EXT2FS_CLUSTER_MASK(fs);
+
+ /* We must lengthen the left extent to the end of the cluster */
+ if (left_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+ /* How many more blocks can be attached to left_ext? */
+ if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+ fillable = max_uninit_len - left_ext->e_len;
+ else
+ fillable = max_init_len - left_ext->e_len;
+
+ if (fillable > range_len)
+ fillable = range_len;
+ if (fillable == 0)
+ goto expand_right;
+
+ /*
+ * If range_start isn't on a cluster boundary, try an
+ * implied cluster allocation for left_ext.
+ */
+ cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+ (range_start & EXT2FS_CLUSTER_MASK(fs));
+ cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+ if (cluster_fill == 0)
+ goto expand_right;
+
+ if (cluster_fill > fillable)
+ cluster_fill = fillable;
+
+ /* Don't expand an initialized left_ext beyond EOF */
+ if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+ x = left_ext->e_lblk + left_ext->e_len - 1;
+ dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+ __func__, x, x + cluster_fill, eof_blk);
+ if (eof_blk >= x && eof_blk <= x + cluster_fill)
+ cluster_fill = eof_blk - x;
+ if (cluster_fill == 0)
+ goto expand_right;
+ }
+
+ err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+ if (err)
+ goto expand_right;
+ left_ext->e_len += cluster_fill;
+ range_start += cluster_fill;
+ range_len -= cluster_fill;
+ alloc_goal += cluster_fill;
+
+ dbg_print_extent("ext_falloc clus left+", left_ext);
+ err = ext2fs_extent_replace(handle, 0, left_ext);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ /* Zero blocks */
+ if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+ err = ext2fs_zero_blocks2(fs, left_ext->e_pblk +
+ left_ext->e_len -
+ cluster_fill, cluster_fill,
+ NULL, NULL);
+ if (err)
+ goto out;
+ }
+ }
+
+expand_right:
+ /* We must lengthen the right extent to the beginning of the cluster */
+ if (right_ext && EXT2FS_CLUSTER_RATIO(fs) > 1) {
+ /* How much can we attach to right_ext? */
+ if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+ fillable = max_uninit_len - right_ext->e_len;
+ else
+ fillable = max_init_len - right_ext->e_len;
+
+ if (fillable > range_len)
+ fillable = range_len;
+ if (fillable == 0)
+ goto try_merge;
+
+ /*
+ * If range_end isn't on a cluster boundary, try an implied
+ * cluster allocation for right_ext.
+ */
+ cluster_fill = right_ext->e_lblk & EXT2FS_CLUSTER_MASK(fs);
+ if (cluster_fill == 0)
+ goto try_merge;
+
+ err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+ if (err)
+ goto out;
+
+ if (cluster_fill > fillable)
+ cluster_fill = fillable;
+ right_ext->e_lblk -= cluster_fill;
+ right_ext->e_pblk -= cluster_fill;
+ right_ext->e_len += cluster_fill;
+ range_len -= cluster_fill;
+
+ dbg_print_extent("ext_falloc clus right+", right_ext);
+ err = ext2fs_extent_replace(handle, 0, right_ext);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ /* Zero blocks if necessary */
+ if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+ err = ext2fs_zero_blocks2(fs, right_ext->e_pblk,
+ cluster_fill, NULL, NULL);
+ if (err)
+ goto out;
+ }
+ }
+
+try_merge:
+ /* Merge both extents together, perhaps? */
+ if (left_ext && right_ext) {
+ /* Are the two extents mergeable? */
+ if ((left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) !=
+ (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT))
+ goto try_left;
+
+ /* User requires init/uninit but extent is uninit/init. */
+ if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+ (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+ ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+ !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+ goto try_left;
+
+ /*
+ * Skip initialized extent unless user wants to zero blocks
+ * or requires init extent.
+ */
+ if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (!(flags & EXT2_FALLOCATE_ZERO_BLOCKS) ||
+ !(flags & EXT2_FALLOCATE_FORCE_INIT)))
+ goto try_left;
+
+ /* Will it even fit? */
+ x = left_ext->e_len + range_len + right_ext->e_len;
+ if (x > (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT ?
+ max_uninit_len : max_init_len))
+ goto try_left;
+
+ err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+ if (err)
+ goto try_left;
+
+ /* Allocate blocks */
+ y = left_ext->e_pblk + left_ext->e_len;
+ err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+ EXT2_NEWRANGE_MIN_LENGTH, y,
+ right_ext->e_pblk - y + 1, NULL,
+ &pblk, &plen);
+ if (err)
+ goto try_left;
+ if (pblk + plen != right_ext->e_pblk)
+ goto try_left;
+ err = claim_range(fs, inode, pblk, plen);
+ if (err)
+ goto out;
+
+ /* Modify extents */
+ left_ext->e_len = x;
+ dbg_print_extent("ext_falloc merge", left_ext);
+ err = ext2fs_extent_replace(handle, 0, left_ext);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+ err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF, &newex);
+ if (err)
+ goto out;
+ err = ext2fs_extent_delete(handle, 0);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+ *right_ext = *left_ext;
+
+ /* Zero blocks */
+ if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+ err = ext2fs_zero_blocks2(fs, range_start, range_len,
+ NULL, NULL);
+ if (err)
+ goto out;
+ }
+
+ return 0;
+ }
+
+try_left:
+ /* Extend the left extent */
+ if (left_ext) {
+ /* How many more blocks can be attached to left_ext? */
+ if (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+ fillable = max_uninit_len - left_ext->e_len;
+ else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+ fillable = max_init_len - left_ext->e_len;
+ else
+ fillable = 0;
+
+ /* User requires init/uninit but extent is uninit/init. */
+ if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+ (left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+ ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+ !(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+ goto try_right;
+
+ if (fillable > range_len)
+ fillable = range_len;
+
+ /* Don't expand an initialized left_ext beyond EOF */
+ x = left_ext->e_lblk + left_ext->e_len - 1;
+ if (!(flags & EXT2_FALLOCATE_INIT_BEYOND_EOF)) {
+ dbg_printf("%s: lend=%llu newlend=%llu eofblk=%llu\n",
+ __func__, x, x + fillable, eof_blk);
+ if (eof_blk >= x && eof_blk <= x + fillable)
+ fillable = eof_blk - x;
+ }
+
+ if (fillable == 0)
+ goto try_right;
+
+ /* Test if the right edge of the range is already mapped? */
+ if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+ err = ext2fs_map_cluster_block(fs, ino, inode,
+ x + fillable, &pblk);
+ if (err)
+ goto out;
+ if (pblk)
+ fillable -= 1 + ((x + fillable)
+ & EXT2FS_CLUSTER_MASK(fs));
+ if (fillable == 0)
+ goto try_right;
+ }
+
+ /* Allocate range of blocks */
+ x = left_ext->e_pblk + left_ext->e_len;
+ err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+ EXT2_NEWRANGE_MIN_LENGTH,
+ x, fillable, NULL, &pblk, &plen);
+ if (err)
+ goto try_right;
+ err = claim_range(fs, inode, pblk, plen);
+ if (err)
+ goto out;
+
+ /* Modify left_ext */
+ err = ext2fs_extent_goto(handle, left_ext->e_lblk);
+ if (err)
+ goto out;
+ range_start += plen;
+ range_len -= plen;
+ left_ext->e_len += plen;
+ dbg_print_extent("ext_falloc left+", left_ext);
+ err = ext2fs_extent_replace(handle, 0, left_ext);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ /* Zero blocks if necessary */
+ if (!(left_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+ err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+ if (err)
+ goto out;
+ }
+ }
+
+try_right:
+ /* Extend the right extent */
+ if (right_ext) {
+ /* How much can we attach to right_ext? */
+ if (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)
+ fillable = max_uninit_len - right_ext->e_len;
+ else if (flags & EXT2_FALLOCATE_ZERO_BLOCKS)
+ fillable = max_init_len - right_ext->e_len;
+ else
+ fillable = 0;
+
+ /* User requires init/uninit but extent is uninit/init. */
+ if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+ (right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)) ||
+ ((flags & EXT2_FALLOCATE_FORCE_UNINIT) &&
+ !(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT)))
+ goto try_anywhere;
+
+ if (fillable > range_len)
+ fillable = range_len;
+ if (fillable == 0)
+ goto try_anywhere;
+
+ /* Test if the left edge of the range is already mapped? */
+ if (EXT2FS_CLUSTER_RATIO(fs) > 1) {
+ err = ext2fs_map_cluster_block(fs, ino, inode,
+ right_ext->e_lblk - fillable, &pblk);
+ if (err)
+ goto out;
+ if (pblk)
+ fillable -= EXT2FS_CLUSTER_RATIO(fs) -
+ ((right_ext->e_lblk - fillable)
+ & EXT2FS_CLUSTER_MASK(fs));
+ if (fillable == 0)
+ goto try_anywhere;
+ }
+
+ /*
+ * FIXME: It would be nice if we could handle allocating a
+ * variable range from a fixed end point instead of just
+ * skipping to the general allocator if the whole range is
+ * unavailable.
+ */
+ err = ext2fs_new_range(fs, EXT2_NEWRANGE_FIXED_GOAL |
+ EXT2_NEWRANGE_MIN_LENGTH,
+ right_ext->e_pblk - fillable,
+ fillable, NULL, &pblk, &plen);
+ if (err)
+ goto try_anywhere;
+ err = claim_range(fs, inode,
+ pblk & ~EXT2FS_CLUSTER_MASK(fs),
+ plen + (pblk & EXT2FS_CLUSTER_MASK(fs)));
+ if (err)
+ goto out;
+
+ /* Modify right_ext */
+ err = ext2fs_extent_goto(handle, right_ext->e_lblk);
+ if (err)
+ goto out;
+ range_len -= plen;
+ right_ext->e_lblk -= plen;
+ right_ext->e_pblk -= plen;
+ right_ext->e_len += plen;
+ dbg_print_extent("ext_falloc right+", right_ext);
+ err = ext2fs_extent_replace(handle, 0, right_ext);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ /* Zero blocks if necessary */
+ if (!(right_ext->e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+ err = ext2fs_zero_blocks2(fs, pblk,
+ plen + cluster_fill, NULL, NULL);
+ if (err)
+ goto out;
+ }
+ }
+
+try_anywhere:
+ /* Try implied cluster alloc on the left and right ends */
+ if (range_len > 0 && (range_start & EXT2FS_CLUSTER_MASK(fs))) {
+ cluster_fill = EXT2FS_CLUSTER_RATIO(fs) -
+ (range_start & EXT2FS_CLUSTER_MASK(fs));
+ cluster_fill &= EXT2FS_CLUSTER_MASK(fs);
+ if (cluster_fill > range_len)
+ cluster_fill = range_len;
+ newex.e_lblk = range_start;
+ err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+ &pblk);
+ if (err)
+ goto out;
+ if (pblk == 0)
+ goto try_right_implied;
+ newex.e_pblk = pblk;
+ newex.e_len = cluster_fill;
+ newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+ EXT2_EXTENT_FLAGS_UNINIT);
+ dbg_print_extent("ext_falloc iclus left+", &newex);
+ ext2fs_extent_goto(handle, newex.e_lblk);
+ err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+ &ex);
+ if (err == EXT2_ET_NO_CURRENT_NODE)
+ ex.e_lblk = 0;
+ else if (err)
+ goto out;
+
+ if (ex.e_lblk > newex.e_lblk)
+ op = 0; /* insert before */
+ else
+ op = EXT2_EXTENT_INSERT_AFTER;
+ dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+ __func__, op ? "after" : "before", ex.e_lblk,
+ newex.e_lblk);
+ err = ext2fs_extent_insert(handle, op, &newex);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+ err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+ newex.e_len, NULL, NULL);
+ if (err)
+ goto out;
+ }
+
+ range_start += cluster_fill;
+ range_len -= cluster_fill;
+ }
+
+try_right_implied:
+ y = range_start + range_len;
+ if (range_len > 0 && (y & EXT2FS_CLUSTER_MASK(fs))) {
+ cluster_fill = y & EXT2FS_CLUSTER_MASK(fs);
+ if (cluster_fill > range_len)
+ cluster_fill = range_len;
+ newex.e_lblk = y & ~EXT2FS_CLUSTER_MASK(fs);
+ err = ext2fs_map_cluster_block(fs, ino, inode, newex.e_lblk,
+ &pblk);
+ if (err)
+ goto out;
+ if (pblk == 0)
+ goto no_implied;
+ newex.e_pblk = pblk;
+ newex.e_len = cluster_fill;
+ newex.e_flags = (flags & EXT2_FALLOCATE_FORCE_INIT ? 0 :
+ EXT2_EXTENT_FLAGS_UNINIT);
+ dbg_print_extent("ext_falloc iclus right+", &newex);
+ ext2fs_extent_goto(handle, newex.e_lblk);
+ err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+ &ex);
+ if (err == EXT2_ET_NO_CURRENT_NODE)
+ ex.e_lblk = 0;
+ else if (err)
+ goto out;
+
+ if (ex.e_lblk > newex.e_lblk)
+ op = 0; /* insert before */
+ else
+ op = EXT2_EXTENT_INSERT_AFTER;
+ dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+ __func__, op ? "after" : "before", ex.e_lblk,
+ newex.e_lblk);
+ err = ext2fs_extent_insert(handle, op, &newex);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+ err = ext2fs_zero_blocks2(fs, newex.e_pblk,
+ newex.e_len, NULL, NULL);
+ if (err)
+ goto out;
+ }
+
+ range_len -= cluster_fill;
+ }
+
+no_implied:
+ if (range_len == 0)
+ return 0;
+
+ newex.e_lblk = range_start;
+ if (flags & EXT2_FALLOCATE_FORCE_INIT) {
+ max_extent_len = max_init_len;
+ newex.e_flags = 0;
+ } else {
+ max_extent_len = max_uninit_len;
+ newex.e_flags = EXT2_EXTENT_FLAGS_UNINIT;
+ }
+ pblk = alloc_goal;
+ y = range_len;
+ for (x = 0; x < y;) {
+ cluster_fill = newex.e_lblk & EXT2FS_CLUSTER_MASK(fs);
+ fillable = min(range_len + cluster_fill, max_extent_len);
+ err = ext2fs_new_range(fs, 0, pblk & ~EXT2FS_CLUSTER_MASK(fs),
+ fillable,
+ NULL, &pblk, &plen);
+ if (err)
+ goto out;
+ err = claim_range(fs, inode, pblk, plen);
+ if (err)
+ goto out;
+
+ /* Create extent */
+ newex.e_pblk = pblk + cluster_fill;
+ newex.e_len = plen - cluster_fill;
+ dbg_print_extent("ext_falloc create", &newex);
+ ext2fs_extent_goto(handle, newex.e_lblk);
+ err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT,
+ &ex);
+ if (err == EXT2_ET_NO_CURRENT_NODE)
+ ex.e_lblk = 0;
+ else if (err)
+ goto out;
+
+ if (ex.e_lblk > newex.e_lblk)
+ op = 0; /* insert before */
+ else
+ op = EXT2_EXTENT_INSERT_AFTER;
+ dbg_printf("%s: inserting %s lblk %llu newex=%llu\n",
+ __func__, op ? "after" : "before", ex.e_lblk,
+ newex.e_lblk);
+ err = ext2fs_extent_insert(handle, op, &newex);
+ if (err)
+ goto out;
+ err = ext2fs_extent_fix_parents(handle);
+ if (err)
+ goto out;
+
+ if (!(newex.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ (flags & EXT2_FALLOCATE_ZERO_BLOCKS)) {
+ err = ext2fs_zero_blocks2(fs, pblk, plen, NULL, NULL);
+ if (err)
+ goto out;
+ }
+
+ /* Update variables at end of loop */
+ x += plen - cluster_fill;
+ range_len -= plen - cluster_fill;
+ newex.e_lblk += plen - cluster_fill;
+ pblk += plen - cluster_fill;
+ if (pblk >= ext2fs_blocks_count(fs->super))
+ pblk = fs->super->s_first_data_block;
+ }
+
+out:
+ return err;
+}
+
+static errcode_t extent_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+ struct ext2_inode *inode, blk64_t goal,
+ blk64_t start, blk64_t len)
+{
+ ext2_extent_handle_t handle;
+ struct ext2fs_extent left_extent, right_extent;
+ struct ext2fs_extent *left_adjacent, *right_adjacent;
+ errcode_t err;
+ blk64_t range_start, range_end = 0, end, next;
+ blk64_t count, goal_distance;
+
+ end = start + len - 1;
+ err = ext2fs_extent_open2(fs, ino, inode, &handle);
+ if (err)
+ return err;
+
+ /*
+ * Find the extent closest to the start of the alloc range. We don't
+ * check the return value because _goto() sets the current node to the
+ * next-lowest extent if 'start' is in a hole; or the next-highest
+ * extent if there aren't any lower ones; or doesn't set a current node
+ * if there was a real error reading the extent tree. In that case,
+ * _get() will error out.
+ */
+start_again:
+ ext2fs_extent_goto(handle, start);
+ err = ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, &left_extent);
+ if (err == EXT2_ET_NO_CURRENT_NODE) {
+ blk64_t max_blocks = ext2fs_blocks_count(fs->super);
+
+ if (goal == ~0ULL)
+ goal = ext2fs_find_inode_goal(fs, ino, inode, start);
+ err = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+ goal, max_blocks - 1, &goal);
+ goal += start;
+ err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+ NULL, start, len, goal);
+ goto errout;
+ } else if (err)
+ goto errout;
+
+ dbg_print_extent("ext_falloc initial", &left_extent);
+ next = left_extent.e_lblk + left_extent.e_len;
+ if (left_extent.e_lblk > start) {
+ /* The nearest extent we found was beyond start??? */
+ goal = left_extent.e_pblk - (left_extent.e_lblk - start);
+ err = ext_falloc_helper(fs, flags, ino, inode, handle, NULL,
+ &left_extent, start,
+ left_extent.e_lblk - start, goal);
+ if (err)
+ goto errout;
+
+ goto start_again;
+ } else if (next >= start) {
+ range_start = next;
+ left_adjacent = &left_extent;
+ } else {
+ range_start = start;
+ left_adjacent = NULL;
+ }
+ goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+ goal_distance = range_start - next;
+
+ do {
+ err = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT_LEAF,
+ &right_extent);
+ dbg_printf("%s: ino=%d get next =%d\n", __func__, ino,
+ (int)err);
+ dbg_print_extent("ext_falloc next", &right_extent);
+ /* Stop if we've seen this extent before */
+ if (!err && right_extent.e_lblk <= left_extent.e_lblk)
+ err = EXT2_ET_EXTENT_NO_NEXT;
+
+ if (err && err != EXT2_ET_EXTENT_NO_NEXT)
+ goto errout;
+ if (err == EXT2_ET_EXTENT_NO_NEXT ||
+ right_extent.e_lblk > end + 1) {
+ range_end = end;
+ right_adjacent = NULL;
+ } else {
+ /* Handle right_extent.e_lblk <= end */
+ range_end = right_extent.e_lblk - 1;
+ right_adjacent = &right_extent;
+ }
+ if (err != EXT2_ET_EXTENT_NO_NEXT &&
+ goal_distance > (range_end - right_extent.e_lblk)) {
+ goal = right_extent.e_pblk -
+ (right_extent.e_lblk - range_start);
+ goal_distance = range_end - right_extent.e_lblk;
+ }
+
+ dbg_printf("%s: ino=%d rstart=%llu rend=%llu\n", __func__, ino,
+ range_start, range_end);
+ err = 0;
+ if (range_start <= range_end) {
+ count = range_end - range_start + 1;
+ err = ext_falloc_helper(fs, flags, ino, inode, handle,
+ left_adjacent, right_adjacent,
+ range_start, count, goal);
+ if (err)
+ goto errout;
+ }
+
+ if (range_end == end)
+ break;
+
+ err = ext2fs_extent_goto(handle, right_extent.e_lblk);
+ if (err)
+ goto errout;
+ next = right_extent.e_lblk + right_extent.e_len;
+ left_extent = right_extent;
+ left_adjacent = &left_extent;
+ range_start = next;
+ goal = left_extent.e_pblk + (range_start - left_extent.e_lblk);
+ goal_distance = range_start - next;
+ } while (range_end < end);
+
+errout:
+ ext2fs_extent_free(handle);
+ return err;
+}
+
+/*
+ * Map physical blocks to a range of logical blocks within a file. The range
+ * of logical blocks are (start, start + len). If there are already extents,
+ * the mappings will try to extend the mappings; otherwise, it will try to map
+ * start as if logical block 0 points to goal. If goal is ~0ULL, then the goal
+ * is calculated based on the inode group.
+ *
+ * Flags:
+ * - EXT2_FALLOCATE_ZERO_BLOCKS: Zero the blocks that are allocated.
+ * - EXT2_FALLOCATE_FORCE_INIT: Create only initialized extents.
+ * - EXT2_FALLOCATE_FORCE_UNINIT: Create only uninitialized extents.
+ * - EXT2_FALLOCATE_INIT_BEYOND_EOF: Create extents beyond EOF.
+ *
+ * If neither FORCE_INIT nor FORCE_UNINIT are specified, this function will
+ * try to expand any extents it finds, zeroing blocks as necessary.
+ */
+errcode_t ext2fs_fallocate(ext2_filsys fs, int flags, ext2_ino_t ino,
+ struct ext2_inode *inode, blk64_t goal,
+ blk64_t start, blk64_t len)
+{
+ struct ext2_inode inode_buf;
+ blk64_t blk, x;
+ errcode_t err;
+
+ if (((flags & EXT2_FALLOCATE_FORCE_INIT) &&
+ (flags & EXT2_FALLOCATE_FORCE_UNINIT)) ||
+ (flags & ~EXT2_FALLOCATE_ALL_FLAGS))
+ return EXT2_ET_INVALID_ARGUMENT;
+
+ if (len > ext2fs_blocks_count(fs->super))
+ return EXT2_ET_BLOCK_ALLOC_FAIL;
+ else if (len == 0)
+ return 0;
+
+ /* Read inode structure if necessary */
+ if (!inode) {
+ err = ext2fs_read_inode(fs, ino, &inode_buf);
+ if (err)
+ return err;
+ inode = &inode_buf;
+ }
+ dbg_printf("%s: ino=%d start=%llu len=%llu goal=%llu\n", __func__, ino,
+ start, len, goal);
+
+ if (inode->i_flags & EXT4_EXTENTS_FL) {
+ err = extent_fallocate(fs, flags, ino, inode, goal, start, len);
+ goto out;
+ }
+
+ /* XXX: Allocate a bunch of blocks the slow way */
+ for (blk = start; blk < start + len; blk++) {
+ err = ext2fs_bmap2(fs, ino, inode, NULL, 0, blk, 0, &x);
+ if (err)
+ return err;
+ if (x)
+ continue;
+
+ err = ext2fs_bmap2(fs, ino, inode, NULL,
+ BMAP_ALLOC | BMAP_UNINIT | BMAP_ZERO, blk,
+ 0, &x);
+ if (err)
+ return err;
+ }
+
+out:
+ if (inode == &inode_buf)
+ ext2fs_write_inode(fs, ino, inode);
+ return err;
+}
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 28/31] libext2fs: use fallocate for creating journals and hugefiles
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (26 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 27/31] libext2fs: implement fallocate Darrick J. Wong
@ 2014-12-20 21:19 ` Darrick J. Wong
2014-12-20 21:20 ` [PATCH 29/31] debugfs: implement fallocate Darrick J. Wong
` (7 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:19 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Use the new fallocate API for creating the journal and the mk_hugefile
feature.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/mkjournal.c | 134 +++++----------------------------
misc/mk_hugefiles.c | 96 ++----------------------
tests/f_opt_extent/expect | 15 ----
tests/r_32to64bit_meta/expect | 4 -
tests/r_32to64bit_move_itable/expect | 4 -
tests/r_64to32bit/expect | 4 -
tests/r_64to32bit_meta/expect | 4 -
tests/t_disable_mcsum_noinitbg/expect | 6 +
tests/t_enable_mcsum/expect | 15 ----
tests/t_enable_mcsum_ext3/expect | 8 +-
tests/t_enable_mcsum_initbg/expect | 11 +--
tests/t_iexpand_full/expect | 4 -
12 files changed, 56 insertions(+), 249 deletions(-)
diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index c42cb98..02a65cb 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -227,89 +227,6 @@ errcode_t ext2fs_zero_blocks(ext2_filsys fs, blk_t blk, int num,
}
/*
- * Helper function for creating the journal using direct I/O routines
- */
-struct mkjournal_struct {
- int num_blocks;
- int newblocks;
- blk64_t goal;
- blk64_t blk_to_zero;
- int zero_count;
- int flags;
- char *buf;
- errcode_t err;
-};
-
-static int mkjournal_proc(ext2_filsys fs,
- blk64_t *blocknr,
- e2_blkcnt_t blockcnt,
- blk64_t ref_block EXT2FS_ATTR((unused)),
- int ref_offset EXT2FS_ATTR((unused)),
- void *priv_data)
-{
- struct mkjournal_struct *es = (struct mkjournal_struct *) priv_data;
- blk64_t new_blk;
- errcode_t retval;
-
- if (*blocknr) {
- es->goal = *blocknr;
- return 0;
- }
- if (blockcnt &&
- (EXT2FS_B2C(fs, es->goal) == EXT2FS_B2C(fs, es->goal+1)))
- new_blk = es->goal+1;
- else {
- es->goal &= ~EXT2FS_CLUSTER_MASK(fs);
- retval = ext2fs_new_block2(fs, es->goal, 0, &new_blk);
- if (retval) {
- es->err = retval;
- return BLOCK_ABORT;
- }
- ext2fs_block_alloc_stats2(fs, new_blk, +1);
- es->newblocks++;
- }
- if (blockcnt >= 0)
- es->num_blocks--;
-
- retval = 0;
- if (blockcnt <= 0)
- retval = io_channel_write_blk64(fs->io, new_blk, 1, es->buf);
- else if (!(es->flags & EXT2_MKJOURNAL_LAZYINIT)) {
- if (es->zero_count) {
- if ((es->blk_to_zero + es->zero_count == new_blk) &&
- (es->zero_count < 1024))
- es->zero_count++;
- else {
- retval = ext2fs_zero_blocks2(fs,
- es->blk_to_zero,
- es->zero_count,
- 0, 0);
- es->zero_count = 0;
- }
- }
- if (es->zero_count == 0) {
- es->blk_to_zero = new_blk;
- es->zero_count = 1;
- }
- }
-
- if (blockcnt == 0)
- memset(es->buf, 0, fs->blocksize);
-
- if (retval) {
- es->err = retval;
- return BLOCK_ABORT;
- }
- *blocknr = es->goal = new_blk;
-
- if (es->num_blocks == 0)
- return (BLOCK_CHANGED | BLOCK_ABORT);
- else
- return BLOCK_CHANGED;
-
-}
-
-/*
* Calculate the initial goal block to be roughly at the middle of the
* filesystem. Pick a group that has the largest number of free
* blocks.
@@ -350,7 +267,8 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
errcode_t retval;
struct ext2_inode inode;
unsigned long long inode_size;
- struct mkjournal_struct es;
+ int falloc_flags = EXT2_FALLOCATE_FORCE_INIT;
+ blk64_t zblk;
if ((retval = ext2fs_create_journal_superblock(fs, num_blocks, flags,
&buf)))
@@ -367,40 +285,16 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
goto out2;
}
- es.num_blocks = num_blocks;
- es.newblocks = 0;
- es.buf = buf;
- es.err = 0;
- es.flags = flags;
- es.zero_count = 0;
- es.goal = (goal != ~0ULL) ? goal : get_midpoint_journal_block(fs);
+ if (goal == ~0ULL)
+ goal = get_midpoint_journal_block(fs);
- if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
+ if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS)
inode.i_flags |= EXT4_EXTENTS_FL;
- if ((retval = ext2fs_write_inode(fs, journal_ino, &inode)))
- goto out2;
- }
- retval = ext2fs_block_iterate3(fs, journal_ino, BLOCK_FLAG_APPEND,
- 0, mkjournal_proc, &es);
- if (retval)
- goto out2;
- if (es.err) {
- retval = es.err;
- goto out2;
- }
- if (es.zero_count) {
- retval = ext2fs_zero_blocks2(fs, es.blk_to_zero,
- es.zero_count, 0, 0);
- if (retval)
- goto out2;
- }
-
- if ((retval = ext2fs_read_inode(fs, journal_ino, &inode)))
- goto out2;
+ if (!(flags & EXT2_MKJOURNAL_LAZYINIT))
+ falloc_flags |= EXT2_FALLOCATE_ZERO_BLOCKS;
inode_size = (unsigned long long)fs->blocksize * num_blocks;
- ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
inode.i_mtime = inode.i_ctime = fs->now ? fs->now : time(0);
inode.i_links_count = 1;
inode.i_mode = LINUX_S_IFREG | 0600;
@@ -408,9 +302,21 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino,
if (retval)
goto out2;
+ retval = ext2fs_fallocate(fs, falloc_flags, journal_ino,
+ &inode, goal, 0, num_blocks);
+ if (retval)
+ goto out2;
+
if ((retval = ext2fs_write_new_inode(fs, journal_ino, &inode)))
goto out2;
- retval = 0;
+
+ retval = ext2fs_bmap2(fs, journal_ino, &inode, NULL, 0, 0, NULL, &zblk);
+ if (retval)
+ goto out2;
+
+ retval = io_channel_write_blk64(fs->io, zblk, 1, buf);
+ if (retval)
+ goto out2;
memcpy(fs->super->s_jnl_blocks, inode.i_block, EXT2_N_BLOCKS*4);
fs->super->s_jnl_blocks[15] = inode.i_size_high;
diff --git a/misc/mk_hugefiles.c b/misc/mk_hugefiles.c
index e42c0b9..0978d55 100644
--- a/misc/mk_hugefiles.c
+++ b/misc/mk_hugefiles.c
@@ -258,12 +258,7 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
{
errcode_t retval;
- blk64_t lblk, bend = 0;
- __u64 size;
- blk64_t left;
- blk64_t count = 0;
struct ext2_inode inode;
- ext2_extent_handle_t handle;
retval = ext2fs_new_inode(fs, 0, LINUX_S_IFREG, NULL, ino);
if (retval)
@@ -283,85 +278,20 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
ext2fs_inode_alloc_stats2(fs, *ino, +1, 0);
- retval = ext2fs_extent_open2(fs, *ino, &inode, &handle);
+ if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+ EXT3_FEATURE_INCOMPAT_EXTENTS))
+ inode.i_flags |= EXT4_EXTENTS_FL;
+ retval = ext2fs_fallocate(fs,
+ EXT2_FALLOCATE_FORCE_INIT |
+ EXT2_FALLOCATE_ZERO_BLOCKS,
+ *ino, &inode, ~0ULL, 0, num);
if (retval)
return retval;
-
- lblk = 0;
- left = num ? num : 1;
- while (left) {
- blk64_t pblk, end;
- blk64_t n = left;
-
- retval = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
- goal, ext2fs_blocks_count(fs->super) - 1, &end);
- if (retval)
- goto errout;
- goal = end;
-
- retval = ext2fs_find_first_set_block_bitmap2(fs->block_map, goal,
- ext2fs_blocks_count(fs->super) - 1, &bend);
- if (retval == ENOENT) {
- bend = ext2fs_blocks_count(fs->super);
- if (num == 0)
- left = 0;
- }
- if (!num || bend - goal < left)
- n = bend - goal;
- pblk = goal;
- if (num)
- left -= n;
- goal += n;
- count += n;
- ext2fs_block_alloc_stats_range(fs, pblk, n, +1);
-
- if (zero_hugefile) {
- blk64_t ret_blk;
- retval = ext2fs_zero_blocks2(fs, pblk, n,
- &ret_blk, NULL);
-
- if (retval)
- com_err(program_name, retval,
- _("while zeroing block %llu "
- "for hugefile"), ret_blk);
- }
-
- while (n) {
- blk64_t l = n;
- struct ext2fs_extent newextent;
-
- if (l > EXT_INIT_MAX_LEN)
- l = EXT_INIT_MAX_LEN;
-
- newextent.e_len = l;
- newextent.e_pblk = pblk;
- newextent.e_lblk = lblk;
- newextent.e_flags = 0;
-
- retval = ext2fs_extent_insert(handle,
- EXT2_EXTENT_INSERT_AFTER, &newextent);
- if (retval)
- return retval;
- pblk += l;
- lblk += l;
- n -= l;
- }
- }
-
- retval = ext2fs_read_inode(fs, *ino, &inode);
- if (retval)
- goto errout;
-
- retval = ext2fs_iblk_add_blocks(fs, &inode,
- count / EXT2FS_CLUSTER_RATIO(fs));
- if (retval)
- goto errout;
- size = (__u64) count * fs->blocksize;
- retval = ext2fs_inode_size_set(fs, &inode, size);
+ retval = ext2fs_inode_size_set(fs, &inode, num * fs->blocksize);
if (retval)
- goto errout;
+ return retval;
- retval = ext2fs_write_new_inode(fs, *ino, &inode);
+ retval = ext2fs_write_inode(fs, *ino, &inode);
if (retval)
goto errout;
@@ -379,13 +309,7 @@ retry:
goto retry;
}
- if (retval)
- goto errout;
-
errout:
- if (handle)
- ext2fs_extent_free(handle);
-
return retval;
}
diff --git a/tests/f_opt_extent/expect b/tests/f_opt_extent/expect
index 6d4863b..f4ed7ff 100644
--- a/tests/f_opt_extent/expect
+++ b/tests/f_opt_extent/expect
@@ -30,22 +30,11 @@ Change in FS metadata:
Inode count: 65536
Block count: 524288
Reserved block count: 26214
--Free blocks: 570
-+Free blocks: 567
+-Free blocks: 569
++Free blocks: 566
Free inodes: 65047
First block: 1
Block size: 1024
-@@ -47,8 +47,8 @@
- Block bitmap at 262 (+261)
- Inode bitmap at 278 (+277)
- Inode table at 294-549 (+293)
-- 21 free blocks, 535 free inodes, 3 directories, 535 unused inodes
-- Free blocks: 4414-4434
-+ 18 free blocks, 535 free inodes, 3 directories, 535 unused inodes
-+ Free blocks: 4417-4434
- Free inodes: 490-1024
- Group 1: (Blocks 8193-16384) [INODE_UNINIT]
- Backup superblock at 8193, Group descriptors at 8194-8197
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
diff --git a/tests/r_32to64bit_meta/expect b/tests/r_32to64bit_meta/expect
index 0eacd45..8796503 100644
--- a/tests/r_32to64bit_meta/expect
+++ b/tests/r_32to64bit_meta/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
Inode count: 65536
Block count: 524288
Reserved block count: 26214
--Free blocks: 858
-+Free blocks: 852
+-Free blocks: 857
++Free blocks: 851
Free inodes: 65046
First block: 1
Block size: 1024
diff --git a/tests/r_32to64bit_move_itable/expect b/tests/r_32to64bit_move_itable/expect
index b51663d..999bb8d 100644
--- a/tests/r_32to64bit_move_itable/expect
+++ b/tests/r_32to64bit_move_itable/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
Inode count: 98304
Block count: 786432
Reserved block count: 39321
--Free blocks: 764
-+Free blocks: 734
+-Free blocks: 763
++Free blocks: 733
Free inodes: 97566
First block: 1
Block size: 1024
diff --git a/tests/r_64to32bit/expect b/tests/r_64to32bit/expect
index 13e94a2..5d2ea4b 100644
--- a/tests/r_64to32bit/expect
+++ b/tests/r_64to32bit/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
Inode count: 65536
Block count: 524288
Reserved block count: 26214
--Free blocks: 571
-+Free blocks: 589
+-Free blocks: 570
++Free blocks: 588
Free inodes: 65048
First block: 1
Block size: 1024
diff --git a/tests/r_64to32bit_meta/expect b/tests/r_64to32bit_meta/expect
index d6e2dcc..1400c6b 100644
--- a/tests/r_64to32bit_meta/expect
+++ b/tests/r_64to32bit_meta/expect
@@ -35,8 +35,8 @@ Change in FS metadata:
Inode count: 65536
Block count: 524288
Reserved block count: 26214
--Free blocks: 852
-+Free blocks: 858
+-Free blocks: 851
++Free blocks: 857
Free inodes: 65046
First block: 1
Block size: 1024
diff --git a/tests/t_disable_mcsum_noinitbg/expect b/tests/t_disable_mcsum_noinitbg/expect
index a022631..09e4ff1 100644
--- a/tests/t_disable_mcsum_noinitbg/expect
+++ b/tests/t_disable_mcsum_noinitbg/expect
@@ -40,9 +40,9 @@ Change in FS metadata:
Block bitmap at 262 (+261)
Inode bitmap at 278 (+277)
Inode table at 294-549 (+293)
-- 21 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-+ 21 free blocks, 536 free inodes, 2 directories
- Free blocks: 4413-4433
+- 0 free blocks, 536 free inodes, 2 directories, 536 unused inodes
++ 0 free blocks, 536 free inodes, 2 directories
+ Free blocks:
Free inodes: 489-1024
-Group 1: (Blocks 8193-16384) [INODE_UNINIT]
+Group 1: (Blocks 8193-16384)
diff --git a/tests/t_enable_mcsum/expect b/tests/t_enable_mcsum/expect
index 2ee3c27..81e1125 100644
--- a/tests/t_enable_mcsum/expect
+++ b/tests/t_enable_mcsum/expect
@@ -45,8 +45,8 @@ Change in FS metadata:
Inode count: 65536
Block count: 524288
Reserved block count: 26214
--Free blocks: 571
-+Free blocks: 568
+-Free blocks: 570
++Free blocks: 567
Free inodes: 65048
First block: 1
Block size: 1024
@@ -58,17 +58,6 @@ Change in FS metadata:
Journal features: (none)
Journal size: 16M
Journal length: 16384
-@@ -46,8 +47,8 @@
- Block bitmap at 262 (+261)
- Inode bitmap at 278 (+277)
- Inode table at 294-549 (+293)
-- 21 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-- Free blocks: 4413-4433
-+ 18 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-+ Free blocks: 4413, 4417-4433
- Free inodes: 489-1024
- Group 1: (Blocks 8193-16384) [INODE_UNINIT]
- Backup superblock at 8193, Group descriptors at 8194-8197
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
diff --git a/tests/t_enable_mcsum_ext3/expect b/tests/t_enable_mcsum_ext3/expect
index 5460482..0f761a9 100644
--- a/tests/t_enable_mcsum_ext3/expect
+++ b/tests/t_enable_mcsum_ext3/expect
@@ -49,8 +49,8 @@ Change in FS metadata:
Reserved GDT blocks at 4-259
Block bitmap at 260 (+259)
@@ -45,7 +46,7 @@
- 7789 free blocks, 1013 free inodes, 2 directories
- Free blocks: 404-8192
+ 0 free blocks, 1013 free inodes, 2 directories
+ Free blocks:
Free inodes: 12-1024
-Group 1: (Blocks 8193-16384)
+Group 1: (Blocks 8193-16384) [ITABLE_ZEROED]
@@ -58,8 +58,8 @@ Change in FS metadata:
Reserved GDT blocks at 8196-8451
Block bitmap at 8452 (+259)
@@ -54,6 +55,6 @@
- 7803 free blocks, 1024 free inodes, 0 directories
- Free blocks: 8582-16384
+ 0 free blocks, 1024 free inodes, 0 directories
+ Free blocks:
Free inodes: 1025-2048
-Group 2: (Blocks 16385-24576)
+Group 2: (Blocks 16385-24576) [ITABLE_ZEROED]
diff --git a/tests/t_enable_mcsum_initbg/expect b/tests/t_enable_mcsum_initbg/expect
index d3b4444..3cbb98f 100644
--- a/tests/t_enable_mcsum_initbg/expect
+++ b/tests/t_enable_mcsum_initbg/expect
@@ -45,8 +45,8 @@ Change in FS metadata:
Inode count: 65536
Block count: 524288
Reserved block count: 26214
--Free blocks: 571
-+Free blocks: 568
+-Free blocks: 570
++Free blocks: 567
Free inodes: 65048
First block: 1
Block size: 1024
@@ -69,10 +69,9 @@ Change in FS metadata:
Block bitmap at 262 (+261)
Inode bitmap at 278 (+277)
Inode table at 294-549 (+293)
-- 21 free blocks, 536 free inodes, 2 directories
-- Free blocks: 4413-4433
-+ 18 free blocks, 536 free inodes, 2 directories, 536 unused inodes
-+ Free blocks: 4413, 4417-4433
+- 0 free blocks, 536 free inodes, 2 directories
++ 0 free blocks, 536 free inodes, 2 directories, 536 unused inodes
+ Free blocks:
Free inodes: 489-1024
-Group 1: (Blocks 8193-16384)
+Group 1: (Blocks 8193-16384) [INODE_UNINIT, ITABLE_ZEROED]
diff --git a/tests/t_iexpand_full/expect b/tests/t_iexpand_full/expect
index 3eb1715..0474827 100644
--- a/tests/t_iexpand_full/expect
+++ b/tests/t_iexpand_full/expect
@@ -21,8 +21,8 @@ Setting inode size 256
Exit status is 0
Change in FS metadata:
@@ -13 +13 @@
--Free blocks: 12301
-+Free blocks: 12
+-Free blocks: 12299
++Free blocks: 10
@@ -22 +22 @@
-Inode blocks per group: 128
+Inode blocks per group: 256
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 29/31] debugfs: implement fallocate
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (27 preceding siblings ...)
2014-12-20 21:19 ` [PATCH 28/31] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
@ 2014-12-20 21:20 ` Darrick J. Wong
2014-12-20 21:20 ` [PATCH 30/31] tests: test debugfs punch command Darrick J. Wong
` (6 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:20 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Implement a fallocate function for debugfs, and add some tests to
demonstrate that it works (more or less).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
debugfs/debug_cmds.ct | 3 +
debugfs/debugfs.8.in | 7 +
debugfs/debugfs.c | 36 +++++++
debugfs/debugfs.h | 1
tests/d_fallocate/expect.gz | Bin
tests/d_fallocate/name | 1
tests/d_fallocate/script | 175 ++++++++++++++++++++++++++++++++++
tests/d_fallocate_bigalloc/expect.gz | Bin
tests/d_fallocate_bigalloc/name | 1
tests/d_fallocate_bigalloc/script | 176 ++++++++++++++++++++++++++++++++++
tests/d_fallocate_blkmap/expect | 58 +++++++++++
tests/d_fallocate_blkmap/name | 1
tests/d_fallocate_blkmap/script | 85 ++++++++++++++++
13 files changed, 544 insertions(+)
create mode 100644 tests/d_fallocate/expect.gz
create mode 100644 tests/d_fallocate/name
create mode 100644 tests/d_fallocate/script
create mode 100644 tests/d_fallocate_bigalloc/expect.gz
create mode 100644 tests/d_fallocate_bigalloc/name
create mode 100644 tests/d_fallocate_bigalloc/script
create mode 100644 tests/d_fallocate_blkmap/expect
create mode 100644 tests/d_fallocate_blkmap/name
create mode 100644 tests/d_fallocate_blkmap/script
diff --git a/debugfs/debug_cmds.ct b/debugfs/debug_cmds.ct
index c6f6d6c..34dad9e 100644
--- a/debugfs/debug_cmds.ct
+++ b/debugfs/debug_cmds.ct
@@ -157,6 +157,9 @@ request do_dirsearch, "Search a directory for a particular filename",
request do_bmap, "Calculate the logical->physical block mapping for an inode",
bmap;
+request do_fallocate, "Allocate uninitialized blocks to an inode",
+ fallocate;
+
request do_punch, "Punch (or truncate) blocks from an inode by deallocating them",
punch, truncate;
diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index 81899a3..1342609 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -306,6 +306,13 @@ from the file \fIfilespec\fR.
Expand the directory
.IR filespec .
.TP
+.BI fallocate " filespec start_block [end_block]
+Allocate and map uninitialized blocks into \fIfilespec\fR between
+logical block \fIstart_block\fR and \fIend_block\fR, inclusive. If
+\fIend_block\fR is not supplied, this function maps until it runs out
+of free disk blocks or the maximum file size is reached. Existing
+mappings are left alone.
+.TP
.BI feature " [fs_feature] [-fs_feature] ..."
Set or clear various filesystem features in the superblock. After setting
or clearing any filesystem features that were requested, print the current
diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 7576a1a..385ebbf 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -2199,6 +2199,42 @@ void do_punch(int argc, char *argv[])
return;
}
}
+
+void do_fallocate(int argc, char *argv[])
+{
+ ext2_ino_t ino;
+ blk64_t start, end;
+ int err;
+ errcode_t errcode;
+
+ if (common_args_process(argc, argv, 3, 4, argv[0],
+ "<file> start_blk [end_blk]",
+ CHECK_FS_RW | CHECK_FS_BITMAPS))
+ return;
+
+ ino = string_to_inode(argv[1]);
+ if (!ino)
+ return;
+ err = strtoblk(argv[0], argv[2], "logical block", &start);
+ if (err)
+ return;
+ if (argc == 4) {
+ err = strtoblk(argv[0], argv[3], "logical block", &end);
+ if (err)
+ return;
+ } else
+ end = ~0;
+
+ errcode = ext2fs_fallocate(current_fs, EXT2_FALLOCATE_INIT_BEYOND_EOF,
+ ino, NULL, ~0ULL, start, end - start + 1);
+
+ if (errcode) {
+ com_err(argv[0], errcode,
+ "while fallocating inode %u from %llu to %llu\n", ino,
+ (unsigned long long) start, (unsigned long long) end);
+ return;
+ }
+}
#endif /* READ_ONLY */
void do_symlink(int argc, char *argv[])
diff --git a/debugfs/debugfs.h b/debugfs/debugfs.h
index e163d0a..76bb22c 100644
--- a/debugfs/debugfs.h
+++ b/debugfs/debugfs.h
@@ -166,6 +166,7 @@ extern void do_imap(int argc, char **argv);
extern void do_set_current_time(int argc, char **argv);
extern void do_supported_features(int argc, char **argv);
extern void do_punch(int argc, char **argv);
+extern void do_fallocate(int argc, char **argv);
extern void do_symlink(int argc, char **argv);
extern void do_dump_mmp(int argc, char **argv);
diff --git a/tests/d_fallocate/expect.gz b/tests/d_fallocate/expect.gz
new file mode 100644
index 0000000000000000000000000000000000000000..3e6ffc38595b7c451e3de0df3a16f795cf33421b
GIT binary patch
literal 3770
zcmZ9Nc{~$-{KqA}=E^ZmjxiFcMP=1u5+X-hxpK>W6gKA^k(*pYteD9aQf!j@-ZF9@
zIa|z%<Svs4zxDn8{`ft9|GXaW_xt(29`DEJ@%Tt50odT_7N+b+X*TO=5_LlRIp3=+
zUYwp2x!ot`yF<Qymm`s%12x3vEwp8VZp|Eg<<)~-o3D>}6x37KoE6w$b-7eoz5PW1
zRlzvTCtl@Ca_ekW)!OBmnT6E_^6ti0RNcX3*ym3z4_+tK^0)YW4x6-X(}3NZ$?Yn;
zpW7c8^!D0oyV(e<b;fMYm3tHG>l)x3GR?cK)g#dQKH*A*28L(XgRF}W72n*d5h_i2
zZ9e<RJ+!35=U3AnvY+-bH}KeC?)G9uRcqsv*F=Bc;CI}<d=w+*;NFHzMDmsp7tF@i
zetoU(#QduFAt`U!TTRm8-R|%2DEDu{KlhuM%ss?&vdQ@kSnHz0&-`rSWEPjL&*eR4
zfZ-G_yMBM4N!t(4Q(fx0J<+sUyT86L(NDvT4f5@-PWVjhF}Fi#6n!VS*p{B8^qNL^
zv*)bWMxuGpGMU!4(UE`U#^Agv)gFc%tNV`Z53JjN2|o1p3vFxC2%{;vH$@F*+!#@?
zZQ1ein+V5NH0^1>y$bUTcpL5)v9~=G=Kb8RG-x^ESF`2eIHJj_#p<x}V5oWW^Xiq=
zJ4=fn*8=?>F=bkIe$#N;VNB1K7B8LOKgb_f9X^)1rFb413{FRo@;zIZ#KF_{?jy9b
zBb}Q*x|a?oy=!xu7{dbhq%|H{Xmp9(nn*bxTkCbB<Q^$SV+>h9MR$43`L9$(j&kDH
z&<}x?UNZ+vW)Dlz@(gyA%(g&#Ebx<%na`i1;W12@jrVioSD_zkpPIi)e+wf~4piyW
z_jVYTIbkb4kv}~UkFsBy-?cr&qEv@(aOQL}hDG9FVtfLcb+*~VaxZ$!N1Qb?-xT*a
z61)lW#%YIRI)o?#2A5J_Q@e>CBK;L3@hN9zs)BK%LZJ#`*aah-hmp0V0zEdI&jeqb
z>oNVoLCeI`6O|p_&!E8=-^Dj@fnZ-eN(%8j77@Jh<!tbeIYm5O-#$*^B?YKFyDE$1
zQyIu|#XEF0j3le{!SgRLoRuh<j}^+la0lYguENQwekEYHN%S>0>Q^m9M{3Zj?|S*N
z0HM=zJ<++_1e-8QWH1&3auoR0Di>FGE@==5IR)SeJZ6N*sA*5loz&PYHVPTs?gwF>
z+eYx(q~NBQTu9gP-e%VZK_m##@D-uIB0%`_V^Qe`huY_NX8+EBBDXIh#Ii6KO{tlY
z;43!&N5kS2f@BDMm<$8LeX3C`IEbWysa7(|5&gIPE#f!t=9dnWGS>PwXZF#N8cMzU
zo86i1ti_>7`b7lyO8qxg0>^(4ENapNT&c&C7*@6k3X|ya8V0+XQftOvbAxKe+AHKR
zmiO<StBH)-kyPtRwDnTjU!|v+tmI>=b8e$<B4`LHL)Y$%3m#hsrl+9J+~%LmLl_{I
zVi;(I1$ub|lalMa6qM&YvT;6ZteyI<ML8;edulCYn@#NMI%uE?thU~c?&8l2;w|!_
z4yd7286ibay9~GB0lP#de3X%j7l`jKd{9cwOqLMWEZyC7O_`gyX`Om7ye2oV4Yx~;
zi|Z|5DniQKd5x`Yj?JBx%02tlF~Y#jgI2NCyXor>H;AvNZAPcKWx=DDe`TL1Npjy*
zqR;!D-r?a1IeCfYQriRSOr8*fgmI`^s>j?dii5I@b!UT=TL?ZWQy$NKHxikTp3_m4
zB+I`&lOh^|->N^61f)NVE&XcriuxWYlAlniF&=5;fGCt&eC#JOD1M&sw2P6<i1{fL
z&t#13-W^X`M=QEgF$FNY@uVy5&nIWu++I#ZR@*hRi8`r9TS#9UJ%M8i2OU-hklKz8
zh%P*5&XdAr=1~hpB^1MP!*6Is5Czi;v#Sy{K*9q`KX^qvvj_7q&{d+ca|w>nuYxAW
zyPQ7M*Mw7q;@>(K88Vn5GHd7ymx)rO#C$uhaWP!w&&JX3Ycdo}C94b0--=c%zncm#
z5;;Q;X~7=3EqiQPp@hk>Q<iJDd?KLS!NoTkhz)JQaw;o%N2}(6t&5O@u8)_QT1x5y
zv~u(!xV`DI-9L8RC`;wfM?_nM`$n?~8hK3Nmg!o4kEXa^B7c@Xf?4p$Prhs75}n(M
zoHpZzdW?YVx6A8~&wexc6+(w=O`lOK0!<8-J{xKfl&@R1KE&+38(cF{3n+B@l^DNw
z>7go{5eJ1<^FX^Jk1llu)zlbi+NqB35D4;w002+a#Q;hx&hiZSUcYU`QM{ts?RIrI
zCfRUij4&|d8q>4*)E23qa80TpQC*d8Q62x88PwkJLyG@Xv|)F>)a}(JK5gClu#4nP
z+20a1qfLn);Q3CGwQXIVOHuc9Ex|_@GtokM)0tw%<FV%2M?4JxMYmTbNC#xbk_KNq
z4!A!Wb3ndx_)k8{zYC>%$gb-`I^f*J8SL|JDQE49ezONH+rNmgE|1{dpE20C{x+*L
z-WGe{RdN6}?=|~As|5}u9e}si7nz59zQP8ZDvOwxf*N}y?9noACmBE8p*z?k(xjSL
zYWGwcrx>LBJgB$_pdw<fH!_$gj5w~cmrgV;yy2-(ED$3Bh<SRnkVp#};qpF>eQK3H
zJ(?bt1{Vf6FpATiFtUP;$=7$=14!2ap<GKP>@q7yuFg6YT+iUjn|5KB;g5V~5P1SX
zfr1huuiq-UPHe+kLIqc#a&8=*qQVJpjsudHAzm(85c0Wmr(PG=hHYP#5#)-jf6tB<
zOsEn`h`;{FF%koy+~c8WL&8bd?~@eqLJ6&`kta){u7A>l@MThA=LD}<xyP*!09%^|
z66jhgg)#UUm0;+yE^q%vM{p+>F8PEKj6AbA(xW4iAaM>x#x0PP!6kGSCpNzxgw`|+
zY2{AG#w|HoHV>hUEkVk#Pjo!K#=n66u-cXtdf=xzIuaW<+8wy5MUJXgdpqP?5WD@3
zmmf%RU`2j2&esi*%Vb^j(Gu0!h3QMN<EVogtHvSmfqe4--Z%6=9`F1ME<p<jK?k2n
z!CT9CSnIW8Y5kPXgepJT#~0}H0bIyk4)K?U%RL)A6p!M`%*bclX$oQ7{G!AMy}vx2
z?6t^KV^+AJouUnk2Y`{j7%H$eChl#y0^Uet3Fr<_iRP#^iR_N$n0MtA?qVldxT`5(
z&w~KB2!W&Q+)k109%J6OAr4T%`p34`AiWd390M=^BX9s^kXtT!01309R-Pl|vjxdP
z;pNjvmW5_nxzWjOCJs}*fZjVLL}TCk6XV>y?G$mVGtlU05AA@?i>RyoA6cP>OEsnz
z|8R}QEyuXby%MhkW=EPl#FP{H@H3iEzLXO`#m1cS>x=@V#<)NPRZhN+Q3HYS)+X>x
zGbaZ+WulL;^NK{Ao6ng7+FxQ3kN;1^bN`p1{*oe~eKgQsC3L=`D=G$fty@R^cySvR
zLVgYLfa(c2{JyF=Vu3|<8a)HarTqK674*b;g%^qPecTzWicz1=n!lOw{lWzf{gawJ
zHkiA9QN`&H&W@-u56uxhY!vS(eoIYJPux}$uzjka;VeL$+fvY(FC!KiE2(tSc{oc8
z#KGGc@K+0QJeGpcgi2sF#6r+k!C8|C2`R_2T0Jw6Vc;7cRKz4hwEG-oBmh&UJYst<
z^h&ToV|o|LhQDiR5%@jlm|$)(&6GaQOF&#_)C=8DBZdx|cTQHW8CwXBB~&ztfH*GA
zXWQ}cMu~3*{F@O*D<MM4gxa=@Ef6s)%Sy4Fd!;drtSZ4%;XNg&F;4GWr-p2ft*aT~
zg4v*E*VsTEII{qO$M5CcMBO+oIqYjMp6!<1sm`+0WD#5Zac7~hsSmFqNprpCu77zT
z+8nE6j(yX|Q#~2v<-&4^ZlW%n@6EbWh+rB(TAeMIn%O4y;Si?SeY<;>@2d}_GQ6Ui
z_iNglVWrzV)c_$c7h|kWg8SLbD5%zrCb(j^?2Vc%eNB)|;hS53`RaTzBuQ}CT%OZ%
zWxVzxi`Ud7Wkkt}>hXNfa7|<S%N<78F9eoW7suT-CX+LArX0m}FYpT_dARnvpKyWM
zSn85b7ttNW`X3(QFM1po&k3W}v_*U|y5ho~|8&5g^TN#%l`vyf?bCS$4oiJ^hxzhF
z`URi3?%W#YlRn;0X+ATm;4z=LX`-W&ESqZz1FF=A<$p5o)K7B~L4JF-9@&nqH!XGw
zmCX3oBzEzMYh!JUpZdUyZgTM-VL3=Ys9#R93;5(YfBm>=$);8sFj+y-HWFKT{vIq7
zHTT}P`92j>GJmR2%IyO3!8{n=m3uBnA1+c<4@cr*B;^zwGs-#Q>xdcjKk)7Sd$l0E
zM87Dj?Gu7cGge2?Mk%}}1bswDMJRs1&gM@Bb3W9j5wuwkQe1^tDr<f%6w}@0#U<I7
zP4sFH-$FD*lU)stNmL}C6zRVqDOu~gY54c|E+k!=T>78`ZLw>V&5cP)Kju#pB_$^r
zgtQdPbyOPN<Y?FDcDr~?M@qgc9G>5&0}X%WE<b-q?xwIho8;gZSy0ZIoJ6=tQ-HoV
zEu2rfMA?yEUidsgd)UhD?g<^Jjgq57KKc>KXv=E@A(CVPUw+TjQ4y(A>I9jeaqJ1v
zBuNB=_tfSLbfA{X<&#E@{(-tehQGi$H2I2_0M_j#_cTP>YRsYFkl@7t|7*AGejO%N
zy$NfuuiEobNcpz#GkoGv=90I-n6%kXN=OmmWm}ccfn!UpeMD18!|&Cqe%$uxxnQ8@
zY|Ii}ow74sH@A>zvA+S5FVpeaq@1XYUXmEKBAoovg9|!n*z>SE*!iyWb2q>0@L;}l
y0k>{<MPq+8UfTX(XFh&w5ttwuF<UycB^qw5mO16~r}$&Uk<ipHmv5P~vHcHq|7W-W
literal 0
HcmV?d00001
diff --git a/tests/d_fallocate/name b/tests/d_fallocate/name
new file mode 100644
index 0000000..72d0ed3
--- /dev/null
+++ b/tests/d_fallocate/name
@@ -0,0 +1 @@
+fallocate sparse files and big files
diff --git a/tests/d_fallocate/script b/tests/d_fallocate/script
new file mode 100644
index 0000000..ae8956e
--- /dev/null
+++ b/tests/d_fallocate/script
@@ -0,0 +1,175 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+ EXP=$test_name.tmp
+ gunzip < $test_dir/expect.gz > $EXP
+else
+ EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+ base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+ blocksize = 1024
+ inode_size = 256
+ inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O ^bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+ name="$1"
+ start="$2"
+ flag="$3"
+
+ cat << ENDL
+write /dev/null $name
+sif /$name size 40960
+eo /$name
+set_bmap $flag 10 $((start + 10))
+set_bmap $flag 13 $((start + 13))
+set_bmap $flag 26 $((start + 26))
+set_bmap $flag 29 $((start + 29))
+ec
+sif /$name blocks 8
+setb $((start + 10))
+setb $((start + 13))
+setb $((start + 26))
+setb $((start + 29))
+ENDL
+}
+
+#Files we create:
+# a: fallocate a 40k file
+# b*: falloc sparse file starting at b*
+# c*: falloc spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+# g*: falloc sparse init file starting at g*
+# h*: falloc sparse init file ending at h*
+# i: midcluster to midcluster, surrounding sparse init
+# j: partial middle cluster alloc
+# k: one big init file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+sif /a size 40960
+fallocate /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+ make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+ echo "fallocate /b$i $i 39" >> $TMPFILE.cmd
+ echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+ make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+ echo "fallocate /c$i 0 $i" >> $TMPFILE.cmd
+ echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "fallocate /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "fallocate /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 2
+setb 9000
+fallocate /f 0 8999
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+# Now do it again, but with initialized blocks
+base=20000
+for i in 8 9 10 11 12 13 14 15; do
+ make_file g$i $(($base + (40 * ($i - 8)))) >> $TMPFILE.cmd
+ echo "fallocate /g$i $i 39" >> $TMPFILE.cmd
+ echo "ex /g$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+ make_file h$i $(($base + 320 + (40 * ($i - 24)))) >> $TMPFILE.cmd
+ echo "fallocate /h$i 0 $i" >> $TMPFILE.cmd
+ echo "ex /h$i" >> $TMPFILE.cmd2
+done
+
+make_file i $(($base + 640)) >> $TMPFILE.cmd
+echo "fallocate /i 4 35" >> $TMPFILE.cmd
+echo "ex /i" >> $TMPFILE.cmd2
+
+make_file j $(($base + 680)) >> $TMPFILE.cmd
+echo "fallocate /j 19 20" >> $TMPFILE.cmd
+echo "ex /j" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null k
+sif /k size 1024
+eo /k
+set_bmap 0 19000
+ec
+sif /k blocks 2
+setb 19000
+fallocate /k 0 8999
+sif /k size 9216000
+ENDL
+echo "ex /k" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+ rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+ echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/d_fallocate_bigalloc/expect.gz b/tests/d_fallocate_bigalloc/expect.gz
new file mode 100644
index 0000000000000000000000000000000000000000..8640bc29dc71b09810f110d07d84d883b1109b88
GIT binary patch
literal 2673
zcmV-%3Xb(3iwFQXkyum$1KnF)Z``;QeGTSU5GV@l7Fk=o<fTLdMX}vYfB;!^fwTzv
zR7`BeR&CE1GviItUtftl8dDm5kloQs2MB0octq(O>GE7&&g$cOx2ZO_SK@G2ici(;
z`r%=FEk2j)!}G2b)n4om)x(2$-rSTsaZ~<QK5U;J%gtf+p)AGS;qbJ-y1XpE9`wbR
z>b|<TFL#^r;bObHy{u};*NdmSr^~+{PrKcT$8B@oP5tNN`cQ2*tJP1tdbIi)#OLav
z+<)62%17~~I@}4vtc;NN;!_R2-;4F;M(E6%8os?L_p8;9%~?J6cDH?g5{LDt`V?29
z{=9ni>Snts#l&azaaTF8JKz5b-akK;yWk3gyI5aeZ=W|9;&{j52&>hv>-}EHEAi7^
zd3}GpnEh+q^lNv{ef6LEeqC1zy53Z~^7^pdeXCX4JzqDH`A%xE)Ae?<ssF5gs}A3M
zKOOYjm7mL9xw$TdyJ!r@K@ZRI{`v87U7sz~@KEj#|H6%52`Mj~OzuC#clJX5QEaxG
zAL{WB)$Q~4dH?<Ex|7zIUSWTyt2bY(LwzIllecdkyliz-etN$Bycb_;rb~ae_W!O#
z`Bhx5SMQrAQoPw5c2zx;|GeMcR@dtXfAH(wx4k=hUv6&eXX@?4`gXq(*`>(pZ{=nE
z&-Kg14^5-2a97qGNm(fOozBER>X)QCbVuK>AD<q|b6kRMx=CC@-3re9k!#&YwwKW2
z=!jFmDc4^Cr$%t9#gXy2x~=>BBi9b&^bu!)v#!4d&h~H}sy-EGqe&FFU8<SNvgW96
z>m&E!S(-?>{>tOX(<CQo(!dq2zXq-t8SjLq`iW}#TaQz=H(r+}dp??MNR!6!44QN#
zO$?e$B25M~nM9flt}~yeNoMDN>l{xu^W$6aq#}8;o+p{5@gy@1TpCX@^E^p8!;@Tw
zC#gJ7Ql;@E6?oE#JZbQx6M4dOFL{P1S|W<q<L2NgpO4}c@|2UuVFIeZjN_>QPeme6
z1$eSEJgH@PBI$Xe_}R|!L<r-Q7{&=<oD#!0A&k>Z7?*k>o)|oFB2S3plo-beahww4
zI3bSHTpZ`$!;=gRoXo=Ui>%|-Rkmt+wg5gQ7J<KMd&B}qN8F3uYWjm%t446@h-2W`
z^=IIC1gDQU6AEID!zoM{QuT4%;yQvTOc-i<!nh`VHbF!vh-!@oK6*e95za7*ydVlQ
z%;}Au2;!b}g&9Vj&M<vJj!(LF{WWlUPLRMiH~k@qHY2zbLC6t;nW<?feDgK&`5a+~
zUju&g6MiQniORum5g$n$PnelhVMZLoP=3+$2T#QePi7gObmn=|GL0vVnMqR`PZ~3m
zrZk>3W+qKDJmm}Vq`?y>@`Ra5C(KN_>kpnxK9bRl@PwJEX(xFq7UHP@Pd1S!Of;OB
zVK6cbj?XY0PmBqpX(xHI%kadA|E$t@V#I$=%rF@7pLHTnh~qpH$5plvPl)537{?iL
zoQp)B5XU(o!^m8J@YEVC8sSMU!;?WAH;HlFAdZ{F48tIfn}iHQy8fOgqhv4scX+xv
zA4@u932%XE?V62b2~pf6<`)K0+$7`|?D~VH9QHXMWT{*xyD+KQg+X><60-|~pl>3w
zi#90v@jN|w)_>Wi&1{|-;L^-&p0(pm2Eor~&n=oqq_DP|TL>A-Ax#HRy9JxMSkSca
z4#j{Rz`4a{>3Z7ZVza|F?SLE&PWqm1;mNJOVpUO(ji+0jPxhJx26qzRiZL9wIR92P
z`CW4o%$<_PaHhp60?w@hVJ?wo441b!cSlXzcrik3rBObMBM!W<OM<}*jq+k!9D^6*
zauDlR#}rnpPH-}7aR$8P4KG@GoK|Bv+2V5Wk~`0c)i|Ax;i$zG;H7AIVd-%!$8g8z
zr2sFM9XY`981DGISkFtd?%d3+z>V_KJ}-XVxo)MKbHIxk<)wXIq+g}1Upe>M`}-^W
zczID2gRlK<alZGyIELZ+q04$<m^xwD^om7<ViASKqD}xdodC?Prvif8*|-yWO=p2d
zxt_)&;1>qpb&m^XLPBAQ!opG~=$f8WAe5Bq_QffL@3%)N3JG&ZbSy%N5xwmB^s?db
zx~G>M(94E;Y24vU)oM;JhF%=47eg<TsF#6WCR#6pyUC;V%Ar?2t=I3XhK>0ga`VrG
z&oRBS5TSfVVfk!MFblzKv|!c?)|T301@j?~!cyCeV7|y!w=z~RA0FvgmKzmJE+beo
z@h1u9>v$B_@n!_`6}+a66-+7!rV|Cz5KKo4hIk=4S}=xSJR=w_B$!DMENcKc1bZnc
z4hTk}U<e|zO_lVlU^xUU5(O(DSP?B)0l{pvU>0{{X9QEr2u4|=V7{_RVP$jX4I_9C
ziuN1?&q0ZvgWx$R+H(**2Ss}hg6E(a&mr|fg6UMj7=m%MU@#pNZ8`|1gAz>#!E{ix
z=^&U6iZ&gDctLZfgBKEP+H~Zng28kIrQ42Rd2d;TP_(76XxkAi@0Du_<yx94*XE}}
z!>p)Lv%)Q|Bea59Q3Gbh91)?=3T8zOn3cRmbc9wgD{9)T+$~cQuE$koswABEmTd?n
z;ijEfwgE0o`xN=KuPr5qWU7+~gk4Vs#OWJ7EVP1=P-7|kfY7?>50f)qc7BojWu=Z#
zcBZiGd@6UaMn@<(Q&@2B3(X4&olgpFPK7oQ+Dr+3n%^q}q4Qy(PeZq_W;gAy(5InW
z!GMgeJq3YShl0Tz1@-UKK-?7!<|wFq_XRV{2&QGCV7^G+v}uBAEXZgXE0~5k(lS;s
z4FfW|=r<ymhB*of@iT&Hn4_Qs-xn-jNH9(nOhYi8D42#|94#2kk>+T@U_eG!TL%Pl
z_XKkk><E|>4CW}<3(yy=SV%CNDp&!*ibTN*2xg-NgE`VRRxpN<7+vNb5Uepd43onn
zCg)TzhRF#wCG-We%Lv9QQ80#CVM-H>VOChh3dS%itfB>jSz%5T3}%IOv|unDJZd^r
zwvb?{rh{QRIMH-4Ob17s4u<LAXw$(k9c&W?gX!QRLNJxNo)GM1n?_$Sxr|^Y)pQt`
z4wGm)3`~cKHXR10!$g}71JhwtnqUT|!%&Q1()F|$=_T*$2v(g>E*;9%vmD9{WN$i@
z3&F$alMQ>5*_(moFdUn`8CVX(vDuq}<uJKTO4AK2hcU5fx`E{|c}$w_?glfzX;)+Q
zV4D8#WmaQyNhw=h2wPn!-0Cv9$%U}Vg~ClPliOPe+gm7cdkbNE3x(TTCO5VaHnvc>
zv87A<{W<nwsa|Z8*r6iB9V(q2SSol;TrIxcn_{h>BAPuh13R!3wym@^{QAzkPbqR|
z-ilyP%m2milHcyivi<eALyIrFZSza>?GC6vWa2x{GkWd6FK_<LhRpXDtGCtHzld+m
zugu3z@?D*GN%I2KT&vJQ_I?-rr{*KxzI*>heE8>wzrXoKeEji)_~mauzkB=c&CmYp
f9)!C;rDnRY1HYwqzH#!$)N=J-JwLQM>OTMg)z2kH
literal 0
HcmV?d00001
diff --git a/tests/d_fallocate_bigalloc/name b/tests/d_fallocate_bigalloc/name
new file mode 100644
index 0000000..915645c
--- /dev/null
+++ b/tests/d_fallocate_bigalloc/name
@@ -0,0 +1 @@
+fallocate sparse files and big files with bigalloc
diff --git a/tests/d_fallocate_bigalloc/script b/tests/d_fallocate_bigalloc/script
new file mode 100644
index 0000000..6b6bf97
--- /dev/null
+++ b/tests/d_fallocate_bigalloc/script
@@ -0,0 +1,176 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+ EXP=$test_name.tmp
+ gunzip < $test_dir/expect.gz > $EXP
+else
+ EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+ cluster_size = 8192
+ base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+ blocksize = 1024
+ inode_size = 256
+ inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+ name="$1"
+ start="$2"
+ flag="$3"
+
+ cat << ENDL
+write /dev/null $name
+sif /$name size 40960
+eo /$name
+set_bmap $flag 10 $((start + 10))
+set_bmap $flag 13 $((start + 13))
+set_bmap $flag 26 $((start + 26))
+set_bmap $flag 29 $((start + 29))
+ec
+sif /$name blocks 32
+setb $((start + 10))
+setb $((start + 13))
+setb $((start + 26))
+setb $((start + 29))
+ENDL
+}
+
+#Files we create:
+# a: fallocate a 40k file
+# b*: falloc sparse file starting at b*
+# c*: falloc spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+# g*: falloc sparse init file starting at g*
+# h*: falloc sparse init file ending at h*
+# i: midcluster to midcluster, surrounding sparse init
+# j: partial middle cluster alloc
+# k: one big init file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+sif /a size 40960
+fallocate /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+ make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+ echo "fallocate /b$i $i 39" >> $TMPFILE.cmd
+ echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+ make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+ echo "fallocate /c$i 0 $i" >> $TMPFILE.cmd
+ echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "fallocate /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "fallocate /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 16
+setb 9000
+fallocate /f 0 8999
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+# Now do it again, but with initialized blocks
+base=20000
+for i in 8 9 10 11 12 13 14 15; do
+ make_file g$i $(($base + (40 * ($i - 8)))) >> $TMPFILE.cmd
+ echo "fallocate /g$i $i 39" >> $TMPFILE.cmd
+ echo "ex /g$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+ make_file h$i $(($base + 320 + (40 * ($i - 24)))) >> $TMPFILE.cmd
+ echo "fallocate /h$i 0 $i" >> $TMPFILE.cmd
+ echo "ex /h$i" >> $TMPFILE.cmd2
+done
+
+make_file i $(($base + 640)) >> $TMPFILE.cmd
+echo "fallocate /i 4 35" >> $TMPFILE.cmd
+echo "ex /i" >> $TMPFILE.cmd2
+
+make_file j $(($base + 680)) >> $TMPFILE.cmd
+echo "fallocate /j 19 20" >> $TMPFILE.cmd
+echo "ex /j" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null k
+sif /k size 1024
+eo /k
+set_bmap 0 19000
+ec
+sif /k blocks 16
+setb 19000
+fallocate /k 0 8999
+sif /k size 9216000
+ENDL
+echo "ex /k" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+ rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+ echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/d_fallocate_blkmap/expect b/tests/d_fallocate_blkmap/expect
new file mode 100644
index 0000000..f7ae606
--- /dev/null
+++ b/tests/d_fallocate_blkmap/expect
@@ -0,0 +1,58 @@
+Creating filesystem with 65536 1k blocks and 4096 inodes
+Superblock backups stored on blocks:
+ 8193, 24577, 40961, 57345
+
+Allocating group tables: \b\b\bdone
+Writing inode tables: \b\b\bdone
+Writing superblocks and filesystem accounting information: \b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/4096 files (0.0% non-contiguous), 2340/65536 blocks
+Exit status is 0
+debugfs write files
+debugfs: stat /a
+Inode: 12 Type: regular Mode: 0666 Flags: 0x0
+Generation: 0 Version: 0x00000000:00000000
+User: 0 Group: 0 Size: 40960
+File ACL: 0 Directory ACL: 0
+Links: 1 Blockcount: 82
+Fragment: Address: 0 Number: 0 Size: 0
+Size of extra inode fields: 28
+BLOCKS:
+(0-1):1312-1313, (2-11):8000-8009, (IND):8010, (12-39):8011-8038
+TOTAL: 41
+
+debugfs: stat /b
+Inode: 13 Type: regular Mode: 0666 Flags: 0x0
+Generation: 0 Version: 0x00000000:00000000
+User: 0 Group: 0 Size: 10240000
+File ACL: 0 Directory ACL: 0
+Links: 1 Blockcount: 20082
+Fragment: Address: 0 Number: 0 Size: 0
+Size of extra inode fields: 28
+BLOCKS:
+(0-11):10000-10011, (IND):10012, (12-267):10013-10268, (DIND):10269, (IND):10270, (268-523):10271-10526, (IND):10527, (524-779):10528-10783, (IND):10784, (780-1035):10785-11040, (IND):11041, (1036-1291):11042-11297, (IND):11298, (1292-1547):11299-11554, (IND):11555, (1548-1803):11556-11811, (IND):11812, (1804-2059):11813-12068, (IND):12069, (2060-2315):12070-12325, (IND):12326, (2316-2571):12327-12582, (IND):12583, (2572-2827):12584-12839, (IND):12840, (2828-3083):12841-13096, (IND):13097, (3084-3339):13098-13353, (IND):13354, (3340-3595):13355-13610, (IND):13611, (3596-3851):13612-13867, (IND):13868, (3852-4107):13869-14124, (IND):14125, (4108-4363):14126-14381, (IND):14382, (4364-4619):14383-14638, (IND):14639, (4620-4875):14640-14895, (IND):14896, (4876-5131):14897-15152, (IND):15153,
(5132-5387):15154-15409, (IND):15410, (5388-5643):15411-15666, (IND):15667, (5644-5899):15668-15923, (IND):15924, (5900-6155):15925-16180, (IND):16181, (6156-6411):16182-16437, (IND):16438,!
(6412-6667):16439-16694, (IND):16695, (6668-6923):16696-16951, (IND):16952, (6924-7179):16953-17208, (IND):17209, (7180-7435):17210-17465, (IND):17466, (7436-7691):17467-17722, (IND):17723, (7692-7947):17724-17979, (IND):17980, (7948-8203):17981-18236, (IND):18237, (8204-8459):18238-18493, (IND):18494, (8460-8715):18495-18750, (IND):18751, (8716-8971):18752-19007, (IND):19008, (8972-9227):19009-19264, (IND):19265, (9228-9483):19266-19521, (IND):19522, (9484-9739):19523-19778, (IND):19779, (9740-9995):19780-20035, (IND):20036, (9996-9999):20037-20040
+TOTAL: 10041
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #0 (6841, counted=6840).
+Fix? yes
+
+Free blocks count wrong for group #1 (1551, counted=1550).
+Fix? yes
+
+Free blocks count wrong (53116, counted=53114).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 13/4096 files (7.7% non-contiguous), 12422/65536 blocks
+Exit status is 1
diff --git a/tests/d_fallocate_blkmap/name b/tests/d_fallocate_blkmap/name
new file mode 100644
index 0000000..ba2b61d
--- /dev/null
+++ b/tests/d_fallocate_blkmap/name
@@ -0,0 +1 @@
+fallocate sparse files and big files on a blockmap fs
diff --git a/tests/d_fallocate_blkmap/script b/tests/d_fallocate_blkmap/script
new file mode 100644
index 0000000..9c48cbc
--- /dev/null
+++ b/tests/d_fallocate_blkmap/script
@@ -0,0 +1,85 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+ EXP=$test_name.tmp
+ gunzip < $test_dir/expect.gz > $EXP1
+else
+ EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+ base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,^extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,^64bit
+ blocksize = 1024
+ inode_size = 256
+ inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O ^bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+
+#Files we create:
+# a: fallocate a 40k file
+# k: one big file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+sif /a bmap[2] 8000
+sif /a size 40960
+sif /a i_blocks 2
+setb 8000
+fallocate /a 0 39
+
+write /dev/null b
+sif /b size 10240000
+sif /b bmap[0] 10000
+sif /b i_blocks 2
+setb 10000
+fallocate /b 0 9999
+ENDL
+echo "stat /a" >> $TMPFILE.cmd2
+echo "stat /b" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed -e '/^.*time:.*$/d' < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+ rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+ echo "$test_name: $test_description: skipped"
+fi
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 30/31] tests: test debugfs punch command
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (28 preceding siblings ...)
2014-12-20 21:20 ` [PATCH 29/31] debugfs: implement fallocate Darrick J. Wong
@ 2014-12-20 21:20 ` Darrick J. Wong
2014-12-22 18:53 ` [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs Darrick J. Wong
` (5 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-20 21:20 UTC (permalink / raw)
To: tytso, darrick.wong; +Cc: linux-ext4
Test punching out various parts of sparse files.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
tests/d_punch/expect | 208 +++++++++++++++++++++++++++++++++++++++++
tests/d_punch/name | 1
tests/d_punch/script | 129 +++++++++++++++++++++++++
tests/d_punch_bigalloc/expect | 207 +++++++++++++++++++++++++++++++++++++++++
tests/d_punch_bigalloc/name | 1
tests/d_punch_bigalloc/script | 130 ++++++++++++++++++++++++++
6 files changed, 676 insertions(+)
create mode 100644 tests/d_punch/expect
create mode 100644 tests/d_punch/name
create mode 100644 tests/d_punch/script
create mode 100644 tests/d_punch_bigalloc/expect
create mode 100644 tests/d_punch_bigalloc/name
create mode 100644 tests/d_punch_bigalloc/script
diff --git a/tests/d_punch/expect b/tests/d_punch/expect
new file mode 100644
index 0000000..764715e
--- /dev/null
+++ b/tests/d_punch/expect
@@ -0,0 +1,208 @@
+Creating filesystem with 65536 1k blocks and 4096 inodes
+Superblock backups stored on blocks:
+ 8193, 24577, 40961, 57345
+
+Allocating group tables: \b\b\bdone
+Writing inode tables: \b\b\bdone
+Writing superblocks and filesystem accounting information: \b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/4096 files (0.0% non-contiguous), 2345/65536 blocks
+Exit status is 0
+debugfs write files
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+debugfs: ex /sample
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1323 0
+ 1/ 1 1/ 5 0 - 9 1313 - 1322 10 Uninit
+ 1/ 1 2/ 5 11 - 12 1324 - 1325 2 Uninit
+ 1/ 1 3/ 5 14 - 25 1327 - 1338 12 Uninit
+ 1/ 1 4/ 5 27 - 28 1340 - 1341 2 Uninit
+ 1/ 1 5/ 5 30 - 39 1343 - 1352 10 Uninit
+debugfs: ex /b8
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1390 0
+ 1/ 1 1/ 4 0 - 0 1326 - 1326 1 Uninit
+ 1/ 1 2/ 4 1 - 1 1339 - 1339 1 Uninit
+ 1/ 1 3/ 4 2 - 2 1342 - 1342 1 Uninit
+ 1/ 1 4/ 4 3 - 7 1353 - 1357 5 Uninit
+debugfs: ex /b9
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1368 0
+ 1/ 1 1/ 1 0 - 8 1358 - 1366 9 Uninit
+debugfs: ex /b10
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1378 0
+ 1/ 1 1/ 2 0 - 0 1367 - 1367 1 Uninit
+ 1/ 1 2/ 2 1 - 9 1369 - 1377 9 Uninit
+debugfs: ex /b11
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1389 0
+ 1/ 1 1/ 1 0 - 9 1379 - 1388 10 Uninit
+debugfs: ex /b12
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1401 0
+ 1/ 1 1/ 2 0 - 9 1391 - 1400 10 Uninit
+ 1/ 1 2/ 2 11 - 11 1402 - 1402 1 Uninit
+debugfs: ex /b13
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1413 0
+ 1/ 1 1/ 2 0 - 9 1403 - 1412 10 Uninit
+ 1/ 1 2/ 2 11 - 12 1414 - 1415 2 Uninit
+debugfs: ex /b14
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1426 0
+ 1/ 1 1/ 2 0 - 9 1416 - 1425 10 Uninit
+ 1/ 1 2/ 2 11 - 12 1427 - 1428 2 Uninit
+debugfs: ex /b15
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1439 0
+ 1/ 1 1/ 3 0 - 9 1429 - 1438 10 Uninit
+ 1/ 1 2/ 3 11 - 12 1440 - 1441 2 Uninit
+ 1/ 1 3/ 3 14 - 14 1443 - 1443 1 Uninit
+debugfs: ex /c24
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 25 - 4294967295 1453 4294967271
+ 1/ 1 1/ 3 25 - 25 1468 - 1468 1 Uninit
+ 1/ 1 2/ 3 27 - 28 1470 - 1471 2 Uninit
+ 1/ 1 3/ 3 30 - 39 1473 - 1482 10 Uninit
+debugfs: ex /c25
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 27 - 4294967295 1483 4294967269
+ 1/ 1 1/ 2 27 - 28 1485 - 1486 2 Uninit
+ 1/ 1 2/ 2 30 - 39 1488 - 1497 10 Uninit
+debugfs: ex /c26
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 27 - 4294967295 1484 4294967269
+ 1/ 1 1/ 2 27 - 28 1498 - 1499 2 Uninit
+ 1/ 1 2/ 2 30 - 39 1501 - 1510 10 Uninit
+debugfs: ex /c27
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 28 - 4294967295 1487 4294967268
+ 1/ 1 1/ 2 28 - 28 1512 - 1512 1 Uninit
+ 1/ 1 2/ 2 30 - 39 1514 - 1523 10 Uninit
+debugfs: ex /c28
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 30 - 4294967295 1500 4294967266
+ 1/ 1 1/ 1 30 - 39 1526 - 1535 10 Uninit
+debugfs: ex /c29
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 30 - 4294967295 1511 4294967266
+ 1/ 1 1/ 1 30 - 39 1537 - 1546 10 Uninit
+debugfs: ex /c30
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 31 - 4294967295 1513 4294967265
+ 1/ 1 1/ 1 31 - 39 1549 - 1557 9 Uninit
+debugfs: ex /c31
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 32 - 4294967295 1524 4294967264
+ 1/ 1 1/ 1 32 - 39 1560 - 1567 8 Uninit
+debugfs: ex /d
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1525 0
+ 1/ 1 1/ 3 0 - 0 1442 - 1442 1 Uninit
+ 1/ 1 2/ 3 1 - 3 1444 - 1446 3 Uninit
+ 1/ 1 3/ 3 36 - 39 1573 - 1576 4 Uninit
+debugfs: ex /e
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1547 0
+ 1/ 1 1/ 11 0 - 5 1447 - 1452 6 Uninit
+ 1/ 1 2/ 11 6 - 9 1454 - 1457 4 Uninit
+ 1/ 1 3/ 11 11 - 12 1459 - 1460 2 Uninit
+ 1/ 1 4/ 11 14 - 18 1462 - 1466 5 Uninit
+ 1/ 1 5/ 11 21 - 21 1472 - 1472 1 Uninit
+ 1/ 1 6/ 11 22 - 22 1536 - 1536 1 Uninit
+ 1/ 1 7/ 11 23 - 23 1548 - 1548 1 Uninit
+ 1/ 1 8/ 11 24 - 25 1558 - 1559 2 Uninit
+ 1/ 1 9/ 11 27 - 28 1569 - 1570 2 Uninit
+ 1/ 1 10/ 11 30 - 30 1572 - 1572 1 Uninit
+ 1/ 1 11/ 11 31 - 39 1577 - 1585 9 Uninit
+debugfs: ex /f
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 2 0 - 0 9000 - 9000 1 Uninit
+ 0/ 0 2/ 2 8999 - 8999 17999 - 17999 1 Uninit
+Pass 1: Checking inodes, blocks, and sizes
+Inode 15 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 16 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 17 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 18 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 19 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 20 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 21 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 22 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 23 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 24 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 25 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 26 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 27 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 28 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 29 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 30 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #1 (7934, counted=7933).
+Fix? yes
+
+Free blocks count wrong (62939, counted=62938).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 32/4096 files (43.8% non-contiguous), 2598/65536 blocks
+Exit status is 1
diff --git a/tests/d_punch/name b/tests/d_punch/name
new file mode 100644
index 0000000..724639f
--- /dev/null
+++ b/tests/d_punch/name
@@ -0,0 +1 @@
+punch sparse files and big files
diff --git a/tests/d_punch/script b/tests/d_punch/script
new file mode 100644
index 0000000..7a77c69
--- /dev/null
+++ b/tests/d_punch/script
@@ -0,0 +1,129 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+ EXP=$test_name.tmp
+ gunzip < $test_dir/expect.gz > $EXP1
+else
+ EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+ base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+ blocksize = 1024
+ inode_size = 256
+ inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O ^bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+ name="$1"
+ start="$2"
+ flag="$3"
+
+ cat << ENDL
+write /dev/null $name
+fallocate /$name 0 39
+punch /$name 10 10
+punch /$name 13 13
+punch /$name 26 26
+punch /$name 29 29
+ENDL
+}
+
+#Files we create:
+# a: punch a 40k file
+# b*: punch sparse file starting at b*
+# c*: punch spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+fallocate /a 0 39
+punch /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+ make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+ echo "punch /b$i $i 39" >> $TMPFILE.cmd
+ echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+ make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+ echo "punch /c$i 0 $i" >> $TMPFILE.cmd
+ echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "punch /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "punch /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 2
+setb 9000
+fallocate /f 0 8999
+punch /f 1 8998
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+ rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+ echo "$test_name: $test_description: skipped"
+fi
diff --git a/tests/d_punch_bigalloc/expect b/tests/d_punch_bigalloc/expect
new file mode 100644
index 0000000..21427d5
--- /dev/null
+++ b/tests/d_punch_bigalloc/expect
@@ -0,0 +1,207 @@
+
+Warning: the bigalloc feature is still under development
+See https://ext4.wiki.kernel.org/index.php/Bigalloc for more information
+
+Creating filesystem with 65536 1k blocks and 4096 inodes
+
+Allocating group tables: \b\b\bdone
+Writing inode tables: \b\b\bdone
+Writing superblocks and filesystem accounting information: \b\b\bdone
+
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 11/4096 files (9.1% non-contiguous), 1144/65536 blocks
+Exit status is 0
+debugfs write files
+debugfs: ex /a
+Level Entries Logical Physical Length Flags
+debugfs: ex /sample
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1184 0
+ 1/ 1 1/ 5 0 - 9 1144 - 1153 10 Uninit
+ 1/ 1 2/ 5 11 - 12 1155 - 1156 2 Uninit
+ 1/ 1 3/ 5 14 - 25 1158 - 1169 12 Uninit
+ 1/ 1 4/ 5 27 - 28 1171 - 1172 2 Uninit
+ 1/ 1 5/ 5 30 - 39 1174 - 1183 10 Uninit
+debugfs: ex /b8
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1232 0
+ 1/ 1 1/ 1 0 - 7 1192 - 1199 8 Uninit
+debugfs: ex /b9
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1248 0
+ 1/ 1 1/ 1 0 - 8 1200 - 1208 9 Uninit
+debugfs: ex /b10
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1272 0
+ 1/ 1 1/ 1 0 - 9 1216 - 1225 10 Uninit
+debugfs: ex /b11
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1296 0
+ 1/ 1 1/ 2 0 - 7 1240 - 1247 8 Uninit
+ 1/ 1 2/ 2 8 - 9 1256 - 1257 2 Uninit
+debugfs: ex /b12
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1320 0
+ 1/ 1 1/ 3 0 - 7 1264 - 1271 8 Uninit
+ 1/ 1 2/ 3 8 - 9 1280 - 1281 2 Uninit
+ 1/ 1 3/ 3 11 - 11 1283 - 1283 1 Uninit
+debugfs: ex /b13
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1344 0
+ 1/ 1 1/ 3 0 - 7 1288 - 1295 8 Uninit
+ 1/ 1 2/ 3 8 - 9 1304 - 1305 2 Uninit
+ 1/ 1 3/ 3 11 - 12 1307 - 1308 2 Uninit
+debugfs: ex /b14
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1368 0
+ 1/ 1 1/ 3 0 - 7 1312 - 1319 8 Uninit
+ 1/ 1 2/ 3 8 - 9 1328 - 1329 2 Uninit
+ 1/ 1 3/ 3 11 - 12 1331 - 1332 2 Uninit
+debugfs: ex /b15
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1392 0
+ 1/ 1 1/ 4 0 - 7 1336 - 1343 8 Uninit
+ 1/ 1 2/ 4 8 - 9 1352 - 1353 2 Uninit
+ 1/ 1 3/ 4 11 - 12 1355 - 1356 2 Uninit
+ 1/ 1 4/ 4 14 - 14 1358 - 1358 1 Uninit
+debugfs: ex /c24
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 25 - 4294967295 1416 4294967271
+ 1/ 1 1/ 3 25 - 25 1401 - 1401 1 Uninit
+ 1/ 1 2/ 3 27 - 28 1403 - 1404 2 Uninit
+ 1/ 1 3/ 3 30 - 39 1406 - 1415 10 Uninit
+debugfs: ex /c25
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 27 - 4294967295 1440 4294967269
+ 1/ 1 1/ 2 27 - 28 1427 - 1428 2 Uninit
+ 1/ 1 2/ 2 30 - 39 1430 - 1439 10 Uninit
+debugfs: ex /c26
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 27 - 4294967295 1464 4294967269
+ 1/ 1 1/ 2 27 - 28 1451 - 1452 2 Uninit
+ 1/ 1 2/ 2 30 - 39 1454 - 1463 10 Uninit
+debugfs: ex /c27
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 28 - 4294967295 1488 4294967268
+ 1/ 1 1/ 2 28 - 28 1476 - 1476 1 Uninit
+ 1/ 1 2/ 2 30 - 39 1478 - 1487 10 Uninit
+debugfs: ex /c28
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 30 - 4294967295 1512 4294967266
+ 1/ 1 1/ 1 30 - 39 1502 - 1511 10 Uninit
+debugfs: ex /c29
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 30 - 4294967295 1536 4294967266
+ 1/ 1 1/ 1 30 - 39 1526 - 1535 10 Uninit
+debugfs: ex /c30
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 31 - 4294967295 1560 4294967265
+ 1/ 1 1/ 1 31 - 39 1551 - 1559 9 Uninit
+debugfs: ex /c31
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 32 - 4294967295 1584 4294967264
+ 1/ 1 1/ 1 32 - 39 1576 - 1583 8 Uninit
+debugfs: ex /d
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1600 0
+ 1/ 1 1/ 2 0 - 3 1360 - 1363 4 Uninit
+ 1/ 1 2/ 2 36 - 39 1596 - 1599 4 Uninit
+debugfs: ex /e
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 4294967295 1624 0
+ 1/ 1 1/ 8 0 - 9 1376 - 1385 10 Uninit
+ 1/ 1 2/ 8 11 - 12 1387 - 1388 2 Uninit
+ 1/ 1 3/ 8 14 - 15 1390 - 1391 2 Uninit
+ 1/ 1 4/ 8 16 - 18 1568 - 1570 3 Uninit
+ 1/ 1 5/ 8 21 - 23 1573 - 1575 3 Uninit
+ 1/ 1 6/ 8 24 - 25 1608 - 1609 2 Uninit
+ 1/ 1 7/ 8 27 - 28 1611 - 1612 2 Uninit
+ 1/ 1 8/ 8 30 - 39 1614 - 1623 10 Uninit
+debugfs: ex /f
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 2 0 - 0 9000 - 9000 1 Uninit
+ 0/ 0 2/ 2 8999 - 8999 17999 - 17999 1 Uninit
+Pass 1: Checking inodes, blocks, and sizes
+Inode 14 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 15 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 16 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 17 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 18 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 19 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 20 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 22 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 23 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 24 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 25 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 26 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 27 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 28 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 29 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Inode 30 extent tree could be shorter.
+ (level 1 is unnecessary)
+Fix? yes
+
+Pass 1E: Optimizing extent trees
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #0 (8003, counted=8002).
+Fix? yes
+
+Free blocks count wrong (64024, counted=64016).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 32/4096 files (43.8% non-contiguous), 1520/65536 blocks
+Exit status is 1
diff --git a/tests/d_punch_bigalloc/name b/tests/d_punch_bigalloc/name
new file mode 100644
index 0000000..6d61ebe
--- /dev/null
+++ b/tests/d_punch_bigalloc/name
@@ -0,0 +1 @@
+punch sparse files and big files with bigalloc
diff --git a/tests/d_punch_bigalloc/script b/tests/d_punch_bigalloc/script
new file mode 100644
index 0000000..6eb0571
--- /dev/null
+++ b/tests/d_punch_bigalloc/script
@@ -0,0 +1,130 @@
+if test -x $DEBUGFS_EXE; then
+
+FSCK_OPT=-fy
+OUT=$test_name.log
+if [ -f $test_dir/expect.gz ]; then
+ EXP=$test_name.tmp
+ gunzip < $test_dir/expect.gz > $EXP1
+else
+ EXP=$test_dir/expect
+fi
+
+cp /dev/null $OUT
+
+cat > $TMPFILE.conf << ENDL
+[fs_types]
+ext4 = {
+ cluster_size = 8192
+ base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr,^has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit
+ blocksize = 1024
+ inode_size = 256
+ inode_ratio = 16384
+}
+ENDL
+MKE2FS_CONFIG=$TMPFILE.conf $MKE2FS -F -o Linux -b 1024 -O bigalloc -T ext4 $TMPFILE 65536 2>&1 | sed -f $cmd_dir/filter.sed >> $OUT 2>&1
+rm -rf $TMPFILE.conf
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+echo "debugfs write files" >> $OUT
+make_file() {
+ name="$1"
+ start="$2"
+ flag="$3"
+
+ cat << ENDL
+write /dev/null $name
+fallocate /$name 0 39
+punch /$name 10 10
+punch /$name 13 13
+punch /$name 26 26
+punch /$name 29 29
+ENDL
+}
+
+#Files we create:
+# a: punch a 40k file
+# b*: punch sparse file starting at b*
+# c*: punch spare file ending at c*
+# d: midcluster to midcluster, surrounding sparse
+# e: partial middle cluster alloc
+# f: one big file
+base=5000
+cat > $TMPFILE.cmd << ENDL
+write /dev/null a
+fallocate /a 0 39
+punch /a 0 39
+ENDL
+echo "ex /a" >> $TMPFILE.cmd2
+
+make_file sample $base --uninit >> $TMPFILE.cmd
+echo "ex /sample" >> $TMPFILE.cmd2
+base=10000
+
+for i in 8 9 10 11 12 13 14 15; do
+ make_file b$i $(($base + (40 * ($i - 8)))) --uninit >> $TMPFILE.cmd
+ echo "punch /b$i $i 39" >> $TMPFILE.cmd
+ echo "ex /b$i" >> $TMPFILE.cmd2
+done
+
+for i in 24 25 26 27 28 29 30 31; do
+ make_file c$i $(($base + 320 + (40 * ($i - 24)))) --uninit >> $TMPFILE.cmd
+ echo "punch /c$i 0 $i" >> $TMPFILE.cmd
+ echo "ex /c$i" >> $TMPFILE.cmd2
+done
+
+make_file d $(($base + 640)) --uninit >> $TMPFILE.cmd
+echo "punch /d 4 35" >> $TMPFILE.cmd
+echo "ex /d" >> $TMPFILE.cmd2
+
+make_file e $(($base + 680)) --uninit >> $TMPFILE.cmd
+echo "punch /e 19 20" >> $TMPFILE.cmd
+echo "ex /e" >> $TMPFILE.cmd2
+
+cat >> $TMPFILE.cmd << ENDL
+write /dev/null f
+sif /f size 1024
+eo /f
+set_bmap --uninit 0 9000
+ec
+sif /f blocks 16
+setb 9000
+fallocate /f 0 8999
+punch /f 1 8998
+ENDL
+echo "ex /f" >> $TMPFILE.cmd2
+
+$DEBUGFS_EXE -w -f $TMPFILE.cmd $TMPFILE > /dev/null 2>&1
+$DEBUGFS_EXE -f $TMPFILE.cmd2 $TMPFILE >> $OUT.new 2>&1
+sed -f $cmd_dir/filter.sed < $OUT.new >> $OUT
+rm -rf $OUT.new $TMPFILE.cmd $TMPFILE.cmd2
+
+$FSCK -fy -N test_filesys $TMPFILE > $OUT.new 2>&1
+status=$?
+echo Exit status is $status >> $OUT.new
+sed -f $cmd_dir/filter.sed -e "s;$TMPFILE;test.img;" $OUT.new >> $OUT
+rm -f $OUT.new
+
+rm -f $TMPFILE
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+ rm -f $test_name.tmp
+fi
+
+unset IMAGE FSCK_OPT OUT EXP
+
+else #if test -x $DEBUGFS_EXE; then
+ echo "$test_name: $test_description: skipped"
+fi
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (29 preceding siblings ...)
2014-12-20 21:20 ` [PATCH 30/31] tests: test debugfs punch command Darrick J. Wong
@ 2014-12-22 18:53 ` Darrick J. Wong
2014-12-22 22:22 ` Andreas Dilger
2014-12-22 22:55 ` [PATCH v2 " Darrick J. Wong
2014-12-22 18:55 ` [PATCH 33/31] e2fsck: on read error, don't rewrite blocks past the end of the fs Darrick J. Wong
` (4 subsequent siblings)
35 siblings, 2 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 18:53 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
If i_extra_isize is zero when we try to write extended attributes,
we'll end up writing the EA magic into the i_extra_isize field, which
causes a subsequent crash on big endian systems (when we try to write
0xEA02 bytes past the inode!). Therefore when the field is zero, set
i_extra_isize to the desired extra_isize size, zero those bytes, and
write the EAs after the end of the extended inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/ext_attr.c | 11 +++++++++++
| 12 ++++++++++++
| 7 +++++++
| Bin
| 1 +
5 files changed, 31 insertions(+)
create mode 100644 tests/f_write_ea_no_extra_isize/expect.1
create mode 100644 tests/f_write_ea_no_extra_isize/expect.2
create mode 100644 tests/f_write_ea_no_extra_isize/image.gz
create mode 100644 tests/f_write_ea_no_extra_isize/name
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 70bc3f9..551c1f2 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -519,6 +519,17 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
if (err)
goto out;
+ /* If extra_isize isn't set, we need to set it now */
+ if (inode->i_extra_isize == 0) {
+ char *p = (char *)inode;
+ size_t extra = handle->fs->super->s_want_extra_isize;
+
+ if (extra == 0)
+ extra = sizeof(inode->i_extra_isize);
+ memset(p + EXT2_GOOD_OLD_INODE_SIZE, 0, extra);
+ inode->i_extra_isize = extra;
+ }
+
move_inline_data_to_front(handle);
x = handle->attrs;
--git a/tests/f_write_ea_no_extra_isize/expect.1 b/tests/f_write_ea_no_extra_isize/expect.1
new file mode 100644
index 0000000..b7e7438
--- /dev/null
+++ b/tests/f_write_ea_no_extra_isize/expect.1
@@ -0,0 +1,12 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Directory inode 12, block #0, offset 4: directory corrupted
+Salvage? yes
+
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 1
--git a/tests/f_write_ea_no_extra_isize/expect.2 b/tests/f_write_ea_no_extra_isize/expect.2
new file mode 100644
index 0000000..3b6073e
--- /dev/null
+++ b/tests/f_write_ea_no_extra_isize/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 0
diff --git a/tests/f_write_ea_no_extra_isize/image.gz b/tests/f_write_ea_no_extra_isize/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..928daff1f344824d357e816883a98b2cdfdaffb3
GIT binary patch
literal 2516
zcmb2|=3qFkI6Z`k`Ry&+Y!OEZh6m-}^`s^_@O3Vjpj4;eVQ?ceQSj)oL#Gnz1=cLm
zv~lFfFtI=2=98SCs6XwXQ}-6Ju%@X>9fI9126HxlZ#2Be**0UwlghsG_L*~cr<Q$x
zcJJ<Oj)e8Ibj2!<JPfHhr<Aq!+aA$gzRhQoDpTqfm88@ubyxlMdNpU)@t~~i`mk$S
zzoVyaURhCSpQJX`e|`A<yOq_6FK*5jUa#jL8+)&B<F~W5)<2(II(+ff@7nX~@@wXt
zO0r@5^laDG$2W8iznog0p}+h^;VVXl4ZK}{zBqI(?)|f0Y^ht>t&nbG@wbW$3=B7Z
zzWX-qadqb7_v@K~-2ZzY{r}&1;?u?qJ8lDkN;|V_HErb@zdo&eIeY8H?91Epe8gTW
z#qp>8%-!w_RG#4e@_o_Y_#Xxt|AD*%>jF1&gXpjJK#~;jC;IEb(o&#~gzJUuETrf}
zRok%tV=6ElWvesq%s;o>RW(Vqw|cASS<|O4a;NN_xHR?tb)Bt$LbgmbOZ<IUQ%o$p
zUwq@2A5YK3-Tr^z+swV;J`Rs%pZ#BB`ad`3|E{WQssEz?TR&NG;97*nr}>xXe%f!t
zxAoKf7Zd(Zk&?Xh?1}x|{r`U+Uwh*Jxur#4xBt=4FJGCGo%NMBXYuokNl*KaC%k*G
zu}tK){;6+2PXDfdYaO{ZbnaGE_m1*LLtr!nMnhmU1V%$(Gz6#@0(bv1g#8Yk!N8!v
F003fX5hwrv
literal 0
HcmV?d00001
--git a/tests/f_write_ea_no_extra_isize/name b/tests/f_write_ea_no_extra_isize/name
new file mode 100644
index 0000000..200e365
--- /dev/null
+++ b/tests/f_write_ea_no_extra_isize/name
@@ -0,0 +1 @@
+write EA when i_extra_size is zero
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs
2014-12-22 18:53 ` [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs Darrick J. Wong
@ 2014-12-22 22:22 ` Andreas Dilger
2014-12-22 22:32 ` Darrick J. Wong
2014-12-22 22:55 ` [PATCH v2 " Darrick J. Wong
1 sibling, 1 reply; 42+ messages in thread
From: Andreas Dilger @ 2014-12-22 22:22 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Theodore Ts'o, ext4 development
On Dec 22, 2014, at 11:53 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> If i_extra_isize is zero when we try to write extended attributes,
> we'll end up writing the EA magic into the i_extra_isize field, which
> causes a subsequent crash on big endian systems (when we try to write
> 0xEA02 bytes past the inode!). Therefore when the field is zero, set
> i_extra_isize to the desired extra_isize size, zero those bytes, and
> write the EAs after the end of the extended inode.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> lib/ext2fs/ext_attr.c | 11 +++++++++++
> tests/f_write_ea_no_extra_isize/expect.1 | 12 ++++++++++++
> tests/f_write_ea_no_extra_isize/expect.2 | 7 +++++++
> tests/f_write_ea_no_extra_isize/image.gz | Bin
> tests/f_write_ea_no_extra_isize/name | 1 +
> 5 files changed, 31 insertions(+)
> create mode 100644 tests/f_write_ea_no_extra_isize/expect.1
> create mode 100644 tests/f_write_ea_no_extra_isize/expect.2
> create mode 100644 tests/f_write_ea_no_extra_isize/image.gz
> create mode 100644 tests/f_write_ea_no_extra_isize/name
>
> diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> index 70bc3f9..551c1f2 100644
> --- a/lib/ext2fs/ext_attr.c
> +++ b/lib/ext2fs/ext_attr.c
> @@ -519,6 +519,17 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> if (err)
> goto out;
>
> + /* If extra_isize isn't set, we need to set it now */
> + if (inode->i_extra_isize == 0) {
> + char *p = (char *)inode;
> + size_t extra = handle->fs->super->s_want_extra_isize;
> +
> + if (extra == 0)
> + extra = sizeof(inode->i_extra_isize);
I don't think this is quite correct. At a minimum, i_extra_isize should
include the padding bytes (now i_checksum_hi) following it so that the
xattr magic and other fields will be properly 32-bit aligned. That said,
if we are going to use the large inode it probably makes sense to leave
space for the i_*time_extra fields.
Cheers, Andreas
> + memset(p + EXT2_GOOD_OLD_INODE_SIZE, 0, extra);
> + inode->i_extra_isize = extra;
> + }
> +
> move_inline_data_to_front(handle);
>
> x = handle->attrs;
> diff --git a/tests/f_write_ea_no_extra_isize/expect.1 b/tests/f_write_ea_no_extra_isize/expect.1
> new file mode 100644
> index 0000000..b7e7438
> --- /dev/null
> +++ b/tests/f_write_ea_no_extra_isize/expect.1
> @@ -0,0 +1,12 @@
> +Pass 1: Checking inodes, blocks, and sizes
> +Pass 2: Checking directory structure
> +Directory inode 12, block #0, offset 4: directory corrupted
> +Salvage? yes
> +
> +Pass 3: Checking directory connectivity
> +Pass 4: Checking reference counts
> +Pass 5: Checking group summary information
> +
> +test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
> +test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
> +Exit status is 1
> diff --git a/tests/f_write_ea_no_extra_isize/expect.2 b/tests/f_write_ea_no_extra_isize/expect.2
> new file mode 100644
> index 0000000..3b6073e
> --- /dev/null
> +++ b/tests/f_write_ea_no_extra_isize/expect.2
> @@ -0,0 +1,7 @@
> +Pass 1: Checking inodes, blocks, and sizes
> +Pass 2: Checking directory structure
> +Pass 3: Checking directory connectivity
> +Pass 4: Checking reference counts
> +Pass 5: Checking group summary information
> +test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
> +Exit status is 0
> diff --git a/tests/f_write_ea_no_extra_isize/image.gz b/tests/f_write_ea_no_extra_isize/image.gz
> new file mode 100644
> index 0000000000000000000000000000000000000000..928daff1f344824d357e816883a98b2cdfdaffb3
> GIT binary patch
> literal 2516
> zcmb2|=3qFkI6Z`k`Ry&+Y!OEZh6m-}^`s^_@O3Vjpj4;eVQ?ceQSj)oL#Gnz1=cLm
> zv~lFfFtI=2=98SCs6XwXQ}-6Ju%@X>9fI9126HxlZ#2Be**0UwlghsG_L*~cr<Q$x
> zcJJ<Oj)e8Ibj2!<JPfHhr<Aq!+aA$gzRhQoDpTqfm88@ubyxlMdNpU)@t~~i`mk$S
> zzoVyaURhCSpQJX`e|`A<yOq_6FK*5jUa#jL8+)&B<F~W5)<2(II(+ff@7nX~@@wXt
> zO0r@5^laDG$2W8iznog0p}+h^;VVXl4ZK}{zBqI(?)|f0Y^ht>t&nbG@wbW$3=B7Z
> zzWX-qadqb7_v@K~-2ZzY{r}&1;?u?qJ8lDkN;|V_HErb@zdo&eIeY8H?91Epe8gTW
> z#qp>8%-!w_RG#4e@_o_Y_#Xxt|AD*%>jF1&gXpjJK#~;jC;IEb(o&#~gzJUuETrf}
> zRok%tV=6ElWvesq%s;o>RW(Vqw|cASS<|O4a;NN_xHR?tb)Bt$LbgmbOZ<IUQ%o$p
> zUwq@2A5YK3-Tr^z+swV;J`Rs%pZ#BB`ad`3|E{WQssEz?TR&NG;97*nr}>xXe%f!t
> zxAoKf7Zd(Zk&?Xh?1}x|{r`U+Uwh*Jxur#4xBt=4FJGCGo%NMBXYuokNl*KaC%k*G
> zu}tK){;6+2PXDfdYaO{ZbnaGE_m1*LLtr!nMnhmU1V%$(Gz6#@0(bv1g#8Yk!N8!v
> F003fX5hwrv
>
> literal 0
> HcmV?d00001
>
> diff --git a/tests/f_write_ea_no_extra_isize/name b/tests/f_write_ea_no_extra_isize/name
> new file mode 100644
> index 0000000..200e365
> --- /dev/null
> +++ b/tests/f_write_ea_no_extra_isize/name
> @@ -0,0 +1 @@
> +write EA when i_extra_size is zero
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs
2014-12-22 22:22 ` Andreas Dilger
@ 2014-12-22 22:32 ` Darrick J. Wong
0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 22:32 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Theodore Ts'o, ext4 development
On Mon, Dec 22, 2014 at 03:22:30PM -0700, Andreas Dilger wrote:
> On Dec 22, 2014, at 11:53 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > If i_extra_isize is zero when we try to write extended attributes,
> > we'll end up writing the EA magic into the i_extra_isize field, which
> > causes a subsequent crash on big endian systems (when we try to write
> > 0xEA02 bytes past the inode!). Therefore when the field is zero, set
> > i_extra_isize to the desired extra_isize size, zero those bytes, and
> > write the EAs after the end of the extended inode.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > lib/ext2fs/ext_attr.c | 11 +++++++++++
> > tests/f_write_ea_no_extra_isize/expect.1 | 12 ++++++++++++
> > tests/f_write_ea_no_extra_isize/expect.2 | 7 +++++++
> > tests/f_write_ea_no_extra_isize/image.gz | Bin
> > tests/f_write_ea_no_extra_isize/name | 1 +
> > 5 files changed, 31 insertions(+)
> > create mode 100644 tests/f_write_ea_no_extra_isize/expect.1
> > create mode 100644 tests/f_write_ea_no_extra_isize/expect.2
> > create mode 100644 tests/f_write_ea_no_extra_isize/image.gz
> > create mode 100644 tests/f_write_ea_no_extra_isize/name
> >
> > diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
> > index 70bc3f9..551c1f2 100644
> > --- a/lib/ext2fs/ext_attr.c
> > +++ b/lib/ext2fs/ext_attr.c
> > @@ -519,6 +519,17 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
> > if (err)
> > goto out;
> >
> > + /* If extra_isize isn't set, we need to set it now */
> > + if (inode->i_extra_isize == 0) {
> > + char *p = (char *)inode;
> > + size_t extra = handle->fs->super->s_want_extra_isize;
> > +
> > + if (extra == 0)
> > + extra = sizeof(inode->i_extra_isize);
>
> I don't think this is quite correct. At a minimum, i_extra_isize should
> include the padding bytes (now i_checksum_hi) following it so that the
> xattr magic and other fields will be properly 32-bit aligned. That said,
> if we are going to use the large inode it probably makes sense to leave
> space for the i_*time_extra fields.
s_want_extra_isize should be set to a sensible value -- mke2fs has
been setting it to 28 (i.e. big enough for i_version_hi) since 2008.
The if (extra == 0) fallback handles the case when the superblock
field is also zero.
Though, hmm, there is a bug; we ought to skip all this if
EXT2_INODE_SIZE == EXT2_GOOD_OLD_INODE SIZE.
--D
>
> Cheers, Andreas
>
> > + memset(p + EXT2_GOOD_OLD_INODE_SIZE, 0, extra);
> > + inode->i_extra_isize = extra;
> > + }
> > +
> > move_inline_data_to_front(handle);
> >
> > x = handle->attrs;
> > diff --git a/tests/f_write_ea_no_extra_isize/expect.1 b/tests/f_write_ea_no_extra_isize/expect.1
> > new file mode 100644
> > index 0000000..b7e7438
> > --- /dev/null
> > +++ b/tests/f_write_ea_no_extra_isize/expect.1
> > @@ -0,0 +1,12 @@
> > +Pass 1: Checking inodes, blocks, and sizes
> > +Pass 2: Checking directory structure
> > +Directory inode 12, block #0, offset 4: directory corrupted
> > +Salvage? yes
> > +
> > +Pass 3: Checking directory connectivity
> > +Pass 4: Checking reference counts
> > +Pass 5: Checking group summary information
> > +
> > +test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
> > +test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
> > +Exit status is 1
> > diff --git a/tests/f_write_ea_no_extra_isize/expect.2 b/tests/f_write_ea_no_extra_isize/expect.2
> > new file mode 100644
> > index 0000000..3b6073e
> > --- /dev/null
> > +++ b/tests/f_write_ea_no_extra_isize/expect.2
> > @@ -0,0 +1,7 @@
> > +Pass 1: Checking inodes, blocks, and sizes
> > +Pass 2: Checking directory structure
> > +Pass 3: Checking directory connectivity
> > +Pass 4: Checking reference counts
> > +Pass 5: Checking group summary information
> > +test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
> > +Exit status is 0
> > diff --git a/tests/f_write_ea_no_extra_isize/image.gz b/tests/f_write_ea_no_extra_isize/image.gz
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..928daff1f344824d357e816883a98b2cdfdaffb3
> > GIT binary patch
> > literal 2516
> > zcmb2|=3qFkI6Z`k`Ry&+Y!OEZh6m-}^`s^_@O3Vjpj4;eVQ?ceQSj)oL#Gnz1=cLm
> > zv~lFfFtI=2=98SCs6XwXQ}-6Ju%@X>9fI9126HxlZ#2Be**0UwlghsG_L*~cr<Q$x
> > zcJJ<Oj)e8Ibj2!<JPfHhr<Aq!+aA$gzRhQoDpTqfm88@ubyxlMdNpU)@t~~i`mk$S
> > zzoVyaURhCSpQJX`e|`A<yOq_6FK*5jUa#jL8+)&B<F~W5)<2(II(+ff@7nX~@@wXt
> > zO0r@5^laDG$2W8iznog0p}+h^;VVXl4ZK}{zBqI(?)|f0Y^ht>t&nbG@wbW$3=B7Z
> > zzWX-qadqb7_v@K~-2ZzY{r}&1;?u?qJ8lDkN;|V_HErb@zdo&eIeY8H?91Epe8gTW
> > z#qp>8%-!w_RG#4e@_o_Y_#Xxt|AD*%>jF1&gXpjJK#~;jC;IEb(o&#~gzJUuETrf}
> > zRok%tV=6ElWvesq%s;o>RW(Vqw|cASS<|O4a;NN_xHR?tb)Bt$LbgmbOZ<IUQ%o$p
> > zUwq@2A5YK3-Tr^z+swV;J`Rs%pZ#BB`ad`3|E{WQssEz?TR&NG;97*nr}>xXe%f!t
> > zxAoKf7Zd(Zk&?Xh?1}x|{r`U+Uwh*Jxur#4xBt=4FJGCGo%NMBXYuokNl*KaC%k*G
> > zu}tK){;6+2PXDfdYaO{ZbnaGE_m1*LLtr!nMnhmU1V%$(Gz6#@0(bv1g#8Yk!N8!v
> > F003fX5hwrv
> >
> > literal 0
> > HcmV?d00001
> >
> > diff --git a/tests/f_write_ea_no_extra_isize/name b/tests/f_write_ea_no_extra_isize/name
> > new file mode 100644
> > index 0000000..200e365
> > --- /dev/null
> > +++ b/tests/f_write_ea_no_extra_isize/name
> > @@ -0,0 +1 @@
> > +write EA when i_extra_size is zero
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 32/31] libext2fs: initialize i_extra_isize when writing EAs
2014-12-22 18:53 ` [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs Darrick J. Wong
2014-12-22 22:22 ` Andreas Dilger
@ 2014-12-22 22:55 ` Darrick J. Wong
1 sibling, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 22:55 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4, Andreas Dilger
If i_extra_isize is zero when we try to write extended attributes,
we'll end up writing the EA magic into the i_extra_isize field, which
causes a subsequent crash on big endian systems (when we try to write
0xEA02 bytes past the inode!). Therefore when the field is zero, set
i_extra_isize to the desired extra_isize size, zero those bytes, and
write the EAs after the end of the extended inode.
v2: Don't bother if we have 128b inodes; ensure that the value
is 32b-aligned so that the EA magic starts on a 32b boundary if
the superblock doesn't tell us otherwise.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/ext_attr.c | 12 ++++++++++++
| 12 ++++++++++++
| 7 +++++++
| Bin
| 1 +
5 files changed, 32 insertions(+)
create mode 100644 tests/f_write_ea_no_extra_isize/expect.1
create mode 100644 tests/f_write_ea_no_extra_isize/expect.2
create mode 100644 tests/f_write_ea_no_extra_isize/image.gz
create mode 100644 tests/f_write_ea_no_extra_isize/name
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 70bc3f9..c6fcf54 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -519,6 +519,18 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
if (err)
goto out;
+ /* If extra_isize isn't set, we need to set it now */
+ if (inode->i_extra_isize == 0 &&
+ EXT2_INODE_SIZE(handle->fs->super) > EXT2_GOOD_OLD_INODE_SIZE) {
+ char *p = (char *)inode;
+ size_t extra = handle->fs->super->s_want_extra_isize;
+
+ if (extra == 0)
+ extra = sizeof(__u32);
+ memset(p + EXT2_GOOD_OLD_INODE_SIZE, 0, extra);
+ inode->i_extra_isize = extra;
+ }
+
move_inline_data_to_front(handle);
x = handle->attrs;
--git a/tests/f_write_ea_no_extra_isize/expect.1 b/tests/f_write_ea_no_extra_isize/expect.1
new file mode 100644
index 0000000..b7e7438
--- /dev/null
+++ b/tests/f_write_ea_no_extra_isize/expect.1
@@ -0,0 +1,12 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Directory inode 12, block #0, offset 4: directory corrupted
+Salvage? yes
+
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 1
--git a/tests/f_write_ea_no_extra_isize/expect.2 b/tests/f_write_ea_no_extra_isize/expect.2
new file mode 100644
index 0000000..3b6073e
--- /dev/null
+++ b/tests/f_write_ea_no_extra_isize/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 0
diff --git a/tests/f_write_ea_no_extra_isize/image.gz b/tests/f_write_ea_no_extra_isize/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..928daff1f344824d357e816883a98b2cdfdaffb3
GIT binary patch
literal 2516
zcmb2|=3qFkI6Z`k`Ry&+Y!OEZh6m-}^`s^_@O3Vjpj4;eVQ?ceQSj)oL#Gnz1=cLm
zv~lFfFtI=2=98SCs6XwXQ}-6Ju%@X>9fI9126HxlZ#2Be**0UwlghsG_L*~cr<Q$x
zcJJ<Oj)e8Ibj2!<JPfHhr<Aq!+aA$gzRhQoDpTqfm88@ubyxlMdNpU)@t~~i`mk$S
zzoVyaURhCSpQJX`e|`A<yOq_6FK*5jUa#jL8+)&B<F~W5)<2(II(+ff@7nX~@@wXt
zO0r@5^laDG$2W8iznog0p}+h^;VVXl4ZK}{zBqI(?)|f0Y^ht>t&nbG@wbW$3=B7Z
zzWX-qadqb7_v@K~-2ZzY{r}&1;?u?qJ8lDkN;|V_HErb@zdo&eIeY8H?91Epe8gTW
z#qp>8%-!w_RG#4e@_o_Y_#Xxt|AD*%>jF1&gXpjJK#~;jC;IEb(o&#~gzJUuETrf}
zRok%tV=6ElWvesq%s;o>RW(Vqw|cASS<|O4a;NN_xHR?tb)Bt$LbgmbOZ<IUQ%o$p
zUwq@2A5YK3-Tr^z+swV;J`Rs%pZ#BB`ad`3|E{WQssEz?TR&NG;97*nr}>xXe%f!t
zxAoKf7Zd(Zk&?Xh?1}x|{r`U+Uwh*Jxur#4xBt=4FJGCGo%NMBXYuokNl*KaC%k*G
zu}tK){;6+2PXDfdYaO{ZbnaGE_m1*LLtr!nMnhmU1V%$(Gz6#@0(bv1g#8Yk!N8!v
F003fX5hwrv
literal 0
HcmV?d00001
--git a/tests/f_write_ea_no_extra_isize/name b/tests/f_write_ea_no_extra_isize/name
new file mode 100644
index 0000000..200e365
--- /dev/null
+++ b/tests/f_write_ea_no_extra_isize/name
@@ -0,0 +1 @@
+write EA when i_extra_size is zero
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 33/31] e2fsck: on read error, don't rewrite blocks past the end of the fs
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (30 preceding siblings ...)
2014-12-22 18:53 ` [PATCH 32/31] libext2fs: initialize i_extra_isize when writing EAs Darrick J. Wong
@ 2014-12-22 18:55 ` Darrick J. Wong
2014-12-22 18:55 ` [PATCH 34/31] e2fsck: fix the journal recreation message Darrick J. Wong
` (3 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 18:55 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
If e2fsck encounters a read error on a block past the end of the
filesystem, don't bother trying to "rewrite" the block. We might
still want to re-try the read to capture FS data marooned past the end
of the filesystem, but in that case e2fsck ought to move the block
back inside the filesystem.
This enables e2fuzz to detect writes past the end of the FS due to
software bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/ehandler.c | 5 +++++
misc/e2fuzz.sh | 7 +++++++
2 files changed, 12 insertions(+)
diff --git a/e2fsck/ehandler.c b/e2fsck/ehandler.c
index 6dddf9c..71ca301 100644
--- a/e2fsck/ehandler.c
+++ b/e2fsck/ehandler.c
@@ -58,6 +58,11 @@ static errcode_t e2fsck_handle_read_error(io_channel channel,
printf(_("Error reading block %lu (%s). "), block,
error_message(error));
preenhalt(ctx);
+
+ /* Don't rewrite a block past the end of the FS. */
+ if (block >= ext2fs_blocks_count(fs->super))
+ return 0;
+
if (ask(ctx, _("Ignore error"), 1)) {
if (ask(ctx, _("Force rewrite"), 1))
io_channel_write_blk64(channel, block, count, data);
diff --git a/misc/e2fuzz.sh b/misc/e2fuzz.sh
index 4cb7b61..d8d9a82 100755
--- a/misc/e2fuzz.sh
+++ b/misc/e2fuzz.sh
@@ -219,6 +219,7 @@ seq 1 "${PASSES}" | while read pass; do
fi
if [ "${RUN_FSCK}" -gt 0 ]; then
cp "${PASS_IMG}" "${FSCK_IMG}"
+ pass_img_sz="$(stat -c '%s' "${PASS_IMG}")"
seq 1 "${MAX_FSCK}" | while read fsck_pass; do
echo "++ fsck pass ${fsck_pass}: $(which e2fsck) -fy ${FSCK_IMG} ${EXTENDED_FSCK_OPTS}"
@@ -250,6 +251,12 @@ seq 1 "${PASSES}" | while read pass; do
exit 2
fi
fi
+
+ fsck_img_sz="$(stat -c '%s' "${FSCK_IMG}")"
+ if [ "${fsck_img_sz}" -ne "${pass_img_sz}" ]; then
+ echo "++ fsck image size changed"
+ exit 3
+ fi
done
fsck_loop_ret=$?
if [ "${fsck_loop_ret}" -gt 0 ]; then
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 34/31] e2fsck: fix the journal recreation message
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (31 preceding siblings ...)
2014-12-22 18:55 ` [PATCH 33/31] e2fsck: on read error, don't rewrite blocks past the end of the fs Darrick J. Wong
@ 2014-12-22 18:55 ` Darrick J. Wong
2014-12-22 18:57 ` [PATCH 35/31] libext2fs: avoid pointless EA block allocation Darrick J. Wong
` (2 subsequent siblings)
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 18:55 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
When we recreate the journal, don't say that the FS "is now ext3
again", since we could be merely fixing a damaged ext4 FS journal;
this does not magically convert the FS back to ext3.
Fix the po files too, though this string hasn't been translated.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
e2fsck/unix.c | 2 +-
po/ca.po | 2 +-
po/cs.po | 2 +-
po/de.po | 2 +-
po/eo.po | 2 +-
po/es.po | 2 +-
po/fr.po | 2 +-
po/id.po | 2 +-
po/it.po | 2 +-
po/nl.po | 2 +-
po/pl.po | 2 +-
po/sv.po | 2 +-
po/tr.po | 2 +-
po/uk.po | 2 +-
po/vi.po | 2 +-
po/zh_CN.po | 2 +-
tests/f_badjour_indblks/expect.1 | 2 +-
tests/f_badjourblks/expect.1 | 2 +-
tests/f_miss_journal/expect.1 | 2 +-
tests/j_corrupt_sb_magic/expect | 2 +-
tests/j_long_trans/expect | 2 +-
tests/j_long_trans_mcsum_32bit/expect | 2 +-
tests/j_long_trans_mcsum_64bit/expect | 2 +-
23 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index d9be549..ccaa247 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1803,7 +1803,7 @@ print_unsupp_features:
log_out(ctx, "%s", _(" Done.\n"));
log_out(ctx, "%s",
_("\n*** journal has been re-created - "
- "filesystem is now ext3 again ***\n"));
+ "filesystem is journalled again ***\n"));
}
}
no_journal:
diff --git a/po/ca.po b/po/ca.po
index 02388da..c9ff513 100644
--- a/po/ca.po
+++ b/po/ca.po
@@ -3178,7 +3178,7 @@ msgstr " Fet.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** registre de canvis recreat - el sist. de fitxers torna a ser ext3 ***\n"
diff --git a/po/cs.po b/po/cs.po
index be0a410..298c94e 100644
--- a/po/cs.po
+++ b/po/cs.po
@@ -3190,7 +3190,7 @@ msgstr " Hotovo.\n"
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** žurnál by znovu vytvoÅen â souborový systém se opÄt stal ext3 ***\n"
diff --git a/po/de.po b/po/de.po
index fd96bb1..8c2078b 100644
--- a/po/de.po
+++ b/po/de.po
@@ -3159,7 +3159,7 @@ msgstr " Erledigt.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** Journal wurde wiederhergestellt - Dateisystem ist nun wieder ext3 ***\n"
diff --git a/po/eo.po b/po/eo.po
index 33e1635..6ce44ad 100644
--- a/po/eo.po
+++ b/po/eo.po
@@ -3113,7 +3113,7 @@ msgstr " Pretas.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** kaÅprotokolo rekreiÄis -- dosiersistemo estas denove ext3 ***\n"
diff --git a/po/es.po b/po/es.po
index 4137940..8dd4fe4 100644
--- a/po/es.po
+++ b/po/es.po
@@ -3235,7 +3235,7 @@ msgstr " Hecho.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** el fichero de transacciones se ha creado de nuevo ***\n"
diff --git a/po/fr.po b/po/fr.po
index 44a482f..9670f86 100644
--- a/po/fr.po
+++ b/po/fr.po
@@ -3173,7 +3173,7 @@ msgstr "Compl
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** le journal a été re-créé - le système de fichiers est de nouveau ext3 ***\n"
diff --git a/po/id.po b/po/id.po
index 0b747ff..7d56ceb 100644
--- a/po/id.po
+++ b/po/id.po
@@ -3170,7 +3170,7 @@ msgstr " Selesai.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** jurnal telah dibuat kembali - sistem berkas sekarang ext3 lagi ***\n"
diff --git a/po/it.po b/po/it.po
index f6c7374..3736d90 100644
--- a/po/it.po
+++ b/po/it.po
@@ -3220,7 +3220,7 @@ msgstr ""
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
#: e2fsck/unix.c:1687
diff --git a/po/nl.po b/po/nl.po
index dcc716d..033ef05 100644
--- a/po/nl.po
+++ b/po/nl.po
@@ -3177,7 +3177,7 @@ msgstr " voltooid.\n"
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** journal is opnieuw aangemaakt -- het bestandssysteem is nu weer ext3 ***\n"
diff --git a/po/pl.po b/po/pl.po
index 4fef6fb..7d71292 100644
--- a/po/pl.po
+++ b/po/pl.po
@@ -3177,7 +3177,7 @@ msgstr " Wykonano.\n"
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** kronika zostaÅa ponownie utworzona - system plików to znowu ext3 ***\n"
diff --git a/po/sv.po b/po/sv.po
index 3391aee..a8d171f 100644
--- a/po/sv.po
+++ b/po/sv.po
@@ -3176,7 +3176,7 @@ msgstr " Klar.\n"
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** journalen har återskapats - filsystemet är nu ext3 igen ***\n"
diff --git a/po/tr.po b/po/tr.po
index 0c715c0..4cd64ba 100644
--- a/po/tr.po
+++ b/po/tr.po
@@ -3249,7 +3249,7 @@ msgstr " Tamamlandı.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** günlük yeniden oluÅturuldu - dosya sistemi yeniden ext3 ***\n"
diff --git a/po/uk.po b/po/uk.po
index f93282a..7deeafb 100644
--- a/po/uk.po
+++ b/po/uk.po
@@ -3177,7 +3177,7 @@ msgstr " Ðиконано.\n"
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** жÑÑнал ÑÑвоÑено повÑоÑно - ÑÐµÐ¿ÐµÑ Ñе Ð·Ð½Ð¾Ð²Ñ Ñайлова ÑиÑÑема ext3 ***\n"
diff --git a/po/vi.po b/po/vi.po
index 2300c19..e08e7f5 100644
--- a/po/vi.po
+++ b/po/vi.po
@@ -3139,7 +3139,7 @@ msgstr " Xong.\n"
#: e2fsck/unix.c:1664
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
"\n"
"*** nháºt ký Äã ÄÆ°á»£c tạo lại â há» thá»ng táºp tin lúc nà y là ext3 lại ***\n"
diff --git a/po/zh_CN.po b/po/zh_CN.po
index 37c4aa7..d79f348 100644
--- a/po/zh_CN.po
+++ b/po/zh_CN.po
@@ -3003,7 +3003,7 @@ msgstr "宿.\n"
#: e2fsck/unix.c:1663
msgid ""
"\n"
-"*** journal has been re-created - filesystem is now ext3 again ***\n"
+"*** journal has been re-created - filesystem is journalled again ***\n"
msgstr ""
#: e2fsck/unix.c:1687
diff --git a/tests/f_badjour_indblks/expect.1 b/tests/f_badjour_indblks/expect.1
index 7ccc59b..a326900 100644
--- a/tests/f_badjour_indblks/expect.1
+++ b/tests/f_badjour_indblks/expect.1
@@ -25,7 +25,7 @@ Recreate journal? yes
Creating journal (1024 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 11/256 files (0.0% non-contiguous), 1111/8192 blocks
diff --git a/tests/f_badjourblks/expect.1 b/tests/f_badjourblks/expect.1
index 34c6658..dc38b2b 100644
--- a/tests/f_badjourblks/expect.1
+++ b/tests/f_badjourblks/expect.1
@@ -23,7 +23,7 @@ Recreate journal? yes
Creating journal (1024 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 11/256 files (0.0% non-contiguous), 1079/8192 blocks
diff --git a/tests/f_miss_journal/expect.1 b/tests/f_miss_journal/expect.1
index 6ec8b38..77d3223 100644
--- a/tests/f_miss_journal/expect.1
+++ b/tests/f_miss_journal/expect.1
@@ -21,7 +21,7 @@ Recreate journal? yes
Creating journal (1024 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 11/256 files (0.0% non-contiguous), 1079/2048 blocks
diff --git a/tests/j_corrupt_sb_magic/expect b/tests/j_corrupt_sb_magic/expect
index 2169a15..0d2e9b4 100644
--- a/tests/j_corrupt_sb_magic/expect
+++ b/tests/j_corrupt_sb_magic/expect
@@ -26,7 +26,7 @@ Recreate journal? yes
Creating journal (1024 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 12/128 files (0.0% non-contiguous), 1092/2048 blocks
diff --git a/tests/j_long_trans/expect b/tests/j_long_trans/expect
index 7638ef1..09da441 100644
--- a/tests/j_long_trans/expect
+++ b/tests/j_long_trans/expect
@@ -100,7 +100,7 @@ Recreate journal? yes
Creating journal (8192 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 11/16384 files (0.0% non-contiguous), 14420/262144 blocks
diff --git a/tests/j_long_trans_mcsum_32bit/expect b/tests/j_long_trans_mcsum_32bit/expect
index 0d141c1..0f35126 100644
--- a/tests/j_long_trans_mcsum_32bit/expect
+++ b/tests/j_long_trans_mcsum_32bit/expect
@@ -139,7 +139,7 @@ Recreate journal? yes
Creating journal (16384 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 11/32768 files (0.0% non-contiguous), 27039/524288 blocks
diff --git a/tests/j_long_trans_mcsum_64bit/expect b/tests/j_long_trans_mcsum_64bit/expect
index 94e9925..17c2517 100644
--- a/tests/j_long_trans_mcsum_64bit/expect
+++ b/tests/j_long_trans_mcsum_64bit/expect
@@ -138,7 +138,7 @@ Recreate journal? yes
Creating journal (16384 blocks): Done.
-*** journal has been re-created - filesystem is now ext3 again ***
+*** journal has been re-created - filesystem is journalled again ***
test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
test_filesys: 11/32768 files (0.0% non-contiguous), 27057/524288 blocks
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 35/31] libext2fs: avoid pointless EA block allocation
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (32 preceding siblings ...)
2014-12-22 18:55 ` [PATCH 34/31] e2fsck: fix the journal recreation message Darrick J. Wong
@ 2014-12-22 18:57 ` Darrick J. Wong
2014-12-22 18:57 ` [PATCH 36/31] libext2fs: strengthen i_extra_isize checks when reading/writing xattrs Darrick J. Wong
2014-12-22 18:57 ` [PATCH 37/31] libext2fs: fix tdb.c mmap leak Darrick J. Wong
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 18:57 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
Use qsort to move the inlinedata attribute to the front of the list
and the empty entries to the end. Then we can use handle->count to
decide if we're done writing xattrs, which helps us to avoid the
situation where we're midway through the attribute list so we
allocate an EA block to store more, but have no idea that there's
actually nothing left in the list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/ext_attr.c | 40 +++++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 19 deletions(-)
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 551c1f2..8210814 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -254,22 +254,20 @@ static struct ea_name_index ea_names[] = {
{0, NULL},
};
-static void move_inline_data_to_front(struct ext2_xattr_handle *h)
+/* Push empty attributes to the end and inlinedata to the front. */
+static int attr_compare(const void *a, const void *b)
{
- struct ext2_xattr *x;
- struct ext2_xattr tmp;
-
- for (x = h->attrs + 1; x < h->attrs + h->length; x++) {
- if (!x->name)
- continue;
-
- if (strcmp(x->name, "system.data") == 0) {
- memcpy(&tmp, x, sizeof(tmp));
- memcpy(x, h->attrs, sizeof(tmp));
- memcpy(h->attrs, &tmp, sizeof(tmp));
- return;
- }
- }
+ const struct ext2_xattr *xa = a, *xb = b;
+
+ if (xa->name == NULL)
+ return +1;
+ else if (xb->name == NULL)
+ return -1;
+ else if (!strcmp(xa->name, "system.data"))
+ return -1;
+ else if (!strcmp(xb->name, "system.data"))
+ return +1;
+ return 0;
}
static const char *find_ea_prefix(int index)
@@ -530,9 +528,13 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
inode->i_extra_isize = extra;
}
- move_inline_data_to_front(handle);
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 36/31] libext2fs: strengthen i_extra_isize checks when reading/writing xattrs
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (33 preceding siblings ...)
2014-12-22 18:57 ` [PATCH 35/31] libext2fs: avoid pointless EA block allocation Darrick J. Wong
@ 2014-12-22 18:57 ` Darrick J. Wong
2014-12-22 18:57 ` [PATCH 37/31] libext2fs: fix tdb.c mmap leak Darrick J. Wong
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 18:57 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
Strengthen the i_extra_isize checks to look for obviously too-small
values before trying to operate on inode EAs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/ext_attr.c | 10 ++++++----
| 12 ++++++++++++
| 7 +++++++
| Bin
| 1 +
| 15 +++++++++++++++
| 7 +++++++
| Bin
| 1 +
9 files changed, 49 insertions(+), 4 deletions(-)
create mode 100644 tests/f_write_ea_toobig_extra_isize/expect.1
create mode 100644 tests/f_write_ea_toobig_extra_isize/expect.2
create mode 100644 tests/f_write_ea_toobig_extra_isize/image.gz
create mode 100644 tests/f_write_ea_toobig_extra_isize/name
create mode 100644 tests/f_write_ea_toosmall_extra_isize/expect.1
create mode 100644 tests/f_write_ea_toosmall_extra_isize/expect.2
create mode 100644 tests/f_write_ea_toosmall_extra_isize/image.gz
create mode 100644 tests/f_write_ea_toosmall_extra_isize/name
diff --git a/lib/ext2fs/ext_attr.c b/lib/ext2fs/ext_attr.c
index 8210814..099f17d 100644
--- a/lib/ext2fs/ext_attr.c
+++ b/lib/ext2fs/ext_attr.c
@@ -535,8 +535,9 @@ errcode_t ext2fs_xattrs_write(struct ext2_xattr_handle *handle)
x = handle->attrs;
qsort(x, handle->length, sizeof(struct ext2_xattr), attr_compare);
- /* Does the inode have size for EA? */
- if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
+ /* Does the inode have space for EA? */
+ if (inode->i_extra_isize < sizeof(inode->i_extra_isize) ||
+ EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
inode->i_extra_isize +
sizeof(__u32))
goto write_ea_block;
@@ -772,8 +773,9 @@ errcode_t ext2fs_xattrs_read(struct ext2_xattr_handle *handle)
xattrs_free_keys(handle);
- /* Does the inode have size for EA? */
- if (EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
+ /* Does the inode have space for EA? */
+ if (inode->i_extra_isize < sizeof(inode->i_extra_isize) ||
+ EXT2_INODE_SIZE(handle->fs->super) <= EXT2_GOOD_OLD_INODE_SIZE +
inode->i_extra_isize +
sizeof(__u32))
goto read_ea_block;
--git a/tests/f_write_ea_toobig_extra_isize/expect.1 b/tests/f_write_ea_toobig_extra_isize/expect.1
new file mode 100644
index 0000000..b7e7438
--- /dev/null
+++ b/tests/f_write_ea_toobig_extra_isize/expect.1
@@ -0,0 +1,12 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Directory inode 12, block #0, offset 4: directory corrupted
+Salvage? yes
+
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 1
--git a/tests/f_write_ea_toobig_extra_isize/expect.2 b/tests/f_write_ea_toobig_extra_isize/expect.2
new file mode 100644
index 0000000..3b6073e
--- /dev/null
+++ b/tests/f_write_ea_toobig_extra_isize/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 0
diff --git a/tests/f_write_ea_toobig_extra_isize/image.gz b/tests/f_write_ea_toobig_extra_isize/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..291924bf62477e5f9f18c198c9d478972590f345
GIT binary patch
literal 2518
zcmb2|=3tmxGd+Zf`Ry&+Y!OEZh6m-}^`s^_@O3Vjpj4;eVQ?ceQSj)oL#Gnz1=cLm
zv~lD(l2O;tJLA~BLvcFItzwzFYgn{h1(d})6D-o-GbiuS<}oxp`B`T3;hH<T#=FhG
zzq{wg@cU}a#*@l@%ieg41!iA;zV^xsgX?F)+{Dzpj*F>nT5)~-(sL`Tb(Zh?mY%wH
z&;C%0`J1NmE2Rbfdz1F{@0;D;&gbvXtEw%1_3M;h&({2Zzh5p7m+K9)m#HxPS@6|I
z?6Khl^V`?@lUMUB55Ko?-<+WSV19-M*?=R<+k>p`b^Ol`%yL=oyVaunCJ!S6!-03@
z>$l6;d(Ez|mt|mJ`0-x!|9^g$Y5lIBBO4k|eBQY6neiUyi_`sm@2~5<H)CIM_NI*P
z3ga11%kP>2bvMK=-GBOP{lmu#eu8+mi<7v3h8*~64<tzef1<x0<V^+YNH|}}&O(Y#
zRJ9HJKbEE>E}Hx8@(%lWpV?kVvZTI8X`4o$x_J9UwX*l^efg2AelA_1oqO<KdvH`t
zly&5ir|Rp|zTN)AY@PnDZX-i)dBp$C*M4oU{40Gba{AZ&$L}XBX1blcv7`Q6S=awh
z7MHr}7dih4&T{dH7XDxI|KIcZ+m!$9p73d_{iFSR?k=6QR{N^?#{PN7y{G2oZa!)_
z!@g$G+^)akmDPWqHt+vFf6cX3=b})ZJjxplfzc2c4S~@R7!85Z5TIHJG`y>0x~TET
Joq<7t0RSPO2<HF*
literal 0
HcmV?d00001
--git a/tests/f_write_ea_toobig_extra_isize/name b/tests/f_write_ea_toobig_extra_isize/name
new file mode 100644
index 0000000..a5ed718
--- /dev/null
+++ b/tests/f_write_ea_toobig_extra_isize/name
@@ -0,0 +1 @@
+write EA when i_extra_size is too big for EA
--git a/tests/f_write_ea_toosmall_extra_isize/expect.1 b/tests/f_write_ea_toosmall_extra_isize/expect.1
new file mode 100644
index 0000000..eecfc9d
--- /dev/null
+++ b/tests/f_write_ea_toosmall_extra_isize/expect.1
@@ -0,0 +1,15 @@
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 has a extra size (1) which is invalid
+Fix? yes
+
+Pass 2: Checking directory structure
+Directory inode 12, block #0, offset 4: directory corrupted
+Salvage? yes
+
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 1
--git a/tests/f_write_ea_toosmall_extra_isize/expect.2 b/tests/f_write_ea_toosmall_extra_isize/expect.2
new file mode 100644
index 0000000..3b6073e
--- /dev/null
+++ b/tests/f_write_ea_toosmall_extra_isize/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (0.0% non-contiguous), 17/512 blocks
+Exit status is 0
diff --git a/tests/f_write_ea_toosmall_extra_isize/image.gz b/tests/f_write_ea_toosmall_extra_isize/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..78a01497ec729dabc9406afb5914e76ce018cbb3
GIT binary patch
literal 2517
zcmb2|=3uBxpAo{u{Pvb@wuqwy!-MkgdQy`d_&OI%P^we#Fu0MKD0uYPp;HO<0&5m&
z+BkA#nAjh1n{@2np*Wr9R<TUoH7we$0?J~Y2^Q(^nUnWu^B5YQ{4BHiaLpY(<L&0(
z-`#U#_<c2J<4NVdWpBL20<*6^Uwh?+!R<3)ZenU)$Hmk(t+>8^>A98FI?H!`+qWs~
z-hHh({x?sz3mR+Imu>#_@7`{2=kxdHRn?Zh`gO{$XKVhu-!GSk%k_uZ%TyTt%=qdf
z_SkTO`R!}{$*Xymhu>ScZ%$BuFh4_sY`~G_?Lk)eI{s$|X1Of)-D**OlZTOk;lR7{
z{o7^iy=K?f%Q7%9{CF?=|3AOWw0_smkqwO}K5yLk%y^IU#p(XO_t*8_o3XDrds9Yt
zh4GB1<#)}1x*KAb?mzvt{^8>VKS4a(#YtR1Lk@hk2a=?KKha+g@}>fHB%Ci~XCXx=
zs@jJAA4^jb7tQ^4d53+x&up(FSyJDlv`wQ=UA%pwTG{*dzWm5lKbNl1&OP|CJvb^R
z$~tn%Q}y*}-){e5wod=%xRIf^JmUZ6YrnQv{*^uzIsI$?<M$K%S+^Ze68t~st;qkU
zg-b;LyR`pUex+l=Hno2*|NT90pX>goT=8jk{gL|0-QKFv;j8i=N!Wh&o|>0?bf5Th
zzWEu;AN_E>Cs+UT!TZ1JtGBM&EQadjQQl|>jE2By2#kinXb6mk0M$ZZ!S4Ay_uDlX
H7!())`D6t8
literal 0
HcmV?d00001
--git a/tests/f_write_ea_toosmall_extra_isize/name b/tests/f_write_ea_toosmall_extra_isize/name
new file mode 100644
index 0000000..718c12c
--- /dev/null
+++ b/tests/f_write_ea_toosmall_extra_isize/name
@@ -0,0 +1 @@
+write EA when i_extra_size is too small to make sense
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 37/31] libext2fs: fix tdb.c mmap leak
2014-12-20 21:16 [PATCH 00/31] e2fsprogs December 2014 patchbomb Darrick J. Wong
` (34 preceding siblings ...)
2014-12-22 18:57 ` [PATCH 36/31] libext2fs: strengthen i_extra_isize checks when reading/writing xattrs Darrick J. Wong
@ 2014-12-22 18:57 ` Darrick J. Wong
35 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2014-12-22 18:57 UTC (permalink / raw)
To: tytso; +Cc: linux-ext4
When undoing an expansion of an mmap'd database while cancelling a
transaction, the tdb code prematurely decreases the variable that
tracks the file size, which leads to a region leak during the
subsequent unmap. Fix this by maintaining a separate counter for the
region size.
(This is probably unnecessary since e2undo was the only user of tdb
transactions, but I suppose we could be proactive.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
lib/ext2fs/tdb.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/ext2fs/tdb.c b/lib/ext2fs/tdb.c
index a916768..7317288 100644
--- a/lib/ext2fs/tdb.c
+++ b/lib/ext2fs/tdb.c
@@ -246,6 +246,7 @@ struct tdb_context {
int page_size;
int max_dead_records;
bool have_transaction_lock;
+ tdb_len_t real_map_size; /* how much space has been mapped */
};
@@ -970,9 +971,10 @@ int tdb_munmap(struct tdb_context *tdb)
#ifdef HAVE_MMAP
if (tdb->map_ptr) {
- int ret = munmap(tdb->map_ptr, tdb->map_size);
+ int ret = munmap(tdb->map_ptr, tdb->real_map_size);
if (ret != 0)
return ret;
+ tdb->real_map_size = 0;
}
#endif
tdb->map_ptr = NULL;
@@ -995,10 +997,12 @@ void tdb_mmap(struct tdb_context *tdb)
*/
if (tdb->map_ptr == MAP_FAILED) {
+ tdb->real_map_size = 0;
tdb->map_ptr = NULL;
TDB_LOG((tdb, TDB_DEBUG_WARNING, "tdb_mmap failed for size %d (%s)\n",
tdb->map_size, strerror(errno)));
}
+ tdb->real_map_size = tdb->map_size;
} else {
tdb->map_ptr = NULL;
}
^ permalink raw reply related [flat|nested] 42+ messages in thread