From: Theodore Ts'o <tytso@mit.edu>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file
Date: Tue, 5 May 2015 10:24:34 -0400 [thread overview]
Message-ID: <20150505142434.GB9260@thunk.org> (raw)
In-Reply-To: <20150402023532.25243.532.stgit@birch.djwong.org>
On Wed, Apr 01, 2015 at 07:35:32PM -0700, Darrick J. Wong wrote:
> The existing undo file format (which is based on tdb) has many
> problems. First, its comparison of superblock fields is ineffective,
> since the last mount time is only written by the kernel, not the tools
> (which means that undo files can be applied out of order, thus
> corrupting the filesystem); block numbers are written in CPU byte
> order, which will cause silent failures if an undo file is moved from
> one type of system to another; using the tdb database costs us an
> enormous amount of CPU overhead to maintain the key data structure,
> and finally, the tdb database is unable to deal with databases larger
> than 2GB. (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
> to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
> might as well move off of tdb now.)
>
> The last problem is fatal if you want to use tune2fs to turn on
> metadata checksumming, since that rewrites every block on the
> filesystem, which can easily produce a many-gigabyte undo file, which
> of course is unreadable and therefore the operation cannot be undone.
>
> Therefore, rip all of that out in favor of writing to a flat file.
> Old blocks are appended to a file and the index is written to the end
> when we're done. This implementation is much faster than wasting a
> considerable amount of time trying to maintain a hash index, which
> drops the runtime overhead of tune2fs -O metadata_csum from ~45min
> to ~20 seconds on a 2TB filesystem.
>
> I have a few reasons that factored in my decision not to repurpose the
> jbd2 file format for undo files. First, undo files are limited to
> 2^32 blocks (16TB) which some day might not serve us well. Second,
> the journal block size is tied to the file system block size, but
> mke2fs wants to be able to back up big chunks of old device contents.
> This would require large changes to the e2fsck journal replay code,
> which itself is derived from the kernel jbd2 driver, which I'd rather
> not destabilize. Third, I want to require undo files to store the FS
> superblock at the end of undo file creation so that e2undo can be
> reasonably sure that an undo file is supposed to apply against the
> given block device, and doing so would require changes to the jbd2
> format. Fourth, it didn't seem like a good idea that external
> journals should resemble undo files so closely.
>
> v2: Provide a state bit that is only set when the undo channel is
> closed correctly so we can warn the user about potentially incomplete
> undo files. Straighten out the superblock handling so that undo files
> won't be confused for real ext* FS images. Record multi-block runs in
> each block key to reduce overhead even further. Support reopening an
> undo file so that we can combine multiple FS operations into one
> (overall smaller) transaction file, which will be easier to manage.
> Flush the undo index data if the program should terminate
> unexpectedly. Update the ext4 superblock bits if errors or -f is
> found to encourage fsck to do a full run the next time it's invoked.
> Enable undoing the undo.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Applied, thanks.
- Ted
next prev parent reply other threads:[~2015-05-05 14:24 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-02 2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
2015-04-02 2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
2015-04-21 1:47 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
2015-04-21 1:47 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2015-04-21 3:03 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2015-04-21 3:03 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
2015-04-21 2:26 ` Theodore Ts'o
2015-04-21 4:43 ` Darrick J. Wong
2015-04-21 14:06 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
2015-04-21 16:33 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
2015-04-21 14:36 ` Theodore Ts'o
2015-05-05 22:45 ` Darrick J. Wong
2015-04-02 2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
2015-04-21 14:47 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
2015-04-02 4:10 ` Andreas Dilger
[not found] ` <20150402060021.GP11031@birch.djwong.org>
[not found] ` <10D33B1F-52B7-4242-9A67-FB9E1CE75296@dilger.ca>
2015-04-06 18:57 ` Darrick J. Wong
2015-04-02 2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
2015-04-02 4:06 ` Andreas Dilger
2015-04-21 15:00 ` Theodore Ts'o
2015-04-21 16:48 ` Theodore Ts'o
2015-04-22 2:47 ` Darrick J. Wong
2015-05-05 14:20 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
2015-05-05 14:21 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
2015-05-05 14:21 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
2015-05-05 14:22 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
2015-05-05 14:24 ` Theodore Ts'o [this message]
2015-04-02 2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
2015-05-05 14:31 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
2015-05-05 14:07 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
2015-05-05 14:36 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
2015-05-05 14:36 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
2015-05-05 14:37 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
2015-05-05 14:43 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
2015-05-05 14:43 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
2015-05-05 14:44 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
2015-05-05 14:46 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
2015-05-05 14:49 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
2015-05-05 14:51 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
2015-05-05 14:52 ` Theodore Ts'o
2015-04-02 2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
2015-05-05 14:52 ` Theodore Ts'o
2015-04-02 2:37 ` [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2015-04-02 2:37 ` [PATCH 29/35] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
2015-04-02 2:37 ` [PATCH 30/35] libext2fs: add new hooks to support large allocations Darrick J. Wong
2015-04-02 2:37 ` [PATCH 31/35] libext2fs: implement fallocate Darrick J. Wong
2015-04-02 2:37 ` [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
2015-04-02 2:37 ` [PATCH 33/35] debugfs: implement fallocate Darrick J. Wong
2015-04-02 2:37 ` [PATCH 34/35] tests: test debugfs punch command Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150505142434.GB9260@thunk.org \
--to=tytso@mit.edu \
--cc=darrick.wong@oracle.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.