From: Theodore Ts'o <tytso@mit.edu>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file
Date: Tue, 5 May 2015 10:24:34 -0400 [thread overview]
Message-ID: <20150505142434.GB9260@thunk.org> (raw)
In-Reply-To: <20150402023532.25243.532.stgit@birch.djwong.org>
On Wed, Apr 01, 2015 at 07:35:32PM -0700, Darrick J. Wong wrote:
> The existing undo file format (which is based on tdb) has many
> problems. First, its comparison of superblock fields is ineffective,
> since the last mount time is only written by the kernel, not the tools
> (which means that undo files can be applied out of order, thus
> corrupting the filesystem); block numbers are written in CPU byte
> order, which will cause silent failures if an undo file is moved from
> one type of system to another; using the tdb database costs us an
> enormous amount of CPU overhead to maintain the key data structure,
> and finally, the tdb database is unable to deal with databases larger
> than 2GB. (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
> to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
> might as well move off of tdb now.)
>
> The last problem is fatal if you want to use tune2fs to turn on
> metadata checksumming, since that rewrites every block on the
> filesystem, which can easily produce a many-gigabyte undo file, which
> of course is unreadable and therefore the operation cannot be undone.
>
> Therefore, rip all of that out in favor of writing to a flat file.
> Old blocks are appended to a file and the index is written to the end
> when we're done. This implementation is much faster than wasting a
> considerable amount of time trying to maintain a hash index, which
> drops the runtime overhead of tune2fs -O metadata_csum from ~45min
> to ~20 seconds on a 2TB filesystem.
>
> I have a few reasons that factored in my decision not to repurpose the
> jbd2 file format for undo files. First, undo files are limited to
> 2^32 blocks (16TB) which some day might not serve us well. Second,
> the journal block size is tied to the file system block size, but
> mke2fs wants to be able to back up big chunks of old device contents.
> This would require large changes to the e2fsck journal replay code,
> which itself is derived from the kernel jbd2 driver, which I'd rather
> not destabilize. Third, I want to require undo files to store the FS
> superblock at the end of undo file creation so that e2undo can be
> reasonably sure that an undo file is supposed to apply against the
> given block device, and doing so would require changes to the jbd2
> format. Fourth, it didn't seem like a good idea that external
> journals should resemble undo files so closely.
>
> v2: Provide a state bit that is only set when the undo channel is
> closed correctly so we can warn the user about potentially incomplete
> undo files. Straighten out the superblock handling so that undo files
> won't be confused for real ext* FS images. Record multi-block runs in
> each block key to reduce overhead even further. Support reopening an
> undo file so that we can combine multiple FS operations into one
> (overall smaller) transaction file, which will be easier to manage.
> Flush the undo index data if the program should terminate
> unexpectedly. Update the ext4 superblock bits if errors or -f is
> found to encourage fsck to do a full run the next time it's invoked.
> Enable undoing the undo.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Applied, thanks.
- Ted
next prev parent reply other threads:[~2015-05-05 14:24 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-02 2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
2015-04-02 2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
2015-04-21 1:47 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
2015-04-21 1:47 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2015-04-21 3:03 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2015-04-21 3:03 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
2015-04-21 2:26 ` Theodore Ts'o
2015-04-21 4:43 ` Darrick J. Wong
2015-04-21 14:06 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
2015-04-21 16:33 ` Theodore Ts'o
2015-04-02 2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
2015-04-21 14:36 ` Theodore Ts'o
2015-05-05 22:45 ` Darrick J. Wong
2015-04-02 2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
2015-04-21 14:47 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
2015-04-02 4:10 ` Andreas Dilger
[not found] ` <20150402060021.GP11031@birch.djwong.org>
[not found] ` <10D33B1F-52B7-4242-9A67-FB9E1CE75296@dilger.ca>
2015-04-06 18:57 ` Darrick J. Wong
2015-04-02 2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
2015-04-02 4:06 ` Andreas Dilger
2015-04-21 15:00 ` Theodore Ts'o
2015-04-21 16:48 ` Theodore Ts'o
2015-04-22 2:47 ` Darrick J. Wong
2015-05-05 14:20 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
2015-05-05 14:21 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
2015-05-05 14:21 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
2015-05-05 14:22 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
2015-05-05 14:24 ` Theodore Ts'o [this message]
2015-04-02 2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
2015-05-05 14:31 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
2015-05-05 14:07 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
2015-05-05 14:36 ` Theodore Ts'o
2015-04-02 2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
2015-05-05 14:36 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
2015-05-05 14:37 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
2015-05-05 14:43 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
2015-05-05 14:43 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
2015-05-05 14:44 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
2015-05-05 14:46 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
2015-05-05 14:49 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
2015-05-05 14:51 ` Theodore Ts'o
2015-04-02 2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
2015-05-05 14:52 ` Theodore Ts'o
2015-04-02 2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
2015-05-05 14:52 ` Theodore Ts'o
2015-04-02 2:37 ` [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2015-04-02 2:37 ` [PATCH 29/35] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
2015-04-02 2:37 ` [PATCH 30/35] libext2fs: add new hooks to support large allocations Darrick J. Wong
2015-04-02 2:37 ` [PATCH 31/35] libext2fs: implement fallocate Darrick J. Wong
2015-04-02 2:37 ` [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
2015-04-02 2:37 ` [PATCH 33/35] debugfs: implement fallocate Darrick J. Wong
2015-04-02 2:37 ` [PATCH 34/35] tests: test debugfs punch command Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150505142434.GB9260@thunk.org \
--to=tytso@mit.edu \
--cc=darrick.wong@oracle.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).