linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file
Date: Tue, 5 May 2015 10:24:34 -0400	[thread overview]
Message-ID: <20150505142434.GB9260@thunk.org> (raw)
In-Reply-To: <20150402023532.25243.532.stgit@birch.djwong.org>

On Wed, Apr 01, 2015 at 07:35:32PM -0700, Darrick J. Wong wrote:
> The existing undo file format (which is based on tdb) has many
> problems.  First, its comparison of superblock fields is ineffective,
> since the last mount time is only written by the kernel, not the tools
> (which means that undo files can be applied out of order, thus
> corrupting the filesystem); block numbers are written in CPU byte
> order, which will cause silent failures if an undo file is moved from
> one type of system to another; using the tdb database costs us an
> enormous amount of CPU overhead to maintain the key data structure,
> and finally, the tdb database is unable to deal with databases larger
> than 2GB.  (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
> to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
> might as well move off of tdb now.)
> 
> The last problem is fatal if you want to use tune2fs to turn on
> metadata checksumming, since that rewrites every block on the
> filesystem, which can easily produce a many-gigabyte undo file, which
> of course is unreadable and therefore the operation cannot be undone.
> 
> Therefore, rip all of that out in favor of writing to a flat file.
> Old blocks are appended to a file and the index is written to the end
> when we're done.  This implementation is much faster than wasting a
> considerable amount of time trying to maintain a hash index, which
> drops the runtime overhead of tune2fs -O metadata_csum from ~45min
> to ~20 seconds on a 2TB filesystem.
> 
> I have a few reasons that factored in my decision not to repurpose the
> jbd2 file format for undo files.  First, undo files are limited to
> 2^32 blocks (16TB) which some day might not serve us well.  Second,
> the journal block size is tied to the file system block size, but
> mke2fs wants to be able to back up big chunks of old device contents.
> This would require large changes to the e2fsck journal replay code,
> which itself is derived from the kernel jbd2 driver, which I'd rather
> not destabilize.  Third, I want to require undo files to store the FS
> superblock at the end of undo file creation so that e2undo can be
> reasonably sure that an undo file is supposed to apply against the
> given block device, and doing so would require changes to the jbd2
> format.  Fourth, it didn't seem like a good idea that external
> journals should resemble undo files so closely.
> 
> v2: Provide a state bit that is only set when the undo channel is
> closed correctly so we can warn the user about potentially incomplete
> undo files.  Straighten out the superblock handling so that undo files
> won't be confused for real ext* FS images.  Record multi-block runs in
> each block key to reduce overhead even further.  Support reopening an
> undo file so that we can combine multiple FS operations into one
> (overall smaller) transaction file, which will be easier to manage.
> Flush the undo index data if the program should terminate
> unexpectedly.  Update the ext4 superblock bits if errors or -f is
> found to encourage fsck to do a full run the next time it's invoked.
> Enable undoing the undo.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Applied, thanks.

					- Ted

  reply	other threads:[~2015-05-05 14:24 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-02  2:34 [PATCH 00/35] e2fsprogs April 2015 patchbomb Darrick J. Wong
2015-04-02  2:34 ` [PATCH 01/35] e2fuzz: fuzz harder Darrick J. Wong
2015-04-21  1:47   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 02/35] e2fsck: turn inline data symlink into a fast symlink when possible Darrick J. Wong
2015-04-21  1:47   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 03/35] libext2fs/e2fsck: provide routines to read-ahead metadata Darrick J. Wong
2015-04-21  3:03   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Darrick J. Wong
2015-04-21  3:03   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 05/35] e2fsck: track directories to be rehashed with a bitmap Darrick J. Wong
2015-04-21  2:26   ` Theodore Ts'o
2015-04-21  4:43     ` Darrick J. Wong
2015-04-21 14:06       ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 06/35] e2fsck: rebuild sparse extent trees/convert non-extent ext3 files Darrick J. Wong
2015-04-21 16:33   ` Theodore Ts'o
2015-04-02  2:34 ` [PATCH 07/35] e2fsck: convert block-mapped files to extents on bigalloc fs Darrick J. Wong
2015-04-21 14:36   ` Theodore Ts'o
2015-05-05 22:45     ` Darrick J. Wong
2015-04-02  2:34 ` [PATCH 08/35] tests: verify proper rebuilding of sparse extent trees and block map file conversion Darrick J. Wong
2015-04-21 14:47   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 09/35] e2fsck: abort on read error beyond end of FS Darrick J. Wong
2015-04-02  4:10   ` Andreas Dilger
     [not found]     ` <20150402060021.GP11031@birch.djwong.org>
     [not found]       ` <10D33B1F-52B7-4242-9A67-FB9E1CE75296@dilger.ca>
2015-04-06 18:57         ` Darrick J. Wong
2015-04-02  2:35 ` [PATCH 10/35] undo-io: add new calls to and speed up the undo io manager Darrick J. Wong
2015-04-02  4:06   ` Andreas Dilger
2015-04-21 15:00     ` Theodore Ts'o
2015-04-21 16:48       ` Theodore Ts'o
2015-04-22  2:47         ` Darrick J. Wong
2015-05-05 14:20   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 11/35] undo-io: be more flexible about setting block size Darrick J. Wong
2015-05-05 14:21   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 12/35] undo-io: use a bitmap to track what we've already written Darrick J. Wong
2015-05-05 14:21   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 13/35] e2undo: fix memory leaks and tweak the error messages somewhat Darrick J. Wong
2015-05-05 14:22   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 14/35] e2undo: ditch tdb file, write everything to a flat file Darrick J. Wong
2015-05-05 14:24   ` Theodore Ts'o [this message]
2015-04-02  2:35 ` [PATCH 15/35] libext2fs: support atexit cleanups Darrick J. Wong
2015-05-05 14:31   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 16/35] e2fsck: optionally create an undo file Darrick J. Wong
2015-05-05 14:07   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 17/35] resize2fs: optionally create " Darrick J. Wong
2015-05-05 14:36   ` Theodore Ts'o
2015-04-02  2:35 ` [PATCH 18/35] tune2fs: " Darrick J. Wong
2015-05-05 14:36   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 19/35] mke2fs: " Darrick J. Wong
2015-05-05 14:37   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 20/35] debugfs: " Darrick J. Wong
2015-05-05 14:43   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 21/35] tests: test undo file creation in e2fsck/resize2fs/tune2fs/mke2fs Darrick J. Wong
2015-05-05 14:43   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 22/35] tests: test various features of the new e2undo format Darrick J. Wong
2015-05-05 14:44   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 23/35] copy-in: create hardlinks with the correct directory filetype Darrick J. Wong
2015-05-05 14:46   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 24/35] copy-in: for files, only iterate file blocks that are mapped Darrick J. Wong
2015-05-05 14:49   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 25/35] copyin: fix error handling Darrick J. Wong
2015-05-05 14:51   ` Theodore Ts'o
2015-04-02  2:36 ` [PATCH 26/35] mke2fs: add simple tests and re-alphabetize mke2fs manpage options Darrick J. Wong
2015-05-05 14:52   ` Theodore Ts'o
2015-04-02  2:37 ` [PATCH 27/35] contrib: script to create minified ext4 image from a directory Darrick J. Wong
2015-05-05 14:52   ` Theodore Ts'o
2015-04-02  2:37 ` [PATCH 28/35] libext2fs: support allocating uninit blocks in bmap2() Darrick J. Wong
2015-04-02  2:37 ` [PATCH 29/35] libext2fs: find/alloc a range of empty blocks Darrick J. Wong
2015-04-02  2:37 ` [PATCH 30/35] libext2fs: add new hooks to support large allocations Darrick J. Wong
2015-04-02  2:37 ` [PATCH 31/35] libext2fs: implement fallocate Darrick J. Wong
2015-04-02  2:37 ` [PATCH 32/35] libext2fs: use fallocate for creating journals and hugefiles Darrick J. Wong
2015-04-02  2:37 ` [PATCH 33/35] debugfs: implement fallocate Darrick J. Wong
2015-04-02  2:37 ` [PATCH 34/35] tests: test debugfs punch command Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150505142434.GB9260@thunk.org \
    --to=tytso@mit.edu \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).