* [Q] ext3 mkfs: zeroing journal blocks @ 2009-05-11 15:03 Alexander Shishkin 2009-05-11 17:58 ` Eric Sandeen 0 siblings, 1 reply; 12+ messages in thread From: Alexander Shishkin @ 2009-05-11 15:03 UTC (permalink / raw) To: linux-ext4 Hi, Here's a question regarding ext3, jbd and mkfs. I'm not 100% confident this is the right list, got it from MAINTAINERS for ext3 and jbd. Please correct me if this is wrong. As far as I could tell from brief looking at jbd code, it seemed to me that the only thing that has to be reset during the filesystem creation time is journal superblock (talking about the default case when journal resides within an ext3 partition). However, currently mke2fs -j would zero every journal block no matter what. So, the question is: can this zeroing really be avoided in mkfs? I tried commenting-out ext2fs_zero_block() in mkjournal_proc() and it seems to speed up mkfs a great deal while the kernel is still able to mount the partition afterwards. Also, for the sake of experiment, I filled the partition with urandom's contents before doing the modified mkfs and it still works. My next step in this direction would be to go through jbd code, but before doing that, I thought, I'd ask here. Please CC me in replies as I'm not (yet) subscribed. Regards, -- Alex ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-11 15:03 [Q] ext3 mkfs: zeroing journal blocks Alexander Shishkin @ 2009-05-11 17:58 ` Eric Sandeen 2009-05-11 18:20 ` Andreas Dilger 0 siblings, 1 reply; 12+ messages in thread From: Eric Sandeen @ 2009-05-11 17:58 UTC (permalink / raw) To: Alexander Shishkin; +Cc: linux-ext4 Alexander Shishkin wrote: > Hi, > > Here's a question regarding ext3, jbd and mkfs. I'm not 100% > confident this is the right list, got it from MAINTAINERS for ext3 > and jbd. Please correct me if this is wrong. > > As far as I could tell from brief looking at jbd code, it seemed to > me that the only thing that has to be reset during the filesystem > creation time is journal superblock (talking about the default case > when journal resides within an ext3 partition). However, currently > mke2fs -j would zero every journal block no matter what. So, the > question is: can this zeroing really be avoided in mkfs? I tried > commenting-out ext2fs_zero_block() in mkjournal_proc() and it seems > to speed up mkfs a great deal while the kernel is still able to mount > the partition afterwards. Also, for the sake of experiment, I filled > the partition with urandom's contents before doing the modified mkfs > and it still works. My next step in this direction would be to go > through jbd code, but before doing that, I thought, I'd ask here. > > Please CC me in replies as I'm not (yet) subscribed. > > Regards, Hi Alexander - Yes, this is a fine list for these questions. Looks like commit 16ed5b3af43c72f60991222b9d7ab65cf53f203d added the block zeroing at the same time as external journal support went in way back in 2001 ... IOW, it wasn't added later to fix anything in particular. Also even at that time, internal journals were not zeroed, so it's not like that was removed in the meantime. Seems extraneous to me, but ... maybe Ted knows more ... -Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-11 17:58 ` Eric Sandeen @ 2009-05-11 18:20 ` Andreas Dilger 2009-05-11 18:44 ` Eric Sandeen 0 siblings, 1 reply; 12+ messages in thread From: Andreas Dilger @ 2009-05-11 18:20 UTC (permalink / raw) To: Eric Sandeen; +Cc: Alexander Shishkin, linux-ext4 On May 11, 2009 12:58 -0500, Eric Sandeen wrote: > Alexander Shishkin wrote: > > As far as I could tell from brief looking at jbd code, it seemed to > > me that the only thing that has to be reset during the filesystem > > creation time is journal superblock (talking about the default case > > when journal resides within an ext3 partition). However, currently > > mke2fs -j would zero every journal block no matter what. So, the > > question is: can this zeroing really be avoided in mkfs? I tried > > commenting-out ext2fs_zero_block() in mkjournal_proc() and it seems > > to speed up mkfs a great deal while the kernel is still able to mount > > the partition afterwards. Also, for the sake of experiment, I filled > > the partition with urandom's contents before doing the modified mkfs > > and it still works. My next step in this direction would be to go > > through jbd code, but before doing that, I thought, I'd ask here. > > Looks like commit 16ed5b3af43c72f60991222b9d7ab65cf53f203d added the > block zeroing at the same time as external journal support went in way > back in 2001 ... IOW, it wasn't added later to fix anything in > particular. Also even at that time, internal journals were not zeroed, > so it's not like that was removed in the meantime. Seems extraneous to > me, but ... maybe Ted knows more ... The reason that the journal is zeroed is because there is some chance that old (valid at the time) transaction headers and commit blocks might be in the journal and could accidentally be "recovered" and cause bad corruption of the filesystem. That said, the chance of this is relatively low, so if you are feeling lucky the zeroing of the journal could be skipped. This accidental journal recovery could only happen if a valid transaction completed at block X, then there was a stale transaction from the filesystem's previous life starting at block X+1 with the next consecutive transaction number. It is pretty unlikely I think. We could avoid this problem entirely if the journal checksum was computed to include the JBD UUID or something in the checksum value, since even old transactions with the correct location and transaction ID would fail the checksum because the new JBD UUID would be different. This could be implemented as part of the "V2 per-block journal checksum", if anyone had time to work on that. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-11 18:20 ` Andreas Dilger @ 2009-05-11 18:44 ` Eric Sandeen 2009-05-11 19:35 ` Theodore Tso ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Eric Sandeen @ 2009-05-11 18:44 UTC (permalink / raw) To: Andreas Dilger; +Cc: Alexander Shishkin, linux-ext4 Andreas Dilger wrote: > The reason that the journal is zeroed is because there is some chance > that old (valid at the time) transaction headers and commit blocks might > be in the journal and could accidentally be "recovered" and cause bad > corruption of the filesystem. But I guess the question is, why isn't a normal internal log zeroed? If I'm reading it right only external logs get this treatment, and I think that's what generated the original question from Alexander. -Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-11 18:44 ` Eric Sandeen @ 2009-05-11 19:35 ` Theodore Tso 2009-05-11 19:35 ` Andreas Dilger 2009-05-12 11:55 ` Alexander Shishkin 2 siblings, 0 replies; 12+ messages in thread From: Theodore Tso @ 2009-05-11 19:35 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andreas Dilger, Alexander Shishkin, linux-ext4 On Mon, May 11, 2009 at 01:44:18PM -0500, Eric Sandeen wrote: > Andreas Dilger wrote: > > > The reason that the journal is zeroed is because there is some chance > > that old (valid at the time) transaction headers and commit blocks might > > be in the journal and could accidentally be "recovered" and cause bad > > corruption of the filesystem. > > But I guess the question is, why isn't a normal internal log zeroed? > > If I'm reading it right only external logs get this treatment, and I > think that's what generated the original question from Alexander. Internal journals are indeed cleared. Check out write_journal_inode() in lib/ext2fs/mkjournal.c, which calls ext2fs_block_iterate() passing in the callback function mkjournal_proc(), which calls ext2fs_zero_blocks(). - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-11 18:44 ` Eric Sandeen 2009-05-11 19:35 ` Theodore Tso @ 2009-05-11 19:35 ` Andreas Dilger 2009-05-12 11:55 ` Alexander Shishkin 2 siblings, 0 replies; 12+ messages in thread From: Andreas Dilger @ 2009-05-11 19:35 UTC (permalink / raw) To: Eric Sandeen; +Cc: Alexander Shishkin, linux-ext4 On May 11, 2009 13:44 -0500, Eric Sandeen wrote: > Andreas Dilger wrote: > > The reason that the journal is zeroed is because there is some chance > > that old (valid at the time) transaction headers and commit blocks might > > be in the journal and could accidentally be "recovered" and cause bad > > corruption of the filesystem. > > But I guess the question is, why isn't a normal internal log zeroed? > > If I'm reading it right only external logs get this treatment, and I > think that's what generated the original question from Alexander. Hmm, possibly because when ext3 was first allocated the internal journal created was "dd if=/dev/zero of=/mnt/fs/.journal bs=1M count={jnl_size}" on the filesystem mounted as ext2, so normal filesystem IO would handle the zeroing of the blocks. Even today if tune2fs adds a journal to a filesystem it does the zero filling of the journal. Looking at the mke2fs code it also appears to be doing zeroing of the journal inode in: mke2fs ->ext2fs_add_journal_inode ->write_journal_inode ->ext2fs_block_iterate ->mkjournal_proc (increment zero_count) ->ext2fs_zero_blocks Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-11 18:44 ` Eric Sandeen 2009-05-11 19:35 ` Theodore Tso 2009-05-11 19:35 ` Andreas Dilger @ 2009-05-12 11:55 ` Alexander Shishkin 2009-05-12 12:13 ` Theodore Tso 2 siblings, 1 reply; 12+ messages in thread From: Alexander Shishkin @ 2009-05-12 11:55 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andreas Dilger, linux-ext4 On 11 May 2009 21:44, Eric Sandeen <sandeen@redhat.com> wrote: > Andreas Dilger wrote: > >> The reason that the journal is zeroed is because there is some chance >> that old (valid at the time) transaction headers and commit blocks might >> be in the journal and could accidentally be "recovered" and cause bad >> corruption of the filesystem. > > But I guess the question is, why isn't a normal internal log zeroed? > > If I'm reading it right only external logs get this treatment, and I > think that's what generated the original question from Alexander. My concern was basically if it is safe to skip zeroing for internal journal. Regards, -- Alex ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-12 11:55 ` Alexander Shishkin @ 2009-05-12 12:13 ` Theodore Tso 2009-05-12 12:49 ` Alexander Shishkin 0 siblings, 1 reply; 12+ messages in thread From: Theodore Tso @ 2009-05-12 12:13 UTC (permalink / raw) To: Alexander Shishkin; +Cc: Eric Sandeen, Andreas Dilger, linux-ext4 On Tue, May 12, 2009 at 02:55:10PM +0300, Alexander Shishkin wrote: > On 11 May 2009 21:44, Eric Sandeen <sandeen@redhat.com> wrote: > > Andreas Dilger wrote: > > > >> The reason that the journal is zeroed is because there is some chance > >> that old (valid at the time) transaction headers and commit blocks might > >> be in the journal and could accidentally be "recovered" and cause bad > >> corruption of the filesystem. > > > > But I guess the question is, why isn't a normal internal log zeroed? > > > > If I'm reading it right only external logs get this treatment, and I > > think that's what generated the original question from Alexander. > > My concern was basically if it is safe to skip zeroing for internal journal. Strictly speaking, no. Most of the time you'll get lucky. The place where you will get into trouble will be is if there is leftover uninitialized garbage (particularly if you are reformatting an existing ext3/4 filesystem) that looks like a journal log block, with the correct journal transaction number, *and* the system crashes before the journal has been completely written through at least once. What precisely is your concern? Normally the journal isn't that big, and it's a contiguous write --- so it doesn't take that long. Are you worried about the time it takes, or trying to avoid some writes to an SSD, or some other concern? If we know it's an SSD, where reads are fast, and writes are slow, I suppose we could scan the disk looking for potentially dangerous blocks and zero them manually. It's really not clear it's worth the effort though. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-12 12:13 ` Theodore Tso @ 2009-05-12 12:49 ` Alexander Shishkin 2009-05-12 21:04 ` Theodore Tso 2009-07-29 15:58 ` [PATCH] [RFC] mkjournal: zero journal blocks only when necessary Alexander Shishkin 0 siblings, 2 replies; 12+ messages in thread From: Alexander Shishkin @ 2009-05-12 12:49 UTC (permalink / raw) To: Theodore Tso; +Cc: Eric Sandeen, Andreas Dilger, linux-ext4 On 12 May 2009 15:13, Theodore Tso <tytso@mit.edu> wrote: >> My concern was basically if it is safe to skip zeroing for internal journal. > > Strictly speaking, no. Most of the time you'll get lucky. The place > where you will get into trouble will be is if there is leftover > uninitialized garbage (particularly if you are reformatting an > existing ext3/4 filesystem) that looks like a journal log block, with > the correct journal transaction number, *and* the system crashes > before the journal has been completely written through at least once. So, what Andreas explained yesterday also applies to the internal log case. I see. Would you say it's possible to prevent this, for instance somehow say, by means checksums as Andreas suggested? > What precisely is your concern? Normally the journal isn't that big, > and it's a contiguous write --- so it doesn't take that long. Are you > worried about the time it takes, or trying to avoid some writes to an > SSD, or some other concern? If we know it's an SSD, where reads are It's an mmc and it (mkfs) runs almost two times faster without zeroing the journal. The only thing I'm worried about is the time that it takes for mke2fs -j to complete. I've done some caching trickery to unix_io.c which I'm going to post here separately, but most of the time seems to be taken by the journal. > fast, and writes are slow, I suppose we could scan the disk looking > for potentially dangerous blocks and zero them manually. It's really > not clear it's worth the effort though. Hm. This might work, I'll look into it. Thanks for the idea! Regards, -- Alex -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Q] ext3 mkfs: zeroing journal blocks 2009-05-12 12:49 ` Alexander Shishkin @ 2009-05-12 21:04 ` Theodore Tso 2009-07-29 15:58 ` [PATCH] [RFC] mkjournal: zero journal blocks only when necessary Alexander Shishkin 1 sibling, 0 replies; 12+ messages in thread From: Theodore Tso @ 2009-05-12 21:04 UTC (permalink / raw) To: Alexander Shishkin; +Cc: Eric Sandeen, Andreas Dilger, linux-ext4 On Tue, May 12, 2009 at 03:49:30PM +0300, Alexander Shishkin wrote: > > So, what Andreas explained yesterday also applies to the internal log > case. I see. Would you say it's possible to prevent this, for instance > somehow say, by means checksums as Andreas suggested? > It's *possible*, but it's not a trivial amount of work; it requires both kernel and userspace changes, though. > It's an mmc and it (mkfs) runs almost two times faster without zeroing > the journal. The only thing I'm worried about is the time that it > takes for mke2fs -j to complete. I've done some caching trickery to > unix_io.c which I'm going to post here separately, but most of the > time seems to be taken by the journal. But why do you care about the time it akes for mke2fs -j to complete? How much time is it taking? Normally mke2fs isn't one of those programs which gets run all the time.... - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] [RFC] mkjournal: zero journal blocks only when necessary 2009-05-12 12:49 ` Alexander Shishkin 2009-05-12 21:04 ` Theodore Tso @ 2009-07-29 15:58 ` Alexander Shishkin 2009-07-29 17:16 ` Theodore Tso 1 sibling, 1 reply; 12+ messages in thread From: Alexander Shishkin @ 2009-07-29 15:58 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, Alexander Shishkin Will something like this do? The only blocks that might theoretically (although very unlikely) be dangerous for newly-created filesystem's integrity are those that still contain valid signatures, others can be safely skipped. Since reads are generally faster (or at least, not slower), this gives some performance increase during mkfs run. Signed-off-by: Alexander Shishkin <alexander.shishckin@gmail.com> --- lib/ext2fs/mkjournal.c | 49 ++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 43 insertions(+), 6 deletions(-) diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c index f5a9dba..bb6e748 100644 --- a/lib/ext2fs/mkjournal.c +++ b/lib/ext2fs/mkjournal.c @@ -144,6 +144,44 @@ errout: * attempt to free the static zeroizing buffer. (This is to keep * programs that check for memory leaks happy.) */ +static errcode_t jzero_blocks(io_channel channel, unsigned long long block, + int count) +{ + errcode_t retval; + unsigned long long n; + static void *__buf = NULL; + struct unix_private_data *data; + + data = channel->private_data; + if (!__buf) + __buf = malloc(channel->block_size); + if (!__buf) + return -1; + + for (n = block; n < block + count; n++) { + journal_header_t *jr; + + retval = io_channel_read_blk(channel, n, 1, __buf); + if (retval) + goto out_err; + continue; + + jr = __buf; + if (jr->h_magic == htonl(JFS_MAGIC_NUMBER) && + jr->h_blocktype == htonl(JFS_REVOKE_BLOCK)) { + memset(__buf, 0, channel->block_size); + retval = io_channel_write_blk(channel, n, 1, __buf); + } + + if (retval) + goto out_err; + } + + return 0; +out_err: + return -1; +} + #define STRIDE_LENGTH 8 errcode_t ext2fs_zero_blocks(ext2_filsys fs, blk_t blk, int num, blk_t *ret_blk, int *ret_count) @@ -238,10 +276,9 @@ static int mkjournal_proc(ext2_filsys fs, (es->zero_count < 1024)) es->zero_count++; else { - retval = ext2fs_zero_blocks(fs, - es->blk_to_zero, - es->zero_count, - 0, 0); + retval = jzero_blocks(fs->io, + es->blk_to_zero, + es->zero_count); es->zero_count = 0; } } @@ -339,8 +376,8 @@ static errcode_t write_journal_inode(ext2_filsys fs, ext2_ino_t journal_ino, goto errout; } if (es.zero_count) { - retval = ext2fs_zero_blocks(fs, es.blk_to_zero, - es.zero_count, 0, 0); + retval = jzero_blocks(fs->io, es.blk_to_zero, + es.zero_count); if (retval) goto errout; } -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] [RFC] mkjournal: zero journal blocks only when necessary 2009-07-29 15:58 ` [PATCH] [RFC] mkjournal: zero journal blocks only when necessary Alexander Shishkin @ 2009-07-29 17:16 ` Theodore Tso 0 siblings, 0 replies; 12+ messages in thread From: Theodore Tso @ 2009-07-29 17:16 UTC (permalink / raw) To: Alexander Shishkin; +Cc: linux-ext4 On Wed, Jul 29, 2009 at 06:58:16PM +0300, Alexander Shishkin wrote: > Will something like this do? > > The only blocks that might theoretically (although very unlikely) be > dangerous for newly-created filesystem's integrity are those that > still contain valid signatures, others can be safely skipped. > > Since reads are generally faster (or at least, not slower), this > gives some performance increase during mkfs run. Did you bother benchmarking what this would do on normal disk drives? Previously we were writing out the blocks to be zeroed in large chunks at a time for speed reasons. This patch reduces it to reading the journal one block at a time, and if it contains a valid signature it writes a zero block. The patch also doesn't check for commit blocks, which are just as much a problem (if not more so) than revoke blocks. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-07-29 17:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-11 15:03 [Q] ext3 mkfs: zeroing journal blocks Alexander Shishkin 2009-05-11 17:58 ` Eric Sandeen 2009-05-11 18:20 ` Andreas Dilger 2009-05-11 18:44 ` Eric Sandeen 2009-05-11 19:35 ` Theodore Tso 2009-05-11 19:35 ` Andreas Dilger 2009-05-12 11:55 ` Alexander Shishkin 2009-05-12 12:13 ` Theodore Tso 2009-05-12 12:49 ` Alexander Shishkin 2009-05-12 21:04 ` Theodore Tso 2009-07-29 15:58 ` [PATCH] [RFC] mkjournal: zero journal blocks only when necessary Alexander Shishkin 2009-07-29 17:16 ` Theodore Tso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).