From: Theodore Tso <tytso@mit.edu>
To: Frank Mayhar <fmayhar@google.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 1/1] Allow ext4 to run without a journal.
Date: Tue, 16 Dec 2008 22:12:10 -0500 [thread overview]
Message-ID: <20081217031210.GA17588@mit.edu> (raw)
In-Reply-To: <1228953270.14314.15.camel@bobble.smo.corp.google.com>
On Wed, Dec 10, 2008 at 03:54:30PM -0800, Frank Mayhar wrote:
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index b455c68..3a05ae7 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -19,6 +19,8 @@
> #include <linux/jbd2.h>
> #include "ext4.h"
>
> +#define EXT4_NOJOURNAL_MAGIC (void *)0x457874344e6a6e6c /* "Ext4Njnl" */
> +
This will cause warnings on systems where pointers are 32-bits. I'd
suggest using a 32-bit magic number. Also, the number you have picked
is divisble by 4, which means at least in theory it could be a valid
pointer. If we make EXT4_NOJOURNAL_MAGIC a 32-bit number, then a
number which is divisible by 4 definitely could be a valid pointer on
a 64-bit system. I'd suggest keeping it simple, and simply using 0x1.
This is known not be a valid pointer, since it's not 32-bit word
aligned, and there is precedent for using such a number --- for
example, SIG_IGN is ((void *) 1) on many architectures, and page 0 is
generally unmapped to catch null pointer dereferences. So I'd suggest
keeping things simple instead of using a fancy char string.
Also, why not use EXT4_NOJOURNAL_MAGIC constant directly, instead of
assigning it as a constant value to s_nojournal_flag in struct
ext4_sb_info? That bloats the data structure for no good reason that
I can tell.
> +static inline int ext4_handle_dirty_metadata(handle_t *handle,
> + struct inode *inode, struct buffer_head *bh)
> +{
> + int err;
> +
> + if (ext4_handle_valid(handle)) {
> + err = __ext4_journal_dirty_metadata(__func__, handle, bh);
> + } else {
> + err = __ext4_write_dirty_metadata(inode, bh);
> + }
> + return err;
> +}
I don't see the point in doing this as an inline function; either way,
each time the inline function is executed, it will result a function
call, either to __ext4_journal_dirty_metadata() or
__ext4_write_dirty_metadata(), both of which are called exactly once
by this inline function. So why not move the conditional and the
contents of _ext4_write_dirty_metadata() and
__ext4_journal_dirty_metadata() into __ext4_handle_dirty_metadata(),
and then do this:
#define ext4_handle_dirty_metadata(handle, inode, bh) \
__ext4_handle_dirty_metadata(__func__, handle, inode, bh)
Ext4_handle_dirty_metadata() gets called a *lot*, so this will shrink
the icache footprint of ext4 filesystem, and for CPU's that have
decent branch prediction, centralizing the condition into a single
location instead of an inline will also be a win, once again
demonstrating that with modern CPU's, optimizations that minimize code
size often also faster.
> /* super.c */
> @@ -208,6 +270,9 @@ int ext4_force_commit(struct super_block *sb);
>
> static inline int ext4_should_journal_data(struct inode *inode)
> {
> + BUG_ON(EXT4_JOURNAL(inode) == NULL &&
> + (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
> + EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL));
> if (!S_ISREG(inode->i_mode))
> return 1;
> if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
This shouldn't be a BUG_ON; this gets triggered if there is an inode
which is marked as needing data journalling. So it is not a *bug*,
but at worst either a filesystem corruption, or more charitably a
conflict between a mount option and a setting in an inode flag. In
any case, these should not cause a BUG; at worst an ext4_error(), or
ext4_warning(). However, today, if a filesystem with inodes
requesting data journalling are mounted on ext2, the request is simply
ignored. So I'd argue that a similar treatmeant should be used for
ext4 filesystem running w/o a journal. At worst perhaps there should
be a one-time ext4_warning(), but I'm not convinced even this is
warranted. If the system administrator mounts a filesystem without a
journal, it's pretty clear what was meant.
> @@ -219,6 +284,8 @@ static inline int ext4_should_journal_data(struct inode *inode)
>
> static inline int ext4_should_order_data(struct inode *inode)
> {
> + BUG_ON(EXT4_JOURNAL(inode) == NULL &&
> + EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL);
See above discussion.
> static inline int ext4_should_writeback_data(struct inode *inode)
> {
> + BUG_ON(EXT4_JOURNAL(inode) == NULL &&
> + EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL);
Again, see above.
> --- a/fs/ext4/ext4_sb.h
> +++ b/fs/ext4/ext4_sb.h
> @@ -28,6 +28,7 @@
> * fourth extended-fs super-block data in memory
> */
> struct ext4_sb_info {
> + int *s_nojournal_flag; /* Null to indicate "not a handle" */
> unsigned long s_desc_size; /* Size of a group descriptor in bytes */
See above discussion; I don't understand why this is necessary, and
why it is an int * here, especially given that it is assigned a
constant value which is a (void *), and it is eventually cast into a
(handle_t *).
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e4a241c..e0f433c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2284,7 +2314,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
printk(KERN_ERR
"ext4: No journal on filesystem on %s\n",
sb->s_id);
- goto failed_mount3;
+ clear_opt(sbi->s_mount_opt, DATA_FLAGS);
+ set_opt(sbi->s_mount_opt, WRITEBACK_DATA);
+ sbi->s_journal = NULL;
+ es->s_state &= cpu_to_le16(~EXT4_VALID_FS);
+ needs_recovery = 0;
+ goto no_journal;
}
In the case where there is no journal, please remove the KERN_ERR
printk; that's just excess noise, and some day, Google System
Administrators will thank you for getting rid of it. :-)
@@ -2748,6 +2798,9 @@ static int ext4_create_journal(struct super_block *sb,
EXT4_SB(sb)->s_journal = journal;
+ if (journal_inum == 0)
+ return 0;
+
ext4_update_dynamic_rev(sb);
EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
EXT4_SET_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL);
If journal_inum == 0, ext4_create_journal will never be called (see
line 2429 in ext4fill_super), so this check is not really necessary.
Actually, what we should probably do (as a separate patch before this
one), is to remove ext4_create_journal() and the journal_inum mount
option altogether. It's pointless code that was originally added to
ext3 by Stephen Tweedie back when he wanted to convert ext2
filesystems to ext3 before e2fsprogs supported tune2fs -j. It's been
obsolete for over eight years, in both ext3 and ext4.
In any case, these are all pretty tiny nits; on the whole I think this
patch is quite clean, so I'll add it to the ext4 patch queue for
testing. I would appreciate getting an updated patch which addresses
these suggested cleanups, though. Also, although I haven't tested it,
I suspect we need to add a check so that if there is no journal, and
the EXT4_FEATURE_INCOMPAT_RECOVERY feature is set, the mount should be
aborted with an error, instead of just clearning the recovery flag and
moving on. In actual practice, such a combination should never
happen, but if it does, failing the mount is probably a safer thing
to do.
Regards,
- Ted
next prev parent reply other threads:[~2008-12-17 3:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-10 23:54 [PATCH 1/1] Allow ext4 to run without a journal Frank Mayhar
2008-12-15 17:35 ` Frank Mayhar
2008-12-16 4:00 ` Theodore Tso
2008-12-17 6:00 ` Theodore Tso
2008-12-17 17:52 ` Frank Mayhar
2008-12-17 3:12 ` Theodore Tso [this message]
2008-12-17 17:50 ` Frank Mayhar
2009-01-03 15:52 ` [PATCH] " Theodore Tso
2009-01-03 20:19 ` Frank Mayhar
2009-01-04 1:17 ` Theodore Tso
2009-01-05 5:31 ` Theodore Tso
2008-12-18 6:51 ` [PATCH 1/1] " Andreas Dilger
2008-12-18 18:00 ` Jan Kara
2008-12-18 18:13 ` Michael Rubin
2008-12-18 18:27 ` Jan Kara
2008-12-18 18:29 ` Curt Wohlgemuth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081217031210.GA17588@mit.edu \
--to=tytso@mit.edu \
--cc=fmayhar@google.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.