* [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty. [not found] <c56f29cb0807090821x74d0652ekc3ae4bb5725df76c@mail.gmail.com> @ 2008-07-09 15:24 ` Jan Willem van den Brand 2008-07-09 16:27 ` Theodore Tso 0 siblings, 1 reply; 5+ messages in thread From: Jan Willem van den Brand @ 2008-07-09 15:24 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1582 bytes --] This patch makes ext3 mountable as ext2 even in case of a power failure while mounted as ext3. We have tested it for kernel 2.6.13 but it should be fairly easy to get it to work for other versions. When mounting an ext3 file system as ext2 (without journalling) an incompatibility flag is checked to assure that the journal can be safely ignored. This INCOMPAT_RECOVER flag is set when ext3 is unmounted. The idea is that, at this point, all checkpointing data is transferred to disk. In case of a power failure, the INCOMPAT flag is not reset. Systems that suffer from frequent power failure (e.g. SD-cards that are unsafely removed) will often not be mountable as ext2. I think that ext3 can be mounted as ext2 when there is no checkpointing data in the journal (no data being written from journal to disk). The journal is then skipped by both e2fsck and ext3 mounting. To make aforementioned systems more frequently mountable as ext2 we reset the INCOMPAT flag when we are sure that there is no checkpointing data in the journal. We set it again as soon as there is. Furthermore, we observed that there is alsmost always checkpointing data in the journal. Therefore, we flush the journal on every file sync (journal flushing flushes checkppointing data) and we perform a file sync after every file close. Obviously, this solution will result in poor performance when many small files are frequently closed after write but that is not the case in our system (TomTom navigation device). I'd like to hear opinions about this solution. Best regards, Jan Willem van den Brand [-- Attachment #2: patch.txt --] [-- Type: text/plain, Size: 3874 bytes --] ==== linux-s3c24xx.org/fs/ext3/fsync.c#1 - linux-s3c24xx/fs/ext3/fsync.c ==== --- linux-s3c24xx.org/fs/ext3/fsync.c 2008-06-30 11:28:51.000000000 +0200 +++ linux-s3c24xx/fs/ext3/fsync.c 2008-06-30 11:28:27.000000000 +0200 @@ -29,6 +29,7 @@ #include <linux/jbd.h> #include <linux/ext3_fs.h> #include <linux/ext3_jbd.h> /* * akpm: A new design for ext3_sync_file(). @@ -45,6 +46,7 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; + struct super_block *sb = inode->i_sb; int ret = 0; J_ASSERT(ext3_journal_current_handle() == 0); @@ -84,5 +86,10 @@ ret = sync_inode(inode, &wbc); } out: + /* After an fsync, we empty the journal (checkpoint all + * T_FINISHED transactions to disk. */ + journal_lock_updates(EXT3_SB(sb)->s_journal); + journal_flush(EXT3_SB(sb)->s_journal); + journal_unlock_updates(EXT3_SB(sb)->s_journal); return ret; } ==== linux-s3c24xx.org/fs/jbd/journal.c#1 - /linux-s3c24xx/fs/jbd/journal.c ==== --- linux-s3c24xx.org/fs/jbd/journal.c 2008-06-30 11:28:51.000000000 +0200 +++ linux-s3c24xx/fs/jbd/journal.c 2008-06-30 11:26:16.000000000 +0200 @@ -22,10 +22,13 @@ * journaling (ext2 can use a reserved inode for storing the log). */ + + #include <linux/module.h> #include <linux/time.h> #include <linux/fs.h> #include <linux/jbd.h> +#include <linux/ext3_fs.h> #include <linux/errno.h> #include <linux/slab.h> #include <linux/smp_lock.h> @@ -37,6 +40,23 @@ #include <asm/page.h> #include <linux/proc_fs.h> +static void ext3_commit_super (struct super_block * sb, + struct ext3_super_block * es, + int sync) +{ + struct buffer_head *sbh = EXT3_SB(sb)->s_sbh; + + if (!sbh) + return; + es->s_wtime = cpu_to_le32(get_seconds()); + es->s_free_blocks_count = cpu_to_le32(ext3_count_free_blocks(sb)); + es->s_free_inodes_count = cpu_to_le32(ext3_count_free_inodes(sb)); + BUFFER_TRACE(sbh, "marking dirty"); + mark_buffer_dirty(sbh); + if (sync) + sync_dirty_buffer(sbh); +} + EXPORT_SYMBOL(journal_start); EXPORT_SYMBOL(journal_restart); EXPORT_SYMBOL(journal_extend); @@ -938,6 +958,8 @@ { journal_superblock_t *sb = journal->j_superblock; struct buffer_head *bh = journal->j_sb_buffer; + struct super_block * sb_p; + int s_start_was_zero = 0; /* * As a special case, if the on-disk copy is already marked as needing @@ -959,8 +981,23 @@ jbd_debug(1,"JBD: updating superblock (start %ld, seq %d, errno %d)\n", journal->j_tail, journal->j_tail_sequence, journal->j_errno); + if (sb->s_start==0) + s_start_was_zero = 1; sb->s_sequence = cpu_to_be32(journal->j_tail_sequence); sb->s_start = cpu_to_be32(journal->j_tail); + /* If s_start gets a non-zero value here, we reset the + * imcompatibility flags */ + if (s_start_was_zero && sb->s_start!=0) + { + sb_p = (struct super_block *)(journal->j_private); + EXT3_SET_INCOMPAT_FEATURE(sb_p, EXT3_FEATURE_INCOMPAT_RECOVER); + /* flush super_block to disk. */ + ext3_commit_super(sb_p, EXT3_SB(sb_p)->s_es, 1); + } sb->s_errno = cpu_to_be32(journal->j_errno); spin_unlock(&journal->j_state_lock); @@ -1342,6 +1379,7 @@ int err = 0; transaction_t *transaction = NULL; unsigned long old_tail; + struct super_block * sb_p; spin_lock(&journal->j_state_lock); @@ -1390,6 +1428,17 @@ J_ASSERT(!journal->j_checkpoint_transactions); J_ASSERT(journal->j_head == journal->j_tail); J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence); + + /* The journal is empty, we can update the compatibility flags + * in ext3's superblock and flush it to disk */ + sb_p = (struct super_block *)(journal->j_private); + EXT3_CLEAR_INCOMPAT_FEATURE(sb_p, EXT3_FEATURE_INCOMPAT_RECOVER); + ext3_commit_super(sb_p, EXT3_SB(sb_p)->s_es, 1); spin_unlock(&journal->j_state_lock); return err; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty. 2008-07-09 15:24 ` [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty Jan Willem van den Brand @ 2008-07-09 16:27 ` Theodore Tso 2008-07-09 21:04 ` Jan Willem van den Brand 0 siblings, 1 reply; 5+ messages in thread From: Theodore Tso @ 2008-07-09 16:27 UTC (permalink / raw) To: Jan Willem van den Brand; +Cc: linux-kernel On Wed, Jul 09, 2008 at 05:24:25PM +0200, Jan Willem van den Brand wrote: > This patch makes ext3 mountable as ext2 even in case of a power > failure while mounted as ext3. We have tested it for kernel 2.6.13 but > it should be fairly easy to get it to work for other versions. What's the rationale for doing this. Why is it *useful* to have the filesystem be mountable as ext2 in case of a power failure? The whole point of ext3 is to be able to keep the filesystem consistent in case of a power failure. So the patch as given would never go into the kernel as-is, and in the general case it is totally counterproductive. Maybe it could go in as a optional behaviour enabled by a mount option, but that's assuming we can be convinced it's a good idea. > In case of a power failure, the INCOMPAT flag is not reset. Systems > that suffer from frequent power failure (e.g. SD-cards that are > unsafely removed) will often not be mountable as ext2. So what? Why can't you just run the journal and check the filesystem for consistency? If the user did a hot-eject while the SD-card was being written, even with your patch there is no guarantee that the card will even be readable. Some SD-cards go totally non-functional due to corruption at the flash translation layer when they are yanked in the middle of an update.... > Obviously, this solution will result in poor performance when many > small files are frequently closed after write but that is not the case > in our system (TomTom navigation device). How often does your TomTom device need to update files? A better (userspace-only) solution might be keep the filesystem mounted read-only, and when you need to write to the filesystem, turn on the LED (which hopefully will be a hint to the users to keep their grubby little paws off the eject button), remount it read/write, do your file writes, then remount it read/only, and turn off the LED. Regards, - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty. 2008-07-09 16:27 ` Theodore Tso @ 2008-07-09 21:04 ` Jan Willem van den Brand 2008-07-09 21:30 ` Theodore Tso 0 siblings, 1 reply; 5+ messages in thread From: Jan Willem van den Brand @ 2008-07-09 21:04 UTC (permalink / raw) To: Theodore Tso, Jan Willem van den Brand, linux-kernel Thank you for your reply! On 7/9/08, Theodore Tso <tytso@mit.edu> wrote: > On Wed, Jul 09, 2008 at 05:24:25PM +0200, Jan Willem van den Brand wrote: > > This patch makes ext3 mountable as ext2 even in case of a power > > failure while mounted as ext3. We have tested it for kernel 2.6.13 but > > it should be fairly easy to get it to work for other versions. > > > What's the rationale for doing this. Why is it *useful* to have the > filesystem be mountable as ext2 in case of a power failure? The whole > point of ext3 is to be able to keep the filesystem consistent in case > of a power failure. > > So the patch as given would never go into the kernel as-is, and in the > general case it is totally counterproductive. Maybe it could go in as > a optional behaviour enabled by a mount option, but that's assuming we > can be convinced it's a good idea. First of all, I totally agree that this should never be default. The patch is usefull for us because the SD-card with ext3 file system is (possibly unsafely) pulled out of our TomTom device (which is built into a car) and then inserted in a Windows PC which only supports ext2. We are also considering ext3 on Windows or running e2fsck to replay the journal and then continue as ext2. But for now we are accessing the file system as ext2. > > In case of a power failure, the INCOMPAT flag is not reset. Systems > > that suffer from frequent power failure (e.g. SD-cards that are > > unsafely removed) will often not be mountable as ext2. > > > So what? Why can't you just run the journal and check the filesystem > for consistency? If the user did a hot-eject while the SD-card was > being written, even with your patch there is no guarantee that the > card will even be readable. Some SD-cards go totally non-functional > due to corruption at the flash translation layer when they are yanked > in the middle of an update.... To run the journal, a user has to go from his PC to his car which is kind of hard to explain. We are aware of the fact that there are many SD-cards that can get corrupted due to power failure. We are only using SD-cards that do not exhibit this problem. So on the embedded device we have both the SD-card and the file system power fail safeness covered. The solution passes extensive power fail safe testing. > > Obviously, this solution will result in poor performance when many > > small files are frequently closed after write but that is not the case > > in our system (TomTom navigation device). > > > How often does your TomTom device need to update files? A better > (userspace-only) solution might be keep the filesystem mounted > read-only, and when you need to write to the filesystem, turn on the > LED (which hopefully will be a hint to the users to keep their grubby > little paws off the eject button), remount it read/write, do your file > writes, then remount it read/only, and turn off the LED. The amount of writing is indeed limited. We actually considered unmounting and then mounting again but dropped that idea because we do not want to close files that are opened for reading. I thought that remounting ro did not flush the journal and reset the INCOMPAT_RECOVER flag. I will experiment with remounting, thanks! Best regards, Jan Willem ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty. 2008-07-09 21:04 ` Jan Willem van den Brand @ 2008-07-09 21:30 ` Theodore Tso 2008-07-10 7:01 ` Jan Willem van den Brand 0 siblings, 1 reply; 5+ messages in thread From: Theodore Tso @ 2008-07-09 21:30 UTC (permalink / raw) To: Jan Willem van den Brand; +Cc: linux-kernel On Wed, Jul 09, 2008 at 11:04:28PM +0200, Jan Willem van den Brand wrote: > First of all, I totally agree that this should never be default. The > patch is usefull for us because the SD-card with ext3 file system is > (possibly unsafely) pulled out of our TomTom device (which is built > into a car) and then inserted in a Windows PC which only supports > ext2. We are also considering ext3 on Windows or running e2fsck to > replay the journal and then continue as ext2. But for now we are > accessing the file system as ext2. Ah, that was the missing piece. You have an ext2-only driver under Windows, and the SD card has to be usable there. It makes *much* more sense now. :-) I'll note that the code to run the journal is available in userspace, and while I didn't originally write it so I can only offer it to you under GPL, it wouldn't be that hard to make it work under Windows. At various times I've taken patches to make parts of e2fsprogs work under Windows. (In fact, the original version of resize2fs was paid for by the folks who make PartitionMagic program, and helped pay for the down payment on my house. :-) Check out lib/ext2fs/dosio.c in the e2fsprogs sources. I don't think anyone has tried building e2fsprogs on a Windows/Dos environment in quite some time, so I'm sure some patches will be necessary, but it maybe quite a bit easier than you think. (As part of the PartitionMagic contract I also made parts of e2fsck and mke2fs work on Windows as well, although that was over ten years ago by now.) That being said, I have to ask the question --- if the goal is Windows compatibility, why aren't you using FAT? Is the performance benefit critical for your application? Or do you need a POSIX-compliant filesystem? > The amount of writing is indeed limited. We actually considered > unmounting and then mounting again but dropped that idea because we do > not want to close files that are opened for reading. I thought that > remounting ro did not flush the journal and reset the INCOMPAT_RECOVER > flag. I will experiment with remounting, thanks! Yes, remounting read/only will flush the journal and clear the INCOMPAT_RECOVER bit. - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty. 2008-07-09 21:30 ` Theodore Tso @ 2008-07-10 7:01 ` Jan Willem van den Brand 0 siblings, 0 replies; 5+ messages in thread From: Jan Willem van den Brand @ 2008-07-10 7:01 UTC (permalink / raw) To: Theodore Tso, Jan Willem van den Brand, linux-kernel > Ah, that was the missing piece. You have an ext2-only driver under > Windows, and the SD card has to be usable there. It makes *much* more > sense now. :-) Actually, we have a different situation. I left out this part in my explanation because I did not want to confuse people :). But here it goes. The SD-card is formatted with the FAT file system which has an ext3 formatted loop file. In this way, the SD-card is "mounted" by windows without the need for a driver. Our Windows application accesses the loop file and only supports ext2. The embedded device does all his writing in the loop file to ensure power fail safeness. Initially, the solution did not pass our power fail tests. The problem was in the page cache reordering. We have the following situation: user | ext3 | loop | FAT | disk The ext3 file system writes in the page cache but ensures that ordering constraints are obeyed. By default, loop writes directly in the page cache of the FAT file. The pages are then written out of order to disk, disobeying the ext3 ordering constraints. Enforcing ordered page writes by mounting the underlying file system (FAT) in sync mode did not work because loop.c writes directly in the page cache. We solved this by not writing directly in the page cache in loop.c if the underlying file system is mounted in sync mode. This solution passes our power fail tests. I will share this patch in a new post. > I'll note that the code to run the journal is available in userspace, > and while I didn't originally write it so I can only offer it to you > under GPL, it wouldn't be that hard to make it work under Windows. At > various times I've taken patches to make parts of e2fsprogs work under > Windows. (In fact, the original version of resize2fs was paid for by > the folks who make PartitionMagic program, and helped pay for the down > payment on my house. :-) Check out lib/ext2fs/dosio.c in the > e2fsprogs sources. I don't think anyone has tried building e2fsprogs > on a Windows/Dos environment in quite some time, so I'm sure some > patches will be necessary, but it maybe quite a bit easier than you > think. (As part of the PartitionMagic contract I also made parts of > e2fsck and mke2fs work on Windows as well, although that was over ten > years ago by now.) Another solution is to compile e2fsprogs with cygwin. I actually have compiled the latest version and it passes all tests. > That being said, I have to ask the question --- if the goal is Windows > compatibility, why aren't you using FAT? Is the performance benefit > critical for your application? Or do you need a POSIX-compliant > filesystem? The embedded device has to be able to cope with power failure. FAT becomes inconsistent after power failure. During our tests, the FAT file system always broke in just a few power failures. Ext3 as well as ext3 + loop (with loop patch) runs for days without breaking. > Yes, remounting read/only will flush the journal and clear the > INCOMPAT_RECOVER bit. Thanks again! I will consider this alternative. Best regards, Jan Willem van den Brand ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-07-10 7:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <c56f29cb0807090821x74d0652ekc3ae4bb5725df76c@mail.gmail.com>
2008-07-09 15:24 ` [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty Jan Willem van den Brand
2008-07-09 16:27 ` Theodore Tso
2008-07-09 21:04 ` Jan Willem van den Brand
2008-07-09 21:30 ` Theodore Tso
2008-07-10 7:01 ` Jan Willem van den Brand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox