public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty.
       [not found] <c56f29cb0807090821x74d0652ekc3ae4bb5725df76c@mail.gmail.com>
@ 2008-07-09 15:24 ` Jan Willem van den Brand
  2008-07-09 16:27   ` Theodore Tso
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Willem van den Brand @ 2008-07-09 15:24 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1582 bytes --]

This patch makes ext3 mountable as ext2 even in case of a power
failure while mounted as ext3. We have tested it for kernel 2.6.13 but
it should be fairly easy to get it to work for other versions.

When mounting an ext3 file system as ext2 (without journalling) an
incompatibility flag is checked to assure that the journal can be
safely ignored. This INCOMPAT_RECOVER flag is set when ext3 is
unmounted. The idea is that, at this point, all checkpointing data is
transferred to disk.

In case of a power failure, the INCOMPAT flag is not reset. Systems
that suffer from frequent power failure (e.g. SD-cards that are
unsafely removed) will often not be mountable as ext2.

I think that ext3 can be mounted as ext2 when there is no
checkpointing data in the journal (no data being written from journal
to disk). The journal is then skipped by both e2fsck and ext3
mounting. To make aforementioned systems more frequently mountable as
ext2 we reset the INCOMPAT flag when we are sure that there is no
checkpointing data in the journal. We set it again as soon as there
is.

Furthermore, we observed that there is alsmost always checkpointing
data in the journal. Therefore, we flush the journal on every file
sync (journal flushing flushes checkppointing data) and we perform a
file sync after every file close.

Obviously, this solution will result in poor performance when many
small files are frequently closed after write but that is not the case
in our system (TomTom navigation device).

I'd like to hear opinions about this solution.

Best regards,

Jan Willem van den Brand

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 3874 bytes --]

==== linux-s3c24xx.org/fs/ext3/fsync.c#1 - linux-s3c24xx/fs/ext3/fsync.c ====
--- linux-s3c24xx.org/fs/ext3/fsync.c	  2008-06-30 11:28:51.000000000 +0200
+++ linux-s3c24xx/fs/ext3/fsync.c	  2008-06-30 11:28:27.000000000 +0200
@@ -29,6 +29,7 @@
 #include <linux/jbd.h>
 #include <linux/ext3_fs.h>
 #include <linux/ext3_jbd.h>
 
 /*
  * akpm: A new design for ext3_sync_file().
@@ -45,6 +46,7 @@
 int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)  {
 	struct inode *inode = dentry->d_inode;
+	struct super_block *sb = inode->i_sb;
 	int ret = 0;
 
 	J_ASSERT(ext3_journal_current_handle() == 0); @@ -84,5 +86,10 @@
 		ret = sync_inode(inode, &wbc);
 	}
 out:
+	/* After an fsync, we empty the journal (checkpoint all
+	 * T_FINISHED transactions to disk.  */
+	journal_lock_updates(EXT3_SB(sb)->s_journal);
+	journal_flush(EXT3_SB(sb)->s_journal);
+	journal_unlock_updates(EXT3_SB(sb)->s_journal);
 	return ret;
 }
==== linux-s3c24xx.org/fs/jbd/journal.c#1 - /linux-s3c24xx/fs/jbd/journal.c ====
--- linux-s3c24xx.org/fs/jbd/journal.c	   2008-06-30 11:28:51.000000000 +0200
+++ linux-s3c24xx/fs/jbd/journal.c	   2008-06-30 11:26:16.000000000 +0200
@@ -22,10 +22,13 @@
  * journaling (ext2 can use a reserved inode for storing the log).
  */
 
+
+
 #include <linux/module.h>
 #include <linux/time.h>
 #include <linux/fs.h>
 #include <linux/jbd.h>
+#include <linux/ext3_fs.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
 #include <linux/smp_lock.h>
@@ -37,6 +40,23 @@
 #include <asm/page.h>
 #include <linux/proc_fs.h>
 
+static void ext3_commit_super (struct super_block * sb,
+			       struct ext3_super_block * es,
+			       int sync)
+{
+	struct buffer_head *sbh = EXT3_SB(sb)->s_sbh;
+
+	if (!sbh)
+		return;
+	es->s_wtime = cpu_to_le32(get_seconds());
+	es->s_free_blocks_count = cpu_to_le32(ext3_count_free_blocks(sb));
+	es->s_free_inodes_count = cpu_to_le32(ext3_count_free_inodes(sb));
+	BUFFER_TRACE(sbh, "marking dirty");
+	mark_buffer_dirty(sbh);
+	if (sync)
+		sync_dirty_buffer(sbh);
+}
+
 EXPORT_SYMBOL(journal_start);
 EXPORT_SYMBOL(journal_restart);
 EXPORT_SYMBOL(journal_extend);
@@ -938,6 +958,8 @@
 {
 	journal_superblock_t *sb = journal->j_superblock;
 	struct buffer_head *bh = journal->j_sb_buffer;
+	struct super_block * sb_p;
+	int s_start_was_zero = 0;
 
 	/*
 	 * As a special case, if the on-disk copy is already marked as needing @@ -959,8 +981,23 @@
 	jbd_debug(1,"JBD: updating superblock (start %ld, seq %d, errno %d)\n",
 		  journal->j_tail, journal->j_tail_sequence, journal->j_errno);
 
+	if (sb->s_start==0)
+	  s_start_was_zero = 1;
 	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
 	sb->s_start    = cpu_to_be32(journal->j_tail);
+	/* If s_start gets a non-zero value here, we reset the
+	 * imcompatibility flags */
+	if (s_start_was_zero && sb->s_start!=0)
+	  {
+	    sb_p = (struct super_block *)(journal->j_private);
+	    EXT3_SET_INCOMPAT_FEATURE(sb_p, EXT3_FEATURE_INCOMPAT_RECOVER);
+	    /* flush super_block to disk. */
+	    ext3_commit_super(sb_p, EXT3_SB(sb_p)->s_es, 1);

+	  }
 	sb->s_errno    = cpu_to_be32(journal->j_errno);
 	spin_unlock(&journal->j_state_lock);
 
@@ -1342,6 +1379,7 @@
 	int err = 0;
 	transaction_t *transaction = NULL;
 	unsigned long old_tail;
+	struct super_block * sb_p;
 
 	spin_lock(&journal->j_state_lock);
 
@@ -1390,6 +1428,17 @@
 	J_ASSERT(!journal->j_checkpoint_transactions);
 	J_ASSERT(journal->j_head == journal->j_tail);
 	J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
+
+	/* The journal is empty, we can update the compatibility flags
+	 * in ext3's superblock and flush it to disk */
+	sb_p = (struct super_block *)(journal->j_private);
+	EXT3_CLEAR_INCOMPAT_FEATURE(sb_p, EXT3_FEATURE_INCOMPAT_RECOVER);

+	ext3_commit_super(sb_p, EXT3_SB(sb_p)->s_es, 1);

 	spin_unlock(&journal->j_state_lock);
 	return err;
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty.
  2008-07-09 15:24 ` [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty Jan Willem van den Brand
@ 2008-07-09 16:27   ` Theodore Tso
  2008-07-09 21:04     ` Jan Willem van den Brand
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Tso @ 2008-07-09 16:27 UTC (permalink / raw)
  To: Jan Willem van den Brand; +Cc: linux-kernel

On Wed, Jul 09, 2008 at 05:24:25PM +0200, Jan Willem van den Brand wrote:
> This patch makes ext3 mountable as ext2 even in case of a power
> failure while mounted as ext3. We have tested it for kernel 2.6.13 but
> it should be fairly easy to get it to work for other versions.

What's the rationale for doing this.  Why is it *useful* to have the
filesystem be mountable as ext2 in case of a power failure?  The whole
point of ext3 is to be able to keep the filesystem consistent in case
of a power failure.

So the patch as given would never go into the kernel as-is, and in the
general case it is totally counterproductive.  Maybe it could go in as
a optional behaviour enabled by a mount option, but that's assuming we
can be convinced it's a good idea.

> In case of a power failure, the INCOMPAT flag is not reset. Systems
> that suffer from frequent power failure (e.g. SD-cards that are
> unsafely removed) will often not be mountable as ext2.

So what?  Why can't you just run the journal and check the filesystem
for consistency?  If the user did a hot-eject while the SD-card was
being written, even with your patch there is no guarantee that the
card will even be readable.  Some SD-cards go totally non-functional
due to corruption at the flash translation layer when they are yanked
in the middle of an update....

> Obviously, this solution will result in poor performance when many
> small files are frequently closed after write but that is not the case
> in our system (TomTom navigation device).

How often does your TomTom device need to update files?  A better
(userspace-only) solution might be keep the filesystem mounted
read-only, and when you need to write to the filesystem, turn on the
LED (which hopefully will be a hint to the users to keep their grubby
little paws off the eject button), remount it read/write, do your file
writes, then remount it read/only, and turn off the LED.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty.
  2008-07-09 16:27   ` Theodore Tso
@ 2008-07-09 21:04     ` Jan Willem van den Brand
  2008-07-09 21:30       ` Theodore Tso
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Willem van den Brand @ 2008-07-09 21:04 UTC (permalink / raw)
  To: Theodore Tso, Jan Willem van den Brand, linux-kernel

Thank you for your reply!

On 7/9/08, Theodore Tso <tytso@mit.edu> wrote:
> On Wed, Jul 09, 2008 at 05:24:25PM +0200, Jan Willem van den Brand wrote:
>  > This patch makes ext3 mountable as ext2 even in case of a power
>  > failure while mounted as ext3. We have tested it for kernel 2.6.13 but
>  > it should be fairly easy to get it to work for other versions.
>
>
> What's the rationale for doing this.  Why is it *useful* to have the
>  filesystem be mountable as ext2 in case of a power failure?  The whole
>  point of ext3 is to be able to keep the filesystem consistent in case
>  of a power failure.
>
>  So the patch as given would never go into the kernel as-is, and in the
>  general case it is totally counterproductive.  Maybe it could go in as
>  a optional behaviour enabled by a mount option, but that's assuming we
>  can be convinced it's a good idea.

First of all, I totally agree that this should never be default. The
patch is usefull for us because the SD-card with ext3 file system is
(possibly unsafely) pulled out of our TomTom device (which is built
into a car) and then inserted in a Windows PC which only supports
ext2. We are also considering ext3 on Windows or running e2fsck to
replay the journal and then continue as ext2. But for now we are
accessing the file system as ext2.

>  > In case of a power failure, the INCOMPAT flag is not reset. Systems
>  > that suffer from frequent power failure (e.g. SD-cards that are
>  > unsafely removed) will often not be mountable as ext2.
>
>
> So what?  Why can't you just run the journal and check the filesystem
>  for consistency?  If the user did a hot-eject while the SD-card was
>  being written, even with your patch there is no guarantee that the
>  card will even be readable.  Some SD-cards go totally non-functional
>  due to corruption at the flash translation layer when they are yanked
>  in the middle of an update....

To run the journal, a user has to go from his PC to his car which is
kind of hard to explain. We are aware of the fact that there are many
SD-cards that can get corrupted due to power failure. We are only
using SD-cards that do not exhibit this problem. So on the embedded
device we have both the SD-card and the file system power fail
safeness covered. The solution passes extensive power fail safe
testing.

>  > Obviously, this solution will result in poor performance when many
>  > small files are frequently closed after write but that is not the case
>  > in our system (TomTom navigation device).
>
>
> How often does your TomTom device need to update files?  A better
>  (userspace-only) solution might be keep the filesystem mounted
>  read-only, and when you need to write to the filesystem, turn on the
>  LED (which hopefully will be a hint to the users to keep their grubby
>  little paws off the eject button), remount it read/write, do your file
>  writes, then remount it read/only, and turn off the LED.

The amount of writing is indeed limited. We actually considered
unmounting and then mounting again but dropped that idea because we do
not want to close files that are opened for reading. I thought that
remounting ro did not flush the journal and reset the INCOMPAT_RECOVER
flag. I will experiment with remounting, thanks!

Best regards,

Jan Willem

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty.
  2008-07-09 21:04     ` Jan Willem van den Brand
@ 2008-07-09 21:30       ` Theodore Tso
  2008-07-10  7:01         ` Jan Willem van den Brand
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Tso @ 2008-07-09 21:30 UTC (permalink / raw)
  To: Jan Willem van den Brand; +Cc: linux-kernel

On Wed, Jul 09, 2008 at 11:04:28PM +0200, Jan Willem van den Brand wrote:
> First of all, I totally agree that this should never be default. The
> patch is usefull for us because the SD-card with ext3 file system is
> (possibly unsafely) pulled out of our TomTom device (which is built
> into a car) and then inserted in a Windows PC which only supports
> ext2. We are also considering ext3 on Windows or running e2fsck to
> replay the journal and then continue as ext2. But for now we are
> accessing the file system as ext2.

Ah, that was the missing piece.  You have an ext2-only driver under
Windows, and the SD card has to be usable there.  It makes *much* more
sense now.  :-)

I'll note that the code to run the journal is available in userspace,
and while I didn't originally write it so I can only offer it to you
under GPL, it wouldn't be that hard to make it work under Windows.  At
various times I've taken patches to make parts of e2fsprogs work under
Windows.  (In fact, the original version of resize2fs was paid for by
the folks who make PartitionMagic program, and helped pay for the down
payment on my house.  :-) Check out lib/ext2fs/dosio.c in the
e2fsprogs sources.  I don't think anyone has tried building e2fsprogs
on a Windows/Dos environment in quite some time, so I'm sure some
patches will be necessary, but it maybe quite a bit easier than you
think.  (As part of the PartitionMagic contract I also made parts of
e2fsck and mke2fs work on Windows as well, although that was over ten
years ago by now.)

That being said, I have to ask the question --- if the goal is Windows
compatibility, why aren't you using FAT?  Is the performance benefit
critical for your application?  Or do you need a POSIX-compliant
filesystem?

> The amount of writing is indeed limited. We actually considered
> unmounting and then mounting again but dropped that idea because we do
> not want to close files that are opened for reading. I thought that
> remounting ro did not flush the journal and reset the INCOMPAT_RECOVER
> flag. I will experiment with remounting, thanks!

Yes, remounting read/only will flush the journal and clear the
INCOMPAT_RECOVER bit.

							- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty.
  2008-07-09 21:30       ` Theodore Tso
@ 2008-07-10  7:01         ` Jan Willem van den Brand
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Willem van den Brand @ 2008-07-10  7:01 UTC (permalink / raw)
  To: Theodore Tso, Jan Willem van den Brand, linux-kernel

> Ah, that was the missing piece.  You have an ext2-only driver under
>  Windows, and the SD card has to be usable there.  It makes *much* more
>  sense now.  :-)

Actually, we have a different situation. I left out this part in my
explanation because I did not want to confuse people :).  But here it
goes. The SD-card is formatted with the FAT file system which has an
ext3 formatted loop file. In this way, the SD-card is "mounted" by
windows without the need for a driver. Our Windows application
accesses the loop file and only supports ext2. The embedded device
does all his writing in the loop file to ensure power fail safeness.

Initially, the solution did not pass our power fail tests. The problem
was in the page cache reordering. We have the following situation:

user | ext3 | loop | FAT | disk

The ext3 file system writes in the page cache but ensures that
ordering constraints are obeyed. By default, loop writes directly in
the page cache of the FAT file. The pages are then written out of
order to disk, disobeying the ext3 ordering constraints. Enforcing
ordered page writes by mounting the underlying file system (FAT) in
sync mode did not work because loop.c writes directly in the page
cache. We solved this by not writing directly in the page cache in
loop.c if the underlying file system is mounted in sync mode.

This solution passes our power fail tests. I will share this patch in
a new post.

>  I'll note that the code to run the journal is available in userspace,
>  and while I didn't originally write it so I can only offer it to you
>  under GPL, it wouldn't be that hard to make it work under Windows.  At
>  various times I've taken patches to make parts of e2fsprogs work under
>  Windows.  (In fact, the original version of resize2fs was paid for by
>  the folks who make PartitionMagic program, and helped pay for the down
>  payment on my house.  :-) Check out lib/ext2fs/dosio.c in the
>  e2fsprogs sources.  I don't think anyone has tried building e2fsprogs
>  on a Windows/Dos environment in quite some time, so I'm sure some
>  patches will be necessary, but it maybe quite a bit easier than you
>  think.  (As part of the PartitionMagic contract I also made parts of
>  e2fsck and mke2fs work on Windows as well, although that was over ten
>  years ago by now.)

Another solution is to compile e2fsprogs with cygwin. I actually have
compiled the latest version and it passes all tests.

>  That being said, I have to ask the question --- if the goal is Windows
>  compatibility, why aren't you using FAT?  Is the performance benefit
>  critical for your application?  Or do you need a POSIX-compliant
>  filesystem?

The embedded device has to be able to cope with power failure. FAT
becomes inconsistent after power failure. During our tests, the FAT
file system always broke in just a few power failures. Ext3 as well as
ext3 + loop (with loop patch) runs for days without breaking.

> Yes, remounting read/only will flush the journal and clear the
>  INCOMPAT_RECOVER bit.

Thanks again! I will consider this alternative.

Best regards,

Jan Willem van den Brand

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-07-10  7:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <c56f29cb0807090821x74d0652ekc3ae4bb5725df76c@mail.gmail.com>
2008-07-09 15:24 ` [RFC] ext3/jbd, kernel 2.6.13, make ext3 mountable as ext2 when journal is empty Jan Willem van den Brand
2008-07-09 16:27   ` Theodore Tso
2008-07-09 21:04     ` Jan Willem van den Brand
2008-07-09 21:30       ` Theodore Tso
2008-07-10  7:01         ` Jan Willem van den Brand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox