public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* EXT intent logging
@ 2004-08-06  4:55 Buddy Lumpkin
  2004-08-06  9:57 ` Daniel Pittman
  2004-08-06 19:56 ` Theodore Ts'o
  0 siblings, 2 replies; 8+ messages in thread
From: Buddy Lumpkin @ 2004-08-06  4:55 UTC (permalink / raw)
  To: linux-kernel

I recently moved from a Sun/Solaris environment to a mostly linux
environment .

A large NFS server went down recently and as it rebooted, fsck ran for a
while before 
the data volumes could be mounted. I noticed the filesystem was ext3 and
asked, is 
journaling disabled? Why on earth is fsck running at all? The admin assured
me this
 is quite normal for ext3 and after a few minutes, the system was brought
back online.

I looked at the configuration and it turns out the system was mounted
DATA=ORDERED. 
That name ordered sounded to me like it should do the kind of intent logging
that I am
 accustomed to on UFS and VXFS. I was very surprised to read that ext3
updates the 
standard data/metadata blocks prior to updating the journal. While im sure
this achieves 
what the snippet from the ext3 faq says below: "this mode guarantees that
after a crash, 
files will never contain stale data blocks from old files", I don't see how
fsck time can be 
reduced entirely with this journal method.

To eliminate fsck on large filesystems, wouldn't you have to update the
journal first, then 
update the data blocks? This way in the event of a crash, the last entries
in the log would 
represent the last I/O operations that were "intended" and those blocks
could be inspected
 for consistency.

This of course is my non-kernel hacker understanding of how this works, but
I can say 
one thing. With UFS mounted with -o logging, I can start a ton of reads and
writes and
 just kill the power on a system and not expect to see any delay when the
system comes 
back up.

Of course, UFS logging does not log data, only metadata (as data=ordered or
 data=writeback options do).

Also, vxfs, which behaves more like data=journal I believe, also spends very
little
 time replaying the journal after a nasty crash.

We wanted the journal to be updated first, but we couldn't understand why we
had to opt for data 
journaling to accomplish this. The unfortunate thing is, we have seen
corruption as a result 
of the data=journal option.

Could someone explain why there isn't an option in ext3 to only log
metadata, but completely 
avoid fsck by updating the log before the data blocks?

And im sure I don't need to ask anyone to correct me if I am misguided in my
thinking. I have found 
on lkml that kind of guidance usually comes for free m

--Buddy


---------------------
    "mount -o data=ordered"
        Only journals metadata changes, but data updates are flushed to
        disk before any transactions commit. Data writes are not atomic
        but this mode still guarantees that after a crash, files will
        never contain stale data blocks from old files.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-08-08  1:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-06  4:55 EXT intent logging Buddy Lumpkin
2004-08-06  9:57 ` Daniel Pittman
2004-08-06 13:22   ` Doug McNaught
2004-08-06 16:36     ` Buddy Lumpkin
2004-08-06 18:46       ` Bernd Eckenfels
2004-08-06 19:56 ` Theodore Ts'o
2004-08-07  4:50   ` Buddy Lumpkin
2004-08-08  1:54   ` Thomas Zimmerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox