From: "Stephen C. Tweedie" <sct@redhat.com>
To: Jakob Oestergaard <jakob@unthought.net>, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@zip.com.au>, Stephen Tweedie <sct@redhat.com>
Subject: Re: jbd bug(s) (?)
Date: Wed, 25 Sep 2002 17:36:05 +0100 [thread overview]
Message-ID: <20020925173605.A12911@redhat.com> (raw)
In-Reply-To: <20020924072117.GD2442@unthought.net>; from jakob@unthought.net on Tue, Sep 24, 2002 at 09:21:17AM +0200
Hi,
On Tue, Sep 24, 2002 at 09:21:17AM +0200, Jakob Oestergaard wrote:
> In Linux-2.4.19, I was wondering about the following:
>
> In fs/jbd/commit.c:583, we find the following:
> /* AKPM: buglet - add `i' to tmp! */
> for (i = 0; i < jh2bh(descriptor)->b_size; i += 512) {
> As I see it, this means that jbd using filesystems (ext3) will only
> remember writing *ONE* entry from the journal.
> Isn't this a problem ?
Nah. In fact I should just remove the loop entirely. For commit
processing, only the header at the very, very start of a commit block
is cared about --- that way, we get atomic commits even if the commit
block is partially written out-of-order on disk. As long as sector
writes within the fs block are atomic, the header remains intact.
> The jbd superblocks contains an index into the journal for the first
> transaction - but there is only *one* copy of the index, and there is no
> reasonable way to detect if it got written correctly to disk.
>
> If the system loses power while updating the superblock, and only *half*
> of this index is written correctly, we have a journal which we cannot
> reach.
Again, only the data in the first sector matters there, and we assume
that disks write individual sectors atomically, or return IO failure
if things get messed up. And the index sector is not updated all that
frequently anyway --- maybe once or twice per journal wrap, but it
doesn't have to be written for each transaction.
> Sort of removes the point of having the journal in the first place. (If
> my above assertion is true).
Actually, the number of single points of failure in a filesystem is
huge. If we lose, say, the root directory, we're toast too (and that
can be due to an inode block or a directory block failure); similarly,
other key directories are critical, and within the journal itself, an
unreadable metadata descriptor block will rend parts of the journal
unusable for recovery.
So if we detect incomplete sector writes, we can recover by forcing a
fsck, but if you want to be able to survive actual data loss, you need
raid.
> one can consider the two blocks and disregard the ones with invalid CRC.
> This leaves us with one or two blocks left - we then pick the one with
> the highest timestamp - and we are then guaranteed to *always* have a
> valid index.
> Wouldn't something like this be required for a journalling fs to be
> worth anything ?
No, ext3 just relies on the sector atomicity guarantees instead.
There _are_ multi-sector data structures in the journal, but those are
all protected by the sector-atomic commit block --- if we don't see
the commit sector, then we ignore all of the blocks of the prior
transaction in the log, and the commit sector is never written until
we've got a guarantee that the whole of the preceding blocks are
consistent on-disk.
Cheers,
Stephen
next prev parent reply other threads:[~2002-09-25 16:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-24 7:21 jbd bug(s) (?) Jakob Oestergaard
2002-09-25 16:36 ` Stephen C. Tweedie [this message]
2002-09-26 12:21 ` Jakob Oestergaard
2002-09-26 12:27 ` Stephen C. Tweedie
2002-09-26 12:56 ` Jakob Oestergaard
2002-09-26 13:44 ` Theodore Ts'o
2002-09-26 14:05 ` Christoph Hellwig
2002-09-26 14:25 ` Theodore Ts'o
2002-09-26 14:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020925173605.A12911@redhat.com \
--to=sct@redhat.com \
--cc=akpm@zip.com.au \
--cc=jakob@unthought.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox