All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: Theodore Tso <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com, linux-ext4@vger.kernel.org, mfasheh@suse.com
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers
Date: Mon, 6 Oct 2008 18:01:54 -0700	[thread overview]
Message-ID: <20081007010154.GE26632@mail.oracle.com> (raw)
In-Reply-To: <20081006233248.GA9337@mit.edu>

On Mon, Oct 06, 2008 at 07:32:48PM -0400, Theodore Tso wrote:
> On Mon, Oct 06, 2008 at 02:42:52PM -0700, Joel Becker wrote:
> I'm not 100% sure.....  The other area that we should check very
> closely is jbd2_journal_commit_transaction(); in some cases, if
> jh->b_committed_data is NULL, the frozen data is thrown away (around
> line 850 in transaction.c).  I *think* this happens if b_frozen_data
> was only copied to escape the buffer, but I'm not certain; in any
> case, there's a potential that in that case you might lose the
> calculated checksum and the correct value wouldn't get written to the
> final location on disk.  

	I remember looking at this before.  Basically, if there is a
later transaction also using this buffer, b_frozen_data has this
transaction's version of said buffer.  We've now written that to the
journal.  b_committed_data exists for get_undo_access().  We've just
committed the buffer, so we know that we no longer need to worry about
the undo access.  If we have both, it's a new undo access, and it needs
b_frozen_data moved into b_committed_data.
	How does this affect our checksum?  This all actually falls out
from the fact that jbd skips checkpointing for buffers that are part of
a later transaction.
	If this buffer is only part of the currently committing
transaction, write_metadata_buffer() will have our trigger (from here on
in the place you suggest, not where I put it) modify b_data directly.
It may get copied to b_frozen_data for escaping, but that b_data will go
out with the checkpoint.  Thus, we have a valid checksum on both the
committed buffer and the checkpointed buffer.
	If the buffer is part of a later transaction as well,
write_metadata_buffer() will have our trigger fire on b_frozen_data.
This will put the checksum in the journal, but not modify b_data.
In a healthy system, the later transaction will eventually commit, and
it will put a good checksum in b_data, finally checkpointing that value.
ocfs2 will force this process via journal_flush() before it lets other
nodes look at the disk.
	Looking at the checkpoint part, though, I think we're not safe.
The buffer is attached to the original transaction's checkpoint list
after the commit.  This buffer has the un-checksummed b_data.  If the
later transaction commits before the checkpoint happens, all is good.
But if the buffer lazily writes to disk while the later transaction is
still running, the original transaction could be considered "done",
updating the journal superblock.  If we crash at that moment, we have a
bad checksum on disk.
	I suppose we could trigger on a checkpointed buffer going out?

Joel

-- 

Life's Little Instruction Book #222

	"Think twice before burdening a friend with a secret."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

WARNING: multiple messages have this Message-ID (diff)
From: Joel Becker <Joel.Becker@oracle.com>
To: Theodore Tso <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com, linux-ext4@vger.kernel.org, mfasheh@suse.com
Subject: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers
Date: Mon, 6 Oct 2008 18:01:54 -0700	[thread overview]
Message-ID: <20081007010154.GE26632@mail.oracle.com> (raw)
In-Reply-To: <20081006233248.GA9337@mit.edu>

On Mon, Oct 06, 2008 at 07:32:48PM -0400, Theodore Tso wrote:
> On Mon, Oct 06, 2008 at 02:42:52PM -0700, Joel Becker wrote:
> I'm not 100% sure.....  The other area that we should check very
> closely is jbd2_journal_commit_transaction(); in some cases, if
> jh->b_committed_data is NULL, the frozen data is thrown away (around
> line 850 in transaction.c).  I *think* this happens if b_frozen_data
> was only copied to escape the buffer, but I'm not certain; in any
> case, there's a potential that in that case you might lose the
> calculated checksum and the correct value wouldn't get written to the
> final location on disk.  

	I remember looking at this before.  Basically, if there is a
later transaction also using this buffer, b_frozen_data has this
transaction's version of said buffer.  We've now written that to the
journal.  b_committed_data exists for get_undo_access().  We've just
committed the buffer, so we know that we no longer need to worry about
the undo access.  If we have both, it's a new undo access, and it needs
b_frozen_data moved into b_committed_data.
	How does this affect our checksum?  This all actually falls out
from the fact that jbd skips checkpointing for buffers that are part of
a later transaction.
	If this buffer is only part of the currently committing
transaction, write_metadata_buffer() will have our trigger (from here on
in the place you suggest, not where I put it) modify b_data directly.
It may get copied to b_frozen_data for escaping, but that b_data will go
out with the checkpoint.  Thus, we have a valid checksum on both the
committed buffer and the checkpointed buffer.
	If the buffer is part of a later transaction as well,
write_metadata_buffer() will have our trigger fire on b_frozen_data.
This will put the checksum in the journal, but not modify b_data.
In a healthy system, the later transaction will eventually commit, and
it will put a good checksum in b_data, finally checkpointing that value.
ocfs2 will force this process via journal_flush() before it lets other
nodes look at the disk.
	Looking at the checkpoint part, though, I think we're not safe.
The buffer is attached to the original transaction's checkpoint list
after the commit.  This buffer has the un-checksummed b_data.  If the
later transaction commits before the checkpoint happens, all is good.
But if the buffer lazily writes to disk while the later transaction is
still running, the original transaction could be considered "done",
updating the journal superblock.  If we crash at that moment, we have a
bad checksum on disk.
	I suppose we could trigger on a checkpointed buffer going out?

Joel

-- 

Life's Little Instruction Book #222

	"Think twice before burdening a friend with a secret."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

  reply	other threads:[~2008-10-07  1:03 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-17 23:26 [PATCH] [RFC] jbd2: Add buffer triggers Joel Becker
2008-09-17 23:26 ` [Ocfs2-devel] " Joel Becker
2008-09-17 23:27 ` [PATCH] ocfs2: Use the new jbd_journal_set_triggers() to printk Joel Becker
2008-09-17 23:27   ` [Ocfs2-devel] " Joel Becker
2008-09-19 21:46 ` [PATCH] [RFC] jbd2: Add buffer triggers Andreas Dilger
2008-09-19 21:46   ` [Ocfs2-devel] " Andreas Dilger
2008-09-20  3:36   ` Joel Becker
2008-09-20  3:36     ` Joel Becker
2008-09-29  1:25 ` Theodore Tso
2008-09-29  1:25   ` [Ocfs2-devel] " Theodore Tso
2008-10-04  0:03   ` Theodore Tso
2008-10-04  0:03     ` [Ocfs2-devel] " Theodore Tso
2008-10-06 21:37     ` Joel Becker
2008-10-06 21:37       ` [Ocfs2-devel] " Joel Becker
2008-10-06 21:42       ` Joel Becker
2008-10-06 21:42         ` [Ocfs2-devel] " Joel Becker
2008-10-06 23:32         ` Theodore Tso
2008-10-06 23:32           ` [Ocfs2-devel] " Theodore Tso
2008-10-07  1:01           ` Joel Becker [this message]
2008-10-07  1:01             ` Joel Becker
2008-10-08 23:17             ` Joel Becker
2008-10-08 23:17               ` Joel Becker
2008-10-16 17:42               ` Theodore Tso
2008-10-16 17:42                 ` Theodore Tso
2008-10-16 19:40                 ` Joel Becker
2008-10-16 19:40                   ` Joel Becker
2008-10-16 19:44                   ` Joel Becker
2008-10-16 19:44                     ` [Ocfs2-devel] " Joel Becker
2008-10-17 12:28                   ` Theodore Tso
2008-10-17 12:28                     ` Theodore Tso
2008-10-17 17:11                     ` Mark Fasheh
2008-10-17 17:11                       ` Mark Fasheh
2008-10-07  1:04           ` Joel Becker
2008-10-07  1:04             ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081007010154.GE26632@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.