Re: [PATCH 4/5] jbd: fix error handling for checkpoint io

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: akpm@linux-foundation.org, sct@redhat.com, adilger@clusterfs.com,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	jbacik@redhat.com, cmm@us.ibm.com, tytso@mit.edu,
	sugita <yumiko.sugita.yf@hitachi.com>,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Subject: Re: [PATCH 4/5] jbd: fix error handling for checkpoint io
Date: Fri, 27 Jun 2008 12:24:20 +0200	[thread overview]
Message-ID: <20080627102419.GC3602@duck.suse.cz> (raw)
In-Reply-To: <48649FA0.70303@hitachi.com>

  Hi,

On Fri 27-06-08 17:06:56, Hidehiro Kawai wrote:
> Jan Kara wrote:
> 
> > On Tue 24-06-08 20:52:59, Hidehiro Kawai wrote:
> 
> >>>>3. is implemented as described below.
> >>>> (1) if log_do_checkpoint() detects an I/O error during
> >>>>     checkpointing, it calls journal_abort() to abort the journal
> >>>> (2) if the journal has aborted, don't update s_start and s_sequence
> >>>>     in the on-disk journal superblock
> >>>>
> >>>>So, if the journal aborts, journaled data will be replayed on the
> >>>>next mount.
> >>>>
> >>>>Now, please remember that some dirty metadata buffers are written
> >>>>back to the filesystem without journaling if the journal aborted.
> >>>>We are happy if all dirty metadata buffers are written to the disk,
> >>>>the integrity of the filesystem will be kept.  However, replaying
> >>>>the journaled data can overwrite the latest on-disk metadata blocks
> >>>>partly with old data.  It would break the filesystem.
> >>>
> >>>  Yes, it would. But how do you think it can happen that a metadata buffer
> >>>will be written back to the filesystem when it is a part of running
> >>>transaction? Note that checkpointing code specifically checks whether the
> >>>buffer being written back is part of a running transaction and if so, it
> >>>waits for commit before writing back the buffer. So I don't think this can
> >>>happen but maybe I miss something...
> >>
> >>Checkpointing code checks it and may call log_wait_commit(), but this
> >>problem is caused by transactions which have not started checkpointing.
> >>
> >>For example, the tail transaction has an old update for block_B and
> >>the running transaction has a new update for block_B.  Then, the
> >>committing transaction fails to write the commit record, it aborts the
> >>journal, and new block_B will be written back to the file system without
> >>journaling.  Because this patch doesn't separate between normal abort
> >>and checkpointing related abort, the tail transaction is left in the
> >>journal space.  So by replaying the tail transaction, new block_B is
> >>overwritten with old one.
> > 
> >   Yes, and this is expected an correct. When we cannot properly finish a
> > transaction, we have to discard everything in it. A bug would be (and I
> > think it could currently happen) if we already checkpointed the previous
> > transaction and then written over block_B new data from the uncommitted
> > transaction. I think we have to avoid that - i.e., in case we abort the
> > journal we should not mark buffers dirty when processing the forget loop.
> 
> Yes.
> 
> > But this is not too serious since fsck has to be run anyway and it will
> > fix the problems.
> 
> Yes.  The filesystem should be marked with an error, so fsck will check
> and recover the filesystem on boot.  But this means the filesystem loses
> some latest updates even if it was cleanly unmounted (although some file
> data has been lost.)  I'm a bit afraid that some people would think of
> this as a regression due to this PATCH 4/5.  At least, to avoid
> undesirable replay, we had better keep journaled data only when the
> journal has been aborted for checkpointing related reason.
  I don't think this makes any difference. Look: We have transaction A
modifying block B fully committed to the journal. Now there is a running
(or committing, it does not really matter) transaction R also modifying block
B. Until R gets fully committed, no block modified by R is checkpointed
to the device - checkpointing code takes care of that and it must be so
to satisfy journaling guarantees.
  So if we abort journal (for whatever reason) before R is fully committed,
no change in R will be seen on the filesystem regardless whether you
cleanup the journal or not.
  And you cannot do much differenly - the principle of journaling is that
either all changes in the transaction get to disk or none of them. So until
the transaction is properly committed, you have the only way to satisfy
this condition - take the "none of them" choice.
  Now fsck could of course be clever and try to use updates in not fully
committed transaction but personally I don't think it's worth the effort.
  Please correct me if I just misunderstood your point...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2008-06-27 10:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-02 10:40 [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Hidehiro Kawai
2008-06-02 10:43 ` [PATCH 1/5] jbd: strictly check for write errors on data buffers Hidehiro Kawai
2008-06-03 22:30   ` Andrew Morton
2008-06-04 10:19     ` Jan Kara
2008-06-04 18:19       ` Andrew Morton
2008-06-04 21:22         ` Theodore Tso
2008-06-04 21:58           ` Andrew Morton
2008-06-04 22:51             ` Theodore Tso
2008-06-05  9:35               ` Jan Kara
2008-06-05 11:33                 ` Hidehiro Kawai
2008-06-05 14:29                   ` Theodore Tso
2008-06-05 16:20                     ` Andrew Morton
2008-06-05 18:49                       ` Andreas Dilger
2008-06-09 10:09                         ` Hidehiro Kawai
2008-06-11 12:35                           ` Jan Kara
2008-06-12 13:19                             ` Hidehiro Kawai
2008-06-05  3:28           ` Mike Snitzer
2008-06-04 21:58         ` Andreas Dilger
2008-06-04 10:53     ` Hidehiro Kawai
2008-06-02 10:45 ` [PATCH 2/5] jbd: ordered data integrity fix Hidehiro Kawai
2008-06-02 11:59   ` Jan Kara
2008-06-03 22:33   ` Andrew Morton
2008-06-04 10:55     ` Hidehiro Kawai
2008-06-02 10:46 ` [PATCH 3/5] jbd: abort when failed to log metadata buffers Hidehiro Kawai
2008-06-02 12:00   ` Jan Kara
2008-06-03 22:35   ` Andrew Morton
2008-06-04 10:57     ` Hidehiro Kawai
2008-06-02 10:47 ` [PATCH 4/5] jbd: fix error handling for checkpoint io Hidehiro Kawai
2008-06-02 12:44   ` Jan Kara
2008-06-03  4:31     ` Hidehiro Kawai
2008-06-03  4:40     ` Hidehiro Kawai
2008-06-03  5:11       ` Hidehiro Kawai
2008-06-03  5:20         ` Andrew Morton
2008-06-03  8:02       ` Jan Kara
2008-06-23 11:14         ` Hidehiro Kawai
2008-06-23 12:22           ` Jan Kara
2008-06-24 11:52             ` Hidehiro Kawai
2008-06-24 13:33               ` Jan Kara
2008-06-27  8:06                 ` Hidehiro Kawai
2008-06-27 10:24                   ` Jan Kara [this message]
2008-06-30  5:09                     ` Hidehiro Kawai
2008-07-07 10:07                       ` Jan Kara
2008-06-02 10:48 ` [PATCH 5/5] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
2008-06-02 12:49   ` Jan Kara
2008-06-02 12:05 ` [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Jan Kara
2008-06-03  4:30   ` Hidehiro Kawai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080627102419.GC3602@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@clusterfs.com \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=jbacik@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=sct@redhat.com \
    --cc=tytso@mit.edu \
    --cc=yumiko.sugita.yf@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox