Re: + jbd-fix-error-handling-for-checkpoint-io.patch added to -mm tree

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: akpm@linux-foundation.org, jack@suse.cz,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	jbacik@redhat.com, cmm@us.ibm.com, tytso@mit.edu, sct@redhat.com,
	adilger@clusterfs.com, mm-commits@vger.kernel.org,
	yumiko.sugita.yf@hitachi.com, satoshi.oshima.fk@hitachi.com
Subject: Re: + jbd-fix-error-handling-for-checkpoint-io.patch added to -mm tree
Date: Thu, 21 Aug 2008 13:51:33 +0200	[thread overview]
Message-ID: <20080821115133.GC5428@duck.suse.cz> (raw)
In-Reply-To: <48AD3ED7.6050903@hitachi.com>

  Hello,

On Thu 21-08-08 19:09:27, Hidehiro Kawai wrote:
> > The patch titled
> >      jbd: fix error handling for checkpoint io
> > has been added to the -mm tree.  Its filename is
> >      jbd-fix-error-handling-for-checkpoint-io.patch
> 
> [snip]
> 
> > Subject: jbd: fix error handling for checkpoint io
> > From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> > 
> > When a checkpointing IO fails, current JBD code doesn't check the error
> > and continue journaling.  This means latest metadata can be lost from both
> > the journal and filesystem.
> > 
> > This patch leaves the failed metadata blocks in the journal space and
> > aborts journaling in the case of log_do_checkpoint().  To achieve this, we
> > need to do:
> > 
> > 1. don't remove the failed buffer from the checkpoint list where in
> >    the case of __try_to_free_cp_buf() because it may be released or
> >    overwritten by a later transaction
> > 2. log_do_checkpoint() is the last chance, remove the failed buffer
> >    from the checkpoint list and abort the journal
> > 3. when checkpointing fails, don't update the journal super block to
> >    prevent the journaled contents from being cleaned.  For safety,
> >    don't update j_tail and j_tail_sequence either
> > 4. when checkpointing fails, notify this error to the ext3 layer so
> >    that ext3 don't clear the needs_recovery flag, otherwise the
> >    journaled contents are ignored and cleaned in the recovery phase
> > 5. if the recovery fails, keep the needs_recovery flag
> 
> > 6. prevent cleanup_journal_tail() from being called between
> >    __journal_drop_transaction() and journal_abort() (a race issue
> >    between journal_flush() and __log_wait_for_space()
> 
> When I read the source code again, I noticed the race condition described
> in 6 doesn't happen.  I've thought journal_flush() can invoke
> log_do_checkpoint() while __log_wait_for_space() is invoking
> log_do_checkpoint(), but it would be wrong.
> 
> First journal_flush() invokes __log_start_commit() and log_wait_commit()
> pair.  After this, there is no running transaction and no starting handle.
> New handles are also not created because j_barrier_count blocks it.
> Thus, when journal_flush() invokes log_do_checkpoint(), there is
> no other process which invokes __log_wait_for_space() and
> log_do_checkpoint() to get free log space.  So invocations of
> log_do_checkpoint() are always isolated, the race condition doesn't
> happen.
  I'm not quite following you. j_barrier_count is increased only in
journal_lock_updates(). Noone is forced to first call
journal_lock_updates() and only after that journal_flush() (although
usually it is done that way). So I think taking the j_checkpoint_mutex in
journal_flush() is really a good thing to do.

> If my understanding is correct, adding mutex_lock() around
> log_do_checkpoint() (see bellow) is unneeded.
> 
> What do you think about this?
> 
> [snip]
> > @@ -1359,10 +1369,16 @@ int journal_flush(journal_t *journal)
> >  	spin_lock(&journal->j_list_lock);
> >  	while (!err && journal->j_checkpoint_transactions != NULL) {
> >  		spin_unlock(&journal->j_list_lock);
> > +		mutex_lock(&journal->j_checkpoint_mutex);
> >  		err = log_do_checkpoint(journal);
> > +		mutex_unlock(&journal->j_checkpoint_mutex);
> >  		spin_lock(&journal->j_list_lock);

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2008-08-21 11:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-21 10:09 + jbd-fix-error-handling-for-checkpoint-io.patch added to -mm tree Hidehiro Kawai
2008-08-21 11:51 ` Jan Kara [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-07-29  6:57 akpm
2008-06-03 22:40 akpm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080821115133.GC5428@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@clusterfs.com \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=jbacik@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=sct@redhat.com \
    --cc=tytso@mit.edu \
    --cc=yumiko.sugita.yf@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox