public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: akpm@linux-foundation.org, sct@redhat.com, adilger@clusterfs.com,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	jack@suse.cz, jbacik@redhat.com, cmm@us.ibm.com, tytso@mit.edu,
	sugita <yumiko.sugita.yf@hitachi.com>,
	Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Subject: Re: [PATCH 4/5] jbd: fix error handling for checkpoint io
Date: Mon, 2 Jun 2008 14:44:09 +0200	[thread overview]
Message-ID: <20080602124409.GL30613@duck.suse.cz> (raw)
In-Reply-To: <4843CFBD.7040706@hitachi.com>

On Mon 02-06-08 19:47:25, Hidehiro Kawai wrote:
> Subject: [PATCH 4/5] jbd: fix error handling for checkpoint io
> 
> When a checkpointing IO fails, current JBD code doesn't check the
> error and continue journaling.  This means latest metadata can be
> lost from both the journal and filesystem.
> 
> This patch leaves the failed metadata blocks in the journal space
> and aborts journaling in the case of log_do_checkpoint().
> To achieve this, we need to do:
> 
> 1. don't remove the failed buffer from the checkpoint list where in
>    the case of __try_to_free_cp_buf() because it may be released or
>    overwritten by a later transaction
> 2. log_do_checkpoint() is the last chance, remove the failed buffer
>    from the checkpoint list and abort the journal
> 3. when checkpointing fails, don't update the journal super block to
>    prevent the journaled contents from being cleaned.  For safety,
>    don't update j_tail and j_tail_sequence either
> 4. when checkpointing fails, notify this error to the ext3 layer so
>    that ext3 don't clear the needs_recovery flag, otherwise the
>    journaled contents are ignored and cleaned in the recovery phase
> 5. if the recovery fails, keep the needs_recovery flag
> 6. prevent cleanup_journal_tail() from being called between
>    __journal_drop_transaction() and journal_abort() (a race issue
>    between journal_flush() and __log_wait_for_space()
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
  Just a few minor comments:

> 
> Index: linux-2.6.26-rc4/fs/jbd/checkpoint.c
> ===================================================================
> --- linux-2.6.26-rc4.orig/fs/jbd/checkpoint.c
> +++ linux-2.6.26-rc4/fs/jbd/checkpoint.c

<snip>

> @@ -318,6 +331,7 @@ int log_do_checkpoint(journal_t *journal
>  	 * OK, we need to start writing disk blocks.  Take one transaction
>  	 * and write it.
>  	 */
> +	result = 0;
>  	spin_lock(&journal->j_list_lock);
>  	if (!journal->j_checkpoint_transactions)
>  		goto out;
> @@ -334,7 +348,7 @@ restart:
>  		int batch_count = 0;
>  		struct buffer_head *bhs[NR_BATCH];
>  		struct journal_head *jh;
> -		int retry = 0;
> +		int retry = 0, err;
>  
>  		while (!retry && transaction->t_checkpoint_list) {
>  			struct buffer_head *bh;
> @@ -347,6 +361,8 @@ restart:
>  				break;
>  			}
>  			retry = __process_buffer(journal, jh, bhs,&batch_count);
> +			if (retry < 0)
> +				result = retry;
  Here you update result whenever retry is < 0 and below when result == 0.
I think it's better to have these two consistent (not that it would be
currently any functional difference).

>  			if (!retry && (need_resched() ||
>  				spin_needbreak(&journal->j_list_lock))) {
>  				spin_unlock(&journal->j_list_lock);
> @@ -371,14 +387,18 @@ restart:
>  		 * Now we have cleaned up the first transaction's checkpoint
>  		 * list. Let's clean up the second one
>  		 */
> -		__wait_cp_io(journal, transaction);
> +		err = __wait_cp_io(journal, transaction);
> +		if (!result)
> +			result = err;
>  	}

> @@ -1360,10 +1370,16 @@ int journal_flush(journal_t *journal)
>  	spin_lock(&journal->j_list_lock);
>  	while (!err && journal->j_checkpoint_transactions != NULL) {
>  		spin_unlock(&journal->j_list_lock);
> +		mutex_lock(&journal->j_checkpoint_mutex);
>  		err = log_do_checkpoint(journal);
> +		mutex_unlock(&journal->j_checkpoint_mutex);
>  		spin_lock(&journal->j_list_lock);
>  	}
>  	spin_unlock(&journal->j_list_lock);
> +
> +	if (is_journal_aborted(journal))
> +		return -EIO;
> +
>  	cleanup_journal_tail(journal);
>  
>  	/* Finally, mark the journal as really needing no recovery.
  OK, so this way you've basically serialized all users of
log_do_checkpoint(). That should be fine because performance-wise interesting
is only log_wait_for_space() and that was already serialized before. So
this change is fine with me. Only please add a comment in front of
log_do_checkpoint() that it's supposed to be called with j_checkpoint_mutex
held so that EIO propagation works correctly.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2008-06-02 12:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-02 10:40 [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Hidehiro Kawai
2008-06-02 10:43 ` [PATCH 1/5] jbd: strictly check for write errors on data buffers Hidehiro Kawai
2008-06-03 22:30   ` Andrew Morton
2008-06-04 10:19     ` Jan Kara
2008-06-04 18:19       ` Andrew Morton
2008-06-04 21:22         ` Theodore Tso
2008-06-04 21:58           ` Andrew Morton
2008-06-04 22:51             ` Theodore Tso
2008-06-05  9:35               ` Jan Kara
2008-06-05 11:33                 ` Hidehiro Kawai
2008-06-05 14:29                   ` Theodore Tso
2008-06-05 16:20                     ` Andrew Morton
2008-06-05 18:49                       ` Andreas Dilger
2008-06-09 10:09                         ` Hidehiro Kawai
2008-06-11 12:35                           ` Jan Kara
2008-06-12 13:19                             ` Hidehiro Kawai
2008-06-05  3:28           ` Mike Snitzer
2008-06-04 21:58         ` Andreas Dilger
2008-06-04 10:53     ` Hidehiro Kawai
2008-06-02 10:45 ` [PATCH 2/5] jbd: ordered data integrity fix Hidehiro Kawai
2008-06-02 11:59   ` Jan Kara
2008-06-03 22:33   ` Andrew Morton
2008-06-04 10:55     ` Hidehiro Kawai
2008-06-02 10:46 ` [PATCH 3/5] jbd: abort when failed to log metadata buffers Hidehiro Kawai
2008-06-02 12:00   ` Jan Kara
2008-06-03 22:35   ` Andrew Morton
2008-06-04 10:57     ` Hidehiro Kawai
2008-06-02 10:47 ` [PATCH 4/5] jbd: fix error handling for checkpoint io Hidehiro Kawai
2008-06-02 12:44   ` Jan Kara [this message]
2008-06-03  4:31     ` Hidehiro Kawai
2008-06-03  4:40     ` Hidehiro Kawai
2008-06-03  5:11       ` Hidehiro Kawai
2008-06-03  5:20         ` Andrew Morton
2008-06-03  8:02       ` Jan Kara
2008-06-23 11:14         ` Hidehiro Kawai
2008-06-23 12:22           ` Jan Kara
2008-06-24 11:52             ` Hidehiro Kawai
2008-06-24 13:33               ` Jan Kara
2008-06-27  8:06                 ` Hidehiro Kawai
2008-06-27 10:24                   ` Jan Kara
2008-06-30  5:09                     ` Hidehiro Kawai
2008-07-07 10:07                       ` Jan Kara
2008-06-02 10:48 ` [PATCH 5/5] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
2008-06-02 12:49   ` Jan Kara
2008-06-02 12:05 ` [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Jan Kara
2008-06-03  4:30   ` Hidehiro Kawai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080602124409.GL30613@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@clusterfs.com \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=jbacik@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=sct@redhat.com \
    --cc=tytso@mit.edu \
    --cc=yumiko.sugita.yf@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox