From: Jan Kara <jack@suse.cz>
To: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: akpm@linux-foundation.org, sct@redhat.com, adilger@clusterfs.com,
linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
jack@suse.cz, jbacik@redhat.com, cmm@us.ibm.com, tytso@mit.edu,
sugita <yumiko.sugita.yf@hitachi.com>,
Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Subject: Re: [PATCH 4/5] jbd: fix error handling for checkpoint io
Date: Mon, 2 Jun 2008 14:44:09 +0200 [thread overview]
Message-ID: <20080602124409.GL30613@duck.suse.cz> (raw)
In-Reply-To: <4843CFBD.7040706@hitachi.com>
On Mon 02-06-08 19:47:25, Hidehiro Kawai wrote:
> Subject: [PATCH 4/5] jbd: fix error handling for checkpoint io
>
> When a checkpointing IO fails, current JBD code doesn't check the
> error and continue journaling. This means latest metadata can be
> lost from both the journal and filesystem.
>
> This patch leaves the failed metadata blocks in the journal space
> and aborts journaling in the case of log_do_checkpoint().
> To achieve this, we need to do:
>
> 1. don't remove the failed buffer from the checkpoint list where in
> the case of __try_to_free_cp_buf() because it may be released or
> overwritten by a later transaction
> 2. log_do_checkpoint() is the last chance, remove the failed buffer
> from the checkpoint list and abort the journal
> 3. when checkpointing fails, don't update the journal super block to
> prevent the journaled contents from being cleaned. For safety,
> don't update j_tail and j_tail_sequence either
> 4. when checkpointing fails, notify this error to the ext3 layer so
> that ext3 don't clear the needs_recovery flag, otherwise the
> journaled contents are ignored and cleaned in the recovery phase
> 5. if the recovery fails, keep the needs_recovery flag
> 6. prevent cleanup_journal_tail() from being called between
> __journal_drop_transaction() and journal_abort() (a race issue
> between journal_flush() and __log_wait_for_space()
>
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Just a few minor comments:
>
> Index: linux-2.6.26-rc4/fs/jbd/checkpoint.c
> ===================================================================
> --- linux-2.6.26-rc4.orig/fs/jbd/checkpoint.c
> +++ linux-2.6.26-rc4/fs/jbd/checkpoint.c
<snip>
> @@ -318,6 +331,7 @@ int log_do_checkpoint(journal_t *journal
> * OK, we need to start writing disk blocks. Take one transaction
> * and write it.
> */
> + result = 0;
> spin_lock(&journal->j_list_lock);
> if (!journal->j_checkpoint_transactions)
> goto out;
> @@ -334,7 +348,7 @@ restart:
> int batch_count = 0;
> struct buffer_head *bhs[NR_BATCH];
> struct journal_head *jh;
> - int retry = 0;
> + int retry = 0, err;
>
> while (!retry && transaction->t_checkpoint_list) {
> struct buffer_head *bh;
> @@ -347,6 +361,8 @@ restart:
> break;
> }
> retry = __process_buffer(journal, jh, bhs,&batch_count);
> + if (retry < 0)
> + result = retry;
Here you update result whenever retry is < 0 and below when result == 0.
I think it's better to have these two consistent (not that it would be
currently any functional difference).
> if (!retry && (need_resched() ||
> spin_needbreak(&journal->j_list_lock))) {
> spin_unlock(&journal->j_list_lock);
> @@ -371,14 +387,18 @@ restart:
> * Now we have cleaned up the first transaction's checkpoint
> * list. Let's clean up the second one
> */
> - __wait_cp_io(journal, transaction);
> + err = __wait_cp_io(journal, transaction);
> + if (!result)
> + result = err;
> }
> @@ -1360,10 +1370,16 @@ int journal_flush(journal_t *journal)
> spin_lock(&journal->j_list_lock);
> while (!err && journal->j_checkpoint_transactions != NULL) {
> spin_unlock(&journal->j_list_lock);
> + mutex_lock(&journal->j_checkpoint_mutex);
> err = log_do_checkpoint(journal);
> + mutex_unlock(&journal->j_checkpoint_mutex);
> spin_lock(&journal->j_list_lock);
> }
> spin_unlock(&journal->j_list_lock);
> +
> + if (is_journal_aborted(journal))
> + return -EIO;
> +
> cleanup_journal_tail(journal);
>
> /* Finally, mark the journal as really needing no recovery.
OK, so this way you've basically serialized all users of
log_do_checkpoint(). That should be fine because performance-wise interesting
is only log_wait_for_space() and that was already serialized before. So
this change is fine with me. Only please add a comment in front of
log_do_checkpoint() that it's supposed to be called with j_checkpoint_mutex
held so that EIO propagation works correctly.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2008-06-02 12:44 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-02 10:40 [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Hidehiro Kawai
2008-06-02 10:43 ` [PATCH 1/5] jbd: strictly check for write errors on data buffers Hidehiro Kawai
2008-06-03 22:30 ` Andrew Morton
2008-06-04 10:19 ` Jan Kara
2008-06-04 18:19 ` Andrew Morton
2008-06-04 21:22 ` Theodore Tso
2008-06-04 21:58 ` Andrew Morton
2008-06-04 22:51 ` Theodore Tso
2008-06-05 9:35 ` Jan Kara
2008-06-05 9:35 ` Jan Kara
2008-06-05 11:33 ` Hidehiro Kawai
2008-06-05 14:29 ` Theodore Tso
2008-06-05 16:20 ` Andrew Morton
2008-06-05 18:49 ` Andreas Dilger
2008-06-09 10:09 ` Hidehiro Kawai
2008-06-11 12:35 ` Jan Kara
2008-06-12 13:19 ` Hidehiro Kawai
2008-06-05 3:28 ` Mike Snitzer
2008-06-05 3:28 ` Mike Snitzer
2008-06-04 21:58 ` Andreas Dilger
2008-06-04 10:53 ` Hidehiro Kawai
2008-06-02 10:45 ` [PATCH 2/5] jbd: ordered data integrity fix Hidehiro Kawai
2008-06-02 11:59 ` Jan Kara
2008-06-03 22:33 ` Andrew Morton
2008-06-04 10:55 ` Hidehiro Kawai
2008-06-02 10:46 ` [PATCH 3/5] jbd: abort when failed to log metadata buffers Hidehiro Kawai
2008-06-02 12:00 ` Jan Kara
2008-06-03 22:35 ` Andrew Morton
2008-06-04 10:57 ` Hidehiro Kawai
2008-06-02 10:47 ` [PATCH 4/5] jbd: fix error handling for checkpoint io Hidehiro Kawai
2008-06-02 12:44 ` Jan Kara [this message]
2008-06-03 4:31 ` Hidehiro Kawai
2008-06-03 4:40 ` Hidehiro Kawai
2008-06-03 5:11 ` Hidehiro Kawai
2008-06-03 5:20 ` Andrew Morton
2008-06-03 8:02 ` Jan Kara
2008-06-23 11:14 ` Hidehiro Kawai
2008-06-23 12:22 ` Jan Kara
2008-06-24 11:52 ` Hidehiro Kawai
2008-06-24 13:33 ` Jan Kara
2008-06-27 8:06 ` Hidehiro Kawai
2008-06-27 10:24 ` Jan Kara
2008-06-30 5:09 ` Hidehiro Kawai
2008-07-07 10:07 ` Jan Kara
2008-06-02 10:48 ` [PATCH 5/5] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
2008-06-02 12:49 ` Jan Kara
2008-06-02 12:05 ` [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Jan Kara
2008-06-03 4:30 ` Hidehiro Kawai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080602124409.GL30613@duck.suse.cz \
--to=jack@suse.cz \
--cc=adilger@clusterfs.com \
--cc=akpm@linux-foundation.org \
--cc=cmm@us.ibm.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=jbacik@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=satoshi.oshima.fk@hitachi.com \
--cc=sct@redhat.com \
--cc=tytso@mit.edu \
--cc=yumiko.sugita.yf@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.