public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: sct@redhat.com, adilger@clusterfs.com,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	jack@suse.cz, jbacik@redhat.com, cmm@us.ibm.com, tytso@mit.edu,
	yumiko.sugita.yf@hitachi.com, satoshi.oshima.fk@hitachi.com
Subject: Re: [PATCH 3/5] jbd: abort when failed to log metadata buffers
Date: Wed, 04 Jun 2008 19:57:50 +0900	[thread overview]
Message-ID: <4846752E.9080501@hitachi.com> (raw)
In-Reply-To: <20080603153506.8a9ca2a4.akpm@linux-foundation.org>

Hi,

Andrew Morton wrote:

> On Mon, 02 Jun 2008 19:46:02 +0900
> Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> wrote:
> 
>>Subject: [PATCH 3/5] jbd: abort when failed to log metadata buffers
>>
>>If we failed to write metadata buffers to the journal space and
>>succeeded to write the commit record, stale data can be written
>>back to the filesystem as metadata in the recovery phase.
>>
>>To avoid this, when we failed to write out metadata buffers,
>>abort the journal before writing the commit record.
>>
>>Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>>---
>> fs/jbd/commit.c |    3 +++
>> 1 file changed, 3 insertions(+)
>>
>>Index: linux-2.6.26-rc4/fs/jbd/commit.c
>>===================================================================
>>--- linux-2.6.26-rc4.orig/fs/jbd/commit.c
>>+++ linux-2.6.26-rc4/fs/jbd/commit.c
>>@@ -734,6 +734,9 @@ wait_for_iobuf:
>> 		/* AKPM: bforget here */
>> 	}
>> 
>>+	if (err)
>>+		journal_abort(journal, err);
>>+
>> 	jbd_debug(3, "JBD: commit phase 6\n");
>> 
>> 	if (journal_write_commit_record(journal, commit_transaction))
>>
> 
> 
> I assume this has all been tested?

Yes,  I tested all cases except for the following case (related to
PATCH 4/5):

> o journal_flush() uses j_checkpoint_mutex to avoid a race with
>  __log_wait_for_space()
> 
> The last item targets a newly found problem.  journal_flush() can be
> called while processing __log_wait_for_space().  In this case,
> cleanup_journal_tail() can be called between
> __journal_drop_transaction() and journal_abort(), then 
> the transaction with checkpointing failure is lost from the journal.
> Using j_checkpoint_mutex which is used by __log_wait_for_space(),
> we should avoid the race condition.  But the test is not so sufficient
> because it is very difficult to produce this race.  So I hope that
> this locking is reviewed carefully (including a possibility of
> deadlock.)

I caused invocations of journal_flush() and __log_wait_for_space() and
a write error simultaneously, but I haven't confirmed the race had
occurred.

> How are you finding these problems and testing the fixes?  Fault
> injection?

I found these problems by reading souce codes, then tested them
by the fault injection approach.  To inject a fault, I used a
SystemTap script at the bottom of this mail.

> Does it make sense to proceed into phase 6 here, or should we bale out
> of commit at this point?

What I really want to do is that don't write the commit record when
metadata buffers couldn't be written to the journal.
It should be no problem in the case of writing revoke records failure
because the recovery process detects the invalid control block with
a noncontiguous sequence number.
But it is nonsense to write the commit record even though we failed
to write control blocks to the journal.  So I think it makes sense
to catch errors for all writes to the journal here and abort the
journal to avoid writing the commit record.

* * * * * *

The following SystemTap script was used to inject a fault.
Please don't use this script without changing.  It is hard-coded
for my environment.


global target_inode_block = 64
/*
 * Inject a fault when a particular metadata buffer is journaled.
 */

%{
#include <linux/buffer_head.h>
#include <linux/jbd.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>

enum fi_state_bits {
	BH_Faulty = BH_Unshadow + 1,
};
%}

function fault_inject (scmd: long) %{
	struct scsi_cmnd *cmd = (void *)((unsigned long)THIS->scmd);
	cmd->cmnd[0] |= (7 << 5);
	cmd->cmd_len = 255;
%}

global do_fault_inject
global faulty_sector
probe module("jbd").function("journal_write_metadata_buffer") {
	if ($jh_in->b_bh->b_blocknr == target_inode_block) {
		do_fault_inject[tid()] = 1
	}
}
probe module("jbd").function("journal_write_metadata_buffer").return {
	do_fault_inject[tid()] = 0
}

probe module("jbd").function("journal_file_buffer") {
	if (do_fault_inject[tid()] && $jlist == 4 /* BJ_IO */) {
		faulty_sector[$jh->b_bh->b_blocknr * 8 + 63] = 1
		printf("mark faulty @ sector=%d\n",
			$jh->b_bh->b_blocknr * 8 + 63)
	}
}

probe kernel.function("scsi_dispatch_cmd") {
	host = $cmd->device->host->host_no
	id = $cmd->device->id
	lun = $cmd->device->lun
	ch = $cmd->device->channel
	sector = $cmd->request->bio->bi_sector
	len = $cmd->transfersize / 512

	if (id != 1) {
		next
	}
	printf("%d:%d:%d:%d, #%d+%d\n", host, ch, id, lun, sector, len)
	if ($cmd->request->cmd_flags & 1 == 1 && faulty_sector[sector]) {
		delete faulty_sector[sector]
		fault_inject($cmd)
		printf("fault injected\n")
	}
}

-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


  reply	other threads:[~2008-06-04 10:58 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-02 10:40 [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Hidehiro Kawai
2008-06-02 10:43 ` [PATCH 1/5] jbd: strictly check for write errors on data buffers Hidehiro Kawai
2008-06-03 22:30   ` Andrew Morton
2008-06-04 10:19     ` Jan Kara
2008-06-04 18:19       ` Andrew Morton
2008-06-04 21:22         ` Theodore Tso
2008-06-04 21:58           ` Andrew Morton
2008-06-04 22:51             ` Theodore Tso
2008-06-05  9:35               ` Jan Kara
2008-06-05 11:33                 ` Hidehiro Kawai
2008-06-05 14:29                   ` Theodore Tso
2008-06-05 16:20                     ` Andrew Morton
2008-06-05 18:49                       ` Andreas Dilger
2008-06-09 10:09                         ` Hidehiro Kawai
2008-06-11 12:35                           ` Jan Kara
2008-06-12 13:19                             ` Hidehiro Kawai
2008-06-05  3:28           ` Mike Snitzer
2008-06-04 21:58         ` Andreas Dilger
2008-06-04 10:53     ` Hidehiro Kawai
2008-06-02 10:45 ` [PATCH 2/5] jbd: ordered data integrity fix Hidehiro Kawai
2008-06-02 11:59   ` Jan Kara
2008-06-03 22:33   ` Andrew Morton
2008-06-04 10:55     ` Hidehiro Kawai
2008-06-02 10:46 ` [PATCH 3/5] jbd: abort when failed to log metadata buffers Hidehiro Kawai
2008-06-02 12:00   ` Jan Kara
2008-06-03 22:35   ` Andrew Morton
2008-06-04 10:57     ` Hidehiro Kawai [this message]
2008-06-02 10:47 ` [PATCH 4/5] jbd: fix error handling for checkpoint io Hidehiro Kawai
2008-06-02 12:44   ` Jan Kara
2008-06-03  4:31     ` Hidehiro Kawai
2008-06-03  4:40     ` Hidehiro Kawai
2008-06-03  5:11       ` Hidehiro Kawai
2008-06-03  5:20         ` Andrew Morton
2008-06-03  8:02       ` Jan Kara
2008-06-23 11:14         ` Hidehiro Kawai
2008-06-23 12:22           ` Jan Kara
2008-06-24 11:52             ` Hidehiro Kawai
2008-06-24 13:33               ` Jan Kara
2008-06-27  8:06                 ` Hidehiro Kawai
2008-06-27 10:24                   ` Jan Kara
2008-06-30  5:09                     ` Hidehiro Kawai
2008-07-07 10:07                       ` Jan Kara
2008-06-02 10:48 ` [PATCH 5/5] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
2008-06-02 12:49   ` Jan Kara
2008-06-02 12:05 ` [PATCH 0/5] jbd: possible filesystem corruption fixes (take 2) Jan Kara
2008-06-03  4:30   ` Hidehiro Kawai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4846752E.9080501@hitachi.com \
    --to=hidehiro.kawai.ez@hitachi.com \
    --cc=adilger@clusterfs.com \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=jack@suse.cz \
    --cc=jbacik@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=satoshi.oshima.fk@hitachi.com \
    --cc=sct@redhat.com \
    --cc=tytso@mit.edu \
    --cc=yumiko.sugita.yf@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox