From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Theodore Ts'o" <tytso@mit.edu>,
Eric Sandeen <sandeen@redhat.com>, Nix <nix@esperi.org.uk>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
gregkh@linuxfoundation.org
Subject: Re: [PATCH -v3] ext4: fix unjournaled inode bitmap modification
Date: Mon, 29 Oct 2012 10:08:15 -0700 [thread overview]
Message-ID: <20121029170617.GB19576@blackbox.djwong.org> (raw)
In-Reply-To: <20121029023034.GA9365@thunk.org>
On Sun, Oct 28, 2012 at 10:30:34PM -0400, Theodore Ts'o wrote:
> On Sat, Oct 27, 2012 at 11:23:57PM -0500, Eric Sandeen wrote:
> > A little more going on here to try to properly handle error
> > cases & moving to the next group; despite
> > ext4_handle_release_buffer being a no-op, I've tried
> > to sprinkle it in at the right places. Double checking
> > on review is probably a fine idea ;)
>
> Sorry, I didn't see your newer version of your patch. I'm not
> convinced it's worth it to try to get the calls to
> ext4_handle_release_buffer() right. There are plenty of other places
> where we're not calling ext4_handle_release_buffer(), and I'm not
> convinced it would ever be useful to make it be something other than a
> no-op. In order to make it be useful, we'd have to enforce a rule
> that every single get_write_access() was matched with either a
> handle_dirty_metadata() or a handle_release_buffer(). That would be
> tricky; worse, we'd have to keep track of a refcount on each bh, which
> would cost us on the scalability front. The main benefit would be
> that might be able to be able to reclaim bh's where we called
> get_write_access() and then changed our mind, but that's relatively
> rare, and I think it's easier to simply be more careful about calling
> get_write_acceess() until we're sure we're going to need write access.
>
> Hence in my version of the patch, I've waited until right before the
> call to ext4_lock_group() before calling get_write_access(). Note
> that it's safe to call get_write_access() on a bh twice; the second
> time the jbd2 layer will notice that the bh is already a part of the
> transaction.
>
> Also, leaving out the calls to ext4_handle_release_buffer() makes the
> patch easier to understand and reason about.
>
> What do you think of this version?
I _think_ it looks ok in terms of making sure we call
ext4_inode_bitmap_csum_set() before calling ext4_handle_dirty_metadata() on the
group descriptor, but this function is a bit tricky. :)
--D
>
> - Ted
>
> commit 67d725143e9e7ea458a0c1c4a6625657c3dc7ba2
> Author: Eric Sandeen <sandeen@redhat.com>
> Date: Sun Oct 28 22:24:57 2012 -0400
>
> ext4: fix unjournaled inode bitmap modification
>
> commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed
> ext4_new_inode() such that the inode bitmap was being modified
> outside a transaction, which could lead to corruption, and was
> discovered when journal_checksum found a bad checksum in the
> journal during log replay.
>
> Nix ran into this when using the journal_async_commit mount
> option, which enables journal checksumming. The ensuing
> journal replay failures due to the bad checksums led to
> filesystem corruption reported as the now infamous
> "Apparent serious progressive ext4 data corruption bug"
>
> [ Changed by tytso to only call ext4_journal_get_write_access() only
> when we're fairly certain that we're going to allocate the inode. ]
>
> I've tested this by mounting with journal_checksum and
> running fsstress then dropping power; I've also tested by
> hacking DM to create snapshots w/o first quiescing, which
> allows me to test journal replay repeatedly w/o actually
> power-cycling the box. Without the patch I hit a journal
> checksum error every time. With this fix it survives
> many iterations.
>
> Reported-by: Nix <nix@esperi.org.uk>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: stable@vger.kernel.org
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 4facdd2..3a100e7 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -725,6 +725,10 @@ repeat_in_this_group:
> "inode=%lu", ino + 1);
> continue;
> }
> + BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
> + err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
> + if (err)
> + goto fail;
> ext4_lock_group(sb, group);
> ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
> ext4_unlock_group(sb, group);
> @@ -738,6 +742,11 @@ repeat_in_this_group:
> goto out;
>
> got:
> + BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> + err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
> + if (err)
> + goto fail;
> +
> /* We may have to initialize the block bitmap if it isn't already */
> if (ext4_has_group_desc_csum(sb) &&
> gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> @@ -771,11 +780,6 @@ got:
> goto fail;
> }
>
> - BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
> - err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
> - if (err)
> - goto fail;
> -
> BUFFER_TRACE(group_desc_bh, "get_write_access");
> err = ext4_journal_get_write_access(handle, group_desc_bh);
> if (err)
> @@ -823,11 +827,6 @@ got:
> }
> ext4_unlock_group(sb, group);
>
> - BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> - err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
> - if (err)
> - goto fail;
> -
> BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
> err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
> if (err)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2012-10-29 17:08 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87objupjlr.fsf@spindle.srvr.nix>
[not found] ` <20121023013343.GB6370@fieldses.org>
[not found] ` <87mwzdnuww.fsf@spindle.srvr.nix>
[not found] ` <20121023143019.GA3040@fieldses.org>
[not found] ` <874nllxi7e.fsf_-_@spindle.srvr.nix>
[not found] ` <874nllxi7e.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-23 20:57 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 22:19 ` Theodore Ts'o
2012-10-23 22:47 ` Nix
2012-10-23 23:16 ` Theodore Ts'o
2012-10-23 23:06 ` Nix
2012-10-23 23:28 ` Theodore Ts'o
2012-10-23 23:34 ` Nix
2012-10-24 0:57 ` Eric Sandeen
2012-10-24 20:17 ` Jan Kara
2012-10-26 15:25 ` Eric Sandeen
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 21:31 ` Theodore Ts'o
2012-10-24 22:05 ` Jannis Achstetter
2012-10-24 23:47 ` Nix
2012-10-25 17:02 ` Felipe Contreras
[not found] ` <87pq48nbyz.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-24 1:13 ` Eric Sandeen
2012-10-24 4:15 ` Nix
2012-10-24 4:27 ` Eric Sandeen
2012-10-24 5:23 ` Theodore Ts'o
2012-10-24 7:00 ` Hugh Dickins
2012-10-24 11:46 ` Nix
2012-10-24 11:45 ` Nix
2012-10-24 17:22 ` Eric Sandeen
2012-10-24 19:49 ` Nix
2012-10-24 19:54 ` Nix
2012-10-24 20:30 ` Eric Sandeen
2012-10-24 20:34 ` Nix
2012-10-24 20:45 ` Nix
2012-10-24 21:08 ` Theodore Ts'o
2012-10-24 23:27 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42 ` Nix
2012-10-25 1:10 ` Theodore Ts'o
2012-10-25 1:45 ` Nix
2012-10-25 14:12 ` Theodore Ts'o
2012-10-25 14:15 ` Nix
2012-10-25 17:39 ` Nix
2012-10-25 11:06 ` Nix
2012-10-26 0:22 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26 20:35 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Eric Sandeen
2012-10-26 20:37 ` Nix
[not found] ` <87wqydx957.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 20:56 ` Theodore Ts'o
[not found] ` <20121026205618.GC8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-26 20:59 ` Nix
[not found] ` <87objpx84k.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:19 ` Nix
[not found] ` <87haphx76u.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 12:45 ` Nix
2012-10-27 17:55 ` Theodore Ts'o
2012-10-27 18:47 ` Nix
2012-10-27 21:19 ` Eric Sandeen
2012-10-27 21:21 ` Nix
2012-10-27 21:23 ` Eric Sandeen
2012-10-27 21:29 ` Nix
2012-10-27 21:34 ` Eric Sandeen
2012-10-27 21:40 ` Nix
[not found] ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29 2:09 ` Eric Sandeen
2012-10-27 22:42 ` Eric Sandeen
2012-10-29 1:00 ` Theodore Ts'o
2012-10-29 1:04 ` Nix
2012-10-29 2:24 ` Eric Sandeen
2012-10-29 2:34 ` Theodore Ts'o
2012-10-29 2:35 ` Eric Sandeen
2012-10-29 2:42 ` Theodore Ts'o
2012-10-27 18:30 ` Eric Sandeen
[not found] ` <20121026211542.GE8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-27 3:11 ` Jim Rees
2012-10-27 8:01 ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28 4:23 ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 13:59 ` Nix
2012-10-29 2:30 ` [PATCH -v3] " Theodore Ts'o
2012-10-29 3:24 ` Eric Sandeen
2012-10-29 5:07 ` Andreas Dilger
2012-10-29 17:08 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121029170617.GB19576@blackbox.djwong.org \
--to=darrick.wong@oracle.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nix@esperi.org.uk \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).