From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
Theodore Ts'o <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, linux-ext4@vger.kernel.org,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: [patch 27/39] ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
Date: Wed, 18 Feb 2009 13:33:05 -0800 [thread overview]
Message-ID: <20090218213305.GB19814@kroah.com> (raw)
In-Reply-To: <20090218213021.GA19814@kroah.com>
[-- Attachment #1: ext4-fix-the-race-between-read_inode_bitmap-and-ext4_new_inode.patch --]
[-- Type: text/plain, Size: 7119 bytes --]
2.6.28-stable review patch. If anyone has any objections, please let us know.
------------------
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
(cherry picked from commit 393418676a7602e1d7d3f6e560159c65c8cbd50e)
We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
whether to initialize the inode bitmap each time it is called.
(introduced by commit c806e68f.)
ext4_read_inode_bitmap does:
spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
ext4_init_inode_bitmap(sb, bh, block_group, desc);
and ext4_new_inode does
if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
ino, inode_bitmap_bh->b_data))
......
...
spin_lock(sb_bgl_lock(sbi, group));
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
i.e., on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
parallel ext4_read_inode_bitmap can zero out the bitmap in between
the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)
The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/ext4/ialloc.c | 140 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 83 insertions(+), 57 deletions(-)
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -570,6 +570,77 @@ static int find_group_other(struct super
}
/*
+ * claim the inode from the inode bitmap. If the group
+ * is uninit we need to take the groups's sb_bgl_lock
+ * and clear the uninit flag. The inode bitmap update
+ * and group desc uninit flag clear should be done
+ * after holding sb_bgl_lock so that ext4_read_inode_bitmap
+ * doesn't race with the ext4_claim_inode
+ */
+static int ext4_claim_inode(struct super_block *sb,
+ struct buffer_head *inode_bitmap_bh,
+ unsigned long ino, ext4_group_t group, int mode)
+{
+ int free = 0, retval = 0;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL);
+
+ spin_lock(sb_bgl_lock(sbi, group));
+ if (ext4_set_bit(ino, inode_bitmap_bh->b_data)) {
+ /* not a free inode */
+ retval = 1;
+ goto err_ret;
+ }
+ ino++;
+ if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
+ ino > EXT4_INODES_PER_GROUP(sb)) {
+ spin_unlock(sb_bgl_lock(sbi, group));
+ ext4_error(sb, __func__,
+ "reserved inode or inode > inodes count - "
+ "block_group = %lu, inode=%lu", group,
+ ino + group * EXT4_INODES_PER_GROUP(sb));
+ return 1;
+ }
+ /* If we didn't allocate from within the initialized part of the inode
+ * table then we need to initialize up to this inode. */
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+
+ if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
+ gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
+ /* When marking the block group with
+ * ~EXT4_BG_INODE_UNINIT we don't want to depend
+ * on the value of bg_itable_unused even though
+ * mke2fs could have initialized the same for us.
+ * Instead we calculated the value below
+ */
+
+ free = 0;
+ } else {
+ free = EXT4_INODES_PER_GROUP(sb) -
+ le16_to_cpu(gdp->bg_itable_unused);
+ }
+
+ /*
+ * Check the relative inode number against the last used
+ * relative inode number in this group. if it is greater
+ * we need to update the bg_itable_unused count
+ *
+ */
+ if (ino > free)
+ gdp->bg_itable_unused =
+ cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
+ }
+ le16_add_cpu(&gdp->bg_free_inodes_count, -1);
+ if (S_ISDIR(mode)) {
+ le16_add_cpu(&gdp->bg_used_dirs_count, 1);
+ }
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+err_ret:
+ spin_unlock(sb_bgl_lock(sbi, group));
+ return retval;
+}
+
+/*
* There are two policies for allocating an inode. If the new inode is
* a directory, then a forward search is made for a block group with both
* free space and a low directory-to-inode ratio; if that fails, then of
@@ -652,8 +723,12 @@ repeat_in_this_group:
if (err)
goto fail;
- if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
- ino, bitmap_bh->b_data)) {
+ BUFFER_TRACE(bh2, "get_write_access");
+ err = ext4_journal_get_write_access(handle, bh2);
+ if (err)
+ goto fail;
+ if (!ext4_claim_inode(sb, bitmap_bh,
+ ino, group, mode)) {
/* we won it */
BUFFER_TRACE(bitmap_bh,
"call ext4_journal_dirty_metadata");
@@ -661,10 +736,13 @@ repeat_in_this_group:
bitmap_bh);
if (err)
goto fail;
+ /* zero bit is inode number 1*/
+ ino++;
goto got;
}
/* we lost it */
jbd2_journal_release_buffer(handle, bitmap_bh);
+ jbd2_journal_release_buffer(handle, bh2);
if (++ino < EXT4_INODES_PER_GROUP(sb))
goto repeat_in_this_group;
@@ -684,21 +762,6 @@ repeat_in_this_group:
goto out;
got:
- ino++;
- if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
- ino > EXT4_INODES_PER_GROUP(sb)) {
- ext4_error(sb, __func__,
- "reserved inode or inode > inodes count - "
- "block_group = %lu, inode=%lu", group,
- ino + group * EXT4_INODES_PER_GROUP(sb));
- err = -EIO;
- goto fail;
- }
-
- BUFFER_TRACE(bh2, "get_write_access");
- err = ext4_journal_get_write_access(handle, bh2);
- if (err) goto fail;
-
/* We may have to initialize the block bitmap if it isn't already */
if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
@@ -733,47 +796,10 @@ got:
if (err)
goto fail;
}
-
- spin_lock(sb_bgl_lock(sbi, group));
- /* If we didn't allocate from within the initialized part of the inode
- * table then we need to initialize up to this inode. */
- if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
- if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
- gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-
- /* When marking the block group with
- * ~EXT4_BG_INODE_UNINIT we don't want to depend
- * on the value of bg_itable_unused even though
- * mke2fs could have initialized the same for us.
- * Instead we calculated the value below
- */
-
- free = 0;
- } else {
- free = EXT4_INODES_PER_GROUP(sb) -
- le16_to_cpu(gdp->bg_itable_unused);
- }
-
- /*
- * Check the relative inode number against the last used
- * relative inode number in this group. if it is greater
- * we need to update the bg_itable_unused count
- *
- */
- if (ino > free)
- gdp->bg_itable_unused =
- cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
- }
next prev parent reply other threads:[~2009-02-18 21:35 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090218212144.965748151@mini.kroah.org>
[not found] ` <20090218213021.GA19814@kroah.com>
2009-02-18 21:32 ` [patch 15/39] ext4: Add support for non-native signed/unsigned htree hash algorithms Greg KH
2009-02-18 21:32 ` [patch 16/39] ext4: tone down ext4_da_writepages warnings Greg KH
2009-02-18 21:32 ` [patch 17/39] ext4: Fix the delalloc writepages to allocate blocks at the right offset Greg KH
2009-02-18 21:32 ` [patch 18/39] ext4: avoid ext4_error when mounting a fs with a single bg Greg KH
2009-02-18 21:32 ` [patch 19/39] ext4: Widen type of ext4_sb_info.s_mb_maxs[] Greg KH
2009-02-18 21:32 ` [patch 20/39] jbd2: Add barrier not supported test to journal_wait_on_commit_record Greg KH
2009-02-18 21:32 ` [patch 21/39] ext4: Dont overwrite allocation_context ac_status Greg KH
2009-02-18 21:32 ` [patch 22/39] ext4: Add blocks added during resize to bitmap Greg KH
2009-02-18 21:32 ` [patch 23/39] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize Greg KH
2009-02-18 21:32 ` [patch 24/39] ext4: cleanup mballoc header files Greg KH
2009-02-18 21:33 ` [patch 25/39] ext4: dont use blocks freed but not yet committed in buddy cache init Greg KH
2009-02-18 21:33 ` [patch 26/39] ext4: Fix race between read_block_bitmap() and mark_diskspace_used() Greg KH
2009-02-18 21:33 ` Greg KH [this message]
2009-02-18 21:33 ` [patch 28/39] jbd2: Add BH_JBDPrivateStart Greg KH
2009-02-18 21:33 ` [patch 29/39] ext4: Use new buffer_head flag to check uninit group bitmaps initialization Greg KH
2009-02-18 21:33 ` [patch 30/39] ext4: mark the blocks/inode bitmap beyond end of group as used Greg KH
2009-02-18 21:33 ` [patch 31/39] ext4: Dont allow new groups to be added during block allocation Greg KH
2009-02-18 21:33 ` [patch 32/39] ext4: Init the complete page while building buddy cache Greg KH
2009-02-18 21:33 ` [patch 33/39] ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc Greg KH
2009-02-18 21:33 ` [patch 34/39] ext4: Add sanity checks for the superblock before mounting the filesystem Greg KH
2009-02-18 21:33 ` [patch 35/39] ext4: only use i_size_high for regular files Greg KH
2009-02-18 21:33 ` [patch 36/39] ext4: Add sanity check to make_indexed_dir Greg KH
2009-02-18 21:33 ` [patch 37/39] jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs" Greg KH
2009-02-18 21:33 ` [patch 38/39] ext4: Initialize the new group descriptor when resizing the filesystem Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090218213305.GB19814@kroah.com \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=davej@redhat.com \
--cc=eteo@redhat.com \
--cc=jake@lwn.net \
--cc=jmforbes@linuxtx.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkrufky@linuxtv.org \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).