linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Theodore Tso <tytso@MIT.EDU>
Cc: cmm@us.ibm.com, sandeen@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: [PATCH -V2 5/5] ext4: Fix the race between read_inode_bitmap and ext4_new_inode
Date: Mon, 24 Nov 2008 16:45:53 +0530	[thread overview]
Message-ID: <20081124111553.GB8462@skywalker> (raw)
In-Reply-To: <20081124040524.GD2163@mit.edu>

On Sun, Nov 23, 2008 at 11:05:24PM -0500, Theodore Tso wrote:
> On Fri, Nov 21, 2008 at 10:14:35PM +0530, Aneesh Kumar K.V wrote:
> > We need to make sure we update the inode bitmap and clear
> > EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held. We look
> > at EXT4_BG_INODE_UNINIT and reinit the inode bitmap each
> > time in ext4_read_inode_bitmap (introduced by
> > c806e68f5647109350ec546fee5b526962970fd2 )
> 
> OK, I believe I've checked in all of your patches in this series into
> the ext4 patch queue
> 
> Some of them have comments that still need to be cleared; this one in
> particular needs a better commit comment, and ideally a comment for
> the new function ext4_claim_inode().

I will add a comment to the above function.

> 
> Also, please don't rename variables unnecessarily; if you really think
> it's needed, please do so in a separate patch.  The renaming of
> variables makes it much harder to review the patch, since it bloats
> the patch, and obscures the true changes happening in the patch.
> Please explain why you are making some of the changes you made in the
> patch.  In particular, why does it matter the order in which you
> unlock the bh and sb_bgl_lock in balloc.c, mballoc.c and inode.c?
> 
> 

I will put the variable name cleanup and unlock cleanup into separate
patch. The unlock is done as a cleanup so that unlock appears in the
reverse order with which we did locking. It doesn't make any difference.

-aneesh

Updated patch below

ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held. We look
at EXT4_BG_INODE_UNINIT and reinit the inode bitmap each
time in ext4_read_inode_bitmap (introduced by
c806e68f5647109350ec546fee5b526962970fd2 )

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/ialloc.c |  146 ++++++++++++++++++++++++++++++++----------------------
 1 files changed, 86 insertions(+), 60 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 229708b..d1ccae5 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -573,6 +573,79 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
 }
 
 /*
+ * claim the inode from the inode bitmap. If the group
+ * is uninit we need to take the groups's sb_bgl_lock
+ * and clear the uninit flag. The inode bitmap update
+ * and group desc uninit flag clear should be done
+ * after holding sb_bgl_lock so that ext4_read_inode_bitmap
+ * doesn't race with the ext4_claim_inode
+ */
+static int ext4_claim_inode(struct super_block *sb,
+			struct buffer_head *inode_bitmap_bh,
+			unsigned long ino, ext4_group_t group, int mode)
+{
+	int free = 0, retval = 0, count;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL);
+
+	spin_lock(sb_bgl_lock(sbi, group));
+	if (ext4_set_bit(ino, inode_bitmap_bh->b_data)) {
+		/* not a free inode */
+		retval = 1;
+		goto err_ret;
+	}
+	ino++;
+	if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
+			ino > EXT4_INODES_PER_GROUP(sb)) {
+		spin_unlock(sb_bgl_lock(sbi, group));
+		ext4_error(sb, __func__,
+			   "reserved inode or inode > inodes count - "
+			   "block_group = %u, inode=%lu", group,
+			   ino + group * EXT4_INODES_PER_GROUP(sb));
+		return 1;
+	}
+	/* If we didn't allocate from within the initialized part of the inode
+	 * table then we need to initialize up to this inode. */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+
+		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
+			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
+			/* When marking the block group with
+			 * ~EXT4_BG_INODE_UNINIT we don't want to depend
+			 * on the value of bg_itable_unused even though
+			 * mke2fs could have initialized the same for us.
+			 * Instead we calculated the value below
+			 */
+
+			free = 0;
+		} else {
+			free = EXT4_INODES_PER_GROUP(sb) -
+				ext4_itable_unused_count(sb, gdp);
+		}
+
+		/*
+		 * Check the relative inode number against the last used
+		 * relative inode number in this group. if it is greater
+		 * we need to  update the bg_itable_unused count
+		 *
+		 */
+		if (ino > free)
+			ext4_itable_unused_set(sb, gdp,
+					(EXT4_INODES_PER_GROUP(sb) - ino));
+	}
+	count = ext4_free_inodes_count(sb, gdp) - 1;
+	ext4_free_inodes_set(sb, gdp, count);
+	if (S_ISDIR(mode)) {
+		count = ext4_used_dirs_count(sb, gdp) + 1;
+		ext4_used_dirs_set(sb, gdp, count);
+	}
+	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+err_ret:
+	spin_unlock(sb_bgl_lock(sbi, group));
+	return retval;
+}
+
+/*
  * There are two policies for allocating an inode.  If the new inode is
  * a directory, then a forward search is made for a block group with both
  * free space and a low directory-to-inode ratio; if that fails, then of
@@ -594,7 +667,7 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 	struct ext4_super_block *es;
 	struct ext4_inode_info *ei;
 	struct ext4_sb_info *sbi;
-	int ret2, err = 0, count;
+	int ret2, err = 0;
 	struct inode *ret;
 	ext4_group_t i;
 	int free = 0;
@@ -657,8 +730,13 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 			if (err)
 				goto fail;
 
-			if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
-						ino, inode_bitmap_bh->b_data)) {
+			BUFFER_TRACE(group_desc_bh, "get_write_access");
+			err = ext4_journal_get_write_access(handle,
+								group_desc_bh);
+			if (err)
+				goto fail;
+			if (!ext4_claim_inode(sb, inode_bitmap_bh,
+						ino, group, mode)) {
 				/* we won it */
 				BUFFER_TRACE(inode_bitmap_bh,
 					"call ext4_journal_dirty_metadata");
@@ -666,10 +744,13 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 							inode_bitmap_bh);
 				if (err)
 					goto fail;
+				/* zero bit is inode number 1*/
+				ino++;
 				goto got;
 			}
 			/* we lost it */
 			jbd2_journal_release_buffer(handle, inode_bitmap_bh);
+			jbd2_journal_release_buffer(handle, group_desc_bh);
 
 			if (++ino < EXT4_INODES_PER_GROUP(sb))
 				goto repeat_in_this_group;
@@ -689,22 +770,6 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 	goto out;
 
 got:
-	ino++;
-	if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
-	    ino > EXT4_INODES_PER_GROUP(sb)) {
-		ext4_error(sb, __func__,
-			   "reserved inode or inode > inodes count - "
-			   "block_group = %u, inode=%lu", group,
-			   ino + group * EXT4_INODES_PER_GROUP(sb));
-		err = -EIO;
-		goto fail;
-	}
-
-	BUFFER_TRACE(group_desc_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, group_desc_bh);
-	if (err)
-		goto fail;
-
 	/* We may have to initialize the block bitmap if it isn't already */
 	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
@@ -741,49 +806,10 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 		if (err)
 			goto fail;
 	}
-
-	spin_lock(sb_bgl_lock(sbi, group));
-	/* If we didn't allocate from within the initialized part of the inode
-	 * table then we need to initialize up to this inode. */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
-			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-
-			/* When marking the block group with
-			 * ~EXT4_BG_INODE_UNINIT we don't want to depend
-			 * on the value of bg_itable_unused even though
-			 * mke2fs could have initialized the same for us.
-			 * Instead we calculated the value below
-			 */
-
-			free = 0;
-		} else {
-			free = EXT4_INODES_PER_GROUP(sb) -
-				ext4_itable_unused_count(sb, gdp);
-		}
-
-		/*
-		 * Check the relative inode number against the last used
-		 * relative inode number in this group. if it is greater
-		 * we need to  update the bg_itable_unused count
-		 *
-		 */
-		if (ino > free)
-			ext4_itable_unused_set(sb, gdp,
-					(EXT4_INODES_PER_GROUP(sb) - ino));
-	}

  reply	other threads:[~2008-11-24 11:16 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-21 16:44 [PATCH -V2 1/5] ext4: Remove unneeded code Aneesh Kumar K.V
2008-11-21 16:44 ` [PATCH -V2 2/5] ext4: unlock group before ext4_error Aneesh Kumar K.V
2008-11-21 16:44   ` [PATCH -V2 3/5] ext4: Fix the race between read_block_bitmap and mark_diskspace_used Aneesh Kumar K.V
2008-11-21 16:44     ` [PATCH -V2 4/5] ext4: Use both hi and lo bits of the group desc values Aneesh Kumar K.V
2008-11-21 16:44       ` [PATCH -V2 5/5] ext4: Fix the race between read_inode_bitmap and ext4_new_inode Aneesh Kumar K.V
2008-11-21 17:30         ` Eric Sandeen
2008-11-23 19:26         ` Theodore Tso
2008-11-24  4:05         ` Theodore Tso
2008-11-24 11:15           ` Aneesh Kumar K.V [this message]
2008-11-21 17:29       ` [PATCH -V2 4/5] ext4: Use both hi and lo bits of the group desc values Eric Sandeen
2008-11-21 17:41         ` Aneesh Kumar K.V
2008-11-21 17:53           ` Eric Sandeen
2008-11-23  4:09             ` Andreas Dilger
2008-11-24  1:21               ` Theodore Tso
2008-11-24  2:13           ` Theodore Tso
2008-11-24 10:38             ` Aneesh Kumar K.V
2008-11-21 17:22     ` [PATCH -V2 3/5] ext4: Fix the race between read_block_bitmap and mark_diskspace_used Eric Sandeen
2008-11-21 17:31       ` Aneesh Kumar K.V
2008-11-21 17:39         ` Aneesh Kumar K.V
2008-11-21 17:40           ` Eric Sandeen
2008-11-21 17:39         ` Eric Sandeen
2008-11-23 19:02         ` Theodore Tso
2008-11-24  6:40           ` Aneesh Kumar K.V
2008-11-23 14:00     ` Theodore Tso
2008-11-24  7:14       ` Alex Zhuravlev
2008-11-24 11:33         ` Aneesh Kumar K.V
2008-11-24 16:36           ` Alex Zhuravlev
2008-11-24 16:43             ` Aneesh Kumar K.V
2008-11-24 18:03               ` Alex Zhuravlev
2008-11-24 18:12                 ` Aneesh Kumar K.V
2008-11-24 18:17                   ` Alex Zhuravlev
2008-11-24 18:21                     ` Aneesh Kumar K.V
2008-11-24 18:28                       ` Alex Zhuravlev
2008-11-24 18:41                       ` Alex Zhuravlev
2008-11-25 14:29           ` Frédéric Bohé
2008-11-25 16:38             ` Alex Zhuravlev
2008-11-23 13:37   ` [PATCH -V2 2/5] ext4: unlock group before ext4_error Theodore Tso
2008-11-23 13:43     ` Theodore Tso
2008-11-23 13:59       ` Aneesh Kumar K.V
2008-11-21 17:20 ` [PATCH -V2 1/5] ext4: Remove unneeded code Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081124111553.GB8462@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).