All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Theodore Tso <tytso@MIT.EDU>
Cc: cmm@us.ibm.com, sandeen@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: [PATCH -V2 5/5] ext4: Fix the race between read_inode_bitmap and ext4_new_inode
Date: Mon, 24 Nov 2008 16:45:53 +0530	[thread overview]
Message-ID: <20081124111553.GB8462@skywalker> (raw)
In-Reply-To: <20081124040524.GD2163@mit.edu>

On Sun, Nov 23, 2008 at 11:05:24PM -0500, Theodore Tso wrote:
> On Fri, Nov 21, 2008 at 10:14:35PM +0530, Aneesh Kumar K.V wrote:
> > We need to make sure we update the inode bitmap and clear
> > EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held. We look
> > at EXT4_BG_INODE_UNINIT and reinit the inode bitmap each
> > time in ext4_read_inode_bitmap (introduced by
> > c806e68f5647109350ec546fee5b526962970fd2 )
> 
> OK, I believe I've checked in all of your patches in this series into
> the ext4 patch queue
> 
> Some of them have comments that still need to be cleared; this one in
> particular needs a better commit comment, and ideally a comment for
> the new function ext4_claim_inode().

I will add a comment to the above function.

> 
> Also, please don't rename variables unnecessarily; if you really think
> it's needed, please do so in a separate patch.  The renaming of
> variables makes it much harder to review the patch, since it bloats
> the patch, and obscures the true changes happening in the patch.
> Please explain why you are making some of the changes you made in the
> patch.  In particular, why does it matter the order in which you
> unlock the bh and sb_bgl_lock in balloc.c, mballoc.c and inode.c?
> 
> 

I will put the variable name cleanup and unlock cleanup into separate
patch. The unlock is done as a cleanup so that unlock appears in the
reverse order with which we did locking. It doesn't make any difference.

-aneesh

Updated patch below

ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held. We look
at EXT4_BG_INODE_UNINIT and reinit the inode bitmap each
time in ext4_read_inode_bitmap (introduced by
c806e68f5647109350ec546fee5b526962970fd2 )

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/ialloc.c |  146 ++++++++++++++++++++++++++++++++----------------------
 1 files changed, 86 insertions(+), 60 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 229708b..d1ccae5 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -573,6 +573,79 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
 }
 
 /*
+ * claim the inode from the inode bitmap. If the group
+ * is uninit we need to take the groups's sb_bgl_lock
+ * and clear the uninit flag. The inode bitmap update
+ * and group desc uninit flag clear should be done
+ * after holding sb_bgl_lock so that ext4_read_inode_bitmap
+ * doesn't race with the ext4_claim_inode
+ */
+static int ext4_claim_inode(struct super_block *sb,
+			struct buffer_head *inode_bitmap_bh,
+			unsigned long ino, ext4_group_t group, int mode)
+{
+	int free = 0, retval = 0, count;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL);
+
+	spin_lock(sb_bgl_lock(sbi, group));
+	if (ext4_set_bit(ino, inode_bitmap_bh->b_data)) {
+		/* not a free inode */
+		retval = 1;
+		goto err_ret;
+	}
+	ino++;
+	if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
+			ino > EXT4_INODES_PER_GROUP(sb)) {
+		spin_unlock(sb_bgl_lock(sbi, group));
+		ext4_error(sb, __func__,
+			   "reserved inode or inode > inodes count - "
+			   "block_group = %u, inode=%lu", group,
+			   ino + group * EXT4_INODES_PER_GROUP(sb));
+		return 1;
+	}
+	/* If we didn't allocate from within the initialized part of the inode
+	 * table then we need to initialize up to this inode. */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+
+		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
+			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
+			/* When marking the block group with
+			 * ~EXT4_BG_INODE_UNINIT we don't want to depend
+			 * on the value of bg_itable_unused even though
+			 * mke2fs could have initialized the same for us.
+			 * Instead we calculated the value below
+			 */
+
+			free = 0;
+		} else {
+			free = EXT4_INODES_PER_GROUP(sb) -
+				ext4_itable_unused_count(sb, gdp);
+		}
+
+		/*
+		 * Check the relative inode number against the last used
+		 * relative inode number in this group. if it is greater
+		 * we need to  update the bg_itable_unused count
+		 *
+		 */
+		if (ino > free)
+			ext4_itable_unused_set(sb, gdp,
+					(EXT4_INODES_PER_GROUP(sb) - ino));
+	}
+	count = ext4_free_inodes_count(sb, gdp) - 1;
+	ext4_free_inodes_set(sb, gdp, count);
+	if (S_ISDIR(mode)) {
+		count = ext4_used_dirs_count(sb, gdp) + 1;
+		ext4_used_dirs_set(sb, gdp, count);
+	}
+	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+err_ret:
+	spin_unlock(sb_bgl_lock(sbi, group));
+	return retval;
+}
+
+/*
  * There are two policies for allocating an inode.  If the new inode is
  * a directory, then a forward search is made for a block group with both
  * free space and a low directory-to-inode ratio; if that fails, then of
@@ -594,7 +667,7 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 	struct ext4_super_block *es;
 	struct ext4_inode_info *ei;
 	struct ext4_sb_info *sbi;
-	int ret2, err = 0, count;
+	int ret2, err = 0;
 	struct inode *ret;
 	ext4_group_t i;
 	int free = 0;
@@ -657,8 +730,13 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 			if (err)
 				goto fail;
 
-			if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
-						ino, inode_bitmap_bh->b_data)) {
+			BUFFER_TRACE(group_desc_bh, "get_write_access");
+			err = ext4_journal_get_write_access(handle,
+								group_desc_bh);
+			if (err)
+				goto fail;
+			if (!ext4_claim_inode(sb, inode_bitmap_bh,
+						ino, group, mode)) {
 				/* we won it */
 				BUFFER_TRACE(inode_bitmap_bh,
 					"call ext4_journal_dirty_metadata");
@@ -666,10 +744,13 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 							inode_bitmap_bh);
 				if (err)
 					goto fail;
+				/* zero bit is inode number 1*/
+				ino++;
 				goto got;
 			}
 			/* we lost it */
 			jbd2_journal_release_buffer(handle, inode_bitmap_bh);
+			jbd2_journal_release_buffer(handle, group_desc_bh);
 
 			if (++ino < EXT4_INODES_PER_GROUP(sb))
 				goto repeat_in_this_group;
@@ -689,22 +770,6 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 	goto out;
 
 got:
-	ino++;
-	if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
-	    ino > EXT4_INODES_PER_GROUP(sb)) {
-		ext4_error(sb, __func__,
-			   "reserved inode or inode > inodes count - "
-			   "block_group = %u, inode=%lu", group,
-			   ino + group * EXT4_INODES_PER_GROUP(sb));
-		err = -EIO;
-		goto fail;
-	}
-
-	BUFFER_TRACE(group_desc_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, group_desc_bh);
-	if (err)
-		goto fail;
-
 	/* We may have to initialize the block bitmap if it isn't already */
 	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
@@ -741,49 +806,10 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)
 		if (err)
 			goto fail;
 	}
-
-	spin_lock(sb_bgl_lock(sbi, group));
-	/* If we didn't allocate from within the initialized part of the inode
-	 * table then we need to initialize up to this inode. */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
-			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-
-			/* When marking the block group with
-			 * ~EXT4_BG_INODE_UNINIT we don't want to depend
-			 * on the value of bg_itable_unused even though
-			 * mke2fs could have initialized the same for us.
-			 * Instead we calculated the value below
-			 */
-
-			free = 0;
-		} else {
-			free = EXT4_INODES_PER_GROUP(sb) -
-				ext4_itable_unused_count(sb, gdp);
-		}
-
-		/*
-		 * Check the relative inode number against the last used
-		 * relative inode number in this group. if it is greater
-		 * we need to  update the bg_itable_unused count
-		 *
-		 */
-		if (ino > free)
-			ext4_itable_unused_set(sb, gdp,
-					(EXT4_INODES_PER_GROUP(sb) - ino));
-	}

  reply	other threads:[~2008-11-24 11:16 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-21 16:44 [PATCH -V2 1/5] ext4: Remove unneeded code Aneesh Kumar K.V
2008-11-21 16:44 ` [PATCH -V2 2/5] ext4: unlock group before ext4_error Aneesh Kumar K.V
2008-11-21 16:44   ` [PATCH -V2 3/5] ext4: Fix the race between read_block_bitmap and mark_diskspace_used Aneesh Kumar K.V
2008-11-21 16:44     ` [PATCH -V2 4/5] ext4: Use both hi and lo bits of the group desc values Aneesh Kumar K.V
2008-11-21 16:44       ` [PATCH -V2 5/5] ext4: Fix the race between read_inode_bitmap and ext4_new_inode Aneesh Kumar K.V
2008-11-21 17:30         ` Eric Sandeen
2008-11-23 19:26         ` Theodore Tso
2008-11-24  4:05         ` Theodore Tso
2008-11-24 11:15           ` Aneesh Kumar K.V [this message]
2008-11-21 17:29       ` [PATCH -V2 4/5] ext4: Use both hi and lo bits of the group desc values Eric Sandeen
2008-11-21 17:41         ` Aneesh Kumar K.V
2008-11-21 17:53           ` Eric Sandeen
2008-11-23  4:09             ` Andreas Dilger
2008-11-24  1:21               ` Theodore Tso
2008-11-24  2:13           ` Theodore Tso
2008-11-24 10:38             ` Aneesh Kumar K.V
2008-11-21 17:22     ` [PATCH -V2 3/5] ext4: Fix the race between read_block_bitmap and mark_diskspace_used Eric Sandeen
2008-11-21 17:31       ` Aneesh Kumar K.V
2008-11-21 17:39         ` Aneesh Kumar K.V
2008-11-21 17:40           ` Eric Sandeen
2008-11-21 17:39         ` Eric Sandeen
2008-11-23 19:02         ` Theodore Tso
2008-11-24  6:40           ` Aneesh Kumar K.V
2008-11-23 14:00     ` Theodore Tso
2008-11-24  7:14       ` Alex Zhuravlev
2008-11-24 11:33         ` Aneesh Kumar K.V
2008-11-24 16:36           ` Alex Zhuravlev
2008-11-24 16:43             ` Aneesh Kumar K.V
2008-11-24 18:03               ` Alex Zhuravlev
2008-11-24 18:12                 ` Aneesh Kumar K.V
2008-11-24 18:17                   ` Alex Zhuravlev
2008-11-24 18:21                     ` Aneesh Kumar K.V
2008-11-24 18:28                       ` Alex Zhuravlev
2008-11-24 18:41                       ` Alex Zhuravlev
2008-11-25 14:29           ` Frédéric Bohé
2008-11-25 16:38             ` Alex Zhuravlev
2008-11-23 13:37   ` [PATCH -V2 2/5] ext4: unlock group before ext4_error Theodore Tso
2008-11-23 13:43     ` Theodore Tso
2008-11-23 13:59       ` Aneesh Kumar K.V
2008-11-21 17:20 ` [PATCH -V2 1/5] ext4: Remove unneeded code Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081124111553.GB8462@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.