Re: BUG_ON at mballoc.c:3752 - Aneesh Kumar K.V

Linux EXT4 FS development
 help / color / mirror / Atom feed

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Eric Sesterhenn <snakebyte@gmx.de>
Cc: linux-ext4@vger.kernel.org,
	Dave Kleikamp <shaggy@linux.vnet.ibm.com>,
	Eric Sandeen <sandeen@redhat.com>
Subject: Re: BUG_ON at mballoc.c:3752
Date: Thu, 7 Feb 2008 18:25:48 +0530	[thread overview]
Message-ID: <20080207125548.GA8701@skywalker> (raw)
In-Reply-To: <1202335188.6886.15.camel@norville.austin.ibm.com>

On Wed, Feb 06, 2008 at 03:59:48PM -0600, Dave Kleikamp wrote:
> 
> File systems should not call BUG() due to a corrupt file system.
> Instead the code should fail the operation, possibly marking the file
> system read-only (or panicking) depending on the errors= mount option.
> 

Eric Sandeen explained me the same on IRC. I was busy with the migrate
locking bug. That's why i didn't update here. Today i tried to reproduce
the problem using the image provided. But in my case it is not hitting
the BUG_ON (mostly due to single cpu). I did look at the code and am not
still not clear how we can hit that BUG_ON. prealloc free space pa_free is
generated out of bitmap. So only if something corrupted bitmap after we
initialized prealloc space we will hit this case. In mballoc we error out
if the block allocated or fall in system zone. One thing i noticed is,
the journal is corrupt. So the only possibility that i have is journal write
resulted in bitmap corruption.

I also looked at the mballoc to make sure we don't panic in case of a
corrupt bitmap. Below is the patch that i have now. This one is yet to
go through the ABAT test but it would be nice to see whether the below
change cause any other issues.

Eric ,
can you run the test with below patch and see if this makes any
difference ?. I know we are not fixing any bugs in the below patch.


ext4: Don't panic in case of corrupt bitmap

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Multiblock allocator was calling BUG_ON in many case if the free and used
blocks count obtained looking at the bitmap is different from what
the allocator internally accounted for. Use ext4_error in such case
and don't panic the system.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---

 fs/ext4/mballoc.c |   35 +++++++++++++++++++++--------------
 1 files changed, 21 insertions(+), 14 deletions(-)


diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 06d1f52..656729b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -680,7 +680,6 @@ static void *mb_find_buddy(struct ext4_buddy *e4b, int order, int *max)
 {
 	char *bb;
 
-	/* FIXME!! is this needed */
 	BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
 	BUG_ON(max == NULL);
 
@@ -964,7 +963,7 @@ static void ext4_mb_generate_buddy(struct super_block *sb,
 	grp->bb_fragments = fragments;
 
 	if (free != grp->bb_free) {
-		printk(KERN_DEBUG
+		ext4_error(sb, __FUNCTION__,
 			"EXT4-fs: group %lu: %u blocks in bitmap, %u in gd\n",
 			group, free, grp->bb_free);
 		grp->bb_free = free;
@@ -1821,13 +1820,24 @@ static void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
 		i = ext4_find_next_zero_bit(bitmap,
 						EXT4_BLOCKS_PER_GROUP(sb), i);
 		if (i >= EXT4_BLOCKS_PER_GROUP(sb)) {
-			BUG_ON(free != 0);
+			/*
+			 * IF we corrupt the bitmap  we won't find any
+			 * free blocks even though group info says we
+			 * we have free blocks
+			 */
+			ext4_error(sb, __FUNCTION__, "%d free blocks as per "
+					"group info. But bitmap says 0\n",
+					free);
 			break;
 		}
 
 		mb_find_extent(e4b, 0, i, ac->ac_g_ex.fe_len, &ex);
 		BUG_ON(ex.fe_len <= 0);
-		BUG_ON(free < ex.fe_len);
+		if (free < ex.fe_len) {
+			ext4_error(sb, __FUNCTION__, "%d free blocks as per "
+					"group info. But got %d blocks\n",
+					free, ex.fe_len);
+		}
 
 		ext4_mb_measure_extent(ac, &ex, e4b);
 
@@ -3354,13 +3364,10 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
 	ac->ac_pa = pa;
 
 	/* we don't correct pa_pstart or pa_plen here to avoid
-	 * possible race when tte group is being loaded concurrently
+	 * possible race when the group is being loaded concurrently
 	 * instead we correct pa later, after blocks are marked
-	 * in on-disk bitmap -- see ext4_mb_release_context() */
-	/*
-	 * FIXME!! but the other CPUs can look at this particular
-	 * pa and think that it have enought free blocks if we
-	 * don't update pa_free here right ?
+	 * in on-disk bitmap -- see ext4_mb_release_context()
+	 * Other CPUs are prevented from allocating from this pa by lg_mutex
 	 */
 	mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
 }
@@ -3743,13 +3750,13 @@ static int ext4_mb_release_inode_pa(struct ext4_buddy *e4b,
 		bit = next + 1;
 	}
 	if (free != pa->pa_free) {
-		printk(KERN_ERR "pa %p: logic %lu, phys. %lu, len %lu\n",
+		printk(KERN_CRIT "pa %p: logic %lu, phys. %lu, len %lu\n",
 			pa, (unsigned long) pa->pa_lstart,
 			(unsigned long) pa->pa_pstart,
 			(unsigned long) pa->pa_len);
-		printk(KERN_ERR "free %u, pa_free %u\n", free, pa->pa_free);
+		ext4_error(sb, __FUNCTION__, "free %u, pa_free %u\n",
+						free, pa->pa_free);
 	}
-	BUG_ON(free != pa->pa_free);
 	atomic_add(free, &sbi->s_mb_discarded);
 
 	return err;
@@ -4405,7 +4412,7 @@ void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 			unsigned long block, unsigned long count,
 			int metadata, unsigned long *freed)
 {
-	struct buffer_head *bitmap_bh = 0;
+	struct buffer_head *bitmap_bh = NULL;
 	struct super_block *sb = inode->i_sb;
 	struct ext4_allocation_context ac;
 	struct ext4_group_desc *gdp;

next prev parent reply	other threads:[~2008-02-07 12:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-31 14:01 BUG_ON at mballoc.c:3752 Eric Sesterhenn
2008-01-31 15:42 ` Eric Sesterhenn
2008-02-04  6:00   ` Aneesh Kumar K.V
2008-02-04  8:29     ` Eric Sesterhenn
2008-02-06 21:59     ` Dave Kleikamp
2008-02-07 12:55       ` Aneesh Kumar K.V [this message]
2008-02-08  1:30         ` Mingming Cao
2008-02-08  3:14           ` Aneesh Kumar K.V
2008-02-08  3:30             ` Eric Sandeen
2008-02-08 13:47         ` Eric Sesterhenn
2008-02-08 15:07         ` Eric Sesterhenn
2008-02-08 15:26           ` Eric Sandeen
2008-02-08 15:31             ` Eric Sesterhenn
2008-02-08 15:33               ` Eric Sandeen
2008-02-01 19:34 ` Mingming Cao

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:06d1f52 dfblob:656729b )
 OR (
bs:"Re: BUG_ON at mballoc.c:3752" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080207125548.GA8701@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=shaggy@linux.vnet.ibm.com \
    --cc=snakebyte@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox