From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Eric Sandeen <sandeen@redhat.com>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH] fix bb_prealloc_list corruption due to wrong group locking
Date: Mon, 16 Mar 2009 11:14:27 +0530 [thread overview]
Message-ID: <20090316054427.GA17376@skywalker> (raw)
In-Reply-To: <49BAD6D9.3010505@redhat.com>
On Fri, Mar 13, 2009 at 04:57:45PM -0500, Eric Sandeen wrote:
> This is for Red Hat bug 490026,
> EXT4 panic, list corruption in ext4_mb_new_inode_pa
>
> (this was on backported ext4 from 2.6.29)
>
> We hit a BUG() in __list_add from ext4_mb_new_inode_pa()
> because the list head pointed to a removed item:
>
> list_add corruption. next->prev should be ffff81042f2fe158,
> but was 0000000000200200
>
> (0000000000200200 is LIST_POISON2, set when the item is deleted)
>
> ext4_lock_group(sb, group) is supposed to protect this list for
> each group, and a common code flow is this:
>
> ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
> ext4_lock_group(sb, grp);
> list_del(&pa->pa_group_list);
> ext4_unlock_group(sb, grp);
>
> so its critical that we get the right group number back for
> this pa->pa_pstart block.
>
> however, ext4_mb_put_pa passes in (pa->pa_pstart - 1) with a
> comment, "-1 is to protect from crossing allocation group"
>
> Other list-manipulators do not use the "-1" so we have the
> potential to lock the wrong group and race. Given how the
> ext4_get_group_no_and_offset() function works, it doesn't seem
> to me that the subtraction is correct.
>
> I've not been able to reproduce the bug, so this is by inspection.
>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
>
> Index: linux-2.6/fs/ext4/mballoc.c
> ===================================================================
> --- linux-2.6.orig/fs/ext4/mballoc.c
> +++ linux-2.6/fs/ext4/mballoc.c
> @@ -3603,8 +3603,7 @@ static void ext4_mb_put_pa(struct ext4_a
> pa->pa_deleted = 1;
> spin_unlock(&pa->pa_lock);
>
> - /* -1 is to protect from crossing allocation group */
> - ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
> + ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
>
> /*
> * possible race:
>
But the change is needed for lg prealloc space because locality group
prealloc reduce pa_pstart on block allocation and once fully allocated
pa_pstart can point to the next block group. But what you found is also
correct for inode prealloc space. I guess the code broke due to FLEX_BG
because without FLEX_BG pa_pstart will never be the first block in the
group so even for inode prealloc space pa_pstart - 1 would be in the
same group. You may want to do
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4415bee..b4656f7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3585,11 +3585,10 @@ static void ext4_mb_pa_callback(struct rcu_head *head)
* drops a reference to preallocated space descriptor
* if this was the last reference and the space is consumed
*/
-static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
- struct super_block *sb, struct ext4_prealloc_space *pa)
+static void ext4_mb_put_pa(struct super_block *sb,
+ ext4_group_t grp, struct ext4_prealloc_space *pa)
{
- ext4_group_t grp;
-
+
if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
return;
@@ -3602,10 +3601,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
pa->pa_deleted = 1;
spin_unlock(&pa->pa_lock);
-
- /* -1 is to protect from crossing allocation group */
- ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
-
+
/*
* possible race:
*
@@ -4469,8 +4465,11 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
*/
static int ext4_mb_release_context(struct ext4_allocation_context *ac)
{
+ ext4_group_t grp;
struct ext4_prealloc_space *pa = ac->ac_pa;
if (pa) {
+ ext4_get_group_no_and_offset(ac->ac_sb,
+ pa->pa_pstart, &grp, NULL);
if (pa->pa_linear) {
/* see comment in ext4_mb_use_group_pa() */
spin_lock(&pa->pa_lock);
@@ -4497,7 +4496,7 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac)
spin_unlock(pa->pa_obj_lock);
ext4_mb_add_n_trim(ac);
}
- ext4_mb_put_pa(ac, ac->ac_sb, pa);
+ ext4_mb_put_pa(ac->ac_sb, grp, pa);
}
if (ac->ac_bitmap_page)
page_cache_release(ac->ac_bitmap_page);
next prev parent reply other threads:[~2009-03-16 5:44 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-13 21:57 [PATCH] fix bb_prealloc_list corruption due to wrong group locking Eric Sandeen
2009-03-13 22:20 ` Eric Sandeen
2009-03-14 4:41 ` Eric Sandeen
2009-03-16 5:44 ` Aneesh Kumar K.V [this message]
2009-03-16 15:03 ` Eric Sandeen
2009-03-16 16:47 ` [PATCH V2] " Eric Sandeen
2009-03-16 17:12 ` Aneesh Kumar K.V
2009-03-16 17:28 ` [PATCH V3] " Eric Sandeen
2009-03-16 17:42 ` Frank Mayhar
2009-03-16 17:48 ` Eric Sandeen
2009-03-16 17:53 ` Frank Mayhar
2009-03-18 16:11 ` Frank Mayhar
2009-03-18 16:17 ` Eric Sandeen
2009-03-18 18:11 ` Theodore Tso
2009-03-17 3:30 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090316054427.GA17376@skywalker \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.