All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH] fix bb_prealloc_list corruption due to wrong group locking
Date: Fri, 13 Mar 2009 23:41:00 -0500	[thread overview]
Message-ID: <49BB355C.2070802@redhat.com> (raw)
In-Reply-To: <49BADC23.3060605@redhat.com>

Eric Sandeen wrote:
> Eric Sandeen wrote:
>> This is for Red Hat bug 490026,
>> EXT4 panic, list corruption in ext4_mb_new_inode_pa
>>
>> (this was on backported ext4 from 2.6.29)
>>
>> We hit a BUG() in __list_add from  ext4_mb_new_inode_pa()
>> because the list head pointed to a removed item:
>>
>> list_add corruption. next->prev should be ffff81042f2fe158,
>> but was 0000000000200200
>>
>> (0000000000200200 is LIST_POISON2, set when the item is deleted)
>>
>> ext4_lock_group(sb, group) is supposed to protect this list for
>> each group, and a common code flow is this:
>>
>>     ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
>>     ext4_lock_group(sb, grp);
>>     list_del(&pa->pa_group_list);
>>     ext4_unlock_group(sb, grp);
>>
>> so its critical that we get the right group number back for
>> this pa->pa_pstart block.
>>
>> however, ext4_mb_put_pa passes in (pa->pa_pstart - 1) with a 
>> comment, "-1 is to protect from crossing allocation group"
>>
>> Other list-manipulators do not use the "-1" so we have the 
>> potential to lock the wrong group and race.  Given how the 
>> ext4_get_group_no_and_offset() function works, it doesn't seem
>> to me that the subtraction is correct.
> 
> Hm, unless pa_pstart gets advanced to the point where it's in the next
> group when it's used up...  might be more reading to do here.

Ok I think I was on the right track here.  It looks like for group_pa
(with pa_linear == 1), pa->pa_pstart is advanced as it's used (actually
in ext4_mb_release_context(), but that's a detail) so by the time it is
used up, pa->pa_pstart has advanced into the next group* and therefore
subtracting one to find the group it belong(ed) to is correct.

However, for inode_pa (with pa_linear == 0) only pa_free is decremented,
and pa_pstart does not move.  Therefore subtracting one from pa_pstart
in ext4_mb_put_pa is actually grabbing the previous block group's lock
in the inode case, and we open a race with other threads which are
locking the correct group.

I'll do a bit more testing/reading but I think that what we probably
need is something like:

static void ext4_mb_put_pa(struct ext4_allocation_context *ac, ...)
{
...
        /* -1 is to protect from crossing allocation group */
	if (pa->pa_linear)
		pa->pa_pstart--;
        ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
...

Could probably come up with something clearer, but that's the gist of it.

-Eric


*i.e. if pa_start began at 0, and the group had 512 blocks, when all
blocks are used, pa_start has advanced by 512, and "512" is the first
block in the *next* group, so we need to trim one off in that case.

  reply	other threads:[~2009-03-14  4:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-13 21:57 [PATCH] fix bb_prealloc_list corruption due to wrong group locking Eric Sandeen
2009-03-13 22:20 ` Eric Sandeen
2009-03-14  4:41   ` Eric Sandeen [this message]
2009-03-16  5:44 ` Aneesh Kumar K.V
2009-03-16 15:03   ` Eric Sandeen
2009-03-16 16:47 ` [PATCH V2] " Eric Sandeen
2009-03-16 17:12   ` Aneesh Kumar K.V
2009-03-16 17:28   ` [PATCH V3] " Eric Sandeen
2009-03-16 17:42     ` Frank Mayhar
2009-03-16 17:48       ` Eric Sandeen
2009-03-16 17:53         ` Frank Mayhar
2009-03-18 16:11           ` Frank Mayhar
2009-03-18 16:17             ` Eric Sandeen
2009-03-18 18:11               ` Theodore Tso
2009-03-17  3:30     ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49BB355C.2070802@redhat.com \
    --to=sandeen@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.