From: Dave Chinner <david@fromorbit.com>
To: Mark Tinguely <tinguely@sgi.com>
Cc: Dave Jones <davej@redhat.com>, CAI Qian <caiqian@redhat.com>,
xfs@oss.sgi.com
Subject: Re: xfs_efi_item slab corruption. (v3.9-10936-g51a26ae)
Date: Wed, 8 May 2013 08:22:56 +1000 [thread overview]
Message-ID: <20130507222256.GD24635@dastard> (raw)
In-Reply-To: <518962FC.2060509@sgi.com>
On Tue, May 07, 2013 at 03:24:28PM -0500, Mark Tinguely wrote:
> On 05/07/13 15:22, Dave Jones wrote:
> >On Tue, May 07, 2013 at 03:04:33PM -0500, Mark Tinguely wrote:
> > > On 05/07/13 14:59, Dave Jones wrote:
> > > > On Tue, May 07, 2013 at 02:58:15PM -0500, Mark Tinguely wrote:
> > > >
> > > > > > I can hit this almost instantly with fsx. I'll do a bisect, though
> > > > > > it sounds like you already have a suspect.
> > > > > >
> > > > >
> > > > > If you want to try kmem debug of Linux 3.8 that would help.
> > > >
> > > > I'm not sure what that is.
> > >
> > > Sorry, if you would test Linux 3.8 with "CONFIG_DEBUG_SLAB=y".
> >
> >Ah, done that. (I pretty much always run with it).
> >
> >This is something new. Even 3.9 was fine. It's only since
> >the recent xfs merge.
> >
> > Dave
> >
>
> git revert 666d644cd72a9ec58b353209ff191d7430f3b357
That won't prevent the use after free. That commit fixed a problem
that could lead to a use after free, but what we are seeing here is
that it has ultimately exposed a previously unknown issue that
causes the use after free.
Basically what is happening is that there are two commits for the
EFD being processed, when there should only be one. I'm not sure how
this is happening yet, but these three traces came out from my debug
sequentially when running generic/006:
__xfs_efi_release 0xffff88006c4d7058 2
Pid: 4756, comm: kworker/1:1H Tainted: G B 3.9.0-rc8-dgc+ #592
Call Trace:
[<ffffffff814e43a5>] __xfs_efi_release+0x35/0x80
[<ffffffff814e444d>] xfs_efi_item_unpin+0x5d/0x70
[<ffffffff814dbb88>] xfs_trans_committed_bulk+0x248/0x2e0
[<ffffffff814e18db>] xlog_cil_committed+0x3b/0x140
[<ffffffff814ddfae>] xlog_state_do_callback+0x19e/0x3a0
.....
That's the EFI being unpinned and it's reference going away during
transaction commit completion procesing.
__xfs_efi_release 0xffff88006c4d7058 1
Pid: 4756, comm: kworker/1:1H Tainted: G B 3.9.0-rc8-dgc+ #592
Call Trace:
[<ffffffff814e43a5>] __xfs_efi_release+0x35/0x80
[<ffffffff814e4620>] xfs_efi_release+0x60/0x80
[<ffffffff814e46a6>] xfs_efd_item_committed+0x26/0x40
[<ffffffff814dba21>] xfs_trans_committed_bulk+0xe1/0x2e0
[<ffffffff814e18db>] xlog_cil_committed+0x3b/0x140
[<ffffffff814ddfae>] xlog_state_do_callback+0x19e/0x3a0
.....
That's the EFD releasing the EFI, which drops it's reference count
and frees the both the EFI and the EFD.
__xfs_efi_release 0xffff88006c4d7058 1802201963
Pid: 4756, comm: kworker/1:1H Tainted: G B 3.9.0-rc8-dgc+ #592
Call Trace:
[<ffffffff814e43a5>] __xfs_efi_release+0x35/0x80
[<ffffffff814e4630>] xfs_efi_release+0x70/0x80
[<ffffffff814e46a6>] xfs_efd_item_committed+0x26/0x40
[<ffffffff814dba21>] xfs_trans_committed_bulk+0xe1/0x2e0
[<ffffffff814e18db>] xlog_cil_committed+0x3b/0x140
.....
And there's the EFI being released by an EFD after it has been
freed. It's definitely been freed because 1802201963 =
0x6b6b6b6b....
Like I said, I don't understand how this is happening yet, but just
seeing this behaviour explains an awful lot about why we've had long
term issues with this code....
FWIW, I used to run with slab debugging turned on for all my debug
builds. I switched my .config from SLAB to SLUB a few months ago,
and didn't realise this silently turned of memory poisoning. That's
how it slipped through my testing....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-05-07 22:23 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-07 13:37 xfs_efi_item slab corruption. (v3.9-10936-g51a26ae) Dave Jones
2013-05-07 19:04 ` Mark Tinguely
2013-05-07 19:07 ` Dave Jones
2013-05-07 19:24 ` Mark Tinguely
2013-05-07 19:31 ` Dave Jones
2013-05-07 19:58 ` Mark Tinguely
2013-05-07 19:59 ` Dave Jones
2013-05-07 20:04 ` Mark Tinguely
2013-05-07 20:22 ` Dave Jones
2013-05-07 20:24 ` Mark Tinguely
2013-05-07 22:22 ` Dave Chinner [this message]
2013-05-07 22:45 ` Mark Tinguely
2013-05-07 23:54 ` Dave Chinner
2013-05-08 0:43 ` Dave Jones
2013-05-08 13:24 ` Mark Tinguely
2013-05-10 1:38 ` Dave Chinner
2013-05-11 12:09 ` Dmitry Monakhov
2013-05-12 0:40 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130507222256.GD24635@dastard \
--to=david@fromorbit.com \
--cc=caiqian@redhat.com \
--cc=davej@redhat.com \
--cc=tinguely@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox