From: Dave Chinner <david@fromorbit.com>
To: "Rafał Kupka" <kupson@kupson.net>
Cc: Christoph Hellwig <hch@infradead.org>, xfs@oss.sgi.com
Subject: Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
Date: Tue, 27 Mar 2012 14:28:43 +1100 [thread overview]
Message-ID: <20120327032843.GV5091@dastard> (raw)
In-Reply-To: <1332777925.14696.194.camel@utemp.mieszkanie>
On Mon, Mar 26, 2012 at 06:05:25PM +0200, Rafał Kupka wrote:
> On Thu, 2012-03-08 at 20:10 +0100, Rafał Kupka wrote:
> > On Wed, 2012-03-07 at 07:30 +1100, Dave Chinner wrote:
>
> Hi,
>
> Finally I've captured this event trace. Hope it will help.
>
> > > Also, it might be worthwhile to capture an event trace (e.g. all the
> > > xfs_buf events) to see what occurs just before the error is
> > > triggered. That might tell us exactl what sequence has occurred
> > > leading up to the error.
> >
> > I will try to provide this information in a few days.
>
> Kernel version - Linus tree up to:
> 0e6e847f "xfs: stop using the page cache to back the buffer cache"
>
> It's uniprocessor virtual machine (KVM) with 384 MB RAM.
Ok, so limited memory and no parallelism. Can you send me you
.config so I can have a look at other various config options?
> Full event trace is available at https://gist.github.com/2206109
> (sorry for it's *.perl extension, my mistake).
>
> Kernel log entry:
> [ 495.197571] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80 ("xfs_trans_read_buf") error 5 buf count 8192
> [ 495.198283] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> [ 501.540477] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80 ("xfs_trans_read_buf") error 5 buf count 8192
> [ 501.543047] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
>
> It's possible that this "I/O error" is caused by memory pressure?
>
> Related trace line:
> xfsdump-2498 [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
>
> Events related to "bno 0x13d80":
> xfsdump-2498 [000] 24195.782266: xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
> xfsdump-2498 [000] 24195.782266: xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf
> xfsdump-2498 [000] 24195.782267: xfs_buf_iorequest: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller _xfs_buf_read
> xfsdump-2498 [000] 24195.782267: xfs_buf_hold: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
> xfsdump-2498 [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
> xfsdump-2498 [000] 24195.782270: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
> xfsdump-2498 [000] 24195.782985: xfs_buf_unlock: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_buf_relse
> xfsdump-2498 [000] 24195.782986: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_trans_read_buf
>
> Do this data put some light on this issue?
Yes, that it was errored out because the bio that was built from the
buffer was zero length. That means the buffer was not set up
properly in the first place. Hmmm, this is unusual - the two lines
above your traces:
xfs_buf_cond_lock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|TRYLOCK|PAGES caller _xfs_buf_find
xfs_buf_find: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf
The lookup found an uninitialised buffer in the cache. That's
exceedingly curious - the match is done on bp->b_file_offset, and
after it has been inserted it the bno is then filled out.
Oh, wow, I think I see the bug. The key is the initial state of that
uninitialised buffer: READ|ASYNC|TRYLOCK|PAGES, and the state of
the it from the xfs_buf_get() trace READ|LOCK|DONT_BLOCK. Those two
sets of flags do not match up - the initial set indicate the buffer
has been inserted into the cache for readahead, the second set
indicate a blocking read.
You've only got a small amount of memory, so I suspect that a memory
allocation has failed in a specific place to cause this. I'm not
exactly sure how the second thread trips over it and initialises it
somewhat (there's 3 initialisation steps to a buffer) but the
initial failure is probably this:
async readhead started
_xfs_buf_find fails
allocate new buffer A
partially initialise A
_xfs_buf_find
fail to find match
insert buffer A into rbtree
returns buffer A
A == A
fails allocate pages to buffer
unlock buffer
release buffer
<uninitialised buffer now sitting in cache. THIS IS WRONG!>
That would lead to a second lookup finding it and using it, but I'm
still not quite sure how the buffer bno gets initialised. I can
construct valid scenarios that will cause it to happen, but none
match the event trace you've posted.
And just looking at the way we clean up this error, this race
condition could lead to a use after free if we immediately free
if we fail to allocate the pages to the buffer. Messy. I'll have to
think on this a bit - maybe a "broken" state is necessary.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-03-27 3:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-29 18:20 XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598 Rafał Kupka
2012-03-02 10:43 ` Christoph Hellwig
2012-03-03 11:37 ` Rafał Kupka
2012-03-03 11:44 ` Christoph Hellwig
2012-03-05 21:40 ` Rafał Kupka
2012-03-06 20:30 ` Dave Chinner
2012-03-08 19:10 ` Rafał Kupka
2012-03-26 16:05 ` Rafał Kupka
2012-03-27 3:28 ` Dave Chinner [this message]
2012-03-27 10:17 ` Dave Chinner
2012-03-30 16:02 ` Rafał Kupka
2012-04-04 9:19 ` Rafał Kupka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120327032843.GV5091@dastard \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=kupson@kupson.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox