From: Dave Jones <davej@codemonkey.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Mason <clm@fb.com>, Jens Axboe <axboe@fb.com>,
Al Viro <viro@zeniv.linux.org.uk>, Josef Bacik <jbacik@fb.com>,
David Sterba <dsterba@suse.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Andrew Lutomirski <luto@kernel.org>
Subject: Re: bio linked list corruption.
Date: Thu, 20 Oct 2016 18:48:16 -0400 [thread overview]
Message-ID: <20161020224816.tshmynpaj7ekbh6t@codemonkey.org.uk> (raw)
In-Reply-To: <CA+55aFxA1QrO2sBwwtEQVp3FFgs6CGWzJg+U5kif+-msFc90uA@mail.gmail.com>
On Tue, Oct 18, 2016 at 05:28:44PM -0700, Linus Torvalds wrote:
> On Tue, Oct 18, 2016 at 5:10 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Adding Andy to the cc, because this *might* be triggered by the
> > vmalloc stack code itself. Maybe the re-use of stacks showing some
> > problem? Maybe Chris (who can't see the problem) doesn't have
> > CONFIG_VMAP_STACK enabled?
>
> I bet it's the plug itself that is the stack address. In fact, it's
> probably that mq_list head pointer
So I've done a few experiments the last couple days.
1, I see some kind of disaster happen with every filesystem
ext4, btrfs, xfs. For some reason I can repro it faster on btrfs
(though xfs blew up pretty quickly too, but I don't know if it's
the same as this list corruption bug).
2, I ran for 24 hours with VMAP_STACK turned off. I saw some
_different_ btrfs problems, but I never hit that list debug corruption
once.
3, I turned vmap stacks back on, and got this pretty quickly.
Another new flavor of crash, but Chris recommended I post this one
because it looks interesting.
[ 3943.514961] BUG: Bad page state in process kworker/u8:14 pfn:482244
[ 3943.532400] page:ffffea0012089100 count:0 mapcount:0 mapping:ffff8804c40d6ae0 index:0x2f
[ 3943.551865] flags: 0x4000000000000008(uptodate)
[ 3943.561652] page dumped because: non-NULL mapping
[ 3943.587698] CPU: 2 PID: 26823 Comm: kworker/u8:14 Not tainted 4.9.0-rc1-think+ #9
[ 3943.607409] Workqueue: writeback wb_workfn
[ 3943.617194] (flush-btrfs-2)
[ 3943.617260] ffffc90001bf7870
[ 3943.627007] ffffffff8130c93c
[ 3943.627075] ffffea0012089100
[ 3943.627112] ffffffff819ff37c
[ 3943.627149] ffffc90001bf7898
[ 3943.636918] ffffffff81150fef
[ 3943.636985] 0000000000000000
[ 3943.637021] ffffea0012089100
[ 3943.637059] 4000000000000008
[ 3943.646965] ffffc90001bf78a8
[ 3943.647041] ffffffff811510aa
[ 3943.647081] ffffc90001bf78f0
[ 3943.647126] Call Trace:
[ 3943.657068] [<ffffffff8130c93c>] dump_stack+0x4f/0x73
[ 3943.666996] [<ffffffff81150fef>] bad_page+0xbf/0x120
[ 3943.676839] [<ffffffff811510aa>] free_pages_check_bad+0x5a/0x70
[ 3943.686646] [<ffffffff8115355b>] free_hot_cold_page+0x20b/0x270
[ 3943.696402] [<ffffffff8115387b>] free_hot_cold_page_list+0x2b/0x50
[ 3943.706092] [<ffffffff8115c1fd>] release_pages+0x2bd/0x350
[ 3943.715726] [<ffffffff8115d732>] __pagevec_release+0x22/0x30
[ 3943.725358] [<ffffffffa00a0d4e>] extent_write_cache_pages.isra.48.constprop.63+0x32e/0x400 [btrfs]
[ 3943.735126] [<ffffffffa00a1199>] extent_writepages+0x49/0x60 [btrfs]
[ 3943.744808] [<ffffffffa0081840>] ? btrfs_releasepage+0x40/0x40 [btrfs]
[ 3943.754457] [<ffffffffa007e993>] btrfs_writepages+0x23/0x30 [btrfs]
[ 3943.764085] [<ffffffff8115a91c>] do_writepages+0x1c/0x30
[ 3943.773667] [<ffffffff811f65f3>] __writeback_single_inode+0x33/0x180
[ 3943.783233] [<ffffffff811f6de8>] writeback_sb_inodes+0x2a8/0x5b0
[ 3943.792870] [<ffffffff811f733b>] wb_writeback+0xeb/0x1f0
[ 3943.802326] [<ffffffff811f7972>] wb_workfn+0xd2/0x280
[ 3943.811673] [<ffffffff810906e5>] process_one_work+0x1d5/0x490
[ 3943.821044] [<ffffffff81090685>] ? process_one_work+0x175/0x490
[ 3943.830447] [<ffffffff810909e9>] worker_thread+0x49/0x490
[ 3943.839756] [<ffffffff810909a0>] ? process_one_work+0x490/0x490
[ 3943.849074] [<ffffffff810909a0>] ? process_one_work+0x490/0x490
[ 3943.858264] [<ffffffff81095b5e>] kthread+0xee/0x110
[ 3943.867451] [<ffffffff81095a70>] ? kthread_park+0x60/0x60
[ 3943.876616] [<ffffffff81095a70>] ? kthread_park+0x60/0x60
[ 3943.885624] [<ffffffff81095a70>] ? kthread_park+0x60/0x60
[ 3943.894580] [<ffffffff81790492>] ret_from_fork+0x22/0x30
This feels like chasing a moving target, because the crash keeps changing..
I'm going to spend some time trying to at least pin down a selection
of syscalls that trinity can reproduce this with quickly.
Early-on, it seemed like this was xattr related, but now I'm not so sure.
Once or twice, I was able to repro it within a few minutes using just
writev, fsync, lsetxattr and lremovexattr. Then a day later, I found I
could run for a day before seeing it. Position of the moon or something.
Or it could have been entirely unrelated to the actual syscalls being run,
and based just on how contended the cpu/memory was.
Dave
next prev parent reply other threads:[~2016-10-20 22:48 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-11 14:45 btrfs bio linked list corruption Dave Jones
2016-10-11 15:11 ` Al Viro
2016-10-11 15:19 ` Dave Jones
2016-10-11 15:20 ` Chris Mason
2016-10-11 15:49 ` Dave Jones
2016-10-11 15:54 ` Chris Mason
2016-10-11 16:25 ` Dave Jones
2016-10-12 13:47 ` Dave Jones
2016-10-12 14:40 ` Dave Jones
2016-10-12 14:42 ` Chris Mason
2016-10-13 18:16 ` Dave Jones
2016-10-13 21:18 ` Chris Mason
2016-10-13 21:56 ` Dave Jones
2016-10-16 0:42 ` Dave Jones
2016-10-18 1:07 ` Chris Mason
2016-10-18 22:42 ` Dave Jones
2016-10-18 23:12 ` Jens Axboe
2016-10-18 23:31 ` Chris Mason
2016-10-18 23:36 ` Jens Axboe
2016-10-18 23:39 ` Linus Torvalds
2016-10-18 23:42 ` Chris Mason
2016-10-19 0:10 ` Linus Torvalds
2016-10-19 0:19 ` Chris Mason
2016-10-19 0:28 ` Linus Torvalds
2016-10-20 22:48 ` Dave Jones [this message]
2016-10-19 1:05 ` Andy Lutomirski
2016-10-20 22:50 ` Dave Jones
2016-10-20 23:01 ` Andy Lutomirski
2016-10-20 23:03 ` Dave Jones
2016-10-20 23:23 ` Andy Lutomirski
2016-10-21 20:02 ` Dave Jones
2016-10-21 20:17 ` Chris Mason
2016-10-21 20:23 ` Dave Jones
2016-10-21 20:38 ` Chris Mason
2016-10-21 20:41 ` Josef Bacik
2016-10-21 21:11 ` Dave Jones
2016-10-22 15:20 ` Dave Jones
2016-10-23 21:32 ` Chris Mason
2016-10-24 4:40 ` Dave Jones
2016-10-24 20:06 ` Andy Lutomirski
2016-10-24 20:46 ` Linus Torvalds
2016-10-24 21:17 ` Linus Torvalds
2016-10-24 21:50 ` Linus Torvalds
2016-10-24 22:02 ` Chris Mason
2016-10-24 22:42 ` Andy Lutomirski
2016-10-25 0:00 ` Linus Torvalds
2016-10-25 1:09 ` Andy Lutomirski
[not found] ` <77d9983d-a00a-1dc1-a9a1-631de1d0c146@fb.com>
2016-10-26 0:27 ` Dave Jones
2016-10-26 1:33 ` Linus Torvalds
2016-10-26 1:39 ` Linus Torvalds
2016-10-26 16:30 ` Dave Jones
2016-10-26 16:48 ` Linus Torvalds
2016-10-26 18:18 ` Dave Jones
2016-10-26 18:42 ` Dave Jones
2016-10-26 19:06 ` Linus Torvalds
2016-10-26 20:00 ` Chris Mason
2016-10-26 21:52 ` Chris Mason
2016-10-26 22:21 ` Linus Torvalds
2016-10-26 22:40 ` Dave Jones
2016-10-26 22:51 ` Linus Torvalds
2016-10-26 22:55 ` Jens Axboe
2016-10-26 22:58 ` Linus Torvalds
2016-10-26 23:03 ` Jens Axboe
2016-10-26 23:07 ` Dave Jones
2016-10-26 23:08 ` Linus Torvalds
2016-10-26 23:20 ` Jens Axboe
2016-10-26 23:38 ` Chris Mason
2016-10-26 23:47 ` Dave Jones
2016-10-27 0:00 ` Jens Axboe
2016-10-27 13:33 ` Chris Mason
2016-10-31 18:55 ` Dave Jones
2016-10-31 19:35 ` Linus Torvalds
2016-10-31 19:44 ` Chris Mason
2016-11-06 16:55 ` btrfs btree_ctree_super fault Dave Jones
2016-11-08 14:59 ` Dave Jones
2016-11-08 15:08 ` Chris Mason
2016-11-10 14:35 ` Dave Jones
2016-11-10 15:27 ` Chris Mason
2016-11-23 19:34 ` bio linked list corruption Dave Jones
2016-11-23 19:58 ` Dave Jones
2016-12-01 15:32 ` btrfs_destroy_inode warn (outstanding extents) Dave Jones
2016-12-03 16:48 ` Dave Jones
2016-12-07 16:15 ` Dave Jones
2016-12-09 21:12 ` Steven Rostedt
2016-12-04 23:04 ` bio linked list corruption Vegard Nossum
2016-12-05 11:10 ` Vegard Nossum
2016-12-05 17:09 ` Vegard Nossum
2016-12-05 17:21 ` Dave Jones
2016-12-05 17:55 ` Linus Torvalds
2016-12-05 19:11 ` Vegard Nossum
2016-12-05 20:10 ` Linus Torvalds
2016-12-05 20:35 ` Linus Torvalds
2016-12-05 21:33 ` Vegard Nossum
2016-12-06 8:42 ` Vegard Nossum
2016-12-06 8:16 ` Peter Zijlstra
2016-12-06 8:36 ` Ingo Molnar
2016-12-06 16:33 ` Linus Torvalds
2016-12-05 20:10 ` Vegard Nossum
2016-12-05 18:11 ` Andy Lutomirski
2016-12-05 18:25 ` Linus Torvalds
2016-12-05 18:26 ` Vegard Nossum
2016-10-26 23:19 ` Chris Mason
2016-10-26 23:21 ` Jens Axboe
2016-10-27 6:33 ` Christoph Hellwig
2016-10-27 16:34 ` Linus Torvalds
2016-10-27 16:36 ` Jens Axboe
2016-10-26 23:01 ` Dave Jones
2016-10-26 23:05 ` Jens Axboe
2016-10-26 22:52 ` Jens Axboe
2016-10-26 22:07 ` Linus Torvalds
2016-10-26 22:54 ` Chris Mason
2016-10-27 5:41 ` Dave Chinner
2016-10-27 17:23 ` Dave Jones
2016-10-19 17:09 ` Philipp Hahn
2016-10-19 17:43 ` Linus Torvalds
2016-10-20 6:52 ` Ingo Molnar
2016-10-20 7:17 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161020224816.tshmynpaj7ekbh6t@codemonkey.org.uk \
--to=davej@codemonkey.org.uk \
--cc=axboe@fb.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).