linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eryu Guan <eguan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Subject: Re: [v4.12-rc1 regression] nfs server crashed in fstests run
Date: Wed, 28 Jun 2017 10:58:42 +0800	[thread overview]
Message-ID: <20170628025842.GZ23360@eguan.usersys.redhat.com> (raw)
In-Reply-To: <20170626123949.GP17542@dastard>

On Mon, Jun 26, 2017 at 10:39:50PM +1000, Dave Chinner wrote:
> On Fri, Jun 23, 2017 at 09:51:56AM +0200, Michal Hocko wrote:
> > On Fri 23-06-17 09:43:34, Michal Hocko wrote:
> > > [Let's add Jack and keep the full email for reference]
> > > 
> > > On Fri 23-06-17 15:26:56, Eryu Guan wrote:
> > [...]
> > > > Then I did further confirmation tests:
> > > > 1. switch to a new branch with that jbd2 patch as HEAD and compile
> > > > kernel, run test with both ext4 and XFS exported on this newly compiled
> > > > kernel, it crashed within 5 iterations.
> > > > 
> > > > 2. revert that jbd2 patch (when it was HEAD), run test with both ext4
> > > > and XFS exported, kernel survived 20 iterations of full fstests run.
> > > > 
> > > > 3. kernel from step 1 survived 20 iterations of full fstests run, if I
> > > > export XFS only (create XFS on /dev/sda4 and mount it at /export/test).
> > > > 
> > > > 4. 4.12-rc1 kernel survived the same test if I export ext4 only (both
> > > > /export/test and /export/scratch were mounted as ext4, and this was done
> > > > on another test host because I don't have another spare test partition)
> > > > 
> > > > 
> > > > All these facts seem to confirm that commit 81378da64de6 really is the
> > > > culprit, I just don't see how..
> > 
> > AFAIR, no follow up patches to remove GFP_NOFS have been merged into
> > ext4 so we are currently only with 81378da64de6 and all it does is that
> > _all_ allocations from the transaction context are implicitly GFP_NOFS.
> > I can imagine that if there is a GFP_KERNEL allocation in this context
> > (which would be incorrect AFAIU) some shrinkers will not be called as a
> > result and that might lead to an observable behavior change. But this
> > sounds like a wild speculation. The mere fact that xfs oopses and there
> > is no ext code in the backtrace is suspicious on its own. Does this oops
> > sound familiar to xfs guys?
> 
> Nope, but if it's in write_cache_pages() then it's not actually
> crashing in XFS code, but in generic page cache and radix tree
> traversal code. Which means objects that are allocated from slabs
> and pools that are shared by both XFS and ext4.
> 
> We've had problems in the past where use after free of bufferheads
> in reiserfs was discovered by corruption of bufferheads in XFS code,
> so maybe there's a similar issue being exposed by the ext4
> GFP_NOFS changes? i.e. try debugging this by treating it as memory
> corruption until we know more...
> 
> > > > > [88901.418500]  write_cache_pages+0x26f/0x510
> 
> Knowing what line of code is failing would help identify what object
> is problematic....

This was what I replied to Darrick when he first asked for the same
information:

"
I managed to reproduce again with 4.12-rc4 kernel, call trace is

[  704.811107] Call Trace:
[  704.811107]  do_trap+0x16a/0x190
[  704.811107]  do_error_trap+0x89/0x110
[  704.811107]  ? xfs_do_writepage+0x6c7/0x6d0 [xfs]
[  704.811107]  ? check_preempt_curr+0x7d/0x90
[  704.811107]  ? ttwu_do_wakeup+0x1e/0x150
[  704.811107]  do_invalid_op+0x20/0x30
[  704.811107]  invalid_op+0x1e/0x30

and xfs_do_writepage+0x6c7 is

(gdb) l *(xfs_do_writepage+0x6c7)
0x679e7 is in xfs_do_writepage (fs/xfs/xfs_aops.c:850).
845             int                     error = 0;
846             int                     count = 0;
847             int                     uptodate = 1;
848             unsigned int            new_type;
849
850             bh = head = page_buffers(page);
851             offset = page_offset(page);
852             do {
853                     if (offset >= end_offset)
854                             break;
"

Later on, I did the same several times, and it ended up in different
lines of the code, I can't remember the exact line number now, but it
always involved in dealing with buffer heads.

Thanks,
Eryu

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-06-28  2:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170602060457.GG23805@eguan.usersys.redhat.com>
     [not found] ` <20170602060457.GG23805-+7p9VZFSOIEFmhoHi+V13ACJwEvxM/w9@public.gmane.org>
2017-06-23  7:26   ` [v4.12-rc1 regression] nfs server crashed in fstests run Eryu Guan
     [not found]     ` <20170623072656.GI23360-+7p9VZFSOIEFmhoHi+V13ACJwEvxM/w9@public.gmane.org>
2017-06-23  7:43       ` Michal Hocko
     [not found]         ` <20170623074334.GE5308-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-06-23  7:51           ` Michal Hocko
     [not found]             ` <20170623075156.GF5308-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-06-23  8:12               ` Eryu Guan
2017-06-26 12:39               ` Dave Chinner
2017-06-27 13:01                 ` Michal Hocko
2017-06-28  2:58                 ` Eryu Guan [this message]
2017-06-28  3:04     ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170628025842.GZ23360@eguan.usersys.redhat.com \
    --to=eguan-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=tytso-3s7WtUTddSA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).