From: Brian Foster <bfoster@redhat.com>
To: Alex Lyakas <alex@zadarastorage.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: fix double free of trans in log recovery on I/O error
Date: Sun, 31 Aug 2014 17:05:08 -0400 [thread overview]
Message-ID: <20140831210507.GA11913@bfoster.bfoster> (raw)
In-Reply-To: <AFED541EB7FA490C8E16DCB81F400B1C@alyakaslap>
On Sun, Aug 31, 2014 at 11:50:52AM +0300, Alex Lyakas wrote:
> Hi Brian, Dave,
> I tested this patch on kernel 3.16, top commit:
>
> commit 19583ca584d6f574384e17fe7613dfaeadcdc4a6
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Sun Aug 3 15:25:02 2014 -0700
>
> Linux 3.16
>
> and, yes, it appears to fix the issue.
>
Thanks. That settles that then, I think. We're reproducing different
problems on the 3.8 stable kernel vs. a recent kernel using the same
test case.
> Trouble is that our production kernel is 3.8.13, and we cannot upgrade to
> mainline kernel easily. Question is whether we can expect some patch
> suitable for our kernel, or, since our kernel is EOL and not a long-term
> one, we cannot?
>
Dave wrote a patch specifically to resolve this problem on older
kernels:
http://oss.sgi.com/archives/xfs/2014-08/msg00204.html
Brian
> Thanks for your help,
> Alex.
>
>
> -----Original Message----- From: Brian Foster
> Sent: 25 August, 2014 5:20 PM
> To: Alex Lyakas
> Cc: xfs@oss.sgi.com ; Dave Chinner
> Subject: Re: [PATCH] xfs: fix double free of trans in log recovery on I/O
> error
>
> On Sun, Aug 24, 2014 at 12:20:20PM +0300, Alex Lyakas wrote:
> >Hi Brian,
> >
> >On Thu, Aug 21, 2014 at 10:18 PM, Brian Foster <bfoster@redhat.com> wrote:
> >> XFS log recovery builds up an xlog_recover object as it passes through
> >> the log operations on the physical log. These structures are managed via
> >> a hash table and are allocated when a new transaction is encountered and
> >> freed once a commit operation for the transaction is encountered.
> >>
> >> This state machine for active transactions is implemented by a
> >> combination of xlog_do_recovery_pass(), which walks through the log
> >> buffers and xlog_recover_process_data() which processes log operations
> >> within each buffer. The latter function decides whether to allocate a
> >> new xlog_recover, add to it or commit and ultimately free it. If an
> >> error occurs at any point during the lifecycle of a particular
> >> xlog_recover, xlog_recover_process_data() frees the object and returns
> >> an error.
> >>
> >> xlog_recover_commit_trans() handles the final processing of the
> >> transaction. It submits whatever I/O is required for the transaction and
> >> frees xlog_recover object along with the transaction items it tracks. If
> >> an error occurs at the final stages of the commit operation, such as I/O
> >> failure, both xlog_recover_commit_trans() and
> >> xlog_recover_process_data() attempt to free the trans object.
> >>
> >> Modify xlog_recover_commit_trans() to only free the trans object on
> >> successful completion of the trans, including any I/O errors that might
> >> occur when recovering the log.
> >>
> >> Signed-off-by: Brian Foster <bfoster@redhat.com>
> >> ---
> >>
> >> Hi all,
> >>
> >> I found that the recent buffer I/O rework fixes didn't address the crash
> >> reproduced by the dm-flakey/log recovery test case I posted recently. I
> >> tracked the crash down to this, which allows the test to pass. This
> >> addresses the crash I saw when running the reproducer manually with the
> >> metadump that Alex posted as well.
> >>
> >> FWIW, I also went back and tested the xfs_buf_iowait() experiment in
> >> both scenarios (Alex's metadump and xfstests test) and they all
> >> reproduce the same crash for me. I think that either I'm still not
> >> reproducing the original problem, something else might have contaminated
> >> the original xfs_buf_iowait() test to give a false positive, or
> >> something else entirely is going on.
> >>
> >> Alex,
> >>
> >> If you have a chance, I think it might be interesting to see whether you
> >> reproduce any problems with this patch. It looks like this is a
> >> regression introduced by:
> >>
> >> 2a84108f xfs: free the list of recovery items on error
> >>
> >> ... but I have no idea if that's in whatever kernel you're running.
> >I am running kernel 3.8.13 with some changes (published at
> >https://github.com/zadarastorage/zadara-xfs-pushback), but this
> >problem also happens on pristine 3.8.13 from
> >git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git,
> >branch linux-stable-3.8.y.
> >
> >I do not have commit 2a84108f in this kernel. It was introduced in 3.14.
> >I applied your patch to 3.8.13, but it doesn't fix the issue. Same
> >problem happens when testing scenario that I described in
> >http://oss.sgi.com/pipermail/xfs/2014-August/037637.html.
> >
>
> Ok, thanks. Yeah, I don't see the double free regression in the 3.8.13
> stable branch. I went back to that kernel to try and confirm some
> things. I do reproduce the problem with your metadump as well as the
> test case I put together. I tested Dave's buf hold across sync I/O patch
> and that does indeed prevent the problem.
>
> For whatever reason, neither the test case nor your metadump reproduce
> the same problem on latest kernels. Instead, they reproduce this double
> free regression. I suspect this is what you ran into when you reproduced
> on a more recent kernel. If you'd like, feel free to try and verify that
> by running your reproducer again on a recent kernel with this patch and
> see if you can still reproduce a crash as with the 3.8.13 kernel.
>
> Brian
>
> >Thanks,
> >Alex.
> >
> >>
> >> Brian
> >>
> >> fs/xfs/xfs_log_recover.c | 11 ++++++++---
> >> 1 file changed, 8 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> >> index 176c4b3..daca9a6 100644
> >> --- a/fs/xfs/xfs_log_recover.c
> >> +++ b/fs/xfs/xfs_log_recover.c
> >> @@ -3528,10 +3528,15 @@ out:
> >> if (!list_empty(&done_list))
> >> list_splice_init(&done_list, &trans->r_itemq);
> >>
> >> - xlog_recover_free_trans(trans);
> >> -
> >> error2 = xfs_buf_delwri_submit(&buffer_list);
> >> - return error ? error : error2;
> >> +
> >> + if (!error)
> >> + error = error2;
> >> + /* caller frees trans on error */
> >> + if (!error)
> >> + xlog_recover_free_trans(trans);
> >> +
> >> + return error;
> >> }
> >>
> >> STATIC int
> >> --
> >> 1.8.3.1
> >>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-31 21:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-21 19:18 [PATCH] xfs: fix double free of trans in log recovery on I/O error Brian Foster
2014-08-24 9:20 ` Alex Lyakas
2014-08-24 9:21 ` Alex Lyakas
2014-08-25 14:20 ` Brian Foster
2014-08-31 8:50 ` Alex Lyakas
2014-08-31 21:05 ` Brian Foster [this message]
2014-09-02 9:51 ` Alex Lyakas
2014-09-02 12:36 ` Brian Foster
2014-09-02 22:02 ` Dave Chinner
2014-09-16 16:01 ` Alex Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140831210507.GA11913@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=alex@zadarastorage.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox