public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
Date: Mon, 18 Jun 2018 23:06:52 -0700	[thread overview]
Message-ID: <20180619060652.GW8128@magnolia> (raw)
In-Reply-To: <20180619052759.GH19934@dastard>

On Tue, Jun 19, 2018 at 03:27:59PM +1000, Dave Chinner wrote:
> On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 19, 2018 at 12:41:27PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > If we are punching out a delalloc extent, xfs_bunmapi() does not
> > > have a transaction context and should not ever need to convert the
> > > on-disk extent format. If such a thing is attempted (e.g. via a
> > > corrupt inode extent count in extent format) then we should abort
> > > with an EFSCORRUPTED error. Unfortunately, we don't do that and
> > > crash instead:
> > > 
> > >  XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
> > >  ==================================================================
> > >  BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
> > >  Read of size 8 at addr 0000000000000028 by task a.out/1406
> > >  CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
> > >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > >  Call Trace:
> > >   dump_stack+0x7b/0xb5
> > >   kasan_report+0x10c/0x390
> > >   __asan_load8+0x54/0x90
> > >   xfs_alloc_get_freelist+0x115/0x350
> > >   xfs_alloc_fix_freelist+0x35b/0x830
> > >   xfs_alloc_vextent+0x215/0x990
> > >   xfs_bmap_extents_to_btree+0x30d/0x940
> > > .....
> > > 
> > > By returning an error here, we avoid such crashes when punching out
> > > a delalloc page because we don't try to fix up an AG freelist
> > > without a transaction. Hence we get an error like so:
> > 
> > Um, isn't erroring out here leaving a dirty bomb in the in-core metadata?
> 
> Not that I can tell. We've already trashed the dirty page state by
> this point, so the page cache can safely reclaim the page and the
> delalloc range over it will never get written.  And the XFS inode
> cleanup code didn't have any issues with the way the error was
> handled, either, because the delalloc range was actually removed
> before the fork format error was triggered.
> 
> IOWs, there is no dirty, stale page state or delalloc extents
> hanging around if this error fires.

Hmmm, well I guess I'll pull this one in and look for problems.

I wonder, is there a <cough> testcase for this?  Or a fuzz-o-matic to
turn all these things into regression tests?

(Yeah, I know there won't be one for syzbot, I dug through its code and
had to reset my brain by reading mballoc.c. :P)

> > Like you say:
> > 
> > > XFS (loop0): page discard on page ffffea00040ae640, inode 0x75e5, offset 0.
> > > XFS (loop0): page discard unable to remove delalloc mapping.
> > 
> > We know the fs is corrupt, we might as well shut down now rather than
> > let this burp out later.
> 
> xfs_bunmapi() doesn't do shutdowns - the higher level code does a
> shutdown on error if it is necessary, otherwise it just propagates
> the error. In this case it has cleaned up correctly, propagates the
> error and it gets back to userspace on the next fsync, and we're
> fine to continue onwards as there was no unrecoverable error....

Fair enough.

> > I get that people don't want to touch well seasoned code, but
> > xfs_bunmapi is this big unwieldly function that's crying out for a
> > refactor.  It's 330 lines long and can be called from various contexts
> > (data/attr fork, punch delalloc, etc.)...
> >
> > ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> > with no transaction and a xfs_defer that we dump on the ground.
> 
> Yes, and yes.
> 
> > So yes, I think the patch does fix the crash, but it's kinda gross.
> 
> Yes, it is.
> 
> But OTOH, I don't want to risk a bunch of filesystem corrupting
> regressions across the entire XFS userbase just to fix a trivially
> simple crash that requires an extremely unlikely co-ordinated
> corruption of an inode data fork and an AGFL, and to simultaneously
> have ENOSPC in every other AGF in the filesystem.
> 
> Put "refactor xfs_bunmapi()" on the list of "things to do when
> there's nothing else to do"...

So in 2066 after the polar ice caps melt after the XFS LOGHAMMER attack
has finally been put down?  Ok. :)

(But no, seriously, if anyone's looking for a little refactoring +
domain knowledge enhancement of the bmapi code...)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-06-19  6:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-19  2:41 [PATCH 0/2] xfs: handle inode extent count mismatch Dave Chinner
2018-06-19  2:41 ` [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion Dave Chinner
2018-06-19  4:54   ` Darrick J. Wong
2018-06-19  5:27     ` Dave Chinner
2018-06-19  6:06       ` Darrick J. Wong [this message]
2018-06-19 23:33         ` Dave Chinner
2018-06-21 16:42           ` Darrick J. Wong
2018-06-20  7:31     ` Christoph Hellwig
2018-06-21 22:34       ` Dave Chinner
2018-06-21 22:55         ` Darrick J. Wong
2018-06-21 23:23           ` Dave Chinner
2018-06-19  2:41 ` [PATCH 2/2] xfs: More robust inode extent count validation Dave Chinner
2018-06-19  4:57   ` Darrick J. Wong
2018-06-19  5:29     ` Dave Chinner
2018-06-19  6:07       ` Darrick J. Wong
2018-06-20  7:34   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180619060652.GW8128@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox