Re: xfs: garbage file data inclusion bug under memory pressure

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	linux-xfs@vger.kernel.org
Subject: Re: xfs: garbage file data inclusion bug under memory pressure
Date: Tue, 30 Jul 2019 07:56:57 +1000	[thread overview]
Message-ID: <20190729215657.GI7777@dread.disaster.area> (raw)
In-Reply-To: <20190729112335.GA23942@bfoster>

On Mon, Jul 29, 2019 at 07:23:35AM -0400, Brian Foster wrote:
> On Mon, Jul 29, 2019 at 12:50:11PM +0900, Tetsuo Handa wrote:
> > Dave Chinner wrote:
> > > > > But I have to ask: what is causing the IO to fail? OOM conditions
> > > > > should not cause writeback errors - XFS will retry memory
> > > > > allocations until they succeed, and the block layer is supposed to
> > > > > be resilient against memory shortages, too. Hence I'd be interested
> > > > > to know what is actually failing here...
> > > > 
> > > > Yeah. It is strange that this problem occurs when close-to-OOM.
> > > > But no failure messages at all (except OOM killer messages and writeback
> > > > error messages).
> > > 
> > > Perhaps using things like trace_kmalloc and friends to isolate the
> > > location of memory allocation failures would help....
> > > 
> > 
> > I checked using below diff, and confirmed that XFS writeback failure is triggered by ENOMEM.
> > 
> > When fsync() is called, xfs_submit_ioend() is called. xfs_submit_ioend() invokes
> > xfs_setfilesize_trans_alloc(), but xfs_trans_alloc() fails with -ENOMEM because
> > xfs_log_reserve() from xfs_trans_reserve() fails with -ENOMEM because
> > xlog_ticket_alloc() is using KM_SLEEP | KM_MAYFAIL which is mapped to
> > GFP_NOFS|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP which will fail under close-to-OOM.
> > 
> > As a result, bio_endio() is immediately called due to -ENOMEM, and
> > xfs_destroy_ioend() from xfs_end_bio() from bio_endio() is printing
> > writeback error message due to -ENOMEM error.
> > (By the way, why not to print error code when printing writeback error message?)
> > 
> > ----------------------------------------
> 
> Ah, that makes sense. Thanks for tracking that down Tetsuo. For context,
> it looks like that flag goes back to commit eb01c9cd87 ("[XFS] Remove
> the xlog_ticket allocator") that replaces some old internal ticket
> allocation mechanism (that I'm not familiar with) with a standard kmem
> cache.
> 
> ISTM we can just remove that KM_MAYFAIL from ticket allocation. We're
> already in NOFS context in this particular caller (writeback), though
> that's probably not the case for most other transaction allocations. If
> we had a reason to get more elaborate, I suppose we could conditionalize
> use of the KM_MAYFAIL flag and/or lift bits of ticket allocation to
> earlier in xfs_trans_alloc(), but it's not clear to me that's necessary.
> Dave?

That's a long time ago, and it predates the pre-allocation of
transactions for file size updates in IO submission. The log ticket
rework is irrelevant - it was just an open-coded slab allocator - it
was the fact it handled allocation failure that mattered. That was
done at the time because we were slowly reducing the number of
blocking allocations at the time - trying to reduce the reliance on
looping until allocation succeeds - so MAYFAIL was used for quite a
lot of new allocations at the time.

This is perfectly fine for transactions in syscall context - if we
don't have memory available for the log ticket, we may as well give
up now before we really start creating memory demand and getting
into a state where we are half way through a transaction and
completely out of memory and can't go forwards or backwards.

The trans alloc/trans reserve/log reserve code was somewhat
different back then, as was the writeback code. I suspect it dates
back to when we had trylock semantics in writeback and so memory
allocation errors like this would have simply redirtied the page and
it was tried again later. Hence, historically, I don't think this
was an issue, either.

Hence the code has morphed so much since then I don't think we can
"blame" this commit for introducing this problem. I looks more like
we have removed all the protection it had as we've simplified the
writeback and transaction allocation/reservation code over time, and
now it's exposed directly in writeback.

----

As for how to fix it, I'd just remove KM_MAYFAIL. We've just done a
transaction allocation with just KM_SLEEP, so we may as well do the
same for the log ticket....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2019-07-29 21:58 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-25 10:06 xfs: garbage file data inclusion bug under memory pressure Tetsuo Handa
2019-07-25 10:53 ` Brian Foster
2019-07-25 12:30   ` Tetsuo Handa
2019-07-25 16:00     ` Brian Foster
2019-07-25 11:32 ` Dave Chinner
2019-07-25 12:44   ` Tetsuo Handa
2019-07-25 17:28     ` Darrick J. Wong
2019-07-25 22:07     ` Dave Chinner
2019-07-29  3:50       ` Tetsuo Handa
2019-07-29 11:23         ` Brian Foster
2019-07-29 21:56           ` Dave Chinner [this message]
2019-07-30 11:30             ` Brian Foster
2019-08-01 10:06             ` [PATCH] fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve() Tetsuo Handa
2019-08-01 10:56               ` Brian Foster
2019-08-01 11:00                 ` Tetsuo Handa
2019-08-01 18:50               ` Luis Chamberlain
2019-08-01 20:46                 ` Darrick J. Wong
2019-08-02 22:21                   ` Luis Chamberlain
2019-08-12 10:57                     ` Tetsuo Handa
2019-08-12 19:55                       ` Darrick J. Wong
2019-08-01 21:13                 ` Tetsuo Handa
2019-08-01 21:55                   ` Dave Chinner
2019-08-01 20:46               ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190729215657.GI7777@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).