Re: [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, Eryu Guan <eguan@redhat.com>
Subject: Re: [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock
Date: Sat, 18 Feb 2017 10:12:15 +1100	[thread overview]
Message-ID: <20170217231215.GD15349@dastard> (raw)
In-Reply-To: <20170217183009.GB20429@bfoster.bfoster>

On Fri, Feb 17, 2017 at 01:30:09PM -0500, Brian Foster wrote:
> On Fri, Feb 17, 2017 at 09:37:05AM +1100, Dave Chinner wrote:
> > On Wed, Feb 15, 2017 at 10:40:43AM -0500, Brian Foster wrote:
> > > Reclaim during quotacheck can lead to deadlocks on the dquot flush lock:
> > > 
> > >  - Quotacheck populates a local delwri queue with the physical dquot
> > >    buffers.
> > >  - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and dirties
> > >    all of the dquots.
> > >  - Reclaim kicks in and attempts to flush a dquot whose buffer is
> > >    already queud on the quotacheck queue. The flush succeeds but
> > >    queueing to the reclaim delwri queue fails as the backing buffer is
> > >    already queued. The flush unlock is now deferred to I/O completion of
> > >    the buffer from the quotacheck queue.
> > >  - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires the
> > >    flush lock to update the backing buffers with the in-core
> > >    recalculated values. This deadlocks as the flush lock was acquired by
> > >    reclaim but the buffer never submitted for I/O as it already resided
> > >    on the quotacheck queue.
> > > 
> > > This is reproduced by running quotacheck on a filesystem with a couple
> > > million inodes in low memory (512MB-1GB) situations.
> > > 
> > > Quotacheck first resets and collects the physical dquot buffers in a
> > > delwri queue. Then, it traverses the filesystem inodes via bulkstat,
> > > updates the in-core dquots, flushes the corrected dquots to the backing
> > > buffers and finally submits the delwri queue for I/O. Since the backing
> > > buffers are queued across the entire quotacheck operation, dquot reclaim
> > > cannot possibly complete a dquot flush before quotacheck completes.
> > > Therefore, dquot reclaim provides no real value during quotacheck.
> > 
> > Which is an OOM vector on systems with lots of dquots and low memory
> > at mount time. Turning off dquot reclaim doesn't change that.
> > 
> 
> It's not intended to. The inefficient design of quotacheck wrt to memory
> usage is a separate problem. AFAICT, that problem goes back as far as my
> git tree does (2.6.xx).
> 
> Looking back through the git history, this deadlock appears to be a
> regression as of commit 43ff2122e6 ("xfs: on-stack delayed write buffer
> lists") in 2012, which replaced a trylock/push sequence from the
> quotacheck flush walk with a synchronous flush lock.

Right - that's the commit that changed the code to what it is now.
That's what I implied needs fixing....

> > Address the root cause - the buffer list is never flushed and so
> > pins the memory quotacheck requires, so we need to flush the buffer
> > list more often.  We also need to submit the buffer list before the
> > flush walks begin, thereby unlocking all the dquots before doing the
> > flush walk and avoiding the deadlock.
> > 
> 
> I acknowledge that this patch doesn't address the root problem, but
> neither does this proposal. The root cause is not that quotacheck uses
> too much memory, it's that the serialization between quotacheck and
> dquot reclaim is bogus.
> 
> For example, if we drop the buffer list being held across the adjust and
> flush walks, we do indeed unpin the buffers and dquots for memory
> reclaim, particularly during the dqadjust bulkstat. The flush walk then
> still has to cycle the flush lock of every dquot to guarantee we wait on
> dquots being flushed by reclaim (otherwise we risk corruption with the
> xfs_qm_flush_one() logic is it is today). The same deadlock vector still
> exists, albeit with a smaller window, if reclaim happens to flush a
> dquot that is backed by a buffer that has already been queued by the
> flush walk. AFAICT, the only requirements for this to occur are a couple
> of dquots on a single backing buffer and some memory pressure from
> somewhere, so the local memory usage is irrelevant.

Except that the window is *tiny* because the dquots are walked in
order and so all the dquots are processed in ascending order. This
means all the dquots in a buffer are flushed sequentially, and each
dquot lookup resets the reclaim reference count count on the dquot
buffer. Hence it is extremely unlikely that a dquot buffer will be
reclaimed while it is partially flushed during the flush walk.

> I'm specifically trying to address the deadlock problem here. If we
> don't want to bypass reclaim, the other options I see are to rework the
> logic on the reclaim side to check the buffer state before we flush the
> dquot, or push on the buffer from the quotacheck side like we used to
> before the commit above.

The latter is effectively what flushing the buffer list before the
flush walk is run does.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2017-02-17 23:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-15 15:40 [PATCH 0/5] xfs: quota deadlock fixes Brian Foster
2017-02-15 15:40 ` [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock Brian Foster
2017-02-16 22:37   ` Dave Chinner
2017-02-17 18:30     ` Brian Foster
2017-02-17 23:12       ` Dave Chinner [this message]
2017-02-18 12:55         ` Brian Foster
2017-02-15 15:40 ` [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock Brian Foster
2017-04-26 21:23   ` Darrick J. Wong
2017-04-27 12:03     ` Brian Foster
2017-04-27 15:47       ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 3/5] xfs: support ability to wait on new inodes Brian Foster
2017-04-27 21:15   ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 4/5] xfs: update ag iterator to support " Brian Foster
2017-04-27 21:17   ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 5/5] xfs: wait on new inodes during quotaoff dquot release Brian Foster
2017-04-27 21:17   ` Darrick J. Wong
2017-02-16  7:42 ` [PATCH 0/5] xfs: quota deadlock fixes Eryu Guan
2017-02-16 12:01   ` Brian Foster
2017-02-17  6:53 ` Eryu Guan
2017-02-17 17:54   ` Brian Foster
2017-02-20  3:52     ` Eryu Guan
2017-02-20 13:25       ` Brian Foster
2017-02-22 15:35         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170217231215.GD15349@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).