Re: [PATCH] xfs: don't change to infinate lock to avoid dead lock

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Wengang Wang <wen.gang.wang@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: don't change to infinate lock to avoid dead lock
Date: Sat, 25 Apr 2020 07:37:29 +1000	[thread overview]
Message-ID: <20200424213729.GC2040@dread.disaster.area> (raw)
In-Reply-To: <676ecd15-d8ea-0e18-6075-3cb11f8c2e15@oracle.com>

On Fri, Apr 24, 2020 at 09:58:09AM -0700, Wengang Wang wrote:
> On 4/23/20 6:39 PM, Dave Chinner wrote:
> > On Thu, Apr 23, 2020 at 04:19:52PM -0700, Wengang Wang wrote:
> > > On 4/23/20 4:14 PM, Wengang Wang wrote:
> > > > The real case I hit is that the process A is waiting for inode unpin on
> > > > XFS A which is a loop device backed mount.
> > > And actually, there is a dm-thin on top of the loop device..
> > Makes no difference, really, because it's still the loop device
> > that is doing the IO to the underlying filesystem...
> I mentioned IO path here, not the IO its self.  In this case, the IO patch
> includes dm-thin.
> 
> We have to consider it as long as we are not sure if there is GPF_KERNEL (or
> any flags without NOFS, NOIO) allocation happens in dm-thin.
> 
> If dm-thin has GPF_KERNEL allocation and goes into memory direct reclaiming,
> the deadlock forms.

If that happens, then that is a bug in dm-thin, not a bug in XFS.
There are rules to how memory allocation must be done to avoid
deadlocks, and one of those is that block device level IO path
allocations *must* use GFP_NOIO. This prevents reclaim from
recursing into subsystems that might require IO to reclaim memory
and hence self deadlock because the IO layer requires allocation to
succeed to make forwards progress.

That's why we have mempools and GFP_NOIO at the block and device
layers....

> > > > And the backing file is from a different (X)FS B mount. So the IO is
> > > > going through loop device, (direct) writes to (X)FS B.
> > > > 
> > > > The (direct) writes to (X)FS B do memory allocations and then memory
> > > > direct reclaims...
> > THe loop device issues IO to the lower filesystem in
> > memalloc_noio_save() context, which means all memory allocations in
> > it's IO path are done with GFP_NOIO context. Hence those allocations
> > will not recurse into reclaim on -any filesystem- and hence will not
> > deadlock on filesystem reclaim. So what I said originally is correct
> > even when we take filesystems stacked via loop devices into account.
> You are right here. Seems loop device is doing NOFS|NOIO allocations.
> 
> The deadlock happened with a bit lower kernel version which is without loop
> device patch that does NOFS|NOIO allocation.

Right, the loop device used to have an allocation context bug, but
that has been fixed. Either way, this is not an XFS or even a
filesystem layer issue.

> Well, here you are only talking about loop device, it's not enough to say
> it's also safe in case the memory reclaiming happens at higher layer above
> loop device in the IO path.

Yes it is.

Block devices and device drivers are *required* to use GFP_NOIO
context for memory allocations in the IO path. IOWs, any block
device that is doing GFP_KERNEL context allocation violates the
memory allocation rules we have for the IO path.  This architectural
constraint exists exclusively to avoid this entire class of IO-based
memory reclaim recursion deadlocks.

> > Hence I'll ask again: do you have stack traces of the deadlock or a
> > lockdep report? If not, can you please describe the storage setup
> > from top to bottom and lay out exactly where in what layers trigger
> > this deadlock?
> 
> Sharing the callback traces:

<snip>

Yeah, so the loop device is doing GFP_KERNEL allocation in a
GFP_NOIO context. You need to fix the loop device in whatever kernel
you are testing, which you have conveniently never mentioned. I'm
betting this is a vendor kernel that is missing fixes from the
upstream kernel. In which case you need to talk to your OS vendor,
not upstream...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2020-04-24 21:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 17:23 [PATCH] xfs: don't change to infinate lock to avoid dead lock Wengang Wang
2020-04-23 23:05 ` Dave Chinner
2020-04-23 23:14   ` Wengang Wang
2020-04-23 23:19     ` Wengang Wang
2020-04-24  1:39       ` Dave Chinner
2020-04-24 16:58         ` Wengang Wang
2020-04-24 21:37           ` Dave Chinner [this message]
2020-04-24 21:45             ` Wengang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200424213729.GC2040@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox