Re: Infinite loop with DAX PMD faults

From: "Kani, Toshimitsu" <toshi.kani@hpe.com>
To: "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"jack@suse.cz" <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: Infinite loop with DAX PMD faults
Date: Fri, 28 Oct 2016 13:51:30 +0000	[thread overview]
Message-ID: <1477662579.20881.108.camel@hpe.com> (raw)
In-Reply-To: <20161028081759.GD30952@quack2.suse.cz>

On Fri, 2016-10-28 at 10:17 +0200, Jan Kara wrote:
> On Thu 27-10-16 22:13:00, Ross Zwisler wrote:
> > 
> > On Thu, Oct 27, 2016 at 09:48:41PM +0000, Kani, Toshimitsu wrote:
> > > 
> > > On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> > > > 
> > > > On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > > > > 
> > > > > 
> > > > > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz>
> > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > > > > series, I've come across the following issue with
> > > > > > generic/344 test from xfstests. The test ends in an
> > > > > > infinite fault loop when we fault index 0 over and over
> > > > > > again never finishing the fault. The problem is that we do
> > > > > > a write fault for index 0 when there is PMD for that index.
> > > > > > So we enter wp_huge_pmd(). For whatever reason that returns
> > > > > > VM_FAULT_FALLBACK so we continue to handle_pte_fault().
> > > > > > There we do
> > > > > > 
> > > > > >         if (pmd_trans_unstable(vmf->pmd) ||
> > > > > > pmd_devmap(*vmf-
> > > > > > > 
> > > > > > > pmd))
> > > > > > 
> > > > > > check which is true - the PMD we have is pmd_trans_huge() -
> > > > > > so we 'return 0' and that results in retrying the fault and
> > > > > > all happens from the beginning again.
> > > > > > 
> > > > > > It isn't quite obvious how to break that cycle to me. The
> > > > > > comment before pmd_none_or_trans_huge_or_clear_bad() goes
> > > > > > to great lengths explaining possible races when PMD is
> > > > > > pmd_trans_huge() so it needs careful evaluation what needs
> > > > > > to be done for DAX. Ross, any idea?
> > > > > 
> > > > > Can you bisect it with CONFIG_BROKEN removed from older
> > > > > kernels?
> > > > > 
> > > > > I remember tracking down something like this when initially
> > > > > doing the pmd support.  It ended up being a missed
> > > > > pmd_devmap() check in the fault path, so it may not be the
> > > > > same issue.  It would at least be interesting to see if 4.6
> > > > > fails in a similar manner with this test and FS_DAX_PMD
> > > > > enabled.
> > > > 
> > > > I've been able to reproduce this with my v4.9-rc2 branch, but
> > > > it doesn't reproduce with the old v4.6 kernel.
> > > 
> > > Not sure if it's relevant, but as FYI I fixed a similar issue
> > > before.
> > > 
> > > commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
> > > dax: Split pmd map when fallback on COW
> > > 
> > > -Toshi
> > 
> > Thanks!  Applying a similar patch solves this
> > deadlock.  Unfortunately I don't (yet?) understand this well enough
> > to say whether this is the correct solution, but it makes
> > generic/344 + PMDs pass. :)
> > 
> > Does anyone with more mm knowledge have time to review?
> 
> I'm not really much into huge pages but AFAICT that should fix the
> problem. I'm just not sure whether in other cases when we return
> VM_FAULT_FALLBACK we don't need something similar. Probably this will
> need some experiments
> ;).

Good to know it worked. :-) I think we need to split a pmd mapping in
the case of COW fallback. The pte handler may not proceed when a pmd
mapping is still in-place.

Thanks,
-Toshi

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm