From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:58005 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753446AbcJ1IMN (ORCPT ); Fri, 28 Oct 2016 04:12:13 -0400 Date: Fri, 28 Oct 2016 10:12:05 +0200 From: Jan Kara To: Dan Williams Cc: Jan Kara , Ross Zwisler , linux-fsdevel , "linux-nvdimm@lists.01.org" Subject: Re: Infinite loop with DAX PMD faults Message-ID: <20161028081205.GC30952@quack2.suse.cz> References: <20161027190750.GA28888@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu 27-10-16 12:46:32, Dan Williams wrote: > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara wrote: > > Hello, > > > > When testing my DAX patches rebased on top of Ross' DAX PMD series, I've > > come across the following issue with generic/344 test from xfstests. The > > test ends in an infinite fault loop when we fault index 0 over and over > > again never finishing the fault. The problem is that we do a write fault > > for index 0 when there is PMD for that index. So we enter wp_huge_pmd(). > > For whatever reason that returns VM_FAULT_FALLBACK so we continue to > > handle_pte_fault(). There we do > > > > if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) > > > > check which is true - the PMD we have is pmd_trans_huge() - so we 'return > > 0' and that results in retrying the fault and all happens from the > > beginning again. > > > > It isn't quite obvious how to break that cycle to me. The comment before > > pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining > > possible races when PMD is pmd_trans_huge() so it needs careful evaluation > > what needs to be done for DAX. Ross, any idea? > > Can you bisect it with CONFIG_BROKEN removed from older kernels? I can try (but likely won't get to it before Kernel Summit, not sure if I'll have time for that there). > I remember tracking down something like this when initially doing the > pmd support. It ended up being a missed pmd_devmap() check in the > fault path, so it may not be the same issue. It would at least be > interesting to see if 4.6 fails in a similar manner with this test and > FS_DAX_PMD enabled. BTW, the results of checks for the PMD are: pmd_devmap(*vmf->pmd) == 0 pmd_trans_huge(*vmf->pmd) == 1 pmd_bad(*vmf->pmd) == 1 I'll see if I can get any meaningful test running based on 4.6... Honza -- Jan Kara SUSE Labs, CR