linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* Infinite loop with DAX PMD faults
@ 2016-10-27 19:07 Jan Kara
  2016-10-27 19:46 ` Dan Williams
  2016-10-27 19:54 ` Ross Zwisler
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Kara @ 2016-10-27 19:07 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: linux-fsdevel, linux-nvdimm

Hello,

When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
come across the following issue with generic/344 test from xfstests. The
test ends in an infinite fault loop when we fault index 0 over and over
again never finishing the fault. The problem is that we do a write fault
for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
For whatever reason that returns VM_FAULT_FALLBACK so we continue to
handle_pte_fault(). There we do

	if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))

check which is true - the PMD we have is pmd_trans_huge() - so we 'return
0' and that results in retrying the fault and all happens from the
beginning again.

It isn't quite obvious how to break that cycle to me. The comment before
pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
possible races when PMD is pmd_trans_huge() so it needs careful evaluation
what needs to be done for DAX. Ross, any idea?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 19:07 Infinite loop with DAX PMD faults Jan Kara
@ 2016-10-27 19:46 ` Dan Williams
  2016-10-27 21:03   ` Ross Zwisler
  2016-10-28  8:12   ` Jan Kara
  2016-10-27 19:54 ` Ross Zwisler
  1 sibling, 2 replies; 11+ messages in thread
From: Dan Williams @ 2016-10-27 19:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-nvdimm@lists.01.org

On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
> come across the following issue with generic/344 test from xfstests. The
> test ends in an infinite fault loop when we fault index 0 over and over
> again never finishing the fault. The problem is that we do a write fault
> for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
> For whatever reason that returns VM_FAULT_FALLBACK so we continue to
> handle_pte_fault(). There we do
>
>         if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
>
> check which is true - the PMD we have is pmd_trans_huge() - so we 'return
> 0' and that results in retrying the fault and all happens from the
> beginning again.
>
> It isn't quite obvious how to break that cycle to me. The comment before
> pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
> possible races when PMD is pmd_trans_huge() so it needs careful evaluation
> what needs to be done for DAX. Ross, any idea?

Can you bisect it with CONFIG_BROKEN removed from older kernels?

I remember tracking down something like this when initially doing the
pmd support.  It ended up being a missed pmd_devmap() check in the
fault path, so it may not be the same issue.  It would at least be
interesting to see if 4.6 fails in a similar manner with this test and
FS_DAX_PMD enabled.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 19:07 Infinite loop with DAX PMD faults Jan Kara
  2016-10-27 19:46 ` Dan Williams
@ 2016-10-27 19:54 ` Ross Zwisler
  2016-10-28  8:02   ` Jan Kara
  1 sibling, 1 reply; 11+ messages in thread
From: Ross Zwisler @ 2016-10-27 19:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-nvdimm

On Thu, Oct 27, 2016 at 09:07:50PM +0200, Jan Kara wrote:
> Hello,
> 
> When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
> come across the following issue with generic/344 test from xfstests. The
> test ends in an infinite fault loop when we fault index 0 over and over
> again never finishing the fault. The problem is that we do a write fault
> for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
> For whatever reason that returns VM_FAULT_FALLBACK so we continue to
> handle_pte_fault(). There we do
> 
> 	if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
> 
> check which is true - the PMD we have is pmd_trans_huge() - so we 'return
> 0' and that results in retrying the fault and all happens from the
> beginning again.
> 
> It isn't quite obvious how to break that cycle to me. The comment before
> pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
> possible races when PMD is pmd_trans_huge() so it needs careful evaluation
> what needs to be done for DAX. Ross, any idea?

I'll try & reproduce this, and I'll get back to you.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 19:46 ` Dan Williams
@ 2016-10-27 21:03   ` Ross Zwisler
  2016-10-27 21:48     ` Kani, Toshimitsu
  2016-10-28  8:12   ` Jan Kara
  1 sibling, 1 reply; 11+ messages in thread
From: Ross Zwisler @ 2016-10-27 21:03 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-fsdevel, Jan Kara, linux-nvdimm@lists.01.org

On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
> > come across the following issue with generic/344 test from xfstests. The
> > test ends in an infinite fault loop when we fault index 0 over and over
> > again never finishing the fault. The problem is that we do a write fault
> > for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
> > For whatever reason that returns VM_FAULT_FALLBACK so we continue to
> > handle_pte_fault(). There we do
> >
> >         if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
> >
> > check which is true - the PMD we have is pmd_trans_huge() - so we 'return
> > 0' and that results in retrying the fault and all happens from the
> > beginning again.
> >
> > It isn't quite obvious how to break that cycle to me. The comment before
> > pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
> > possible races when PMD is pmd_trans_huge() so it needs careful evaluation
> > what needs to be done for DAX. Ross, any idea?
> 
> Can you bisect it with CONFIG_BROKEN removed from older kernels?
> 
> I remember tracking down something like this when initially doing the
> pmd support.  It ended up being a missed pmd_devmap() check in the
> fault path, so it may not be the same issue.  It would at least be
> interesting to see if 4.6 fails in a similar manner with this test and
> FS_DAX_PMD enabled.

I've been able to reproduce this with my v4.9-rc2 branch, but it doesn't
reproduce with the old v4.6 kernel.

My guess is that this might be because in the old v4.6 kernel, PMD faults
don't actually work most of the time because most users don't pass an 2MiB
aligned address to mmap.  This was fixed by Toshi's patches:

dbe6ec8 ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings
74d2fad thp, dax: add thp_get_unmapped_area for pmd mappings

Anyway, I'm off to try and understand this failure more deeply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 21:03   ` Ross Zwisler
@ 2016-10-27 21:48     ` Kani, Toshimitsu
  2016-10-28  4:13       ` Ross Zwisler
  0 siblings, 1 reply; 11+ messages in thread
From: Kani, Toshimitsu @ 2016-10-27 21:48 UTC (permalink / raw)
  To: dan.j.williams@intel.com, ross.zwisler@linux.intel.com
  Cc: linux-fsdevel@vger.kernel.org, jack@suse.cz,
	linux-nvdimm@lists.01.org

On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > 
> > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz> wrote:
> > > 
> > > Hello,
> > > 
> > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > series, I've come across the following issue with generic/344
> > > test from xfstests. The test ends in an infinite fault loop when
> > > we fault index 0 over and over again never finishing the fault.
> > > The problem is that we do a write fault
> > > for index 0 when there is PMD for that index. So we enter
> > > wp_huge_pmd(). For whatever reason that returns VM_FAULT_FALLBACK
> > > so we continue to handle_pte_fault(). There we do
> > > 
> > >         if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf-
> > > >pmd))
> > > 
> > > check which is true - the PMD we have is pmd_trans_huge() - so we
> > > 'return 0' and that results in retrying the fault and all happens
> > > from the beginning again.
> > > 
> > > It isn't quite obvious how to break that cycle to me. The comment
> > > before pmd_none_or_trans_huge_or_clear_bad() goes to great
> > > lengths explaining possible races when PMD is pmd_trans_huge() so
> > > it needs careful evaluation what needs to be done for DAX. Ross,
> > > any idea?
> > 
> > Can you bisect it with CONFIG_BROKEN removed from older kernels?
> > 
> > I remember tracking down something like this when initially doing
> > the pmd support.  It ended up being a missed pmd_devmap() check in
> > the fault path, so it may not be the same issue.  It would at least
> > be interesting to see if 4.6 fails in a similar manner with this
> > test and FS_DAX_PMD enabled.
> 
> I've been able to reproduce this with my v4.9-rc2 branch, but it
> doesn't reproduce with the old v4.6 kernel.

Not sure if it's relevant, but as FYI I fixed a similar issue before.

commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
dax: Split pmd map when fallback on COW

-Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 21:48     ` Kani, Toshimitsu
@ 2016-10-28  4:13       ` Ross Zwisler
  2016-10-28  8:17         ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Ross Zwisler @ 2016-10-28  4:13 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-fsdevel@vger.kernel.org, jack@suse.cz,
	linux-nvdimm@lists.01.org

On Thu, Oct 27, 2016 at 09:48:41PM +0000, Kani, Toshimitsu wrote:
> On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> > On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > > 
> > > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz> wrote:
> > > > 
> > > > Hello,
> > > > 
> > > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > > series, I've come across the following issue with generic/344
> > > > test from xfstests. The test ends in an infinite fault loop when
> > > > we fault index 0 over and over again never finishing the fault.
> > > > The problem is that we do a write fault
> > > > for index 0 when there is PMD for that index. So we enter
> > > > wp_huge_pmd(). For whatever reason that returns VM_FAULT_FALLBACK
> > > > so we continue to handle_pte_fault(). There we do
> > > > 
> > > >         if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf-
> > > > >pmd))
> > > > 
> > > > check which is true - the PMD we have is pmd_trans_huge() - so we
> > > > 'return 0' and that results in retrying the fault and all happens
> > > > from the beginning again.
> > > > 
> > > > It isn't quite obvious how to break that cycle to me. The comment
> > > > before pmd_none_or_trans_huge_or_clear_bad() goes to great
> > > > lengths explaining possible races when PMD is pmd_trans_huge() so
> > > > it needs careful evaluation what needs to be done for DAX. Ross,
> > > > any idea?
> > > 
> > > Can you bisect it with CONFIG_BROKEN removed from older kernels?
> > > 
> > > I remember tracking down something like this when initially doing
> > > the pmd support.  It ended up being a missed pmd_devmap() check in
> > > the fault path, so it may not be the same issue.  It would at least
> > > be interesting to see if 4.6 fails in a similar manner with this
> > > test and FS_DAX_PMD enabled.
> > 
> > I've been able to reproduce this with my v4.9-rc2 branch, but it
> > doesn't reproduce with the old v4.6 kernel.
> 
> Not sure if it's relevant, but as FYI I fixed a similar issue before.
> 
> commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
> dax: Split pmd map when fallback on COW
> 
> -Toshi

Thanks!  Applying a similar patch solves this deadlock.  Unfortunately I don't
(yet?) understand this well enough to say whether this is the correct
solution, but it makes generic/344 + PMDs pass. :)

Does anyone with more mm knowledge have time to review?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 19:54 ` Ross Zwisler
@ 2016-10-28  8:02   ` Jan Kara
  2016-10-28 15:35     ` Ross Zwisler
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2016-10-28  8:02 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: linux-fsdevel, Jan Kara, linux-nvdimm

On Thu 27-10-16 13:54:49, Ross Zwisler wrote:
> On Thu, Oct 27, 2016 at 09:07:50PM +0200, Jan Kara wrote:
> > When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
> > come across the following issue with generic/344 test from xfstests. The
> > test ends in an infinite fault loop when we fault index 0 over and over
> > again never finishing the fault. The problem is that we do a write fault
> > for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
> > For whatever reason that returns VM_FAULT_FALLBACK so we continue to
> > handle_pte_fault(). There we do
> > 
> > 	if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
> > 
> > check which is true - the PMD we have is pmd_trans_huge() - so we 'return
> > 0' and that results in retrying the fault and all happens from the
> > beginning again.
> > 
> > It isn't quite obvious how to break that cycle to me. The comment before
> > pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
> > possible races when PMD is pmd_trans_huge() so it needs careful evaluation
> > what needs to be done for DAX. Ross, any idea?
> 
> I'll try & reproduce this, and I'll get back to you.

For me it happened with ext4 which returned VM_FAULT_FALLBACK from its
pmd_fault handler on write fault (likely we were not able to allocate
sufficiently large contiguous hunk). So I'm not sure you will be able to
easily reproduce just with your series. However tweaking XFS to return
VM_FAULT_FALLBACK when FAULT_FLAG_WRITE is set should do the trick.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-27 19:46 ` Dan Williams
  2016-10-27 21:03   ` Ross Zwisler
@ 2016-10-28  8:12   ` Jan Kara
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Kara @ 2016-10-28  8:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-fsdevel, Jan Kara, linux-nvdimm@lists.01.org

On Thu 27-10-16 12:46:32, Dan Williams wrote:
> On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
> > come across the following issue with generic/344 test from xfstests. The
> > test ends in an infinite fault loop when we fault index 0 over and over
> > again never finishing the fault. The problem is that we do a write fault
> > for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
> > For whatever reason that returns VM_FAULT_FALLBACK so we continue to
> > handle_pte_fault(). There we do
> >
> >         if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
> >
> > check which is true - the PMD we have is pmd_trans_huge() - so we 'return
> > 0' and that results in retrying the fault and all happens from the
> > beginning again.
> >
> > It isn't quite obvious how to break that cycle to me. The comment before
> > pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
> > possible races when PMD is pmd_trans_huge() so it needs careful evaluation
> > what needs to be done for DAX. Ross, any idea?
> 
> Can you bisect it with CONFIG_BROKEN removed from older kernels?

I can try (but likely won't get to it before Kernel Summit, not sure if
I'll have time for that there).

> I remember tracking down something like this when initially doing the
> pmd support.  It ended up being a missed pmd_devmap() check in the
> fault path, so it may not be the same issue.  It would at least be
> interesting to see if 4.6 fails in a similar manner with this test and
> FS_DAX_PMD enabled.

BTW, the results of checks for the PMD are:

pmd_devmap(*vmf->pmd) == 0
pmd_trans_huge(*vmf->pmd) == 1
pmd_bad(*vmf->pmd) == 1

I'll see if I can get any meaningful test running based on 4.6...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-28  4:13       ` Ross Zwisler
@ 2016-10-28  8:17         ` Jan Kara
  2016-10-28 13:51           ` Kani, Toshimitsu
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2016-10-28  8:17 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-fsdevel@vger.kernel.org, jack@suse.cz,
	linux-nvdimm@lists.01.org

On Thu 27-10-16 22:13:00, Ross Zwisler wrote:
> On Thu, Oct 27, 2016 at 09:48:41PM +0000, Kani, Toshimitsu wrote:
> > On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> > > On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > > > 
> > > > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz> wrote:
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > > > series, I've come across the following issue with generic/344
> > > > > test from xfstests. The test ends in an infinite fault loop when
> > > > > we fault index 0 over and over again never finishing the fault.
> > > > > The problem is that we do a write fault
> > > > > for index 0 when there is PMD for that index. So we enter
> > > > > wp_huge_pmd(). For whatever reason that returns VM_FAULT_FALLBACK
> > > > > so we continue to handle_pte_fault(). There we do
> > > > > 
> > > > >         if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf-
> > > > > >pmd))
> > > > > 
> > > > > check which is true - the PMD we have is pmd_trans_huge() - so we
> > > > > 'return 0' and that results in retrying the fault and all happens
> > > > > from the beginning again.
> > > > > 
> > > > > It isn't quite obvious how to break that cycle to me. The comment
> > > > > before pmd_none_or_trans_huge_or_clear_bad() goes to great
> > > > > lengths explaining possible races when PMD is pmd_trans_huge() so
> > > > > it needs careful evaluation what needs to be done for DAX. Ross,
> > > > > any idea?
> > > > 
> > > > Can you bisect it with CONFIG_BROKEN removed from older kernels?
> > > > 
> > > > I remember tracking down something like this when initially doing
> > > > the pmd support.  It ended up being a missed pmd_devmap() check in
> > > > the fault path, so it may not be the same issue.  It would at least
> > > > be interesting to see if 4.6 fails in a similar manner with this
> > > > test and FS_DAX_PMD enabled.
> > > 
> > > I've been able to reproduce this with my v4.9-rc2 branch, but it
> > > doesn't reproduce with the old v4.6 kernel.
> > 
> > Not sure if it's relevant, but as FYI I fixed a similar issue before.
> > 
> > commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
> > dax: Split pmd map when fallback on COW
> > 
> > -Toshi
> 
> Thanks!  Applying a similar patch solves this deadlock.  Unfortunately I don't
> (yet?) understand this well enough to say whether this is the correct
> solution, but it makes generic/344 + PMDs pass. :)
> 
> Does anyone with more mm knowledge have time to review?

I'm not really much into huge pages but AFAICT that should fix the problem.
I'm just not sure whether in other cases when we return VM_FAULT_FALLBACK
we don't need something similar. Probably this will need some experiments
;).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-28  8:17         ` Jan Kara
@ 2016-10-28 13:51           ` Kani, Toshimitsu
  0 siblings, 0 replies; 11+ messages in thread
From: Kani, Toshimitsu @ 2016-10-28 13:51 UTC (permalink / raw)
  To: ross.zwisler@linux.intel.com, jack@suse.cz
  Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org

On Fri, 2016-10-28 at 10:17 +0200, Jan Kara wrote:
> On Thu 27-10-16 22:13:00, Ross Zwisler wrote:
> > 
> > On Thu, Oct 27, 2016 at 09:48:41PM +0000, Kani, Toshimitsu wrote:
> > > 
> > > On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> > > > 
> > > > On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > > > > 
> > > > > 
> > > > > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz>
> > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > > > > series, I've come across the following issue with
> > > > > > generic/344 test from xfstests. The test ends in an
> > > > > > infinite fault loop when we fault index 0 over and over
> > > > > > again never finishing the fault. The problem is that we do
> > > > > > a write fault for index 0 when there is PMD for that index.
> > > > > > So we enter wp_huge_pmd(). For whatever reason that returns
> > > > > > VM_FAULT_FALLBACK so we continue to handle_pte_fault().
> > > > > > There we do
> > > > > > 
> > > > > >         if (pmd_trans_unstable(vmf->pmd) ||
> > > > > > pmd_devmap(*vmf-
> > > > > > > 
> > > > > > > pmd))
> > > > > > 
> > > > > > check which is true - the PMD we have is pmd_trans_huge() -
> > > > > > so we 'return 0' and that results in retrying the fault and
> > > > > > all happens from the beginning again.
> > > > > > 
> > > > > > It isn't quite obvious how to break that cycle to me. The
> > > > > > comment before pmd_none_or_trans_huge_or_clear_bad() goes
> > > > > > to great lengths explaining possible races when PMD is
> > > > > > pmd_trans_huge() so it needs careful evaluation what needs
> > > > > > to be done for DAX. Ross, any idea?
> > > > > 
> > > > > Can you bisect it with CONFIG_BROKEN removed from older
> > > > > kernels?
> > > > > 
> > > > > I remember tracking down something like this when initially
> > > > > doing the pmd support.  It ended up being a missed
> > > > > pmd_devmap() check in the fault path, so it may not be the
> > > > > same issue.  It would at least be interesting to see if 4.6
> > > > > fails in a similar manner with this test and FS_DAX_PMD
> > > > > enabled.
> > > > 
> > > > I've been able to reproduce this with my v4.9-rc2 branch, but
> > > > it doesn't reproduce with the old v4.6 kernel.
> > > 
> > > Not sure if it's relevant, but as FYI I fixed a similar issue
> > > before.
> > > 
> > > commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
> > > dax: Split pmd map when fallback on COW
> > > 
> > > -Toshi
> > 
> > Thanks!  Applying a similar patch solves this
> > deadlock.  Unfortunately I don't (yet?) understand this well enough
> > to say whether this is the correct solution, but it makes
> > generic/344 + PMDs pass. :)
> > 
> > Does anyone with more mm knowledge have time to review?
> 
> I'm not really much into huge pages but AFAICT that should fix the
> problem. I'm just not sure whether in other cases when we return
> VM_FAULT_FALLBACK we don't need something similar. Probably this will
> need some experiments
> ;).

Good to know it worked. :-) I think we need to split a pmd mapping in
the case of COW fallback. The pte handler may not proceed when a pmd
mapping is still in-place.

Thanks,
-Toshi

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Infinite loop with DAX PMD faults
  2016-10-28  8:02   ` Jan Kara
@ 2016-10-28 15:35     ` Ross Zwisler
  0 siblings, 0 replies; 11+ messages in thread
From: Ross Zwisler @ 2016-10-28 15:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-nvdimm

On Fri, Oct 28, 2016 at 10:02:36AM +0200, Jan Kara wrote:
> On Thu 27-10-16 13:54:49, Ross Zwisler wrote:
> > On Thu, Oct 27, 2016 at 09:07:50PM +0200, Jan Kara wrote:
> > > When testing my DAX patches rebased on top of Ross' DAX PMD series, I've
> > > come across the following issue with generic/344 test from xfstests. The
> > > test ends in an infinite fault loop when we fault index 0 over and over
> > > again never finishing the fault. The problem is that we do a write fault
> > > for index 0 when there is PMD for that index. So we enter wp_huge_pmd().
> > > For whatever reason that returns VM_FAULT_FALLBACK so we continue to
> > > handle_pte_fault(). There we do
> > > 
> > > 	if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
> > > 
> > > check which is true - the PMD we have is pmd_trans_huge() - so we 'return
> > > 0' and that results in retrying the fault and all happens from the
> > > beginning again.
> > > 
> > > It isn't quite obvious how to break that cycle to me. The comment before
> > > pmd_none_or_trans_huge_or_clear_bad() goes to great lengths explaining
> > > possible races when PMD is pmd_trans_huge() so it needs careful evaluation
> > > what needs to be done for DAX. Ross, any idea?
> > 
> > I'll try & reproduce this, and I'll get back to you.
> 
> For me it happened with ext4 which returned VM_FAULT_FALLBACK from its
> pmd_fault handler on write fault (likely we were not able to allocate
> sufficiently large contiguous hunk). So I'm not sure you will be able to
> easily reproduce just with your series. However tweaking XFS to return
> VM_FAULT_FALLBACK when FAULT_FLAG_WRITE is set should do the trick.

I was able to reproduce it with XFS, with just my series.

The fallback check that's failing for me is this one:

        if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR)

in dax_pmd_insert_mapping().
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-28 15:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-27 19:07 Infinite loop with DAX PMD faults Jan Kara
2016-10-27 19:46 ` Dan Williams
2016-10-27 21:03   ` Ross Zwisler
2016-10-27 21:48     ` Kani, Toshimitsu
2016-10-28  4:13       ` Ross Zwisler
2016-10-28  8:17         ` Jan Kara
2016-10-28 13:51           ` Kani, Toshimitsu
2016-10-28  8:12   ` Jan Kara
2016-10-27 19:54 ` Ross Zwisler
2016-10-28  8:02   ` Jan Kara
2016-10-28 15:35     ` Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).