All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kani, Toshimitsu" <toshi.kani@hpe.com>
To: "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"jack@suse.cz" <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: Infinite loop with DAX PMD faults
Date: Fri, 28 Oct 2016 13:51:30 +0000	[thread overview]
Message-ID: <1477662579.20881.108.camel@hpe.com> (raw)
In-Reply-To: <20161028081759.GD30952@quack2.suse.cz>

On Fri, 2016-10-28 at 10:17 +0200, Jan Kara wrote:
> On Thu 27-10-16 22:13:00, Ross Zwisler wrote:
> > 
> > On Thu, Oct 27, 2016 at 09:48:41PM +0000, Kani, Toshimitsu wrote:
> > > 
> > > On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> > > > 
> > > > On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > > > > 
> > > > > 
> > > > > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz>
> > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > > > > series, I've come across the following issue with
> > > > > > generic/344 test from xfstests. The test ends in an
> > > > > > infinite fault loop when we fault index 0 over and over
> > > > > > again never finishing the fault. The problem is that we do
> > > > > > a write fault for index 0 when there is PMD for that index.
> > > > > > So we enter wp_huge_pmd(). For whatever reason that returns
> > > > > > VM_FAULT_FALLBACK so we continue to handle_pte_fault().
> > > > > > There we do
> > > > > > 
> > > > > >         if (pmd_trans_unstable(vmf->pmd) ||
> > > > > > pmd_devmap(*vmf-
> > > > > > > 
> > > > > > > pmd))
> > > > > > 
> > > > > > check which is true - the PMD we have is pmd_trans_huge() -
> > > > > > so we 'return 0' and that results in retrying the fault and
> > > > > > all happens from the beginning again.
> > > > > > 
> > > > > > It isn't quite obvious how to break that cycle to me. The
> > > > > > comment before pmd_none_or_trans_huge_or_clear_bad() goes
> > > > > > to great lengths explaining possible races when PMD is
> > > > > > pmd_trans_huge() so it needs careful evaluation what needs
> > > > > > to be done for DAX. Ross, any idea?
> > > > > 
> > > > > Can you bisect it with CONFIG_BROKEN removed from older
> > > > > kernels?
> > > > > 
> > > > > I remember tracking down something like this when initially
> > > > > doing the pmd support.  It ended up being a missed
> > > > > pmd_devmap() check in the fault path, so it may not be the
> > > > > same issue.  It would at least be interesting to see if 4.6
> > > > > fails in a similar manner with this test and FS_DAX_PMD
> > > > > enabled.
> > > > 
> > > > I've been able to reproduce this with my v4.9-rc2 branch, but
> > > > it doesn't reproduce with the old v4.6 kernel.
> > > 
> > > Not sure if it's relevant, but as FYI I fixed a similar issue
> > > before.
> > > 
> > > commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
> > > dax: Split pmd map when fallback on COW
> > > 
> > > -Toshi
> > 
> > Thanks!  Applying a similar patch solves this
> > deadlock.  Unfortunately I don't (yet?) understand this well enough
> > to say whether this is the correct solution, but it makes
> > generic/344 + PMDs pass. :)
> > 
> > Does anyone with more mm knowledge have time to review?
> 
> I'm not really much into huge pages but AFAICT that should fix the
> problem. I'm just not sure whether in other cases when we return
> VM_FAULT_FALLBACK we don't need something similar. Probably this will
> need some experiments
> ;).

Good to know it worked. :-) I think we need to split a pmd mapping in
the case of COW fallback. The pte handler may not proceed when a pmd
mapping is still in-place.

Thanks,
-Toshi

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: "Kani, Toshimitsu" <toshi.kani@hpe.com>
To: "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"jack@suse.cz" <jack@suse.cz>
Cc: "dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Infinite loop with DAX PMD faults
Date: Fri, 28 Oct 2016 13:51:30 +0000	[thread overview]
Message-ID: <1477662579.20881.108.camel@hpe.com> (raw)
In-Reply-To: <20161028081759.GD30952@quack2.suse.cz>

On Fri, 2016-10-28 at 10:17 +0200, Jan Kara wrote:
> On Thu 27-10-16 22:13:00, Ross Zwisler wrote:
> > 
> > On Thu, Oct 27, 2016 at 09:48:41PM +0000, Kani, Toshimitsu wrote:
> > > 
> > > On Thu, 2016-10-27 at 15:03 -0600, Ross Zwisler wrote:
> > > > 
> > > > On Thu, Oct 27, 2016 at 12:46:32PM -0700, Dan Williams wrote:
> > > > > 
> > > > > 
> > > > > On Thu, Oct 27, 2016 at 12:07 PM, Jan Kara <jack@suse.cz>
> > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > When testing my DAX patches rebased on top of Ross' DAX PMD
> > > > > > series, I've come across the following issue with
> > > > > > generic/344 test from xfstests. The test ends in an
> > > > > > infinite fault loop when we fault index 0 over and over
> > > > > > again never finishing the fault. The problem is that we do
> > > > > > a write fault for index 0 when there is PMD for that index.
> > > > > > So we enter wp_huge_pmd(). For whatever reason that returns
> > > > > > VM_FAULT_FALLBACK so we continue to handle_pte_fault().
> > > > > > There we do
> > > > > > 
> > > > > >         if (pmd_trans_unstable(vmf->pmd) ||
> > > > > > pmd_devmap(*vmf-
> > > > > > > 
> > > > > > > pmd))
> > > > > > 
> > > > > > check which is true - the PMD we have is pmd_trans_huge() -
> > > > > > so we 'return 0' and that results in retrying the fault and
> > > > > > all happens from the beginning again.
> > > > > > 
> > > > > > It isn't quite obvious how to break that cycle to me. The
> > > > > > comment before pmd_none_or_trans_huge_or_clear_bad() goes
> > > > > > to great lengths explaining possible races when PMD is
> > > > > > pmd_trans_huge() so it needs careful evaluation what needs
> > > > > > to be done for DAX. Ross, any idea?
> > > > > 
> > > > > Can you bisect it with CONFIG_BROKEN removed from older
> > > > > kernels?
> > > > > 
> > > > > I remember tracking down something like this when initially
> > > > > doing the pmd support.  It ended up being a missed
> > > > > pmd_devmap() check in the fault path, so it may not be the
> > > > > same issue.  It would at least be interesting to see if 4.6
> > > > > fails in a similar manner with this test and FS_DAX_PMD
> > > > > enabled.
> > > > 
> > > > I've been able to reproduce this with my v4.9-rc2 branch, but
> > > > it doesn't reproduce with the old v4.6 kernel.
> > > 
> > > Not sure if it's relevant, but as FYI I fixed a similar issue
> > > before.
> > > 
> > > commit 59bf4fb9d386601cbaa70a9b00159abb846dedaa
> > > dax: Split pmd map when fallback on COW
> > > 
> > > -Toshi
> > 
> > Thanks!  Applying a similar patch solves this
> > deadlock.  Unfortunately I don't (yet?) understand this well enough
> > to say whether this is the correct solution, but it makes
> > generic/344 + PMDs pass. :)
> > 
> > Does anyone with more mm knowledge have time to review?
> 
> I'm not really much into huge pages but AFAICT that should fix the
> problem. I'm just not sure whether in other cases when we return
> VM_FAULT_FALLBACK we don't need something similar. Probably this will
> need some experiments
> ;).

Good to know it worked. :-) I think we need to split a pmd mapping in
the case of COW fallback. The pte handler may not proceed when a pmd
mapping is still in-place.

Thanks,
-Toshi


  reply	other threads:[~2016-10-28 13:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-27 19:07 Infinite loop with DAX PMD faults Jan Kara
2016-10-27 19:07 ` Jan Kara
2016-10-27 19:46 ` Dan Williams
2016-10-27 19:46   ` Dan Williams
2016-10-27 21:03   ` Ross Zwisler
2016-10-27 21:03     ` Ross Zwisler
2016-10-27 21:48     ` Kani, Toshimitsu
2016-10-27 21:48       ` Kani, Toshimitsu
2016-10-28  4:13       ` Ross Zwisler
2016-10-28  4:13         ` Ross Zwisler
2016-10-28  8:17         ` Jan Kara
2016-10-28  8:17           ` Jan Kara
2016-10-28 13:51           ` Kani, Toshimitsu [this message]
2016-10-28 13:51             ` Kani, Toshimitsu
2016-10-28  8:12   ` Jan Kara
2016-10-28  8:12     ` Jan Kara
2016-10-27 19:54 ` Ross Zwisler
2016-10-27 19:54   ` Ross Zwisler
2016-10-28  8:02   ` Jan Kara
2016-10-28  8:02     ` Jan Kara
2016-10-28 15:35     ` Ross Zwisler
2016-10-28 15:35       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1477662579.20881.108.camel@hpe.com \
    --to=toshi.kani@hpe.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.