linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	Ross Zwisler <ross.zwisler@intel.com>
Subject: Re: dax pmd fault handler never returns to userspace
Date: Fri, 20 Nov 2015 09:34:58 +1100	[thread overview]
Message-ID: <20151119223458.GE19199@dastard> (raw)
In-Reply-To: <CAPcyv4i9o4Uznpi3z=FUGZJ14GVnM6dWxyXbgi-1v1YPo=jKqg@mail.gmail.com>

On Wed, Nov 18, 2015 at 10:58:29AM -0800, Dan Williams wrote:
> On Wed, Nov 18, 2015 at 10:53 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > On Wed, Nov 18, 2015 at 01:32:46PM -0500, Jeff Moyer wrote:
> >> Ross Zwisler <ross.zwisler@linux.intel.com> writes:
> >>
> >> > Yea, my first round of testing was broken, sorry about that.
> >> >
> >> > It looks like this test causes the PMD fault handler to be called repeatedly
> >> > over and over until you kill the userspace process.  This doesn't happen for
> >> > XFS because when using XFS this test doesn't hit PMD faults, only PTE faults.
> >>
> >> Hmm, I wonder why not?
> >
> > Well, whether or not you get PMDs is dependent on the block allocator for the
> > filesystem.  We ask the FS how much space is contiguous via get_blocks(), and
> > if it's less than PMD_SIZE (2 MiB) we fall back to the regular 4k page fault
> > path.   This code all lives in __dax_pmd_fault().  There are also a bunch of
> > other reasons why we'd fall back to 4k faults - the virtual address isn't 2
> > MiB aligned, etc.   It's actually pretty hard to get everything right so you
> > actually get PMD faults.
> >
> > Anyway, my guess is that we're failing to meet one of our criteria in XFS, so
> > we just always fall back to PTEs for this test.
> >
> >> Sounds like that will need investigating as well, right?
> >
> > Yep, on it.
> 
> XFS can do pmd faults just fine, you just need to use fiemap to find a
> 2MiB aligned physical offset.  See the ndctl pmd test I posted.

This comes under the topic of "XFS and Storage Alignment 101".
there's nothing new here and it's just like aligning your filesystem
to RAID5/6 geometries for optimal sequential IO patterns:

# mkfs.xfs -f -d su=2m,sw=1 /dev/pmem0
....
# mount /dev/pmem0 /mnt/xfs
# xfs_io -c "extsize 2m" /mnt/xfs

And now XFS will allocate strip unit (2MB) aligned extents of 2MB
in all files created in that filesystem. Now all you have to care
about is correctly aligning the base address of /dev/pmem0 to 2MB so
that all the stripe units (and hence file extent allocations) are
correctly aligned to the page tables.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-11-19 22:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-18 15:53 dax pmd fault handler never returns to userspace Jeff Moyer
2015-11-18 15:56 ` Zwisler, Ross
2015-11-18 16:52 ` Dan Williams
2015-11-18 17:00   ` Ross Zwisler
2015-11-18 17:43     ` Jeff Moyer
2015-11-18 18:10       ` Dan Williams
2015-11-18 18:23         ` Ross Zwisler
2015-11-18 18:32           ` Jeff Moyer
2015-11-18 18:53             ` Ross Zwisler
2015-11-18 18:58               ` Dan Williams
2015-11-19 22:34                 ` Dave Chinner [this message]
2015-11-18 21:33           ` Toshi Kani
2015-11-18 21:57             ` Dan Williams
2015-11-18 22:04               ` Toshi Kani
2015-11-19  0:36                 ` Ross Zwisler
2015-11-19  0:39                   ` Dan Williams
2015-11-19  1:05                   ` Toshi Kani
2015-11-19  1:19                   ` Dan Williams
2015-11-18 18:30         ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151119223458.GE19199@dastard \
    --to=david@fromorbit.com \
    --cc=dan.j.williams@intel.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=ross.zwisler@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).