From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
linux-xfs@vger.kernel.org,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 2/3] xfs: consolidate the various page fault handlers
Date: Thu, 24 Aug 2017 15:29:42 -0600 [thread overview]
Message-ID: <20170824212942.GB20489@linux.intel.com> (raw)
In-Reply-To: <20170824152207.30729-3-hch@lst.de>
On Thu, Aug 24, 2017 at 05:22:06PM +0200, Christoph Hellwig wrote:
> Add a new __xfs_filemap_fault helper that implements all four page fault
> callouts, and make these methods themselves small stubs that set the
> correct write_fault flag, and exit early for the non-DAX case for the
> hugepage related ones.
>
> Life would be so much simpler if we only had one method for all this.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
A few nits and a question below. This looks much simpler and gets rid of a
lot of duplication.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
<>
> STATIC int
> xfs_filemap_huge_fault(
> struct vm_fault *vmf,
> enum page_entry_size pe_size)
> {
> - struct inode *inode = file_inode(vmf->vma->vm_file);
> - struct xfs_inode *ip = XFS_I(inode);
> - int ret;
> -
> - if (!IS_DAX(inode))
> + if (!IS_DAX(file_inode(vmf->vma->vm_file)))
> return VM_FAULT_FALLBACK;
>
> - trace_xfs_filemap_huge_fault(ip);
> -
> - if (vmf->flags & FAULT_FLAG_WRITE) {
> - sb_start_pagefault(inode->i_sb);
> - file_update_time(vmf->vma->vm_file);
> - }
> -
> - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> - ret = dax_iomap_fault(vmf, pe_size, NULL, &xfs_iomap_ops);
> - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> -
> - if (vmf->flags & FAULT_FLAG_WRITE)
> - sb_end_pagefault(inode->i_sb);
> + /* DAX can shortcut the normal fault path on write faults! */
Does this comment still make sense for the PMD case? Here's what I *think* it
means: For DAX write faults we make the PTE writeable in the ->fault() code
and don't rely on a follow-up ->page_mkwrite() call. This happens in
do_shared_fault(), where we return before the ->page_mkwrite() call because
the DAX write fault returns VM_FAULT_NOPAGE.
This is in contrast to the normal page fault case where the ->fault() call
ends up calling filemap_fault(), which populates the page read-only, and then
do_shared_fault() does a follow-up ->page_mkwrite() call which makes the page
writable.
First, is the above correct? :) If so, I think the comment doesn't make
sense for the PMD fault path because in that case we always make the PMD
writeable in the first ->huge_fault() call, as there is no follow-up
*_mkwrite()?
> + return __xfs_filemap_fault(vmf, pe_size,
> + (vmf->flags & FAULT_FLAG_WRITE));
> +}
>
> - return ret;
> +STATIC int
> +xfs_filemap_page_mkwrite(
> + struct vm_fault *vmf)
> +{
> + return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
> }
>
> -/*
> - * pfn_mkwrite was originally inteneded to ensure we capture time stamp
> - * updates on write faults. In reality, it's need to serialise against
> - * truncate similar to page_mkwrite. Hence we cycle the XFS_MMAPLOCK_SHARED
> - * to ensure we serialise the fault barrier in place.
> - */
> static int
STATIC
> xfs_filemap_pfn_mkwrite(
> struct vm_fault *vmf)
> {
> -
> - struct inode *inode = file_inode(vmf->vma->vm_file);
> - struct xfs_inode *ip = XFS_I(inode);
> - int ret = VM_FAULT_NOPAGE;
> - loff_t size;
> -
> - trace_xfs_filemap_pfn_mkwrite(ip);
> -
> - sb_start_pagefault(inode->i_sb);
> - file_update_time(vmf->vma->vm_file);
> -
> - /* check if the faulting page hasn't raced with truncate */
> - xfs_ilock(ip, XFS_MMAPLOCK_SHARED);
> - size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
I see that this size check is gone from the new code, and we now rely on the
equivalent check in dax_iomap_fault(). Nice.
> - if (vmf->pgoff >= size)
> - ret = VM_FAULT_SIGBUS;
> - else if (IS_DAX(inode))
> - ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, &xfs_iomap_ops);
> - xfs_iunlock(ip, XFS_MMAPLOCK_SHARED);
> - sb_end_pagefault(inode->i_sb);
> - return ret;
> -
> + if (!IS_DAX(file_inode(vmf->vma->vm_file)))
> + return VM_FAULT_NOPAGE;
> + return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
> }
>
> static const struct vm_operations_struct xfs_file_vm_ops = {
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index bcc3cdf8e1c5..3f41d5340eb8 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -689,10 +689,26 @@ DEFINE_INODE_EVENT(xfs_inode_set_cowblocks_tag);
> DEFINE_INODE_EVENT(xfs_inode_clear_cowblocks_tag);
> DEFINE_INODE_EVENT(xfs_inode_free_cowblocks_invalid);
>
> -DEFINE_INODE_EVENT(xfs_filemap_fault);
> -DEFINE_INODE_EVENT(xfs_filemap_huge_fault);
> -DEFINE_INODE_EVENT(xfs_filemap_page_mkwrite);
> -DEFINE_INODE_EVENT(xfs_filemap_pfn_mkwrite);
> +TRACE_EVENT(xfs_filemap_fault,
> + TP_PROTO(struct xfs_inode *ip, enum page_entry_size pe_size,
> + bool write_fault),
> + TP_ARGS(ip, pe_size, write_fault),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_ino_t, ino)
> + __field(enum page_entry_size, pe_size)
> + __field(bool, write_fault)
> + ),
> + TP_fast_assign(
> + __entry->dev = VFS_I(ip)->i_sb->s_dev;
> + __entry->ino = ip->i_ino;
> + __entry->pe_size = pe_size;
> + __entry->write_fault = write_fault;
> + ),
> + TP_printk("dev %d:%d ino 0x%llx pe_size %d write_fault %d",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->ino, __entry->pe_size, __entry->write_fault)
> +)
I wonder if pe_size would be more readable as a string via __print_symbolic()
so we see "pe_size PMD" instead of "pe_size 1" from enum page_entry_size?
next prev parent reply other threads:[~2017-08-24 21:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-24 15:22 synchronous page faults for XFS Christoph Hellwig
2017-08-24 15:22 ` [PATCH 1/3] iomap: return VM_FAULT_* codes from iomap_page_mkwrite Christoph Hellwig
2017-08-24 19:26 ` Ross Zwisler
2017-08-25 7:10 ` Christoph Hellwig
2017-08-24 15:22 ` [PATCH 2/3] xfs: consolidate the various page fault handlers Christoph Hellwig
2017-08-24 21:29 ` Ross Zwisler [this message]
2017-08-25 7:15 ` Christoph Hellwig
2017-08-28 17:50 ` Ross Zwisler
2017-08-28 17:52 ` Christoph Hellwig
2017-08-28 20:09 ` Ross Zwisler
2017-08-24 15:22 ` [PATCH 3/3] xfs: support for synchronous DAX faults Christoph Hellwig
2017-08-24 21:42 ` Ross Zwisler
2017-08-25 0:13 ` Dan Williams
2017-08-25 6:50 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170824212942.GB20489@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).