linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	Andy Lutomirski <luto@kernel.org>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Boaz Harrosh <boazh@netapp.com>
Subject: Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults
Date: Mon, 21 Aug 2017 15:09:16 -0600	[thread overview]
Message-ID: <20170821210916.GF26220@linux.intel.com> (raw)
In-Reply-To: <20170817160815.30466-12-jack@suse.cz>

On Thu, Aug 17, 2017 at 06:08:13PM +0200, Jan Kara wrote:
> Add a flag to iomap interface informing the caller that inode needs
> fdstasync(2) for returned extent to become persistent and use it in DAX
> fault code so that we map such extents only read only. We propagate the
> information that the page table entry has been inserted write-protected
> from dax_iomap_fault() with a new VM_FAULT_RO flag. Filesystem fault
> handler is then responsible for calling fdatasync(2) and updating page
> tables to map pfns read-write. dax_iomap_fault() also takes care of
> updating vmf->orig_pte to match the PTE that was inserted so that we can
> safely recheck that PTE did not change while write-enabling it.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c              | 31 +++++++++++++++++++++++++++++++
>  include/linux/iomap.h |  2 ++
>  include/linux/mm.h    |  6 +++++-
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index bc040e654cc9..ca88fc356786 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1177,6 +1177,22 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf,
>  			goto error_finish_iomap;
>  		}
>  
> +		/*
> +		 * If we are doing synchronous page fault and inode needs fsync,
> +		 * we can insert PTE into page tables only after that happens.
> +		 * Skip insertion for now and return the pfn so that caller can
> +		 * insert it after fsync is done.
> +		 */
> +		if (write && (vma->vm_flags & VM_SYNC) &&
> +		    (iomap.flags & IOMAP_F_NEEDDSYNC)) {
> +			if (WARN_ON_ONCE(!pfnp)) {
> +				error = -EIO;
> +				goto error_finish_iomap;
> +			}
> +			*pfnp = pfn;
> +			vmf_ret = VM_FAULT_NEEDDSYNC | major;
> +			goto finish_iomap;
> +		}

Sorry for the second reply, but I spotted this during my testing.

The radix tree entry is inserted and marked as dirty by the
dax_insert_mapping_entry() call a few lines above this newly added block.

I think that this patch should prevent the radix tree entry that we insert
from being marked as dirty, and let the dax_insert_pfn_mkwrite() handler do
that work.  Right now it is being made dirty twice, which we don't need.

Just inserting the entry as clean here and then marking it as dirty later in
dax_insert_pfn_mkwrite() keeps the radix tree entry dirty state consistent
with the PTE dirty state.  It also solves an issue we have right now where the
newly inserted dirty entry will immediately be flushed as part of the
vfs_fsync_range() call that the filesystem will do before
dax_insert_pfn_mkwrite(). 

For example, here's a trace of a PMD write fault on a completely sparse file:

  dax_pmd_fault: dev 259:0 ino 0xc shared WRITE|ALLOW_RETRY|KILLABLE|USER
  address 0x7feab8e00000 vm_start 0x7feab8e00000 vm_end 0x7feab9000000 pgoff
  0x0 max_pgoff 0x1ff 
  
  dax_pmd_fault_done: dev 259:0 ino 0xc shared WRITE|ALLOW_RETRY|KILLABLE|USER
  address 0x7feab8e00000 vm_start 0x7feab8e00000 vm_end 0x7feab9000000 pgoff
  0x0 max_pgoff 0x1ff NEEDDSYNC
  
  dax_writeback_range: dev 259:0 ino 0xc pgoff 0x0-0x1ff
  
  dax_writeback_one: dev 259:0 ino 0xc pgoff 0x0 pglen 0x200
  
  dax_writeback_range_done: dev 259:0 ino 0xc pgoff 0x1-0x1ff
  
  dax_insert_pfn_mkwrite: dev 259:0 ino 0xc shared
  WRITE|ALLOW_RETRY|KILLABLE|USER address 0x7feab8e00000 pgoff 0x0 NOPAGE

The PMD that we are writing back with dax_writeback_one() is the one that we
just made dirty via the first 1/2 of the sync fault, before we've installed a
page table entry.  This fix might speed up some of your test measurements as
well.

  parent reply	other threads:[~2017-08-21 21:09 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17 16:08 [RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults Jan Kara
2017-08-17 16:08 ` [PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK Jan Kara
2017-08-17 16:08 ` [PATCH 02/13] dax: Simplify arguments of dax_insert_mapping() Jan Kara
2017-08-17 16:08 ` [PATCH 03/13] dax: Factor out getting of pfn out of iomap Jan Kara
2017-08-18 22:06   ` Ross Zwisler
2017-08-23 18:30   ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault() Jan Kara
2017-08-18 22:08   ` Ross Zwisler
2017-08-23 18:30   ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test Jan Kara
2017-08-18 22:08   ` Ross Zwisler
2017-08-23 18:31   ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite Jan Kara
2017-08-18 22:10   ` Ross Zwisler
2017-08-23 18:31   ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() " Jan Kara
2017-08-18 22:12   ` Ross Zwisler
2017-08-23 18:32   ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 08/13] dax: Fix comment describing dax_iomap_fault() Jan Kara
2017-08-18 22:12   ` Ross Zwisler
2017-08-23 18:32   ` Christoph Hellwig
2017-08-17 16:08 ` [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn Jan Kara
2017-08-21 18:45   ` Ross Zwisler
2017-08-23 18:34   ` Christoph Hellwig
2017-08-24  7:26     ` Jan Kara
2017-08-17 16:08 ` [PATCH 10/13] mm: Wire up MAP_SYNC Jan Kara
2017-08-21 21:37   ` Ross Zwisler
2017-08-22  9:36     ` Jan Kara
2017-08-21 21:57   ` Ross Zwisler
2017-08-22  9:34     ` Jan Kara
2017-08-22 17:27     ` Dan Williams
2017-08-23 18:43   ` Christoph Hellwig
2017-08-24  7:16     ` Jan Kara
2017-08-17 16:08 ` [PATCH 11/13] dax, iomap: Add support for synchronous faults Jan Kara
2017-08-21 18:58   ` Ross Zwisler
2017-08-22  9:46     ` Jan Kara
2017-08-21 21:09   ` Ross Zwisler [this message]
2017-08-22 10:08     ` Jan Kara
2017-08-24 12:27   ` Christoph Hellwig
2017-08-24 12:34     ` Jan Kara
2017-08-24 13:38       ` Christoph Hellwig
2017-08-24 16:45         ` Jan Kara
2017-08-17 16:08 ` [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite() Jan Kara
2017-08-21 19:01   ` Ross Zwisler
2017-08-17 16:08 ` [PATCH 13/13] ext4: Support for synchronous DAX faults Jan Kara
2017-08-21 19:19   ` Ross Zwisler
2017-08-22 10:18     ` Jan Kara
2017-08-23 18:37   ` Christoph Hellwig
2017-08-24  7:18     ` Jan Kara
2017-08-24 12:31   ` Christoph Hellwig
2017-08-24 12:34     ` Christoph Hellwig
2017-08-24 12:36     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170821210916.GF26220@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=boazh@netapp.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).