From: Matthew Wilcox <willy@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 07/22] Replace the XIP page fault handler with the DAX page fault handler
Date: Wed, 9 Apr 2014 16:48:06 -0400 [thread overview]
Message-ID: <20140409204806.GF5727@linux.intel.com> (raw)
In-Reply-To: <20140408220525.GC26019@quack.suse.cz>
On Wed, Apr 09, 2014 at 12:05:25AM +0200, Jan Kara wrote:
> > + if (!page)
> > + return VM_FAULT_OOM;
> > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + if (vmf->pgoff >= size) {
> Maybe comment here that we have to recheck i_size so that we don't create
> pages in the area truncate_pagecache() has already evicted.
Done.
> > + dax_get_addr(inode, bh, &vfrom); /* XXX: error handling */
> The error handling here is missing as the comment suggests :)
Added.
> > + if (buffer_unwritten(&bh) || buffer_new(&bh))
> > + dax_clear_blocks(inode, bh.b_blocknr, bh.b_size);
> Where is dax_clear_blocks() defined?
Er ... patch 11. I'll reorder the patches ;-)
> > +
> > + error = dax_get_pfn(inode, &bh, &pfn);
> > + if (error > 0)
> > + error = vm_insert_mixed(vma, vaddr, pfn);
> When there's a hole (thus page != NULL) and we are called from
> dax_mkwrite(), this will always return EBUSY, correct?
Erm ... it will return -EBUSY if this was the task that previously
faulted on it. Drat. See below.
> > + mutex_unlock(&mapping->i_mmap_mutex);
> > +
> > + if (page) {
> > + delete_from_page_cache(page);
> > + unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT,
> > + PAGE_CACHE_SIZE, 0);
> Here we unmap the PTE pointing to the hole page but then we'll have to
> retry the fault again to fill in the pfn we've got? This seems wrong. I'd
> say we want to remap the PTE from the hole page to a pfn we've got while
> holding i_mmap_mutex. remap_pfn_range() almost does what you need, except
> that you also need that to work for normal pages. So you might need to
> create a new helper in mm layer for that.
I think it's easier than that. How does this look?
@@ -390,9 +389,8 @@ static int do_dax_fault(struct vm_area_struct *vma, struct v
dax_clear_blocks(inode, bh.b_blocknr, bh.b_size);
error = dax_get_pfn(&bh, &pfn, blkbits);
- if (error > 0)
- error = vm_insert_mixed(vma, vaddr, pfn);
- mutex_unlock(&mapping->i_mmap_mutex);
+ if (error <= 0)
+ goto unlock;
if (page) {
delete_from_page_cache(page);
@@ -402,6 +400,9 @@ static int do_dax_fault(struct vm_area_struct *vma, struct v
page_cache_release(page);
}
+ error = vm_insert_mixed(vma, vaddr, pfn);
+ mutex_unlock(&mapping->i_mmap_mutex);
+
if (error == -ENOMEM)
return VM_FAULT_OOM;
/* -EBUSY is fine, somebody else faulted on the same PTE */
@@ -409,6 +410,8 @@ static int do_dax_fault(struct vm_area_struct *vma, struct v
BUG_ON(error);
return VM_FAULT_NOPAGE | major;
+ unlock:
+ mutex_unlock(&mapping->i_mmap_mutex);
sigbus:
if (page) {
unlock_page(page);
> > +int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> > + get_block_t get_block)
> > +{
> > + int result;
> > + struct super_block *sb = file_inode(vma->vm_file)->i_sb;
> > +
> > + sb_start_pagefault(sb);
> You don't need any filesystem freeze protection for the fault handler
> since that's not going to modify the filesystem.
Err ... we might allocate a block as a result of doing a write to a hole.
Or does that not count as 'modifying the filesystem' in this context?
> > + file_update_time(vma->vm_file);
> Why do you update m/ctime? We are only reading the file...
... except that it might be a write fault. I think we modify the file
iff we return VM_FAULT_MAJOR from do_dax_fault(). So I'd be open to
something like this:
sb_start_pagefault(sb);
result = do_dax_fault(vma, vmf, get_block);
if (result & VM_FAULT_MAJOR)
file_update_time(vma->vm_file);
sb_end_pagefault(sb);
Would that work better for you?
> > @@ -70,7 +101,7 @@ const struct file_operations ext2_file_operations = {
> > #ifdef CONFIG_COMPAT
> > .compat_ioctl = ext2_compat_ioctl,
> > #endif
> > - .mmap = generic_file_mmap,
> > + .mmap = ext2_file_mmap,
> So what's the point of ext2_file_operations ever handling IS_DAX()
> inodes? Actually ext2_file_operations and ext2_xip_file_operations seem to
> be the same after this patch so either you drop ext2_xip_file_operations
> (I'm for this) or you can leave generic_file_mmap here and assume
> ext2_file_mmap is always called for IS_DAX() inodes.
The goal is to get them the same. At this point, the only sticky point is:
.splice_read = generic_file_splice_read,
.splice_write = generic_file_splice_write,
And splice is pretty damn sticky for DAX.
WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 07/22] Replace the XIP page fault handler with the DAX page fault handler
Date: Wed, 9 Apr 2014 16:48:06 -0400 [thread overview]
Message-ID: <20140409204806.GF5727@linux.intel.com> (raw)
In-Reply-To: <20140408220525.GC26019@quack.suse.cz>
On Wed, Apr 09, 2014 at 12:05:25AM +0200, Jan Kara wrote:
> > + if (!page)
> > + return VM_FAULT_OOM;
> > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + if (vmf->pgoff >= size) {
> Maybe comment here that we have to recheck i_size so that we don't create
> pages in the area truncate_pagecache() has already evicted.
Done.
> > + dax_get_addr(inode, bh, &vfrom); /* XXX: error handling */
> The error handling here is missing as the comment suggests :)
Added.
> > + if (buffer_unwritten(&bh) || buffer_new(&bh))
> > + dax_clear_blocks(inode, bh.b_blocknr, bh.b_size);
> Where is dax_clear_blocks() defined?
Er ... patch 11. I'll reorder the patches ;-)
> > +
> > + error = dax_get_pfn(inode, &bh, &pfn);
> > + if (error > 0)
> > + error = vm_insert_mixed(vma, vaddr, pfn);
> When there's a hole (thus page != NULL) and we are called from
> dax_mkwrite(), this will always return EBUSY, correct?
Erm ... it will return -EBUSY if this was the task that previously
faulted on it. Drat. See below.
> > + mutex_unlock(&mapping->i_mmap_mutex);
> > +
> > + if (page) {
> > + delete_from_page_cache(page);
> > + unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT,
> > + PAGE_CACHE_SIZE, 0);
> Here we unmap the PTE pointing to the hole page but then we'll have to
> retry the fault again to fill in the pfn we've got? This seems wrong. I'd
> say we want to remap the PTE from the hole page to a pfn we've got while
> holding i_mmap_mutex. remap_pfn_range() almost does what you need, except
> that you also need that to work for normal pages. So you might need to
> create a new helper in mm layer for that.
I think it's easier than that. How does this look?
@@ -390,9 +389,8 @@ static int do_dax_fault(struct vm_area_struct *vma, struct v
dax_clear_blocks(inode, bh.b_blocknr, bh.b_size);
error = dax_get_pfn(&bh, &pfn, blkbits);
- if (error > 0)
- error = vm_insert_mixed(vma, vaddr, pfn);
- mutex_unlock(&mapping->i_mmap_mutex);
+ if (error <= 0)
+ goto unlock;
if (page) {
delete_from_page_cache(page);
@@ -402,6 +400,9 @@ static int do_dax_fault(struct vm_area_struct *vma, struct v
page_cache_release(page);
}
+ error = vm_insert_mixed(vma, vaddr, pfn);
+ mutex_unlock(&mapping->i_mmap_mutex);
+
if (error == -ENOMEM)
return VM_FAULT_OOM;
/* -EBUSY is fine, somebody else faulted on the same PTE */
@@ -409,6 +410,8 @@ static int do_dax_fault(struct vm_area_struct *vma, struct v
BUG_ON(error);
return VM_FAULT_NOPAGE | major;
+ unlock:
+ mutex_unlock(&mapping->i_mmap_mutex);
sigbus:
if (page) {
unlock_page(page);
> > +int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> > + get_block_t get_block)
> > +{
> > + int result;
> > + struct super_block *sb = file_inode(vma->vm_file)->i_sb;
> > +
> > + sb_start_pagefault(sb);
> You don't need any filesystem freeze protection for the fault handler
> since that's not going to modify the filesystem.
Err ... we might allocate a block as a result of doing a write to a hole.
Or does that not count as 'modifying the filesystem' in this context?
> > + file_update_time(vma->vm_file);
> Why do you update m/ctime? We are only reading the file...
... except that it might be a write fault. I think we modify the file
iff we return VM_FAULT_MAJOR from do_dax_fault(). So I'd be open to
something like this:
sb_start_pagefault(sb);
result = do_dax_fault(vma, vmf, get_block);
if (result & VM_FAULT_MAJOR)
file_update_time(vma->vm_file);
sb_end_pagefault(sb);
Would that work better for you?
> > @@ -70,7 +101,7 @@ const struct file_operations ext2_file_operations = {
> > #ifdef CONFIG_COMPAT
> > .compat_ioctl = ext2_compat_ioctl,
> > #endif
> > - .mmap = generic_file_mmap,
> > + .mmap = ext2_file_mmap,
> So what's the point of ext2_file_operations ever handling IS_DAX()
> inodes? Actually ext2_file_operations and ext2_xip_file_operations seem to
> be the same after this patch so either you drop ext2_xip_file_operations
> (I'm for this) or you can leave generic_file_mmap here and assume
> ext2_file_mmap is always called for IS_DAX() inodes.
The goal is to get them the same. At this point, the only sticky point is:
.splice_read = generic_file_splice_read,
.splice_write = generic_file_splice_write,
And splice is pretty damn sticky for DAX.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-04-09 20:48 UTC|newest]
Thread overview: 180+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-23 19:08 [PATCH v7 00/22] Support ext4 on NV-DIMMs Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 01/22] Fix XIP fault vs truncate race Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-03-29 15:57 ` Jan Kara
2014-03-29 15:57 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 02/22] Allow page fault handlers to perform the COW Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 16:34 ` Jan Kara
2014-04-08 16:34 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 03/22] axonram: Fix bug in direct_access Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-03-29 16:22 ` Jan Kara
2014-03-29 16:22 ` Jan Kara
2014-04-02 19:24 ` Matthew Wilcox
2014-04-02 19:24 ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 04/22] Change direct_access calling convention Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-03-29 16:30 ` Jan Kara
2014-03-29 16:30 ` Jan Kara
2014-04-02 19:27 ` Matthew Wilcox
2014-04-02 19:27 ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 05/22] Introduce IS_DAX(inode) Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 15:32 ` Jan Kara
2014-04-08 15:32 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 06/22] Replace XIP read and write with DAX I/O Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 17:56 ` Jan Kara
2014-04-08 17:56 ` Jan Kara
2014-04-08 20:21 ` Matthew Wilcox
2014-04-08 20:21 ` Matthew Wilcox
2014-04-09 9:14 ` Jan Kara
2014-04-09 9:14 ` Jan Kara
2014-04-09 15:19 ` Matthew Wilcox
2014-04-09 15:19 ` Matthew Wilcox
2014-04-09 20:55 ` Jan Kara
2014-04-09 20:55 ` Jan Kara
2014-04-13 18:05 ` Matthew Wilcox
2014-04-13 18:05 ` Matthew Wilcox
2014-04-09 12:04 ` Jan Kara
2014-04-09 12:04 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 07/22] Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 22:05 ` Jan Kara
2014-04-08 22:05 ` Jan Kara
2014-04-09 20:48 ` Matthew Wilcox [this message]
2014-04-09 20:48 ` Matthew Wilcox
2014-04-09 21:12 ` Jan Kara
2014-04-09 21:12 ` Jan Kara
2014-04-13 11:21 ` Matthew Wilcox
2014-04-13 11:21 ` Matthew Wilcox
2014-04-14 16:04 ` Jan Kara
2014-04-14 16:04 ` Jan Kara
2014-04-09 10:27 ` Jan Kara
2014-04-09 10:27 ` Jan Kara
2014-04-09 20:51 ` Matthew Wilcox
2014-04-09 20:51 ` Matthew Wilcox
2014-04-09 21:43 ` Jan Kara
2014-04-09 21:43 ` Jan Kara
2014-04-13 18:03 ` Matthew Wilcox
2014-04-13 18:03 ` Matthew Wilcox
2014-07-29 12:12 ` Matthew Wilcox
2014-07-29 12:12 ` Matthew Wilcox
2014-07-29 21:04 ` Jan Kara
2014-07-29 21:04 ` Jan Kara
2014-07-29 21:23 ` Matthew Wilcox
2014-07-29 21:23 ` Matthew Wilcox
2014-07-30 9:52 ` Jan Kara
2014-07-30 9:52 ` Jan Kara
2014-07-30 21:02 ` Matthew Wilcox
2014-07-30 21:02 ` Matthew Wilcox
2014-08-09 11:00 ` Matthew Wilcox
2014-08-09 11:00 ` Matthew Wilcox
2014-08-11 8:51 ` Jan Kara
2014-08-11 8:51 ` Jan Kara
2014-08-11 14:13 ` Matthew Wilcox
2014-08-11 14:13 ` Matthew Wilcox
2014-08-11 14:35 ` Jan Kara
2014-08-11 14:35 ` Jan Kara
2014-08-11 15:02 ` Matthew Wilcox
2014-08-11 15:02 ` Matthew Wilcox
2014-08-11 15:25 ` Jan Kara
2014-08-11 15:25 ` Jan Kara
2014-05-21 20:35 ` Toshi Kani
2014-05-21 20:35 ` Toshi Kani
2014-06-05 22:38 ` Toshi Kani
2014-06-05 22:38 ` Toshi Kani
2014-03-23 19:08 ` [PATCH v7 08/22] Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 22:17 ` Jan Kara
2014-04-08 22:17 ` Jan Kara
2014-04-09 9:26 ` Jan Kara
2014-04-09 9:26 ` Jan Kara
2014-04-13 19:07 ` Matthew Wilcox
2014-04-13 19:07 ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 09/22] Remove mm/filemap_xip.c Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 18:21 ` Jan Kara
2014-04-08 18:21 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 10/22] Remove get_xip_mem Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-08 18:20 ` Jan Kara
2014-04-08 18:20 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 11/22] Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 9:46 ` Jan Kara
2014-04-09 9:46 ` Jan Kara
2014-04-10 14:16 ` Matthew Wilcox
2014-04-10 14:16 ` Matthew Wilcox
2014-04-10 18:31 ` Jan Kara
2014-04-10 18:31 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 12/22] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 9:52 ` Jan Kara
2014-04-09 9:52 ` Jan Kara
2014-04-10 14:22 ` Matthew Wilcox
2014-04-10 14:22 ` Matthew Wilcox
2014-04-10 18:35 ` Jan Kara
2014-04-10 18:35 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 13/22] ext2: Remove ext2_use_xip Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 9:55 ` Jan Kara
2014-04-09 9:55 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 14/22] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 9:59 ` Jan Kara
2014-04-09 9:59 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 15/22] Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 9:59 ` Jan Kara
2014-04-09 9:59 ` Jan Kara
2014-04-10 14:23 ` Matthew Wilcox
2014-04-10 14:23 ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 16/22] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 10:02 ` Jan Kara
2014-04-09 10:02 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 17/22] Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 10:04 ` Jan Kara
2014-04-09 10:04 ` Jan Kara
2014-04-10 14:26 ` Matthew Wilcox
2014-04-10 14:26 ` Matthew Wilcox
2014-04-10 18:40 ` Jan Kara
2014-04-10 18:40 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 18/22] xip: Add xip_zero_page_range Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 10:15 ` Jan Kara
2014-04-09 10:15 ` Jan Kara
2014-04-10 14:27 ` Matthew Wilcox
2014-04-10 14:27 ` Matthew Wilcox
2014-04-10 18:43 ` Jan Kara
2014-04-10 18:43 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 19/22] ext4: Make ext4_block_zero_page_range static Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-03-24 19:11 ` tytso
2014-03-24 19:11 ` tytso
2014-03-23 19:08 ` [PATCH v7 20/22] ext4: Add DAX functionality Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 12:17 ` Jan Kara
2014-04-09 12:17 ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 21/22] ext4: Fix typos Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-03-24 19:16 ` tytso
2014-03-24 19:16 ` tytso
2014-03-23 19:08 ` [PATCH v7 22/22] brd: Rename XIP to DAX Matthew Wilcox
2014-03-23 19:08 ` Matthew Wilcox
2014-04-09 10:07 ` Jan Kara
2014-04-09 10:07 ` Jan Kara
2014-05-18 14:58 ` [PATCH v7 00/22] Support ext4 on NV-DIMMs Boaz Harrosh
2014-05-18 14:58 ` Boaz Harrosh
2014-05-18 23:24 ` Matthew Wilcox
2014-05-18 23:24 ` Matthew Wilcox
2014-06-17 18:11 ` Boaz Harrosh
2014-06-17 18:11 ` Boaz Harrosh
2014-06-17 18:19 ` Matthew Wilcox
2014-06-17 18:19 ` Matthew Wilcox
2014-06-17 18:39 ` Boaz Harrosh
2014-06-17 18:39 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140409204806.GF5727@linux.intel.com \
--to=willy@linux.intel.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.