linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping
@ 2016-01-10 13:59 Boaz Harrosh
  2016-01-10 14:02 ` [PATCH 1/2] mm: Allow single pagefault on mmap-write with VM_MIXEDMAP Boaz Harrosh
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Boaz Harrosh @ 2016-01-10 13:59 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dan Williams, Andrew Morton, Matthew Wilcox,
	linux-mm@kvack.org
  Cc: Ross Zwisler, Oleg Nesterov, Mel Gorman, Johannes Weiner

Hi

Today any VM_MIXEDMAP or VM_PFN mapping when enabling a write access
to their mapping, will have a double pagefault for every write access.

This is because vma->vm_page_prot defines how a page/pfn is inserted into
the page table (see vma_wants_writenotify in mm/mmap.c).

Which means that it is always inserted with read-only under the
assumption that we want to be notified when write access occurs.

But this is not always true and adds an unnecessary page-fault on
every new mmap-write access

This patchset is trying to give the fault handler more choice by passing
an pgprot_t to vm_insert_mixed() via a new vm_insert_mixed_prot() API.

If the mm guys feel that the pgprot_t and its helpers and flags are private
to mm/memory.c I can easily do a new: vm_insert_mixed_rw() instead. of the
above vm_insert_mixed_prot() which enables any control not only write.

Following is a patch to DAX to optimize out the extra page-fault.

TODO: I only did 4k mapping perhaps 2M mapping can enjoy the same single
fault on write access. If interesting to anyone I can attempt a fix.

Dan Andrew who needs to pick this up please?

list of patches:
[PATCH 1/2] mm: Allow single pagefault on mmap-write with VM_MIXEDMAP
[PATCH 2/2] dax: Only fault once on mmap write access

Thank you
Boaz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] mm: Allow single pagefault on mmap-write with VM_MIXEDMAP
  2016-01-10 13:59 [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Boaz Harrosh
@ 2016-01-10 14:02 ` Boaz Harrosh
  2016-01-10 14:03 ` [PATCH 2/2] dax: Only fault once on mmap write access Boaz Harrosh
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Boaz Harrosh @ 2016-01-10 14:02 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dan Williams, Andrew Morton, Matthew Wilcox,
	linux-mm@kvack.org
  Cc: Ross Zwisler, Oleg Nesterov, Mel Gorman, Johannes Weiner


Until now vma->vm_page_prot defines how a page/pfn is inserted into
the page table (see vma_wants_writenotify in mm/mmap.c).

Which meant that it was always inserted with read-only under the
assumption that we want to be notified when write access occurs.
This is not always true and adds an unnecessary page-fault on
every new mmap-write.

This patch adds a more granular approach and lets the fault handler
decide how it wants to map the mixmap pfn.

The old vm_insert_mixed() now receives a new pgprot_t prot and is
renamed to: vm_insert_mixed_prot().
A new inline vm_insert_mixed() is defined which is a wrapper over
vm_insert_mixed_prot(), with the vma->vm_page_prot default as before,
so to satisfy all current users.

CC: Andrew Morton <akpm@linux-foundation.org>
CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Matthew Wilcox <willy@linux.intel.com>
CC: linux-mm@kvack.org (open list:MEMORY MANAGEMENT)

Reviewed-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 include/linux/mm.h |  8 +++++++-
 mm/memory.c        | 10 +++++-----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80001de..46a9a19 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2108,8 +2108,14 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
+int vm_insert_mixed_prot(struct vm_area_struct *vma, unsigned long addr,
+			 unsigned long pfn, pgprot_t prot);
+static inline
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
-			unsigned long pfn);
+		    unsigned long pfn)
+{
+	return vm_insert_mixed_prot(vma, addr, pfn, vma->vm_page_prot);
+}
 int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
 
 
diff --git a/mm/memory.c b/mm/memory.c
index deb679c..c716913 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1589,8 +1589,8 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_insert_pfn);
 
-int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
-			unsigned long pfn)
+int vm_insert_mixed_prot(struct vm_area_struct *vma, unsigned long addr,
+			 unsigned long pfn, pgprot_t prot)
 {
 	BUG_ON(!(vma->vm_flags & VM_MIXEDMAP));
 
@@ -1608,11 +1608,11 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 		struct page *page;
 
 		page = pfn_to_page(pfn);
-		return insert_page(vma, addr, page, vma->vm_page_prot);
+		return insert_page(vma, addr, page, prot);
 	}
-	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+	return insert_pfn(vma, addr, pfn, prot);
 }
-EXPORT_SYMBOL(vm_insert_mixed);
+EXPORT_SYMBOL(vm_insert_mixed_prot);
 
 /*
  * maps a range of physical memory into the requested pages. the old
-- 
1.9.3


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] dax: Only fault once on mmap write access
  2016-01-10 13:59 [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Boaz Harrosh
  2016-01-10 14:02 ` [PATCH 1/2] mm: Allow single pagefault on mmap-write with VM_MIXEDMAP Boaz Harrosh
@ 2016-01-10 14:03 ` Boaz Harrosh
  2016-01-11  1:19 ` [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Dan Williams
  2016-01-11 19:35 ` Matthew Wilcox
  3 siblings, 0 replies; 8+ messages in thread
From: Boaz Harrosh @ 2016-01-10 14:03 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dan Williams, Andrew Morton, Matthew Wilcox,
	linux-mm@kvack.org
  Cc: Ross Zwisler, Oleg Nesterov, Mel Gorman, Johannes Weiner


In current code for any mmap-write access there are two page faults.
One that maps the pfn into the vma (vm_insert_mixed()), and a second
one that converts the read-only mapping to read-write (via pfn_mkwrite).

But since we already know that this is a write access we can map the
pfn read-write and save the extra fault.

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 fs/dax.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index a86d3cc..3fee696 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -289,6 +289,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9);
 	unsigned long vaddr = (unsigned long)vmf->virtual_address;
 	void __pmem *addr;
+	pgprot_t prot = vma->vm_page_prot;
 	unsigned long pfn;
 	pgoff_t size;
 	int error;
@@ -321,7 +322,10 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 		wmb_pmem();
 	}
 
-	error = vm_insert_mixed(vma, vaddr, pfn);
+	if (vmf->flags & FAULT_FLAG_WRITE)
+		prot = pgprot_modify(prot, PAGE_SHARED);
+
+	error = vm_insert_mixed_prot(vma, vaddr, pfn, prot);
 
  out:
 	i_mmap_unlock_read(mapping);
-- 
1.9.3


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping
  2016-01-10 13:59 [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Boaz Harrosh
  2016-01-10 14:02 ` [PATCH 1/2] mm: Allow single pagefault on mmap-write with VM_MIXEDMAP Boaz Harrosh
  2016-01-10 14:03 ` [PATCH 2/2] dax: Only fault once on mmap write access Boaz Harrosh
@ 2016-01-11  1:19 ` Dan Williams
  2016-01-11  9:22   ` Boaz Harrosh
  2016-01-11 19:35 ` Matthew Wilcox
  3 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2016-01-11  1:19 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox,
	linux-mm@kvack.org, Ross Zwisler, Oleg Nesterov, Mel Gorman,
	Johannes Weiner

On Sun, Jan 10, 2016 at 5:59 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> Hi
>
> Today any VM_MIXEDMAP or VM_PFN mapping when enabling a write access
> to their mapping, will have a double pagefault for every write access.
>
> This is because vma->vm_page_prot defines how a page/pfn is inserted into
> the page table (see vma_wants_writenotify in mm/mmap.c).
>
> Which means that it is always inserted with read-only under the
> assumption that we want to be notified when write access occurs.
>
> But this is not always true and adds an unnecessary page-fault on
> every new mmap-write access
>
> This patchset is trying to give the fault handler more choice by passing
> an pgprot_t to vm_insert_mixed() via a new vm_insert_mixed_prot() API.
>
> If the mm guys feel that the pgprot_t and its helpers and flags are private
> to mm/memory.c I can easily do a new: vm_insert_mixed_rw() instead. of the
> above vm_insert_mixed_prot() which enables any control not only write.
>
> Following is a patch to DAX to optimize out the extra page-fault.
>
> TODO: I only did 4k mapping perhaps 2M mapping can enjoy the same single
> fault on write access. If interesting to anyone I can attempt a fix.
>
> Dan Andrew who needs to pick this up please?

This collides with the patches currently pending in -mm for 4.5, lets
take a look at this for 4.6.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping
  2016-01-11  1:19 ` [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Dan Williams
@ 2016-01-11  9:22   ` Boaz Harrosh
  2016-01-11 16:37     ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Boaz Harrosh @ 2016-01-11  9:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox,
	linux-mm@kvack.org, Ross Zwisler, Oleg Nesterov, Mel Gorman,
	Johannes Weiner

On 01/11/2016 03:19 AM, Dan Williams wrote:
> On Sun, Jan 10, 2016 at 5:59 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
>> Hi
>>
>> Today any VM_MIXEDMAP or VM_PFN mapping when enabling a write access
>> to their mapping, will have a double pagefault for every write access.
>>
>> This is because vma->vm_page_prot defines how a page/pfn is inserted into
>> the page table (see vma_wants_writenotify in mm/mmap.c).
>>
>> Which means that it is always inserted with read-only under the
>> assumption that we want to be notified when write access occurs.
>>
>> But this is not always true and adds an unnecessary page-fault on
>> every new mmap-write access
>>
>> This patchset is trying to give the fault handler more choice by passing
>> an pgprot_t to vm_insert_mixed() via a new vm_insert_mixed_prot() API.
>>
>> If the mm guys feel that the pgprot_t and its helpers and flags are private
>> to mm/memory.c I can easily do a new: vm_insert_mixed_rw() instead. of the
>> above vm_insert_mixed_prot() which enables any control not only write.
>>
>> Following is a patch to DAX to optimize out the extra page-fault.
>>
>> TODO: I only did 4k mapping perhaps 2M mapping can enjoy the same single
>> fault on write access. If interesting to anyone I can attempt a fix.
>>
>> Dan Andrew who needs to pick this up please?
> 
> This collides with the patches currently pending in -mm for 4.5, lets
> take a look at this for 4.6.
> 

OK thanks, I will try to work this over current linux-next and sure we
will wait for 4.5-rc1 to look at this again.

Do you have any comments in general about this?

Thanks
Boaz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping
  2016-01-11  9:22   ` Boaz Harrosh
@ 2016-01-11 16:37     ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2016-01-11 16:37 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox,
	linux-mm@kvack.org, Ross Zwisler, Oleg Nesterov, Mel Gorman,
	Johannes Weiner

On Mon, Jan 11, 2016 at 1:22 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> On 01/11/2016 03:19 AM, Dan Williams wrote:
>> On Sun, Jan 10, 2016 at 5:59 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
>>> Hi
>>>
>>> Today any VM_MIXEDMAP or VM_PFN mapping when enabling a write access
>>> to their mapping, will have a double pagefault for every write access.
>>>
>>> This is because vma->vm_page_prot defines how a page/pfn is inserted into
>>> the page table (see vma_wants_writenotify in mm/mmap.c).
>>>
>>> Which means that it is always inserted with read-only under the
>>> assumption that we want to be notified when write access occurs.
>>>
>>> But this is not always true and adds an unnecessary page-fault on
>>> every new mmap-write access
>>>
>>> This patchset is trying to give the fault handler more choice by passing
>>> an pgprot_t to vm_insert_mixed() via a new vm_insert_mixed_prot() API.
>>>
>>> If the mm guys feel that the pgprot_t and its helpers and flags are private
>>> to mm/memory.c I can easily do a new: vm_insert_mixed_rw() instead. of the
>>> above vm_insert_mixed_prot() which enables any control not only write.
>>>
>>> Following is a patch to DAX to optimize out the extra page-fault.
>>>
>>> TODO: I only did 4k mapping perhaps 2M mapping can enjoy the same single
>>> fault on write access. If interesting to anyone I can attempt a fix.
>>>
>>> Dan Andrew who needs to pick this up please?
>>
>> This collides with the patches currently pending in -mm for 4.5, lets
>> take a look at this for 4.6.
>>
>
> OK thanks, I will try to work this over current linux-next and sure we
> will wait for 4.5-rc1 to look at this again.
>
> Do you have any comments in general about this?

Looks worthwhile at first glance, the only concern that comes to mind
is integration with Ross' fsync/msync enabling.  How much does this
change matter in practice?  If the mapping is long standing then I
expect this cost gets hidden?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping
  2016-01-10 13:59 [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Boaz Harrosh
                   ` (2 preceding siblings ...)
  2016-01-11  1:19 ` [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Dan Williams
@ 2016-01-11 19:35 ` Matthew Wilcox
  2016-01-12 13:29   ` Matthew Wilcox
  3 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2016-01-11 19:35 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Kirill A. Shutemov, Dan Williams, Andrew Morton,
	linux-mm@kvack.org, Ross Zwisler, Oleg Nesterov, Mel Gorman,
	Johannes Weiner

On Sun, Jan 10, 2016 at 03:59:22PM +0200, Boaz Harrosh wrote:
> Today any VM_MIXEDMAP or VM_PFN mapping when enabling a write access
> to their mapping, will have a double pagefault for every write access.
> 
> This is because vma->vm_page_prot defines how a page/pfn is inserted into
> the page table (see vma_wants_writenotify in mm/mmap.c).
> 
> Which means that it is always inserted with read-only under the
> assumption that we want to be notified when write access occurs.
> 
> But this is not always true and adds an unnecessary page-fault on
> every new mmap-write access
> 
> This patchset is trying to give the fault handler more choice by passing
> an pgprot_t to vm_insert_mixed() via a new vm_insert_mixed_prot() API.
> 
> If the mm guys feel that the pgprot_t and its helpers and flags are private
> to mm/memory.c I can easily do a new: vm_insert_mixed_rw() instead. of the
> above vm_insert_mixed_prot() which enables any control not only write.

We've known about this one for a while, and it's never been terribly
high on the priority list to fix it.  This is the obvious way to fix
it but, as you note, it might be seen as increasing the leak between
the abstractions.

I would rather see the memory.c code move in the direction of the
huge_memory.c code.  How about something like this?

diff --git a/fs/dax.c b/fs/dax.c
index a610cbe..09b6c8c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -498,6 +498,7 @@ EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 			struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	const bool write = vmf->flags & FAULT_FLAG_WRITE;
 	unsigned long vaddr = (unsigned long)vmf->virtual_address;
 	struct address_space *mapping = inode->i_mapping;
 	struct block_device *bdev = bh->b_bdev;
@@ -534,12 +535,11 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	}
 	dax_unmap_atomic(bdev, &dax);
 
-	error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false,
-			vmf->flags & FAULT_FLAG_WRITE);
+	error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false, write);
 	if (error)
 		goto out;
 
-	error = vm_insert_mixed(vma, vaddr, dax.pfn);
+	error = vmf_insert_pfn(vma, vaddr, dax.pfn, write);
 
  out:
 	i_mmap_unlock_read(mapping);
@@ -559,7 +559,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned blkbits = inode->i_blkbits;
 	sector_t block;
 	pgoff_t size;
-	int error;
+	int result, error;
 	int major = 0;
 
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -660,13 +660,14 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * indicate what the callback should do via the uptodate variable, same
 	 * as for normal BH based IO completions.
 	 */
-	error = dax_insert_mapping(inode, &bh, vma, vmf);
+	result = dax_insert_mapping(inode, &bh, vma, vmf);
 	if (buffer_unwritten(&bh)) {
 		if (complete_unwritten)
-			complete_unwritten(&bh, !error);
+			complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 		else
 			WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE));
 	}
+	return result | major;
 
  out:
 	if (error == -ENOMEM)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27dbd1b..a95242c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2170,8 +2170,10 @@ struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
-int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
+int vm_insert_pfn(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn);
+int vmf_insert_pfn(struct vm_area_struct *, unsigned long addr,
+			pfn_t pfn, bool write);
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 			pfn_t pfn);
 int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index 708a0c7c..b93bcba 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1505,8 +1505,15 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+static pte_t maybe_pte_mkwrite(pte_t pte, struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_flags & VM_WRITE))
+		pte = pte_mkwrite(pte);
+	return pte;
+}
+
 static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
-			pfn_t pfn, pgprot_t prot)
+			pfn_t pfn, pgprot_t prot, bool write)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int retval;
@@ -1526,6 +1533,10 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 		entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
 	else
 		entry = pte_mkspecial(pfn_t_pte(pfn, prot));
+	if (write) {
+		entry = pte_mkyoung(pte_mkdirty(entry));
+		entry = maybe_pte_mkwrite(entry, vma);
+	}
 	set_pte_at(mm, addr, pte, entry);
 	update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
 
@@ -1537,26 +1548,28 @@ out:
 }
 
 /**
- * vm_insert_pfn - insert single pfn into user vma
+ * vmf_insert_pfn - insert single pfn into user vma
  * @vma: user vma to map to
  * @addr: target user address of this page
  * @pfn: source kernel pfn
+ * @write: Whether to insert a writable entry
  *
  * Similar to vm_insert_page, this allows drivers to insert individual pages
  * they've allocated into a user vma. Same comments apply.
  *
  * This function should only be called from a vm_ops->fault handler, and
- * in that case the handler should return NULL.
+ * the return value from this function is suitable for returning from that
+ * handler.
  *
  * vma cannot be a COW mapping.
  *
  * As this is called only for pages that do not currently exist, we
  * do not need to flush old virtual caches or the TLB.
  */
-int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
-			unsigned long pfn)
+int vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
+			pfn_t pfn, bool write)
 {
-	int ret;
+	int error;
 	pgprot_t pgprot = vma->vm_page_prot;
 	/*
 	 * Technically, architectures with pte_special can avoid all these
@@ -1568,16 +1581,29 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 	BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) ==
 						(VM_PFNMAP|VM_MIXEDMAP));
 	BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
-	BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn));
+	BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_t_valid(pfn));
 
 	if (addr < vma->vm_start || addr >= vma->vm_end)
-		return -EFAULT;
-	if (track_pfn_insert(vma, &pgprot, __pfn_to_pfn_t(pfn, PFN_DEV)))
-		return -EINVAL;
+		return VM_FAULT_SIGBUS;
+	if (track_pfn_insert(vma, &pgprot, pfn))
+		return VM_FAULT_SIGBUS;
 
-	ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot);
+	error = insert_pfn(vma, addr, pfn, pgprot, write);
+	if (error == -EBUSY || !error)
+		return VM_FAULT_NOPAGE;
+	return VM_FAULT_SIGBUS;
+}
+EXPORT_SYMBOL(vmf_insert_pfn);
 
-	return ret;
+/* TODO: Convert users to vmf_insert_pfn */
+int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn)
+{
+	int result = vmf_insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV),
+								false);
+	if (result & VM_FAULT_ERROR)
+		return -EFAULT;
+	return 0;
 }
 EXPORT_SYMBOL(vm_insert_pfn);
 
@@ -1602,7 +1628,7 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 		page = pfn_t_to_page(pfn);
 		return insert_page(vma, addr, page, vma->vm_page_prot);
 	}
-	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+	return insert_pfn(vma, addr, pfn, vma->vm_page_prot, false);
 }
 EXPORT_SYMBOL(vm_insert_mixed);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping
  2016-01-11 19:35 ` Matthew Wilcox
@ 2016-01-12 13:29   ` Matthew Wilcox
  0 siblings, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2016-01-12 13:29 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Kirill A. Shutemov, Dan Williams, Andrew Morton,
	linux-mm@kvack.org, Ross Zwisler, Oleg Nesterov, Mel Gorman,
	Johannes Weiner

On Mon, Jan 11, 2016 at 02:35:23PM -0500, Matthew Wilcox wrote:
> I would rather see the memory.c code move in the direction of the
> huge_memory.c code.  How about something like this?

Whoops, missed some bits in the DAX conversion where we'd return an -errno instead of VM_FAULT flags.  Take two.

diff --git a/fs/dax.c b/fs/dax.c
index a610cbe..deff70f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -498,6 +498,7 @@ EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 			struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	const bool write = vmf->flags & FAULT_FLAG_WRITE;
 	unsigned long vaddr = (unsigned long)vmf->virtual_address;
 	struct address_space *mapping = inode->i_mapping;
 	struct block_device *bdev = bh->b_bdev;
@@ -506,7 +507,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 		.size = bh->b_size,
 	};
 	pgoff_t size;
-	int error;
+	int result;
 
 	i_mmap_lock_read(mapping);
 
@@ -518,15 +519,11 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	 * allocated past the end of the file.
 	 */
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (unlikely(vmf->pgoff >= size)) {
-		error = -EIO;
-		goto out;
-	}
+	if (unlikely(vmf->pgoff >= size))
+		goto sigbus;
 
-	if (dax_map_atomic(bdev, &dax) < 0) {
-		error = PTR_ERR(dax.addr);
-		goto out;
-	}
+	if (dax_map_atomic(bdev, &dax) < 0)
+		goto sigbus;
 
 	if (buffer_unwritten(bh) || buffer_new(bh)) {
 		clear_pmem(dax.addr, PAGE_SIZE);
@@ -534,17 +531,19 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	}
 	dax_unmap_atomic(bdev, &dax);
 
-	error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false,
-			vmf->flags & FAULT_FLAG_WRITE);
-	if (error)
-		goto out;
+	if (dax_radix_entry(mapping, vmf->pgoff, dax.sector, false, write))
+		goto sigbus;
 
-	error = vm_insert_mixed(vma, vaddr, dax.pfn);
+	result = vmf_insert_pfn(vma, vaddr, dax.pfn, write);
 
  out:
 	i_mmap_unlock_read(mapping);
 
-	return error;
+	return result;
+
+ sigbus:
+	result = VM_FAULT_SIGBUS;
+	goto out;
 }
 
 static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
@@ -559,7 +558,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned blkbits = inode->i_blkbits;
 	sector_t block;
 	pgoff_t size;
-	int error;
+	int result, error;
 	int major = 0;
 
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -660,13 +659,14 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * indicate what the callback should do via the uptodate variable, same
 	 * as for normal BH based IO completions.
 	 */
-	error = dax_insert_mapping(inode, &bh, vma, vmf);
+	result = dax_insert_mapping(inode, &bh, vma, vmf);
 	if (buffer_unwritten(&bh)) {
 		if (complete_unwritten)
-			complete_unwritten(&bh, !error);
+			complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 		else
 			WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE));
 	}
+	return result | major;
 
  out:
 	if (error == -ENOMEM)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27dbd1b..a95242c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2170,8 +2170,10 @@ struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
-int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
+int vm_insert_pfn(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn);
+int vmf_insert_pfn(struct vm_area_struct *, unsigned long addr,
+			pfn_t pfn, bool write);
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 			pfn_t pfn);
 int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index 708a0c7c..b93bcba 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1505,8 +1505,15 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+static pte_t maybe_pte_mkwrite(pte_t pte, struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_flags & VM_WRITE))
+		pte = pte_mkwrite(pte);
+	return pte;
+}
+
 static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
-			pfn_t pfn, pgprot_t prot)
+			pfn_t pfn, pgprot_t prot, bool write)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int retval;
@@ -1526,6 +1533,10 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 		entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
 	else
 		entry = pte_mkspecial(pfn_t_pte(pfn, prot));
+	if (write) {
+		entry = pte_mkyoung(pte_mkdirty(entry));
+		entry = maybe_pte_mkwrite(entry, vma);
+	}
 	set_pte_at(mm, addr, pte, entry);
 	update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
 
@@ -1537,26 +1548,28 @@ out:
 }
 
 /**
- * vm_insert_pfn - insert single pfn into user vma
+ * vmf_insert_pfn - insert single pfn into user vma
  * @vma: user vma to map to
  * @addr: target user address of this page
  * @pfn: source kernel pfn
+ * @write: Whether to insert a writable entry
  *
  * Similar to vm_insert_page, this allows drivers to insert individual pages
  * they've allocated into a user vma. Same comments apply.
  *
  * This function should only be called from a vm_ops->fault handler, and
- * in that case the handler should return NULL.
+ * the return value from this function is suitable for returning from that
+ * handler.
  *
  * vma cannot be a COW mapping.
  *
  * As this is called only for pages that do not currently exist, we
  * do not need to flush old virtual caches or the TLB.
  */
-int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
-			unsigned long pfn)
+int vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
+			pfn_t pfn, bool write)
 {
-	int ret;
+	int error;
 	pgprot_t pgprot = vma->vm_page_prot;
 	/*
 	 * Technically, architectures with pte_special can avoid all these
@@ -1568,16 +1581,29 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 	BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) ==
 						(VM_PFNMAP|VM_MIXEDMAP));
 	BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
-	BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn));
+	BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_t_valid(pfn));
 
 	if (addr < vma->vm_start || addr >= vma->vm_end)
-		return -EFAULT;
-	if (track_pfn_insert(vma, &pgprot, __pfn_to_pfn_t(pfn, PFN_DEV)))
-		return -EINVAL;
+		return VM_FAULT_SIGBUS;
+	if (track_pfn_insert(vma, &pgprot, pfn))
+		return VM_FAULT_SIGBUS;
 
-	ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot);
+	error = insert_pfn(vma, addr, pfn, pgprot, write);
+	if (error == -EBUSY || !error)
+		return VM_FAULT_NOPAGE;
+	return VM_FAULT_SIGBUS;
+}
+EXPORT_SYMBOL(vmf_insert_pfn);
 
-	return ret;
+/* TODO: Convert users to vmf_insert_pfn */
+int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn)
+{
+	int result = vmf_insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV),
+								false);
+	if (result & VM_FAULT_ERROR)
+		return -EFAULT;
+	return 0;
 }
 EXPORT_SYMBOL(vm_insert_pfn);
 
@@ -1602,7 +1628,7 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
 		page = pfn_t_to_page(pfn);
 		return insert_page(vma, addr, page, vma->vm_page_prot);
 	}
-	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+	return insert_pfn(vma, addr, pfn, vma->vm_page_prot, false);
 }
 EXPORT_SYMBOL(vm_insert_mixed);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-01-12 13:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-10 13:59 [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Boaz Harrosh
2016-01-10 14:02 ` [PATCH 1/2] mm: Allow single pagefault on mmap-write with VM_MIXEDMAP Boaz Harrosh
2016-01-10 14:03 ` [PATCH 2/2] dax: Only fault once on mmap write access Boaz Harrosh
2016-01-11  1:19 ` [PATCHSET 0/2] Allow single pagefault in write access of a VM_MIXEDMAP mapping Dan Williams
2016-01-11  9:22   ` Boaz Harrosh
2016-01-11 16:37     ` Dan Williams
2016-01-11 19:35 ` Matthew Wilcox
2016-01-12 13:29   ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).