All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <piggin@cyberone.com.au>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@osdl.org>, Andi Kleen <ak@suse.de>,
	Rohit Seth <rohitseth@google.com>, Andrew Morton <akpm@osdl.org>,
	clameter@sgi.com, mbligh@google.com, hugh@veritas.com,
	riel@redhat.com, andrea@suse.de, arjan@infradead.org,
	apw@shadowen.org, mel@csn.ul.ie, marcelo@kvack.org,
	anton@samba.org, paulmck@us.ibm.com,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC][PATCH] tracking dirty pages in shared mappings
Date: Sat, 06 May 2006 23:18:35 +1000	[thread overview]
Message-ID: <445CA22B.8030807@cyberone.com.au> (raw)
In-Reply-To: <1146861313.3561.13.camel@lappy>

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

Peter Zijlstra wrote:

>People expressed the need to track dirty pages in shared mappings.
>Linus outlined the general idea of doing that through making clean
>writable pages write-protected and taking the write fault.
>
>This patch does exactly that, it makes pages in a shared writable
>mapping write-protected. On write-fault the pages are marked dirty and
>made writable. When the pages get synced with their backing store, the
>write-protection is re-instated.
>
>It survives a simple test and shows the dirty pages in /proc/vmstat.
>
>Comments?
>

Looks pretty good. Christoph and I were looking at ways to improve
performance impact of this, and skipping the extra work for particular
(eg. shmem) mappings might be a good idea?

Attached is a patch with a couple of things I've currently got.

In the long run, I'd like to be able to set_page_dirty and
balance_dirty_pages outside of both ptl and mmap_sem, for performance
reasons. That will require a reworking of arch code though :(



[-- Attachment #2: mm-track-dirty-mmap-fixes.patch --]
[-- Type: text/plain, Size: 6983 bytes --]

Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c	2006-05-06 23:05:10.000000000 +1000
+++ linux-2.6/mm/memory.c	2006-05-06 23:13:16.000000000 +1000
@@ -48,6 +48,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/backing-dev.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -1466,18 +1467,6 @@ static int do_wp_page(struct mm_struct *
 		}
 	}
 
-	if (VM_SharedWritable(vma)) {
-		flush_cache_page(vma, address, pte_pfn(orig_pte));
-		entry = pte_mkyoung(orig_pte);
-		entry = pte_mkwrite(pte_mkdirty(entry));
-		ptep_set_access_flags(vma, address, page_table, entry, 1);
-		update_mmu_cache(vma, address, entry);
-		lazy_mmu_prot_update(entry);
-		ret |= VM_FAULT_WRITE;
-		set_page_dirty(old_page);
-		goto unlock;
-	}
-
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
@@ -2131,8 +2120,11 @@ retry:
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		if (write_access)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-		else if (VM_SharedWritable(vma))
-			entry = pte_wrprotect(entry);
+		else if (VM_SharedWritable(vma)) {
+			struct address_space *mapping = page_mapping(new_page);
+			if (mapping && mapping_cap_account_dirty(mapping))
+				entry = pte_wrprotect(entry);
+		}
 		set_pte_at(mm, address, page_table, entry);
 		if (anon) {
 			inc_mm_counter(mm, anon_rss);
@@ -2241,12 +2233,22 @@ static inline int handle_pte_fault(struc
 	if (unlikely(!pte_same(*pte, entry)))
 		goto unlock;
 	if (write_access) {
-		if (!pte_write(entry))
-			return do_wp_page(mm, vma, address,
-					pte, pmd, ptl, entry);
+		if (!pte_write(entry)) {
+			if (!VM_SharedWritable(vma)) {
+				return do_wp_page(mm, vma, address,
+						pte, pmd, ptl, entry);
+			} else {
+				struct page *page;
+				entry = pte_mkwrite(entry);
+				page = vm_normal_page(vma, address, entry);
+				if (page)
+					set_page_dirty(page);
+			}
+		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
+
 	if (!pte_same(old_entry, entry)) {
 		ptep_set_access_flags(vma, address, pte, entry, write_access);
 		update_mmu_cache(vma, address, entry);
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2006-05-06 23:05:10.000000000 +1000
+++ linux-2.6/include/linux/mm.h	2006-05-06 23:06:17.000000000 +1000
@@ -183,8 +183,7 @@ extern unsigned int kobjsize(const void 
 #define VM_SequentialReadHint(v)	((v)->vm_flags & VM_SEQ_READ)
 #define VM_RandomReadHint(v)		((v)->vm_flags & VM_RAND_READ)
 
-#define VM_SharedWritable(v)		(((v)->vm_flags & (VM_SHARED | VM_MAYSHARE)) && \
-					 ((v)->vm_flags & VM_WRITE))
+#define VM_SharedWritable(v)		((v)->vm_flags & (VM_SHARED|VM_WRITE))
 
 /*
  * mapping from the currently active vm_flags protection bits (the
@@ -724,7 +723,7 @@ static inline void unmap_shared_mapping_
 
 extern int vmtruncate(struct inode * inode, loff_t offset);
 extern int vmtruncate_range(struct inode * inode, loff_t offset, loff_t end);
-extern int install_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot, int wrprotect);
+extern int install_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot);
 extern int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long pgoff, pgprot_t prot);
 
 #ifdef CONFIG_MMU
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2006-05-06 23:05:10.000000000 +1000
+++ linux-2.6/mm/filemap.c	2006-05-06 23:06:17.000000000 +1000
@@ -1582,8 +1582,7 @@ repeat:
 		return -ENOMEM;
 
 	if (page) {
-		err = install_page(mm, vma, addr, page, prot,
-				VM_SharedWritable(vma));
+		err = install_page(mm, vma, addr, page, prot);
 		if (err) {
 			page_cache_release(page);
 			return err;
Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c	2006-05-06 23:05:10.000000000 +1000
+++ linux-2.6/mm/fremap.c	2006-05-06 23:07:17.000000000 +1000
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/backing-dev.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -49,8 +50,7 @@ static int zap_pte(struct mm_struct *mm,
  * previously existing mapping.
  */
 int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long addr, struct page *page, pgprot_t prot,
-		int wrprotect)
+		unsigned long addr, struct page *page, pgprot_t prot)
 {
 	struct inode *inode;
 	pgoff_t size;
@@ -81,8 +81,11 @@ int install_page(struct mm_struct *mm, s
 
 	flush_icache_page(vma, page);
 	pte_val = mk_pte(page, prot);
-	if (wrprotect)
-		pte_val = pte_wrprotect(pte_val);
+	if (VM_SharedWritable(vma)) {
+		struct address_space *mapping = page_mapping(page);
+		if (mapping && mapping_cap_account_dirty(mapping))
+			pte_val = pte_wrprotect(pte_val);
+	}
 	set_pte_at(mm, addr, pte, pte_val);
 	page_add_file_rmap(page);
 	update_mmu_cache(vma, addr, pte_val);
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c	2006-05-06 23:05:10.000000000 +1000
+++ linux-2.6/mm/shmem.c	2006-05-06 23:06:17.000000000 +1000
@@ -1270,7 +1270,7 @@ static int shmem_populate(struct vm_area
 		/* Page may still be null, but only if nonblock was set. */
 		if (page) {
 			mark_page_accessed(page);
-			err = install_page(mm, vma, addr, page, prot, 0);
+			err = install_page(mm, vma, addr, page, prot);
 			if (err) {
 				page_cache_release(page);
 				return err;
Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c	2006-05-06 23:05:10.000000000 +1000
+++ linux-2.6/mm/page-writeback.c	2006-05-06 23:06:28.000000000 +1000
@@ -29,6 +29,7 @@
 #include <linux/sysctl.h>
 #include <linux/cpu.h>
 #include <linux/syscalls.h>
+#include <linux/rmap.h>
 
 /*
  * The maximum number of pages to writeout in a single bdflush/kupdate
@@ -725,9 +726,10 @@ int test_clear_page_dirty(struct page *p
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
 			write_unlock_irqrestore(&mapping->tree_lock, flags);
-			page_wrprotect(page);
-			if (mapping_cap_account_dirty(mapping))
+			if (mapping_cap_account_dirty(mapping)) {
+				page_wrprotect(page);
 				dec_page_state(nr_dirty);
+			}
 			return 1;
 		}
 		write_unlock_irqrestore(&mapping->tree_lock, flags);
@@ -757,9 +759,10 @@ int clear_page_dirty_for_io(struct page 
 
 	if (mapping) {
 		if (TestClearPageDirty(page)) {
-			page_wrprotect(page);
-			if (mapping_cap_account_dirty(mapping))
+			if (mapping_cap_account_dirty(mapping)) {
+				page_wrprotect(page);
 				dec_page_state(nr_dirty);
+			}
 			return 1;
 		}
 		return 0;

  reply	other threads:[~2006-05-06 13:18 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-05 20:35 [RFC][PATCH] tracking dirty pages in shared mappings Peter Zijlstra
2006-05-06 13:18 ` Nick Piggin [this message]
2006-05-06 13:34   ` Peter Zijlstra
2006-05-06 13:47     ` Nick Piggin
2006-05-06 15:29       ` Peter Zijlstra
2006-05-07  0:40         ` Nick Piggin
2006-05-07  3:43           ` Nick Piggin
2006-05-08  6:43         ` Christoph Lameter
2006-05-08  7:23           ` Peter Zijlstra
2006-05-08 19:20           ` [RFC][PATCH 1/2] tracking dirty pages in shared mappings -V3 Peter Zijlstra
2006-05-09  5:41             ` Christoph Lameter
2006-05-09  6:06               ` Peter Zijlstra
2006-05-09 20:44               ` [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4 Peter Zijlstra
2006-05-09 20:52                 ` Peter Chubb
2006-05-09 20:55                   ` Martin Bligh
2006-05-09 22:56                     ` Brian Twichell
2006-05-10  0:24                     ` Linus Torvalds
2006-05-10  0:29                       ` Nick Piggin
2006-05-10  1:24                         ` Linus Torvalds
2006-05-11 15:02                 ` Andrew Morton
2006-05-11 16:39                   ` Andy Whitcroft
2006-05-11 22:52                   ` Christoph Lameter
2006-05-11 23:30                     ` Linus Torvalds
2006-05-11 23:44                       ` Andrew Morton
2006-05-12  0:10                         ` Linus Torvalds
2006-05-12  8:07                         ` Andy Whitcroft
2006-05-12 14:25                           ` Martin J. Bligh
2006-05-14 15:58                         ` Andy Whitcroft
2006-05-12  1:51                   ` Nick Piggin
2006-05-12  4:30                     ` Andrew Morton
2006-05-12  5:05                       ` Nick Piggin
2006-05-12  7:06                       ` Peter Zijlstra
2006-05-12  8:04                         ` Nick Piggin
2006-05-12  8:52                           ` Peter Zijlstra
2006-05-12  8:07                         ` Nick Piggin
2006-05-12  4:51                   ` Christoph Lameter
2006-05-09 20:44               ` [RFC][PATCH 2/3] throttle writers of shared mappings Peter Zijlstra
2006-05-09 22:54                 ` Nick Piggin
2006-05-09 22:55                   ` Nick Piggin
2006-05-10  6:25                     ` Peter Zijlstra
2006-05-09 20:44               ` [RFC][PATCH 3/3] optimize follow_pages() Peter Zijlstra
2006-05-10  6:30                 ` Peter Zijlstra
2006-05-08 19:24           ` [RFC][PATCH 2/2] throttle writers of shared mappings Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=445CA22B.8030807@cyberone.com.au \
    --to=piggin@cyberone.com.au \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=anton@samba.org \
    --cc=apw@shadowen.org \
    --cc=arjan@infradead.org \
    --cc=clameter@sgi.com \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=marcelo@kvack.org \
    --cc=mbligh@google.com \
    --cc=mel@csn.ul.ie \
    --cc=paulmck@us.ibm.com \
    --cc=riel@redhat.com \
    --cc=rohitseth@google.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.