[PATCH 15/16] mm: Non-atomically mark page accessed in write_begin where possible

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Linux-MM <linux-mm@kvack.org>
Cc: Linux-FSDevel <linux-fsdevel@vger.kernel.org>
Subject: [PATCH 15/16] mm: Non-atomically mark page accessed in write_begin where possible
Date: Fri, 18 Apr 2014 15:50:42 +0100	[thread overview]
Message-ID: <1397832643-14275-16-git-send-email-mgorman@suse.de> (raw)
In-Reply-To: <1397832643-14275-1-git-send-email-mgorman@suse.de>

aops->write_begin may allocate a new page and make it visible just to
have mark_page_accessed called almost immediately after. Once it's visible
atomic operations are necessary which is noticable overhead when writing
to an in-memory filesystem like tmpfs but should also be noticable with
fast storage.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial allocation
of a page cache page. This patch adds an init_page_accessed() helper which
behaves like the first call to mark_page_accessed() but may called before
the page is visible and can be done non-atomically.

In this patch, new allocations in grab_cache_page_write_begin() or
find_or_create_page() use init_page_accessed() and existing pages use
mark_page_accessed().

This places a burden because filesystems need to ensure they either use these
helpers or update the helpers they do use to call init_page_accessed()
or mark_page_accessed() as appropriate. There is also a snag in that
the timing of the mark_page_accessed() has now changed so in rare cases
it's possible a page gets to the end of the LRU as PageReferenced where
as previously it might have been repromoted. This is expected to be rare
but it's worth the filesystem people thinking about it in case they see
a problem with the timing change.

In a profiled run measuring dd to tmpfs the overhead of mark_page_accessed was

25142     0.7055  vmlinux-3.15.0-rc1-vanilla vmlinux-3.15.0-rc1-vanilla shmem_write_end
107830    3.0256  vmlinux-3.15.0-rc1-vanilla vmlinux-3.15.0-rc1-vanilla mark_page_accessed

3.73% overall. With the patch applied, it becomes

118185    3.1712  vmlinux-3.15.0-rc1-microopt-v1r11 vmlinux-3.15.0-rc1-microopt-v1r11 shmem_write_end
2395      0.0643  vmlinux-3.15.0-rc1-microopt-v1r11 vmlinux-3.15.0-rc1-microopt-v1r11 init_page_accessed
159       0.0043  vmlinux-3.15.0-rc1-microopt-v1r11 vmlinux-3.15.0-rc1-microopt-v1r11 mark_page_accessed

3.23% overall. shmem_write_end increases in apparent cost because the
SetPageUptodate is now to a cache line that mark_page_accessed had not
dirtied for it. Even with that taken into account, it's still fewer
atomic operations overall.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/page-flags.h |  1 +
 include/linux/swap.h       |  1 +
 mm/filemap.c               | 55 +++++++++++++++++++++++++++-------------------
 mm/shmem.c                 |  6 ++++-
 mm/swap.c                  | 11 ++++++++++
 5 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4d4b39a..2093eb7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -198,6 +198,7 @@ struct page;	/* forward declaration */
 TESTPAGEFLAG(Locked, locked)
 PAGEFLAG(Error, error) TESTCLEARFLAG(Error, error)
 PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced)
+	__SETPAGEFLAG(Referenced, referenced)
 PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty)
 PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru)
 PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4a9ac85..e54312d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -314,6 +314,7 @@ extern void lru_add_page_tail(struct page *page, struct page *page_tail,
 			 struct lruvec *lruvec, struct list_head *head);
 extern void activate_page(struct page *);
 extern void mark_page_accessed(struct page *);
+extern void init_page_accessed(struct page *page);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
diff --git a/mm/filemap.c b/mm/filemap.c
index a82fbe4..c28f69c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1059,24 +1059,31 @@ struct page *find_or_create_page(struct address_space *mapping,
 	int err;
 repeat:
 	page = find_lock_page(mapping, index);
-	if (!page) {
-		page = __page_cache_alloc(gfp_mask);
-		if (!page)
-			return NULL;
-		/*
-		 * We want a regular kernel memory (not highmem or DMA etc)
-		 * allocation for the radix tree nodes, but we need to honour
-		 * the context-specific requirements the caller has asked for.
-		 * GFP_RECLAIM_MASK collects those requirements.
-		 */
-		err = add_to_page_cache_lru(page, mapping, index,
-			(gfp_mask & GFP_RECLAIM_MASK));
-		if (unlikely(err)) {
-			page_cache_release(page);
-			page = NULL;
-			if (err == -EEXIST)
-				goto repeat;
-		}
+	if (page) {
+		mark_page_accessed(page);
+		return page;
+	}
+
+	page = __page_cache_alloc(gfp_mask);
+	if (!page)
+		return NULL;
+
+	/* Init accessed so avoit atomic mark_page_accessed later */
+	init_page_accessed(page);
+
+	/*
+	 * We want a regular kernel memory (not highmem or DMA etc)
+	 * allocation for the radix tree nodes, but we need to honour
+	 * the context-specific requirements the caller has asked for.
+	 * GFP_RECLAIM_MASK collects those requirements.
+	 */
+	err = add_to_page_cache_lru(page, mapping, index,
+		(gfp_mask & GFP_RECLAIM_MASK));
+	if (unlikely(err)) {
+		page_cache_release(page);
+		page = NULL;
+		if (err == -EEXIST)
+			goto repeat;
 	}
 	return page;
 }
@@ -2372,7 +2379,6 @@ int pagecache_write_end(struct file *file, struct address_space *mapping,
 {
 	const struct address_space_operations *aops = mapping->a_ops;
 
-	mark_page_accessed(page);
 	return aops->write_end(file, mapping, pos, len, copied, page, fsdata);
 }
 EXPORT_SYMBOL(pagecache_write_end);
@@ -2466,12 +2472,18 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping,
 		gfp_notmask = __GFP_FS;
 repeat:
 	page = find_lock_page(mapping, index);
-	if (page)
+	if (page) {
+		mark_page_accessed(page);
 		goto found;
+	}
 
 	page = __page_cache_alloc(gfp_mask & ~gfp_notmask);
 	if (!page)
 		return NULL;
+
+	/* Init accessed so avoit atomic mark_page_accessed later */
+	init_page_accessed(page);
+
 	status = add_to_page_cache_lru(page, mapping, index,
 						GFP_KERNEL & ~gfp_notmask);
 	if (unlikely(status)) {
@@ -2530,7 +2542,7 @@ again:
 
 		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
 						&page, &fsdata);
-		if (unlikely(status))
+		if (unlikely(status < 0))
 			break;
 
 		if (mapping_writably_mapped(mapping))
@@ -2539,7 +2551,6 @@ again:
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 		flush_dcache_page(page);
 
-		mark_page_accessed(page);
 		status = a_ops->write_end(file, mapping, pos, bytes, copied,
 						page, fsdata);
 		if (unlikely(status < 0))
diff --git a/mm/shmem.c b/mm/shmem.c
index f47fb38..700a4ad 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1372,9 +1372,13 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned flags,
 			struct page **pagep, void **fsdata)
 {
+	int ret;
 	struct inode *inode = mapping->host;
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
-	return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL);
+	ret = shmem_getpage(inode, index, pagep, SGP_WRITE, NULL);
+	if (*pagep)
+		init_page_accessed(*pagep);
+	return ret;
 }
 
 static int
diff --git a/mm/swap.c b/mm/swap.c
index fed4caf..2490dfe 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -583,6 +583,17 @@ void mark_page_accessed(struct page *page)
 EXPORT_SYMBOL(mark_page_accessed);
 
 /*
+ * Used to mark_page_accessed(page) that is not visible yet and when it is
+ * still safe to use non-atomic ops
+ */
+void init_page_accessed(struct page *page)
+{
+	if (!PageReferenced(page))
+		__SetPageReferenced(page);
+}
+EXPORT_SYMBOL(init_page_accessed);
+
+/*
  * Queue the page for addition to the LRU via pagevec. The decision on whether
  * to add the page to the [in]active [file|anon] list is deferred until the
  * pagevec is drained. This gives a chance for the caller of __lru_cache_add()
-- 
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-04-18 14:50 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-18 14:50 [PATCH 00/16] Misc page alloc, shmem and mark_page_accessed optimisations Mel Gorman
2014-04-18 14:50 ` [PATCH 01/16] mm: Disable zone_reclaim_mode by default Mel Gorman
2014-04-18 17:26   ` Andi Kleen
2014-04-18 21:15     ` Dave Hansen
2014-04-18 14:50 ` [PATCH 02/16] mm: page_alloc: Do not cache reclaim distances Mel Gorman
2014-04-18 14:50 ` [PATCH 03/16] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-04-18 17:52   ` Johannes Weiner
2014-04-18 14:50 ` [PATCH 04/16] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-04-18 17:52   ` Johannes Weiner
2014-04-18 14:50 ` [PATCH 05/16] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-04-18 14:50 ` [PATCH 06/16] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-04-18 18:03   ` Johannes Weiner
2014-04-19 11:18     ` Mel Gorman
2014-04-18 14:50 ` [PATCH 07/16] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-04-18 18:05   ` Johannes Weiner
2014-04-18 14:50 ` [PATCH 08/16] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-04-18 18:08   ` Johannes Weiner
2014-04-19 11:19     ` Mel Gorman
2014-04-18 14:50 ` [PATCH 09/16] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-04-18 18:10   ` Johannes Weiner
2014-04-18 14:50 ` [PATCH 10/16] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-04-18 17:16   ` Vlastimil Babka
2014-04-18 14:50 ` [PATCH 11/16] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-04-18 14:50 ` [PATCH 12/16] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-04-18 18:13   ` Johannes Weiner
2014-04-18 14:50 ` [PATCH 13/16] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-04-18 14:50 ` [PATCH 14/16] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-04-18 14:50 ` Mel Gorman [this message]
2014-04-18 14:50 ` [PATCH 16/16] mm: filemap: Prefetch page->flags if !PageUptodate Mel Gorman
2014-04-18 19:16   ` Hugh Dickins
2014-04-19 11:23     ` Mel Gorman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4d4b39a dfblob:2093eb7 dfblob:4a9ac85 dfblob:e54312d
dfblob:a82fbe4 dfblob:c28f69c dfblob:f47fb38 dfblob:700a4ad
dfblob:fed4caf dfblob:2490dfe )
 OR (
bs:"[PATCH 15/16] mm: Non-atomically mark page accessed in write_begin where possible" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1397832643-14275-16-git-send-email-mgorman@suse.de \
    --to=mgorman@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).