From: Andrew Morton <akpm@linux-foundation.org>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@kernel.dk>
Subject: Re: [RFC 1/2] deactive invalidated pages
Date: Mon, 22 Nov 2010 14:14:49 -0800 [thread overview]
Message-ID: <20101122141449.9de58a2c.akpm@linux-foundation.org> (raw)
In-Reply-To: <bdd6628e81c06f6871983c971d91160fca3f8b5e.1290349672.git.minchan.kim@gmail.com>
On Sun, 21 Nov 2010 23:30:23 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
>
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
>
> In fact, invalidate is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
>
> If it is real working set, it could have a enough time to
> activate the page since we always try to keep many pages in
> inactive list.
>
> I reuse lru_demote of Peter with some change.
>
>
> ...
>
> +/*
> + * Function used to forecefully demote a page to the head of the inactive
> + * list.
> + */
This comment is wrong? The page gets moved to the _tail_ of the
inactive list?
> +void lru_deactive_page(struct page *page)
Should be "deactivate" throughout the patch. IMO.
> +{
> + if (likely(get_page_unless_zero(page))) {
> + struct pagevec *pvec = &get_cpu_var(lru_deactive_pvecs);
> +
> + if (!pagevec_add(pvec, page))
> + __pagevec_lru_deactive(pvec);
> + put_cpu_var(lru_deactive_pvecs);
> + }
> }
>
> +
> void lru_add_drain(void)
> {
> drain_cpu_pagevecs(get_cpu());
> diff --git a/mm/truncate.c b/mm/truncate.c
> index cd94607..c73fb19 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -332,7 +332,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> {
> struct pagevec pvec;
> pgoff_t next = start;
> - unsigned long ret = 0;
> + unsigned long ret;
> + unsigned long count = 0;
> int i;
>
> pagevec_init(&pvec, 0);
> @@ -359,8 +360,10 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> if (lock_failed)
> continue;
>
> - ret += invalidate_inode_page(page);
> -
> + ret = invalidate_inode_page(page);
> + if (!ret)
> + lru_deactive_page(page);
This is the core part of the patch and it needs a code comment to
explain the reasons for doing this.
I wonder about the page_mapped() case. We were unable to invalidate
the page because it was mapped into pagetables. But was it really
appropriate to deactivate the page in that case?
> + count += ret;
> unlock_page(page);
> if (next > end)
> break;
Suggested updates:
include/linux/swap.h | 2 +-
mm/swap.c | 13 ++++++-------
mm/truncate.c | 7 ++++++-
3 files changed, 13 insertions(+), 9 deletions(-)
diff -puN include/linux/swap.h~mm-deactivate-invalidated-pages-fix include/linux/swap.h
--- a/include/linux/swap.h~mm-deactivate-invalidated-pages-fix
+++ a/include/linux/swap.h
@@ -213,7 +213,7 @@ extern void mark_page_accessed(struct pa
extern void lru_add_drain(void);
extern int lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
-extern void lru_deactive_page(struct page *page);
+extern void lru_deactivate_page(struct page *page);
extern void swap_setup(void);
extern void add_page_to_unevictable_list(struct page *page);
diff -puN mm/swap.c~mm-deactivate-invalidated-pages-fix mm/swap.c
--- a/mm/swap.c~mm-deactivate-invalidated-pages-fix
+++ a/mm/swap.c
@@ -39,7 +39,7 @@ int page_cluster;
static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactive_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
/*
@@ -334,23 +334,22 @@ static void drain_cpu_pagevecs(int cpu)
local_irq_restore(flags);
}
- pvec = &per_cpu(lru_deactive_pvecs, cpu);
+ pvec = &per_cpu(lru_deactivate_pvecs, cpu);
if (pagevec_count(pvec))
__pagevec_lru_deactive(pvec);
}
/*
- * Function used to forecefully demote a page to the head of the inactive
- * list.
+ * Forecfully demote a page to the tail of the inactive list.
*/
-void lru_deactive_page(struct page *page)
+void lru_deactivate_page(struct page *page)
{
if (likely(get_page_unless_zero(page))) {
- struct pagevec *pvec = &get_cpu_var(lru_deactive_pvecs);
+ struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
if (!pagevec_add(pvec, page))
__pagevec_lru_deactive(pvec);
- put_cpu_var(lru_deactive_pvecs);
+ put_cpu_var(lru_deactivate_pvecs);
}
}
diff -puN mm/truncate.c~mm-deactivate-invalidated-pages-fix mm/truncate.c
--- a/mm/truncate.c~mm-deactivate-invalidated-pages-fix
+++ a/mm/truncate.c
@@ -361,8 +361,13 @@ unsigned long invalidate_mapping_pages(s
continue;
ret = invalidate_inode_page(page);
+ /*
+ * If the page was dirty or under writeback we cannot
+ * invalidate it now. Move it to the tail of the
+ * inactive LRU so that reclaim will free it promptly.
+ */
if (!ret)
- lru_deactive_page(page);
+ lru_deactivate_page(page);
count += ret;
unlock_page(page);
if (next > end)
_
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@kernel.dk>
Subject: Re: [RFC 1/2] deactive invalidated pages
Date: Mon, 22 Nov 2010 14:14:49 -0800 [thread overview]
Message-ID: <20101122141449.9de58a2c.akpm@linux-foundation.org> (raw)
In-Reply-To: <bdd6628e81c06f6871983c971d91160fca3f8b5e.1290349672.git.minchan.kim@gmail.com>
On Sun, 21 Nov 2010 23:30:23 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
>
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
>
> In fact, invalidate is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
>
> If it is real working set, it could have a enough time to
> activate the page since we always try to keep many pages in
> inactive list.
>
> I reuse lru_demote of Peter with some change.
>
>
> ...
>
> +/*
> + * Function used to forecefully demote a page to the head of the inactive
> + * list.
> + */
This comment is wrong? The page gets moved to the _tail_ of the
inactive list?
> +void lru_deactive_page(struct page *page)
Should be "deactivate" throughout the patch. IMO.
> +{
> + if (likely(get_page_unless_zero(page))) {
> + struct pagevec *pvec = &get_cpu_var(lru_deactive_pvecs);
> +
> + if (!pagevec_add(pvec, page))
> + __pagevec_lru_deactive(pvec);
> + put_cpu_var(lru_deactive_pvecs);
> + }
> }
>
> +
> void lru_add_drain(void)
> {
> drain_cpu_pagevecs(get_cpu());
> diff --git a/mm/truncate.c b/mm/truncate.c
> index cd94607..c73fb19 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -332,7 +332,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> {
> struct pagevec pvec;
> pgoff_t next = start;
> - unsigned long ret = 0;
> + unsigned long ret;
> + unsigned long count = 0;
> int i;
>
> pagevec_init(&pvec, 0);
> @@ -359,8 +360,10 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> if (lock_failed)
> continue;
>
> - ret += invalidate_inode_page(page);
> -
> + ret = invalidate_inode_page(page);
> + if (!ret)
> + lru_deactive_page(page);
This is the core part of the patch and it needs a code comment to
explain the reasons for doing this.
I wonder about the page_mapped() case. We were unable to invalidate
the page because it was mapped into pagetables. But was it really
appropriate to deactivate the page in that case?
> + count += ret;
> unlock_page(page);
> if (next > end)
> break;
Suggested updates:
include/linux/swap.h | 2 +-
mm/swap.c | 13 ++++++-------
mm/truncate.c | 7 ++++++-
3 files changed, 13 insertions(+), 9 deletions(-)
diff -puN include/linux/swap.h~mm-deactivate-invalidated-pages-fix include/linux/swap.h
--- a/include/linux/swap.h~mm-deactivate-invalidated-pages-fix
+++ a/include/linux/swap.h
@@ -213,7 +213,7 @@ extern void mark_page_accessed(struct pa
extern void lru_add_drain(void);
extern int lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
-extern void lru_deactive_page(struct page *page);
+extern void lru_deactivate_page(struct page *page);
extern void swap_setup(void);
extern void add_page_to_unevictable_list(struct page *page);
diff -puN mm/swap.c~mm-deactivate-invalidated-pages-fix mm/swap.c
--- a/mm/swap.c~mm-deactivate-invalidated-pages-fix
+++ a/mm/swap.c
@@ -39,7 +39,7 @@ int page_cluster;
static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactive_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
/*
@@ -334,23 +334,22 @@ static void drain_cpu_pagevecs(int cpu)
local_irq_restore(flags);
}
- pvec = &per_cpu(lru_deactive_pvecs, cpu);
+ pvec = &per_cpu(lru_deactivate_pvecs, cpu);
if (pagevec_count(pvec))
__pagevec_lru_deactive(pvec);
}
/*
- * Function used to forecefully demote a page to the head of the inactive
- * list.
+ * Forecfully demote a page to the tail of the inactive list.
*/
-void lru_deactive_page(struct page *page)
+void lru_deactivate_page(struct page *page)
{
if (likely(get_page_unless_zero(page))) {
- struct pagevec *pvec = &get_cpu_var(lru_deactive_pvecs);
+ struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
if (!pagevec_add(pvec, page))
__pagevec_lru_deactive(pvec);
- put_cpu_var(lru_deactive_pvecs);
+ put_cpu_var(lru_deactivate_pvecs);
}
}
diff -puN mm/truncate.c~mm-deactivate-invalidated-pages-fix mm/truncate.c
--- a/mm/truncate.c~mm-deactivate-invalidated-pages-fix
+++ a/mm/truncate.c
@@ -361,8 +361,13 @@ unsigned long invalidate_mapping_pages(s
continue;
ret = invalidate_inode_page(page);
+ /*
+ * If the page was dirty or under writeback we cannot
+ * invalidate it now. Move it to the tail of the
+ * inactive LRU so that reclaim will free it promptly.
+ */
if (!ret)
- lru_deactive_page(page);
+ lru_deactivate_page(page);
count += ret;
unlock_page(page);
if (next > end)
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-22 22:15 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-21 14:30 [RFC 1/2] deactive invalidated pages Minchan Kim
2010-11-21 14:30 ` Minchan Kim
2010-11-21 14:30 ` [RFC 2/2] Prevent promotion of page in madvise_dontneed Minchan Kim
2010-11-21 14:30 ` Minchan Kim
2010-11-21 14:38 ` Minchan Kim
2010-11-21 14:38 ` Minchan Kim
2010-11-21 16:34 ` Ben Gamari
2010-11-21 16:34 ` Ben Gamari
2010-11-22 0:31 ` Minchan Kim
2010-11-22 0:31 ` Minchan Kim
2010-11-22 22:21 ` Andrew Morton
2010-11-22 22:21 ` Andrew Morton
2010-11-23 4:57 ` Minchan Kim
2010-11-23 4:57 ` Minchan Kim
2010-11-23 9:50 ` Mel Gorman
2010-11-23 9:50 ` Mel Gorman
2010-11-23 23:49 ` Minchan Kim
2010-11-23 23:49 ` Minchan Kim
2010-11-21 15:21 ` [RFC 1/2] deactive invalidated pages Ben Gamari
2010-11-21 15:21 ` Ben Gamari
2010-11-23 7:16 ` KOSAKI Motohiro
2010-11-23 7:16 ` KOSAKI Motohiro
2010-11-23 13:48 ` Ben Gamari
2010-11-23 13:48 ` Ben Gamari
2010-11-23 23:48 ` Minchan Kim
2010-11-23 23:48 ` Minchan Kim
2010-11-23 14:49 ` [RFC PATCH] fadvise support in rsync Ben Gamari
2010-11-23 14:49 ` Ben Gamari
2010-11-23 15:35 ` Pádraig Brady
2010-11-23 15:35 ` Pádraig Brady
2010-11-24 0:17 ` KOSAKI Motohiro
2010-11-24 0:17 ` KOSAKI Motohiro
2010-11-23 14:49 ` [PATCH 1/3] Add fadvise interface wrapper Ben Gamari
2010-11-23 14:49 ` Ben Gamari
2010-11-23 14:49 ` [PATCH 2/3] Inform kernel of FADV_DONTNEED hint in sender Ben Gamari
2010-11-23 14:49 ` Ben Gamari
2010-11-23 14:49 ` [PATCH 3/3] Inform kernel of FADV_DONTNEED hint in receiver Ben Gamari
2010-11-23 14:49 ` Ben Gamari
2010-11-22 1:17 ` [RFC 1/2] deactive invalidated pages Rik van Riel
2010-11-22 1:17 ` Rik van Riel
2010-11-22 22:14 ` Andrew Morton [this message]
2010-11-22 22:14 ` Andrew Morton
2010-11-23 4:52 ` Minchan Kim
2010-11-23 4:52 ` Minchan Kim
2010-11-23 5:01 ` Andrew Morton
2010-11-23 5:01 ` Andrew Morton
2010-11-23 5:23 ` Minchan Kim
2010-11-23 5:23 ` Minchan Kim
2010-11-23 5:22 ` Andrew Morton
2010-11-23 5:22 ` Andrew Morton
2010-11-23 5:45 ` Minchan Kim
2010-11-23 5:45 ` Minchan Kim
2010-11-23 5:48 ` Andrew Morton
2010-11-23 5:48 ` Andrew Morton
2010-11-23 6:05 ` Minchan Kim
2010-11-23 6:05 ` Minchan Kim
2010-11-23 7:15 ` Andrew Morton
2010-11-23 7:15 ` Andrew Morton
2010-11-23 7:44 ` Minchan Kim
2010-11-23 7:44 ` Minchan Kim
2010-11-23 7:53 ` Andrew Morton
2010-11-23 7:53 ` Andrew Morton
2010-11-23 8:02 ` Minchan Kim
2010-11-23 8:02 ` Minchan Kim
2010-11-23 9:43 ` Mel Gorman
2010-11-23 9:43 ` Mel Gorman
2010-11-23 23:32 ` Minchan Kim
2010-11-23 23:32 ` Minchan Kim
2010-11-23 9:38 ` Mel Gorman
2010-11-23 9:38 ` Mel Gorman
2010-11-23 14:55 ` Ben Gamari
2010-11-23 14:55 ` Ben Gamari
2010-11-23 14:58 ` Mel Gorman
2010-11-23 14:58 ` Mel Gorman
2010-11-23 20:35 ` Andrew Morton
2010-11-23 20:35 ` Andrew Morton
2010-11-23 22:10 ` Mel Gorman
2010-11-23 22:10 ` Mel Gorman
2010-11-23 23:45 ` Minchan Kim
2010-11-23 23:45 ` Minchan Kim
2010-11-24 18:01 ` Mel Gorman
2010-11-24 18:01 ` Mel Gorman
2010-11-23 7:16 ` KOSAKI Motohiro
2010-11-23 7:16 ` KOSAKI Motohiro
2010-11-23 7:40 ` Minchan Kim
2010-11-23 7:40 ` Minchan Kim
2010-11-23 7:42 ` Andrew Morton
2010-11-23 7:42 ` Andrew Morton
2010-11-23 8:01 ` KOSAKI Motohiro
2010-11-23 8:01 ` KOSAKI Motohiro
2010-11-23 8:44 ` Minchan Kim
2010-11-23 8:44 ` Minchan Kim
2010-11-23 9:02 ` KOSAKI Motohiro
2010-11-23 9:02 ` KOSAKI Motohiro
2010-11-23 9:05 ` Minchan Kim
2010-11-23 9:05 ` Minchan Kim
2010-11-23 9:07 ` Minchan Kim
2010-11-23 9:07 ` Minchan Kim
2010-11-23 14:57 ` Ben Gamari
2010-11-23 14:57 ` Ben Gamari
2010-11-24 0:13 ` KOSAKI Motohiro
2010-11-24 0:13 ` KOSAKI Motohiro
2010-11-23 9:28 ` Mel Gorman
2010-11-23 9:28 ` Mel Gorman
2010-11-23 23:24 ` Minchan Kim
2010-11-23 23:24 ` Minchan Kim
2010-11-24 10:02 ` Mel Gorman
2010-11-24 10:02 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101122141449.9de58a2c.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=npiggin@kernel.dk \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.