From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Ben Gamari <bgamari.foss@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@kernel.dk>, Mel Gorman <mel@csn.ul.ie>
Subject: Re: [PATCH v2 1/3] deactivate invalidated pages
Date: Mon, 29 Nov 2010 15:49:54 +0800 [thread overview]
Message-ID: <20101129074954.GB22803@localhost> (raw)
In-Reply-To: <7b50614882592047dfd96f6ca2bb2d0baa8f5367.1290956059.git.minchan.kim@gmail.com>
On Sun, Nov 28, 2010 at 11:02:55PM +0800, Minchan Kim wrote:
> This patch is based on mmotm-11-23.
I cannot find __pagevec_lru_deactive() in mmotm-11-23.
Do you have any more patches?
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
>
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
>
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
>
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. This patch uses trick PG_reclaim so the page would
> be moved into tail of inactive list when the page writeout completes.
>
> It can prevent writeout of pageout which is less effective than
> flusher's writeout.
>
> This patch considers page_mappged(page) with working set.
> So the page could leave head of inactive to get a change to activate.
>
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
>
> Note :
> PG_reclaim trick of writeback page could race with end_page_writeback
> so this patch check PageWriteback one more. It makes race window time
> reall small. But by theoretical, it still have a race. But it's a trivial.
>
> Quote from fe3cba17 and some modification
> "If some page PG_reclaim unintentionally, it will confuse readahead and
> make it restart the size rampup process. But it's a trivial problem, and
> can mostly be avoided by checking PageWriteback(page) first in readahead"
>
> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
> always clears PG_reclaim. Next patch will fix it.
>
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
>
> Changelog since v1:
> - modify description
> - correct typo
> - add some comment
> - change deactivation policy
> ---
> mm/swap.c | 84 +++++++++++++++++++++++++++++++++++++++++++++---------------
> 1 files changed, 63 insertions(+), 21 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 31f5ec4..345eca1 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
> spin_unlock_irq(&zone->lru_lock);
> }
>
> -static void __pagevec_lru_deactive(struct pagevec *pvec)
> +/*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head or tail to reclaim ASAP and evict
> + * working set page.
> + *
> + * PG_reclaim means when the page's writeback completes, the page
> + * will move into tail of inactive for reclaiming ASAP.
> + *
> + * 1. active, mapped page -> inactive, head
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
> + */
> +static void __lru_deactivate(struct page *page, struct zone *zone)
> {
> - int i, lru, file;
> + int lru, file;
> + int active = 0;
> +
> + if (!PageLRU(page))
> + return;
> +
> + if (PageActive(page))
> + active = 1;
> + /* Some processes are using the page */
> + if (page_mapped(page) && !active)
> + return;
It's good to check such protections if doing heuristic demotion.
However if requested explicitly by the user, I'm _much more_ inclined
to act stupid&dumb and meet the user's expectation. Or will this code
be called by someone other than DONTNEED? Sorry I have no context of
the full code.
> + else if (PageWriteback(page)) {
> + SetPageReclaim(page);
> + /* Check race with end_page_writeback */
> + if (!PageWriteback(page))
> + ClearPageReclaim(page);
Does the double check help a lot?
> + } else if (PageDirty(page))
> + SetPageReclaim(page);
Typically there are much more dirty pages than writeback pages.
I guess it's a good place to call bdi_start_inode_writeback() which
was posted here: http://www.spinics.net/lists/linux-mm/msg10833.html
Thanks,
Fengguang
> +
> + file = page_is_file_cache(page);
> + lru = page_lru_base_type(page);
> + del_page_from_lru_list(zone, page, lru + active);
> + ClearPageActive(page);
> + ClearPageReferenced(page);
> + add_page_to_lru_list(zone, page, lru);
> + if (active)
> + __count_vm_event(PGDEACTIVATE);
> +
> + update_page_reclaim_stat(zone, page, file, 0);
> +}
>
> +/*
> + * This function must be called with preemption disable.
> + */
> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
> +{
> + int i;
> struct zone *zone = NULL;
>
> for (i = 0; i < pagevec_count(pvec); i++) {
> @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
> zone = pagezone;
> spin_lock_irq(&zone->lru_lock);
> }
> -
> - if (PageLRU(page)) {
> - if (PageActive(page)) {
> - file = page_is_file_cache(page);
> - lru = page_lru_base_type(page);
> - del_page_from_lru_list(zone, page,
> - lru + LRU_ACTIVE);
> - ClearPageActive(page);
> - ClearPageReferenced(page);
> - add_page_to_lru_list(zone, page, lru);
> - __count_vm_event(PGDEACTIVATE);
> -
> - update_page_reclaim_stat(zone, page, file, 0);
> - }
> - }
> + __lru_deactivate(page, zone);
> }
> if (zone)
> spin_unlock_irq(&zone->lru_lock);
> @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
>
> pvec = &per_cpu(lru_deactivate_pvecs, cpu);
> if (pagevec_count(pvec))
> - __pagevec_lru_deactive(pvec);
> + __pagevec_lru_deactivate(pvec);
> }
>
> /*
> - * Forecfully demote a page to the tail of the inactive list.
> + * Forcefully deactivate a page.
> + * This function is used for reclaiming the page ASAP when the page
> + * can't be invalidated by Dirty/Writeback.
> */
> void lru_deactivate_page(struct page *page)
> {
> @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
> struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>
> if (!pagevec_add(pvec, page))
> - __pagevec_lru_deactive(pvec);
> + __pagevec_lru_deactivate(pvec);
> put_cpu_var(lru_deactivate_pvecs);
> }
> }
>
> -
> void lru_add_drain(void)
> {
> drain_cpu_pagevecs(get_cpu());
> --
> 1.7.0.4
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Ben Gamari <bgamari.foss@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@kernel.dk>, Mel Gorman <mel@csn.ul.ie>
Subject: Re: [PATCH v2 1/3] deactivate invalidated pages
Date: Mon, 29 Nov 2010 15:49:54 +0800 [thread overview]
Message-ID: <20101129074954.GB22803@localhost> (raw)
In-Reply-To: <7b50614882592047dfd96f6ca2bb2d0baa8f5367.1290956059.git.minchan.kim@gmail.com>
On Sun, Nov 28, 2010 at 11:02:55PM +0800, Minchan Kim wrote:
> This patch is based on mmotm-11-23.
I cannot find __pagevec_lru_deactive() in mmotm-11-23.
Do you have any more patches?
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
>
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
>
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
>
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. This patch uses trick PG_reclaim so the page would
> be moved into tail of inactive list when the page writeout completes.
>
> It can prevent writeout of pageout which is less effective than
> flusher's writeout.
>
> This patch considers page_mappged(page) with working set.
> So the page could leave head of inactive to get a change to activate.
>
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
>
> Note :
> PG_reclaim trick of writeback page could race with end_page_writeback
> so this patch check PageWriteback one more. It makes race window time
> reall small. But by theoretical, it still have a race. But it's a trivial.
>
> Quote from fe3cba17 and some modification
> "If some page PG_reclaim unintentionally, it will confuse readahead and
> make it restart the size rampup process. But it's a trivial problem, and
> can mostly be avoided by checking PageWriteback(page) first in readahead"
>
> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
> always clears PG_reclaim. Next patch will fix it.
>
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
>
> Changelog since v1:
> - modify description
> - correct typo
> - add some comment
> - change deactivation policy
> ---
> mm/swap.c | 84 +++++++++++++++++++++++++++++++++++++++++++++---------------
> 1 files changed, 63 insertions(+), 21 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 31f5ec4..345eca1 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
> spin_unlock_irq(&zone->lru_lock);
> }
>
> -static void __pagevec_lru_deactive(struct pagevec *pvec)
> +/*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head or tail to reclaim ASAP and evict
> + * working set page.
> + *
> + * PG_reclaim means when the page's writeback completes, the page
> + * will move into tail of inactive for reclaiming ASAP.
> + *
> + * 1. active, mapped page -> inactive, head
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
> + */
> +static void __lru_deactivate(struct page *page, struct zone *zone)
> {
> - int i, lru, file;
> + int lru, file;
> + int active = 0;
> +
> + if (!PageLRU(page))
> + return;
> +
> + if (PageActive(page))
> + active = 1;
> + /* Some processes are using the page */
> + if (page_mapped(page) && !active)
> + return;
It's good to check such protections if doing heuristic demotion.
However if requested explicitly by the user, I'm _much more_ inclined
to act stupid&dumb and meet the user's expectation. Or will this code
be called by someone other than DONTNEED? Sorry I have no context of
the full code.
> + else if (PageWriteback(page)) {
> + SetPageReclaim(page);
> + /* Check race with end_page_writeback */
> + if (!PageWriteback(page))
> + ClearPageReclaim(page);
Does the double check help a lot?
> + } else if (PageDirty(page))
> + SetPageReclaim(page);
Typically there are much more dirty pages than writeback pages.
I guess it's a good place to call bdi_start_inode_writeback() which
was posted here: http://www.spinics.net/lists/linux-mm/msg10833.html
Thanks,
Fengguang
> +
> + file = page_is_file_cache(page);
> + lru = page_lru_base_type(page);
> + del_page_from_lru_list(zone, page, lru + active);
> + ClearPageActive(page);
> + ClearPageReferenced(page);
> + add_page_to_lru_list(zone, page, lru);
> + if (active)
> + __count_vm_event(PGDEACTIVATE);
> +
> + update_page_reclaim_stat(zone, page, file, 0);
> +}
>
> +/*
> + * This function must be called with preemption disable.
> + */
> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
> +{
> + int i;
> struct zone *zone = NULL;
>
> for (i = 0; i < pagevec_count(pvec); i++) {
> @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
> zone = pagezone;
> spin_lock_irq(&zone->lru_lock);
> }
> -
> - if (PageLRU(page)) {
> - if (PageActive(page)) {
> - file = page_is_file_cache(page);
> - lru = page_lru_base_type(page);
> - del_page_from_lru_list(zone, page,
> - lru + LRU_ACTIVE);
> - ClearPageActive(page);
> - ClearPageReferenced(page);
> - add_page_to_lru_list(zone, page, lru);
> - __count_vm_event(PGDEACTIVATE);
> -
> - update_page_reclaim_stat(zone, page, file, 0);
> - }
> - }
> + __lru_deactivate(page, zone);
> }
> if (zone)
> spin_unlock_irq(&zone->lru_lock);
> @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
>
> pvec = &per_cpu(lru_deactivate_pvecs, cpu);
> if (pagevec_count(pvec))
> - __pagevec_lru_deactive(pvec);
> + __pagevec_lru_deactivate(pvec);
> }
>
> /*
> - * Forecfully demote a page to the tail of the inactive list.
> + * Forcefully deactivate a page.
> + * This function is used for reclaiming the page ASAP when the page
> + * can't be invalidated by Dirty/Writeback.
> */
> void lru_deactivate_page(struct page *page)
> {
> @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
> struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>
> if (!pagevec_add(pvec, page))
> - __pagevec_lru_deactive(pvec);
> + __pagevec_lru_deactivate(pvec);
> put_cpu_var(lru_deactivate_pvecs);
> }
> }
>
> -
> void lru_add_drain(void)
> {
> drain_cpu_pagevecs(get_cpu());
> --
> 1.7.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-29 7:50 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
2010-11-28 15:02 ` Minchan Kim
2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
2010-11-28 15:02 ` Minchan Kim
2010-11-29 4:25 ` Rik van Riel
2010-11-29 4:25 ` Rik van Riel
2010-11-29 7:29 ` Wu Fengguang
2010-11-29 7:29 ` Wu Fengguang
2010-11-29 8:16 ` Minchan Kim
2010-11-29 8:16 ` Minchan Kim
2010-11-29 8:26 ` Wu Fengguang
2010-11-29 8:26 ` Wu Fengguang
2010-11-28 15:02 ` [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed Minchan Kim
2010-11-28 15:02 ` Minchan Kim
2010-11-29 4:28 ` Rik van Riel
2010-11-29 4:28 ` Rik van Riel
2010-11-29 4:30 ` Minchan Kim
2010-11-29 4:30 ` Minchan Kim
2010-11-28 15:13 ` [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
2010-11-28 15:13 ` Minchan Kim
2010-11-28 19:02 ` Rik van Riel
2010-11-28 19:02 ` Rik van Riel
2010-11-29 0:33 ` KOSAKI Motohiro
2010-11-29 0:33 ` KOSAKI Motohiro
2010-11-29 1:58 ` Ben Gamari
2010-11-29 1:58 ` Ben Gamari
2010-11-29 2:13 ` KOSAKI Motohiro
2010-11-29 2:13 ` KOSAKI Motohiro
2010-11-29 2:26 ` Ben Gamari
2010-11-29 2:26 ` Ben Gamari
2010-11-29 2:13 ` Minchan Kim
2010-11-29 2:13 ` Minchan Kim
2010-11-29 2:35 ` KOSAKI Motohiro
2010-11-29 2:35 ` KOSAKI Motohiro
2010-11-29 7:49 ` Wu Fengguang [this message]
2010-11-29 7:49 ` Wu Fengguang
2010-11-29 8:09 ` Minchan Kim
2010-11-29 8:09 ` Minchan Kim
2010-11-29 12:07 ` Mel Gorman
2010-11-29 12:07 ` Mel Gorman
2010-11-29 15:28 ` Minchan Kim
2010-11-29 15:28 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101129074954.GB22803@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bgamari.foss@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=npiggin@kernel.dk \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.