From: Johannes Weiner <hannes@cmpxchg.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org, mel@csn.ul.ie,
akpm@linux-foundation.org
Subject: Re: [rfc] vmscan: serialize aggressive reclaimers
Date: Sat, 29 Nov 2008 16:39:04 +0100 [thread overview]
Message-ID: <20081129153902.GA1944@cmpxchg.org> (raw)
In-Reply-To: <20081129164322.8131.KOSAKI.MOTOHIRO@jp.fujitsu.com>
On Sat, Nov 29, 2008 at 04:46:24PM +0900, KOSAKI Motohiro wrote:
> > Since we have to pull through a reclaim cycle once we commited to it,
> > what do you think about serializing the lower priority levels
> > completely?
> >
> > The idea is that when one reclaimer has done a low priority level
> > iteration with a huge reclaim target, chances are that succeeding
> > reclaimers don't even need to drop to lower levels at all because
> > enough memory has already been freed.
> >
> > My testprogram maps and faults in a file that is about as large as my
> > physical memory. Then it spawns off n processes that try allocate
> > 1/2n of total memory in anon pages, i.e. half of it in sum. After it
> > ran, I check how much memory has been reclaimed. But my zone sizes
> > are too small to induce enormous reclaim targets so I don't see vast
> > over-reclaims.
> >
> > I have measured the time of other tests on an SMP machine with 4 cores
> > and the following patch applied. I couldn't see any performance
> > degradation. But since the bug is not triggerable here, I can not
> > prove it helps the original problem, either.
>
> I wonder why nobody of vmscan folks write actual performance improvement value
> in patch description.
That's why I made it RFC. I haven't seriously tested it, I just
wanted to know what people that understand more than I do think of the
idea.
> I think this patch point to right direction.
> but, unfortunately, this implementation isn't fast as I mesured as.
Fair enough.
> > The level where it starts serializing is chosen pretty arbitrarily.
> > Suggestions welcome :)
> >
> > Hannes
> >
> > ---
> >
> > Prevent over-reclaiming by serializing direct reclaimers below a
> > certain priority level.
> >
> > Over-reclaiming happens when the sum of the reclaim targets of all
> > reclaiming processes is larger than the sum of the needed free pages,
> > thus leading to excessive eviction of more cache and anonymous pages
> > than required.
> >
> > A scan iteration over all zones can not be aborted intermittently when
> > enough pages are reclaimed because that would mess up the scan balance
> > between the zones. Instead, prevent that too many processes
> > simultaneously commit themselves to lower priority level scans in the
> > first place.
> >
> > Chances are that after the exclusive reclaimer has finished, enough
> > memory has been freed that succeeding scanners don't need to drop to
> > lower priority levels at all anymore.
> >
> > Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
> > ---
> > mm/vmscan.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -35,6 +35,7 @@
> > #include <linux/notifier.h>
> > #include <linux/rwsem.h>
> > #include <linux/delay.h>
> > +#include <linux/wait.h>
> > #include <linux/kthread.h>
> > #include <linux/freezer.h>
> > #include <linux/memcontrol.h>
> > @@ -42,6 +43,7 @@
> > #include <linux/sysctl.h>
> >
> > #include <asm/tlbflush.h>
> > +#include <asm/atomic.h>
> > #include <asm/div64.h>
> >
> > #include <linux/swapops.h>
> > @@ -1546,10 +1548,15 @@ static unsigned long shrink_zones(int pr
> > * returns: 0, if no pages reclaimed
> > * else, the number of pages reclaimed
> > */
> > +
> > +static DECLARE_WAIT_QUEUE_HEAD(reclaim_wait);
> > +static atomic_t reclaim_exclusive = ATOMIC_INIT(0);
> > +
> > static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> > struct scan_control *sc)
> > {
> > int priority;
> > + int exclusive = 0;
> > unsigned long ret = 0;
> > unsigned long total_scanned = 0;
> > unsigned long nr_reclaimed = 0;
> > @@ -1580,6 +1587,14 @@ static unsigned long do_try_to_free_page
> > sc->nr_scanned = 0;
> > if (!priority)
> > disable_swap_token();
> > + /*
> > + * Serialize aggressive reclaimers
> > + */
> > + if (priority <= DEF_PRIORITY / 2 && !exclusive) {
>
> On large machine, DEF_PRIORITY / 2 is really catastrophe situation.
> 2^6 = 64.
> if zone has 64GB memory, it mean 1GB reclaim.
> I think more early restriction is better.
I am just afraid that it kills parallelity.
> > + wait_event(reclaim_wait,
> > + !atomic_cmpxchg(&reclaim_exclusive, 0, 1));
> > + exclusive = 1;
> > + }
>
> if you want to restrict to one task, you can use mutex.
> and this wait_queue should put on global variable. it should be zone variable.
Hm, global or per-zone? Rik suggested to do it per-node and I like
that idea.
> In addision, you don't consider recursive relaim and several task can't sleep there.
>
>
> please believe me. I have richest experience about reclaim throttling in the planet.
Hehe, okay. Than I am glad you don't hate the idea completely. Do
you have any patches flying around that do something similar?
Hannes
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org, mel@csn.ul.ie,
akpm@linux-foundation.org
Subject: Re: [rfc] vmscan: serialize aggressive reclaimers
Date: Sat, 29 Nov 2008 16:39:04 +0100 [thread overview]
Message-ID: <20081129153902.GA1944@cmpxchg.org> (raw)
In-Reply-To: <20081129164322.8131.KOSAKI.MOTOHIRO@jp.fujitsu.com>
On Sat, Nov 29, 2008 at 04:46:24PM +0900, KOSAKI Motohiro wrote:
> > Since we have to pull through a reclaim cycle once we commited to it,
> > what do you think about serializing the lower priority levels
> > completely?
> >
> > The idea is that when one reclaimer has done a low priority level
> > iteration with a huge reclaim target, chances are that succeeding
> > reclaimers don't even need to drop to lower levels at all because
> > enough memory has already been freed.
> >
> > My testprogram maps and faults in a file that is about as large as my
> > physical memory. Then it spawns off n processes that try allocate
> > 1/2n of total memory in anon pages, i.e. half of it in sum. After it
> > ran, I check how much memory has been reclaimed. But my zone sizes
> > are too small to induce enormous reclaim targets so I don't see vast
> > over-reclaims.
> >
> > I have measured the time of other tests on an SMP machine with 4 cores
> > and the following patch applied. I couldn't see any performance
> > degradation. But since the bug is not triggerable here, I can not
> > prove it helps the original problem, either.
>
> I wonder why nobody of vmscan folks write actual performance improvement value
> in patch description.
That's why I made it RFC. I haven't seriously tested it, I just
wanted to know what people that understand more than I do think of the
idea.
> I think this patch point to right direction.
> but, unfortunately, this implementation isn't fast as I mesured as.
Fair enough.
> > The level where it starts serializing is chosen pretty arbitrarily.
> > Suggestions welcome :)
> >
> > Hannes
> >
> > ---
> >
> > Prevent over-reclaiming by serializing direct reclaimers below a
> > certain priority level.
> >
> > Over-reclaiming happens when the sum of the reclaim targets of all
> > reclaiming processes is larger than the sum of the needed free pages,
> > thus leading to excessive eviction of more cache and anonymous pages
> > than required.
> >
> > A scan iteration over all zones can not be aborted intermittently when
> > enough pages are reclaimed because that would mess up the scan balance
> > between the zones. Instead, prevent that too many processes
> > simultaneously commit themselves to lower priority level scans in the
> > first place.
> >
> > Chances are that after the exclusive reclaimer has finished, enough
> > memory has been freed that succeeding scanners don't need to drop to
> > lower priority levels at all anymore.
> >
> > Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
> > ---
> > mm/vmscan.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -35,6 +35,7 @@
> > #include <linux/notifier.h>
> > #include <linux/rwsem.h>
> > #include <linux/delay.h>
> > +#include <linux/wait.h>
> > #include <linux/kthread.h>
> > #include <linux/freezer.h>
> > #include <linux/memcontrol.h>
> > @@ -42,6 +43,7 @@
> > #include <linux/sysctl.h>
> >
> > #include <asm/tlbflush.h>
> > +#include <asm/atomic.h>
> > #include <asm/div64.h>
> >
> > #include <linux/swapops.h>
> > @@ -1546,10 +1548,15 @@ static unsigned long shrink_zones(int pr
> > * returns: 0, if no pages reclaimed
> > * else, the number of pages reclaimed
> > */
> > +
> > +static DECLARE_WAIT_QUEUE_HEAD(reclaim_wait);
> > +static atomic_t reclaim_exclusive = ATOMIC_INIT(0);
> > +
> > static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> > struct scan_control *sc)
> > {
> > int priority;
> > + int exclusive = 0;
> > unsigned long ret = 0;
> > unsigned long total_scanned = 0;
> > unsigned long nr_reclaimed = 0;
> > @@ -1580,6 +1587,14 @@ static unsigned long do_try_to_free_page
> > sc->nr_scanned = 0;
> > if (!priority)
> > disable_swap_token();
> > + /*
> > + * Serialize aggressive reclaimers
> > + */
> > + if (priority <= DEF_PRIORITY / 2 && !exclusive) {
>
> On large machine, DEF_PRIORITY / 2 is really catastrophe situation.
> 2^6 = 64.
> if zone has 64GB memory, it mean 1GB reclaim.
> I think more early restriction is better.
I am just afraid that it kills parallelity.
> > + wait_event(reclaim_wait,
> > + !atomic_cmpxchg(&reclaim_exclusive, 0, 1));
> > + exclusive = 1;
> > + }
>
> if you want to restrict to one task, you can use mutex.
> and this wait_queue should put on global variable. it should be zone variable.
Hm, global or per-zone? Rik suggested to do it per-node and I like
that idea.
> In addision, you don't consider recursive relaim and several task can't sleep there.
>
>
> please believe me. I have richest experience about reclaim throttling in the planet.
Hehe, okay. Than I am glad you don't hate the idea completely. Do
you have any patches flying around that do something similar?
Hannes
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-11-29 15:39 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-24 19:50 [PATCH] vmscan: bail out of page reclaim after swap_cluster_max pages Rik van Riel
2008-11-24 19:50 ` Rik van Riel
2008-11-24 20:53 ` Andrew Morton
2008-11-24 20:53 ` Andrew Morton
2008-11-25 11:35 ` KOSAKI Motohiro
2008-11-25 11:35 ` KOSAKI Motohiro
2008-11-25 13:32 ` Rik van Riel
2008-11-25 13:32 ` Rik van Riel
2008-11-25 14:30 ` KOSAKI Motohiro
2008-11-25 14:30 ` KOSAKI Motohiro
2008-11-28 7:02 ` KOSAKI Motohiro
2008-11-28 7:02 ` KOSAKI Motohiro
2008-11-28 11:03 ` Rik van Riel
2008-11-28 11:03 ` Rik van Riel
2008-11-29 10:53 ` KOSAKI Motohiro
2008-11-29 10:53 ` KOSAKI Motohiro
2008-11-29 16:24 ` Rik van Riel
2008-11-29 16:24 ` Rik van Riel
2008-11-30 6:30 ` KOSAKI Motohiro
2008-11-30 6:30 ` KOSAKI Motohiro
2008-12-03 5:26 ` [PATCH] vmscan: improve reclaim throuput to bail out patch KOSAKI Motohiro
2008-12-03 5:26 ` KOSAKI Motohiro
2008-12-03 13:46 ` Rik van Riel
2008-12-03 13:46 ` Rik van Riel
2008-12-03 15:12 ` KOSAKI Motohiro
2008-12-03 15:12 ` KOSAKI Motohiro
2008-12-04 1:28 ` [PATCH] vmscan: improve reclaim throuput to bail out patch take2 KOSAKI Motohiro
2008-12-04 1:28 ` KOSAKI Motohiro
2008-12-04 4:20 ` MinChan Kim
2008-12-04 4:20 ` MinChan Kim
2008-12-04 5:04 ` KOSAKI Motohiro
2008-12-04 5:04 ` KOSAKI Motohiro
2008-12-07 3:28 ` Andrew Morton
2008-12-07 3:28 ` Andrew Morton
2008-12-08 2:49 ` KOSAKI Motohiro
2008-12-08 2:49 ` KOSAKI Motohiro
2008-12-01 13:40 ` [PATCH] vmscan: bail out of page reclaim after swap_cluster_max pages Christoph Lameter
2008-12-01 13:40 ` Christoph Lameter
2008-11-26 2:24 ` KOSAKI Motohiro
2008-11-26 2:24 ` KOSAKI Motohiro
2008-11-27 17:36 ` [rfc] vmscan: serialize aggressive reclaimers Johannes Weiner
2008-11-27 17:36 ` Johannes Weiner
2008-11-29 7:46 ` KOSAKI Motohiro
2008-11-29 7:46 ` KOSAKI Motohiro
2008-11-29 15:39 ` Johannes Weiner [this message]
2008-11-29 15:39 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081129153902.GA1944@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.