* [PATCH] - Improve drain pages performance on large systems
@ 2011-02-15 22:38 Jack Steiner
2011-02-16 0:00 ` Minchan Kim
2011-02-16 0:17 ` Andi Kleen
0 siblings, 2 replies; 5+ messages in thread
From: Jack Steiner @ 2011-02-15 22:38 UTC (permalink / raw)
To: linux-mm, akpm; +Cc: linux-kernel
Heavy swapping within a cpuset causes frequent calls to drain_all_pages().
This sends IPIs to all cpus to free PCP pages. In most cases, there are
no pages to be freed on cpus outside of the swapping cpuset.
Add checks to minimize locking and updates to potentially hot cachelines.
Before acquiring locks, do a quick check to see if any pages are in the PCP
queues. Exit if none.
On a 128 node SGI UV system, this reduced the IPI overhead to cpus outside of the
swapping cpuset by 38% and improved time to run a pass of the swaping test
from 98 sec to 51 sec. These times are obviously test & configuration
dependent but the improvements are significant.
Signed-off-by: Jack Steiner <steiner@sgi.com>
---
mm/page_alloc.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c 2011-02-15 16:28:36.165921713 -0600
+++ linux/mm/page_alloc.c 2011-02-15 16:29:43.085502487 -0600
@@ -592,10 +592,24 @@ static void free_pcppages_bulk(struct zo
int batch_free = 0;
int to_free = count;
+ /*
+ * Quick scan of zones. If all are empty, there is nothing to do.
+ */
+ for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) {
+ struct list_head *list;
+
+ list = &pcp->lists[migratetype];
+ if (!list_empty(list))
+ break;
+ }
+ if (migratetype == MIGRATE_PCPTYPES)
+ return;
+
spin_lock(&zone->lock);
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;
+ migratetype = 0;
while (to_free) {
struct page *page;
struct list_head *list;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] - Improve drain pages performance on large systems
2011-02-15 22:38 [PATCH] - Improve drain pages performance on large systems Jack Steiner
@ 2011-02-16 0:00 ` Minchan Kim
2011-02-16 3:01 ` David Rientjes
2011-02-16 15:43 ` Jack Steiner
2011-02-16 0:17 ` Andi Kleen
1 sibling, 2 replies; 5+ messages in thread
From: Minchan Kim @ 2011-02-16 0:00 UTC (permalink / raw)
To: Jack Steiner; +Cc: linux-mm, akpm, linux-kernel, Mel Gorman, David Rientjes
On Wed, Feb 16, 2011 at 7:38 AM, Jack Steiner <steiner@sgi.com> wrote:
>
> Heavy swapping within a cpuset causes frequent calls to drain_all_pages()
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] - Improve drain pages performance on large systems
2011-02-15 22:38 [PATCH] - Improve drain pages performance on large systems Jack Steiner
2011-02-16 0:00 ` Minchan Kim
@ 2011-02-16 0:17 ` Andi Kleen
1 sibling, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2011-02-16 0:17 UTC (permalink / raw)
To: Jack Steiner; +Cc: linux-mm, akpm, linux-kernel
Jack Steiner <steiner@sgi.com> writes:
> Heavy swapping within a cpuset causes frequent calls to drain_all_pages().
I suspect drain_all_pages should be really made more zone aware in the
first place and only drain what is actually needed (e.g.
work off a zonelist). I was fighting with this for hwpoison too.
That said your patch looks reasonable.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] - Improve drain pages performance on large systems
2011-02-16 0:00 ` Minchan Kim
@ 2011-02-16 3:01 ` David Rientjes
2011-02-16 15:43 ` Jack Steiner
1 sibling, 0 replies; 5+ messages in thread
From: David Rientjes @ 2011-02-16 3:01 UTC (permalink / raw)
To: Minchan Kim
Cc: Jack Steiner, linux-mm, Andrew Morton, linux-kernel, Mel Gorman
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2392 bytes --]
On Wed, 16 Feb 2011, Minchan Kim wrote:
> > Index: linux/mm/page_alloc.c
> > ===================================================================
> > --- linux.orig/mm/page_alloc.c A 2011-02-15 16:28:36.165921713 -0600
> > +++ linux/mm/page_alloc.c A A A 2011-02-15 16:29:43.085502487 -0600
> > @@ -592,10 +592,24 @@ static void free_pcppages_bulk(struct zo
> > A A A A int batch_free = 0;
> > A A A A int to_free = count;
> >
> > + A A A /*
> > + A A A A * Quick scan of zones. If all are empty, there is nothing to do.
> > + A A A A */
> > + A A A for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) {
> > + A A A A A A A struct list_head *list;
> > +
> > + A A A A A A A list = &pcp->lists[migratetype];
> > + A A A A A A A if (!list_empty(list))
> > + A A A A A A A A A A A break;
> > + A A A }
> > + A A A if (migratetype == MIGRATE_PCPTYPES)
> > + A A A A A A A return;
> > +
> > A A A A spin_lock(&zone->lock);
> > A A A A zone->all_unreclaimable = 0;
> > A A A A zone->pages_scanned = 0;
> >
> > + A A A migratetype = 0;
> > A A A A while (to_free) {
> > A A A A A A A A struct page *page;
> > A A A A A A A A struct list_head *list;
>
> It does make sense to me.
> Although new code looks to be rather costly in small box, anyway we
> use the same logic in while loop so cache would be hot. so cost would
> be little.
>
I was going to mention the implications for small machines as well, this
doesn't look good for callers that know free_pcppages_bulk() will do
something.
> But how about this? This one never affect fast-critical path.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ff7e158..2dfb61a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1095,8 +1095,10 @@ static void drain_pages(unsigned int cpu)
> pset = per_cpu_ptr(zone->pageset, cpu);
>
> pcp = &pset->pcp;
> - free_pcppages_bulk(zone, pcp->count, pcp);
> - pcp->count = 0;
> + if (pcp->count > 0) {
> + free_pcppages_bulk(zone, pcp->count, pcp);
> + pcp->count = 0;
> + }
> local_irq_restore(flags);
> }
> }
Right, this is 2ff754fa upstream. I'm wondering if Jack still sees the
same problem since 2.6.38-rc3.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] - Improve drain pages performance on large systems
2011-02-16 0:00 ` Minchan Kim
2011-02-16 3:01 ` David Rientjes
@ 2011-02-16 15:43 ` Jack Steiner
1 sibling, 0 replies; 5+ messages in thread
From: Jack Steiner @ 2011-02-16 15:43 UTC (permalink / raw)
To: Minchan Kim; +Cc: linux-mm, akpm, linux-kernel, Mel Gorman, David Rientjes
On Wed, Feb 16, 2011 at 09:00:59AM +0900, Minchan Kim wrote:
> On Wed, Feb 16, 2011 at 7:38 AM, Jack Steiner <steiner@sgi.com> wrote:
> >
> > Heavy swapping within a cpuset causes frequent calls to drain_all_pages().
> > This sends IPIs to all cpus to free PCP pages. In most cases, there are
> > no pages to be freed on cpus outside of the swapping cpuset.
> >
> > Add checks to minimize locking and updates to potentially hot cachelines.
> > Before acquiring locks, do a quick check to see if any pages are in the PCP
> > queues. Exit if none.
> >
> > On a 128 node SGI UV system, this reduced the IPI overhead to cpus outside of the
> > swapping cpuset by 38% and improved time to run a pass of the swaping test
> > from 98 sec to 51 sec. These times are obviously test & configuration
> > dependent but the improvements are significant.
> >
> >
> > Signed-off-by: Jack Steiner <steiner@sgi.com>
> >
> > ---
> > mm/page_alloc.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > Index: linux/mm/page_alloc.c
> > ===================================================================
> > --- linux.orig/mm/page_alloc.c 2011-02-15 16:28:36.165921713 -0600
> > +++ linux/mm/page_alloc.c 2011-02-15 16:29:43.085502487 -0600
> > @@ -592,10 +592,24 @@ static void free_pcppages_bulk(struct zo
> > int batch_free = 0;
> > int to_free = count;
> >
> > + /*
> > + * Quick scan of zones. If all are empty, there is nothing to do.
> > + */
> > + for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++) {
> > + struct list_head *list;
> > +
> > + list = &pcp->lists[migratetype];
> > + if (!list_empty(list))
> > + break;
> > + }
> > + if (migratetype == MIGRATE_PCPTYPES)
> > + return;
> > +
> > spin_lock(&zone->lock);
> > zone->all_unreclaimable = 0;
> > zone->pages_scanned = 0;
> >
> > + migratetype = 0;
> > while (to_free) {
> > struct page *page;
> > struct list_head *list;
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
>
> It does make sense to me.
> Although new code looks to be rather costly in small box, anyway we
> use the same logic in while loop so cache would be hot. so cost would
> be little.
>
> But how about this? This one never affect fast-critical path.
Yes. Much cleaner. And even better, as David points out it is already in the tree.
I did my original testing on a 2.6.32 distro kernel & missed the fact that this was
recently fixed upstream.
My patch from yesterday can be discarded.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ff7e158..2dfb61a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1095,8 +1095,10 @@ static void drain_pages(unsigned int cpu)
> pset = per_cpu_ptr(zone->pageset, cpu);
>
> pcp = &pset->pcp;
> - free_pcppages_bulk(zone, pcp->count, pcp);
> - pcp->count = 0;
> + if (pcp->count > 0) {
> + free_pcppages_bulk(zone, pcp->count, pcp);
> + pcp->count = 0;
> + }
> local_irq_restore(flags);
> }
> }
>
>
>
> --
> Kind regards,
> Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-02-21 19:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-15 22:38 [PATCH] - Improve drain pages performance on large systems Jack Steiner
2011-02-16 0:00 ` Minchan Kim
2011-02-16 3:01 ` David Rientjes
2011-02-16 15:43 ` Jack Steiner
2011-02-16 0:17 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).