All of lore.kernel.org
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: linux-kernel@vger.kernel.org, Chris Metcalf <cmetcalf@tilera.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Russell King <linux@arm.linux.org.uk>,
	linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Rik van Riel <riel@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Mel Gorman <mel@csn.ul.ie>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
Date: Tue, 03 Jan 2012 12:45:45 -0500	[thread overview]
Message-ID: <4F033EC9.4050909@gmail.com> (raw)
In-Reply-To: <1325499859-2262-8-git-send-email-gilad@benyossef.com>

(1/2/12 5:24 AM), Gilad Ben-Yossef wrote:
> Calculate a cpumask of CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages when
> asked to flush.
> 
> This patch saves 99% of IPIs asking to drain per-cpu
> pages in case of severe memory preassure that leads
> to OOM since in these cases multiple, possibly concurrent,
> allocation requests end up in the direct reclaim code
> path so when the per-cpu pages end up reclaimed on first
> allocation failure for most of the proceeding allocation
> attempts until the memory pressure is off (possibly via
> the OOM killer) there are no per-cpu pages on most CPUs
> (and there can easily be hundreds of them).
> 
> This also has the side effect of shortening the average
> latency of direct reclaim by 1 or more order of magnitude
> since waiting for all the CPUs to ACK the IPI takes a
> long time.
> 
> Tested by running "hackbench 400" on a 4 CPU x86 otherwise
> idle VM and observing the difference between the number
> of direct reclaim attempts that end up in drain_all_pages()
> and those were more then 1/2 of the online CPU had any
> per-cpu page in them, using the vmstat counters introduced
> in the next patch in the series and using proc/interrupts.
> 
> In the test sceanrio, this saved around 500 global IPIs.
> After trigerring an OOM:
> 
> $ cat /proc/vmstat
> ...
> pcp_global_drain 627
> pcp_global_ipi_saved 578
> 
> I've also seen the number of drains reach 15k calls
> with the saved percentage reaching 99% when there
> are more tasks running during an OOM kill.
> 
> Signed-off-by: Gilad Ben-Yossef<gilad@benyossef.com>
> Acked-by: Christoph Lameter<cl@linux.com>
> CC: Chris Metcalf<cmetcalf@tilera.com>
> CC: Peter Zijlstra<a.p.zijlstra@chello.nl>
> CC: Frederic Weisbecker<fweisbec@gmail.com>
> CC: Russell King<linux@arm.linux.org.uk>
> CC: linux-mm@kvack.org
> CC: Pekka Enberg<penberg@kernel.org>
> CC: Matt Mackall<mpm@selenic.com>
> CC: Sasha Levin<levinsasha928@gmail.com>
> CC: Rik van Riel<riel@redhat.com>
> CC: Andi Kleen<andi@firstfloor.org>
> CC: Mel Gorman<mel@csn.ul.ie>
> CC: Andrew Morton<akpm@linux-foundation.org>
> CC: Alexander Viro<viro@zeniv.linux.org.uk>
> CC: linux-fsdevel@vger.kernel.org
> CC: Avi Kivity<avi@redhat.com>
> ---
>   Christopth Ack was for a previous version that allocated
>   the cpumask in drain_all_pages().

When you changed a patch design and implementation, ACKs are
should be dropped. otherwise you miss to chance to get a good
review.



>   mm/page_alloc.c |   26 +++++++++++++++++++++++++-
>   1 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2b8ba3a..092c331 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,14 @@ DEFINE_PER_CPU(int, numa_node);
>   EXPORT_PER_CPU_SYMBOL(numa_node);
>   #endif
> 
> +/*
> + * A global cpumask of CPUs with per-cpu pages that gets
> + * recomputed on each drain. We use a global cpumask
> + * for to avoid allocation on direct reclaim code path
> + * for CONFIG_CPUMASK_OFFSTACK=y
> + */
> +static cpumask_var_t cpus_with_pcps;
> +
>   #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>   /*
>    * N.B., Do NOT reference the '_numa_mem_' per cpu variable directly.
> @@ -1119,7 +1127,19 @@ void drain_local_pages(void *arg)
>    */
>   void drain_all_pages(void)
>   {
> -	on_each_cpu(drain_local_pages, NULL, 1);
> +	int cpu;
> +	struct per_cpu_pageset *pcp;
> +	struct zone *zone;
> +

get_online_cpu() ?

> +	for_each_online_cpu(cpu)
> +		for_each_populated_zone(zone) {
> +			pcp = per_cpu_ptr(zone->pageset, cpu);
> +			if (pcp->pcp.count)
> +				cpumask_set_cpu(cpu, cpus_with_pcps);
> +			else
> +				cpumask_clear_cpu(cpu, cpus_with_pcps);

cpumask* functions can't be used locklessly?

> +		}
> +	on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
>   }
> 
>   #ifdef CONFIG_HIBERNATION
> @@ -3623,6 +3643,10 @@ static void setup_zone_pageset(struct zone *zone)
>   void __init setup_per_cpu_pageset(void)
>   {
>   	struct zone *zone;
> +	int ret;
> +
> +	ret = zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL);
> +	BUG_ON(!ret);
> 
>   	for_each_populated_zone(zone)
>   		setup_zone_pageset(zone);


WARNING: multiple messages have this Message-ID (diff)
From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: linux-kernel@vger.kernel.org, Chris Metcalf <cmetcalf@tilera.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Russell King <linux@arm.linux.org.uk>,
	linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Rik van Riel <riel@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Mel Gorman <mel@csn.ul.ie>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
Date: Tue, 03 Jan 2012 12:45:45 -0500	[thread overview]
Message-ID: <4F033EC9.4050909@gmail.com> (raw)
In-Reply-To: <1325499859-2262-8-git-send-email-gilad@benyossef.com>

(1/2/12 5:24 AM), Gilad Ben-Yossef wrote:
> Calculate a cpumask of CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages when
> asked to flush.
> 
> This patch saves 99% of IPIs asking to drain per-cpu
> pages in case of severe memory preassure that leads
> to OOM since in these cases multiple, possibly concurrent,
> allocation requests end up in the direct reclaim code
> path so when the per-cpu pages end up reclaimed on first
> allocation failure for most of the proceeding allocation
> attempts until the memory pressure is off (possibly via
> the OOM killer) there are no per-cpu pages on most CPUs
> (and there can easily be hundreds of them).
> 
> This also has the side effect of shortening the average
> latency of direct reclaim by 1 or more order of magnitude
> since waiting for all the CPUs to ACK the IPI takes a
> long time.
> 
> Tested by running "hackbench 400" on a 4 CPU x86 otherwise
> idle VM and observing the difference between the number
> of direct reclaim attempts that end up in drain_all_pages()
> and those were more then 1/2 of the online CPU had any
> per-cpu page in them, using the vmstat counters introduced
> in the next patch in the series and using proc/interrupts.
> 
> In the test sceanrio, this saved around 500 global IPIs.
> After trigerring an OOM:
> 
> $ cat /proc/vmstat
> ...
> pcp_global_drain 627
> pcp_global_ipi_saved 578
> 
> I've also seen the number of drains reach 15k calls
> with the saved percentage reaching 99% when there
> are more tasks running during an OOM kill.
> 
> Signed-off-by: Gilad Ben-Yossef<gilad@benyossef.com>
> Acked-by: Christoph Lameter<cl@linux.com>
> CC: Chris Metcalf<cmetcalf@tilera.com>
> CC: Peter Zijlstra<a.p.zijlstra@chello.nl>
> CC: Frederic Weisbecker<fweisbec@gmail.com>
> CC: Russell King<linux@arm.linux.org.uk>
> CC: linux-mm@kvack.org
> CC: Pekka Enberg<penberg@kernel.org>
> CC: Matt Mackall<mpm@selenic.com>
> CC: Sasha Levin<levinsasha928@gmail.com>
> CC: Rik van Riel<riel@redhat.com>
> CC: Andi Kleen<andi@firstfloor.org>
> CC: Mel Gorman<mel@csn.ul.ie>
> CC: Andrew Morton<akpm@linux-foundation.org>
> CC: Alexander Viro<viro@zeniv.linux.org.uk>
> CC: linux-fsdevel@vger.kernel.org
> CC: Avi Kivity<avi@redhat.com>
> ---
>   Christopth Ack was for a previous version that allocated
>   the cpumask in drain_all_pages().

When you changed a patch design and implementation, ACKs are
should be dropped. otherwise you miss to chance to get a good
review.



>   mm/page_alloc.c |   26 +++++++++++++++++++++++++-
>   1 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2b8ba3a..092c331 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,14 @@ DEFINE_PER_CPU(int, numa_node);
>   EXPORT_PER_CPU_SYMBOL(numa_node);
>   #endif
> 
> +/*
> + * A global cpumask of CPUs with per-cpu pages that gets
> + * recomputed on each drain. We use a global cpumask
> + * for to avoid allocation on direct reclaim code path
> + * for CONFIG_CPUMASK_OFFSTACK=y
> + */
> +static cpumask_var_t cpus_with_pcps;
> +
>   #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>   /*
>    * N.B., Do NOT reference the '_numa_mem_' per cpu variable directly.
> @@ -1119,7 +1127,19 @@ void drain_local_pages(void *arg)
>    */
>   void drain_all_pages(void)
>   {
> -	on_each_cpu(drain_local_pages, NULL, 1);
> +	int cpu;
> +	struct per_cpu_pageset *pcp;
> +	struct zone *zone;
> +

get_online_cpu() ?

> +	for_each_online_cpu(cpu)
> +		for_each_populated_zone(zone) {
> +			pcp = per_cpu_ptr(zone->pageset, cpu);
> +			if (pcp->pcp.count)
> +				cpumask_set_cpu(cpu, cpus_with_pcps);
> +			else
> +				cpumask_clear_cpu(cpu, cpus_with_pcps);

cpumask* functions can't be used locklessly?

> +		}
> +	on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
>   }
> 
>   #ifdef CONFIG_HIBERNATION
> @@ -3623,6 +3643,10 @@ static void setup_zone_pageset(struct zone *zone)
>   void __init setup_per_cpu_pageset(void)
>   {
>   	struct zone *zone;
> +	int ret;
> +
> +	ret = zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL);
> +	BUG_ON(!ret);
> 
>   	for_each_populated_zone(zone)
>   		setup_zone_pageset(zone);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-01-03 17:45 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-02 10:24 [PATCH v5 0/8] Reduce cross CPU IPI interference Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-03  7:51   ` Michal Nazarewicz
2012-01-03  7:51     ` Michal Nazarewicz
2012-01-03  7:51     ` Michal Nazarewicz
2012-01-03  8:12     ` Gilad Ben-Yossef
2012-01-03  8:12       ` Gilad Ben-Yossef
2012-01-03  8:57       ` Michal Nazarewicz
2012-01-03  8:57         ` Michal Nazarewicz
2012-01-03  8:57         ` Michal Nazarewicz
2012-01-03 22:26   ` Andrew Morton
2012-01-03 22:26     ` Andrew Morton
2012-01-05 13:17     ` Michal Nazarewicz
2012-01-05 13:17       ` Michal Nazarewicz
2012-01-08 16:04     ` Gilad Ben-Yossef
2012-01-08 16:04       ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 3/8] tile: Move tile to use " Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-03 22:34   ` Andrew Morton
2012-01-03 22:34     ` Andrew Morton
2012-01-08 16:09     ` Gilad Ben-Yossef
2012-01-08 16:09       ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-03 17:45   ` KOSAKI Motohiro [this message]
2012-01-03 17:45     ` KOSAKI Motohiro
2012-01-03 18:58     ` Gilad Ben-Yossef
2012-01-03 18:58       ` Gilad Ben-Yossef
2012-01-03 22:02       ` KOSAKI Motohiro
2012-01-03 22:02         ` KOSAKI Motohiro
2012-01-05 14:20     ` Mel Gorman
2012-01-05 14:20       ` Mel Gorman
2012-01-05 14:40       ` Russell King - ARM Linux
2012-01-05 14:40         ` Russell King - ARM Linux
2012-01-05 15:24         ` Peter Zijlstra
2012-01-05 15:24           ` Peter Zijlstra
2012-01-05 16:17         ` Mel Gorman
2012-01-05 16:17           ` Mel Gorman
2012-01-05 16:35           ` Russell King - ARM Linux
2012-01-05 16:35             ` Russell King - ARM Linux
2012-01-05 18:35             ` Paul E. McKenney
2012-01-05 18:35               ` Paul E. McKenney
2012-01-05 22:21               ` Mel Gorman
2012-01-05 22:21                 ` Mel Gorman
2012-01-06  6:06                 ` Srivatsa S. Bhat
2012-01-06  6:06                   ` Srivatsa S. Bhat
2012-01-06 10:46                   ` Mel Gorman
2012-01-06 10:46                     ` Mel Gorman
2012-01-06 13:28                 ` Greg KH
2012-01-06 13:28                   ` Greg KH
2012-01-06 14:09                   ` Mel Gorman
2012-01-06 14:09                     ` Mel Gorman
2012-01-05 22:06           ` Andrew Morton
2012-01-05 22:06             ` Andrew Morton
2012-01-05 22:31             ` Mel Gorman
2012-01-05 22:31               ` Mel Gorman
2012-01-05 23:19               ` Andrew Morton
2012-01-05 23:19                 ` Andrew Morton
2012-01-09 17:25                 ` Mel Gorman
2012-01-09 17:25                   ` Mel Gorman
2012-01-07 16:52           ` Paul E. McKenney
2012-01-07 16:52             ` Paul E. McKenney
2012-01-07 17:05             ` Paul E. McKenney
2012-01-07 17:05               ` Paul E. McKenney
2012-01-05 15:54   ` Mel Gorman
2012-01-05 15:54     ` Mel Gorman
2012-01-08 16:01     ` Gilad Ben-Yossef
2012-01-08 16:01       ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 8/8] mm: add vmstat counters for tracking PCP drains Gilad Ben-Yossef
2012-01-02 10:24   ` Gilad Ben-Yossef
2012-01-03 17:47   ` KOSAKI Motohiro
2012-01-03 17:47     ` KOSAKI Motohiro
2012-01-03 19:00     ` Gilad Ben-Yossef
2012-01-03 19:00       ` Gilad Ben-Yossef
2012-01-03 22:13       ` KOSAKI Motohiro
2012-01-03 22:13         ` KOSAKI Motohiro
2012-01-03 22:37       ` Andrew Morton
2012-01-03 22:37         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F033EC9.4050909@gmail.com \
    --to=kosaki.motohiro@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=cmetcalf@tilera.com \
    --cc=fweisbec@gmail.com \
    --cc=gilad@benyossef.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mel@csn.ul.ie \
    --cc=mpm@selenic.com \
    --cc=penberg@kernel.org \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.