linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-05-13 19:08 Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 01/11] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
                   ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

"Problems" with the current code:
 1. there is a lack of synchronization in setting ->high and ->batch in
    percpu_pagelist_fraction_sysctl_handler()
 2. stop_machine() in zone_pcp_update() is unnecissary.
 3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero

To fix:
 1. add memory barriers, a safe ->batch value, an update side mutex when
    updating ->high and ->batch, and use ACCESS_ONCE() for ->batch users that
    expect a stable value.
 2. avoid draining pages in zone_pcp_update(), rely upon the memory barriers added to fix #1
 3. factor out quite a few functions, and then call the appropriate one.

Note that it results in a change to the behavior of zone_pcp_update(), which is
used by memory_hotplug. I'm rather certain that I've diserned (and preserved)
the essential behavior (changing ->high and ->batch), and only eliminated
unneeded actions (draining the per cpu pages), but this may not be the case.

Further note that the draining of pages that previously took place in
zone_pcp_update() occured after repeated draining when attempting to offline a
page, and after the offline has "succeeded". It appears that the draining was
added to zone_pcp_update() to avoid refactoring setup_pageset() into 2
funtions.

--

Since v2: https://lkml.org/lkml/2013/4/9/718

 - note ACCESS_ONCE() in fix #1 above.
 - consolidate ->batch & ->high update protocol into a single funtion (Gilad).
 - add missing ACCESS_ONCE() on ->batch

Since v1: https://lkml.org/lkml/2013/4/5/444

 - instead of using on_each_cpu(), use memory barriers (Gilad) and an update side mutex.
 - add "Problem" #3 above, and fix.
 - rename function to match naming style of similar function
 - move unrelated comment

--

Cody P Schafer (11):
  mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high
  mm/page_alloc: insert memory barriers to allow async update of pcp
    batch and high
  mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE
  mm/page_alloc: convert zone_pcp_update() to rely on memory barriers
    instead of stop_machine()
  mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly
    recalulate high
  mm/page_alloc: factor setup_pageset() into pageset_init() and
    pageset_set_batch()
  mm/page_alloc: relocate comment to be directly above code it refers
    to.
  mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset()
  mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init()
  mm/page_alloc: rename setup_pagelist_highmark() to match naming of
    pageset_set_batch()

 mm/page_alloc.c | 151 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 90 insertions(+), 61 deletions(-)

-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 01/11] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 02/11] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high Cody P Schafer
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Creates pageset_set_batch() for use in setup_pageset().
pageset_set_batch() imitates the functionality of
setup_pagelist_highmark(), but uses the boot time
(percpu_pagelist_fraction == 0) calculations for determining ->high
based on ->batch.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98cbdf6..9e556e6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4030,6 +4030,14 @@ static int __meminit zone_batchsize(struct zone *zone)
 #endif
 }
 
+/* a companion to setup_pagelist_highmark() */
+static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+{
+	struct per_cpu_pages *pcp = &p->pcp;
+	pcp->high = 6 * batch;
+	pcp->batch = max(1UL, 1 * batch);
+}
+
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -4039,8 +4047,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 
 	pcp = &p->pcp;
 	pcp->count = 0;
-	pcp->high = 6 * batch;
-	pcp->batch = max(1UL, 1 * batch);
+	pageset_set_batch(p, batch);
 	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
@@ -4049,7 +4056,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
  * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
  */
-
 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 				unsigned long high)
 {
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 02/11] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 01/11] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 03/11] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Because we are going to rely upon a careful transision between old and
new ->high and ->batch values using memory barriers and will remove
stop_machine(), we need to prevent multiple updaters from interweaving
their memory writes.

Add a simple mutex to protect both update loops.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9e556e6..cea883d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -65,6 +65,9 @@
 #include <asm/div64.h>
 #include "internal.h"
 
+/* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
+static DEFINE_MUTEX(pcp_batch_high_lock);
+
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -5555,6 +5558,8 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
 	if (!write || (ret < 0))
 		return ret;
+
+	mutex_lock(&pcp_batch_high_lock);
 	for_each_populated_zone(zone) {
 		for_each_possible_cpu(cpu) {
 			unsigned long  high;
@@ -5563,6 +5568,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 				per_cpu_ptr(zone->pageset, cpu), high);
 		}
 	}
+	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
 }
 
@@ -6076,7 +6082,9 @@ static int __meminit __zone_pcp_update(void *data)
 
 void __meminit zone_pcp_update(struct zone *zone)
 {
+	mutex_lock(&pcp_batch_high_lock);
 	stop_machine(__zone_pcp_update, zone, NULL);
+	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
 
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 03/11] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 01/11] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 02/11] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 04/11] mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE Cody P Schafer
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Introduce pageset_update() to perform a safe transision from one set of
pcp->{batch,high} to a new set using memory barriers.

This ensures that batch is always set to a safe value (1) prior to
updating high, and ensure that high is fully updated before setting the
real value of batch. It avoids ->batch ever rising above ->high.

Suggested by Gilad Ben-Yossef <gilad@benyossef.com> in these threads:

	https://lkml.org/lkml/2013/4/9/23
	https://lkml.org/lkml/2013/4/10/49

Also reproduces his proposed comment.

Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cea883d..7e45b91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4033,12 +4033,37 @@ static int __meminit zone_batchsize(struct zone *zone)
 #endif
 }
 
+/*
+ * pcp->high and pcp->batch values are related and dependent on one another:
+ * ->batch must never be higher then ->high.
+ * The following function updates them in a safe manner without read side
+ * locking.
+ *
+ * Any new users of pcp->batch and pcp->high should ensure they can cope with
+ * those fields changing asynchronously (acording the the above rule).
+ *
+ * mutex_is_locked(&pcp_batch_high_lock) required when calling this function
+ * outside of boot time (or some other assurance that no concurrent updaters
+ * exist).
+ */
+static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
+		unsigned long batch)
+{
+       /* start with a fail safe value for batch */
+	pcp->batch = 1;
+	smp_wmb();
+
+       /* Update high, then batch, in order */
+	pcp->high = high;
+	smp_wmb();
+
+	pcp->batch = batch;
+}
+
 /* a companion to setup_pagelist_highmark() */
 static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
 {
-	struct per_cpu_pages *pcp = &p->pcp;
-	pcp->high = 6 * batch;
-	pcp->batch = max(1UL, 1 * batch);
+	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
 }
 
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
@@ -4062,13 +4087,11 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 				unsigned long high)
 {
-	struct per_cpu_pages *pcp;
+	unsigned long batch = max(1UL, high / 4);
+	if ((high / 4) > (PAGE_SHIFT * 8))
+		batch = PAGE_SHIFT * 8;
 
-	pcp = &p->pcp;
-	pcp->high = high;
-	pcp->batch = max(1UL, high/4);
-	if ((high/4) > (PAGE_SHIFT * 8))
-		pcp->batch = PAGE_SHIFT * 8;
+	pageset_update(&p->pcp, high, batch);
 }
 
 static void __meminit setup_zone_pageset(struct zone *zone)
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 04/11] mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (2 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 03/11] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 05/11] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine() Cody P Schafer
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

pcp->batch could change at any point, avoid relying on it being a stable value.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7e45b91..71d843d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1182,10 +1182,12 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 {
 	unsigned long flags;
 	int to_drain;
+	unsigned long batch;
 
 	local_irq_save(flags);
-	if (pcp->count >= pcp->batch)
-		to_drain = pcp->batch;
+	batch = ACCESS_ONCE(pcp->batch);
+	if (pcp->count >= batch)
+		to_drain = batch;
 	else
 		to_drain = pcp->count;
 	if (to_drain > 0) {
@@ -1353,8 +1355,9 @@ void free_hot_cold_page(struct page *page, int cold)
 		list_add(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
-		free_pcppages_bulk(zone, pcp->batch, pcp);
-		pcp->count -= pcp->batch;
+		unsigned long batch = ACCESS_ONCE(pcp->batch);
+		free_pcppages_bulk(zone, batch, pcp);
+		pcp->count -= batch;
 	}
 
 out:
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 05/11] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine()
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (3 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 04/11] mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 06/11] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high Cody P Schafer
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
percpu pageset based on a zone's ->managed_pages. We don't need to drain
the entire percpu pageset just to modify these fields.

This lets us avoid calling setup_pageset() (and the draining required to
call it) and instead allows simply setting the fields' values (with some
attention paid to memory barriers to prevent the relationship between
->batch and ->high from being thrown off).

This does change the behavior of zone_pcp_update() as the percpu
pagesets will not be drained when zone_pcp_update() is called (they will
end up being shrunk, not completely drained, later when a 0-order page
is freed in free_hot_cold_page()).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 33 +++++++++------------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 71d843d..5a00195 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6083,33 +6083,18 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int __meminit __zone_pcp_update(void *data)
-{
-	struct zone *zone = data;
-	int cpu;
-	unsigned long batch = zone_batchsize(zone), flags;
-
-	for_each_possible_cpu(cpu) {
-		struct per_cpu_pageset *pset;
-		struct per_cpu_pages *pcp;
-
-		pset = per_cpu_ptr(zone->pageset, cpu);
-		pcp = &pset->pcp;
-
-		local_irq_save(flags);
-		if (pcp->count > 0)
-			free_pcppages_bulk(zone, pcp->count, pcp);
-		drain_zonestat(zone, pset);
-		setup_pageset(pset, batch);
-		local_irq_restore(flags);
-	}
-	return 0;
-}
-
+/*
+ * The zone indicated has a new number of managed_pages; batch sizes and percpu
+ * page high values need to be recalulated.
+ */
 void __meminit zone_pcp_update(struct zone *zone)
 {
+	unsigned cpu;
+	unsigned long batch;
 	mutex_lock(&pcp_batch_high_lock);
-	stop_machine(__zone_pcp_update, zone, NULL);
+	batch = zone_batchsize(zone);
+	for_each_possible_cpu(cpu)
+		pageset_set_batch(per_cpu_ptr(zone->pageset, cpu), batch);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 06/11] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (4 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 05/11] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine() Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 07/11] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch() Cody P Schafer
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Simply moves calculation of the new 'high' value outside the
for_each_possible_cpu() loop, as it does not depend on the cpu.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5a00195..3583281 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5573,7 +5573,6 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
  * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
  * can have before it gets flushed back to buddy allocator.
  */
-
 int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -5587,12 +5586,11 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 
 	mutex_lock(&pcp_batch_high_lock);
 	for_each_populated_zone(zone) {
-		for_each_possible_cpu(cpu) {
-			unsigned long  high;
-			high = zone->managed_pages / percpu_pagelist_fraction;
+		unsigned long  high;
+		high = zone->managed_pages / percpu_pagelist_fraction;
+		for_each_possible_cpu(cpu)
 			setup_pagelist_highmark(
-				per_cpu_ptr(zone->pageset, cpu), high);
-		}
+					per_cpu_ptr(zone->pageset, cpu), high);
 	}
 	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 07/11] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch()
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (5 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 06/11] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 08/11] mm/page_alloc: relocate comment to be directly above code it refers to Cody P Schafer
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3583281..696ce96 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4069,7 +4069,7 @@ static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
 	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_init(struct per_cpu_pageset *p)
 {
 	struct per_cpu_pages *pcp;
 	int migratetype;
@@ -4078,11 +4078,16 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 
 	pcp = &p->pcp;
 	pcp->count = 0;
-	pageset_set_batch(p, batch);
 	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
+static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+{
+	pageset_init(p);
+	pageset_set_batch(p, batch);
+}
+
 /*
  * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 08/11] mm/page_alloc: relocate comment to be directly above code it refers to.
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (6 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 07/11] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch() Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 09/11] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset() Cody P Schafer
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 696ce96..53c62c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3709,12 +3709,12 @@ void __ref build_all_zonelists(pg_data_t *pgdat, struct zone *zone)
 		mminit_verify_zonelist();
 		cpuset_init_current_mems_allowed();
 	} else {
-		/* we have to stop all cpus to guarantee there is no user
-		   of zonelist */
 #ifdef CONFIG_MEMORY_HOTPLUG
 		if (zone)
 			setup_zone_pageset(zone);
 #endif
+		/* we have to stop all cpus to guarantee there is no user
+		   of zonelist */
 		stop_machine(__build_all_zonelists, pgdat, NULL);
 		/* cpuset refresh routine should be here */
 	}
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 09/11] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset()
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (7 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 08/11] mm/page_alloc: relocate comment to be directly above code it refers to Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 10/11] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init() Cody P Schafer
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 53c62c5..0c3cdbb6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4102,22 +4102,25 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 	pageset_update(&p->pcp, high, batch);
 }
 
+static void __meminit zone_pageset_init(struct zone *zone, int cpu)
+{
+	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+
+	pageset_init(pcp);
+	if (percpu_pagelist_fraction)
+		setup_pagelist_highmark(pcp,
+			(zone->managed_pages /
+				percpu_pagelist_fraction));
+	else
+		pageset_set_batch(pcp, zone_batchsize(zone));
+}
+
 static void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
-
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
-
-	for_each_possible_cpu(cpu) {
-		struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
-
-		setup_pageset(pcp, zone_batchsize(zone));
-
-		if (percpu_pagelist_fraction)
-			setup_pagelist_highmark(pcp,
-				(zone->managed_pages /
-					percpu_pagelist_fraction));
-	}
+	for_each_possible_cpu(cpu)
+		zone_pageset_init(zone, cpu);
 }
 
 /*
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 10/11] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init()
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (8 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 09/11] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset() Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:08 ` [PATCH RESEND v3 11/11] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch() Cody P Schafer
  2013-05-13 19:20 ` [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Pekka Enberg
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Previously, zone_pcp_update() called pageset_set_batch() directly,
essentially assuming that percpu_pagelist_fraction == 0. Correct this by
calling zone_pageset_init(), which chooses the appropriate ->batch and
->high calculations.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c3cdbb6..251fb5f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6096,11 +6096,9 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 void __meminit zone_pcp_update(struct zone *zone)
 {
 	unsigned cpu;
-	unsigned long batch;
 	mutex_lock(&pcp_batch_high_lock);
-	batch = zone_batchsize(zone);
 	for_each_possible_cpu(cpu)
-		pageset_set_batch(per_cpu_ptr(zone->pageset, cpu), batch);
+		zone_pageset_init(zone, cpu);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RESEND v3 11/11] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch()
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (9 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 10/11] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init() Cody P Schafer
@ 2013-05-13 19:08 ` Cody P Schafer
  2013-05-13 19:20 ` [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Pekka Enberg
  11 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 251fb5f..b335c98 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4063,7 +4063,7 @@ static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
 	pcp->batch = batch;
 }
 
-/* a companion to setup_pagelist_highmark() */
+/* a companion to pageset_set_high() */
 static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
 {
 	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
@@ -4089,10 +4089,10 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 }
 
 /*
- * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
+ * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
  */
-static void setup_pagelist_highmark(struct per_cpu_pageset *p,
+static void pageset_set_high(struct per_cpu_pageset *p,
 				unsigned long high)
 {
 	unsigned long batch = max(1UL, high / 4);
@@ -4108,7 +4108,7 @@ static void __meminit zone_pageset_init(struct zone *zone, int cpu)
 
 	pageset_init(pcp);
 	if (percpu_pagelist_fraction)
-		setup_pagelist_highmark(pcp,
+		pageset_set_high(pcp,
 			(zone->managed_pages /
 				percpu_pagelist_fraction));
 	else
@@ -5597,8 +5597,8 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 		unsigned long  high;
 		high = zone->managed_pages / percpu_pagelist_fraction;
 		for_each_possible_cpu(cpu)
-			setup_pagelist_highmark(
-					per_cpu_ptr(zone->pageset, cpu), high);
+			pageset_set_high(per_cpu_ptr(zone->pageset, cpu),
+					 high);
 	}
 	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
-- 
1.8.2.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (10 preceding siblings ...)
  2013-05-13 19:08 ` [PATCH RESEND v3 11/11] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch() Cody P Schafer
@ 2013-05-13 19:20 ` Pekka Enberg
  2013-05-13 20:47   ` Cody P Schafer
  11 siblings, 1 reply; 14+ messages in thread
From: Pekka Enberg @ 2013-05-13 19:20 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Andrew Morton, Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro,
	Mel Gorman, Linux MM, LKML

Hi Cody,

On Mon, May 13, 2013 at 10:08 PM, Cody P Schafer
<cody@linux.vnet.ibm.com> wrote:
> "Problems" with the current code:
>  1. there is a lack of synchronization in setting ->high and ->batch in
>     percpu_pagelist_fraction_sysctl_handler()
>  2. stop_machine() in zone_pcp_update() is unnecissary.
>  3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero

Maybe it's just me but I find the above problem description confusing.
How does the problem manifest itself? How did you find about it? Why
do we need to fix all three problems in the same patch set?

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-05-13 19:20 ` [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Pekka Enberg
@ 2013-05-13 20:47   ` Cody P Schafer
  0 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-05-13 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro,
	Mel Gorman, Linux MM, LKML

On 05/13/2013 12:20 PM, Pekka Enberg wrote:
> Hi Cody,
>
> On Mon, May 13, 2013 at 10:08 PM, Cody P Schafer
> <cody@linux.vnet.ibm.com> wrote:
>> "Problems" with the current code:
>>   1. there is a lack of synchronization in setting ->high and ->batch in
>>      percpu_pagelist_fraction_sysctl_handler()
>>   2. stop_machine() in zone_pcp_update() is unnecissary.
>>   3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero
>
> Maybe it's just me but I find the above problem description confusing.
> How does the problem manifest itself?

1. I've not reproduced this causing issues.
2. Calling zone_pcp_update() is slow.
3. Not reproduced either, but would cause percpu_pagelist_fraction (set 
via sysctl) to be ignored after a call to zone_pcp_update() (for 
example, after a memory hotplug).

> How did you find about it?

I'm writing some code that resizes zones and thus uses 
zone_pcp_update(), and fixing broken things along the way.

> Why
> do we need to fix all three problems in the same patch set?

They all affect the same bit of code and fixing all of them means 
restructuring the both of the sites where ->high and ->batch are set.

Additionally, splitting it out (if possible) would make it less clear 
what the overall goal is, and would mean a few inter-patchset 
dependencies, which are undesirable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-05-13 20:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-13 19:08 [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 01/11] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 02/11] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 03/11] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 04/11] mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 05/11] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine() Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 06/11] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 07/11] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch() Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 08/11] mm/page_alloc: relocate comment to be directly above code it refers to Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 09/11] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset() Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 10/11] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init() Cody P Schafer
2013-05-13 19:08 ` [PATCH RESEND v3 11/11] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch() Cody P Schafer
2013-05-13 19:20 ` [PATCH RESEND v3 00/11] mm: fixup changers of per cpu pageset's ->high and ->batch Pekka Enberg
2013-05-13 20:47   ` Cody P Schafer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).