linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 1/3] mm, kswapd: remove bogus check of balance_classzone_idx
@ 2016-01-26 15:36 Vlastimil Babka
  2016-01-26 15:36 ` [RFC 2/3] mm, compaction: introduce kcompactd Vlastimil Babka
  2016-01-26 15:36 ` [RFC 3/3] mm, kswapd: stop performing compaction from kswapd Vlastimil Babka
  0 siblings, 2 replies; 5+ messages in thread
From: Vlastimil Babka @ 2016-01-26 15:36 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Andrea Arcangeli, Kirill A. Shutemov,
	Rik van Riel, Joonsoo Kim, Mel Gorman, David Rientjes,
	Michal Hocko, Johannes Weiner, Vlastimil Babka

During work on kcompactd integration I have spotted a confusing check of
balance_classzone_idx, which I believe is bogus.

The balanced_classzone_idx is filled by balance_pgdat() as the highest zone
it attempted to balance. This was introduced by commit dc83edd941f4 ("mm:
kswapd: use the classzone idx that kswapd was using for
sleeping_prematurely()"). The intention is that (as expressed in today's
function names), the value used for kswapd_shrink_zone() calls in
balance_pgdat() is the same as for the decisions in kswapd_try_to_sleep().
An unwanted side-effect of that commit was breaking the checks in kswapd()
whether there was another kswapd_wakeup with a tighter (=lower) classzone_idx.
Commits 215ddd6664ce ("mm: vmscan: only read new_classzone_idx from pgdat
when reclaiming successfully") and d2ebd0f6b895 ("kswapd: avoid unnecessary
rebalance after an unsuccessful balancing") tried to fixed, but apparently
introduced a bogus check that this patch removes.

Consider zone indexes X < Y < Z, where:
- Z is the value used for the first kswapd wakeup.
- Y is returned as balanced_classzone_idx, which means zones with index higher
  than Y (including Z) were found to be unreclaimable.
- X is the value used for the second kswapd wakeup

The new wakeup with value X means that kswapd is now supposed to balance harder
all zones with index <= X. But instead, due to Y < Z, it will go sleep and
won't read the new value X. This is subtly wrong.

The effect of this patch is that kswapd will react better in some situations,
where e.g. the first wakeup is for ZONE_DMA32, the second is for ZONE_DMA, and
due to unreclaimable ZONE_NORMAL. Before this patch, kswapd would go sleep
instead of reclaiming ZONE_DMA harder. I expect these situations are very rare,
and more value is in better maintainability due to the removal of confusing
and bogus check.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb3dd37ccd7c..72d52d3aef74 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3481,8 +3481,7 @@ static int kswapd(void *p)
 		 * new request of a similar or harder type will succeed soon
 		 * so consider going to sleep on the basis we reclaimed at
 		 */
-		if (balanced_classzone_idx >= new_classzone_idx &&
-					balanced_order == new_order) {
+		if (balanced_order == new_order) {
 			new_order = pgdat->kswapd_max_order;
 			new_classzone_idx = pgdat->classzone_idx;
 			pgdat->kswapd_max_order =  0;
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC 2/3] mm, compaction: introduce kcompactd
  2016-01-26 15:36 [RFC 1/3] mm, kswapd: remove bogus check of balance_classzone_idx Vlastimil Babka
@ 2016-01-26 15:36 ` Vlastimil Babka
  2016-01-27  9:10   ` Mel Gorman
  2016-01-26 15:36 ` [RFC 3/3] mm, kswapd: stop performing compaction from kswapd Vlastimil Babka
  1 sibling, 1 reply; 5+ messages in thread
From: Vlastimil Babka @ 2016-01-26 15:36 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Andrea Arcangeli, Kirill A. Shutemov,
	Rik van Riel, Joonsoo Kim, Mel Gorman, David Rientjes,
	Michal Hocko, Johannes Weiner, Vlastimil Babka

Memory compaction can be currently performed in several contexts:

- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP page
  fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc

The purpose of compaction is two-fold. The obvious purpose is to satisfy a
(pending or future) high-order allocation, and is easy to evaluate. The other
purpose is to keep overal memory fragmentation low and help the
anti-fragmentation mechanism. The success wrt the latter purpose is more
difficult to evaluate though.

The current situation wrt the purposes has a few drawbacks:

- compaction is invoked only when a high-order page or hugepage is not
  available (or manually). This might be too late for the purposes of keeping
  memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would be
  better if compaction was performed asynchronously to keep fragmentation low,
  before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP page
  faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in some
  scenarios

To improve the situation, we should be able to benefit from an equivalent of
kswapd, but for compaction - i.e. a background thread which responds to
fragmentation and the need for high-order allocations (including hugepages)
somewhat proactively.

One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let kswapd
handle reclaim, as order-0 allocations are often more critical than high-order
ones.

Another possibility is to extend khugepaged, but this kthread is a single
instance and tied to THP configs.

This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new tunables.
The lifecycle mimics kswapd kthreads, including the memory hotplug hooks.

Waking up of the kcompactd threads is also tied to kswapd activity and follows
these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from the
  slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so don't
  invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd

The kswapd compact/reclaim loop for high-order pages will be removed in the
next patch with the description of what's wrong with it.

In this patch, kcompactd uses the standard compaction_suitable() and
compact_finished() criteria, which means it will most likely have nothing left
to do after kswapd finishes, until the next patch. Kcompactd also mimics
direct compaction somewhat by trying async compaction first and sync compaction
afterwards, and uses the deferred compaction functionality.

Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are not
available (currently not done due to __GFP_NO_KSWAPD) or when a fragmentation
event (i.e. __rmqueue_fallback()) occurs. It's also possible to perform
periodic compaction with kcompactd.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h        |  16 +++
 include/linux/mmzone.h            |   6 +
 include/linux/vm_event_item.h     |   1 +
 include/trace/events/compaction.h |  55 +++++++++
 mm/compaction.c                   | 227 ++++++++++++++++++++++++++++++++++++++
 mm/memory_hotplug.c               |  15 ++-
 mm/page_alloc.c                   |   3 +
 mm/vmscan.c                       |   6 +
 mm/vmstat.c                       |   1 +
 9 files changed, 325 insertions(+), 5 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 4cd4ddf64cc7..1367c0564d42 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -52,6 +52,10 @@ extern void compaction_defer_reset(struct zone *zone, int order,
 				bool alloc_success);
 extern bool compaction_restarting(struct zone *zone, int order);
 
+extern int kcompactd_run(int nid);
+extern void kcompactd_stop(int nid);
+extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
+
 #else
 static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order, int alloc_flags,
@@ -84,6 +88,18 @@ static inline bool compaction_deferred(struct zone *zone, int order)
 	return true;
 }
 
+static int kcompactd_run(int nid)
+{
+	return 0;
+}
+static void kcompactd_stop(int nid)
+{
+}
+
+static void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx)
+{
+}
+
 #endif /* CONFIG_COMPACTION */
 
 #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 33bb1b19273e..412613a91e15 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -664,6 +664,12 @@ typedef struct pglist_data {
 					   mem_hotplug_begin/end() */
 	int kswapd_max_order;
 	enum zone_type classzone_idx;
+#ifdef CONFIG_COMPACTION
+	int kcompactd_max_order;
+	enum zone_type kcompactd_classzone_idx;
+	wait_queue_head_t kcompactd_wait;
+	struct task_struct *kcompactd;
+#endif
 #ifdef CONFIG_NUMA_BALANCING
 	/* Lock serializing the migrate rate limiting window */
 	spinlock_t numabalancing_migrate_lock;
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 67c1dbd19c6d..58ecc056ee45 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -53,6 +53,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		COMPACTMIGRATE_SCANNED, COMPACTFREE_SCANNED,
 		COMPACTISOLATED,
 		COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+		KCOMPACTD_WAKE,
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index c92d1e1cbad9..223450aeb49e 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -350,6 +350,61 @@ DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset,
 );
 #endif
 
+TRACE_EVENT(mm_compaction_kcompactd_sleep,
+
+	TP_PROTO(int nid),
+
+	TP_ARGS(nid),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+	),
+
+	TP_printk("nid=%d", __entry->nid)
+);
+
+DECLARE_EVENT_CLASS(kcompactd_wake_template,
+
+	TP_PROTO(int nid, int order, enum zone_type classzone_idx),
+
+	TP_ARGS(nid, order, classzone_idx),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(int, order)
+		__field(enum zone_type, classzone_idx)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->order = order;
+		__entry->classzone_idx = classzone_idx;
+	),
+
+	TP_printk("nid=%d order=%d classzone_idx=%-8s",
+		__entry->nid,
+		__entry->order,
+		__print_symbolic(__entry->classzone_idx, ZONE_TYPE))
+);
+
+DEFINE_EVENT(kcompactd_wake_template, mm_compaction_wakeup_kcompactd,
+
+	TP_PROTO(int nid, int order, enum zone_type classzone_idx),
+
+	TP_ARGS(nid, order, classzone_idx)
+);
+
+DEFINE_EVENT(kcompactd_wake_template, mm_compaction_kcompactd_wake,
+
+	TP_PROTO(int nid, int order, enum zone_type classzone_idx),
+
+	TP_ARGS(nid, order, classzone_idx)
+);
+
 #endif /* _TRACE_COMPACTION_H */
 
 /* This part must be outside protection */
diff --git a/mm/compaction.c b/mm/compaction.c
index 585de54dbe8c..7452975fa481 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -17,6 +17,9 @@
 #include <linux/balloon_compaction.h>
 #include <linux/page-isolation.h>
 #include <linux/kasan.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
+#include <linux/module.h>
 #include "internal.h"
 
 #ifdef CONFIG_COMPACTION
@@ -29,6 +32,7 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
 {
 	count_vm_events(item, delta);
 }
+
 #else
 #define count_compact_event(item) do { } while (0)
 #define count_compact_events(item, delta) do { } while (0)
@@ -1759,4 +1763,227 @@ void compaction_unregister_node(struct node *node)
 }
 #endif /* CONFIG_SYSFS && CONFIG_NUMA */
 
+static bool kcompactd_work_requested(pg_data_t *pgdat)
+{
+	return pgdat->kcompactd_max_order > 0;
+}
+
+static bool kcompactd_node_suitable(pg_data_t *pgdat)
+{
+	int zoneid;
+	struct zone *zone;
+
+	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+		zone = &pgdat->node_zones[zoneid];
+
+		if (!populated_zone(zone))
+			continue;
+
+		if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
+					pgdat->kcompactd_classzone_idx)
+							== COMPACT_CONTINUE)
+			return true;
+	}
+
+	return false;
+}
+
+static void kcompactd_do_work(pg_data_t *pgdat)
+{
+	/*
+	 * With no special task, compact all zones so that a page of requested
+	 * order is allocatable.
+	 */
+	int zoneid;
+	struct zone *zone;
+	struct compact_control cc = {
+		.order = pgdat->kcompactd_max_order,
+		.classzone_idx = pgdat->kcompactd_classzone_idx,
+		.mode = MIGRATE_ASYNC,
+		.ignore_skip_hint = true,
+
+	};
+	bool success = false;
+
+	trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order,
+							cc.classzone_idx);
+	count_vm_event(KCOMPACTD_WAKE);
+
+retry:
+	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+		int status;
+
+		zone = &pgdat->node_zones[zoneid];
+		if (!populated_zone(zone))
+			continue;
+
+		if (compaction_deferred(zone, cc.order))
+			continue;
+
+		if (compaction_suitable(zone, cc.order, 0, zoneid) !=
+							COMPACT_CONTINUE)
+			continue;
+
+		cc.nr_freepages = 0;
+		cc.nr_migratepages = 0;
+		cc.zone = zone;
+		INIT_LIST_HEAD(&cc.freepages);
+		INIT_LIST_HEAD(&cc.migratepages);
+
+		status = compact_zone(zone, &cc);
+
+		if (zone_watermark_ok(zone, cc.order, low_wmark_pages(zone),
+						cc.classzone_idx, 0)) {
+			success = true;
+			compaction_defer_reset(zone, cc.order, false);
+		} else if (cc.mode != MIGRATE_ASYNC &&
+						status == COMPACT_COMPLETE) {
+			defer_compaction(zone, cc.order);
+		}
+
+		VM_BUG_ON(!list_empty(&cc.freepages));
+		VM_BUG_ON(!list_empty(&cc.migratepages));
+	}
+
+	if (!success && cc.mode == MIGRATE_ASYNC) {
+		cc.mode = MIGRATE_SYNC_LIGHT;
+		goto retry;
+	}
+
+	/*
+	 * Regardless of success, we are done until woken up next. But remember
+	 * the requested order/classzone_idx in case it was higher/tighter than
+	 * our current ones
+	 */
+	if (pgdat->kcompactd_max_order <= cc.order)
+		pgdat->kcompactd_max_order = 0;
+	if (pgdat->classzone_idx >= cc.classzone_idx)
+		pgdat->classzone_idx = pgdat->nr_zones - 1;
+}
+
+void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx)
+{
+	if (!order)
+		return;
+
+	if (pgdat->kcompactd_max_order < order)
+		pgdat->kcompactd_max_order = order;
+
+	if (pgdat->kcompactd_classzone_idx > classzone_idx)
+		pgdat->kcompactd_classzone_idx = classzone_idx;
+
+	if (!waitqueue_active(&pgdat->kcompactd_wait))
+		return;
+
+	if (!kcompactd_node_suitable(pgdat))
+		return;
+
+	trace_mm_compaction_wakeup_kcompactd(pgdat->node_id, order,
+							classzone_idx);
+	wake_up_interruptible(&pgdat->kcompactd_wait);
+}
+
+/*
+ * The background compaction daemon, started as a kernel thread
+ * from the init process.
+ */
+static int kcompactd(void *p)
+{
+	pg_data_t *pgdat = (pg_data_t*)p;
+	struct task_struct *tsk = current;
+
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
+
+	if (!cpumask_empty(cpumask))
+		set_cpus_allowed_ptr(tsk, cpumask);
+
+	set_freezable();
+
+	pgdat->kcompactd_max_order = 0;
+	pgdat->kcompactd_classzone_idx = pgdat->nr_zones - 1;
+
+	while (!kthread_should_stop()) {
+		trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
+		wait_event_freezable(pgdat->kcompactd_wait,
+				kcompactd_work_requested(pgdat));
+
+		kcompactd_do_work(pgdat);
+	}
+
+	return 0;
+}
+
+/*
+ * This kcompactd start function will be called by init and node-hot-add.
+ * On node-hot-add, kcompactd will moved to proper cpus if cpus are hot-added.
+ */
+int kcompactd_run(int nid)
+{
+	pg_data_t *pgdat = NODE_DATA(nid);
+	int ret = 0;
+
+	if (pgdat->kcompactd)
+		return 0;
+
+	pgdat->kcompactd = kthread_run(kcompactd, pgdat, "kcompactd%d", nid);
+	if (IS_ERR(pgdat->kcompactd)) {
+		pr_err("Failed to start kcompactd on node %d\n", nid);
+		ret = PTR_ERR(pgdat->kcompactd);
+		pgdat->kcompactd = NULL;
+	}
+	return ret;
+}
+
+/*
+ * Called by memory hotplug when all memory in a node is offlined. Caller must
+ * hold mem_hotplug_begin/end().
+ */
+void kcompactd_stop(int nid)
+{
+	struct task_struct *kcompactd = NODE_DATA(nid)->kcompactd;
+
+	if (kcompactd) {
+		kthread_stop(kcompactd);
+		NODE_DATA(nid)->kcompactd = NULL;
+	}
+}
+
+/*
+ * It's optimal to keep kcompactd on the same CPUs as their memory, but
+ * not required for correctness. So if the last cpu in a node goes
+ * away, we get changed to run anywhere: as the first one comes back,
+ * restore their cpu bindings.
+ */
+static int cpu_callback(struct notifier_block *nfb, unsigned long action,
+			void *hcpu)
+{
+	int nid;
+
+	if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
+		for_each_node_state(nid, N_MEMORY) {
+			pg_data_t *pgdat = NODE_DATA(nid);
+			const struct cpumask *mask;
+
+			mask = cpumask_of_node(pgdat->node_id);
+
+			if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
+				/* One of our CPUs online: restore mask */
+				set_cpus_allowed_ptr(pgdat->kcompactd, mask);
+		}
+	}
+	return NOTIFY_OK;
+}
+
+static int __init kcompactd_init(void)
+{
+	int nid;
+
+	for_each_node_state(nid, N_MEMORY)
+		kcompactd_run(nid);
+	hotcpu_notifier(cpu_callback, 0);
+	return 0;
+}
+
+module_init(kcompactd_init)
+
 #endif /* CONFIG_COMPACTION */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4af58a3a8ffa..bb5651b4ecf3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -33,6 +33,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memblock.h>
 #include <linux/bootmem.h>
+#include <linux/compaction.h>
 
 #include <asm/tlbflush.h>
 
@@ -1042,7 +1043,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	arg.nr_pages = nr_pages;
 	node_states_check_changes_online(nr_pages, zone, &arg);
 
-	nid = pfn_to_nid(pfn);
+	nid = zone_to_nid(zone);
 
 	ret = memory_notify(MEM_GOING_ONLINE, &arg);
 	ret = notifier_to_errno(ret);
@@ -1082,7 +1083,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
 	if (onlined_pages) {
-		node_states_set_node(zone_to_nid(zone), &arg);
+		node_states_set_node(nid, &arg);
 		if (need_zonelists_rebuild)
 			build_all_zonelists(NULL, NULL);
 		else
@@ -1093,8 +1094,10 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 
 	init_per_zone_wmark_min();
 
-	if (onlined_pages)
-		kswapd_run(zone_to_nid(zone));
+	if (onlined_pages) {
+		kswapd_run(nid);
+		kcompactd_run(nid);
+	}
 
 	vm_total_pages = nr_free_pagecache_pages();
 
@@ -1858,8 +1861,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		zone_pcp_update(zone);
 
 	node_states_clear_node(node, &arg);
-	if (arg.status_change_nid >= 0)
+	if (arg.status_change_nid >= 0) {
 		kswapd_stop(node);
+		kcompactd_stop(node);
+	}
 
 	vm_total_pages = nr_free_pagecache_pages();
 	writeback_set_ratelimit();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 63358d9f9aa9..7747eb36e789 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5212,6 +5212,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 #endif
 	init_waitqueue_head(&pgdat->kswapd_wait);
 	init_waitqueue_head(&pgdat->pfmemalloc_wait);
+#ifdef CONFIG_COMPACTION
+	init_waitqueue_head(&pgdat->kcompactd_wait);
+#endif
 	pgdat_page_ext_init(pgdat);
 
 	for (j = 0; j < MAX_NR_ZONES; j++) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 72d52d3aef74..1449e21c55cc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3408,6 +3408,12 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
 		 */
 		reset_isolation_suitable(pgdat);
 
+		/*
+		 * We have freed the memory, now we should compact it to make
+		 * allocation of the requested order possible.
+		 */
+		wakeup_kcompactd(pgdat, order, classzone_idx);
+
 		if (!kthread_should_stop())
 			schedule();
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2c74ddf16..25a0a65c65cc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -826,6 +826,7 @@ const char * const vmstat_text[] = {
 	"compact_stall",
 	"compact_fail",
 	"compact_success",
+	"compact_kcompatd_wake",
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC 3/3] mm, kswapd: stop performing compaction from kswapd
  2016-01-26 15:36 [RFC 1/3] mm, kswapd: remove bogus check of balance_classzone_idx Vlastimil Babka
  2016-01-26 15:36 ` [RFC 2/3] mm, compaction: introduce kcompactd Vlastimil Babka
@ 2016-01-26 15:36 ` Vlastimil Babka
  1 sibling, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2016-01-26 15:36 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Andrea Arcangeli, Kirill A. Shutemov,
	Rik van Riel, Joonsoo Kim, Mel Gorman, David Rientjes,
	Michal Hocko, Johannes Weiner, Vlastimil Babka

Similarly to direct reclaim/compaction, kswapd attempts to combine reclaim and
compaction to attempt making memory allocation of given order available. The
details differ from direct reclaim e.g. in having high watermark as a goal.
The code involved in kswapd's reclaim/compaction decisions has evolved to be
quite complex. Testing reveals that it doesn't actually work in at least one
scenario, and closer inspection suggests that it could be greatly simplified
without compromising on the goal (make high-order page available) or efficiency
(don't reclaim too much). The simplification relieas of doing all compaction in
kcompactd, which is simply woken up when high watermarks are reached by
kswapd's reclaim.

The scenario where kswapd compaction doesn't work was found with mmtests test
stress-highalloc configured to attempt order-9 allocations without direct
reclaim, just waking up kswapd. There was no compaction attempt from kswapd
during the whole test. Some added instrumentation shows what happens:

- balance_pgdat() sets end_zone to Normal, as it's not balanced
- reclaim is attempted on DMA zone, which sets nr_attempted to 99, but it
  cannot reclaim anything, so sc.nr_reclaimed is 0
- for zones DMA32 and Normal, kswapd_shrink_zone uses testorder=0, so it
  merely checks if high watermarks were reached for base pages. This is true,
  so no reclaim is attempted. For DMA, testorder=0 wasn't used, as
  compaction_suitable() returned COMPACT_SKIPPED
- even though the pgdat_needs_compaction flag wasn't set to false, no
  compaction happens due to the condition sc.nr_reclaimed > nr_attempted
  being false (as 0 < 99)
- priority-- due to nr_reclaimed being 0, repeat until priority reaches 0
  pgdat_balanced() is false as only the small zone DMA appears balanced
  (curiously in that check, watermark appears OK and compaction_suitable()
  returns COMPACT_PARTIAL, because a lower classzone_idx is used there)

Now, even if it was decided that reclaim shouldn't be attempted on the DMA
zone, the scenario would be the same, as (sc.nr_reclaimed=0 > nr_attempted=0)
is also false. The condition really should use >= as the comment suggests.
Then there is a mismatch in the check for setting pgdat_needs_compaction to
false using low watermark, while the rest uses high watermark, and who knows
what other subtlety. Hopefully this demonstrates that this is unsustainable.

Luckily we can simplify this a lot. The reclaim/compaction decisions make
sense for direct reclaim scenario, but in kswapd, our primary goal is to reach
high watermark in order-0 pages. Afterwards we can attempt compaction just
once. Unlike direct reclaim, we don't reclaim extra pages (over the high
watermark), the current code already disallows it for good reasons.

After this patch, we simply wake up kcompactd to process the pgdat, after we
have either succeeded or failed to reach the high watermarks in kswapd, which
goes to sleep. Kcompactd will apply the usual criteria to determine which
zones are worth compacting. The key element is adding a "highorder" parameter
to zone_balanced, which, when set to false, makes it consider only order-0
watermark instead of the desired higher order (this was done previously by
kswapd_shrink_zone(), but not elsewhere). This false is passed for example
in pgdat_balanced(). Importantly, wakeup_kswapd() uses true to make sure kswapd
and thus kcompactd are woken up for a high-order allocation failure.

For testing, I used stress-highalloc configured to do order-9 allocations with
GFP_NOWAIT|__GFP_HIGH|__GFP_COMP, so they relied just on kswapd/kcompactd
reclaim/compaction (the interfering kernel builds in phases 1 and 2 work as
usual):

stress-highalloc
                            4.4                   4.4                   4.4
                         1-test                2-test                3-test
Success 1        4.00 (  0.00%)        3.00 ( 25.00%)        9.00 (-125.00%)
Success 2        4.00 (  0.00%)        4.00 (  0.00%)       10.00 (-150.00%)
Success 3       38.00 (  0.00%)       38.00 (  0.00%)       76.00 (-100.00%)

                 4.4         4.4         4.4
              1-test      2-test      3-test
User         2953.35     3093.48     3093.42
System       1122.04     1143.88     1156.99
Elapsed      1868.16     1874.77     1899.16

                                      4.4         4.4         4.4
                                   1-test      2-test      3-test
Minor Faults                    108895488   109087527   109388744
Major Faults                          620         604         750
Swap Ins                              237         280         412
Swap Outs                            3722        4585        3691
Allocation stalls                     321         313         260
DMA allocs                            110         159           7
DMA32 allocs                     76286159    76181606    76855688
Normal allocs                    26568981    26483685    26719273
Movable allocs                          0           0           0
Direct pages scanned                53308       46054       47504
Kswapd pages scanned              2126926     2156583     2180757
Kswapd pages reclaimed            2119616     2134575     2173845
Direct pages reclaimed              53141       45355       47336
Kswapd efficiency                     99%         98%         99%
Kswapd velocity                  1138.514    1150.319    1148.275
Direct efficiency                     99%         98%         99%
Direct velocity                    28.535      24.565      25.013
Percentage direct scans                2%          2%          2%
Zone normal velocity              284.759     275.373     280.962
Zone dma32 velocity               882.278     899.486     892.326
Zone dma velocity                   0.012       0.025       0.000
Page writes by reclaim           3722.000    4585.000    3691.000
Page writes file                        0           0           0
Page writes anon                     3722        4585        3691
Page reclaim immediate                139         293         245
Sector Reads                      4392000     4389436     4434052
Sector Writes                    11082116    11088032    11102164
Page rescued immediate                  0           0           0
Slabs scanned                     1554185     1546949     1573151
Direct inode steals                 17675       27398        3110
Kswapd inode steals                 49005       38837       63606
Kswapd skipped wait                     0           0           0
THP fault alloc                       816         725         784
THP collapse alloc                    429         351         368
THP splits                              5           4           7
THP fault fallback                    583         619         561
THP collapse fail                      12          12          13
Compaction stalls                    1082        1064         924
Compaction success                    185         134         113
Compaction failures                   897         930         811
Page migrate success               547663      555349     1357157
Page migrate failure                23289        9299       23159
Compaction pages isolated         1143892     1146019     2810572
Compaction migrate scanned        1518500     1463701     7626063
Compaction free scanned          35478038    32735896   326240264
Compaction cost                       601         608        1515

We can see that just adding kcompactd (second column) didn't help, as kswapd
was failing to do anything and went sleep with balanced_order=0, therefore
kcompactd didn't receive the correct order=9. After this patch (third column)
we see improvements in allocation success rate along with increased compaction
activity. The compaction stalls (direct compaction) in the interfering kernel
builds (probably THP's) also decreased by 10% thanks to kcompactd activity.
THP alloc successes are somewhat unstable (there should be no difference
between patch 1 and 2) but it doesn't look like a regression for this patch.

We can also configure stress-highalloc to perform both direct
reclaim/compaction and wakeup kswapd/kcompactd, by using
GFP_KERNEL|__GFP_HIGH|__GFP_COMP:

stress-highalloc
                            4.4                   4.4                   4.4
                        1-test2               2-test2               3-test2
Success 1       13.00 (  0.00%)       11.00 ( 15.38%)        7.00 ( 46.15%)
Success 2       14.00 (  0.00%)       12.00 ( 14.29%)        7.00 ( 50.00%)
Success 3       77.00 (  0.00%)       80.00 ( -3.90%)       79.00 ( -2.60%)

                 4.4         4.4         4.4
             1-test2     2-test2     3-test2
User         3109.78     3160.65     3141.30
System       1156.71     1169.76     1164.30
Elapsed      1890.71     1931.35     1883.46

                                      4.4         4.4         4.4
                                  1-test2     2-test2     3-test2
Minor Faults                    110079446   111295014   110632582
Major Faults                          612         612         627
Swap Ins                              223         191         174
Swap Outs                            4829        4285        4796
Allocation stalls                    4516        4571        4577
DMA allocs                            129         164          12
DMA32 allocs                     77336335    78049863    77568017
Normal allocs                    26946355    27371021    26879992
Movable allocs                          0           0           0
Direct pages scanned               164893      159692      113793
Kswapd pages scanned              2115489     2076764     2099869
Kswapd pages reclaimed            2107476     2069146     2091612
Direct pages reclaimed             164268      158978      113527
Kswapd efficiency                     99%         99%         99%
Kswapd velocity                  1118.886    1075.291    1114.900
Direct efficiency                     99%         99%         99%
Direct velocity                    87.212      82.684      60.417
Percentage direct scans                7%          7%          5%
Zone normal velocity              289.128     301.114     280.090
Zone dma32 velocity               916.944     856.826     895.227
Zone dma velocity                   0.026       0.036       0.000
Page writes by reclaim           4829.000    4285.000    4796.000
Page writes file                        0           0           0
Page writes anon                     4829        4285        4796
Page reclaim immediate                274         379          30
Sector Reads                      4465640     4469740     4431932
Sector Writes                    11117364    11126004    11117220
Page rescued immediate                  0           0           0
Slabs scanned                     1677445     1882706     1655883
Direct inode steals                  8364       16551        5725
Kswapd inode steals                 65367       61029       67560
Kswapd skipped wait                     0           0           0
THP fault alloc                       769         686         685
THP collapse alloc                    332         356         368
THP splits                              3           2           2
THP fault fallback                    691         731         658
THP collapse fail                      20          20          21
Compaction stalls                    2686        2749        2256
Compaction success                    889         955         512
Compaction failures                  1797        1794        1744
Page migrate success              2442538     2396245     3044899
Page migrate failure                31276       32952       58085
Compaction pages isolated         4963155     4875568     6204653
Compaction migrate scanned       17004834    13387834    20155758
Compaction free scanned         269507706   259687893   437874337
Compaction cost                      2749        2674        3420

Here, this patch does lower the success rate for phases 1 and 2 with
interference, but importantly phase 3 (no interference) is the same.  There's
however significant reduction in direct compaction stalls, made entirely of the
successful stalls. This means the offload to kcompactd is working as expected,
and direct compaction is reduced either due to detecting contention, on
compaction deferred by kcompactd. The apparent regression of alloc success rate
is likely due to races - a direct compaction attempt immediately followed by
allocation attempt is more likely to succeed than when the allocation attempts
are asynchronous to kcompactd activity. This is the price for reduced
latencies.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 131 +++++++++++++++++-------------------------------------------
 1 file changed, 36 insertions(+), 95 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1449e21c55cc..dd4ccce93509 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2981,18 +2981,23 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc)
 	} while (memcg);
 }
 
-static bool zone_balanced(struct zone *zone, int order,
-			  unsigned long balance_gap, int classzone_idx)
+static bool zone_balanced(struct zone *zone, int order, bool highorder,
+			unsigned long balance_gap, int classzone_idx)
 {
-	if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone) +
-				    balance_gap, classzone_idx))
-		return false;
+	unsigned long mark = high_wmark_pages(zone) + balance_gap;
 
-	if (IS_ENABLED(CONFIG_COMPACTION) && order && compaction_suitable(zone,
-				order, 0, classzone_idx) == COMPACT_SKIPPED)
-		return false;
+	/*
+	 * When checking from pgdat_balanced(), kswapd should stop and sleep
+	 * when it reaches the high order-0 watermark and let kcompactd take
+	 * over. Other callers such as wakeup_kswapd() want to determine the
+	 * true high-order watermark.
+	 */
+	if (IS_ENABLED(CONFIG_COMPACTION) && !highorder) {
+		mark += (1UL << order);
+		order = 0;
+	}
 
-	return true;
+	return zone_watermark_ok_safe(zone, order, mark, classzone_idx);
 }
 
 /*
@@ -3042,7 +3047,7 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
 			continue;
 		}
 
-		if (zone_balanced(zone, order, 0, i))
+		if (zone_balanced(zone, order, false, 0, i))
 			balanced_pages += zone->managed_pages;
 		else if (!order)
 			return false;
@@ -3096,8 +3101,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
  */
 static bool kswapd_shrink_zone(struct zone *zone,
 			       int classzone_idx,
-			       struct scan_control *sc,
-			       unsigned long *nr_attempted)
+			       struct scan_control *sc)
 {
 	int testorder = sc->order;
 	unsigned long balance_gap;
@@ -3107,17 +3111,6 @@ static bool kswapd_shrink_zone(struct zone *zone,
 	sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
 
 	/*
-	 * Kswapd reclaims only single pages with compaction enabled. Trying
-	 * too hard to reclaim until contiguous free pages have become
-	 * available can hurt performance by evicting too much useful data
-	 * from memory. Do not reclaim more than needed for compaction.
-	 */
-	if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
-			compaction_suitable(zone, sc->order, 0, classzone_idx)
-							!= COMPACT_SKIPPED)
-		testorder = 0;
-
-	/*
 	 * We put equal pressure on every zone, unless one zone has way too
 	 * many pages free already. The "too many pages" is defined as the
 	 * high wmark plus a "gap" where the gap is either the low
@@ -3131,15 +3124,12 @@ static bool kswapd_shrink_zone(struct zone *zone,
 	 * reclaim is necessary
 	 */
 	lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone));
-	if (!lowmem_pressure && zone_balanced(zone, testorder,
+	if (!lowmem_pressure && zone_balanced(zone, testorder, false,
 						balance_gap, classzone_idx))
 		return true;
 
 	shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
 
-	/* Account for the number of pages attempted to reclaim */
-	*nr_attempted += sc->nr_to_reclaim;
-
 	clear_bit(ZONE_WRITEBACK, &zone->flags);
 
 	/*
@@ -3149,7 +3139,7 @@ static bool kswapd_shrink_zone(struct zone *zone,
 	 * waits.
 	 */
 	if (zone_reclaimable(zone) &&
-	    zone_balanced(zone, testorder, 0, classzone_idx)) {
+	    zone_balanced(zone, testorder, false, 0, classzone_idx)) {
 		clear_bit(ZONE_CONGESTED, &zone->flags);
 		clear_bit(ZONE_DIRTY, &zone->flags);
 	}
@@ -3161,7 +3151,7 @@ static bool kswapd_shrink_zone(struct zone *zone,
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at high_wmark_pages(zone).
  *
- * Returns the final order kswapd was reclaiming at
+ * Returns the highest zone idx kswapd was reclaiming at
  *
  * There is special handling here for zones which are full of pinned pages.
  * This can happen if the pages are all mlocked, or if they are all used by
@@ -3178,8 +3168,7 @@ static bool kswapd_shrink_zone(struct zone *zone,
  * interoperates with the page allocator fallback scheme to ensure that aging
  * of pages is balanced across the zones.
  */
-static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
-							int *classzone_idx)
+static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
 {
 	int i;
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
@@ -3196,9 +3185,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 	count_vm_event(PAGEOUTRUN);
 
 	do {
-		unsigned long nr_attempted = 0;
 		bool raise_priority = true;
-		bool pgdat_needs_compaction = (order > 0);
 
 		sc.nr_reclaimed = 0;
 
@@ -3233,7 +3220,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 				break;
 			}
 
-			if (!zone_balanced(zone, order, 0, 0)) {
+			if (!zone_balanced(zone, order, true, 0, 0)) {
 				end_zone = i;
 				break;
 			} else {
@@ -3249,24 +3236,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 		if (i < 0)
 			goto out;
 
-		for (i = 0; i <= end_zone; i++) {
-			struct zone *zone = pgdat->node_zones + i;
-
-			if (!populated_zone(zone))
-				continue;
-
-			/*
-			 * If any zone is currently balanced then kswapd will
-			 * not call compaction as it is expected that the
-			 * necessary pages are already available.
-			 */
-			if (pgdat_needs_compaction &&
-					zone_watermark_ok(zone, order,
-						low_wmark_pages(zone),
-						*classzone_idx, 0))
-				pgdat_needs_compaction = false;
-		}
-
 		/*
 		 * If we're getting trouble reclaiming, start doing writepage
 		 * even in laptop mode.
@@ -3310,8 +3279,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 			 * that that high watermark would be met at 100%
 			 * efficiency.
 			 */
-			if (kswapd_shrink_zone(zone, end_zone,
-					       &sc, &nr_attempted))
+			if (kswapd_shrink_zone(zone, end_zone, &sc))
 				raise_priority = false;
 		}
 
@@ -3324,46 +3292,25 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 				pfmemalloc_watermark_ok(pgdat))
 			wake_up_all(&pgdat->pfmemalloc_wait);
 
-		/*
-		 * Fragmentation may mean that the system cannot be rebalanced
-		 * for high-order allocations in all zones. If twice the
-		 * allocation size has been reclaimed and the zones are still
-		 * not balanced then recheck the watermarks at order-0 to
-		 * prevent kswapd reclaiming excessively. Assume that a
-		 * process requested a high-order can direct reclaim/compact.
-		 */
-		if (order && sc.nr_reclaimed >= 2UL << order)
-			order = sc.order = 0;
-
 		/* Check if kswapd should be suspending */
 		if (try_to_freeze() || kthread_should_stop())
 			break;
 
 		/*
-		 * Compact if necessary and kswapd is reclaiming at least the
-		 * high watermark number of pages as requsted
-		 */
-		if (pgdat_needs_compaction && sc.nr_reclaimed > nr_attempted)
-			compact_pgdat(pgdat, order);
-
-		/*
 		 * Raise priority if scanning rate is too low or there was no
 		 * progress in reclaiming pages
 		 */
 		if (raise_priority || !sc.nr_reclaimed)
 			sc.priority--;
 	} while (sc.priority >= 1 &&
-		 !pgdat_balanced(pgdat, order, *classzone_idx));
+			!pgdat_balanced(pgdat, order, classzone_idx));
 
 out:
 	/*
-	 * Return the order we were reclaiming at so prepare_kswapd_sleep()
-	 * makes a decision on the order we were last reclaiming at. However,
-	 * if another caller entered the allocator slow path while kswapd
-	 * was awake, order will remain at the higher level
+	 * Return the highest zone idx we were reclaiming at so
+	 * prepare_kswapd_sleep() makes the same decisions as here.
 	 */
-	*classzone_idx = end_zone;
-	return order;
+	return end_zone;
 }
 
 static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
@@ -3443,7 +3390,6 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
 static int kswapd(void *p)
 {
 	unsigned long order, new_order;
-	unsigned balanced_order;
 	int classzone_idx, new_classzone_idx;
 	int balanced_classzone_idx;
 	pg_data_t *pgdat = (pg_data_t*)p;
@@ -3476,23 +3422,19 @@ static int kswapd(void *p)
 	set_freezable();
 
 	order = new_order = 0;
-	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
 	balanced_classzone_idx = classzone_idx;
 	for ( ; ; ) {
 		bool ret;
 
 		/*
-		 * If the last balance_pgdat was unsuccessful it's unlikely a
-		 * new request of a similar or harder type will succeed soon
-		 * so consider going to sleep on the basis we reclaimed at
+		 * While we were reclaiming, there might have been another
+		 * wakeup, so check the values.
 		 */
-		if (balanced_order == new_order) {
-			new_order = pgdat->kswapd_max_order;
-			new_classzone_idx = pgdat->classzone_idx;
-			pgdat->kswapd_max_order =  0;
-			pgdat->classzone_idx = pgdat->nr_zones - 1;
-		}
+		new_order = pgdat->kswapd_max_order;
+		new_classzone_idx = pgdat->classzone_idx;
+		pgdat->kswapd_max_order =  0;
+		pgdat->classzone_idx = pgdat->nr_zones - 1;
 
 		if (order < new_order || classzone_idx > new_classzone_idx) {
 			/*
@@ -3502,7 +3444,7 @@ static int kswapd(void *p)
 			order = new_order;
 			classzone_idx = new_classzone_idx;
 		} else {
-			kswapd_try_to_sleep(pgdat, balanced_order,
+			kswapd_try_to_sleep(pgdat, order,
 						balanced_classzone_idx);
 			order = pgdat->kswapd_max_order;
 			classzone_idx = pgdat->classzone_idx;
@@ -3522,9 +3464,8 @@ static int kswapd(void *p)
 		 */
 		if (!ret) {
 			trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
-			balanced_classzone_idx = classzone_idx;
-			balanced_order = balance_pgdat(pgdat, order,
-						&balanced_classzone_idx);
+			balanced_classzone_idx = balance_pgdat(pgdat, order,
+								classzone_idx);
 		}
 	}
 
@@ -3554,7 +3495,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
 	}
 	if (!waitqueue_active(&pgdat->kswapd_wait))
 		return;
-	if (zone_balanced(zone, order, 0, 0))
+	if (zone_balanced(zone, order, true, 0, 0))
 		return;
 
 	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC 2/3] mm, compaction: introduce kcompactd
  2016-01-26 15:36 ` [RFC 2/3] mm, compaction: introduce kcompactd Vlastimil Babka
@ 2016-01-27  9:10   ` Mel Gorman
  2016-01-27 12:58     ` Vlastimil Babka
  0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2016-01-27  9:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Andrew Morton, Andrea Arcangeli,
	Kirill A. Shutemov, Rik van Riel, Joonsoo Kim, David Rientjes,
	Michal Hocko, Johannes Weiner

On Tue, Jan 26, 2016 at 04:36:14PM +0100, Vlastimil Babka wrote:
> Memory compaction can be currently performed in several contexts:
> 
> - kswapd balancing a zone after a high-order allocation failure
> - direct compaction to satisfy a high-order allocation, including THP page
>   fault attemps
> - khugepaged trying to collapse a hugepage
> - manually from /proc
> 
> The purpose of compaction is two-fold. The obvious purpose is to satisfy a
> (pending or future) high-order allocation, and is easy to evaluate. The other
> purpose is to keep overal memory fragmentation low and help the
> anti-fragmentation mechanism. The success wrt the latter purpose is more
> difficult to evaluate though.
> 
> The current situation wrt the purposes has a few drawbacks:
> 
> - compaction is invoked only when a high-order page or hugepage is not
>   available (or manually). This might be too late for the purposes of keeping
>   memory fragmentation low.
> - direct compaction increases latency of allocations. Again, it would be
>   better if compaction was performed asynchronously to keep fragmentation low,
>   before the allocation itself comes.
> - (a special case of the previous) the cost of compaction during THP page
>   faults can easily offset the benefits of THP.
> - kswapd compaction appears to be complex, fragile and not working in some
>   scenarios
> 

An addendum to that is that kswapd can be compacting for a high-order
allocation request when it should be reclaiming memory for an order-0
request.

My recollection is that kswapd compacting was meant to help atomic
high-order allocations but I wonder if the same problem even exists with
the revised watermark handling.

> To improve the situation, we should be able to benefit from an equivalent of
> kswapd, but for compaction - i.e. a background thread which responds to
> fragmentation and the need for high-order allocations (including hugepages)
> somewhat proactively.
> 
> One possibility is to extend the responsibilities of kswapd, which could
> however complicate its design too much. It should be better to let kswapd
> handle reclaim, as order-0 allocations are often more critical than high-order
> ones.
> 
> Another possibility is to extend khugepaged, but this kthread is a single
> instance and tied to THP configs.
> 

That also would not handle the atomic high-order allocation case.

> This patch goes with the option of a new set of per-node kthreads called
> kcompactd, and lays the foundations, without introducing any new tunables.
> The lifecycle mimics kswapd kthreads, including the memory hotplug hooks.
> 
> Waking up of the kcompactd threads is also tied to kswapd activity and follows
> these rules:
> - we don't want to affect any fastpaths, so wake up kcompactd only from the
>   slowpath, as it's done for kswapd

Ok

> - if kswapd is doing reclaim, it's more important than compaction, so don't
>   invoke kcompactd until kswapd goes to sleep

This makes sense given that kswapd can be reclaiming order-0 pages so
compaction can even start with a reasonable chance of success.

> - the target order used for kswapd is passed to kcompactd
> 
> The kswapd compact/reclaim loop for high-order pages will be removed in the
> next patch with the description of what's wrong with it.
> 
> In this patch, kcompactd uses the standard compaction_suitable() and
> compact_finished() criteria, which means it will most likely have nothing left
> to do after kswapd finishes, until the next patch. Kcompactd also mimics
> direct compaction somewhat by trying async compaction first and sync compaction
> afterwards, and uses the deferred compaction functionality.
> 

Why should it try async compaction first? The deferred compaction makes
sense as kcompact will need some sort of limitation on the amount of
CPU it can use.

> Future possible future uses for kcompactd include the ability to wake up
> kcompactd on demand in special situations, such as when hugepages are not
> available (currently not done due to __GFP_NO_KSWAPD) or when a fragmentation
> event (i.e. __rmqueue_fallback()) occurs. It's also possible to perform
> periodic compaction with kcompactd.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/compaction.h        |  16 +++
>  include/linux/mmzone.h            |   6 +
>  include/linux/vm_event_item.h     |   1 +
>  include/trace/events/compaction.h |  55 +++++++++
>  mm/compaction.c                   | 227 ++++++++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c               |  15 ++-
>  mm/page_alloc.c                   |   3 +
>  mm/vmscan.c                       |   6 +
>  mm/vmstat.c                       |   1 +
>  9 files changed, 325 insertions(+), 5 deletions(-)
> 
> <SNIP>
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 585de54dbe8c..7452975fa481 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -17,6 +17,9 @@
>  #include <linux/balloon_compaction.h>
>  #include <linux/page-isolation.h>
>  #include <linux/kasan.h>
> +#include <linux/kthread.h>
> +#include <linux/freezer.h>
> +#include <linux/module.h>
>  #include "internal.h"
>  
>  #ifdef CONFIG_COMPACTION
> @@ -29,6 +32,7 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
>  {
>  	count_vm_events(item, delta);
>  }
> +
>  #else
>  #define count_compact_event(item) do { } while (0)
>  #define count_compact_events(item, delta) do { } while (0)

Spurious whitespace change.

> @@ -1759,4 +1763,227 @@ void compaction_unregister_node(struct node *node)
>  }
>  #endif /* CONFIG_SYSFS && CONFIG_NUMA */
>  
> +static bool kcompactd_work_requested(pg_data_t *pgdat)
> +{
> +	return pgdat->kcompactd_max_order > 0;
> +}
> +

inline

> +static bool kcompactd_node_suitable(pg_data_t *pgdat)
> +{
> +	int zoneid;
> +	struct zone *zone;
> +
> +	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
> +		zone = &pgdat->node_zones[zoneid];
> +
> +		if (!populated_zone(zone))
> +			continue;
> +
> +		if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
> +					pgdat->kcompactd_classzone_idx)
> +							== COMPACT_CONTINUE)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +

Why does this traverse all zones and not just the ones within the
classzone_idx?

> +static void kcompactd_do_work(pg_data_t *pgdat)
> +{
> +	/*
> +	 * With no special task, compact all zones so that a page of requested
> +	 * order is allocatable.
> +	 */
> +	int zoneid;
> +	struct zone *zone;
> +	struct compact_control cc = {
> +		.order = pgdat->kcompactd_max_order,
> +		.classzone_idx = pgdat->kcompactd_classzone_idx,
> +		.mode = MIGRATE_ASYNC,
> +		.ignore_skip_hint = true,
> +
> +	};
> +	bool success = false;
> +
> +	trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order,
> +							cc.classzone_idx);
> +	count_vm_event(KCOMPACTD_WAKE);
> +
> +retry:
> +	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {

Again, why is classzone_idx not taken into account?

> +		int status;
> +
> +		zone = &pgdat->node_zones[zoneid];
> +		if (!populated_zone(zone))
> +			continue;
> +
> +		if (compaction_deferred(zone, cc.order))
> +			continue;
> +
> +		if (compaction_suitable(zone, cc.order, 0, zoneid) !=
> +							COMPACT_CONTINUE)
> +			continue;
> +
> +		cc.nr_freepages = 0;
> +		cc.nr_migratepages = 0;
> +		cc.zone = zone;
> +		INIT_LIST_HEAD(&cc.freepages);
> +		INIT_LIST_HEAD(&cc.migratepages);
> +
> +		status = compact_zone(zone, &cc);
> +
> +		if (zone_watermark_ok(zone, cc.order, low_wmark_pages(zone),
> +						cc.classzone_idx, 0)) {
> +			success = true;
> +			compaction_defer_reset(zone, cc.order, false);
> +		} else if (cc.mode != MIGRATE_ASYNC &&
> +						status == COMPACT_COMPLETE) {
> +			defer_compaction(zone, cc.order);
> +		}
> +
> +		VM_BUG_ON(!list_empty(&cc.freepages));
> +		VM_BUG_ON(!list_empty(&cc.migratepages));
> +	}
> +
> +	if (!success && cc.mode == MIGRATE_ASYNC) {
> +		cc.mode = MIGRATE_SYNC_LIGHT;
> +		goto retry;
> +	}
> +

Still not getting why kcompactd should concern itself with async
compaction. It's really direct compaction that cared and was trying to
avoid stalls.

> +	 * Regardless of success, we are done until woken up next. But remember
> +	 * the requested order/classzone_idx in case it was higher/tighter than
> +	 * our current ones
> +	 */
> +	if (pgdat->kcompactd_max_order <= cc.order)
> +		pgdat->kcompactd_max_order = 0;
> +	if (pgdat->classzone_idx >= cc.classzone_idx)
> +		pgdat->classzone_idx = pgdat->nr_zones - 1;
> +}
> +
>
> <SNIP>
>
> @@ -1042,7 +1043,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  	arg.nr_pages = nr_pages;
>  	node_states_check_changes_online(nr_pages, zone, &arg);
>  
> -	nid = pfn_to_nid(pfn);
> +	nid = zone_to_nid(zone);
>  
>  	ret = memory_notify(MEM_GOING_ONLINE, &arg);
>  	ret = notifier_to_errno(ret);
> @@ -1082,7 +1083,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  	pgdat_resize_unlock(zone->zone_pgdat, &flags);
>  
>  	if (onlined_pages) {
> -		node_states_set_node(zone_to_nid(zone), &arg);
> +		node_states_set_node(nid, &arg);
>  		if (need_zonelists_rebuild)
>  			build_all_zonelists(NULL, NULL);
>  		else

Why are these two hunks necessary?

> @@ -1093,8 +1094,10 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  
>  	init_per_zone_wmark_min();
>  
> -	if (onlined_pages)
> -		kswapd_run(zone_to_nid(zone));
> +	if (onlined_pages) {
> +		kswapd_run(nid);
> +		kcompactd_run(nid);
> +	}
>  
>  	vm_total_pages = nr_free_pagecache_pages();
>  
> @@ -1858,8 +1861,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		zone_pcp_update(zone);
>  
>  	node_states_clear_node(node, &arg);
> -	if (arg.status_change_nid >= 0)
> +	if (arg.status_change_nid >= 0) {
>  		kswapd_stop(node);
> +		kcompactd_stop(node);
> +	}
>  
>  	vm_total_pages = nr_free_pagecache_pages();
>  	writeback_set_ratelimit();
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 63358d9f9aa9..7747eb36e789 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5212,6 +5212,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  #endif
>  	init_waitqueue_head(&pgdat->kswapd_wait);
>  	init_waitqueue_head(&pgdat->pfmemalloc_wait);
> +#ifdef CONFIG_COMPACTION
> +	init_waitqueue_head(&pgdat->kcompactd_wait);
> +#endif
>  	pgdat_page_ext_init(pgdat);
>  
>  	for (j = 0; j < MAX_NR_ZONES; j++) {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 72d52d3aef74..1449e21c55cc 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3408,6 +3408,12 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
>  		 */
>  		reset_isolation_suitable(pgdat);
>  
> +		/*
> +		 * We have freed the memory, now we should compact it to make
> +		 * allocation of the requested order possible.
> +		 */
> +		wakeup_kcompactd(pgdat, order, classzone_idx);
> +
>  		if (!kthread_should_stop())
>  			schedule();
>  

This initially confused me but it's due to patch ordering. It's silly
but when this patch is applied then both kswapd and kcompactd are
compacting memory. I would prefer if the patches were in reverse order
but that is purely taste.

While this was not a comprehensive review, I think the patch is ok in
principal. While deferred compaction will keep the CPU usage under control,
the main concern is that kcompactd consumes too much CPU but I do not
see a case where that would trigger that kswapd would not have
encountered already.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC 2/3] mm, compaction: introduce kcompactd
  2016-01-27  9:10   ` Mel Gorman
@ 2016-01-27 12:58     ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2016-01-27 12:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, Andrew Morton, Andrea Arcangeli,
	Kirill A. Shutemov, Rik van Riel, Joonsoo Kim, David Rientjes,
	Michal Hocko, Johannes Weiner

On 01/27/2016 10:10 AM, Mel Gorman wrote:
> On Tue, Jan 26, 2016 at 04:36:14PM +0100, Vlastimil Babka wrote:
>> Memory compaction can be currently performed in several contexts:
>> 
>> - kswapd balancing a zone after a high-order allocation failure
>> - direct compaction to satisfy a high-order allocation, including THP page
>>   fault attemps
>> - khugepaged trying to collapse a hugepage
>> - manually from /proc
>> 
>> The purpose of compaction is two-fold. The obvious purpose is to satisfy a
>> (pending or future) high-order allocation, and is easy to evaluate. The other
>> purpose is to keep overal memory fragmentation low and help the
>> anti-fragmentation mechanism. The success wrt the latter purpose is more
>> difficult to evaluate though.
>> 
>> The current situation wrt the purposes has a few drawbacks:
>> 
>> - compaction is invoked only when a high-order page or hugepage is not
>>   available (or manually). This might be too late for the purposes of keeping
>>   memory fragmentation low.
>> - direct compaction increases latency of allocations. Again, it would be
>>   better if compaction was performed asynchronously to keep fragmentation low,
>>   before the allocation itself comes.
>> - (a special case of the previous) the cost of compaction during THP page
>>   faults can easily offset the benefits of THP.
>> - kswapd compaction appears to be complex, fragile and not working in some
>>   scenarios
>> 
> 
> An addendum to that is that kswapd can be compacting for a high-order
> allocation request when it should be reclaiming memory for an order-0
> request.

Right, thanks.

> My recollection is that kswapd compacting was meant to help atomic
> high-order allocations but I wonder if the same problem even exists with
> the revised watermark handling.

Well, certainly nobody noticed kswapd compaction being dysfunctional.

> 
>> - the target order used for kswapd is passed to kcompactd
>> 
>> The kswapd compact/reclaim loop for high-order pages will be removed in the
>> next patch with the description of what's wrong with it.
>> 
>> In this patch, kcompactd uses the standard compaction_suitable() and
>> compact_finished() criteria, which means it will most likely have nothing left
>> to do after kswapd finishes, until the next patch. Kcompactd also mimics
>> direct compaction somewhat by trying async compaction first and sync compaction
>> afterwards, and uses the deferred compaction functionality.
>> 
> 
> Why should it try async compaction first? The deferred compaction makes
> sense as kcompact will need some sort of limitation on the amount of
> CPU it can use.

I was just being conservative, but good point. Unlike direct compaction, latency
doesn't bother kcompactd.

> 
>> @@ -1759,4 +1763,227 @@ void compaction_unregister_node(struct node *node)
>>  }
>>  #endif /* CONFIG_SYSFS && CONFIG_NUMA */
>>  
>> +static bool kcompactd_work_requested(pg_data_t *pgdat)
>> +{
>> +	return pgdat->kcompactd_max_order > 0;
>> +}
>> +
> 
> inline
> 
>> +static bool kcompactd_node_suitable(pg_data_t *pgdat)
>> +{
>> +	int zoneid;
>> +	struct zone *zone;
>> +
>> +	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
>> +		zone = &pgdat->node_zones[zoneid];
>> +
>> +		if (!populated_zone(zone))
>> +			continue;
>> +
>> +		if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
>> +					pgdat->kcompactd_classzone_idx)
>> +							== COMPACT_CONTINUE)
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
> 
> Why does this traverse all zones and not just the ones within the
> classzone_idx?

Hmm, guess I didn't revisit it after previous submission where kswapd compaction
wasn't being replaced by kcompactd. But kswapd also tries to balance higher
zones than those given by classzone_idx, if they needed. I'll rethink this.

>> +static void kcompactd_do_work(pg_data_t *pgdat)
>> +{
>> +	/*
>> +	 * With no special task, compact all zones so that a page of requested
>> +	 * order is allocatable.
>> +	 */
>> +	int zoneid;
>> +	struct zone *zone;
>> +	struct compact_control cc = {
>> +		.order = pgdat->kcompactd_max_order,
>> +		.classzone_idx = pgdat->kcompactd_classzone_idx,
>> +		.mode = MIGRATE_ASYNC,
>> +		.ignore_skip_hint = true,
>> +
>> +	};
>> +	bool success = false;
>> +
>> +	trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order,
>> +							cc.classzone_idx);
>> +	count_vm_event(KCOMPACTD_WAKE);
>> +
>> +retry:
>> +	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
> 
> Again, why is classzone_idx not taken into account?

Might be worth to just do everything once we've woken up, like kswapd.
Deferred+suitable checks should prevent wasted attempts in either case.

[...]

>> +
>> +	if (!success && cc.mode == MIGRATE_ASYNC) {
>> +		cc.mode = MIGRATE_SYNC_LIGHT;
>> +		goto retry;
>> +	}
>> +
> 
> Still not getting why kcompactd should concern itself with async
> compaction. It's really direct compaction that cared and was trying to
> avoid stalls.

Right

>> +	 * Regardless of success, we are done until woken up next. But remember
>> +	 * the requested order/classzone_idx in case it was higher/tighter than
>> +	 * our current ones
>> +	 */
>> +	if (pgdat->kcompactd_max_order <= cc.order)
>> +		pgdat->kcompactd_max_order = 0;
>> +	if (pgdat->classzone_idx >= cc.classzone_idx)
>> +		pgdat->classzone_idx = pgdat->nr_zones - 1;
>> +}
>> +
>>
>> <SNIP>
>>
>> @@ -1042,7 +1043,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>>  	arg.nr_pages = nr_pages;
>>  	node_states_check_changes_online(nr_pages, zone, &arg);
>>  
>> -	nid = pfn_to_nid(pfn);
>> +	nid = zone_to_nid(zone);
>>  
>>  	ret = memory_notify(MEM_GOING_ONLINE, &arg);
>>  	ret = notifier_to_errno(ret);
>> @@ -1082,7 +1083,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>>  	pgdat_resize_unlock(zone->zone_pgdat, &flags);
>>  
>>  	if (onlined_pages) {
>> -		node_states_set_node(zone_to_nid(zone), &arg);
>> +		node_states_set_node(nid, &arg);
>>  		if (need_zonelists_rebuild)
>>  			build_all_zonelists(NULL, NULL);
>>  		else
> 
> Why are these two hunks necessary?

Just a drive-by cleanup/optimization that didn't seem worth separate patch. But
I probably should?

> 
>> @@ -1093,8 +1094,10 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>>  
>>  	init_per_zone_wmark_min();
>>  
>> -	if (onlined_pages)
>> -		kswapd_run(zone_to_nid(zone));
>> +	if (onlined_pages) {
>> +		kswapd_run(nid);
>> +		kcompactd_run(nid);
>> +	}
>>  
>>  	vm_total_pages = nr_free_pagecache_pages();
>>  
>> @@ -1858,8 +1861,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>>  		zone_pcp_update(zone);
>>  
>>  	node_states_clear_node(node, &arg);
>> -	if (arg.status_change_nid >= 0)
>> +	if (arg.status_change_nid >= 0) {
>>  		kswapd_stop(node);
>> +		kcompactd_stop(node);
>> +	}
>>  
>>  	vm_total_pages = nr_free_pagecache_pages();
>>  	writeback_set_ratelimit();
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 63358d9f9aa9..7747eb36e789 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5212,6 +5212,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>>  #endif
>>  	init_waitqueue_head(&pgdat->kswapd_wait);
>>  	init_waitqueue_head(&pgdat->pfmemalloc_wait);
>> +#ifdef CONFIG_COMPACTION
>> +	init_waitqueue_head(&pgdat->kcompactd_wait);
>> +#endif
>>  	pgdat_page_ext_init(pgdat);
>>  
>>  	for (j = 0; j < MAX_NR_ZONES; j++) {
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 72d52d3aef74..1449e21c55cc 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -3408,6 +3408,12 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
>>  		 */
>>  		reset_isolation_suitable(pgdat);
>>  
>> +		/*
>> +		 * We have freed the memory, now we should compact it to make
>> +		 * allocation of the requested order possible.
>> +		 */
>> +		wakeup_kcompactd(pgdat, order, classzone_idx);
>> +
>>  		if (!kthread_should_stop())
>>  			schedule();
>>  
> 
> This initially confused me but it's due to patch ordering. It's silly
> but when this patch is applied then both kswapd and kcompactd are
> compacting memory. I would prefer if the patches were in reverse order
> but that is purely taste.

In reverse order there would be a case where neither is compacting. Guess I'll
just move the wakeup to the next patch. The separation is mainly for making
review more tractable.

> While this was not a comprehensive review, I think the patch is ok in
> principal. While deferred compaction will keep the CPU usage under control,
> the main concern is that kcompactd consumes too much CPU but I do not
> see a case where that would trigger that kswapd would not have
> encountered already.

Thanks! On the opposite, kswapd didn't consider deferred compaction, so it could
consume too much CPU if it wasn't otherwise broken.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-27 12:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-26 15:36 [RFC 1/3] mm, kswapd: remove bogus check of balance_classzone_idx Vlastimil Babka
2016-01-26 15:36 ` [RFC 2/3] mm, compaction: introduce kcompactd Vlastimil Babka
2016-01-27  9:10   ` Mel Gorman
2016-01-27 12:58     ` Vlastimil Babka
2016-01-26 15:36 ` [RFC 3/3] mm, kswapd: stop performing compaction from kswapd Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).