[RFC 0/2] Larger Order Protection V1

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/2] Larger Order Protection V1
@ 2018-02-16 16:01 Christoph Lameter
  2018-02-16 16:01 ` [RFC 1/2] Protect larger order pages from breaking up Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Christoph Lameter @ 2018-02-16 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Matthew Wilcox, linux-mm, linux-rdma, akpm,
	Thomas Schoebel-Theuer, andi, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz

We have discussed for years ways to create more reliable ways to allocate large contiguous
memory segments and to avoid fragmentation. This is an ad hoc scheme based on reservation
of higher order pages in the page allocator. It is fully transparent and integrated
into the page allocator.

This approach goes back to the meeting on contiguous memory at the Plumbers conference
in 2017 and the effort by Guy and Mike Kravetz to establish and API to map contiguous
memory segments into user space. Reservations will allow the contiguous memory allocations
to work even after the system has run for a considerable time.

Contiguous memory is also important for general system performance. F.e. slab
allocators can be made to use large frames in order to optimize performance.
See patch 1.

Other use cases are jumbo frames or device driver specific allocations.

For more on this see Mike Kravetz patches in particular 

Mike Kravetz MMAP CONTIG flag support at

https://lkml.org/lkml/2017/10/3/992

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 16:01 [RFC 0/2] Larger Order Protection V1 Christoph Lameter
@ 2018-02-16 16:01 ` Christoph Lameter
  2018-02-16 18:02   ` Randy Dunlap
                     ` (2 more replies)
  2018-02-16 16:01 ` [RFC 2/2] Page order diagnostics Christoph Lameter
  2018-02-16 18:27 ` [RFC 0/2] Larger Order Protection V1 Christopher Lameter
  2 siblings, 3 replies; 17+ messages in thread
From: Christoph Lameter @ 2018-02-16 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz

[-- Attachment #1: limit_order --]
[-- Type: text/plain, Size: 10140 bytes --]

Over time as the kernel is churning through memory it will break
up larger pages and as time progresses larger contiguous allocations
will no longer be possible. This is an approach to preserve these
large pages and prevent them from being broken up.

This is useful for example for the use of jumbo pages and can
satify various needs of subsystems and device drivers that require
large contiguous allocation to operate properly.

The idea is to reserve a pool of pages of the required order
so that the kernel is not allowed to use the pages for allocations
of a different order. This is a pool that is fully integrated
into the page allocator and therefore transparently usable.

Control over this feature is by writing to /proc/zoneinfo.

F.e. to ensure that 2000 16K pages stay available for jumbo
frames do

	echo "2=2000" >/proc/zoneinfo

or through the order=<page spec> on the kernel command line.
F.e.

	order=2=2000,4N2=500

These pages will be subject to reclaim etc as usual but will not
be broken up.

One can then also f.e. operate the slub allocator with
64k pages. Specify "slub_max_order=4 slub_min_order=4" on
the kernel command line and all slab allocator allocations
will occur in 64K page sizes.

Note that this will reduce the memory available to the application
in some cases. Reclaim may occur more often. If more than
the reserved number of higher order pages are being used then
allocations will still fail as normal.

In order to make this work just right one needs to be able to
know the workload well enough to reserve the right amount
of pages. This is comparable to other reservation schemes.

Well that f.e brings up huge pages. You can of course
also use this to reserve those and can then be sure that
you can dynamically resize your huge page pools even after
a long time of system up time.

The idea for this patch came from Thomas Schoebel-Theuer whom I met
at the LCA and who described the approach to me promising
a patch that would do this. Sadly he has vanished somehow.
However, he has been using this approach to support a
production environment for numerous years.

So I redid his patch and this is the first draft of it.


Idea-by: Thomas Schoebel-Theuer <tst-0Nly+W1lFbFDiq0p6IFu4YQuADTiUCJX@public.gmane.org>

First performance tests in a virtual enviroment show
a hackbench improvement by 6% just by increasing
the page size used by the page allocator to order 3.

Signed-off-by: Christopher Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Index: linux/include/linux/mmzone.h
===================================================================
--- linux.orig/include/linux/mmzone.h
+++ linux/include/linux/mmzone.h
@@ -96,6 +96,11 @@ extern int page_group_by_mobility_disabl
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
+	/* We stop breaking up pages of this order if less than
+	 * min are available. At that point the pages can only
+	 * be used for allocations of that particular order.
+	 */
+	unsigned long		min;
 };
 
 struct pglist_data;
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -1844,7 +1844,12 @@ struct page *__rmqueue_smallest(struct z
 		area = &(zone->free_area[current_order]);
 		page = list_first_entry_or_null(&area->free_list[migratetype],
 							struct page, lru);
-		if (!page)
+		/*
+		 * Continue if no page is found or if our freelist contains
+		 * less than the minimum pages of that order. In that case
+		 * we better look for a different order.
+		 */
+		if (!page || area->nr_free < area->min)
 			continue;
 		list_del(&page->lru);
 		rmv_page_order(page);
@@ -5190,6 +5195,57 @@ static void build_zonelists(pg_data_t *p
 
 #endif	/* CONFIG_NUMA */
 
+int set_page_order_min(int node, int order, unsigned min)
+{
+	int i, o;
+	long min_pages = 0;			/* Pages already reserved */
+	long managed_pages = 0;			/* Pages managed on the node */
+	struct zone *last;
+	unsigned remaining;
+
+	/*
+	 * Determine already reserved memory for orders
+	 * plus the total of the pages on the node
+	 */
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		struct zone *z = &NODE_DATA(node)->node_zones[i];
+		if (managed_zone(z)) {
+			for (o = 0; o < MAX_ORDER; o++) {
+				if (o != order)
+					min_pages += z->free_area[o].min << o;
+
+			}
+			managed_pages += z->managed_pages;
+		}
+	}
+
+	if (min_pages + (min << order) > managed_pages / 2)
+		return -ENOMEM;
+
+	/* Set the min values for all zones on the node */
+	remaining = min;
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		struct zone *z = &NODE_DATA(node)->node_zones[i];
+		if (managed_zone(z)) {
+			u64 tmp;
+
+			tmp = (u64)z->managed_pages * (min << order);
+			do_div(tmp, managed_pages);
+			tmp >>= order;
+			z->free_area[order].min = tmp;
+
+			last = z;
+			remaining -= tmp;
+		}
+	}
+
+	/* Deal with rounding errors */
+	if (remaining)
+		last->free_area[order].min += remaining;
+
+	return 0;
+}
+
 /*
  * Boot pageset table. One per cpu which is going to be used for all
  * zones and all nodes. The parameters will be set in such a way
@@ -5424,6 +5480,7 @@ static void __meminit zone_init_free_lis
 	for_each_migratetype_order(order, t) {
 		INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
 		zone->free_area[order].nr_free = 0;
+		zone->free_area[order].min = 0;
 	}
 }
 
@@ -6998,6 +7055,7 @@ static void __setup_per_zone_wmarks(void
 	unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
 	unsigned long lowmem_pages = 0;
 	struct zone *zone;
+	int order;
 	unsigned long flags;
 
 	/* Calculate total number of !ZONE_HIGHMEM pages */
@@ -7012,6 +7070,10 @@ static void __setup_per_zone_wmarks(void
 		spin_lock_irqsave(&zone->lock, flags);
 		tmp = (u64)pages_min * zone->managed_pages;
 		do_div(tmp, lowmem_pages);
+
+		for (order = 0; order < MAX_ORDER; order++)
+			tmp += zone->free_area[order].min << order;
+
 		if (is_highmem(zone)) {
 			/*
 			 * __GFP_HIGH and PF_MEMALLOC allocations usually don't
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -27,6 +27,7 @@
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/ctype.h>
 
 #include "internal.h"
 
@@ -1614,6 +1615,11 @@ static void zoneinfo_show_print(struct s
 				zone_numa_state_snapshot(zone, i));
 #endif
 
+	for (i = 0; i < MAX_ORDER; i++)
+		if (zone->free_area[i].min)
+			seq_printf(m, "\nPreserve %lu pages of order %d from breaking up.",
+				zone->free_area[i].min, i);
+
 	seq_printf(m, "\n  pagesets");
 	for_each_online_cpu(i) {
 		struct per_cpu_pageset *pageset;
@@ -1641,6 +1647,122 @@ static void zoneinfo_show_print(struct s
 	seq_putc(m, '\n');
 }
 
+static int __order_protect(char *p)
+{
+	char c;
+
+	do {
+		int order = 0;
+		int pages = 0;
+		int node = 0;
+		int rc;
+
+		/* Syntax <order>[N<node>]=number */
+		if (!isdigit(*p))
+			return -EFAULT;
+
+		while (true) {
+			c = *p++;
+
+			if (!isdigit(c))
+				break;
+
+			order = order * 10 + c - '0';
+		}
+
+		/* Check for optional node specification */
+		if (c == 'N') {
+			if (!isdigit(*p))
+				return -EFAULT;
+
+			while (true) {
+				c = *p++;
+				if (!isdigit(c))
+					break;
+				node = node * 10 + c - '0';
+			}
+		}
+
+		if (c != '=')
+			return -EINVAL;
+
+		if (!isdigit(*p))
+			return -EINVAL;
+
+		while (true) {
+			c = *p++;
+			if (!isdigit(c))
+				break;
+			pages = pages * 10 + c - '0';
+		}
+
+		if (order == 0 || order >= MAX_ORDER)
+		       return -EINVAL;
+
+		if (!node_online(node))
+			return -ENOSYS;
+
+		rc = set_page_order_min(node, order, pages);
+		if (rc)
+			return rc;
+
+	} while (c == ',');
+
+	if (c)
+		return -EINVAL;
+
+	setup_per_zone_wmarks();
+
+	return 0;
+}
+
+/*
+ * Writing to /proc/zoneinfo allows to setup the large page breakup
+ * protection.
+ *
+ * Syntax:
+ * 	<order>[N<node>]=<number>{,<order>[N<node>]=<number>}
+ *
+ * F.e. Protecting 500 pages of order 2 (16K on intel) and 300 of
+ * order 4 (64K) on node 1
+ *
+ * 	echo "2=500,4N1=300" >/proc/zoneinfo
+ *
+ */
+static ssize_t zoneinfo_write(struct file *file, const char __user *buffer,
+			size_t count, loff_t *ppos)
+{
+	char zinfo[200];
+	int rc;
+
+	if (count > sizeof(zinfo))
+		return -EINVAL;
+
+	if (copy_from_user(zinfo, buffer, count))
+		return -EFAULT;
+
+	zinfo[count - 1] = 0;
+
+	rc = __order_protect(zinfo);
+
+	if (rc)
+		return rc;
+
+	return count;
+}
+
+static int order_protect(char *s)
+{
+	int rc;
+
+	rc = __order_protect(s);
+	if (rc)
+		printk("Invalid order=%s rc=%d\n",s, rc);
+
+	return 1;
+}
+__setup("order=", order_protect);
+
 /*
  * Output information about zones in @pgdat.  All zones are printed regardless
  * of whether they are populated or not: lowmem_reserve_ratio operates on the
@@ -1672,6 +1794,7 @@ static const struct file_operations zone
 	.read		= seq_read,
 	.llseek		= seq_lseek,
 	.release	= seq_release,
+	.write		= zoneinfo_write,
 };
 
 enum writeback_stat_item {
@@ -2016,7 +2139,7 @@ void __init init_mm_internals(void)
 	proc_create("buddyinfo", 0444, NULL, &buddyinfo_file_operations);
 	proc_create("pagetypeinfo", 0444, NULL, &pagetypeinfo_file_operations);
 	proc_create("vmstat", 0444, NULL, &vmstat_file_operations);
-	proc_create("zoneinfo", 0444, NULL, &zoneinfo_file_operations);
+	proc_create("zoneinfo", 0644, NULL, &zoneinfo_file_operations);
 #endif
 }
 
Index: linux/include/linux/gfp.h
===================================================================
--- linux.orig/include/linux/gfp.h
+++ linux/include/linux/gfp.h
@@ -543,6 +543,7 @@ void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
 void page_alloc_init_late(void);
+int set_page_order_min(int node, int order, unsigned min);
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC 2/2] Page order diagnostics
  2018-02-16 16:01 [RFC 0/2] Larger Order Protection V1 Christoph Lameter
  2018-02-16 16:01 ` [RFC 1/2] Protect larger order pages from breaking up Christoph Lameter
@ 2018-02-16 16:01 ` Christoph Lameter
       [not found]   ` <20180216160121.583566579-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2018-02-16 18:27 ` [RFC 0/2] Larger Order Protection V1 Christopher Lameter
  2 siblings, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2018-02-16 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz

[-- Attachment #1: order_stats --]
[-- Type: text/plain, Size: 5595 bytes --]

It is beneficial to know about the contiguous memory segments
available on a system and the number of allocations failing
for each page order.

This patch adds details per order statistics to /proc/meminfo
so the current memory use can be determined.

Also adds counters to /proc/vmstat to show allocation
failures for each page order.

Signed-off-by: Christoph Laeter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Index: linux/include/linux/mmzone.h
===================================================================
--- linux.orig/include/linux/mmzone.h
+++ linux/include/linux/mmzone.h
@@ -185,6 +185,10 @@ enum node_stat_item {
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+#ifdef CONFIG_ORDER_STATS
+	NR_ORDER,
+	NR_ORDER_MAX = NR_ORDER + MAX_ORDER - 1,
+#endif
 	NR_VM_NODE_STAT_ITEMS
 };
 
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -828,6 +828,10 @@ static inline void __free_one_page(struc
 	VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
 
+#ifdef CONFIG_ORDER_STATS
+	dec_node_page_state(page, NR_ORDER + order);
+#endif
+
 continue_merging:
 	while (order < max_order - 1) {
 		buddy_pfn = __find_buddy_pfn(pfn, order);
@@ -1285,6 +1289,9 @@ static void __init __free_pages_boot_cor
 	page_zone(page)->managed_pages += nr_pages;
 	set_page_refcounted(page);
 	__free_pages(page, order);
+#ifdef CONFIG_ORDER_STATS
+	inc_node_page_state(page, NR_ORDER + order);
+#endif
 }
 
 #if defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) || \
@@ -1855,6 +1862,9 @@ struct page *__rmqueue_smallest(struct z
 		rmv_page_order(page);
 		area->nr_free--;
 		expand(zone, page, order, current_order, area, migratetype);
+#ifdef CONFIG_ORDER_STATS
+		inc_node_page_state(page, NR_ORDER + order);
+#endif
 		set_pcppage_migratetype(page, migratetype);
 		return page;
 	}
@@ -4169,6 +4179,11 @@ nopage:
 fail:
 	warn_alloc(gfp_mask, ac->nodemask,
 			"page allocation failure: order:%u", order);
+
+#ifdef CONFIG_ORDER_STATS
+	count_vm_event(ORDER0_ALLOC_FAIL + order);
+#endif
+
 got_pg:
 	return page;
 }
Index: linux/fs/proc/meminfo.c
===================================================================
--- linux.orig/fs/proc/meminfo.c
+++ linux/fs/proc/meminfo.c
@@ -51,6 +51,7 @@ static int meminfo_proc_show(struct seq_
 	long available;
 	unsigned long pages[NR_LRU_LISTS];
 	int lru;
+	int order;
 
 	si_meminfo(&i);
 	si_swapinfo(&i);
@@ -155,6 +156,11 @@ static int meminfo_proc_show(struct seq_
 		    global_zone_page_state(NR_FREE_CMA_PAGES));
 #endif
 
+#ifdef CONFIG_ORDER_STATS
+	for (order= 0; order < MAX_ORDER; order++)
+		seq_printf(m, "Order%2d Pages:     %5lu\n",
+			order, global_node_page_state(NR_ORDER + order));
+#endif
 	hugetlb_report_meminfo(m);
 
 	arch_report_meminfo(m);
Index: linux/mm/Kconfig
===================================================================
--- linux.orig/mm/Kconfig
+++ linux/mm/Kconfig
@@ -752,6 +752,15 @@ config PERCPU_STATS
 	  information includes global and per chunk statistics, which can
 	  be used to help understand percpu memory usage.
 
+config ORDER_STATS
+	bool "Statistics for different sized allocations"
+	default n
+	help
+	  Create statistics about the contiguous memory segments allocated
+	  through the page allocator. This creates statistics about the
+	  memory segments in use in /proc/meminfo and the node meminfo files
+	  as well as allocation failure statistics in /proc/vmstat.
+
 config GUP_BENCHMARK
 	bool "Enable infrastructure for get_user_pages_fast() benchmarking"
 	default n
Index: linux/include/linux/vm_event_item.h
===================================================================
--- linux.orig/include/linux/vm_event_item.h
+++ linux/include/linux/vm_event_item.h
@@ -111,6 +111,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 		SWAP_RA,
 		SWAP_RA_HIT,
 #endif
+#ifdef CONFIG_ORDER_STATS
+		ORDER0_ALLOC_FAIL,
+		ORDER_MAX_FAIL = ORDER0_ALLOC_FAIL + MAX_ORDER -1,
+#endif
 		NR_VM_EVENT_ITEMS
 };
 
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1289,6 +1289,52 @@ const char * const vmstat_text[] = {
 	"swap_ra",
 	"swap_ra_hit",
 #endif
+#ifdef CONFIG_ORDER_STATS
+	"order0_failure",
+	"order1_failure",
+	"order2_failure",
+	"order3_failure",
+	"order4_failure",
+	"order5_failure",
+	"order6_failure",
+	"order7_failure",
+	"order8_failure",
+	"order9_failure",
+	"order10_failure",
+#ifdef CONFIG_FORCE_MAX_ZONEORDER
+#if MAX_ORDER > 11
+	"order11_failure"
+#endif
+#if MAX_ORDER > 12
+	"order12_failure"
+#endif
+#if MAX_ORDER > 13
+	"order13_failure"
+#endif
+#if MAX_ORDER > 14
+	"order14_failure"
+#endif
+#if MAX_ORDER > 15
+	"order15_failure"
+#endif
+#if MAX_ORDER > 16
+	"order16_failure"
+#endif
+#if MAX_ORDER > 17
+	"order17_failure"
+#endif
+#if MAX_ORDER > 18
+	"order18_failure"
+#endif
+#if MAX_ORDER > 19
+	"order19_failure"
+#endif
+#if MAX_ORDER > 20
+#error Please add more lines...
+#endif
+
+#endif /* CONFIG_FORCE_MAX_ZONEORDER */
+#endif /* CONFIG_ORDER_STATS */
 #endif /* CONFIG_VM_EVENTS_COUNTERS */
 };
 #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
       [not found]   ` <20180216160121.519788537-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
@ 2018-02-16 17:03     ` Andi Kleen
       [not found]       ` <20180216170354.vpbuugzqsrrfc4js-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
  2018-02-16 19:01     ` Dave Hansen
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2018-02-16 17:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz

> First performance tests in a virtual enviroment show
> a hackbench improvement by 6% just by increasing
> the page size used by the page allocator to order 3.

So why is hackbench improving? Is that just for kernel stacks?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 16:01 ` [RFC 1/2] Protect larger order pages from breaking up Christoph Lameter
@ 2018-02-16 18:02   ` Randy Dunlap
       [not found]     ` <b76028c6-c755-8178-2dfc-81c7db1f8bed-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2018-02-16 18:59   ` Mike Kravetz
       [not found]   ` <20180216160121.519788537-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2018-02-16 18:02 UTC (permalink / raw)
  To: Christoph Lameter, Mel Gorman
  Cc: Matthew Wilcox, linux-mm, linux-rdma, akpm,
	Thomas Schoebel-Theuer, andi, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz

On 02/16/2018 08:01 AM, Christoph Lameter wrote:
> Control over this feature is by writing to /proc/zoneinfo.
> 
> F.e. to ensure that 2000 16K pages stay available for jumbo
> frames do
> 
> 	echo "2=2000" >/proc/zoneinfo
> 
> or through the order=<page spec> on the kernel command line.
> F.e.
> 
> 	order=2=2000,4N2=500


Please document the the kernel command line option in
Documentation/admin-guide/kernel-parameters.txt.

I suppose that /proc/zoneinfo should be added somewhere in Documentation/vm/
but I'm not sure where that would be.

thanks,
-- 
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
       [not found]       ` <20180216170354.vpbuugzqsrrfc4js-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
@ 2018-02-16 18:25         ` Christopher Lameter
  0 siblings, 0 replies; 17+ messages in thread
From: Christopher Lameter @ 2018-02-16 18:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mel Gorman, Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	Rik van Riel, Michal Hocko, Guy Shattah, Anshuman Khandual,
	Michal Nazarewicz, Vlastimil Babka, David Nellans, Laura Abbott,
	Pavel Machek, Dave Hansen, Mike Kravetz

On Fri, 16 Feb 2018, Andi Kleen wrote:

> > First performance tests in a virtual enviroment show
> > a hackbench improvement by 6% just by increasing
> > the page size used by the page allocator to order 3.
>
> So why is hackbench improving? Is that just for kernel stacks?

Less stack overhead. The large the page size the less metadata need to be
handled. The freelists get larger and the chance of hitting the per cpu
freelist increases.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 0/2] Larger Order Protection V1
  2018-02-16 16:01 [RFC 0/2] Larger Order Protection V1 Christoph Lameter
  2018-02-16 16:01 ` [RFC 1/2] Protect larger order pages from breaking up Christoph Lameter
  2018-02-16 16:01 ` [RFC 2/2] Page order diagnostics Christoph Lameter
@ 2018-02-16 18:27 ` Christopher Lameter
  2 siblings, 0 replies; 17+ messages in thread
From: Christopher Lameter @ 2018-02-16 18:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Matthew Wilcox, linux-mm, linux-rdma, akpm,
	Thomas Schoebel-Theuer, andi, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz


Why are the patches not making linux-mm? They are on other mailing lists.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 16:01 ` [RFC 1/2] Protect larger order pages from breaking up Christoph Lameter
  2018-02-16 18:02   ` Randy Dunlap
@ 2018-02-16 18:59   ` Mike Kravetz
       [not found]     ` <5108eb20-2b20-bd48-903e-bce312e96974-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
       [not found]   ` <20180216160121.519788537-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 17+ messages in thread
From: Mike Kravetz @ 2018-02-16 18:59 UTC (permalink / raw)
  To: Christoph Lameter, Mel Gorman
  Cc: Matthew Wilcox, linux-mm, linux-rdma, akpm,
	Thomas Schoebel-Theuer, andi, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On 02/16/2018 08:01 AM, Christoph Lameter wrote:
> Over time as the kernel is churning through memory it will break
> up larger pages and as time progresses larger contiguous allocations
> will no longer be possible. This is an approach to preserve these
> large pages and prevent them from being broken up.
> 
> This is useful for example for the use of jumbo pages and can
> satify various needs of subsystems and device drivers that require
> large contiguous allocation to operate properly.
> 
> The idea is to reserve a pool of pages of the required order
> so that the kernel is not allowed to use the pages for allocations
> of a different order. This is a pool that is fully integrated
> into the page allocator and therefore transparently usable.
> 
> Control over this feature is by writing to /proc/zoneinfo.
> 
> F.e. to ensure that 2000 16K pages stay available for jumbo
> frames do
> 
> 	echo "2=2000" >/proc/zoneinfo
> 
> or through the order=<page spec> on the kernel command line.
> F.e.
> 
> 	order=2=2000,4N2=500
> 
> These pages will be subject to reclaim etc as usual but will not
> be broken up.
> 
> One can then also f.e. operate the slub allocator with
> 64k pages. Specify "slub_max_order=4 slub_min_order=4" on
> the kernel command line and all slab allocator allocations
> will occur in 64K page sizes.
> 
> Note that this will reduce the memory available to the application
> in some cases. Reclaim may occur more often. If more than
> the reserved number of higher order pages are being used then
> allocations will still fail as normal.
> 
> In order to make this work just right one needs to be able to
> know the workload well enough to reserve the right amount
> of pages. This is comparable to other reservation schemes.

Yes.

I like the idea that this only comes into play as the result of explicit
user/sysadmin action.  It does remind me of hugetlbfs reservations.  So,
we hope that only people who really know their workload and know what
they are doing would use this feature.

> Well that f.e brings up huge pages. You can of course
> also use this to reserve those and can then be sure that
> you can dynamically resize your huge page pools even after
> a long time of system up time.

Yes, and no.  Doesn't that assume nobody else is doing allocations
of that size?  For example, I could image THP using huge page sized
reservations.  The when it comes time to resize your hugetlbfs pool
there may not be enough.  Although, we may quickly split THP pages
in this case.  I am not sure.

IIRC, Guy Shattah's use case was for allocations greater than MAX_ORDER.
This would not directly address that.  A huge contiguous area (2GB) is
the sweet spot' for best performance in his case.  However, I think he
could still benefit from using a set of larger (such as 2MB) size
allocations which this scheme could help with.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
       [not found]   ` <20180216160121.519788537-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
  2018-02-16 17:03     ` Andi Kleen
@ 2018-02-16 19:01     ` Dave Hansen
  2018-02-16 20:15       ` Christopher Lameter
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-02-16 19:01 UTC (permalink / raw)
  To: Christoph Lameter, Mel Gorman
  Cc: Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Mike Kravetz

On 02/16/2018 08:01 AM, Christoph Lameter wrote:
> In order to make this work just right one needs to be able to
> know the workload well enough to reserve the right amount
> of pages. This is comparable to other reservation schemes.

Yes, but it's a reservation scheme that doesn't show up in MemFree, for
instance.  Even hugetlbfs-reserved memory subtracts from that.

This has the potential to be really confusing to apps.  If this memory
is now not available to normal apps, they might plow into the invisible
memory limits and get into nasty reclaim scenarios.

Shouldn't this subtract the memory for MemFree and friends?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
       [not found]     ` <5108eb20-2b20-bd48-903e-bce312e96974-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2018-02-16 20:13       ` Christopher Lameter
  2018-02-18  9:00         ` Guy Shattah
  0 siblings, 1 reply; 17+ messages in thread
From: Christopher Lameter @ 2018-02-16 20:13 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Mel Gorman, Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On Fri, 16 Feb 2018, Mike Kravetz wrote:

> > Well that f.e brings up huge pages. You can of course
> > also use this to reserve those and can then be sure that
> > you can dynamically resize your huge page pools even after
> > a long time of system up time.
>
> Yes, and no.  Doesn't that assume nobody else is doing allocations
> of that size?  For example, I could image THP using huge page sized
> reservations.  The when it comes time to resize your hugetlbfs pool
> there may not be enough.  Although, we may quickly split THP pages
> in this case.  I am not sure.

Yup it has a pool for everyone. Question is how to divide the loot ;-)

> IIRC, Guy Shattah's use case was for allocations greater than MAX_ORDER.
> This would not directly address that.  A huge contiguous area (2GB) is
> the sweet spot' for best performance in his case.  However, I think he
> could still benefit from using a set of larger (such as 2MB) size
> allocations which this scheme could help with.

MAX_ORDER can be increased to allow for larger allocations. IA64 has f.e.
a much larger MAX_ORDER size. So does powerpc. And then the reservation
scheme will work.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 19:01     ` Dave Hansen
@ 2018-02-16 20:15       ` Christopher Lameter
  2018-02-16 21:08         ` Dave Hansen
  0 siblings, 1 reply; 17+ messages in thread
From: Christopher Lameter @ 2018-02-16 20:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mel Gorman, Matthew Wilcox, linux-mm, linux-rdma, akpm,
	Thomas Schoebel-Theuer, andi, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Mike Kravetz

On Fri, 16 Feb 2018, Dave Hansen wrote:

> On 02/16/2018 08:01 AM, Christoph Lameter wrote:
> > In order to make this work just right one needs to be able to
> > know the workload well enough to reserve the right amount
> > of pages. This is comparable to other reservation schemes.
>
> Yes, but it's a reservation scheme that doesn't show up in MemFree, for
> instance.  Even hugetlbfs-reserved memory subtracts from that.

Ok. There is the question if we can get all these reservation schemes
under one hood instead of having page order specific ones in subsystems
like hugetlb.

> This has the potential to be really confusing to apps.  If this memory
> is now not available to normal apps, they might plow into the invisible
> memory limits and get into nasty reclaim scenarios.

> Shouldn't this subtract the memory for MemFree and friends?

Ok certainly we could do that. But on the other hand the memory is
available if those subsystems ask for the right order. Its not clear to me
what the right way of handling this is. Right now it adds the reserved
pages to the watermarks. But then under some circumstances the memory is
available. What is the best solution here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 20:15       ` Christopher Lameter
@ 2018-02-16 21:08         ` Dave Hansen
  2018-02-16 21:43           ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-02-16 21:08 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Mel Gorman, Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Mike Kravetz

On 02/16/2018 12:15 PM, Christopher Lameter wrote:
>> This has the potential to be really confusing to apps.  If this memory
>> is now not available to normal apps, they might plow into the invisible
>> memory limits and get into nasty reclaim scenarios.
>> Shouldn't this subtract the memory for MemFree and friends?
> Ok certainly we could do that. But on the other hand the memory is
> available if those subsystems ask for the right order. Its not clear to me
> what the right way of handling this is. Right now it adds the reserved
> pages to the watermarks. But then under some circumstances the memory is
> available. What is the best solution here?

There's definitely no perfect solution.

But, in general, I think we should cater to the dumbest users.  Folks
doing higher-order allocations are not that.  I say we make the picture
the most clear for the traditional 4k users.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 21:08         ` Dave Hansen
@ 2018-02-16 21:43           ` Matthew Wilcox
       [not found]             ` <20180216214353.GA32655-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2018-02-16 21:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christopher Lameter, Mel Gorman, linux-mm, linux-rdma, akpm,
	Thomas Schoebel-Theuer, andi, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Mike Kravetz

On Fri, Feb 16, 2018 at 01:08:11PM -0800, Dave Hansen wrote:
> On 02/16/2018 12:15 PM, Christopher Lameter wrote:
> >> This has the potential to be really confusing to apps.  If this memory
> >> is now not available to normal apps, they might plow into the invisible
> >> memory limits and get into nasty reclaim scenarios.
> >> Shouldn't this subtract the memory for MemFree and friends?
> > Ok certainly we could do that. But on the other hand the memory is
> > available if those subsystems ask for the right order. Its not clear to me
> > what the right way of handling this is. Right now it adds the reserved
> > pages to the watermarks. But then under some circumstances the memory is
> > available. What is the best solution here?
> 
> There's definitely no perfect solution.
> 
> But, in general, I think we should cater to the dumbest users.  Folks
> doing higher-order allocations are not that.  I say we make the picture
> the most clear for the traditional 4k users.

Your way might be confusing -- if there's a system which is under varying
amounts of jumboframe load and all the 16k pages get gobbled up by the
ethernet driver, MemFree won't change at all, for example.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
       [not found]             ` <20180216214353.GA32655-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2018-02-16 21:47               ` Dave Hansen
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Hansen @ 2018-02-16 21:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christopher Lameter, Mel Gorman, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Mike Kravetz

On 02/16/2018 01:43 PM, Matthew Wilcox wrote:
>> There's definitely no perfect solution.
>>
>> But, in general, I think we should cater to the dumbest users.  Folks
>> doing higher-order allocations are not that.  I say we make the picture
>> the most clear for the traditional 4k users.
> Your way might be confusing -- if there's a system which is under varying
> amounts of jumboframe load and all the 16k pages get gobbled up by the
> ethernet driver, MemFree won't change at all, for example.

IOW, you agree that "there's definitely no perfect solution." :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 1/2] Protect larger order pages from breaking up
       [not found]     ` <b76028c6-c755-8178-2dfc-81c7db1f8bed-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2018-02-17 16:07       ` Mike Rapoprt
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Rapoprt @ 2018-02-17 16:07 UTC (permalink / raw)
  To: Randy Dunlap, Christoph Lameter, Mel Gorman
  Cc: Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen, Mike Kravetz



On February 16, 2018 7:02:53 PM GMT+01:00, Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
>On 02/16/2018 08:01 AM, Christoph Lameter wrote:
>> Control over this feature is by writing to /proc/zoneinfo.
>> 
>> F.e. to ensure that 2000 16K pages stay available for jumbo
>> frames do
>> 
>> 	echo "2=2000" >/proc/zoneinfo
>> 
>> or through the order=<page spec> on the kernel command line.
>> F.e.
>> 
>> 	order=2=2000,4N2=500
>
>
>Please document the the kernel command line option in
>Documentation/admin-guide/kernel-parameters.txt.
>
>I suppose that /proc/zoneinfo should be added somewhere in
>Documentation/vm/
>but I'm not sure where that would be.

It's in Documentation/sysctl/vm.txt and in 'man proc' [1]

[1] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man5/proc.5

>thanks,

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 2/2] Page order diagnostics
       [not found]   ` <20180216160121.583566579-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
@ 2018-02-17 21:17     ` Pavel Machek
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Machek @ 2018-02-17 21:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Matthew Wilcox, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g, Rik van Riel, Michal Hocko,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Dave Hansen,
	Mike Kravetz

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

Hi!

> @@ -1289,6 +1289,52 @@ const char * const vmstat_text[] = {
>  	"swap_ra",
>  	"swap_ra_hit",
>  #endif
> +#ifdef CONFIG_ORDER_STATS
> +	"order0_failure",
> +	"order1_failure",
> +	"order2_failure",
> +	"order3_failure",
> +	"order4_failure",
> +	"order5_failure",
> +	"order6_failure",
> +	"order7_failure",
> +	"order8_failure",
> +	"order9_failure",
> +	"order10_failure",
> +#ifdef CONFIG_FORCE_MAX_ZONEORDER
> +#if MAX_ORDER > 11
> +	"order11_failure"
> +#endif
> +#if MAX_ORDER > 12
> +	"order12_failure"
> +#endif
> +#if MAX_ORDER > 13
> +	"order13_failure"
> +#endif
> +#if MAX_ORDER > 14
> +	"order14_failure"
> +#endif
> +#if MAX_ORDER > 15
> +	"order15_failure"
> +#endif
> +#if MAX_ORDER > 16
> +	"order16_failure"
> +#endif
> +#if MAX_ORDER > 17
> +	"order17_failure"
> +#endif
> +#if MAX_ORDER > 18
> +	"order18_failure"
> +#endif
> +#if MAX_ORDER > 19
> +	"order19_failure"
> +#endif

I don't think this does what you want it to do. Commas are missing.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [RFC 1/2] Protect larger order pages from breaking up
  2018-02-16 20:13       ` Christopher Lameter
@ 2018-02-18  9:00         ` Guy Shattah
  0 siblings, 0 replies; 17+ messages in thread
From: Guy Shattah @ 2018-02-18  9:00 UTC (permalink / raw)
  To: Christopher Lameter, Mike Kravetz
  Cc: Mel Gorman, Matthew Wilcox,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	Thomas Schoebel-Theuer,
	andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org, Rik van Riel,
	Michal Hocko, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

> 
> Yup it has a pool for everyone. Question is how to divide the loot ;-)
> 
> > IIRC, Guy Shattah's use case was for allocations greater than MAX_ORDER.
> > This would not directly address that.  A huge contiguous area (2GB) is
> > the sweet spot' for best performance in his case.  However, I think he
> > could still benefit from using a set of larger (such as 2MB) size
> > allocations which this scheme could help with.
> 
> MAX_ORDER can be increased to allow for larger allocations. IA64 has f.e.
> a much larger MAX_ORDER size. So does powerpc. And then the reservation
> scheme will work.
> 

MAX_ORDER can be increased only if kernel is recompiled. 
It won't work for code running for the general case / typical user.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-02-18  9:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-16 16:01 [RFC 0/2] Larger Order Protection V1 Christoph Lameter
2018-02-16 16:01 ` [RFC 1/2] Protect larger order pages from breaking up Christoph Lameter
2018-02-16 18:02   ` Randy Dunlap
     [not found]     ` <b76028c6-c755-8178-2dfc-81c7db1f8bed-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-02-17 16:07       ` Mike Rapoprt
2018-02-16 18:59   ` Mike Kravetz
     [not found]     ` <5108eb20-2b20-bd48-903e-bce312e96974-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2018-02-16 20:13       ` Christopher Lameter
2018-02-18  9:00         ` Guy Shattah
     [not found]   ` <20180216160121.519788537-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
2018-02-16 17:03     ` Andi Kleen
     [not found]       ` <20180216170354.vpbuugzqsrrfc4js-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2018-02-16 18:25         ` Christopher Lameter
2018-02-16 19:01     ` Dave Hansen
2018-02-16 20:15       ` Christopher Lameter
2018-02-16 21:08         ` Dave Hansen
2018-02-16 21:43           ` Matthew Wilcox
     [not found]             ` <20180216214353.GA32655-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2018-02-16 21:47               ` Dave Hansen
2018-02-16 16:01 ` [RFC 2/2] Page order diagnostics Christoph Lameter
     [not found]   ` <20180216160121.583566579-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
2018-02-17 21:17     ` Pavel Machek
2018-02-16 18:27 ` [RFC 0/2] Larger Order Protection V1 Christopher Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox