* [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
@ 2006-09-07 19:03 Mel Gorman
2006-09-07 19:04 ` [PATCH 1/8] Add __GFP_EASYRCLM flag and update callers Mel Gorman
` (8 more replies)
0 siblings, 9 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:03 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
This is the latest version of anti-fragmentation based on sub-zones (previously
called list-based anti-fragmentation) based on top of 2.6.18-rc5-mm1. In
it's last release, it was decided that the scheme should be implemented with
zones to avoid affecting the page allocator hot paths. However, at VM Summit,
it was made clear that zones may not be the right answer either because zones
have their own issues. Hence, this is a reintroduction of the first approach.
The purpose of these patches is to reduce external fragmentation by grouping
pages of related types together. The objective is that when page reclaim
occurs, there is a greater chance that large contiguous pages will be
free. Note that this is not a defragmentation which would get contiguous
pages by moving pages around.
This patch works by categorising allocations by their reclaimability;
EasyReclaimable - These are userspace pages that are easily reclaimable. This
flag is set when it is known that the pages will be trivially reclaimed
by writing the page out to swap or syncing with backing storage
KernelReclaimable - These are allocations for some kernel caches that are
reclaimable or allocations that are known to be very short-lived.
KernelNonReclaimable - These are pages that are allocated by the kernel that
are not trivially reclaimed. For example, the memory allocated for a
loaded module would be in this category. By default, allocations are
considered to be of this type
Instead of having one MAX_ORDER-sized array of free lists in struct free_area,
there is one for each type of reclaimability. Once a 2^MAX_ORDER block of
pages is split for a type of allocation, it is added to the free-lists for
that type, in effect reserving it. Hence, over time, pages of the different
types can be clustered together. When a page is allocated, the page-flags
are updated with a value indicating it's type of reclaimability so that it
is placed on the correct list on free.
When the preferred freelists are expired, the largest possible block is taken
from an alternative list. Buddies that are split from that large block are
placed on the preferred allocation-type freelists to mitigate fragmentation.
This implementation gives best-effort for low fragmentation in all zones. To
be effective, min_free_kbytes needs to be set to a value about 10% of physical
memory (10% was found by experimentation, it may be workload dependant). To get
that value lower, anti-fragmentation needs to be significantly more invasive
so it's best to find out what sorts of workloads still cause fragmentation
before taking further steps.
Our tests show that about 60-70% of physical memory can be allocated on
a desktop after a few days uptime. In benchmarks and stress tests, we are
finding that 80% of memory is available as contiguous blocks at the end of
the test. To compare, a standard kernel was getting < 1% of memory as large
pages on a desktop and about 8-12% of memory as large pages at the end of
stress tests.
Performance tests are within 0.1% for kbuild on a number of test machines. aim9
is usually within 1% except on x86_64 where aim9 results are unreliable.
I have never been able to show it but it is possible the main allocator path
is adversely affected by anti-fragmentation and it may be exposed by using
differnet compilers or benchmarks. If any regressions are detected due to
anti-fragmentation, it may be simply disabled via the kernel configuration
and I'd appreciate a report detailing the regression and how to trigger it.
Following this email are 8 patches. They early patches introduce the split
between user and kernel allocations. Later we introduce a further split
for kernel allocations, into KernRclm and KernNoRclm. Note that although
in early patches an additional page flag is consumed, later patches reuse
the suspend bits, releasing this bit again.
Comments?
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/8] Add __GFP_EASYRCLM flag and update callers
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
@ 2006-09-07 19:04 ` Mel Gorman
2006-09-07 19:04 ` [PATCH 2/8] Split the free lists into kernel and user parts Mel Gorman
` (7 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:04 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
This patch adds a flag __GFP_EASYRCLM. Allocations using the __GFP_EASYRCLM
flag are expected to be easily reclaimed by syncing with backing storage (be
it a file or swap) or cleaning the buffers and discarding.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
fs/block_dev.c | 3 ++-
fs/buffer.c | 3 ++-
fs/compat.c | 3 ++-
fs/exec.c | 3 ++-
fs/inode.c | 3 ++-
include/asm-i386/page.h | 4 +++-
include/linux/gfp.h | 12 +++++++++++-
include/linux/highmem.h | 4 +++-
mm/memory.c | 8 ++++++--
mm/swap_state.c | 4 +++-
10 files changed, 36 insertions(+), 11 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/fs/block_dev.c linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/block_dev.c
--- linux-2.6.18-rc5-mm1-clean/fs/block_dev.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/block_dev.c 2006-09-04 18:36:09.000000000 +0100
@@ -380,7 +380,8 @@ struct block_device *bdget(dev_t dev)
inode->i_rdev = dev;
inode->i_bdev = bdev;
inode->i_data.a_ops = &def_blk_aops;
- mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+ mapping_set_gfp_mask(&inode->i_data,
+ set_rclmflags(GFP_USER, __GFP_EASYRCLM));
inode->i_data.backing_dev_info = &default_backing_dev_info;
spin_lock(&bdev_lock);
list_add(&bdev->bd_list, &all_bdevs);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/fs/buffer.c linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/buffer.c
--- linux-2.6.18-rc5-mm1-clean/fs/buffer.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/buffer.c 2006-09-04 18:36:09.000000000 +0100
@@ -986,7 +986,8 @@ grow_dev_page(struct block_device *bdev,
struct page *page;
struct buffer_head *bh;
- page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+ page = find_or_create_page(inode->i_mapping, index,
+ set_rclmflags(GFP_NOFS, __GFP_EASYRCLM));
if (!page)
return NULL;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/fs/compat.c linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/compat.c
--- linux-2.6.18-rc5-mm1-clean/fs/compat.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/compat.c 2006-09-04 18:36:09.000000000 +0100
@@ -1419,7 +1419,8 @@ static int compat_copy_strings(int argc,
page = bprm->page[i];
new = 0;
if (!page) {
- page = alloc_page(GFP_HIGHUSER);
+ page = alloc_page(set_rclmflags(GFP_HIGHUSER,
+ __GFP_EASYRCLM));
bprm->page[i] = page;
if (!page) {
ret = -ENOMEM;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/fs/exec.c linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/exec.c
--- linux-2.6.18-rc5-mm1-clean/fs/exec.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/exec.c 2006-09-04 18:36:09.000000000 +0100
@@ -238,7 +238,8 @@ static int copy_strings(int argc, char _
page = bprm->page[i];
new = 0;
if (!page) {
- page = alloc_page(GFP_HIGHUSER);
+ page = alloc_page(set_rclmflags(GFP_HIGHUSER,
+ __GFP_EASYRCLM));
bprm->page[i] = page;
if (!page) {
ret = -ENOMEM;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/fs/inode.c linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/inode.c
--- linux-2.6.18-rc5-mm1-clean/fs/inode.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/fs/inode.c 2006-09-04 18:36:09.000000000 +0100
@@ -145,7 +145,8 @@ static struct inode *alloc_inode(struct
mapping->a_ops = &empty_aops;
mapping->host = inode;
mapping->flags = 0;
- mapping_set_gfp_mask(mapping, GFP_HIGHUSER);
+ mapping_set_gfp_mask(mapping,
+ set_rclmflags(GFP_HIGHUSER, __GFP_EASYRCLM));
mapping->assoc_mapping = NULL;
mapping->backing_dev_info = &default_backing_dev_info;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/include/asm-i386/page.h linux-2.6.18-rc5-mm1-001_antifrag_flags/include/asm-i386/page.h
--- linux-2.6.18-rc5-mm1-clean/include/asm-i386/page.h 2006-08-28 04:41:48.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/include/asm-i386/page.h 2006-09-04 18:36:09.000000000 +0100
@@ -35,7 +35,9 @@
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define alloc_zeroed_user_highpage(vma, vaddr) \
+ alloc_page_vma(set_rclmflags(GFP_HIGHUSER|__GFP_ZERO, __GFP_EASYRCLM),\
+ vma, vaddr)
#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
/*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/include/linux/gfp.h linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/gfp.h
--- linux-2.6.18-rc5-mm1-clean/include/linux/gfp.h 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/gfp.h 2006-09-04 18:36:09.000000000 +0100
@@ -46,6 +46,7 @@ struct vm_area_struct;
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
#define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
#define __GFP_THISNODE ((__force gfp_t)0x40000u)/* No fallback, no policies */
+#define __GFP_EASYRCLM ((__force gfp_t)0x80000u) /* Easily reclaimed page */
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -54,7 +55,11 @@ struct vm_area_struct;
#define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
- __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE)
+ __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|\
+ __GFP_EASYRCLM)
+
+/* This mask makes up all the RCLM-related flags */
+#define GFP_RECLAIM_MASK (__GFP_EASYRCLM)
/* This equals 0, but use constants in case they ever change */
#define GFP_NOWAIT (GFP_ATOMIC & ~__GFP_HIGH)
@@ -93,6 +98,11 @@ static inline enum zone_type gfp_zone(gf
return ZONE_NORMAL;
}
+static inline gfp_t set_rclmflags(gfp_t gfp, gfp_t reclaim_flags)
+{
+ return (gfp & ~(GFP_RECLAIM_MASK)) | reclaim_flags;
+}
+
/*
* There is only one page-allocator function, and two main namespaces to
* it. The alloc_page*() variants return 'struct page *' and as such
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/include/linux/highmem.h linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/highmem.h
--- linux-2.6.18-rc5-mm1-clean/include/linux/highmem.h 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/highmem.h 2006-09-04 18:36:09.000000000 +0100
@@ -61,7 +61,9 @@ static inline void clear_user_highpage(s
static inline struct page *
alloc_zeroed_user_highpage(struct vm_area_struct *vma, unsigned long vaddr)
{
- struct page *page = alloc_page_vma(GFP_HIGHUSER, vma, vaddr);
+ struct page *page = alloc_page_vma(
+ set_rclmflags(GFP_HIGHUSER, __GFP_EASYRCLM),
+ vma, vaddr);
if (page)
clear_user_highpage(page, vaddr);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/mm/memory.c linux-2.6.18-rc5-mm1-001_antifrag_flags/mm/memory.c
--- linux-2.6.18-rc5-mm1-clean/mm/memory.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/mm/memory.c 2006-09-04 18:36:09.000000000 +0100
@@ -1562,7 +1562,9 @@ gotten:
if (!new_page)
goto oom;
} else {
- new_page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+ new_page = alloc_page_vma(
+ set_rclmflags(GFP_HIGHUSER, __GFP_EASYRCLM),
+ vma, address);
if (!new_page)
goto oom;
cow_user_page(new_page, old_page, address);
@@ -2177,7 +2179,9 @@ retry:
if (unlikely(anon_vma_prepare(vma)))
goto oom;
- page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+ page = alloc_page_vma(
+ set_rclmflags(GFP_HIGHUSER, __GFP_EASYRCLM),
+ vma, address);
if (!page)
goto oom;
copy_user_highpage(page, new_page, address);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-clean/mm/swap_state.c linux-2.6.18-rc5-mm1-001_antifrag_flags/mm/swap_state.c
--- linux-2.6.18-rc5-mm1-clean/mm/swap_state.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-001_antifrag_flags/mm/swap_state.c 2006-09-04 18:36:09.000000000 +0100
@@ -343,7 +343,9 @@ struct page *read_swap_cache_async(swp_e
* Get a new page to read into from swap.
*/
if (!new_page) {
- new_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+ new_page = alloc_page_vma(
+ set_rclmflags(GFP_HIGHUSER, __GFP_EASYRCLM),
+ vma, addr);
if (!new_page)
break; /* Out of memory */
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/8] Split the free lists into kernel and user parts
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
2006-09-07 19:04 ` [PATCH 1/8] Add __GFP_EASYRCLM flag and update callers Mel Gorman
@ 2006-09-07 19:04 ` Mel Gorman
2006-09-08 7:54 ` Peter Zijlstra
2006-09-07 19:04 ` [PATCH 3/8] Split the per-cpu " Mel Gorman
` (6 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:04 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
This patch adds the core of the anti-fragmentation strategy. It works by
grouping related allocation types together. The idea is that large groups of
pages that may be reclaimed are placed near each other. The zone->free_area
list is broken into RCLM_TYPES number of lists.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
include/linux/mmzone.h | 10 +++
include/linux/page-flags.h | 7 ++
mm/page_alloc.c | 109 +++++++++++++++++++++++++++++++---------
3 files changed, 102 insertions(+), 24 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.18-rc5-mm1-002_fragcore/include/linux/mmzone.h
--- linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/mmzone.h 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-002_fragcore/include/linux/mmzone.h 2006-09-04 18:37:59.000000000 +0100
@@ -24,8 +24,16 @@
#endif
#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
+#define RCLM_NORCLM 0
+#define RCLM_EASY 1
+#define RCLM_TYPES 2
+
+#define for_each_rclmtype_order(type, order) \
+ for (order = 0; order < MAX_ORDER; order++) \
+ for (type = 0; type < RCLM_TYPES; type++)
+
struct free_area {
- struct list_head free_list;
+ struct list_head free_list[RCLM_TYPES];
unsigned long nr_free;
};
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/page-flags.h linux-2.6.18-rc5-mm1-002_fragcore/include/linux/page-flags.h
--- linux-2.6.18-rc5-mm1-001_antifrag_flags/include/linux/page-flags.h 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-002_fragcore/include/linux/page-flags.h 2006-09-04 18:37:59.000000000 +0100
@@ -92,6 +92,7 @@
#define PG_buddy 19 /* Page is free, on buddy lists */
#define PG_readahead 20 /* Reminder to do readahead */
+#define PG_easyrclm 21 /* Page is an easy reclaim block */
#if (BITS_PER_LONG > 32)
@@ -254,6 +255,12 @@
#define SetPageReadahead(page) set_bit(PG_readahead, &(page)->flags)
#define TestClearPageReadahead(page) test_and_clear_bit(PG_readahead, &(page)->flags)
+#define PageEasyRclm(page) test_bit(PG_easyrclm, &(page)->flags)
+#define SetPageEasyRclm(page) set_bit(PG_easyrclm, &(page)->flags)
+#define ClearPageEasyRclm(page) clear_bit(PG_easyrclm, &(page)->flags)
+#define __SetPageEasyRclm(page) __set_bit(PG_easyrclm, &(page)->flags)
+#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
+
struct page; /* forward declaration */
int test_clear_page_dirty(struct page *page);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-001_antifrag_flags/mm/page_alloc.c linux-2.6.18-rc5-mm1-002_fragcore/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-001_antifrag_flags/mm/page_alloc.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-002_fragcore/mm/page_alloc.c 2006-09-04 18:37:59.000000000 +0100
@@ -133,6 +133,16 @@ static unsigned long __initdata dma_rese
unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+static inline int get_pageblock_type(struct page *page)
+{
+ return (PageEasyRclm(page) != 0);
+}
+
+static inline int gfpflags_to_rclmtype(unsigned long gfp_flags)
+{
+ return ((gfp_flags & __GFP_EASYRCLM) != 0);
+}
+
#ifdef CONFIG_DEBUG_VM
static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
{
@@ -402,11 +412,13 @@ static inline void __free_one_page(struc
{
unsigned long page_idx;
int order_size = 1 << order;
+ int rclmtype = get_pageblock_type(page);
if (unlikely(PageCompound(page)))
destroy_compound_page(page, order);
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
+ __SetPageEasyRclm(page);
VM_BUG_ON(page_idx & (order_size - 1));
VM_BUG_ON(bad_range(zone, page));
@@ -414,7 +426,6 @@ static inline void __free_one_page(struc
zone->free_pages += order_size;
while (order < MAX_ORDER-1) {
unsigned long combined_idx;
- struct free_area *area;
struct page *buddy;
buddy = __page_find_buddy(page, page_idx, order);
@@ -422,8 +433,7 @@ static inline void __free_one_page(struc
break; /* Move the buddy up one level. */
list_del(&buddy->lru);
- area = zone->free_area + order;
- area->nr_free--;
+ zone->free_area[order].nr_free--;
rmv_page_order(buddy);
combined_idx = __find_combined_index(page_idx, order);
page = page + (combined_idx - page_idx);
@@ -431,7 +441,7 @@ static inline void __free_one_page(struc
order++;
}
set_page_order(page, order);
- list_add(&page->lru, &zone->free_area[order].free_list);
+ list_add(&page->lru, &zone->free_area[order].free_list[rclmtype]);
zone->free_area[order].nr_free++;
}
@@ -567,7 +577,8 @@ void fastcall __init __free_pages_bootme
* -- wli
*/
static inline void expand(struct zone *zone, struct page *page,
- int low, int high, struct free_area *area)
+ int low, int high, struct free_area *area,
+ int rclmtype)
{
unsigned long size = 1 << high;
@@ -576,7 +587,7 @@ static inline void expand(struct zone *z
high--;
size >>= 1;
VM_BUG_ON(bad_range(zone, &page[size]));
- list_add(&page[size].lru, &area->free_list);
+ list_add(&page[size].lru, &area->free_list[rclmtype]);
area->nr_free++;
set_page_order(&page[size], high);
}
@@ -627,31 +638,80 @@ static int prep_new_page(struct page *pa
return 0;
}
+/* Remove an element from the buddy allocator from the fallback list */
+static struct page *__rmqueue_fallback(struct zone *zone, int order,
+ gfp_t gfp_flags)
+{
+ struct free_area * area;
+ int current_order;
+ struct page *page;
+ int rclmtype = gfpflags_to_rclmtype(gfp_flags);
+
+ /* Find the largest possible block of pages in the other list */
+ rclmtype = !rclmtype;
+ for (current_order = MAX_ORDER-1; current_order >= order;
+ --current_order) {
+ area = &(zone->free_area[current_order]);
+ if (list_empty(&area->free_list[rclmtype]))
+ continue;
+
+ page = list_entry(area->free_list[rclmtype].next,
+ struct page, lru);
+ area->nr_free--;
+
+ /*
+ * If breaking a large block of pages, place the buddies
+ * on the preferred allocation list
+ */
+ if (unlikely(current_order >= MAX_ORDER / 2))
+ rclmtype = !rclmtype;
+
+ /* Remove the page from the freelists */
+ list_del(&page->lru);
+ rmv_page_order(page);
+ zone->free_pages -= 1UL << order;
+ expand(zone, page, order, current_order, area, rclmtype);
+ return page;
+ }
+
+ return NULL;
+}
+
/*
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
*/
-static struct page *__rmqueue(struct zone *zone, unsigned int order)
+static struct page *__rmqueue(struct zone *zone, unsigned int order,
+ gfp_t gfp_flags)
{
struct free_area * area;
unsigned int current_order;
struct page *page;
+ int rclmtype = gfpflags_to_rclmtype(gfp_flags);
+ /* Find a page of the appropriate size in the preferred list */
for (current_order = order; current_order < MAX_ORDER; ++current_order) {
- area = zone->free_area + current_order;
- if (list_empty(&area->free_list))
+ area = &(zone->free_area[current_order]);
+ if (list_empty(&area->free_list[rclmtype]))
continue;
- page = list_entry(area->free_list.next, struct page, lru);
+ page = list_entry(area->free_list[rclmtype].next,
+ struct page, lru);
list_del(&page->lru);
rmv_page_order(page);
area->nr_free--;
zone->free_pages -= 1UL << order;
- expand(zone, page, order, current_order, area);
- return page;
+ expand(zone, page, order, current_order, area, rclmtype);
+ goto got_page;
}
- return NULL;
+ page = __rmqueue_fallback(zone, order, gfp_flags);
+
+got_page:
+ if (unlikely(rclmtype == RCLM_NORCLM) && page)
+ __ClearPageEasyRclm(page);
+
+ return page;
}
/*
@@ -660,13 +720,14 @@ static struct page *__rmqueue(struct zon
* Returns the number of new pages which were placed at *list.
*/
static int rmqueue_bulk(struct zone *zone, unsigned int order,
- unsigned long count, struct list_head *list)
+ unsigned long count, struct list_head *list,
+ gfp_t gfp_flags)
{
int i;
spin_lock(&zone->lock);
for (i = 0; i < count; ++i) {
- struct page *page = __rmqueue(zone, order);
+ struct page *page = __rmqueue(zone, order, gfp_flags);
if (unlikely(page == NULL))
break;
list_add_tail(&page->lru, list);
@@ -741,7 +802,7 @@ void mark_free_pages(struct zone *zone)
{
unsigned long pfn, max_zone_pfn;
unsigned long flags;
- int order;
+ int order, t;
struct list_head *curr;
if (!zone->spanned_pages)
@@ -758,14 +819,15 @@ void mark_free_pages(struct zone *zone)
ClearPageNosaveFree(page);
}
- for (order = MAX_ORDER - 1; order >= 0; --order)
- list_for_each(curr, &zone->free_area[order].free_list) {
+ for_each_rclmtype_order(t, order) {
+ list_for_each(curr, &zone->free_area[order].free_list[t]) {
unsigned long i;
pfn = page_to_pfn(list_entry(curr, struct page, lru));
for (i = 0; i < (1UL << order); i++)
SetPageNosaveFree(pfn_to_page(pfn + i));
}
+ }
spin_unlock_irqrestore(&zone->lock, flags);
}
@@ -864,7 +926,7 @@ again:
local_irq_save(flags);
if (!pcp->count) {
pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, &pcp->list);
+ pcp->batch, &pcp->list, gfp_flags);
if (unlikely(!pcp->count))
goto failed;
}
@@ -873,7 +935,7 @@ again:
pcp->count--;
} else {
spin_lock_irqsave(&zone->lock, flags);
- page = __rmqueue(zone, order);
+ page = __rmqueue(zone, order, gfp_flags);
spin_unlock(&zone->lock);
if (!page)
goto failed;
@@ -1782,6 +1844,7 @@ void __meminit memmap_init_zone(unsigned
init_page_count(page);
reset_page_mapcount(page);
SetPageReserved(page);
+ SetPageEasyRclm(page);
INIT_LIST_HEAD(&page->lru);
#ifdef WANT_PAGE_VIRTUAL
/* The shift won't overflow because ZONE_NORMAL is below 4G. */
@@ -1797,9 +1860,9 @@ void __meminit memmap_init_zone(unsigned
void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone,
unsigned long size)
{
- int order;
- for (order = 0; order < MAX_ORDER ; order++) {
- INIT_LIST_HEAD(&zone->free_area[order].free_list);
+ int order, rclmtype;
+ for_each_rclmtype_order(rclmtype, order) {
+ INIT_LIST_HEAD(&zone->free_area[order].free_list[rclmtype]);
zone->free_area[order].nr_free = 0;
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/8] Split the per-cpu lists into kernel and user parts
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
2006-09-07 19:04 ` [PATCH 1/8] Add __GFP_EASYRCLM flag and update callers Mel Gorman
2006-09-07 19:04 ` [PATCH 2/8] Split the free lists into kernel and user parts Mel Gorman
@ 2006-09-07 19:04 ` Mel Gorman
2006-09-07 19:05 ` [PATCH 4/8] Add a configure option for anti-fragmentation Mel Gorman
` (5 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:04 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
The freelists for each allocation type can slowly become fragmented due to
the per-cpu list. Consider what happens when the following happens
1. A 2^(MAX_ORDER-1) list is reserved for __GFP_EASYRCLM pages
2. An order-0 page is allocated from the newly reserved block
3. The page is freed and placed on the per-cpu list
4. alloc_page() is called with GFP_KERNEL as the gfp_mask
5. The per-cpu list is used to satisfy the allocation
This results in a kernel page is in the middle of a RCLM_EASY region. This
means that over long periods of the time, the anti-fragmentation scheme
slowly degrades to the standard allocator.
This patch divides the per-cpu lists into RCLM_TYPES number of lists.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
include/linux/mmzone.h | 16 +++++++++--
mm/page_alloc.c | 63 +++++++++++++++++++++++++++-----------------
mm/vmstat.c | 4 +-
3 files changed, 56 insertions(+), 27 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-002_fragcore/include/linux/mmzone.h linux-2.6.18-rc5-mm1-003_percpu/include/linux/mmzone.h
--- linux-2.6.18-rc5-mm1-002_fragcore/include/linux/mmzone.h 2006-09-04 18:37:59.000000000 +0100
+++ linux-2.6.18-rc5-mm1-003_percpu/include/linux/mmzone.h 2006-09-04 18:39:39.000000000 +0100
@@ -28,6 +28,8 @@
#define RCLM_EASY 1
#define RCLM_TYPES 2
+#define for_each_rclmtype(type) \
+ for (type = 0; type < RCLM_TYPES; type++)
#define for_each_rclmtype_order(type, order) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < RCLM_TYPES; type++)
@@ -77,10 +79,10 @@ enum zone_stat_item {
NR_VM_ZONE_STAT_ITEMS };
struct per_cpu_pages {
- int count; /* number of pages in the list */
+ int counts[RCLM_TYPES]; /* number of pages in the list */
int high; /* high watermark, emptying needed */
int batch; /* chunk size for buddy add/remove */
- struct list_head list; /* the list of pages */
+ struct list_head list[RCLM_TYPES]; /* the list of pages */
};
struct per_cpu_pageset {
@@ -91,6 +93,16 @@ struct per_cpu_pageset {
#endif
} ____cacheline_aligned_in_smp;
+static inline int pcp_count(struct per_cpu_pages *pcp)
+{
+ int rclmtype, count = 0;
+
+ for_each_rclmtype(rclmtype)
+ count += pcp->counts[rclmtype];
+
+ return count;
+}
+
#ifdef CONFIG_NUMA
#define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
#else
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-002_fragcore/mm/page_alloc.c linux-2.6.18-rc5-mm1-003_percpu/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-002_fragcore/mm/page_alloc.c 2006-09-04 18:37:59.000000000 +0100
+++ linux-2.6.18-rc5-mm1-003_percpu/mm/page_alloc.c 2006-09-04 18:39:39.000000000 +0100
@@ -745,7 +745,7 @@ static int rmqueue_bulk(struct zone *zon
*/
void drain_node_pages(int nodeid)
{
- int i;
+ int i, pindex;
enum zone_type z;
unsigned long flags;
@@ -761,10 +761,14 @@ void drain_node_pages(int nodeid)
struct per_cpu_pages *pcp;
pcp = &pset->pcp[i];
- if (pcp->count) {
+ if (pcp_count(pcp)) {
local_irq_save(flags);
- free_pages_bulk(zone, pcp->count, &pcp->list, 0);
- pcp->count = 0;
+ for_each_rclmtype(pindex) {
+ free_pages_bulk(zone,
+ pcp->counts[pindex],
+ &pcp->list[pindex], 0);
+ pcp->counts[pindex] = 0;
+ }
local_irq_restore(flags);
}
}
@@ -777,7 +781,7 @@ static void __drain_pages(unsigned int c
{
unsigned long flags;
struct zone *zone;
- int i;
+ int i, pindex;
for_each_zone(zone) {
struct per_cpu_pageset *pset;
@@ -788,8 +792,13 @@ static void __drain_pages(unsigned int c
pcp = &pset->pcp[i];
local_irq_save(flags);
- free_pages_bulk(zone, pcp->count, &pcp->list, 0);
- pcp->count = 0;
+ for_each_rclmtype(pindex) {
+ free_pages_bulk(zone,
+ pcp->counts[pindex],
+ &pcp->list[pindex], 0);
+
+ pcp->counts[pindex] = 0;
+ }
local_irq_restore(flags);
}
}
@@ -851,6 +860,7 @@ void drain_local_pages(void)
static void fastcall free_hot_cold_page(struct page *page, int cold)
{
struct zone *zone = page_zone(page);
+ int pindex = get_pageblock_type(page);
struct per_cpu_pages *pcp;
unsigned long flags;
@@ -866,11 +876,11 @@ static void fastcall free_hot_cold_page(
pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
local_irq_save(flags);
__count_vm_event(PGFREE);
- list_add(&page->lru, &pcp->list);
- pcp->count++;
- if (pcp->count >= pcp->high) {
- free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
- pcp->count -= pcp->batch;
+ list_add(&page->lru, &pcp->list[pindex]);
+ pcp->counts[pindex]++;
+ if (pcp->counts[pindex] >= pcp->high) {
+ free_pages_bulk(zone, pcp->batch, &pcp->list[pindex], 0);
+ pcp->counts[pindex] -= pcp->batch;
}
local_irq_restore(flags);
put_cpu();
@@ -916,6 +926,7 @@ static struct page *buffered_rmqueue(str
struct page *page;
int cold = !!(gfp_flags & __GFP_COLD);
int cpu;
+ int rclmtype = gfpflags_to_rclmtype(gfp_flags);
again:
cpu = get_cpu();
@@ -924,15 +935,15 @@ again:
pcp = &zone_pcp(zone, cpu)->pcp[cold];
local_irq_save(flags);
- if (!pcp->count) {
- pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, &pcp->list, gfp_flags);
- if (unlikely(!pcp->count))
+ if (!pcp->counts[rclmtype]) {
+ pcp->counts[rclmtype] += rmqueue_bulk(zone, 0,
+ pcp->batch, &pcp->list[rclmtype], gfp_flags);
+ if (unlikely(!pcp->counts[rclmtype]))
goto failed;
}
- page = list_entry(pcp->list.next, struct page, lru);
+ page = list_entry(pcp->list[rclmtype].next, struct page, lru);
list_del(&page->lru);
- pcp->count--;
+ pcp->counts[rclmtype]--;
} else {
spin_lock_irqsave(&zone->lock, flags);
page = __rmqueue(zone, order, gfp_flags);
@@ -1480,7 +1491,7 @@ void show_free_areas(void)
temperature ? "cold" : "hot",
pageset->pcp[temperature].high,
pageset->pcp[temperature].batch,
- pageset->pcp[temperature].count);
+ pcp_count(&pageset->pcp[temperature]));
}
}
@@ -1921,20 +1932,26 @@ static int __cpuinit zone_batchsize(stru
inline void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
{
struct per_cpu_pages *pcp;
+ int rclmtype;
memset(p, 0, sizeof(*p));
pcp = &p->pcp[0]; /* hot */
- pcp->count = 0;
+ for_each_rclmtype(rclmtype) {
+ pcp->counts[rclmtype] = 0;
+ INIT_LIST_HEAD(&pcp->list[rclmtype]);
+ }
pcp->high = 6 * batch;
pcp->batch = max(1UL, 1 * batch);
- INIT_LIST_HEAD(&pcp->list);
+ INIT_LIST_HEAD(&pcp->list[RCLM_EASY]);
pcp = &p->pcp[1]; /* cold*/
- pcp->count = 0;
+ for_each_rclmtype(rclmtype) {
+ pcp->counts[rclmtype] = 0;
+ INIT_LIST_HEAD(&pcp->list[rclmtype]);
+ }
pcp->high = 2 * batch;
pcp->batch = max(1UL, batch/2);
- INIT_LIST_HEAD(&pcp->list);
}
/*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-002_fragcore/mm/vmstat.c linux-2.6.18-rc5-mm1-003_percpu/mm/vmstat.c
--- linux-2.6.18-rc5-mm1-002_fragcore/mm/vmstat.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-003_percpu/mm/vmstat.c 2006-09-04 18:39:39.000000000 +0100
@@ -562,7 +562,7 @@ static int zoneinfo_show(struct seq_file
pageset = zone_pcp(zone, i);
for (j = 0; j < ARRAY_SIZE(pageset->pcp); j++) {
- if (pageset->pcp[j].count)
+ if (pcp_count(&pageset->pcp[j]))
break;
}
if (j == ARRAY_SIZE(pageset->pcp))
@@ -574,7 +574,7 @@ static int zoneinfo_show(struct seq_file
"\n high: %i"
"\n batch: %i",
i, j,
- pageset->pcp[j].count,
+ pcp_count(&pageset->pcp[j]),
pageset->pcp[j].high,
pageset->pcp[j].batch);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 4/8] Add a configure option for anti-fragmentation
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
` (2 preceding siblings ...)
2006-09-07 19:04 ` [PATCH 3/8] Split the per-cpu " Mel Gorman
@ 2006-09-07 19:05 ` Mel Gorman
2006-09-07 19:05 ` [PATCH 5/8] Drain per-cpu lists when high-order allocations fail Mel Gorman
` (4 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:05 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
The anti-fragmentation strategy has memory overhead. This patch allows
the strategy to be disabled for small memory systems or if it is known the
workload is suffering because of the strategy. It also acts to show where
the anti-frag strategy interacts with the standard buddy allocator.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
init/Kconfig | 14 ++++++++++++++
mm/page_alloc.c | 20 ++++++++++++++++++++
2 files changed, 34 insertions(+)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-003_percpu/init/Kconfig linux-2.6.18-rc5-mm1-004_configurable/init/Kconfig
--- linux-2.6.18-rc5-mm1-003_percpu/init/Kconfig 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-004_configurable/init/Kconfig 2006-09-04 18:41:13.000000000 +0100
@@ -478,6 +478,20 @@ config SLOB
default !SLAB
bool
+config PAGEALLOC_ANTIFRAG
+ bool "Avoid fragmentation in the page allocator"
+ def_bool n
+ help
+ The standard allocator will fragment memory over time which means
+ that high order allocations will fail even if kswapd is running. If
+ this option is set, the allocator will try and group page types into
+ two groups, kernel and easy reclaimable. The gain is a best effort
+ attempt at lowering fragmentation which a few workloads care about.
+ The loss is a more complex allocactor that may perform slower. If
+ you are interested in working with large pages, say Y and set
+ /proc/sys/vm/min_free_bytes to be 10% of physical memory. Otherwise
+ say N
+
menu "Loadable module support"
config MODULES
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-003_percpu/mm/page_alloc.c linux-2.6.18-rc5-mm1-004_configurable/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-003_percpu/mm/page_alloc.c 2006-09-04 18:39:39.000000000 +0100
+++ linux-2.6.18-rc5-mm1-004_configurable/mm/page_alloc.c 2006-09-04 18:41:13.000000000 +0100
@@ -133,6 +133,7 @@ static unsigned long __initdata dma_rese
unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
static inline int get_pageblock_type(struct page *page)
{
return (PageEasyRclm(page) != 0);
@@ -142,6 +143,17 @@ static inline int gfpflags_to_rclmtype(u
{
return ((gfp_flags & __GFP_EASYRCLM) != 0);
}
+#else
+static inline int get_pageblock_type(struct page *page)
+{
+ return RCLM_NORCLM;
+}
+
+static inline int gfpflags_to_rclmtype(unsigned long gfp_flags)
+{
+ return RCLM_NORCLM;
+}
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
#ifdef CONFIG_DEBUG_VM
static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
@@ -638,6 +650,7 @@ static int prep_new_page(struct page *pa
return 0;
}
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
/* Remove an element from the buddy allocator from the fallback list */
static struct page *__rmqueue_fallback(struct zone *zone, int order,
gfp_t gfp_flags)
@@ -676,6 +689,13 @@ static struct page *__rmqueue_fallback(s
return NULL;
}
+#else
+static struct page *__rmqueue_fallback(struct zone *zone, unsigned int order,
+ int rclmtype)
+{
+ return NULL;
+}
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
/*
* Do the hard work of removing an element from the buddy allocator.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 5/8] Drain per-cpu lists when high-order allocations fail
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
` (3 preceding siblings ...)
2006-09-07 19:05 ` [PATCH 4/8] Add a configure option for anti-fragmentation Mel Gorman
@ 2006-09-07 19:05 ` Mel Gorman
2006-09-07 19:05 ` [PATCH 6/8] Move free pages between lists on steal Mel Gorman
` (3 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:05 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
Per-cpu pages can accidentally cause fragmentation because they are free, but
pinned pages in an otherwise contiguous block. When this patch is applied,
the per-cpu caches are drained after the direct-reclaim is entered if the
requested order is greater than 0. It simply reuses the code used by suspend
and hotplug.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
Kconfig | 4 ++++
page_alloc.c | 32 +++++++++++++++++++++++++++++---
2 files changed, 33 insertions(+), 3 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-004_configurable/mm/Kconfig linux-2.6.18-rc5-mm1-005_drainpercpu/mm/Kconfig
--- linux-2.6.18-rc5-mm1-004_configurable/mm/Kconfig 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-005_drainpercpu/mm/Kconfig 2006-09-07 19:16:11.000000000 +0100
@@ -242,3 +242,7 @@ config READAHEAD_SMOOTH_AGING
- have the danger of readahead thrashing(i.e. memory tight)
This feature is only available on non-NUMA systems.
+
+config NEED_DRAIN_PERCPU_PAGES
+ def_bool y
+ depends on PM || HOTPLUG_CPU || PAGEALLOC_ANTIFRAG
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-004_configurable/mm/page_alloc.c linux-2.6.18-rc5-mm1-005_drainpercpu/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-004_configurable/mm/page_alloc.c 2006-09-07 19:14:30.000000000 +0100
+++ linux-2.6.18-rc5-mm1-005_drainpercpu/mm/page_alloc.c 2006-09-07 19:16:38.000000000 +0100
@@ -796,7 +796,7 @@ void drain_node_pages(int nodeid)
}
#endif
-#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU)
+#ifdef CONFIG_NEED_DRAIN_PERCPU_PAGES
static void __drain_pages(unsigned int cpu)
{
unsigned long flags;
@@ -823,7 +823,7 @@ static void __drain_pages(unsigned int c
}
}
}
-#endif /* CONFIG_PM || CONFIG_HOTPLUG_CPU */
+#endif /* CONFIG_DRAIN_PERCPU_PAGES */
#ifdef CONFIG_PM
@@ -860,7 +860,9 @@ void mark_free_pages(struct zone *zone)
spin_unlock_irqrestore(&zone->lock, flags);
}
+#endif /* CONFIG_PM */
+#if defined(CONFIG_PM) || defined(CONFIG_PAGEALLOC_ANTIFRAG)
/*
* Spill all of this CPU's per-cpu pages back into the buddy allocator.
*/
@@ -872,7 +874,28 @@ void drain_local_pages(void)
__drain_pages(smp_processor_id());
local_irq_restore(flags);
}
-#endif /* CONFIG_PM */
+
+void smp_drain_local_pages(void *arg)
+{
+ drain_local_pages();
+}
+
+/*
+ * Spill all the per-cpu pages from all CPUs back into the buddy allocator
+ */
+void drain_all_local_pages(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __drain_pages(smp_processor_id());
+ local_irq_restore(flags);
+
+ smp_call_function(smp_drain_local_pages, NULL, 0, 1);
+}
+#else
+void drain_all_local_pages(void) {}
+#endif /* CONFIG_PM || CONFIG_PAGEALLOC_ANTIFRAG */
/*
* Free a 0-order page
@@ -1232,6 +1255,9 @@ rebalance:
cond_resched();
+ if (order != 0)
+ drain_all_local_pages();
+
if (likely(did_some_progress)) {
page = get_page_from_freelist(gfp_mask, order,
zonelist, alloc_flags);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 6/8] Move free pages between lists on steal
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
` (4 preceding siblings ...)
2006-09-07 19:05 ` [PATCH 5/8] Drain per-cpu lists when high-order allocations fail Mel Gorman
@ 2006-09-07 19:05 ` Mel Gorman
2006-09-07 19:06 ` [PATCH 7/8] Introduce the RCLM_KERN allocation type Mel Gorman
` (2 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:05 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
When a fallback occurs, there will be free pages for one allocation type
stored on the list for another. When a large steal occurs, this patch
will move all the free pages within one list to one allocation type.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
page_alloc.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 75 insertions(+), 9 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-005_drainpercpu/mm/page_alloc.c linux-2.6.18-rc5-mm1-006_movefree/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-005_drainpercpu/mm/page_alloc.c 2006-09-04 18:42:43.000000000 +0100
+++ linux-2.6.18-rc5-mm1-006_movefree/mm/page_alloc.c 2006-09-04 18:44:14.000000000 +0100
@@ -651,6 +651,62 @@ static int prep_new_page(struct page *pa
}
#ifdef CONFIG_PAGEALLOC_ANTIFRAG
+/*
+ * Move the free pages in a range to the free lists of the requested type.
+ * Note that start_page and end_pages are not aligned in a MAX_ORDER_NR_PAGES
+ * boundary. If alignment is required, use move_freepages_block()
+ */
+int move_freepages(struct zone *zone,
+ struct page *start_page, struct page *end_page,
+ int rclmtype)
+{
+ struct page *page;
+ unsigned long order;
+ int blocks_moved = 0;
+
+ BUG_ON(page_zone(start_page) != page_zone(end_page));
+
+ for (page = start_page; page < end_page;) {
+ if (!PageBuddy(page)) {
+ page++;
+ continue;
+ }
+#ifdef CONFIG_HOLES_IN_ZONE
+ if (!pfn_valid(page_to_pfn(page))) {
+ page++;
+ continue;
+ }
+#endif
+
+ order = page_order(page);
+ list_del(&page->lru);
+ list_add(&page->lru,
+ &zone->free_area[order].free_list[rclmtype]);
+ page += 1 << order;
+ blocks_moved++;
+ }
+
+ return blocks_moved;
+}
+
+int move_freepages_block(struct zone *zone, struct page *page, int rclmtype)
+{
+ unsigned long start_pfn;
+ struct page *start_page, *end_page;
+
+ start_pfn = page_to_pfn(page);
+ start_pfn = ALIGN(start_pfn, MAX_ORDER_NR_PAGES);
+ start_page = pfn_to_page(start_pfn);
+ end_page = start_page + MAX_ORDER_NR_PAGES;
+
+ if (page_zone(page) != page_zone(start_page))
+ start_page = page;
+ if (page_zone(page) != page_zone(end_page))
+ return 0;
+
+ return move_freepages(zone, start_page, end_page, rclmtype);
+}
+
/* Remove an element from the buddy allocator from the fallback list */
static struct page *__rmqueue_fallback(struct zone *zone, int order,
gfp_t gfp_flags)
@@ -658,10 +714,10 @@ static struct page *__rmqueue_fallback(s
struct free_area * area;
int current_order;
struct page *page;
- int rclmtype = gfpflags_to_rclmtype(gfp_flags);
+ int start_rclmtype = gfpflags_to_rclmtype(gfp_flags);
+ int rclmtype = !start_rclmtype;
/* Find the largest possible block of pages in the other list */
- rclmtype = !rclmtype;
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
area = &(zone->free_area[current_order]);
@@ -672,24 +728,34 @@ static struct page *__rmqueue_fallback(s
struct page, lru);
area->nr_free--;
- /*
- * If breaking a large block of pages, place the buddies
- * on the preferred allocation list
- */
- if (unlikely(current_order >= MAX_ORDER / 2))
- rclmtype = !rclmtype;
-
/* Remove the page from the freelists */
list_del(&page->lru);
rmv_page_order(page);
zone->free_pages -= 1UL << order;
expand(zone, page, order, current_order, area, rclmtype);
+
+ /* Move free pages between lists if stealing a large block */
+ if (current_order > MAX_ORDER / 2)
+ move_freepages_block(zone, page, start_rclmtype);
+
return page;
}
return NULL;
}
#else
+int move_freepages(struct zone *zone,
+ struct page *start_page, struct page *end_page,
+ int rclmtype)
+{
+ return 0;
+}
+
+int move_freepages_block(struct zone *zone, struct page *page, int rclmtype)
+{
+ return 0;
+}
+
static struct page *__rmqueue_fallback(struct zone *zone, unsigned int order,
int rclmtype)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 7/8] Introduce the RCLM_KERN allocation type
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
` (5 preceding siblings ...)
2006-09-07 19:05 ` [PATCH 6/8] Move free pages between lists on steal Mel Gorman
@ 2006-09-07 19:06 ` Mel Gorman
2006-09-07 19:06 ` [PATCH 8/8] [DEBUG] Add statistics Mel Gorman
2006-09-08 0:58 ` [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Andrew Morton
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:06 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
Some kernel allocations are easily reclaimable such as inode caches and
these reclaimable kernel allocations are by far the most common type of
kernel allocation. This patch marks those type of allocations explicitly
and tries to group them together.
As another page bit would normally be required, it was decided to reuse the
suspend-related page bits and make anti-fragmentation and software suspend
mutually exclusive. More details are in the patch.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
arch/x86_64/kernel/e820.c | 8 +++++
fs/buffer.c | 5 +--
fs/dcache.c | 3 +
fs/ext2/super.c | 3 +
fs/ext3/super.c | 3 +
fs/jbd/journal.c | 6 ++-
fs/jbd/revoke.c | 6 ++-
fs/ntfs/inode.c | 6 ++-
fs/reiserfs/super.c | 3 +
include/linux/gfp.h | 10 +++---
include/linux/mmzone.h | 5 +--
include/linux/page-flags.h | 60 +++++++++++++++++++++++++++++++------
init/Kconfig | 1
lib/radix-tree.c | 6 ++-
mm/page_alloc.c | 64 ++++++++++++++++++++++++++--------------
mm/shmem.c | 8 +++--
net/core/skbuff.c | 1
17 files changed, 144 insertions(+), 54 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/arch/x86_64/kernel/e820.c linux-2.6.18-rc5-mm1-007_kernrclm/arch/x86_64/kernel/e820.c
--- linux-2.6.18-rc5-mm1-006_movefree/arch/x86_64/kernel/e820.c 2006-09-04 18:34:30.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/arch/x86_64/kernel/e820.c 2006-09-04 18:45:50.000000000 +0100
@@ -235,6 +235,13 @@ e820_register_active_regions(int nid, un
}
}
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
+static void __init
+e820_mark_nosave_range(unsigned long start, unsigned long end)
+{
+ printk("Nosave not set when anti-frag is enabled");
+}
+#else
/* Mark pages corresponding to given address range as nosave */
static void __init
e820_mark_nosave_range(unsigned long start, unsigned long end)
@@ -250,6 +257,7 @@ e820_mark_nosave_range(unsigned long sta
if (pfn_valid(pfn))
SetPageNosave(pfn_to_page(pfn));
}
+#endif
/*
* Find the ranges of physical addresses that do not correspond to
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/buffer.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/buffer.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/buffer.c 2006-09-04 18:36:09.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/buffer.c 2006-09-04 18:45:50.000000000 +0100
@@ -2642,7 +2642,7 @@ int submit_bh(int rw, struct buffer_head
* from here on down, it's all bio -- do the initial mapping,
* submit_bio -> generic_make_request may further map this bio around
*/
- bio = bio_alloc(GFP_NOIO, 1);
+ bio = bio_alloc(set_rclmflags(GFP_NOIO, __GFP_EASYRCLM), 1);
bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
@@ -2922,7 +2922,8 @@ static void recalc_bh_state(void)
struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
{
- struct buffer_head *ret = kmem_cache_alloc(bh_cachep, gfp_flags);
+ struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+ set_rclmflags(gfp_flags, __GFP_KERNRCLM));
if (ret) {
get_cpu_var(bh_accounting).nr++;
recalc_bh_state();
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/dcache.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/dcache.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/dcache.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/dcache.c 2006-09-04 18:45:50.000000000 +0100
@@ -853,7 +853,8 @@ struct dentry *d_alloc(struct dentry * p
struct dentry *dentry;
char *dname;
- dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
+ dentry = kmem_cache_alloc(dentry_cache,
+ set_rclmflags(GFP_KERNEL, __GFP_KERNRCLM));
if (!dentry)
return NULL;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/ext2/super.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/ext2/super.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/ext2/super.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/ext2/super.c 2006-09-04 18:45:50.000000000 +0100
@@ -140,7 +140,8 @@ static kmem_cache_t * ext2_inode_cachep;
static struct inode *ext2_alloc_inode(struct super_block *sb)
{
struct ext2_inode_info *ei;
- ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep, SLAB_KERNEL);
+ ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep,
+ set_rclmflags(SLAB_KERNEL, __GFP_KERNRCLM));
if (!ei)
return NULL;
#ifdef CONFIG_EXT2_FS_POSIX_ACL
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/ext3/super.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/ext3/super.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/ext3/super.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/ext3/super.c 2006-09-04 18:45:50.000000000 +0100
@@ -444,7 +444,8 @@ static struct inode *ext3_alloc_inode(st
{
struct ext3_inode_info *ei;
- ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS);
+ ei = kmem_cache_alloc(ext3_inode_cachep,
+ set_rclmflags(SLAB_NOFS, __GFP_KERNRCLM));
if (!ei)
return NULL;
#ifdef CONFIG_EXT3_FS_POSIX_ACL
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/jbd/journal.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/jbd/journal.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/jbd/journal.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/jbd/journal.c 2006-09-04 18:45:50.000000000 +0100
@@ -1735,7 +1735,8 @@ static struct journal_head *journal_allo
#ifdef CONFIG_JBD_DEBUG
atomic_inc(&nr_journal_heads);
#endif
- ret = kmem_cache_alloc(journal_head_cache, GFP_NOFS);
+ ret = kmem_cache_alloc(journal_head_cache,
+ set_rclmflags(GFP_NOFS, __GFP_KERNRCLM));
if (ret == 0) {
jbd_debug(1, "out of memory for journal_head\n");
if (time_after(jiffies, last_warning + 5*HZ)) {
@@ -1745,7 +1746,8 @@ static struct journal_head *journal_allo
}
while (ret == 0) {
yield();
- ret = kmem_cache_alloc(journal_head_cache, GFP_NOFS);
+ ret = kmem_cache_alloc(journal_head_cache,
+ set_rclmflags(GFP_NOFS, __GFP_KERNRCLM));
}
}
return ret;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/jbd/revoke.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/jbd/revoke.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/jbd/revoke.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/jbd/revoke.c 2006-09-04 18:45:50.000000000 +0100
@@ -206,7 +206,8 @@ int journal_init_revoke(journal_t *journ
while((tmp >>= 1UL) != 0UL)
shift++;
- journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
+ journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache,
+ set_rclmflags(GFP_KERNEL, __GFP_KERNRCLM));
if (!journal->j_revoke_table[0])
return -ENOMEM;
journal->j_revoke = journal->j_revoke_table[0];
@@ -229,7 +230,8 @@ int journal_init_revoke(journal_t *journ
for (tmp = 0; tmp < hash_size; tmp++)
INIT_LIST_HEAD(&journal->j_revoke->hash_table[tmp]);
- journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
+ journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache,
+ set_rclmflags(GFP_KERNEL, __GFP_KERNRCLM));
if (!journal->j_revoke_table[1]) {
kfree(journal->j_revoke_table[0]->hash_table);
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/ntfs/inode.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/ntfs/inode.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/ntfs/inode.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/ntfs/inode.c 2006-09-04 18:45:50.000000000 +0100
@@ -324,7 +324,8 @@ struct inode *ntfs_alloc_big_inode(struc
ntfs_inode *ni;
ntfs_debug("Entering.");
- ni = kmem_cache_alloc(ntfs_big_inode_cache, SLAB_NOFS);
+ ni = kmem_cache_alloc(ntfs_big_inode_cache,
+ set_rclmflags(SLAB_NOFS, __GFP_KERNRCLM));
if (likely(ni != NULL)) {
ni->state = 0;
return VFS_I(ni);
@@ -349,7 +350,8 @@ static inline ntfs_inode *ntfs_alloc_ext
ntfs_inode *ni;
ntfs_debug("Entering.");
- ni = kmem_cache_alloc(ntfs_inode_cache, SLAB_NOFS);
+ ni = kmem_cache_alloc(ntfs_inode_cache,
+ set_rclmflags(SLAB_NOFS, __GFP_KERNRCLM));
if (likely(ni != NULL)) {
ni->state = 0;
return ni;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/fs/reiserfs/super.c linux-2.6.18-rc5-mm1-007_kernrclm/fs/reiserfs/super.c
--- linux-2.6.18-rc5-mm1-006_movefree/fs/reiserfs/super.c 2006-09-04 18:34:32.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/fs/reiserfs/super.c 2006-09-04 18:45:50.000000000 +0100
@@ -496,7 +496,8 @@ static struct inode *reiserfs_alloc_inod
{
struct reiserfs_inode_info *ei;
ei = (struct reiserfs_inode_info *)
- kmem_cache_alloc(reiserfs_inode_cachep, SLAB_KERNEL);
+ kmem_cache_alloc(reiserfs_inode_cachep,
+ set_rclmflags(SLAB_KERNEL, __GFP_KERNRCLM));
if (!ei)
return NULL;
return &ei->vfs_inode;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/include/linux/gfp.h linux-2.6.18-rc5-mm1-007_kernrclm/include/linux/gfp.h
--- linux-2.6.18-rc5-mm1-006_movefree/include/linux/gfp.h 2006-09-04 18:36:09.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/include/linux/gfp.h 2006-09-04 18:45:50.000000000 +0100
@@ -46,9 +46,10 @@ struct vm_area_struct;
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
#define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
#define __GFP_THISNODE ((__force gfp_t)0x40000u)/* No fallback, no policies */
-#define __GFP_EASYRCLM ((__force gfp_t)0x80000u) /* Easily reclaimed page */
+#define __GFP_KERNRCLM ((__force gfp_t)0x80000u) /* Kernel reclaimable page */
+#define __GFP_EASYRCLM ((__force gfp_t)0x100000u) /* Easily reclaimed page */
-#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 21 /* Room for 21 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/* if you forget to add the bitmask here kernel will crash, period */
@@ -56,10 +57,10 @@ struct vm_area_struct;
__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|\
- __GFP_EASYRCLM)
+ __GFP_KERNRCLM|__GFP_EASYRCLM)
/* This mask makes up all the RCLM-related flags */
-#define GFP_RECLAIM_MASK (__GFP_EASYRCLM)
+#define GFP_RECLAIM_MASK (__GFP_KERNRCLM|__GFP_EASYRCLM)
/* This equals 0, but use constants in case they ever change */
#define GFP_NOWAIT (GFP_ATOMIC & ~__GFP_HIGH)
@@ -100,6 +101,7 @@ static inline enum zone_type gfp_zone(gf
static inline gfp_t set_rclmflags(gfp_t gfp, gfp_t reclaim_flags)
{
+ BUG_ON((gfp & GFP_RECLAIM_MASK) == GFP_RECLAIM_MASK);
return (gfp & ~(GFP_RECLAIM_MASK)) | reclaim_flags;
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/include/linux/mmzone.h linux-2.6.18-rc5-mm1-007_kernrclm/include/linux/mmzone.h
--- linux-2.6.18-rc5-mm1-006_movefree/include/linux/mmzone.h 2006-09-04 18:39:39.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/include/linux/mmzone.h 2006-09-04 18:45:50.000000000 +0100
@@ -25,8 +25,9 @@
#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
#define RCLM_NORCLM 0
-#define RCLM_EASY 1
-#define RCLM_TYPES 2
+#define RCLM_KERN 1
+#define RCLM_EASY 2
+#define RCLM_TYPES 3
#define for_each_rclmtype(type) \
for (type = 0; type < RCLM_TYPES; type++)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/include/linux/page-flags.h linux-2.6.18-rc5-mm1-007_kernrclm/include/linux/page-flags.h
--- linux-2.6.18-rc5-mm1-006_movefree/include/linux/page-flags.h 2006-09-04 18:37:59.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/include/linux/page-flags.h 2006-09-04 18:45:50.000000000 +0100
@@ -82,18 +82,37 @@
#define PG_private 11 /* If pagecache, has fs-private data */
#define PG_writeback 12 /* Page is under writeback */
-#define PG_nosave 13 /* Used for system suspend/resume */
#define PG_compound 14 /* Part of a compound page */
#define PG_swapcache 15 /* Swap page: swp_entry_t in private */
#define PG_mappedtodisk 16 /* Has blocks allocated on-disk */
#define PG_reclaim 17 /* To be reclaimed asap */
-#define PG_nosave_free 18 /* Used for system suspend/resume */
#define PG_buddy 19 /* Page is free, on buddy lists */
#define PG_readahead 20 /* Reminder to do readahead */
-#define PG_easyrclm 21 /* Page is an easy reclaim block */
+/*
+ * As anti-fragmentation requires two flags, it was best to reuse the suspend
+ * flags and make anti-fragmentation depend on !SOFTWARE_SUSPEND. This works
+ * on the assumption that machines being suspended do not really care about
+ * large contiguous allocations. There are two alternatives to where the
+ * anti-fragmentation falgs could be stored
+ *
+ * 1. Use the lower two bits of page->lru and remove direct references to
+ * page->lru
+ * 2. Use the page->flags of the struct page backing the page storing the
+ * mem_map
+ *
+ * The first option may be difficult to read. The second option would require
+ * an additional cache line
+ */
+#ifndef CONFIG_PAGEALLOC_ANTIFRAG
+#define PG_nosave 13 /* Used for system suspend/resume */
+#define PG_nosave_free 18 /* Free, should not be written */
+#else
+#define PG_kernrclm 13 /* Page is a kernel reclaim block */
+#define PG_easyrclm 18 /* Page is an easy reclaim block */
+#endif
#if (BITS_PER_LONG > 32)
/*
@@ -212,6 +231,7 @@
ret; \
})
+#ifndef CONFIG_PAGEALLOC_ANTIFRAG
#define PageNosave(page) test_bit(PG_nosave, &(page)->flags)
#define SetPageNosave(page) set_bit(PG_nosave, &(page)->flags)
#define TestSetPageNosave(page) test_and_set_bit(PG_nosave, &(page)->flags)
@@ -222,6 +242,34 @@
#define SetPageNosaveFree(page) set_bit(PG_nosave_free, &(page)->flags)
#define ClearPageNosaveFree(page) clear_bit(PG_nosave_free, &(page)->flags)
+#define PageKernRclm(page) (0)
+#define SetPageKernRclm(page) do {} while (0)
+#define ClearPageKernRclm(page) do {} while (0)
+#define __SetPageKernRclm(page) do {} while (0)
+#define __ClearPageKernRclm(page) do {} while (0)
+
+#define PageEasyRclm(page) (0)
+#define SetPageEasyRclm(page) do {} while (0)
+#define ClearPageEasyRclm(page) do {} while (0)
+#define __SetPageEasyRclm(page) do {} while (0)
+#define __ClearPageEasyRclm(page) do {} while (0)
+
+#else
+
+#define PageKernRclm(page) test_bit(PG_kernrclm, &(page)->flags)
+#define SetPageKernRclm(page) set_bit(PG_kernrclm, &(page)->flags)
+#define ClearPageKernRclm(page) clear_bit(PG_kernrclm, &(page)->flags)
+#define __SetPageKernRclm(page) __set_bit(PG_kernrclm, &(page)->flags)
+#define __ClearPageKernRclm(page) __clear_bit(PG_kernrclm, &(page)->flags)
+
+#define PageEasyRclm(page) test_bit(PG_easyrclm, &(page)->flags)
+#define SetPageEasyRclm(page) set_bit(PG_easyrclm, &(page)->flags)
+#define ClearPageEasyRclm(page) clear_bit(PG_easyrclm, &(page)->flags)
+#define __SetPageEasyRclm(page) __set_bit(PG_easyrclm, &(page)->flags)
+#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
+
+
#define PageBuddy(page) test_bit(PG_buddy, &(page)->flags)
#define __SetPageBuddy(page) __set_bit(PG_buddy, &(page)->flags)
#define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags)
@@ -255,12 +303,6 @@
#define SetPageReadahead(page) set_bit(PG_readahead, &(page)->flags)
#define TestClearPageReadahead(page) test_and_clear_bit(PG_readahead, &(page)->flags)
-#define PageEasyRclm(page) test_bit(PG_easyrclm, &(page)->flags)
-#define SetPageEasyRclm(page) set_bit(PG_easyrclm, &(page)->flags)
-#define ClearPageEasyRclm(page) clear_bit(PG_easyrclm, &(page)->flags)
-#define __SetPageEasyRclm(page) __set_bit(PG_easyrclm, &(page)->flags)
-#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
-
struct page; /* forward declaration */
int test_clear_page_dirty(struct page *page);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/init/Kconfig linux-2.6.18-rc5-mm1-007_kernrclm/init/Kconfig
--- linux-2.6.18-rc5-mm1-006_movefree/init/Kconfig 2006-09-04 18:41:13.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/init/Kconfig 2006-09-04 18:45:50.000000000 +0100
@@ -491,6 +491,7 @@ config PAGEALLOC_ANTIFRAG
you are interested in working with large pages, say Y and set
/proc/sys/vm/min_free_bytes to be 10% of physical memory. Otherwise
say N
+ depends on !SOFTWARE_SUSPEND
menu "Loadable module support"
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/lib/radix-tree.c linux-2.6.18-rc5-mm1-007_kernrclm/lib/radix-tree.c
--- linux-2.6.18-rc5-mm1-006_movefree/lib/radix-tree.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/lib/radix-tree.c 2006-09-04 18:45:50.000000000 +0100
@@ -93,7 +93,8 @@ radix_tree_node_alloc(struct radix_tree_
struct radix_tree_node *ret;
gfp_t gfp_mask = root_gfp_mask(root);
- ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ ret = kmem_cache_alloc(radix_tree_node_cachep,
+ set_rclmflags(gfp_mask, __GFP_KERNRCLM));
if (ret == NULL && !(gfp_mask & __GFP_WAIT)) {
struct radix_tree_preload *rtp;
@@ -137,7 +138,8 @@ int radix_tree_preload(gfp_t gfp_mask)
rtp = &__get_cpu_var(radix_tree_preloads);
while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
preempt_enable();
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ node = kmem_cache_alloc(radix_tree_node_cachep,
+ set_rclmflags(gfp_mask, __GFP_KERNRCLM));
if (node == NULL)
goto out;
preempt_disable();
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/mm/page_alloc.c linux-2.6.18-rc5-mm1-007_kernrclm/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-006_movefree/mm/page_alloc.c 2006-09-04 18:44:14.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/mm/page_alloc.c 2006-09-04 18:45:50.000000000 +0100
@@ -136,12 +136,16 @@ static unsigned long __initdata dma_rese
#ifdef CONFIG_PAGEALLOC_ANTIFRAG
static inline int get_pageblock_type(struct page *page)
{
- return (PageEasyRclm(page) != 0);
+ return ((PageEasyRclm(page) != 0) << 1) | (PageKernRclm(page) != 0);
}
static inline int gfpflags_to_rclmtype(unsigned long gfp_flags)
{
- return ((gfp_flags & __GFP_EASYRCLM) != 0);
+ gfp_t badflags = (__GFP_EASYRCLM | __GFP_KERNRCLM);
+ WARN_ON((gfp_flags & badflags) == badflags);
+
+ return (((gfp_flags & __GFP_EASYRCLM) != 0) << 1) |
+ ((gfp_flags & __GFP_KERNRCLM) != 0);
}
#else
static inline int get_pageblock_type(struct page *page)
@@ -431,6 +435,7 @@ static inline void __free_one_page(struc
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
__SetPageEasyRclm(page);
+ __ClearPageKernRclm(page);
VM_BUG_ON(page_idx & (order_size - 1));
VM_BUG_ON(bad_range(zone, page));
@@ -707,6 +712,12 @@ int move_freepages_block(struct zone *zo
return move_freepages(zone, start_page, end_page, rclmtype);
}
+static int fallbacks[RCLM_TYPES][RCLM_TYPES] = {
+ { RCLM_NORCLM, RCLM_KERN, RCLM_EASY }, /* RCLM_NORCLM Fallback */
+ { RCLM_KERN, RCLM_NORCLM, RCLM_EASY }, /* RCLM_KERN Fallback */
+ { RCLM_EASY, RCLM_KERN, RCLM_NORCLM} /* RCLM_EASY Fallback */
+};
+
/* Remove an element from the buddy allocator from the fallback list */
static struct page *__rmqueue_fallback(struct zone *zone, int order,
gfp_t gfp_flags)
@@ -715,30 +726,36 @@ static struct page *__rmqueue_fallback(s
int current_order;
struct page *page;
int start_rclmtype = gfpflags_to_rclmtype(gfp_flags);
- int rclmtype = !start_rclmtype;
+ int rclmtype, i;
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
- area = &(zone->free_area[current_order]);
- if (list_empty(&area->free_list[rclmtype]))
- continue;
+ for (i = 0; i < RCLM_TYPES; i++) {
+ rclmtype = fallbacks[start_rclmtype][i];
- page = list_entry(area->free_list[rclmtype].next,
- struct page, lru);
- area->nr_free--;
+ area = &(zone->free_area[current_order]);
+ if (list_empty(&area->free_list[rclmtype]))
+ continue;
- /* Remove the page from the freelists */
- list_del(&page->lru);
- rmv_page_order(page);
- zone->free_pages -= 1UL << order;
- expand(zone, page, order, current_order, area, rclmtype);
+ page = list_entry(area->free_list[rclmtype].next,
+ struct page, lru);
+ area->nr_free--;
- /* Move free pages between lists if stealing a large block */
- if (current_order > MAX_ORDER / 2)
- move_freepages_block(zone, page, start_rclmtype);
+ /* Remove the page from the freelists */
+ list_del(&page->lru);
+ rmv_page_order(page);
+ zone->free_pages -= 1UL << order;
+ expand(zone, page, order, current_order, area,
+ start_rclmtype);
+
+ /* Move free pages between lists for large blocks */
+ if (current_order >= MAX_ORDER / 2)
+ move_freepages_block(zone, page,
+ start_rclmtype);
- return page;
+ return page;
+ }
}
return NULL;
@@ -794,9 +811,12 @@ static struct page *__rmqueue(struct zon
page = __rmqueue_fallback(zone, order, gfp_flags);
got_page:
- if (unlikely(rclmtype == RCLM_NORCLM) && page)
+ if (unlikely(rclmtype != RCLM_EASY) && page)
__ClearPageEasyRclm(page);
+ if (rclmtype == RCLM_KERN && page)
+ SetPageKernRclm(page);
+
return page;
}
@@ -891,7 +911,7 @@ static void __drain_pages(unsigned int c
}
#endif /* CONFIG_DRAIN_PERCPU_PAGES */
-#ifdef CONFIG_PM
+#ifdef CONFIG_SOFTWARE_SUSPEND
void mark_free_pages(struct zone *zone)
{
unsigned long pfn, max_zone_pfn;
@@ -2052,7 +2072,7 @@ inline void setup_pageset(struct per_cpu
pcp->counts[rclmtype] = 0;
INIT_LIST_HEAD(&pcp->list[rclmtype]);
}
- pcp->high = 6 * batch;
+ pcp->high = 3 * batch;
pcp->batch = max(1UL, 1 * batch);
INIT_LIST_HEAD(&pcp->list[RCLM_EASY]);
@@ -2061,7 +2081,7 @@ inline void setup_pageset(struct per_cpu
pcp->counts[rclmtype] = 0;
INIT_LIST_HEAD(&pcp->list[rclmtype]);
}
- pcp->high = 2 * batch;
+ pcp->high = batch;
pcp->batch = max(1UL, batch/2);
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/mm/shmem.c linux-2.6.18-rc5-mm1-007_kernrclm/mm/shmem.c
--- linux-2.6.18-rc5-mm1-006_movefree/mm/shmem.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/mm/shmem.c 2006-09-04 18:45:50.000000000 +0100
@@ -91,7 +91,8 @@ static inline struct page *shmem_dir_all
* BLOCKS_PER_PAGE on indirect pages, assume PAGE_CACHE_SIZE:
* might be reconsidered if it ever diverges from PAGE_SIZE.
*/
- return alloc_pages(gfp_mask, PAGE_CACHE_SHIFT-PAGE_SHIFT);
+ return alloc_pages(set_rclmflags(gfp_mask, __GFP_KERNRCLM),
+ PAGE_CACHE_SHIFT-PAGE_SHIFT);
}
static inline void shmem_dir_free(struct page *page)
@@ -968,7 +969,8 @@ shmem_alloc_page(gfp_t gfp, struct shmem
pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx);
pvma.vm_pgoff = idx;
pvma.vm_end = PAGE_SIZE;
- page = alloc_page_vma(gfp | __GFP_ZERO, &pvma, 0);
+ page = alloc_page_vma(set_rclmflags(gfp | __GFP_ZERO, __GFP_KERNRCLM),
+ &pvma, 0);
mpol_free(pvma.vm_policy);
return page;
}
@@ -988,7 +990,7 @@ shmem_swapin(struct shmem_inode_info *in
static inline struct page *
shmem_alloc_page(gfp_t gfp,struct shmem_inode_info *info, unsigned long idx)
{
- return alloc_page(gfp | __GFP_ZERO);
+ return alloc_page(set_rclmflags(gfp | __GFP_ZERO, __GFP_KERNRCLM));
}
#endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-006_movefree/net/core/skbuff.c linux-2.6.18-rc5-mm1-007_kernrclm/net/core/skbuff.c
--- linux-2.6.18-rc5-mm1-006_movefree/net/core/skbuff.c 2006-09-04 18:34:33.000000000 +0100
+++ linux-2.6.18-rc5-mm1-007_kernrclm/net/core/skbuff.c 2006-09-04 18:45:50.000000000 +0100
@@ -148,6 +148,7 @@ struct sk_buff *__alloc_skb(unsigned int
u8 *data;
cache = fclone ? skbuff_fclone_cache : skbuff_head_cache;
+ gfp_mask |= __GFP_KERNRCLM;
/* Get the HEAD */
skb = kmem_cache_alloc(cache, gfp_mask & ~__GFP_DMA);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 8/8] [DEBUG] Add statistics
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
` (6 preceding siblings ...)
2006-09-07 19:06 ` [PATCH 7/8] Introduce the RCLM_KERN allocation type Mel Gorman
@ 2006-09-07 19:06 ` Mel Gorman
2006-09-08 0:58 ` [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Andrew Morton
8 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-07 19:06 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Mel Gorman
This patch is strictly debug only. With static markers from SystemTap (what is
the current story with these?) or any other type of static marking of probe
points, this could be replaced by a relatively trivial script. Until such
static probes exist, this patch outputs some information to /proc/buddyinfo
that may help explain what went wrong if the anti-fragmentation strategy fails.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
page_alloc.c | 20 ++++++++++++++++++++
vmstat.c | 16 ++++++++++++++++
2 files changed, 36 insertions(+)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-007_kernrclm/mm/page_alloc.c linux-2.6.18-rc5-mm1-009_stats/mm/page_alloc.c
--- linux-2.6.18-rc5-mm1-007_kernrclm/mm/page_alloc.c 2006-09-04 18:45:50.000000000 +0100
+++ linux-2.6.18-rc5-mm1-009_stats/mm/page_alloc.c 2006-09-04 18:47:33.000000000 +0100
@@ -56,6 +56,10 @@ unsigned long totalram_pages __read_most
unsigned long totalreserve_pages __read_mostly;
long nr_swap_pages;
int percpu_pagelist_fraction;
+int split_count[RCLM_TYPES];
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
+int fallback_counts[RCLM_TYPES];
+#endif
static void __free_pages_ok(struct page *page, unsigned int order);
@@ -742,6 +746,12 @@ static struct page *__rmqueue_fallback(s
struct page, lru);
area->nr_free--;
+ /* Account for a MAX_ORDER block being split */
+ if (current_order == MAX_ORDER - 1 &&
+ order < MAX_ORDER - 1) {
+ split_count[start_rclmtype]++;
+ }
+
/* Remove the page from the freelists */
list_del(&page->lru);
rmv_page_order(page);
@@ -754,6 +764,12 @@ static struct page *__rmqueue_fallback(s
move_freepages_block(zone, page,
start_rclmtype);
+ /* Account for fallbacks */
+ if (order < MAX_ORDER - 1 &&
+ current_order != MAX_ORDER - 1) {
+ fallback_counts[start_rclmtype]++;
+ }
+
return page;
}
}
@@ -804,6 +820,10 @@ static struct page *__rmqueue(struct zon
rmv_page_order(page);
area->nr_free--;
zone->free_pages -= 1UL << order;
+
+ if (current_order == MAX_ORDER - 1 && order < MAX_ORDER - 1)
+ split_count[rclmtype]++;
+
expand(zone, page, order, current_order, area, rclmtype);
goto got_page;
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc5-mm1-007_kernrclm/mm/vmstat.c linux-2.6.18-rc5-mm1-009_stats/mm/vmstat.c
--- linux-2.6.18-rc5-mm1-007_kernrclm/mm/vmstat.c 2006-09-04 18:39:39.000000000 +0100
+++ linux-2.6.18-rc5-mm1-009_stats/mm/vmstat.c 2006-09-04 18:47:33.000000000 +0100
@@ -13,6 +13,11 @@
#include <linux/module.h>
#include <linux/cpu.h>
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
+extern int split_count[RCLM_TYPES];
+extern int fallback_counts[RCLM_TYPES];
+#endif
+
void __get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free, struct pglist_data *pgdat)
{
@@ -427,6 +432,17 @@ static int frag_show(struct seq_file *m,
spin_unlock_irqrestore(&zone->lock, flags);
seq_putc(m, '\n');
}
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
+ seq_printf(m, "Fallback counts\n");
+ seq_printf(m, "KernNoRclm: %8d\n", fallback_counts[RCLM_NORCLM]);
+ seq_printf(m, "KernRclm: %8d\n", fallback_counts[RCLM_KERN]);
+ seq_printf(m, "EasyRclm: %8d\n", fallback_counts[RCLM_EASY]);
+
+ seq_printf(m, "\nSplit counts\n");
+ seq_printf(m, "KernNoRclm: %8d\n", split_count[RCLM_NORCLM]);
+ seq_printf(m, "KernRclm: %8d\n", split_count[RCLM_KERN]);
+ seq_printf(m, "EasyRclm: %8d\n", split_count[RCLM_EASY]);
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
` (7 preceding siblings ...)
2006-09-07 19:06 ` [PATCH 8/8] [DEBUG] Add statistics Mel Gorman
@ 2006-09-08 0:58 ` Andrew Morton
2006-09-08 8:30 ` Peter Zijlstra
2006-09-08 8:36 ` Mel Gorman
8 siblings, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2006-09-08 0:58 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm, linux-kernel
On Thu, 7 Sep 2006 20:03:42 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:
> When a page is allocated, the page-flags
> are updated with a value indicating it's type of reclaimability so that it
> is placed on the correct list on free.
We're getting awful tight on page-flags.
Would it be possible to avoid adding the flag? Say, have a per-zone bitmap
of size (zone->present_pages/(1<<MAX_ORDER)) bits, then do a lookup in
there to work out whether a particular page is within a MAX_ORDER clump of
easy-reclaimable pages?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/8] Split the free lists into kernel and user parts
2006-09-07 19:04 ` [PATCH 2/8] Split the free lists into kernel and user parts Mel Gorman
@ 2006-09-08 7:54 ` Peter Zijlstra
2006-09-08 9:20 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2006-09-08 7:54 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm, linux-kernel
Hi Mel,
Looking good, some small nits follow.
On Thu, 2006-09-07 at 20:04 +0100, Mel Gorman wrote:
> +#define for_each_rclmtype_order(type, order) \
> + for (order = 0; order < MAX_ORDER; order++) \
> + for (type = 0; type < RCLM_TYPES; type++)
It seems odd to me that you have the for loops in reverse order of the
arguments.
> +static inline int get_pageblock_type(struct page *page)
> +{
> + return (PageEasyRclm(page) != 0);
> +}
I find the naming a little odd, I would have suspected something like:
get_page_blocktype() or thereabout since you're getting a page
attribute.
> +static inline int gfpflags_to_rclmtype(unsigned long gfp_flags)
> +{
> + return ((gfp_flags & __GFP_EASYRCLM) != 0);
> +}
gfp_t argument?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
2006-09-08 0:58 ` [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Andrew Morton
@ 2006-09-08 8:30 ` Peter Zijlstra
2006-09-08 9:24 ` Mel Gorman
2006-09-08 8:36 ` Mel Gorman
1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2006-09-08 8:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mel Gorman, linux-mm, linux-kernel
On Thu, 2006-09-07 at 17:58 -0700, Andrew Morton wrote:
> On Thu, 7 Sep 2006 20:03:42 +0100 (IST)
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > When a page is allocated, the page-flags
> > are updated with a value indicating it's type of reclaimability so that it
> > is placed on the correct list on free.
>
> We're getting awful tight on page-flags.
>
> Would it be possible to avoid adding the flag? Say, have a per-zone bitmap
> of size (zone->present_pages/(1<<MAX_ORDER)) bits, then do a lookup in
> there to work out whether a particular page is within a MAX_ORDER clump of
> easy-reclaimable pages?
That would not actually work, the fallback allocation path can move
blocks smaller than MAX_ORDER to another recaim type.
But yeah, page flags are getting right, perhaps Rafael can use his
recently introduced bitmaps to rid us of the swsusp flags?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
2006-09-08 0:58 ` [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Andrew Morton
2006-09-08 8:30 ` Peter Zijlstra
@ 2006-09-08 8:36 ` Mel Gorman
2006-09-08 13:06 ` Peter Zijlstra
1 sibling, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2006-09-08 8:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Thu, 7 Sep 2006, Andrew Morton wrote:
> On Thu, 7 Sep 2006 20:03:42 +0100 (IST)
> Mel Gorman <mel@csn.ul.ie> wrote:
>
>> When a page is allocated, the page-flags
>> are updated with a value indicating it's type of reclaimability so that it
>> is placed on the correct list on free.
>
> We're getting awful tight on page-flags.
>
Yeah, I know :(
> Would it be possible to avoid adding the flag? Say, have a per-zone bitmap
> of size (zone->present_pages/(1<<MAX_ORDER)) bits, then do a lookup in
> there to work out whether a particular page is within a MAX_ORDER clump of
> easy-reclaimable pages?
>
An early version of the patches created such a bitmap and it was heavily
resisted for two reasons. It put more pressure on the cache and it needed
to be resized during hot-add and hot-remove. It was the latter issue
people had more problems with. However, I can reimplement it if people
want to take a look. As I see it currently, there are five choices that
could be taken to avoid using an additional pageflag
1. Re-use existing page flags. This is what I currently do in a later
patch for the software suspend flags
pros: Straight-forward implementation, appears to use no additional flags
cons: When swsusp stops using the flags, anti-frag takes them right back
Makes anti-frag mutually exclusive with swsusp
2. Create a per-zone bitmap for every MAX_ORDER block
pros: Straight-forward implementation initially
cons: Needs resizing during hotadd which could get complicated
Bit more cache pressure
3. Use the low two bits of page->lru
pros: Uses existing struct page field
cons: It's a bit funky looking
4. Use the page->flags of the struct page backing the pages used
for the memmap.
pros: Similar to the bitmap idea except with less hotadd problems
cons: Bit more cache pressure
5. Add an additional field page->hintsflags used for non-critical flags.
There are patches out there like guest page hinting that want to
consume flags but not for any vital purpose and usually for machines
that have ample amounts of memory. For these features, add an
additional page->hintsflags
pros: Straight-forward to implement
cons: Increses struct page size for some kernel features.
I am leaning towards option 3 because it uses no additional memory but I'm
not sure how people feel about using pointer magic like this.
Any opinions?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/8] Split the free lists into kernel and user parts
2006-09-08 7:54 ` Peter Zijlstra
@ 2006-09-08 9:20 ` Mel Gorman
0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-08 9:20 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm, linux-kernel
On Fri, 8 Sep 2006, Peter Zijlstra wrote:
> Hi Mel,
>
> Looking good, some small nits follow.
>
> On Thu, 2006-09-07 at 20:04 +0100, Mel Gorman wrote:
>
>> +#define for_each_rclmtype_order(type, order) \
>> + for (order = 0; order < MAX_ORDER; order++) \
>> + for (type = 0; type < RCLM_TYPES; type++)
>
> It seems odd to me that you have the for loops in reverse order of the
> arguments.
>
I'll fix that.
>> +static inline int get_pageblock_type(struct page *page)
>> +{
>> + return (PageEasyRclm(page) != 0);
>> +}
>
> I find the naming a little odd, I would have suspected something like:
> get_page_blocktype() or thereabout since you're getting a page
> attribute.
>
This is a throwback from an early version when I used a bitmap that used
one bit per MAX_ORDER_NR_PAGES block of pages. Many pages in a block
shared one bit - hence get_pageblock_type(). The name is now stupid. I'll
fix it.
>> +static inline int gfpflags_to_rclmtype(unsigned long gfp_flags)
>> +{
>> + return ((gfp_flags & __GFP_EASYRCLM) != 0);
>> +}
>
> gfp_t argument?
>
doh, yes, it should be gfp_t
Thanks
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
2006-09-08 8:30 ` Peter Zijlstra
@ 2006-09-08 9:24 ` Mel Gorman
0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-08 9:24 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Andrew Morton, linux-mm, linux-kernel
On Fri, 8 Sep 2006, Peter Zijlstra wrote:
> On Thu, 2006-09-07 at 17:58 -0700, Andrew Morton wrote:
>> On Thu, 7 Sep 2006 20:03:42 +0100 (IST)
>> Mel Gorman <mel@csn.ul.ie> wrote:
>>
>>> When a page is allocated, the page-flags
>>> are updated with a value indicating it's type of reclaimability so that it
>>> is placed on the correct list on free.
>>
>> We're getting awful tight on page-flags.
>>
>> Would it be possible to avoid adding the flag? Say, have a per-zone bitmap
>> of size (zone->present_pages/(1<<MAX_ORDER)) bits, then do a lookup in
>> there to work out whether a particular page is within a MAX_ORDER clump of
>> easy-reclaimable pages?
>
> That would not actually work, the fallback allocation path can move
> blocks smaller than MAX_ORDER to another recaim type.
>
Believe it or not, it may be desirably to have a whole block represented
by one or two bits. If a fallback allocation occurs and I move blocks
between lists, I want pages that free later to be freed to the new list as
well. Currently that doesn't happen because the flags are set per-page but
it used to happen in an early version of anti-frag.
> But yeah, page flags are getting right, perhaps Rafael can use his
> recently introduced bitmaps to rid us of the swsusp flags?
>
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
2006-09-08 8:36 ` Mel Gorman
@ 2006-09-08 13:06 ` Peter Zijlstra
2006-09-08 13:16 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2006-09-08 13:06 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel
On Fri, 2006-09-08 at 09:36 +0100, Mel Gorman wrote:
> On Thu, 7 Sep 2006, Andrew Morton wrote:
>
> > On Thu, 7 Sep 2006 20:03:42 +0100 (IST)
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> >> When a page is allocated, the page-flags
> >> are updated with a value indicating it's type of reclaimability so that it
> >> is placed on the correct list on free.
> >
> > We're getting awful tight on page-flags.
> >
>
> Yeah, I know :(
>
> > Would it be possible to avoid adding the flag? Say, have a per-zone bitmap
> > of size (zone->present_pages/(1<<MAX_ORDER)) bits, then do a lookup in
> > there to work out whether a particular page is within a MAX_ORDER clump of
> > easy-reclaimable pages?
> >
>
> An early version of the patches created such a bitmap and it was heavily
> resisted for two reasons. It put more pressure on the cache and it needed
> to be resized during hot-add and hot-remove. It was the latter issue
> people had more problems with. However, I can reimplement it if people
> want to take a look. As I see it currently, there are five choices that
> could be taken to avoid using an additional pageflag
>
> 1. Re-use existing page flags. This is what I currently do in a later
> patch for the software suspend flags
> pros: Straight-forward implementation, appears to use no additional flags
> cons: When swsusp stops using the flags, anti-frag takes them right back
> Makes anti-frag mutually exclusive with swsusp
>
> 2. Create a per-zone bitmap for every MAX_ORDER block
> pros: Straight-forward implementation initially
> cons: Needs resizing during hotadd which could get complicated
> Bit more cache pressure
>
> 3. Use the low two bits of page->lru
> pros: Uses existing struct page field
> cons: It's a bit funky looking
>
> 4. Use the page->flags of the struct page backing the pages used
> for the memmap.
> pros: Similar to the bitmap idea except with less hotadd problems
> cons: Bit more cache pressure
>
> 5. Add an additional field page->hintsflags used for non-critical flags.
> There are patches out there like guest page hinting that want to
> consume flags but not for any vital purpose and usually for machines
> that have ample amounts of memory. For these features, add an
> additional page->hintsflags
> pros: Straight-forward to implement
> cons: Increses struct page size for some kernel features.
>
> I am leaning towards option 3 because it uses no additional memory but I'm
> not sure how people feel about using pointer magic like this.
>
> Any opinions?
If, as you stated in a previous mail, you'd like to have flags per
MAX_ORDER block, you'd already have to suffer the extra cache pressure.
In that case I vote for 4.
Otherwise 3 sounds doable, we already hide PAGE_MAPPING_ANON in a
pointer, so hiding flags is not new to struct page. It's just a question
of how good the implementation will look, I hope you'll not have to
visit all the list ops.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] Avoiding fragmentation with subzone groupings v25
2006-09-08 13:06 ` Peter Zijlstra
@ 2006-09-08 13:16 ` Mel Gorman
0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2006-09-08 13:16 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Andrew Morton, linux-mm, linux-kernel
On Fri, 8 Sep 2006, Peter Zijlstra wrote:
> On Fri, 2006-09-08 at 09:36 +0100, Mel Gorman wrote:
>> On Thu, 7 Sep 2006, Andrew Morton wrote:
>>
>>> On Thu, 7 Sep 2006 20:03:42 +0100 (IST)
>>> Mel Gorman <mel@csn.ul.ie> wrote:
>>>
>>>> When a page is allocated, the page-flags
>>>> are updated with a value indicating it's type of reclaimability so that it
>>>> is placed on the correct list on free.
>>>
>>> We're getting awful tight on page-flags.
>>>
>>
>> Yeah, I know :(
>>
>>> Would it be possible to avoid adding the flag? Say, have a per-zone bitmap
>>> of size (zone->present_pages/(1<<MAX_ORDER)) bits, then do a lookup in
>>> there to work out whether a particular page is within a MAX_ORDER clump of
>>> easy-reclaimable pages?
>>>
>>
>> An early version of the patches created such a bitmap and it was heavily
>> resisted for two reasons. It put more pressure on the cache and it needed
>> to be resized during hot-add and hot-remove. It was the latter issue
>> people had more problems with. However, I can reimplement it if people
>> want to take a look. As I see it currently, there are five choices that
>> could be taken to avoid using an additional pageflag
>>
>> 1. Re-use existing page flags. This is what I currently do in a later
>> patch for the software suspend flags
>> pros: Straight-forward implementation, appears to use no additional flags
>> cons: When swsusp stops using the flags, anti-frag takes them right back
>> Makes anti-frag mutually exclusive with swsusp
>>
>> 2. Create a per-zone bitmap for every MAX_ORDER block
>> pros: Straight-forward implementation initially
>> cons: Needs resizing during hotadd which could get complicated
>> Bit more cache pressure
>>
>> 3. Use the low two bits of page->lru
>> pros: Uses existing struct page field
>> cons: It's a bit funky looking
>>
>> 4. Use the page->flags of the struct page backing the pages used
>> for the memmap.
>> pros: Similar to the bitmap idea except with less hotadd problems
>> cons: Bit more cache pressure
>>
>> 5. Add an additional field page->hintsflags used for non-critical flags.
>> There are patches out there like guest page hinting that want to
>> consume flags but not for any vital purpose and usually for machines
>> that have ample amounts of memory. For these features, add an
>> additional page->hintsflags
>> pros: Straight-forward to implement
>> cons: Increses struct page size for some kernel features.
>>
>> I am leaning towards option 3 because it uses no additional memory but I'm
>> not sure how people feel about using pointer magic like this.
>>
>> Any opinions?
>
> If, as you stated in a previous mail, you'd like to have flags per
> MAX_ORDER block, you'd already have to suffer the extra cache pressure.
> In that case I vote for 4.
>
Originally, I wanted flags per MAX_ORDER block but I no longer have data
on whether this is a good idea or not. It could turn out that we steal
back and forth a lot when pageblock flags are used.
> Otherwise 3 sounds doable, we already hide PAGE_MAPPING_ANON in a
> pointer, so hiding flags is not new to struct page. It's just a question
> of how good the implementation will look, I hope you'll not have to
> visit all the list ops.
>
One way to find out for sure! I reckon I'll go off and implement options 3
and 4 as add-on patches that avoid the use of page->flags and see what
they look like. As you said, pointer magic in struct page is not new.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-09-08 13:16 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-07 19:03 [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Mel Gorman
2006-09-07 19:04 ` [PATCH 1/8] Add __GFP_EASYRCLM flag and update callers Mel Gorman
2006-09-07 19:04 ` [PATCH 2/8] Split the free lists into kernel and user parts Mel Gorman
2006-09-08 7:54 ` Peter Zijlstra
2006-09-08 9:20 ` Mel Gorman
2006-09-07 19:04 ` [PATCH 3/8] Split the per-cpu " Mel Gorman
2006-09-07 19:05 ` [PATCH 4/8] Add a configure option for anti-fragmentation Mel Gorman
2006-09-07 19:05 ` [PATCH 5/8] Drain per-cpu lists when high-order allocations fail Mel Gorman
2006-09-07 19:05 ` [PATCH 6/8] Move free pages between lists on steal Mel Gorman
2006-09-07 19:06 ` [PATCH 7/8] Introduce the RCLM_KERN allocation type Mel Gorman
2006-09-07 19:06 ` [PATCH 8/8] [DEBUG] Add statistics Mel Gorman
2006-09-08 0:58 ` [PATCH 0/8] Avoiding fragmentation with subzone groupings v25 Andrew Morton
2006-09-08 8:30 ` Peter Zijlstra
2006-09-08 9:24 ` Mel Gorman
2006-09-08 8:36 ` Mel Gorman
2006-09-08 13:06 ` Peter Zijlstra
2006-09-08 13:16 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).