From: Christoph Lameter <clameter@sgi.com>
To: Andy Whitcroft <apw@shadowen.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>, Mel Gorman <mel@skynet.ie>
Cc: David Chinner <dgc@sgi.com>
Subject: [RFC 12/26] SLUB: Slab reclaim through Lumpy reclaim
Date: Fri, 31 Aug 2007 18:41:19 -0700 [thread overview]
Message-ID: <20070901014222.073887169@sgi.com> (raw)
In-Reply-To: 20070901014107.719506437@sgi.com
[-- Attachment #1: 0012-slab_defrag_lumpy_reclaim.patch --]
[-- Type: text/plain, Size: 8011 bytes --]
Creates a special function kmem_cache_isolate_slab() and kmem_cache_reclaim()
to support lumpy reclaim.
In order to isolate pages we will have to handle slab page allocations in
such a way that we can determine if a slab is valid whenever we access it
regardless of its time in life.
A valid slab that can be freed has PageSlab(page) and page->inuse > 0 set.
So we need to make sure in allocate_slab that page->inuse is zero before
PageSlab is set otherwise kmem_cache_vacate may operate on a slab that
has not been properly setup yet.
kmem_cache_isolate_page() is called from lumpy reclaim to isolate pages
neighboring a page cache page that is being reclaimed. Lumpy reclaim will
gather the slabs and call kmem_cache_reclaim() on the list.
This means that we can remove a slab that is in the way of coalescing
together a higher order page.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/slab.h | 2 +
mm/slab.c | 13 +++++++
mm/slub.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++----
mm/vmscan.c | 15 ++++++--
4 files changed, 109 insertions(+), 9 deletions(-)
Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h 2007-08-28 20:05:42.000000000 -0700
+++ linux-2.6/include/linux/slab.h 2007-08-28 20:06:22.000000000 -0700
@@ -62,6 +62,8 @@ unsigned int kmem_cache_size(struct kmem
const char *kmem_cache_name(struct kmem_cache *);
int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
int kmem_cache_defrag(int node);
+int kmem_cache_isolate_slab(struct page *);
+int kmem_cache_reclaim(struct list_head *);
/*
* Please use this macro to create slab caches. Simply specify the
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c 2007-08-28 20:04:54.000000000 -0700
+++ linux-2.6/mm/slab.c 2007-08-28 20:06:22.000000000 -0700
@@ -2532,6 +2532,19 @@ int kmem_cache_defrag(int node)
return 0;
}
+/*
+ * SLAB does not support slab defragmentation
+ */
+int kmem_cache_isolate_slab(struct page *page)
+{
+ return -ENOSYS;
+}
+
+int kmem_cache_reclaim(struct list_head *zaplist)
+{
+ return 0;
+}
+
/**
* kmem_cache_destroy - delete a cache
* @cachep: the cache to destroy
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-08-28 20:04:54.000000000 -0700
+++ linux-2.6/mm/slub.c 2007-08-28 20:10:37.000000000 -0700
@@ -1006,6 +1006,7 @@ static inline int slab_pad_check(struct
static inline int check_object(struct kmem_cache *s, struct page *page,
void *object, int active) { return 1; }
static inline void add_full(struct kmem_cache_node *n, struct page *page) {}
+static inline void remove_full(struct kmem_cache *s, struct page *page) {}
static inline void kmem_cache_open_debug_check(struct kmem_cache *s) {}
#define slub_debug 0
#endif
@@ -1068,11 +1069,9 @@ static struct page *new_slab(struct kmem
n = get_node(s, page_to_nid(page));
if (n)
atomic_long_inc(&n->nr_slabs);
+
+ page->inuse = 0;
page->slab = s;
- page->flags |= 1 << PG_slab;
- if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
- SLAB_STORE_USER | SLAB_TRACE))
- SetSlabDebug(page);
start = page_address(page);
end = start + s->objects * s->size;
@@ -1090,8 +1089,18 @@ static struct page *new_slab(struct kmem
set_freepointer(s, last, NULL);
page->freelist = start;
- page->inuse = 0;
-out:
+
+ /*
+ * page->inuse must be 0 when PageSlab(page) becomes
+ * true so that defrag knows that this slab is not in use.
+ */
+ smp_wmb();
+ __SetPageSlab(page);
+ if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
+ SLAB_STORE_USER | SLAB_TRACE))
+ SetSlabDebug(page);
+
+ out:
if (flags & __GFP_WAIT)
local_irq_disable();
return page;
@@ -2638,6 +2647,73 @@ static unsigned long count_partial(struc
return x;
}
+ /*
+ * Isolate page from the slab partial lists. Return 0 if succesful.
+ *
+ * After isolation the LRU field can be used to put the page onto
+ * a reclaim list.
+ */
+int kmem_cache_isolate_slab(struct page *page)
+{
+ unsigned long flags;
+ struct kmem_cache *s;
+ int rc = -ENOENT;
+
+ if (!PageSlab(page) || SlabFrozen(page))
+ return rc;
+
+ /*
+ * Get a reference to the page. Return if its freed or being freed.
+ * This is necessary to make sure that the page does not vanish
+ * from under us before we are able to check the result.
+ */
+ if (!get_page_unless_zero(page))
+ return rc;
+
+ local_irq_save(flags);
+ slab_lock(page);
+
+ /*
+ * Check a variety of conditions to insure that the page was not
+ * 1. Freed
+ * 2. Frozen
+ * 3. Is in the process of being freed (min one remaining object)
+ */
+ if (!PageSlab(page) || SlabFrozen(page) || !page->inuse) {
+ slab_unlock(page);
+ put_page(page);
+ goto out;
+ }
+
+ /*
+ * Drop reference. There are object remaining and therefore
+ * the slab lock will be taken before the last objects can
+ * be removed. So we cannot be in the process of freeing the
+ * object.
+ *
+ * We set the slab frozen before releasing the lock. This means
+ * that no free action will be performed. If it becomes empty
+ * then we will free it during kmem_cache_reclaim().
+ */
+ BUG_ON(page_count(page) <= 1);
+ put_page(page);
+
+ /*
+ * Remove the slab from the lists and mark it frozen
+ */
+ s = page->slab;
+ if (page->inuse < s->objects)
+ remove_partial(s, page);
+ else if (s->flags & SLAB_STORE_USER)
+ remove_full(s, page);
+ SetSlabFrozen(page);
+ slab_unlock(page);
+ rc = 0;
+out:
+ local_irq_restore(flags);
+ return rc;
+}
+
/*
* Vacate all objects in the given slab.
*
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2007-08-28 20:05:42.000000000 -0700
+++ linux-2.6/mm/vmscan.c 2007-08-28 20:06:22.000000000 -0700
@@ -657,6 +657,7 @@ static int __isolate_lru_page(struct pag
*/
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
+ struct list_head *slab_pages,
unsigned long *scanned, int order, int mode)
{
unsigned long nr_taken = 0;
@@ -730,7 +731,13 @@ static unsigned long isolate_lru_pages(u
case -EBUSY:
/* else it is being freed elsewhere */
list_move(&cursor_page->lru, src);
+ break;
+
default:
+ if (slab_pages &&
+ kmem_cache_isolate_slab(cursor_page) == 0)
+ list_add(&cursor_page->lru,
+ slab_pages);
break;
}
}
@@ -766,6 +773,7 @@ static unsigned long shrink_inactive_lis
struct zone *zone, struct scan_control *sc)
{
LIST_HEAD(page_list);
+ LIST_HEAD(slab_list);
struct pagevec pvec;
unsigned long nr_scanned = 0;
unsigned long nr_reclaimed = 0;
@@ -783,7 +791,7 @@ static unsigned long shrink_inactive_lis
nr_taken = isolate_lru_pages(sc->swap_cluster_max,
&zone->inactive_list,
- &page_list, &nr_scan, sc->order,
+ &page_list, &slab_list, &nr_scan, sc->order,
(sc->order > PAGE_ALLOC_COSTLY_ORDER)?
ISOLATE_BOTH : ISOLATE_INACTIVE);
nr_active = clear_active_flags(&page_list);
@@ -793,6 +801,7 @@ static unsigned long shrink_inactive_lis
-(nr_taken - nr_active));
zone->pages_scanned += nr_scan;
spin_unlock_irq(&zone->lru_lock);
+ kmem_cache_reclaim(&slab_list);
nr_scanned += nr_scan;
nr_freed = shrink_page_list(&page_list, sc);
@@ -934,8 +943,8 @@ force_reclaim_mapped:
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
- &l_hold, &pgscanned, sc->order, ISOLATE_ACTIVE);
+ pgmoved = isolate_lru_pages(nr_pages, &zone->active_list, &l_hold,
+ NULL, &pgscanned, sc->order, ISOLATE_ACTIVE);
zone->pages_scanned += pgscanned;
__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
spin_unlock_irq(&zone->lru_lock);
--
next prev parent reply other threads:[~2007-09-01 1:42 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-01 1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
2007-09-01 1:41 ` [RFC 01/26] SLUB: Extend slabinfo to support -D and -C options Christoph Lameter
2007-09-01 1:41 ` [RFC 02/26] SLUB: Move count_partial() Christoph Lameter
2007-09-01 1:41 ` [RFC 03/26] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio Christoph Lameter
2007-09-01 1:41 ` [RFC 04/26] SLUB: Add defrag_ratio field and sysfs support Christoph Lameter
2007-09-01 1:41 ` [RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000920 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001344 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namespace /sys/slab/Acpi-Operand /sys/slab/Acpi-Pa rse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/slab/RAW /sys/slab/TCP /sys/slab/UDP /sys/sl Christoph Lameter
2007-09-01 1:41 ` [RFC 06/26] SLUB: Add get() and kick() methods Christoph Lameter
2007-09-01 1:41 ` [RFC 07/26] SLUB: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2007-09-01 1:41 ` [RFC 08/26] SLUB: Consolidate add_partial and add_partial_tail to one function Christoph Lameter
2007-09-01 1:41 ` [RFC 09/26] SLUB: Slab defrag core Christoph Lameter
2007-09-01 1:41 ` [RFC 10/26] SLUB: Trigger defragmentation from memory reclaim Christoph Lameter
2007-09-01 1:41 ` [RFC 11/26] VM: Allow get_page_unless_zero on compound pages Christoph Lameter
2007-09-01 1:41 ` Christoph Lameter [this message]
2007-09-01 1:41 ` [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts Christoph Lameter
2007-09-19 15:08 ` Rik van Riel
2007-09-19 18:00 ` Christoph Lameter
2007-09-01 1:41 ` [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support Christoph Lameter
2007-09-01 2:04 ` KAMEZAWA Hiroyuki
2007-09-01 2:07 ` Christoph Lameter
2007-09-01 1:41 ` [RFC 15/26] bufferhead: Revert constructor removal Christoph Lameter
2007-09-01 1:41 ` [RFC 16/26] Buffer heads: Support slab defrag Christoph Lameter
2007-09-01 1:41 ` [RFC 17/26] inodes: Support generic defragmentation Christoph Lameter
2007-09-01 1:41 ` [RFC 18/26] FS: ExtX filesystem defrag Christoph Lameter
2007-09-01 9:48 ` Jeff Garzik
2007-09-02 11:37 ` Christoph Lameter
2007-09-01 1:41 ` [RFC 19/26] FS: XFS slab defragmentation Christoph Lameter
2007-09-01 1:41 ` [RFC 20/26] FS: Proc filesystem support for slab defrag Christoph Lameter
2007-09-01 1:41 ` [RFC 21/26] FS: Slab defrag: Reiserfs support Christoph Lameter
2007-09-01 1:41 ` [RFC 22/26] FS: Socket inode defragmentation Christoph Lameter
2007-09-01 1:41 ` [RFC 23/26] dentries: Extract common code to remove dentry from lru Christoph Lameter
2007-09-01 1:41 ` [RFC 24/26] dentries: Add constructor Christoph Lameter
2007-09-01 1:41 ` [RFC 25/26] dentries: dentry defragmentation Christoph Lameter
2007-09-01 1:41 ` [RFC 26/26] SLUB: Add debugging for slab defrag Christoph Lameter
2007-09-06 20:34 ` [RFC 00/26] Slab defragmentation V5 Jörn Engel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070901014222.073887169@sgi.com \
--to=clameter@sgi.com \
--cc=apw@shadowen.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).