From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 607A46B0033 for ; Wed, 27 Dec 2017 17:06:55 -0500 (EST) Received: by mail-it0-f72.google.com with SMTP id i66so23338967itf.0 for ; Wed, 27 Dec 2017 14:06:55 -0800 (PST) Received: from resqmta-ch2-09v.sys.comcast.net (resqmta-ch2-09v.sys.comcast.net. [2001:558:fe21:29:69:252:207:41]) by mx.google.com with ESMTPS id y135si15603910itb.85.2017.12.27.14.06.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:06:54 -0800 (PST) Message-Id: <20171227220636.361857279@linux.com> Date: Wed, 27 Dec 2017 16:06:36 -0600 From: Christoph Lameter Subject: [RFC 0/8] Xarray object migration V1 Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig This is a patchset on top of Matthew Wilcox Xarray code and implements object migration of xarray nodes. The migration is integrated into the defragmetation and shrinking logic of the slab allocator. Defragmentation will ensure that all xarray slab pages have less objects available than specified by the slab defrag ratio. Slab shrinking will create a slab cache with optimal object density. Only one slab page will have available objects per node. To test apply this patchset on top of Matthew Wilcox Xarray code from Dec 11th (See infradead github). Then go to /sys/kernel/slab/radix_tree Inspect the number of partial slab pages cat partial And then perform a cache shrink operation echo 1 >shrink This is just a barebones approach using a special mode of the slab migration patchset that does not require refcounts. If this is acceptable then additional functionality can be added: 1. Migration of objects to a specific node 2. Dispersion of objects across all nodes (MPOL_INTERLEAVE) 3. Subsystems can request to move an object to a specific node. 4. Tying into the page migration and page defragmentation logic so that so far unmovable pages that are in the way of creating a contiguous block of memory will become movable. This is only possible for xarray for now but it would be worthwhile to extend this to dentries and inodes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f197.google.com (mail-ua0-f197.google.com [209.85.217.197]) by kanga.kvack.org (Postfix) with ESMTP id 1A63B6B0069 for ; Wed, 27 Dec 2017 17:09:29 -0500 (EST) Received: by mail-ua0-f197.google.com with SMTP id a4so15713370uae.14 for ; Wed, 27 Dec 2017 14:09:29 -0800 (PST) Received: from resqmta-ch2-05v.sys.comcast.net (resqmta-ch2-05v.sys.comcast.net. [2001:558:fe21:29:69:252:207:37]) by mx.google.com with ESMTPS id h32si2809318uae.350.2017.12.27.14.09.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:09:28 -0800 (PST) Message-Id: <20171227220652.402842142@linux.com> Date: Wed, 27 Dec 2017 16:06:39 -0600 From: Christoph Lameter Subject: [RFC 3/8] slub: Add isolate() and migrate() methods References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=isolate_and_migrate_methods Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig Add the two methods needed for moving objects and enable the display of the callbacks via the /sys/kernel/slab interface. Add documentation explaining the use of these methods and the prototypes for slab.h. Add functions to setup the callbacks method for a slab cache. Add empty functions for SLAB/SLOB. The API is generic so it could be theoretically implemented for these allocators as well. Signed-off-by: Christoph Lameter --- include/linux/slab.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/slub_def.h | 3 ++ mm/slub.c | 29 ++++++++++++++++++++++++++- 3 files changed, 81 insertions(+), 1 deletion(-) Index: linux/include/linux/slub_def.h =================================================================== --- linux.orig/include/linux/slub_def.h +++ linux/include/linux/slub_def.h @@ -98,6 +98,9 @@ struct kmem_cache { gfp_t allocflags; /* gfp flags to use on each alloc */ int refcount; /* Refcount for slab cache destroy */ void (*ctor)(void *); + kmem_isolate_func *isolate; + kmem_migrate_func *migrate; + int inuse; /* Offset to metadata */ int align; /* Alignment */ int reserved; /* Reserved bytes at the end of slabs */ Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -3479,7 +3479,6 @@ static int calculate_sizes(struct kmem_c else s->flags &= ~__OBJECT_POISON; - /* * If we are Redzoning then check if there is some space between the * end of the object and the free pointer. If not then add an @@ -4275,6 +4274,25 @@ int __kmem_cache_create(struct kmem_cach return err; } +void kmem_cache_setup_mobility(struct kmem_cache *s, + kmem_isolate_func isolate, kmem_migrate_func migrate) +{ + /* + * Defragmentable slabs must have a ctor otherwise objects may be + * in an undetermined state after they are allocated. + */ + BUG_ON(!s->ctor); + s->isolate = isolate; + s->migrate = migrate; + /* + * Sadly serialization requirements currently mean that we have + * to disable fast cmpxchg based processing. + */ + s->flags &= ~__CMPXCHG_DOUBLE; + +} +EXPORT_SYMBOL(kmem_cache_setup_mobility); + void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller) { struct kmem_cache *s; @@ -4969,6 +4987,20 @@ static ssize_t ops_show(struct kmem_cach if (s->ctor) x += sprintf(buf + x, "ctor : %pS\n", s->ctor); + + if (s->isolate) { + x += sprintf(buf + x, "isolate : "); + x += sprint_symbol(buf + x, + (unsigned long)s->isolate); + x += sprintf(buf + x, "\n"); + } + + if (s->migrate) { + x += sprintf(buf + x, "migrate : "); + x += sprint_symbol(buf + x, + (unsigned long)s->migrate); + x += sprintf(buf + x, "\n"); + } return x; } SLAB_ATTR_RO(ops); Index: linux/include/linux/slab.h =================================================================== --- linux.orig/include/linux/slab.h +++ linux/include/linux/slab.h @@ -146,6 +146,68 @@ void memcg_deactivate_kmem_caches(struct void memcg_destroy_kmem_caches(struct mem_cgroup *); /* + * Function prototypes passed to kmem_cache_setup_mobility() to enable mobile + * objects and targeted reclaim in slab caches. + */ + +/* + * kmem_cache_isolate_func() is called with locks held so that the slab + * objects cannot be freed. We are in an atomic context and no slab + * operations may be performed. The purpose of kmem_cache_isolate_func() + * is to pin the object so that it cannot be freed until + * kmem_cache_migrate_func() has processed them. This may be accomplished + * by increasing the refcount or setting a flag. + * + * Parameters passed are the number of objects to process and an array of + * pointers to objects which are intended to be moved. + * + * Returns a pointer that is passed to the migrate function. If any objects + * cannot be touched at this point then the pointer may indicate a + * failure and then the migration function can simply remove the references + * that were already obtained. The private data could be used to track + * the objects that were already pinned. + * + * The object pointer array passed is also passed to kmem_cache_migrate(). + * The function may remove objects from the array by setting pointers to + * NULL. This is useful if we can determine that an object is being freed + * because kmem_cache_isolate_func() was called when the subsystem + * was calling kmem_cache_free(). + * In that case it is not necessary to increase the refcount or + * specially mark the object because the release of the slab lock + * will lead to the immediate freeing of the object. + */ +typedef void *kmem_isolate_func(struct kmem_cache *, void **, int); + +/* + * kmem_cache_move_migrate_func is called with no locks held and interrupts + * enabled. Sleeping is possible. Any operation may be performed in + * migrate(). kmem_cache_migrate_func should allocate new objects and + * free all the objects. + ** + * Parameters passed are the number of objects in the array, the array of + * pointers to the objects, the NUMA node where the object should be + * allocated and the pointer returned by kmem_cache_isolate_func(). + * + * Success is checked by examining the number of remaining objects in + * the slab. If the number is zero then the objects will be freed. + */ +typedef void kmem_migrate_func(struct kmem_cache *, void **, int nr, int node, void *private); + +/* + * kmem_cache_setup_mobility() is used to setup callbacks for a slab cache. + */ +#ifdef CONFIG_SLUB +void kmem_cache_setup_mobility(struct kmem_cache *, kmem_isolate_func, + kmem_migrate_func); +#else +static inline void kmem_cache_setup_mobility(struct kmem_cache *s, + kmem_isolate_func isolate, kmem_migrate_func migrate) {} +#endif + +/* + * Allocator specific definitions. These are mainly used to establish optimized + * ways to convert kmalloc() calls to kmem_cache_alloc() invocations by + * selecting the appropriate general cache at compile time. * Please use this macro to create slab caches. Simply specify the * name of the structure and maybe some flags that are listed above. * Index: linux/mm/slab_common.c =================================================================== --- linux.orig/mm/slab_common.c +++ linux/mm/slab_common.c @@ -278,7 +278,7 @@ int slab_unmergeable(struct kmem_cache * if (!is_root_cache(s)) return 1; - if (s->ctor) + if (s->ctor || s->isolate || s->migrate) return 1; /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f197.google.com (mail-ua0-f197.google.com [209.85.217.197]) by kanga.kvack.org (Postfix) with ESMTP id 044886B0253 for ; Wed, 27 Dec 2017 17:09:30 -0500 (EST) Received: by mail-ua0-f197.google.com with SMTP id b42so13868501uah.20 for ; Wed, 27 Dec 2017 14:09:29 -0800 (PST) Received: from resqmta-ch2-09v.sys.comcast.net (resqmta-ch2-09v.sys.comcast.net. [2001:558:fe21:29:69:252:207:41]) by mx.google.com with ESMTPS id 61si1881038uas.96.2017.12.27.14.09.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:09:29 -0800 (PST) Message-Id: <20171227220652.487092808@linux.com> Date: Wed, 27 Dec 2017 16:06:40 -0600 From: Christoph Lameter Subject: [RFC 4/8] slub: Sort slab cache list and establish maximum objects for defrag slabs References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=sort_and_max Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig It is advantageous to have all defragmentable slabs together at the beginning of the list of slabs so that there is no need to scan the complete list. Put defragmentable caches first when adding a slab cache and others last. Determine the maximum number of objects in defragmentable slabs. This allows the sizing of the array holding refs to objects in a slab later. Signed-off-by: Christoph Lameter --- mm/slub.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -197,6 +197,9 @@ static inline bool kmem_cache_has_cpu_pa /* Use cmpxchg_double */ #define __CMPXCHG_DOUBLE ((slab_flags_t __force)0x40000000U) +/* Maximum objects in defragmentable slabs */ +static unsigned int max_defrag_slab_objects; + /* * Tracking user of a slab. */ @@ -4274,22 +4278,45 @@ int __kmem_cache_create(struct kmem_cach return err; } +/* + * Allocate a slab scratch space that is sufficient to keep at least + * max_defrag_slab_objects pointers to individual objects and also a bitmap + * for max_defrag_slab_objects. + */ +static inline void *alloc_scratch(void) +{ + return kmalloc(max_defrag_slab_objects * sizeof(void *) + + BITS_TO_LONGS(max_defrag_slab_objects) * sizeof(unsigned long), + GFP_KERNEL); +} + void kmem_cache_setup_mobility(struct kmem_cache *s, kmem_isolate_func isolate, kmem_migrate_func migrate) { + int max_objects = oo_objects(s->max); + /* * Defragmentable slabs must have a ctor otherwise objects may be * in an undetermined state after they are allocated. */ BUG_ON(!s->ctor); + + mutex_lock(&slab_mutex); + s->isolate = isolate; s->migrate = migrate; + /* * Sadly serialization requirements currently mean that we have * to disable fast cmpxchg based processing. */ s->flags &= ~__CMPXCHG_DOUBLE; + list_move(&s->list, &slab_caches); /* Move to top */ + if (max_objects > max_defrag_slab_objects) + max_defrag_slab_objects = max_objects; + + mutex_unlock(&slab_mutex); } EXPORT_SYMBOL(kmem_cache_setup_mobility); Index: linux/mm/slab_common.c =================================================================== --- linux.orig/mm/slab_common.c +++ linux/mm/slab_common.c @@ -392,7 +392,7 @@ static struct kmem_cache *create_cache(c goto out_free_cache; s->refcount = 1; - list_add(&s->list, &slab_caches); + list_add_tail(&s->list, &slab_caches); memcg_link_cache(s); out: if (err) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f199.google.com (mail-ua0-f199.google.com [209.85.217.199]) by kanga.kvack.org (Postfix) with ESMTP id ADD7E6B025F for ; Wed, 27 Dec 2017 17:09:33 -0500 (EST) Received: by mail-ua0-f199.google.com with SMTP id a2so2169023uak.0 for ; Wed, 27 Dec 2017 14:09:33 -0800 (PST) Received: from resqmta-ch2-05v.sys.comcast.net (resqmta-ch2-05v.sys.comcast.net. [2001:558:fe21:29:69:252:207:37]) by mx.google.com with ESMTPS id h32si2809318uae.350.2017.12.27.14.09.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:09:33 -0800 (PST) Message-Id: <20171227220652.718663523@linux.com> Date: Wed, 27 Dec 2017 16:06:43 -0600 From: Christoph Lameter Subject: [RFC 7/8] xarray: Implement migration function for objects References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=xarray Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig Implement functions to migrate objects. This is based on initial code by Matthew Wilcox and was modified to work with slab object migration. Signed-off-by: Christoph Lameter Index: linux/lib/radix-tree.c =================================================================== --- linux.orig/lib/radix-tree.c +++ linux/lib/radix-tree.c @@ -1754,6 +1754,18 @@ static int radix_tree_cpu_dead(unsigned return 0; } + +extern void xa_object_migrate(void *tree_node, int numa_node); + +static void radix_tree_migrate(struct kmem_cache *s, void **objects, int nr, + int node, void *private) +{ + int i; + + for (i=0; iprivate_list)); + node->array = XA_FREE; call_rcu(&node->rcu_head, radix_tree_node_rcu_free); } @@ -1569,6 +1570,51 @@ void xa_destroy(struct xarray *xa) } EXPORT_SYMBOL(xa_destroy); +void xa_object_migrate(struct xa_node *node, int numa_node) +{ + struct xarray *xa = READ_ONCE(node->array); + void __rcu **slot; + struct xa_node *new_node; + int i; + + /* Freed or not yet in tree then skip */ + if (!xa || xa == XA_FREE) + return; + + new_node = kmem_cache_alloc_node(radix_tree_node_cachep, GFP_KERNEL, numa_node); + + xa_lock_irq(xa); + + /* Check again..... */ + if (xa != node->array || !list_empty(&node->private_list)) { + node = new_node; + goto unlock; + } + + memcpy(new_node, node, sizeof(struct xa_node)); + + /* Move pointers to new node */ + INIT_LIST_HEAD(&new_node->private_list); + for (i = 0; i < XA_CHUNK_SIZE; i++) { + void *x = xa_entry_locked(xa, new_node, i); + + if (xa_is_node(x)) + rcu_assign_pointer(xa_to_node(x)->parent, new_node); + } + if (!new_node->parent) + slot = &xa->xa_head; + else + slot = &xa_parent_locked(xa, new_node)->slots[new_node->offset]; + rcu_assign_pointer(*slot, xa_mk_node(new_node)); + +unlock: + xa_unlock_irq(xa); + xa_node_free(node); + rcu_barrier(); + return; + +} + #ifdef XA_DEBUG void xa_dump_node(const struct xa_node *node) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f69.google.com (mail-vk0-f69.google.com [209.85.213.69]) by kanga.kvack.org (Postfix) with ESMTP id 2CFEA6B0069 for ; Wed, 27 Dec 2017 17:11:43 -0500 (EST) Received: by mail-vk0-f69.google.com with SMTP id q143so20467864vkb.19 for ; Wed, 27 Dec 2017 14:11:43 -0800 (PST) Received: from resqmta-ch2-12v.sys.comcast.net (resqmta-ch2-12v.sys.comcast.net. [2001:558:fe21:29:69:252:207:44]) by mx.google.com with ESMTPS id 9si10641591uac.138.2017.12.27.14.11.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:11:42 -0800 (PST) Message-Id: <20171227220652.322991754@linux.com> Date: Wed, 27 Dec 2017 16:06:38 -0600 From: Christoph Lameter Subject: [RFC 2/8] slub: Add defrag_ratio field and sysfs support References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=defrag_ratio Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig "defrag_ratio" is used to set the threshold at which defragmentation should be attempted on a slab page. "defrag_ratio" is percentage in the range of 1 - 100. If more than that percentage of slots in a slab page are unused the the slab page will become subject to defragmentation. Add a defrag ratio field and set it to 30% by default. A limit of 30% specifies that less than 3 out of 10 available slots for objects need to be leftover before slab defragmentation will be attempted on the remaining objects. Signed-off-by: Christoph Lameter --- Documentation/ABI/testing/sysfs-kernel-slab | 13 +++++++++++++ include/linux/slub_def.h | 6 ++++++ mm/slub.c | 23 +++++++++++++++++++++++ 3 files changed, 42 insertions(+) Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -3613,6 +3613,7 @@ static int kmem_cache_open(struct kmem_c set_cpu_partial(s); + s->defrag_ratio = 30; #ifdef CONFIG_NUMA s->remote_node_defrag_ratio = 1000; #endif @@ -5078,6 +5079,27 @@ static ssize_t reserved_show(struct kmem } SLAB_ATTR_RO(reserved); +static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf) +{ + return sprintf(buf, "%d\n", s->defrag_ratio); +} + +static ssize_t defrag_ratio_store(struct kmem_cache *s, + const char *buf, size_t length) +{ + unsigned long ratio; + int err; + + err = kstrtoul(buf, 10, &ratio); + if (err) + return err; + + if (ratio < 100) + s->defrag_ratio = ratio; + return length; +} +SLAB_ATTR(defrag_ratio); + #ifdef CONFIG_SLUB_DEBUG static ssize_t slabs_show(struct kmem_cache *s, char *buf) { @@ -5402,6 +5424,7 @@ static struct attribute *slab_attrs[] = &validate_attr.attr, &alloc_calls_attr.attr, &free_calls_attr.attr, + &defrag_ratio_attr.attr, #endif #ifdef CONFIG_ZONE_DMA &cache_dma_attr.attr, Index: linux/Documentation/ABI/testing/sysfs-kernel-slab =================================================================== --- linux.orig/Documentation/ABI/testing/sysfs-kernel-slab +++ linux/Documentation/ABI/testing/sysfs-kernel-slab @@ -180,6 +180,19 @@ Description: list. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. +What: /sys/kernel/slab/cache/defrag_ratio +Date: December 2017 +KernelVersion: 4.16 +Contact: Christoph Lameter + Pekka Enberg , +Description: + The defrag_ratio files allows the control of how agressive + slab fragmentation reduction works at reclaiming objects from + sparsely populated slabs. This is a percentage. If a slab + has more than this percentage of available object then reclaim + will attempt to reclaim objects so that the whole slab + page can be freed. The default is 30%. + What: /sys/kernel/slab/cache/deactivate_to_tail Date: February 2008 KernelVersion: 2.6.25 Index: linux/include/linux/slub_def.h =================================================================== --- linux.orig/include/linux/slub_def.h +++ linux/include/linux/slub_def.h @@ -104,6 +104,12 @@ struct kmem_cache { int red_left_pad; /* Left redzone padding size */ const char *name; /* Name (only for display!) */ struct list_head list; /* List of slab caches */ + int defrag_ratio; /* + * Ratio used to check the percentage of + * objects allocate in a slab page. + * If less than this ratio is allocated + * then reclaim attempts are made. + */ #ifdef CONFIG_SYSFS struct kobject kobj; /* For sysfs */ struct work_struct kobj_remove_work; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f72.google.com (mail-vk0-f72.google.com [209.85.213.72]) by kanga.kvack.org (Postfix) with ESMTP id C963A6B0069 for ; Wed, 27 Dec 2017 17:11:46 -0500 (EST) Received: by mail-vk0-f72.google.com with SMTP id w69so17892041vkh.3 for ; Wed, 27 Dec 2017 14:11:46 -0800 (PST) Received: from resqmta-ch2-03v.sys.comcast.net (resqmta-ch2-03v.sys.comcast.net. [2001:558:fe21:29:69:252:207:35]) by mx.google.com with ESMTPS id i7si8823622vkf.303.2017.12.27.14.11.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:11:46 -0800 (PST) Message-Id: <20171227220652.570663500@linux.com> Date: Wed, 27 Dec 2017 16:06:41 -0600 From: Christoph Lameter Subject: [RFC 5/8] slub: Slab defrag core References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=mobility_core Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig Slab defragmentation may occur: 1. Unconditionally when kmem_cache_shrink is called on a slab cache by the kernel calling kmem_cache_shrink. 2. Through the use of the slabinfo command. 3. Per node defrag conditionally when kmem_cache_defrag() is called (can be called from reclaim code with a later patch). Defragmentation is only performed if the fragmentation of the slab is lower than the specified percentage. Fragmentation ratios are measured by calculating the percentage of objects in use compared to the total number of objects that the slab page can accomodate. The scanning of slab caches is optimized because the defragmentable slabs come first on the list. Thus we can terminate scans on the first slab encountered that does not support defragmentation. kmem_cache_defrag() takes a node parameter. This can either be -1 if defragmentation should be performed on all nodes, or a node number. A couple of functions must be setup via a call to kmem_cache_setup_defrag() in order for a slabcache to support defragmentation. These are kmem_defrag_isolate_func (void *isolate(struct kmem_cache *s, void **objects, int nr)) Must stabilize that the objects and ensure that they will not be freed until the migration function is complete. SLUB guarantees that the objects are still allocated. However, other threads may be blocked in slab_free() attempting to free objects in the slab. These may succeed as soon as isolate() returns to the slab allocator. The function must be able to detect such situations and void the attempts to free such objects (by for example voiding the corresponding entry in the objects array). No slab operations may be performed in isolate(). Interrupts are disabled. What can be done is very limited. The slab lock for the page that contains the object is taken. Any attempt to perform a slab operation may lead to a deadlock. kmem_defrag_isolate_func returns a private pointer that is passed to kmem_defrag_kick_func(). Should we be unable to obtain all references then that pointer may indicate to the kick() function that it should not attempt any object removal or move but simply undo the measure that were used to stabilize the object. kmem_defrag_migrate_func (void migrate(struct kmem_cache *, void **objects, int nr, int node, void *get_result)) After SLUB has stabilzed the objects in a slab it will then drop all locks and use migrate() to move objects out of the slab. The existence of the object is guaranteed by virtue of the earlier obtained references via kmem_defrag_get_func(). The callback may perform any slab operation since no locks are held at the time of call. The callback should remove the object from the slab in some way. This may be accomplished by reclaiming the object and then running kmem_cache_free() or reallocating it and then running kmem_cache_free(). Reallocation is advantageous because the partial slabs were just sorted to have the partial slabs with the most objects first. Reallocation is likely to result in filling up a slab in addition to freeing up one slab. A filled up slab can also be removed from the partial list. So there could be a double effect. kmem_defrag_migrate_func() does not return a result. SLUB will check the number of remaining objects in the slab. If all objects were removed then the slab is freed and we have reduced the overall fragmentation of the slab cache. Signed-off-by: Christoph Lameter --- include/linux/slab.h | 3 mm/slub.c | 265 ++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 215 insertions(+), 53 deletions(-) Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -353,6 +353,12 @@ static __always_inline void slab_lock(st bit_spin_lock(PG_locked, &page->flags); } +static __always_inline int slab_trylock(struct page *page) +{ + VM_BUG_ON_PAGE(PageTail(page), page); + return bit_spin_trylock(PG_locked, &page->flags); +} + static __always_inline void slab_unlock(struct page *page) { VM_BUG_ON_PAGE(PageTail(page), page); @@ -3903,79 +3909,6 @@ void kfree(const void *x) } EXPORT_SYMBOL(kfree); -#define SHRINK_PROMOTE_MAX 32 - -/* - * kmem_cache_shrink discards empty slabs and promotes the slabs filled - * up most to the head of the partial lists. New allocations will then - * fill those up and thus they can be removed from the partial lists. - * - * The slabs with the least items are placed last. This results in them - * being allocated from last increasing the chance that the last objects - * are freed in them. - */ -int __kmem_cache_shrink(struct kmem_cache *s) -{ - int node; - int i; - struct kmem_cache_node *n; - struct page *page; - struct page *t; - struct list_head discard; - struct list_head promote[SHRINK_PROMOTE_MAX]; - unsigned long flags; - int ret = 0; - - flush_all(s); - for_each_kmem_cache_node(s, node, n) { - INIT_LIST_HEAD(&discard); - for (i = 0; i < SHRINK_PROMOTE_MAX; i++) - INIT_LIST_HEAD(promote + i); - - spin_lock_irqsave(&n->list_lock, flags); - - /* - * Build lists of slabs to discard or promote. - * - * Note that concurrent frees may occur while we hold the - * list_lock. page->inuse here is the upper limit. - */ - list_for_each_entry_safe(page, t, &n->partial, lru) { - int free = page->objects - page->inuse; - - /* Do not reread page->inuse */ - barrier(); - - /* We do not keep full slabs on the list */ - BUG_ON(free <= 0); - - if (free == page->objects) { - list_move(&page->lru, &discard); - n->nr_partial--; - } else if (free <= SHRINK_PROMOTE_MAX) - list_move(&page->lru, promote + free - 1); - } - - /* - * Promote the slabs filled up most to the head of the - * partial list. - */ - for (i = SHRINK_PROMOTE_MAX - 1; i >= 0; i--) - list_splice(promote + i, &n->partial); - - spin_unlock_irqrestore(&n->list_lock, flags); - - /* Release empty slabs */ - list_for_each_entry_safe(page, t, &discard, lru) - discard_slab(s, page); - - if (slabs_node(s, node)) - ret = 1; - } - - return ret; -} - #ifdef CONFIG_MEMCG static void kmemcg_cache_deact_after_rcu(struct kmem_cache *s) { @@ -4289,14 +4222,270 @@ static inline void *alloc_scratch(void) GFP_KERNEL); } +/* + * Move all objects in the given slab. + * + * If the target node is the current node then the object is moved else + * where on the same node. Which is an effective way of defragmentation + * since the current slab page with its object is exempt from allocation. + * + * The scratch area passed to list function is sufficient to hold + * struct listhead times objects per slab. We use it to hold void ** times + * objects per slab plus a bitmap for each object. + */ +static void kmem_cache_move(struct page *page, void *scratch, int node) +{ + void **vector = scratch; + void *p; + void *addr = page_address(page); + struct kmem_cache *s; + unsigned long *map; + int count; + void *private; + unsigned long flags; + unsigned long objects; + + local_irq_save(flags); + slab_lock(page); + + BUG_ON(!PageSlab(page)); /* Must be s slab page */ + BUG_ON(!page->frozen); /* Slab must have been frozen earlier */ + + s = page->slab_cache; + objects = page->objects; + map = scratch + objects * sizeof(void **); + + /* Determine used objects */ + bitmap_fill(map, objects); + for (p = page->freelist; p; p = get_freepointer(s, p)) + __clear_bit(slab_index(p, s, addr), map); + + /* Build vector of pointers to objects */ + count = 0; + memset(vector, 0, objects * sizeof(void **)); + for_each_object(p, s, addr, objects) + if (test_bit(slab_index(p, s, addr), map)) + vector[count++] = p; + + if (s->isolate) + private = s->isolate(s, vector, count); + else + /* + * Objects do not need to be isolated. + */ + private = NULL; + + /* + * Pinned the objects. Now we can drop the slab lock. The slab + * is frozen so it cannot vanish from under us nor will + * allocations be performed on the slab. However, unlocking the + * slab will allow concurrent slab_frees to proceed. So + * the subsystem must have a way to tell from the content + * of the object that it was freed. + * + * If neither RCU nor ctor is being used then the object + * may be modified by the allocator after being freed + * which may disrupt the ability of the migrate function + * to tell if the object is free or not. + */ + slab_unlock(page); + local_irq_restore(flags); + + /* + * Perform callbacks to move the objects. + */ + s->migrate(s, vector, count, node, private); +} + +/* + * Move slab objects on a particular node of the cache. + * Release slabs with zero objects and tryg to call the move function for + * slabs with less than the configured percentage of objects allocated. + * + * Returns the number of slabs left on the node after the operation. + */ +static unsigned long __move(struct kmem_cache *s, int node, + int target_node, int ratio) +{ + unsigned long flags; + struct page *page, *page2; + LIST_HEAD(move_list); + struct kmem_cache_node *n = get_node(s, node); + + if (node == target_node && n->nr_partial <= 1) + /* + * Trying to reduce fragmentataion on a node but there is + * only a single or no partial slab page. This is already + * the optimal object density that we can reach + */ + goto out; + + spin_lock_irqsave(&n->list_lock, flags); + list_for_each_entry_safe(page, page2, &n->partial, lru) { + if (!slab_trylock(page)) + /* Busy slab. Get out of the way */ + continue; + + if (page->inuse) { + if (page->inuse > ratio * page->objects / 100) { + slab_unlock(page); + /* + * Skip slab because the object density + * in the slab page is high enough + */ + continue; + } + + list_move(&page->lru, &move_list); + if (s->migrate) { + /* Remove page from being considered for allocations */ + n->nr_partial--; + page->frozen = 1; + } + slab_unlock(page); + } else { + /* Empty slab page */ + list_del(&page->lru); + n->nr_partial--; + slab_unlock(page); + discard_slab(s, page); + } + } + + if (!s->migrate) + /* + * No defrag method. By simply putting the zaplist at the + * end of the partial list we can let them simmer longer + * and thus increase the chance of all objects being + * reclaimed. + * + * We have effectively sorted the partial list and put + * the slabs with more objects first. As soon as they + * are allocated they are going to be removed from the + * partial list. + */ + list_splice(&move_list, n->partial.prev); + + + spin_unlock_irqrestore(&n->list_lock, flags); + + if (s->migrate && !list_empty(&move_list)) { + void **scratch = alloc_scratch(); + struct page *page; + struct page *page2; + + if (scratch) { + /* Try to remove / move the objects left */ + list_for_each_entry(page, &move_list, lru) { + if (page->inuse) + kmem_cache_move(page, scratch, target_node); + } + kfree(scratch); + } + + /* Inspect results and dispose of pages */ + spin_lock_irqsave(&n->list_lock, flags); + list_for_each_entry_safe(page, page2, &move_list, lru) { + slab_lock(page); + page->frozen = 0; + + if (page->inuse) { + /* + * Objects left in slab page move it to + * the tail of the partial list to + * increase the change that the freeing + * of the remaining objects will + * free the slab page + */ + n->nr_partial++; + list_add_tail(&n->partial, &page->lru); + slab_unlock(page); + + } else { + slab_unlock(page); + discard_slab(s, page); + } + } + spin_unlock_irqrestore(&n->list_lock, flags); + } +out: + return atomic_long_read(&n->nr_slabs); +} + +/* + * Defrag slabs conditional on the amount of fragmentation in a page. + */ +int kmem_cache_defrag(int node) +{ + struct kmem_cache *s; + unsigned long left = 0; + + /* + * kmem_cache_defrag may be called from the reclaim path which may be + * called for any page allocator alloc. So there is the danger that we + * get called in a situation where slub already acquired the slub_lock + * for other purposes. + */ + if (!mutex_trylock(&slab_mutex)) + return 0; + + list_for_each_entry(s, &slab_caches, list) { + /* + * Defragmentable caches come first. If the slab cache is not + * defragmentable then we can stop traversing the list. + */ + if (!s->migrate) + break; + + if (node == -1) { + int nid; + + for_each_node_state(nid, N_NORMAL_MEMORY) + if (s->node[nid]->nr_partial > MAX_PARTIAL) + left += __move(s, nid, nid, s->defrag_ratio); + } else + left += __move(s, node, node, 100); + + } + mutex_unlock(&slab_mutex); + return left; +} +EXPORT_SYMBOL(kmem_cache_defrag); + +/* + * kmem_cache_shrink reduces the memory footprint of a slab cache + * by as much as possible. This works by removing empty slabs from + * the partial list, migrating slab objects to denser slab pages + * (if the slab cache supports that) or reorganizing the partial + * list so that denser slab pages come first and less dense + * allocated slab pages are at the end. + */ +int __kmem_cache_shrink(struct kmem_cache *s) +{ + int node; + int left = 0; + + flush_all(s); + for_each_node_state(node, N_NORMAL_MEMORY) + left += __move(s, node, node, 100); + + return 0; +} +EXPORT_SYMBOL(__kmem_cache_shrink); + void kmem_cache_setup_mobility(struct kmem_cache *s, kmem_isolate_func isolate, kmem_migrate_func migrate) { int max_objects = oo_objects(s->max); /* - * Defragmentable slabs must have a ctor otherwise objects may be - * in an undetermined state after they are allocated. + * Mobile objects must have a ctor otherwise the + * object may be in an undefined state on allocation. + * + * Since the object may need to be inspected by the + * migration function at any time after allocation we + * musdt ensure that the object always has a defined + * state. */ BUG_ON(!s->ctor); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f71.google.com (mail-vk0-f71.google.com [209.85.213.71]) by kanga.kvack.org (Postfix) with ESMTP id 171686B0261 for ; Wed, 27 Dec 2017 17:11:48 -0500 (EST) Received: by mail-vk0-f71.google.com with SMTP id y16so10298931vkd.16 for ; Wed, 27 Dec 2017 14:11:48 -0800 (PST) Received: from resqmta-ch2-08v.sys.comcast.net (resqmta-ch2-08v.sys.comcast.net. [2001:558:fe21:29:69:252:207:40]) by mx.google.com with ESMTPS id y36si14271681uac.0.2017.12.27.14.11.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:11:47 -0800 (PST) Message-Id: <20171227220652.651198943@linux.com> Date: Wed, 27 Dec 2017 16:06:42 -0600 From: Christoph Lameter Subject: [RFC 6/8] slub: Extend slabinfo to support -D and -F options References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=extend_slabinfo Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig -F lists caches that support moving and defragmentation -C lists caches that use a ctor. Change field names for defrag_ratio and remote_node_defrag_ratio. Add determination of the allocation ratio for a slab. The allocation ratio is the percentage of available slots for objects in use. Signed-off-by: Christoph Lameter --- Documentation/vm/slabinfo.c | 48 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 43 insertions(+), 5 deletions(-) Index: linux/tools/vm/slabinfo.c =================================================================== --- linux.orig/tools/vm/slabinfo.c +++ linux/tools/vm/slabinfo.c @@ -33,6 +33,8 @@ struct slabinfo { int hwcache_align, object_size, objs_per_slab; int sanity_checks, slab_size, store_user, trace; int order, poison, reclaim_account, red_zone; + int movable, ctor; + int defrag_ratio, remote_node_defrag_ratio; unsigned long partial, objects, slabs, objects_partial, objects_total; unsigned long alloc_fastpath, alloc_slowpath; unsigned long free_fastpath, free_slowpath; @@ -67,6 +69,8 @@ int show_report; int show_alias; int show_slab; int skip_zero = 1; +int show_movable; +int show_ctor; int show_numa; int show_track; int show_first_alias; @@ -109,14 +113,16 @@ static void fatal(const char *x, ...) static void usage(void) { - printf("slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.\n\n" - "slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n" + printf("slabinfo 4/15/2017. (c) 2007 sgi/(c) 2011 Linux Foundation/(c) 2017 Jump Trading LLC.\n\n" + "slabinfo [-aCdDefFhnpvtsz] [-d debugopts] [slab-regexp]\n" "-a|--aliases Show aliases\n" "-A|--activity Most active slabs first\n" "-d|--debug= Set/Clear Debug options\n" + "-C|--ctor Show slabs with ctors\n" "-D|--display-active Switch line format to activity\n" "-e|--empty Show empty slabs\n" "-f|--first-alias Show first alias\n" + "-F|--movable Show caches that support movable objects\n" "-h|--help Show usage information\n" "-i|--inverted Inverted list\n" "-l|--slabs Show slabs\n" @@ -369,7 +375,7 @@ static void slab_numa(struct slabinfo *s return; if (!line) { - printf("\n%-21s:", mode ? "NUMA nodes" : "Slab"); + printf("\n%-21s: Rto ", mode ? "NUMA nodes" : "Slab"); for(node = 0; node <= highest_node; node++) printf(" %4d", node); printf("\n----------------------"); @@ -378,6 +384,7 @@ static void slab_numa(struct slabinfo *s printf("\n"); } printf("%-21s ", mode ? "All slabs" : s->name); + printf("%3d ", s->remote_node_defrag_ratio); for(node = 0; node <= highest_node; node++) { char b[20]; @@ -535,6 +542,8 @@ static void report(struct slabinfo *s) printf("** Slabs are destroyed via RCU\n"); if (s->reclaim_account) printf("** Reclaim accounting active\n"); + if (s->movable) + printf("** Defragmentation at %d%%\n", s->defrag_ratio); printf("\nSizes (bytes) Slabs Debug Memory\n"); printf("------------------------------------------------------------------------\n"); @@ -585,6 +594,12 @@ static void slabcache(struct slabinfo *s if (show_empty && s->slabs) return; + if (show_movable && !s->movable) + return; + + if (show_ctor && !s->ctor) + return; + if (sort_loss == 0) store_size(size_str, slab_size(s)); else @@ -599,6 +614,10 @@ static void slabcache(struct slabinfo *s *p++ = '*'; if (s->cache_dma) *p++ = 'd'; + if (s->movable) + *p++ = 'F'; + if (s->ctor) + *p++ = 'C'; if (s->hwcache_align) *p++ = 'A'; if (s->poison) @@ -633,7 +652,8 @@ static void slabcache(struct slabinfo *s printf("%-21s %8ld %7d %15s %14s %4d %1d %3ld %3ld %s\n", s->name, s->objects, s->object_size, size_str, dist_str, s->objs_per_slab, s->order, - s->slabs ? (s->partial * 100) / s->slabs : 100, + s->slabs ? (s->partial * 100) / + (s->slabs * s->objs_per_slab) : 100, s->slabs ? (s->objects * s->object_size * 100) / (s->slabs * (page_size << s->order)) : 100, flags); @@ -1252,7 +1272,17 @@ static void read_slab_dir(void) slab->cpu_partial_free = get_obj("cpu_partial_free"); slab->alloc_node_mismatch = get_obj("alloc_node_mismatch"); slab->deactivate_bypass = get_obj("deactivate_bypass"); + slab->defrag_ratio = get_obj("defrag_ratio"); + slab->remote_node_defrag_ratio = + get_obj("remote_node_defrag_ratio"); chdir(".."); + if (read_slab_obj(slab, "ops")) { + if (strstr(buffer, "ctor :")) + slab->ctor = 1; + if (strstr(buffer, "migrate :")) + slab->movable = 1; + } + if (slab->name[0] == ':') alias_targets++; slab++; @@ -1329,6 +1359,8 @@ static void xtotals(void) } struct option opts[] = { + { "ctor", no_argument, NULL, 'C' }, + { "movable", no_argument, NULL, 'F' }, { "aliases", no_argument, NULL, 'a' }, { "activity", no_argument, NULL, 'A' }, { "debug", optional_argument, NULL, 'd' }, @@ -1364,7 +1396,7 @@ int main(int argc, char *argv[]) page_size = getpagesize(); - while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU", + while ((c = getopt_long(argc, argv, "aACd::DefFhil1noprstvzTSN:LXBU", opts, NULL)) != -1) switch (c) { case '1': @@ -1420,6 +1452,12 @@ int main(int argc, char *argv[]) case 'z': skip_zero = 0; break; + case 'C': + show_ctor = 1; + break; + case 'F': + show_movable = 1; + break; case 'T': show_totals = 1; break; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id EFE966B0268 for ; Wed, 27 Dec 2017 17:11:49 -0500 (EST) Received: by mail-qt0-f199.google.com with SMTP id n31so30381428qtc.2 for ; Wed, 27 Dec 2017 14:11:49 -0800 (PST) Received: from resqmta-ch2-03v.sys.comcast.net (resqmta-ch2-03v.sys.comcast.net. [2001:558:fe21:29:69:252:207:35]) by mx.google.com with ESMTPS id k3si4478961qkd.362.2017.12.27.14.11.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 14:11:49 -0800 (PST) Message-Id: <20171227220652.804369136@linux.com> Date: Wed, 27 Dec 2017 16:06:44 -0600 From: Christoph Lameter Subject: [RFC 8/8] Add debugging output References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=debug Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig Useful to see whats going on. Signed-off-by: Christoph Lameter Index: linux/lib/xarray.c =================================================================== --- linux.orig/lib/xarray.c +++ linux/lib/xarray.c @@ -1583,11 +1583,13 @@ void xa_object_migrate(struct xa_node *n new_node = kmem_cache_alloc_node(radix_tree_node_cachep, GFP_KERNEL, numa_node); + printk(KERN_INFO "xa_object_migrate(%px, %d)\n", node, numa_node); xa_lock_irq(xa); /* Check again..... */ if (xa != node->array || !list_empty(&node->private_list)) { node = new_node; + printk(KERN_ERR "Skip temporary object\n"); goto unlock; } @@ -1606,6 +1608,7 @@ void xa_object_migrate(struct xa_node *n else slot = &xa_parent_locked(xa, new_node)->slots[new_node->offset]; rcu_assign_pointer(*slot, xa_mk_node(new_node)); + printk(KERN_ERR "Success\n"); unlock: xa_unlock_irq(xa); Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -4245,6 +4245,7 @@ static void kmem_cache_move(struct page unsigned long flags; unsigned long objects; + printk(KERN_ERR "kmem_cache_move in: page=%px inuse=%d\n", page, page->inuse); local_irq_save(flags); slab_lock(page); @@ -4267,6 +4268,7 @@ static void kmem_cache_move(struct page if (test_bit(slab_index(p, s, addr), map)) vector[count++] = p; + printk(KERN_ERR "Vector of %d items\n", count); if (s->isolate) private = s->isolate(s, vector, count); else @@ -4295,6 +4297,7 @@ static void kmem_cache_move(struct page * Perform callbacks to move the objects. */ s->migrate(s, vector, count, node, private); + printk(KERN_ERR "kmem_cache_move out: page=%px inuse=%d\n", page, page->inuse); } /* @@ -4312,6 +4315,7 @@ static unsigned long __move(struct kmem_ LIST_HEAD(move_list); struct kmem_cache_node *n = get_node(s, node); + printk(KERN_ERR "__move(%s, %d, %d, %d) migrate=%px\n", s->name, node, target_node, ratio, s->migrate); if (node == target_node && n->nr_partial <= 1) /* * Trying to reduce fragmentataion on a node but there is @@ -4322,9 +4326,16 @@ static unsigned long __move(struct kmem_ spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry_safe(page, page2, &n->partial, lru) { - if (!slab_trylock(page)) + printk(KERN_ERR "Slab page %px inuse=%d ", page, page->inuse); + if (page->inuse > 1000) { + printk("Page->inuse too high....\n"); + break; + } + if (!slab_trylock(page)) { + printk("Locked\n"); /* Busy slab. Get out of the way */ continue; + } if (page->inuse) { if (page->inuse > ratio * page->objects / 100) { @@ -4333,10 +4344,13 @@ static unsigned long __move(struct kmem_ * Skip slab because the object density * in the slab page is high enough */ + printk("Below ratio. Skipping\n"); continue; } list_move(&page->lru, &move_list); + printk("Added to list to move\n"); + if (s->migrate) { /* Remove page from being considered for allocations */ n->nr_partial--; @@ -4345,6 +4359,7 @@ static unsigned long __move(struct kmem_ slab_unlock(page); } else { /* Empty slab page */ + printk("Empty\n"); list_del(&page->lru); n->nr_partial--; slab_unlock(page); @@ -4374,11 +4389,17 @@ static unsigned long __move(struct kmem_ struct page *page; struct page *page2; + printk(KERN_ERR "Beginning to migrate pages\n"); if (scratch) { /* Try to remove / move the objects left */ list_for_each_entry(page, &move_list, lru) { - if (page->inuse) + if (page->inuse) { kmem_cache_move(page, scratch, target_node); + if (page->inuse > 1000) { + printk(KERN_ERR "Page corrupted. Abort\n"); + break; + } + } } kfree(scratch); } @@ -4404,9 +4425,11 @@ static unsigned long __move(struct kmem_ } else { slab_unlock(page); discard_slab(s, page); + printk(KERN_ERR "Freed one page %px\n", page); } } spin_unlock_irqrestore(&n->list_lock, flags); + printk(KERN_ERR "Finished migrating slab objects\n"); } out: return atomic_long_read(&n->nr_slabs); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id BA96E6B0033 for ; Thu, 28 Dec 2017 00:19:29 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id l20so8026306pgc.10 for ; Wed, 27 Dec 2017 21:19:29 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id q13si6439482pgc.706.2017.12.27.21.19.28 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 27 Dec 2017 21:19:28 -0800 (PST) Subject: Re: [RFC 0/8] Xarray object migration V1 References: <20171227220636.361857279@linux.com> From: Randy Dunlap Message-ID: Date: Wed, 27 Dec 2017 21:19:11 -0800 MIME-Version: 1.0 In-Reply-To: <20171227220636.361857279@linux.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter , Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On 12/27/2017 02:06 PM, Christoph Lameter wrote: > This is a patchset on top of Matthew Wilcox Xarray code and implements > object migration of xarray nodes. The migration is integrated into > the defragmetation and shrinking logic of the slab allocator. > > Defragmentation will ensure that all xarray slab pages have > less objects available than specified by the slab defrag ratio. > > Slab shrinking will create a slab cache with optimal object > density. Only one slab page will have available objects per node. > > To test apply this patchset on top of Matthew Wilcox Xarray code > from Dec 11th (See infradead github). linux-mm archive is missing patch 1/8 and so am I. https://marc.info/?l=linux-mm -- ~Randy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f198.google.com (mail-io0-f198.google.com [209.85.223.198]) by kanga.kvack.org (Postfix) with ESMTP id AE85F6B0033 for ; Thu, 28 Dec 2017 09:59:25 -0500 (EST) Received: by mail-io0-f198.google.com with SMTP id g81so33528194ioa.14 for ; Thu, 28 Dec 2017 06:59:25 -0800 (PST) Received: from resqmta-ch2-02v.sys.comcast.net (resqmta-ch2-02v.sys.comcast.net. [2001:558:fe21:29:69:252:207:34]) by mx.google.com with ESMTPS id g137si4859227ioe.172.2017.12.28.06.59.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Dec 2017 06:59:24 -0800 (PST) Date: Thu, 28 Dec 2017 08:57:21 -0600 (CST) From: Christopher Lameter Subject: Re: [RFC 0/8] Xarray object migration V1 In-Reply-To: Message-ID: References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Randy Dunlap Cc: Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Wed, 27 Dec 2017, Randy Dunlap wrote: > > To test apply this patchset on top of Matthew Wilcox Xarray code > > from Dec 11th (See infradead github). > > linux-mm archive is missing patch 1/8 and so am I. > > https://marc.info/?l=linux-mm Duh. How can you troubleshoot that one? First patch: Subject: slub: Replace ctor field with ops field in /sys/slab/* Create an ops field in /sys/slab/*/ops to contain all the callback operations defined for a slab cache. This will be used to display the additional callbacks that will be defined soon to enable defragmentation. Display the existing ctor callback in the ops fields contents. Signed-off-by: Christoph Lameter --- mm/slub.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -4959,13 +4959,18 @@ static ssize_t cpu_partial_store(struct } SLAB_ATTR(cpu_partial); -static ssize_t ctor_show(struct kmem_cache *s, char *buf) +static ssize_t ops_show(struct kmem_cache *s, char *buf) { + int x = 0; + if (!s->ctor) return 0; - return sprintf(buf, "%pS\n", s->ctor); + + if (s->ctor) + x += sprintf(buf + x, "ctor : %pS\n", s->ctor); + return x; } -SLAB_ATTR_RO(ctor); +SLAB_ATTR_RO(ops); static ssize_t aliases_show(struct kmem_cache *s, char *buf) { @@ -5377,7 +5382,7 @@ static struct attribute *slab_attrs[] = &objects_partial_attr.attr, &partial_attr.attr, &cpu_slabs_attr.attr, - &ctor_attr.attr, + &ops_attr.attr, &aliases_attr.attr, &align_attr.attr, &hwcache_align_attr.attr, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id CC6AC6B0033 for ; Thu, 28 Dec 2017 12:18:59 -0500 (EST) Received: by mail-pg0-f72.google.com with SMTP id a10so24049257pgq.3 for ; Thu, 28 Dec 2017 09:18:59 -0800 (PST) Message-ID: <1514481533.3040.6.camel@HansenPartnership.com> Subject: Re: [RFC 0/8] Xarray object migration V1 From: James Bottomley Date: Thu, 28 Dec 2017 09:18:53 -0800 In-Reply-To: References: <20171227220636.361857279@linux.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Christopher Lameter , Randy Dunlap Cc: Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig , Benjamin LaHaise On Thu, 2017-12-28 at 08:57 -0600, Christopher Lameter wrote: > On Wed, 27 Dec 2017, Randy Dunlap wrote: > > > > > > > > > To test apply this patchset on top of Matthew Wilcox Xarray code > > > from Dec 11th (See infradead github). > > > > linux-mm archive is missing patch 1/8 and so am I. > > > > https://marc.info/?l=linux-mm > > Duh. How can you troubleshoot that one? Well you can ask for expert help. A The mm list also ate one of my bug reports (although the followup made it). A This is the lost email: From: James Bottomley To: Linux Memory Management List Subject: Hang with v4.15-rc trying to swap back in Date: Wed, 27 Dec 2017 10:12:20 -0800 Message-Id: <1514398340.3986.10.camel@HansenPartnership.com> This is the accepting MTA line from postfix: Dec 27 10:12:23 bedivere postfix/smtp[15670]: CFB7E8EE190: to=, relay=aspmx.l.google.com[74.125.28.26]:25, delay=1.2, delays=0.09/0.03/0.6/0.42, dsn=2.0.0, status=sent (250 2.0.0 OK 1514398342 z21si24644492plo.126 - gsmtp) The one that made it is: Message-Id: <1514407817.4169.4.camel@HansenPartnership.com> I've cc'd Ben because I think the list is still on his systems. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 28 Dec 2017 12:33:51 -0500 From: Benjamin LaHaise Subject: Re: [RFC 0/8] Xarray object migration V1 Message-ID: <20171228173351.GK24310@kvack.org> References: <20171227220636.361857279@linux.com> <1514481533.3040.6.camel@HansenPartnership.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1514481533.3040.6.camel@HansenPartnership.com> Sender: owner-linux-mm@kvack.org List-ID: To: James Bottomley Cc: Christopher Lameter , Randy Dunlap , Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Thu, Dec 28, 2017 at 09:18:53AM -0800, James Bottomley wrote: ... > Well you can ask for expert help. The mm list also ate one of my bug > reports (although the followup made it). This is the lost email: > > From: James Bottomley > To: Linux Memory Management List > Subject: Hang with v4.15-rc trying to swap back in > Date: Wed, 27 Dec 2017 10:12:20 -0800 > Message-Id: <1514398340.3986.10.camel@HansenPartnership.com> ... > I've cc'd Ben because I think the list is still on his systems. ... Looks like Google's anti-spam service filtered it, so my system never even saw it. Not much I can do when that happens other than try to manually track it down. -ben -- "Thought is the essence of where you are now." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f70.google.com (mail-pl0-f70.google.com [209.85.160.70]) by kanga.kvack.org (Postfix) with ESMTP id 855D86B0253 for ; Thu, 28 Dec 2017 12:40:24 -0500 (EST) Received: by mail-pl0-f70.google.com with SMTP id z3so23732726pln.6 for ; Thu, 28 Dec 2017 09:40:24 -0800 (PST) Received: from bedivere.hansenpartnership.com (bedivere.hansenpartnership.com. [66.63.167.143]) by mx.google.com with ESMTPS id p3si27024755pld.717.2017.12.28.09.40.23 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 28 Dec 2017 09:40:23 -0800 (PST) Message-ID: <1514482820.3040.13.camel@HansenPartnership.com> Subject: Re: [RFC 0/8] Xarray object migration V1 From: James Bottomley Date: Thu, 28 Dec 2017 09:40:20 -0800 In-Reply-To: <20171228173351.GK24310@kvack.org> References: <20171227220636.361857279@linux.com> <1514481533.3040.6.camel@HansenPartnership.com> <20171228173351.GK24310@kvack.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Benjamin LaHaise Cc: Christopher Lameter , Randy Dunlap , Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Thu, 2017-12-28 at 12:33 -0500, Benjamin LaHaise wrote: > On Thu, Dec 28, 2017 at 09:18:53AM -0800, James Bottomley wrote: > ... > > > > Well you can ask for expert help. A The mm list also ate one of my > > bug reports (although the followup made it). A This is the lost > > email: > > > > From: James Bottomley > > > > To: Linux Memory Management List > > Subject: Hang with v4.15-rc trying to swap back in > > Date: Wed, 27 Dec 2017 10:12:20 -0800 > > Message-Id: <1514398340.3986.10.camel@HansenPartnership.com> > ... > > > > I've cc'd Ben because I think the list is still on his systems. > ... > > Looks like Google's anti-spam service filtered it, so my system never > even saw it.A A Not much I can do when that happens other than try to > manually track it down. I honestly don't think it's safe to host a public email list on google: their "spam" filter is eccentric to say the least and is far too willing to generate false positives for reasons no-one seems to be able to fix. A What about moving the list to reliable infrastructure, like vger? James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 28 Dec 2017 14:17:48 -0500 From: Benjamin LaHaise Subject: Re: [RFC 0/8] Xarray object migration V1 Message-ID: <20171228191748.GO24310@kvack.org> References: <20171227220636.361857279@linux.com> <1514481533.3040.6.camel@HansenPartnership.com> <20171228173351.GK24310@kvack.org> <1514482820.3040.13.camel@HansenPartnership.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1514482820.3040.13.camel@HansenPartnership.com> Sender: owner-linux-mm@kvack.org List-ID: To: James Bottomley Cc: Christopher Lameter , Randy Dunlap , Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Thu, Dec 28, 2017 at 09:40:20AM -0800, James Bottomley wrote: > On Thu, 2017-12-28 at 12:33 -0500, Benjamin LaHaise wrote: > > On Thu, Dec 28, 2017 at 09:18:53AM -0800, James Bottomley wrote: > > ... > > > > > > Well you can ask for expert help. The mm list also ate one of my > > > bug reports (although the followup made it). This is the lost > > > email: > > > > > > From: James Bottomley > > > > > > To: Linux Memory Management List > > > Subject: Hang with v4.15-rc trying to swap back in > > > Date: Wed, 27 Dec 2017 10:12:20 -0800 > > > Message-Id: <1514398340.3986.10.camel@HansenPartnership.com> > > ... > > > > > > I've cc'd Ben because I think the list is still on his systems. > > ... > > > > Looks like Google's anti-spam service filtered it, so my system never > > even saw it. Not much I can do when that happens other than try to > > manually track it down. > > I honestly don't think it's safe to host a public email list on google: > their "spam" filter is eccentric to say the least and is far too > willing to generate false positives for reasons no-one seems to be able > to fix. What about moving the list to reliable infrastructure, like > vger? The list is not hosted on Google - Google's anti-spam service is only used for ingress filtering. Spamassassin is not a "Good Enough" option these days given the volume and nature of spam that comes into kvack.org. If you can point me at a better anti-spam solution that actually works (and is not RBL based), I'd be happy to try it since the Google service is absolutely awful with pretty much no way to get a human to fix obviously broken things. -ben > James > > -- "Thought is the essence of where you are now." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 3B6156B0033 for ; Thu, 28 Dec 2017 15:01:02 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id r8so5070806pgq.1 for ; Thu, 28 Dec 2017 12:01:02 -0800 (PST) Received: from bedivere.hansenpartnership.com (bedivere.hansenpartnership.com. [66.63.167.143]) by mx.google.com with ESMTPS id g1si27287003pfk.52.2017.12.28.12.01.00 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 28 Dec 2017 12:01:00 -0800 (PST) Message-ID: <1514491258.3040.28.camel@HansenPartnership.com> Subject: Re: [RFC 0/8] Xarray object migration V1 From: James Bottomley Date: Thu, 28 Dec 2017 12:00:58 -0800 In-Reply-To: <20171228191748.GO24310@kvack.org> References: <20171227220636.361857279@linux.com> <1514481533.3040.6.camel@HansenPartnership.com> <20171228173351.GK24310@kvack.org> <1514482820.3040.13.camel@HansenPartnership.com> <20171228191748.GO24310@kvack.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Benjamin LaHaise Cc: Christopher Lameter , Randy Dunlap , Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Thu, 2017-12-28 at 14:17 -0500, Benjamin LaHaise wrote: > On Thu, Dec 28, 2017 at 09:40:20AM -0800, James Bottomley wrote: > > > > On Thu, 2017-12-28 at 12:33 -0500, Benjamin LaHaise wrote: > > > > > > On Thu, Dec 28, 2017 at 09:18:53AM -0800, James Bottomley wrote: > > > ... > > > > > > > > > > > > Well you can ask for expert help. A The mm list also ate one of > > > > my bug reports (although the followup made it). A This is the > > > > lost email: > > > > > > > > From: James Bottomley > > > .com > > > > > > > > > > > > > > To: Linux Memory Management List > > > > Subject: Hang with v4.15-rc trying to swap back in > > > > Date: Wed, 27 Dec 2017 10:12:20 -0800 > > > > Message-Id: <1514398340.3986.10.camel@HansenPartnership. > > > > com> > > > ... > > > > > > > > > > > > I've cc'd Ben because I think the list is still on his systems. > > > ... > > > > > > Looks like Google's anti-spam service filtered it, so my system > > > never even saw it.A A Not much I can do when that happens other > > > than try to manually track it down. > > > > I honestly don't think it's safe to host a public email list on > > google: their "spam" filter is eccentric to say the least and is > > far too willing to generate false positives for reasons no-one > > seems to be able to fix. A What about moving the list to reliable > > infrastructure, like vger? > > The list is not hosted on Google - Google's anti-spam service is only > used for ingress filtering. OK, but that is the problem: you're relying on google infrastructure for a service it does incredibly poorly. > Spamassassin is not a "Good Enough" option these days given the > volume and nature of spam that comes into kvack.org.A A If you can > point me at a better anti-spam solution that actually works (and is > not RBL based), I'd be happy to try it since the Google service is > absolutely awful with pretty much no way to get a human to fix > obviously broken things. Well, to be honest, I find spamassassin to be incredibly useful (it's what I use), especially being rules and points based instead of absolute (meaning I can use the RBL but not rely on it). A It hasn't given me a false positive on anything for over a year and its false negative rate is about 2% with my current configuration. However, I think the best solution is to use vger ... it already has an efficient ingress filter and it doesn't rely on google, so it doesn't suffer the arbitrary mail loss problem of google. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 28 Dec 2017 15:33:52 -0500 From: Benjamin LaHaise Subject: Re: [RFC 0/8] Xarray object migration V1 Message-ID: <20171228203352.GP24310@kvack.org> References: <20171227220636.361857279@linux.com> <1514481533.3040.6.camel@HansenPartnership.com> <20171228173351.GK24310@kvack.org> <1514482820.3040.13.camel@HansenPartnership.com> <20171228191748.GO24310@kvack.org> <1514491258.3040.28.camel@HansenPartnership.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1514491258.3040.28.camel@HansenPartnership.com> Sender: owner-linux-mm@kvack.org List-ID: To: James Bottomley Cc: Christopher Lameter , Randy Dunlap , Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Thu, Dec 28, 2017 at 12:00:58PM -0800, James Bottomley wrote: ... > > The list is not hosted on Google - Google's anti-spam service is only > > used for ingress filtering. > > OK, but that is the problem: you're relying on google infrastructure > for a service it does incredibly poorly. I did say that I am open to changing how spam filtering is done. > > Spamassassin is not a "Good Enough" option these days given the > > volume and nature of spam that comes into kvack.org. If you can > > point me at a better anti-spam solution that actually works (and is > > not RBL based), I'd be happy to try it since the Google service is > > absolutely awful with pretty much no way to get a human to fix > > obviously broken things. > > Well, to be honest, I find spamassassin to be incredibly useful (it's > what I use), especially being rules and points based instead of > absolute (meaning I can use the RBL but not rely on it). It hasn't > given me a false positive on anything for over a year and its false > negative rate is about 2% with my current configuration. False negative rate last time I used spam assassin was way more than 10-15%, and it mostly failed to filter out the phishing scams which tend to be the bigger problem of late. That isn't so much of a mailing list concern, but it is a significant issue for user accounts. > However, I think the best solution is to use vger ... it already has an > efficient ingress filter and it doesn't rely on google, so it doesn't > suffer the arbitrary mail loss problem of google. This is the first time anyone has complained to me about messages being filtered in a number of years, and I have added whitelists for people over the years that had problems ending up on various blacklists. I don't have time these days to actively scan mailing lists for issues, so reporting them on-list without directly Cc'ing me will not get my attention. I'll look into options over the next few days and see if there are any better solutions available now than the last time I looked at the spam problem. Please give me at least a little a bit of time to look into possible fixes before going nuclear. -ben > James > > -- "Thought is the essence of where you are now." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 8F3F96B0038 for ; Thu, 28 Dec 2017 17:24:35 -0500 (EST) Received: by mail-oi0-f72.google.com with SMTP id y195so8211724oia.22 for ; Thu, 28 Dec 2017 14:24:35 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id d70si3861312oig.310.2017.12.28.14.24.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Dec 2017 14:24:34 -0800 (PST) Date: Fri, 29 Dec 2017 09:24:20 +1100 From: Dave Chinner Subject: Re: [RFC 0/8] Xarray object migration V1 Message-ID: <20171228222419.GQ1871@rh> References: <20171227220636.361857279@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171227220636.361857279@linux.com> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Christoph Hellwig On Wed, Dec 27, 2017 at 04:06:36PM -0600, Christoph Lameter wrote: > This is a patchset on top of Matthew Wilcox Xarray code and implements > object migration of xarray nodes. The migration is integrated into > the defragmetation and shrinking logic of the slab allocator. ..... > This is only possible for xarray for now but it would be worthwhile > to extend this to dentries and inodes. Christoph, you keep saying this is the goal, but I'm yet to see a solution proposed for the atomic replacement of all the pointers to an inode from external objects. An inode that has no active references still has an awful lot of passive and internal references that need to be dealt with. e.g. racing page operations accessing mapping->host, the inode in various lists (e.g. superblock inode list, writeback lists, etc), the inode lookup cache(s), backpointers from LSMs, fsnotify marks, crypto information, internal filesystem pointers (e.g. log items, journal handles, buffer references, etc) and so on. And each filesystem has a different set of passive references, too. Oh, and I haven't even mentioned deadlocks yet, either. :P IOWs, just saying "it would be worthwhile to extend this to dentries and inodes" completely misrepresents the sheer complexity of doing so. We've known that atomic replacement is the big problem for defragging inodes and dentries since this work was started, what, more than 10 years? And while there's been many revisions of the core defrag code since then, there has been no credible solution presented for atomic replacement of objects with complex external references. This is a show-stopper for inode/dentry slab defrag, and I don't see that this new patchset is any different... Cheers, Dave. -- Dave Chinner dchinner@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 3E7CC6B0033 for ; Thu, 28 Dec 2017 19:19:18 -0500 (EST) Received: by mail-it0-f72.google.com with SMTP id z142so24900130itc.6 for ; Thu, 28 Dec 2017 16:19:18 -0800 (PST) Received: from resqmta-ch2-03v.sys.comcast.net (resqmta-ch2-03v.sys.comcast.net. [2001:558:fe21:29:69:252:207:35]) by mx.google.com with ESMTPS id f11si17474317ite.9.2017.12.28.16.19.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Dec 2017 16:19:17 -0800 (PST) Date: Thu, 28 Dec 2017 18:19:15 -0600 (CST) From: Christopher Lameter Subject: Re: [RFC 0/8] Xarray object migration V1 In-Reply-To: <20171228222419.GQ1871@rh> Message-ID: References: <20171227220636.361857279@linux.com> <20171228222419.GQ1871@rh> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner Cc: Matthew Wilcox , linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Christoph Hellwig On Fri, 29 Dec 2017, Dave Chinner wrote: > IOWs, just saying "it would be worthwhile to extend this to dentries > and inodes" completely misrepresents the sheer complexity of doing > so. We've known that atomic replacement is the big problem for > defragging inodes and dentries since this work was started, what, > more than 10 years? And while there's been many revisions of the > core defrag code since then, there has been no credible solution > presented for atomic replacement of objects with complex external > references. This is a show-stopper for inode/dentry slab defrag, and > I don't see that this new patchset is any different... Well this is a chance here to start an implementation since the radix tree is being reworked anyways. This is not dealing with dentries and inodes but it brings in the basic infrastructure into the slab allocators that can then be used to add other slab caches. Same warnings were given to me when we did page migration and it languished for 5 years. I have not had time to really focus on memory management issues since I left SGI about 9 years ago but it seems that I may now have the chance in 2018 to put a significant amount of time into making some progress. Large memory in servers has become a significant problem for my employer and the ability to allocate and manage contiguous memory blocks is essential to preserve performance and avoid constant reboot. So I will be looking for ways to address these issues. Maybe with a couple of approaches. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id CE1646B0069 for ; Sat, 30 Dec 2017 01:21:00 -0500 (EST) Received: by mail-pl0-f71.google.com with SMTP id g33so26061221plb.13 for ; Fri, 29 Dec 2017 22:21:00 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id y20si26332133pgv.291.2017.12.29.22.20.59 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 29 Dec 2017 22:20:59 -0800 (PST) Date: Fri, 29 Dec 2017 22:20:52 -0800 From: Matthew Wilcox Subject: Re: [RFC 2/8] slub: Add defrag_ratio field and sysfs support Message-ID: <20171230062052.GB27959@bombadil.infradead.org> References: <20171227220636.361857279@linux.com> <20171227220652.322991754@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171227220652.322991754@linux.com> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Wed, Dec 27, 2017 at 04:06:38PM -0600, Christoph Lameter wrote: > +++ linux/Documentation/ABI/testing/sysfs-kernel-slab > @@ -180,6 +180,19 @@ Description: > list. It can be written to clear the current count. > Available when CONFIG_SLUB_STATS is enabled. > > +What: /sys/kernel/slab/cache/defrag_ratio > +Date: December 2017 > +KernelVersion: 4.16 > +Contact: Christoph Lameter > + Pekka Enberg , > +Description: > + The defrag_ratio files allows the control of how agressive > + slab fragmentation reduction works at reclaiming objects from > + sparsely populated slabs. This is a percentage. If a slab > + has more than this percentage of available object then reclaim > + will attempt to reclaim objects so that the whole slab > + page can be freed. The default is 30%. > + > What: /sys/kernel/slab/cache/deactivate_to_tail > Date: February 2008 > KernelVersion: 2.6.25 Should this documentation mention it's SLUB-only? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id EFC616B0069 for ; Sat, 30 Dec 2017 01:42:49 -0500 (EST) Received: by mail-pl0-f69.google.com with SMTP id 33so26070845pll.9 for ; Fri, 29 Dec 2017 22:42:49 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id s76si7838826pgc.768.2017.12.29.22.42.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 29 Dec 2017 22:42:48 -0800 (PST) Date: Fri, 29 Dec 2017 22:42:46 -0800 From: Matthew Wilcox Subject: Re: [RFC 3/8] slub: Add isolate() and migrate() methods Message-ID: <20171230064246.GC27959@bombadil.infradead.org> References: <20171227220636.361857279@linux.com> <20171227220652.402842142@linux.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171227220652.402842142@linux.com> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Wed, Dec 27, 2017 at 04:06:39PM -0600, Christoph Lameter wrote: > @@ -98,6 +98,9 @@ struct kmem_cache { > gfp_t allocflags; /* gfp flags to use on each alloc */ > int refcount; /* Refcount for slab cache destroy */ > void (*ctor)(void *); > + kmem_isolate_func *isolate; > + kmem_migrate_func *migrate; > + > int inuse; /* Offset to metadata */ > int align; /* Alignment */ > int reserved; /* Reserved bytes at the end of slabs */ [...] > +/* > + * kmem_cache_setup_mobility() is used to setup callbacks for a slab cache. > + */ > +#ifdef CONFIG_SLUB > +void kmem_cache_setup_mobility(struct kmem_cache *, kmem_isolate_func, > + kmem_migrate_func); > +#else > +static inline void kmem_cache_setup_mobility(struct kmem_cache *s, > + kmem_isolate_func isolate, kmem_migrate_func migrate) {} > +#endif Is this the right approach? I could imagine there being more ops in the future. I suspect we should bite the bullet now and do: struct kmem_cache_operations { void (*ctor)(void *); void *(*isolate)(struct kmem_cache *, void **objs, int nr); void (*migrate)(struct kmem_cache *, void **objs, int nr, int node, void *private); }; Not sure how best to convert the existing constructor users to this scheme. Perhaps cheat ... - void (*ctor)(void *); + union { + void (*ctor)(void *); + const struct kmem_cache_operations *ops; + }; and use a slab flag to tell you which to use. > @@ -4969,6 +4987,20 @@ static ssize_t ops_show(struct kmem_cach > > if (s->ctor) > x += sprintf(buf + x, "ctor : %pS\n", s->ctor); > + > + if (s->isolate) { > + x += sprintf(buf + x, "isolate : "); > + x += sprint_symbol(buf + x, > + (unsigned long)s->isolate); > + x += sprintf(buf + x, "\n"); > + } Here you could print the symbol of the ops vector instead of the function pointer ... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id B605F6B0289 for ; Mon, 1 Jan 2018 16:20:46 -0500 (EST) Received: by mail-pf0-f197.google.com with SMTP id p89so21369007pfk.5 for ; Mon, 01 Jan 2018 13:20:46 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id f89si32914580plb.110.2018.01.01.13.20.45 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 01 Jan 2018 13:20:45 -0800 (PST) Date: Mon, 1 Jan 2018 13:20:39 -0800 From: Matthew Wilcox Subject: Re: [RFC 3/8] slub: Add isolate() and migrate() methods Message-ID: <20180101212039.GA13116@bombadil.infradead.org> References: <20171227220636.361857279@linux.com> <20171227220652.402842142@linux.com> <20171230064246.GC27959@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171230064246.GC27959@bombadil.infradead.org> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Fri, Dec 29, 2017 at 10:42:46PM -0800, Matthew Wilcox wrote: > Is this the right approach? I could imagine there being more ops in > the future. I suspect we should bite the bullet now and do: I thought of a cute additional slab operation we could define, print(). We could do something like this ... struct page *page = virt_to_head_page(ptr); if (!PageSlab(page)) return false; slab = page->slab_cache; if (!(slab->flags & SLAB_FLAGS_OPS) || !slab->ops->print) return false; slab->ops->print(ptr); return true; and get nice debugging output like we have for VM_BUG_ON_PAGE, only for any type that's implemented a slab operations vec. Of course, this won't replace VM_BUG_ON_PAGE because struct pages aren't slab-allocated (but could we pretend they are?) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f70.google.com (mail-it0-f70.google.com [209.85.214.70]) by kanga.kvack.org (Postfix) with ESMTP id 1B0B96B02B2 for ; Tue, 2 Jan 2018 09:53:37 -0500 (EST) Received: by mail-it0-f70.google.com with SMTP id w125so33804872itf.0 for ; Tue, 02 Jan 2018 06:53:37 -0800 (PST) Received: from resqmta-ch2-03v.sys.comcast.net (resqmta-ch2-03v.sys.comcast.net. [2001:558:fe21:29:69:252:207:35]) by mx.google.com with ESMTPS id p127si5941381iop.174.2018.01.02.06.53.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Jan 2018 06:53:36 -0800 (PST) Date: Tue, 2 Jan 2018 08:53:34 -0600 (CST) From: Christopher Lameter Subject: Re: [RFC 2/8] slub: Add defrag_ratio field and sysfs support In-Reply-To: <20171230062052.GB27959@bombadil.infradead.org> Message-ID: References: <20171227220636.361857279@linux.com> <20171227220652.322991754@linux.com> <20171230062052.GB27959@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Fri, 29 Dec 2017, Matthew Wilcox wrote: > > What: /sys/kernel/slab/cache/deactivate_to_tail > > Date: February 2008 > > KernelVersion: 2.6.25 > > Should this documentation mention it's SLUB-only? It could but /sys/kernel/slab is only supported for SLUB at this point. Sysfs handling should move into slab_common.c though long terms so that it works for any allocator. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id 79C286B02B4 for ; Tue, 2 Jan 2018 09:56:18 -0500 (EST) Received: by mail-io0-f199.google.com with SMTP id a2so20996237ioc.12 for ; Tue, 02 Jan 2018 06:56:18 -0800 (PST) Received: from resqmta-ch2-06v.sys.comcast.net (resqmta-ch2-06v.sys.comcast.net. [2001:558:fe21:29:69:252:207:38]) by mx.google.com with ESMTPS id n138si23688650itb.16.2018.01.02.06.56.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Jan 2018 06:56:17 -0800 (PST) Date: Tue, 2 Jan 2018 08:56:15 -0600 (CST) From: Christopher Lameter Subject: Re: [RFC 3/8] slub: Add isolate() and migrate() methods In-Reply-To: <20180101212039.GA13116@bombadil.infradead.org> Message-ID: References: <20171227220636.361857279@linux.com> <20171227220652.402842142@linux.com> <20171230064246.GC27959@bombadil.infradead.org> <20180101212039.GA13116@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Mon, 1 Jan 2018, Matthew Wilcox wrote: > I thought of a cute additional slab operation we could define, print(). > We could do something like this ... > > struct page *page = virt_to_head_page(ptr); > if (!PageSlab(page)) > return false; > slab = page->slab_cache; > if (!(slab->flags & SLAB_FLAGS_OPS) || !slab->ops->print) > return false; > slab->ops->print(ptr); > return true; > > and get nice debugging output like we have for VM_BUG_ON_PAGE, only > for any type that's implemented a slab operations vec. Of course, this > won't replace VM_BUG_ON_PAGE because struct pages aren't slab-allocated > (but could we pretend they are?) Cute... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id E562D6B02B6 for ; Tue, 2 Jan 2018 09:58:17 -0500 (EST) Received: by mail-qt0-f199.google.com with SMTP id b26so34774811qtb.18 for ; Tue, 02 Jan 2018 06:58:17 -0800 (PST) Received: from resqmta-ch2-10v.sys.comcast.net (resqmta-ch2-10v.sys.comcast.net. [2001:558:fe21:29:69:252:207:42]) by mx.google.com with ESMTPS id k3si5626745qkd.362.2018.01.02.06.58.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Jan 2018 06:58:17 -0800 (PST) Date: Tue, 2 Jan 2018 08:55:45 -0600 (CST) From: Christopher Lameter Subject: Re: [RFC 3/8] slub: Add isolate() and migrate() methods In-Reply-To: <20171230064246.GC27959@bombadil.infradead.org> Message-ID: References: <20171227220636.361857279@linux.com> <20171227220652.402842142@linux.com> <20171230064246.GC27959@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: linux-mm@kvack.org, Pekka Enberg , akpm@linux-foundation.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Dave Chinner , Christoph Hellwig On Fri, 29 Dec 2017, Matthew Wilcox wrote: > Is this the right approach? I could imagine there being more ops in > the future. I suspect we should bite the bullet now and do: > > struct kmem_cache_operations { > void (*ctor)(void *); > void *(*isolate)(struct kmem_cache *, void **objs, int nr); > void (*migrate)(struct kmem_cache *, void **objs, int nr, int node, > void *private); > }; Well yes but that would mean converting the existing call sites. > Not sure how best to convert the existing constructor users to this scheme. > Perhaps cheat ... One of the prior releases of slab defragmentation did this. We could do it at some point. For now the approach avoids changing the API. > > @@ -4969,6 +4987,20 @@ static ssize_t ops_show(struct kmem_cach > > > > if (s->ctor) > > x += sprintf(buf + x, "ctor : %pS\n", s->ctor); > > + > > + if (s->isolate) { > > + x += sprintf(buf + x, "isolate : "); > > + x += sprint_symbol(buf + x, > > + (unsigned long)s->isolate); > > + x += sprintf(buf + x, "\n"); > > + } > > Here you could print the symbol of the ops vector instead of the function > pointer ... Well yes if we had it and thne we could avoid printing individual fields. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org