* [PATCH -mm 0/3] slub: make dead caches discard free slabs immediately @ 2015-01-26 12:55 Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails Vladimir Davydov ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 12:55 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel Hi, The kmem extension of the memory cgroup is almost usable now. There is, in fact, the only serious issue left: per memcg kmem caches may pin the owner cgroup for indefinitely long. This is, because a slab cache may keep empty slab pages in its private structures to optimize performance, while we take a css reference per each charged kmem page. The issue is only relevant to SLUB, because SLAB periodically reaps empty slabs. This patch set fixes this issue for SLUB. For details, please see patch 3. Thanks, Vladimir Davydov (3): slub: don't fail kmem_cache_shrink if slab placement optimization fails slab: zap kmem_cache_shrink return value slub: make dead caches discard free slabs immediately include/linux/slab.h | 2 +- mm/slab.c | 9 +++++++-- mm/slab.h | 2 +- mm/slab_common.c | 21 +++++++++++++------- mm/slob.c | 3 +-- mm/slub.c | 53 +++++++++++++++++++++++++++++++++++--------------- 6 files changed, 61 insertions(+), 29 deletions(-) -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 12:55 [PATCH -mm 0/3] slub: make dead caches discard free slabs immediately Vladimir Davydov @ 2015-01-26 12:55 ` Vladimir Davydov 2015-01-26 15:48 ` Christoph Lameter 2015-01-26 12:55 ` [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately Vladimir Davydov 2 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 12:55 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel SLUB's kmem_cache_shrink not only removes empty slabs from the cache, but also sorts slabs by the number of objects in-use to cope with fragmentation. To achieve that, it tries to allocate a temporary array. If it fails, it will abort the whole procedure. This is unacceptable for kmemcg, where we want to be sure that all empty slabs are removed from the cache on memcg offline, so let's just skip the slab placement optimization step if the allocation fails, but still get rid of empty slabs. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> --- mm/slub.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 5ed1a73e2ec8..770bea3ed445 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3376,12 +3376,19 @@ int __kmem_cache_shrink(struct kmem_cache *s) struct page *page; struct page *t; int objects = oo_objects(s->max); + struct list_head empty_slabs; struct list_head *slabs_by_inuse = kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); unsigned long flags; - if (!slabs_by_inuse) - return -ENOMEM; + if (!slabs_by_inuse) { + /* + * Do not abort if we failed to allocate a temporary array. + * Just skip the slab placement optimization then. + */ + slabs_by_inuse = &empty_slabs; + objects = 1; + } flush_all(s); for_each_kmem_cache_node(s, node, n) { @@ -3400,7 +3407,9 @@ int __kmem_cache_shrink(struct kmem_cache *s) * list_lock. page->inuse here is the upper limit. */ list_for_each_entry_safe(page, t, &n->partial, lru) { - list_move(&page->lru, slabs_by_inuse + page->inuse); + if (page->inuse < objects) + list_move(&page->lru, + slabs_by_inuse + page->inuse); if (!page->inuse) n->nr_partial--; } @@ -3419,7 +3428,8 @@ int __kmem_cache_shrink(struct kmem_cache *s) discard_slab(s, page); } - kfree(slabs_by_inuse); + if (slabs_by_inuse != &empty_slabs) + kfree(slabs_by_inuse); return 0; } -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 12:55 ` [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails Vladimir Davydov @ 2015-01-26 15:48 ` Christoph Lameter 2015-01-26 17:01 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 15:48 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > SLUB's kmem_cache_shrink not only removes empty slabs from the cache, > but also sorts slabs by the number of objects in-use to cope with > fragmentation. To achieve that, it tries to allocate a temporary array. > If it fails, it will abort the whole procedure. I do not think its worth optimizing this. If we cannot allocate even a small object then the system is in an extremely bad state anyways. > @@ -3400,7 +3407,9 @@ int __kmem_cache_shrink(struct kmem_cache *s) > * list_lock. page->inuse here is the upper limit. > */ > list_for_each_entry_safe(page, t, &n->partial, lru) { > - list_move(&page->lru, slabs_by_inuse + page->inuse); > + if (page->inuse < objects) > + list_move(&page->lru, > + slabs_by_inuse + page->inuse); > if (!page->inuse) > n->nr_partial--; > } The condition is always true. A page that has page->inuse == objects would not be on the partial list. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 15:48 ` Christoph Lameter @ 2015-01-26 17:01 ` Vladimir Davydov 2015-01-26 18:24 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 17:01 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel Hi Christoph, On Mon, Jan 26, 2015 at 09:48:00AM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > SLUB's kmem_cache_shrink not only removes empty slabs from the cache, > > but also sorts slabs by the number of objects in-use to cope with > > fragmentation. To achieve that, it tries to allocate a temporary array. > > If it fails, it will abort the whole procedure. > > I do not think its worth optimizing this. If we cannot allocate even a > small object then the system is in an extremely bad state anyways. Hmm, I've just checked my /proc/slabinfo and seen that I have 512 objects per slab at max, so that the temporary array will be 2 pages at max. So you're right - this kmalloc will never fail on my system, simply because we never fail GFP_KERNEL allocations of order < 3. However, theoretically we can have as much as MAX_OBJS_PER_PAGE=32767 objects per slab, which would result in a huge allocation. Anyways, I think that silently relying on the fact that the allocator never fails small allocations is kind of unreliable. What if this behavior will change one day? So I'd prefer to either make kmem_cache_shrink fall back to using a variable on stack in case of the kmalloc failure, like this patch does, or place an explicit BUG_ON after it. The latter looks dangerous to me, because, as I mentioned above, I'm not sure that we always have less than 2048 objects per slab. > > > @@ -3400,7 +3407,9 @@ int __kmem_cache_shrink(struct kmem_cache *s) > > * list_lock. page->inuse here is the upper limit. > > */ > > list_for_each_entry_safe(page, t, &n->partial, lru) { > > - list_move(&page->lru, slabs_by_inuse + page->inuse); > > + if (page->inuse < objects) > > + list_move(&page->lru, > > + slabs_by_inuse + page->inuse); > > if (!page->inuse) > > n->nr_partial--; > > } > > The condition is always true. A page that has page->inuse == objects > would not be on the partial list. > This is in case we failed to allocate the slabs_by_inuse array. We only have a list for empty slabs then (on stack). Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 17:01 ` Vladimir Davydov @ 2015-01-26 18:24 ` Christoph Lameter 2015-01-26 19:36 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 18:24 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > Anyways, I think that silently relying on the fact that the allocator > never fails small allocations is kind of unreliable. What if this We are not doing that though. If the allocation fails we do nothing. > > > + if (page->inuse < objects) > > > + list_move(&page->lru, > > > + slabs_by_inuse + page->inuse); > > > if (!page->inuse) > > > n->nr_partial--; > > > } > > > > The condition is always true. A page that has page->inuse == objects > > would not be on the partial list. > > > > This is in case we failed to allocate the slabs_by_inuse array. We only > have a list for empty slabs then (on stack). Ok in that case objects == 1. If you want to do this maybe do it in a more general way? You could allocate an array on the stack to deal with the common cases. I believe an array of 32 objects would be fine to allocate and cover most of the slab caches on the system? Would eliminate most of the allocations in kmem_cache_shrink. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 18:24 ` Christoph Lameter @ 2015-01-26 19:36 ` Vladimir Davydov 2015-01-26 19:53 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 19:36 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 12:24:49PM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > Anyways, I think that silently relying on the fact that the allocator > > never fails small allocations is kind of unreliable. What if this > > We are not doing that though. If the allocation fails we do nothing. Yeah, that's correct, but memcg/kmem wants it to always free empty slabs (see patch 3 for details), so I'm trying to be punctual and eliminate any possibility of failure, because a failure (if it ever happened) would result in a permanent memory leak (pinned mem_cgroup + its kmem_caches). > > > > > + if (page->inuse < objects) > > > > + list_move(&page->lru, > > > > + slabs_by_inuse + page->inuse); > > > > if (!page->inuse) > > > > n->nr_partial--; > > > > } > > > > > > The condition is always true. A page that has page->inuse == objects > > > would not be on the partial list. > > > > > > > This is in case we failed to allocate the slabs_by_inuse array. We only > > have a list for empty slabs then (on stack). > > Ok in that case objects == 1. If you want to do this maybe do it in a more > general way? > > You could allocate an array on the stack to deal with the common cases. I > believe an array of 32 objects would be fine to allocate and cover most of > the slab caches on the system? Would eliminate most of the allocations in > kmem_cache_shrink. We could do that, but IMO that would only complicate the code w/o yielding any real benefits. This function is slow and called rarely anyway, so I don't think there is any point to optimize out a page allocation here. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 19:36 ` Vladimir Davydov @ 2015-01-26 19:53 ` Christoph Lameter 2015-01-27 12:58 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 19:53 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > We could do that, but IMO that would only complicate the code w/o > yielding any real benefits. This function is slow and called rarely > anyway, so I don't think there is any point to optimize out a page > allocation here. I think you already have the code there. Simply allow the sizeing of the empty_page[] array. And rename it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-26 19:53 ` Christoph Lameter @ 2015-01-27 12:58 ` Vladimir Davydov 2015-01-27 17:02 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-27 12:58 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 01:53:32PM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > We could do that, but IMO that would only complicate the code w/o > > yielding any real benefits. This function is slow and called rarely > > anyway, so I don't think there is any point to optimize out a page > > allocation here. > > I think you already have the code there. Simply allow the sizeing of the > empty_page[] array. And rename it. > May be, we could remove this allocation at all then? I mean, always distribute slabs among constant number of buckets, say 32, like this: diff --git a/mm/slub.c b/mm/slub.c index 5ed1a73e2ec8..a43b213770b4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3358,6 +3358,8 @@ void kfree(const void *x) } EXPORT_SYMBOL(kfree); +#define SHRINK_BUCKETS 32 + /* * kmem_cache_shrink removes empty slabs from the partial lists and sorts * the remaining slabs by the number of items in use. The slabs with the @@ -3376,19 +3378,15 @@ int __kmem_cache_shrink(struct kmem_cache *s) struct page *page; struct page *t; int objects = oo_objects(s->max); - struct list_head *slabs_by_inuse = - kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); + struct list_head slabs_by_inuse[SHRINK_BUCKETS]; unsigned long flags; - if (!slabs_by_inuse) - return -ENOMEM; - flush_all(s); for_each_kmem_cache_node(s, node, n) { if (!n->nr_partial) continue; - for (i = 0; i < objects; i++) + for (i = 0; i < SHRINK_BUCKETS; i++) INIT_LIST_HEAD(slabs_by_inuse + i); spin_lock_irqsave(&n->list_lock, flags); @@ -3400,7 +3398,9 @@ int __kmem_cache_shrink(struct kmem_cache *s) * list_lock. page->inuse here is the upper limit. */ list_for_each_entry_safe(page, t, &n->partial, lru) { - list_move(&page->lru, slabs_by_inuse + page->inuse); + i = DIV_ROUND_UP(page->inuse * (SHRINK_BUCKETS - 1), + objects); + list_move(&page->lru, slabs_by_inuse + i); if (!page->inuse) n->nr_partial--; } @@ -3409,7 +3409,7 @@ int __kmem_cache_shrink(struct kmem_cache *s) * Rebuild the partial list with the slabs filled up most * first and the least used slabs at the end. */ - for (i = objects - 1; i > 0; i--) + for (i = SHRINK_BUCKETS - 1; i > 0; i--) list_splice(slabs_by_inuse + i, n->partial.prev); spin_unlock_irqrestore(&n->list_lock, flags); @@ -3419,7 +3419,6 @@ int __kmem_cache_shrink(struct kmem_cache *s) discard_slab(s, page); } - kfree(slabs_by_inuse); return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-27 12:58 ` Vladimir Davydov @ 2015-01-27 17:02 ` Christoph Lameter 2015-01-28 15:00 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-27 17:02 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Tue, 27 Jan 2015, Vladimir Davydov wrote: > May be, we could remove this allocation at all then? I mean, always > distribute slabs among constant number of buckets, say 32, like this: The point of the sorting is to have the slab pages that only have a few objects available at the beginning of the list. Allocations can then easily reduce the size of hte partial page list. What you could do is simply put all slab pages with more than 32 objects available at the end of the list. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails 2015-01-27 17:02 ` Christoph Lameter @ 2015-01-28 15:00 ` Vladimir Davydov 0 siblings, 0 replies; 24+ messages in thread From: Vladimir Davydov @ 2015-01-28 15:00 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Tue, Jan 27, 2015 at 11:02:12AM -0600, Christoph Lameter wrote: > What you could do is simply put all slab pages with more than 32 objects > available at the end of the list. OK, got it, will redo. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 12:55 [PATCH -mm 0/3] slub: make dead caches discard free slabs immediately Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails Vladimir Davydov @ 2015-01-26 12:55 ` Vladimir Davydov 2015-01-26 15:49 ` Christoph Lameter 2015-01-26 12:55 ` [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately Vladimir Davydov 2 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 12:55 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel The kmem_cache_shrink() return value is inconsistent: for SLAB it returns 0 iff the cache is empty, while for SLUB and SLOB it always returns 0. So let's zap it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> --- include/linux/slab.h | 2 +- mm/slab.c | 9 +++++++-- mm/slab.h | 2 +- mm/slab_common.c | 8 ++------ mm/slob.c | 3 +-- mm/slub.c | 12 ++++-------- 6 files changed, 16 insertions(+), 20 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index ed2ffaab59ea..18430ed916b1 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -116,7 +116,7 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t, unsigned long, void (*)(void *)); void kmem_cache_destroy(struct kmem_cache *); -int kmem_cache_shrink(struct kmem_cache *); +void kmem_cache_shrink(struct kmem_cache *); void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); void memcg_deactivate_kmem_caches(struct mem_cgroup *); diff --git a/mm/slab.c b/mm/slab.c index 7894017bc160..279c44d6d8e1 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2382,7 +2382,7 @@ out: return nr_freed; } -int __kmem_cache_shrink(struct kmem_cache *cachep) +static int __cache_shrink(struct kmem_cache *cachep) { int ret = 0; int node; @@ -2400,11 +2400,16 @@ int __kmem_cache_shrink(struct kmem_cache *cachep) return (ret ? 1 : 0); } +void __kmem_cache_shrink(struct kmem_cache *cachep) +{ + __cache_shrink(cachep); +} + int __kmem_cache_shutdown(struct kmem_cache *cachep) { int i; struct kmem_cache_node *n; - int rc = __kmem_cache_shrink(cachep); + int rc = __cache_shrink(cachep); if (rc) return rc; diff --git a/mm/slab.h b/mm/slab.h index 0a56d76ac0e9..c036e520d2cf 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -138,7 +138,7 @@ static inline unsigned long kmem_cache_flags(unsigned long object_size, #define CACHE_CREATE_MASK (SLAB_CORE_FLAGS | SLAB_DEBUG_FLAGS | SLAB_CACHE_FLAGS) int __kmem_cache_shutdown(struct kmem_cache *); -int __kmem_cache_shrink(struct kmem_cache *); +void __kmem_cache_shrink(struct kmem_cache *); void slab_kmem_cache_release(struct kmem_cache *); struct seq_file; diff --git a/mm/slab_common.c b/mm/slab_common.c index 0dd9eb4e0f87..6803639fdff0 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -641,18 +641,14 @@ EXPORT_SYMBOL(kmem_cache_destroy); * @cachep: The cache to shrink. * * Releases as many slabs as possible for a cache. - * To help debugging, a zero exit status indicates all slabs were released. */ -int kmem_cache_shrink(struct kmem_cache *cachep) +void kmem_cache_shrink(struct kmem_cache *cachep) { - int ret; - get_online_cpus(); get_online_mems(); - ret = __kmem_cache_shrink(cachep); + __kmem_cache_shrink(cachep); put_online_mems(); put_online_cpus(); - return ret; } EXPORT_SYMBOL(kmem_cache_shrink); diff --git a/mm/slob.c b/mm/slob.c index 96a86206a26b..043a14b6ccbe 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -618,9 +618,8 @@ int __kmem_cache_shutdown(struct kmem_cache *c) return 0; } -int __kmem_cache_shrink(struct kmem_cache *d) +void __kmem_cache_shrink(struct kmem_cache *c) { - return 0; } struct kmem_cache kmem_cache_boot = { diff --git a/mm/slub.c b/mm/slub.c index 770bea3ed445..c09d93dde40e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3368,7 +3368,7 @@ EXPORT_SYMBOL(kfree); * being allocated from last increasing the chance that the last objects * are freed in them. */ -int __kmem_cache_shrink(struct kmem_cache *s) +void __kmem_cache_shrink(struct kmem_cache *s) { int node; int i; @@ -3430,7 +3430,6 @@ int __kmem_cache_shrink(struct kmem_cache *s) if (slabs_by_inuse != &empty_slabs) kfree(slabs_by_inuse); - return 0; } static int slab_mem_going_offline_callback(void *arg) @@ -4696,12 +4695,9 @@ static ssize_t shrink_show(struct kmem_cache *s, char *buf) static ssize_t shrink_store(struct kmem_cache *s, const char *buf, size_t length) { - if (buf[0] == '1') { - int rc = kmem_cache_shrink(s); - - if (rc) - return rc; - } else + if (buf[0] == '1') + kmem_cache_shrink(s); + else return -EINVAL; return length; } -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 12:55 ` [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value Vladimir Davydov @ 2015-01-26 15:49 ` Christoph Lameter 2015-01-26 17:04 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 15:49 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > @@ -2400,11 +2400,16 @@ int __kmem_cache_shrink(struct kmem_cache *cachep) > return (ret ? 1 : 0); > } > > +void __kmem_cache_shrink(struct kmem_cache *cachep) > +{ > + __cache_shrink(cachep); > +} > + Why do we need this wrapper? Rename __cache_shrink to __kmem_cache_shrink instead? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 15:49 ` Christoph Lameter @ 2015-01-26 17:04 ` Vladimir Davydov 2015-01-26 18:26 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 17:04 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 09:49:47AM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > @@ -2400,11 +2400,16 @@ int __kmem_cache_shrink(struct kmem_cache *cachep) > > return (ret ? 1 : 0); > > } > > > > +void __kmem_cache_shrink(struct kmem_cache *cachep) > > +{ > > + __cache_shrink(cachep); > > +} > > + > > Why do we need this wrapper? Rename __cache_shrink to __kmem_cache_shrink > instead? > __cache_shrink() is used not only in __kmem_cache_shrink(), but also in SLAB's __kmem_cache_shutdown(), where we do need its return value to check if the cache is empty. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 17:04 ` Vladimir Davydov @ 2015-01-26 18:26 ` Christoph Lameter 2015-01-26 19:48 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 18:26 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > __cache_shrink() is used not only in __kmem_cache_shrink(), but also in > SLAB's __kmem_cache_shutdown(), where we do need its return value to > check if the cache is empty. It could be useful to know if a slab is empty. So maybe leave kmem_cache_shrink the way it is and instead fix up slub to return the proper value? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 18:26 ` Christoph Lameter @ 2015-01-26 19:48 ` Vladimir Davydov 2015-01-26 19:55 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 19:48 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 12:26:57PM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > __cache_shrink() is used not only in __kmem_cache_shrink(), but also in > > SLAB's __kmem_cache_shutdown(), where we do need its return value to > > check if the cache is empty. > > It could be useful to know if a slab is empty. So maybe leave > kmem_cache_shrink the way it is and instead fix up slub to return the > proper value? Hmm, why? The return value has existed since this function was introduced, but nobody seems to have ever used it outside the slab core. Besides, this check is racy, so IMO we shouldn't encourage users of the API to rely on it. That said, I believe we should drop the return value for now. If anybody ever needs it, we can reintroduce it. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 19:48 ` Vladimir Davydov @ 2015-01-26 19:55 ` Christoph Lameter 2015-01-26 20:16 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 19:55 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > Hmm, why? The return value has existed since this function was > introduced, but nobody seems to have ever used it outside the slab core. > Besides, this check is racy, so IMO we shouldn't encourage users of the > API to rely on it. That said, I believe we should drop the return value > for now. If anybody ever needs it, we can reintroduce it. The check is only racy if you have concurrent users. It is not racy if a subsystem shuts down access to the slabs and then checks if everything is clean before closing the cache. Slab creation and destruction are not serialized. It is the responsibility of the subsystem to make sure that there are no concurrent users and that there are no objects remaining before destroying a slab. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 19:55 ` Christoph Lameter @ 2015-01-26 20:16 ` Vladimir Davydov 2015-01-26 20:28 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 20:16 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 01:55:14PM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > Hmm, why? The return value has existed since this function was > > introduced, but nobody seems to have ever used it outside the slab core. > > Besides, this check is racy, so IMO we shouldn't encourage users of the > > API to rely on it. That said, I believe we should drop the return value > > for now. If anybody ever needs it, we can reintroduce it. > > The check is only racy if you have concurrent users. It is not racy if a > subsystem shuts down access to the slabs and then checks if everything is > clean before closing the cache. > > Slab creation and destruction are not serialized. It is the responsibility > of the subsystem to make sure that there are no concurrent users and that > there are no objects remaining before destroying a slab. Right, but I just don't see why a subsystem using a kmem_cache would need to check whether there are any objects left in the cache. I mean, it should somehow keep track of the objects it's allocated anyway, e.g. by linking them in a list. That means it must already have a way to check if it is safe to destroy its cache or not. Suppose we leave the return value as is. A subsystem, right before going to destroy a cache, calls kmem_cache_shrink, which returns 1 (slab is not empty). What is it supposed to do then? Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 20:16 ` Vladimir Davydov @ 2015-01-26 20:28 ` Christoph Lameter 2015-01-26 20:43 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2015-01-26 20:28 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, 26 Jan 2015, Vladimir Davydov wrote: > Right, but I just don't see why a subsystem using a kmem_cache would > need to check whether there are any objects left in the cache. I mean, > it should somehow keep track of the objects it's allocated anyway, e.g. > by linking them in a list. That means it must already have a way to > check if it is safe to destroy its cache or not. The acpi subsystem did that at some point. > Suppose we leave the return value as is. A subsystem, right before going > to destroy a cache, calls kmem_cache_shrink, which returns 1 (slab is > not empty). What is it supposed to do then? That is up to the subsystem. If it has a means of tracking down the missing object then it can deal with it. If not then it cannot shutdown the cache and do a proper recovery action. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value 2015-01-26 20:28 ` Christoph Lameter @ 2015-01-26 20:43 ` Vladimir Davydov 0 siblings, 0 replies; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 20:43 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 02:28:33PM -0600, Christoph Lameter wrote: > On Mon, 26 Jan 2015, Vladimir Davydov wrote: > > > Right, but I just don't see why a subsystem using a kmem_cache would > > need to check whether there are any objects left in the cache. I mean, > > it should somehow keep track of the objects it's allocated anyway, e.g. > > by linking them in a list. That means it must already have a way to > > check if it is safe to destroy its cache or not. > > The acpi subsystem did that at some point. > > > Suppose we leave the return value as is. A subsystem, right before going > > to destroy a cache, calls kmem_cache_shrink, which returns 1 (slab is > > not empty). What is it supposed to do then? > > That is up to the subsystem. If it has a means of tracking down the > missing object then it can deal with it. If not then it cannot shutdown > the cache and do a proper recovery action. Hmm, we could make kmem_cache_destroy return EBUSY for the purpose. However, since it spits warnings on failure, which is reasonable, we have this check in kmem_cache_shrink... Anyways, I see your point now, thank you for pointing it out. I will fix SLUB's __kmem_cache_shrink retval instead of removing it altogether in the next iteration. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately 2015-01-26 12:55 [PATCH -mm 0/3] slub: make dead caches discard free slabs immediately Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value Vladimir Davydov @ 2015-01-26 12:55 ` Vladimir Davydov 2015-01-27 8:00 ` Joonsoo Kim 2 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-26 12:55 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel To speed up further allocations SLUB may store empty slabs in per cpu/node partial lists instead of freeing them immediately. This prevents per memcg caches destruction, because kmem caches created for a memory cgroup are only destroyed after the last page charged to the cgroup is freed. To fix this issue, this patch resurrects approach first proposed in [1]. It forbids SLUB to cache empty slabs after the memory cgroup that the cache belongs to was destroyed. It is achieved by setting kmem_cache's cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so that it would drop frozen empty slabs immediately if cpu_partial = 0. The runtime overhead is minimal. From all the hot functions, we only touch relatively cold put_cpu_partial(): we make it call unfreeze_partials() after freezing a slab that belongs to an offline memory cgroup. Since slab freezing exists to avoid moving slabs from/to a partial list on free/alloc, and there can't be allocations from dead caches, it shouldn't cause any overhead. We do have to disable preemption for put_cpu_partial() to achieve that though. The original patch was accepted well and even merged to the mm tree. However, I decided to withdraw it due to changes happening to the memcg core at that time. I had an idea of introducing per-memcg shrinkers for kmem caches, but now, as memcg has finally settled down, I do not see it as an option, because SLUB shrinker would be too costly to call since SLUB does not keep free slabs on a separate list. Besides, we currently do not even call per-memcg shrinkers for offline memcgs. Overall, it would introduce much more complexity to both SLUB and memcg than this small patch. Regarding to SLAB, there's no problem with it, because it shrinks per-cpu/node caches periodically. Thanks to list_lru reparenting, we no longer keep entries for offline cgroups in per-memcg arrays (such as memcg_cache_params->memcg_caches), so we do not have to bother if a per-memcg cache will be shrunk a bit later than it could be. [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650 Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> --- mm/slab.c | 2 +- mm/slab.h | 2 +- mm/slab_common.c | 15 +++++++++++++-- mm/slob.c | 2 +- mm/slub.c | 25 ++++++++++++++++++++----- 5 files changed, 36 insertions(+), 10 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 279c44d6d8e1..f0514df07b85 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2400,7 +2400,7 @@ static int __cache_shrink(struct kmem_cache *cachep) return (ret ? 1 : 0); } -void __kmem_cache_shrink(struct kmem_cache *cachep) +void __kmem_cache_shrink(struct kmem_cache *cachep, bool deactivate) { __cache_shrink(cachep); } diff --git a/mm/slab.h b/mm/slab.h index c036e520d2cf..041260197984 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -138,7 +138,7 @@ static inline unsigned long kmem_cache_flags(unsigned long object_size, #define CACHE_CREATE_MASK (SLAB_CORE_FLAGS | SLAB_DEBUG_FLAGS | SLAB_CACHE_FLAGS) int __kmem_cache_shutdown(struct kmem_cache *); -void __kmem_cache_shrink(struct kmem_cache *); +void __kmem_cache_shrink(struct kmem_cache *, bool); void slab_kmem_cache_release(struct kmem_cache *); struct seq_file; diff --git a/mm/slab_common.c b/mm/slab_common.c index 6803639fdff0..472ab7fcffd4 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -549,10 +549,13 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg) { int idx; struct memcg_cache_array *arr; - struct kmem_cache *s; + struct kmem_cache *s, *c; idx = memcg_cache_id(memcg); + get_online_cpus(); + get_online_mems(); + mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) { if (!is_root_cache(s)) @@ -560,9 +563,17 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg) arr = rcu_dereference_protected(s->memcg_params.memcg_caches, lockdep_is_held(&slab_mutex)); + c = arr->entries[idx]; + if (!c) + continue; + + __kmem_cache_shrink(c, true); arr->entries[idx] = NULL; } mutex_unlock(&slab_mutex); + + put_online_mems(); + put_online_cpus(); } void memcg_destroy_kmem_caches(struct mem_cgroup *memcg) @@ -646,7 +657,7 @@ void kmem_cache_shrink(struct kmem_cache *cachep) { get_online_cpus(); get_online_mems(); - __kmem_cache_shrink(cachep); + __kmem_cache_shrink(cachep, false); put_online_mems(); put_online_cpus(); } diff --git a/mm/slob.c b/mm/slob.c index 043a14b6ccbe..e63ff9d926dc 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -618,7 +618,7 @@ int __kmem_cache_shutdown(struct kmem_cache *c) return 0; } -void __kmem_cache_shrink(struct kmem_cache *c) +void __kmem_cache_shrink(struct kmem_cache *c, bool deactivate) { } diff --git a/mm/slub.c b/mm/slub.c index c09d93dde40e..6f57824af019 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2007,6 +2007,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) int pages; int pobjects; + preempt_disable(); do { pages = 0; pobjects = 0; @@ -2040,6 +2041,14 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage); + if (unlikely(!s->cpu_partial)) { + unsigned long flags; + + local_irq_save(flags); + unfreeze_partials(s, this_cpu_ptr(s->cpu_slab)); + local_irq_restore(flags); + } + preempt_enable(); #endif } @@ -3368,7 +3377,7 @@ EXPORT_SYMBOL(kfree); * being allocated from last increasing the chance that the last objects * are freed in them. */ -void __kmem_cache_shrink(struct kmem_cache *s) +void __kmem_cache_shrink(struct kmem_cache *s, bool deactivate) { int node; int i; @@ -3381,6 +3390,15 @@ void __kmem_cache_shrink(struct kmem_cache *s) kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); unsigned long flags; + if (deactivate) { + /* + * Disable empty slabs caching. Used to avoid pinning offline + * memory cgroups by freeable kmem pages. + */ + s->cpu_partial = 0; + s->min_partial = 0; + } + if (!slabs_by_inuse) { /* * Do not abort if we failed to allocate a temporary array. @@ -3392,9 +3410,6 @@ void __kmem_cache_shrink(struct kmem_cache *s) flush_all(s); for_each_kmem_cache_node(s, node, n) { - if (!n->nr_partial) - continue; - for (i = 0; i < objects; i++) INIT_LIST_HEAD(slabs_by_inuse + i); @@ -3438,7 +3453,7 @@ static int slab_mem_going_offline_callback(void *arg) mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) - __kmem_cache_shrink(s); + __kmem_cache_shrink(s, false); mutex_unlock(&slab_mutex); return 0; -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately 2015-01-26 12:55 ` [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately Vladimir Davydov @ 2015-01-27 8:00 ` Joonsoo Kim 2015-01-27 8:23 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Joonsoo Kim @ 2015-01-27 8:00 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel On Mon, Jan 26, 2015 at 03:55:29PM +0300, Vladimir Davydov wrote: > To speed up further allocations SLUB may store empty slabs in per > cpu/node partial lists instead of freeing them immediately. This > prevents per memcg caches destruction, because kmem caches created for a > memory cgroup are only destroyed after the last page charged to the > cgroup is freed. > > To fix this issue, this patch resurrects approach first proposed in [1]. > It forbids SLUB to cache empty slabs after the memory cgroup that the > cache belongs to was destroyed. It is achieved by setting kmem_cache's > cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() > so that it would drop frozen empty slabs immediately if cpu_partial = 0. > > The runtime overhead is minimal. From all the hot functions, we only > touch relatively cold put_cpu_partial(): we make it call > unfreeze_partials() after freezing a slab that belongs to an offline > memory cgroup. Since slab freezing exists to avoid moving slabs from/to > a partial list on free/alloc, and there can't be allocations from dead > caches, it shouldn't cause any overhead. We do have to disable > preemption for put_cpu_partial() to achieve that though. > > The original patch was accepted well and even merged to the mm tree. > However, I decided to withdraw it due to changes happening to the memcg > core at that time. I had an idea of introducing per-memcg shrinkers for > kmem caches, but now, as memcg has finally settled down, I do not see it > as an option, because SLUB shrinker would be too costly to call since > SLUB does not keep free slabs on a separate list. Besides, we currently > do not even call per-memcg shrinkers for offline memcgs. Overall, it > would introduce much more complexity to both SLUB and memcg than this > small patch. > > Regarding to SLAB, there's no problem with it, because it shrinks > per-cpu/node caches periodically. Thanks to list_lru reparenting, we no > longer keep entries for offline cgroups in per-memcg arrays (such as > memcg_cache_params->memcg_caches), so we do not have to bother if a > per-memcg cache will be shrunk a bit later than it could be. > > [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650 > > Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> > --- > mm/slab.c | 2 +- > mm/slab.h | 2 +- > mm/slab_common.c | 15 +++++++++++++-- > mm/slob.c | 2 +- > mm/slub.c | 25 ++++++++++++++++++++----- > 5 files changed, 36 insertions(+), 10 deletions(-) > > diff --git a/mm/slab.c b/mm/slab.c > index 279c44d6d8e1..f0514df07b85 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -2400,7 +2400,7 @@ static int __cache_shrink(struct kmem_cache *cachep) > return (ret ? 1 : 0); > } > > -void __kmem_cache_shrink(struct kmem_cache *cachep) > +void __kmem_cache_shrink(struct kmem_cache *cachep, bool deactivate) > { > __cache_shrink(cachep); > } > diff --git a/mm/slab.h b/mm/slab.h > index c036e520d2cf..041260197984 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -138,7 +138,7 @@ static inline unsigned long kmem_cache_flags(unsigned long object_size, > #define CACHE_CREATE_MASK (SLAB_CORE_FLAGS | SLAB_DEBUG_FLAGS | SLAB_CACHE_FLAGS) > > int __kmem_cache_shutdown(struct kmem_cache *); > -void __kmem_cache_shrink(struct kmem_cache *); > +void __kmem_cache_shrink(struct kmem_cache *, bool); > void slab_kmem_cache_release(struct kmem_cache *); > > struct seq_file; > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 6803639fdff0..472ab7fcffd4 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -549,10 +549,13 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg) > { > int idx; > struct memcg_cache_array *arr; > - struct kmem_cache *s; > + struct kmem_cache *s, *c; > > idx = memcg_cache_id(memcg); > > + get_online_cpus(); > + get_online_mems(); > + > mutex_lock(&slab_mutex); > list_for_each_entry(s, &slab_caches, list) { > if (!is_root_cache(s)) > @@ -560,9 +563,17 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg) > > arr = rcu_dereference_protected(s->memcg_params.memcg_caches, > lockdep_is_held(&slab_mutex)); > + c = arr->entries[idx]; > + if (!c) > + continue; > + > + __kmem_cache_shrink(c, true); > arr->entries[idx] = NULL; > } > mutex_unlock(&slab_mutex); > + > + put_online_mems(); > + put_online_cpus(); > } > > void memcg_destroy_kmem_caches(struct mem_cgroup *memcg) > @@ -646,7 +657,7 @@ void kmem_cache_shrink(struct kmem_cache *cachep) > { > get_online_cpus(); > get_online_mems(); > - __kmem_cache_shrink(cachep); > + __kmem_cache_shrink(cachep, false); > put_online_mems(); > put_online_cpus(); > } > diff --git a/mm/slob.c b/mm/slob.c > index 043a14b6ccbe..e63ff9d926dc 100644 > --- a/mm/slob.c > +++ b/mm/slob.c > @@ -618,7 +618,7 @@ int __kmem_cache_shutdown(struct kmem_cache *c) > return 0; > } > > -void __kmem_cache_shrink(struct kmem_cache *c) > +void __kmem_cache_shrink(struct kmem_cache *c, bool deactivate) > { > } > > diff --git a/mm/slub.c b/mm/slub.c > index c09d93dde40e..6f57824af019 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2007,6 +2007,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) > int pages; > int pobjects; > > + preempt_disable(); > do { > pages = 0; > pobjects = 0; > @@ -2040,6 +2041,14 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) > > } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) > != oldpage); > + if (unlikely(!s->cpu_partial)) { > + unsigned long flags; > + > + local_irq_save(flags); > + unfreeze_partials(s, this_cpu_ptr(s->cpu_slab)); > + local_irq_restore(flags); > + } > + preempt_enable(); > #endif > } > > @@ -3368,7 +3377,7 @@ EXPORT_SYMBOL(kfree); > * being allocated from last increasing the chance that the last objects > * are freed in them. > */ > -void __kmem_cache_shrink(struct kmem_cache *s) > +void __kmem_cache_shrink(struct kmem_cache *s, bool deactivate) > { > int node; > int i; > @@ -3381,6 +3390,15 @@ void __kmem_cache_shrink(struct kmem_cache *s) > kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); > unsigned long flags; > > + if (deactivate) { > + /* > + * Disable empty slabs caching. Used to avoid pinning offline > + * memory cgroups by freeable kmem pages. > + */ > + s->cpu_partial = 0; > + s->min_partial = 0; > + } > + Hello, Maybe, kick_all_cpus_sync() is needed here since object would be freed asynchronously so they can't see this updated value. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately 2015-01-27 8:00 ` Joonsoo Kim @ 2015-01-27 8:23 ` Vladimir Davydov 2015-01-27 9:21 ` Joonsoo Kim 0 siblings, 1 reply; 24+ messages in thread From: Vladimir Davydov @ 2015-01-27 8:23 UTC (permalink / raw) To: Joonsoo Kim Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes, Johannes Weiner, Michal Hocko, linux-mm, linux-kernel Hi Joonsoo, On Tue, Jan 27, 2015 at 05:00:09PM +0900, Joonsoo Kim wrote: > On Mon, Jan 26, 2015 at 03:55:29PM +0300, Vladimir Davydov wrote: > > @@ -3381,6 +3390,15 @@ void __kmem_cache_shrink(struct kmem_cache *s) > > kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); > > unsigned long flags; > > > > + if (deactivate) { > > + /* > > + * Disable empty slabs caching. Used to avoid pinning offline > > + * memory cgroups by freeable kmem pages. > > + */ > > + s->cpu_partial = 0; > > + s->min_partial = 0; > > + } > > + > > Maybe, kick_all_cpus_sync() is needed here since object would > be freed asynchronously so they can't see this updated value. I thought flush_all() should do the trick, no? Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately 2015-01-27 8:23 ` Vladimir Davydov @ 2015-01-27 9:21 ` Joonsoo Kim 2015-01-27 9:28 ` Vladimir Davydov 0 siblings, 1 reply; 24+ messages in thread From: Joonsoo Kim @ 2015-01-27 9:21 UTC (permalink / raw) To: Vladimir Davydov Cc: Joonsoo Kim, Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes, Johannes Weiner, Michal Hocko, Linux Memory Management List, LKML 2015-01-27 17:23 GMT+09:00 Vladimir Davydov <vdavydov@parallels.com>: > Hi Joonsoo, > > On Tue, Jan 27, 2015 at 05:00:09PM +0900, Joonsoo Kim wrote: >> On Mon, Jan 26, 2015 at 03:55:29PM +0300, Vladimir Davydov wrote: >> > @@ -3381,6 +3390,15 @@ void __kmem_cache_shrink(struct kmem_cache *s) >> > kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); >> > unsigned long flags; >> > >> > + if (deactivate) { >> > + /* >> > + * Disable empty slabs caching. Used to avoid pinning offline >> > + * memory cgroups by freeable kmem pages. >> > + */ >> > + s->cpu_partial = 0; >> > + s->min_partial = 0; >> > + } >> > + >> >> Maybe, kick_all_cpus_sync() is needed here since object would >> be freed asynchronously so they can't see this updated value. > > I thought flush_all() should do the trick, no? Unfortunately, it doesn't. flush_all() sends IPI to not all cpus. It only sends IPI to cpus where some conditions are met and freeing could occur on the other ones. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately 2015-01-27 9:21 ` Joonsoo Kim @ 2015-01-27 9:28 ` Vladimir Davydov 0 siblings, 0 replies; 24+ messages in thread From: Vladimir Davydov @ 2015-01-27 9:28 UTC (permalink / raw) To: Joonsoo Kim Cc: Joonsoo Kim, Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes, Johannes Weiner, Michal Hocko, Linux Memory Management List, LKML On Tue, Jan 27, 2015 at 06:21:14PM +0900, Joonsoo Kim wrote: > 2015-01-27 17:23 GMT+09:00 Vladimir Davydov <vdavydov@parallels.com>: > > Hi Joonsoo, > > > > On Tue, Jan 27, 2015 at 05:00:09PM +0900, Joonsoo Kim wrote: > >> On Mon, Jan 26, 2015 at 03:55:29PM +0300, Vladimir Davydov wrote: > >> > @@ -3381,6 +3390,15 @@ void __kmem_cache_shrink(struct kmem_cache *s) > >> > kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); > >> > unsigned long flags; > >> > > >> > + if (deactivate) { > >> > + /* > >> > + * Disable empty slabs caching. Used to avoid pinning offline > >> > + * memory cgroups by freeable kmem pages. > >> > + */ > >> > + s->cpu_partial = 0; > >> > + s->min_partial = 0; > >> > + } > >> > + > >> > >> Maybe, kick_all_cpus_sync() is needed here since object would > >> be freed asynchronously so they can't see this updated value. > > > > I thought flush_all() should do the trick, no? > > Unfortunately, it doesn't. > > flush_all() sends IPI to not all cpus. It only sends IPI to cpus where > some conditions > are met and freeing could occur on the other ones. Oh, true, missed that. Yeah, we should kick all cpus explicitly then. Will fix in the next iteration. Thanks for catching this! Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-01-28 15:00 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-26 12:55 [PATCH -mm 0/3] slub: make dead caches discard free slabs immediately Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 1/3] slub: don't fail kmem_cache_shrink if slab placement optimization fails Vladimir Davydov 2015-01-26 15:48 ` Christoph Lameter 2015-01-26 17:01 ` Vladimir Davydov 2015-01-26 18:24 ` Christoph Lameter 2015-01-26 19:36 ` Vladimir Davydov 2015-01-26 19:53 ` Christoph Lameter 2015-01-27 12:58 ` Vladimir Davydov 2015-01-27 17:02 ` Christoph Lameter 2015-01-28 15:00 ` Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 2/3] slab: zap kmem_cache_shrink return value Vladimir Davydov 2015-01-26 15:49 ` Christoph Lameter 2015-01-26 17:04 ` Vladimir Davydov 2015-01-26 18:26 ` Christoph Lameter 2015-01-26 19:48 ` Vladimir Davydov 2015-01-26 19:55 ` Christoph Lameter 2015-01-26 20:16 ` Vladimir Davydov 2015-01-26 20:28 ` Christoph Lameter 2015-01-26 20:43 ` Vladimir Davydov 2015-01-26 12:55 ` [PATCH -mm 3/3] slub: make dead caches discard free slabs immediately Vladimir Davydov 2015-01-27 8:00 ` Joonsoo Kim 2015-01-27 8:23 ` Vladimir Davydov 2015-01-27 9:21 ` Joonsoo Kim 2015-01-27 9:28 ` Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).