From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by kanga.kvack.org (Postfix) with ESMTP id 026796B0035 for ; Tue, 1 Apr 2014 03:38:54 -0400 (EDT) Received: by mail-lb0-f169.google.com with SMTP id q8so6620061lbi.14 for ; Tue, 01 Apr 2014 00:38:54 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id le2si10216912lbc.61.2014.04.01.00.38.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Apr 2014 00:38:53 -0700 (PDT) From: Vladimir Davydov Subject: [PATCH -mm v2 0/2] cleanup kmemcg charging (was: "kmemcg: get rid of __GFP_KMEMCG") Date: Tue, 1 Apr 2014 11:38:43 +0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, gthelen@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Hi, Currently we charge kmem to memcg in alloc_pages if __GFP_KMEMCG is passed. However, since there are only a few places where we actually want to charge kmem, we could call kmemcg charge function explicitly instead. That would remove all kmemcg-related stuff from the general allocation path and make all kmem charges easier to follow. So let's charge kmem explicitly where we want it to be charged (slab, threadinfo) and remove __GFP_KMEMCG. Changes in v2: - use static key optimization in memcg_(un)charge_slab to avoid any overhead if kmemcg is not used; - introduce helper functions, alloc/free_kmem_pages, which charge newly allocated pages to kmemcg, to avoid code duplication; - do not remove accounting of kmalloc_large allocations (as discussed in the comments to v1). v1 can be found at lkml.org/lkml/2014/3/26/228 Thanks, Vladimir Davydov (2): sl[au]b: charge slabs to kmemcg explicitly mm: get rid of __GFP_KMEMCG include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 17 ++++-------- include/linux/slab.h | 11 -------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/memcontrol.c | 4 +-- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab.c | 7 ++++- mm/slab.h | 29 ++++++++++++++++++++ mm/slab_common.c | 18 +++++++++---- mm/slub.c | 30 ++++++++++++++------- 12 files changed, 119 insertions(+), 72 deletions(-) -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by kanga.kvack.org (Postfix) with ESMTP id 58BE66B0035 for ; Tue, 1 Apr 2014 03:38:55 -0400 (EDT) Received: by mail-lb0-f176.google.com with SMTP id 10so6639079lbg.35 for ; Tue, 01 Apr 2014 00:38:54 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id g7si10198760lag.186.2014.04.01.00.38.53 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Apr 2014 00:38:53 -0700 (PDT) From: Vladimir Davydov Subject: [PATCH -mm v2 2/2] mm: get rid of __GFP_KMEMCG Date: Tue, 1 Apr 2014 11:38:45 +0400 Message-ID: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, gthelen@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. Signed-off-by: Vladimir Davydov --- include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 2 +- include/linux/slab.h | 11 -------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab_common.c | 12 +++++++++ mm/slub.c | 6 ++--- 9 files changed, 60 insertions(+), 46 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 29068dd26c3d..13acdb5259f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; diff --git a/include/linux/slab.h b/include/linux/slab.h index 3dd389aa91c7..6d6959292e00 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -358,17 +358,6 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} - #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); #else diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/kernel/fork.c b/kernel/fork.c index f4b09bc15f3a..a1632c878037 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0327f9d5a8c0..41378986a1e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2810,8 +2802,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - if (page) set_page_owner(page, order, gfp_mask); @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/slab_common.c b/mm/slab_common.c index 6673597ac967..cab4c49b3e8c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index b203cfceff95..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3405,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by kanga.kvack.org (Postfix) with ESMTP id E645B6B0038 for ; Tue, 1 Apr 2014 03:38:55 -0400 (EDT) Received: by mail-lb0-f180.google.com with SMTP id 10so6463835lbg.25 for ; Tue, 01 Apr 2014 00:38:55 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id e6si10220422lah.16.2014.04.01.00.38.53 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Apr 2014 00:38:54 -0700 (PDT) From: Vladimir Davydov Subject: [PATCH -mm v2 1/2] sl[au]b: charge slabs to kmemcg explicitly Date: Tue, 1 Apr 2014 11:38:44 +0400 Message-ID: <031f9e2374dcb4cb6c2e7d509d1276623d5b1fba.1396335798.git.vdavydov@parallels.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, gthelen@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org We have only a few places where we actually want to charge kmem so instead of intruding into the general page allocation path with __GFP_KMEMCG it's better to explictly charge kmem there. All kmem charges will be easier to follow that way. This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG from memcg caches' allocflags. Instead it makes slab allocation path call memcg_charge_kmem directly getting memcg to charge from the cache's memcg params. This also eliminates any possibility of misaccounting an allocation going from one memcg's cache to another memcg, because now we always charge slabs against the memcg the cache belongs to. That's why this patch removes the big comment to memcg_kmem_get_cache. Signed-off-by: Vladimir Davydov --- include/linux/memcontrol.h | 15 ++++----------- mm/memcontrol.c | 4 ++-- mm/slab.c | 7 ++++++- mm/slab.h | 29 +++++++++++++++++++++++++++++ mm/slab_common.c | 6 +----- mm/slub.c | 24 +++++++++++++++++------- 6 files changed, 59 insertions(+), 26 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e9dfcdad24c5..29068dd26c3d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -512,6 +512,9 @@ void memcg_update_array_size(int num_groups); struct kmem_cache * __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp); +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size); +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size); + void mem_cgroup_destroy_cache(struct kmem_cache *cachep); int __kmem_cache_destroy_memcg_children(struct kmem_cache *s); @@ -589,17 +592,7 @@ memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order) * @cachep: the original global kmem cache * @gfp: allocation flags. * - * This function assumes that the task allocating, which determines the memcg - * in the page allocator, belongs to the same cgroup throughout the whole - * process. Misacounting can happen if the task calls memcg_kmem_get_cache() - * while belonging to a cgroup, and later on changes. This is considered - * acceptable, and should only happen upon task migration. - * - * Before the cache is created by the memcg core, there is also a possible - * imbalance: the task belongs to a memcg, but the cache being allocated from - * is the global cache, since the child cache is not yet guaranteed to be - * ready. This case is also fine, since in this case the GFP_KMEMCG will not be - * passed and the page allocator will not attempt any cgroup accounting. + * All memory allocated from a per-memcg cache is charged to the owner memcg. */ static __always_inline struct kmem_cache * memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b4b6aef562fa..71e1e11ab6a2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v) } #endif -static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) { struct res_counter *fail_res; int ret = 0; @@ -2982,7 +2982,7 @@ static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) return ret; } -static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) { res_counter_uncharge(&memcg->res, size); if (do_swap_account) diff --git a/mm/slab.c b/mm/slab.c index eebc619ae33c..af126a37dafd 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1664,8 +1664,12 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; + if (memcg_charge_slab(cachep, flags, cachep->gfporder)) + return NULL; + page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); if (!page) { + memcg_uncharge_slab(cachep, cachep->gfporder); if (!(flags & __GFP_NOWARN) && printk_ratelimit()) slab_out_of_memory(cachep, flags, nodeid); return NULL; @@ -1724,7 +1728,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page) memcg_release_pages(cachep, cachep->gfporder); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += nr_freed; - __free_memcg_kmem_pages(page, cachep->gfporder); + __free_pages(page, cachep->gfporder); + memcg_uncharge_slab(cachep, cachep->gfporder); } static void kmem_rcu_free(struct rcu_head *head) diff --git a/mm/slab.h b/mm/slab.h index 3045316b7c9d..3db3c52f80a2 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -191,6 +191,26 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) return s; return s->memcg_params->root_cache; } + +static __always_inline int memcg_charge_slab(struct kmem_cache *s, + gfp_t gfp, int order) +{ + if (!memcg_kmem_enabled()) + return 0; + if (is_root_cache(s)) + return 0; + return memcg_charge_kmem(s->memcg_params->memcg, gfp, + PAGE_SIZE << order); +} + +static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order) +{ + if (!memcg_kmem_enabled()) + return; + if (is_root_cache(s)) + return; + memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order); +} #else static inline bool is_root_cache(struct kmem_cache *s) { @@ -226,6 +246,15 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) { return s; } + +static inline int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int order) +{ + return 0; +} + +static inline void memcg_uncharge_slab(struct kmem_cache *s, int order) +{ +} #endif static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) diff --git a/mm/slab_common.c b/mm/slab_common.c index f3cfccf76dda..6673597ac967 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -290,12 +290,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c root_cache->size, root_cache->align, root_cache->flags, root_cache->ctor, memcg, root_cache); - if (IS_ERR(s)) { + if (IS_ERR(s)) kfree(cache_name); - goto out_unlock; - } - - s->allocflags |= __GFP_KMEMCG; out_unlock: mutex_unlock(&slab_mutex); diff --git a/mm/slub.c b/mm/slub.c index 5e234f1f8853..b203cfceff95 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1317,17 +1317,26 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) /* * Slab allocation and freeing */ -static inline struct page *alloc_slab_page(gfp_t flags, int node, - struct kmem_cache_order_objects oo) +static inline struct page *alloc_slab_page(struct kmem_cache *s, + gfp_t flags, int node, struct kmem_cache_order_objects oo) { + struct page *page; int order = oo_order(oo); flags |= __GFP_NOTRACK; + if (memcg_charge_slab(s, flags, order)) + return NULL; + if (node == NUMA_NO_NODE) - return alloc_pages(flags, order); + page = alloc_pages(flags, order); else - return alloc_pages_exact_node(node, flags, order); + page = alloc_pages_exact_node(node, flags, order); + + if (!page) + memcg_uncharge_slab(s, order); + + return page; } static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) @@ -1349,7 +1358,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) */ alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; - page = alloc_slab_page(alloc_gfp, node, oo); + page = alloc_slab_page(s, alloc_gfp, node, oo); if (unlikely(!page)) { oo = s->min; alloc_gfp = flags; @@ -1357,7 +1366,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible */ - page = alloc_slab_page(alloc_gfp, node, oo); + page = alloc_slab_page(s, alloc_gfp, node, oo); if (page) stat(s, ORDER_FALLBACK); @@ -1473,7 +1482,8 @@ static void __free_slab(struct kmem_cache *s, struct page *page) page_mapcount_reset(page); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; - __free_memcg_kmem_pages(page, order); + __free_pages(page, order); + memcg_uncharge_slab(s, order); } #define need_reserve_slab_rcu \ -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by kanga.kvack.org (Postfix) with ESMTP id 56A026B0038 for ; Tue, 1 Apr 2014 20:48:44 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id fa1so10638922pad.14 for ; Tue, 01 Apr 2014 17:48:44 -0700 (PDT) Received: from mail-pa0-x249.google.com (mail-pa0-x249.google.com [2607:f8b0:400e:c03::249]) by mx.google.com with ESMTPS id ub3si141095pac.276.2014.04.01.17.48.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Apr 2014 17:48:43 -0700 (PDT) Received: by mail-pa0-f73.google.com with SMTP id kq14so1404100pab.4 for ; Tue, 01 Apr 2014 17:48:42 -0700 (PDT) References: From: Greg Thelen Subject: Re: [PATCH -mm v2 2/2] mm: get rid of __GFP_KMEMCG In-reply-to: Date: Tue, 01 Apr 2014 17:48:41 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On Tue, Apr 01 2014, Vladimir Davydov wrote: > Currently to allocate a page that should be charged to kmemcg (e.g. > threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page > allocated is then to be freed by free_memcg_kmem_pages. Apart from > looking asymmetrical, this also requires intrusion to the general > allocation path. So let's introduce separate functions that will > alloc/free pages charged to kmemcg. > > The new functions are called alloc_kmem_pages and free_kmem_pages. They > should be used when the caller actually would like to use kmalloc, but > has to fall back to the page allocator for the allocation is large. They > only differ from alloc_pages and free_pages in that besides allocating > or freeing pages they also charge them to the kmem resource counter of > the current memory cgroup. > > Signed-off-by: Vladimir Davydov > --- > include/linux/gfp.h | 10 ++++--- > include/linux/memcontrol.h | 2 +- > include/linux/slab.h | 11 -------- > include/linux/thread_info.h | 2 -- > include/trace/events/gfpflags.h | 1 - > kernel/fork.c | 6 ++--- > mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- > mm/slab_common.c | 12 +++++++++ > mm/slub.c | 6 ++--- > 9 files changed, 60 insertions(+), 46 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 39b81dc7d01a..d382db71e300 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,7 +31,6 @@ struct vm_area_struct; > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > -#define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > @@ -91,7 +90,6 @@ struct vm_area_struct; > > #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) > #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ > -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ > #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ > > /* > @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, > #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ > alloc_pages_vma(gfp_mask, 0, vma, addr, node) > > +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); > +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, > + unsigned int order); > + > extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); > extern unsigned long get_zeroed_page(gfp_t gfp_mask); > > @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); > -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); > +extern void __free_kmem_pages(struct page *page, unsigned int order); > +extern void free_kmem_pages(unsigned long addr, unsigned int order); > > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 29068dd26c3d..13acdb5259f5 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) > * res_counter_charge_nofail, but we hope those allocations are rare, > * and won't be worth the trouble. > */ > - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) > + if (gfp & __GFP_NOFAIL) > return true; > if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) > return true; > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 3dd389aa91c7..6d6959292e00 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -358,17 +358,6 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, > #include > #endif > > -static __always_inline void * > -kmalloc_order(size_t size, gfp_t flags, unsigned int order) > -{ > - void *ret; > - > - flags |= (__GFP_COMP | __GFP_KMEMCG); > - ret = (void *) __get_free_pages(flags, order); > - kmemleak_alloc(ret, size, 1, flags); > - return ret; > -} > - Removing this from the header file breaks builds without CONFIG_TRACING. Example: % make allnoconfig && make -j4 mm/ [...] include/linux/slab.h: In function a??kmalloc_order_tracea??: include/linux/slab.h:367:2: error: implicit declaration of function a??kmalloc_ordera?? [-Werror=implicit-function-declaration] > #ifdef CONFIG_TRACING > extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); > #else > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h > index fddbe2023a5d..1807bb194816 100644 > --- a/include/linux/thread_info.h > +++ b/include/linux/thread_info.h > @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); > # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) > #endif > > -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) > - > /* > * flag set/clear/test wrappers > * - pass TIF_xxxx constants to these functions > diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h > index 1eddbf1557f2..d6fd8e5b14b7 100644 > --- a/include/trace/events/gfpflags.h > +++ b/include/trace/events/gfpflags.h > @@ -34,7 +34,6 @@ > {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ > {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ > {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ > - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ > {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ > {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ > {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ > diff --git a/kernel/fork.c b/kernel/fork.c > index f4b09bc15f3a..a1632c878037 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) > static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, > int node) > { > - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, > - THREAD_SIZE_ORDER); > + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, > + THREAD_SIZE_ORDER); > > return page ? page_address(page) : NULL; > } > > static inline void free_thread_info(struct thread_info *ti) > { > - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); > + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); > } > # else > static struct kmem_cache *thread_info_cache; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0327f9d5a8c0..41378986a1e6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; > - struct mem_cgroup *memcg = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, > if (unlikely(!zonelist->_zonerefs->zone)) > return NULL; > > - /* > - * Will only have any effect when __GFP_KMEMCG is set. This is > - * verified in the (always inline) callee > - */ > - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) > - return NULL; > - > retry_cpuset: > cpuset_mems_cookie = read_mems_allowed_begin(); > > @@ -2810,8 +2802,6 @@ out: > if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) > goto retry_cpuset; > > - memcg_kmem_commit_charge(page, memcg, order); > - > if (page) > set_page_owner(page, order, gfp_mask); > > @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) > EXPORT_SYMBOL(free_pages); > > /* > - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free > - * pages allocated with __GFP_KMEMCG. > + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter > + * of the current memory cgroup. > * > - * Those pages are accounted to a particular memcg, embedded in the > - * corresponding page_cgroup. To avoid adding a hit in the allocator to search > - * for that information only to find out that it is NULL for users who have no > - * interest in that whatsoever, we provide these functions. > - * > - * The caller knows better which flags it relies on. > + * It should be used when the caller would like to use kmalloc, but since the > + * allocation is large, it has to fall back to the page allocator. > + */ > +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) > +{ > + struct page *page; > + struct mem_cgroup *memcg = NULL; > + > + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) > + return NULL; > + page = alloc_pages(gfp_mask, order); > + memcg_kmem_commit_charge(page, memcg, order); > + return page; > +} > + > +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) > +{ > + struct page *page; > + struct mem_cgroup *memcg = NULL; > + > + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) > + return NULL; > + page = alloc_pages_node(nid, gfp_mask, order); > + memcg_kmem_commit_charge(page, memcg, order); > + return page; > +} > + > +/* > + * __free_kmem_pages and free_kmem_pages will free pages allocated with > + * alloc_kmem_pages. > */ > -void __free_memcg_kmem_pages(struct page *page, unsigned int order) > +void __free_kmem_pages(struct page *page, unsigned int order) > { > memcg_kmem_uncharge_pages(page, order); > __free_pages(page, order); > } > > -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) > +void free_kmem_pages(unsigned long addr, unsigned int order) > { > if (addr != 0) { > VM_BUG_ON(!virt_addr_valid((void *)addr)); > - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); > + __free_kmem_pages(virt_to_page((void *)addr), order); > } > } > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 6673597ac967..cab4c49b3e8c 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) > } > #endif /* !CONFIG_SLOB */ > > +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) > +{ > + void *ret; > + struct page *page; > + > + flags |= __GFP_COMP; > + page = alloc_kmem_pages(flags, order); > + ret = page ? page_address(page) : NULL; > + kmemleak_alloc(ret, size, 1, flags); > + return ret; > +} > + > #ifdef CONFIG_TRACING > void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) > { > diff --git a/mm/slub.c b/mm/slub.c > index b203cfceff95..fa7a1817835e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) > struct page *page; > void *ptr = NULL; > > - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; > - page = alloc_pages_node(node, flags, get_order(size)); > + flags |= __GFP_COMP | __GFP_NOTRACK; > + page = alloc_kmem_pages_node(node, flags, get_order(size)); > if (page) > ptr = page_address(page); > > @@ -3405,7 +3405,7 @@ void kfree(const void *x) > if (unlikely(!PageSlab(page))) { > BUG_ON(!PageCompound(page)); > kfree_hook(x); > - __free_memcg_kmem_pages(page, compound_order(page)); > + __free_kmem_pages(page, compound_order(page)); > return; > } > slab_free(page->slab_cache, page, object, _RET_IP_); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yh0-f41.google.com (mail-yh0-f41.google.com [209.85.213.41]) by kanga.kvack.org (Postfix) with ESMTP id 7F31E6B0098 for ; Tue, 1 Apr 2014 20:49:35 -0400 (EDT) Received: by mail-yh0-f41.google.com with SMTP id v1so9875898yhn.14 for ; Tue, 01 Apr 2014 17:49:35 -0700 (PDT) Received: from mail-yh0-x24a.google.com (mail-yh0-x24a.google.com [2607:f8b0:4002:c01::24a]) by mx.google.com with ESMTPS id k22si305724yhj.107.2014.04.01.17.49.35 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Apr 2014 17:49:35 -0700 (PDT) Received: by mail-yh0-f74.google.com with SMTP id f10so1468769yha.3 for ; Tue, 01 Apr 2014 17:49:35 -0700 (PDT) References: <031f9e2374dcb4cb6c2e7d509d1276623d5b1fba.1396335798.git.vdavydov@parallels.com> From: Greg Thelen Subject: Re: [PATCH -mm v2 1/2] sl[au]b: charge slabs to kmemcg explicitly In-reply-to: <031f9e2374dcb4cb6c2e7d509d1276623d5b1fba.1396335798.git.vdavydov@parallels.com> Date: Tue, 01 Apr 2014 17:49:34 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On Tue, Apr 01 2014, Vladimir Davydov wrote: > We have only a few places where we actually want to charge kmem so > instead of intruding into the general page allocation path with > __GFP_KMEMCG it's better to explictly charge kmem there. All kmem > charges will be easier to follow that way. > > This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG > from memcg caches' allocflags. Instead it makes slab allocation path > call memcg_charge_kmem directly getting memcg to charge from the cache's > memcg params. > > This also eliminates any possibility of misaccounting an allocation > going from one memcg's cache to another memcg, because now we always > charge slabs against the memcg the cache belongs to. That's why this > patch removes the big comment to memcg_kmem_get_cache. > > Signed-off-by: Vladimir Davydov Acked-by: Greg Thelen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by kanga.kvack.org (Postfix) with ESMTP id CDC486B007B for ; Wed, 2 Apr 2014 02:11:23 -0400 (EDT) Received: by mail-lb0-f169.google.com with SMTP id q8so7974360lbi.28 for ; Tue, 01 Apr 2014 23:11:22 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id le2si465303lbc.124.2014.04.01.23.11.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Apr 2014 23:11:22 -0700 (PDT) Message-ID: <533BAA03.9000406@parallels.com> Date: Wed, 2 Apr 2014 10:11:15 +0400 From: Vladimir Davydov MIME-Version: 1.0 Subject: Re: [PATCH -mm v2 2/2] mm: get rid of __GFP_KMEMCG References: In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Greg Thelen Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On 04/02/2014 04:48 AM, Greg Thelen wrote: > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 3dd389aa91c7..6d6959292e00 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -358,17 +358,6 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, > #include > #endif > > -static __always_inline void * > -kmalloc_order(size_t size, gfp_t flags, unsigned int order) > -{ > - void *ret; > - > - flags |= (__GFP_COMP | __GFP_KMEMCG); > - ret = (void *) __get_free_pages(flags, order); > - kmemleak_alloc(ret, size, 1, flags); > - return ret; > -} > - > Removing this from the header file breaks builds without > CONFIG_TRACING. > Example: > % make allnoconfig && make -j4 mm/ > [...] > include/linux/slab.h: In function a??kmalloc_order_tracea??: > include/linux/slab.h:367:2: error: implicit declaration of function a??kmalloc_ordera?? [-Werror=implicit-function-declaration] Oh, my bad - forgot to add the function declaration :-( The fixed version is on its way. Thank you for catching this! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f47.google.com (mail-la0-f47.google.com [209.85.215.47]) by kanga.kvack.org (Postfix) with ESMTP id 0D8C46B0080 for ; Wed, 2 Apr 2014 02:16:10 -0400 (EDT) Received: by mail-la0-f47.google.com with SMTP id pn19so3480159lab.6 for ; Tue, 01 Apr 2014 23:16:10 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id jg10si465608lbc.164.2014.04.01.23.16.08 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Apr 2014 23:16:09 -0700 (PDT) From: Vladimir Davydov Subject: [PATCH -mm v2.1] mm: get rid of __GFP_KMEMCG Date: Wed, 2 Apr 2014 10:16:05 +0400 Message-ID: <1396419365-351-1-git-send-email-vdavydov@parallels.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, gthelen@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Glauber Costa Cc: Christoph Lameter Cc: Pekka Enberg --- Changes in v2.1: - add missing kmalloc_order forward declaration; lacking it caused compilation breakage with CONFIG_TRACING=n include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 2 +- include/linux/slab.h | 11 +------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab_common.c | 12 +++++++++ mm/slub.c | 6 ++--- 9 files changed, 61 insertions(+), 45 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 29068dd26c3d..13acdb5259f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; diff --git a/include/linux/slab.h b/include/linux/slab.h index 3dd389aa91c7..eaaf2f69c54d 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/kernel/fork.c b/kernel/fork.c index f4b09bc15f3a..a1632c878037 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0327f9d5a8c0..41378986a1e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2810,8 +2802,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - if (page) set_page_owner(page, order, gfp_mask); @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/slab_common.c b/mm/slab_common.c index 6673597ac967..cab4c49b3e8c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index b203cfceff95..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3405,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f175.google.com (mail-yk0-f175.google.com [209.85.160.175]) by kanga.kvack.org (Postfix) with ESMTP id E52A86B011C for ; Wed, 2 Apr 2014 17:26:05 -0400 (EDT) Received: by mail-yk0-f175.google.com with SMTP id 131so763317ykp.34 for ; Wed, 02 Apr 2014 14:26:05 -0700 (PDT) Received: from mail-yh0-x24a.google.com (mail-yh0-x24a.google.com [2607:f8b0:4002:c01::24a]) by mx.google.com with ESMTPS id u56si3646606yhg.74.2014.04.02.14.26.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 02 Apr 2014 14:26:05 -0700 (PDT) Received: by mail-yh0-f74.google.com with SMTP id f10so116410yha.1 for ; Wed, 02 Apr 2014 14:26:05 -0700 (PDT) References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> From: Greg Thelen Subject: Re: [PATCH -mm v2.1] mm: get rid of __GFP_KMEMCG In-reply-to: <1396419365-351-1-git-send-email-vdavydov@parallels.com> Date: Wed, 02 Apr 2014 14:25:47 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On Tue, Apr 01 2014, Vladimir Davydov wrote: > Currently to allocate a page that should be charged to kmemcg (e.g. > threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page > allocated is then to be freed by free_memcg_kmem_pages. Apart from > looking asymmetrical, this also requires intrusion to the general > allocation path. So let's introduce separate functions that will > alloc/free pages charged to kmemcg. > > The new functions are called alloc_kmem_pages and free_kmem_pages. They > should be used when the caller actually would like to use kmalloc, but > has to fall back to the page allocator for the allocation is large. They > only differ from alloc_pages and free_pages in that besides allocating > or freeing pages they also charge them to the kmem resource counter of > the current memory cgroup. > > Signed-off-by: Vladimir Davydov One comment nit below, otherwise looks good to me. Acked-by: Greg Thelen > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Glauber Costa > Cc: Christoph Lameter > Cc: Pekka Enberg > --- > Changes in v2.1: > - add missing kmalloc_order forward declaration; lacking it caused > compilation breakage with CONFIG_TRACING=n > > include/linux/gfp.h | 10 ++++--- > include/linux/memcontrol.h | 2 +- > include/linux/slab.h | 11 +------- > include/linux/thread_info.h | 2 -- > include/trace/events/gfpflags.h | 1 - > kernel/fork.c | 6 ++--- > mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- > mm/slab_common.c | 12 +++++++++ > mm/slub.c | 6 ++--- > 9 files changed, 61 insertions(+), 45 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 39b81dc7d01a..d382db71e300 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,7 +31,6 @@ struct vm_area_struct; > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > -#define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > @@ -91,7 +90,6 @@ struct vm_area_struct; > > #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) > #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ > -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ > #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ > > /* > @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, > #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ > alloc_pages_vma(gfp_mask, 0, vma, addr, node) > > +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); > +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, > + unsigned int order); > + > extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); > extern unsigned long get_zeroed_page(gfp_t gfp_mask); > > @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); > -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); > +extern void __free_kmem_pages(struct page *page, unsigned int order); > +extern void free_kmem_pages(unsigned long addr, unsigned int order); > > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 29068dd26c3d..13acdb5259f5 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) > * res_counter_charge_nofail, but we hope those allocations are rare, > * and won't be worth the trouble. > */ Just a few lines higher in first memcg_kmem_newpage_charge() comment, there is a leftover reference to GFP_KMEMCG which should be removed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f43.google.com (mail-la0-f43.google.com [209.85.215.43]) by kanga.kvack.org (Postfix) with ESMTP id D6A096B0031 for ; Thu, 3 Apr 2014 11:04:12 -0400 (EDT) Received: by mail-la0-f43.google.com with SMTP id e16so1437231lan.2 for ; Thu, 03 Apr 2014 08:04:11 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id y8si3676657lae.133.2014.04.03.08.04.09 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 Apr 2014 08:04:09 -0700 (PDT) Message-ID: <533D7864.7080907@parallels.com> Date: Thu, 3 Apr 2014 19:04:04 +0400 From: Vladimir Davydov MIME-Version: 1.0 Subject: Re: [PATCH -mm v2.1] mm: get rid of __GFP_KMEMCG References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Greg Thelen Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On 04/03/2014 01:25 AM, Greg Thelen wrote: > On Tue, Apr 01 2014, Vladimir Davydov wrote: > >> Currently to allocate a page that should be charged to kmemcg (e.g. >> threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page >> allocated is then to be freed by free_memcg_kmem_pages. Apart from >> looking asymmetrical, this also requires intrusion to the general >> allocation path. So let's introduce separate functions that will >> alloc/free pages charged to kmemcg. >> >> The new functions are called alloc_kmem_pages and free_kmem_pages. They >> should be used when the caller actually would like to use kmalloc, but >> has to fall back to the page allocator for the allocation is large. They >> only differ from alloc_pages and free_pages in that besides allocating >> or freeing pages they also charge them to the kmem resource counter of >> the current memory cgroup. >> >> Signed-off-by: Vladimir Davydov > One comment nit below, otherwise looks good to me. > > Acked-by: Greg Thelen > >> Cc: Johannes Weiner >> Cc: Michal Hocko >> Cc: Glauber Costa >> Cc: Christoph Lameter >> Cc: Pekka Enberg >> --- >> Changes in v2.1: >> - add missing kmalloc_order forward declaration; lacking it caused >> compilation breakage with CONFIG_TRACING=n >> >> include/linux/gfp.h | 10 ++++--- >> include/linux/memcontrol.h | 2 +- >> include/linux/slab.h | 11 +------- >> include/linux/thread_info.h | 2 -- >> include/trace/events/gfpflags.h | 1 - >> kernel/fork.c | 6 ++--- >> mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- >> mm/slab_common.c | 12 +++++++++ >> mm/slub.c | 6 ++--- >> 9 files changed, 61 insertions(+), 45 deletions(-) >> >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >> index 39b81dc7d01a..d382db71e300 100644 >> --- a/include/linux/gfp.h >> +++ b/include/linux/gfp.h >> @@ -31,7 +31,6 @@ struct vm_area_struct; >> #define ___GFP_HARDWALL 0x20000u >> #define ___GFP_THISNODE 0x40000u >> #define ___GFP_RECLAIMABLE 0x80000u >> -#define ___GFP_KMEMCG 0x100000u >> #define ___GFP_NOTRACK 0x200000u >> #define ___GFP_NO_KSWAPD 0x400000u >> #define ___GFP_OTHER_NODE 0x800000u >> @@ -91,7 +90,6 @@ struct vm_area_struct; >> >> #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) >> #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ >> -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ >> #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ >> >> /* >> @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, >> #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ >> alloc_pages_vma(gfp_mask, 0, vma, addr, node) >> >> +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); >> +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, >> + unsigned int order); >> + >> extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); >> extern unsigned long get_zeroed_page(gfp_t gfp_mask); >> >> @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); >> extern void free_hot_cold_page(struct page *page, int cold); >> extern void free_hot_cold_page_list(struct list_head *list, int cold); >> >> -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); >> -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); >> +extern void __free_kmem_pages(struct page *page, unsigned int order); >> +extern void free_kmem_pages(unsigned long addr, unsigned int order); >> >> #define __free_page(page) __free_pages((page), 0) >> #define free_page(addr) free_pages((addr), 0) >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 29068dd26c3d..13acdb5259f5 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) >> * res_counter_charge_nofail, but we hope those allocations are rare, >> * and won't be worth the trouble. >> */ > Just a few lines higher in first memcg_kmem_newpage_charge() comment, > there is a leftover reference to GFP_KMEMCG which should be removed. Good catch, will resend. Thank you for the review! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f43.google.com (mail-la0-f43.google.com [209.85.215.43]) by kanga.kvack.org (Postfix) with ESMTP id A072B6B0031 for ; Thu, 3 Apr 2014 11:06:09 -0400 (EDT) Received: by mail-la0-f43.google.com with SMTP id e16so1470513lan.30 for ; Thu, 03 Apr 2014 08:06:09 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id qs7si3685820lbb.111.2014.04.03.08.06.07 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 Apr 2014 08:06:08 -0700 (PDT) From: Vladimir Davydov Subject: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG Date: Thu, 3 Apr 2014 19:05:59 +0400 Message-ID: <1396537559-17453-1-git-send-email-vdavydov@parallels.com> In-Reply-To: <1396419365-351-1-git-send-email-vdavydov@parallels.com> References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Christoph Lameter , Pekka Enberg Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. Signed-off-by: Vladimir Davydov Acked-by: Greg Thelen Cc: Johannes Weiner Cc: Michal Hocko Cc: Glauber Costa Cc: Christoph Lameter Cc: Pekka Enberg --- Changes in v2.2: - remove mentioning of GFP_KMEMCG in the comment to __memcg_kmem_newpage_charge Changes in v2.1: - add missing kmalloc_order forward declaration; lacking it caused compilation breakage with CONFIG_TRACING=n include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 2 +- include/linux/slab.h | 11 +------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/memcontrol.c | 11 ++++---- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab_common.c | 12 +++++++++ mm/slub.c | 6 ++--- 10 files changed, 67 insertions(+), 50 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 29068dd26c3d..13acdb5259f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; diff --git a/include/linux/slab.h b/include/linux/slab.h index 3dd389aa91c7..eaaf2f69c54d 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/kernel/fork.c b/kernel/fork.c index f4b09bc15f3a..a1632c878037 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 71e1e11ab6a2..ba30b26fc483 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3531,11 +3531,12 @@ __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order) /* * Disabling accounting is only relevant for some specific memcg * internal allocations. Therefore we would initially not have such - * check here, since direct calls to the page allocator that are marked - * with GFP_KMEMCG only happen outside memcg core. We are mostly - * concerned with cache allocations, and by having this test at - * memcg_kmem_get_cache, we are already able to relay the allocation to - * the root cache and bypass the memcg cache altogether. + * check here, since direct calls to the page allocator that are + * accounted to kmemcg (alloc_kmem_pages and friends) only happen + * outside memcg core. We are mostly concerned with cache allocations, + * and by having this test at memcg_kmem_get_cache, we are already able + * to relay the allocation to the root cache and bypass the memcg cache + * altogether. * * There is one exception, though: the SLUB allocator does not create * large order caches, but rather service large kmallocs directly from diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0327f9d5a8c0..41378986a1e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2810,8 +2802,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - if (page) set_page_owner(page, order, gfp_mask); @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/slab_common.c b/mm/slab_common.c index 6673597ac967..cab4c49b3e8c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index b203cfceff95..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3405,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id 254BA6B0035 for ; Thu, 10 Apr 2014 19:38:34 -0400 (EDT) Received: by mail-pa0-f46.google.com with SMTP id kx10so4561952pab.5 for ; Thu, 10 Apr 2014 16:38:33 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTP id ic8si3026595pad.300.2014.04.10.16.38.32 for ; Thu, 10 Apr 2014 16:38:33 -0700 (PDT) Date: Thu, 10 Apr 2014 16:38:31 -0700 From: Andrew Morton Subject: Re: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG Message-Id: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> In-Reply-To: <1396537559-17453-1-git-send-email-vdavydov@parallels.com> References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> <1396537559-17453-1-git-send-email-vdavydov@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Christoph Lameter , Pekka Enberg On Thu, 3 Apr 2014 19:05:59 +0400 Vladimir Davydov wrote: > Currently to allocate a page that should be charged to kmemcg (e.g. > threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page > allocated is then to be freed by free_memcg_kmem_pages. Apart from > looking asymmetrical, this also requires intrusion to the general > allocation path. So let's introduce separate functions that will > alloc/free pages charged to kmemcg. > > The new functions are called alloc_kmem_pages and free_kmem_pages. They > should be used when the caller actually would like to use kmalloc, but > has to fall back to the page allocator for the allocation is large. They > only differ from alloc_pages and free_pages in that besides allocating > or freeing pages they also charge them to the kmem resource counter of > the current memory cgroup. > > ... > > +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) > +{ > + void *ret; > + struct page *page; > + > + flags |= __GFP_COMP; > + page = alloc_kmem_pages(flags, order); > + ret = page ? page_address(page) : NULL; > + kmemleak_alloc(ret, size, 1, flags); > + return ret; > +} While we're in there it wouldn't hurt to document this: why it exists, what it does, etc. And why it sets __GFP_COMP. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by kanga.kvack.org (Postfix) with ESMTP id 020486B0036 for ; Fri, 11 Apr 2014 08:52:28 -0400 (EDT) Received: by mail-la0-f53.google.com with SMTP id b8so3488916lan.12 for ; Fri, 11 Apr 2014 05:52:26 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id y6si5532842lal.5.2014.04.11.05.52.25 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Apr 2014 05:52:25 -0700 (PDT) From: Vladimir Davydov Subject: [PATCH -mm] slab: document kmalloc_order Date: Fri, 11 Apr 2014 16:52:16 +0400 Message-ID: <1397220736-13840-1-git-send-email-vdavydov@parallels.com> In-Reply-To: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> References: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: cl@linux-foundation.org, penberg@kernel.org, gthelen@google.com, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Signed-off-by: Vladimir Davydov --- mm/slab_common.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index cab4c49b3e8c..3ffd2e76b5d2 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,11 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +/* + * To avoid unnecessary overhead, we pass through large allocation requests + * directly to the page allocator. We use __GFP_COMP, because we will need to + * know the allocation order to free the pages properly in kfree. + */ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) { void *ret; -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f53.google.com (mail-pb0-f53.google.com [209.85.160.53]) by kanga.kvack.org (Postfix) with ESMTP id C58C66B0035 for ; Fri, 11 Apr 2014 11:58:00 -0400 (EDT) Received: by mail-pb0-f53.google.com with SMTP id rp16so5515181pbb.26 for ; Fri, 11 Apr 2014 08:57:59 -0700 (PDT) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net. [2001:558:fe2d:43:76:96:30:96]) by mx.google.com with ESMTP id ic8si4469276pad.95.2014.04.11.08.57.58 for ; Fri, 11 Apr 2014 08:57:58 -0700 (PDT) Date: Fri, 11 Apr 2014 10:57:54 -0500 (CDT) From: Christoph Lameter Subject: Re: [PATCH -mm] slab: document kmalloc_order In-Reply-To: <1397220736-13840-1-git-send-email-vdavydov@parallels.com> Message-ID: References: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> <1397220736-13840-1-git-send-email-vdavydov@parallels.com> Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: akpm@linux-foundation.org, penberg@kernel.org, gthelen@google.com, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On Fri, 11 Apr 2014, Vladimir Davydov wrote: > diff --git a/mm/slab_common.c b/mm/slab_common.c > index cab4c49b3e8c..3ffd2e76b5d2 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -573,6 +573,11 @@ void __init create_kmalloc_caches(unsigned long flags) > } > #endif /* !CONFIG_SLOB */ > > +/* > + * To avoid unnecessary overhead, we pass through large allocation requests > + * directly to the page allocator. We use __GFP_COMP, because we will need to > + * know the allocation order to free the pages properly in kfree. > + */ > void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) > { > void *ret; > ??? kmalloc_order is defined in include/linux/slab.h -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f170.google.com (mail-pd0-f170.google.com [209.85.192.170]) by kanga.kvack.org (Postfix) with ESMTP id 4EED06B0035 for ; Fri, 11 Apr 2014 12:07:52 -0400 (EDT) Received: by mail-pd0-f170.google.com with SMTP id v10so5480255pde.15 for ; Fri, 11 Apr 2014 09:07:51 -0700 (PDT) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net. [2001:558:fe2d:43:76:96:30:96]) by mx.google.com with ESMTP id bi5si4524375pbb.148.2014.04.11.09.07.51 for ; Fri, 11 Apr 2014 09:07:51 -0700 (PDT) Date: Fri, 11 Apr 2014 11:07:48 -0500 (CDT) From: Christoph Lameter Subject: Re: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG In-Reply-To: <1396537559-17453-1-git-send-email-vdavydov@parallels.com> Message-ID: References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> <1396537559-17453-1-git-send-email-vdavydov@parallels.com> Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Pekka Enberg On Thu, 3 Apr 2014, Vladimir Davydov wrote: > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, > #include > #endif > > -static __always_inline void * > -kmalloc_order(size_t size, gfp_t flags, unsigned int order) > -{ > - void *ret; > - > - flags |= (__GFP_COMP | __GFP_KMEMCG); > - ret = (void *) __get_free_pages(flags, order); > - kmemleak_alloc(ret, size, 1, flags); > - return ret; > -} > +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); Hmmm... This was intentional inlined to allow inline expansion for calls to kmalloc with large constants. The inline expansion directly converts these calls to page allocator calls avoiding slab overhead. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by kanga.kvack.org (Postfix) with ESMTP id 372046B0035 for ; Fri, 11 Apr 2014 13:24:48 -0400 (EDT) Received: by mail-la0-f50.google.com with SMTP id pv20so3837470lab.9 for ; Fri, 11 Apr 2014 10:24:46 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id g7si5992777lab.82.2014.04.11.10.24.44 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Apr 2014 10:24:45 -0700 (PDT) Message-ID: <53482555.4070603@parallels.com> Date: Fri, 11 Apr 2014 21:24:37 +0400 From: Vladimir Davydov MIME-Version: 1.0 Subject: Re: [PATCH -mm] slab: document kmalloc_order References: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> <1397220736-13840-1-git-send-email-vdavydov@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: akpm@linux-foundation.org, penberg@kernel.org, gthelen@google.com, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org On 04/11/2014 07:57 PM, Christoph Lameter wrote: > On Fri, 11 Apr 2014, Vladimir Davydov wrote: > >> diff --git a/mm/slab_common.c b/mm/slab_common.c >> index cab4c49b3e8c..3ffd2e76b5d2 100644 >> --- a/mm/slab_common.c >> +++ b/mm/slab_common.c >> @@ -573,6 +573,11 @@ void __init create_kmalloc_caches(unsigned long flags) >> } >> #endif /* !CONFIG_SLOB */ >> >> +/* >> + * To avoid unnecessary overhead, we pass through large allocation requests >> + * directly to the page allocator. We use __GFP_COMP, because we will need to >> + * know the allocation order to free the pages properly in kfree. >> + */ >> void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) >> { >> void *ret; >> > ??? kmalloc_order is defined in include/linux/slab.h I moved it to slab_common.c in "[PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG" -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f43.google.com (mail-la0-f43.google.com [209.85.215.43]) by kanga.kvack.org (Postfix) with ESMTP id 378BE6B0038 for ; Fri, 11 Apr 2014 13:33:22 -0400 (EDT) Received: by mail-la0-f43.google.com with SMTP id e16so3728847lan.2 for ; Fri, 11 Apr 2014 10:33:20 -0700 (PDT) Received: from relay.parallels.com (relay.parallels.com. [195.214.232.42]) by mx.google.com with ESMTPS id h8si6012699lam.53.2014.04.11.10.33.18 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Apr 2014 10:33:19 -0700 (PDT) Message-ID: <53482754.1090102@parallels.com> Date: Fri, 11 Apr 2014 21:33:08 +0400 From: Vladimir Davydov MIME-Version: 1.0 Subject: Re: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> <1396537559-17453-1-git-send-email-vdavydov@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Pekka Enberg On 04/11/2014 08:07 PM, Christoph Lameter wrote: > On Thu, 3 Apr 2014, Vladimir Davydov wrote: > >> --- a/include/linux/slab.h >> +++ b/include/linux/slab.h >> @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, >> #include >> #endif >> >> -static __always_inline void * >> -kmalloc_order(size_t size, gfp_t flags, unsigned int order) >> -{ >> - void *ret; >> - >> - flags |= (__GFP_COMP | __GFP_KMEMCG); >> - ret = (void *) __get_free_pages(flags, order); >> - kmemleak_alloc(ret, size, 1, flags); >> - return ret; >> -} >> +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); > > Hmmm... This was intentional inlined to allow inline expansion for calls > to kmalloc with large constants. The inline expansion directly converts > these calls to page allocator calls avoiding slab overhead. I moved kmalloc_order() to slab_common.c, because I can't call alloc_kmem_pages() directly from the header (we don't have page_address() defined there, and including mm.h to slab.h wouldn't be good I think), and I don't want to introduce __get_free_kmem_pages(). Sorry that I didn't state this explicitly in the comment to the patch. However, would it be any better if I introduced __get_free_kmem_pages() and called it from kmalloc_order(), which could be inlined then? I don't think so, because I would have to place __get_free_kmem_pages() in page_alloc.c just like __get_free_pages() (again, because including mm.h to gfp.h for page_address() isn't an option), and we would get exactly the same number of function calls. I admit that this patch adds one extra function call to large kmallocs: - before: __get_free_pages -> alloc_pages - after: kmalloc_order -> alloc_kmem_pages -> alloc_pages but that's not because I move kmalloc_order from the header, but rather because I introduce alloc_kmem_pages, which is not inline. What can we do to eliminate it? We could place alloc_kmem_pages() definition to a header file, but since it needs memcontrol.h for kmemcg charging functions, that would lead to slab.h depending on memcontrol.h eventually, which is not good IMO. Alternatively, we could #ifndef MEMCG_KMEM # define alloc_kmem_pages alloc_pages #endif so that we would avoid any additional overhead if kmemcg is compiled out. However, do we need to bother about one function call at all? My point is that one function call can be neglected in case of large kmem allocations, which are rather rare. Any thoughts/objections? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751509AbaDAHiz (ORCPT ); Tue, 1 Apr 2014 03:38:55 -0400 Received: from relay.parallels.com ([195.214.232.42]:56767 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751222AbaDAHiv (ORCPT ); Tue, 1 Apr 2014 03:38:51 -0400 From: Vladimir Davydov To: CC: , , , , , , Subject: [PATCH -mm v2 2/2] mm: get rid of __GFP_KMEMCG Date: Tue, 1 Apr 2014 11:38:45 +0400 Message-ID: X-Mailer: git-send-email 1.7.10.4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. Signed-off-by: Vladimir Davydov --- include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 2 +- include/linux/slab.h | 11 -------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab_common.c | 12 +++++++++ mm/slub.c | 6 ++--- 9 files changed, 60 insertions(+), 46 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 29068dd26c3d..13acdb5259f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; diff --git a/include/linux/slab.h b/include/linux/slab.h index 3dd389aa91c7..6d6959292e00 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -358,17 +358,6 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} - #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); #else diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/kernel/fork.c b/kernel/fork.c index f4b09bc15f3a..a1632c878037 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0327f9d5a8c0..41378986a1e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2810,8 +2802,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - if (page) set_page_owner(page, order, gfp_mask); @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/slab_common.c b/mm/slab_common.c index 6673597ac967..cab4c49b3e8c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index b203cfceff95..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3405,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); -- 1.7.10.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751683AbaDAHjB (ORCPT ); Tue, 1 Apr 2014 03:39:01 -0400 Received: from relay.parallels.com ([195.214.232.42]:56769 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751214AbaDAHiv (ORCPT ); Tue, 1 Apr 2014 03:38:51 -0400 From: Vladimir Davydov To: CC: , , , , , , Subject: [PATCH -mm v2 1/2] sl[au]b: charge slabs to kmemcg explicitly Date: Tue, 1 Apr 2014 11:38:44 +0400 Message-ID: <031f9e2374dcb4cb6c2e7d509d1276623d5b1fba.1396335798.git.vdavydov@parallels.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have only a few places where we actually want to charge kmem so instead of intruding into the general page allocation path with __GFP_KMEMCG it's better to explictly charge kmem there. All kmem charges will be easier to follow that way. This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG from memcg caches' allocflags. Instead it makes slab allocation path call memcg_charge_kmem directly getting memcg to charge from the cache's memcg params. This also eliminates any possibility of misaccounting an allocation going from one memcg's cache to another memcg, because now we always charge slabs against the memcg the cache belongs to. That's why this patch removes the big comment to memcg_kmem_get_cache. Signed-off-by: Vladimir Davydov --- include/linux/memcontrol.h | 15 ++++----------- mm/memcontrol.c | 4 ++-- mm/slab.c | 7 ++++++- mm/slab.h | 29 +++++++++++++++++++++++++++++ mm/slab_common.c | 6 +----- mm/slub.c | 24 +++++++++++++++++------- 6 files changed, 59 insertions(+), 26 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e9dfcdad24c5..29068dd26c3d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -512,6 +512,9 @@ void memcg_update_array_size(int num_groups); struct kmem_cache * __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp); +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size); +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size); + void mem_cgroup_destroy_cache(struct kmem_cache *cachep); int __kmem_cache_destroy_memcg_children(struct kmem_cache *s); @@ -589,17 +592,7 @@ memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order) * @cachep: the original global kmem cache * @gfp: allocation flags. * - * This function assumes that the task allocating, which determines the memcg - * in the page allocator, belongs to the same cgroup throughout the whole - * process. Misacounting can happen if the task calls memcg_kmem_get_cache() - * while belonging to a cgroup, and later on changes. This is considered - * acceptable, and should only happen upon task migration. - * - * Before the cache is created by the memcg core, there is also a possible - * imbalance: the task belongs to a memcg, but the cache being allocated from - * is the global cache, since the child cache is not yet guaranteed to be - * ready. This case is also fine, since in this case the GFP_KMEMCG will not be - * passed and the page allocator will not attempt any cgroup accounting. + * All memory allocated from a per-memcg cache is charged to the owner memcg. */ static __always_inline struct kmem_cache * memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b4b6aef562fa..71e1e11ab6a2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v) } #endif -static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) { struct res_counter *fail_res; int ret = 0; @@ -2982,7 +2982,7 @@ static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) return ret; } -static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) { res_counter_uncharge(&memcg->res, size); if (do_swap_account) diff --git a/mm/slab.c b/mm/slab.c index eebc619ae33c..af126a37dafd 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1664,8 +1664,12 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; + if (memcg_charge_slab(cachep, flags, cachep->gfporder)) + return NULL; + page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); if (!page) { + memcg_uncharge_slab(cachep, cachep->gfporder); if (!(flags & __GFP_NOWARN) && printk_ratelimit()) slab_out_of_memory(cachep, flags, nodeid); return NULL; @@ -1724,7 +1728,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page) memcg_release_pages(cachep, cachep->gfporder); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += nr_freed; - __free_memcg_kmem_pages(page, cachep->gfporder); + __free_pages(page, cachep->gfporder); + memcg_uncharge_slab(cachep, cachep->gfporder); } static void kmem_rcu_free(struct rcu_head *head) diff --git a/mm/slab.h b/mm/slab.h index 3045316b7c9d..3db3c52f80a2 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -191,6 +191,26 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) return s; return s->memcg_params->root_cache; } + +static __always_inline int memcg_charge_slab(struct kmem_cache *s, + gfp_t gfp, int order) +{ + if (!memcg_kmem_enabled()) + return 0; + if (is_root_cache(s)) + return 0; + return memcg_charge_kmem(s->memcg_params->memcg, gfp, + PAGE_SIZE << order); +} + +static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order) +{ + if (!memcg_kmem_enabled()) + return; + if (is_root_cache(s)) + return; + memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order); +} #else static inline bool is_root_cache(struct kmem_cache *s) { @@ -226,6 +246,15 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) { return s; } + +static inline int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int order) +{ + return 0; +} + +static inline void memcg_uncharge_slab(struct kmem_cache *s, int order) +{ +} #endif static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) diff --git a/mm/slab_common.c b/mm/slab_common.c index f3cfccf76dda..6673597ac967 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -290,12 +290,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c root_cache->size, root_cache->align, root_cache->flags, root_cache->ctor, memcg, root_cache); - if (IS_ERR(s)) { + if (IS_ERR(s)) kfree(cache_name); - goto out_unlock; - } - - s->allocflags |= __GFP_KMEMCG; out_unlock: mutex_unlock(&slab_mutex); diff --git a/mm/slub.c b/mm/slub.c index 5e234f1f8853..b203cfceff95 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1317,17 +1317,26 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) /* * Slab allocation and freeing */ -static inline struct page *alloc_slab_page(gfp_t flags, int node, - struct kmem_cache_order_objects oo) +static inline struct page *alloc_slab_page(struct kmem_cache *s, + gfp_t flags, int node, struct kmem_cache_order_objects oo) { + struct page *page; int order = oo_order(oo); flags |= __GFP_NOTRACK; + if (memcg_charge_slab(s, flags, order)) + return NULL; + if (node == NUMA_NO_NODE) - return alloc_pages(flags, order); + page = alloc_pages(flags, order); else - return alloc_pages_exact_node(node, flags, order); + page = alloc_pages_exact_node(node, flags, order); + + if (!page) + memcg_uncharge_slab(s, order); + + return page; } static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) @@ -1349,7 +1358,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) */ alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; - page = alloc_slab_page(alloc_gfp, node, oo); + page = alloc_slab_page(s, alloc_gfp, node, oo); if (unlikely(!page)) { oo = s->min; alloc_gfp = flags; @@ -1357,7 +1366,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible */ - page = alloc_slab_page(alloc_gfp, node, oo); + page = alloc_slab_page(s, alloc_gfp, node, oo); if (page) stat(s, ORDER_FALLBACK); @@ -1473,7 +1482,8 @@ static void __free_slab(struct kmem_cache *s, struct page *page) page_mapcount_reset(page); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; - __free_memcg_kmem_pages(page, order); + __free_pages(page, order); + memcg_uncharge_slab(s, order); } #define need_reserve_slab_rcu \ -- 1.7.10.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751585AbaDAHi6 (ORCPT ); Tue, 1 Apr 2014 03:38:58 -0400 Received: from relay.parallels.com ([195.214.232.42]:56765 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751204AbaDAHiv (ORCPT ); Tue, 1 Apr 2014 03:38:51 -0400 From: Vladimir Davydov To: CC: , , , , , , Subject: [PATCH -mm v2 0/2] cleanup kmemcg charging (was: "kmemcg: get rid of __GFP_KMEMCG") Date: Tue, 1 Apr 2014 11:38:43 +0400 Message-ID: X-Mailer: git-send-email 1.7.10.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Currently we charge kmem to memcg in alloc_pages if __GFP_KMEMCG is passed. However, since there are only a few places where we actually want to charge kmem, we could call kmemcg charge function explicitly instead. That would remove all kmemcg-related stuff from the general allocation path and make all kmem charges easier to follow. So let's charge kmem explicitly where we want it to be charged (slab, threadinfo) and remove __GFP_KMEMCG. Changes in v2: - use static key optimization in memcg_(un)charge_slab to avoid any overhead if kmemcg is not used; - introduce helper functions, alloc/free_kmem_pages, which charge newly allocated pages to kmemcg, to avoid code duplication; - do not remove accounting of kmalloc_large allocations (as discussed in the comments to v1). v1 can be found at lkml.org/lkml/2014/3/26/228 Thanks, Vladimir Davydov (2): sl[au]b: charge slabs to kmemcg explicitly mm: get rid of __GFP_KMEMCG include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 17 ++++-------- include/linux/slab.h | 11 -------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/memcontrol.c | 4 +-- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab.c | 7 ++++- mm/slab.h | 29 ++++++++++++++++++++ mm/slab_common.c | 18 +++++++++---- mm/slub.c | 30 ++++++++++++++------- 12 files changed, 119 insertions(+), 72 deletions(-) -- 1.7.10.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757385AbaDBAzd (ORCPT ); Tue, 1 Apr 2014 20:55:33 -0400 Received: from mail-pa0-f73.google.com ([209.85.220.73]:45129 "EHLO mail-pa0-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754271AbaDBAzb (ORCPT ); Tue, 1 Apr 2014 20:55:31 -0400 References: User-agent: mu4e 0.9.9.6pre2; emacs 24.3.1 From: Greg Thelen To: Vladimir Davydov Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Subject: Re: [PATCH -mm v2 2/2] mm: get rid of __GFP_KMEMCG In-reply-to: Date: Tue, 01 Apr 2014 17:48:41 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 01 2014, Vladimir Davydov wrote: > Currently to allocate a page that should be charged to kmemcg (e.g. > threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page > allocated is then to be freed by free_memcg_kmem_pages. Apart from > looking asymmetrical, this also requires intrusion to the general > allocation path. So let's introduce separate functions that will > alloc/free pages charged to kmemcg. > > The new functions are called alloc_kmem_pages and free_kmem_pages. They > should be used when the caller actually would like to use kmalloc, but > has to fall back to the page allocator for the allocation is large. They > only differ from alloc_pages and free_pages in that besides allocating > or freeing pages they also charge them to the kmem resource counter of > the current memory cgroup. > > Signed-off-by: Vladimir Davydov > --- > include/linux/gfp.h | 10 ++++--- > include/linux/memcontrol.h | 2 +- > include/linux/slab.h | 11 -------- > include/linux/thread_info.h | 2 -- > include/trace/events/gfpflags.h | 1 - > kernel/fork.c | 6 ++--- > mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- > mm/slab_common.c | 12 +++++++++ > mm/slub.c | 6 ++--- > 9 files changed, 60 insertions(+), 46 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 39b81dc7d01a..d382db71e300 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,7 +31,6 @@ struct vm_area_struct; > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > -#define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > @@ -91,7 +90,6 @@ struct vm_area_struct; > > #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) > #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ > -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ > #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ > > /* > @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, > #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ > alloc_pages_vma(gfp_mask, 0, vma, addr, node) > > +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); > +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, > + unsigned int order); > + > extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); > extern unsigned long get_zeroed_page(gfp_t gfp_mask); > > @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); > -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); > +extern void __free_kmem_pages(struct page *page, unsigned int order); > +extern void free_kmem_pages(unsigned long addr, unsigned int order); > > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 29068dd26c3d..13acdb5259f5 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) > * res_counter_charge_nofail, but we hope those allocations are rare, > * and won't be worth the trouble. > */ > - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) > + if (gfp & __GFP_NOFAIL) > return true; > if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) > return true; > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 3dd389aa91c7..6d6959292e00 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -358,17 +358,6 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, > #include > #endif > > -static __always_inline void * > -kmalloc_order(size_t size, gfp_t flags, unsigned int order) > -{ > - void *ret; > - > - flags |= (__GFP_COMP | __GFP_KMEMCG); > - ret = (void *) __get_free_pages(flags, order); > - kmemleak_alloc(ret, size, 1, flags); > - return ret; > -} > - Removing this from the header file breaks builds without CONFIG_TRACING. Example: % make allnoconfig && make -j4 mm/ [...] include/linux/slab.h: In function ‘kmalloc_order_trace’: include/linux/slab.h:367:2: error: implicit declaration of function ‘kmalloc_order’ [-Werror=implicit-function-declaration] > #ifdef CONFIG_TRACING > extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); > #else > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h > index fddbe2023a5d..1807bb194816 100644 > --- a/include/linux/thread_info.h > +++ b/include/linux/thread_info.h > @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); > # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) > #endif > > -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) > - > /* > * flag set/clear/test wrappers > * - pass TIF_xxxx constants to these functions > diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h > index 1eddbf1557f2..d6fd8e5b14b7 100644 > --- a/include/trace/events/gfpflags.h > +++ b/include/trace/events/gfpflags.h > @@ -34,7 +34,6 @@ > {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ > {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ > {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ > - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ > {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ > {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ > {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ > diff --git a/kernel/fork.c b/kernel/fork.c > index f4b09bc15f3a..a1632c878037 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) > static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, > int node) > { > - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, > - THREAD_SIZE_ORDER); > + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, > + THREAD_SIZE_ORDER); > > return page ? page_address(page) : NULL; > } > > static inline void free_thread_info(struct thread_info *ti) > { > - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); > + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); > } > # else > static struct kmem_cache *thread_info_cache; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0327f9d5a8c0..41378986a1e6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, > int migratetype = allocflags_to_migratetype(gfp_mask); > unsigned int cpuset_mems_cookie; > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; > - struct mem_cgroup *memcg = NULL; > > gfp_mask &= gfp_allowed_mask; > > @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, > if (unlikely(!zonelist->_zonerefs->zone)) > return NULL; > > - /* > - * Will only have any effect when __GFP_KMEMCG is set. This is > - * verified in the (always inline) callee > - */ > - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) > - return NULL; > - > retry_cpuset: > cpuset_mems_cookie = read_mems_allowed_begin(); > > @@ -2810,8 +2802,6 @@ out: > if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) > goto retry_cpuset; > > - memcg_kmem_commit_charge(page, memcg, order); > - > if (page) > set_page_owner(page, order, gfp_mask); > > @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) > EXPORT_SYMBOL(free_pages); > > /* > - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free > - * pages allocated with __GFP_KMEMCG. > + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter > + * of the current memory cgroup. > * > - * Those pages are accounted to a particular memcg, embedded in the > - * corresponding page_cgroup. To avoid adding a hit in the allocator to search > - * for that information only to find out that it is NULL for users who have no > - * interest in that whatsoever, we provide these functions. > - * > - * The caller knows better which flags it relies on. > + * It should be used when the caller would like to use kmalloc, but since the > + * allocation is large, it has to fall back to the page allocator. > + */ > +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) > +{ > + struct page *page; > + struct mem_cgroup *memcg = NULL; > + > + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) > + return NULL; > + page = alloc_pages(gfp_mask, order); > + memcg_kmem_commit_charge(page, memcg, order); > + return page; > +} > + > +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) > +{ > + struct page *page; > + struct mem_cgroup *memcg = NULL; > + > + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) > + return NULL; > + page = alloc_pages_node(nid, gfp_mask, order); > + memcg_kmem_commit_charge(page, memcg, order); > + return page; > +} > + > +/* > + * __free_kmem_pages and free_kmem_pages will free pages allocated with > + * alloc_kmem_pages. > */ > -void __free_memcg_kmem_pages(struct page *page, unsigned int order) > +void __free_kmem_pages(struct page *page, unsigned int order) > { > memcg_kmem_uncharge_pages(page, order); > __free_pages(page, order); > } > > -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) > +void free_kmem_pages(unsigned long addr, unsigned int order) > { > if (addr != 0) { > VM_BUG_ON(!virt_addr_valid((void *)addr)); > - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); > + __free_kmem_pages(virt_to_page((void *)addr), order); > } > } > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 6673597ac967..cab4c49b3e8c 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) > } > #endif /* !CONFIG_SLOB */ > > +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) > +{ > + void *ret; > + struct page *page; > + > + flags |= __GFP_COMP; > + page = alloc_kmem_pages(flags, order); > + ret = page ? page_address(page) : NULL; > + kmemleak_alloc(ret, size, 1, flags); > + return ret; > +} > + > #ifdef CONFIG_TRACING > void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) > { > diff --git a/mm/slub.c b/mm/slub.c > index b203cfceff95..fa7a1817835e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) > struct page *page; > void *ptr = NULL; > > - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; > - page = alloc_pages_node(node, flags, get_order(size)); > + flags |= __GFP_COMP | __GFP_NOTRACK; > + page = alloc_kmem_pages_node(node, flags, get_order(size)); > if (page) > ptr = page_address(page); > > @@ -3405,7 +3405,7 @@ void kfree(const void *x) > if (unlikely(!PageSlab(page))) { > BUG_ON(!PageCompound(page)); > kfree_hook(x); > - __free_memcg_kmem_pages(page, compound_order(page)); > + __free_kmem_pages(page, compound_order(page)); > return; > } > slab_free(page->slab_cache, page, object, _RET_IP_); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755463AbaDBBRc (ORCPT ); Tue, 1 Apr 2014 21:17:32 -0400 Received: from mail-qa0-f74.google.com ([209.85.216.74]:35295 "EHLO mail-qa0-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751676AbaDBBRa (ORCPT ); Tue, 1 Apr 2014 21:17:30 -0400 References: <031f9e2374dcb4cb6c2e7d509d1276623d5b1fba.1396335798.git.vdavydov@parallels.com> User-agent: mu4e 0.9.9.6pre2; emacs 24.3.1 From: Greg Thelen To: Vladimir Davydov Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Subject: Re: [PATCH -mm v2 1/2] sl[au]b: charge slabs to kmemcg explicitly In-reply-to: <031f9e2374dcb4cb6c2e7d509d1276623d5b1fba.1396335798.git.vdavydov@parallels.com> Date: Tue, 01 Apr 2014 17:49:34 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 01 2014, Vladimir Davydov wrote: > We have only a few places where we actually want to charge kmem so > instead of intruding into the general page allocation path with > __GFP_KMEMCG it's better to explictly charge kmem there. All kmem > charges will be easier to follow that way. > > This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG > from memcg caches' allocflags. Instead it makes slab allocation path > call memcg_charge_kmem directly getting memcg to charge from the cache's > memcg params. > > This also eliminates any possibility of misaccounting an allocation > going from one memcg's cache to another memcg, because now we always > charge slabs against the memcg the cache belongs to. That's why this > patch removes the big comment to memcg_kmem_get_cache. > > Signed-off-by: Vladimir Davydov Acked-by: Greg Thelen From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757432AbaDBGLV (ORCPT ); Wed, 2 Apr 2014 02:11:21 -0400 Received: from relay.parallels.com ([195.214.232.42]:44962 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756703AbaDBGLU (ORCPT ); Wed, 2 Apr 2014 02:11:20 -0400 Message-ID: <533BAA03.9000406@parallels.com> Date: Wed, 2 Apr 2014 10:11:15 +0400 From: Vladimir Davydov MIME-Version: 1.0 To: Greg Thelen CC: , , , , , , Subject: Re: [PATCH -mm v2 2/2] mm: get rid of __GFP_KMEMCG References: In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/02/2014 04:48 AM, Greg Thelen wrote: > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 3dd389aa91c7..6d6959292e00 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -358,17 +358,6 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, > #include > #endif > > -static __always_inline void * > -kmalloc_order(size_t size, gfp_t flags, unsigned int order) > -{ > - void *ret; > - > - flags |= (__GFP_COMP | __GFP_KMEMCG); > - ret = (void *) __get_free_pages(flags, order); > - kmemleak_alloc(ret, size, 1, flags); > - return ret; > -} > - > Removing this from the header file breaks builds without > CONFIG_TRACING. > Example: > % make allnoconfig && make -j4 mm/ > [...] > include/linux/slab.h: In function ‘kmalloc_order_trace’: > include/linux/slab.h:367:2: error: implicit declaration of function ‘kmalloc_order’ [-Werror=implicit-function-declaration] Oh, my bad - forgot to add the function declaration :-( The fixed version is on its way. Thank you for catching this! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757335AbaDBGQJ (ORCPT ); Wed, 2 Apr 2014 02:16:09 -0400 Received: from relay.parallels.com ([195.214.232.42]:45671 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755713AbaDBGQI (ORCPT ); Wed, 2 Apr 2014 02:16:08 -0400 From: Vladimir Davydov To: CC: , , , , , , Subject: [PATCH -mm v2.1] mm: get rid of __GFP_KMEMCG Date: Wed, 2 Apr 2014 10:16:05 +0400 Message-ID: <1396419365-351-1-git-send-email-vdavydov@parallels.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. Signed-off-by: Vladimir Davydov Cc: Johannes Weiner Cc: Michal Hocko Cc: Glauber Costa Cc: Christoph Lameter Cc: Pekka Enberg --- Changes in v2.1: - add missing kmalloc_order forward declaration; lacking it caused compilation breakage with CONFIG_TRACING=n include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 2 +- include/linux/slab.h | 11 +------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab_common.c | 12 +++++++++ mm/slub.c | 6 ++--- 9 files changed, 61 insertions(+), 45 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 29068dd26c3d..13acdb5259f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; diff --git a/include/linux/slab.h b/include/linux/slab.h index 3dd389aa91c7..eaaf2f69c54d 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/kernel/fork.c b/kernel/fork.c index f4b09bc15f3a..a1632c878037 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0327f9d5a8c0..41378986a1e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2810,8 +2802,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - if (page) set_page_owner(page, order, gfp_mask); @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/slab_common.c b/mm/slab_common.c index 6673597ac967..cab4c49b3e8c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index b203cfceff95..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3405,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); -- 1.7.10.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030422AbaDBV0J (ORCPT ); Wed, 2 Apr 2014 17:26:09 -0400 Received: from mail-yh0-f74.google.com ([209.85.213.74]:56692 "EHLO mail-yh0-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030257AbaDBV0G (ORCPT ); Wed, 2 Apr 2014 17:26:06 -0400 References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> User-agent: mu4e 0.9.9.6pre2; emacs 24.3.1 From: Greg Thelen To: Vladimir Davydov Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Subject: Re: [PATCH -mm v2.1] mm: get rid of __GFP_KMEMCG In-reply-to: <1396419365-351-1-git-send-email-vdavydov@parallels.com> Date: Wed, 02 Apr 2014 14:25:47 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 01 2014, Vladimir Davydov wrote: > Currently to allocate a page that should be charged to kmemcg (e.g. > threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page > allocated is then to be freed by free_memcg_kmem_pages. Apart from > looking asymmetrical, this also requires intrusion to the general > allocation path. So let's introduce separate functions that will > alloc/free pages charged to kmemcg. > > The new functions are called alloc_kmem_pages and free_kmem_pages. They > should be used when the caller actually would like to use kmalloc, but > has to fall back to the page allocator for the allocation is large. They > only differ from alloc_pages and free_pages in that besides allocating > or freeing pages they also charge them to the kmem resource counter of > the current memory cgroup. > > Signed-off-by: Vladimir Davydov One comment nit below, otherwise looks good to me. Acked-by: Greg Thelen > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Glauber Costa > Cc: Christoph Lameter > Cc: Pekka Enberg > --- > Changes in v2.1: > - add missing kmalloc_order forward declaration; lacking it caused > compilation breakage with CONFIG_TRACING=n > > include/linux/gfp.h | 10 ++++--- > include/linux/memcontrol.h | 2 +- > include/linux/slab.h | 11 +------- > include/linux/thread_info.h | 2 -- > include/trace/events/gfpflags.h | 1 - > kernel/fork.c | 6 ++--- > mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- > mm/slab_common.c | 12 +++++++++ > mm/slub.c | 6 ++--- > 9 files changed, 61 insertions(+), 45 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 39b81dc7d01a..d382db71e300 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,7 +31,6 @@ struct vm_area_struct; > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > -#define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > @@ -91,7 +90,6 @@ struct vm_area_struct; > > #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) > #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ > -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ > #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ > > /* > @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, > #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ > alloc_pages_vma(gfp_mask, 0, vma, addr, node) > > +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); > +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, > + unsigned int order); > + > extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); > extern unsigned long get_zeroed_page(gfp_t gfp_mask); > > @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); > extern void free_hot_cold_page(struct page *page, int cold); > extern void free_hot_cold_page_list(struct list_head *list, int cold); > > -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); > -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); > +extern void __free_kmem_pages(struct page *page, unsigned int order); > +extern void free_kmem_pages(unsigned long addr, unsigned int order); > > #define __free_page(page) __free_pages((page), 0) > #define free_page(addr) free_pages((addr), 0) > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 29068dd26c3d..13acdb5259f5 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) > * res_counter_charge_nofail, but we hope those allocations are rare, > * and won't be worth the trouble. > */ Just a few lines higher in first memcg_kmem_newpage_charge() comment, there is a leftover reference to GFP_KMEMCG which should be removed. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752658AbaDCPEM (ORCPT ); Thu, 3 Apr 2014 11:04:12 -0400 Received: from relay.parallels.com ([195.214.232.42]:32781 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752302AbaDCPEI (ORCPT ); Thu, 3 Apr 2014 11:04:08 -0400 Message-ID: <533D7864.7080907@parallels.com> Date: Thu, 3 Apr 2014 19:04:04 +0400 From: Vladimir Davydov MIME-Version: 1.0 To: Greg Thelen CC: , , , , , , Subject: Re: [PATCH -mm v2.1] mm: get rid of __GFP_KMEMCG References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/03/2014 01:25 AM, Greg Thelen wrote: > On Tue, Apr 01 2014, Vladimir Davydov wrote: > >> Currently to allocate a page that should be charged to kmemcg (e.g. >> threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page >> allocated is then to be freed by free_memcg_kmem_pages. Apart from >> looking asymmetrical, this also requires intrusion to the general >> allocation path. So let's introduce separate functions that will >> alloc/free pages charged to kmemcg. >> >> The new functions are called alloc_kmem_pages and free_kmem_pages. They >> should be used when the caller actually would like to use kmalloc, but >> has to fall back to the page allocator for the allocation is large. They >> only differ from alloc_pages and free_pages in that besides allocating >> or freeing pages they also charge them to the kmem resource counter of >> the current memory cgroup. >> >> Signed-off-by: Vladimir Davydov > One comment nit below, otherwise looks good to me. > > Acked-by: Greg Thelen > >> Cc: Johannes Weiner >> Cc: Michal Hocko >> Cc: Glauber Costa >> Cc: Christoph Lameter >> Cc: Pekka Enberg >> --- >> Changes in v2.1: >> - add missing kmalloc_order forward declaration; lacking it caused >> compilation breakage with CONFIG_TRACING=n >> >> include/linux/gfp.h | 10 ++++--- >> include/linux/memcontrol.h | 2 +- >> include/linux/slab.h | 11 +------- >> include/linux/thread_info.h | 2 -- >> include/trace/events/gfpflags.h | 1 - >> kernel/fork.c | 6 ++--- >> mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- >> mm/slab_common.c | 12 +++++++++ >> mm/slub.c | 6 ++--- >> 9 files changed, 61 insertions(+), 45 deletions(-) >> >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >> index 39b81dc7d01a..d382db71e300 100644 >> --- a/include/linux/gfp.h >> +++ b/include/linux/gfp.h >> @@ -31,7 +31,6 @@ struct vm_area_struct; >> #define ___GFP_HARDWALL 0x20000u >> #define ___GFP_THISNODE 0x40000u >> #define ___GFP_RECLAIMABLE 0x80000u >> -#define ___GFP_KMEMCG 0x100000u >> #define ___GFP_NOTRACK 0x200000u >> #define ___GFP_NO_KSWAPD 0x400000u >> #define ___GFP_OTHER_NODE 0x800000u >> @@ -91,7 +90,6 @@ struct vm_area_struct; >> >> #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) >> #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ >> -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ >> #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ >> >> /* >> @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, >> #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ >> alloc_pages_vma(gfp_mask, 0, vma, addr, node) >> >> +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); >> +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, >> + unsigned int order); >> + >> extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); >> extern unsigned long get_zeroed_page(gfp_t gfp_mask); >> >> @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); >> extern void free_hot_cold_page(struct page *page, int cold); >> extern void free_hot_cold_page_list(struct list_head *list, int cold); >> >> -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); >> -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); >> +extern void __free_kmem_pages(struct page *page, unsigned int order); >> +extern void free_kmem_pages(unsigned long addr, unsigned int order); >> >> #define __free_page(page) __free_pages((page), 0) >> #define free_page(addr) free_pages((addr), 0) >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 29068dd26c3d..13acdb5259f5 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) >> * res_counter_charge_nofail, but we hope those allocations are rare, >> * and won't be worth the trouble. >> */ > Just a few lines higher in first memcg_kmem_newpage_charge() comment, > there is a leftover reference to GFP_KMEMCG which should be removed. Good catch, will resend. Thank you for the review! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752682AbaDCPGO (ORCPT ); Thu, 3 Apr 2014 11:06:14 -0400 Received: from relay.parallels.com ([195.214.232.42]:33020 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752570AbaDCPGE (ORCPT ); Thu, 3 Apr 2014 11:06:04 -0400 From: Vladimir Davydov To: CC: , , , Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Christoph Lameter , Pekka Enberg Subject: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG Date: Thu, 3 Apr 2014 19:05:59 +0400 Message-ID: <1396537559-17453-1-git-send-email-vdavydov@parallels.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1396419365-351-1-git-send-email-vdavydov@parallels.com> References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. Signed-off-by: Vladimir Davydov Acked-by: Greg Thelen Cc: Johannes Weiner Cc: Michal Hocko Cc: Glauber Costa Cc: Christoph Lameter Cc: Pekka Enberg --- Changes in v2.2: - remove mentioning of GFP_KMEMCG in the comment to __memcg_kmem_newpage_charge Changes in v2.1: - add missing kmalloc_order forward declaration; lacking it caused compilation breakage with CONFIG_TRACING=n include/linux/gfp.h | 10 ++++--- include/linux/memcontrol.h | 2 +- include/linux/slab.h | 11 +------- include/linux/thread_info.h | 2 -- include/trace/events/gfpflags.h | 1 - kernel/fork.c | 6 ++--- mm/memcontrol.c | 11 ++++---- mm/page_alloc.c | 56 ++++++++++++++++++++++++--------------- mm/slab_common.c | 12 +++++++++ mm/slub.c | 6 ++--- 10 files changed, 67 insertions(+), 50 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 29068dd26c3d..13acdb5259f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,7 +543,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; diff --git a/include/linux/slab.h b/include/linux/slab.h index 3dd389aa91c7..eaaf2f69c54d 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/kernel/fork.c b/kernel/fork.c index f4b09bc15f3a..a1632c878037 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 71e1e11ab6a2..ba30b26fc483 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3531,11 +3531,12 @@ __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order) /* * Disabling accounting is only relevant for some specific memcg * internal allocations. Therefore we would initially not have such - * check here, since direct calls to the page allocator that are marked - * with GFP_KMEMCG only happen outside memcg core. We are mostly - * concerned with cache allocations, and by having this test at - * memcg_kmem_get_cache, we are already able to relay the allocation to - * the root cache and bypass the memcg cache altogether. + * check here, since direct calls to the page allocator that are + * accounted to kmemcg (alloc_kmem_pages and friends) only happen + * outside memcg core. We are mostly concerned with cache allocations, + * and by having this test at memcg_kmem_get_cache, we are already able + * to relay the allocation to the root cache and bypass the memcg cache + * altogether. * * There is one exception, though: the SLUB allocator does not create * large order caches, but rather service large kmallocs directly from diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0327f9d5a8c0..41378986a1e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2723,7 +2723,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2742,13 +2741,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2810,8 +2802,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - if (page) set_page_owner(page, order, gfp_mask); @@ -2868,27 +2858,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/slab_common.c b/mm/slab_common.c index 6673597ac967..cab4c49b3e8c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,18 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index b203cfceff95..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3335,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3405,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); -- 1.7.10.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759319AbaDJXie (ORCPT ); Thu, 10 Apr 2014 19:38:34 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:50029 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754206AbaDJXic (ORCPT ); Thu, 10 Apr 2014 19:38:32 -0400 Date: Thu, 10 Apr 2014 16:38:31 -0700 From: Andrew Morton To: Vladimir Davydov Cc: , , , Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Christoph Lameter , Pekka Enberg Subject: Re: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG Message-Id: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> In-Reply-To: <1396537559-17453-1-git-send-email-vdavydov@parallels.com> References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> <1396537559-17453-1-git-send-email-vdavydov@parallels.com> X-Mailer: Sylpheed 3.2.0beta5 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 3 Apr 2014 19:05:59 +0400 Vladimir Davydov wrote: > Currently to allocate a page that should be charged to kmemcg (e.g. > threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page > allocated is then to be freed by free_memcg_kmem_pages. Apart from > looking asymmetrical, this also requires intrusion to the general > allocation path. So let's introduce separate functions that will > alloc/free pages charged to kmemcg. > > The new functions are called alloc_kmem_pages and free_kmem_pages. They > should be used when the caller actually would like to use kmalloc, but > has to fall back to the page allocator for the allocation is large. They > only differ from alloc_pages and free_pages in that besides allocating > or freeing pages they also charge them to the kmem resource counter of > the current memory cgroup. > > ... > > +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) > +{ > + void *ret; > + struct page *page; > + > + flags |= __GFP_COMP; > + page = alloc_kmem_pages(flags, order); > + ret = page ? page_address(page) : NULL; > + kmemleak_alloc(ret, size, 1, flags); > + return ret; > +} While we're in there it wouldn't hurt to document this: why it exists, what it does, etc. And why it sets __GFP_COMP. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756620AbaDKMw0 (ORCPT ); Fri, 11 Apr 2014 08:52:26 -0400 Received: from relay.parallels.com ([195.214.232.42]:56982 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754275AbaDKMwY (ORCPT ); Fri, 11 Apr 2014 08:52:24 -0400 From: Vladimir Davydov To: CC: , , , , , , , , Subject: [PATCH -mm] slab: document kmalloc_order Date: Fri, 11 Apr 2014 16:52:16 +0400 Message-ID: <1397220736-13840-1-git-send-email-vdavydov@parallels.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> References: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.30.16.96] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Signed-off-by: Vladimir Davydov --- mm/slab_common.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index cab4c49b3e8c..3ffd2e76b5d2 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,11 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +/* + * To avoid unnecessary overhead, we pass through large allocation requests + * directly to the page allocator. We use __GFP_COMP, because we will need to + * know the allocation order to free the pages properly in kfree. + */ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) { void *ret; -- 1.7.10.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757288AbaDKP57 (ORCPT ); Fri, 11 Apr 2014 11:57:59 -0400 Received: from qmta10.emeryville.ca.mail.comcast.net ([76.96.30.17]:47731 "EHLO qmta10.emeryville.ca.mail.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754045AbaDKP56 (ORCPT ); Fri, 11 Apr 2014 11:57:58 -0400 Date: Fri, 11 Apr 2014 10:57:54 -0500 (CDT) From: Christoph Lameter X-X-Sender: cl@nuc To: Vladimir Davydov cc: akpm@linux-foundation.org, penberg@kernel.org, gthelen@google.com, hannes@cmpxchg.org, mhocko@suse.cz, glommer@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org Subject: Re: [PATCH -mm] slab: document kmalloc_order In-Reply-To: <1397220736-13840-1-git-send-email-vdavydov@parallels.com> Message-ID: References: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> <1397220736-13840-1-git-send-email-vdavydov@parallels.com> Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 11 Apr 2014, Vladimir Davydov wrote: > diff --git a/mm/slab_common.c b/mm/slab_common.c > index cab4c49b3e8c..3ffd2e76b5d2 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -573,6 +573,11 @@ void __init create_kmalloc_caches(unsigned long flags) > } > #endif /* !CONFIG_SLOB */ > > +/* > + * To avoid unnecessary overhead, we pass through large allocation requests > + * directly to the page allocator. We use __GFP_COMP, because we will need to > + * know the allocation order to free the pages properly in kfree. > + */ > void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) > { > void *ret; > ??? kmalloc_order is defined in include/linux/slab.h From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759687AbaDKQJ0 (ORCPT ); Fri, 11 Apr 2014 12:09:26 -0400 Received: from qmta10.emeryville.ca.mail.comcast.net ([76.96.30.17]:44697 "EHLO qmta10.emeryville.ca.mail.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759445AbaDKQHv (ORCPT ); Fri, 11 Apr 2014 12:07:51 -0400 Date: Fri, 11 Apr 2014 11:07:48 -0500 (CDT) From: Christoph Lameter X-X-Sender: cl@nuc To: Vladimir Davydov cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, devel@openvz.org, Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Pekka Enberg Subject: Re: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG In-Reply-To: <1396537559-17453-1-git-send-email-vdavydov@parallels.com> Message-ID: References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> <1396537559-17453-1-git-send-email-vdavydov@parallels.com> Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 3 Apr 2014, Vladimir Davydov wrote: > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, > #include > #endif > > -static __always_inline void * > -kmalloc_order(size_t size, gfp_t flags, unsigned int order) > -{ > - void *ret; > - > - flags |= (__GFP_COMP | __GFP_KMEMCG); > - ret = (void *) __get_free_pages(flags, order); > - kmemleak_alloc(ret, size, 1, flags); > - return ret; > -} > +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); Hmmm... This was intentional inlined to allow inline expansion for calls to kmalloc with large constants. The inline expansion directly converts these calls to page allocator calls avoiding slab overhead. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030230AbaDKRYq (ORCPT ); Fri, 11 Apr 2014 13:24:46 -0400 Received: from relay.parallels.com ([195.214.232.42]:40105 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754451AbaDKRYn (ORCPT ); Fri, 11 Apr 2014 13:24:43 -0400 Message-ID: <53482555.4070603@parallels.com> Date: Fri, 11 Apr 2014 21:24:37 +0400 From: Vladimir Davydov MIME-Version: 1.0 To: Christoph Lameter CC: , , , , , , , , Subject: Re: [PATCH -mm] slab: document kmalloc_order References: <20140410163831.c76596b0f8d0bef39a42c63f@linux-foundation.org> <1397220736-13840-1-git-send-email-vdavydov@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [81.5.110.170] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/11/2014 07:57 PM, Christoph Lameter wrote: > On Fri, 11 Apr 2014, Vladimir Davydov wrote: > >> diff --git a/mm/slab_common.c b/mm/slab_common.c >> index cab4c49b3e8c..3ffd2e76b5d2 100644 >> --- a/mm/slab_common.c >> +++ b/mm/slab_common.c >> @@ -573,6 +573,11 @@ void __init create_kmalloc_caches(unsigned long flags) >> } >> #endif /* !CONFIG_SLOB */ >> >> +/* >> + * To avoid unnecessary overhead, we pass through large allocation requests >> + * directly to the page allocator. We use __GFP_COMP, because we will need to >> + * know the allocation order to free the pages properly in kfree. >> + */ >> void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) >> { >> void *ret; >> > ??? kmalloc_order is defined in include/linux/slab.h I moved it to slab_common.c in "[PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG" From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753687AbaDKRdS (ORCPT ); Fri, 11 Apr 2014 13:33:18 -0400 Received: from relay.parallels.com ([195.214.232.42]:41099 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750768AbaDKRdR (ORCPT ); Fri, 11 Apr 2014 13:33:17 -0400 Message-ID: <53482754.1090102@parallels.com> Date: Fri, 11 Apr 2014 21:33:08 +0400 From: Vladimir Davydov MIME-Version: 1.0 To: Christoph Lameter CC: , , , , Greg Thelen , Johannes Weiner , Michal Hocko , Glauber Costa , Pekka Enberg Subject: Re: [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG References: <1396419365-351-1-git-send-email-vdavydov@parallels.com> <1396537559-17453-1-git-send-email-vdavydov@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [81.5.110.170] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/11/2014 08:07 PM, Christoph Lameter wrote: > On Thu, 3 Apr 2014, Vladimir Davydov wrote: > >> --- a/include/linux/slab.h >> +++ b/include/linux/slab.h >> @@ -358,16 +358,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, >> #include >> #endif >> >> -static __always_inline void * >> -kmalloc_order(size_t size, gfp_t flags, unsigned int order) >> -{ >> - void *ret; >> - >> - flags |= (__GFP_COMP | __GFP_KMEMCG); >> - ret = (void *) __get_free_pages(flags, order); >> - kmemleak_alloc(ret, size, 1, flags); >> - return ret; >> -} >> +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); > > Hmmm... This was intentional inlined to allow inline expansion for calls > to kmalloc with large constants. The inline expansion directly converts > these calls to page allocator calls avoiding slab overhead. I moved kmalloc_order() to slab_common.c, because I can't call alloc_kmem_pages() directly from the header (we don't have page_address() defined there, and including mm.h to slab.h wouldn't be good I think), and I don't want to introduce __get_free_kmem_pages(). Sorry that I didn't state this explicitly in the comment to the patch. However, would it be any better if I introduced __get_free_kmem_pages() and called it from kmalloc_order(), which could be inlined then? I don't think so, because I would have to place __get_free_kmem_pages() in page_alloc.c just like __get_free_pages() (again, because including mm.h to gfp.h for page_address() isn't an option), and we would get exactly the same number of function calls. I admit that this patch adds one extra function call to large kmallocs: - before: __get_free_pages -> alloc_pages - after: kmalloc_order -> alloc_kmem_pages -> alloc_pages but that's not because I move kmalloc_order from the header, but rather because I introduce alloc_kmem_pages, which is not inline. What can we do to eliminate it? We could place alloc_kmem_pages() definition to a header file, but since it needs memcontrol.h for kmemcg charging functions, that would lead to slab.h depending on memcontrol.h eventually, which is not good IMO. Alternatively, we could #ifndef MEMCG_KMEM # define alloc_kmem_pages alloc_pages #endif so that we would avoid any additional overhead if kmemcg is compiled out. However, do we need to bother about one function call at all? My point is that one function call can be neglected in case of large kmem allocations, which are rather rare. Any thoughts/objections? Thanks.