* [patch 0/8] memcg: charge path cleanups
@ 2014-02-07 17:04 Johannes Weiner
2014-02-07 17:04 ` [patch 1/8] mm: memcg: remove unnecessary preemption disabling Johannes Weiner
` (9 more replies)
0 siblings, 10 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
Hi Michal,
I took some of your patches and combined them with the charge path
cleanups I already had and the changes I made after our discussion.
I'm really happy about where this is going:
mm/memcontrol.c | 298 ++++++++++++++++--------------------------------------
1 file changed, 87 insertions(+), 211 deletions(-)
let me know what you think!
Johannes
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [patch 1/8] mm: memcg: remove unnecessary preemption disabling
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-10 14:17 ` Michal Hocko
2014-02-07 17:04 ` [patch 2/8] mm: memcg: remove mem_cgroup_move_account_page_stat() Johannes Weiner
` (8 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
lock_page_cgroup() disables preemption, remove explicit preemption
disabling for code paths holding this lock.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 53385cd4e6f0..befb3dd9d46c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -921,8 +921,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
struct page *page,
bool anon, int nr_pages)
{
- preempt_disable();
-
/*
* Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
* counted as CACHE even if it's on ANON LRU.
@@ -947,8 +945,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
}
__this_cpu_add(memcg->stat->nr_page_events, nr_pages);
-
- preempt_enable();
}
unsigned long
@@ -3780,17 +3776,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline
-void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
- struct mem_cgroup *to,
- unsigned int nr_pages,
- enum mem_cgroup_stat_index idx)
+static void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
+ struct mem_cgroup *to,
+ unsigned int nr_pages,
+ enum mem_cgroup_stat_index idx)
{
/* Update stat data for mem_cgroup */
- preempt_disable();
__this_cpu_sub(from->stat->count[idx], nr_pages);
__this_cpu_add(to->stat->count[idx], nr_pages);
- preempt_enable();
}
/**
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 2/8] mm: memcg: remove mem_cgroup_move_account_page_stat()
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
2014-02-07 17:04 ` [patch 1/8] mm: memcg: remove unnecessary preemption disabling Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-10 14:19 ` Michal Hocko
2014-02-07 17:04 ` [patch 3/8] memcg: update comment about charge reparenting on cgroup exit Johannes Weiner
` (7 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
It used to disable preemption and run sanity checks but now it's only
taking a number out of one percpu counter and putting it into another.
Do this directly in the callsite and save the indirection.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index befb3dd9d46c..639cf58b2643 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3776,16 +3776,6 @@ void mem_cgroup_split_huge_fixup(struct page *head)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-static void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
- struct mem_cgroup *to,
- unsigned int nr_pages,
- enum mem_cgroup_stat_index idx)
-{
- /* Update stat data for mem_cgroup */
- __this_cpu_sub(from->stat->count[idx], nr_pages);
- __this_cpu_add(to->stat->count[idx], nr_pages);
-}
-
/**
* mem_cgroup_move_account - move account of the page
* @page: the page
@@ -3831,13 +3821,19 @@ static int mem_cgroup_move_account(struct page *page,
move_lock_mem_cgroup(from, &flags);
- if (!anon && page_mapped(page))
- mem_cgroup_move_account_page_stat(from, to, nr_pages,
- MEM_CGROUP_STAT_FILE_MAPPED);
+ if (!anon && page_mapped(page)) {
+ __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED],
+ nr_pages);
+ __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED],
+ nr_pages);
+ }
- if (PageWriteback(page))
- mem_cgroup_move_account_page_stat(from, to, nr_pages,
- MEM_CGROUP_STAT_WRITEBACK);
+ if (PageWriteback(page)) {
+ __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_WRITEBACK],
+ nr_pages);
+ __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_WRITEBACK],
+ nr_pages);
+ }
mem_cgroup_charge_statistics(from, page, anon, -nr_pages);
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 3/8] memcg: update comment about charge reparenting on cgroup exit
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
2014-02-07 17:04 ` [patch 1/8] mm: memcg: remove unnecessary preemption disabling Johannes Weiner
2014-02-07 17:04 ` [patch 2/8] mm: memcg: remove mem_cgroup_move_account_page_stat() Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-10 14:23 ` Michal Hocko
2014-02-07 17:04 ` [patch 4/8] memcg: !memcg && !mm is not allowed for __mem_cgroup_try_charge Johannes Weiner
` (6 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
Reparenting memory charges in the css_free() callback was meant as a
temporary fix for charges that race with offlining, but after some
follow-up discussion, it turns out that this is really the right place
to reparent charges because it guarantees none are in-flight.
Make clear that the reparenting in css_offline() is an optimistic
sweep of established charges because swapout records might hold up
css_free() indefinitely, but that in fact the css_free() reparenting
is the properly synchronized one.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 52 +++++++++++++++-------------------------------------
1 file changed, 15 insertions(+), 37 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 639cf58b2643..b8a96c7d1167 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6600,51 +6600,29 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
kmem_cgroup_css_offline(memcg);
mem_cgroup_invalidate_reclaim_iterators(memcg);
- mem_cgroup_reparent_charges(memcg);
mem_cgroup_destroy_all_caches(memcg);
vmpressure_cleanup(&memcg->vmpressure);
+ /*
+ * Memcg gets css references while charging the res_counter,
+ * so we reparent charges in .css_free() when the references
+ * are gone and we know there are no in-flight charges.
+ *
+ * However, at this time, swapout records also hold css refs
+ * indefinitely beyond offlining, which prevent .css_free()
+ * from being called. But after offlining, css_tryget() is
+ * disabled, which means that all the left-over page cache in
+ * the group would be stuck without being reclaimable. Clear
+ * out all those already established charges optimistically
+ * here, and catch any raced charges in .css_free() later on.
+ */
+ mem_cgroup_reparent_charges(memcg);
}
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
- /*
- * XXX: css_offline() would be where we should reparent all
- * memory to prepare the cgroup for destruction. However,
- * memcg does not do css_tryget() and res_counter charging
- * under the same RCU lock region, which means that charging
- * could race with offlining. Offlining only happens to
- * cgroups with no tasks in them but charges can show up
- * without any tasks from the swapin path when the target
- * memcg is looked up from the swapout record and not from the
- * current task as it usually is. A race like this can leak
- * charges and put pages with stale cgroup pointers into
- * circulation:
- *
- * #0 #1
- * lookup_swap_cgroup_id()
- * rcu_read_lock()
- * mem_cgroup_lookup()
- * css_tryget()
- * rcu_read_unlock()
- * disable css_tryget()
- * call_rcu()
- * offline_css()
- * reparent_charges()
- * res_counter_charge()
- * css_put()
- * css_free()
- * pc->mem_cgroup = dead memcg
- * add page to lru
- *
- * The bulk of the charges are still moved in offline_css() to
- * avoid pinning a lot of pages in case a long-term reference
- * like a swapout record is deferring the css_free() to long
- * after offlining. But this makes sure we catch any charges
- * made after offlining:
- */
- mem_cgroup_reparent_charges(memcg);
+ mem_cgroup_reparent_charges(memcg);
memcg_destroy_kmem(memcg);
__mem_cgroup_free(memcg);
}
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 4/8] memcg: !memcg && !mm is not allowed for __mem_cgroup_try_charge
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (2 preceding siblings ...)
2014-02-07 17:04 ` [patch 3/8] memcg: update comment about charge reparenting on cgroup exit Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-07 17:04 ` [patch 5/8] memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm() Johannes Weiner
` (5 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
From: Michal Hocko <mhocko@suse.cz>
An ancient comment tries to explain that a given mm might be NULL when a
task is migrated. It has been introduced by 8a9f3ccd (Memory controller:
memory accounting) along with other bigger changes so it is not much
more specific about the conditions.
Anyway, Even if the task is migrated to another memcg there is no way we
can see NULL mm struct. So either this was not correct from the very
beginning or it is not true anymore.
The only remaining case would be seeing charges after exit_mm but that
would be a bug on its own as the task doesn't have an address space
anymore.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b8a96c7d1167..e1d7f33227e4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2737,15 +2737,6 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
if (gfp_mask & __GFP_NOFAIL)
oom = false;
-
- /*
- * We always charge the cgroup the mm_struct belongs to.
- * The mm_struct's mem_cgroup changes on task migration if the
- * thread group leader migrates. It's possible that mm is not
- * set, if so charge the root memcg (happens for pagecache usage).
- */
- if (!*ptr && !mm)
- *ptr = root_mem_cgroup;
again:
if (*ptr) { /* css should be a valid one */
memcg = *ptr;
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 5/8] memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm()
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (3 preceding siblings ...)
2014-02-07 17:04 ` [patch 4/8] memcg: !memcg && !mm is not allowed for __mem_cgroup_try_charge Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-10 14:29 ` Michal Hocko
2014-02-07 17:04 ` [patch 6/8] memcg: get_mem_cgroup_from_mm() Johannes Weiner
` (4 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
Users pass either a mm that has been established under task lock, or
use a verified current->mm, which means the task can't be exiting.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e1d7f33227e4..689fffdee471 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1075,13 +1075,6 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
{
struct mem_cgroup *memcg = NULL;
- if (!mm)
- return NULL;
- /*
- * Because we have no locks, mm->owner's may be being moved to other
- * cgroup. We use css_tryget() here even if this looks
- * pessimistic (rather than adding locks here).
- */
rcu_read_lock();
do {
memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 6/8] memcg: get_mem_cgroup_from_mm()
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (4 preceding siblings ...)
2014-02-07 17:04 ` [patch 5/8] memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm() Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-10 14:41 ` Michal Hocko
2014-02-07 17:04 ` [patch 7/8] memcg: do not replicate get_mem_cgroup_from_mm in __mem_cgroup_try_charge Johannes Weiner
` (3 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
Instead of returning NULL from try_get_mem_cgroup_from_mm() when the
mm owner is exiting, just return root_mem_cgroup. This makes sense
for all callsites and gets rid of some of them having to fallback
manually.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 689fffdee471..37946635655c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1071,7 +1071,7 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
return mem_cgroup_from_css(task_css(p, mem_cgroup_subsys_id));
}
-struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
+struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
{
struct mem_cgroup *memcg = NULL;
@@ -1079,7 +1079,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
do {
memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
if (unlikely(!memcg))
- break;
+ memcg = root_mem_cgroup;
} while (!css_tryget(&memcg->css));
rcu_read_unlock();
return memcg;
@@ -1475,7 +1475,7 @@ bool task_in_mem_cgroup(struct task_struct *task,
p = find_lock_task_mm(task);
if (p) {
- curr = try_get_mem_cgroup_from_mm(p->mm);
+ curr = get_mem_cgroup_from_mm(p->mm);
task_unlock(p);
} else {
/*
@@ -1489,8 +1489,6 @@ bool task_in_mem_cgroup(struct task_struct *task,
css_get(&curr->css);
rcu_read_unlock();
}
- if (!curr)
- return false;
/*
* We should check use_hierarchy of "memcg" not "curr". Because checking
* use_hierarchy of "curr" here make this function true if hierarchy is
@@ -3649,15 +3647,7 @@ __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order)
if (!current->mm || current->memcg_kmem_skip_account)
return true;
- memcg = try_get_mem_cgroup_from_mm(current->mm);
-
- /*
- * very rare case described in mem_cgroup_from_task. Unfortunately there
- * isn't much we can do without complicating this too much, and it would
- * be gfp-dependent anyway. Just let it go
- */
- if (unlikely(!memcg))
- return true;
+ memcg = get_mem_cgroup_from_mm(current->mm);
if (!memcg_can_account_kmem(memcg)) {
css_put(&memcg->css);
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 7/8] memcg: do not replicate get_mem_cgroup_from_mm in __mem_cgroup_try_charge
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (5 preceding siblings ...)
2014-02-07 17:04 ` [patch 6/8] memcg: get_mem_cgroup_from_mm() Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-07 17:04 ` [patch 8/8] memcg: sanitize __mem_cgroup_try_charge() call protocol Johannes Weiner
` (2 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
From: Michal Hocko <mhocko@suse.cz>
__mem_cgroup_try_charge duplicates get_mem_cgroup_from_mm for charges
which came without a memcg. The only reason seems to be a tiny
optimization when css_tryget is not called if the charge can be
consumed from the stock. Nevertheless css_tryget is very cheap since
it has been reworked to use per-cpu counting so this optimization
doesn't give us anything these days.
So let's drop the code duplication so that the code is more readable.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 50 ++++++--------------------------------------------
1 file changed, 6 insertions(+), 44 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 37946635655c..9eabe4473649 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2731,52 +2731,14 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
again:
if (*ptr) { /* css should be a valid one */
memcg = *ptr;
- if (mem_cgroup_is_root(memcg))
- goto done;
- if (consume_stock(memcg, nr_pages))
- goto done;
css_get(&memcg->css);
} else {
- struct task_struct *p;
-
- rcu_read_lock();
- p = rcu_dereference(mm->owner);
- /*
- * Because we don't have task_lock(), "p" can exit.
- * In that case, "memcg" can point to root or p can be NULL with
- * race with swapoff. Then, we have small risk of mis-accouning.
- * But such kind of mis-account by race always happens because
- * we don't have cgroup_mutex(). It's overkill and we allo that
- * small race, here.
- * (*) swapoff at el will charge against mm-struct not against
- * task-struct. So, mm->owner can be NULL.
- */
- memcg = mem_cgroup_from_task(p);
- if (!memcg)
- memcg = root_mem_cgroup;
- if (mem_cgroup_is_root(memcg)) {
- rcu_read_unlock();
- goto done;
- }
- if (consume_stock(memcg, nr_pages)) {
- /*
- * It seems dagerous to access memcg without css_get().
- * But considering how consume_stok works, it's not
- * necessary. If consume_stock success, some charges
- * from this memcg are cached on this cpu. So, we
- * don't need to call css_get()/css_tryget() before
- * calling consume_stock().
- */
- rcu_read_unlock();
- goto done;
- }
- /* after here, we may be blocked. we need to get refcnt */
- if (!css_tryget(&memcg->css)) {
- rcu_read_unlock();
- goto again;
- }
- rcu_read_unlock();
+ memcg = get_mem_cgroup_from_mm(mm);
}
+ if (mem_cgroup_is_root(memcg))
+ goto done;
+ if (consume_stock(memcg, nr_pages))
+ goto done;
do {
bool invoke_oom = oom && !nr_oom_retries;
@@ -2812,8 +2774,8 @@ again:
if (batch > nr_pages)
refill_stock(memcg, batch - nr_pages);
- css_put(&memcg->css);
done:
+ css_put(&memcg->css);
*ptr = memcg;
return 0;
nomem:
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 8/8] memcg: sanitize __mem_cgroup_try_charge() call protocol
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (6 preceding siblings ...)
2014-02-07 17:04 ` [patch 7/8] memcg: do not replicate get_mem_cgroup_from_mm in __mem_cgroup_try_charge Johannes Weiner
@ 2014-02-07 17:04 ` Johannes Weiner
2014-02-10 15:28 ` Michal Hocko
2014-02-07 17:07 ` [patch 0/8] memcg: charge path cleanups Michal Hocko
2014-03-10 15:55 ` Michal Hocko
9 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2014-02-07 17:04 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linux-kernel
Some callsites pass a memcg directly, some callsites pass a mm that
first has to be translated to an mm. This makes for a terrible
function interface.
Just push the mm-to-memcg translation into the respective callsites
and always pass a memcg to mem_cgroup_try_charge().
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 135 +++++++++++++++++++++++---------------------------------
1 file changed, 54 insertions(+), 81 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9eabe4473649..0049081b834d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2609,7 +2609,7 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
}
-/* See __mem_cgroup_try_charge() for details */
+/* See mem_cgroup_try_charge() for details */
enum {
CHARGE_OK, /* success */
CHARGE_RETRY, /* need to retry but retry is not bad */
@@ -2682,45 +2682,34 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
return CHARGE_NOMEM;
}
-/*
- * __mem_cgroup_try_charge() does
- * 1. detect memcg to be charged against from passed *mm and *ptr,
- * 2. update res_counter
- * 3. call memory reclaim if necessary.
- *
- * In some special case, if the task is fatal, fatal_signal_pending() or
- * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
- * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
- * as possible without any hazards. 2: all pages should have a valid
- * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
- * pointer, that is treated as a charge to root_mem_cgroup.
- *
- * So __mem_cgroup_try_charge() will return
- * 0 ... on success, filling *ptr with a valid memcg pointer.
- * -ENOMEM ... charge failure because of resource limits.
- * -EINTR ... if thread is fatal. *ptr is filled with root_mem_cgroup.
+/**
+ * mem_cgroup_try_charge - try charging a memcg
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ * @oom: trigger OOM if reclaim fails
*
- * Unlike the exported interface, an "oom" parameter is added. if oom==true,
- * the oom-killer can be invoked.
+ * Returns 0 if @memcg was charged successfully, -EINTR if the charge
+ * was bypassed to root_mem_cgroup, and -ENOMEM if the charge failed.
*/
-static int __mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t gfp_mask,
- unsigned int nr_pages,
- struct mem_cgroup **ptr,
- bool oom)
+static int mem_cgroup_try_charge(struct mem_cgroup *memcg,
+ gfp_t gfp_mask,
+ unsigned int nr_pages,
+ bool oom)
{
unsigned int batch = max(CHARGE_BATCH, nr_pages);
int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
- struct mem_cgroup *memcg = NULL;
int ret;
+ if (mem_cgroup_is_root(memcg))
+ goto done;
/*
- * Unlike gloval-vm's OOM-kill, we're not in memory shortage
- * in system level. So, allow to go ahead dying process in addition to
- * MEMDIE process.
+ * Unlike in global OOM situations, memcg is not in a physical
+ * memory shortage. Allow dying and OOM-killed tasks to
+ * bypass the last charges so that they can exit quickly and
+ * free their memory.
*/
- if (unlikely(test_thread_flag(TIF_MEMDIE)
- || fatal_signal_pending(current)))
+ if (unlikely(test_thread_flag(TIF_MEMDIE) ||
+ fatal_signal_pending(current)))
goto bypass;
if (unlikely(task_in_memcg_oom(current)))
@@ -2729,14 +2718,6 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
if (gfp_mask & __GFP_NOFAIL)
oom = false;
again:
- if (*ptr) { /* css should be a valid one */
- memcg = *ptr;
- css_get(&memcg->css);
- } else {
- memcg = get_mem_cgroup_from_mm(mm);
- }
- if (mem_cgroup_is_root(memcg))
- goto done;
if (consume_stock(memcg, nr_pages))
goto done;
@@ -2744,10 +2725,8 @@ again:
bool invoke_oom = oom && !nr_oom_retries;
/* If killed, bypass charge */
- if (fatal_signal_pending(current)) {
- css_put(&memcg->css);
+ if (fatal_signal_pending(current))
goto bypass;
- }
ret = mem_cgroup_do_charge(memcg, gfp_mask, batch,
nr_pages, invoke_oom);
@@ -2756,17 +2735,12 @@ again:
break;
case CHARGE_RETRY: /* not in OOM situation but retry */
batch = nr_pages;
- css_put(&memcg->css);
- memcg = NULL;
goto again;
case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
- css_put(&memcg->css);
goto nomem;
case CHARGE_NOMEM: /* OOM routine works */
- if (!oom || invoke_oom) {
- css_put(&memcg->css);
+ if (!oom || invoke_oom)
goto nomem;
- }
nr_oom_retries--;
break;
}
@@ -2775,16 +2749,11 @@ again:
if (batch > nr_pages)
refill_stock(memcg, batch - nr_pages);
done:
- css_put(&memcg->css);
- *ptr = memcg;
return 0;
nomem:
- if (!(gfp_mask & __GFP_NOFAIL)) {
- *ptr = NULL;
+ if (!(gfp_mask & __GFP_NOFAIL))
return -ENOMEM;
- }
bypass:
- *ptr = root_mem_cgroup;
return -EINTR;
}
@@ -2983,20 +2952,17 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
{
struct res_counter *fail_res;
- struct mem_cgroup *_memcg;
int ret = 0;
ret = res_counter_charge(&memcg->kmem, size, &fail_res);
if (ret)
return ret;
- _memcg = memcg;
- ret = __mem_cgroup_try_charge(NULL, gfp, size >> PAGE_SHIFT,
- &_memcg, oom_gfp_allowed(gfp));
-
+ ret = mem_cgroup_try_charge(memcg, gfp, size >> PAGE_SHIFT,
+ oom_gfp_allowed(gfp));
if (ret == -EINTR) {
/*
- * __mem_cgroup_try_charge() chosed to bypass to root due to
+ * mem_cgroup_try_charge() chosed to bypass to root due to
* OOM kill or fatal signal. Since our only options are to
* either fail the allocation or charge it to this cgroup, do
* it as a temporary condition. But we can't fail. From a
@@ -3006,7 +2972,7 @@ static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
*
* This condition will only trigger if the task entered
* memcg_charge_kmem in a sane state, but was OOM-killed during
- * __mem_cgroup_try_charge() above. Tasks that were already
+ * mem_cgroup_try_charge() above. Tasks that were already
* dying when the allocation triggers should have been already
* directed to the root cgroup in memcontrol.h
*/
@@ -3864,8 +3830,8 @@ out:
static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask, enum charge_type ctype)
{
- struct mem_cgroup *memcg = NULL;
unsigned int nr_pages = 1;
+ struct mem_cgroup *memcg;
bool oom = true;
int ret;
@@ -3879,8 +3845,14 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
oom = false;
}
- ret = __mem_cgroup_try_charge(mm, gfp_mask, nr_pages, &memcg, oom);
- if (ret == -ENOMEM)
+ memcg = get_mem_cgroup_from_mm(mm);
+ ret = mem_cgroup_try_charge(memcg, gfp_mask, nr_pages, oom);
+ css_put(&memcg->css);
+ if (ret == -EINTR) {
+ memcg = root_mem_cgroup;
+ ret = 0;
+ }
+ if (ret)
return ret;
__mem_cgroup_commit_charge(memcg, page, nr_pages, ctype, false);
return 0;
@@ -3909,7 +3881,7 @@ static int __mem_cgroup_try_charge_swapin(struct mm_struct *mm,
gfp_t mask,
struct mem_cgroup **memcgp)
{
- struct mem_cgroup *memcg;
+ struct mem_cgroup *memcg = NULL;
struct page_cgroup *pc;
int ret;
@@ -3923,21 +3895,17 @@ static int __mem_cgroup_try_charge_swapin(struct mm_struct *mm,
*/
if (PageCgroupUsed(pc))
return 0;
- if (!do_swap_account)
- goto charge_cur_mm;
- memcg = try_get_mem_cgroup_from_page(page);
+ if (do_swap_account)
+ memcg = try_get_mem_cgroup_from_page(page);
if (!memcg)
- goto charge_cur_mm;
- *memcgp = memcg;
- ret = __mem_cgroup_try_charge(NULL, mask, 1, memcgp, true);
+ memcg = get_mem_cgroup_from_mm(mm);
+ ret = mem_cgroup_try_charge(memcg, mask, 1, true);
css_put(&memcg->css);
- if (ret == -EINTR)
- ret = 0;
- return ret;
-charge_cur_mm:
- ret = __mem_cgroup_try_charge(mm, mask, 1, memcgp, true);
- if (ret == -EINTR)
+ if (ret == -EINTR) {
+ memcg = root_mem_cgroup;
ret = 0;
+ }
+ *memcgp = memcg;
return ret;
}
@@ -3954,11 +3922,17 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm, struct page *page,
* there's also a KSM case which does need to charge the page.
*/
if (!PageSwapCache(page)) {
+ struct mem_cgroup *memcg;
int ret;
- ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, memcgp, true);
- if (ret == -EINTR)
+ memcg = get_mem_cgroup_from_mm(mm);
+ ret = mem_cgroup_try_charge(memcg, gfp_mask, 1, true);
+ css_put(&memcg->css);
+ if (ret == -EINTR) {
+ memcg = root_mem_cgroup;
ret = 0;
+ }
+ *memcgp = memcg;
return ret;
}
return __mem_cgroup_try_charge_swapin(mm, page, gfp_mask, memcgp);
@@ -6607,8 +6581,7 @@ one_by_one:
batch_count = PRECHARGE_COUNT_AT_ONCE;
cond_resched();
}
- ret = __mem_cgroup_try_charge(NULL,
- GFP_KERNEL, 1, &memcg, false);
+ ret = mem_cgroup_try_charge(memcg, GFP_KERNEL, 1, false);
if (ret)
/* mem_cgroup_clear_mc() will do uncharge later */
return ret;
--
1.8.5.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [patch 0/8] memcg: charge path cleanups
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (7 preceding siblings ...)
2014-02-07 17:04 ` [patch 8/8] memcg: sanitize __mem_cgroup_try_charge() call protocol Johannes Weiner
@ 2014-02-07 17:07 ` Michal Hocko
2014-03-10 15:55 ` Michal Hocko
9 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-02-07 17:07 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:17, Johannes Weiner wrote:
> Hi Michal,
>
> I took some of your patches and combined them with the charge path
> cleanups I already had and the changes I made after our discussion.
>
> I'm really happy about where this is going:
>
> mm/memcontrol.c | 298 ++++++++++++++++--------------------------------------
> 1 file changed, 87 insertions(+), 211 deletions(-)
The diffstat looks really nice. I will not get to the patches earlier
than on Monday.
> let me know what you think!
>
> Johannes
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 1/8] mm: memcg: remove unnecessary preemption disabling
2014-02-07 17:04 ` [patch 1/8] mm: memcg: remove unnecessary preemption disabling Johannes Weiner
@ 2014-02-10 14:17 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-02-10 14:17 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:18, Johannes Weiner wrote:
> lock_page_cgroup() disables preemption, remove explicit preemption
> disabling for code paths holding this lock.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
It would be better to document the dependency on lock_page_cgroup. But
the patch looks correct.
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 15 ++++-----------
> 1 file changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 53385cd4e6f0..befb3dd9d46c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -921,8 +921,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
> struct page *page,
> bool anon, int nr_pages)
> {
> - preempt_disable();
> -
> /*
> * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
> * counted as CACHE even if it's on ANON LRU.
> @@ -947,8 +945,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
> }
>
> __this_cpu_add(memcg->stat->nr_page_events, nr_pages);
> -
> - preempt_enable();
> }
>
> unsigned long
> @@ -3780,17 +3776,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> -static inline
> -void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
> - struct mem_cgroup *to,
> - unsigned int nr_pages,
> - enum mem_cgroup_stat_index idx)
> +static void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
> + struct mem_cgroup *to,
> + unsigned int nr_pages,
> + enum mem_cgroup_stat_index idx)
> {
> /* Update stat data for mem_cgroup */
> - preempt_disable();
> __this_cpu_sub(from->stat->count[idx], nr_pages);
> __this_cpu_add(to->stat->count[idx], nr_pages);
> - preempt_enable();
> }
>
> /**
> --
> 1.8.5.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 2/8] mm: memcg: remove mem_cgroup_move_account_page_stat()
2014-02-07 17:04 ` [patch 2/8] mm: memcg: remove mem_cgroup_move_account_page_stat() Johannes Weiner
@ 2014-02-10 14:19 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-02-10 14:19 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:19, Johannes Weiner wrote:
> It used to disable preemption and run sanity checks but now it's only
> taking a number out of one percpu counter and putting it into another.
> Do this directly in the callsite and save the indirection.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
OK, why not.
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 28 ++++++++++++----------------
> 1 file changed, 12 insertions(+), 16 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index befb3dd9d46c..639cf58b2643 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3776,16 +3776,6 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> -static void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
> - struct mem_cgroup *to,
> - unsigned int nr_pages,
> - enum mem_cgroup_stat_index idx)
> -{
> - /* Update stat data for mem_cgroup */
> - __this_cpu_sub(from->stat->count[idx], nr_pages);
> - __this_cpu_add(to->stat->count[idx], nr_pages);
> -}
> -
> /**
> * mem_cgroup_move_account - move account of the page
> * @page: the page
> @@ -3831,13 +3821,19 @@ static int mem_cgroup_move_account(struct page *page,
>
> move_lock_mem_cgroup(from, &flags);
>
> - if (!anon && page_mapped(page))
> - mem_cgroup_move_account_page_stat(from, to, nr_pages,
> - MEM_CGROUP_STAT_FILE_MAPPED);
> + if (!anon && page_mapped(page)) {
> + __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED],
> + nr_pages);
> + __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED],
> + nr_pages);
> + }
>
> - if (PageWriteback(page))
> - mem_cgroup_move_account_page_stat(from, to, nr_pages,
> - MEM_CGROUP_STAT_WRITEBACK);
> + if (PageWriteback(page)) {
> + __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_WRITEBACK],
> + nr_pages);
> + __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_WRITEBACK],
> + nr_pages);
> + }
>
> mem_cgroup_charge_statistics(from, page, anon, -nr_pages);
>
> --
> 1.8.5.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 3/8] memcg: update comment about charge reparenting on cgroup exit
2014-02-07 17:04 ` [patch 3/8] memcg: update comment about charge reparenting on cgroup exit Johannes Weiner
@ 2014-02-10 14:23 ` Michal Hocko
2014-02-10 20:12 ` Hugh Dickins
0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2014-02-10 14:23 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:20, Johannes Weiner wrote:
> Reparenting memory charges in the css_free() callback was meant as a
> temporary fix for charges that race with offlining, but after some
> follow-up discussion, it turns out that this is really the right place
> to reparent charges because it guarantees none are in-flight.
>
> Make clear that the reparenting in css_offline() is an optimistic
> sweep of established charges because swapout records might hold up
> css_free() indefinitely, but that in fact the css_free() reparenting
> is the properly synchronized one.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
OK, I am still thinking about 2 stage reparenting. LRU drain part called
from css_offline and charge drain from css_free. But this is a
sufficient for now.
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 52 +++++++++++++++-------------------------------------
> 1 file changed, 15 insertions(+), 37 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 639cf58b2643..b8a96c7d1167 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6600,51 +6600,29 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
> kmem_cgroup_css_offline(memcg);
>
> mem_cgroup_invalidate_reclaim_iterators(memcg);
> - mem_cgroup_reparent_charges(memcg);
> mem_cgroup_destroy_all_caches(memcg);
> vmpressure_cleanup(&memcg->vmpressure);
> + /*
> + * Memcg gets css references while charging the res_counter,
> + * so we reparent charges in .css_free() when the references
> + * are gone and we know there are no in-flight charges.
> + *
> + * However, at this time, swapout records also hold css refs
> + * indefinitely beyond offlining, which prevent .css_free()
> + * from being called. But after offlining, css_tryget() is
> + * disabled, which means that all the left-over page cache in
> + * the group would be stuck without being reclaimable. Clear
> + * out all those already established charges optimistically
> + * here, and catch any raced charges in .css_free() later on.
> + */
> + mem_cgroup_reparent_charges(memcg);
> }
>
> static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> - /*
> - * XXX: css_offline() would be where we should reparent all
> - * memory to prepare the cgroup for destruction. However,
> - * memcg does not do css_tryget() and res_counter charging
> - * under the same RCU lock region, which means that charging
> - * could race with offlining. Offlining only happens to
> - * cgroups with no tasks in them but charges can show up
> - * without any tasks from the swapin path when the target
> - * memcg is looked up from the swapout record and not from the
> - * current task as it usually is. A race like this can leak
> - * charges and put pages with stale cgroup pointers into
> - * circulation:
> - *
> - * #0 #1
> - * lookup_swap_cgroup_id()
> - * rcu_read_lock()
> - * mem_cgroup_lookup()
> - * css_tryget()
> - * rcu_read_unlock()
> - * disable css_tryget()
> - * call_rcu()
> - * offline_css()
> - * reparent_charges()
> - * res_counter_charge()
> - * css_put()
> - * css_free()
> - * pc->mem_cgroup = dead memcg
> - * add page to lru
> - *
> - * The bulk of the charges are still moved in offline_css() to
> - * avoid pinning a lot of pages in case a long-term reference
> - * like a swapout record is deferring the css_free() to long
> - * after offlining. But this makes sure we catch any charges
> - * made after offlining:
> - */
> - mem_cgroup_reparent_charges(memcg);
>
> + mem_cgroup_reparent_charges(memcg);
> memcg_destroy_kmem(memcg);
> __mem_cgroup_free(memcg);
> }
> --
> 1.8.5.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 5/8] memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm()
2014-02-07 17:04 ` [patch 5/8] memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm() Johannes Weiner
@ 2014-02-10 14:29 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-02-10 14:29 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:22, Johannes Weiner wrote:
> Users pass either a mm that has been established under task lock, or
> use a verified current->mm, which means the task can't be exiting.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e1d7f33227e4..689fffdee471 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1075,13 +1075,6 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
> {
> struct mem_cgroup *memcg = NULL;
>
> - if (!mm)
> - return NULL;
> - /*
> - * Because we have no locks, mm->owner's may be being moved to other
> - * cgroup. We use css_tryget() here even if this looks
> - * pessimistic (rather than adding locks here).
> - */
> rcu_read_lock();
> do {
> memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> --
> 1.8.5.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 6/8] memcg: get_mem_cgroup_from_mm()
2014-02-07 17:04 ` [patch 6/8] memcg: get_mem_cgroup_from_mm() Johannes Weiner
@ 2014-02-10 14:41 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-02-10 14:41 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:23, Johannes Weiner wrote:
> Instead of returning NULL from try_get_mem_cgroup_from_mm() when the
> mm owner is exiting, just return root_mem_cgroup. This makes sense
> for all callsites and gets rid of some of them having to fallback
> manually.
It makes sense now that css reference counting is basically for free
so we do not have to prevent reference counting on the root.
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/memcontrol.c | 18 ++++--------------
> 1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 689fffdee471..37946635655c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1071,7 +1071,7 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
> return mem_cgroup_from_css(task_css(p, mem_cgroup_subsys_id));
> }
>
> -struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
> +struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
> {
> struct mem_cgroup *memcg = NULL;
>
> @@ -1079,7 +1079,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
> do {
> memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> if (unlikely(!memcg))
> - break;
> + memcg = root_mem_cgroup;
> } while (!css_tryget(&memcg->css));
> rcu_read_unlock();
> return memcg;
> @@ -1475,7 +1475,7 @@ bool task_in_mem_cgroup(struct task_struct *task,
>
> p = find_lock_task_mm(task);
> if (p) {
> - curr = try_get_mem_cgroup_from_mm(p->mm);
> + curr = get_mem_cgroup_from_mm(p->mm);
> task_unlock(p);
> } else {
> /*
> @@ -1489,8 +1489,6 @@ bool task_in_mem_cgroup(struct task_struct *task,
> css_get(&curr->css);
> rcu_read_unlock();
> }
> - if (!curr)
> - return false;
> /*
> * We should check use_hierarchy of "memcg" not "curr". Because checking
> * use_hierarchy of "curr" here make this function true if hierarchy is
> @@ -3649,15 +3647,7 @@ __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order)
> if (!current->mm || current->memcg_kmem_skip_account)
> return true;
>
> - memcg = try_get_mem_cgroup_from_mm(current->mm);
> -
> - /*
> - * very rare case described in mem_cgroup_from_task. Unfortunately there
> - * isn't much we can do without complicating this too much, and it would
> - * be gfp-dependent anyway. Just let it go
> - */
> - if (unlikely(!memcg))
> - return true;
> + memcg = get_mem_cgroup_from_mm(current->mm);
>
> if (!memcg_can_account_kmem(memcg)) {
> css_put(&memcg->css);
> --
> 1.8.5.3
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 8/8] memcg: sanitize __mem_cgroup_try_charge() call protocol
2014-02-07 17:04 ` [patch 8/8] memcg: sanitize __mem_cgroup_try_charge() call protocol Johannes Weiner
@ 2014-02-10 15:28 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-02-10 15:28 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:25, Johannes Weiner wrote:
> Some callsites pass a memcg directly, some callsites pass a mm that
> first has to be translated to an mm. This makes for a terrible
> function interface.
>
> Just push the mm-to-memcg translation into the respective callsites
> and always pass a memcg to mem_cgroup_try_charge().
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
OK if you really think that repeating the same pattern when charging
against mm is better then I can live with that. It is a trivial amount
of code.
[...]
> @@ -3923,21 +3895,17 @@ static int __mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> */
> if (PageCgroupUsed(pc))
> return 0;
I think that we should set *memcgp in this path rather than rely on
caller to do the right thing. Both current callers do so but it is not
nice. And the detail why we need NULL is quite subtle so a comment
mentioning that __mem_cgroup_commit_charge_swapin has to ignore such a
page would be more than welcome.
[...]
Other than that looks good to me.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 3/8] memcg: update comment about charge reparenting on cgroup exit
2014-02-10 14:23 ` Michal Hocko
@ 2014-02-10 20:12 ` Hugh Dickins
2014-02-11 18:48 ` Johannes Weiner
0 siblings, 1 reply; 20+ messages in thread
From: Hugh Dickins @ 2014-02-10 20:12 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Michal Hocko, linux-mm, linux-kernel
On Mon, 10 Feb 2014, Michal Hocko wrote:
> On Fri 07-02-14 12:04:20, Johannes Weiner wrote:
> > Reparenting memory charges in the css_free() callback was meant as a
> > temporary fix for charges that race with offlining, but after some
> > follow-up discussion, it turns out that this is really the right place
> > to reparent charges because it guarantees none are in-flight.
Perhaps: I'm not as gung-ho for this new orthodoxy as you are.
> >
> > Make clear that the reparenting in css_offline() is an optimistic
> > sweep of established charges because swapout records might hold up
> > css_free() indefinitely, but that in fact the css_free() reparenting
> > is the properly synchronized one.
It worries me that you keep referring to the memsw usage, but
forget the kmem usage, which also delays css_free() indefinitely.
Or am I out-of-date? Seems not, mem_cgroup_reparent_chages() still
waits for memcg->res - memcg->kmem to reach 0, knowing there's not
much certainty that kmem will reach 0 any time soon.
I think you need a plan for what to do with the kmem pinning,
before going much further in reworking the memsw pinning.
Or at the least, please mention it in this patch's comment.
Hugh
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> OK, I am still thinking about 2 stage reparenting. LRU drain part called
> from css_offline and charge drain from css_free. But this is a
> sufficient for now.
>
> Acked-by: Michal Hocko <mhocko@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 3/8] memcg: update comment about charge reparenting on cgroup exit
2014-02-10 20:12 ` Hugh Dickins
@ 2014-02-11 18:48 ` Johannes Weiner
0 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-02-11 18:48 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Michal Hocko, linux-mm, linux-kernel
On Mon, Feb 10, 2014 at 12:12:42PM -0800, Hugh Dickins wrote:
> On Mon, 10 Feb 2014, Michal Hocko wrote:
> > On Fri 07-02-14 12:04:20, Johannes Weiner wrote:
> > > Reparenting memory charges in the css_free() callback was meant as a
> > > temporary fix for charges that race with offlining, but after some
> > > follow-up discussion, it turns out that this is really the right place
> > > to reparent charges because it guarantees none are in-flight.
>
> Perhaps: I'm not as gung-ho for this new orthodoxy as you are.
>
> > > Make clear that the reparenting in css_offline() is an optimistic
> > > sweep of established charges because swapout records might hold up
> > > css_free() indefinitely, but that in fact the css_free() reparenting
> > > is the properly synchronized one.
>
> It worries me that you keep referring to the memsw usage, but
> forget the kmem usage, which also delays css_free() indefinitely.
>
> Or am I out-of-date? Seems not, mem_cgroup_reparent_chages() still
> waits for memcg->res - memcg->kmem to reach 0, knowing there's not
> much certainty that kmem will reach 0 any time soon.
>
> I think you need a plan for what to do with the kmem pinning,
> before going much further in reworking the memsw pinning.
>
> Or at the least, please mention it in this patch's comment.
It think the discussion from the other thread bled over into this one
a little bit, this patch was merely about clarifying that .css_free()
reparenting is not the crude hack it was described as.
Yes, I forgot about kmem and it should be mentioned in this patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 0/8] memcg: charge path cleanups
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
` (8 preceding siblings ...)
2014-02-07 17:07 ` [patch 0/8] memcg: charge path cleanups Michal Hocko
@ 2014-03-10 15:55 ` Michal Hocko
9 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2014-03-10 15:55 UTC (permalink / raw)
To: Johannes Weiner; +Cc: linux-mm, linux-kernel
On Fri 07-02-14 12:04:17, Johannes Weiner wrote:
> Hi Michal,
>
> I took some of your patches and combined them with the charge path
> cleanups I already had and the changes I made after our discussion.
>
> I'm really happy about where this is going:
>
> mm/memcontrol.c | 298 ++++++++++++++++--------------------------------------
> 1 file changed, 87 insertions(+), 211 deletions(-)
>
> let me know what you think!
>
> Johannes
What's happened with this series? Are you planning to repost it
Johannes? I know it is late in the cycle but it would be nice to have it
at least in mmotm tree because it shuffles the code a lot.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [patch 0/8] memcg: charge path cleanups
@ 2014-03-12 1:28 Johannes Weiner
0 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-03-12 1:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Michal Hocko, linux-mm, cgroups, linux-kernel
Hi Andrew,
here are some cleanups and refactoring efforts of the memcg charge
path for 3.15 from Michal and me.
mm/memcontrol.c | 319 +++++++++++++++++++-----------------------------------
1 file changed, 112 insertions(+), 207 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-03-12 1:28 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-07 17:04 [patch 0/8] memcg: charge path cleanups Johannes Weiner
2014-02-07 17:04 ` [patch 1/8] mm: memcg: remove unnecessary preemption disabling Johannes Weiner
2014-02-10 14:17 ` Michal Hocko
2014-02-07 17:04 ` [patch 2/8] mm: memcg: remove mem_cgroup_move_account_page_stat() Johannes Weiner
2014-02-10 14:19 ` Michal Hocko
2014-02-07 17:04 ` [patch 3/8] memcg: update comment about charge reparenting on cgroup exit Johannes Weiner
2014-02-10 14:23 ` Michal Hocko
2014-02-10 20:12 ` Hugh Dickins
2014-02-11 18:48 ` Johannes Weiner
2014-02-07 17:04 ` [patch 4/8] memcg: !memcg && !mm is not allowed for __mem_cgroup_try_charge Johannes Weiner
2014-02-07 17:04 ` [patch 5/8] memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm() Johannes Weiner
2014-02-10 14:29 ` Michal Hocko
2014-02-07 17:04 ` [patch 6/8] memcg: get_mem_cgroup_from_mm() Johannes Weiner
2014-02-10 14:41 ` Michal Hocko
2014-02-07 17:04 ` [patch 7/8] memcg: do not replicate get_mem_cgroup_from_mm in __mem_cgroup_try_charge Johannes Weiner
2014-02-07 17:04 ` [patch 8/8] memcg: sanitize __mem_cgroup_try_charge() call protocol Johannes Weiner
2014-02-10 15:28 ` Michal Hocko
2014-02-07 17:07 ` [patch 0/8] memcg: charge path cleanups Michal Hocko
2014-03-10 15:55 ` Michal Hocko
-- strict thread matches above, loose matches on Subject: below --
2014-03-12 1:28 Johannes Weiner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).