From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [RFC 1/5] memcg: cleanup charge routines
Date: Thu, 30 Jan 2014 12:18:37 -0500 [thread overview]
Message-ID: <20140130171837.GD6963@cmpxchg.org> (raw)
In-Reply-To: <1387295130-19771-2-git-send-email-mhocko@suse.cz>
On Tue, Dec 17, 2013 at 04:45:26PM +0100, Michal Hocko wrote:
> The current core of memcg charging is wild to say the least.
> __mem_cgroup_try_charge which is in the center tries to be too clever
> and it handles two independent cases
> * when the memcg to be charged is known in advance
> * when the given mm_struct is charged
> The resulting callchains are quite complex:
>
> memcg_charge_kmem(mm=NULL, memcg) mem_cgroup_newpage_charge(mm)
> | | _________________________________________ mem_cgroup_cache_charge(current->mm)
> | |/ |
> | ______________________________ mem_cgroup_charge_common(mm, memcg=NULL) |
> |/ /
> | /
> | ____________________________ mem_cgroup_try_charge_swapin(mm, memcg=NULL) /
> |/ | _________________________________________/
> | |/
> | | /* swap accounting */ /* no swap accounting */
> | _____________________________ __mem_cgroup_try_charge_swapin(mm=NULL, memcg) || (mm, memcg=NULL)
> |/
> | ____________________________ mem_cgroup_do_precharge(mm=NULL, memcg)
> |/
> __mem_cgroup_try_charge
> mem_cgroup_do_charge
> res_counter_charge
> mem_cgroup_reclaim
> mem_cgroup_wait_acct_move
> mem_cgroup_oom
>
> This patch splits __mem_cgroup_try_charge into two logical parts.
> mem_cgroup_try_charge_mm which is responsible for charges for the given
> mm_struct and it returns the charged memcg or NULL under OOM while
> mem_cgroup_try_charge_memcg charges a known memcg and returns an error
> code.
>
> The only tricky part which remains is __mem_cgroup_try_charge_swapin
> because it can return 0 if PageCgroupUsed is already set and then we do
> not want to commit the charge. This is done with a magic combination of
> memcg = NULL and ret = 0. So the function preserves its memcgp parameter
> and sets the given memcg to NULL when it sees PageCgroupUsed
> (__mem_cgroup_commit_charge_swapin then ignores such a commit).
>
> Not only the code is easier to follow the change reduces the code size
> too:
> $ size mm/built-in.o.before
> text data bss dec hex filename
> 457463 83162 49824 590449 90271 mm/built-in.o.before
>
> $ size mm/built-in.o.after
> text data bss dec hex filename
> 456794 83162 49824 589780 8ffd4 mm/built-in.o.after
Nice!
> @@ -2655,37 +2655,68 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> }
>
> /*
> - * __mem_cgroup_try_charge() does
> - * 1. detect memcg to be charged against from passed *mm and *ptr,
> - * 2. update res_counter
> - * 3. call memory reclaim if necessary.
> + * __mem_cgroup_try_charge_memcg - core of the memcg charging code. The caller
> + * keeps a css reference to the given memcg. We do not charge root_mem_cgroup.
> + * OOM is triggered only if allowed by the given oom parameter (except for
> + * __GFP_NOFAIL when it is ignored).
> *
> - * In some special case, if the task is fatal, fatal_signal_pending() or
> - * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
> - * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
> - * as possible without any hazards. 2: all pages should have a valid
> - * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
> - * pointer, that is treated as a charge to root_mem_cgroup.
> - *
> - * So __mem_cgroup_try_charge() will return
> - * 0 ... on success, filling *ptr with a valid memcg pointer.
> - * -ENOMEM ... charge failure because of resource limits.
> - * -EINTR ... if thread is fatal. *ptr is filled with root_mem_cgroup.
> - *
> - * Unlike the exported interface, an "oom" parameter is added. if oom==true,
> - * the oom-killer can be invoked.
> + * Returns 0 on success, -ENOMEM when the given memcg is under OOM and -EINTR
> + * when the charge is bypassed (either when fatal signals are pending or
> + * __GFP_NOFAIL allocation cannot be charged).
> */
> -static int __mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t gfp_mask,
> +static int __mem_cgroup_try_charge_memcg(gfp_t gfp_mask,
> unsigned int nr_pages,
> - struct mem_cgroup **ptr,
> + struct mem_cgroup *memcg,
> bool oom)
Why not keep the __mem_cgroup_try_charge() name? It's shorter and
just as descriptive.
> {
> unsigned int batch = max(CHARGE_BATCH, nr_pages);
> int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
> - struct mem_cgroup *memcg = NULL;
> int ret;
>
> + VM_BUG_ON(!memcg || memcg == root_mem_cgroup);
> +
> + if (unlikely(task_in_memcg_oom(current)))
> + goto nomem;
> +
> + if (gfp_mask & __GFP_NOFAIL)
> + oom = false;
> +
> + do {
> + bool invoke_oom = oom && !nr_oom_retries;
> +
> + /* If killed, bypass charge */
> + if (fatal_signal_pending(current))
> + goto bypass;
> +
> + ret = mem_cgroup_do_charge(memcg, gfp_mask, batch,
> + nr_pages, invoke_oom);
> + switch (ret) {
> + case CHARGE_RETRY: /* not in OOM situation but retry */
> + batch = nr_pages;
> + break;
> + case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
> + goto nomem;
> + case CHARGE_NOMEM: /* OOM routine works */
> + if (!oom || invoke_oom)
> + goto nomem;
> + nr_oom_retries--;
> + break;
> + }
> + } while (ret != CHARGE_OK);
> +
> + if (batch > nr_pages)
> + refill_stock(memcg, batch - nr_pages);
> +
> + return 0;
> +nomem:
> + if (!(gfp_mask & __GFP_NOFAIL))
> + return -ENOMEM;
> +bypass:
> + return -EINTR;
> +}
> +
> +static bool mem_cgroup_bypass_charge(void)
The name and parameter list suggests this consults some global memory
cgroup state. current_bypass_charge()? I think ultimately we want to
move away from all these mem_cgroup prefixes of static functions in
there, they add nothing of value.
> +{
> /*
> * Unlike gloval-vm's OOM-kill, we're not in memory shortage
> * in system level. So, allow to go ahead dying process in addition to
> @@ -2693,13 +2724,23 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
> */
> if (unlikely(test_thread_flag(TIF_MEMDIE)
> || fatal_signal_pending(current)))
> - goto bypass;
> + return true;
>
> - if (unlikely(task_in_memcg_oom(current)))
> - goto nomem;
> + return false;
> +}
>
> - if (gfp_mask & __GFP_NOFAIL)
> - oom = false;
> +/*
> + * Charges and returns memcg associated with the given mm (or root_mem_cgroup
> + * if mm is NULL). Returns NULL if memcg is under OOM.
> + */
> +static struct mem_cgroup *mem_cgroup_try_charge_mm(struct mm_struct *mm,
> + gfp_t gfp_mask,
> + unsigned int nr_pages,
> + bool oom)
We already have a try_get_mem_cgroup_from_mm(). After this series,
this function basically duplicates that and it would be much cleaner
if we only had one try_charge() function and let all the callers use
the appropriate try_get_mem_cgroup_from_wherever() themselves.
If you pull the patch that moves consume_stock() back into
try_charge() up front, I think this cleanup would be more obvious and
the result even better.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-01-30 17:18 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 15:45 [RFC] memcg: some charge path cleanups + css offline vs. charge race fix Michal Hocko
2013-12-17 15:45 ` [RFC 1/5] memcg: cleanup charge routines Michal Hocko
2014-01-30 17:18 ` Johannes Weiner [this message]
2014-02-03 13:20 ` Michal Hocko
2014-02-03 15:51 ` Johannes Weiner
2014-02-03 16:41 ` Michal Hocko
2013-12-17 15:45 ` [RFC 2/5] memcg: move stock charge into __mem_cgroup_try_charge_memcg Michal Hocko
2013-12-17 15:45 ` [RFC 3/5] memcg: mm == NULL is not allowed for mem_cgroup_try_charge_mm Michal Hocko
2013-12-17 15:45 ` [RFC 4/5] memcg: make sure that memcg is not offline when charging Michal Hocko
2014-01-30 17:29 ` Johannes Weiner
2014-02-03 13:33 ` Michal Hocko
2014-02-03 16:18 ` Johannes Weiner
2014-02-03 16:44 ` Michal Hocko
2013-12-17 15:45 ` [RFC 5/5] Revert "mm: memcg: fix race condition between memcg teardown and swapin" Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140130171837.GD6963@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).